From 82624418d5940ef628a51ca07aa222073fa9a388 Mon Sep 17 00:00:00 2001 From: lemon Date: Thu, 3 Mar 2022 10:11:01 +0100 Subject: change some things around --- content/b/_index.md | 6 - content/b/c99-vla-tricks.md | 308 -------------------------------------------- content/b/first.md | 15 --- 3 files changed, 329 deletions(-) delete mode 100644 content/b/_index.md delete mode 100644 content/b/c99-vla-tricks.md delete mode 100644 content/b/first.md (limited to 'content/b') diff --git a/content/b/_index.md b/content/b/_index.md deleted file mode 100644 index c504e1b..0000000 --- a/content/b/_index.md +++ /dev/null @@ -1,6 +0,0 @@ -+++ -title = "Blog" -sort_by = "date" -template = "blog.html" -page_template = "blog-page.html" -+++ diff --git a/content/b/c99-vla-tricks.md b/content/b/c99-vla-tricks.md deleted file mode 100644 index b2b07eb..0000000 --- a/content/b/c99-vla-tricks.md +++ /dev/null @@ -1,308 +0,0 @@ ---- -title: "C99 doesn't need function bodies, or 'VLAs are Turing complete'" -date: 2022-02-19 ---- - -Preface -======= - -The 1999 revision to the C programming language brought several interesting -changes and additions to the standard. Among those are variable length arrays, -or VLAs for short, which allow array types to have lengths that are determined -at runtime. These can show up in different contexts, but for the purposes of -this post we will be looking at VLAs as parameters to functions. For example: - - void f(int n, float v[n]); - -Since array parameters decay to their corresponding pointer type, that -declaration is functionally equivalent, and compatible ([6.7.5.3p7]) to the -following: - - void f(int n, float *v); - -The length of `v` in the first declaration is given by the value of a previous -parameter of the function, which can only be known at runtime. However, in a -function prototype (i.e. not a function definition) like the above, the length -of a VLA parameter is never computed and essentially ignored ([6.7.5.2p5]). -But in a function definition, the length is evaluated and must be greater than -zero. In practice this is rather useless because the sizeof operator doesn't -work as one might expect it to with array parameters ([6.5.3.4-note88]): - - void f(int n, float param[n]) { - float local[n]; - int a = sizeof local, - b = sizeof param; - - printf("n = %d\n", n); - printf("sizeof local = %d\n", a); // this will be n * sizeof(float) - printf("sizeof param = %d\n", b); // but this will be sizeof(float *) - } - -In other words, using sizeof with a VLA will evaluate the runtime-known length -and calculate the array size based on that, except when that VLA is a function -parameter, then, for whatever reason, it is the size of the corresponding -pointer type (in this example, `sizeof local === n * sizeof(float)` but `sizeof -param == sizeof(float *)`). The length of a VLA parameter may be used when e.g. -computing indices when accessing multi-dimensional variable length arrays. - -Alas, the standard mandates that the variable array length be computed when -the function is called. Of course, the expression in between the square -brackets is not limited to simple expressions like `n`, so one can write -something like: - - void f(int a, int b, int m[(a + b) / 2]) {} - -or - - void f(int x, int b[abs(x)]) {} - -or even - - void f(int v[getchar()]) {} - - -[6.7.5.3p7]: http://port70.net/~nsz/c/c99/n1256.html#6.7.5.3p7 -[6.7.5.2p5]: http://port70.net/~nsz/c/c99/n1256.html#6.7.5.2p5 -[6.5.3.4-note88]: http://port70.net/~nsz/c/c99/n1256.html#note88 - - -Disembodiment -============= - -The following program should give you an idea of the kind of constructs -that these rules allow for ([try it on compiler explorer][a]): - - int main(int argc, char *argv[printf("Hello")]) - { - printf(" world!\n"); - } - -The length expression of the VLA is evaluated before any of the statements in -`main`. I couldn't find anywhere in the standard saying whether this evaluation -order is well-defined but it is what clang and gcc do, and luckily, it does not -matter for the sake of this article, as we will see shortly. - -Let us refer to the subset of C99 where function bodies must be empty as -*disembodied C*. You might naturally ask yourself what things can be -accomplished in this subset (though you can probably guess the answer from the -title of this page). In other words, what can you do in C if you are limited to -just evaluating expressions and no statements? - -- Using the comma operator, we can sequence arbitrarily many expressions for - their side effects, so - - void f() { - printf("Press enter to confirm: "); - getchar(); - printf("thanks.\n"); - } - - becomes - - void f(char _[( - printf("Press enter to confirm: "), - getchar(), - printf("thanks.\n"), - 1 - )]) {} - - In a comma expression, the operands are evaluated left-to-right and the value - of the last operand is the resulting value of the whole expression. The `1` - at the end ensures that the evaluated array length is >0 (to avoid UB). - -- Functions in disembodied C are going to need a dummy parameter where the VLA - length expression evaluation can take place. For consistency, we will denote - it as `char _[...]` and give it the value `""` (the empty string) when calling - said functions (note that the value we give it doesn't actually matter, though - its size should be at least as big as the computed VLA size to avoid UB). - -- If-else statements can be replaced with ternary conditional expressions, such that - - void f(int n) { - if (n < 0) - printf("negative!"); - else if (n > 0) - printf("positive!"); - else - printf("zero!"); - } - - becomes - - void f(int n, char _[( - (n < 0) ? - printf("negative!") - : (n > 0) ? - printf("positive!") - : - printf("zero!") - , 1 - )]) {} - -- Remember that the VLA length expression can access previous function arguments, - so parameter passing is essentially unchanged. - -- We cannot return values, but we can use out parameters by taking advantage of - the fact that assignments are expressions in C, so instead of - - int imax(int a, int b) { - return a > b ? a : b; - } - - we can write - - void imax(int *out, int a, int b, char _[ - (*out = a > b ? a : b), - 1 - ]) {} - -- We cannot define local variables inside of expressions, but we can just add - extra function parameters to use as temporaries, rewriting - - void fswapf(float *a, float *b) { - float tmp = *a; - *a = *b; - *b = tmp; - } - - as - - static void fswapf_aux(float *a, float *b, float tmp, char _[( - tmp = *a, - *a = *b, - *b = tmp, - 1 - )]) {} - - void fswapf(float *a, float *b, char _[( - fswapf_aux(a, b, 0, ""), 1 - )]) {} - - Alternatively, if re-entrancy and thread-safety are disregarded, we could - just use global (static) variables. - -- What about loops? Clearly we cannot use `while` or `for` inside expressions. - Thankfully, they are unnecessary thanks to recursion. For example: - - float sum(float *v, size_t n) { - float sum = 0.0f; - for (size_t i = 0; i < n; ++i) - sum += v[i]; - return sum; - } - - can be expressed as - - /* the forward declaration is necessary */ - static void sum_aux(float *out, float *v, size_t n, char *); - static void sum_aux(float *out, float *v, size_t n, char _[( - (n > 0) ? ( - *out += *v, - sum_aux(out, v + 1, n - 1, ""), - 1 - ) : 1 - )]) {} - - void sum(float *out, float *v, size_t n, char _[( - *out = 0.0f, - sum_aux(out, v, n, ""), - 1 - )]) {} - - In fact, any imperative-style loop can be turned into an equivalent recursive - loop (as any functional programmer will be happy to demonstrate), though since - C lacks closures or any form of anonymous functions, it can get quite unwieldy - (I hope you like auxiliary functions). - - The astute reader might point out that these two versions of `sum` are not - equivalent because the recursive definition may cause a stack overflow for - large enough values of `n`. This is unfortunately true, and the major hurdle - for the practicality of disembodied C, but does not preclude Turing - completeness (an ideal [Turing machine] has infinite memory at its - disposal). Luckily, modern compilers are smart, and if we write our functions - carefully to be [tail recursive], they will often be able to perform tail - call elimination, removing the risk of a stack overflow. For the above - example, both gcc and clang are able to optimize the tail call ([see on - compiler explorer][b]). - -- Although not relevant to standard C99, in case you had the idea of using [gcc - statement expressions] to bypass these limitations, the compiler will stop - you right on your tracks since it doesn't allow those outside of function - bodies. - - -[Turing machine]: https://en.wikipedia.org/wiki/Turing_machine -[tail recursive]: https://en.wikipedia.org/wiki/Tail_call -[gcc statement expressions]: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html -[a]: https://godbolt.org/z/f3chYMvh5 -[b]: https://godbolt.org/z/b1aG78os5 - - -Limitations -=========== - -After all of that, it seems that basically anything can be done in disembodied -C, so, is there something that *can't* be expressed in this C99 subset? - -With the rules that we've laid down so far, yes. In particular: - -- It is not possible in the general case to use APIs that require callbacks. - For example, look at the standard `qsort` function: - - void qsort(void *base, size_t nmemb, size_t size, - int (*compar)(const void *, const void *)); - - The `compar` parameter is a pointer to a function that should return the - result of comparing two values in the given array. Since its parameters are - void pointers, we can't have the VLA argument we need to perform computations - (arrays of `void` are not valid C), and we also cannot provide an `int` - return value. Thus there is no way (that I know of) to use this function in - standards-compliant disembodied C. - - This is not a dealbreaker since we could just reimplement `qsort` ;). - -- We cannot access the program arguments. The entry point of a program in - disembodied C looks like this: - - int main(int argc, char *argv[ /* code here ... */ ]) {} - - In the VLA length expression for `argv`, `argc` is in scope, but `argv` is - not. - - Fortunately there is a way to work around this: we can use the common - extension where `main` can take a third argument which is a pointer to the - environment. According to the standard ([J.5.1]), this is implemented in - hosted environments. In practice, POSIX environments and Microsoft both - support it. Then, our entry point looks like this instead: - - int main(int argc, char **argv, char *envp[/* ... */]) {} - - Now `argv` is in scope within the `envp` array length expression. - - Side note: even though we can't `return`, `main` is the exception to the rule - that reaching the closing `}` of a function returning non-void is verboten, - and is equivalent to returning zero ([5.1.2.2.3]). If we need to terminate - with an exit code other than zero, we can use `exit()`. - - -[J.5.1]: http://port70.net/~nsz/c/c99/n1256.html#J.5.1 -[5.1.2.2.3]: http://port70.net/~nsz/c/c99/n1256.html#5.1.2.2.3 - - -Final Words -=========== - -It seems that with enough effort anything can be done with these peculiar -restrictions. As an example, I wrote implementations of the [MD5] and [SHA256] -hash functions as well as [Conway's Game of Life using SDL for graphics][life]. -I was also working on a more interesting and "useful" program but I kind of -lost motivation halfway through. I'll update this article if I ever come back -to finish that. - -In case it wasn't clear enough, there is zero practical reason to ever do any -of this. I hope you'll agree it's a fun party trick though :). - - -[MD5]: https://git.sr.ht/~lsof/x/tree/main/item/vla/md5sum.c -[SHA256]: https://git.sr.ht/~lsof/x/tree/main/item/vla/sha256sum.c -[life]: https://git.sr.ht/~lsof/x/tree/main/item/vla/life.c diff --git a/content/b/first.md b/content/b/first.md deleted file mode 100644 index 291b364..0000000 --- a/content/b/first.md +++ /dev/null @@ -1,15 +0,0 @@ -+++ -title = "test post" -date = 2022-02-02 -+++ - -# test -lorem ipsum dolor sit amet - - u16 rng(u16 s){ - s^=s<<8,s=(s<<8 - |s>>8)^((s&0xff - )<<1),s=0x7e00^ - (s>>1)^(0x9e74& - -(~s&1));return - 46497^s?s:s^s;} -- cgit v1.2.3