From 82624418d5940ef628a51ca07aa222073fa9a388 Mon Sep 17 00:00:00 2001 From: lemon Date: Thu, 3 Mar 2022 10:11:01 +0100 Subject: change some things around --- content/b/_index.md | 6 - content/b/c99-vla-tricks.md | 308 -------------------------------------------- content/b/first.md | 15 --- content/w/_index.md | 6 + content/w/c99-vla-tricks.md | 308 ++++++++++++++++++++++++++++++++++++++++++++ content/w/first.md | 15 +++ static/mips.html | 4 + static/style.css | 11 ++ templates/blog-page.html | 5 +- templates/blog.html | 4 +- templates/index.html | 2 +- 11 files changed, 352 insertions(+), 332 deletions(-) delete mode 100644 content/b/_index.md delete mode 100644 content/b/c99-vla-tricks.md delete mode 100644 content/b/first.md create mode 100644 content/w/_index.md create mode 100644 content/w/c99-vla-tricks.md create mode 100644 content/w/first.md diff --git a/content/b/_index.md b/content/b/_index.md deleted file mode 100644 index c504e1b..0000000 --- a/content/b/_index.md +++ /dev/null @@ -1,6 +0,0 @@ -+++ -title = "Blog" -sort_by = "date" -template = "blog.html" -page_template = "blog-page.html" -+++ diff --git a/content/b/c99-vla-tricks.md b/content/b/c99-vla-tricks.md deleted file mode 100644 index b2b07eb..0000000 --- a/content/b/c99-vla-tricks.md +++ /dev/null @@ -1,308 +0,0 @@ ---- -title: "C99 doesn't need function bodies, or 'VLAs are Turing complete'" -date: 2022-02-19 ---- - -Preface -======= - -The 1999 revision to the C programming language brought several interesting -changes and additions to the standard. Among those are variable length arrays, -or VLAs for short, which allow array types to have lengths that are determined -at runtime. These can show up in different contexts, but for the purposes of -this post we will be looking at VLAs as parameters to functions. For example: - - void f(int n, float v[n]); - -Since array parameters decay to their corresponding pointer type, that -declaration is functionally equivalent, and compatible ([6.7.5.3p7]) to the -following: - - void f(int n, float *v); - -The length of `v` in the first declaration is given by the value of a previous -parameter of the function, which can only be known at runtime. However, in a -function prototype (i.e. not a function definition) like the above, the length -of a VLA parameter is never computed and essentially ignored ([6.7.5.2p5]). -But in a function definition, the length is evaluated and must be greater than -zero. In practice this is rather useless because the sizeof operator doesn't -work as one might expect it to with array parameters ([6.5.3.4-note88]): - - void f(int n, float param[n]) { - float local[n]; - int a = sizeof local, - b = sizeof param; - - printf("n = %d\n", n); - printf("sizeof local = %d\n", a); // this will be n * sizeof(float) - printf("sizeof param = %d\n", b); // but this will be sizeof(float *) - } - -In other words, using sizeof with a VLA will evaluate the runtime-known length -and calculate the array size based on that, except when that VLA is a function -parameter, then, for whatever reason, it is the size of the corresponding -pointer type (in this example, `sizeof local === n * sizeof(float)` but `sizeof -param == sizeof(float *)`). The length of a VLA parameter may be used when e.g. -computing indices when accessing multi-dimensional variable length arrays. - -Alas, the standard mandates that the variable array length be computed when -the function is called. Of course, the expression in between the square -brackets is not limited to simple expressions like `n`, so one can write -something like: - - void f(int a, int b, int m[(a + b) / 2]) {} - -or - - void f(int x, int b[abs(x)]) {} - -or even - - void f(int v[getchar()]) {} - - -[6.7.5.3p7]: http://port70.net/~nsz/c/c99/n1256.html#6.7.5.3p7 -[6.7.5.2p5]: http://port70.net/~nsz/c/c99/n1256.html#6.7.5.2p5 -[6.5.3.4-note88]: http://port70.net/~nsz/c/c99/n1256.html#note88 - - -Disembodiment -============= - -The following program should give you an idea of the kind of constructs -that these rules allow for ([try it on compiler explorer][a]): - - int main(int argc, char *argv[printf("Hello")]) - { - printf(" world!\n"); - } - -The length expression of the VLA is evaluated before any of the statements in -`main`. I couldn't find anywhere in the standard saying whether this evaluation -order is well-defined but it is what clang and gcc do, and luckily, it does not -matter for the sake of this article, as we will see shortly. - -Let us refer to the subset of C99 where function bodies must be empty as -*disembodied C*. You might naturally ask yourself what things can be -accomplished in this subset (though you can probably guess the answer from the -title of this page). In other words, what can you do in C if you are limited to -just evaluating expressions and no statements? - -- Using the comma operator, we can sequence arbitrarily many expressions for - their side effects, so - - void f() { - printf("Press enter to confirm: "); - getchar(); - printf("thanks.\n"); - } - - becomes - - void f(char _[( - printf("Press enter to confirm: "), - getchar(), - printf("thanks.\n"), - 1 - )]) {} - - In a comma expression, the operands are evaluated left-to-right and the value - of the last operand is the resulting value of the whole expression. The `1` - at the end ensures that the evaluated array length is >0 (to avoid UB). - -- Functions in disembodied C are going to need a dummy parameter where the VLA - length expression evaluation can take place. For consistency, we will denote - it as `char _[...]` and give it the value `""` (the empty string) when calling - said functions (note that the value we give it doesn't actually matter, though - its size should be at least as big as the computed VLA size to avoid UB). - -- If-else statements can be replaced with ternary conditional expressions, such that - - void f(int n) { - if (n < 0) - printf("negative!"); - else if (n > 0) - printf("positive!"); - else - printf("zero!"); - } - - becomes - - void f(int n, char _[( - (n < 0) ? - printf("negative!") - : (n > 0) ? - printf("positive!") - : - printf("zero!") - , 1 - )]) {} - -- Remember that the VLA length expression can access previous function arguments, - so parameter passing is essentially unchanged. - -- We cannot return values, but we can use out parameters by taking advantage of - the fact that assignments are expressions in C, so instead of - - int imax(int a, int b) { - return a > b ? a : b; - } - - we can write - - void imax(int *out, int a, int b, char _[ - (*out = a > b ? a : b), - 1 - ]) {} - -- We cannot define local variables inside of expressions, but we can just add - extra function parameters to use as temporaries, rewriting - - void fswapf(float *a, float *b) { - float tmp = *a; - *a = *b; - *b = tmp; - } - - as - - static void fswapf_aux(float *a, float *b, float tmp, char _[( - tmp = *a, - *a = *b, - *b = tmp, - 1 - )]) {} - - void fswapf(float *a, float *b, char _[( - fswapf_aux(a, b, 0, ""), 1 - )]) {} - - Alternatively, if re-entrancy and thread-safety are disregarded, we could - just use global (static) variables. - -- What about loops? Clearly we cannot use `while` or `for` inside expressions. - Thankfully, they are unnecessary thanks to recursion. For example: - - float sum(float *v, size_t n) { - float sum = 0.0f; - for (size_t i = 0; i < n; ++i) - sum += v[i]; - return sum; - } - - can be expressed as - - /* the forward declaration is necessary */ - static void sum_aux(float *out, float *v, size_t n, char *); - static void sum_aux(float *out, float *v, size_t n, char _[( - (n > 0) ? ( - *out += *v, - sum_aux(out, v + 1, n - 1, ""), - 1 - ) : 1 - )]) {} - - void sum(float *out, float *v, size_t n, char _[( - *out = 0.0f, - sum_aux(out, v, n, ""), - 1 - )]) {} - - In fact, any imperative-style loop can be turned into an equivalent recursive - loop (as any functional programmer will be happy to demonstrate), though since - C lacks closures or any form of anonymous functions, it can get quite unwieldy - (I hope you like auxiliary functions). - - The astute reader might point out that these two versions of `sum` are not - equivalent because the recursive definition may cause a stack overflow for - large enough values of `n`. This is unfortunately true, and the major hurdle - for the practicality of disembodied C, but does not preclude Turing - completeness (an ideal [Turing machine] has infinite memory at its - disposal). Luckily, modern compilers are smart, and if we write our functions - carefully to be [tail recursive], they will often be able to perform tail - call elimination, removing the risk of a stack overflow. For the above - example, both gcc and clang are able to optimize the tail call ([see on - compiler explorer][b]). - -- Although not relevant to standard C99, in case you had the idea of using [gcc - statement expressions] to bypass these limitations, the compiler will stop - you right on your tracks since it doesn't allow those outside of function - bodies. - - -[Turing machine]: https://en.wikipedia.org/wiki/Turing_machine -[tail recursive]: https://en.wikipedia.org/wiki/Tail_call -[gcc statement expressions]: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html -[a]: https://godbolt.org/z/f3chYMvh5 -[b]: https://godbolt.org/z/b1aG78os5 - - -Limitations -=========== - -After all of that, it seems that basically anything can be done in disembodied -C, so, is there something that *can't* be expressed in this C99 subset? - -With the rules that we've laid down so far, yes. In particular: - -- It is not possible in the general case to use APIs that require callbacks. - For example, look at the standard `qsort` function: - - void qsort(void *base, size_t nmemb, size_t size, - int (*compar)(const void *, const void *)); - - The `compar` parameter is a pointer to a function that should return the - result of comparing two values in the given array. Since its parameters are - void pointers, we can't have the VLA argument we need to perform computations - (arrays of `void` are not valid C), and we also cannot provide an `int` - return value. Thus there is no way (that I know of) to use this function in - standards-compliant disembodied C. - - This is not a dealbreaker since we could just reimplement `qsort` ;). - -- We cannot access the program arguments. The entry point of a program in - disembodied C looks like this: - - int main(int argc, char *argv[ /* code here ... */ ]) {} - - In the VLA length expression for `argv`, `argc` is in scope, but `argv` is - not. - - Fortunately there is a way to work around this: we can use the common - extension where `main` can take a third argument which is a pointer to the - environment. According to the standard ([J.5.1]), this is implemented in - hosted environments. In practice, POSIX environments and Microsoft both - support it. Then, our entry point looks like this instead: - - int main(int argc, char **argv, char *envp[/* ... */]) {} - - Now `argv` is in scope within the `envp` array length expression. - - Side note: even though we can't `return`, `main` is the exception to the rule - that reaching the closing `}` of a function returning non-void is verboten, - and is equivalent to returning zero ([5.1.2.2.3]). If we need to terminate - with an exit code other than zero, we can use `exit()`. - - -[J.5.1]: http://port70.net/~nsz/c/c99/n1256.html#J.5.1 -[5.1.2.2.3]: http://port70.net/~nsz/c/c99/n1256.html#5.1.2.2.3 - - -Final Words -=========== - -It seems that with enough effort anything can be done with these peculiar -restrictions. As an example, I wrote implementations of the [MD5] and [SHA256] -hash functions as well as [Conway's Game of Life using SDL for graphics][life]. -I was also working on a more interesting and "useful" program but I kind of -lost motivation halfway through. I'll update this article if I ever come back -to finish that. - -In case it wasn't clear enough, there is zero practical reason to ever do any -of this. I hope you'll agree it's a fun party trick though :). - - -[MD5]: https://git.sr.ht/~lsof/x/tree/main/item/vla/md5sum.c -[SHA256]: https://git.sr.ht/~lsof/x/tree/main/item/vla/sha256sum.c -[life]: https://git.sr.ht/~lsof/x/tree/main/item/vla/life.c diff --git a/content/b/first.md b/content/b/first.md deleted file mode 100644 index 291b364..0000000 --- a/content/b/first.md +++ /dev/null @@ -1,15 +0,0 @@ -+++ -title = "test post" -date = 2022-02-02 -+++ - -# test -lorem ipsum dolor sit amet - - u16 rng(u16 s){ - s^=s<<8,s=(s<<8 - |s>>8)^((s&0xff - )<<1),s=0x7e00^ - (s>>1)^(0x9e74& - -(~s&1));return - 46497^s?s:s^s;} diff --git a/content/w/_index.md b/content/w/_index.md new file mode 100644 index 0000000..4ecec2e --- /dev/null +++ b/content/w/_index.md @@ -0,0 +1,6 @@ ++++ +title = "Writings" +sort_by = "date" +template = "blog.html" +page_template = "blog-page.html" ++++ diff --git a/content/w/c99-vla-tricks.md b/content/w/c99-vla-tricks.md new file mode 100644 index 0000000..b2b07eb --- /dev/null +++ b/content/w/c99-vla-tricks.md @@ -0,0 +1,308 @@ +--- +title: "C99 doesn't need function bodies, or 'VLAs are Turing complete'" +date: 2022-02-19 +--- + +Preface +======= + +The 1999 revision to the C programming language brought several interesting +changes and additions to the standard. Among those are variable length arrays, +or VLAs for short, which allow array types to have lengths that are determined +at runtime. These can show up in different contexts, but for the purposes of +this post we will be looking at VLAs as parameters to functions. For example: + + void f(int n, float v[n]); + +Since array parameters decay to their corresponding pointer type, that +declaration is functionally equivalent, and compatible ([6.7.5.3p7]) to the +following: + + void f(int n, float *v); + +The length of `v` in the first declaration is given by the value of a previous +parameter of the function, which can only be known at runtime. However, in a +function prototype (i.e. not a function definition) like the above, the length +of a VLA parameter is never computed and essentially ignored ([6.7.5.2p5]). +But in a function definition, the length is evaluated and must be greater than +zero. In practice this is rather useless because the sizeof operator doesn't +work as one might expect it to with array parameters ([6.5.3.4-note88]): + + void f(int n, float param[n]) { + float local[n]; + int a = sizeof local, + b = sizeof param; + + printf("n = %d\n", n); + printf("sizeof local = %d\n", a); // this will be n * sizeof(float) + printf("sizeof param = %d\n", b); // but this will be sizeof(float *) + } + +In other words, using sizeof with a VLA will evaluate the runtime-known length +and calculate the array size based on that, except when that VLA is a function +parameter, then, for whatever reason, it is the size of the corresponding +pointer type (in this example, `sizeof local === n * sizeof(float)` but `sizeof +param == sizeof(float *)`). The length of a VLA parameter may be used when e.g. +computing indices when accessing multi-dimensional variable length arrays. + +Alas, the standard mandates that the variable array length be computed when +the function is called. Of course, the expression in between the square +brackets is not limited to simple expressions like `n`, so one can write +something like: + + void f(int a, int b, int m[(a + b) / 2]) {} + +or + + void f(int x, int b[abs(x)]) {} + +or even + + void f(int v[getchar()]) {} + + +[6.7.5.3p7]: http://port70.net/~nsz/c/c99/n1256.html#6.7.5.3p7 +[6.7.5.2p5]: http://port70.net/~nsz/c/c99/n1256.html#6.7.5.2p5 +[6.5.3.4-note88]: http://port70.net/~nsz/c/c99/n1256.html#note88 + + +Disembodiment +============= + +The following program should give you an idea of the kind of constructs +that these rules allow for ([try it on compiler explorer][a]): + + int main(int argc, char *argv[printf("Hello")]) + { + printf(" world!\n"); + } + +The length expression of the VLA is evaluated before any of the statements in +`main`. I couldn't find anywhere in the standard saying whether this evaluation +order is well-defined but it is what clang and gcc do, and luckily, it does not +matter for the sake of this article, as we will see shortly. + +Let us refer to the subset of C99 where function bodies must be empty as +*disembodied C*. You might naturally ask yourself what things can be +accomplished in this subset (though you can probably guess the answer from the +title of this page). In other words, what can you do in C if you are limited to +just evaluating expressions and no statements? + +- Using the comma operator, we can sequence arbitrarily many expressions for + their side effects, so + + void f() { + printf("Press enter to confirm: "); + getchar(); + printf("thanks.\n"); + } + + becomes + + void f(char _[( + printf("Press enter to confirm: "), + getchar(), + printf("thanks.\n"), + 1 + )]) {} + + In a comma expression, the operands are evaluated left-to-right and the value + of the last operand is the resulting value of the whole expression. The `1` + at the end ensures that the evaluated array length is >0 (to avoid UB). + +- Functions in disembodied C are going to need a dummy parameter where the VLA + length expression evaluation can take place. For consistency, we will denote + it as `char _[...]` and give it the value `""` (the empty string) when calling + said functions (note that the value we give it doesn't actually matter, though + its size should be at least as big as the computed VLA size to avoid UB). + +- If-else statements can be replaced with ternary conditional expressions, such that + + void f(int n) { + if (n < 0) + printf("negative!"); + else if (n > 0) + printf("positive!"); + else + printf("zero!"); + } + + becomes + + void f(int n, char _[( + (n < 0) ? + printf("negative!") + : (n > 0) ? + printf("positive!") + : + printf("zero!") + , 1 + )]) {} + +- Remember that the VLA length expression can access previous function arguments, + so parameter passing is essentially unchanged. + +- We cannot return values, but we can use out parameters by taking advantage of + the fact that assignments are expressions in C, so instead of + + int imax(int a, int b) { + return a > b ? a : b; + } + + we can write + + void imax(int *out, int a, int b, char _[ + (*out = a > b ? a : b), + 1 + ]) {} + +- We cannot define local variables inside of expressions, but we can just add + extra function parameters to use as temporaries, rewriting + + void fswapf(float *a, float *b) { + float tmp = *a; + *a = *b; + *b = tmp; + } + + as + + static void fswapf_aux(float *a, float *b, float tmp, char _[( + tmp = *a, + *a = *b, + *b = tmp, + 1 + )]) {} + + void fswapf(float *a, float *b, char _[( + fswapf_aux(a, b, 0, ""), 1 + )]) {} + + Alternatively, if re-entrancy and thread-safety are disregarded, we could + just use global (static) variables. + +- What about loops? Clearly we cannot use `while` or `for` inside expressions. + Thankfully, they are unnecessary thanks to recursion. For example: + + float sum(float *v, size_t n) { + float sum = 0.0f; + for (size_t i = 0; i < n; ++i) + sum += v[i]; + return sum; + } + + can be expressed as + + /* the forward declaration is necessary */ + static void sum_aux(float *out, float *v, size_t n, char *); + static void sum_aux(float *out, float *v, size_t n, char _[( + (n > 0) ? ( + *out += *v, + sum_aux(out, v + 1, n - 1, ""), + 1 + ) : 1 + )]) {} + + void sum(float *out, float *v, size_t n, char _[( + *out = 0.0f, + sum_aux(out, v, n, ""), + 1 + )]) {} + + In fact, any imperative-style loop can be turned into an equivalent recursive + loop (as any functional programmer will be happy to demonstrate), though since + C lacks closures or any form of anonymous functions, it can get quite unwieldy + (I hope you like auxiliary functions). + + The astute reader might point out that these two versions of `sum` are not + equivalent because the recursive definition may cause a stack overflow for + large enough values of `n`. This is unfortunately true, and the major hurdle + for the practicality of disembodied C, but does not preclude Turing + completeness (an ideal [Turing machine] has infinite memory at its + disposal). Luckily, modern compilers are smart, and if we write our functions + carefully to be [tail recursive], they will often be able to perform tail + call elimination, removing the risk of a stack overflow. For the above + example, both gcc and clang are able to optimize the tail call ([see on + compiler explorer][b]). + +- Although not relevant to standard C99, in case you had the idea of using [gcc + statement expressions] to bypass these limitations, the compiler will stop + you right on your tracks since it doesn't allow those outside of function + bodies. + + +[Turing machine]: https://en.wikipedia.org/wiki/Turing_machine +[tail recursive]: https://en.wikipedia.org/wiki/Tail_call +[gcc statement expressions]: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html +[a]: https://godbolt.org/z/f3chYMvh5 +[b]: https://godbolt.org/z/b1aG78os5 + + +Limitations +=========== + +After all of that, it seems that basically anything can be done in disembodied +C, so, is there something that *can't* be expressed in this C99 subset? + +With the rules that we've laid down so far, yes. In particular: + +- It is not possible in the general case to use APIs that require callbacks. + For example, look at the standard `qsort` function: + + void qsort(void *base, size_t nmemb, size_t size, + int (*compar)(const void *, const void *)); + + The `compar` parameter is a pointer to a function that should return the + result of comparing two values in the given array. Since its parameters are + void pointers, we can't have the VLA argument we need to perform computations + (arrays of `void` are not valid C), and we also cannot provide an `int` + return value. Thus there is no way (that I know of) to use this function in + standards-compliant disembodied C. + + This is not a dealbreaker since we could just reimplement `qsort` ;). + +- We cannot access the program arguments. The entry point of a program in + disembodied C looks like this: + + int main(int argc, char *argv[ /* code here ... */ ]) {} + + In the VLA length expression for `argv`, `argc` is in scope, but `argv` is + not. + + Fortunately there is a way to work around this: we can use the common + extension where `main` can take a third argument which is a pointer to the + environment. According to the standard ([J.5.1]), this is implemented in + hosted environments. In practice, POSIX environments and Microsoft both + support it. Then, our entry point looks like this instead: + + int main(int argc, char **argv, char *envp[/* ... */]) {} + + Now `argv` is in scope within the `envp` array length expression. + + Side note: even though we can't `return`, `main` is the exception to the rule + that reaching the closing `}` of a function returning non-void is verboten, + and is equivalent to returning zero ([5.1.2.2.3]). If we need to terminate + with an exit code other than zero, we can use `exit()`. + + +[J.5.1]: http://port70.net/~nsz/c/c99/n1256.html#J.5.1 +[5.1.2.2.3]: http://port70.net/~nsz/c/c99/n1256.html#5.1.2.2.3 + + +Final Words +=========== + +It seems that with enough effort anything can be done with these peculiar +restrictions. As an example, I wrote implementations of the [MD5] and [SHA256] +hash functions as well as [Conway's Game of Life using SDL for graphics][life]. +I was also working on a more interesting and "useful" program but I kind of +lost motivation halfway through. I'll update this article if I ever come back +to finish that. + +In case it wasn't clear enough, there is zero practical reason to ever do any +of this. I hope you'll agree it's a fun party trick though :). + + +[MD5]: https://git.sr.ht/~lsof/x/tree/main/item/vla/md5sum.c +[SHA256]: https://git.sr.ht/~lsof/x/tree/main/item/vla/sha256sum.c +[life]: https://git.sr.ht/~lsof/x/tree/main/item/vla/life.c diff --git a/content/w/first.md b/content/w/first.md new file mode 100644 index 0000000..291b364 --- /dev/null +++ b/content/w/first.md @@ -0,0 +1,15 @@ ++++ +title = "test post" +date = 2022-02-02 ++++ + +# test +lorem ipsum dolor sit amet + + u16 rng(u16 s){ + s^=s<<8,s=(s<<8 + |s>>8)^((s&0xff + )<<1),s=0x7e00^ + (s>>1)^(0x9e74& + -(~s&1));return + 46497^s?s:s^s;} diff --git a/static/mips.html b/static/mips.html index 219034c..90e66bd 100644 --- a/static/mips.html +++ b/static/mips.html @@ -86,6 +86,10 @@ hr {
+ +

MIPS Assembler

Input MIPS assembly code below. Labels, .org and simple .word directives are also recognized. Comments start with ';' and extend to the end of the line.

diff --git a/static/style.css b/static/style.css index a3046b6..77be7bc 100644 --- a/static/style.css +++ b/static/style.css @@ -37,6 +37,17 @@ p { text-align: justify; } +nav.navbar { + padding-top: 10px; +} + +nav.navbar > a { + padding: 1px 20px; + background: var(--pre-bg); + border: 1px solid var(--border1); + border-radius: 6px; +} + h1.title { line-height: 1.2; font-size: 40px; diff --git a/templates/blog-page.html b/templates/blog-page.html index b8263e0..ddf34f0 100644 --- a/templates/blog-page.html +++ b/templates/blog-page.html @@ -3,6 +3,10 @@ {% block title %}{{ page.title }}{% endblock title %} {% block content %} + +

{{ page.title }}

@@ -10,5 +14,4 @@
{{ page.content | safe }} -
Go back {% endblock content %} diff --git a/templates/blog.html b/templates/blog.html index afee41d..fb7ea21 100644 --- a/templates/blog.html +++ b/templates/blog.html @@ -3,6 +3,9 @@ {% block title %}lemon's site{% endblock title %} {% block content %} +

{{ section.title }}

@@ -11,5 +14,4 @@
  • {{ page.title }} ({{ page.date }})
  • {% endfor %} -
    Go back {% endblock content %} diff --git a/templates/index.html b/templates/index.html index d6c2755..3eee716 100644 --- a/templates/index.html +++ b/templates/index.html @@ -12,7 +12,7 @@ share some of that stuff here.