summaryrefslogtreecommitdiff
path: root/content/w/c99-vla-tricks.md
blob: 539ddd850c7a638d0dfcd8f238c38786fab07817 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
---
title: "C99 doesn't need function bodies, or 'VLAs are Turing complete'"
date: 2022-02-19
taxonomies:
  tags: ["programming", "c"]
---

Preface
=======

The 1999 revision to the C programming language brought several interesting
changes and additions to the standard. Among those are variable length arrays,
or VLAs for short, which allow array types to have lengths that are determined
at runtime. These can show up in different contexts, but for the purposes of
this post we will be looking at VLAs as parameters to functions. For example:

    void f(int n, float v[n]);

Since array parameters decay to their corresponding pointer type, that
declaration is functionally equivalent, and compatible ([6.7.5.3p7]) to the
following:

    void f(int n, float *v);

The length of `v` in the first declaration is given by the value of a previous
parameter of the function, which can only be known at runtime. However, in a
function prototype (i.e. not a function definition) like the above, the length
of a VLA parameter is never computed and essentially ignored ([6.7.5.2p5]).
But in a function definition, the length is evaluated and must be greater than
zero. In practice this is rather useless because the sizeof operator doesn't
work as one might expect it to with array parameters ([6.5.3.4-note88]):

    void f(int n, float param[n]) {
        float local[n];
        int a = sizeof local,
            b = sizeof param;

        printf("n = %d\n", n);
        printf("sizeof local = %d\n", a); // this will be n * sizeof(float)
        printf("sizeof param = %d\n", b); // but this will be sizeof(float *)
    }

In other words, using sizeof with a VLA will evaluate the runtime-known length
and calculate the array size based on that, except when that VLA is a function
parameter, then, for whatever reason, it is the size of the corresponding
pointer type (in this example, `sizeof local === n * sizeof(float)` but `sizeof
param == sizeof(float *)`). The length of a VLA parameter may be used when e.g.
computing indices when accessing multi-dimensional variable length arrays.

Alas, the standard mandates that the variable array length be computed when
the function is called. Of course, the expression in between the square
brackets is not limited to simple expressions like `n`, so one can write
something like:

    void f(int a, int b, int m[(a + b) / 2]) {}

or

    void f(int x, int b[abs(x)]) {}

or even

    void f(int v[getchar()]) {}


[6.7.5.3p7]: http://port70.net/~nsz/c/c99/n1256.html#6.7.5.3p7
[6.7.5.2p5]: http://port70.net/~nsz/c/c99/n1256.html#6.7.5.2p5
[6.5.3.4-note88]: http://port70.net/~nsz/c/c99/n1256.html#note88


Disembodiment
=============

The following program should give you an idea of the kind of constructs
that these rules allow for ([try it on compiler explorer][a]):

    int main(int argc, char *argv[printf("Hello")])
    {
        printf(" world!\n");
    }

The length expression of the VLA is evaluated before any of the statements in
`main`. I couldn't find anywhere in the standard saying whether this evaluation
order is well-defined but it is what clang and gcc do, and luckily, it does not
matter for the sake of this article, as we will see shortly.

Let us refer to the subset of C99 where function bodies must be empty as
*disembodied C*. You might naturally ask yourself what things can be
accomplished in this subset (though you can probably guess the answer from the
title of this page). In other words, what can you do in C if you are limited to
just evaluating expressions and no statements?

- Using the comma operator, we can sequence arbitrarily many expressions for
  their side effects, so

      void f() {
          printf("Press enter to confirm: ");
          getchar();
          printf("thanks.\n");
      }

  becomes

      void f(char _[(
          printf("Press enter to confirm: "),
          getchar(),
          printf("thanks.\n"),
          1
      )]) {}

  In a comma expression, the operands are evaluated left-to-right and the value
  of the last operand is the resulting value of the whole expression. The `1`
  at the end ensures that the evaluated array length is >0 (to avoid UB).

- Functions in disembodied C are going to need a dummy parameter where the VLA
  length expression evaluation can take place. For consistency, we will denote
  it as `char _[...]` and give it the value `""` (the empty string) when calling
  said functions (note that the value we give it doesn't actually matter, though
  its size should be at least as big as the computed VLA size to avoid UB).

- If-else statements can be replaced with ternary conditional expressions, such that

      void f(int n) {
          if (n < 0)
              printf("negative!");
          else if (n > 0)
              printf("positive!");
          else
              printf("zero!");
      }

  becomes

      void f(int n, char _[(
          (n < 0) ?
              printf("negative!")
          : (n > 0) ?
              printf("positive!")
          :
              printf("zero!")
          , 1
      )]) {}

- Remember that the VLA length expression can access previous function arguments,
  so parameter passing is essentially unchanged.

- We cannot return values, but we can use out parameters by taking advantage of
  the fact that assignments are expressions in C, so instead of

      int imax(int a, int b) {
          return a > b ? a : b;
      }

  we can write

      void imax(int *out, int a, int b, char _[
          (*out = a > b ? a : b),
          1
      ]) {}

- We cannot define local variables inside of expressions, but we can just add
  extra function parameters to use as temporaries, rewriting

      void fswapf(float *a, float *b) {
          float tmp = *a;
          *a = *b;
          *b = tmp;
      }

  as

      static void fswapf_aux(float *a, float *b, float tmp, char _[(
          tmp = *a,
          *a = *b,
          *b = tmp,
          1
      )]) {}

      void fswapf(float *a, float *b, char _[(
          fswapf_aux(a, b, 0, ""), 1
      )]) {}

  Alternatively, if re-entrancy and thread-safety are disregarded, we could
  just use global (static) variables.

- What about loops? Clearly we cannot use `while` or `for` inside expressions.
  Thankfully, they are unnecessary thanks to recursion. For example:

      float sum(float *v, size_t n) {
          float sum = 0.0f;
          for (size_t i = 0; i < n; ++i)
              sum += v[i];
          return sum;
      }

  can be expressed as

      /* the forward declaration is necessary */
      static void sum_aux(float *out, float *v, size_t n, char *);
      static void sum_aux(float *out, float *v, size_t n, char _[(
          (n > 0) ? (
              *out += *v,
              sum_aux(out, v + 1, n - 1, ""),
              1
          ) : 1
      )]) {}

      void sum(float *out, float *v, size_t n, char _[(
          *out = 0.0f,
          sum_aux(out, v, n, ""),
          1
      )]) {}

  In fact, any imperative-style loop can be turned into an equivalent recursive
  loop (as any functional programmer will be happy to demonstrate), though since
  C lacks closures or any form of anonymous functions, it can get quite unwieldy
  (I hope you like auxiliary functions).

  The astute reader might point out that these two versions of `sum` are not
  equivalent because the recursive definition may cause a stack overflow for
  large enough values of `n`. This is unfortunately true, and the major hurdle
  for the practicality of disembodied C, but does not preclude Turing
  completeness (an ideal [Turing machine] has infinite memory at its
  disposal). Luckily, modern compilers are smart, and if we write our functions
  carefully to be [tail recursive], they will often be able to perform tail
  call elimination, removing the risk of a stack overflow. For the above
  example, both gcc and clang are able to optimize the tail call ([see on
  compiler explorer][b]).

- Although not relevant to standard C99, in case you had the idea of using [gcc
  statement expressions] to bypass these limitations, the compiler will stop
  you right on your tracks since it doesn't allow those outside of function
  bodies.


[Turing machine]: https://en.wikipedia.org/wiki/Turing_machine
[tail recursive]: https://en.wikipedia.org/wiki/Tail_call
[gcc statement expressions]: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
[a]: https://godbolt.org/z/f3chYMvh5
[b]: https://godbolt.org/z/b1aG78os5


Limitations
===========

After all of that, it seems that basically anything can be done in disembodied
C, so, is there something that *can't* be expressed in this C99 subset?

With the rules that we've laid down so far, yes. In particular:

- It is not possible in the general case to use APIs that require callbacks.
  For example, look at the standard `qsort` function:

       void qsort(void *base, size_t nmemb, size_t size,
                  int (*compar)(const void *, const void *));
  
  The `compar` parameter is a pointer to a function that should return the
  result of comparing two values in the given array. Since its parameters are
  void pointers, we can't have the VLA argument we need to perform computations
  (arrays of `void` are not valid C), and we also cannot provide an `int`
  return value. Thus there is no way (that I know of) to use this function in
  standards-compliant disembodied C.

  This is not a dealbreaker since we could just reimplement `qsort` ;).

- We cannot access the program arguments. The entry point of a program in
  disembodied C looks like this:

      int main(int argc, char *argv[ /* code here ... */ ]) {}

  In the VLA length expression for `argv`, `argc` is in scope, but `argv` is
  not.

  Fortunately there is a way to work around this: we can use the common
  extension where `main` can take a third argument which is a pointer to the
  environment. According to the standard ([J.5.1]), this is implemented in
  hosted environments. In practice, POSIX environments and Microsoft both
  support it. Then, our entry point looks like this instead:

      int main(int argc, char **argv, char *envp[/* ... */]) {}

  Now `argv` is in scope within the `envp` array length expression.

  Side note: even though we can't `return`, `main` is the exception to the rule
  that reaching the closing `}` of a function returning non-void is verboten,
  and is equivalent to returning zero ([5.1.2.2.3]). If we need to terminate
  with an exit code other than zero, we can use `exit()`.


[J.5.1]: http://port70.net/~nsz/c/c99/n1256.html#J.5.1
[5.1.2.2.3]: http://port70.net/~nsz/c/c99/n1256.html#5.1.2.2.3


Final Words
===========

It seems that with enough effort anything can be done with these peculiar
restrictions. As an example, I wrote implementations of the [MD5] and [SHA256]
hash functions as well as [Conway's Game of Life using SDL for graphics][life].
I was also working on a more interesting and "useful" program but I kind of
lost motivation halfway through. I'll update this article if I ever come back
to finish that.

In case it wasn't clear enough, there is zero practical reason to ever do any
of this. I hope you'll agree it's a fun party trick though :).


[MD5]: https://git.sr.ht/~lsof/x/tree/main/item/vla/md5sum.c
[SHA256]: https://git.sr.ht/~lsof/x/tree/main/item/vla/sha256sum.c
[life]: https://git.sr.ht/~lsof/x/tree/main/item/vla/life.c