diff options
| -rw-r--r-- | .editorconfig | 4 | ||||
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | README.md | 160 | ||||
| -rwxr-xr-x | bootstrap.sh | 2 | ||||
| -rw-r--r-- | doc/cstd.md | 36 | ||||
| -rw-r--r-- | main.c | 16 |
6 files changed, 196 insertions, 23 deletions
diff --git a/.editorconfig b/.editorconfig index 5697a97..ec3b152 100644 --- a/.editorconfig +++ b/.editorconfig @@ -12,3 +12,7 @@ indent_size = 3 [Makefile] indent_style = tab + +[*.md] +indent_style = space +indent_size = 2 @@ -1,4 +1,5 @@ hostconfig.h +config.mk antcc antcc0 antcc1 @@ -1,27 +1,155 @@ -`antcc` is a C compiler using its own custom backend. Currently still in a -experimental stage, but can compile successfully some real-world C codebases (e.g. lua, sqlite3, oksh). +`antcc` is a small C compiler using its own independent backend. -Report bugs in the [issue tracker](https://codeberg.org/lsof/antcc/issues), or -by sending me an email. +Supports [most of C11 and some C23 features](doc/cstd.md), as well as some GNU extensions. -# Supported targets +Currently still in a experimental stage, but can successfully build some +real-world C codebases such as Lua, SQLite3 and oksh (and itself). -- For now just x86-64 POSIX (Sys-V + ELF). Only tested on linux so far. +`antcc` is inspired by other small C compilers like +[TCC](https://bellard.org/tcc/), +[cproc](https://git.sr.ht/~mcf/cproc), +[chibicc](https://github.com/rui314/chibicc), +and backends like [QBE](https://c9x.me/compile/) and [LLVM](https://llvm.org/). -# Building +## Requirements -``` -./configure +`antcc` is written in standard C11 and can be built with any conforming +compiler toolchain. The `Makefile` requires GNU Make. At runtime, an existing +C compiler is currently required for calling the linker with the appropiate +libc runtime paths (eventually, the driver should only depend on the linker by +determining those linker paths and flags at `configure` time). + +## Building + +Run `./configure` to create `hostconfig.h` and `config.mk` for your system. + +Build with -make # outputs ./antcc executable -# or -make opt #compile with optimizations -# or +``` +make +#or +make opt #compile with -O2 +#or make dbg #compile with UBsan and Asan ``` -# Usage +Install with `(sudo) make install`. + +## Supported targets + +For now just x86-64 POSIX (Sys-V + ELF). aarch64 backend is in the works. Tested and known to work: -The driver is still incomplete but it mimics that of compilers like gcc. + - `x86_64-linux-gnu` + - `x86_64-linus-musl` + +## Usage + +The driver is still incomplete but it mimics that of compilers like gcc, see `--help`. `antcc` compiles translation units to object files directly, but the driver -will invoke an external linker command to output an executable if `-c` isn't passed. +will invoke an external command to link to an executable if `-c` isn't passed. + +Cross-compilation is partially supported: cross-compiling object files works +but an external cross-compiling toolchain for linking is required; the driver +will try to find one (invoking e.g. `aarch64-linux-gnu-gcc`, or falling back +to [`zig cc`](https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html)), +and appropiate include paths must be manually specified. You can specify the compiler target architecture with `-target <triple>`. + +## Testing + +`bootstrap.sh` will bootstrap the compiler in 3 stages: + - Stage 0 builds the compiler with the system's C compiler + - Stage 1 builds the compiler with the stage 0 output + - Stage 2 builds the compiler with the stage 1 output + - Then stage 1 and 2 outputs are verified to be identical + +There are tests in the `test` directory: + - `test/run.sh`: local tests + - `test/lua.sh`: compile Lua 5.4.0 and run its testsuite + - `test/c-testsuite.sh`: run [c-testsuite](https://github.com/c-testsuite/c-testsuite) + +## Issues and contributing + +You can report issues on the [issue tracker](https://codeberg.org/lsof/antcc/issues). + +Contributions are welcome, send as pull requests [on Codeberg](https://codeberg.org/lsof/antcc/pulls). + +## Internals & Design + +C type representation (`type.h` & `type.c`) is shared by the frontend and +backend because the backend is responsible for ABI-specific lowering of calling +conventions. + +The C frontend is structured like so: + - Compiler driver (`main.c`), which parses command line options, inputs and + outputs and calls out to the core compiler to build individual object + files and possibly invoke an external command to link them together. + + - Tokenizer & preprocessor (`c/lex.c`): The input file is scanned on-demand, + initially reading characters into an internal buffer after performing + backslash-newline delition (and possibly trigraph substitution), then producing + one token at a time when the parser requests the next one. Preprocessing + (directives & macro expansion) is also done on the fly. + + - Parser & IR generation (`c/c.c`): The handwritten parser reads declarations + and keeps them in a symbol table/environment. Static data is written to + buffers that correspond to the .rodata/.data sections of the final object + file, emitting relocations to the object file interface too. Function + bodies are parsed and transformed into the IR in one pass. Expressions are + parsed into expression trees before being emitted or compile-time evaluated + (`c/eval.c`), but there is no whole-program AST. When the end of a + function definition is reached, the backend is called to perform all of the + passes that will finally transform it into machine code written to the + .text section. + +The backend (`ir/*`) uses an IR in Static Single Assignment (SSA) form. +Instructions have a return type and up to two operands. Because of SSA form, +temporaries are simply referenced by the instruction that provides their +definition, so an explicit output operand is not required. The list of +instructions is defined in `ir/op.def`. Each basic block in the control flow +graph consists of 0 or more phi functions, followed by 0 or more instructions, +terminated by a jump (unconditional/conditional branch, return, or trap). + +The builder API (`ir/builder.c`) used by the frontend performs peephole +optimizations on the fly, mainly constant folding. + +Object file interface routines are in `obj/obj.[c/h]` ELF implementation in +`obj/elf.[c/h]`. Support for other object formats like PE and Mach-O is planned. +Debug information in the form of DWARF is also planned, but it is a sizeable +undertaking. + +The `-d...` compiler flag can be used to print the output of different stages +of the backend for debugging. + +The backend performs the following main passes: + - ABI lowering (`ir/abi0.c`, `x86_64/sysv.c`): implements target calling + convention details, such as lowering structures being passed/returned by + value in registers or the stack. + - Intrinsics lowering (`ir/intrin.c`): lowers some intrinsics emitted by the + frontend (currently just structcopy) + - mem2reg (`ir/mem2reg.c`): lower stack slots into SSA temporaries. This is + an important pass because the frontend puts every C variable into a stack + slot, and this pass transforms those into temporaries and phi instructions + in SSA form instructions when possible (most of the time, unless they are + aggregates or their address is taken), which is also how clang/LLVM does + it. Can be disabled with -O0. + - With -O1+ optimizations enabled + + inlining (`ir/inliner.c`) + + common-subexpression elimination (`ir/cse.c`), + + general arithmetic simplifications, branch simplification + (`ir/simpl.c`) + + - Stack lowering (`ir/stack.c`): `alloca` instructions are deleted and + corresponding stack slots replaced with calculated stack offsets. + - Instruction selection (`ir/isel.c`, `x86_64/isel.c`): architecture-specific + instruction selection, addressing mode utilization, introduction of + register constraints. + - Register allocation (`ir/regalloc.c`): performs linear scan register + allocation. Spilling has a lot of room for improvement, at the moment the + current interval is spilled when there are no free registers, with no other + heuristics. A scratch register is reserved for operations with spilled + temporaries. + - Code emission (`x86_64/emit.c`): binary code for the target architecture is + emitted directly (not textual assembly). Relocations are deferred to the + object file interface too. + +[ ... ] diff --git a/bootstrap.sh b/bootstrap.sh index 1761ac5..b2f3fa9 100755 --- a/bootstrap.sh +++ b/bootstrap.sh @@ -27,3 +27,5 @@ echo echo "== Stage 2 (compiling with stage 1 output) ==" X ./antcc1 $opt $cflags -o antcc2 $src X md5sum antcc2 + +(X cmp antcc1 antcc2) && echo ok. || (echo 'bootstrap FAIL!'; exit 1) diff --git a/doc/cstd.md b/doc/cstd.md new file mode 100644 index 0000000..63e39cd --- /dev/null +++ b/doc/cstd.md @@ -0,0 +1,36 @@ +A list of missing standard C features: + +## ANSI/C89 + - K&R style function definitions with type declarations: `h(a, b) int a; double b; { ... }` + +## C99 + - Variable-length arrays (VLAs) + - Proper `long double` support in platforms with extended floats (currently equivalent to `double`) + - `complex` types, `<tgmath.h>` header + - digraphs + - Universal character names (`\uXXXX`, `\UXXXXXXXX`) + - IEEE 754 float support Annex F IEC 60559 (`FLT_EVAL_METHOD`, `FENV_ACCESS` pragma) (not even GCC or Clang care about this) + +## C11 + - `_Alignas`, `max_align_t` + - Multithreading support (`_Thread_local`, atomics) + - `u8".."` string literals + +## C23 + - Decimal floating-point types (`_Decimal32`, `_Decimal64`, and `_Decimal128`) + - Bit-precise integers (`_BitInt`) + - `nullptr`, `nullptr_t` + - Binary integer constants + - `char8_t` + - Digit separator ' + - Attributes (`[[...]]` syntax) + - Labels followed by declarations and } + - `true` and `false` keywords + - `auto` for type inference, `typeof_unqual`, `constexpr` + - `unreachable` macro in `<stddef.h>` + - checked int arithmetic (`<stdckdint.h>`) + - tagged type compatibility [N3037](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3037.pdf) + - storage class specifier for compound literals + - preprocessor `#embed` directive, `__has_include`, `__has_c_attribute` + - Pragmas for float rounding direction (`STDC FENV_ROUND`, `STDC FENV_DEC_ROUND`) + - IEEE 754 decimal floats, interchange and extended types. various `<float.h>` macros,... @@ -219,10 +219,11 @@ optparse(char **args) } else if (*arg == 'g') { /* TODO debug info */ } else if (*arg == 'O') { - if (!arg[1]) ccopt.o = 0; /* default opts */ - else if (arg[1] == '1') ccopt.o = OPT1; - else if ((uint)arg[1] - '1' < 9) ccopt.o = OPT2; - else if (arg[1] == '0') ccopt.o = OPT0; + char o = arg[1]; + if (!o || o == 'g') ccopt.o = 0; /* default opts */ + else if (o == '1' || o == 's' || o == 'z') ccopt.o = OPT1; + else if ((uint)o - '1' < 9) ccopt.o = OPT2; + else if (o == '0') ccopt.o = OPT0; else goto Bad; } else if (*arg == 'D' || *arg == 'U') { void cpppredef(bool undef, const char *cmd); @@ -646,7 +647,8 @@ sysinclpaths(void) static void prihelp(void) { - pfmt("Usage: antcc [options] infile(s)...\n" + pfmt("antcc version "ANTCC_VERSION_STR"\n" + "Usage: antcc [options] infile(s)...\n" " antcc [options] -run infile [arguments...]\n" " antcc [options] infile(s)... -run [arguments...]\n" "Options:\n" @@ -665,8 +667,8 @@ prihelp(void) " -llib \tLink with library\n" " -fpie \tEmit code for position independent executable\n" " -fpic \tEmit position independent code\n" - " -O[0|1..] \tSet optimization level\n" - " -x[c|o] \tSpecify type of next input file (C, object)\n" + " -O<..> \tSet optimization level (0|g|1|2|s|z) (default: -Og)\n" + " -x<c|o> \tSpecify type of next input file (C, object)\n" " -W[...] \tTurn on warnings (stub)\n" " -Werror \tTurn warnings into errors\n" " -w \tSuppress warnings\n" |