aboutsummaryrefslogtreecommitdiffhomepage
path: root/README.md
diff options
context:
space:
mode:
authorlemon <lsof@mailbox.org>2026-02-21 12:35:55 +0100
committerlemon <lsof@mailbox.org>2026-02-21 12:35:55 +0100
commit8aa0c6a5526a69b7f1b992990db59b275dfc2d80 (patch)
treefbe2e86eefa82ffc681445e991998f9b12a521f7 /README.md
parent8dfd823f9f8b4d9bc43af473ab2758b3f1cc2012 (diff)
project documentation
Diffstat (limited to 'README.md')
-rw-r--r--README.md160
1 files changed, 144 insertions, 16 deletions
diff --git a/README.md b/README.md
index d0bfcb7..c0db173 100644
--- a/README.md
+++ b/README.md
@@ -1,27 +1,155 @@
-`antcc` is a C compiler using its own custom backend. Currently still in a
-experimental stage, but can compile successfully some real-world C codebases (e.g. lua, sqlite3, oksh).
+`antcc` is a small C compiler using its own independent backend.
-Report bugs in the [issue tracker](https://codeberg.org/lsof/antcc/issues), or
-by sending me an email.
+Supports [most of C11 and some C23 features](doc/cstd.md), as well as some GNU extensions.
-# Supported targets
+Currently still in a experimental stage, but can successfully build some
+real-world C codebases such as Lua, SQLite3 and oksh (and itself).
-- For now just x86-64 POSIX (Sys-V + ELF). Only tested on linux so far.
+`antcc` is inspired by other small C compilers like
+[TCC](https://bellard.org/tcc/),
+[cproc](https://git.sr.ht/~mcf/cproc),
+[chibicc](https://github.com/rui314/chibicc),
+and backends like [QBE](https://c9x.me/compile/) and [LLVM](https://llvm.org/).
-# Building
+## Requirements
-```
-./configure
+`antcc` is written in standard C11 and can be built with any conforming
+compiler toolchain. The `Makefile` requires GNU Make. At runtime, an existing
+C compiler is currently required for calling the linker with the appropiate
+libc runtime paths (eventually, the driver should only depend on the linker by
+determining those linker paths and flags at `configure` time).
+
+## Building
+
+Run `./configure` to create `hostconfig.h` and `config.mk` for your system.
+
+Build with
-make # outputs ./antcc executable
-# or
-make opt #compile with optimizations
-# or
+```
+make
+#or
+make opt #compile with -O2
+#or
make dbg #compile with UBsan and Asan
```
-# Usage
+Install with `(sudo) make install`.
+
+## Supported targets
+
+For now just x86-64 POSIX (Sys-V + ELF). aarch64 backend is in the works. Tested and known to work:
-The driver is still incomplete but it mimics that of compilers like gcc.
+ - `x86_64-linux-gnu`
+ - `x86_64-linus-musl`
+
+## Usage
+
+The driver is still incomplete but it mimics that of compilers like gcc, see `--help`.
`antcc` compiles translation units to object files directly, but the driver
-will invoke an external linker command to output an executable if `-c` isn't passed.
+will invoke an external command to link to an executable if `-c` isn't passed.
+
+Cross-compilation is partially supported: cross-compiling object files works
+but an external cross-compiling toolchain for linking is required; the driver
+will try to find one (invoking e.g. `aarch64-linux-gnu-gcc`, or falling back
+to [`zig cc`](https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html)),
+and appropiate include paths must be manually specified. You can specify the compiler target architecture with `-target <triple>`.
+
+## Testing
+
+`bootstrap.sh` will bootstrap the compiler in 3 stages:
+ - Stage 0 builds the compiler with the system's C compiler
+ - Stage 1 builds the compiler with the stage 0 output
+ - Stage 2 builds the compiler with the stage 1 output
+ - Then stage 1 and 2 outputs are verified to be identical
+
+There are tests in the `test` directory:
+ - `test/run.sh`: local tests
+ - `test/lua.sh`: compile Lua 5.4.0 and run its testsuite
+ - `test/c-testsuite.sh`: run [c-testsuite](https://github.com/c-testsuite/c-testsuite)
+
+## Issues and contributing
+
+You can report issues on the [issue tracker](https://codeberg.org/lsof/antcc/issues).
+
+Contributions are welcome, send as pull requests [on Codeberg](https://codeberg.org/lsof/antcc/pulls).
+
+## Internals & Design
+
+C type representation (`type.h` & `type.c`) is shared by the frontend and
+backend because the backend is responsible for ABI-specific lowering of calling
+conventions.
+
+The C frontend is structured like so:
+ - Compiler driver (`main.c`), which parses command line options, inputs and
+ outputs and calls out to the core compiler to build individual object
+ files and possibly invoke an external command to link them together.
+
+ - Tokenizer & preprocessor (`c/lex.c`): The input file is scanned on-demand,
+ initially reading characters into an internal buffer after performing
+ backslash-newline delition (and possibly trigraph substitution), then producing
+ one token at a time when the parser requests the next one. Preprocessing
+ (directives & macro expansion) is also done on the fly.
+
+ - Parser & IR generation (`c/c.c`): The handwritten parser reads declarations
+ and keeps them in a symbol table/environment. Static data is written to
+ buffers that correspond to the .rodata/.data sections of the final object
+ file, emitting relocations to the object file interface too. Function
+ bodies are parsed and transformed into the IR in one pass. Expressions are
+ parsed into expression trees before being emitted or compile-time evaluated
+ (`c/eval.c`), but there is no whole-program AST. When the end of a
+ function definition is reached, the backend is called to perform all of the
+ passes that will finally transform it into machine code written to the
+ .text section.
+
+The backend (`ir/*`) uses an IR in Static Single Assignment (SSA) form.
+Instructions have a return type and up to two operands. Because of SSA form,
+temporaries are simply referenced by the instruction that provides their
+definition, so an explicit output operand is not required. The list of
+instructions is defined in `ir/op.def`. Each basic block in the control flow
+graph consists of 0 or more phi functions, followed by 0 or more instructions,
+terminated by a jump (unconditional/conditional branch, return, or trap).
+
+The builder API (`ir/builder.c`) used by the frontend performs peephole
+optimizations on the fly, mainly constant folding.
+
+Object file interface routines are in `obj/obj.[c/h]` ELF implementation in
+`obj/elf.[c/h]`. Support for other object formats like PE and Mach-O is planned.
+Debug information in the form of DWARF is also planned, but it is a sizeable
+undertaking.
+
+The `-d...` compiler flag can be used to print the output of different stages
+of the backend for debugging.
+
+The backend performs the following main passes:
+ - ABI lowering (`ir/abi0.c`, `x86_64/sysv.c`): implements target calling
+ convention details, such as lowering structures being passed/returned by
+ value in registers or the stack.
+ - Intrinsics lowering (`ir/intrin.c`): lowers some intrinsics emitted by the
+ frontend (currently just structcopy)
+ - mem2reg (`ir/mem2reg.c`): lower stack slots into SSA temporaries. This is
+ an important pass because the frontend puts every C variable into a stack
+ slot, and this pass transforms those into temporaries and phi instructions
+ in SSA form instructions when possible (most of the time, unless they are
+ aggregates or their address is taken), which is also how clang/LLVM does
+ it. Can be disabled with -O0.
+ - With -O1+ optimizations enabled
+ + inlining (`ir/inliner.c`)
+ + common-subexpression elimination (`ir/cse.c`),
+ + general arithmetic simplifications, branch simplification
+ (`ir/simpl.c`)
+
+ - Stack lowering (`ir/stack.c`): `alloca` instructions are deleted and
+ corresponding stack slots replaced with calculated stack offsets.
+ - Instruction selection (`ir/isel.c`, `x86_64/isel.c`): architecture-specific
+ instruction selection, addressing mode utilization, introduction of
+ register constraints.
+ - Register allocation (`ir/regalloc.c`): performs linear scan register
+ allocation. Spilling has a lot of room for improvement, at the moment the
+ current interval is spilled when there are no free registers, with no other
+ heuristics. A scratch register is reserved for operations with spilled
+ temporaries.
+ - Code emission (`x86_64/emit.c`): binary code for the target architecture is
+ emitted directly (not textual assembly). Relocations are deferred to the
+ object file interface too.
+
+[ ... ]