aboutsummaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
-rw-r--r--.editorconfig4
-rw-r--r--.gitignore1
-rw-r--r--README.md160
-rwxr-xr-xbootstrap.sh2
-rw-r--r--doc/cstd.md36
-rw-r--r--main.c16
6 files changed, 196 insertions, 23 deletions
diff --git a/.editorconfig b/.editorconfig
index 5697a97..ec3b152 100644
--- a/.editorconfig
+++ b/.editorconfig
@@ -12,3 +12,7 @@ indent_size = 3
[Makefile]
indent_style = tab
+
+[*.md]
+indent_style = space
+indent_size = 2
diff --git a/.gitignore b/.gitignore
index 3d6828c..3b34cc8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
hostconfig.h
+config.mk
antcc
antcc0
antcc1
diff --git a/README.md b/README.md
index d0bfcb7..c0db173 100644
--- a/README.md
+++ b/README.md
@@ -1,27 +1,155 @@
-`antcc` is a C compiler using its own custom backend. Currently still in a
-experimental stage, but can compile successfully some real-world C codebases (e.g. lua, sqlite3, oksh).
+`antcc` is a small C compiler using its own independent backend.
-Report bugs in the [issue tracker](https://codeberg.org/lsof/antcc/issues), or
-by sending me an email.
+Supports [most of C11 and some C23 features](doc/cstd.md), as well as some GNU extensions.
-# Supported targets
+Currently still in a experimental stage, but can successfully build some
+real-world C codebases such as Lua, SQLite3 and oksh (and itself).
-- For now just x86-64 POSIX (Sys-V + ELF). Only tested on linux so far.
+`antcc` is inspired by other small C compilers like
+[TCC](https://bellard.org/tcc/),
+[cproc](https://git.sr.ht/~mcf/cproc),
+[chibicc](https://github.com/rui314/chibicc),
+and backends like [QBE](https://c9x.me/compile/) and [LLVM](https://llvm.org/).
-# Building
+## Requirements
-```
-./configure
+`antcc` is written in standard C11 and can be built with any conforming
+compiler toolchain. The `Makefile` requires GNU Make. At runtime, an existing
+C compiler is currently required for calling the linker with the appropiate
+libc runtime paths (eventually, the driver should only depend on the linker by
+determining those linker paths and flags at `configure` time).
+
+## Building
+
+Run `./configure` to create `hostconfig.h` and `config.mk` for your system.
+
+Build with
-make # outputs ./antcc executable
-# or
-make opt #compile with optimizations
-# or
+```
+make
+#or
+make opt #compile with -O2
+#or
make dbg #compile with UBsan and Asan
```
-# Usage
+Install with `(sudo) make install`.
+
+## Supported targets
+
+For now just x86-64 POSIX (Sys-V + ELF). aarch64 backend is in the works. Tested and known to work:
-The driver is still incomplete but it mimics that of compilers like gcc.
+ - `x86_64-linux-gnu`
+ - `x86_64-linus-musl`
+
+## Usage
+
+The driver is still incomplete but it mimics that of compilers like gcc, see `--help`.
`antcc` compiles translation units to object files directly, but the driver
-will invoke an external linker command to output an executable if `-c` isn't passed.
+will invoke an external command to link to an executable if `-c` isn't passed.
+
+Cross-compilation is partially supported: cross-compiling object files works
+but an external cross-compiling toolchain for linking is required; the driver
+will try to find one (invoking e.g. `aarch64-linux-gnu-gcc`, or falling back
+to [`zig cc`](https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html)),
+and appropiate include paths must be manually specified. You can specify the compiler target architecture with `-target <triple>`.
+
+## Testing
+
+`bootstrap.sh` will bootstrap the compiler in 3 stages:
+ - Stage 0 builds the compiler with the system's C compiler
+ - Stage 1 builds the compiler with the stage 0 output
+ - Stage 2 builds the compiler with the stage 1 output
+ - Then stage 1 and 2 outputs are verified to be identical
+
+There are tests in the `test` directory:
+ - `test/run.sh`: local tests
+ - `test/lua.sh`: compile Lua 5.4.0 and run its testsuite
+ - `test/c-testsuite.sh`: run [c-testsuite](https://github.com/c-testsuite/c-testsuite)
+
+## Issues and contributing
+
+You can report issues on the [issue tracker](https://codeberg.org/lsof/antcc/issues).
+
+Contributions are welcome, send as pull requests [on Codeberg](https://codeberg.org/lsof/antcc/pulls).
+
+## Internals & Design
+
+C type representation (`type.h` & `type.c`) is shared by the frontend and
+backend because the backend is responsible for ABI-specific lowering of calling
+conventions.
+
+The C frontend is structured like so:
+ - Compiler driver (`main.c`), which parses command line options, inputs and
+ outputs and calls out to the core compiler to build individual object
+ files and possibly invoke an external command to link them together.
+
+ - Tokenizer & preprocessor (`c/lex.c`): The input file is scanned on-demand,
+ initially reading characters into an internal buffer after performing
+ backslash-newline delition (and possibly trigraph substitution), then producing
+ one token at a time when the parser requests the next one. Preprocessing
+ (directives & macro expansion) is also done on the fly.
+
+ - Parser & IR generation (`c/c.c`): The handwritten parser reads declarations
+ and keeps them in a symbol table/environment. Static data is written to
+ buffers that correspond to the .rodata/.data sections of the final object
+ file, emitting relocations to the object file interface too. Function
+ bodies are parsed and transformed into the IR in one pass. Expressions are
+ parsed into expression trees before being emitted or compile-time evaluated
+ (`c/eval.c`), but there is no whole-program AST. When the end of a
+ function definition is reached, the backend is called to perform all of the
+ passes that will finally transform it into machine code written to the
+ .text section.
+
+The backend (`ir/*`) uses an IR in Static Single Assignment (SSA) form.
+Instructions have a return type and up to two operands. Because of SSA form,
+temporaries are simply referenced by the instruction that provides their
+definition, so an explicit output operand is not required. The list of
+instructions is defined in `ir/op.def`. Each basic block in the control flow
+graph consists of 0 or more phi functions, followed by 0 or more instructions,
+terminated by a jump (unconditional/conditional branch, return, or trap).
+
+The builder API (`ir/builder.c`) used by the frontend performs peephole
+optimizations on the fly, mainly constant folding.
+
+Object file interface routines are in `obj/obj.[c/h]` ELF implementation in
+`obj/elf.[c/h]`. Support for other object formats like PE and Mach-O is planned.
+Debug information in the form of DWARF is also planned, but it is a sizeable
+undertaking.
+
+The `-d...` compiler flag can be used to print the output of different stages
+of the backend for debugging.
+
+The backend performs the following main passes:
+ - ABI lowering (`ir/abi0.c`, `x86_64/sysv.c`): implements target calling
+ convention details, such as lowering structures being passed/returned by
+ value in registers or the stack.
+ - Intrinsics lowering (`ir/intrin.c`): lowers some intrinsics emitted by the
+ frontend (currently just structcopy)
+ - mem2reg (`ir/mem2reg.c`): lower stack slots into SSA temporaries. This is
+ an important pass because the frontend puts every C variable into a stack
+ slot, and this pass transforms those into temporaries and phi instructions
+ in SSA form instructions when possible (most of the time, unless they are
+ aggregates or their address is taken), which is also how clang/LLVM does
+ it. Can be disabled with -O0.
+ - With -O1+ optimizations enabled
+ + inlining (`ir/inliner.c`)
+ + common-subexpression elimination (`ir/cse.c`),
+ + general arithmetic simplifications, branch simplification
+ (`ir/simpl.c`)
+
+ - Stack lowering (`ir/stack.c`): `alloca` instructions are deleted and
+ corresponding stack slots replaced with calculated stack offsets.
+ - Instruction selection (`ir/isel.c`, `x86_64/isel.c`): architecture-specific
+ instruction selection, addressing mode utilization, introduction of
+ register constraints.
+ - Register allocation (`ir/regalloc.c`): performs linear scan register
+ allocation. Spilling has a lot of room for improvement, at the moment the
+ current interval is spilled when there are no free registers, with no other
+ heuristics. A scratch register is reserved for operations with spilled
+ temporaries.
+ - Code emission (`x86_64/emit.c`): binary code for the target architecture is
+ emitted directly (not textual assembly). Relocations are deferred to the
+ object file interface too.
+
+[ ... ]
diff --git a/bootstrap.sh b/bootstrap.sh
index 1761ac5..b2f3fa9 100755
--- a/bootstrap.sh
+++ b/bootstrap.sh
@@ -27,3 +27,5 @@ echo
echo "== Stage 2 (compiling with stage 1 output) =="
X ./antcc1 $opt $cflags -o antcc2 $src
X md5sum antcc2
+
+(X cmp antcc1 antcc2) && echo ok. || (echo 'bootstrap FAIL!'; exit 1)
diff --git a/doc/cstd.md b/doc/cstd.md
new file mode 100644
index 0000000..63e39cd
--- /dev/null
+++ b/doc/cstd.md
@@ -0,0 +1,36 @@
+A list of missing standard C features:
+
+## ANSI/C89
+ - K&R style function definitions with type declarations: `h(a, b) int a; double b; { ... }`
+
+## C99
+ - Variable-length arrays (VLAs)
+ - Proper `long double` support in platforms with extended floats (currently equivalent to `double`)
+ - `complex` types, `<tgmath.h>` header
+ - digraphs
+ - Universal character names (`\uXXXX`, `\UXXXXXXXX`)
+ - IEEE 754 float support Annex F IEC 60559 (`FLT_EVAL_METHOD`, `FENV_ACCESS` pragma) (not even GCC or Clang care about this)
+
+## C11
+ - `_Alignas`, `max_align_t`
+ - Multithreading support (`_Thread_local`, atomics)
+ - `u8".."` string literals
+
+## C23
+ - Decimal floating-point types (`_Decimal32`, `_Decimal64`, and `_Decimal128`)
+ - Bit-precise integers (`_BitInt`)
+ - `nullptr`, `nullptr_t`
+ - Binary integer constants
+ - `char8_t`
+ - Digit separator '
+ - Attributes (`[[...]]` syntax)
+ - Labels followed by declarations and }
+ - `true` and `false` keywords
+ - `auto` for type inference, `typeof_unqual`, `constexpr`
+ - `unreachable` macro in `<stddef.h>`
+ - checked int arithmetic (`<stdckdint.h>`)
+ - tagged type compatibility [N3037](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3037.pdf)
+ - storage class specifier for compound literals
+ - preprocessor `#embed` directive, `__has_include`, `__has_c_attribute`
+ - Pragmas for float rounding direction (`STDC FENV_ROUND`, `STDC FENV_DEC_ROUND`)
+ - IEEE 754 decimal floats, interchange and extended types. various `<float.h>` macros,...
diff --git a/main.c b/main.c
index 47664c2..12374af 100644
--- a/main.c
+++ b/main.c
@@ -219,10 +219,11 @@ optparse(char **args)
} else if (*arg == 'g') {
/* TODO debug info */
} else if (*arg == 'O') {
- if (!arg[1]) ccopt.o = 0; /* default opts */
- else if (arg[1] == '1') ccopt.o = OPT1;
- else if ((uint)arg[1] - '1' < 9) ccopt.o = OPT2;
- else if (arg[1] == '0') ccopt.o = OPT0;
+ char o = arg[1];
+ if (!o || o == 'g') ccopt.o = 0; /* default opts */
+ else if (o == '1' || o == 's' || o == 'z') ccopt.o = OPT1;
+ else if ((uint)o - '1' < 9) ccopt.o = OPT2;
+ else if (o == '0') ccopt.o = OPT0;
else goto Bad;
} else if (*arg == 'D' || *arg == 'U') {
void cpppredef(bool undef, const char *cmd);
@@ -646,7 +647,8 @@ sysinclpaths(void)
static void
prihelp(void)
{
- pfmt("Usage: antcc [options] infile(s)...\n"
+ pfmt("antcc version "ANTCC_VERSION_STR"\n"
+ "Usage: antcc [options] infile(s)...\n"
" antcc [options] -run infile [arguments...]\n"
" antcc [options] infile(s)... -run [arguments...]\n"
"Options:\n"
@@ -665,8 +667,8 @@ prihelp(void)
" -llib \tLink with library\n"
" -fpie \tEmit code for position independent executable\n"
" -fpic \tEmit position independent code\n"
- " -O[0|1..] \tSet optimization level\n"
- " -x[c|o] \tSpecify type of next input file (C, object)\n"
+ " -O<..> \tSet optimization level (0|g|1|2|s|z) (default: -Og)\n"
+ " -x<c|o> \tSpecify type of next input file (C, object)\n"
" -W[...] \tTurn on warnings (stub)\n"
" -Werror \tTurn warnings into errors\n"
" -w \tSuppress warnings\n"