aboutsummaryrefslogtreecommitdiffhomepage
path: root/ir
Commit message (Collapse)AuthorAgeFilesLines
* x86-64/emit: implement single-exit-point ret with jump threading lemon2025-12-162-1/+3
|
* bitset: better implementation of bsiter() and stuff lemon2025-12-162-2/+2
| | | | Also changed the type to size_t for portability
* mem2reg: fix obvious inefficiency lemon2025-12-161-16/+10
| | | | | | | deltrivialphis() was iterating over every variably instead of just looking at the variable being examined. And I'd been wondering why mem2reg was such a bottleneck for a testcase like sqlite3 amalgamation.. it's easy to miss the forest for the trees.
* create distinct interned string type lemon2025-12-153-10/+10
| | | | | | | | | | | | | | Interned strings are used pervasively, so it's a good idea to add a layer of type safety to differentiate them from general cstrs and avoid potential bugs from comparing non-interned and interned strings. Not that that's happened so far that I can remember, but it could. I'm 90% sure it's legal to alias `struct {char c;}` pointers with `char` pointers. This specific typedef gives type safety but with a simple one-way `internstr -> const char *` typecast (with `&istr->c`). Converting the other way around is more intentional: a straight up cast `(internstr)cstr` which sticks out as unchecked and probably wrong, or calling the intern(cstr) function, which is the right way.
* move intern() to mem.c lemon2025-12-151-1/+0
| | | | | Being in lex.c was vestigial, since it was being used all over the frontend and backend.
* regalloc: fix lifetime construction for nested loops lemon2025-12-151-17/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, given something like ``` 1 a = ... 2 loop { // outer 3 b = do something with a 4 loop { // inner 5 ... 6 if (b < 0) 7 break 'inner; 8 if (b == 0) 9 return; 10 ... 11 } 12 } ``` Regalloc thought outer goes from 2..6, because 6 is the last place where flow jumps directly back to 2. So `a` would have the lifetime [1,7). However if neither the break nor return are taken, the inner loop repeats and then control could flow back to 7 -> 3. But now the physical location for `a` might have been clobbered between 8..10, which is wrong. This fixes that by making sure the outer loop is considered to span 2..10. The way I went about it might not be the best way of doing it. I'm not 100% certain that it's fully correct and will always find the correct loopend, either. It's surprising it took this long to hit this edge case.
* only put dats can in .text now when emitting it lemon2025-12-142-3/+3
|
* various relocation related optimization lemon2025-12-141-4/+6
| | | | | | | | | | With 59ca5a8db, querying if a symbol is defined is cheap. If we're compiling code that calls foo() and we defined foo() in this compilation unit, we already know its offset within the .text section, so use it instead of emitting a relocation for the linker to handle. Also, put small literal data in the .text section instead of .rodata. This seems to improve performance (cache locality?), and as a bonus, it will be good for aarch64's instr encoding with smallish PC-relative offsets.
* regalloc: fixbug with phi move of stack <- stack lemon2025-12-132-6/+5
|
* Add -O optimization flag lemon2025-12-131-2/+4
|
* fix position independent loads of function symbols. lemon2025-12-133-6/+6
| | | | | | | | For `extern int x[1];`, can use PCREL32 for &x. But for `extern int x(int)`, must use GOTREL, when not being called directly (that's PLT). Therefore the type of an external symbol (actually just whether it denotes a function) matters when deciding what kind of relocation to emit, so keep that information.
* rename arraylength macro -> countof lemon2025-12-115-22/+22
|
* ir: bump MAXINSTR lemon2025-12-101-1/+1
|
* parallel move; implement reg<->stack swp lemon2025-12-101-3/+18
|
* regalloc: optimize a little edge case better lemon2025-12-101-4/+6
| | | | | | | | | | | | With two-address instructions one needs to make sure the dst doesn't get allocated to the same reg as the right-hand operand: %r = mul %x, %y ; %y cannot be %r Except, if the operands are the same %r = mul %x, %x ; if %x is dead after this instr, it's fine to allocate %r to the same reg
* misc fixes lemon2025-12-101-1/+1
|
* rega: change assert for spilled callee. it's ok if nspill==1 lemon2025-12-091-1/+1
|
* abi: fix aggregate passed by regs 2nd reg offset lemon2025-12-062-24/+28
| | | | | | | | | It was broken for example `struct { i32 a; f64 b; }` (would try to load/store b from byte offset 4, not 8). Introduce r2off, realize in x86-64 it's always 8; even `struct {i32 a; f32 b;}` gets passed in one (integer) register. But not so in (future) ABIs like RISC-V, I believe there `{i32, f32}` would get passed in 1 integer and 1 float register (r2off = 4).
* add command-line predefined macros (-D, -U) lemon2025-12-061-2/+0
|
* ir: float fold div/0 lemon2025-12-051-4/+3
|
* regalloc: kill dead defs of physical regs lemon2025-12-041-8/+16
|
* c: make tentative definitions work lemon2025-12-021-1/+1
|
* abi/isel: aggregate args in stack wip lemon2025-11-271-9/+31
|
* regalloc: skip dead phis lemon2025-11-261-1/+4
|
* ir: simplify some occurrences of single-argument phis lemon2025-11-242-8/+17
|
* ir.h: tweak mkintrin() definition to work with tinycc lemon2025-11-241-1/+1
|
* ir: implement cvtu64f. and other bug fixes lemon2025-11-231-2/+35
| | | | | compiler is bootstrapping?! however, stage1 and stage2+ executables aren't bit-identical.. small differences in the codegen.. need to look into that
* implement cvtfXu64 by lowering it in builder lemon2025-11-231-9/+46
| | | | this should probably be in a separate pass?
* c: check actual reachability for non-void func may not return value lemon2025-11-232-0/+22
|
* implement float varargs, and some other fixes lemon2025-11-233-7/+17
|
* make sure indirect function call pointer does not end up in clobber reg lemon2025-11-221-2/+2
|
* ir: freeblk: clear preds lemon2025-11-221-0/+2
|
* ir/ir.c: fix assert in mkcallarg lemon2025-11-221-1/+1
|
* ir/dump: initialize out buffer statically lemon2025-11-221-3/+1
|
* regalloc: merge overlapping fixed intervals better lemon2025-11-221-1/+12
|
* irdump: print alloca # bytes lemon2025-11-211-0/+3
|
* ir: implement dominator tree computation lemon2025-11-213-0/+40
|
* ir: barebones IR passes checked contracts lemon2025-11-217-2/+26
|
* remove umul lemon2025-11-213-3/+1
|
* change op names to match 285063eba44 lemon2025-11-218-142/+142
|
* rename IR classes to reflect bitsize lemon2025-11-219-46/+46
|
* regalloc: assert nops aren't being used lemon2025-11-211-0/+1
|
* ir/builder: peephole optimize branch with constant conditional lemon2025-11-211-4/+14
|
* mem2reg: implement marker algorithm from Braun et al lemon2025-11-211-8/+40
|
* mem2reg: store pending phis implicitly lemon2025-11-211-12/+8
|
* ir: fix delpred when npred becomes 1 lemon2025-11-211-2/+12
|
* ir/dump: print block predecessors lemon2025-11-211-2/+10
|
* cfg: sortrpo delete unreachable blocks with allocas by hoisting them to the ↵ lemon2025-11-211-6/+7
| | | | entry block
* isel: lower allocas a different way, such that stk address gets materialized ↵ lemon2025-11-201-1/+1
| | | | when necesary
* ir: for easier debugging, keep ctype in dats, print as literal when possible lemon2025-11-203-21/+53
|