aboutsummaryrefslogtreecommitdiffhomepage
path: root/ir
Commit message (Collapse)AuthorAgeFilesLines
* simpl: handle multiplication by negative po2 too lemon2025-12-221-5/+9
|
* simpl: optimize unsigned & signed division by power of 2 lemon2025-12-213-19/+66
|
* ir: simpl: optimize some constant multiplications lemon2025-12-212-18/+61
| | | | Reuse irbinop() and irunop() for the constant results cases.
* rega: fix 3ff0bfcb lemon2025-12-211-4/+1
|
* driver: -fsyntax-only lemon2025-12-201-1/+1
|
* rega: fix infinite loop when compiling infinite loop lemon2025-12-201-1/+4
|
* some static assertions for packed type sizes lemon2025-12-201-0/+2
|
* backend: unify pass memory allocation strategies lemon2025-12-207-31/+21
| | | | | | It was all over the place for temporary data structures used by individual passes. Now there is an arena specifically for that, which is nicer.
* backend: general simplification pass skeleton lemon2025-12-203-1/+154
|
* copyopt: optimize same-arg phis with multiple preds lemon2025-12-201-2/+6
|
* ir/regalloc: struct alloc -> union alloc lemon2025-12-201-16/+15
|
* ir: move some filluses() to ir.c, rename optmem.c -> mem2reg.c lemon2025-12-194-30/+29
|
* ir: move cls2load to interface lemon2025-12-183-7/+7
| | | | | There's plenty of code duplication like this around I'm looking to reduce.
* regalloc+emit: get rid of xsave/xrestore hack lemon2025-12-182-51/+63
| | | | | | | Was used for situation where we needed to spill more than 1 temporary and have to use a register that is already used. Instead of push/pop, we can just allocate and set aside specific stack slots for this purpose. Also, reworked linearscan() interval sets to separate FPR/GPR intervals.
* rega: implement stack<->stack swap for parallel moves lemon2025-12-181-29/+34
|
* x86_64: for vararg calls, write to EAX in isel lemon2025-12-181-8/+25
| | | | Also, in regalloc ensure fixed intervals are sorted
* x86-64/emit: implement single-exit-point ret with jump threading lemon2025-12-162-1/+3
|
* bitset: better implementation of bsiter() and stuff lemon2025-12-162-2/+2
| | | | Also changed the type to size_t for portability
* mem2reg: fix obvious inefficiency lemon2025-12-161-16/+10
| | | | | | | deltrivialphis() was iterating over every variably instead of just looking at the variable being examined. And I'd been wondering why mem2reg was such a bottleneck for a testcase like sqlite3 amalgamation.. it's easy to miss the forest for the trees.
* create distinct interned string type lemon2025-12-153-10/+10
| | | | | | | | | | | | | | Interned strings are used pervasively, so it's a good idea to add a layer of type safety to differentiate them from general cstrs and avoid potential bugs from comparing non-interned and interned strings. Not that that's happened so far that I can remember, but it could. I'm 90% sure it's legal to alias `struct {char c;}` pointers with `char` pointers. This specific typedef gives type safety but with a simple one-way `internstr -> const char *` typecast (with `&istr->c`). Converting the other way around is more intentional: a straight up cast `(internstr)cstr` which sticks out as unchecked and probably wrong, or calling the intern(cstr) function, which is the right way.
* move intern() to mem.c lemon2025-12-151-1/+0
| | | | | Being in lex.c was vestigial, since it was being used all over the frontend and backend.
* regalloc: fix lifetime construction for nested loops lemon2025-12-151-17/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, given something like ``` 1 a = ... 2 loop { // outer 3 b = do something with a 4 loop { // inner 5 ... 6 if (b < 0) 7 break 'inner; 8 if (b == 0) 9 return; 10 ... 11 } 12 } ``` Regalloc thought outer goes from 2..6, because 6 is the last place where flow jumps directly back to 2. So `a` would have the lifetime [1,7). However if neither the break nor return are taken, the inner loop repeats and then control could flow back to 7 -> 3. But now the physical location for `a` might have been clobbered between 8..10, which is wrong. This fixes that by making sure the outer loop is considered to span 2..10. The way I went about it might not be the best way of doing it. I'm not 100% certain that it's fully correct and will always find the correct loopend, either. It's surprising it took this long to hit this edge case.
* only put dats can in .text now when emitting it lemon2025-12-142-3/+3
|
* various relocation related optimization lemon2025-12-141-4/+6
| | | | | | | | | | With 59ca5a8db, querying if a symbol is defined is cheap. If we're compiling code that calls foo() and we defined foo() in this compilation unit, we already know its offset within the .text section, so use it instead of emitting a relocation for the linker to handle. Also, put small literal data in the .text section instead of .rodata. This seems to improve performance (cache locality?), and as a bonus, it will be good for aarch64's instr encoding with smallish PC-relative offsets.
* regalloc: fixbug with phi move of stack <- stack lemon2025-12-132-6/+5
|
* Add -O optimization flag lemon2025-12-131-2/+4
|
* fix position independent loads of function symbols. lemon2025-12-133-6/+6
| | | | | | | | For `extern int x[1];`, can use PCREL32 for &x. But for `extern int x(int)`, must use GOTREL, when not being called directly (that's PLT). Therefore the type of an external symbol (actually just whether it denotes a function) matters when deciding what kind of relocation to emit, so keep that information.
* rename arraylength macro -> countof lemon2025-12-115-22/+22
|
* ir: bump MAXINSTR lemon2025-12-101-1/+1
|
* parallel move; implement reg<->stack swp lemon2025-12-101-3/+18
|
* regalloc: optimize a little edge case better lemon2025-12-101-4/+6
| | | | | | | | | | | | With two-address instructions one needs to make sure the dst doesn't get allocated to the same reg as the right-hand operand: %r = mul %x, %y ; %y cannot be %r Except, if the operands are the same %r = mul %x, %x ; if %x is dead after this instr, it's fine to allocate %r to the same reg
* misc fixes lemon2025-12-101-1/+1
|
* rega: change assert for spilled callee. it's ok if nspill==1 lemon2025-12-091-1/+1
|
* abi: fix aggregate passed by regs 2nd reg offset lemon2025-12-062-24/+28
| | | | | | | | | It was broken for example `struct { i32 a; f64 b; }` (would try to load/store b from byte offset 4, not 8). Introduce r2off, realize in x86-64 it's always 8; even `struct {i32 a; f32 b;}` gets passed in one (integer) register. But not so in (future) ABIs like RISC-V, I believe there `{i32, f32}` would get passed in 1 integer and 1 float register (r2off = 4).
* add command-line predefined macros (-D, -U) lemon2025-12-061-2/+0
|
* ir: float fold div/0 lemon2025-12-051-4/+3
|
* regalloc: kill dead defs of physical regs lemon2025-12-041-8/+16
|
* c: make tentative definitions work lemon2025-12-021-1/+1
|
* abi/isel: aggregate args in stack wip lemon2025-11-271-9/+31
|
* regalloc: skip dead phis lemon2025-11-261-1/+4
|
* ir: simplify some occurrences of single-argument phis lemon2025-11-242-8/+17
|
* ir.h: tweak mkintrin() definition to work with tinycc lemon2025-11-241-1/+1
|
* ir: implement cvtu64f. and other bug fixes lemon2025-11-231-2/+35
| | | | | compiler is bootstrapping?! however, stage1 and stage2+ executables aren't bit-identical.. small differences in the codegen.. need to look into that
* implement cvtfXu64 by lowering it in builder lemon2025-11-231-9/+46
| | | | this should probably be in a separate pass?
* c: check actual reachability for non-void func may not return value lemon2025-11-232-0/+22
|
* implement float varargs, and some other fixes lemon2025-11-233-7/+17
|
* make sure indirect function call pointer does not end up in clobber reg lemon2025-11-221-2/+2
|
* ir: freeblk: clear preds lemon2025-11-221-0/+2
|
* ir/ir.c: fix assert in mkcallarg lemon2025-11-221-1/+1
|
* ir/dump: initialize out buffer statically lemon2025-11-221-3/+1
|