aboutsummaryrefslogtreecommitdiffhomepage
path: root/ir
Commit message (Collapse)AuthorAgeFilesLines
* fix various warningslemon2026-02-243-9/+7
|
* inline: fix undefined value returnslemon2026-02-241-1/+2
| | | | | | Previously if an inlined function has a return statement with no value (control flow reaching the closing brace of the function), would use an invalid null reference in the inlined body. Turn it into undef.
* IR: just use an array for extended constantslemon2026-02-195-37/+25
| | | | | The extra work of using a hashtable to intern them is probably unnecessary.
* cfg: dominator computation should ignore blocks with no predecessorslemon2026-02-191-1/+1
| | | | | | | These didn't show up atp before but with inlining, for example, a noreturn function, they could be introduced. And the pass ordering means they wouldn't have been cleaned up before filldom(). An unreachable block having no dominator makes sense too.
* ir: regalloc & x86-64 isel bugfixeslemon2026-02-181-1/+1
|
* ir/dump: sanity checklemon2026-02-181-0/+1
|
* ir/rpo: sanity checkslemon2026-02-181-4/+7
|
* ir: basic inlining pass implementationlemon2026-02-184-11/+333
|
* simpl: fix simplifying known cond brancheslemon2026-01-111-8/+3
|
* ir/simpl: stub out some code that wasn't properly tested and brokenlemon2026-01-091-0/+2
| | | | I'll figure it out later, but I better not have a broken trunk
* codegen: eliminate redudant consecutive ret sequenceslemon2026-01-081-0/+1
|
* ir: only stub out float <-> u64 cvt on x86lemon2026-01-081-4/+3
| | | | hackish..
* irsimpl: optimize away cond branches after constant propagationlemon2026-01-071-21/+69
|
* basic CSElemon2026-01-044-0/+119
|
* ir bugfixeslemon2026-01-042-4/+14
|
* rega: fix spill copy of i32 -> i64lemon2026-01-043-6/+6
|
* backend: separate instrs for integer/float storelemon2025-12-317-19/+30
|
* aarch64 isel syms with offsetlemon2025-12-311-6/+3
|
* ir/builder: fix bug optiminzg x+x as x-x -> 0lemon2025-12-261-2/+3
|
* avoid GOT relocations in unnecessary instanceslemon2025-12-253-7/+12
| | | | | Also change xcon to have a flagset for symbols (whether it's a function, locally defined; later: thread local, etc).
* ir: arena-backed linked list for uselistslemon2025-12-243-115/+90
| | | | | Is much simpler than the growable buffers, seems to be just as efficient if not a little faster when benchmarked.
* abi0: get rid of manual instruse reorderinglemon2025-12-241-12/+1
| | | | Vestigial, wasn't enough and we're sorting uses in mem2reg now.
* lower alloca as a separate pass before isellemon2025-12-234-0/+46
|
* ir: use BIT macro for regset (1<< is wrong for u64)lemon2025-12-232-6/+8
|
* simpl: handle multiplication by negative po2 toolemon2025-12-221-5/+9
|
* simpl: optimize unsigned & signed division by power of 2lemon2025-12-213-19/+66
|
* ir: simpl: optimize some constant multiplicationslemon2025-12-212-18/+61
| | | | Reuse irbinop() and irunop() for the constant results cases.
* rega: fix 3ff0bfcblemon2025-12-211-4/+1
|
* driver: -fsyntax-onlylemon2025-12-201-1/+1
|
* rega: fix infinite loop when compiling infinite looplemon2025-12-201-1/+4
|
* some static assertions for packed type sizeslemon2025-12-201-0/+2
|
* backend: unify pass memory allocation strategieslemon2025-12-207-31/+21
| | | | | | It was all over the place for temporary data structures used by individual passes. Now there is an arena specifically for that, which is nicer.
* backend: general simplification pass skeletonlemon2025-12-203-1/+154
|
* copyopt: optimize same-arg phis with multiple predslemon2025-12-201-2/+6
|
* ir/regalloc: struct alloc -> union alloclemon2025-12-201-16/+15
|
* ir: move some filluses() to ir.c, rename optmem.c -> mem2reg.clemon2025-12-194-30/+29
|
* ir: move cls2load to interfacelemon2025-12-183-7/+7
| | | | | There's plenty of code duplication like this around I'm looking to reduce.
* regalloc+emit: get rid of xsave/xrestore hacklemon2025-12-182-51/+63
| | | | | | | Was used for situation where we needed to spill more than 1 temporary and have to use a register that is already used. Instead of push/pop, we can just allocate and set aside specific stack slots for this purpose. Also, reworked linearscan() interval sets to separate FPR/GPR intervals.
* rega: implement stack<->stack swap for parallel moveslemon2025-12-181-29/+34
|
* x86_64: for vararg calls, write to EAX in isellemon2025-12-181-8/+25
| | | | Also, in regalloc ensure fixed intervals are sorted
* x86-64/emit: implement single-exit-point ret with jump threadinglemon2025-12-162-1/+3
|
* bitset: better implementation of bsiter() and stufflemon2025-12-162-2/+2
| | | | Also changed the type to size_t for portability
* mem2reg: fix obvious inefficiencylemon2025-12-161-16/+10
| | | | | | | deltrivialphis() was iterating over every variably instead of just looking at the variable being examined. And I'd been wondering why mem2reg was such a bottleneck for a testcase like sqlite3 amalgamation.. it's easy to miss the forest for the trees.
* create distinct interned string typelemon2025-12-153-10/+10
| | | | | | | | | | | | | | Interned strings are used pervasively, so it's a good idea to add a layer of type safety to differentiate them from general cstrs and avoid potential bugs from comparing non-interned and interned strings. Not that that's happened so far that I can remember, but it could. I'm 90% sure it's legal to alias `struct {char c;}` pointers with `char` pointers. This specific typedef gives type safety but with a simple one-way `internstr -> const char *` typecast (with `&istr->c`). Converting the other way around is more intentional: a straight up cast `(internstr)cstr` which sticks out as unchecked and probably wrong, or calling the intern(cstr) function, which is the right way.
* move intern() to mem.clemon2025-12-151-1/+0
| | | | | Being in lex.c was vestigial, since it was being used all over the frontend and backend.
* regalloc: fix lifetime construction for nested loopslemon2025-12-151-17/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, given something like ``` 1 a = ... 2 loop { // outer 3 b = do something with a 4 loop { // inner 5 ... 6 if (b < 0) 7 break 'inner; 8 if (b == 0) 9 return; 10 ... 11 } 12 } ``` Regalloc thought outer goes from 2..6, because 6 is the last place where flow jumps directly back to 2. So `a` would have the lifetime [1,7). However if neither the break nor return are taken, the inner loop repeats and then control could flow back to 7 -> 3. But now the physical location for `a` might have been clobbered between 8..10, which is wrong. This fixes that by making sure the outer loop is considered to span 2..10. The way I went about it might not be the best way of doing it. I'm not 100% certain that it's fully correct and will always find the correct loopend, either. It's surprising it took this long to hit this edge case.
* only put dats can in .text now when emitting itlemon2025-12-142-3/+3
|
* various relocation related optimizationlemon2025-12-141-4/+6
| | | | | | | | | | With 59ca5a8db, querying if a symbol is defined is cheap. If we're compiling code that calls foo() and we defined foo() in this compilation unit, we already know its offset within the .text section, so use it instead of emitting a relocation for the linker to handle. Also, put small literal data in the .text section instead of .rodata. This seems to improve performance (cache locality?), and as a bonus, it will be good for aarch64's instr encoding with smallish PC-relative offsets.
* regalloc: fixbug with phi move of stack <- stacklemon2025-12-132-6/+5
|
* Add -O optimization flaglemon2025-12-131-2/+4
|