For those who have been following my adventure in the world of IRA, or just want to take advantage of some of the time I've spent on this, here's a detailed recap of my exploits and my current status (which I'm very happy with):
I initially thought IRA was calculating costs wrongly for our architecture and so I experimented with adding more costs in different places. This effectively raised the priority of coloring certain allocnos and it led to good performance on some benchmarks, but it was really a complete hack and not an acceptable solution, so I went back to the drawing board. After chatting with Vlad and Jeff, I realised IRA was calculating costs correctly and the issue was that the costs only reflect the preferences of each allocno, as opposed to their "agnosticism" (i.e. not caring what register they get), and so instructions that will happily take any register would take whichever one you showed them first; this meant that high priority agnostic instructions (that are colored first) were pinching BOTTOM_REGS before the lower-priority special instructions got a chance. To address this, I came up with the simple solution of changing the REG_ALLOC_ORDER, where I show TOP_CREGS (the top half of our register set) to agnostic instructions before the callee-save BOTTOM_REGS (c7-c15) by default, so they take TOP_CREGS when all other costs are equal; meanwhile, allocnos requiring BOTTOM_REGS were given them because they'd been left unallocated and their costs were lower than the TOP_CREGS. This resulted in good performance improvements on some key benchmarks, where previously agnostic instructions were unwittingly using up these special BOTTOM_REGS and leading to us allocating TOP_CREGS for operands in special instructions (which then required spills and moves to get a BOTTOM_REG in time to do the special instruction). The main drawback of this alternate REG_ALLOC_ORDER was that we reduce the chance to reuse a callee-save register if we pick a TOP_CREG, because it doesn't work on all instructions, and I ended up getting regressions on some low-pressure benchmarks because we used more callee-save registers than previously. I then tried various other ways of achieving the performance improvements I got on the high register pressure benchmarks and eliminating the regressions on the low-pressure ones: * I tried using '?' constraints on agnostic instructions, such that they erred to TOP_CREGS instead of callee-save BOTTOM_REGS, but this led to too much bias away from the more reusable BOTTOM_REGS (even when there was no need to avoid them) and I subsequently got regressions; * I tried restricting the effect of the '?', by not using it for the register preferencing pass and by only adding on the alt_cost if I'd determined the allocno never requires BOTTOM_REG, and this got some decent performance, but I was still effectively penalising agnostic instructions for using BOTTOM_REGS regardless of whether others allocnos were demanding them, so I got regressions; * I tried coloring the ones that need BOTTOM_REGS first, which maximises the reuse opportunities, but this was essentially a rejection of the existing tried-and-tested coloring heuristics in IRA and consequently led to some regressions; * I tried combining the new REG_ALLOC_ORDER with a modified coloring heuristic that uses the need for BOTTOM_REGS as a tie-breaker when two allocnos are otherwise indistinguishable, but this was overriding the ALLOCNO_NUM ordering (which I guess represents live range starting points?), so I got some regressions. Fortunately, I was hit by another epiphany on Monday night and I realised I could emulate the new REG_ALLOC_ORDER dynamically within assign_hard_reg and only when necessary. When allocating for an agnostic allocno, if it conflicted with other allocnos that needed BOTTOM_REGS, I added 1 to the full_cost of each of the BOTTOM_REGS so this allocno took a TOP_CREG. This gave me my best performance overall, but there were still some regressions on low-pressure benchmarks. Whilst investigating them, I realised I had gone beyond my original new REG_ALLOC_ORDER because I was now avoiding all BOTTOM_REGS instead of just the callee-save BOTTOM_REGS, so I changed assign_hard_reg so it only adds 1 to each callee-save BOTTOM_REG. This means we happily use up the caller-save BOTTOM_REGS (c1 - c6), regardless of whether the allocno is agnostic or not, and this leads to more reusability in low-pressure cases (whereas avoiding all BOTTOM_REGS for an agnostic instruction decreases reusability). Note that I had been experimenting with Priority algorithm and Chaitin-Briggs up until this point and have decided to stick with Priority for now because we are using Priority already and this minimises the disruption and reduces the risk of regressions on untested inputs. (Results generally show CB to be better performing and smaller code-size, but it suffers badly in very high pressure cases and spills more than Priority.) As of yesterday, I had two versions of the Priority algorithm IRA: one that biases against all BOTTOM_REGS when an agnostic allocno conflicts with one that wants BOTTOM_REGS; and one that biases against only the callee-save BOTTOM_REGS. The former got me better performance on high-pressure benchmarks and the latter got me better performance on the ones with some low pressure. Either were an improvement on what I started with (default IRA), but I wanted to have the best of both worlds and so I needed a hybrid that biases against only callee-save BOTTOM_REGS for low-pressure and biases against all BOTTOM_REGS for high-pressure. Sure, I could have put in some ugly hack that looks at the register pressure number and picks one or the other, but instead I came up with a more intelligent solution, which takes advantage of our single caller-save TOP_CREG (c16), which appears after the caller-save BOTTOM_REGS in our REG_ALLOC_ORDER. Now, when an agnostic allocno conflicts with one that wants BOTTOM_REGS, I add 1 to the cost of c16 too, so we still give out c1, c2 etc early on (because they have lower costs than all the callee-save regs and they appear earlier in the REG_ALLOC_ORDER than c16) to whichever allocno gets there first. But later, when all the caller-save regs are allocated, we will use our first callee-save reg (agnostic allocnos will take c17, ones that need BOTTOM_REGS will take c7). At this point, the future cost of this allocated callee-save reg becomes cheaper in future than the caller-save regs, and we opt to use this in preference to the caller-save regs (including c1-c6). The result is that, for high pressure, we are holding back all the BOTTOM_REGS for those that need them. The performance of this version is my best ever and I am very happy with it. It doesn't quite get the highest performance I've seen on one of the high- pressure benchmarks, because we may still be giving up some BOTTOM_REGS to agnostic instructions early on, but it's pretty pretty good otherwise. Thanks for reading and happy allocating!