Hi, I have been looking a bit into the vec4 spilling code and this series implements a few improvements. The main changes are in patches 1 and 4, that add small optimizations. The remaining patches are all minor changes.
Also, I noticed that enabling spilling of everything (which is what I used to test these changes) makes additional piglit tests to fail for some reason. I did not look into why this happens, but I noticed that this series seems to improve things slightly for some reason, probably because with this we save a few scratch loads in some cases. Specifically I get these results in my IvyBridge laptop for a full piglit run forcing spilling of everything on the vec4 backend: With master: crash: 5, fail: 205, pass: 18630, skip: 9187, warn: 3 - With this series: crash: 5, fail: 191, pass: 18639, skip: 9192, warn: 3 - Besides the changes implemented in this series I also evaluated other ideas based on initial work by Ben, however, I ended up discarding them because they would not bring the benefits he was anticipating. I discuss the rationale below for each one: 1) Reuse the same vgrf for all the scratch reads The current code allocates a new vgrf every time the spilled register is read by any instruction, but even if that increases the vgrf count, it is actually the key for the spilling to be successful. As far as I understand the register allocation process, we run into the need to spill when we have conflicts between live vgrfs that can't be allocated simultaneously. These conflicts, in the end, come from the live analysis, and generally, the longer the life span of a vgrf, the more difficult its allocation will be. This makes sense. Once we have decided to spill a register, what we do is that we turn it into multiple vgrfs that are short-lived. Because these registers are short lived, they can be easily allocated and we reduce the average number of conflicts in the allocation process, taking us one step closer to success. If, on the other hand, we reuse the same vgrf for all scratch reads of the spilled register, we end up with a vgrf that has exactly the same life span as the register we spilled, and thus, it has exactly the same conflicts, that is, we end up exactly in the same situation we were before, only that now we have one extra vgrf on top. It is even worse if we try to allocate a single vgrf for all our spills, since that just wouldn't work (as soon as we try to spill more than one operand of the same instruction we would have a problem). 2) Allow spilling of registers with size > 1 I think this is useless in the vec4 backend because by the time we reach register allocation we won't have registers with size > 1. This is because GRF array access is pushed to scratch and then the split_virtual_grfs pass will split anything that still has size > 1 to things with size = 1. To my shame I only realized this after doing the changes and noticing that a full piglit run never hit the case of registers with size > 1. 3) Making spilling costly for adjacent registers The idea here was to increase the spilling cost for vgrfs that were written by one instruction and immediately used in the next. This has one obvious problem and it is that it only considers two instructions. If the same register is used again much later in the program code, it could actually be the source of a lot of conflicts (because it is alive for all that time) and we want to spill it. One of the patches in my series is based on this idea, but what it does is to directly avoid the scratch read for that operand, not prevent the register from being spilled. The second problem with this is that the current algorithm that selects the best register to spill already considers this but in a broader, more useful way. The algorithm selects the vgrf with the best benefit / cost ratio. The benefit is computed based on the number of interferences that the vgrf produces, so for a short-lived register that is only written once and immediately used, it will compute a very small benefit that will make it unlikely to be selected for spilling (actually, if this is the best we can spill that means that we will fail to allocate anyway, since spilling a register like this won't get us any closer to a successful allocation!). On the other hand, if that same register is used again much later in the program, the algorithm will probably compute a high benefit, since it is likely that in this case, being a long-lived register it would cause a lot of inteferences. In summary, the current algorithm seems to handle this case more efficiently. Iago Toral Quiroga (5): i965/vec4: Only emit one scratch read per instruction for spilled registers i965/vec4: Remove checks for reladdr when checking for spillable registers i965/vec4: Register spilling should never see registers with size != 1 i965/vec4: Don't emit scratch reads for a spilled register we have just written i965: Add a debug option for spilling everything in vec4 code src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 60 ++++++++++++++++++---- src/mesa/drivers/dri/i965/intel_debug.c | 3 +- src/mesa/drivers/dri/i965/intel_debug.h | 5 +- 5 files changed, 57 insertions(+), 15 deletions(-) -- 1.9.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev