https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103252
--- Comment #5 from Jason A. Donenfeld <jason at zx2c4 dot com> --- > This one is fine/ok as GCC is using k0 as a spill register rather than > spilling to memory. 32bit x86 has limited registers and all. There is nothing > odd about this one even. Right, okay, I see what's happening there. I suppose it's a point of debate as to whether using k0 is actually faster than having the frontend optimize away the stack access or whether it misses that and there's a memory latency penalty for spilling. Presumably the risk of penalty is too high, I guess. For the original example, though, it doesn't seem to even be saving a spill. The non-k0 code is clearly better than the k0 code. I don't know much about how the allocator works and interacts with various passes of the optimizer, but I wonder if spilling to a mask register should have a higher weight than spilling to a gpr?
