yeah, I was mainly commenting on the questionble performance gains. We
can't just assume less instructions == more perf as we don't really
know what changing instructions really means.
And right, I wasn't really taking LoadPropagation into account, but it
seems like that at least nvidia prefers XM
It seems multiplication by negative powers of two are nonexistent in the
shader-db, so an specialized optimization for them would probably not be
worth it.
It seems my approach gives better instruction counts in shader-db than
your approach, since it can generate shorter (for things like a * 7) an
I think we could do something else (which may even cover more cases):
1. try to use a shl (we already do that)
2 use shladd for all negative imms with for all power of two negative
immediates (are we already doing it? I think we miss a lot of opts
where "worse" instructions could include modifier
Strongly mitigates the harm from the previous commit, which made many
integer multiplications much more heavy on the register and instruction
count.
total instructions in shared programs : 5839715 -> 5801926 (-0.65%)
total gprs used in shared programs: 670553 -> 669853 (-0.10%)
total shared us