On 10/11/24 5:40 PM, Andrew Waterman wrote:
Whether or not we should use vmv.v.i vs vmv.s.x for loading [-16..15]
into the 0th element is probably uarch dependent.  The tradeoff is
loading the GPR vs the broadcast in the vector unit.  I didn't bother
with this case.

Note that this tradeoff is only interesting when LMUL is small.  When
LMUL is large, vmv.v.i does a lot more work than vmv.s.x (writing
multiple vector registers versus just one).
Very true and I would expect LMUL <= 1 to be the most common case.


Mostly it's a matter of spotting something dumb and fixing it rather than having to answer questions later about dumb codegen. I doubt any of these cases matter in practice.


Jeff

Reply via email to