On 8/3/23 08:31, Kito Cheng wrote:
I am working on that, it seems the cost of vsetvli instruction become 0
due to this change, then loop invariant motion won't hoist vsetvli longer.
I haven't looked yet (generating baseline rvv.exp data right now).  But
before I went to bed last night I was worried that a change snuck
through that shouldn't have (changing the toplevel INSN/SET cost
handling -- that wasn't supposed to be in the commit).  I was too tired
to verify and correct without possibly mucking it up further.

That'll be the first thing to look at.  THe costing change was supposed
only affect if-then-else constructs, not sets in general.


If so, I think the most simple fix is adding more checks on the set
cost - only check the SET_SRC is if-then-else?
No, the simple fix is to just remove the errant part of the commit :-0 My tests aren't done, but that does seem to dramatically help. Given it wasn't supposed to go in as-is and it's causing major problems, I'll probably just rip it out even though my testing isn't done.



Let me run the regression to see if that works - although the current
vsetvli cost is too high (4x~5x), but I think it should be fixed later
with a more complete expermantal.
Exactly. I think we need to do a full audit of the costing paths. I've been slowly devising a way to do that and I'll probably give it to Raphael or Jivan once I've fleshed it out a bit more in my head.

The goal is to make sure the costs are sensible and consistent across the different interfaces. A cost failure is actually a bit hard to find because all that happens is you get the wrong set of transformations -- but the code still works correctly, it's just not as efficient as it should be. It doesn't have to be perfect, but we've clearly got a problem.

WRT vsetvli costing. That may ultimately be something that's uarch dependent. We're working on the assumption that vsetvlis are common in the code stream and they need to be very efficient from the hardware standpoint (think as cheap or cheaper than any simple ALU instruction). I probably can't say what we're doing, but I bet it wouldn't be a surprise to others doing a high performance V implementation.

jeff

Reply via email to