On 06/09/2024 08:06, Robin Dapp wrote:
There were absolutely problems without this. It's a while ago now, so I'm
struggling with the details, but as GCC only applies the mask to selected
operations there were all sorts of issues that crept in. Zeroing the
undefined lanes seemed to match the middle end assumptions (or, at least it
made the UB consistent?) or maybe I read it in the code somewhere. Sorry,
it's years since I wrote that.

So we only found two instances of this problem and both were related to
_Bools.  In case you have more cases, it would be greatly appreciated
to verify the series with them.  If you don't mind, would it be possible
to comment out the zeroing, re-run the testsuite and check for FAILs?

I looked it up, and it was an execution failure in testcase gfortran.dg/assumed_rank_1.f90 that prompted me to add the initialization.

I believe I observed other cases of this too, but I can't find a list.

It shouldn't be too hard to run the test you suggest, but I won't have the results today.

This sounds like a generally good plan. Better than just zero it and hope
that's right anyway. ;)

So, in theory, is it better if amdgcn allows both? Or is that one little
move immediate instruction in the backend going to produce better/cleaner
middle end code?

The new predicate is supposed to inform the vectorizer of what it "prefers",
i.e. the hardware does anyway.  So if amdgcn leaves the inactive elements
undefined the predicate should only accept undefined as well.
Once the vectorizer requires zeros (or something else than undefined),
it will, explicity, emit a zeroing merge/blend in gimple.  That way the
zeroing can easily be combined with surrounding code.

Of course amdgcn could also advertise zero and then always force a zero before
loading as you currently do.  That would be unconditional, though, and the
combination with surrounding RTL might also be a bit more difficult than when
it's exposed in gimple already.

OK, good to know, thanks!

Andrew

Reply via email to