On Thu, 05 Sep 2024 14:57:06 PDT (-0700), a...@baylibre.com wrote:
On Thu, 5 Sept 2024, 21:10 Robin Dapp, <rdapp....@gmail.com> wrote:

> > +(define_predicate "maskload_else_operand"
> > +  (and (match_code "const_int,const_vector")
> > +       (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
> This forces maskload and mask_gather_load to only accept zero here, but
> in fact the hardware would allow us to accept any value (including
> undefined).
>
> Here's the expand code:
>
>    /* Masked lanes are required to hold zero.  */
>    emit_move_insn (operands[0], gcn_vec_constant (<MODE>mode, 0));
>
>    emit_insn (gen_gather<mode>_expr_exec (operands[0], addr, as, v,
>                                           operands[0], exec));

Where does this requirement come from?  Did you face problems when not
setting the masked lanes to zero?  I also realized the original problem
with a gather load, but, as Richi noticed in the meanwhile, only happens
with padding types.


There were absolutely problems without this. It's a while ago now, so I'm
struggling with the details, but as GCC only applies the mask to selected
operations there were all sorts of issues that crept in. Zeroing the
undefined lanes seemed to match the middle end assumptions (or, at least it
made the UB consistent?) or maybe I read it in the code somewhere. Sorry,
it's years since I wrote that.

Sorry to just randomly chime in here, but I bet Robin's asleep and I think you guys might just be talking past each other (and for some reason this one's poking my phone, so I kept seeing the messages).

[Also something's a bit broken with the thread in general, it looks like some bits didn't make inbox.sourceware.org. They're all in my gmail, so hopefully they're possible to dig up on your end. Here's the best top-level I could find: https://inbox.sourceware.org/gcc-patches/d3ddv8vm4ar5.usibt8rv5...@gmail.com/]

We ran into this unwritten "masked elements are zeroed" requirement in RISC-V as well, which resulted in a handful of bugs. The initial proposed workaround was to jam a bunch of zeroing into the backend, but that's both a bit inefficient and appears to have not fully worked due to some quirks of how the RISC-V vector extension configuration works <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115336>.

So Robin now has this patch set to adujst the middle end to be explicit about what values it assumes for the unmasked elements, which we're then taking advantage of on RISC-V by not inserting the pre-zeroing.

Generally the idea for the patch is to not require unconditional zeroing
everything but also enable undefined (like riscv) and leave the decision
to the vectorizer.  So it would only zero if needed.


This sounds like a generally good plan. Better than just zero it and hope
that's right anyway. ;)

So, in theory, is it better if amdgcn allows both? Or is that one little
move immediate instruction in the backend going to produce better/cleaner
middle end code?

IIUC the best code would come frome backends matching what the HW does and relying on the middle end to insert explicit ops to set the unmasked values where necessary. Unless the hardware has some faster way to insert the zeros on masked elements (which IIUC is how x86 and aarch64 work).

I think Robin was just trying to adjust the backends to have no functional changes while still respecting the new on-mask element pattern flavors.

Robin would know better, though.


Andrew

Reply via email to