On Thu, 5 Sept 2024, 21:10 Robin Dapp, <rdapp....@gmail.com> wrote:

> > > +(define_predicate "maskload_else_operand"
> > > +  (and (match_code "const_int,const_vector")
> > > +       (match_test "op == CONST0_RTX (GET_MODE (op))")))
> >
> > This forces maskload and mask_gather_load to only accept zero here, but
> > in fact the hardware would allow us to accept any value (including
> > undefined).
> >
> > Here's the expand code:
> >
> >    /* Masked lanes are required to hold zero.  */
> >    emit_move_insn (operands[0], gcn_vec_constant (<MODE>mode, 0));
> >
> >    emit_insn (gen_gather<mode>_expr_exec (operands[0], addr, as, v,
> >                                           operands[0], exec));
>
> Where does this requirement come from?  Did you face problems when not
> setting the masked lanes to zero?  I also realized the original problem
> with a gather load, but, as Richi noticed in the meanwhile, only happens
> with padding types.
>

There were absolutely problems without this. It's a while ago now, so I'm
struggling with the details, but as GCC only applies the mask to selected
operations there were all sorts of issues that crept in. Zeroing the
undefined lanes seemed to match the middle end assumptions (or, at least it
made the UB consistent?) or maybe I read it in the code somewhere. Sorry,
it's years since I wrote that.

Generally the idea for the patch is to not require unconditional zeroing
> everything but also enable undefined (like riscv) and leave the decision
> to the vectorizer.  So it would only zero if needed.


This sounds like a generally good plan. Better than just zero it and hope
that's right anyway. ;)

So, in theory, is it better if amdgcn allows both? Or is that one little
move immediate instruction in the backend going to produce better/cleaner
middle end code?

Andrew

Reply via email to