On Tue, Dec 27, 2022 at 9:46 PM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 12/19/22 00:44, Richard Biener wrote:
> > On Sat, Dec 17, 2022 at 2:54 AM Jeff Law via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >>
> >>
> >> On 12/16/22 18:44, 钟居哲 wrote:
> >>> Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
> >>> For example vlm or vsm we will load store 8-bit ??? (I am not sure
> >>> hardward can load store 4bit,but I am sure it definetly not load store
> >>> the whole register size)
> >> Most likely than not you end up loading a larger quantity with the high
> >> bits zero'd.  Interesting that we're using a packed model.  I'd been
> >> told it was fairly expensive to implement in hardware relative to teh
> >> cost of implementing the sparse model.
> >
> > Since the masks are extra inputs if you use a packed model you need
> > to wire less bits into the execution units for the masks which I guess
> > is actually cheaper.  Yes, producing the masks might be more complicated.
> We went through this at a prior employer and the hardware guys argued
> strongly that a packed model for mask registers was just too expensive
> to implement.  I don't think it was the # of wires, but the muxes.  The
> number of wires into the unit was an issue when we started talking about
> sub-byte masking :-)
>
> Conceptually on the hardware side each bit in the mask corresponds to a
> byte in a vector register.  When the element size is 8 bits, then
> obviously there is a 1:1 correspondence between potentially masked
> elements and bits the mask register.
>
> When the element size is 32 bits, then there are 3 don't care bits in
> the mask register, then a single bit that is queried for masked
> operations.  So if you had a 128bit vector with 32 bits per element, a
> mask register might have a value like:
>
> 0xxx 1xxx 1xxx 0xxx
>
> A 128 bit vector with 64 bits per element might be:
>
> 0xxx xxxx 1xxx xxxx
>
> Where the xxxs are don't cares and the 0/1 are the masks.
>
>
>
> >
> > The only "issue" might be with 4, 2 and 1 bit masks which would
> > have a size of 8 bits but a precision of less that endianess might
> > play a role.
> >
> > Btw, this is all similar to AVX512 where we even don't use
> > vector BI modes but integer modes for the mask which
> > then becomes QImode for 1, 2, 4 and 8 bit masks and
> > HImode for 16, SImode for 32 and DImode for 64 bit masks.
> Right.  I think in hindsight that might have been a mistake.

Yes, vector BI modes would have been better here.
On GCN the mask is always DImode, that would have been the
other (better) alternative here I think.

Richard.

>
> jeff

Reply via email to