On Tue, Dec 27, 2022 at 9:46 PM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 12/19/22 00:44, Richard Biener wrote: > > On Sat, Dec 17, 2022 at 2:54 AM Jeff Law via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > >> > >> > >> > >> On 12/16/22 18:44, 钟居哲 wrote: > >>> Yes, VNx4DF only has 4 bit in mask mode in case of load and store. > >>> For example vlm or vsm we will load store 8-bit ??? (I am not sure > >>> hardward can load store 4bit,but I am sure it definetly not load store > >>> the whole register size) > >> Most likely than not you end up loading a larger quantity with the high > >> bits zero'd. Interesting that we're using a packed model. I'd been > >> told it was fairly expensive to implement in hardware relative to teh > >> cost of implementing the sparse model. > > > > Since the masks are extra inputs if you use a packed model you need > > to wire less bits into the execution units for the masks which I guess > > is actually cheaper. Yes, producing the masks might be more complicated. > We went through this at a prior employer and the hardware guys argued > strongly that a packed model for mask registers was just too expensive > to implement. I don't think it was the # of wires, but the muxes. The > number of wires into the unit was an issue when we started talking about > sub-byte masking :-) > > Conceptually on the hardware side each bit in the mask corresponds to a > byte in a vector register. When the element size is 8 bits, then > obviously there is a 1:1 correspondence between potentially masked > elements and bits the mask register. > > When the element size is 32 bits, then there are 3 don't care bits in > the mask register, then a single bit that is queried for masked > operations. So if you had a 128bit vector with 32 bits per element, a > mask register might have a value like: > > 0xxx 1xxx 1xxx 0xxx > > A 128 bit vector with 64 bits per element might be: > > 0xxx xxxx 1xxx xxxx > > Where the xxxs are don't cares and the 0/1 are the masks. > > > > > > > The only "issue" might be with 4, 2 and 1 bit masks which would > > have a size of 8 bits but a precision of less that endianess might > > play a role. > > > > Btw, this is all similar to AVX512 where we even don't use > > vector BI modes but integer modes for the mask which > > then becomes QImode for 1, 2, 4 and 8 bit masks and > > HImode for 16, SImode for 32 and DImode for 64 bit masks. > Right. I think in hindsight that might have been a mistake.
Yes, vector BI modes would have been better here. On GCN the mask is always DImode, that would have been the other (better) alternative here I think. Richard. > > jeff