On 12/19/22 00:44, Richard Biener wrote:
On Sat, Dec 17, 2022 at 2:54 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:



On 12/16/22 18:44, 钟居哲 wrote:
Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
For example vlm or vsm we will load store 8-bit ??? (I am not sure
hardward can load store 4bit,but I am sure it definetly not load store
the whole register size)
Most likely than not you end up loading a larger quantity with the high
bits zero'd.  Interesting that we're using a packed model.  I'd been
told it was fairly expensive to implement in hardware relative to teh
cost of implementing the sparse model.

Since the masks are extra inputs if you use a packed model you need
to wire less bits into the execution units for the masks which I guess
is actually cheaper.  Yes, producing the masks might be more complicated.
We went through this at a prior employer and the hardware guys argued strongly that a packed model for mask registers was just too expensive to implement. I don't think it was the # of wires, but the muxes. The number of wires into the unit was an issue when we started talking about sub-byte masking :-)

Conceptually on the hardware side each bit in the mask corresponds to a byte in a vector register. When the element size is 8 bits, then obviously there is a 1:1 correspondence between potentially masked elements and bits the mask register.

When the element size is 32 bits, then there are 3 don't care bits in the mask register, then a single bit that is queried for masked operations. So if you had a 128bit vector with 32 bits per element, a mask register might have a value like:

0xxx 1xxx 1xxx 0xxx

A 128 bit vector with 64 bits per element might be:

0xxx xxxx 1xxx xxxx

Where the xxxs are don't cares and the 0/1 are the masks.




The only "issue" might be with 4, 2 and 1 bit masks which would
have a size of 8 bits but a precision of less that endianess might
play a role.

Btw, this is all similar to AVX512 where we even don't use
vector BI modes but integer modes for the mask which
then becomes QImode for 1, 2, 4 and 8 bit masks and
HImode for 16, SImode for 32 and DImode for 64 bit masks.
Right.  I think in hindsight that might have been a mistake.

jeff

Reply via email to