On Mon, 13 Feb 2023, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Mon, 13 Feb 2023, juzhe.zh...@rivai.ai wrote:
> >
> >> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> >> Yes, I think so.
> >> 
> >> Let's explain RVV more clearly.
> >> Let's suppose we have vector-length = 64bits in RVV CPU.
> >> VNx1BI is exactly 1 consecutive bits.
> >> VNx2BI is exactly 2 consecutive bits.
> >> VNx4BI is exactly 4 consecutive bits.
> >> VNx8BI is exactly 8 consecutive bits.
> >> 
> >> For VNx1BI (vbool64_t ), we load it wich this asm:
> >> vsetvl e8mf8
> >> vlm.v
> >> 
> >> For VNx2BI (vbool32_t ), we load it wich this asm:
> >> vsetvl e8mf4
> >> vlm.v
> >> 
> >> For VNx4BI (vbool16_t ), we load it wich this asm:
> >> vsetvl e8mf2
> >> vlm.v
> >> 
> >> For VNx8BI (vbool8_t ), we load it wich this asm:
> >> vsetvl e8m1
> >> vlm.v
> >> 
> >> In case of this code sequence:
> >> vbool16_t v4 = *(vbool16_t *)in;
> >> vbool8_t v3 = *(vbool8_t*)in;
> >> 
> >> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
> >> We can't just use the data loaded by VNx4BI (vbool16_t ) in  VNx8BI 
> >> (vbool8_t ).
> >> But we can use the data loaded by VNx8BI (vbool8_t  ) in  VNx4BI 
> >> (vbool16_t ).
> >>
> >> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by 
> >> vbool16_t v4 which is already loaded
> >> It's incorrect for RVV.
> >
> > OK, so the 'vlm.v' instruction will zero the padding bits (according to
> > vsetvl), but I doubt the memory subsystem will not load a whole byte.
> >
> > Then GET_MODE_PRECISION of VNx4BI has to be smaller than 
> > GET_MODE_PRECISION of VNx8BI, even if their size is the same.
> >
> > I suppose that ADJUST_NUNITS should be able to do this, but then we
> > have in aarch64-modes.def
> >
> > VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
> > VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
> > VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
> > VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
> >
> > ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
> > ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
> > ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
> > ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
> >
> > so all VNxMBI modes are 2 bytes in size but their component is always
> > BImode but IIRC the elements of VNx2BImode occupy 4 bits each?
> 
> Yeah.  Only the low bit is significant, so it's still a 1-bit element.
> But the padding is distributed evenly across the elements rather than
> being grouped at one end of the predicate.

I wonder what we'd do for a target that makes the high bit significant ;)

> > For riscv we have
> >
> > VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1);
> > ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1));
> >
> > so here it would be natural to set the mode precision to
> > a poly-int computed by the component precision times nunits?  OTOH
> > we have to look at the component precision vs. size as well and
> >
> > /* Single bit mode used for booleans.  */ 
> > BOOL_MODE (BI, 1, 1); 
> >
> > BOOL_MODE is not documented, but its precision and size, so BImode
> > has a size of 1.  That makes VECTOR_BOOL_MODE very special since
> > the layout isn't derived from the component mode.  Deriving the
> > layout from the precision would make aarch64 incorrect and
> > would need BI2 and BI4 modes at least.
> 
> I think the elements have to stay BI for AArch64.  Using BI2 (with a
> precision of 2) would make both bits significant.

I think what's "wrong" with a BImode component mode is not the
precision but the size - we don't support bit-precision component
types on the GENERIC side but for bool vector modes we pack the
components to a bit size and aarch64 has varying bit sizes here
(and thus components with padding).  I don't think we support
modes with sizes less than a unit but since bool modes are special
we could re-purpose their precision to mean bitsize.

> I'm not sure the RVV case fits into the existing mode layout scheme.
> AFAIK we don't currently support vector modes with padding at one end.
> If that's right, the fix is likely to involve more than just tweaking
> the mode parameters.
> 
> What's the byte size of VNx1BI, expressed as a function of N?
> If it's CEIL (N, 8) then we don't have a way of representing that yet.

PARTIAL_VECTOR_MODE?  (ick)

Richard.

Reply via email to