On Mon, 13 Feb 2023, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Mon, 13 Feb 2023, juzhe.zh...@rivai.ai wrote: > > > >> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1? > >> Yes, I think so. > >> > >> Let's explain RVV more clearly. > >> Let's suppose we have vector-length = 64bits in RVV CPU. > >> VNx1BI is exactly 1 consecutive bits. > >> VNx2BI is exactly 2 consecutive bits. > >> VNx4BI is exactly 4 consecutive bits. > >> VNx8BI is exactly 8 consecutive bits. > >> > >> For VNx1BI (vbool64_t ), we load it wich this asm: > >> vsetvl e8mf8 > >> vlm.v > >> > >> For VNx2BI (vbool32_t ), we load it wich this asm: > >> vsetvl e8mf4 > >> vlm.v > >> > >> For VNx4BI (vbool16_t ), we load it wich this asm: > >> vsetvl e8mf2 > >> vlm.v > >> > >> For VNx8BI (vbool8_t ), we load it wich this asm: > >> vsetvl e8m1 > >> vlm.v > >> > >> In case of this code sequence: > >> vbool16_t v4 = *(vbool16_t *)in; > >> vbool8_t v3 = *(vbool8_t*)in; > >> > >> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t ) > >> We can't just use the data loaded by VNx4BI (vbool16_t ) in VNx8BI > >> (vbool8_t ). > >> But we can use the data loaded by VNx8BI (vbool8_t ) in VNx4BI > >> (vbool16_t ). > >> > >> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by > >> vbool16_t v4 which is already loaded > >> It's incorrect for RVV. > > > > OK, so the 'vlm.v' instruction will zero the padding bits (according to > > vsetvl), but I doubt the memory subsystem will not load a whole byte. > > > > Then GET_MODE_PRECISION of VNx4BI has to be smaller than > > GET_MODE_PRECISION of VNx8BI, even if their size is the same. > > > > I suppose that ADJUST_NUNITS should be able to do this, but then we > > have in aarch64-modes.def > > > > VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2); > > VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2); > > VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2); > > VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2); > > > > ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8); > > ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4); > > ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2); > > ADJUST_NUNITS (VNx2BI, aarch64_sve_vg); > > > > so all VNxMBI modes are 2 bytes in size but their component is always > > BImode but IIRC the elements of VNx2BImode occupy 4 bits each? > > Yeah. Only the low bit is significant, so it's still a 1-bit element. > But the padding is distributed evenly across the elements rather than > being grouped at one end of the predicate.
I wonder what we'd do for a target that makes the high bit significant ;) > > For riscv we have > > > > VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1); > > ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1)); > > > > so here it would be natural to set the mode precision to > > a poly-int computed by the component precision times nunits? OTOH > > we have to look at the component precision vs. size as well and > > > > /* Single bit mode used for booleans. */ > > BOOL_MODE (BI, 1, 1); > > > > BOOL_MODE is not documented, but its precision and size, so BImode > > has a size of 1. That makes VECTOR_BOOL_MODE very special since > > the layout isn't derived from the component mode. Deriving the > > layout from the precision would make aarch64 incorrect and > > would need BI2 and BI4 modes at least. > > I think the elements have to stay BI for AArch64. Using BI2 (with a > precision of 2) would make both bits significant. I think what's "wrong" with a BImode component mode is not the precision but the size - we don't support bit-precision component types on the GENERIC side but for bool vector modes we pack the components to a bit size and aarch64 has varying bit sizes here (and thus components with padding). I don't think we support modes with sizes less than a unit but since bool modes are special we could re-purpose their precision to mean bitsize. > I'm not sure the RVV case fits into the existing mode layout scheme. > AFAIK we don't currently support vector modes with padding at one end. > If that's right, the fix is likely to involve more than just tweaking > the mode parameters. > > What's the byte size of VNx1BI, expressed as a function of N? > If it's CEIL (N, 8) then we don't have a way of representing that yet. PARTIAL_VECTOR_MODE? (ick) Richard.