On 22/11/24, Richard Henderson wrote: > On 11/20/24 19:49, Anton Johansson wrote: > > Adds new functions to the gvec API for truncating, sign- or zero > > extending vector elements. Currently implemented as helper functions, > > these may be mapped onto host vector instructions in the future. > > > > For the time being, allows translation of more complicated vector > > instructions by helper-to-tcg. > > > > Signed-off-by: Anton Johansson <a...@rev.ng> > > --- > > accel/tcg/tcg-runtime-gvec.c | 41 +++++++++++++++++ > > accel/tcg/tcg-runtime.h | 22 +++++++++ > > include/tcg/tcg-op-gvec-common.h | 18 ++++++++ > > tcg/tcg-op-gvec.c | 78 ++++++++++++++++++++++++++++++++ > > 4 files changed, 159 insertions(+) > > > > diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c > > index afca89baa1..685c991e6a 100644 > > --- a/accel/tcg/tcg-runtime-gvec.c > > +++ b/accel/tcg/tcg-runtime-gvec.c > > @@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, > > void *c, uint32_t desc) > > } > > clear_high(d, oprsz, desc); > > } > > + > > +#define DO_SZ_OP1(NAME, DSTTY, SRCTY) > > \ > > +void HELPER(NAME)(void *d, void *a, uint32_t desc) > > \ > > +{ > > \ > > + intptr_t oprsz = simd_oprsz(desc); > > \ > > + intptr_t elsz = oprsz/sizeof(DSTTY); > > \ > > + intptr_t i; > > \ > > + > > \ > > + for (i = 0; i < elsz; ++i) { > > \ > > + SRCTY aa = *((SRCTY *) a + i); > > \ > > + *((DSTTY *) d + i) = aa; > > \ > > + } > > \ > > + clear_high(d, oprsz, desc); > > \ > > This formulation is not valid. > > (1) Generic forms must *always* operate strictly on columns. This > formulation is either expanding a narrow vector to a wider vector or > compressing a wider vector to a narrow vector. > > (2) This takes no care for byte ordering of the data between columns. This > is where sticking strictly to columns helps, in that we can assume that data > is host-endian *within the column*, but we cannot assume anything about the > element indexing of ptr + i.
Concerning (1) and (2), is this a limitation imposed on generic vector ops. to simplify mapping to host vector instructions where padding/alignment of elements might differ? From my understanding, the helper above should be fine since we can assume contiguous elements? But maybe it doesn't make sense to add a gvec op. that is only implemented via helper, I'm not sure. > (3) This takes no care for element overlap if A == D. Ah, good point! > The only form of sign/zero-extract that you may add generically is an alias > for > > d[i] = a[i] & mask > > or > > d[i] = (a[i] << shift) >> shift > > where A and D use the same element type. We could add new tcg opcodes for > these (particularly the second, for sign-extension), though x86_64 does not > support it, afaics. I see, I don't think we can make this work for Hexagon vector ops., as an example consider V6_vadduwsat which performs an unsigned saturated add of 32-bit elements, currently we emit void emit_V6_vadduwsat(intptr_t vec2, intptr_t vec7, intptr_t vec6) { VectorMem mem = {0}; intptr_t vec5 = temp_new_gvec(&mem, 256); tcg_gen_gvec_zext(MO_64, MO_32, vec5, vec7, 256, 128, 256); intptr_t vec1 = temp_new_gvec(&mem, 256); tcg_gen_gvec_zext(MO_64, MO_32, vec1, vec6, 256, 128, 256); tcg_gen_gvec_add(MO_64, vec1, vec1, vec5, 256, 256); intptr_t vec3 = temp_new_gvec(&mem, 256); tcg_gen_gvec_dup_imm(MO_64, vec3, 256, 256, 4294967295ull); tcg_gen_gvec_umin(MO_64, vec1, vec1, vec3, 256, 256); tcg_gen_gvec_trunc(MO_32, MO_64, vec2, vec1, 128, 256, 128); } so we really do rely on the size-changing property of zext here, the input vectors are 128-byte and we expand them to 256-byte. We could expand vector operations within the instruction to the largest vector size, but would need to zext and trunc to destination and source registers anyway. //Anton