On 11/20/24 19:49, Anton Johansson wrote:
Adds new functions to the gvec API for truncating, sign- or zero
extending vector elements.  Currently implemented as helper functions,
these may be mapped onto host vector instructions in the future.

For the time being, allows translation of more complicated vector
instructions by helper-to-tcg.

Signed-off-by: Anton Johansson <a...@rev.ng>
---
  accel/tcg/tcg-runtime-gvec.c     | 41 +++++++++++++++++
  accel/tcg/tcg-runtime.h          | 22 +++++++++
  include/tcg/tcg-op-gvec-common.h | 18 ++++++++
  tcg/tcg-op-gvec.c                | 78 ++++++++++++++++++++++++++++++++
  4 files changed, 159 insertions(+)

diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index afca89baa1..685c991e6a 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, void 
*c, uint32_t desc)
      }
      clear_high(d, oprsz, desc);
  }
+
+#define DO_SZ_OP1(NAME, DSTTY, SRCTY)                                      \
+void HELPER(NAME)(void *d, void *a, uint32_t desc)                         \
+{                                                                          \
+    intptr_t oprsz = simd_oprsz(desc);                                     \
+    intptr_t elsz = oprsz/sizeof(DSTTY);                                   \
+    intptr_t i;                                                            \
+                                                                           \
+    for (i = 0; i < elsz; ++i) {                                           \
+        SRCTY aa = *((SRCTY *) a + i);                                     \
+        *((DSTTY *) d + i) = aa;                                           \
+    }                                                                      \
+    clear_high(d, oprsz, desc);                                            \

This formulation is not valid.

(1) Generic forms must *always* operate strictly on columns. This formulation is either expanding a narrow vector to a wider vector or compressing a wider vector to a narrow vector.

(2) This takes no care for byte ordering of the data between columns. This is where sticking strictly to columns helps, in that we can assume that data is host-endian *within the column*, but we cannot assume anything about the element indexing of ptr + i.

(3) This takes no care for element overlap if A == D.

The only form of sign/zero-extract that you may add generically is an alias for

  d[i] = a[i] & mask

or

  d[i] = (a[i] << shift) >> shift

where A and D use the same element type. We could add new tcg opcodes for these (particularly the second, for sign-extension), though x86_64 does not support it, afaics.


r~

Reply via email to