On 2024/7/25 2:04 PM, Richard Henderson wrote:
On 7/17/24 23:39, Max Chou wrote:
+static inline QEMU_ALWAYS_INLINE void
+vext_continus_ldst_host(CPURISCVState *env, vext_ldst_elem_fn_host
*ldst_host,
+ void *vd, uint32_t evl, uint32_t reg_start,
void *host,
+ uint32_t esz, bool is_load)
+{
+#if TARGET_BIG_ENDIAN != HOST_BIG_ENDIAN
+ for (; reg_start < evl; reg_start++, host += esz) {
+ uint32_t byte_off = reg_start * esz;
+ ldst_host(vd, byte_off, host);
+ }
+#else
+ uint32_t byte_offset = reg_start * esz;
+ uint32_t size = (evl - reg_start) * esz;
+
+ if (is_load) {
+ memcpy(vd + byte_offset, host, size);
+ } else {
+ memcpy(host, vd + byte_offset, size);
+ }
+#endif
First, TARGET_BIG_ENDIAN is always false, so this reduces to
HOST_BIG_ENDIAN.
Second, even if TARGET_BIG_ENDIAN were true, this optimization would
be wrong, because of the element ordering given in vector_internals.h
(i.e. H1 etc).
Thanks for the suggestions.
I missed the element ordering here.
I'll fix this at v6.
Third, this can be done with C if, instead of cpp ifdef, so that you
always compile-test both sides.
Fourth... what are the atomicity guarantees of RVV? I didn't
immediately see anything in the RVV manual, which suggests that the
atomicity is the same as individual integer loads of the same size.
Because there are no atomicity guarantees for memcpy, you can only use
this for byte load/store.
For big-endian bytes, you can optimize this to 64-bit little-endian
operations.
Compare arm gen_sve_ldr.
Thanks for the suggestion.
I'll check arm gen_sve_ldr.
r~