Thanks for the helpful suggestion. The attached v2 patch tries to implement it.
It was bootstrapped and regtested on x86, aarch64 and Power 10. Also regtested on rv64gcv_zvl512b. Those are all little-endian (sub)targets, though, so it would certainly be helpful if you could run additional aarch64 big-endian tests. I haven't seen any scan-assembler failures so far. Regards Robin [PATCH v2] varasm: Use native_encode_rtx for constant vectors. optimize_constant_pool hashes vector masks by native_encode_rtx and merges identically hashed values in the constant pool. Afterwards the optimized values are written in output_constant_pool_2. However, native_encode_rtx and output_constant_pool_2 disagree in their encoding of vector masks: native_encode_rtx does not pad with zeroes while output_constant_pool_2 implicitly does. In RVV's shuffle-evenodd-run.c there are two masks (a) "0101" for V4BI (b) "01010101" for V8BI and that have the same representation/encoding ("1010101") in native_encode_rtx. output_constant_pool_2 uses "101" for (a) and "1010101" for (b). Now, optimize_constant_pool might happen to merge both masks using (a) as representative. Then, output_constant_pool_2 will output "1010" which is only valid for the second mask as the implicit zero padding doesn't agree with (b). (b)'s "1010101" works for both masks as a V4BI load will ignore the last four padding bits. This patch makes output_constant_pool_2 use native_encode_rtx so both functions will agree on an encoding and output the correct constant. gcc/ChangeLog: * varasm.cc (output_constant_pool_2): Use native_encode_rtx for building the memory image of a const vector mask. --- gcc/varasm.cc | 40 +++++++++++++--------------------------- 1 file changed, 13 insertions(+), 27 deletions(-) diff --git a/gcc/varasm.cc b/gcc/varasm.cc index 0068ec2ce4d..9e86efcd178 100644 --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -4301,34 +4301,20 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align) { gcc_assert (GET_CODE (x) == CONST_VECTOR); - /* Pick the smallest integer mode that contains at least one - whole element. Often this is byte_mode and contains more - than one element. */ - unsigned int nelts = GET_MODE_NUNITS (mode); - unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts; - unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT); - scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); - unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode)); - - /* We allow GET_MODE_PRECISION (mode) <= GET_MODE_BITSIZE (mode) but - only properly handle cases where the difference is less than a - byte. */ - gcc_assert (GET_MODE_BITSIZE (mode) - GET_MODE_PRECISION (mode) < - BITS_PER_UNIT); - - /* Build the constant up one integer at a time. */ - unsigned int elts_per_int = int_bits / elt_bits; - for (unsigned int i = 0; i < nelts; i += elts_per_int) + auto_vec<target_unit, 128> buffer; + buffer.truncate (0); + buffer.reserve (GET_MODE_SIZE (mode)); + + bool ok = native_encode_rtx (mode, x, buffer, 0, GET_MODE_SIZE (mode)); + gcc_assert (ok); + + for (unsigned i = 0; + i < GET_MODE_SIZE (mode) / GET_MODE_SIZE (byte_mode); + i++) { - unsigned HOST_WIDE_INT value = 0; - unsigned int limit = MIN (nelts - i, elts_per_int); - for (unsigned int j = 0; j < limit; ++j) - { - auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j)); - value |= (elt & mask) << (j * elt_bits); - } - output_constant_pool_2 (int_mode, gen_int_mode (value, int_mode), - i != 0 ? MIN (align, int_bits) : align); + unsigned HOST_WIDE_INT value = buffer[i]; + output_constant_pool_2 (byte_mode, gen_int_mode (value, byte_mode), + i == 0 ? 0 : 1); } break; } -- 2.47.1