"Robin Dapp" <rdapp....@gmail.com> writes: > [PATCH] varasm: Use native_encode_rtx for constant vectors. > > optimize_constant_pool hashes vector masks by native_encode_rtx and > merges identically hashed values in the constant pool. Afterwards the > optimized values are written in output_constant_pool_2. > > However, native_encode_rtx and output_constant_pool_2 disagree in their > encoding of vector masks: native_encode_rtx does not pad with zeroes > while output_constant_pool_2 implicitly does. > > In RVV's shuffle-evenodd-run.c there are two masks > (a) "0101" for V4BI > (b) "01010101" for V8BI and > that have the same representation/encoding ("1010101") in native_encode_rtx. > output_constant_pool_2 uses "101" for (a) and "1010101" for (b). > > Now, optimize_constant_pool might happen to merge both masks using > (a) as representative. Then, output_constant_pool_2 will output "1010" > which is only valid for the second mask as the implicit zero padding > doesn't agree with (b). > (b)'s "1010101" works for both masks as a V4BI load will ignore the last four > padding bits. > > This patch makes output_constant_pool_2 use native_encode_rtx so both > functions will agree on an encoding and output the correct constant. > > gcc/ChangeLog: > > * varasm.cc (output_constant_pool_2): Use native_encode_rtx for > building the memory image of a const vector mask. > --- > gcc/varasm.cc | 32 ++++++++++++-------------------- > 1 file changed, 12 insertions(+), 20 deletions(-)
Thanks for trying it this way. > diff --git a/gcc/varasm.cc b/gcc/varasm.cc > index 0068ec2ce4d..584d1864077 100644 > --- a/gcc/varasm.cc > +++ b/gcc/varasm.cc > @@ -4301,34 +4301,26 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, > unsigned int align) > { > gcc_assert (GET_CODE (x) == CONST_VECTOR); > > - /* Pick the smallest integer mode that contains at least one > - whole element. Often this is byte_mode and contains more > - than one element. */ > + auto_vec<target_unit, 128> buffer; > + buffer.truncate (0); > + buffer.reserve (GET_MODE_SIZE (mode)); > + > + bool ok = native_encode_rtx (mode, x, buffer, 0, GET_MODE_SIZE (mode)); > + gcc_assert (ok); > + This part looks good. > unsigned int nelts = GET_MODE_NUNITS (mode); > unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts; > unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT); > scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); > - unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode)); > > - /* We allow GET_MODE_PRECISION (mode) <= GET_MODE_BITSIZE (mode) but > - only properly handle cases where the difference is less than a > - byte. */ > - gcc_assert (GET_MODE_BITSIZE (mode) - GET_MODE_PRECISION (mode) < > - BITS_PER_UNIT); > - > - /* Build the constant up one integer at a time. */ > - unsigned int elts_per_int = int_bits / elt_bits; > - for (unsigned int i = 0; i < nelts; i += elts_per_int) > + for (unsigned i = 0; > + i < GET_MODE_SIZE (mode) / GET_MODE_SIZE (int_mode); > + i += GET_MODE_SIZE (int_mode)) > { > unsigned HOST_WIDE_INT value = 0; > - unsigned int limit = MIN (nelts - i, elts_per_int); > - for (unsigned int j = 0; j < limit; ++j) > - { > - auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j)); > - value |= (elt & mask) << (j * elt_bits); > - } > + memcpy (&value, buffer.address () + i, GET_MODE_SIZE (int_mode)); > output_constant_pool_2 (int_mode, gen_int_mode (value, int_mode), > - i != 0 ? MIN (align, int_bits) : align); > + align); But this part doesn't look endian-safe, since it mixes tha target endianness of the encoded buffer with the host endianness of "value" (via the the memcpy). If we wanted to contiue generating the constant in int_mode, we'd need to decode sections of the buffer back to int_mode first. That doesn't seem worth the effort though. How about just using the global byte_mode and iterating over every element of the buffer? The alignment would be "align" for i == 0 and 1 otherwise. That might require some scan-assembler updates, but it seems worth it. I can test for aarch64 testsuite updates next week if that'd help. Thanks, Richard