"Robin Dapp" <rdapp....@gmail.com> writes: >> "Robin Dapp" <rdapp....@gmail.com> writes: >>> But here I intended to just change the encoded value in memory and to >>> prevent it from aliasing with other encoded values. >>> In an actual instruction the values we "padded" (or that are beyond the >>> vector length) are ignored. For the purpose of hashing a mask of >>> "1010" and "10101010" should be distinct which they are not right now >>> because we don't stop at the length so to say. >> >> But what modes do "1010" and "10101010" represent here? And which code >> treats them as equal? (It should be ok if the hashes are equal, as long >> as the equality function distinguishes them.) > > I realize I didn't fully detail that in the original post, sorry :/ > > In my test case we have two functions one with the first mask ("1010" for > V4BI) and another one performing a similar operation just for > V8BI ("10101010"). > > In optimize_constant_pool we use native_encode_rtx to encode the value > into a buffer (of size GET_MODE_SIZE which is one byte here). > IIRC the whole function works at byte granularity. > > htab->find_slot_with_hash > then finds the same hash for the V4BI and V8BI masks, and we wrongly unify it. Ah, OK, thanks! I don't remember seeing that code before.
In some ways, merging the entries seems like a nice feature... if it worked correctly. But optimize_constant_pool does mean that native_encode_rtx and output_constant_pool_2 must agree on what the byte image looks like, even for padding bits. I wonder if we should instead get rid of: case MODE_VECTOR_BOOL: { gcc_assert (GET_CODE (x) == CONST_VECTOR); /* Pick the smallest integer mode that contains at least one whole element. Often this is byte_mode and contains more than one element. */ unsigned int nelts = GET_MODE_NUNITS (mode); unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts; unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT); scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode)); /* We allow GET_MODE_PRECISION (mode) <= GET_MODE_BITSIZE (mode) but only properly handle cases where the difference is less than a byte. */ gcc_assert (GET_MODE_BITSIZE (mode) - GET_MODE_PRECISION (mode) < BITS_PER_UNIT); /* Build the constant up one integer at a time. */ unsigned int elts_per_int = int_bits / elt_bits; for (unsigned int i = 0; i < nelts; i += elts_per_int) { unsigned HOST_WIDE_INT value = 0; unsigned int limit = MIN (nelts - i, elts_per_int); for (unsigned int j = 0; j < limit; ++j) { auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j)); value |= (elt & mask) << (j * elt_bits); } output_constant_pool_2 (int_mode, gen_int_mode (value, int_mode), i != 0 ? MIN (align, int_bits) : align); } break; } from output_constant_pool_2 and make it defer to native_encode_rtx instead. That seems like the most direct way of avoiding mishaps. E.g. another way in which different routines could make different choices is in whether, for SVE's VNx8BI (say), they fill the upper bit of each 2-bit element with 0s or with a copy of the low bit. Both choices are valid in principle, and sharing the same code between both routines would make sure that they make the same choice. Thanks, Richard