"H.J. Lu" <hjl.to...@gmail.com> writes:
> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote:
>> "H.J. Lu" <hjl.to...@gmail.com> writes:
>>> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote:
>>>> "H.J. Lu" <hjl.to...@gmail.com> writes:
>>>>> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote:
>>>>>> "H.J. Lu" <hjl.to...@gmail.com> writes:
>>>>>>> On 10/17/18, Marc Glisse <marc.gli...@inria.fr> wrote:
>>>>>>>> On Wed, 17 Oct 2018, H.J. Lu wrote:
>>>>>>>>
>>>>>>>>> We may simplify
>>>>>>>>>
>>>>>>>>>  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)
>>>>>>>>>
>>>>>>>>> to X when mode of X is the same as of mode of subreg.
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> we already have code to simplify vec_select(vec_merge):
>>>>>>>>
>>>>>>>>      /* If we select elements in a vec_merge that all come from the
>>>>>>>> same
>>>>>>>>         operand, select from that operand directly.  */
>>>>>>>>
>>>>>>>> It would make sense to me to make the subreg transform as similar to
>>>>>>>> it
>>>>>>>> as
>>>>>>>> possible, in particular you don't need to special case
>>>>>>>> vec_duplicate,
>>>>>>>> the
>>>>>>>> transformation would see that everything comes from the first
>>>>>>>> vector,
>>>>>>>> produce (subreg (vec_duplicate X) 0), and let another transformation
>>>>>>>> optimize that.
>>>>>>
>>>>>> Sorry, didn't see this before the OK.
>>>>>>
>>>>>>> What do you mean by another transformation? If simplify_subreg
>>>>>>> doesn't
>>>>>>> return X for
>>>>>>>
>>>>>>>   (subreg (vec_merge (vec_duplicate X)
>>>>>>>                      (vector)
>>>>>>>                      (const_int ((1 << N) | M)))
>>>>>>>           (N * sizeof (X)))
>>>>>>>
>>>>>>>
>>>>>>> no further transformation will be done.
>>>>>>
>>>>>> I think the point was that we should transform:
>>>>>>
>>>>>>   (subreg (vec_merge X
>>>>>>               (vector)
>>>>>>               (const_int ((1 << N) | M)))
>>>>>>    (N * sizeof (X)))
>>>>>>
>>>>>> into:
>>>>>>
>>>>>>   simplify_gen_subreg (outermode, X, innermode, byte)
>>>>>>
>>>>>> which should further simplify when X is a vec_duplicate.
>>>>>
>>>>> But sizeof (X) is the size of scalar of vec_dup.  How do we
>>>>> check the mask of vec_merge?
>>>>
>>>> Yeah, should be sizeof (outermode) (which was the same thing
>>>> in the original pattern, but not here).
>>>>
>>>> Richard
>>>>
>>>
>>> Like this
>>>
>>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
>>> index b0cf3bbb2a9..e12b5c0e165 100644
>>> --- a/gcc/simplify-rtx.c
>>> +++ b/gcc/simplify-rtx.c
>>> @@ -6601,20 +6601,21 @@ simplify_subreg (machine_mode outermode, rtx op,
>>>        return NULL_RTX;
>>>      }
>>>
>>> -  /* Return X for
>>> -  (subreg (vec_merge (vec_duplicate X)
>>> +  /* Simplify
>>> +  (subreg (vec_merge (X)
>>>             (vector)
>>>             (const_int ((1 << N) | M)))
>>> -     (N * sizeof (X)))
>>> +     (N * sizeof (outermode)))
>>> +     to
>>> +  (subreg ((X) (N * sizeof (outermode)))
>>
>> Stray "(": (subreg (X) (N * sizeof (outermode)))
>>
>> OK with that change if it passes testing.
>
> The self-test failed for 32-bit compiler:
>
> expected: (reg:QI 342)
>   actual: (subreg:QI (vec_merge:V128QI (vec_duplicate:V128QI (reg:QI 342))
>         (reg:V128QI 343)
>         (const_int 65 [0x41])) 64)
>
> since
>
> && (UINTVAL (XEXP (op, 2)) & (HOST_WIDE_INT_1U << idx)) != 0)
>
> works only up to vectors with 64 elements for 32-bit compilers.

Since HOST_WIDE_INT should be 64 bits for both 32-bit and 64-bit
compilers, I guess the test invoked UB and so only worked by chance
for 64-bit compilers?

> Should we limit the self-test to vectors with 64 elements?

HOST_BITS_PER_WIDE_INT elements probably.  I wondered at first whether
we should try to support CONST_WIDE_INT, but that just changes the limit
to WIDE_INT_MAX_PRECISION.

Thanks,
Richard

Reply via email to