"H.J. Lu" <hjl.to...@gmail.com> writes: > On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote: >> "H.J. Lu" <hjl.to...@gmail.com> writes: >>> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote: >>>> "H.J. Lu" <hjl.to...@gmail.com> writes: >>>>> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote: >>>>>> "H.J. Lu" <hjl.to...@gmail.com> writes: >>>>>>> On 10/17/18, Marc Glisse <marc.gli...@inria.fr> wrote: >>>>>>>> On Wed, 17 Oct 2018, H.J. Lu wrote: >>>>>>>> >>>>>>>>> We may simplify >>>>>>>>> >>>>>>>>> (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0) >>>>>>>>> >>>>>>>>> to X when mode of X is the same as of mode of subreg. >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> we already have code to simplify vec_select(vec_merge): >>>>>>>> >>>>>>>> /* If we select elements in a vec_merge that all come from the >>>>>>>> same >>>>>>>> operand, select from that operand directly. */ >>>>>>>> >>>>>>>> It would make sense to me to make the subreg transform as similar to >>>>>>>> it >>>>>>>> as >>>>>>>> possible, in particular you don't need to special case >>>>>>>> vec_duplicate, >>>>>>>> the >>>>>>>> transformation would see that everything comes from the first >>>>>>>> vector, >>>>>>>> produce (subreg (vec_duplicate X) 0), and let another transformation >>>>>>>> optimize that. >>>>>> >>>>>> Sorry, didn't see this before the OK. >>>>>> >>>>>>> What do you mean by another transformation? If simplify_subreg >>>>>>> doesn't >>>>>>> return X for >>>>>>> >>>>>>> (subreg (vec_merge (vec_duplicate X) >>>>>>> (vector) >>>>>>> (const_int ((1 << N) | M))) >>>>>>> (N * sizeof (X))) >>>>>>> >>>>>>> >>>>>>> no further transformation will be done. >>>>>> >>>>>> I think the point was that we should transform: >>>>>> >>>>>> (subreg (vec_merge X >>>>>> (vector) >>>>>> (const_int ((1 << N) | M))) >>>>>> (N * sizeof (X))) >>>>>> >>>>>> into: >>>>>> >>>>>> simplify_gen_subreg (outermode, X, innermode, byte) >>>>>> >>>>>> which should further simplify when X is a vec_duplicate. >>>>> >>>>> But sizeof (X) is the size of scalar of vec_dup. How do we >>>>> check the mask of vec_merge? >>>> >>>> Yeah, should be sizeof (outermode) (which was the same thing >>>> in the original pattern, but not here). >>>> >>>> Richard >>>> >>> >>> Like this >>> >>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c >>> index b0cf3bbb2a9..e12b5c0e165 100644 >>> --- a/gcc/simplify-rtx.c >>> +++ b/gcc/simplify-rtx.c >>> @@ -6601,20 +6601,21 @@ simplify_subreg (machine_mode outermode, rtx op, >>> return NULL_RTX; >>> } >>> >>> - /* Return X for >>> - (subreg (vec_merge (vec_duplicate X) >>> + /* Simplify >>> + (subreg (vec_merge (X) >>> (vector) >>> (const_int ((1 << N) | M))) >>> - (N * sizeof (X))) >>> + (N * sizeof (outermode))) >>> + to >>> + (subreg ((X) (N * sizeof (outermode))) >> >> Stray "(": (subreg (X) (N * sizeof (outermode))) >> >> OK with that change if it passes testing. > > The self-test failed for 32-bit compiler: > > expected: (reg:QI 342) > actual: (subreg:QI (vec_merge:V128QI (vec_duplicate:V128QI (reg:QI 342)) > (reg:V128QI 343) > (const_int 65 [0x41])) 64) > > since > > && (UINTVAL (XEXP (op, 2)) & (HOST_WIDE_INT_1U << idx)) != 0) > > works only up to vectors with 64 elements for 32-bit compilers.
Since HOST_WIDE_INT should be 64 bits for both 32-bit and 64-bit compilers, I guess the test invoked UB and so only worked by chance for 64-bit compilers? > Should we limit the self-test to vectors with 64 elements? HOST_BITS_PER_WIDE_INT elements probably. I wondered at first whether we should try to support CONST_WIDE_INT, but that just changes the limit to WIDE_INT_MAX_PRECISION. Thanks, Richard