Richard Sandiford <richard.sandif...@arm.com> writes:
> Jakub Jelinek <ja...@redhat.com> writes:
>> On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote:
>>> > +(define_split
>>> > +  [(set (match_operand:GPI 0 "register_operand")
>>> > + (LOGICAL:GPI
>>> > +   (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>>> > +                        (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>>> > +            (match_operand:GPI 4 "const_int_operand"))
>>> > +   (zero_extend:GPI (match_operand 3 "register_operand"))))]
>>> > +  "can_create_pseudo_p ()
>>> > +   && REG_P (operands[1])
>>> > +   && REG_P (operands[3])
>>> > +   && REGNO (operands[1]) == REGNO (operands[3])
>>> > +   && ((unsigned HOST_WIDE_INT)
>>> > +       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>>> > +                    << INTVAL (operands[2]), <MODE>mode)
>>> > +       == UINTVAL (operands[4]))"
>>> 
>>> IMO this would be easier to understand as:
>>> 
>>>    && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>>>                        << INTVAL (operands[2]), <MODE>mode)
>>>        == INTVAL (operands[4]))
>>> 
>>> (At first I thought the cast and UINTVAL were trying to escape the
>>> sign-extension canonicalisation.)
>>
>> It is ok to write it that way, you're right, I wrote it with
>> UINTVAL etc. because I initially didn't use trunc_int_for_mode
>> but that is wrong for SImode if the mask is shifted into bit 31.
>>
>>> I'm not sure about this one though.  The REGNO checks mean that this is
>>> effectively for hard registers only.  I thought one of the reasons for
>>> make_more_copies was to avoid combining hard registers like this, so I'm
>>> not sure we should have a pattern that specifically targets them.
>>> 
>>> Segher, have I misunderstood?
>>
>> Yes, this one works only with the hard regs, the problem is that when
>> the hard regs are there, combiner doesn't try anything else, so without
>> such splitter it punts on that.
>> If I add yet another testcase which doesn't have hard registers, like:
>> unsigned
>> or_shift2 (void)
>> {
>>   unsigned char i = 0;
>>   asm volatile ("" : "+r" (i));
>>   return i | (i << 11);
>> }
>> then my patch doesn't handle that case, and the only splitter that would
>> help would need to deal with:
>> (set (reg/i:SI 0 x0)
>>     (ior:SI (and:SI (ashift:SI (subreg:SI (reg:QI 97 [ i ]) 0)
>>                 (const_int 11 [0xb]))
>>             (const_int 522240 [0x7f800]))
>>         (zero_extend:SI (reg:QI 97 [ i ]))))
>> I have added another combine splitter for this below.  But as you can
>> see, what combiner simplification comes with isn't really consistent
>> and orthogonal, different operations in there look quite differently :(.
>
> Hmm, OK.  Still, the above looks reasonable on first principles.
>
>>> These two look good to me apart from the cast nit.  The last one feels
>>> like it's more general than just sign_extends though.  I guess it would
>>> work for any duplicated operation that can be performed in a single
>>> instruction.
>>
>> True, but only very small portion of them can actually make it through,
>> it needs something that combine has been able to propagate into another
>> instruction.  So if we know about other insns that would look the same
>> and would actually be ever matched, we can e.g. define an operator predicate
>> for it, but until we have testcases for that, not sure it is worth it.
>>
>> Here is an updated patch that handles also the zero extends without hard
>> registers and doesn't have the UHWI casts (but untested for now except
>> for the testcase):
>>
>> 2021-04-14  Jakub Jelinek  <ja...@redhat.com>
>>
>>      PR target/100056
>>      * config/aarch64/aarch64.md (*<LOGICAL:optab>_<SHIFT:optab><mode>3):
>>      Add combine splitters for *<LOGICAL:optab>_ashl<mode>3 with
>>      ZERO_EXTEND, SIGN_EXTEND or AND.
>>
>>      * gcc.target/aarch64/pr100056.c: New test.
>>
>> --- gcc/config/aarch64/aarch64.md.jj 2021-04-13 20:41:45.030040848 +0200
>> +++ gcc/config/aarch64/aarch64.md    2021-04-14 19:07:41.641623978 +0200
>> @@ -4431,6 +4431,75 @@ (define_insn "*<LOGICAL:optab>_<SHIFT:op
>>    [(set_attr "type" "logic_shift_imm")]
>>  )
>>  
>> +(define_split
>> +  [(set (match_operand:GPI 0 "register_operand")
>> +    (LOGICAL:GPI
>> +      (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>> +                           (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> +               (match_operand:GPI 4 "const_int_operand"))
>> +      (zero_extend:GPI (match_operand 3 "register_operand"))))]
>> +  "can_create_pseudo_p ()
>> +   && REG_P (operands[1])
>> +   && REG_P (operands[3])
>> +   && REGNO (operands[1]) == REGNO (operands[3])
>> +   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>> +                       << INTVAL (operands[2]), <MODE>mode)
>> +       == INTVAL (operands[4]))"
>> +  [(set (match_dup 4) (zero_extend:GPI (match_dup 3)))
>> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
>> +                               (match_dup 4)))]
>> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
>> +)
>> +
>> +(define_split
>> +  [(set (match_operand:GPI 0 "register_operand")
>> +    (LOGICAL:GPI
>> +      (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator"
>> +                             [(match_operand 1 "register_operand")])
>> +                           (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> +               (match_operand:GPI 3 "const_int_operand"))
>> +      (zero_extend:GPI (match_dup 1))))]
>> +  "can_create_pseudo_p ()
>> +   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[1]))
>> +                       << INTVAL (operands[2]), <MODE>mode)
>> +       == INTVAL (operands[3]))"
>> +  [(set (match_dup 4) (zero_extend:GPI (match_dup 1)))
>> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
>> +                               (match_dup 4)))]
>> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
>
> Minor, but it might be better to use operand 5 for the temporary here.
> Otherwise this looks good apart from the open question about whether
> we should be doing complex combinations involving hard regs.  Let's see
> what Segher says about that side.

OK, so based on what Segher said about possibly extending
make_more_copies for GCC 12, let's go with this for GCC 11.
It'll then be easy to remove the hard-register version for
GCC 12 if that's appropriate.

Thanks,
Richard

>> +)
>> +
>> +(define_split
>> +  [(set (match_operand:GPI 0 "register_operand")
>> +    (LOGICAL:GPI
>> +      (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>> +                           (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> +               (match_operand:GPI 4 "const_int_operand"))
>> +      (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"))))]
>> +  "can_create_pseudo_p ()
>> +   && pow2_or_zerop (UINTVAL (operands[3]) + 1)
>> +   && (trunc_int_for_mode (UINTVAL (operands[3])
>> +                       << INTVAL (operands[2]), <MODE>mode)
>> +       == INTVAL (operands[4]))"
>> +  [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3)))
>> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
>> +                               (match_dup 4)))]
>> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
>> +)
>> +
>> +(define_split
>> +  [(set (match_operand:GPI 0 "register_operand")
>> +    (LOGICAL:GPI
>> +      (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand"))
>> +                  (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> +      (sign_extend:GPI (match_dup 1))))]
>> +  "can_create_pseudo_p ()"
>> +  [(set (match_dup 4) (sign_extend:GPI (match_dup 1)))
>> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
>> +                               (match_dup 4)))]
>> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
>> +)
>> +
>>  (define_insn "*<optab>_rol<mode>3"
>>    [(set (match_operand:GPI 0 "register_operand" "=r")
>>      (LOGICAL:GPI (rotate:GPI
>> --- gcc/testsuite/gcc.target/aarch64/pr100056.c.jj   2021-04-14 
>> 18:54:25.885626705 +0200
>> +++ gcc/testsuite/gcc.target/aarch64/pr100056.c      2021-04-14 
>> 19:00:00.837828080 +0200
>> @@ -0,0 +1,58 @@
>> +/* PR target/100056 */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +/* { dg-final { scan-assembler-not {\t[us]bfiz\tw[0-9]+, w[0-9]+, 11} } } */
>> +
>> +int
>> +or_shift_u8 (unsigned char i)
>> +{
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_u3a (unsigned i)
>> +{
>> +  i &= 7;
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_u3b (unsigned i)
>> +{
>> +  i = (i << 29) >> 29;
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_s16 (signed short i)
>> +{
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_s8 (signed char i)
>> +{
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_s13 (int i)
>> +{
>> +  i = (i << 19) >> 19;
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_s3 (int i)
>> +{
>> +  i = (i << 29) >> 29;
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_u8_asm (unsigned char x)
>> +{
>> +  unsigned char i = x;
>> +  asm volatile ("" : "+r" (i));
>> +  return i | (i << 11);
>> +}
>>
>>
>>      Jakub

Reply via email to