Re: Truncate optimisation question

Richard Sandiford Thu, 05 Dec 2013 10:00:27 -0800

Eric Botcazou <ebotca...@adacore.com> writes:
>> The comment says that we're trying to match:
>> 
>> 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int))))
>> 2. (set (reg:QI) (plus:QI (mem:QI) (const_int)))
>> 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int)))
>> 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int)))
>> 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int))))
>> 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
>> 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
>> 8. (set (reg:SI) (leu:SI (plus:QI ...)))
>> 
>> And I think that's what we should be matching in cases where the
>> extension isn't redundant, even on RISC targets.
>
> Which one(s) exactly?  Most of the RISC targets we have are parameterized 
> (WORD_REGISTER_OPERATIONS, PROMOTE_MODE, etc) to avoid operations in modes 
> smaller than the word mode.


The first one, sorry.

>> The problem here isn't really about which mode is on the plus,
>> but whether we recognise that the extension instruction is redundant.
>> I.e. we start with:
>> 
>> (insn 9 8 10 2 (set (reg:SI 120)
>>         (plus:SI (subreg:SI (reg:QI 118) 0)
>>             (const_int -48 [0xffffffffffffffd0]))) test.c:6 -1
>>      (nil))
>> (insn 10 9 11 2 (set (reg:SI 121)
>>         (and:SI (reg:SI 120)
>>             (const_int 255 [0xff]))) test.c:6 -1
>>      (nil))
>> (insn 11 10 12 2 (set (reg:CC 100 cc)
>>         (compare:CC (reg:SI 121)
>>             (const_int 9 [0x9]))) test.c:6 -1
>>      (nil))
>> 
>> and what we want combine to do is to recognise that insn 10 is redundant
>> and reduce the sequence to:
>> 
>> (insn 9 8 10 2 (set (reg:SI 120)
>>         (plus:SI (subreg:SI (reg:QI 118) 0)
>>             (const_int -48 [0xffffffffffffffd0]))) test.c:6 -1
>>      (nil))
>> (insn 11 10 12 2 (set (reg:CC 100 cc)
>>         (compare:CC (reg:SI 120)
>>             (const_int 9 [0x9]))) test.c:6 -1
>>      (nil))
>> 
>> But insn 11 is redundant on all targets, not just RISC ones.
>> It isn't about whether the target has a QImode addition or not.
>
> That's theoritical though since, on x86 for example, the redundant 
> instruction 
> isn't even generated because of the QImode addition...

Not for this testcase, sure, but we use an SImode addition and keep the
equivalent redundant extension until combine for:

    int foo (unsigned char *x)
    {
      return (((unsigned int) *x - 48) & 0xff) < 10;
    }

immediately before combine:
(insn 7 6 8 2 (parallel [
            (set (reg:SI 93 [ D.1753 ])
                (plus:SI (reg:SI 92 [ D.1753 ])
                    (const_int -48 [0xffffffffffffffd0])))
            (clobber (reg:CC 17 flags))
        ]) /tmp/foo.c:3 261 {*addsi_1}
     (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 8 7 9 2 (set (reg:SI 94 [ D.1753 ])
        (zero_extend:SI (subreg:QI (reg:SI 93 [ D.1753 ]) 0))) /tmp/foo.c:3 133 
{*zero_extendqisi2}
     (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ])
        (nil)))
(insn 9 8 10 2 (set (reg:CC 17 flags)
        (compare:CC (reg:SI 94 [ D.1753 ])
            (const_int 9 [0x9]))) /tmp/foo.c:3 7 {*cmpsi_1}
     (expr_list:REG_DEAD (reg:SI 94 [ D.1753 ])
        (nil)))

What saves us isn't QImode addition but QImode comparison:

combine:
(insn 7 6 8 2 (parallel [
            (set (reg:SI 93 [ D.1753 ])
                (plus:SI (reg:SI 92 [ D.1753 ])
                    (const_int -48 [0xffffffffffffffd0])))
            (clobber (reg:CC 17 flags))
        ]) /tmp/foo.c:3 261 {*addsi_1}
     (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(note 8 7 9 2 NOTE_INSN_DELETED)
(insn 9 8 10 2 (set (reg:CC 17 flags)
        (compare:CC (subreg:QI (reg:SI 93 [ D.1753 ]) 0)
            (const_int 9 [0x9]))) /tmp/foo.c:3 5 {*cmpqi_1}
     (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ])
        (nil)))

        movzbl  (%rdi), %eax
        subl    $48, %eax
        cmpb    $9, %al
        setbe   %al
        movzbl  %al, %eax
        ret

(The patch didn't affect things here.)

FWIW, change the testcase to:

    int foo (unsigned char *x)
    {
      return (((unsigned int) *x - 48) & 0x1ff) < 10;
    }

and we keep the redundant AND, again regardless of whether the patch is
applied.

>> Well, I think making the simplify-rtx code conditional on the target
>> would be the wrong way to go.  If we really can't live with it being
>> unconditional then I think we should revert it.  But like I say I think
>> it would be better to make combine recognise the redundancy even with
>> the new form.  (Or as I say, longer term, not to rely on combine to
>> eliminate redundant extensions.)  But I don't have time to do that myself...
>
> It helps x86 so we won't revert it.  My fear is that we'll need to add
> code in other places to RISCify back the result of this
> "simplification".

But that's the problem with trying to do the optimisation in this way.
We first simplify a truncation of an SImode addition X.  Then we simplify
a zero extension of that truncation.  Then we have the opportunity to realise
that the zero extension wasn't necessary after all, so we actually want
to undo both simplifications and go back to the original addition pattern.
So undoing the simplifications is precisely what we're aiming for here,
again regardless of RISCness.

Thanks,
Richard

Re: Truncate optimisation question

Reply via email to