On Feb 18, 2015, at 4:42 PM, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote:

> 
> On 18/02/15 12:32, Maxim Kuvyrkov wrote:
>> On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote:
>> 
>>> Hi all,
>>> 
>>> This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_<mode>3
>>> pattern and its associated splitters. The problem is that for the 2nd
>>> alternative it will split a right-shift into a SISD left-shift by the 
>>> negated
>>> amount to be shifted by (the ushl instruction allows such semantics).
>>> The splitter generates this RTL:
>>> 
>>> (set (match_dup 2)
>>>       (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
>>> (set (match_dup 0)
>>>       (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))
>>> 
>>> The problem here is that the shift amount register is negated without 
>>> telling
>>> the register allocator about it (and it can't figure it out itself).
>>> So if you try to use the register that operand 2 is assigned to later on,
>>> you get the negated shift amount instead!
>>> 
>>> The testcase in the patch demonstrates the simple code that can get 
>>> miscompiled
>>> due to this behaviour.
>>> 
>>> The solution in this patch is to negate the shift amount into the output
>>> operand (operand 0) and mark it as an earlyclobber in that alternative.
>>> This is actually exactly what the very similar
>>> *aarch64_ashr_sisd_or_int_<mode>3 pattern does below.
>>> I believe this is the safest and simplest fix at this stage.
>>> 
>>> This bug was exposed on the Linaro 4.9 branch that happened to have the 
>>> perfect
>>> storm of costs and register pressure and ended up miscompiling
>>> the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
>>> Linaro compiler, generating essentially code like:
>>> 
>>> .L141:
>>>    neg    d8, d8               //d8 negated!
>>>    ushl    v0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
>>>    fmov    w0, s0
>>>        <...irrelevant code...>
>>>        b    .L140
>>> <...>
>>> .L140:
>>>    fmov    w0, s8  // s8/d8 used and incremented assuming it had not 
>>> changed at L141
>>>    add    w0, w0, 1
>>>    fmov    s8, w0
>>>    fmov    w1, s10
>>>    cmp    w0, w1
>>>    bne    .L141
>>> 
>>> 
>>> Basically d8 is negated and later used as if it had not been at .L140 
>>> leading
>>> to completely wrong behaviour.
>>> 
>>> With this patch that particular part of the assembly now contains at L141:
>>>        neg     d0, d8
>>>        ushl    v0.2s, v11.2s, v0.2s
>>>        fmov    w0, s0
>>> 
>>> Leaving the original shift amount in d8 intact.
>>> 
>>> This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
>>> pattern was introduced for 4.9)
>>> Bootstrapped and tested on trunk and 4.9.
>>> 
>>> Ok for trunk and 4.9?
>> First of all, applauses!  I realize how difficult it was to reduce this 
>> problem.
> 
> Thanks!
>> 
>> Your patch looks OK to me, but I can't shake off feeling that it will 
>> pessimize cases when d8 is not used afterwards.  In particular, your patch 
>> makes it impossible to use same register for output (operand 0) and inputs 
>> (operands 1 and 2).
>> 
>> Did you consider using SCRATCHes instead of re-using operand 0 with early 
>> clobber like in the attached [untested] patch?  If I got it all correct, 
>> register allocator will get more freedom in deciding which register to use 
>> for negated shift temporary, while still allowing reusing register from 
>> operand 0 for one of the inputs.
> 
> I considered it (but didn't try it) because we end up demanding a scratch 
> register unnecessarily for the two alternatives that don't split which might 
> pessimize register allocation.

That's not the case.  The "X" constraint in (match_scratch) is special; it 
tells RA to not allocate register.

> 
> For stage 4 I think my proposed fix is the minimal one and it keeps 
> consistent with the other patterns in that area that were added all together 
> with:
> https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01130.html

I think this approach is OK, as long as we revisit the possibility of using 
SCRATCHes in these and similar patterns at stage 1.

Thanks,

--
Maxim Kuvyrkov
www.linaro.org


Reply via email to