On Feb 18, 2015, at 4:42 PM, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote:
> > On 18/02/15 12:32, Maxim Kuvyrkov wrote: >> On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote: >> >>> Hi all, >>> >>> This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_<mode>3 >>> pattern and its associated splitters. The problem is that for the 2nd >>> alternative it will split a right-shift into a SISD left-shift by the >>> negated >>> amount to be shifted by (the ushl instruction allows such semantics). >>> The splitter generates this RTL: >>> >>> (set (match_dup 2) >>> (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG)) >>> (set (match_dup 0) >>> (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S)) >>> >>> The problem here is that the shift amount register is negated without >>> telling >>> the register allocator about it (and it can't figure it out itself). >>> So if you try to use the register that operand 2 is assigned to later on, >>> you get the negated shift amount instead! >>> >>> The testcase in the patch demonstrates the simple code that can get >>> miscompiled >>> due to this behaviour. >>> >>> The solution in this patch is to negate the shift amount into the output >>> operand (operand 0) and mark it as an earlyclobber in that alternative. >>> This is actually exactly what the very similar >>> *aarch64_ashr_sisd_or_int_<mode>3 pattern does below. >>> I believe this is the safest and simplest fix at this stage. >>> >>> This bug was exposed on the Linaro 4.9 branch that happened to have the >>> perfect >>> storm of costs and register pressure and ended up miscompiling >>> the TEST_BIT macro in ira-costs.c during a build of trunk by the generated >>> Linaro compiler, generating essentially code like: >>> >>> .L141: >>> neg d8, d8 //d8 negated! >>> ushl v0.2s, v11.2s, v8.2s // shift right => shift left by neg amount >>> fmov w0, s0 >>> <...irrelevant code...> >>> b .L140 >>> <...> >>> .L140: >>> fmov w0, s8 // s8/d8 used and incremented assuming it had not >>> changed at L141 >>> add w0, w0, 1 >>> fmov s8, w0 >>> fmov w1, s10 >>> cmp w0, w1 >>> bne .L141 >>> >>> >>> Basically d8 is negated and later used as if it had not been at .L140 >>> leading >>> to completely wrong behaviour. >>> >>> With this patch that particular part of the assembly now contains at L141: >>> neg d0, d8 >>> ushl v0.2s, v11.2s, v0.2s >>> fmov w0, s0 >>> >>> Leaving the original shift amount in d8 intact. >>> >>> This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending >>> pattern was introduced for 4.9) >>> Bootstrapped and tested on trunk and 4.9. >>> >>> Ok for trunk and 4.9? >> First of all, applauses! I realize how difficult it was to reduce this >> problem. > > Thanks! >> >> Your patch looks OK to me, but I can't shake off feeling that it will >> pessimize cases when d8 is not used afterwards. In particular, your patch >> makes it impossible to use same register for output (operand 0) and inputs >> (operands 1 and 2). >> >> Did you consider using SCRATCHes instead of re-using operand 0 with early >> clobber like in the attached [untested] patch? If I got it all correct, >> register allocator will get more freedom in deciding which register to use >> for negated shift temporary, while still allowing reusing register from >> operand 0 for one of the inputs. > > I considered it (but didn't try it) because we end up demanding a scratch > register unnecessarily for the two alternatives that don't split which might > pessimize register allocation. That's not the case. The "X" constraint in (match_scratch) is special; it tells RA to not allocate register. > > For stage 4 I think my proposed fix is the minimal one and it keeps > consistent with the other patterns in that area that were added all together > with: > https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01130.html I think this approach is OK, as long as we revisit the possibility of using SCRATCHes in these and similar patterns at stage 1. Thanks, -- Maxim Kuvyrkov www.linaro.org