[PING**2] [PATCH, ARM] correctly encode the CC reg data flow

Bernd Edlinger Sat, 29 Apr 2017 10:22:37 -0700

Ping...


On 04/20/17 20:11, Bernd Edlinger wrote:
> Ping...
>
> for this patch:
> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
>
> On 01/18/17 16:36, Bernd Edlinger wrote:
>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>> Hi,
>>>>>>
>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>> one.
>>>>>>
>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>> before reload, a mis-compilation in libgcc function
>>>>>> __gnu_satfractdasq
>>>>>> was discovered, see [1] for more details.
>>>>>>
>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>> up into this:
>>>>>>
>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>                (set (match_dup 2)
>>>>>>                     (minus:SI (match_dup 5)
>>>>>>                              (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>>> 0))))])]
>>>>>>
>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>                (set (reg:CC CC_REGNUM)
>>>>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>
>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>> redundant and thus got removed, because the data values are
>>>>>> identical.
>>>>>>
>>>>>> I think that applies to a number of similar pattern where data
>>>>>> flow is happening through the CC reg.
>>>>>>
>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>> independently from the optimization issue PR77308.
>>>>>>
>>>>>> Therefore I think the patterns need to specify the true
>>>>>> value that will be in the CC reg, in order for cse to
>>>>>> know what the instructions are really doing.
>>>>>>
>>>>>>
>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>> Is it OK for trunk?
>>>>>>
>>>>>
>>>>> I agree you've found a valid problem here, but I have some issues with
>>>>> the patch itself.
>>>>>
>>>>>
>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>   [(set (reg:CC_NCV CC_REGNUM)
>>>>>     (compare:CC_NCV
>>>>>       (match_operand:DI 1 "register_operand" "r")
>>>>>       (match_operand:DI 2 "register_operand" "r")))
>>>>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>   "TARGET_32BIT"
>>>>>   "#"
>>>>>   "&& reload_completed"
>>>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>>>            (compare:CC (match_dup 1) (match_dup 2)))
>>>>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>>>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>            (compare:CC_C
>>>>>              (zero_extend:DI (match_dup 4))
>>>>>              (plus:DI (zero_extend:DI (match_dup 5))
>>>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>           (set (match_dup 3)
>>>>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>
>>>>>
>>>>> This pattern is now no-longer self consistent in that before the split
>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>> afterwards it is just CC_C.
>>>>>
>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>> reflect the result of the 64-bit comparison), but that then implies
>>>>> that
>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>> should in
>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>> agree
>>>>> that CC_NCV is the correct mode for this operation
>>>>>
>>>>> I'm not sure if there are other consequences that will fall out from
>>>>> fixing this (it's possible that we might need a change to
>>>>> select_cc_mode
>>>>> as well).
>>>>>
>>>>
>>>> Yes, this is still a bit awkward...
>>>>
>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>> only gets the C bit correct, the expression for N and V is a different
>>>> one.
>>>>
>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>> more CC bits than the pattern does explicitly specify the value.
>>>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>>>> hard to write down the correct rtl expression for it.
>>>>
>>>> In theory the pattern should describe everything correctly,
>>>> maybe, like:
>>>>
>>>> set (reg:CC_C CC_REGNUM)
>>>>     (compare:CC_C
>>>>       (zero_extend:DI (match_dup 4))
>>>>       (plus:DI (zero_extend:DI (match_dup 5))
>>>>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>> set (reg:CC_NV CC_REGNUM)
>>>>     (compare:CC_NV
>>>>      (match_dup 4))
>>>>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>> 0)))
>>>> set (match_dup 3)
>>>>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>
>>>>
>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>> in parallel?
>>>>
>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>> defines C from the DImode result, and NV from the SImode result,
>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>> bits it really defines?
>>>>
>>>>
>>>> What do you think?
>>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>
>>> I think maybe the right solution is to invent a new CCmode
>>> that defines C as if the comparison is done in DImode
>>> but N and V as if the comparison is done in SImode.
>>>
>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>> only N and Z are set correctly), but in a different patch of course.
>>>
>>> Attached is a new version that implements the new CCmode.
>>>
>>> How do you like this new version?
>>>
>>> It seems to be able to build a cross-compiler at least.
>>>
>>> I will start a new bootstrap with this new patch, but that can take some
>>> time until I have definitive results.
>>>
>>> Is there still a chance that it can go into gcc-7 or should it wait
>>> for the next stage1?
>>>
>>> Thanks
>>> Bernd.
>>
>>
>> I thought I should also look at where the subdi_compare1 amd the
>> negdi2_compare patterns are used, and look if the caller is fine with
>> not having all CC bits available.
>>
>> And indeed usubv<mode>4 turns out to be questionabe, because it
>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>> CCmode.
>>
>> To correct this, the branch should use CC_Cmode which is always defined.
>>
>> So I tried to test this pattern, with the following test programs,
>> and found that the code actually improves when the branch uses CC_Cmode
>> instead of CCmode, both for SImode and DImode data, which was a bit
>> surprising.
>>
>> I used this test program to see how the usubv<mode>4 pattern works:
>>
>> cat test.c (DImode)
>> unsigned long long x, y, z;
>> int b;
>> void test()
>> {
>>   b = __builtin_sub_overflow (y,z, &x);
>> }
>>
>>
>> unpatched code used 8 byte more stack than patched,
>> because the DImode subtraction is effectively done twice.
>>
>> cat test1.c (SImode)
>> unsigned long x, y, z;
>> int b;
>> void test()
>> {
>>   b = __builtin_sub_overflow (y,z, &x);
>> }
>>
>> which generates (unpatched):
>>         cmp     r3, r0
>>         sub     ip, r3, r0
>>
>> instead of expected (patched):
>>     subs    r3, r3, r2
>>
>>
>> The condition is extracted by ifconversion and/or combine
>> and complicates the resulting code instead of simplifying.
>>
>> I think this happens only when the branch and the subsi/di3_compare1
>> is using the same CC mode.
>>
>> That does not happen when the CC modes disagree, as with the
>> proposed patch.  All other uses of the pattern are already using
>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>
>> Attached is an updated version of the patch, that happens to
>> improve the code generation of the usubsi4 and usubdi4 pattern,
>> as a side effect.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.

[PING**2] [PATCH, ARM] correctly encode the CC reg data flow

Reply via email to