Ping...
On 06/01/17 18:00, Bernd Edlinger wrote: > Ping... > > On 05/12/17 18:49, Bernd Edlinger wrote: >> Ping... >> >> On 04/29/17 19:21, Bernd Edlinger wrote: >>> Ping... >>> >>> On 04/20/17 20:11, Bernd Edlinger wrote: >>>> Ping... >>>> >>>> for this patch: >>>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html >>>> >>>> On 01/18/17 16:36, Bernd Edlinger wrote: >>>>> On 01/13/17 19:28, Bernd Edlinger wrote: >>>>>> On 01/13/17 17:10, Bernd Edlinger wrote: >>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> this is related to PR77308, the follow-up patch will depend on >>>>>>>>> this >>>>>>>>> one. >>>>>>>>> >>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned >>>>>>>>> before reload, a mis-compilation in libgcc function >>>>>>>>> __gnu_satfractdasq >>>>>>>>> was discovered, see [1] for more details. >>>>>>>>> >>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly >>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split >>>>>>>>> up into this: >>>>>>>>> >>>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>>> (compare:CC (match_dup 0) (match_dup 1))) >>>>>>>>> (parallel [(set (reg:CC CC_REGNUM) >>>>>>>>> (compare:CC (match_dup 3) (match_dup 4))) >>>>>>>>> (set (match_dup 2) >>>>>>>>> (minus:SI (match_dup 5) >>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) >>>>>>>>> (const_int >>>>>>>>> 0))))])] >>>>>>>>> >>>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>>> (compare:CC (match_dup 2) (match_dup 3))) >>>>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>>>>>>>> (set (reg:CC CC_REGNUM) >>>>>>>>> (compare:CC (match_dup 0) (match_dup 1))))] >>>>>>>>> >>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare >>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC >>>>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be >>>>>>>>> redundant and thus got removed, because the data values are >>>>>>>>> identical. >>>>>>>>> >>>>>>>>> I think that applies to a number of similar pattern where data >>>>>>>>> flow is happening through the CC reg. >>>>>>>>> >>>>>>>>> So this is a kind of correctness issue, and should be fixed >>>>>>>>> independently from the optimization issue PR77308. >>>>>>>>> >>>>>>>>> Therefore I think the patterns need to specify the true >>>>>>>>> value that will be in the CC reg, in order for cse to >>>>>>>>> know what the instructions are really doing. >>>>>>>>> >>>>>>>>> >>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>>>>> Is it OK for trunk? >>>>>>>>> >>>>>>>> >>>>>>>> I agree you've found a valid problem here, but I have some issues >>>>>>>> with >>>>>>>> the patch itself. >>>>>>>> >>>>>>>> >>>>>>>> (define_insn_and_split "subdi3_compare1" >>>>>>>> [(set (reg:CC_NCV CC_REGNUM) >>>>>>>> (compare:CC_NCV >>>>>>>> (match_operand:DI 1 "register_operand" "r") >>>>>>>> (match_operand:DI 2 "register_operand" "r"))) >>>>>>>> (set (match_operand:DI 0 "register_operand" "=&r") >>>>>>>> (minus:DI (match_dup 1) (match_dup 2)))] >>>>>>>> "TARGET_32BIT" >>>>>>>> "#" >>>>>>>> "&& reload_completed" >>>>>>>> [(parallel [(set (reg:CC CC_REGNUM) >>>>>>>> (compare:CC (match_dup 1) (match_dup 2))) >>>>>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup >>>>>>>> 2)))]) >>>>>>>> (parallel [(set (reg:CC_C CC_REGNUM) >>>>>>>> (compare:CC_C >>>>>>>> (zero_extend:DI (match_dup 4)) >>>>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>>> (set (match_dup 3) >>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] >>>>>>>> >>>>>>>> >>>>>>>> This pattern is now no-longer self consistent in that before the >>>>>>>> split >>>>>>>> the overall result for the condition register is in mode CC_NCV, >>>>>>>> but >>>>>>>> afterwards it is just CC_C. >>>>>>>> >>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>>>>>>> reflect the result of the 64-bit comparison), but that then implies >>>>>>>> that >>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and >>>>>>>> should in >>>>>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to >>>>>>>> agree >>>>>>>> that CC_NCV is the correct mode for this operation >>>>>>>> >>>>>>>> I'm not sure if there are other consequences that will fall out >>>>>>>> from >>>>>>>> fixing this (it's possible that we might need a change to >>>>>>>> select_cc_mode >>>>>>>> as well). >>>>>>>> >>>>>>> >>>>>>> Yes, this is still a bit awkward... >>>>>>> >>>>>>> The N and V bit will be the correct result for the subdi3_compare1 >>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) >>>>>>> only gets the C bit correct, the expression for N and V is a >>>>>>> different >>>>>>> one. >>>>>>> >>>>>>> It probably works, because the subsi3_carryin_compare instruction >>>>>>> sets >>>>>>> more CC bits than the pattern does explicitly specify the value. >>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but >>>>>>> it is >>>>>>> hard to write down the correct rtl expression for it. >>>>>>> >>>>>>> In theory the pattern should describe everything correctly, >>>>>>> maybe, like: >>>>>>> >>>>>>> set (reg:CC_C CC_REGNUM) >>>>>>> (compare:CC_C >>>>>>> (zero_extend:DI (match_dup 4)) >>>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>> set (reg:CC_NV CC_REGNUM) >>>>>>> (compare:CC_NV >>>>>>> (match_dup 4)) >>>>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int >>>>>>> 0))) >>>>>>> set (match_dup 3) >>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>> >>>>>>> >>>>>>> But I doubt that will work to set CC_REGNUM with two different modes >>>>>>> in parallel? >>>>>>> >>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly >>>>>>> defines C from the DImode result, and NV from the SImode result, >>>>>>> similar to the CC_NOOVmode, that also leaves something open what >>>>>>> bits it really defines? >>>>>>> >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> Bernd. >>>>>> >>>>>> I think maybe the right solution is to invent a new CCmode >>>>>> that defines C as if the comparison is done in DImode >>>>>> but N and V as if the comparison is done in SImode. >>>>>> >>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare), >>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because >>>>>> only N and Z are set correctly), but in a different patch of course. >>>>>> >>>>>> Attached is a new version that implements the new CCmode. >>>>>> >>>>>> How do you like this new version? >>>>>> >>>>>> It seems to be able to build a cross-compiler at least. >>>>>> >>>>>> I will start a new bootstrap with this new patch, but that can take >>>>>> some >>>>>> time until I have definitive results. >>>>>> >>>>>> Is there still a chance that it can go into gcc-7 or should it wait >>>>>> for the next stage1? >>>>>> >>>>>> Thanks >>>>>> Bernd. >>>>> >>>>> >>>>> I thought I should also look at where the subdi_compare1 amd the >>>>> negdi2_compare patterns are used, and look if the caller is fine with >>>>> not having all CC bits available. >>>>> >>>>> And indeed usubv<mode>4 turns out to be questionabe, because it >>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU, >>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses >>>>> CCmode. >>>>> >>>>> To correct this, the branch should use CC_Cmode which is always >>>>> defined. >>>>> >>>>> So I tried to test this pattern, with the following test programs, >>>>> and found that the code actually improves when the branch uses >>>>> CC_Cmode >>>>> instead of CCmode, both for SImode and DImode data, which was a bit >>>>> surprising. >>>>> >>>>> I used this test program to see how the usubv<mode>4 pattern works: >>>>> >>>>> cat test.c (DImode) >>>>> unsigned long long x, y, z; >>>>> int b; >>>>> void test() >>>>> { >>>>> b = __builtin_sub_overflow (y,z, &x); >>>>> } >>>>> >>>>> >>>>> unpatched code used 8 byte more stack than patched, >>>>> because the DImode subtraction is effectively done twice. >>>>> >>>>> cat test1.c (SImode) >>>>> unsigned long x, y, z; >>>>> int b; >>>>> void test() >>>>> { >>>>> b = __builtin_sub_overflow (y,z, &x); >>>>> } >>>>> >>>>> which generates (unpatched): >>>>> cmp r3, r0 >>>>> sub ip, r3, r0 >>>>> >>>>> instead of expected (patched): >>>>> subs r3, r3, r2 >>>>> >>>>> >>>>> The condition is extracted by ifconversion and/or combine >>>>> and complicates the resulting code instead of simplifying. >>>>> >>>>> I think this happens only when the branch and the subsi/di3_compare1 >>>>> is using the same CC mode. >>>>> >>>>> That does not happen when the CC modes disagree, as with the >>>>> proposed patch. All other uses of the pattern are already using >>>>> CC_Cmode or CC_Vmode in the branch, and these do not change. >>>>> >>>>> Attached is an updated version of the patch, that happens to >>>>> improve the code generation of the usubsi4 and usubdi4 pattern, >>>>> as a side effect. >>>>> >>>>> >>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>> Is it OK for trunk? >>>>> >>>>> >>>>> Thanks >>>>> Bernd.