Ping...
On 04/29/17 19:21, Bernd Edlinger wrote: > Ping... > > On 04/20/17 20:11, Bernd Edlinger wrote: >> Ping... >> >> for this patch: >> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html >> >> On 01/18/17 16:36, Bernd Edlinger wrote: >>> On 01/13/17 19:28, Bernd Edlinger wrote: >>>> On 01/13/17 17:10, Bernd Edlinger wrote: >>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>>>>> On 18/12/16 12:58, Bernd Edlinger wrote: >>>>>>> Hi, >>>>>>> >>>>>>> this is related to PR77308, the follow-up patch will depend on this >>>>>>> one. >>>>>>> >>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned >>>>>>> before reload, a mis-compilation in libgcc function >>>>>>> __gnu_satfractdasq >>>>>>> was discovered, see [1] for more details. >>>>>>> >>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly >>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split >>>>>>> up into this: >>>>>>> >>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>> (compare:CC (match_dup 0) (match_dup 1))) >>>>>>> (parallel [(set (reg:CC CC_REGNUM) >>>>>>> (compare:CC (match_dup 3) (match_dup 4))) >>>>>>> (set (match_dup 2) >>>>>>> (minus:SI (match_dup 5) >>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int >>>>>>> 0))))])] >>>>>>> >>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>> (compare:CC (match_dup 2) (match_dup 3))) >>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>>>>>> (set (reg:CC CC_REGNUM) >>>>>>> (compare:CC (match_dup 0) (match_dup 1))))] >>>>>>> >>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare >>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC >>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be >>>>>>> redundant and thus got removed, because the data values are >>>>>>> identical. >>>>>>> >>>>>>> I think that applies to a number of similar pattern where data >>>>>>> flow is happening through the CC reg. >>>>>>> >>>>>>> So this is a kind of correctness issue, and should be fixed >>>>>>> independently from the optimization issue PR77308. >>>>>>> >>>>>>> Therefore I think the patterns need to specify the true >>>>>>> value that will be in the CC reg, in order for cse to >>>>>>> know what the instructions are really doing. >>>>>>> >>>>>>> >>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>>> Is it OK for trunk? >>>>>>> >>>>>> >>>>>> I agree you've found a valid problem here, but I have some issues >>>>>> with >>>>>> the patch itself. >>>>>> >>>>>> >>>>>> (define_insn_and_split "subdi3_compare1" >>>>>> [(set (reg:CC_NCV CC_REGNUM) >>>>>> (compare:CC_NCV >>>>>> (match_operand:DI 1 "register_operand" "r") >>>>>> (match_operand:DI 2 "register_operand" "r"))) >>>>>> (set (match_operand:DI 0 "register_operand" "=&r") >>>>>> (minus:DI (match_dup 1) (match_dup 2)))] >>>>>> "TARGET_32BIT" >>>>>> "#" >>>>>> "&& reload_completed" >>>>>> [(parallel [(set (reg:CC CC_REGNUM) >>>>>> (compare:CC (match_dup 1) (match_dup 2))) >>>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup >>>>>> 2)))]) >>>>>> (parallel [(set (reg:CC_C CC_REGNUM) >>>>>> (compare:CC_C >>>>>> (zero_extend:DI (match_dup 4)) >>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>> (set (match_dup 3) >>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] >>>>>> >>>>>> >>>>>> This pattern is now no-longer self consistent in that before the >>>>>> split >>>>>> the overall result for the condition register is in mode CC_NCV, but >>>>>> afterwards it is just CC_C. >>>>>> >>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>>>>> reflect the result of the 64-bit comparison), but that then implies >>>>>> that >>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and >>>>>> should in >>>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to >>>>>> agree >>>>>> that CC_NCV is the correct mode for this operation >>>>>> >>>>>> I'm not sure if there are other consequences that will fall out from >>>>>> fixing this (it's possible that we might need a change to >>>>>> select_cc_mode >>>>>> as well). >>>>>> >>>>> >>>>> Yes, this is still a bit awkward... >>>>> >>>>> The N and V bit will be the correct result for the subdi3_compare1 >>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) >>>>> only gets the C bit correct, the expression for N and V is a different >>>>> one. >>>>> >>>>> It probably works, because the subsi3_carryin_compare instruction sets >>>>> more CC bits than the pattern does explicitly specify the value. >>>>> We know the subsi3_carryin_compare also computes the NV bits, but >>>>> it is >>>>> hard to write down the correct rtl expression for it. >>>>> >>>>> In theory the pattern should describe everything correctly, >>>>> maybe, like: >>>>> >>>>> set (reg:CC_C CC_REGNUM) >>>>> (compare:CC_C >>>>> (zero_extend:DI (match_dup 4)) >>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>> set (reg:CC_NV CC_REGNUM) >>>>> (compare:CC_NV >>>>> (match_dup 4)) >>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int >>>>> 0))) >>>>> set (match_dup 3) >>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>> >>>>> >>>>> But I doubt that will work to set CC_REGNUM with two different modes >>>>> in parallel? >>>>> >>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly >>>>> defines C from the DImode result, and NV from the SImode result, >>>>> similar to the CC_NOOVmode, that also leaves something open what >>>>> bits it really defines? >>>>> >>>>> >>>>> What do you think? >>>>> >>>>> >>>>> Thanks >>>>> Bernd. >>>> >>>> I think maybe the right solution is to invent a new CCmode >>>> that defines C as if the comparison is done in DImode >>>> but N and V as if the comparison is done in SImode. >>>> >>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare), >>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because >>>> only N and Z are set correctly), but in a different patch of course. >>>> >>>> Attached is a new version that implements the new CCmode. >>>> >>>> How do you like this new version? >>>> >>>> It seems to be able to build a cross-compiler at least. >>>> >>>> I will start a new bootstrap with this new patch, but that can take >>>> some >>>> time until I have definitive results. >>>> >>>> Is there still a chance that it can go into gcc-7 or should it wait >>>> for the next stage1? >>>> >>>> Thanks >>>> Bernd. >>> >>> >>> I thought I should also look at where the subdi_compare1 amd the >>> negdi2_compare patterns are used, and look if the caller is fine with >>> not having all CC bits available. >>> >>> And indeed usubv<mode>4 turns out to be questionabe, because it >>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU, >>> CCmode) which is inconsistent when subdi3_compare1 no longer uses >>> CCmode. >>> >>> To correct this, the branch should use CC_Cmode which is always defined. >>> >>> So I tried to test this pattern, with the following test programs, >>> and found that the code actually improves when the branch uses CC_Cmode >>> instead of CCmode, both for SImode and DImode data, which was a bit >>> surprising. >>> >>> I used this test program to see how the usubv<mode>4 pattern works: >>> >>> cat test.c (DImode) >>> unsigned long long x, y, z; >>> int b; >>> void test() >>> { >>> b = __builtin_sub_overflow (y,z, &x); >>> } >>> >>> >>> unpatched code used 8 byte more stack than patched, >>> because the DImode subtraction is effectively done twice. >>> >>> cat test1.c (SImode) >>> unsigned long x, y, z; >>> int b; >>> void test() >>> { >>> b = __builtin_sub_overflow (y,z, &x); >>> } >>> >>> which generates (unpatched): >>> cmp r3, r0 >>> sub ip, r3, r0 >>> >>> instead of expected (patched): >>> subs r3, r3, r2 >>> >>> >>> The condition is extracted by ifconversion and/or combine >>> and complicates the resulting code instead of simplifying. >>> >>> I think this happens only when the branch and the subsi/di3_compare1 >>> is using the same CC mode. >>> >>> That does not happen when the CC modes disagree, as with the >>> proposed patch. All other uses of the pattern are already using >>> CC_Cmode or CC_Vmode in the branch, and these do not change. >>> >>> Attached is an updated version of the patch, that happens to >>> improve the code generation of the usubsi4 and usubdi4 pattern, >>> as a side effect. >>> >>> >>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>> Is it OK for trunk? >>> >>> >>> Thanks >>> Bernd.