Ping... for this patch: https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
On 01/18/17 16:36, Bernd Edlinger wrote: > On 01/13/17 19:28, Bernd Edlinger wrote: >> On 01/13/17 17:10, Bernd Edlinger wrote: >>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>>> On 18/12/16 12:58, Bernd Edlinger wrote: >>>>> Hi, >>>>> >>>>> this is related to PR77308, the follow-up patch will depend on this >>>>> one. >>>>> >>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned >>>>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq >>>>> was discovered, see [1] for more details. >>>>> >>>>> The reason seems to be that when the *arm_cmpdi_insn is directly >>>>> followed by a *arm_cmpdi_unsigned instruction, both are split >>>>> up into this: >>>>> >>>>> [(set (reg:CC CC_REGNUM) >>>>> (compare:CC (match_dup 0) (match_dup 1))) >>>>> (parallel [(set (reg:CC CC_REGNUM) >>>>> (compare:CC (match_dup 3) (match_dup 4))) >>>>> (set (match_dup 2) >>>>> (minus:SI (match_dup 5) >>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int >>>>> 0))))])] >>>>> >>>>> [(set (reg:CC CC_REGNUM) >>>>> (compare:CC (match_dup 2) (match_dup 3))) >>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>>>> (set (reg:CC CC_REGNUM) >>>>> (compare:CC (match_dup 0) (match_dup 1))))] >>>>> >>>>> The problem is that the reg:CC from the *subsi3_carryin_compare >>>>> is not mentioning that the reg:CC is also dependent on the reg:CC >>>>> from before. Therefore the *arm_cmpsi_insn appears to be >>>>> redundant and thus got removed, because the data values are identical. >>>>> >>>>> I think that applies to a number of similar pattern where data >>>>> flow is happening through the CC reg. >>>>> >>>>> So this is a kind of correctness issue, and should be fixed >>>>> independently from the optimization issue PR77308. >>>>> >>>>> Therefore I think the patterns need to specify the true >>>>> value that will be in the CC reg, in order for cse to >>>>> know what the instructions are really doing. >>>>> >>>>> >>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>> Is it OK for trunk? >>>>> >>>> >>>> I agree you've found a valid problem here, but I have some issues with >>>> the patch itself. >>>> >>>> >>>> (define_insn_and_split "subdi3_compare1" >>>> [(set (reg:CC_NCV CC_REGNUM) >>>> (compare:CC_NCV >>>> (match_operand:DI 1 "register_operand" "r") >>>> (match_operand:DI 2 "register_operand" "r"))) >>>> (set (match_operand:DI 0 "register_operand" "=&r") >>>> (minus:DI (match_dup 1) (match_dup 2)))] >>>> "TARGET_32BIT" >>>> "#" >>>> "&& reload_completed" >>>> [(parallel [(set (reg:CC CC_REGNUM) >>>> (compare:CC (match_dup 1) (match_dup 2))) >>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))]) >>>> (parallel [(set (reg:CC_C CC_REGNUM) >>>> (compare:CC_C >>>> (zero_extend:DI (match_dup 4)) >>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>> (set (match_dup 3) >>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] >>>> >>>> >>>> This pattern is now no-longer self consistent in that before the split >>>> the overall result for the condition register is in mode CC_NCV, but >>>> afterwards it is just CC_C. >>>> >>>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>>> reflect the result of the 64-bit comparison), but that then implies >>>> that >>>> the cc mode of subsi3_carryin_compare is incorrect as well and >>>> should in >>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to >>>> agree >>>> that CC_NCV is the correct mode for this operation >>>> >>>> I'm not sure if there are other consequences that will fall out from >>>> fixing this (it's possible that we might need a change to >>>> select_cc_mode >>>> as well). >>>> >>> >>> Yes, this is still a bit awkward... >>> >>> The N and V bit will be the correct result for the subdi3_compare1 >>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) >>> only gets the C bit correct, the expression for N and V is a different >>> one. >>> >>> It probably works, because the subsi3_carryin_compare instruction sets >>> more CC bits than the pattern does explicitly specify the value. >>> We know the subsi3_carryin_compare also computes the NV bits, but it is >>> hard to write down the correct rtl expression for it. >>> >>> In theory the pattern should describe everything correctly, >>> maybe, like: >>> >>> set (reg:CC_C CC_REGNUM) >>> (compare:CC_C >>> (zero_extend:DI (match_dup 4)) >>> (plus:DI (zero_extend:DI (match_dup 5)) >>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>> set (reg:CC_NV CC_REGNUM) >>> (compare:CC_NV >>> (match_dup 4)) >>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))) >>> set (match_dup 3) >>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))) >>> >>> >>> But I doubt that will work to set CC_REGNUM with two different modes >>> in parallel? >>> >>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly >>> defines C from the DImode result, and NV from the SImode result, >>> similar to the CC_NOOVmode, that also leaves something open what >>> bits it really defines? >>> >>> >>> What do you think? >>> >>> >>> Thanks >>> Bernd. >> >> I think maybe the right solution is to invent a new CCmode >> that defines C as if the comparison is done in DImode >> but N and V as if the comparison is done in SImode. >> >> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare), >> furthermore I think the CC_NOOV should be renamed to CC_NZ (because >> only N and Z are set correctly), but in a different patch of course. >> >> Attached is a new version that implements the new CCmode. >> >> How do you like this new version? >> >> It seems to be able to build a cross-compiler at least. >> >> I will start a new bootstrap with this new patch, but that can take some >> time until I have definitive results. >> >> Is there still a chance that it can go into gcc-7 or should it wait >> for the next stage1? >> >> Thanks >> Bernd. > > > I thought I should also look at where the subdi_compare1 amd the > negdi2_compare patterns are used, and look if the caller is fine with > not having all CC bits available. > > And indeed usubv<mode>4 turns out to be questionabe, because it > emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU, > CCmode) which is inconsistent when subdi3_compare1 no longer uses > CCmode. > > To correct this, the branch should use CC_Cmode which is always defined. > > So I tried to test this pattern, with the following test programs, > and found that the code actually improves when the branch uses CC_Cmode > instead of CCmode, both for SImode and DImode data, which was a bit > surprising. > > I used this test program to see how the usubv<mode>4 pattern works: > > cat test.c (DImode) > unsigned long long x, y, z; > int b; > void test() > { > b = __builtin_sub_overflow (y,z, &x); > } > > > unpatched code used 8 byte more stack than patched, > because the DImode subtraction is effectively done twice. > > cat test1.c (SImode) > unsigned long x, y, z; > int b; > void test() > { > b = __builtin_sub_overflow (y,z, &x); > } > > which generates (unpatched): > cmp r3, r0 > sub ip, r3, r0 > > instead of expected (patched): > subs r3, r3, r2 > > > The condition is extracted by ifconversion and/or combine > and complicates the resulting code instead of simplifying. > > I think this happens only when the branch and the subsi/di3_compare1 > is using the same CC mode. > > That does not happen when the CC modes disagree, as with the > proposed patch. All other uses of the pattern are already using > CC_Cmode or CC_Vmode in the branch, and these do not change. > > Attached is an updated version of the patch, that happens to > improve the code generation of the usubsi4 and usubdi4 pattern, > as a side effect. > > > Bootstrapped and reg-tested on arm-linux-gnueabihf. > Is it OK for trunk? > > > Thanks > Bernd.