Ramana, On 7 July 2014 13:48, Venkataramanan Kumar <venkataramanan.ku...@linaro.org> wrote: > Hi Ramana/Maxim, > > > On 18 June 2014 16:05, Venkataramanan Kumar > <venkataramanan.ku...@linaro.org> wrote: >> Hi Ramana, >> >> On 18 June 2014 15:29, Ramana Radhakrishnan <ramana....@googlemail.com> >> wrote: >>> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar >>> <venkataramanan.ku...@linaro.org> wrote: >>>> Hi Maintainers, >>>> >>>> This patch fixes the PR 60617 that occurs when we turn on reload pass >>>> in thumb2 mode. >>>> >>>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3 >>>> argument of the below function call. >>>> >>>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst))); >>>> >>>> >>>> (----snip---) >>>> (insn 634 633 635 27 (parallel [ >>>> (set (reg:SI 3 r3) >>>> (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand >>>> r5 is registers gets assigned >>>> (reg/v:SI 112 [ op2 ])) >>>> (eq:SI (reg/v:SI 110 [ dst ]) <== This operand >>>> (reg/v:SI 111 [ op1 ])))) >>>> (clobber (reg:CC 100 cc)) >>>> ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300 >>>> {*ior_scc_scc >>>> (----snip---) >>>> >>>> The issue here is that the above pattern demands 5 registers (LO_REGS). >>>> >>>> But when we are in reload, registers r0 is used for pointer to the >>>> class, r1 and r2 for first and second argument. r7 is used for stack >>>> pointer. >>>> >>>> So we are left with r3,r4,r5 and r6. But the above patterns needs five >>>> LO_REGS. Hence we get spill failure when processing the last register >>>> operand in that pattern, >>>> >>>> In ARM port, TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and >>>> for thumb 2 mode there is mention of using LO_REG in the comment as >>>> below. >>>> >>>> "Care should be taken to avoid adding thumb-2 patterns that require >>>> many low registers" >>>> >>>> So conservative fix is not to allow this pattern for Thumb-2 mode. >>> >>> I don't have an additional solution off the top of my head and >>> probably need to go do some digging. >>> >>> It sounds like the conservative fix but what's the impact of doing so >>> ? Have you measured that in terms of performance or code size on a >>> range of benchmarks ? >>> >>>> >> >> I haven't done any benchmark testing. I will try and run some >> benchmarks with my patch. >> >> >>>> I allowed these pattern for Thumb2 when we have constant operands for >>>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to >>>> thum2-cond-cmp-4.c pass. >>> >>> That sounds fine and fair - no trouble there. >>> >>> My concern is with removing the register alternatives and loosing the >>> ability to trigger conditional compares on 4.9 and trunk for Thumb1 >>> till the time the "new" conditional compare work makes it in. >>> >>> Ramana > > I tested this conservative fix with Coremark (ran it on chromebook)and > SPEC cpu2006 (cross compiled and compared assembly differences). > > With Coremark there are no performance issues. In fact there no > assembly differences with CPU flags for A15 and A9. > > For SPEC2006 I cross compiled and compared assembly differences with > and without patch (-O3 -fno-common). > I have not run these bechmarks. > > There are major code differences and are due to conditional compare > instructions not getting generated as you expected. This also results > in different physical register numbers assigned in the generated code > and also there are code scheduling differences when comparing it with > base. > > > I am showing a simple test case to showcase the conditional compare > difference I am seeing in SPEC2006 benchmarks. > > char a,b; > int i; > int f( int j) > { > if ( (i >= a) ? (j <= a) : 1) > return 1; > else > return 0; > } > > GCC FSF 4.9 > ----------- > > movw r2, #:lower16:a > movw r3, #:lower16:i > movt r2, #:upper16:a > movt r3, #:upper16:i > ldrb r2, [r2] @ zero_extendqisi2 > ldr r3, [r3] > cmp r2, r3 > it le > cmple r2, r0 <== conditional compare instrucion generated. > ite ge > movge r0, #1 > movlt r0, #0 > bx lr > > > With patch > --------- > > movw r2, #:lower16:a > movw r3, #:lower16:i > movt r2, #:upper16:a > movt r3, #:upper16:i > ldr r3, [r3] > ldrb r2, [r2] @ zero_extendqisi2 > cmp r2, r3 > ite le > movle r3, #0 <== Unoptimal moves generated. > movgt r3, #1 <== > cmp r2, r0 > ite lt > movlt r0, r3 > orrge r0, r3, #1<== > bx lr > > The following benchmarks have maximum number of conditional compare > pattern differences and also code scheduling changes/different > physical register numbers in generated code. > 416.gamess/ > 434.zeusmp/ > 400.perlbench > 403.gcc/ > 445.gobmk > 483.xalancbmk/ > 401.bzip2 > 433.milc/ > 435.gromacs > > Assembly files differences seen in the below benchmarks buy are not very > large. > > 436.cactusADM/ > 447.dealII > 454.calculix > 456.hmmer > 453.povray/ > 465.tonto/ > 471.omnetpp > 459.GemsFDTD > 437.leslie3d > 464.h264ref > > > Based on the results, I am of the opinion that this patch would cause > some performance hit if we have it up streamed on trunk or GCC 4.9. > > Another thing to be noted is that this bug occurs when only when we > compile with -fno-tree-dce and -mno-lra on GCC 4.9. > > Also with current FSF trunk and GCC 4.9, we have LRA as default for > ARM, which does not have this bug. > > My opinion is that reporter can use this fix to build the package that > caused it and if it has to be built with -fno-tree-dce. > > I am not sure if we can do anything in machine descriptions level to > prevent this bug. Also in reload pass I am not sure if it is easy to > fix. > > Do you have any suggestions? > So this is a 4.8-only bug, or a 4.9/trunk when disabling LRA. Ramana, do you think it's worth fixing it in 4.8, at the price of a likely performance degradation, or leaving it open as "known bug" in that branch?
Thanks, Christophe. > regards, > Venkat. > >> >> This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the >> LRA pass is enabled by default now . >> May be too conservative, but is there a way to enable this pattern >> when we have LRA pass and prevent it we have old reload pass? >> >> regards, >> Venkat. >> >> >> >>> >>>> >>>> Regression tested with gcc 4.9 branch since in trunk this bug is >>>> masked revision 209897. >>>> >>>> Please provide your suggestion on this patch >>>> >>>> regards, >>>> Venkat.