Hi, The patch fixes two IT issues: 1) IT block is splitted for A15 (and Cortex-M).
For A15 tune, max cond insns (max_insns_skipped) is 2, which is set as the maximum allowed insns in an IT block (see thumb2_final_prescan_insn). So IT which has 3 or 4 instructions, will be splitted. Take the first if-then-else in the test case of the patch as example, it will generate ITs like: cmp r0, #10 ite gt subgt r0, r0, r1 suble r0, r1, r0 ite gt addgt r0, r0, #10 suble r0, r0, #7 It does not make sense to split the IT. For cortex-m, the IT can not be folded if the previous insn is 4 BYTES. 2) For arm_v7m_tune, max cond insns is not aligned with its branch cost. In ifcvt.c, the relation between MAX_CONDITIONAL_EXECUTE (max cond insns) and branch cost is: #ifndef MAX_CONDITIONAL_EXECUTE #define MAX_CONDITIONAL_EXECUTE \ (BRANCH_COST (optimize_function_for_speed_p (cfun), false) \ + 1) #endif So the best value of max cond insns for v7m_tune should be 2. Bootstrap and no make check regression on ARM Chrome book. No make check regression for Cortex-M3. Cortex-M4 O2 performance changes on coremark, dhrystone and eembc-v1: coremark: -0.11% dhrystone: 1.26%. a2time01_lite: 2.63% canrdr01_lite: 4.27% iirflt01_lite: 6.51% rspeed01_lite: 6.51% dither01_lite: 7.36% The biggest regression in eembc-v1 is pntrch01_lite: -0.51% All other regressions < 0.1% Cortex-M4 O3 performance changes are similar with O2, except one regression due to loop alignment change. OK for trunk? Thanks! -Zhenqiang 2014-01-14 Zhenqiang Chen <zhenqiang.c...@arm.com> * config/arm/arm.c (arm_v7m_tune): Set max_insns_skipped to 2. (thumb2_final_prescan_insn): Set max to MAX_INSN_PER_IT_BLOCK. testsuite/ChangeLog: 2014-01-14 Zhenqiang Chen <zhenqiang.c...@arm.com> * gcc.target/arm/its.c: New test. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 39d23cc..8751675 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1696,7 +1696,7 @@ const struct tune_params arm_v7m_tune = &v7m_extra_costs, NULL, /* Sched adj cost. */ 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ arm_cortex_m_branch_cost, @@ -22131,11 +22131,11 @@ thumb2_final_prescan_insn (rtx insn) int mask; int max; - /* Maximum number of conditionally executed instructions in a block - is minimum of the two max values: maximum allowed in an IT block - and maximum that is beneficial according to the cost model and tune. */ - max = (max_insns_skipped < MAX_INSN_PER_IT_BLOCK) ? - max_insns_skipped : MAX_INSN_PER_IT_BLOCK; + /* max_insns_skipped in the tune was already taken into account in the + cost model of ifcvt pass when generating COND_EXEC insns. At this stage + just emit the IT blocks as we can. It does not make sense to split + the IT blocks. */ + max = MAX_INSN_PER_IT_BLOCK; /* Remove the previous insn from the count of insns to be output. */ if (arm_condexec_count) diff --git a/gcc/testsuite/gcc.target/arm/its.c b/gcc/testsuite/gcc.target/arm/its.c new file mode 100644 index 0000000..8eecdf4 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/its.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +int test (int a, int b) +{ + int r; + if (a > 10) + { + r = a - b; + r += 10; + } + else + { + r = b - a; + r -= 7; + } + if (r > 0) + r -= 3; + return r; +} +/* { dg-final { scan-assembler-times "\tit" 2 { target arm_thumb2 } } } */