>> >> 1) If -fira_loop_pressure is enabled, it reduces ~24% invariant motions in my >tests. But it does not help on total code size. Seams there is issue to update >the >"regs_needed" after moving an invariant out of the loop (My benchmark logs >show ~73% cases have more than one invariants moved). >> >> During tracing, I found that move an integer constant out of the loop does >> not >increase regs_needed. Function "get_pressure_class_and_nregs (rtx insn, int >*nregs)" computes the "regs_needed". >> >> *nregs >> = ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC >> (set))]; >> >> In ARM, the insn to set an integer is like >> (set (reg:SI 183) >> (const_int 32 [0x20])) inv1.c:64 182 {*thumb1_movsi_insn} >> (nil)) >> GET_MODE (SET_SRC (set)) is VOIDMode and >ira_reg_class_max_nregs[pressure_class][VOIDMode] is 0. In one of my test >cases, it moves 4 integer constants out of the loop, which leads to spilling. >> >> According to the algorithm in "calculate_loop_reg_pressure", moving an >> invariant out of the loop should impact on the register pressure. So I >> try to add the following code >> >> if (! (*nregs)) >> *nregs = ira_reg_class_max_nregs[pressure_class][GET_MODE (reg)]; >> >> Logs show it reduces another 32% invariant motions. But the code size is >> still >far from disabling the pass. Logs show -fira_loop_pressure impact other passes >in addition to loop2_invariant (The result of "-fira_loop_pressure >-fno-move-loop-invariants" is different from the result of >"-fno-move-loop-invariants"). >> >> 2) By default -fira_loop_pressure is not enabled for -Os, the logic to >> compute >"regs_used" seams not sound. The following codes is from function >"find_invariants_to_move" >> { >> unsigned int n_regs = DF_REG_SIZE (df); >> >> regs_used = 2; >> >> for (i = 0; i < n_regs; i++) >> { >> if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i)) >> { >> /* This is a value that is used but not changed inside loop. >*/ >> regs_used++; >> } >> } >> } >> * There is no loop related inform in the code. >> * Benchmark logs show the condition (!DF_REGNO_FIRST_DEF (i) && >DF_REGNO_LAST_USE (i)) is never true. > >Still there is code that tries to deal with -Os. Simply disabling the pass >makes >that logic pointless.
If -fira-loop-pressure is not enabled, function estimate_reg_pressure_cost (cfgloopanal.c) is used to estimate the cost. At the beginning of the function, it checks /* If we have enough registers, we should use them and not restrict the transformations unnecessarily. */ if (regs_needed + target_res_regs <= available_regs) return 0; Here are the CSiBE benchmark logs before "if (...)" for ARM/MIPS/PPC/X86. available_regs target_res_regs regs_needed ARM : 9 3 2 MIPS: 10/26 3 2 PPC : 18/29 3 2 X86 : 6/15 3 2 regs_needed++ after invariant motion. The size_cost of the first several invariant (available_regs - target_res_regs(3) - regs_needed(2)) motions are always 0. So I prefer to disable the pass if -fira-loop-pressure is not enabled. >Thus, please try to fix the code that is there to deal with -Os (a target may >opt to >enable -fira-loop-pressure by default for -Os). Yes. Targets need tune to enable -fira-loop-pressure. For -fira-loop-pressure, CSiBE logs show MIPS and PPC have a little improvement and X86 has a little regression compared with -fira-loop-pressure is not enabled. If fira-loop-pressure is enabled, the cost check bases on if ((int) new_regs[pressure_class] + (int) regs_needed[pressure_class] + LOOP_DATA (curr_loop)->max_reg_pressure[pressure_class] + IRA_LOOP_RESERVED_REGS > ira_available_class_regs[pressure_class]) But a reg is available does not mean it can be used in any instruction. e.g. For ARM Cortex-M0, only few instructions can use r8-r15. (r8-r11, r13-r15 are already excluded in the available_regs). Logs show the result is much better if r12 is also excluded. Thanks! -Zhenqiang