> > In the PR, the spill happens in the initial basic block of the function, > > i.e. > > the one with the highest frequency. > > > > Also as noted in the PR, swapping the 'unlikely' branch to 'likely' avoids > > the spill, > > even though it does not affect the frequency of the initial basic block, and > > makes the block with the use more rarely executed. > > The spill is mainly decided by 3 insns related to r92 > > 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ]) > 284 (reg:SF 102)) "test3.c":7:1 142 {*movsf_internal} > 285 (expr_list:REG_DEAD (reg:SF 102) > > 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ]) > 289 (subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 > {*movsi_internal} > 290 (nil)) > > And > 382(insn 28 27 29 5 (set (reg:DF 98) > 383 (float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 > {*extendsfdf2} > 384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ]) > 385 (nil))) > 386(insn 29 28 30 5 (s > > The frequency the for INSN 3 and INSN 9 is not affected, but frequency of INSN > 28 drop from 805 -> 89 after swapping "unlikely" and "likely". Because of > that, GPR cost decreases a lot, finally make the RA choose GPR instead of MEM. > > GENERAL_REGS:2356,2356 > SSE_REGS:6000,6000 > MEM:4089,4089
But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't make sense that selecting a GPR for it looks cheaper than xmm0. > Dump of 301.ira: > 67 a4(r92,l0) costs: AREG:2356,2356 DREG:2356,2356 CREG:2356,2356 > BREG:2356,2356 SIREG:2356,2356 DIREG:2356,2356 AD_REGS:2356,2356 > CLOBBERED_REGS:2356,2356 Q_REGS:2356,2356 NON_Q_REGS:2356,2356 > TLS_GOTBASE_REGS:2356,2356 GENERAL_REGS:2356,2356 SSE_FIRST_REG:6000,6000 > NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 \ > MMX_REGS:19534,19534 INT_SSE_REGS:19534,19534 ALL_REGS:214534,214534 > MEM:4089,4089 > > And although there's no spill, there's an extra VMOVD in the later BB which > looks suboptimal(Guess we can stand with that since it's cold.) I think that falls out of the wrong decision for SSE_REGS cost. Alexander > > 24 vmovd %eax, %xmm2 > 25 vcvtss2sd %xmm2, %xmm2, %xmm1 > 26 vmulsd %xmm0, %xmm1, %xmm0 > 27 vcvtsd2ss %xmm0, %xmm0, %xmm0 > > > > Do you have a root cause analysis that explains the above? > > > > Alexander