https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
Bug ID: 114729 Summary: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vineetg at gcc dot gnu.org CC: jeffreyalaw at gmail dot com, kito.cheng at gmail dot com, rdapp at gcc dot gnu.org Target Milestone: --- Created attachment 57953 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57953&action=edit spec cactu reduced In RISC-V SPEC runs, Cactu dynamic icounts are worst of all (compared to aarch64 with similar build toggles: -Ofast). As of Upstream commit 3fed1609f610 of 2024-01-31: aarch64: 1,363,212,534,747 vs. risc-v : 2,852,277,890,338 There's an existing issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106265 which captures ongoing work to improve the stack/array accesses. However that is more of damage control. The root cause happens to be excessive stack spills on RISC-V. Robin noticed these were somehow triggered by first scheduling pass. Disabling sched1 with -fno-schedule-insns brings down the total icount to half 1,295,520,619,523 which is even slightly better than aarch64, all things considered. I ran a reducer (tracking token sfp in -verbose-asm output) and was able to get a test which shows a single stack spill (store+load) with default/-fschedule-insns and none with -fno-schedule-insns. It seems sched1 is moving insn around, but the actual spills are generated by IRA. So this is an interplay of sched1 and IRA. ``` ira New iteration of spill/restore move Changing RTL for loop 2 (header bb6) Changing RTL for loop 1 (header bb4) 26 vs parent 26:Creating newreg=246 from oldreg=137 25 vs parent 25:Creating newreg=247 from oldreg=143 11 vs parent 11:Creating newreg=248 from oldreg=223 16 vs parent 16:Creating newreg=249 from oldreg=237 Changing RTL for loop 3 (header bb3) 26 vs parent 26:Creating newreg=250 from oldreg=246 25 vs parent 25:Creating newreg=251 from oldreg=247 -1 vs parent 11:Creating newreg=253 from oldreg=248 16 vs parent 16:Creating newreg=254 from oldreg=249 ... scanning new insn with uid = 181. scanning new insn with uid = 182. scanning new insn with uid = 183. scanning new insn with uid = 184. changing bb of uid 194 unscanned insn scanning new insn with uid = 185. scanning new insn with uid = 186. scanning new insn with uid = 187. scanning new insn with uid = 188. changing bb of uid 195 unscanned insn ... +++Costs: overall 11650, reg 10680, mem 970, ld 485, st 485, move 1366 +++ move loops 0, new jumps 2 ... (insn 9 104 11 2 (set (reg/f:DI 137 [ r.4_4 ]) (mem/f/c:DI (lo_sum:DI (reg/f:DI 155) (symbol_ref:DI ("r") [flags 0x86] <var_decl 0x7a69fcdb1630 r>)) [4 r+0 S8 A64])) {*movdi_64bit} (expr_list:REG_DEAD (reg/f:DI 155) (expr_list:REG_EQUAL (mem/f/c:DI (symbol_ref:DI ("r") [flags 0x86] <var_decl 0x7a69fcdb1630 r>) [4 r+0 S8 A64]) (insn 115 165 181 2 (set (reg:DI 245) (const_int 1 [0x1])) {*movdi_64bit} (expr_list:REG_EQUIV (const_int 1 [0x1]) ---- spill code start ----- (insn 181 115 182 2 (set (reg/f:DI 246 [orig:137 r.4_4 ] [137]) (reg/f:DI 137 [ r.4_4 ])) {*movdi_64bit} (expr_list:REG_DEAD (reg/f:DI 137 [ r.4_4 ]) (insn 182 181 183 2 (set (reg/f:DI 247 [orig:143 w.9_10 ] [143]) (reg/f:DI 143 [ w.9_10 ])) {*movdi_64bit} (expr_list:REG_DEAD (reg/f:DI 143 [ w.9_10 ]) (insn 183 182 184 2 (set (reg:DI 248 [orig:223 MEM[(int *)j.15_19 + 4B] ] [223]) (reg:DI 223 [ MEM[(int *)j.15_19 + 4B] ])) {*movdi_64bit} (expr_list:REG_DEAD (reg:DI 223 [ MEM[(int *)j.15_19 + 4B] ]) (insn 184 183 174 2 (set (reg:DI 249 [orig:237 _38 ] [237]) (reg:DI 237 [ _38 ])) {*movdi_64bit} (expr_list:REG_DEAD (reg:DI 237 [ _38 ]) ---- spill code ----- (jump_insn 174 184 175 2 (set (pc) (label_ref 100)) 350 {jump} (nil) -> 100) (barrier 175 174 196) ---- spill code start ----- (code_label 196 175 195 10 10 (nil) [1 uses]) (note 195 196 189 10 [bb 10] NOTE_INSN_BASIC_BLOCK) (insn 189 195 190 10 (set (reg/f:DI 250 [orig:137 r.4_4 ] [137]) (reg/f:DI 246 [orig:137 r.4_4 ] [137])) {*movdi_64bit} (expr_list:REG_DEAD (reg/f:DI 246 [orig:137 r.4_4 ] [137]) (insn 190 189 191 10 (set (reg/f:DI 251 [orig:143 w.9_10 ] [143]) (reg/f:DI 247 [orig:143 w.9_10 ] [143])) {*movdi_64bit} (expr_list:REG_DEAD (reg/f:DI 247 [orig:143 w.9_10 ] [143]) (insn 191 190 192 10 (set (reg/v:DI 252 [orig:152 i ] [152]) (reg/v:DI 152 [ i ])) 208 {*movdi_64bit} (expr_list:REG_DEAD (reg/v:DI 152 [ i ]) (insn 192 191 193 10 (set (reg:DI 253 [orig:223 MEM[(int *)j.15_19 + 4B] ] [223]) (reg:DI 248 [orig:223 MEM[(int *)j.15_19 + 4B] ] [223])) {*movdi_64bit} (expr_list:REG_DEAD (reg:DI 248 [orig:223 MEM[(int *)j.15_19 + 4B] ] [223]) (insn 193 192 97 10 (set (reg:DI 254 [orig:237 _38 ] [237]) (reg:DI 249 [orig:237 _38 ] [237])) {*movdi_64bit} (expr_list:REG_DEAD (reg:DI 249 [orig:237 _38 ] [237]) ---- spill code ----- (code_label 97 193 14 3 3 (nil) [1 uses]) (note 14 97 33 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 17 56 21 3 (set (reg:DF 159 [ l ]) (mem/c:DF (lo_sum:DI (reg/f:DI 234) ```