https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
--- Comment #16 from Jeffrey A. Law <law at gcc dot gnu.org> --- This is looking more and more like an LRA bug to me. After IRA we have this sequence: (insn 30 28 31 2 (set (reg:QI 10 a0) (reg:QI 151 [ D.129558 ])) "j.C":71:30 284 {*movqi_internal} (nil)) (call_insn 31 30 73 2 (parallel [ (set (reg:RVVM8QI 104 v8) (call (mem:SI (symbol_ref:DI ("_Z15InterleaveUpperI4SimdIcLl1EEu14__rvv_int8m8_tET0_T_S3_S3_") [flags 0x41] <function_decl 0x7ffff0b9cd00 InterleaveUpper>) [0 InterleaveUpper S4 A32]) (const_int 0 [0]))) (use (unspec:SI [ (const_int 1 [0x1]) ] UNSPEC_CALLEE_CC)) (clobber (reg:SI 1 ra)) ]) "j.C":71:30 468 {call_value_internal} (expr_list:REG_DEAD (reg:QI 10 a0) (expr_list:REG_DEAD (reg:RVVM8QI 112 v16) (expr_list:REG_CALL_DECL (symbol_ref:DI ("_Z15InterleaveUpperI4SimdIcLl1EEu14__rvv_int8m8_tET0_T_S3_S3_") [flags 0x41] <function_decl 0x7ffff0b9cd00 InterleaveUpper>) (nil)))) (expr_list:QI (use (reg:QI 10 a0)) (expr_list:RVVM8QI (use (reg:RVVM8QI 104 v8)) (expr_list:RVVM8QI (use (reg:RVVM8QI 112 v16)) (nil))))) (insn 73 31 33 2 (parallel [ (set (reg:RVVM8QI 159 [ v8 ]) (reg:RVVM8QI 104 v8)) (use (reg:SI 67 vtype)) ]) "j.C":71:30 2850 {*movrvvm8qi_reg_whole_vtype} (nil)) (insn 33 73 37 2 (set (reg:RVVM1BI 145 [ _16 ]) (if_then_else:RVVM1BI (unspec:RVVM1BI [ (const_vector:RVVM1BI repeat [ (const_int 1 [0x1]) ]) (const_int 0 [0]) (const_int 2 [0x2]) (const_int 0 [0]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (gt:RVVM1BI (reg:RVVM8QI 135 [ _4 ]) (const_vector:RVVM8QI repeat [ (const_int 0 [0]) ])) (unspec:RVVM1BI [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) "j.C":51:34 12655 {*pred_cmprvvm8qi_narrow} (expr_list:REG_DEAD (reg:DI 0 zero) (nil))) (insn 37 33 35 2 (set (reg:QI 10 a0) (reg:QI 151 [ D.129558 ])) "j.C":72:17 discrim 1 284 {*movqi_internal} (nil)) (insn 35 37 38 2 (set (reg:RVVM8QI 112 v16) (if_then_else:RVVM8QI (unspec:RVVM1BI [ (reg:RVVM1BI 145 [ _16 ]) (const_int 0 [0]) (const_int 2 [0x2]) (const_int 0 [0]) repeated x2 (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (minus:RVVM8QI (reg:RVVM8QI 135 [ _4 ]) (reg:RVVM8QI 159 [ v8 ])) (reg:RVVM8QI 159 [ v8 ]))) "j.C":72:17 discrim 1 5590 {pred_subrvvm8qi} (expr_list:REG_DEAD (reg:SI 67 vtype) (expr_list:REG_DEAD (reg:SI 66 vl) (nil)))) (call_insn 38 35 40 2 (parallel [ (call (mem:SI (symbol_ref:DI ("_Z14AssertVecEqualI4SimdIcLl1EEEvT_DTcl4ZerocvS2__EEES3_") [flags 0x41] <function_decl 0x7ffff0b93c00 AssertVecEqual>) [0 AssertVecEqual S4 A32]) (const_int 0 [0])) (use (unspec:SI [ (const_int 1 [0x1]) ] UNSPEC_CALLEE_CC)) (clobber (reg:SI 1 ra)) ]) "j.C":72:17 discrim 1 467 {call_internal} (expr_list:REG_DEAD (reg:QI 10 a0) (expr_list:REG_DEAD (reg:RVVM8QI 112 v16) (expr_list:REG_DEAD (reg:RVVM8QI 104 v8) (expr_list:REG_CALL_DECL (symbol_ref:DI ("_Z14AssertVecEqualI4SimdIcLl1EEEvT_DTcl4ZerocvS2__EEES3_") [flags 0x41] <function_decl 0x7ffff0b93c00 AssertVecEqual>) (nil))))) (expr_list:QI (use (reg:QI 10 a0)) (expr_list:QI (use (reg:QI 10 a0)) (expr_list:RVVM8QI (use (reg:RVVM8QI 104 v8)) (expr_list:RVVM8QI (use (reg:RVVM8QI 112 v16)) (nil))))) A few points to note. Insn 31 is obviously a call which sets v8..v15, as expected. We do copy the result into a pseudo at insn 73. But note we keep the v8..v15 live. That is potentially worrisome. Insn 33 is pretty straightforward, though it is worth noting that with LMUL=8 the output is an earlyclobber. Insn 35 is our problem. Note that it has a hard register ouput (v16..v23), that's a result of it being used as an argument register for the call at insn 38. The call at insn 38 is also using the value in v8..v15 as the other argument. If we assume that v8..v15 are live, v0 is the mask, that leaves just v16..23 and v24..31 for the input operands. ie, we're going to have to tie an input to an output to get a valid allocation. But neither of the arguments to the MINUS are dead. Clearly we're going to need some reloading. If we didn't have v8..15 already live, then we could potentially have had an easy allocation. I explored that a bit by hacking up cse and using compile-time flags to ensure the copy that was originally between insns 35 and 38 didn't get removed. I'll write more tomorrow, but the deeper I get into this the more it feels like an LRA issue. Yea, it sucks that with LMUL=8 we're going to blow out the register file and reload like crazy. Clearly those who wanted this concept to gang up registers weren't compiler junkies. The code we're going to get for this scenario is going to be awful because we've blown out the register file, but we shouldn't ICE.