Hello Richard: On 10/06/24 3:20 pm, Richard Sandiford wrote: > Ajit Agarwal <aagar...@linux.ibm.com> writes: >> Hello Richard: >> >> On 10/06/24 2:52 pm, Richard Sandiford wrote: >>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>> On 10/06/24 2:12 pm, Richard Sandiford wrote: >>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + rtx set = single_set (insn); >>>>>>>>>>>>>> + if (set == NULL_RTX) >>>>>>>>>>>>>> + return false; >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + rtx op0 = SET_SRC (set); >>>>>>>>>>>>>> + rtx_code code = GET_CODE (op0); >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + // This check is added as register pairs are not >>>>>>>>>>>>>> generated >>>>>>>>>>>>>> + // by RA for neg:V2DF (fma: V2DF (reg1) >>>>>>>>>>>>>> + // (reg2) >>>>>>>>>>>>>> + // (neg:V2DF (reg3))) >>>>>>>>>>>>>> + if (GET_RTX_CLASS (code) == RTX_UNARY) >>>>>>>>>>>>>> + return false; >>>>>>>>>>>>> >>>>>>>>>>>>> What's special about (neg (fma ...))? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am not sure why register allocator fails allocating register >>>>>>>>>>>> pairs with >>>>>>>>>>>> NEG Unary operation with fma operand. I have not debugged register >>>>>>>>>>>> allocator why the NEG >>>>>>>>>>>> Unary operation with fma operand. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> For neg (fma ...) cases because of subreg 128 bits from OOmode 256 >>>>>>>>>> bits are >>>>>>>>>> set correctly. >>>>>>>>>> IRA marked them spill candidates as spill priority is zero. >>>>>>>>>> >>>>>>>>>> Due to this LRA reload pass couldn't allocate register pairs. >>>>>>>>> >>>>>>>>> I think this is just restating the symptom though. I suppose the same >>>>>>>>> kind of questions apply here too: what was the instruction before the >>>>>>>>> pass runs, what was the instruction after the pass runs, and why is >>>>>>>>> the rtl change incorrect (by the meaning above)? >>>>>>>>> >>>>>>>> >>>>>>>> Original case where we dont do load fusion, spill happens, in that >>>>>>>> case we dont require sequential register pairs to be generated for 2 >>>>>>>> loads >>>>>>>> for. Hence it worked. >>>>>>>> >>>>>>>> rtl change is correct and there is no error. >>>>>>>> >>>>>>>> for load fusion spill happens and we dont generate sequential register >>>>>>>> pairs >>>>>>>> because pf spill candidate and lxvp gives incorrect results as >>>>>>>> sequential register >>>>>>>> pairs are required for lxvp. >>>>>>> >>>>>>> Can you go into more detail? How is the lxvp represented? And how do >>>>>>> we end up not getting a sequential register pair? What does the rtl >>>>>>> look like (before and after things have gone wrong)? >>>>>>> >>>>>>> It seems like either the rtl is not describing the result of the fusion >>>>>>> correctly or there is some problem in the .md description of lxvp. >>>>>>> >>>>>> >>>>>> After fusion pass: >>>>>> >>>>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] >>>>>> [240]) >>>>>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285]) >>>>>> (const_int 16 [0x10])) [1 MEM <vector(2) real(kind=8)> >>>>>> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 >>>>>> {vsx_movv2df_64bit} >>>>>> (nil)) >>>>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] >>>>>> [240]) >>>>>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) >>>>>> real(kind=8)> [(real(kind=8) *)_4050]+16 ]) >>>>>> (reg:V2DF 44 12 [3119]) >>>>>> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] >>>>>> [240]))))) {*vsx_nfmsv2df4} >>>>>> (nil)) >>>>>> >>>>>> In LRA reload. >>>>>> >>>>>> (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ]) >>>>>> (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM >>>>>> <vector(2) real(kind=8)> [(real(kind=8) *)_4188]+0 S16 A64])) >>>>>> "shell_lam.fppized.f":238:72 2187 {*movoo} >>>>>> (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] >>>>>> [1285]) [1 MEM <vector(2) real(kind=8)> [(real(kind=8) *)_4188]+0 S16 >>>>>> A64]) >>>>>> (nil))) >>>>>> (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ]) >>>>>> (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) >>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) 16) >>>>>> (reg:V2DF 4283 [3119]) >>>>>> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 >>>>>> ]) 16))))) {*vsx_nfmsv2df4} >>>>>> (nil)) >>>>>> >>>>>> >>>>>> In LRA reload sequential registers are not generated as r2572 is splled >>>>>> and move to spill location >>>>>> in stack and subsequent uses loads from stack. Hence sequential >>>>>> registers pairs are not generated. >>>>>> >>>>>> lxvp vsx0, 0(r1). >>>>>> >>>>>> It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use >>>>>> sequential register pairs. >>>>>> >>>>>> Without load fusion since 2 loads exists and 2 loads need not require >>>>>> sequential registers >>>>>> hence it worked but with load fusion and using lxvp it requires >>>>>> sequential register pairs. >>>>> >>>>> Do you mean that this is a performance regression? I.e. the fact that >>>>> lxvp requires sequential registers causes extra spilling, due to having >>>>> less allocation freedom? >>>>> >>>>> Or is it a correctness problem? If so, what is it? Nothing in the rtl >>>>> above looks wrong in principle (although I've no idea if the REG_EQUIV >>>>> is correct in this context). What does the allocated code look like, >>>>> and why is it wrong? >>>>> >>>>> If (reg:OO 2561) is spilled and then one half of it used, only that half >>>>> needs to be loaded from the spill slot. E.g. if (reg:OO 2561) is reloaded >>>>> for insn 2412 on its own, only the second half of the register needs to be >>>>> loaded from memory. >>>>> >>>> >>>> This is bwaves spec 2017 benchmark. Spill happening in register allocator >>>> could be because of less registers available in order to generate >>>> sequential registers for lxvp. >>>> >>>> Because of spill sequential registers are not generated and breaks the >>>> correctness. REG_EQUIV is generated by IRA. >>>> >>>> Allocated code because of spill doesn't generate sequential registers. >>>> In LRA reload because of spill marked for IRA adjust to another >>>> register causing not to generate sequential registers. >>>> >>>> reg:OO 2572 is spilled not reg:OO 2561. Because of spilled its >>>> loaded from memory instead of generating sequential registers. >>>> >>>> Other reasons for spilling because of long live range for reg:OO 2572. >>>> >>>> We can add heuristics in fusion code not to fuse for longer live >>>> ranges that should solve the problem. >>> >>> This doesn't describe the real problem though. It's natural for >>> registers to be spilled sometimes. That would happen for OOmode >>> registers even if we didn't form them in the fusion pass. And it's >>> normal GCC semantics that the two hard registers allocated to an OOmode >>> pseudo are consecutive. >>> >>> Like I asked above, please show the allocated code (including spills >>> and reloads) and explain why it's wrong. >>> >> >> After LRA reload: >> >> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] >> [240]) >> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285]) >> (const_int 16 [0x10])) [1 MEM <vector(2) real(kind=8)> >> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 >> {vsx_movv2df_64bit} >> (nil)) >> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] >> [240]) >> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) real(kind=8)> >> [(real(kind=8) *)_4050]+16 ]) >> (reg:V2DF 44 12 [3119]) >> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] >> [240]))))) {*vsx_nfmsv2df4} >> (nil)) >> >> (insn 2473 9311 9312 187 (set (reg:V2DF 38 6 [orig:905 vect__302.545 ] [905]) >> (neg:V2DF (fma:V2DF (reg:V2DF 44 12 [3119]) >> (reg:V2DF 38 6 [orig:2561 MEM <vector(2) real(kind=8)> >> [(real(kind=8) *)_4050] ] [2561]) >> (neg:V2DF (reg:V2DF 47 15 [5266]))))) {*vsx_nfmsv2df4} >> (nil)) >> >> In the above allocated code it assign registers 51 and 47 and they are not >> sequential. > > The reload for 2412 looks valid. What was the original pre-reload > version of insn 2473? Also, what happened to insn 2472? Was it deleted? >
This is preload version of 2473: (insn 2473 2396 2478 161 (set (reg:V2DF 905 [ vect__302.545 ]) (neg:V2DF (fma:V2DF (reg:V2DF 4283 [3119]) (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) real(kind=8)> [(real(kind=8) *)_4050] ]) 0) (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 ]) 0))))) {*vsx_nfmsv2df4} (expr_list:REG_DEAD (reg:OO 2572 [ vect__300.543_236 ]) (expr_list:REG_DEAD (reg:OO 2561 [ MEM <vector(2) real(kind=8)> [(real(kind=8) *)_4050] ]) (nil)))) insn 2472 is replaced with 9299 after reload. > Richard Thanks & Regards Ajit