Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>>>>>>> + >>>>>>>>> + rtx set = single_set (insn); >>>>>>>>> + if (set == NULL_RTX) >>>>>>>>> + return false; >>>>>>>>> + >>>>>>>>> + rtx op0 = SET_SRC (set); >>>>>>>>> + rtx_code code = GET_CODE (op0); >>>>>>>>> + >>>>>>>>> + // This check is added as register pairs are not generated >>>>>>>>> + // by RA for neg:V2DF (fma: V2DF (reg1) >>>>>>>>> + // (reg2) >>>>>>>>> + // (neg:V2DF (reg3))) >>>>>>>>> + if (GET_RTX_CLASS (code) == RTX_UNARY) >>>>>>>>> + return false; >>>>>>>> >>>>>>>> What's special about (neg (fma ...))? >>>>>>>> >>>>>>> >>>>>>> I am not sure why register allocator fails allocating register pairs >>>>>>> with >>>>>>> NEG Unary operation with fma operand. I have not debugged register >>>>>>> allocator why the NEG >>>>>>> Unary operation with fma operand. >>>>>> >>>>> >>>>> For neg (fma ...) cases because of subreg 128 bits from OOmode 256 bits >>>>> are >>>>> set correctly. >>>>> IRA marked them spill candidates as spill priority is zero. >>>>> >>>>> Due to this LRA reload pass couldn't allocate register pairs. >>>> >>>> I think this is just restating the symptom though. I suppose the same >>>> kind of questions apply here too: what was the instruction before the >>>> pass runs, what was the instruction after the pass runs, and why is >>>> the rtl change incorrect (by the meaning above)? >>>> >>> >>> Original case where we dont do load fusion, spill happens, in that >>> case we dont require sequential register pairs to be generated for 2 loads >>> for. Hence it worked. >>> >>> rtl change is correct and there is no error. >>> >>> for load fusion spill happens and we dont generate sequential register pairs >>> because pf spill candidate and lxvp gives incorrect results as sequential >>> register >>> pairs are required for lxvp. >> >> Can you go into more detail? How is the lxvp represented? And how do >> we end up not getting a sequential register pair? What does the rtl >> look like (before and after things have gone wrong)? >> >> It seems like either the rtl is not describing the result of the fusion >> correctly or there is some problem in the .md description of lxvp. >> > > After fusion pass: > > (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] [240]) > (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285]) > (const_int 16 [0x10])) [1 MEM <vector(2) real(kind=8)> > [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 > {vsx_movv2df_64bit} > (nil)) > (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] [240]) > (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) real(kind=8)> > [(real(kind=8) *)_4050]+16 ]) > (reg:V2DF 44 12 [3119]) > (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] > [240]))))) {*vsx_nfmsv2df4} > (nil)) > > In LRA reload. > > (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ]) > (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM <vector(2) > real(kind=8)> [(real(kind=8) *)_4188]+0 S16 A64])) > "shell_lam.fppized.f":238:72 2187 {*movoo} > (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) > [1 MEM <vector(2) real(kind=8)> [(real(kind=8) *)_4188]+0 S16 A64]) > (nil))) > (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ]) > (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) > real(kind=8)> [(real(kind=8) *)_4050] ]) 16) > (reg:V2DF 4283 [3119]) > (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 ]) > 16))))) {*vsx_nfmsv2df4} > (nil)) > > > In LRA reload sequential registers are not generated as r2572 is splled and > move to spill location > in stack and subsequent uses loads from stack. Hence sequential registers > pairs are not generated. > > lxvp vsx0, 0(r1). > > It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use > sequential register pairs. > > Without load fusion since 2 loads exists and 2 loads need not require > sequential registers > hence it worked but with load fusion and using lxvp it requires sequential > register pairs.
Do you mean that this is a performance regression? I.e. the fact that lxvp requires sequential registers causes extra spilling, due to having less allocation freedom? Or is it a correctness problem? If so, what is it? Nothing in the rtl above looks wrong in principle (although I've no idea if the REG_EQUIV is correct in this context). What does the allocated code look like, and why is it wrong? If (reg:OO 2561) is spilled and then one half of it used, only that half needs to be loaded from the spill slot. E.g. if (reg:OO 2561) is reloaded for insn 2412 on its own, only the second half of the register needs to be loaded from memory. Richard