Hello Richard: On 11/06/24 8:59 pm, Richard Sandiford wrote: > Ajit Agarwal <aagar...@linux.ibm.com> writes: >> On 11/06/24 7:07 pm, Richard Sandiford wrote: >>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>> Hello Richard: >>>> On 11/06/24 6:12 pm, Richard Sandiford wrote: >>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>>>> Hello Richard: >>>>>> >>>>>> On 11/06/24 5:15 pm, Richard Sandiford wrote: >>>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>>>>>> Hello Richard: >>>>>>>> On 11/06/24 4:56 pm, Ajit Agarwal wrote: >>>>>>>>> Hello Richard: >>>>>>>>> >>>>>>>>> On 11/06/24 4:36 pm, Richard Sandiford wrote: >>>>>>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>>>>>>>>>>>>> After LRA reload: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 >>>>>>>>>>>>>>> vect__302.545 ] [240]) >>>>>>>>>>>>>>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] >>>>>>>>>>>>>>> [1285]) >>>>>>>>>>>>>>> (const_int 16 [0x10])) [1 MEM <vector(2) >>>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4188]+16 S16 A64])) >>>>>>>>>>>>>>> "shell_lam.fppized.f":238:72 1190 {vsx_movv2df_64bit} >>>>>>>>>>>>>>> (nil)) >>>>>>>>>>>>>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 >>>>>>>>>>>>>>> vect__302.545 ] [240]) >>>>>>>>>>>>>>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) >>>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050]+16 ]) >>>>>>>>>>>>>>> (reg:V2DF 44 12 [3119]) >>>>>>>>>>>>>>> (neg:V2DF (reg:V2DF 51 19 [orig:240 >>>>>>>>>>>>>>> vect__302.545 ] [240]))))) {*vsx_nfmsv2df4} >>>>>>>>>>>>>>> (nil)) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (insn 2473 9311 9312 187 (set (reg:V2DF 38 6 [orig:905 >>>>>>>>>>>>>>> vect__302.545 ] [905]) >>>>>>>>>>>>>>> (neg:V2DF (fma:V2DF (reg:V2DF 44 12 [3119]) >>>>>>>>>>>>>>> (reg:V2DF 38 6 [orig:2561 MEM <vector(2) >>>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ] [2561]) >>>>>>>>>>>>>>> (neg:V2DF (reg:V2DF 47 15 [5266]))))) >>>>>>>>>>>>>>> {*vsx_nfmsv2df4} >>>>>>>>>>>>>>> (nil)) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In the above allocated code it assign registers 51 and 47 and >>>>>>>>>>>>>>> they are not sequential. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The reload for 2412 looks valid. What was the original >>>>>>>>>>>>>> pre-reload >>>>>>>>>>>>>> version of insn 2473? Also, what happened to insn 2472? Was it >>>>>>>>>>>>>> deleted? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This is preload version of 2473: >>>>>>>>>>>>> >>>>>>>>>>>>> (insn 2473 2396 2478 161 (set (reg:V2DF 905 [ vect__302.545 ]) >>>>>>>>>>>>> (neg:V2DF (fma:V2DF (reg:V2DF 4283 [3119]) >>>>>>>>>>>>> (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) >>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) 0) >>>>>>>>>>>>> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ >>>>>>>>>>>>> vect__300.543_236 ]) 0))))) {*vsx_nfmsv2df4} >>>>>>>>>>>>> (expr_list:REG_DEAD (reg:OO 2572 [ vect__300.543_236 ]) >>>>>>>>>>>>> (expr_list:REG_DEAD (reg:OO 2561 [ MEM <vector(2) >>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) >>>>>>>>>>>>> (nil)))) >>>>>>>>>>>>> >>>>>>>>>>>>> insn 2472 is replaced with 9299 after reload. >>>>>>>>>>>> >>>>>>>>>>>> You'd have to check the dumps to be sure, but I think 9299 is >>>>>>>>>>>> instead >>>>>>>>>>>> generated as an input reload of 2412, rather than being a >>>>>>>>>>>> replacement >>>>>>>>>>>> of insn 2472. T >>>>>>>>>>> >>>>>>>>>>> Yes it is generated for 2412. The predecessor of 2412 is load from >>>>>>>>>>> plus offset as in 2412 we have subreg:V2DF (reg OO 2572) 16). >>>>>>>>>>> >>>>>>>>>>> This is not correct as we are not generating lxvp and it is >>>>>>>>>>> normal load lxv. >>>>>>>>>>> As normal load is generated in predecessor insn of 2412 with >>>>>>>>>>> plus constant offset it breaks the correctness. >>>>>>>>>> >>>>>>>>>> Not using lxvp is a deliberate choice though. >>>>>>>>>> >>>>>>>>>> If a (reg:OO R) has been spilled, there's no requirement for LRA >>>>>>>>>> to load both halves of R when only one half is needed. LRA just >>>>>>>>>> loads what it needs into whichever registers happen to be free. >>>>>>>>>> >>>>>>>>>> If the reload of R instead used lxvp, LRA would be forced to free >>>>>>>>>> up another register for the other half of R, even though that value >>>>>>>>>> would never be used. >>>>>>>>>> >>>>>>>>> >>>>>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value >>>>>>>>> will be from plus offset 16 instead it should be loaded value >>>>>>>>> from zero offset. As in load fusion pass we are replacing >>>>>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value >>>>>>>>> is from plus 16 offsets and thats why its breaking the correctness. >>>>>>>>> >>>>>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0 >>>>>>>>> and loaded value is from 16 offset instead its loading from zero >>>>>>>>> offset and thats why we are breaking the correctness. >>>>>>>>> >>>>>>>> >>>>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value >>>>>>>> will be from plus offset 16 instead it should be loaded value >>>>>>>> from zero offset. As in load fusion pass we are replacing >>>>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value >>>>>>>> is from plus 16 offsets instead it should load from zero offset. >>>>>>>> Thats why its breaking the correctness. >>>>>>>> >>>>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0 >>>>>>>> and loaded value is from 16 offset instead its loading from zero >>>>>>>> offset and thats why we are breaking the correctness. >>>>>>> >>>>>>> I don't understand, sorry. (subreg:V2DI (reg:OO R) 0) is always >>>>>>> >>>>>>> (a) the first hard register in (reg:OO R), when the whole of R >>>>>>> is stored in hard registers >>>>>>> (b) at address offset 0 from the start of (reg:OO R), when R is >>>>>>> spilled to memory >>>>>>> >>>>>>> Similarly, (subreg:V2DI (reg:OO R) 16) is always >>>>>>> >>>>>>> (c) the second hard register in (reg:OO R), when the whole of R >>>>>>> is stored in hard registers >>>>>>> (d) at address offset 16 from the start of (reg:OO R), when R is >>>>>>> spilled to memory >>>>>>> >>>>>> >>>>>> Yes but we are replacing use of loaded value from plus 16 offset >>>>>> with subreg (reg OO ) 0 and similarly we are replacing use of loaded >>>>>> value from 0 offset with subreg (reg OO ) 16 as we are swapping >>>>>> the use operand. >>>>>> >>>>>> When it is spilled its vice versa subreg (reg OO ) 16 should be >>>>>> loaded from 0 offset and subreg (reg OO) 0 should be loaded >>>>>> from 16 offset as we are swapping the use operand. >>>>>> >>>>>> This is the semantics of lxvp. >>>>> >>>>> Hmm, OK. Does that mean that: >>>>> >>>>> lxvp A,B >>>>> >>>>> loads A+1 from B and A from B+16? (I couldn't find an online >>>>> description of the instruction btw -- is there one?) >>>>> >>>> >>>> Yes thats correct. Even I didn't find online document that >>>> describes the same. >>> >>> Thanks, I think I get it now. >>> >> >> Thanks a lot. Can I know what should we be doing with neg (fma) >> correctness failures with load fusion. > > I think it would involve: > > - describing lxvp and stxvp as unspec patterns, as I mentioned > in the previous reply > > - making plain movoo split loads and stores into individual > lxv and stxvs. (Or, alternative, it could use lxvp and stxvp, > but internally swap the registers after load and before store.) > That is, movoo should load the lower-numbered register from the > lower address and the higher-numbered register from the higher > address, and likewise for stores. >
Would you mind elaborating the above. > - make the fusion pass replace the first load result with > (subreg:V2DI (reg:OO R) 16) and the second load result with > (subreg:V2DI (reg:OO R) 0), as I think it already does. > > Thanks, > Richard Thanks & Regards Ajit