Hello Richard: On 11/06/24 7:07 pm, Richard Sandiford wrote: > Ajit Agarwal <aagar...@linux.ibm.com> writes: >> Hello Richard: >> On 11/06/24 6:12 pm, Richard Sandiford wrote: >>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>> Hello Richard: >>>> >>>> On 11/06/24 5:15 pm, Richard Sandiford wrote: >>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>>>> Hello Richard: >>>>>> On 11/06/24 4:56 pm, Ajit Agarwal wrote: >>>>>>> Hello Richard: >>>>>>> >>>>>>> On 11/06/24 4:36 pm, Richard Sandiford wrote: >>>>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>>>>>>>>>>> After LRA reload: >>>>>>>>>>>>> >>>>>>>>>>>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 >>>>>>>>>>>>> vect__302.545 ] [240]) >>>>>>>>>>>>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] >>>>>>>>>>>>> [1285]) >>>>>>>>>>>>> (const_int 16 [0x10])) [1 MEM <vector(2) >>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4188]+16 S16 A64])) >>>>>>>>>>>>> "shell_lam.fppized.f":238:72 1190 {vsx_movv2df_64bit} >>>>>>>>>>>>> (nil)) >>>>>>>>>>>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 >>>>>>>>>>>>> vect__302.545 ] [240]) >>>>>>>>>>>>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) >>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050]+16 ]) >>>>>>>>>>>>> (reg:V2DF 44 12 [3119]) >>>>>>>>>>>>> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 >>>>>>>>>>>>> ] [240]))))) {*vsx_nfmsv2df4} >>>>>>>>>>>>> (nil)) >>>>>>>>>>>>> >>>>>>>>>>>>> (insn 2473 9311 9312 187 (set (reg:V2DF 38 6 [orig:905 >>>>>>>>>>>>> vect__302.545 ] [905]) >>>>>>>>>>>>> (neg:V2DF (fma:V2DF (reg:V2DF 44 12 [3119]) >>>>>>>>>>>>> (reg:V2DF 38 6 [orig:2561 MEM <vector(2) >>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ] [2561]) >>>>>>>>>>>>> (neg:V2DF (reg:V2DF 47 15 [5266]))))) >>>>>>>>>>>>> {*vsx_nfmsv2df4} >>>>>>>>>>>>> (nil)) >>>>>>>>>>>>> >>>>>>>>>>>>> In the above allocated code it assign registers 51 and 47 and >>>>>>>>>>>>> they are not sequential. >>>>>>>>>>>> >>>>>>>>>>>> The reload for 2412 looks valid. What was the original pre-reload >>>>>>>>>>>> version of insn 2473? Also, what happened to insn 2472? Was it >>>>>>>>>>>> deleted? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is preload version of 2473: >>>>>>>>>>> >>>>>>>>>>> (insn 2473 2396 2478 161 (set (reg:V2DF 905 [ vect__302.545 ]) >>>>>>>>>>> (neg:V2DF (fma:V2DF (reg:V2DF 4283 [3119]) >>>>>>>>>>> (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) >>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) 0) >>>>>>>>>>> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ >>>>>>>>>>> vect__300.543_236 ]) 0))))) {*vsx_nfmsv2df4} >>>>>>>>>>> (expr_list:REG_DEAD (reg:OO 2572 [ vect__300.543_236 ]) >>>>>>>>>>> (expr_list:REG_DEAD (reg:OO 2561 [ MEM <vector(2) >>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) >>>>>>>>>>> (nil)))) >>>>>>>>>>> >>>>>>>>>>> insn 2472 is replaced with 9299 after reload. >>>>>>>>>> >>>>>>>>>> You'd have to check the dumps to be sure, but I think 9299 is instead >>>>>>>>>> generated as an input reload of 2412, rather than being a replacement >>>>>>>>>> of insn 2472. T >>>>>>>>> >>>>>>>>> Yes it is generated for 2412. The predecessor of 2412 is load from >>>>>>>>> plus offset as in 2412 we have subreg:V2DF (reg OO 2572) 16). >>>>>>>>> >>>>>>>>> This is not correct as we are not generating lxvp and it is >>>>>>>>> normal load lxv. >>>>>>>>> As normal load is generated in predecessor insn of 2412 with >>>>>>>>> plus constant offset it breaks the correctness. >>>>>>>> >>>>>>>> Not using lxvp is a deliberate choice though. >>>>>>>> >>>>>>>> If a (reg:OO R) has been spilled, there's no requirement for LRA >>>>>>>> to load both halves of R when only one half is needed. LRA just >>>>>>>> loads what it needs into whichever registers happen to be free. >>>>>>>> >>>>>>>> If the reload of R instead used lxvp, LRA would be forced to free >>>>>>>> up another register for the other half of R, even though that value >>>>>>>> would never be used. >>>>>>>> >>>>>>> >>>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value >>>>>>> will be from plus offset 16 instead it should be loaded value >>>>>>> from zero offset. As in load fusion pass we are replacing >>>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value >>>>>>> is from plus 16 offsets and thats why its breaking the correctness. >>>>>>> >>>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0 >>>>>>> and loaded value is from 16 offset instead its loading from zero >>>>>>> offset and thats why we are breaking the correctness. >>>>>>> >>>>>> >>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value >>>>>> will be from plus offset 16 instead it should be loaded value >>>>>> from zero offset. As in load fusion pass we are replacing >>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value >>>>>> is from plus 16 offsets instead it should load from zero offset. >>>>>> Thats why its breaking the correctness. >>>>>> >>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0 >>>>>> and loaded value is from 16 offset instead its loading from zero >>>>>> offset and thats why we are breaking the correctness. >>>>> >>>>> I don't understand, sorry. (subreg:V2DI (reg:OO R) 0) is always >>>>> >>>>> (a) the first hard register in (reg:OO R), when the whole of R >>>>> is stored in hard registers >>>>> (b) at address offset 0 from the start of (reg:OO R), when R is >>>>> spilled to memory >>>>> >>>>> Similarly, (subreg:V2DI (reg:OO R) 16) is always >>>>> >>>>> (c) the second hard register in (reg:OO R), when the whole of R >>>>> is stored in hard registers >>>>> (d) at address offset 16 from the start of (reg:OO R), when R is >>>>> spilled to memory >>>>> >>>> >>>> Yes but we are replacing use of loaded value from plus 16 offset >>>> with subreg (reg OO ) 0 and similarly we are replacing use of loaded >>>> value from 0 offset with subreg (reg OO ) 16 as we are swapping >>>> the use operand. >>>> >>>> When it is spilled its vice versa subreg (reg OO ) 16 should be >>>> loaded from 0 offset and subreg (reg OO) 0 should be loaded >>>> from 16 offset as we are swapping the use operand. >>>> >>>> This is the semantics of lxvp. >>> >>> Hmm, OK. Does that mean that: >>> >>> lxvp A,B >>> >>> loads A+1 from B and A from B+16? (I couldn't find an online >>> description of the instruction btw -- is there one?) >>> >> >> Yes thats correct. Even I didn't find online document that >> describes the same. > > Thanks, I think I get it now. >
Thanks a lot. Can I know what should we be doing with neg (fma) correctness failures with load fusion. >>> If lxvp does behave like that, it shouldn't be described as a normal move: >>> >>> [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa") >>> (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))] >>> >>> since the rules I listed above are target-independent. >>> >> >> Rules for load for lxvp should as normal, but how the use >> of lxvp defines should be different for lxvp than normal >> load. > > I don't think that's true though. Because of the rules above, > GCC expects that: > > (set (reg:OO R) (mem:OO ADDR)) > > will load ADDR+0 into R and ADDR+16 into R+1. Anything else will > cause exactly the kind of mix-up that you're seeing here. > > In the worst case, the load should be described as: > > (set (reg:OO R) > (unspec:OO [(mem:OO ADDR)] UNSPEC_LXVP)) > > although there are other alternatives that could be used if OOmode > weren't defined as an OPAQUE_MODE. > If not described as above breaks the correctness. >>> Alternatively, if lxvp is described as a normal move, >>> with the 128-bit pieces swapped compared to GCC's normal order, >>> TARGET_CAN_CHANGE_MODE_CLASS must prevent subregs that select >>> one half of the register (or less). But that would defeat exactly >>> the optimisation that you're trying to do. >>> >> >> Sorry, I didn't get how TARGET_CAN_CHANGE_MODE_CLASS prevents >> subreg that selects one half of the registers. > > It doesn't prevent the subregs at such, but it stops the mode > change from happening in certain registers. > > For example, if we have (subreg:V2DI (reg:OO R) 0), and if > TARGET_CAN_CHANGE_MODE_CLASS disallows changes between V2DI > and OO in all register classes that can hold OO, then the register > allocator would handle the subreg by spilling register R to memory > and loading a V2DI piece of it from there. > > This would allow the port to store OOmode in an unusual order. > But like I say, it would also defeat exactly what you're trying > to do here. > Sure. Thanks. > Thanks, > Richard Thanks & Regards Ajit