Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Richard Sandiford Tue, 11 Jun 2024 08:29:14 -0700

Ajit Agarwal <aagar...@linux.ibm.com> writes:
> On 11/06/24 7:07 pm, Richard Sandiford wrote:
>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>> Hello Richard:
>>> On 11/06/24 6:12 pm, Richard Sandiford wrote:
>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>> Hello Richard:
>>>>>
>>>>> On 11/06/24 5:15 pm, Richard Sandiford wrote:
>>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>>> Hello Richard:
>>>>>>> On 11/06/24 4:56 pm, Ajit Agarwal wrote:
>>>>>>>> Hello Richard:
>>>>>>>>
>>>>>>>> On 11/06/24 4:36 pm, Richard Sandiford wrote:
>>>>>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>>>>>>>>>> After LRA reload:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 
>>>>>>>>>>>>>> vect__302.545 ] [240])
>>>>>>>>>>>>>>         (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] 
>>>>>>>>>>>>>> [1285])
>>>>>>>>>>>>>>                 (const_int 16 [0x10])) [1 MEM <vector(2) 
>>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4188]+16 S16 A64])) 
>>>>>>>>>>>>>> "shell_lam.fppized.f":238:72 1190 {vsx_movv2df_64bit}
>>>>>>>>>>>>>>      (nil))
>>>>>>>>>>>>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 
>>>>>>>>>>>>>> vect__302.545 ] [240])
>>>>>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) 
>>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050]+16 ])
>>>>>>>>>>>>>>                 (reg:V2DF 44 12 [3119])
>>>>>>>>>>>>>>                 (neg:V2DF (reg:V2DF 51 19 [orig:240 
>>>>>>>>>>>>>> vect__302.545 ] [240]))))) {*vsx_nfmsv2df4}
>>>>>>>>>>>>>>      (nil))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (insn 2473 9311 9312 187 (set (reg:V2DF 38 6 [orig:905 
>>>>>>>>>>>>>> vect__302.545 ] [905])
>>>>>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 44 12 [3119])
>>>>>>>>>>>>>>                 (reg:V2DF 38 6 [orig:2561 MEM <vector(2) 
>>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ] [2561])
>>>>>>>>>>>>>>                 (neg:V2DF (reg:V2DF 47 15 [5266]))))) 
>>>>>>>>>>>>>> {*vsx_nfmsv2df4}
>>>>>>>>>>>>>>      (nil))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the above allocated code it assign registers 51 and 47 and 
>>>>>>>>>>>>>> they are not sequential.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The reload for 2412 looks valid.  What was the original pre-reload
>>>>>>>>>>>>> version of insn 2473?  Also, what happened to insn 2472?  Was it 
>>>>>>>>>>>>> deleted?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is preload version of 2473:
>>>>>>>>>>>>
>>>>>>>>>>>> (insn 2473 2396 2478 161 (set (reg:V2DF 905 [ vect__302.545 ])
>>>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 4283 [3119])
>>>>>>>>>>>>                 (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) 
>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) 0)
>>>>>>>>>>>>                 (neg:V2DF (subreg:V2DF (reg:OO 2572 [ 
>>>>>>>>>>>> vect__300.543_236 ]) 0))))) {*vsx_nfmsv2df4}
>>>>>>>>>>>>      (expr_list:REG_DEAD (reg:OO 2572 [ vect__300.543_236 ])
>>>>>>>>>>>>         (expr_list:REG_DEAD (reg:OO 2561 [ MEM <vector(2) 
>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ])
>>>>>>>>>>>>             (nil))))
>>>>>>>>>>>>
>>>>>>>>>>>> insn 2472 is replaced with 9299 after reload.
>>>>>>>>>>>
>>>>>>>>>>> You'd have to check the dumps to be sure, but I think 9299 is 
>>>>>>>>>>> instead
>>>>>>>>>>> generated as an input reload of 2412, rather than being a 
>>>>>>>>>>> replacement
>>>>>>>>>>> of insn 2472.  T
>>>>>>>>>>
>>>>>>>>>> Yes it is generated for 2412. The predecessor of 2412 is load from
>>>>>>>>>> plus offset as in 2412 we have subreg:V2DF (reg OO 2572) 16).
>>>>>>>>>>
>>>>>>>>>> This is not correct as we are not generating lxvp and it is 
>>>>>>>>>> normal load lxv.
>>>>>>>>>> As normal load is generated in predecessor insn of 2412 with
>>>>>>>>>> plus constant offset it breaks the correctness.
>>>>>>>>>
>>>>>>>>> Not using lxvp is a deliberate choice though.
>>>>>>>>>
>>>>>>>>> If a (reg:OO R) has been spilled, there's no requirement for LRA
>>>>>>>>> to load both halves of R when only one half is needed.  LRA just
>>>>>>>>> loads what it needs into whichever registers happen to be free.
>>>>>>>>>
>>>>>>>>> If the reload of R instead used lxvp, LRA would be forced to free
>>>>>>>>> up another register for the other half of R, even though that value
>>>>>>>>> would never be used.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value
>>>>>>>> will be from plus offset 16 instead it should be loaded value 
>>>>>>>> from zero offset. As in load fusion pass we are replacing
>>>>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value
>>>>>>>> is from plus 16 offsets and thats why its breaking the correctness.
>>>>>>>>
>>>>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0
>>>>>>>> and loaded value is from 16 offset instead its loading from zero
>>>>>>>> offset and thats why we are breaking the correctness.
>>>>>>>>
>>>>>>>
>>>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value
>>>>>>> will be from plus offset 16 instead it should be loaded value 
>>>>>>> from zero offset. As in load fusion pass we are replacing
>>>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value
>>>>>>> is from plus 16 offsets instead it should load from zero offset.
>>>>>>> Thats why its breaking the correctness.
>>>>>>>  
>>>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0
>>>>>>> and loaded value is from 16 offset instead its loading from zero
>>>>>>> offset and thats why we are breaking the correctness.
>>>>>>
>>>>>> I don't understand, sorry.  (subreg:V2DI (reg:OO R) 0) is always
>>>>>>
>>>>>> (a) the first hard register in (reg:OO R), when the whole of R
>>>>>>     is stored in hard registers
>>>>>> (b) at address offset 0 from the start of (reg:OO R), when R is
>>>>>>     spilled to memory
>>>>>>
>>>>>> Similarly, (subreg:V2DI (reg:OO R) 16) is always
>>>>>>
>>>>>> (c) the second hard register in (reg:OO R), when the whole of R
>>>>>>     is stored in hard registers
>>>>>> (d) at address offset 16 from the start of (reg:OO R), when R is
>>>>>>     spilled to memory
>>>>>>
>>>>>
>>>>> Yes but we are replacing use of loaded value from plus 16 offset
>>>>> with subreg (reg OO ) 0 and similarly we are replacing use of loaded
>>>>> value from 0 offset with subreg (reg OO ) 16 as we are swapping
>>>>> the use operand.
>>>>>
>>>>> When it is spilled its vice versa subreg (reg OO ) 16 should be
>>>>> loaded from 0 offset and subreg (reg OO) 0 should be loaded
>>>>> from 16 offset as we are swapping the use operand.
>>>>>
>>>>> This is the semantics of lxvp.
>>>>
>>>> Hmm, OK.  Does that mean that:
>>>>
>>>>    lxvp A,B
>>>>
>>>> loads A+1 from B and A from B+16?  (I couldn't find an online
>>>> description of the instruction btw -- is there one?)
>>>>
>>>
>>> Yes thats correct. Even I didn't find online document that
>>> describes the same.
>> 
>> Thanks, I think I get it now.
>> 
>
> Thanks a lot. Can I know what should we be doing with neg (fma)
> correctness failures with load fusion.


I think it would involve:

- describing lxvp and stxvp as unspec patterns, as I mentioned
  in the previous reply

- making plain movoo split loads and stores into individual
  lxv and stxvs.  (Or, alternative, it could use lxvp and stxvp,
  but internally swap the registers after load and before store.)
  That is, movoo should load the lower-numbered register from the
  lower address and the higher-numbered register from the higher
  address, and likewise for stores.

- make the fusion pass replace the first load result with
  (subreg:V2DI (reg:OO R) 16) and the second load result with
  (subreg:V2DI (reg:OO R) 0), as I think it already does.

Thanks,
Richard

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Reply via email to