Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Ajit Agarwal Tue, 11 Jun 2024 08:56:36 -0700

Hello Richard:

On 11/06/24 8:59 pm, Richard Sandiford wrote:
> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>> On 11/06/24 7:07 pm, Richard Sandiford wrote:
>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>> Hello Richard:
>>>> On 11/06/24 6:12 pm, Richard Sandiford wrote:
>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>> Hello Richard:
>>>>>>
>>>>>> On 11/06/24 5:15 pm, Richard Sandiford wrote:
>>>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>>>> Hello Richard:
>>>>>>>> On 11/06/24 4:56 pm, Ajit Agarwal wrote:
>>>>>>>>> Hello Richard:
>>>>>>>>>
>>>>>>>>> On 11/06/24 4:36 pm, Richard Sandiford wrote:
>>>>>>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>>>>>>>>>>> After LRA reload:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 
>>>>>>>>>>>>>>> vect__302.545 ] [240])
>>>>>>>>>>>>>>>         (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] 
>>>>>>>>>>>>>>> [1285])
>>>>>>>>>>>>>>>                 (const_int 16 [0x10])) [1 MEM <vector(2) 
>>>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4188]+16 S16 A64])) 
>>>>>>>>>>>>>>> "shell_lam.fppized.f":238:72 1190 {vsx_movv2df_64bit}
>>>>>>>>>>>>>>>      (nil))
>>>>>>>>>>>>>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 
>>>>>>>>>>>>>>> vect__302.545 ] [240])
>>>>>>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) 
>>>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050]+16 ])
>>>>>>>>>>>>>>>                 (reg:V2DF 44 12 [3119])
>>>>>>>>>>>>>>>                 (neg:V2DF (reg:V2DF 51 19 [orig:240 
>>>>>>>>>>>>>>> vect__302.545 ] [240]))))) {*vsx_nfmsv2df4}
>>>>>>>>>>>>>>>      (nil))
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (insn 2473 9311 9312 187 (set (reg:V2DF 38 6 [orig:905 
>>>>>>>>>>>>>>> vect__302.545 ] [905])
>>>>>>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 44 12 [3119])
>>>>>>>>>>>>>>>                 (reg:V2DF 38 6 [orig:2561 MEM <vector(2) 
>>>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ] [2561])
>>>>>>>>>>>>>>>                 (neg:V2DF (reg:V2DF 47 15 [5266]))))) 
>>>>>>>>>>>>>>> {*vsx_nfmsv2df4}
>>>>>>>>>>>>>>>      (nil))
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the above allocated code it assign registers 51 and 47 and 
>>>>>>>>>>>>>>> they are not sequential.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The reload for 2412 looks valid.  What was the original 
>>>>>>>>>>>>>> pre-reload
>>>>>>>>>>>>>> version of insn 2473?  Also, what happened to insn 2472?  Was it 
>>>>>>>>>>>>>> deleted?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is preload version of 2473:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (insn 2473 2396 2478 161 (set (reg:V2DF 905 [ vect__302.545 ])
>>>>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 4283 [3119])
>>>>>>>>>>>>>                 (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) 
>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) 0)
>>>>>>>>>>>>>                 (neg:V2DF (subreg:V2DF (reg:OO 2572 [ 
>>>>>>>>>>>>> vect__300.543_236 ]) 0))))) {*vsx_nfmsv2df4}
>>>>>>>>>>>>>      (expr_list:REG_DEAD (reg:OO 2572 [ vect__300.543_236 ])
>>>>>>>>>>>>>         (expr_list:REG_DEAD (reg:OO 2561 [ MEM <vector(2) 
>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ])
>>>>>>>>>>>>>             (nil))))
>>>>>>>>>>>>>
>>>>>>>>>>>>> insn 2472 is replaced with 9299 after reload.
>>>>>>>>>>>>
>>>>>>>>>>>> You'd have to check the dumps to be sure, but I think 9299 is 
>>>>>>>>>>>> instead
>>>>>>>>>>>> generated as an input reload of 2412, rather than being a 
>>>>>>>>>>>> replacement
>>>>>>>>>>>> of insn 2472.  T
>>>>>>>>>>>
>>>>>>>>>>> Yes it is generated for 2412. The predecessor of 2412 is load from
>>>>>>>>>>> plus offset as in 2412 we have subreg:V2DF (reg OO 2572) 16).
>>>>>>>>>>>
>>>>>>>>>>> This is not correct as we are not generating lxvp and it is 
>>>>>>>>>>> normal load lxv.
>>>>>>>>>>> As normal load is generated in predecessor insn of 2412 with
>>>>>>>>>>> plus constant offset it breaks the correctness.
>>>>>>>>>>
>>>>>>>>>> Not using lxvp is a deliberate choice though.
>>>>>>>>>>
>>>>>>>>>> If a (reg:OO R) has been spilled, there's no requirement for LRA
>>>>>>>>>> to load both halves of R when only one half is needed.  LRA just
>>>>>>>>>> loads what it needs into whichever registers happen to be free.
>>>>>>>>>>
>>>>>>>>>> If the reload of R instead used lxvp, LRA would be forced to free
>>>>>>>>>> up another register for the other half of R, even though that value
>>>>>>>>>> would never be used.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value
>>>>>>>>> will be from plus offset 16 instead it should be loaded value 
>>>>>>>>> from zero offset. As in load fusion pass we are replacing
>>>>>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value
>>>>>>>>> is from plus 16 offsets and thats why its breaking the correctness.
>>>>>>>>>
>>>>>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0
>>>>>>>>> and loaded value is from 16 offset instead its loading from zero
>>>>>>>>> offset and thats why we are breaking the correctness.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value
>>>>>>>> will be from plus offset 16 instead it should be loaded value 
>>>>>>>> from zero offset. As in load fusion pass we are replacing
>>>>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value
>>>>>>>> is from plus 16 offsets instead it should load from zero offset.
>>>>>>>> Thats why its breaking the correctness.
>>>>>>>>  
>>>>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0
>>>>>>>> and loaded value is from 16 offset instead its loading from zero
>>>>>>>> offset and thats why we are breaking the correctness.
>>>>>>>
>>>>>>> I don't understand, sorry.  (subreg:V2DI (reg:OO R) 0) is always
>>>>>>>
>>>>>>> (a) the first hard register in (reg:OO R), when the whole of R
>>>>>>>     is stored in hard registers
>>>>>>> (b) at address offset 0 from the start of (reg:OO R), when R is
>>>>>>>     spilled to memory
>>>>>>>
>>>>>>> Similarly, (subreg:V2DI (reg:OO R) 16) is always
>>>>>>>
>>>>>>> (c) the second hard register in (reg:OO R), when the whole of R
>>>>>>>     is stored in hard registers
>>>>>>> (d) at address offset 16 from the start of (reg:OO R), when R is
>>>>>>>     spilled to memory
>>>>>>>
>>>>>>
>>>>>> Yes but we are replacing use of loaded value from plus 16 offset
>>>>>> with subreg (reg OO ) 0 and similarly we are replacing use of loaded
>>>>>> value from 0 offset with subreg (reg OO ) 16 as we are swapping
>>>>>> the use operand.
>>>>>>
>>>>>> When it is spilled its vice versa subreg (reg OO ) 16 should be
>>>>>> loaded from 0 offset and subreg (reg OO) 0 should be loaded
>>>>>> from 16 offset as we are swapping the use operand.
>>>>>>
>>>>>> This is the semantics of lxvp.
>>>>>
>>>>> Hmm, OK.  Does that mean that:
>>>>>
>>>>>    lxvp A,B
>>>>>
>>>>> loads A+1 from B and A from B+16?  (I couldn't find an online
>>>>> description of the instruction btw -- is there one?)
>>>>>
>>>>
>>>> Yes thats correct. Even I didn't find online document that
>>>> describes the same.
>>>
>>> Thanks, I think I get it now.
>>>
>>
>> Thanks a lot. Can I know what should we be doing with neg (fma)
>> correctness failures with load fusion.
> 
> I think it would involve:
> 
> - describing lxvp and stxvp as unspec patterns, as I mentioned
>   in the previous reply
> 
> - making plain movoo split loads and stores into individual
>   lxv and stxvs.  (Or, alternative, it could use lxvp and stxvp,
>   but internally swap the registers after load and before store.)
>   That is, movoo should load the lower-numbered register from the
>   lower address and the higher-numbered register from the higher
>   address, and likewise for stores.
>


Would you mind elaborating the above.

> - make the fusion pass replace the first load result with
>   (subreg:V2DI (reg:OO R) 16) and the second load result with
>   (subreg:V2DI (reg:OO R) 0), as I think it already does.
> 
> Thanks,
> Richard

Thanks & Regards
Ajit

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Reply via email to