Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Richard Sandiford Tue, 11 Jun 2024 05:42:21 -0700

Ajit Agarwal <aagar...@linux.ibm.com> writes:
> Hello Richard:
>
> On 11/06/24 5:15 pm, Richard Sandiford wrote:
>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>> Hello Richard:
>>> On 11/06/24 4:56 pm, Ajit Agarwal wrote:
>>>> Hello Richard:
>>>>
>>>> On 11/06/24 4:36 pm, Richard Sandiford wrote:
>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>>>>>> After LRA reload:
>>>>>>>>>>
>>>>>>>>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 
>>>>>>>>>> vect__302.545 ] [240])
>>>>>>>>>>         (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
>>>>>>>>>>                 (const_int 16 [0x10])) [1 MEM <vector(2) 
>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4188]+16 S16 A64])) 
>>>>>>>>>> "shell_lam.fppized.f":238:72 1190 {vsx_movv2df_64bit}
>>>>>>>>>>      (nil))
>>>>>>>>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 
>>>>>>>>>> vect__302.545 ] [240])
>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) 
>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050]+16 ])
>>>>>>>>>>                 (reg:V2DF 44 12 [3119])
>>>>>>>>>>                 (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>>>>>>>>> [240]))))) {*vsx_nfmsv2df4}
>>>>>>>>>>      (nil))
>>>>>>>>>>
>>>>>>>>>> (insn 2473 9311 9312 187 (set (reg:V2DF 38 6 [orig:905 vect__302.545 
>>>>>>>>>> ] [905])
>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 44 12 [3119])
>>>>>>>>>>                 (reg:V2DF 38 6 [orig:2561 MEM <vector(2) 
>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ] [2561])
>>>>>>>>>>                 (neg:V2DF (reg:V2DF 47 15 [5266]))))) 
>>>>>>>>>> {*vsx_nfmsv2df4}
>>>>>>>>>>      (nil))
>>>>>>>>>>
>>>>>>>>>> In the above allocated code it assign registers 51 and 47 and they 
>>>>>>>>>> are not sequential.
>>>>>>>>>
>>>>>>>>> The reload for 2412 looks valid.  What was the original pre-reload
>>>>>>>>> version of insn 2473?  Also, what happened to insn 2472?  Was it 
>>>>>>>>> deleted?
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is preload version of 2473:
>>>>>>>>
>>>>>>>> (insn 2473 2396 2478 161 (set (reg:V2DF 905 [ vect__302.545 ])
>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 4283 [3119])
>>>>>>>>                 (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) 
>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) 0)
>>>>>>>>                 (neg:V2DF (subreg:V2DF (reg:OO 2572 [ 
>>>>>>>> vect__300.543_236 ]) 0))))) {*vsx_nfmsv2df4}
>>>>>>>>      (expr_list:REG_DEAD (reg:OO 2572 [ vect__300.543_236 ])
>>>>>>>>         (expr_list:REG_DEAD (reg:OO 2561 [ MEM <vector(2) 
>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ])
>>>>>>>>             (nil))))
>>>>>>>>
>>>>>>>> insn 2472 is replaced with 9299 after reload.
>>>>>>>
>>>>>>> You'd have to check the dumps to be sure, but I think 9299 is instead
>>>>>>> generated as an input reload of 2412, rather than being a replacement
>>>>>>> of insn 2472.  T
>>>>>>
>>>>>> Yes it is generated for 2412. The predecessor of 2412 is load from
>>>>>> plus offset as in 2412 we have subreg:V2DF (reg OO 2572) 16).
>>>>>>
>>>>>> This is not correct as we are not generating lxvp and it is 
>>>>>> normal load lxv.
>>>>>> As normal load is generated in predecessor insn of 2412 with
>>>>>> plus constant offset it breaks the correctness.
>>>>>
>>>>> Not using lxvp is a deliberate choice though.
>>>>>
>>>>> If a (reg:OO R) has been spilled, there's no requirement for LRA
>>>>> to load both halves of R when only one half is needed.  LRA just
>>>>> loads what it needs into whichever registers happen to be free.
>>>>>
>>>>> If the reload of R instead used lxvp, LRA would be forced to free
>>>>> up another register for the other half of R, even though that value
>>>>> would never be used.
>>>>>
>>>>
>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value
>>>> will be from plus offset 16 instead it should be loaded value 
>>>> from zero offset. As in load fusion pass we are replacing
>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value
>>>> is from plus 16 offsets and thats why its breaking the correctness.
>>>>
>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0
>>>> and loaded value is from 16 offset instead its loading from zero
>>>> offset and thats why we are breaking the correctness.
>>>>
>>>
>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value
>>> will be from plus offset 16 instead it should be loaded value 
>>> from zero offset. As in load fusion pass we are replacing
>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value
>>> is from plus 16 offsets instead it should load from zero offset.
>>> Thats why its breaking the correctness.
>>>  
>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0
>>> and loaded value is from 16 offset instead its loading from zero
>>> offset and thats why we are breaking the correctness.
>> 
>> I don't understand, sorry.  (subreg:V2DI (reg:OO R) 0) is always
>> 
>> (a) the first hard register in (reg:OO R), when the whole of R
>>     is stored in hard registers
>> (b) at address offset 0 from the start of (reg:OO R), when R is
>>     spilled to memory
>> 
>> Similarly, (subreg:V2DI (reg:OO R) 16) is always
>> 
>> (c) the second hard register in (reg:OO R), when the whole of R
>>     is stored in hard registers
>> (d) at address offset 16 from the start of (reg:OO R), when R is
>>     spilled to memory
>>
>
> Yes but we are replacing use of loaded value from plus 16 offset
> with subreg (reg OO ) 0 and similarly we are replacing use of loaded
> value from 0 offset with subreg (reg OO ) 16 as we are swapping
> the use operand.
>
> When it is spilled its vice versa subreg (reg OO ) 16 should be
> loaded from 0 offset and subreg (reg OO) 0 should be loaded
> from 16 offset as we are swapping the use operand.
>
> This is the semantics of lxvp.


Hmm, OK.  Does that mean that:

   lxvp A,B

loads A+1 from B and A from B+16?  (I couldn't find an online
description of the instruction btw -- is there one?)

If lxvp does behave like that, it shouldn't be described as a normal move:

  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
        (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]

since the rules I listed above are target-independent.

Alternatively, if lxvp is described as a normal move,
with the 128-bit pieces swapped compared to GCC's normal order,
TARGET_CAN_CHANGE_MODE_CLASS must prevent subregs that select
one half of the register (or less).  But that would defeat exactly
the optimisation that you're trying to do.

Thanks,
Richard

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Reply via email to