Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Ajit Agarwal Tue, 11 Jun 2024 07:40:25 -0700

Hello Richard:

On 11/06/24 7:07 pm, Richard Sandiford wrote:
> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>> Hello Richard:
>> On 11/06/24 6:12 pm, Richard Sandiford wrote:
>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>> Hello Richard:
>>>>
>>>> On 11/06/24 5:15 pm, Richard Sandiford wrote:
>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>> Hello Richard:
>>>>>> On 11/06/24 4:56 pm, Ajit Agarwal wrote:
>>>>>>> Hello Richard:
>>>>>>>
>>>>>>> On 11/06/24 4:36 pm, Richard Sandiford wrote:
>>>>>>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>>>>>>>>> After LRA reload:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 
>>>>>>>>>>>>> vect__302.545 ] [240])
>>>>>>>>>>>>>         (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] 
>>>>>>>>>>>>> [1285])
>>>>>>>>>>>>>                 (const_int 16 [0x10])) [1 MEM <vector(2) 
>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4188]+16 S16 A64])) 
>>>>>>>>>>>>> "shell_lam.fppized.f":238:72 1190 {vsx_movv2df_64bit}
>>>>>>>>>>>>>      (nil))
>>>>>>>>>>>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 
>>>>>>>>>>>>> vect__302.545 ] [240])
>>>>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) 
>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050]+16 ])
>>>>>>>>>>>>>                 (reg:V2DF 44 12 [3119])
>>>>>>>>>>>>>                 (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 
>>>>>>>>>>>>> ] [240]))))) {*vsx_nfmsv2df4}
>>>>>>>>>>>>>      (nil))
>>>>>>>>>>>>>
>>>>>>>>>>>>> (insn 2473 9311 9312 187 (set (reg:V2DF 38 6 [orig:905 
>>>>>>>>>>>>> vect__302.545 ] [905])
>>>>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 44 12 [3119])
>>>>>>>>>>>>>                 (reg:V2DF 38 6 [orig:2561 MEM <vector(2) 
>>>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ] [2561])
>>>>>>>>>>>>>                 (neg:V2DF (reg:V2DF 47 15 [5266]))))) 
>>>>>>>>>>>>> {*vsx_nfmsv2df4}
>>>>>>>>>>>>>      (nil))
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the above allocated code it assign registers 51 and 47 and 
>>>>>>>>>>>>> they are not sequential.
>>>>>>>>>>>>
>>>>>>>>>>>> The reload for 2412 looks valid.  What was the original pre-reload
>>>>>>>>>>>> version of insn 2473?  Also, what happened to insn 2472?  Was it 
>>>>>>>>>>>> deleted?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This is preload version of 2473:
>>>>>>>>>>>
>>>>>>>>>>> (insn 2473 2396 2478 161 (set (reg:V2DF 905 [ vect__302.545 ])
>>>>>>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 4283 [3119])
>>>>>>>>>>>                 (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) 
>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) 0)
>>>>>>>>>>>                 (neg:V2DF (subreg:V2DF (reg:OO 2572 [ 
>>>>>>>>>>> vect__300.543_236 ]) 0))))) {*vsx_nfmsv2df4}
>>>>>>>>>>>      (expr_list:REG_DEAD (reg:OO 2572 [ vect__300.543_236 ])
>>>>>>>>>>>         (expr_list:REG_DEAD (reg:OO 2561 [ MEM <vector(2) 
>>>>>>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ])
>>>>>>>>>>>             (nil))))
>>>>>>>>>>>
>>>>>>>>>>> insn 2472 is replaced with 9299 after reload.
>>>>>>>>>>
>>>>>>>>>> You'd have to check the dumps to be sure, but I think 9299 is instead
>>>>>>>>>> generated as an input reload of 2412, rather than being a replacement
>>>>>>>>>> of insn 2472.  T
>>>>>>>>>
>>>>>>>>> Yes it is generated for 2412. The predecessor of 2412 is load from
>>>>>>>>> plus offset as in 2412 we have subreg:V2DF (reg OO 2572) 16).
>>>>>>>>>
>>>>>>>>> This is not correct as we are not generating lxvp and it is 
>>>>>>>>> normal load lxv.
>>>>>>>>> As normal load is generated in predecessor insn of 2412 with
>>>>>>>>> plus constant offset it breaks the correctness.
>>>>>>>>
>>>>>>>> Not using lxvp is a deliberate choice though.
>>>>>>>>
>>>>>>>> If a (reg:OO R) has been spilled, there's no requirement for LRA
>>>>>>>> to load both halves of R when only one half is needed.  LRA just
>>>>>>>> loads what it needs into whichever registers happen to be free.
>>>>>>>>
>>>>>>>> If the reload of R instead used lxvp, LRA would be forced to free
>>>>>>>> up another register for the other half of R, even though that value
>>>>>>>> would never be used.
>>>>>>>>
>>>>>>>
>>>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value
>>>>>>> will be from plus offset 16 instead it should be loaded value 
>>>>>>> from zero offset. As in load fusion pass we are replacing
>>>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value
>>>>>>> is from plus 16 offsets and thats why its breaking the correctness.
>>>>>>>
>>>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0
>>>>>>> and loaded value is from 16 offset instead its loading from zero
>>>>>>> offset and thats why we are breaking the correctness.
>>>>>>>
>>>>>>
>>>>>> If a (reg:OO R ) 16 is loaded when it is spilled then loaded value
>>>>>> will be from plus offset 16 instead it should be loaded value 
>>>>>> from zero offset. As in load fusion pass we are replacing
>>>>>> (reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value
>>>>>> is from plus 16 offsets instead it should load from zero offset.
>>>>>> Thats why its breaking the correctness.
>>>>>>  
>>>>>> Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0
>>>>>> and loaded value is from 16 offset instead its loading from zero
>>>>>> offset and thats why we are breaking the correctness.
>>>>>
>>>>> I don't understand, sorry.  (subreg:V2DI (reg:OO R) 0) is always
>>>>>
>>>>> (a) the first hard register in (reg:OO R), when the whole of R
>>>>>     is stored in hard registers
>>>>> (b) at address offset 0 from the start of (reg:OO R), when R is
>>>>>     spilled to memory
>>>>>
>>>>> Similarly, (subreg:V2DI (reg:OO R) 16) is always
>>>>>
>>>>> (c) the second hard register in (reg:OO R), when the whole of R
>>>>>     is stored in hard registers
>>>>> (d) at address offset 16 from the start of (reg:OO R), when R is
>>>>>     spilled to memory
>>>>>
>>>>
>>>> Yes but we are replacing use of loaded value from plus 16 offset
>>>> with subreg (reg OO ) 0 and similarly we are replacing use of loaded
>>>> value from 0 offset with subreg (reg OO ) 16 as we are swapping
>>>> the use operand.
>>>>
>>>> When it is spilled its vice versa subreg (reg OO ) 16 should be
>>>> loaded from 0 offset and subreg (reg OO) 0 should be loaded
>>>> from 16 offset as we are swapping the use operand.
>>>>
>>>> This is the semantics of lxvp.
>>>
>>> Hmm, OK.  Does that mean that:
>>>
>>>    lxvp A,B
>>>
>>> loads A+1 from B and A from B+16?  (I couldn't find an online
>>> description of the instruction btw -- is there one?)
>>>
>>
>> Yes thats correct. Even I didn't find online document that
>> describes the same.
> 
> Thanks, I think I get it now.
>


Thanks a lot. Can I know what should we be doing with neg (fma)
correctness failures with load fusion.

>>> If lxvp does behave like that, it shouldn't be described as a normal move:
>>>
>>>   [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
>>>     (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
>>>
>>> since the rules I listed above are target-independent.
>>>
>>
>> Rules for load for lxvp should as normal, but how the use
>> of lxvp defines should be different for lxvp than normal
>> load.
> 
> I don't think that's true though.  Because of the rules above,
> GCC expects that:
> 
>   (set (reg:OO R) (mem:OO ADDR))
> 
> will load ADDR+0 into R and ADDR+16 into R+1.  Anything else will
> cause exactly the kind of mix-up that you're seeing here.
> 
> In the worst case, the load should be described as:
> 
>   (set (reg:OO R)
>        (unspec:OO [(mem:OO ADDR)] UNSPEC_LXVP))
> 
> although there are other alternatives that could be used if OOmode
> weren't defined as an OPAQUE_MODE.
> 

If not described as above breaks the correctness.

>>> Alternatively, if lxvp is described as a normal move,
>>> with the 128-bit pieces swapped compared to GCC's normal order,
>>> TARGET_CAN_CHANGE_MODE_CLASS must prevent subregs that select
>>> one half of the register (or less).  But that would defeat exactly
>>> the optimisation that you're trying to do.
>>>
>>
>> Sorry, I didn't get how TARGET_CAN_CHANGE_MODE_CLASS prevents
>> subreg that selects one half of the registers. 
> 
> It doesn't prevent the subregs at such, but it stops the mode
> change from happening in certain registers.
> 
> For example, if we have (subreg:V2DI (reg:OO R) 0), and if
> TARGET_CAN_CHANGE_MODE_CLASS disallows changes between V2DI
> and OO in all register classes that can hold OO, then the register
> allocator would handle the subreg by spilling register R to memory
> and loading a V2DI piece of it from there.
> 
> This would allow the port to store OOmode in an unusual order.
> But like I say, it would also defeat exactly what you're trying
> to do here.
> 
Sure. Thanks.

> Thanks,
> Richard

Thanks & Regards
Ajit

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Reply via email to