Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Ajit Agarwal Tue, 11 Jun 2024 04:26:39 -0700

Hello Richard:

On 11/06/24 4:36 pm, Richard Sandiford wrote:
> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>> After LRA reload:
>>>>>>
>>>>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>>>>> [240])
>>>>>>         (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
>>>>>>                 (const_int 16 [0x10])) [1 MEM <vector(2) real(kind=8)> 
>>>>>> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 
>>>>>> {vsx_movv2df_64bit}
>>>>>>      (nil))
>>>>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>>>>> [240])
>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) 
>>>>>> real(kind=8)> [(real(kind=8) *)_4050]+16 ])
>>>>>>                 (reg:V2DF 44 12 [3119])
>>>>>>                 (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>>>>> [240]))))) {*vsx_nfmsv2df4}
>>>>>>      (nil))
>>>>>>
>>>>>> (insn 2473 9311 9312 187 (set (reg:V2DF 38 6 [orig:905 vect__302.545 ] 
>>>>>> [905])
>>>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 44 12 [3119])
>>>>>>                 (reg:V2DF 38 6 [orig:2561 MEM <vector(2) real(kind=8)> 
>>>>>> [(real(kind=8) *)_4050] ] [2561])
>>>>>>                 (neg:V2DF (reg:V2DF 47 15 [5266]))))) {*vsx_nfmsv2df4}
>>>>>>      (nil))
>>>>>>
>>>>>> In the above allocated code it assign registers 51 and 47 and they are 
>>>>>> not sequential.
>>>>>
>>>>> The reload for 2412 looks valid.  What was the original pre-reload
>>>>> version of insn 2473?  Also, what happened to insn 2472?  Was it deleted?
>>>>>
>>>>
>>>> This is preload version of 2473:
>>>>
>>>> (insn 2473 2396 2478 161 (set (reg:V2DF 905 [ vect__302.545 ])
>>>>         (neg:V2DF (fma:V2DF (reg:V2DF 4283 [3119])
>>>>                 (subreg:V2DF (reg:OO 2561 [ MEM <vector(2) real(kind=8)> 
>>>> [(real(kind=8) *)_4050] ]) 0)
>>>>                 (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 ]) 
>>>> 0))))) {*vsx_nfmsv2df4}
>>>>      (expr_list:REG_DEAD (reg:OO 2572 [ vect__300.543_236 ])
>>>>         (expr_list:REG_DEAD (reg:OO 2561 [ MEM <vector(2) real(kind=8)> 
>>>> [(real(kind=8) *)_4050] ])
>>>>             (nil))))
>>>>
>>>> insn 2472 is replaced with 9299 after reload.
>>>
>>> You'd have to check the dumps to be sure, but I think 9299 is instead
>>> generated as an input reload of 2412, rather than being a replacement
>>> of insn 2472.  T
>>
>> Yes it is generated for 2412. The predecessor of 2412 is load from
>> plus offset as in 2412 we have subreg:V2DF (reg OO 2572) 16).
>>
>> This is not correct as we are not generating lxvp and it is 
>> normal load lxv.
>> As normal load is generated in predecessor insn of 2412 with
>> plus constant offset it breaks the correctness.
> 
> Not using lxvp is a deliberate choice though.
> 
> If a (reg:OO R) has been spilled, there's no requirement for LRA
> to load both halves of R when only one half is needed.  LRA just
> loads what it needs into whichever registers happen to be free.
> 
> If the reload of R instead used lxvp, LRA would be forced to free
> up another register for the other half of R, even though that value
> would never be used.
>


If a (reg:OO R ) 16 is loaded when it is spilled then loaded value
will be from plus offset 16 instead it should be loaded value 
from zero offset. As in load fusion pass we are replacing
(reg:V2DI R) with subreg (reg:OO R) 16 and hence loaded value
is from plus 16 offsets and thats why its breaking the correctness.

Similarly we are replacing (reg:V2DI R) 16 with subreg (reg:OO R) 0
and loaded value is from 16 offset instead its loading from zero
offset and thats why we are breaking the correctness.

To generate lxvp this is the semantics of replacing in load fusion pass. 
>>> That is, LRA needs to reload (subreg:V2DF (reg:OO 2572) 16)
>>> from memory for insn 2412.  It can use the destination of insn 2412 (r51)
>>> as a temporary to do that.  It doesn't need to load the other half of
>>> reg:OO 2572 for this instruction.  That in itself looks ok.
>>>
>>> So it looks like the problem is specific to insn 2473.  Perhaps LRA
>>> thinks that r47 already contains the low half of (reg:OO 2572),
>>> left behind by some previous instruction not shown above?
>>> If LRA is wrong about that -- if r47 doesn't already contain the
>>> low half of (reg:OO 2572) -- then there's a bug somewhere.
>>> But we need to track down and fix the bug rather than try to sidestep
>>> it in the fusion pass.
>>>
>>
>> Similarly for 2473 normal load with 0 offset are generated in predecessor
>> insn as we are generating subreg:V2DF (reg OO 2572) 0 in 2473. As we are not
>> generating lxvp this is not correct and breaks the code.
> 
> That too sounds ok, for the reasons above.
> 
>> Above code is valid if we are generating lxvp that generates
>> sequential registers, but we are not geneating lxvp and normal
>> load is generated and this breaks the code.
> 
> I think you said earlier that the code is miscompiled (fails at
> runtime).  If that's due to an RA issue, then presumably there is
> an instruction that, after RA, is reading the wrong value.  In other
> words, there's presumably a register input somewhere that has the wrong
> contents.  Have you isolated which instruction and register that is?
> 
The code is compiled and run successfully but we get miscompare
error.

> Thanks,
> Richard

Thanks & Regards
Ajit

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

Reply via email to