Hello Richard:

On 12/06/24 3:02 am, Richard Sandiford wrote:
> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>> Hello Richard:
>>
>> On 11/06/24 9:41 pm, Richard Sandiford wrote:
>>> Ajit Agarwal <aagar...@linux.ibm.com> writes:
>>>>>> Thanks a lot. Can I know what should we be doing with neg (fma)
>>>>>> correctness failures with load fusion.
>>>>>
>>>>> I think it would involve:
>>>>>
>>>>> - describing lxvp and stxvp as unspec patterns, as I mentioned
>>>>>   in the previous reply
>>>>>
>>>>> - making plain movoo split loads and stores into individual
>>>>>   lxv and stxvs.  (Or, alternative, it could use lxvp and stxvp,
>>>>>   but internally swap the registers after load and before store.)
>>>>>   That is, movoo should load the lower-numbered register from the
>>>>>   lower address and the higher-numbered register from the higher
>>>>>   address, and likewise for stores.
>>>>>
>>>>
>>>> Would you mind elaborating the above.
>>>
>>> I think movoo should use rs6000_split_multireg_move for all alternatives,
>>> like movxo does.  movoo should split into 2 V1TI loads/stores and movxo
>>> should split into 4 V1TI loads/stores.  lxvp and stxvp would be
>>> independent patterns of the form:
>>>
>>>   (set ...
>>>        (unspec [...] UNSPEC_FOO))
>>>
>>> ---
>>>
>>
>> In load fusion pass I generate the above pattern for adjacent merge
>> pairs.
>>
>>> rs6000_split_multireg_move has:
>>>
>>>   /* The __vector_pair and __vector_quad modes are multi-register
>>>      modes, so if we have to load or store the registers, we have to be
>>>      careful to properly swap them if we're in little endian mode
>>>      below.  This means the last register gets the first memory
>>>      location.  We also need to be careful of using the right register
>>>      numbers if we are splitting XO to OO.  */
>>>
>>> But I don't see how this can work reliably if we allow the kind of
>>> subregs that you want to create here.  The register order is the opposite
>>> from the one that GCC expects.
>>>
>>> This is more a question for the PowerPC maintainers though.
>>>
>>
>> Above unspec pattern generated and modified the movoo pattern to accept
>> the above spec it goes through the rs6000_split_multireg_move
>> it splits into 2 VITI loads and generate consecutive loads with sequential
>> registers. In load_fusion pass I generate the subreg along with load results 
>> subreg (reg OO R) 16 and subreg (reg OO R) 0.
>>
>> But it doesnt generate lxvp instruction. If above unspec instruction
>> pattern and write separate pattern in md file to generate lxvp instead of
>> normal movoo, then it won't go through rs6000_split_multireg_move
> 
> I don't understand the last bit, sorry.  Under the scheme I described,
> lxvp should be generated only through an unspec (and no other way).
> Same for stxvp.  The fusion pass should generate those unspecs.
> 
> If the fusion pass has generated the code correctly, the lxvp unspec
> will remain throughout compilation, unless all uses of it are later
> deleted as dead.
> 
> The movoo rtl pattern should continue to be:
> 
>   [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
>       (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
> 
> But movoo should generate individual loads, stores and moves.  By design,
> it should never generate lxvp or stxvp.
> 
> This means that, if a fused load is spilled, the sequence will be
> something like:
> 
>   lxvp ...   // original fused load (unspec)
>   ...
>   stxv ...   // store one half to the stack (split from movoo)
>   stxv ...   // store the other half to the stack (split from movoo)
> 
> Then insns that use the pair will load whichever half they need
> from the stack.
> 
> I realise that isn't great, but it should at least be correct.
> 

Thanks a lot. It worked.

> Thanks,
> Richard

Thanks & Regards
Ajit

Reply via email to