Hello Richard: On 12/06/24 3:02 am, Richard Sandiford wrote: > Ajit Agarwal <aagar...@linux.ibm.com> writes: >> Hello Richard: >> >> On 11/06/24 9:41 pm, Richard Sandiford wrote: >>> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>>>> Thanks a lot. Can I know what should we be doing with neg (fma) >>>>>> correctness failures with load fusion. >>>>> >>>>> I think it would involve: >>>>> >>>>> - describing lxvp and stxvp as unspec patterns, as I mentioned >>>>> in the previous reply >>>>> >>>>> - making plain movoo split loads and stores into individual >>>>> lxv and stxvs. (Or, alternative, it could use lxvp and stxvp, >>>>> but internally swap the registers after load and before store.) >>>>> That is, movoo should load the lower-numbered register from the >>>>> lower address and the higher-numbered register from the higher >>>>> address, and likewise for stores. >>>>> >>>> >>>> Would you mind elaborating the above. >>> >>> I think movoo should use rs6000_split_multireg_move for all alternatives, >>> like movxo does. movoo should split into 2 V1TI loads/stores and movxo >>> should split into 4 V1TI loads/stores. lxvp and stxvp would be >>> independent patterns of the form: >>> >>> (set ... >>> (unspec [...] UNSPEC_FOO)) >>> >>> --- >>> >> >> In load fusion pass I generate the above pattern for adjacent merge >> pairs. >> >>> rs6000_split_multireg_move has: >>> >>> /* The __vector_pair and __vector_quad modes are multi-register >>> modes, so if we have to load or store the registers, we have to be >>> careful to properly swap them if we're in little endian mode >>> below. This means the last register gets the first memory >>> location. We also need to be careful of using the right register >>> numbers if we are splitting XO to OO. */ >>> >>> But I don't see how this can work reliably if we allow the kind of >>> subregs that you want to create here. The register order is the opposite >>> from the one that GCC expects. >>> >>> This is more a question for the PowerPC maintainers though. >>> >> >> Above unspec pattern generated and modified the movoo pattern to accept >> the above spec it goes through the rs6000_split_multireg_move >> it splits into 2 VITI loads and generate consecutive loads with sequential >> registers. In load_fusion pass I generate the subreg along with load results >> subreg (reg OO R) 16 and subreg (reg OO R) 0. >> >> But it doesnt generate lxvp instruction. If above unspec instruction >> pattern and write separate pattern in md file to generate lxvp instead of >> normal movoo, then it won't go through rs6000_split_multireg_move > > I don't understand the last bit, sorry. Under the scheme I described, > lxvp should be generated only through an unspec (and no other way). > Same for stxvp. The fusion pass should generate those unspecs. > > If the fusion pass has generated the code correctly, the lxvp unspec > will remain throughout compilation, unless all uses of it are later > deleted as dead. > > The movoo rtl pattern should continue to be: > > [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa") > (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))] > > But movoo should generate individual loads, stores and moves. By design, > it should never generate lxvp or stxvp. > > This means that, if a fused load is spilled, the sequence will be > something like: > > lxvp ... // original fused load (unspec) > ... > stxv ... // store one half to the stack (split from movoo) > stxv ... // store the other half to the stack (split from movoo) > > Then insns that use the pair will load whichever half they need > from the stack. > > I realise that isn't great, but it should at least be correct. >
Thanks a lot. It worked. > Thanks, > Richard Thanks & Regards Ajit