Hello Richard: On 11/06/24 9:41 pm, Richard Sandiford wrote: > Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>> Thanks a lot. Can I know what should we be doing with neg (fma) >>>> correctness failures with load fusion. >>> >>> I think it would involve: >>> >>> - describing lxvp and stxvp as unspec patterns, as I mentioned >>> in the previous reply >>> >>> - making plain movoo split loads and stores into individual >>> lxv and stxvs. (Or, alternative, it could use lxvp and stxvp, >>> but internally swap the registers after load and before store.) >>> That is, movoo should load the lower-numbered register from the >>> lower address and the higher-numbered register from the higher >>> address, and likewise for stores. >>> >> >> Would you mind elaborating the above. > > I think movoo should use rs6000_split_multireg_move for all alternatives, > like movxo does. movoo should split into 2 V1TI loads/stores and movxo > should split into 4 V1TI loads/stores. lxvp and stxvp would be > independent patterns of the form: > > (set ... > (unspec [...] UNSPEC_FOO)) > > --- >
In load fusion pass I generate the above pattern for adjacent merge pairs. > rs6000_split_multireg_move has: > > /* The __vector_pair and __vector_quad modes are multi-register > modes, so if we have to load or store the registers, we have to be > careful to properly swap them if we're in little endian mode > below. This means the last register gets the first memory > location. We also need to be careful of using the right register > numbers if we are splitting XO to OO. */ > > But I don't see how this can work reliably if we allow the kind of > subregs that you want to create here. The register order is the opposite > from the one that GCC expects. > > This is more a question for the PowerPC maintainers though. > Above unspec pattern generated and modified the movoo pattern to accept the above spec it goes through the rs6000_split_multireg_move it splits into 2 VITI loads and generate consecutive loads with sequential registers. In load_fusion pass I generate the subreg along with load results subreg (reg OO R) 16 and subreg (reg OO R) 0. But it doesnt generate lxvp instruction. If above unspec instruction pattern and write separate pattern in md file to generate lxvp instead of normal movoo, then it won't go through rs6000_split_multireg_move > And this is one of the (admittedly many) times when I wish GCC's > subreg model was more like LLVM's. :) > > Thanks, > Richard Thanks & Regards Ajit