Ajit Agarwal <aagar...@linux.ibm.com> writes: > Hello Richard: > > On 11/06/24 9:41 pm, Richard Sandiford wrote: >> Ajit Agarwal <aagar...@linux.ibm.com> writes: >>>>> Thanks a lot. Can I know what should we be doing with neg (fma) >>>>> correctness failures with load fusion. >>>> >>>> I think it would involve: >>>> >>>> - describing lxvp and stxvp as unspec patterns, as I mentioned >>>> in the previous reply >>>> >>>> - making plain movoo split loads and stores into individual >>>> lxv and stxvs. (Or, alternative, it could use lxvp and stxvp, >>>> but internally swap the registers after load and before store.) >>>> That is, movoo should load the lower-numbered register from the >>>> lower address and the higher-numbered register from the higher >>>> address, and likewise for stores. >>>> >>> >>> Would you mind elaborating the above. >> >> I think movoo should use rs6000_split_multireg_move for all alternatives, >> like movxo does. movoo should split into 2 V1TI loads/stores and movxo >> should split into 4 V1TI loads/stores. lxvp and stxvp would be >> independent patterns of the form: >> >> (set ... >> (unspec [...] UNSPEC_FOO)) >> >> --- >> > > In load fusion pass I generate the above pattern for adjacent merge > pairs. > >> rs6000_split_multireg_move has: >> >> /* The __vector_pair and __vector_quad modes are multi-register >> modes, so if we have to load or store the registers, we have to be >> careful to properly swap them if we're in little endian mode >> below. This means the last register gets the first memory >> location. We also need to be careful of using the right register >> numbers if we are splitting XO to OO. */ >> >> But I don't see how this can work reliably if we allow the kind of >> subregs that you want to create here. The register order is the opposite >> from the one that GCC expects. >> >> This is more a question for the PowerPC maintainers though. >> > > Above unspec pattern generated and modified the movoo pattern to accept > the above spec it goes through the rs6000_split_multireg_move > it splits into 2 VITI loads and generate consecutive loads with sequential > registers. In load_fusion pass I generate the subreg along with load results > subreg (reg OO R) 16 and subreg (reg OO R) 0. > > But it doesnt generate lxvp instruction. If above unspec instruction > pattern and write separate pattern in md file to generate lxvp instead of > normal movoo, then it won't go through rs6000_split_multireg_move
I don't understand the last bit, sorry. Under the scheme I described, lxvp should be generated only through an unspec (and no other way). Same for stxvp. The fusion pass should generate those unspecs. If the fusion pass has generated the code correctly, the lxvp unspec will remain throughout compilation, unless all uses of it are later deleted as dead. The movoo rtl pattern should continue to be: [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa") (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))] But movoo should generate individual loads, stores and moves. By design, it should never generate lxvp or stxvp. This means that, if a fused load is spilled, the sequence will be something like: lxvp ... // original fused load (unspec) ... stxv ... // store one half to the stack (split from movoo) stxv ... // store the other half to the stack (split from movoo) Then insns that use the pair will load whichever half they need from the stack. I realise that isn't great, but it should at least be correct. Thanks, Richard