Hi, On 2021/6/3 06:20, Segher Boessenkool wrote: > On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: >> On P8LE, extra rot64+rot64 load or store instructions are generated >> in float128 to vector __int128 conversion. >> >> This patch teaches pass swaps to also handle such pattens to remove >> extra swap instructions. > > Did you check if this is already handled by simplify-rtx if the mode had > been TImode (not V1TImode)? If not, why do you not handle it there?
I tried to do it in combine or peephole, the later pass split2 or split3 will still split it to rotate + rotate again as we have split after reload, and this pattern is quite P8LE specific, so put it in pass swap. The simplify-rtx could simplify r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register operations already. vsx.md: ;; The post-reload split requires that we re-permute the source ;; register in case it is still live. (define_split [(set (match_operand:VSX_LE_128 0 "memory_operand") (match_operand:VSX_LE_128 1 "vsx_register_operand"))] "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" [(const_int 0)] { rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode); rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); DONE; }) Thanks, Xionghu