On Thu, Jun 10, 2021 at 03:11:08PM +0800, Xionghu Luo wrote: > On 2021/6/10 00:24, Segher Boessenkool wrote: > >> "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && > >> !TARGET_P9_VECTOR > >> && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" > >> [(const_int 0)] > >> { > >> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); > >> rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode); > >> rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); > >> DONE; > >> }) > > > > So it seems like it is only 3 insns in the very unlucky case? Normally > > it will end up as just one simple store? > > I am afraid there is not "simple store" for *TImode on P8LE*. There is only > stxvd2x that rotates the element(stvx requires memory to be aligned, not > suitable pattern), so every vsx_le_perm_store_v1ti must be split to 3 > instructions for alternative 0, it seems incorrect to force the cost to be 4.
Often it could be done as just two insns though? If the value stored is not used elsewhere? So we could make the first alternative cost 8 then as well, which will also work out for combine, right? Alternatively we could have what is now the second alternative be the first, if that is realistic -- that one already has cost 8 (it is just two machine instructions). Segher