On Fri, Aug 13, 2021 at 09:42:00AM +0800, Hongtao Liu wrote: > > So, I wonder if your new routine shouldn't be instead done after > > in ix86_expand_vec_perm_const_1 after vec_perm_1 among other 2 insn cases > > and handle the other vpmovdw etc. cases in combine splitters (see that we > > only use low half or quarter of the result and transform whatever > > permutation we've used into what we want). > > > Got it, i'll try that way.
Note, IMHO the ultimate fix would be to add real support for the __builtin_shufflevector -1 indices (meaning I don't care what will be in that element, perhaps narrowed down to an implementation choice of any element of the input vector(s) or 0). As VEC_PERM_EXPR is currently used for both perms by variable permutation vector and constant, I think we'd need to introduce VEC_PERM_CONST_EXPR, which would be exactly like VEC_PERM_EXPR, except that the last operand would be required to be a VECTOR_CST and that all ones element would mean something different, the I don't care behavior. The GIMPLE side would be fairly easy, except that there should be some optimizations eventually, like when only certain subset of elements of a vector are used later, we can mark the other elements as don't care. The hard part would be backend expansion, especially x86. I guess we could easily canonicalize VEC_PERM_EXPR with constant permutations into VEC_PERM_CONST_EXPR by replacing all ones elements with elements modulo the number of elements (or twice that for 2 operand perms), but then in all the routines that recognize something we'd need to special case the unknown elements to match anything during testing and for expansion replace it by something that would match. That is again a lot of work, but not extremely hard. The hardest would be to deal with the expand_vec_perm_1 handling many cases by trying to recog an instruction. Either we'd need to represent the unknown case by a magic CONST_INT_WILDCARD or CONST_INT_RANGE that recog with the help of the patterns would replace by some CONST_INT that matches it, but note we have all those const_0_to_N_operand and in conditions INTVAL (operands[3]) & 1 == 0 and INTVAL (operands[3]) + 1 == INTVAL (operands[4]) etc., or we'd need to either manually or semi-automatically build some code that would try to guess right values for unknown before trying to recog it. Jakub