[It looks like I missed hitting the send button on this response] > > Seems to be one instruction shorter at least ;-) Yes, there can be much > worse regressions than that because of the patch (like 40 instructions > instead of 4, in the x86 backend).
If this is replacing 4 instructions with 40 in x86 backend maybe someone will notice :) Not a win in this particular testcase because the compiler replaces 2 constant permutes ( that's about 4 cycles) with a load from the constant pool , a generic permute and in addition are polluting the icache with guff in the constant pool . If you go to 3 -4 permutes into a single one then it might be a win but not till that point. > with a-b without first asking the backend whether it might be more > efficient. One permutation is better than 2. > It just happens that the range > of possible permutations is too large (and the basic instructions are too > strange) for backends to do a good job on them, and thus keeping toplevel > input as a hint is important. Of-course, the problem here is this change of semantics with the hook TARGET_VEC_PERM_CONST_OK. Targets were expanding to generic permutes with constants in the *absence* of being able to deal with them with the specialized permutes. fwprop will now leave us at a point where each target has to now grow more knowledge with respect to how best to expand a generic constant permute with a sequence of permutes rather than just using the generic permute. Generating a sequence of permutes from a single constant permute will be a harder problem than (say) dealing with immediate expansions so you are pushing more complexity into the target but in the short term target maintainers should definitely have a heads up that folks using vector permute intrinsics could actually have performance regressions on their targets. Thanks, Ramana