https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707
--- Comment #23 from alalaw01 at gcc dot gnu.org --- Yes, difficult. I'm conscious that this is stage 3, and worried about adding too much complexity, especially if we're writing code that we'd eventually drop in favour of a more complete framework later (i.e. in gcc7). I'm inclined against > (I wondered > if load-lanes would require more unrolling we should prefer SLP anyway?). As we've seen cases where load-lanes requires more unrolling but the code is still much better. Likewise your argument against > to query whether _all_ loads need to be permuted with SLP ... > thus if there is a load node which is not permuted then retain the SLP. seems convincing. I think the heuristic in comment 16 handles permutation well enough, and beyond that, sharing (rather than the permutation) then appears to be the critical factor. Unfortunately as you say SLP doesn't really handle sharing yet...so > I fear that to get a better heuristic > than what is proposed we need to push this for example to > vect_make_slp_decision where all instances are built Might be reasonable, but I fear it'd be of dubious benefit without: > and we'd need to gather some sharing data therein. I guess if that were a useful step towards > But then there is only a small step to the point where we could actually > compare SLP vs. non-SLP costs. then there is some justification, but the former feels like too much complexity at this stage - especially to do it well; how much do we really want to gather data on the sharing that exists at present, rather than looking at removing that sharing entirely? I'm thinking of e.g. SLP nodes that are performing the same computations but with different permutations too - shouldn't we be aiming at making permutations into first class citizens/operations, and making SLP trees into DAGs? Longer-term goals, sure... So my instinct is to go with the comment 16 patch, and accept that we take the hit in that last testcase (i.e. the one with the sharing).