https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2021-01-21 CC| |rsandifo at gcc dot gnu.org Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #4 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- To try to summarise a conversation we had on IRC: As things stand, codes like WIDEN_MULT_EXPR are intended to be code-generated as a hi/lo pair, with both the hi and lo operation being vector(N*2) → vector(N) operations. This works for BB SLP if the SLP group size is ≥ N*2, but (as things stand) is bound to fail otherwise. On targets that operate on only a single vector size, a hard failure is not a problem for group sizes < N*2, since we would have failed in the same place even if we hadn't matched a WIDEN_MULT_EXPR. But it hurts on aarch64 because we could vectorise the multiplication and conversions using mixed vector sizes. I think the conclusion was that: (1) We should define vector(N) → vector(N) optabs for each current widening operation. E.g. in the testcase aarch64 would provide v8qi → v8hi widening operations. (2) We should add directly-mapped internal functions for the new optabs. (3) We should make the modifier==NONE paths in vectorizable_conversion use the new internal functions for WIDEN_*_EXPRs.