https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-01-21
                 CC|                            |rsandifo at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #4 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
To try to summarise a conversation we had on IRC:

As things stand, codes like WIDEN_MULT_EXPR are intended
to be code-generated as a hi/lo pair, with both the hi
and lo operation being vector(N*2) → vector(N) operations.
This works for BB SLP if the SLP group size is ≥ N*2,
but (as things stand) is bound to fail otherwise.

On targets that operate on only a single vector size,
a hard failure is not a problem for group sizes < N*2,
since we would have failed in the same place even if
we hadn't matched a WIDEN_MULT_EXPR.  But it hurts on
aarch64 because we could vectorise the multiplication
and conversions using mixed vector sizes.

I think the conclusion was that:

(1) We should define vector(N) → vector(N) optabs for
    each current widening operation.  E.g. in the testcase
    aarch64 would provide v8qi → v8hi widening operations.

(2) We should add directly-mapped internal functions for the new optabs.

(3) We should make the modifier==NONE paths in vectorizable_conversion
    use the new internal functions for WIDEN_*_EXPRs.

Reply via email to