[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

hliu at amperecomputing dot com via Gcc-bugs Tue, 18 Jul 2023 19:57:57 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625


--- Comment #6 from Hao Liu <hliu at amperecomputing dot com> ---
Thanks for the confirmation about the reduction latency.  I'll create a simple
patch to fix this.

> Discounting the loads, we do have 15 general operations.

That's true, and there are indeed 8 general operations for scalar loop.  As the
count_ops() is accurate, it seems maybe the Cost of Vector Body is too large
(Vector inside of loop cost: 51):

    *k_48 4 times vec_perm costs 12 in body
    *k_48 1 times unaligned_load (misalign -1) costs 4 in body
    _5->m1 1 times vec_perm costs 3 in body
    _5->m4 1 times unaligned_load (misalign -1) costs 4 in body
    (int) _24 2 times vec_promote_demote costs 4 in body
    (double) _25 4 times vec_promote_demote costs 8 in body
    _2 * _26 4 times vector_stmt costs 8 in body

If it is small enough, even the vect-body cost is increased according to the
issue-info, SLP is still profitable.  I'm not quite familiar with this part and
it may affect all aarch64 targets, so I think it's hard to fix by me.  It would
be great if you will look at how to fix this.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

Reply via email to