https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117270

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:98b2009b8768f8790dff9edbe00742bcdf2b7482

commit r15-7254-g98b2009b8768f8790dff9edbe00742bcdf2b7482
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Tue Jan 28 14:45:11 2025 +0000

    vect: Fix permutation counting in VLA-friendly path [PR117270]

    vectorizable_slp_permutation_1 has two ways of generating the
    permutations: one that looks for repeating patterns and one that
    calculates the permutation index for every output element individually.
    The former works for VLA and VLS whereas the latter only works for VLS.

    There are two justifications for using the repeating code for VLS:
    it gives more testing coverage, and it should reduce the analysis
    overhead for common cases.  This PR kind-of demonstrates both:
    the VLS coverage was showing a bug in the analysis shortcut.

    The bug seems to go back to g:ab7e60cec1a6, which added the
    repeating_p path.  It generated N copies of the permutation vector
    in the repeating case, but didn't multiply the number of permutation
    instructions for costing purposes by N.  So we seem to have been
    undercounting ncopies>1 permutations all this time...

    The problem became more visible with g:8157f3f2d211, which extended
    the repeating code to handle more cases.

    In the patch, I think noutputs is in practice always a multiple
    of unpack_factor, but it seemed more future-proof to handle the
    general case.

    gcc/
            PR tree-optimization/117270
            * tree-vect-slp.cc (vectorizable_slp_permutation_1): Make nperms
            account for the number of times that each permutation will be used
            during transformation.

Reply via email to