Re: How to force gcc to vectorize the loop with particular vectorization width

Jakub Jelinek Thu, 19 Oct 2017 02:02:07 -0700

On Thu, Oct 19, 2017 at 10:38:28AM +0200, Richard Biener wrote:
> On Thu, Oct 19, 2017 at 9:22 AM, Denis Bakhvalov <dendib...@gmail.com> wrote:
> > Hello!
> >
> > I have a hot inner loop which was vectorized by gcc, but I also want
> > compiler to unroll this loop by some factor.
> > It can be controled in clang with this pragma:
> > #pragma clang loop vectorize(enable) vectorize_width(8)
> > Please see example here:
> > https://godbolt.org/g/UJoUJn
> >
> > So I want to tell gcc something like this:
> > "I want you to vectorize the loop. After that I want you to unroll
> > this vectorized loop by some defined factor."
> >
> > I was playing with #pragma omp simd with the safelen clause, and
> > #pragma GCC optimize("unroll-loops") with no success. Compiler option
> > -fmax-unroll-times is not suitable for me, because it will affect
> > other parts of the code.
> >
> > Is it possible to achieve this somehow?
> 
> No.


#pragma omp simd has simdlen clause which is a hint on the preferable
vectorization factor, but the vectorizer doesn't use it so far;
probably it wouldn't be that hard to at least use that as the starting
factor if the target has multiple ones if it is one of those.
The vectorizer has some support for using wider vectorization factors
if there are mixed width types within the same loop, so perhaps
supporting 2x/4x/8x etc. sizes of the normally chosen width might not be
that hard.
What we don't have right now is support for using smaller
vectorization factors, which might be sometimes beneficial for -O2
vectorization of mixed width type loops.  We always use the vf derived
from the smallest width type, say when using SSE2 and there is a char type,
we try to use vf of 16 and if there is also int type, do operations on those
in 4x as many instructions, while there is also an option to use
vf of 4 and for operations on char just do something meaningful only in 1/4
of vector elements.  The various x86 vector ISAs have instructions to
widen or narrow for conversions.

In any case, no is the right answer right now, we don't have that
implemented.

        Jakub

Re: How to force gcc to vectorize the loop with particular vectorization width

Reply via email to