http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59249

            Bug ID: 59249
           Summary: Jump threading makes if-conversion and following
                    vectorization impossible.
           Product: gcc
           Version: 4.8.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bmei at broadcom dot com

I am doing some investigation on loops can be vectorized by LLVM, but not GCC.
One example is loop that contains more than one if-else constructs.

typedef signed char int8;
#define FFT         128

typedef struct {
    int8   exp[FFT];
} feq_t;

void test(feq_t *feq)
{
    int k;
    int feqMinimum = 15;
    int8 *exp = feq->exp;

    for (k=0;k<FFT;k++) {
    exp[k] -= feqMinimum;
    if(exp[k]<-15) exp[k] = -15;
    if(exp[k]>15) exp[k]  = 15;
    }
}

Compile it with 4.8.2 on x86_64
~/install-4.8/bin/gcc ghs-algorithms_380.c -O2 -fdump-tree-ifcvt-details
-ftree-vectorize  -save-temps

It is not vectorized because if-else constructs inside the loop cannot be
if-converted. Looking into .ifcvt file, this is due to bad if-else structure
(ifcvt pass complains "only critical predecessors"). One branch jumps directly
into another branch. Digging a bit deeper, I found such structure is generated
by dom1 pass doing jump threading optimization. 

So recompile with 

~/install-4.8/bin/gcc ghs-algorithms_380.c -O2 -fdump-tree-ifcvt-details
-ftree-vectorize  -save-temps -fno-tree-dominator-opts

It is magically if-converted and vectorized! Same on our target, performance is
improved greatly in this example.

It seems to me that doing jump threading for architectures support
if-conversion is not a good idea. Original if-else structures are damaged so
that if-conversion cannot proceed, so are vectorization and maybe other
optimizations. Should we try to identify those "bad" jump threading and skip
them for such architectures? 

Andrew Pinski slightly modified the code and -fno-tree-dominator-opts trick
won't work any more. 

#define FFT         128

typedef struct {
    signed char   exp[FFT];
} feq_t;

void test(feq_t *feq)
{
    int k;
    int feqMinimum = 15;
    signed char *exp = feq->exp;

    for (k=0;k<FFT;k++) {
signed char temp = exp[k] - feqMinimum;
        if(temp<-15) temp = -15;
        if(temp>15) temp  = 15;
exp[k] = temp;
    }
}

But this time is due to jump threading in VRP pass that causes the trouble.
With -fno-tree-vrp, the code can be if-converted and vectorized again.

Reply via email to