http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59249
Bug ID: 59249 Summary: Jump threading makes if-conversion and following vectorization impossible. Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: bmei at broadcom dot com I am doing some investigation on loops can be vectorized by LLVM, but not GCC. One example is loop that contains more than one if-else constructs. typedef signed char int8; #define FFT 128 typedef struct { int8 exp[FFT]; } feq_t; void test(feq_t *feq) { int k; int feqMinimum = 15; int8 *exp = feq->exp; for (k=0;k<FFT;k++) { exp[k] -= feqMinimum; if(exp[k]<-15) exp[k] = -15; if(exp[k]>15) exp[k] = 15; } } Compile it with 4.8.2 on x86_64 ~/install-4.8/bin/gcc ghs-algorithms_380.c -O2 -fdump-tree-ifcvt-details -ftree-vectorize -save-temps It is not vectorized because if-else constructs inside the loop cannot be if-converted. Looking into .ifcvt file, this is due to bad if-else structure (ifcvt pass complains "only critical predecessors"). One branch jumps directly into another branch. Digging a bit deeper, I found such structure is generated by dom1 pass doing jump threading optimization. So recompile with ~/install-4.8/bin/gcc ghs-algorithms_380.c -O2 -fdump-tree-ifcvt-details -ftree-vectorize -save-temps -fno-tree-dominator-opts It is magically if-converted and vectorized! Same on our target, performance is improved greatly in this example. It seems to me that doing jump threading for architectures support if-conversion is not a good idea. Original if-else structures are damaged so that if-conversion cannot proceed, so are vectorization and maybe other optimizations. Should we try to identify those "bad" jump threading and skip them for such architectures? Andrew Pinski slightly modified the code and -fno-tree-dominator-opts trick won't work any more. #define FFT 128 typedef struct { signed char exp[FFT]; } feq_t; void test(feq_t *feq) { int k; int feqMinimum = 15; signed char *exp = feq->exp; for (k=0;k<FFT;k++) { signed char temp = exp[k] - feqMinimum; if(temp<-15) temp = -15; if(temp>15) temp = 15; exp[k] = temp; } } But this time is due to jump threading in VRP pass that causes the trouble. With -fno-tree-vrp, the code can be if-converted and vectorized again.