https://llvm.org/bugs/show_bug.cgi?id=26454
Bug ID: 26454 Summary: [3.8.0] omp parallel for simd unexpected behaviour at different optimization levels Product: OpenMP Version: unspecified Hardware: PC OS: FreeBSD Status: NEW Severity: normal Priority: P Component: Clang Compiler Support Assignee: unassignedclangb...@nondot.org Reporter: bugzi...@hannes.hauswedell.net CC: llvm-bugs@lists.llvm.org Classification: Unclassified Created attachment 15815 --> https://llvm.org/bugs/attachment.cgi?id=15815&action=edit small benchmark I have created a little example to compare the vectorization support of clang vs gcc and the possible benefits of combining omp parallel and simd. These are the results, g++ is 5.3.0 and clang++ is 3.7.1 / 3.8.d20150720_1 (I know, not the most recent snapshot); I have limited OMP_NUM_THREADS to 2, so that we still get a clear picture. g++5 test.cpp -std=c++14 -fopenmp -O0 auto: 4.07434 omp parallel for: 2.03428 omp simd: 3.24567 omp parallel for simd: 1.85369 g++5 test.cpp -std=c++14 -fopenmp -O3 auto: 0.595322 omp parallel for: 0.410147 omp simd: 0.514423 omp parallel for simd: 0.383947 clang++37 test.cpp -std=c++14 -fopenmp -O0 auto: 2.91202 omp parallel for: 2.44816 omp simd: 2.95256 omp parallel for simd: 1.82498 clang++37 test.cpp -std=c++14 -fopenmp -O3 auto: 0.619024 omp parallel for: 0.412554 omp simd: 0.593244 omp parallel for simd: 0.403466 clang++-devel test.cpp -std=c++14 -fopenmp -O0 auto: 2.91251 omp parallel for: 1.72933 omp simd: 2.95548 omp parallel for simd: 2.14271 clang++-devel test.cpp -std=c++14 -fopenmp -O3 auto: 0.616876 omp parallel for: 0.289257 omp simd: 0.557144 omp parallel for simd: 0.289215 The first observation: clang38 is faster or the same speed as GCC in auto, omp parallel for and omp simd, both with and without optimization. With optimization there is also a significant speed-up of clang38 over clang37 and gcc! Congratulations :) For "#pragma omp parallel for simd" it is different. I know this is an OPENMP4 feature and for -03 clang37 and 38 correctly warn me: warning: loop not vectorized: failed explicitly specified loop vectorization [-Wpass-failed] Hence the speed of "parallel for" and "parallel for simd" are the same on clang37 and clang38. However for -00 there is no warning which I would consider a bug, BUT the runtime is also different. It is better than simd and worse than parallel which means it is doing neither and something else instead, could be another bug... but what is actually happening there? Thank you for taking the time and providing this excellent compiler! PS: Is there an ETA for #pragma omp parallel for simd ? -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs