https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103976
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2022-01-11
Status|UNCONFIRMED |NEW
Keywords| |missed-optimization
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. The kernel is outlined even in the if (0) path but
eventually executed serially (which is faster than with using threads).
The only difference with using if (1) is
--- a-t.c.244t.optimized0 2022-01-11 14:07:52.152665056 +0100
+++ a-t.c.244t.optimized 2022-01-11 14:07:58.696751625 +0100
@@ -121,7 +121,7 @@
# sum_17 = PHI <sum_10(3), 0.0(2)>
# ivtmp_4 = PHI <ivtmp_3(3), 100000000(2)>
.omp_data_o.1.sum = sum_17;
- __builtin_GOMP_parallel (main._omp_fn.0, &.omp_data_o.1, 1, 0);
+ __builtin_GOMP_parallel (main._omp_fn.0, &.omp_data_o.1, 0, 0);
sum_10 = .omp_data_o.1.sum;
.omp_data_o.1 ={v} {CLOBBER};
ivtmp_3 = ivtmp_4 + 4294967295;
the loop kernel still executes workload computation and reduction
commoning with atomics. Without -fopenmp we unroll the kernel
and constant evaluate all 1./j