On Thu, Apr 10, 2025 at 05:13:12PM +0530, Tejas Belagod wrote: > Thanks for the explanation. I looked into why some of the tests may have > failed - my flawed understanding of the reduction clause was why I didn't > have the += in the loops - it might have passed for me as I probably hit the > exact omp_get_num_threads () number required for the final += reductions to > trigger from the declare clause. As Richard said, the += in the loops ought > to fix the tests. I'm still analysing inscan_reduction_incl () to fix it > properly.
For scan think about scan_a = 0; #pragma omp parallel for reduction(inscan, +:scan_a) for (int i = 0; i < N; i++) { simd_scan[i] = scan_a; #pragma omp scan exclusive(scan_a) scan_a += a[i]; } and scan_a = 0; #pragma omp parallel reduction(inscan, +:scan_a) for (int i = 0; i < N; i++) { scan_a += a[i]; #pragma omp scan inclusive(scan_a) simd_scan[i] = scan_a; } The directives arrange for the parallel version to compute the same thing as the serial loops without the directives (at least say for unsigned types, for signed or floating point it can reassociate the additions (or whatever the reduction operation is). OpenMP effectively breaks those halves of the loop bodies apart and does magic in between. Jakub