On 4/10/25 5:29 PM, Jakub Jelinek wrote:
On Thu, Apr 10, 2025 at 05:13:12PM +0530, Tejas Belagod wrote:
Thanks for the explanation.  I looked into why some of the tests may have
failed - my flawed understanding of the reduction clause was why I didn't
have the += in the loops - it might have passed for me as I probably hit the
exact omp_get_num_threads () number required for the final += reductions to
trigger from the declare clause.  As Richard said, the += in the loops ought
to fix the tests. I'm still analysing inscan_reduction_incl () to fix it
properly.

For scan think about
   scan_a = 0;
#pragma omp parallel for reduction(inscan, +:scan_a)
   for (int i = 0; i < N; i++)
     {
       simd_scan[i] = scan_a;
       #pragma omp scan exclusive(scan_a)
       scan_a += a[i];
     }
and
   scan_a = 0;
#pragma omp parallel reduction(inscan, +:scan_a)
   for (int i = 0; i < N; i++)
     {
       scan_a += a[i];
       #pragma omp scan inclusive(scan_a)
       simd_scan[i] = scan_a;
     }
The directives arrange for the parallel version to compute the
same thing as the serial loops without the directives (at least
say for unsigned types, for signed or floating point it can
reassociate the additions (or whatever the reduction operation is).

Thanks for the explanations. It's easier to write tests when I think about it this way - it is less confusing.

OpenMP effectively breaks those halves of the loop bodies apart and
does magic in between.

So the magic in the middle is about applying the reduction upto the required iteration (across threads etc) when it is then read in the other part of the loop?

Thanks,
Tejas.

Reply via email to