On Thu, Apr 10, 2025 at 05:13:12PM +0530, Tejas Belagod wrote:
> Thanks for the explanation.  I looked into why some of the tests may have
> failed - my flawed understanding of the reduction clause was why I didn't
> have the += in the loops - it might have passed for me as I probably hit the
> exact omp_get_num_threads () number required for the final += reductions to
> trigger from the declare clause.  As Richard said, the += in the loops ought
> to fix the tests. I'm still analysing inscan_reduction_incl () to fix it
> properly.

For scan think about
  scan_a = 0;
#pragma omp parallel for reduction(inscan, +:scan_a)
  for (int i = 0; i < N; i++)
    {
      simd_scan[i] = scan_a;
      #pragma omp scan exclusive(scan_a)
      scan_a += a[i];
    }
and
  scan_a = 0;
#pragma omp parallel reduction(inscan, +:scan_a)
  for (int i = 0; i < N; i++)
    {
      scan_a += a[i];
      #pragma omp scan inclusive(scan_a)
      simd_scan[i] = scan_a;
    }
The directives arrange for the parallel version to compute the
same thing as the serial loops without the directives (at least
say for unsigned types, for signed or floating point it can
reassociate the additions (or whatever the reduction operation is).
OpenMP effectively breaks those halves of the loop bodies apart and
does magic in between.

        Jakub

Reply via email to