On Wed, Apr 09, 2025 at 04:01:49PM +0530, Tejas Belagod wrote:
> > It also looks like there might be a missing "+" in simd_reduction:
> > 
> >    #pragma omp simd reduction (+:va, i)
> >    for (j = 0; j < 16; j++)
> >      va = svld1_s32 (svptrue_b32 (), a);
> > 
> >    res = svaddv_s32 (svptrue_b32 (), va);
> > 
> >    if (res != 8)
> >      __builtin_abort ();
> > 
> > since AFAICT the loop is not doing a reduction as things stand.
> > But perhaps that's deliberate, since it does match the != 8 test.
> 
> That's interesting. I thought the reduction definition in the
> 'declare reduction' does the reduction from all the individual interations
> according the the rules defined in the reduction irrespective of the loop
> structure.  Maybe 'va' doesn't become implicit private and causes a race
> (which may be why I didn't see it in my testing) - I'll try to repro this
> and have a look.

No, reduction privatizes the variable, initializes the private variable with
the initializer from UDR and reduces at the end only from all the private
variables to the original one.

For simd, each SIMD lane has one private copy and there are # SIMD lanes
reductions into the original, for e.g. worksharing constructs each thread
has a private copy and there are omp_get_num_threads () reductions, etc.

There is no special action at the end of each loop body, it is up to the
user to merge state from each iteration.  In sane code the loop body does
similar operation to what the reduction does.
So say if you have
float sum = 0;
#pragma omp parallel for reduction (+:sum)
for (int i = 0; i < 1024; ++i)
  sum += a[i];
then if OpenMP pragmas are ignored, all the array members
are summed up in that order, while if say there are 4
threads and each handles 256 iterations, then each thread will
start with sum = 0; and do sum += a[i]; for i omp_get_thread_num () * 256
to omp_get_thread_num () * 256 + 255 inclusive and finally in some random
order the 4 private floats will be summed together.
For floating point that can result in different behavior (different rounding
etc.) but user said it is ok like that.

        Jakub

Reply via email to