On Fri, Oct 18, 2024 at 11:52:26AM +0530, Tejas Belagod wrote:
> +/* This worksharing construct binds to an implicit outer parallel region in
> +    whose scope va is declared and therefore is default private.  This causes
> +    the lastprivate clause list item va to be diagnosed as private in the 
> outer
> +    context.  Similarly for constructs for and distribute.  */

So just add #pragma omp parallel around it, then it isn't private in outer
context but shared.

> +#pragma omp sections lastprivate (va) /* { dg-error {lastprivate variable 
> 'va' is private in outer context} } */
> +    {
> +      #pragma omp section
> +      vb = svld1_s32 (svptrue_b32 (), b);
> +      #pragma omp section
> +      vc = svld1_s32 (svptrue_b32 (), c);
> +      #pragma omp section
> +      va = svadd_s32_z (svptrue_b32 (), vb, vc);

This is again racy (if there is any parallel around it, whether in main or
within the function), while vb and vc are implicitly shared, by the
time the last section is run, the first two might not even have started, or
might be done concurrently with the third one.  And, as the last section
is the only one which modifies the lastprivate variable, it isn't a good
example for it.  lastprivate is primarily private, each thread in the
parallel has its own copy and it is nice if each section say writes to it
as a temporary and then uses it for some operation.
E.g.
      #pragma omp section
      va = svld1_s32 (svptrue_b32 (), b);
      va = svadd_s32_z (svptrue_b32 (), va, svld1_s32 (svptrue_b32 (), c));
and then another section which subtracts instead of adds and yet another
which multiples rather than adds and then verify lastprivate got the
value from the multiplication.
> +

Again, put #pragma omp parallel around this

> +#pragma omp for lastprivate (va) /* { dg-error {lastprivate variable 'va' is 
> private in outer context} } */
> +  for (i = 0; i < 1; i++)

and perhaps more than one iteration, ideally do something more interesting,
but on the other side, as different iterations can be handled by different
threads, there can't be dependencies between the iterations.
> +    {
> +      vb = svld1_s32 (svptrue_b32 (), b);
> +      vc = svld1_s32 (svptrue_b32 (), c);
> +      va = svadd_s32_z (svptrue_b32 (), vb, vc);
> +    }

> +#pragma omp parallel
> +#pragma omp sections lastprivate (vb, vc)
> +    {
> +      #pragma omp section
> +      vb = svld1_s32 (svptrue_b32 (), b);
> +      #pragma omp section
> +      vc = svld1_s32 (svptrue_b32 (), c);
> +    }

This is invalid, vb is used, even when the last
section doesn't write it.  lastprivate for sections
means each thread has its own copy and value
from the thread which executed the last section (lexically)
is copied to the original.
If you are lucky and the same thread handles both sections,
then it would work, but it can be different thread...

> +#pragma omp parallel
> +#pragma omp for lastprivate (va, vb, vc)
> +  for (i = 0; i < 4; i++)
> +    {
> +      vb = svld1_s32 (svptrue_b32 (), b + i * 8);
> +      vc = svld1_s32 (svptrue_b32 (), c + i * 8);
> +      va = svadd_s32_z (svptrue_b32 (), vb, vc);
> +      svst1_s32 (svptrue_b32 (), a + i * 8, va);

Is svst1 storing just one element or say 8 elements
and not the whole variable length vector?
If there is overlap between what different threads
write, then it would be racy (or if it can load beyond end of
array).

        Jakub

Reply via email to