On Fri, Oct 18, 2024 at 11:52:26AM +0530, Tejas Belagod wrote: > +/* This worksharing construct binds to an implicit outer parallel region in > + whose scope va is declared and therefore is default private. This causes > + the lastprivate clause list item va to be diagnosed as private in the > outer > + context. Similarly for constructs for and distribute. */
So just add #pragma omp parallel around it, then it isn't private in outer context but shared. > +#pragma omp sections lastprivate (va) /* { dg-error {lastprivate variable > 'va' is private in outer context} } */ > + { > + #pragma omp section > + vb = svld1_s32 (svptrue_b32 (), b); > + #pragma omp section > + vc = svld1_s32 (svptrue_b32 (), c); > + #pragma omp section > + va = svadd_s32_z (svptrue_b32 (), vb, vc); This is again racy (if there is any parallel around it, whether in main or within the function), while vb and vc are implicitly shared, by the time the last section is run, the first two might not even have started, or might be done concurrently with the third one. And, as the last section is the only one which modifies the lastprivate variable, it isn't a good example for it. lastprivate is primarily private, each thread in the parallel has its own copy and it is nice if each section say writes to it as a temporary and then uses it for some operation. E.g. #pragma omp section va = svld1_s32 (svptrue_b32 (), b); va = svadd_s32_z (svptrue_b32 (), va, svld1_s32 (svptrue_b32 (), c)); and then another section which subtracts instead of adds and yet another which multiples rather than adds and then verify lastprivate got the value from the multiplication. > + Again, put #pragma omp parallel around this > +#pragma omp for lastprivate (va) /* { dg-error {lastprivate variable 'va' is > private in outer context} } */ > + for (i = 0; i < 1; i++) and perhaps more than one iteration, ideally do something more interesting, but on the other side, as different iterations can be handled by different threads, there can't be dependencies between the iterations. > + { > + vb = svld1_s32 (svptrue_b32 (), b); > + vc = svld1_s32 (svptrue_b32 (), c); > + va = svadd_s32_z (svptrue_b32 (), vb, vc); > + } > +#pragma omp parallel > +#pragma omp sections lastprivate (vb, vc) > + { > + #pragma omp section > + vb = svld1_s32 (svptrue_b32 (), b); > + #pragma omp section > + vc = svld1_s32 (svptrue_b32 (), c); > + } This is invalid, vb is used, even when the last section doesn't write it. lastprivate for sections means each thread has its own copy and value from the thread which executed the last section (lexically) is copied to the original. If you are lucky and the same thread handles both sections, then it would work, but it can be different thread... > +#pragma omp parallel > +#pragma omp for lastprivate (va, vb, vc) > + for (i = 0; i < 4; i++) > + { > + vb = svld1_s32 (svptrue_b32 (), b + i * 8); > + vc = svld1_s32 (svptrue_b32 (), c + i * 8); > + va = svadd_s32_z (svptrue_b32 (), vb, vc); > + svst1_s32 (svptrue_b32 (), a + i * 8, va); Is svst1 storing just one element or say 8 elements and not the whole variable length vector? If there is overlap between what different threads write, then it would be racy (or if it can load beyond end of array). Jakub