The simple one-line fix in GOMP_taskwait took many hours to find. Shared memory problems are a pain to debug, especially when adding code to dump some state turns a testcase that fails every hundred or so runs into one that takes thousands of times longer to fail.
What happens here is that GOMP_taskwait is called in the parent thread some time after gomp_barrier_handle_tasks has run in the child to the point of clearing the parent's "children" field. However, since there is no acquire barrier in the parent and the child may or may not have reached the release barrier in the mutex unlock, the memory stores in the child are not guaranteed to be seen in order in the parent thread. Thus the parent can see "task->children" clear but not yet see stores done as part of the real work of the child, ie. to "a" and "n" in the testcase. The GOMP_task change fixes a similar potential problem. Bootstrapped and regression tested powerpc-linux. OK to apply? PR libgomp/51376 * task.c (GOMP_taskwait): Don't access task->children outside of task_lock mutex region. (GOMP_task): Likewise. Index: libgomp/task.c =================================================================== --- libgomp/task.c (revision 181833) +++ libgomp/task.c (working copy) @@ -116,12 +116,12 @@ GOMP_task (void (*fn) (void *), void *da } else fn (data); - if (task.children) - { - gomp_mutex_lock (&team->task_lock); - gomp_clear_parent (task.children); - gomp_mutex_unlock (&team->task_lock); - } + if (team != NULL) + gomp_mutex_lock (&team->task_lock); + if (task.children != NULL) + gomp_clear_parent (task.children); + if (team != NULL) + gomp_mutex_unlock (&team->task_lock); gomp_end_task (); } else @@ -290,8 +290,9 @@ GOMP_taskwait (void) struct gomp_task *child_task = NULL; struct gomp_task *to_free = NULL; - if (task == NULL || task->children == NULL) + if (task == NULL) return; + gomp_mutex_lock (&team->task_lock); while (1) { -- Alan Modra Australia Development Lab, IBM