[PING][PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

Tom de Vries Wed, 19 May 2021 07:53:01 -0700

On 4/23/21 6:48 PM, Tom de Vries wrote:
> On 4/23/21 5:45 PM, Alexander Monakov wrote:
>> On Thu, 22 Apr 2021, Tom de Vries wrote:
>>
>>> Ah, I see, agreed, that makes sense.  I was afraid there was some
>>> fundamental problem that I overlooked.
>>>
>>> Here's an updated version.  I've tried to make it clear that the
>>> futex_wait/wake are locally used versions, not generic functionality.
>> Could you please regenerate the patch passing appropriate flags to
>> 'git format-patch' so it presents a rewrite properly (see documentation
>> for --patience and --break-rewrites options). The attached patch was mostly
>> unreadable, I'm afraid.
> Sure.  I did notice that the patch was not readable, but I didn't known
> there were options to improve that, so thanks for pointing that out.
>


Ping.  Any comments?

Thanks,
- Tom

> 0001-libgomp-nvptx-Fix-hang-in-gomp_team_barrier_wait_end.patch
> 
> From d3053a7ec7444b371ee29097a673e637b0d369d9 Mon Sep 17 00:00:00 2001
> From: Tom de Vries <tdevr...@suse.de>
> Date: Tue, 20 Apr 2021 08:47:03 +0200
> Subject: [PATCH 1/4] [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
> 
> Consider the following omp fragment.
> ...
>   #pragma omp target
>   #pragma omp parallel num_threads (2)
>   #pragma omp task
>     ;
> ...
> 
> This hangs at -O0 for nvptx.
> 
> Investigating the behaviour gives us the following trace of events:
> - both threads execute GOMP_task, where they:
>   - deposit a task, and
>   - execute gomp_team_barrier_wake
> - thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
>   proceeds to wait at the team barrier
> - thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
>   calls gomp_barrier_handle_tasks, where it:
>   - executes both tasks and marks the team barrier done
>   - executes a gomp_team_barrier_wake which wakes up thread 1
> - thread 1 exits the team barrier
> - thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
>   the team barrier.
> - thread 0 hangs.
> 
> To understand why there is a hang here, it's good to understand how things
> are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
> a copy of the libgomp/config/linux/bar.c implementation, with uses of both
> futex_wake and do_wait replaced with uses of ptx insn bar.sync:
> ...
>   if (bar->total > 1)
>     asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
> ...
> 
> The point where thread 0 goes to wait at the team barrier, corresponds in
> the linux implementation with a do_wait.  In the linux case, the call to
> do_wait doesn't hang, because it's waiting for bar->generation to become
> a certain value, and if bar->generation already has that value, it just
> proceeds, without any need for coordination with other threads.
> 
> In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
> logical barrier, which never happens: thread 1 is lingering in the
> thread pool at the thread pool barrier (using a different logical barrier),
> waiting to join a new team.
> 
> The easiest way to fix this is to revert to the posix implementation for
> bar.{c,h}.  That however falls back on a busy-waiting approach, and
> does not take advantage of the ptx bar.sync insn.
> 
> Instead, we revert to the linux implementation for bar.c,
> and implement bar.c local functions futex_wait and futex_wake using the
> bar.sync insn.
> 
> This is a WIP version that does not yet take performance into consideration,
> but instead focuses on copying a working version as completely as possible,
> and isolating the machine-specific changes to as few functions as
> possible.
> 
> The bar.sync insn takes an argument specifying how many threads are
> participating, and that doesn't play well with the futex syntax where it's
> not clear in advance how many threads will be woken up.
> 
> This is solved by waking up all waiting threads each time a futex_wait or
> futex_wake happens, and possibly going back to sleep with an updated thread
> count.
> 
> Tested libgomp on x86_64 with nvptx accelerator, both as-is and with
> do_spin hardcoded to 1.
> 
> libgomp/ChangeLog:
> 
> 2021-04-20  Tom de Vries  <tdevr...@suse.de>
> 
>       PR target/99555
>       * config/nvptx/bar.c (generation_to_barrier): New function, copied
>       from config/rtems/bar.c.
>       (futex_wait, futex_wake): New function.
>       (do_spin, do_wait): New function, copied from config/linux/wait.h.
>       (gomp_barrier_wait_end, gomp_barrier_wait_last)
>       (gomp_team_barrier_wake, gomp_team_barrier_wait_end):
>       (gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
>       and replace with include of config/linux/bar.c.
>       * config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
>       (gomp_barrier_init): Init new fields.
>       * testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
>       workarounds.
>       * testsuite/libgomp.c/pr99555-1.c: Same.
>       * testsuite/libgomp.fortran/task-detach-6.f90: Same.
> ---
>  libgomp/config/nvptx/bar.c                    | 388 ++++++++----------
>  libgomp/config/nvptx/bar.h                    |   4 +
>  .../libgomp.c-c++-common/task-detach-6.c      |   8 -
>  libgomp/testsuite/libgomp.c/pr99555-1.c       |   8 -
>  .../libgomp.fortran/task-detach-6.f90         |  12 -
>  5 files changed, 180 insertions(+), 240 deletions(-)
>  rewrite libgomp/config/nvptx/bar.c (76%)
> 
> diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c
> dissimilarity index 76%
> index c5c2fa8829b..e0e6e5ed839 100644
> --- a/libgomp/config/nvptx/bar.c
> +++ b/libgomp/config/nvptx/bar.c
> @@ -1,212 +1,176 @@
> -/* Copyright (C) 2015-2021 Free Software Foundation, Inc.
> -   Contributed by Alexander Monakov <amona...@ispras.ru>
> -
> -   This file is part of the GNU Offloading and Multi Processing Library
> -   (libgomp).
> -
> -   Libgomp is free software; you can redistribute it and/or modify it
> -   under the terms of the GNU General Public License as published by
> -   the Free Software Foundation; either version 3, or (at your option)
> -   any later version.
> -
> -   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
> -   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> -   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> -   more details.
> -
> -   Under Section 7 of GPL version 3, you are granted additional
> -   permissions described in the GCC Runtime Library Exception, version
> -   3.1, as published by the Free Software Foundation.
> -
> -   You should have received a copy of the GNU General Public License and
> -   a copy of the GCC Runtime Library Exception along with this program;
> -   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -/* This is an NVPTX specific implementation of a barrier synchronization
> -   mechanism for libgomp.  This type is private to the library.  This
> -   implementation uses atomic instructions and bar.sync instruction.  */
> -
> -#include <limits.h>
> -#include "libgomp.h"
> -
> -
> -void
> -gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
> -{
> -  if (__builtin_expect (state & BAR_WAS_LAST, 0))
> -    {
> -      /* Next time we'll be awaiting TOTAL threads again.  */
> -      bar->awaited = bar->total;
> -      __atomic_store_n (&bar->generation, bar->generation + BAR_INCR,
> -                     MEMMODEL_RELEASE);
> -    }
> -  if (bar->total > 1)
> -    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
> -}
> -
> -void
> -gomp_barrier_wait (gomp_barrier_t *bar)
> -{
> -  gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar));
> -}
> -
> -/* Like gomp_barrier_wait, except that if the encountering thread
> -   is not the last one to hit the barrier, it returns immediately.
> -   The intended usage is that a thread which intends to gomp_barrier_destroy
> -   this barrier calls gomp_barrier_wait, while all other threads
> -   call gomp_barrier_wait_last.  When gomp_barrier_wait returns,
> -   the barrier can be safely destroyed.  */
> -
> -void
> -gomp_barrier_wait_last (gomp_barrier_t *bar)
> -{
> -  /* Deferring to gomp_barrier_wait does not use the optimization opportunity
> -     allowed by the interface contract for all-but-last participants.  The
> -     original implementation in config/linux/bar.c handles this better.  */
> -  gomp_barrier_wait (bar);
> -}
> -
> -void
> -gomp_team_barrier_wake (gomp_barrier_t *bar, int count)
> -{
> -  if (bar->total > 1)
> -    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
> -}
> -
> -void
> -gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
> -{
> -  unsigned int generation, gen;
> -
> -  if (__builtin_expect (state & BAR_WAS_LAST, 0))
> -    {
> -      /* Next time we'll be awaiting TOTAL threads again.  */
> -      struct gomp_thread *thr = gomp_thread ();
> -      struct gomp_team *team = thr->ts.team;
> -
> -      bar->awaited = bar->total;
> -      team->work_share_cancelled = 0;
> -      if (__builtin_expect (team->task_count, 0))
> -     {
> -       gomp_barrier_handle_tasks (state);
> -       state &= ~BAR_WAS_LAST;
> -     }
> -      else
> -     {
> -       state &= ~BAR_CANCELLED;
> -       state += BAR_INCR - BAR_WAS_LAST;
> -       __atomic_store_n (&bar->generation, state, MEMMODEL_RELEASE);
> -       if (bar->total > 1)
> -         asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
> -       return;
> -     }
> -    }
> -
> -  generation = state;
> -  state &= ~BAR_CANCELLED;
> -  do
> -    {
> -      if (bar->total > 1)
> -     asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
> -      gen = __atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE);
> -      if (__builtin_expect (gen & BAR_TASK_PENDING, 0))
> -     {
> -       gomp_barrier_handle_tasks (state);
> -       gen = __atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE);
> -     }
> -      generation |= gen & BAR_WAITING_FOR_TASK;
> -    }
> -  while (gen != state + BAR_INCR);
> -}
> -
> -void
> -gomp_team_barrier_wait (gomp_barrier_t *bar)
> -{
> -  gomp_team_barrier_wait_end (bar, gomp_barrier_wait_start (bar));
> -}
> -
> -void
> -gomp_team_barrier_wait_final (gomp_barrier_t *bar)
> -{
> -  gomp_barrier_state_t state = gomp_barrier_wait_final_start (bar);
> -  if (__builtin_expect (state & BAR_WAS_LAST, 0))
> -    bar->awaited_final = bar->total;
> -  gomp_team_barrier_wait_end (bar, state);
> -}
> -
> -bool
> -gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
> -                                gomp_barrier_state_t state)
> -{
> -  unsigned int generation, gen;
> -
> -  if (__builtin_expect (state & BAR_WAS_LAST, 0))
> -    {
> -      /* Next time we'll be awaiting TOTAL threads again.  */
> -      /* BAR_CANCELLED should never be set in state here, because
> -      cancellation means that at least one of the threads has been
> -      cancelled, thus on a cancellable barrier we should never see
> -      all threads to arrive.  */
> -      struct gomp_thread *thr = gomp_thread ();
> -      struct gomp_team *team = thr->ts.team;
> -
> -      bar->awaited = bar->total;
> -      team->work_share_cancelled = 0;
> -      if (__builtin_expect (team->task_count, 0))
> -     {
> -       gomp_barrier_handle_tasks (state);
> -       state &= ~BAR_WAS_LAST;
> -     }
> -      else
> -     {
> -       state += BAR_INCR - BAR_WAS_LAST;
> -       __atomic_store_n (&bar->generation, state, MEMMODEL_RELEASE);
> -       if (bar->total > 1)
> -         asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
> -       return false;
> -     }
> -    }
> -
> -  if (__builtin_expect (state & BAR_CANCELLED, 0))
> -    return true;
> -
> -  generation = state;
> -  do
> -    {
> -      if (bar->total > 1)
> -     asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
> -      gen = __atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE);
> -      if (__builtin_expect (gen & BAR_CANCELLED, 0))
> -     return true;
> -      if (__builtin_expect (gen & BAR_TASK_PENDING, 0))
> -     {
> -       gomp_barrier_handle_tasks (state);
> -       gen = __atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE);
> -     }
> -      generation |= gen & BAR_WAITING_FOR_TASK;
> -    }
> -  while (gen != state + BAR_INCR);
> -
> -  return false;
> -}
> -
> -bool
> -gomp_team_barrier_wait_cancel (gomp_barrier_t *bar)
> -{
> -  return gomp_team_barrier_wait_cancel_end (bar, gomp_barrier_wait_start 
> (bar));
> -}
> -
> -void
> -gomp_team_barrier_cancel (struct gomp_team *team)
> -{
> -  gomp_mutex_lock (&team->task_lock);
> -  if (team->barrier.generation & BAR_CANCELLED)
> -    {
> -      gomp_mutex_unlock (&team->task_lock);
> -      return;
> -    }
> -  team->barrier.generation |= BAR_CANCELLED;
> -  gomp_mutex_unlock (&team->task_lock);
> -  gomp_team_barrier_wake (&team->barrier, INT_MAX);
> -}
> +/* Copyright (C) 2015-2021 Free Software Foundation, Inc.
> +   Contributed by Alexander Monakov <amona...@ispras.ru>
> +
> +   This file is part of the GNU Offloading and Multi Processing Library
> +   (libgomp).
> +
> +   Libgomp is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +   more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* This is an NVPTX specific implementation of a barrier synchronization
> +   mechanism for libgomp.  This type is private to the library.  This
> +   implementation uses atomic instructions and bar.sync instruction.  */
> +
> +#include <limits.h>
> +#include "libgomp.h"
> +
> +/* For cpu_relax.  */
> +#include "doacross.h"
> +
> +/* Assuming ADDR is &bar->generation, return bar.  Copied from
> +   rtems/bar.c.  */
> +
> +static gomp_barrier_t *
> +generation_to_barrier (int *addr)
> +{
> +  char *bar
> +    = (char *) addr - __builtin_offsetof (gomp_barrier_t, generation);
> +  return (gomp_barrier_t *)bar;
> +}
> +
> +/* Implement futex_wait-like behaviour to plug into the linux/bar.c
> +   implementation.  Assumes ADDR is &bar->generation.   */
> +
> +static inline void
> +futex_wait (int *addr, int val)
> +{
> +  gomp_barrier_t *bar = generation_to_barrier (addr);
> +
> +  if (bar->total < 2)
> +    /* A barrier with less than two threads, nop.  */
> +    return;
> +
> +  gomp_mutex_lock (&bar->lock);
> +
> +  /* Futex semantics: only go to sleep if *addr == val.  */
> +  if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_ACQUIRE) != val, 0))
> +    {
> +      gomp_mutex_unlock (&bar->lock);
> +      return;
> +    }
> +
> +  /* Register as waiter.  */
> +  unsigned int waiters
> +    = __atomic_add_fetch (&bar->waiters, 1, MEMMODEL_ACQ_REL);
> +  if (waiters == 0)
> +    __builtin_abort ();
> +  unsigned int waiter_id = waiters;
> +
> +  if (waiters > 1)
> +    {
> +      /* Wake other threads in bar.sync.  */
> +      asm volatile ("bar.sync 1, %0;" : : "r" (32 * waiters));
> +
> +      /* Ensure that they have updated waiters.  */
> +      asm volatile ("bar.sync 1, %0;" : : "r" (32 * waiters));
> +    }
> +
> +  gomp_mutex_unlock (&bar->lock);
> +
> +  while (1)
> +    {
> +      /* Wait for next thread in barrier.  */
> +      asm volatile ("bar.sync 1, %0;" : : "r" (32 * (waiters + 1)));
> +
> +      /* Get updated waiters.  */
> +      unsigned int updated_waiters
> +     = __atomic_load_n (&bar->waiters, MEMMODEL_ACQUIRE);
> +
> +      /* Notify that we have updated waiters.  */
> +      asm volatile ("bar.sync 1, %0;" : : "r" (32 * (waiters + 1)));
> +
> +      waiters = updated_waiters;
> +
> +      if (waiter_id > waiters)
> +     /* A wake happened, and we're in the group of woken threads.  */
> +     break;
> +
> +      /* Continue waiting.  */
> +    }
> +}
> +
> +/* Implement futex_wake-like behaviour to plug into the linux/bar.c
> +   implementation.  Assumes ADDR is &bar->generation.  */
> +
> +static inline void
> +futex_wake (int *addr, int count)
> +{
> +  gomp_barrier_t *bar = generation_to_barrier (addr);
> +
> +  if (bar->total < 2)
> +    /* A barrier with less than two threads, nop.  */
> +    return;
> +
> +  gomp_mutex_lock (&bar->lock);
> +  unsigned int waiters = __atomic_load_n (&bar->waiters, MEMMODEL_ACQUIRE);
> +  if (waiters == 0)
> +    {
> +      /* No threads to wake.  */
> +      gomp_mutex_unlock (&bar->lock);
> +      return;
> +    }
> +
> +  if (count == INT_MAX)
> +    /* Release all threads.  */
> +    __atomic_store_n (&bar->waiters, 0, MEMMODEL_RELEASE);
> +  else if (count < bar->total)
> +    /* Release count threads.  */
> +    __atomic_add_fetch (&bar->waiters, -count, MEMMODEL_ACQ_REL);
> +  else
> +    /* Count has an illegal value.  */
> +    __builtin_abort ();
> +
> +  /* Wake other threads in bar.sync.  */
> +  asm volatile ("bar.sync 1, %0;" : : "r" (32 * (waiters + 1)));
> +
> +  /* Let them get the updated waiters.  */
> +  asm volatile ("bar.sync 1, %0;" : : "r" (32 * (waiters + 1)));
> +
> +  gomp_mutex_unlock (&bar->lock);
> +}
> +
> +/* Copied from linux/wait.h.  */
> +
> +static inline int do_spin (int *addr, int val)
> +{
> +  unsigned long long i, count = gomp_spin_count_var;
> +
> +  if (__builtin_expect (__atomic_load_n (&gomp_managed_threads,
> +                                      MEMMODEL_RELAXED)
> +                     > gomp_available_cpus, 0))
> +    count = gomp_throttled_spin_count_var;
> +  for (i = 0; i < count; i++)
> +    if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_RELAXED) != val, 
> 0))
> +      return 0;
> +    else
> +      cpu_relax ();
> +  return 1;
> +}
> +
> +/* Copied from linux/wait.h.  */
> +
> +static inline void do_wait (int *addr, int val)
> +{
> +  if (do_spin (addr, val))
> +    futex_wait (addr, val);
> +}
> +
> +/* Reuse the linux implementation.  */
> +#define GOMP_WAIT_H 1
> +#include "../linux/bar.c"
> diff --git a/libgomp/config/nvptx/bar.h b/libgomp/config/nvptx/bar.h
> index 9bf3d914a02..c69426e1629 100644
> --- a/libgomp/config/nvptx/bar.h
> +++ b/libgomp/config/nvptx/bar.h
> @@ -38,6 +38,8 @@ typedef struct
>    unsigned generation;
>    unsigned awaited;
>    unsigned awaited_final;
> +  unsigned waiters;
> +  gomp_mutex_t lock;
>  } gomp_barrier_t;
>  
>  typedef unsigned int gomp_barrier_state_t;
> @@ -57,6 +59,8 @@ static inline void gomp_barrier_init (gomp_barrier_t *bar, 
> unsigned count)
>    bar->awaited = count;
>    bar->awaited_final = count;
>    bar->generation = 0;
> +  bar->waiters = 0;
> +  gomp_mutex_init (&bar->lock);
>  }
>  
>  static inline void gomp_barrier_reinit (gomp_barrier_t *bar, unsigned count)
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c 
> b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
> index f18b57bf047..e5c2291e6ff 100644
> --- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
> +++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
> @@ -2,9 +2,6 @@
>  
>  #include <omp.h>
>  #include <assert.h>
> -#include <unistd.h> // For 'alarm'.
> -
> -#include "on_device_arch.h"
>  
>  /* Test tasks with detach clause on an offload device.  Each device
>     thread spawns off a chain of tasks, that can then be executed by
> @@ -12,11 +9,6 @@
>  
>  int main (void)
>  {
> -  //TODO See '../libgomp.c/pr99555-1.c'.
> -  if (on_device_arch_nvptx ())
> -    alarm (4); /*TODO Until resolved, make sure that we exit quickly, with 
> error status.
> -              { dg-xfail-run-if "PR99555" { offload_device_nvptx } } */
> -
>    int x = 0, y = 0, z = 0;
>    int thread_count;
>    omp_event_handle_t detach_event1, detach_event2;
> diff --git a/libgomp/testsuite/libgomp.c/pr99555-1.c 
> b/libgomp/testsuite/libgomp.c/pr99555-1.c
> index bd33b93716b..7386e016fd2 100644
> --- a/libgomp/testsuite/libgomp.c/pr99555-1.c
> +++ b/libgomp/testsuite/libgomp.c/pr99555-1.c
> @@ -2,16 +2,8 @@
>  
>  // { dg-additional-options "-O0" }
>  
> -#include <unistd.h> // For 'alarm'.
> -
> -#include "../libgomp.c-c++-common/on_device_arch.h"
> -
>  int main (void)
>  {
> -  if (on_device_arch_nvptx ())
> -    alarm (4); /*TODO Until resolved, make sure that we exit quickly, with 
> error status.
> -              { dg-xfail-run-if "PR99555" { offload_device_nvptx } } */
> -
>  #pragma omp target
>  #pragma omp parallel // num_threads(1)
>  #pragma omp task
> diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90 
> b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
> index e4373b4c6f1..03a3b61540d 100644
> --- a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
> +++ b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
> @@ -1,6 +1,5 @@
>  ! { dg-do run }
>  
> -! { dg-additional-sources on_device_arch.c }
>    ! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is 
> valid for Fortran but not for C" }
>  
>  ! Test tasks with detach clause on an offload device.  Each device
> @@ -14,17 +13,6 @@ program task_detach_6
>    integer :: x = 0, y = 0, z = 0
>    integer :: thread_count
>  
> -  interface
> -    integer function on_device_arch_nvptx() bind(C)
> -    end function on_device_arch_nvptx
> -  end interface
> -
> -  !TODO See '../libgomp.c/pr99555-1.c'.
> -  if (on_device_arch_nvptx () /= 0) then
> -     call alarm (4, 0); !TODO Until resolved, make sure that we exit 
> quickly, with error status.
> -     ! { dg-xfail-run-if "PR99555" { offload_device_nvptx } }
> -  end if
> -
>    !$omp target map (tofrom: x, y, z) map (from: thread_count)
>      !$omp parallel private (detach_event1, detach_event2)
>        !$omp single
> -- 2.28.0
>

[PING][PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

Reply via email to