https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Chung-Lin Tang <clt...@gcc.gnu.org>:

https://gcc.gnu.org/g:fdc7469cf597ec11229ddfc3e9c7a06f3d0fba9d

commit r13-4832-gfdc7469cf597ec11229ddfc3e9c7a06f3d0fba9d
Author: Chung-Lin Tang <clt...@codesourcery.com>
Date:   Wed Dec 21 05:57:45 2022 -0800

    nvptx: reimplement libgomp barriers [PR99555]

    Instead of trying to have the GPU do CPU-with-OS-like things, this new
barriers
    implementation for NVPTX uses simplistic bar.* synchronization
instructions.
    Tasks are processed after threads have joined, and only if team->task_count
!= 0

    It is noted that: there might be a little bit of performance forfeited for
    cases where earlier arriving threads could've been used to process tasks
ahead
    of other threads, but that has the requirement of implementing complex
    futex-wait/wake like behavior, which is what we're try to avoid with this
patch.
    It is deemed that task processing is not what GPU target offloading is
usually
    used for.

    Implementation highlight notes:
    1. gomp_team_barrier_wake() is now an empty function (threads never "wake"
in
       the usual manner)
    2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
    3. gomp_barrier_wait_last() now is implemented using "bar.arrive"

    4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
       The main synchronization is done using a 'bar.red' instruction. This
reduces
       across all threads the condition (team->task_count != 0), to enable the
task
       processing down below if any thread created a task.
       (this bar.red usage means that this patch is dependent on the prior
NVPTX
       bar.red GCC patch)

            PR target/99555

    libgomp/ChangeLog:

            * config/nvptx/bar.c (generation_to_barrier): Remove.
            (futex_wait,futex_wake,do_spin,do_wait): Remove.
            (GOMP_WAIT_H): Remove.
            (#include "../linux/bar.c"): Remove.
            (gomp_barrier_wait_end): New function.
            (gomp_barrier_wait): Likewise.
            (gomp_barrier_wait_last): Likewise.
            (gomp_team_barrier_wait_end): Likewise.
            (gomp_team_barrier_wait): Likewise.
            (gomp_team_barrier_wait_final): Likewise.
            (gomp_team_barrier_wait_cancel_end): Likewise.
            (gomp_team_barrier_wait_cancel): Likewise.
            (gomp_team_barrier_cancel): Likewise.
            * config/nvptx/bar.h (gomp_barrier_t): Remove waiters, lock fields.
            (gomp_barrier_init): Remove init of waiters, lock fields.
            (gomp_team_barrier_wake): Remove prototype, add new static inline
            function.

Reply via email to