On 7/20/20 3:19 PM, Kwok Cheung Yeung wrote: > On 01/07/2020 3:28 pm, Tom de Vries wrote: >> So, I think gcc needs a copy of (some of) the >> gcc/testsuite/gcc.dg/ia64-sync-*.c tests for effective target >> sync_char_short. >> >> However, since this patch only adds partial support, we cannot enable >> sync_char_short for nvptx yet. So, if you stick to partial support, you >> should add a char/short copy of ia64-sync-3.c to gcc.target/nvptx (which >> ideally could be an include of a generic test-case that is active for >> sync_char_short only, with mention that it can be removed once >> sync_char_short is enabled for nvptx). >> > > I have added gcc.target/nvptx/sync.c, which is a version of > ia64-sync-3.c extended to test chars and shorts too.
I've: - added the short/char part of that as gcc.dg/ia64-async-5.c - included the sync_int_long ones of gcc.dg/ia64-async-* in gcc.target/nvptx - did the same for gcc.dg/ia64-async-5.c as part of this patch > I kept the original > int and long tests because sync_int_long isn't indicated as being > supported on nvptx either. > Yep, I proposed a patch to enable that: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551842.html . >> I looked at the implementation, and it looks ok to me, though I think we >> need to make explicit in a comment what the assumptions are: >> - that we have read and write access to the entire word, and >> - that the word is not volatile. >> > > I've added some extra comments in the implementation. Like I said > previously, the loop accounts for the larger word being volatile. > Right, I known what that loop is intending to do. The loop though may manifest worst-case as a hang, so I've mentioned that in the comment. >> As for the oacc test-case, you could add the __int128 bit, perhaps along >> the lines of how things are done in >> libgomp/testsuite/libgomp.c++/target-8.C ? >> > > I've added a extra test for __int128 types in my libgomp testcase that > runs if 128-bit types are supported. > > I've tested that there are no regressions with the patch on standalone > nvptx, and that the new reduction-16.c testcase passes with both nvptx > and AMD GCN offloading. > > Is this version okay for master and og10? > Pushed as attached to master. Thanks, - Tom
nvptx: Add support for subword compare-and-swap This adds support for __sync_val_compare_and_swap and __sync_bool_compare_and_swap for 1-byte and 2-byte long values, which are not natively supported on nvptx. Build and reg-tested on nvptx. Build and reg-tested libgomp on x86_64 with nvptx accelerator. 2020-07-16 Kwok Cheung Yeung <k...@codesourcery.com> libgcc/ * config/nvptx/atomic.c: New. * config/nvptx/t-nvptx (LIB2ADD): Add atomic.c. gcc/testsuite/ * gcc.target/nvptx/sync-5.c: New. libgomp/ * testsuite/libgomp.c-c++-common/reduction-16.c: New. --- gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c | 2 + libgcc/config/nvptx/atomic.c | 73 ++++++++++++++++++++++ libgcc/config/nvptx/t-nvptx | 3 +- .../testsuite/libgomp.c-c++-common/reduction-16.c | 53 ++++++++++++++++ 4 files changed, 130 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c b/gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c new file mode 100644 index 00000000000..ec40f2ca7a9 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c @@ -0,0 +1,2 @@ +/* { dg-do run } */ +#include "../../gcc.dg/ia64-sync-5.c" diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c new file mode 100644 index 00000000000..e1ea078692a --- /dev/null +++ b/libgcc/config/nvptx/atomic.c @@ -0,0 +1,73 @@ +/* NVPTX atomic operations + Copyright (C) 2020 Free Software Foundation, Inc. + Contributed by Mentor Graphics. + + This file is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the + Free Software Foundation; either version 3, or (at your option) any + later version. + + This file is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdbool.h> + +/* Implement __sync_val_compare_and_swap and __sync_bool_compare_and_swap + for 1 and 2-byte values (which are not natively supported) in terms of + __sync_val_compare_and_swap for 4-byte values (which is supported). + This assumes that the contents of the word surrounding the subword + value that we are interested in are accessible as well (which should + normally be the case). Note that if the contents of the word surrounding + the subword changes between the __sync_val_compare_and_swap_4 and the + preceeding load of oldword, while the subword does not, the implementation + loops, which may manifest worst-case as a hang. */ + +#define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE) \ + \ +TYPE \ +__sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval) \ +{ \ + unsigned int *wordptr = (unsigned int *)((__UINTPTR_TYPE__ ) ptr & ~3UL); \ + int shift = ((__UINTPTR_TYPE__ ) ptr & 3UL) * 8; \ + unsigned int valmask = (1 << (SIZE * 8)) - 1; \ + unsigned int wordmask = ~(valmask << shift); \ + unsigned int oldword = *wordptr; \ + for (;;) \ + { \ + TYPE prevval = (oldword >> shift) & valmask; \ + /* Exit if the subword value previously read from memory is not */ \ + /* equal to the expected value OLDVAL. */ \ + if (__builtin_expect (prevval != oldval, 0)) \ + return prevval; \ + unsigned int newword = oldword & wordmask; \ + newword |= ((unsigned int) newval) << shift; \ + unsigned int prevword \ + = __sync_val_compare_and_swap_4 (wordptr, oldword, newword); \ + /* Exit only if the compare-and-swap succeeds on the whole word */ \ + /* (i.e. the contents of *WORDPTR have not changed since the last */ \ + /* memory read). */ \ + if (__builtin_expect (prevword == oldword, 1)) \ + return oldval; \ + oldword = prevword; \ + } \ +} \ + \ +bool \ +__sync_bool_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval) \ +{ \ + return __sync_val_compare_and_swap_##SIZE (ptr, oldval, newval) == oldval; \ +} + +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned char, 1) +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned short, 2) diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx index c4d20c94cbb..ede0bf0f87d 100644 --- a/libgcc/config/nvptx/t-nvptx +++ b/libgcc/config/nvptx/t-nvptx @@ -1,5 +1,6 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \ - $(srcdir)/config/nvptx/mgomp.c + $(srcdir)/config/nvptx/mgomp.c \ + $(srcdir)/config/nvptx/atomic.c LIB2ADDEH= LIB2FUNCS_EXCLUDE=__main diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c new file mode 100644 index 00000000000..d0e82b04790 --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c @@ -0,0 +1,53 @@ +/* { dg-do run } */ + +#include <stdlib.h> + +#define N 512 + +#define GENERATE_TEST(T) \ +int test_##T (void) \ +{ \ + T a[N], res = 0; \ + \ + for (int i = 0; i < N; ++i) \ + a[i] = i & 1; \ + \ +_Pragma("omp target teams distribute reduction(||:res) defaultmap(tofrom:scalar)") \ + for (int i = 0; i < N; ++i) \ + res = res || a[i]; \ + \ + /* res should be non-zero. */\ + if (!res) \ + return 1; \ + \ +_Pragma("omp target teams distribute reduction(&&:res) defaultmap(tofrom:scalar)") \ + for (int i = 0; i < N; ++i) \ + res = res && a[i]; \ + \ + /* res should be zero. */ \ + return res; \ +} + +GENERATE_TEST(char) +GENERATE_TEST(short) +GENERATE_TEST(int) +GENERATE_TEST(long) +#ifdef __SIZEOF_INT128__ +GENERATE_TEST(__int128) +#endif + +int main(void) +{ + if (test_char ()) + abort (); + if (test_short ()) + abort (); + if (test_int ()) + abort (); + if (test_long ()) + abort (); +#ifdef __SIZEOF_INT128__ + if (test___int128 ()) + abort (); +#endif +}