Re: [PATCH v7] libgfortran: Replace mutex with rwlock
On 2023/12/14 23:50, Richard Earnshaw (lists) wrote: On 09/12/2023 15:39, Lipeng Zhu wrote: This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. (__gthrw): New function. (__gthread_rwlock_rdlock): New function. (__gthread_rwlock_tryrdlock): New function. (__gthread_rwlock_wrlock): New function. (__gthread_rwlock_trywrlock): New function. (__gthread_rwlock_unlock): New function. libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New macro. * io/async.h (RWLOCK_DEBUG_ADD): New macro. (CHECK_RDLOCK): New macro. (CHECK_WRLOCK): New macro. (TAIL_RWLOCK_DEBUG_QUEUE): New macro. (IN_RWLOCK_DEBUG_QUEUE): New macro. (RDLOCK): New macro. (WRLOCK): New macro. (RWUNLOCK): New macro. (RD_TO_WRLOCK): New macro. (INTERN_RDLOCK): New macro. (INTERN_WRLOCK): New macro. (INTERN_RWUNLOCK): New macro. * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in a comment. (unit_lock): Remove including associated internal_proto. (unit_rwlock): New declarations including associated internal_proto. (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock instead of __gthread_mutex_lock and __gthread_mutex_unlock on unit_lock. * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (st_write_done_worker): Likewise. * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' comment. Use unit_rwlock variable instead of unit_lock variable. (get_gfc_unit_from_unit_root): New function. (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_units): Likewise. (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on unit_lock. * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. It looks like this has broken builds on arm-none-eabi when using newlib: In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran /runtime/error.c:27: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function ‘dec_waiting_unlocked’: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error : implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration] 1023 | WRLOCK (&unit_rwlock); | ^~ /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error : implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration] 1025 | RWUNLOCK (&unit_rwlock); | ^~~~ R. Hi Richard, The root cause is that the macro WRLOCK and RWUNLOCK are not defined in io.h. The reason of x86 platform not failed is that HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never been used. Code logic show as below: #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else WRLOCK (&unit_rwlock); u->waiting--; RWUNLOCK (&unit_rwlock); #endif I just draft a patch try to fix this bug, because I didn't have arm platform, would you help to validate if it was fixed on arm platform? diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index 15daa0995b1..c7f0f7d7d9e 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u) #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else - WRLOCK (&unit_rwlock); +#ifdef __GTHREAD_RWLOCK_INIT + __gthread_rwlock_wrlock (&unit_rwlock); + u->waiting--; + __gthread_rwlock_unlock (&unit_rwlock); +#else + __gthread_mutex_lock (&unit_rwlock); u->waiting--; - RWUNLOCK (&unit_rwlock); + __gthread_mutex_unlock (&unit_rwlock); +#endif #endif } Lipeng Zhu
Re: [PATCH v7 4/5] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic
Hi! On 2023-12-14T15:26:38+0100, Tobias Burnus wrote: > On 19.08.23 00:47, Julian Brown wrote: >> This patch adds support for non-constant component offsets in "map" >> clauses for OpenMP (and the equivalants for OpenACC) [...] Should eventually also add some OpenACC test cases? > LGTM with: > > - inclusion of your follow-up fix for shared-memory systems (see email > of August 21) This was applied here: >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c >> +/* { dg-output "(\n|\r|\r\n)" } */ >> +/* { dg-output "libgomp: Mapped array elements must be the same >> .*(\n|\r|\r\n)+" } */ >> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */ ..., and here: >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c >> +/* { dg-output "(\n|\r|\r\n)" } */ >> +/* { dg-output "libgomp: Mapped array elements must be the same >> .*(\n|\r|\r\n)+" } */ >> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */ ..., but not here: >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 >> +! { dg-output "(\n|\r|\r\n)" } >> +! { dg-output "libgomp: Mapped array elements must be the same >> .*(\n|\r|\r\n)+" } >> +! { dg-shouldfail "" { offload_device_nonshared_as } } Pushed to master branch commit bc7546e32c5a942e240ef97776352d21105ef291 "In 'libgomp.fortran/map-subarray-5.f90', restrict 'dg-output's to 'target offload_device_nonshared_as'", see attached. Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From bc7546e32c5a942e240ef97776352d21105ef291 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 15 Dec 2023 13:05:24 +0100 Subject: [PATCH] In 'libgomp.fortran/map-subarray-5.f90', restrict 'dg-output's to 'target offload_device_nonshared_as' ..., as in 'libgomp.c-c++-common/map-arrayofstruct-{2,3}.c'. Minor fix-up for commit f5745dc1426bdb1a53ebaf7af758b2250ccbff02 "OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic". libgomp/ * testsuite/libgomp.fortran/map-subarray-5.f90: Restrict 'dg-output's to 'target offload_device_nonshared_as'. --- libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 index e7cdf11e610..59ad01ab76b 100644 --- a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 +++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 @@ -49,6 +49,6 @@ end do end -! { dg-output "(\n|\r|\r\n)" } -! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" } +! { dg-output "(\n|\r|\r\n)" { target offload_device_nonshared_as } } +! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" { target offload_device_nonshared_as } } ! { dg-shouldfail "" { offload_device_nonshared_as } } -- 2.34.1
Re: [PATCH v7] libgfortran: Replace mutex with rwlock
On 15/12/2023 11:31, Lipeng Zhu wrote: On 2023/12/14 23:50, Richard Earnshaw (lists) wrote: On 09/12/2023 15:39, Lipeng Zhu wrote: This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. (__gthrw): New function. (__gthread_rwlock_rdlock): New function. (__gthread_rwlock_tryrdlock): New function. (__gthread_rwlock_wrlock): New function. (__gthread_rwlock_trywrlock): New function. (__gthread_rwlock_unlock): New function. libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New macro. * io/async.h (RWLOCK_DEBUG_ADD): New macro. (CHECK_RDLOCK): New macro. (CHECK_WRLOCK): New macro. (TAIL_RWLOCK_DEBUG_QUEUE): New macro. (IN_RWLOCK_DEBUG_QUEUE): New macro. (RDLOCK): New macro. (WRLOCK): New macro. (RWUNLOCK): New macro. (RD_TO_WRLOCK): New macro. (INTERN_RDLOCK): New macro. (INTERN_WRLOCK): New macro. (INTERN_RWUNLOCK): New macro. * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in a comment. (unit_lock): Remove including associated internal_proto. (unit_rwlock): New declarations including associated internal_proto. (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock instead of __gthread_mutex_lock and __gthread_mutex_unlock on unit_lock. * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (st_write_done_worker): Likewise. * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' comment. Use unit_rwlock variable instead of unit_lock variable. (get_gfc_unit_from_unit_root): New function. (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_units): Likewise. (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on unit_lock. * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. It looks like this has broken builds on arm-none-eabi when using newlib: In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran /runtime/error.c:27: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function ‘dec_waiting_unlocked’: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error : implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration] 1023 | WRLOCK (&unit_rwlock); | ^~ /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error : implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration] 1025 | RWUNLOCK (&unit_rwlock); | ^~~~ R. Hi Richard, The root cause is that the macro WRLOCK and RWUNLOCK are not defined in io.h. The reason of x86 platform not failed is that HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never been used. Code logic show as below: #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else WRLOCK (&unit_rwlock); u->waiting--; RWUNLOCK (&unit_rwlock); #endif I just draft a patch try to fix this bug, because I didn't have arm platform, would you help to validate if it was fixed on arm platform? diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index 15daa0995b1..c7f0f7d7d9e 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u) #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else - WRLOCK (&unit_rwlock); +#ifdef __GTHREAD_RWLOCK_INIT + __gthread_rwlock_wrlock (&unit_rwlock); + u->waiting--; + __gthread_rwlock_unlock (&unit_rwlock); +#else + __gthread_mutex_lock (&unit_rwlock); u->waiting--; - RWUNLOCK (&unit_rwlock); + __gthread_mutex_unlock (&unit_rwlock); +#endif #endif } Lipeng Zhu Hi Lipeng, Thanks for the quick reply. I can confirm that with the above change the bootstrap failure is fixed. However, this shouldn't be considered a formal review; libgfortran is not really my area. I'll be away now until January 2nd. Richard.