Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-15 Thread Lipeng Zhu




On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:

On 09/12/2023 15:39, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.



It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration]
  1023 |   WRLOCK (&unit_rwlock);
   |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration]
  1025 |   RWUNLOCK (&unit_rwlock);
   |   ^~~~


R.


Hi Richard,

The root cause is that the macro WRLOCK and RWUNLOCK are not defined in 
io.h. The reason of x86 platform not failed is that 
HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never 
been used. Code logic show as below:

#ifdef HAVE_ATOMIC_FETCH_ADD
  (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
#else
  WRLOCK (&unit_rwlock);
  u->waiting--;
  RWUNLOCK (&unit_rwlock);
#endif

I just draft a patch try to fix this bug, because I didn't have arm 
platform, would you help to validate if it was fixed on arm platform?


diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 15daa0995b1..c7f0f7d7d9e 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u)
 #ifdef HAVE_ATOMIC_FETCH_ADD
   (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
 #else
-  WRLOCK (&unit_rwlock);
+#ifdef __GTHREAD_RWLOCK_INIT
+  __gthread_rwlock_wrlock (&unit_rwlock);
+  u->waiting--;
+  __gthread_rwlock_unlock (&unit_rwlock);
+#else
+  __gthread_mutex_lock (&unit_rwlock);
   u->waiting--;
-  RWUNLOCK (&unit_rwlock);
+  __gthread_mutex_unlock (&unit_rwlock);
+#endif
 #endif
 }


Lipeng Zhu


Re: [PATCH v7 4/5] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2023-12-15 Thread Thomas Schwinge
Hi!

On 2023-12-14T15:26:38+0100, Tobias Burnus  wrote:
> On 19.08.23 00:47, Julian Brown wrote:
>> This patch adds support for non-constant component offsets in "map"
>> clauses for OpenMP (and the equivalants for OpenACC) [...]

Should eventually also add some OpenACC test cases?


> LGTM with:
>
> - inclusion of your follow-up fix for shared-memory systems (see email
> of August 21)

This was applied here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c

>> +/* { dg-output "(\n|\r|\r\n)" } */
>> +/* { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" } */
>> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */

..., and here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c

>> +/* { dg-output "(\n|\r|\r\n)" } */
>> +/* { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" } */
>> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */

..., but not here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

>> +! { dg-output "(\n|\r|\r\n)" }
>> +! { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" }
>> +! { dg-shouldfail "" { offload_device_nonshared_as } }

Pushed to master branch commit bc7546e32c5a942e240ef97776352d21105ef291
"In 'libgomp.fortran/map-subarray-5.f90', restrict 'dg-output's to 'target 
offload_device_nonshared_as'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From bc7546e32c5a942e240ef97776352d21105ef291 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 15 Dec 2023 13:05:24 +0100
Subject: [PATCH] In 'libgomp.fortran/map-subarray-5.f90', restrict
 'dg-output's to 'target offload_device_nonshared_as'

..., as in 'libgomp.c-c++-common/map-arrayofstruct-{2,3}.c'.

Minor fix-up for commit f5745dc1426bdb1a53ebaf7af758b2250ccbff02
"OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic".

	libgomp/
	* testsuite/libgomp.fortran/map-subarray-5.f90: Restrict
	'dg-output's to 'target offload_device_nonshared_as'.
---
 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
index e7cdf11e610..59ad01ab76b 100644
--- a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
@@ -49,6 +49,6 @@ end do
 
 end
 
-! { dg-output "(\n|\r|\r\n)" }
-! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" }
+! { dg-output "(\n|\r|\r\n)" { target offload_device_nonshared_as } }
+! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" { target offload_device_nonshared_as } }
 ! { dg-shouldfail "" { offload_device_nonshared_as } }
-- 
2.34.1



Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-15 Thread Richard Earnshaw




On 15/12/2023 11:31, Lipeng Zhu wrote:



On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:

On 09/12/2023 15:39, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.



It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from 
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran

/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In 
function

‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ 
[-Wimplicit-function-declaration]

  1023 |   WRLOCK (&unit_rwlock);
   |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ 
[-Wimplicit-function-declaration]

  1025 |   RWUNLOCK (&unit_rwlock);
   |   ^~~~


R.


Hi Richard,

The root cause is that the macro WRLOCK and RWUNLOCK are not defined in 
io.h. The reason of x86 platform not failed is that 
HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never 
been used. Code logic show as below:

#ifdef HAVE_ATOMIC_FETCH_ADD
   (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
#else
   WRLOCK (&unit_rwlock);
   u->waiting--;
   RWUNLOCK (&unit_rwlock);
#endif

I just draft a patch try to fix this bug, because I didn't have arm 
platform, would you help to validate if it was fixed on arm platform?


diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 15daa0995b1..c7f0f7d7d9e 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u)
  #ifdef HAVE_ATOMIC_FETCH_ADD
    (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
  #else
-  WRLOCK (&unit_rwlock);
+#ifdef __GTHREAD_RWLOCK_INIT
+  __gthread_rwlock_wrlock (&unit_rwlock);
+  u->waiting--;
+  __gthread_rwlock_unlock (&unit_rwlock);
+#else
+  __gthread_mutex_lock (&unit_rwlock);
    u->waiting--;
-  RWUNLOCK (&unit_rwlock);
+  __gthread_mutex_unlock (&unit_rwlock);
+#endif
  #endif
  }


Lipeng Zhu


Hi Lipeng,

Thanks for the quick reply.  I can confirm that with the above change 
the bootstrap failure is fixed.  However, this shouldn't be considered a 
formal review; libgfortran is not really my area.


I'll be away now until January 2nd.

Richard.