On 5/9/2023 10:32 AM, Zhu, Lipeng wrote:
On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote:
On Mon, 8 May 2023 17:44:43 +0800
Lipeng Zhu <lipeng....@intel.com> wrote:
This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most
instances, we can get the unit in the phase of reading the unit_cache
or unit_root tree. So split the read/write phase by rwlock would be an
approach to make it more parallel.
BTW, the IPC metrics can gain around 9x in our test server with 220
cores. The benchmark we used is https://github.com/rwesson/NEAT
See commentary typos below.
You did not state if you regression tested the patch?
I use valgrind --tool=helgrind or --tool=drd to test 'make
check-fortran'. Is it necessary to add an additional unit test for this
patch?
Other than that it LGTM but i cannot approve it.
Thank you for your kind help for this patch, is there anything that I
can do or can you help to push this patch forward?
Hi Bernhard,
Is there any other refinement that need I to do for this patch?
Thanks.
diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h index
ad226c8e856..0033cc74252 100644
--- a/libgfortran/io/async.h
+++ b/libgfortran/io/async.h
@@ -210,6 +210,128 @@
DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n",
aio_prefix, #mutex,
mutex); \
Thanks, corrected in Patch v5.
} while (0)
+#ifdef __GTHREAD_RWLOCK_INIT
+#define RWLOCK_DEBUG_ADD(rwlock) do { \
+ aio_rwlock_debug *n; \
+ n = xmalloc (sizeof(aio_rwlock_debug)); \
Missing space before the open brace: sizeof (
Thanks, corrected in Patch v5.
diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index
82664dc5f98..62f1db21d34 100644
--- a/libgfortran/io/unit.c
+++ b/libgfortran/io/unit.c
@@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME
respectively. If not, see
/* IO locking rules:
- UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and
UNIT_CACHE.
+ UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and
UNIT_CACHE.
+ And use the rwlock to spilt read and write phase to UNIT_ROOT tree
+ and UNIT_CACHE to increase CPU efficiency.
s/spilt/split. Maybe:
Using an rwlock improves efficiency by allowing us to separate readers
and writers of both UNIT_ROOT
and UNIT_CACHE.
Thanks, corrected in Patch v5.
@@ -350,6 +356,17 @@ retry:
if (c == 0)
break;
}
+ /* We did not find a unit in the cache nor in the unit list,
create a new
+ (locked) unit and insert into the unit list and cache.
+ Manipulating either or both the unit list and the unit cache
requires to
+ hold a write-lock [for obvious reasons]:
+ 1. By separating the read/write lock, it will greatly reduce the
contention
+ at the read part, while write part is not always necessary or
most
+ unlikely once the unit hit in cache.
+ By separating the read/write lock, we will greatly reduce the
contention
+ on the read part, while the write part is unlikely once the unit
hits
+ the cache.
+ 2. We try to balance the implementation complexity and the
performance
+ gains that fit into current cases we observed by just using a
+ pthread_rwlock. */
Let's drop 2.
Got it, thanks!
thanks,