Re: [PATCH v6] libgfortran: Replace mutex with rwlock

2023-12-08 Thread Jakub Jelinek
On Fri, Aug 18, 2023 at 11:18:19AM +0800, Zhu, Lipeng wrote:
> From: Lipeng Zhu 
> 
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
> 
> libgcc/ChangeLog:
> 
> * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro
> (__gthrw): New function
> (__gthread_rwlock_rdlock): New function
> (__gthread_rwlock_tryrdlock): New function
> (__gthread_rwlock_wrlock): New function
> (__gthread_rwlock_trywrlock): New function
> (__gthread_rwlock_unlock): New function
> 
> libgfortran/ChangeLog:
> 
> * io/async.c (DEBUG_LINE): New
> * io/async.h (RWLOCK_DEBUG_ADD): New macro
> (CHECK_RDLOCK): New macro
> (CHECK_WRLOCK): New macro
> (TAIL_RWLOCK_DEBUG_QUEUE): New macro
> (IN_RWLOCK_DEBUG_QUEUE): New macro
> (RDLOCK): New macro
> (WRLOCK): New macro
> (RWUNLOCK): New macro
> (RD_TO_WRLOCK): New macro
> (INTERN_RDLOCK): New macro
> (INTERN_WRLOCK): New macro
> (INTERN_RWUNLOCK): New macro
> * io/io.h (internal_proto): Define unit_rwlock
> * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock
> (st_write_done_worker): Relace unit_lock with unit_rwlock
> * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock
> (if): Relace unit_lock with unit_rwlock
> (close_unit_1): Relace unit_lock with unit_rwlock
> (close_units): Relace unit_lock with unit_rwlock
> (newunit_alloc): Relace unit_lock with unit_rwlock
> * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock

The changeLog entries are all incorrect.
1) they should be indented by a tab, not 4 spaces, and should end with
   a dot
2) when several consecutive descriptions have the same text, especially
   when it is long, it should use Likewise. for the 2nd and following
3) (internal_proto) is certainly not what you've changed, from what I can
   see in io.h you've done:
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
   (if) is certainly not what you've changed either, always find what
   function or macro the change was in, or if you remove something, state
   it, if you add something, state it.
4) all the
   Replace unit_lock with unit_rwlock. descriptions only partially match
   reality, you've also changed the operations on those variables.

> --- a/libgfortran/io/async.h
> +++ b/libgfortran/io/async.h
> @@ -207,9 +207,132 @@
>  INTERN_LOCK (&debug_queue_lock); \
>  MUTEX_DEBUG_ADD (mutex); \
>  INTERN_UNLOCK (&debug_queue_lock);   
> \
> -DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", 
> aio_prefix, #mutex, mutex); \
> +DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", 
> aio_prefix, #mutex, \
> +  mutex); \

Why are you changing this at all?

> +#define RD_TO_WRLOCK(rwlock) \
> +  RWUNLOCK (rwlock);\

At least a space before \ (or better tab

> +#define RD_TO_WRLOCK(rwlock) \
> +  RWUNLOCK (rwlock);\

Likewise.

> +  WRLOCK (rwlock);
> +#endif
> +#endif
> +
> +#ifndef __GTHREAD_RWLOCK_INIT
> +#define RDLOCK(rwlock) LOCK (rwlock)
> +#define WRLOCK(rwlock) LOCK (rwlock)
> +#define RWUNLOCK(rwlock) UNLOCK (rwlock)
> +#define RD_TO_WRLOCK(rwlock) {}

do {} while (0)
instead of {}
?

>  #endif
>  
>  #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex);
>  
>  #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, mutex);
>  
> +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, rwlock);
> +#define INTERN_WRLOCK(rwlock) T_ERROR (__gthread_rwlock_wrlock, rwlock);
> +#define INTERN_RWUNLOCK(rwlock) T_ERROR (__gthread_rwlock_unlock, rwlock);

Admittedly preexisting issue, but I wonder why the ; at the end, all the
uses of RDLOCK etc. I've seen were followed by ;

> --- a/libgfortran/io/unit.c
> +++ b/libgfortran/io/unit.c
> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  
>  
>  /* IO locking rules:
> -   UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   UNIT_RWLOCK is a master lock, protect

[PATCH] Fortran: allow NULL() for POINTER, OPTIONAL, CONTIGUOUS dummy [PR111503]

2023-12-08 Thread Harald Anlauf
Dear all,

here's another fix for the CONTIGUOUS attribute: NULL() should
derive its characteristics from its MOLD argument; otherwise it is
"determined by the entity with which the reference is associated".
(F2018:16.9.144).

The testcase is cross-checked with Intel.
NAG rejects cases where MOLD is a pointer.  I think it is wrong here.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From c73b248ec16388ed1ce109fce8a468a87e367085 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 8 Dec 2023 11:11:08 +0100
Subject: [PATCH] Fortran: allow NULL() for POINTER, OPTIONAL, CONTIGUOUS dummy
 [PR111503]

gcc/fortran/ChangeLog:

	PR fortran/111503
	* expr.cc (gfc_is_simply_contiguous): Determine characteristics of
	NULL() from MOLD argument if present, otherwise treat as present.
	* primary.cc (gfc_variable_attr): Derive attributes of NULL(MOLD)
	from MOLD.

gcc/testsuite/ChangeLog:

	PR fortran/111503
	* gfortran.dg/contiguous_14.f90: New test.
---
 gcc/fortran/expr.cc | 14 
 gcc/fortran/primary.cc  |  4 ++-
 gcc/testsuite/gfortran.dg/contiguous_14.f90 | 39 +
 3 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/contiguous_14.f90

diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index c668baeef8c..709f3c3cbef 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -5958,6 +5958,20 @@ gfc_is_simply_contiguous (gfc_expr *expr, bool strict, bool permit_element)
   if (expr->expr_type == EXPR_ARRAY)
 return true;

+  if (expr->expr_type == EXPR_NULL)
+{
+  /* F2018:16.9.144  NULL ([MOLD]):
+	 "If MOLD is present, the characteristics are the same as MOLD."
+	 "If MOLD is absent, the characteristics of the result are
+	 determined by the entity with which the reference is associated."
+	 F2018:15.3.2.2 characteristics attributes include CONTIGUOUS.  */
+  if (expr->ts.type == BT_UNKNOWN)
+	return true;
+  else
+	return (gfc_variable_attr (expr, NULL).contiguous
+		|| gfc_variable_attr (expr, NULL).allocatable);
+}
+
   if (expr->expr_type == EXPR_FUNCTION)
 {
   if (expr->value.function.isym)
diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 7278932b634..f8a1c09d190 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -2627,7 +2627,9 @@ gfc_variable_attr (gfc_expr *expr, gfc_typespec *ts)
   gfc_component *comp;
   bool has_inquiry_part;

-  if (expr->expr_type != EXPR_VARIABLE && expr->expr_type != EXPR_FUNCTION)
+  if (expr->expr_type != EXPR_VARIABLE
+  && expr->expr_type != EXPR_FUNCTION
+  && !(expr->expr_type == EXPR_NULL && expr->ts.type != BT_UNKNOWN))
 gfc_internal_error ("gfc_variable_attr(): Expression isn't a variable");

   sym = expr->symtree->n.sym;
diff --git a/gcc/testsuite/gfortran.dg/contiguous_14.f90 b/gcc/testsuite/gfortran.dg/contiguous_14.f90
new file mode 100644
index 000..21e42311e9c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/contiguous_14.f90
@@ -0,0 +1,39 @@
+! { dg-do compile }
+! PR fortran/111503 - passing NULL() to POINTER, OPTIONAL, CONTIGUOUS dummy
+
+program test
+  implicit none
+  integer, pointer, contiguous :: p(:) => null()
+  integer, allocatable, target :: a(:)
+  type t
+ integer, pointer, contiguous :: p(:) => null()
+ integer, allocatable :: a(:)
+  end type t
+  type(t),   target :: z
+  class(t), allocatable, target :: c
+  print *, is_contiguous (p)
+  allocate (t :: c)
+  call one (p)
+  call one ()
+  call one (null ())
+  call one (null (p))
+  call one (a)
+  call one (null (a))
+  call one (z% p)
+  call one (z% a)
+  call one (null (z% p))
+  call one (null (z% a))
+  call one (c% p)
+  call one (c% a)
+  call one (null (c% p))
+  call one (null (c% a))
+contains
+  subroutine one (x)
+integer, pointer, optional, contiguous, intent(in) :: x(:)
+print *, present (x)
+if (present (x)) then
+   print *, "->", associated (x)
+   if (associated (x)) stop 99
+end if
+  end subroutine one
+end
--
2.35.3



Re: [PATCH] Fortran: allow NULL() for POINTER, OPTIONAL, CONTIGUOUS dummy [PR111503]

2023-12-08 Thread FX Coudert
Hi Harald,

> here's another fix for the CONTIGUOUS attribute: NULL() should
> derive its characteristics from its MOLD argument; otherwise it is
> "determined by the entity with which the reference is associated".
> (F2018:16.9.144).

Looking good to me, but leave 48 hours for someone else to object if they want.

Best,
FX

Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-12-08 Thread Jakub Jelinek
On Wed, Nov 08, 2023 at 05:58:10PM +0100, Tobias Burnus wrote:
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -11739,6 +11739,7 @@ builtin_fnspec (tree callee)
>   return ".cO ";
>/* Realloc serves both as allocation point and deallocation point.  */
>case BUILT_IN_REALLOC:
> +  case BUILT_IN_GOMP_REALLOC:
>   return ".Cw ";

I must say I've never been sure if one needs to specify those ". " for
integral arguments for which nothing is known; if they would need to be,
then also all the BUILT_IN_GOMP_* other cases would be wrong, but not just
those, also BUILT_IN_*_CHK (which have extra size_t argument) or
here BUILT_IN_REALLOC itself.  So, let's hope it is ok as is.

Otherwise, the middle-end changes look just fine to me, and for Fortran
FE I'm afraid you know it much more than I do.

Jakub



Re: [patch] OpenMP: Add uses_allocators support

2023-12-08 Thread Jakub Jelinek
On Mon, Nov 20, 2023 at 11:42:02AM +0100, Tobias Burnus wrote:
> 2023-11-19  Tobias Burnus  
>   Chung-Lin Tang 
> 
> gcc/ChangeLog:
> 
>   * builtin-types.def (BT_FN_VOID_PTRMODE):
>   (BT_FN_PTRMODE_PTRMODE_INT_PTR): Add.
>   * gimplify.cc (gimplify_bind_expr): Diagnose missing
>   uses_allocators clause.
>   (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses,
>   gimplify_omp_workshare): Handle uses_allocators.
>   * omp-builtins.def (BUILT_IN_OMP_INIT_ALLOCATOR,
>   BUILT_IN_OMP_DESTROY_ALLOCATOR): Add.
>   * omp-low.cc (scan_sharing_clauses):

Missing description.

> +static tree
> +c_parser_omp_clause_uses_allocators (c_parser *parser, tree list)
> +{
> +  location_t clause_loc = c_parser_peek_token (parser)->location;
> +  tree t = NULL_TREE, nl = list;
> +  matching_parens parens;
> +  if (!parens.require_open (parser))
> +return list;
> +
> +  tree memspace_expr = NULL_TREE;
> +  tree traits_var = NULL_TREE;
> +
> +  struct item_tok
> +  {
> +location_t loc;
> +tree id;
> +item_tok (void) : loc (UNKNOWN_LOCATION), id (NULL_TREE) {}
> +  };
> +  struct item { item_tok name, arg; };
> +  auto_vec *modifiers = NULL, *allocators = NULL;
> +  auto_vec *cur_list = new auto_vec (4);

This is certainly the first time I've seen pointers to auto_vec,
normally one uses just vec in such cases, auto_vec is used typically
on automatic variables to make sure the destruction is done.

But I think all the first parse it as a token soup without checking
anything and only in the second round actually check it is something
we've never done before in exactly the same situations.

The usual way would be to quickly peek at tokens to see if there is
: ahead and decide based on that.

See e.g. c_parser_omp_clause_allocate clause.

That has_modifiers check could be basically copied over with
the names of modifiers changed, or could be done in a loop, or
could be moved into a helper function which could be used
by c_parser_omp_clause_allocate and this function and perhaps
others, pass it the list of modifiers and have it return whether
there are modifiers or not.
c_parser_omp_clause_linear does this too (though, that has one of
the modifiers without arguments).

I'm afraid if parsing of every clause does the parsing so significantly
differently it will be a maintainance nightmare.

> @@ -23648,7 +23861,8 @@ c_parser_omp_target_exit_data (location_t loc, 
> c_parser *parser,
>   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION) \
>   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_THREAD_LIMIT) \
>   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR)\
> - | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR))
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR)\

Space before \ please (and also  in the line above.

> +   if (strcmp (IDENTIFIER_POINTER (DECL_NAME (t)),
> +   "omp_null_allocator") == 0)
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c),
> + "% cannot be used in "
> + "% clause");
> +   break;
> + }
> +
> +   if (OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE (c)
> +   || OMP_CLAUSE_USES_ALLOCATORS_TRAITS (c))
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c),
> + "modifiers cannot be used with pre-defined "
> + "allocators");
> +   break;
> + }
> + }
> +   t = OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE (c);
> +   if (t != NULL_TREE
> +   && (TREE_CODE (t) != CONST_DECL
> +   || TREE_CODE (TREE_TYPE (t)) != ENUMERAL_TYPE
> +   || strcmp (IDENTIFIER_POINTER (TYPE_IDENTIFIER (TREE_TYPE 
> (t))),
> +  "omp_memspace_handle_t") != 0))
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c), "memspace modifier must be "

Maybe % ?

> + "constant enum of % type");
> +   remove = true;
> +   break;
> + }
> +   t = OMP_CLAUSE_USES_ALLOCATORS_TRAITS (c);
> +   if (t != NULL_TREE)
> + {
> +   bool type_err = false;
> +
> +   if (TREE_CODE (TREE_TYPE (t)) != ARRAY_TYPE
> +   || DECL_SIZE (t) == NULL_TREE)
> + type_err = true;
> +   else
> + {
> +   tree elem_t = TREE_TYPE (TREE_TYPE (t));
> +   if (TREE_CODE (elem_t) != RECORD_TYPE
> +   || strcmp (IDENTIFIER_POINTER (TYPE_IDENTIFIER (elem_t)),
> +  "omp_alloctrait_t") != 0
> +   || !TYPE_READONLY (elem_t))
> + type_err = true;
> + }
> +   if (type_err)
> + {
> +   if (TREE_CODE (t) != ERROR_MARK)

t != error_mark_node

> + error_at (OMP_CLAUSE_LOCATION (c), "traits array %qE must "
> +  

Re: [Patch] OpenMP: Support acquires/release in 'omp require atomic_default_mem_order'

2023-12-08 Thread Jakub Jelinek
On Tue, Nov 28, 2023 at 12:28:05PM +0100, Tobias Burnus wrote:
> I stumbled over this omission when looking at Sandra's patch. It turned out 
> that this is
> a new OpenMP 5.2 feature - probably added to simplify/unify the syntax. I 
> guess the reason
> that release/acquire wasn't added before is that it cannot be universally be 
> used - read/write
> do only accept one of them.

I thought when this was discussed that it was meant to behave right (choose
some more appropriate memory model if one was not allowed), but reading 5.2
I think that is not what ended up in the spec, because [213:11-13] says that
atomic_default_mem_order is as if the argument appeared on any atomic
directive without explicit mem-order clause and atomic directive has the
[314:9-10] restrictions.

I'd bring this to omp-lang whether it was really meant that
#pragma omp requires atomic_default_mem_order (release)
int foo (int *p) {
  int t;
  #pragma omp atomic read
t = *p;
  return t;
}
and
#pragma omp requires atomic_default_mem_order (acquire)
void bar (int *p) {
  #pragma omp atomic write
*p = 0;
}
are meant to be invalid.

Another comment, atomic_default_mem_order arguments aren't handled
just in the requires parsing, but also in context selectors.
So, the additions would need to be reflected in
omp_check_context_selector and omp_context_selector_matches
as well.

Jakub