[gcc r15-1784] gcc: docs: Fix documentation of two hooks

2024-07-02 Thread Matthew Malcomson via Gcc-cvs
https://gcc.gnu.org/g:fef7b8ce9029a0538d0c051b3100ed3deb544b2b

commit r15-1784-gfef7b8ce9029a0538d0c051b3100ed3deb544b2b
Author: Matthew Malcomson 
Date:   Tue Jul 2 11:11:11 2024 +0100

gcc: docs: Fix documentation of two hooks

The `function_attribute_inlinable_p` hook documentation described it
returning the value if it is OK to inline the provided fndecl into "the
current function".  AFAICS This hook is only called when
`current_function_decl` is the same as the `fndecl` argument that the
hook is given, hence asking whether `fndecl` can be inlined into "the
current function" doesn't seem relevant.  Moreover from what I see no
existing implementation of `function_attribute_inlinable_p` uses "the
current function" in any way.
Update the documentation to match this understanding.

The `unspec_may_trap_p` documentation mentioned applying to either
`unspec` or `unspec_volatile`.  AFAICS this hook is only used for
`unspec` codes since c84a808e493a, so I removed the mention of
`unspec_volatile`.

gcc/ChangeLog:

* doc/tm.texi: Regenerated.
* target.def (function_attribute_inlinable_p,
unspec_may_trap_p): Update documentation.

Diff:
---
 gcc/doc/tm.texi | 18 --
 gcc/target.def  | 18 --
 2 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 24c92e2e733..f10d9a59c66 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10752,10 +10752,10 @@ attribute handlers.  So far this only affects the 
@var{noinit} and
 
 @deftypefn {Target Hook} bool TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P 
(const_tree @var{fndecl})
 @cindex inlining
-This target hook returns @code{true} if it is OK to inline @var{fndecl}
-into the current function, despite its having target-specific
-attributes, @code{false} otherwise.  By default, if a function has a
-target specific attribute attached to it, it will not be inlined.
+This target hook returns @code{false} if the target-specific attributes on
+@var{fndecl} always block it getting inlined, @code{true} otherwise.  By
+default, if a function has a target specific attribute attached to it, it
+will not be inlined.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_OPTION_VALID_ATTRIBUTE_P (tree 
@var{fndecl}, tree @var{name}, tree @var{args}, int @var{flags})
@@ -12253,12 +12253,10 @@ allocation.
 @end deftypefn
 
 @deftypefn {Target Hook} int TARGET_UNSPEC_MAY_TRAP_P (const_rtx @var{x}, 
unsigned @var{flags})
-This target hook returns nonzero if @var{x}, an @code{unspec} or
-@code{unspec_volatile} operation, might cause a trap.  Targets can use
-this hook to enhance precision of analysis for @code{unspec} and
-@code{unspec_volatile} operations.  You may call @code{may_trap_p_1}
-to analyze inner elements of @var{x} in which case @var{flags} should be
-passed along.
+This target hook returns nonzero if @var{x}, an @code{unspec} might cause
+a trap.  Targets can use this hook to enhance precision of analysis for
+@code{unspec} operations.  You may call @code{may_trap_p_1} to analyze inner
+elements of @var{x} in which case @var{flags} should be passed along.
 @end deftypefn
 
 @deftypefn {Target Hook} void TARGET_SET_CURRENT_FUNCTION (tree @var{decl})
diff --git a/gcc/target.def b/gcc/target.def
index e6f4df963f0..ce4d1ecd58b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2314,10 +2314,10 @@ attribute handlers.  So far this only affects the 
@var{noinit} and\n\
 DEFHOOK
 (function_attribute_inlinable_p,
  "@cindex inlining\n\
-This target hook returns @code{true} if it is OK to inline @var{fndecl}\n\
-into the current function, despite its having target-specific\n\
-attributes, @code{false} otherwise.  By default, if a function has a\n\
-target specific attribute attached to it, it will not be inlined.",
+This target hook returns @code{false} if the target-specific attributes on\n\
+@var{fndecl} always block it getting inlined, @code{true} otherwise.  By\n\
+default, if a function has a target specific attribute attached to it, it\n\
+will not be inlined.",
  bool, (const_tree fndecl),
  hook_bool_const_tree_false)
 
@@ -4067,12 +4067,10 @@ allocation.",
FLAGS has the same meaning as in rtlanal.cc: may_trap_p_1.  */
 DEFHOOK
 (unspec_may_trap_p,
- "This target hook returns nonzero if @var{x}, an @code{unspec} or\n\
-@code{unspec_volatile} operation, might cause a trap.  Targets can use\n\
-this hook to enhance precision of analysis for @code{unspec} and\n\
-@code{unspec_volatile} operations.  You may call @code{may_trap_p_1}\n\
-to analyze inner elements of @var{x} in which case @var{flags} should be\n\
-passed along.",
+ "This target hook returns nonzero if @var{x}, an @code{unspec} might cause\n\
+a trap.  Targets can use this hook to enhance precision of analysis for\n\
+@code{unspec} operations.  You may call @code{may_trap_p_1} to analyze inner\n\
+elements of @var{x} in which case 

[gcc r15-2244] [MAINTAINERS] Update email and move to DCO

2024-07-24 Thread Matthew Malcomson via Gcc-cvs
https://gcc.gnu.org/g:93ced50d1cf6ac0855934cbbc5eb08e870d7949c

commit r15-2244-g93ced50d1cf6ac0855934cbbc5eb08e870d7949c
Author: Matthew Malcomson 
Date:   Tue Jul 23 14:56:41 2024 +0100

[MAINTAINERS] Update email and move to DCO

* MAINTAINERS: Update my email address.

Signed-off-by: Matthew Malcomson 

Diff:
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 200a223b431f..542d058d727c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -636,7 +636,7 @@ Luis Machadoluisgpm 

 Ziga Mahkovec   ziga
 Vladimir Makarovvmakarov
 David Malcolm   dmalcolm
-Matthew Malcomson   matmal01
+Matthew Malcomson   matmal01
 Mikhail Maltsev miyuki  
 Jose E. Marchesijemarch 
 Stamatis Markianos-Wright   stammark
@@ -922,6 +922,7 @@ Jeff Law

 Jeff Law
 Jeff Law
 H.J. Lu 
+Matthew Malcomson   
 Immad Mir   
 Gaius Mulley
 Andrew Pinski   


[gcc r15-6042] c++: Allow overloaded builtins to be used in SFINAE context

2024-12-09 Thread Matthew Malcomson via Gcc-cvs
https://gcc.gnu.org/g:9ed094a817ecaf43c79505286759b88eb0555be2

commit r15-6042-g9ed094a817ecaf43c79505286759b88eb0555be2
Author: Matthew Malcomson 
Date:   Mon Oct 7 16:42:41 2024 +0100

c++: Allow overloaded builtins to be used in SFINAE context

This commit newly introduces the ability to use overloaded builtins in
C++ SFINAE context.

The goal behind this is in order to ensure there is a single mechanism
that libstdc++ can use to determine whether a given type can be used in
the atomic fetch_add (and similar) builtins.  I am working on another
patch that hopes to use this mechanism to identify whether fetch_add
(and similar) work on floating point types.

Current state of the world:

GCC currently exposes resolved versions of these builtins to the
user, so for GCC it's currently possible to use tests similar to the
below to check for atomic loads on a 2 byte sized object.
  #if __has_builtin(__atomic_load_2)
Clang does not expose resolved versions of the atomic builtins.

clang currently allows SFINAE on builtins, so that C++ code can
check whether a builtin is available on a given type.
GCC does not (and that is what this patch aims to change).

C libraries like libatomic can check whether a given atomic builtin
can work on a given type by using autoconf to check for a
miscompilation when attempting such a use.

My goal:
I would like to enable floating point fetch_add (and similar) in
GCC, in order to use those overloads in libstdc++ implementation of
atomic::fetch_add.
This should allow compilers targeting GPU's which have floating
point fetch_add instructions to emit optimal code.

In order to do that I need some consistent mechanism that libstdc++
can use to identify whether the fetch_add builtins have floating
point overloads (and for which types these exist).

I would hence like to enable SFINAE on builtins, so that libstdc++
can use that mechanism for the floating point fetch_add builtins.

Implementation follows the existing mechanism for handling SFINAE
contexts in c-common.cc.  A boolean is passed into the c-common.cc
function indicating whether these functions should emit errors or not.
This boolean comes from `complain & tf_error` in the C++ frontend.
(Similar to other functions like valid_array_size_p and
c_build_vec_perm_expr).

This is done both for resolve_overloaded_builtin and
check_builtin_function_arguments, both of which can be used in SFINAE
contexts.
I attempted to trigger something using the `reject_gcc_builtin`
function in an SFINAE context.  Given the context where this
function is called from the C++ frontend it looks like it may be
possible, but I did not manage to trigger this in template context
by attempting to do something similar to the testcases added around
those calls.
- I would appreciate any feedback on whether this is something that
  can happen in a template context, and if so some help writing a
  relevant testcase for it.

Both of these functions have target hooks for target specific builtins
that I have updated to take the extra boolean flag.  I have not adjusted
the functions implementing those target hooks (except to update the
declarations) so target specific builtins will still error in SFINAE
contexts.
- I could imagine not updating the target hook definition since nothing
  would use that change.  However I figure that allowing targets to
  decide this behaviour would be the right thing to do eventually, and
  since this is the target-independent part of the change to do that
  this patch should make that change.
  Could adjust if others disagree.

Other relevant points that I'd appreciate reviewers check:
- I did not pass this new flag through
  atomic_bitint_fetch_using_cas_loop since the _BitInt type is not
  available in the C++ frontend and I didn't want if conditions that can
  not be executed in the source.
- I only test non-compile-time-constant types with SVE types, since I do
  not know of a way to get a VLA into a SFINAE context.
- While writing tests I noticed a few differences with clang in this
  area.  I don't think they are problematic but am mentioning them for
  completeness and to allow others to judge if these are a problem).
  - atomic_fetch_add on a boolean is allowed by clang.
  - When __atomic_load is passed an invalid memory model (i.e. too
large), we give an SFINAE failure while clang does not.

Bootstrap and regression tested on AArch64 and x86_64.
Built first stage on targets whose target hook declaration needed
updated (though did not regtest etc).  T

[gcc r15-6051] clang-format AlwaysBreakAfterReturnType to TopLevelDefinitions

2024-12-09 Thread Matthew Malcomson via Gcc-cvs
https://gcc.gnu.org/g:0c83096f19b075ac1b80113828188a0fd64400af

commit r15-6051-g0c83096f19b075ac1b80113828188a0fd64400af
Author: Matthew Malcomson 
Date:   Mon Dec 9 10:51:44 2024 +

clang-format AlwaysBreakAfterReturnType to TopLevelDefinitions

The previous value of TopLevel meant that the function name of
declarations would also be on a new line.  THis does not match the
current formatting of headers.

Manual testing done on c-common.h.

Also set BraceWrapping.BeforeWhile to true to match the formatting
specified for do/while loops in GNU coding standards.
https://www.gnu.org/prep/standards/standards.html#Formatting

Ok for trunk?

contrib/ChangeLog:

* clang-format: AlwaysBreakAfterReturnType set to
TopLevelDefinitions and BraceWrapping.BeforeWhile set to true.

Signed-off-by: Matthew Malcomson 

Diff:
---
 contrib/clang-format | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/clang-format b/contrib/clang-format
index 95f1455c14db..4ed50ab6b268 100644
--- a/contrib/clang-format
+++ b/contrib/clang-format
@@ -27,7 +27,7 @@
 ---
 Language: Cpp
 AccessModifierOffset: -2
-AlwaysBreakAfterReturnType: TopLevel
+AlwaysBreakAfterReturnType: TopLevelDefinitions
 BinPackArguments: true
 BinPackParameters: true
 BraceWrapping:
@@ -42,6 +42,7 @@ BraceWrapping:
   AfterUnion: true
   BeforeCatch: true
   BeforeElse: true
+  BeforeWhile: true
   IndentBraces: true
   SplitEmptyFunction: false
 BreakBeforeBinaryOperators: All


[gcc r15-5989] clang-format BraceWrapping.AfterCaseLabel to true

2024-12-06 Thread Matthew Malcomson via Gcc-cvs
https://gcc.gnu.org/g:788334199917aaa02f696abb0bca02bf9a460547

commit r15-5989-g788334199917aaa02f696abb0bca02bf9a460547
Author: Matthew Malcomson 
Date:   Tue Dec 3 22:13:40 2024 +

clang-format BraceWrapping.AfterCaseLabel to true

This setting seems to better match the indentation that is used in GCC.

Adds an exra level of indentation after braces in a case statement.

Only manual testing done on the switch statements in
c-common.cc:resolve_overloaded_builtin and
alias.cc:record_component_aliases.

Ok for trunk?

contrib/ChangeLog:

* clang-format: Set BraceWrapping.AfterCaseLabel.

Signed-off-by: Matthew Malcomson 

Diff:
---
 contrib/clang-format | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/clang-format b/contrib/clang-format
index f4f708648eb9..95f1455c14db 100644
--- a/contrib/clang-format
+++ b/contrib/clang-format
@@ -32,6 +32,7 @@ BinPackArguments: true
 BinPackParameters: true
 BraceWrapping:
   AfterClass: true
+  AfterCaseLabel: true
   AfterControlStatement: true
   AfterEnum: true
   AfterFunction: true


[gcc r15-7587] gcc: testsuite: Fix builtin-speculation-overloads[14].C testism

2025-02-17 Thread Matthew Malcomson via Gcc-cvs
https://gcc.gnu.org/g:9335ff73a509a1f203de691052d600facd07c3f8

commit r15-7587-g9335ff73a509a1f203de691052d600facd07c3f8
Author: Matthew Malcomson 
Date:   Mon Feb 10 16:24:20 2025 +

gcc: testsuite: Fix builtin-speculation-overloads[14].C testism

When making warnings trigger a failure in template substitution I
could not find any way to trigger the warning about builtin speculation
not being available on the given target.

Turns out I misread the code -- this warning happens when the
speculation_barrier pattern is not defined.

Here we add an effective target to represent
"__builtin_speculation_safe_value is available on this target" and use
that to adjust our test on SFINAE behaviour accordingly.
N.b. this means that we get extra testing -- not just that things work
on targets which support __builtin_speculation_safe_value, but also that
the behaviour works on targets which don't support it.

Tested with AArch64 native, AArch64 cross compiler, and RISC-V cross
compiler (just running the tests that I've changed).

Ok for trunk?

gcc/testsuite/ChangeLog:

PR target/117991
* g++.dg/template/builtin-speculation-overloads.def: SUCCESS
argument in SPECULATION_ASSERTS now uses a macro `true_def`
instead of the literal `true` for arguments which should work
with `__builtin_speculation_safe_value`.
* g++.dg/template/builtin-speculation-overloads1.C: Define
`true_def` macro on command line to compiler according to the
effective target representing that
`__builtin_speculation_safe_value` does something on this
target.
* g++.dg/template/builtin-speculation-overloads4.C: Likewise.
* lib/target-supports.exp
(check_effective_target_speculation_barrier_defined): New.

Signed-off-by: Matthew Malcomson 

Diff:
---
 gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def | 9 ++---
 gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C  | 2 ++
 gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C  | 2 ++
 gcc/testsuite/lib/target-supports.exp   | 9 +
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def 
b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def
index 39d9b748d524..ada13e6f77c3 100644
--- a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def
+++ b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def
@@ -15,14 +15,17 @@ class X{};
 class Large { public: int arr[10]; };
 class Incomplete;
 
+/* Using `true_def` in order to account for the fact that if this target
+ * doesn't support __builtin_speculation_safe_value at all everything fails to
+ * substitute.  */
 #define SPECULATION_ASSERTS
\
-  MAKE_SPECULATION_ASSERT (int, true)  
\
+  MAKE_SPECULATION_ASSERT (int, true_def)  
\
   MAKE_SPECULATION_ASSERT (float, false)   
\
   MAKE_SPECULATION_ASSERT (X, false)   
\
   MAKE_SPECULATION_ASSERT (Large, false)   
\
   MAKE_SPECULATION_ASSERT (Incomplete, false)  
\
-  MAKE_SPECULATION_ASSERT (int *, true)
\
-  MAKE_SPECULATION_ASSERT (long, true)
+  MAKE_SPECULATION_ASSERT (int *, true_def)
\
+  MAKE_SPECULATION_ASSERT (long, true_def)
 
 int main() {
 SPECULATION_ASSERTS
diff --git a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C 
b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C
index bc8f1083a994..4c50d4aa6f5e 100644
--- a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C
+++ b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C
@@ -1,5 +1,7 @@
 /* Check that overloaded builtins can be used in templates with SFINAE.  */
 // { dg-do compile { target c++17 } }
+// { dg-additional-options "-Dtrue_def=true" { target 
speculation_barrier_defined } }
+// { dg-additional-options "-Dtrue_def=false" { target { ! 
speculation_barrier_defined } } }
 
 /* Checks performed here:
Various types (some that work, some that don't).  */
diff --git a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C 
b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C
index c024a21fa18e..cc0b3131af77 100644
--- a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C
+++ b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C
@@ -1,5 +1,7 @@
 /* Check that overloaded builtins can be used in templates with SFINAE.  */
 // { dg-do compile { target c++17 } }
+// { dg-additional-o

[gcc] Created branch 'matmal01/heads/openmp-pr119588' in namespace 'refs/users'

2025-09-10 Thread Matthew Malcomson via Gcc-cvs
The branch 'matmal01/heads/openmp-pr119588' was created in namespace 
'refs/users' pointing to:

 f7babdba2650... [RFC] Implement "flat" barrier in libgomp


[gcc(refs/users/matmal01/heads/openmp-pr119588)] [RFC] Implement "flat" barrier in libgomp

2025-09-17 Thread Matthew Malcomson via Gcc-cvs
https://gcc.gnu.org/g:f7babdba2650aad72fb189f0988650e009ef3041

commit f7babdba2650aad72fb189f0988650e009ef3041
Author: Matthew Malcomson 
Date:   Wed Jun 25 15:30:53 2025 +0100

[RFC] Implement "flat" barrier in libgomp

Working on PR119588.  From that PR:
We've seen on some internal workloads (NVPL BLAS running GEMM routine
on a small matrix) that the overhead of a `#pragma omp parallel`
statement when running with a high number of cores (72 or 144) is much
higher with the libgomp implementation than with LLVM's libomp.

In a program which has both some work that can be handled with high
parallelism (so OMP is running with many threads) and a large number
of small pieces of work that need to be performed with low overhead,
this has been seen to cause a significant overhead when accumulated.

--
This patch is still not complete -- especially around spreading the
new interface to the nvptx and gcn targets -- but I've made more
progress and would like to gather feedback on whether people think
this is a reasonable approach.

Points that I would like feedback on:
- Is this extra complexity OK, especially for the futex_waitv
  fallback?
  - I have thought about creating yet another target (in the same way
that the linux/ target is for kernels where `futex` is available).
Would this approach be favoured?
- Are the changes in `gomp_team_start` significant enough for other
  targets that it's worth bringing it into the linux/ target?
  Similar to what I see the nvptx/ target has done for
  `gomp_team_start`  Could have some `ifdef` for the alternate barrier
  API rather than try to modify all the other interfaces to match a
  slightly modified API.
  - As far as I can tell so far the only modifications required by
these changes are in the posix/ target and I've made them.  I
haven't yet checked 100% but I believe the nvptx and gcn targets
don't use this function (believe because they don't define the
`reinit` function and guess they don't use pthreads).

--
This patch introduces a barrier that performs better on the
microbenchmark provided in PR119588.  I have ran various higher-level
benchmarks and have not seen anything I can claim as outside noise
range so far -- though I'm working on reducing the noise in my
benchmarks so hopefully will be able to say that soon.

This patch is very similar to that I added on the PR a month and a bit
ago.  Here I have extended the barrier added there to also be used for
cancellable barriers and to handle tasking (the two main things left
outstanding in the functionality of that patch).

The outline of the barrier is:
1) Each thread associated with a barrier has its own ID (currently the
   team_id of the team that the current thread is a part of).
2) When a thread arrives at the barrier it marks its "arrived flag" as
   having arrived.
3) ID 0 (which is always the thread which does the bookkeeping for
   this team, at the top-level it's the primary thread) waits on each
   of the other threads "arrived flags".  It executes tasks while
   waiting on these threads to arrive.
4) Once ID 0 sees all other threads have arrived it signals to all
   secondary threads that the barrier is complete via incrementing the
   global generation (as usual).

Fundamental differences being that we have thread-local "arrived"
flags instead of a single counter.  That means there is less
contention and appears to speed up the barrier significantly when
threads are hitting the barrier very hard.

Other differences are:
1) The "coordinating" thread is pre-determined rather than "whichever
   thread hits the barrier last".
   - This has some knock-on effects w.r.t. how the barrier is used.
 (See e.g. the changes in work.c).
   - This also means that the behaviour of the pre-determined
 "coordinating" thread must be more complicated while it's waiting
 on other threads to arrive.  Especially see the task handling in
 `gomp_team_barrier_ensure_last`.
   - Because we assign the "coordinating" thread to be the primary
 thread this does mean that in-between points (3) and (4) above
 the primary thread can see the results of the operations of all
 secondary threads.  A point important for another optimisation I
 hope to make later where we only go through one barrier per
 iteration in the main execution loop of `gomp_thread_start` for
 top-level threads.
2) When a barrier needs to be adjusted so that different threads have
   different ID's we must serialise all threads before this adjustment.
   - The previous design had no assignment of threads to s