[gcc r15-1784] gcc: docs: Fix documentation of two hooks
https://gcc.gnu.org/g:fef7b8ce9029a0538d0c051b3100ed3deb544b2b commit r15-1784-gfef7b8ce9029a0538d0c051b3100ed3deb544b2b Author: Matthew Malcomson Date: Tue Jul 2 11:11:11 2024 +0100 gcc: docs: Fix documentation of two hooks The `function_attribute_inlinable_p` hook documentation described it returning the value if it is OK to inline the provided fndecl into "the current function". AFAICS This hook is only called when `current_function_decl` is the same as the `fndecl` argument that the hook is given, hence asking whether `fndecl` can be inlined into "the current function" doesn't seem relevant. Moreover from what I see no existing implementation of `function_attribute_inlinable_p` uses "the current function" in any way. Update the documentation to match this understanding. The `unspec_may_trap_p` documentation mentioned applying to either `unspec` or `unspec_volatile`. AFAICS this hook is only used for `unspec` codes since c84a808e493a, so I removed the mention of `unspec_volatile`. gcc/ChangeLog: * doc/tm.texi: Regenerated. * target.def (function_attribute_inlinable_p, unspec_may_trap_p): Update documentation. Diff: --- gcc/doc/tm.texi | 18 -- gcc/target.def | 18 -- 2 files changed, 16 insertions(+), 20 deletions(-) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 24c92e2e733..f10d9a59c66 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -10752,10 +10752,10 @@ attribute handlers. So far this only affects the @var{noinit} and @deftypefn {Target Hook} bool TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P (const_tree @var{fndecl}) @cindex inlining -This target hook returns @code{true} if it is OK to inline @var{fndecl} -into the current function, despite its having target-specific -attributes, @code{false} otherwise. By default, if a function has a -target specific attribute attached to it, it will not be inlined. +This target hook returns @code{false} if the target-specific attributes on +@var{fndecl} always block it getting inlined, @code{true} otherwise. By +default, if a function has a target specific attribute attached to it, it +will not be inlined. @end deftypefn @deftypefn {Target Hook} bool TARGET_OPTION_VALID_ATTRIBUTE_P (tree @var{fndecl}, tree @var{name}, tree @var{args}, int @var{flags}) @@ -12253,12 +12253,10 @@ allocation. @end deftypefn @deftypefn {Target Hook} int TARGET_UNSPEC_MAY_TRAP_P (const_rtx @var{x}, unsigned @var{flags}) -This target hook returns nonzero if @var{x}, an @code{unspec} or -@code{unspec_volatile} operation, might cause a trap. Targets can use -this hook to enhance precision of analysis for @code{unspec} and -@code{unspec_volatile} operations. You may call @code{may_trap_p_1} -to analyze inner elements of @var{x} in which case @var{flags} should be -passed along. +This target hook returns nonzero if @var{x}, an @code{unspec} might cause +a trap. Targets can use this hook to enhance precision of analysis for +@code{unspec} operations. You may call @code{may_trap_p_1} to analyze inner +elements of @var{x} in which case @var{flags} should be passed along. @end deftypefn @deftypefn {Target Hook} void TARGET_SET_CURRENT_FUNCTION (tree @var{decl}) diff --git a/gcc/target.def b/gcc/target.def index e6f4df963f0..ce4d1ecd58b 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2314,10 +2314,10 @@ attribute handlers. So far this only affects the @var{noinit} and\n\ DEFHOOK (function_attribute_inlinable_p, "@cindex inlining\n\ -This target hook returns @code{true} if it is OK to inline @var{fndecl}\n\ -into the current function, despite its having target-specific\n\ -attributes, @code{false} otherwise. By default, if a function has a\n\ -target specific attribute attached to it, it will not be inlined.", +This target hook returns @code{false} if the target-specific attributes on\n\ +@var{fndecl} always block it getting inlined, @code{true} otherwise. By\n\ +default, if a function has a target specific attribute attached to it, it\n\ +will not be inlined.", bool, (const_tree fndecl), hook_bool_const_tree_false) @@ -4067,12 +4067,10 @@ allocation.", FLAGS has the same meaning as in rtlanal.cc: may_trap_p_1. */ DEFHOOK (unspec_may_trap_p, - "This target hook returns nonzero if @var{x}, an @code{unspec} or\n\ -@code{unspec_volatile} operation, might cause a trap. Targets can use\n\ -this hook to enhance precision of analysis for @code{unspec} and\n\ -@code{unspec_volatile} operations. You may call @code{may_trap_p_1}\n\ -to analyze inner elements of @var{x} in which case @var{flags} should be\n\ -passed along.", + "This target hook returns nonzero if @var{x}, an @code{unspec} might cause\n\ +a trap. Targets can use this hook to enhance precision of analysis for\n\ +@code{unspec} operations. You may call @code{may_trap_p_1} to analyze inner\n\ +elements of @var{x} in which case
[gcc r15-2244] [MAINTAINERS] Update email and move to DCO
https://gcc.gnu.org/g:93ced50d1cf6ac0855934cbbc5eb08e870d7949c commit r15-2244-g93ced50d1cf6ac0855934cbbc5eb08e870d7949c Author: Matthew Malcomson Date: Tue Jul 23 14:56:41 2024 +0100 [MAINTAINERS] Update email and move to DCO * MAINTAINERS: Update my email address. Signed-off-by: Matthew Malcomson Diff: --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 200a223b431f..542d058d727c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -636,7 +636,7 @@ Luis Machadoluisgpm Ziga Mahkovec ziga Vladimir Makarovvmakarov David Malcolm dmalcolm -Matthew Malcomson matmal01 +Matthew Malcomson matmal01 Mikhail Maltsev miyuki Jose E. Marchesijemarch Stamatis Markianos-Wright stammark @@ -922,6 +922,7 @@ Jeff Law Jeff Law Jeff Law H.J. Lu +Matthew Malcomson Immad Mir Gaius Mulley Andrew Pinski
[gcc r15-6042] c++: Allow overloaded builtins to be used in SFINAE context
https://gcc.gnu.org/g:9ed094a817ecaf43c79505286759b88eb0555be2 commit r15-6042-g9ed094a817ecaf43c79505286759b88eb0555be2 Author: Matthew Malcomson Date: Mon Oct 7 16:42:41 2024 +0100 c++: Allow overloaded builtins to be used in SFINAE context This commit newly introduces the ability to use overloaded builtins in C++ SFINAE context. The goal behind this is in order to ensure there is a single mechanism that libstdc++ can use to determine whether a given type can be used in the atomic fetch_add (and similar) builtins. I am working on another patch that hopes to use this mechanism to identify whether fetch_add (and similar) work on floating point types. Current state of the world: GCC currently exposes resolved versions of these builtins to the user, so for GCC it's currently possible to use tests similar to the below to check for atomic loads on a 2 byte sized object. #if __has_builtin(__atomic_load_2) Clang does not expose resolved versions of the atomic builtins. clang currently allows SFINAE on builtins, so that C++ code can check whether a builtin is available on a given type. GCC does not (and that is what this patch aims to change). C libraries like libatomic can check whether a given atomic builtin can work on a given type by using autoconf to check for a miscompilation when attempting such a use. My goal: I would like to enable floating point fetch_add (and similar) in GCC, in order to use those overloads in libstdc++ implementation of atomic::fetch_add. This should allow compilers targeting GPU's which have floating point fetch_add instructions to emit optimal code. In order to do that I need some consistent mechanism that libstdc++ can use to identify whether the fetch_add builtins have floating point overloads (and for which types these exist). I would hence like to enable SFINAE on builtins, so that libstdc++ can use that mechanism for the floating point fetch_add builtins. Implementation follows the existing mechanism for handling SFINAE contexts in c-common.cc. A boolean is passed into the c-common.cc function indicating whether these functions should emit errors or not. This boolean comes from `complain & tf_error` in the C++ frontend. (Similar to other functions like valid_array_size_p and c_build_vec_perm_expr). This is done both for resolve_overloaded_builtin and check_builtin_function_arguments, both of which can be used in SFINAE contexts. I attempted to trigger something using the `reject_gcc_builtin` function in an SFINAE context. Given the context where this function is called from the C++ frontend it looks like it may be possible, but I did not manage to trigger this in template context by attempting to do something similar to the testcases added around those calls. - I would appreciate any feedback on whether this is something that can happen in a template context, and if so some help writing a relevant testcase for it. Both of these functions have target hooks for target specific builtins that I have updated to take the extra boolean flag. I have not adjusted the functions implementing those target hooks (except to update the declarations) so target specific builtins will still error in SFINAE contexts. - I could imagine not updating the target hook definition since nothing would use that change. However I figure that allowing targets to decide this behaviour would be the right thing to do eventually, and since this is the target-independent part of the change to do that this patch should make that change. Could adjust if others disagree. Other relevant points that I'd appreciate reviewers check: - I did not pass this new flag through atomic_bitint_fetch_using_cas_loop since the _BitInt type is not available in the C++ frontend and I didn't want if conditions that can not be executed in the source. - I only test non-compile-time-constant types with SVE types, since I do not know of a way to get a VLA into a SFINAE context. - While writing tests I noticed a few differences with clang in this area. I don't think they are problematic but am mentioning them for completeness and to allow others to judge if these are a problem). - atomic_fetch_add on a boolean is allowed by clang. - When __atomic_load is passed an invalid memory model (i.e. too large), we give an SFINAE failure while clang does not. Bootstrap and regression tested on AArch64 and x86_64. Built first stage on targets whose target hook declaration needed updated (though did not regtest etc). T
[gcc r15-6051] clang-format AlwaysBreakAfterReturnType to TopLevelDefinitions
https://gcc.gnu.org/g:0c83096f19b075ac1b80113828188a0fd64400af commit r15-6051-g0c83096f19b075ac1b80113828188a0fd64400af Author: Matthew Malcomson Date: Mon Dec 9 10:51:44 2024 + clang-format AlwaysBreakAfterReturnType to TopLevelDefinitions The previous value of TopLevel meant that the function name of declarations would also be on a new line. THis does not match the current formatting of headers. Manual testing done on c-common.h. Also set BraceWrapping.BeforeWhile to true to match the formatting specified for do/while loops in GNU coding standards. https://www.gnu.org/prep/standards/standards.html#Formatting Ok for trunk? contrib/ChangeLog: * clang-format: AlwaysBreakAfterReturnType set to TopLevelDefinitions and BraceWrapping.BeforeWhile set to true. Signed-off-by: Matthew Malcomson Diff: --- contrib/clang-format | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/clang-format b/contrib/clang-format index 95f1455c14db..4ed50ab6b268 100644 --- a/contrib/clang-format +++ b/contrib/clang-format @@ -27,7 +27,7 @@ --- Language: Cpp AccessModifierOffset: -2 -AlwaysBreakAfterReturnType: TopLevel +AlwaysBreakAfterReturnType: TopLevelDefinitions BinPackArguments: true BinPackParameters: true BraceWrapping: @@ -42,6 +42,7 @@ BraceWrapping: AfterUnion: true BeforeCatch: true BeforeElse: true + BeforeWhile: true IndentBraces: true SplitEmptyFunction: false BreakBeforeBinaryOperators: All
[gcc r15-5989] clang-format BraceWrapping.AfterCaseLabel to true
https://gcc.gnu.org/g:788334199917aaa02f696abb0bca02bf9a460547 commit r15-5989-g788334199917aaa02f696abb0bca02bf9a460547 Author: Matthew Malcomson Date: Tue Dec 3 22:13:40 2024 + clang-format BraceWrapping.AfterCaseLabel to true This setting seems to better match the indentation that is used in GCC. Adds an exra level of indentation after braces in a case statement. Only manual testing done on the switch statements in c-common.cc:resolve_overloaded_builtin and alias.cc:record_component_aliases. Ok for trunk? contrib/ChangeLog: * clang-format: Set BraceWrapping.AfterCaseLabel. Signed-off-by: Matthew Malcomson Diff: --- contrib/clang-format | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/clang-format b/contrib/clang-format index f4f708648eb9..95f1455c14db 100644 --- a/contrib/clang-format +++ b/contrib/clang-format @@ -32,6 +32,7 @@ BinPackArguments: true BinPackParameters: true BraceWrapping: AfterClass: true + AfterCaseLabel: true AfterControlStatement: true AfterEnum: true AfterFunction: true
[gcc r15-7587] gcc: testsuite: Fix builtin-speculation-overloads[14].C testism
https://gcc.gnu.org/g:9335ff73a509a1f203de691052d600facd07c3f8 commit r15-7587-g9335ff73a509a1f203de691052d600facd07c3f8 Author: Matthew Malcomson Date: Mon Feb 10 16:24:20 2025 + gcc: testsuite: Fix builtin-speculation-overloads[14].C testism When making warnings trigger a failure in template substitution I could not find any way to trigger the warning about builtin speculation not being available on the given target. Turns out I misread the code -- this warning happens when the speculation_barrier pattern is not defined. Here we add an effective target to represent "__builtin_speculation_safe_value is available on this target" and use that to adjust our test on SFINAE behaviour accordingly. N.b. this means that we get extra testing -- not just that things work on targets which support __builtin_speculation_safe_value, but also that the behaviour works on targets which don't support it. Tested with AArch64 native, AArch64 cross compiler, and RISC-V cross compiler (just running the tests that I've changed). Ok for trunk? gcc/testsuite/ChangeLog: PR target/117991 * g++.dg/template/builtin-speculation-overloads.def: SUCCESS argument in SPECULATION_ASSERTS now uses a macro `true_def` instead of the literal `true` for arguments which should work with `__builtin_speculation_safe_value`. * g++.dg/template/builtin-speculation-overloads1.C: Define `true_def` macro on command line to compiler according to the effective target representing that `__builtin_speculation_safe_value` does something on this target. * g++.dg/template/builtin-speculation-overloads4.C: Likewise. * lib/target-supports.exp (check_effective_target_speculation_barrier_defined): New. Signed-off-by: Matthew Malcomson Diff: --- gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def | 9 ++--- gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C | 2 ++ gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C | 2 ++ gcc/testsuite/lib/target-supports.exp | 9 + 4 files changed, 19 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def index 39d9b748d524..ada13e6f77c3 100644 --- a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def +++ b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def @@ -15,14 +15,17 @@ class X{}; class Large { public: int arr[10]; }; class Incomplete; +/* Using `true_def` in order to account for the fact that if this target + * doesn't support __builtin_speculation_safe_value at all everything fails to + * substitute. */ #define SPECULATION_ASSERTS \ - MAKE_SPECULATION_ASSERT (int, true) \ + MAKE_SPECULATION_ASSERT (int, true_def) \ MAKE_SPECULATION_ASSERT (float, false) \ MAKE_SPECULATION_ASSERT (X, false) \ MAKE_SPECULATION_ASSERT (Large, false) \ MAKE_SPECULATION_ASSERT (Incomplete, false) \ - MAKE_SPECULATION_ASSERT (int *, true) \ - MAKE_SPECULATION_ASSERT (long, true) + MAKE_SPECULATION_ASSERT (int *, true_def) \ + MAKE_SPECULATION_ASSERT (long, true_def) int main() { SPECULATION_ASSERTS diff --git a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C index bc8f1083a994..4c50d4aa6f5e 100644 --- a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C +++ b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C @@ -1,5 +1,7 @@ /* Check that overloaded builtins can be used in templates with SFINAE. */ // { dg-do compile { target c++17 } } +// { dg-additional-options "-Dtrue_def=true" { target speculation_barrier_defined } } +// { dg-additional-options "-Dtrue_def=false" { target { ! speculation_barrier_defined } } } /* Checks performed here: Various types (some that work, some that don't). */ diff --git a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C index c024a21fa18e..cc0b3131af77 100644 --- a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C +++ b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads4.C @@ -1,5 +1,7 @@ /* Check that overloaded builtins can be used in templates with SFINAE. */ // { dg-do compile { target c++17 } } +// { dg-additional-o
[gcc] Created branch 'matmal01/heads/openmp-pr119588' in namespace 'refs/users'
The branch 'matmal01/heads/openmp-pr119588' was created in namespace 'refs/users' pointing to: f7babdba2650... [RFC] Implement "flat" barrier in libgomp
[gcc(refs/users/matmal01/heads/openmp-pr119588)] [RFC] Implement "flat" barrier in libgomp
https://gcc.gnu.org/g:f7babdba2650aad72fb189f0988650e009ef3041 commit f7babdba2650aad72fb189f0988650e009ef3041 Author: Matthew Malcomson Date: Wed Jun 25 15:30:53 2025 +0100 [RFC] Implement "flat" barrier in libgomp Working on PR119588. From that PR: We've seen on some internal workloads (NVPL BLAS running GEMM routine on a small matrix) that the overhead of a `#pragma omp parallel` statement when running with a high number of cores (72 or 144) is much higher with the libgomp implementation than with LLVM's libomp. In a program which has both some work that can be handled with high parallelism (so OMP is running with many threads) and a large number of small pieces of work that need to be performed with low overhead, this has been seen to cause a significant overhead when accumulated. -- This patch is still not complete -- especially around spreading the new interface to the nvptx and gcn targets -- but I've made more progress and would like to gather feedback on whether people think this is a reasonable approach. Points that I would like feedback on: - Is this extra complexity OK, especially for the futex_waitv fallback? - I have thought about creating yet another target (in the same way that the linux/ target is for kernels where `futex` is available). Would this approach be favoured? - Are the changes in `gomp_team_start` significant enough for other targets that it's worth bringing it into the linux/ target? Similar to what I see the nvptx/ target has done for `gomp_team_start` Could have some `ifdef` for the alternate barrier API rather than try to modify all the other interfaces to match a slightly modified API. - As far as I can tell so far the only modifications required by these changes are in the posix/ target and I've made them. I haven't yet checked 100% but I believe the nvptx and gcn targets don't use this function (believe because they don't define the `reinit` function and guess they don't use pthreads). -- This patch introduces a barrier that performs better on the microbenchmark provided in PR119588. I have ran various higher-level benchmarks and have not seen anything I can claim as outside noise range so far -- though I'm working on reducing the noise in my benchmarks so hopefully will be able to say that soon. This patch is very similar to that I added on the PR a month and a bit ago. Here I have extended the barrier added there to also be used for cancellable barriers and to handle tasking (the two main things left outstanding in the functionality of that patch). The outline of the barrier is: 1) Each thread associated with a barrier has its own ID (currently the team_id of the team that the current thread is a part of). 2) When a thread arrives at the barrier it marks its "arrived flag" as having arrived. 3) ID 0 (which is always the thread which does the bookkeeping for this team, at the top-level it's the primary thread) waits on each of the other threads "arrived flags". It executes tasks while waiting on these threads to arrive. 4) Once ID 0 sees all other threads have arrived it signals to all secondary threads that the barrier is complete via incrementing the global generation (as usual). Fundamental differences being that we have thread-local "arrived" flags instead of a single counter. That means there is less contention and appears to speed up the barrier significantly when threads are hitting the barrier very hard. Other differences are: 1) The "coordinating" thread is pre-determined rather than "whichever thread hits the barrier last". - This has some knock-on effects w.r.t. how the barrier is used. (See e.g. the changes in work.c). - This also means that the behaviour of the pre-determined "coordinating" thread must be more complicated while it's waiting on other threads to arrive. Especially see the task handling in `gomp_team_barrier_ensure_last`. - Because we assign the "coordinating" thread to be the primary thread this does mean that in-between points (3) and (4) above the primary thread can see the results of the operations of all secondary threads. A point important for another optimisation I hope to make later where we only go through one barrier per iteration in the main execution loop of `gomp_thread_start` for top-level threads. 2) When a barrier needs to be adjusted so that different threads have different ID's we must serialise all threads before this adjustment. - The previous design had no assignment of threads to s