[PATCH] PR58669: does not detect all cpu cores/threads

2013-10-17 Thread Andrew
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58669 Testing: $ /usr/lib/jvm/icedtea-6/bin/java TestProcessors Processors: 8 $ /usr/lib/jvm/gcj-jdk/bin/java -version java version "1.5.0" gij (GNU libgcj) version 4.8.1 $ /usr/lib/jvm/gcj-jdk/bin/java TestProcessors Processors: 1 $ /h

[COMMITTED/13] Fix PR 110386: backprop vs ABSU_EXPR

2023-10-01 Thread Andrew Pinski
From: Andrew Pinski The issue here is that when backprop tries to go and strip sign ops, it skips over ABSU_EXPR but ABSU_EXPR not only does an ABS, it also changes the type to unsigned. Since strip_sign_op_1 is only supposed to strip off sign changing operands and not ones that change types

[COMMITTED/13] Fix PR 111331: wrong code for `a > 28 ? MIN : 29`

2023-10-01 Thread Andrew Pinski
From: Andrew Pinski The problem here is after r6-7425-ga9fee7cdc3c62d0e51730, the comparison to see if the transformation could be done was using the wrong value. Instead of see if the inner was LE (for MIN and GE for MAX) the outer value, it was comparing the inner to the value used in the

[COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread Andrew MacLeod
range but turns out it was really being set in DOM2.   Instead they check for the range in the final listing... Bootstrapped on  x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From dae5de2a2353b928cc7099a78d88a40473abefd2 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Wed, 27 Sep

[COMMITTED] Remove pass counting in VRP.

2023-10-03 Thread Andrew MacLeod
ng on there) Bootstraps  on x86_64-pc-linux-gnu with no regressions.   Pushed. Andrew From 29abc475a360ad14d5f692945f2805fba1fdc679 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Thu, 28 Sep 2023 09:19:32 -0400 Subject: [PATCH 2/5] Remove pass counting in VRP. Rather than using a pass cou

Re: [COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread Andrew MacLeod
huh.  thanks,  I'll have a look. Andrew On 10/3/23 11:47, David Edelsohn wrote: This patch caused a bootstrap failure on AIX. during GIMPLE pass: evrp /nasfarm/edelsohn/src/src/libgcc/libgcc2.c: In function '__gcc_bcmp': /nasfarm/edelsohn/src/src/libgcc/libgcc2.c:2910:1: in

Re: [COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread Andrew MacLeod
Give this a try..  I'm testing it here, but x86 doesn't seem to show it anyway for some reason :-P I think i needed to handle pointers special since SSA_NAMES handle pointer ranges different. Andrew On 10/3/23 11:47, David Edelsohn wrote: This patch caused a bootstrap fail

Re: [COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread Andrew MacLeod
perfect.  I'll check it in when my testrun is done. Thanks  .. .  and sorry :-) Andrew On 10/3/23 12:53, David Edelsohn wrote: AIX bootstrap is happier with the patch. Thanks, David On Tue, Oct 3, 2023 at 12:30 PM Andrew MacLeod wrote: Give this a try..  I'm testing it her

Re: [COMMITTED] Remove pass counting in VRP.

2023-10-03 Thread Andrew MacLeod
On 10/3/23 13:02, David Malcolm wrote: On Tue, 2023-10-03 at 10:32 -0400, Andrew MacLeod wrote: Pass counting in VRP is used to decide when to call early VRP, pass the flag to enable warnings, and when the final pass is. If you try to add additional passes, this becomes quite fragile. This

[COMMITTED] Don't use range_info_get_range for pointers.

2023-10-03 Thread Andrew MacLeod
d8808c37d29110872fa51b98e71aef9e160b4692 Author: Andrew MacLeod Date: Tue Oct 3 12:32:10 2023 -0400 Don't use range_info_get_range for pointers. Pointers only track null and nonnull, so we need to handle them specially. * tree-ssanames.cc (set_range_info): Use get_ptr_inf

Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2023-10-04 Thread Andrew Pinski
otstraps just fine as at commit 7eb5ce7f58ed ("Remove pass counting in > VRP."). > > Shall I file a PR, or can you handle it regardless? Let me know if you > need anything from me. It is already filed as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111688 . Thanks, Andrew > > Maciej

Re: [PATCH]AArch64 Handle copysign (x, -1) expansion efficiently

2023-10-05 Thread Andrew Pinski
d") > (match_operand:GPF 1 "register_operand") > - (match_operand:GPF 2 "register_operand")] > + (match_operand:GPF 2 "nonmemory_operand")] >"TARGET_SIMD" > { > - rtx bitmask = gen_reg_rtx (mode); > + machine_mode int_mode =

[COMMITTED 2/3] Add a dom based ranger for fast VRP.

2023-10-05 Thread Andrew MacLeod
from anywhere. Pushed. Andrew From ad8cd713b4e489826e289551b8b8f8f708293a5b Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Fri, 28 Jul 2023 13:18:15 -0400 Subject: [PATCH 2/3] Add a dom based ranger for fast VRP. Provide a dominator based implementation of a range query. * gimple_ran

[COMMITTED 1/3] Add outgoing range vector calculation API.

2023-10-05 Thread Andrew MacLeod
t only looks at whether NAME has a range, and returns it if it does.  not other overhead. Pushed. From 52c1e2c805bc2fd7a30583dce3608b738f3a5ce4 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Tue, 15 Aug 2023 17:29:58 -0400 Subject: [PATCH 1/3] Add outgoing range vector calcualtion API Pr

[COMMITTED 3/3] Create a fast VRP pass

2023-10-05 Thread Andrew MacLeod
file with the extension .fvrp. pushed. From f4e2dac53fd62fbf2af95e0bf26d24e929fa1f66 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Mon, 2 Oct 2023 18:32:49 -0400 Subject: [PATCH 3/3] Create a fast VRP pass * timevar.def (TV_TREE_FAST_VRP): New. * tree-pass.h (make_pass_fast_vrp): New

[COMMITTED 0/3] Add a FAST VRP pass.

2023-10-05 Thread Andrew MacLeod
side effects).   A little additional  work can reduce the memory footprint further too.  I have done no experiments as yet as to the cot of adding relations, but it would be pretty straightforward as it is just reusing all the same components the main ranger does Andrew

Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-05 Thread Andrew Pinski
form? > > I.e. could we go with your new version of the match.pd patch, and add some > > isel stuff as a follow-on? > > > > Sure if that's what's desired But.. > > The example you posted above is for instance worse for x86 > https://godbolt.org/z/x9ccqxW

[PATCH] MATCH: Fix infinite loop between `vec_cond(vec_cond(a, b, 0), c, d)` and `a & b`

2023-10-05 Thread Andrew Pinski
Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)` into `vec_cond(a & b, c, d)` but since in this case a is a comparison fold will change `a & b` back into `vec_cond(a,b,0)` which causes an infinite loop. The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*

[committed] amdgcn: silence warning

2023-10-06 Thread Andrew Stubbs
I've just committed this simple patch to silence an enum warning. Andrewamdgcn: silence warning gcc/ChangeLog: * config/gcn/gcn.cc (print_operand): Adjust xcode type to fix warning. diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index f6cff659703..ef3b6472a52 100644 --- a/g

[committed] amdgcn: switch mov insns to compact syntax

2023-10-06 Thread Andrew Stubbs
I've just committed this patch. It should have no functional changes except to make it easier to add new alternatives into the alternative-heavy move instructions. Andrewamdgcn: switch mov insns to compact syntax The move instructions typically have many alternatives (and I'm about to add more

Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5

2023-10-06 Thread Andrew Stubbs
{ target vect_strided5 } } } */ This patch causes a test regression on amdgcn because vect_strided5 is true (because check_effective_target_vect_fully_masked is true), but the testcase still gives the message 4 times. Perhaps because amdgcn uses masking and not vect_load_lanes? Andrew

Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-09 Thread Andrew Pinski
OPYSIGN @0 @1)) > > >>> (coss @0))) > > >>> > > >>> which properly will diagnose a duplicate pattern. Ther are > > >>> currently no operator lists with just builtins defined (that > > >>> could be fixed, see gencfn-macros

Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5

2023-10-09 Thread Andrew Stubbs
t;vectorizing stmts using SLP" 3 "vect" { target vect_strided5 && vect_load_lanes } } } */ Could you verify it whether it work for you ? You need an additional set of curly braces in the second line to avoid a syntax error message, but I get a pass with that change. Thanks Andrew

[COMMITTED] Remove unused get_identity_relation.

2023-10-09 Thread Andrew MacLeod
VREL_EQ... as there is only one.  As it stands, always returns VREL_EQ, so simply use VREL_EQ in the 2 calling locations. Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From 5ee51119d1345f3f13af784455a4ae466766912b Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date

[COMMITTED] PR tree-optimization/111694 - Ensure float equivalences include + and - zero.

2023-10-09 Thread Andrew MacLeod
. Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From b0892b1fc637fadf14d7016858983bc5776a1e69 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Mon, 9 Oct 2023 10:15:07 -0400 Subject: [PATCH 2/2] Ensure float equivalences include + and - zero. A floating point equivalence may not

[PATCH] MATCH: [PR111679] Add alternative simplification of `a | ((~a) ^ b)`

2023-10-09 Thread Andrew Pinski
So currently we have a simplification for `a | ~(a ^ b)` but that does not match the case where we had originally `(~a) | (a ^ b)` so we need to add a new pattern that matches that and uses bitwise_inverted_equal_p that also catches comparisons too. OK? Bootstrapped and tested on x86_64-linux-gnu

Re: [PATCH] use get_range_query to replace get_global_range_query

2023-10-10 Thread Andrew Pinski
> > + get_range_query (cfun)->range_of_expr (r, bound); > > expand doesn't have a ranger instance so this is a no-op. I'm unsure > if it would be safe given we're half GIMPLE, half RTL. Please leave it > out. It definitely does not work and can&#x

Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV

2023-10-10 Thread Andrew Stubbs
optimizing more than expected makes it low priority). LGTM Andrew

Re: [committed] [PR target/93062] RISC-V: Handle long conditional branches for RISC-V

2023-10-10 Thread Andrew Waterman
pops the RAS and the latter pushes it. Any reason for using a different sequence in one than the other? On Tue, Oct 10, 2023 at 3:11 PM Jeff Law wrote: > > > Ventana has had a variant of this patch from Andrew W. in its tree for > at least a year. I'm dusting it off

[PATCH] MATCH: [PR111282] Simplify `a & (b ^ ~a)` to `a & b`

2023-10-10 Thread Andrew Pinski
While `a & (b ^ ~a)` is optimized to `a & b` on the rtl level, it is always good to optimize this at the gimple level and allows us to match a few extra things including where a is a comparison. Note I had to update/change the testcase and-1.c to avoid matching this case as we can match -2 and 1 a

Re: [committed] [PR target/93062] RISC-V: Handle long conditional branches for RISC-V

2023-10-11 Thread Andrew Waterman
On Tue, Oct 10, 2023 at 8:26 PM Jeff Law wrote: > > > > On 10/10/23 18:24, Andrew Waterman wrote: > > I remembered another concern since we discussed this patch privately. > > Using ra for long calls results in a sequence that will corrupt the > > return-address sta

[COMMITTED][GCC13] PR tree-optimization/111694 - Ensure float equivalences include + and - zero.

2023-10-11 Thread Andrew MacLeod
Similar patch which was checked into trunk last week.   slight tweak needed as dconstm0 was not exported in gcc 13, otherwise functionally the same Bootstrapped on x86_64-pc-linux-gnu.  pushed. Andrew commit f0efc4b25cba1bd35b08b7dfbab0f8fc81b55c66 Author: Andrew MacLeod Date: Mon Oct 9 13

Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions

2023-10-11 Thread Andrew Pinski
gcc] Error 2 > make[1]: Leaving directory > '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1' > make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2 This is also recorded as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 . It breaks more than just RISCV; it depends on the version of texinfo that is installed too. Thanks, Andrew > > > juzhe.zh...@rivai.ai

[COMMITTED] PR tree-optimization/111622 - Do not add partial equivalences with no uses.

2023-10-13 Thread Andrew MacLeod
.  pushed. Andrew

[COMMITTED] [GCC13] PR tree-optimization/111622 - Do not add partial equivalences with no uses.

2023-10-13 Thread Andrew MacLeod
large, it can consume a lot of time.  Typically, partial equivalence lists are small.   In this case, a lot of dead stmts were not removed, so there was no redundancy elimination and it was causing an issue. Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From

Re: [COMMITTED] PR tree-optimization/111622 - Do not add partial equivalences with no uses.

2023-10-13 Thread Andrew MacLeod
of course the patch would be handy... On 10/13/23 09:23, Andrew MacLeod wrote: Technically PR 111622 exposes a bug in GCC 13, but its been papered over on trunk by this: commit 9ea74d235c7e7816b996a17c61288f02ef767985 Author: Richard Biener Date:   Thu Sep 14 09:31:23 2023 +0200

[PATCH] MATCH: [PR111432] Simplify `a & (x | CST)` to a when we know that (a & ~CST) == 0

2023-10-13 Thread Andrew Pinski
This adds the simplification `a & (x | CST)` to a when we know that `(a & ~CST) == 0`. In a similar fashion as `a & CST` is handle. I looked into handling `a | (x & CST)` but that I don't see any decent simplifications happening. OK? Bootstrapped and tested on x86_linux-gnu with no regressions.

[PATCH 2/2] [c] Fix PR 101364: ICE after error due to diagnose_arglist_conflict not checking for error

2023-10-14 Thread Andrew Pinski
When checking to see if we have a function declaration has a conflict due to promotations, there is no test to see if the type was an error mark and then calls c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an ICE. This adds a check for error before the call of c_ty

[PATCH 1/2] Fix ICE due to c_safe_arg_type_equiv_p not checking for error_mark node

2023-10-14 Thread Andrew Pinski
This is a simple error recovery issue when c_safe_arg_type_equiv_p was added in r8-5312-gc65e18d3331aa999. The issue is that after an error, an argument type (of a function type) might turn into an error mark node and c_safe_arg_type_equiv_p was not ready for that. So this just adds a check for err

[PATCH] MATCH: Improve `A CMP 0 ? A : -A` set of patterns to use bitwise_equal_p.

2023-10-15 Thread Andrew Pinski
This improves the `A CMP 0 ? A : -A` set of match patterns to use bitwise_equal_p which allows an nop cast between signed and unsigned. This allows catching a few extra cases which were not being caught before. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog:

[PATCH] Improve factor_out_conditional_operation for conversions and constants

2023-10-15 Thread Andrew Pinski
In the case of a NOP conversion (precisions of the 2 types are equal), factoring out the conversion can be done even if int_fits_type_p returns false and even when the conversion is defined by a statement inside the conditional. Since it is a NOP conversion there is no zero/sign extending happening

[PATCH] [PR31531] MATCH: Improve ~a < ~b and ~a < CST, allow a nop cast inbetween ~ and a/b

2023-10-15 Thread Andrew Pinski
Currently we able to simplify `~a CMP ~b` to `b CMP a` but we should allow a nop conversion in between the `~` and the `a` which can show up. A similarly thing should be done for `~a CMP CST`. I had originally submitted the `~a CMP CST` case as https://gcc.gnu.org/pipermail/gcc-patches/2021-Novem

Re: [PATCH] Add files to discourage submissions of PRs to the GitHub mirror.

2023-10-16 Thread Andrew Pinski
G.md > https://github.com/git/git/blob/master/.github/PULL_REQUEST_TEMPLATE.md > What do people think? > I think this is a great idea. Is a similar one for opening issues too? Thanks, Andrew ChangeLog: > > * .github/CONTRIBUTING.md: New file. > * .github/PULL_R

Re: [PATCH 11/11] aarch64: Add new load/store pair fusion pass.

2023-10-17 Thread Andrew Pinski
On Tue, Oct 17, 2023 at 1:52 PM Alex Coplan wrote: > > This adds a new aarch64-specific RTL-SSA pass dedicated to forming load > and store pairs (LDPs and STPs). > > As a motivating example for the kind of thing this improves, take the > following testcase: > > extern double c[20]; > > double f(do

aarch64: Replace duplicated selftests

2023-10-18 Thread Andrew Carlotti
Pushed as obvious. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_test_fractional_cost): Test <= instead of testing < twice. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 2b0de7ca0389be6698c329b54f9501b8ec09183f..9c3c0e705e2e6ea3b55b4a5f1

[0/3] target_version and aarch64 function multiversioning

2023-10-18 Thread Andrew Carlotti
tween target_clones and target/target_version multiversioning, but would require agreement on how to resolve some of the issues discussed in [1]. Thanks, Andrew [1] https://gcc.gnu.org/pipermail/gcc/2023-October/242686.html

[1/3] Add support for target_version attribute

2023-10-18 Thread Andrew Carlotti
This patch adds support for the "target_version" attribute to the middle end and the C++ frontend, which will be used to implement function multiversioning in the aarch64 backend. Note that C++ is currently the only frontend which supports multiversioning using the "target" attribute, whereas the

[2/3] [aarch64] Add function multiversioning support

2023-10-18 Thread Andrew Carlotti
This adds initial support for function multiversion on aarch64 using the target_version and target_clones attributes. This mostly follows the Beta specification in the ACLE [1], with a few diffences that remain to be fixed: - Symbol mangling for target_clones differs from that for target_version

[3/3] WIP/RFC: Fix name mangling for target_clones

2023-10-18 Thread Andrew Carlotti
This is a partial patch to make the mangling of function version names for target_clones match those generated using the target or target_version attributes. It modifies the name of function versions, but does not yet rename the resolved symbol, resulting in a duplicate symbol name (and an error a

[COMMITTED] Fix expansion of `(a & 2) != 1`

2023-10-18 Thread Andrew Pinski
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670 where we would remove the `& CST` part if we ended up not calling expand_single_bit_test. This fixes the problem by introducing a new variable that will be used for calling expand_single_bit_test. As afar as I know this can only s

[PATCH] aarch64: [PR110986] Emit csinv again for `a ? ~b : b`

2023-10-18 Thread Andrew Pinski
After r14-3110-g7fb65f10285, the canonical form for `a ? ~b : b` changed to be `-(a) ^ b` that means for aarch64 we need to add a few new insn patterns to be able to catch this and change it to be what is the canonical form for the aarch64 backend. A secondary pattern was needed to support a zero_e

[committed] amdgcn: deprecate Fiji device and multilib

2023-10-19 Thread Andrew Stubbs
The build has been failing for the last few days because LLVM removed support for the HSACOv3 binary metadata format, which we were still using for the Fiji multilib. The LLVM commit has now been reverted (thank you Pierre van Houtryve), but it's only a temporary repreive. This patch removes

[PATCH] wwwdocs: gcc-14: mark amdgcn fiji deprecated

2023-10-19 Thread Andrew Stubbs
OK to commit? Andrewgcc-14: mark amdgcn fiji deprecated diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index c817dde4..91ab8132 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@ -178,6 +178,16 @@ a work-in-progress. +AMD Radeon (GCN) + + +

[PATCH] c: [PR104822] Don't warn about converting NULL to different sso endian

2023-10-19 Thread Andrew Pinski
In a similar way we don't warn about NULL pointer constant conversion to a different named address we should not warn to a different sso endian either. This adds the simple check. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR c/104822 gcc/c/ChangeLog: * c-t

[PATCH] c: [PR100532] Fix ICE when an agrgument was an error mark

2023-10-19 Thread Andrew Pinski
In the case of convert_argument, we would return the same expression back rather than error_mark_node after the error message about trying to convert to an incomplete type. This causes issues in the gimplfier trying to see if another conversion is needed. The code here dates back to before the rev

Re: [1/3] Add support for target_version attribute

2023-10-19 Thread Andrew Carlotti
On Thu, Oct 19, 2023 at 07:04:09AM +, Richard Biener wrote: > On Wed, 18 Oct 2023, Andrew Carlotti wrote: > > > This patch adds support for the "target_version" attribute to the middle > > end and the C++ frontend, which will be used to implement function > &

Re: [PATCH] [ARC] Add support for HS4x cpus.

2018-07-06 Thread Andrew Burgess
* Claudiu Zissulescu [2018-06-13 12:09:18 +0300]: > From: Claudiu Zissulescu > > This patch adds support for two ARCHS variations. > > Ok to apply? > Claudiu Sorry for the delay, this looks fine. Thanks, Andrew > > gcc/ > 2017-03-10 Claudiu Zissulescu

Re: [PATCH, GCC, AARCH64] Add support for +profile extension

2018-07-09 Thread Andrew Pinski
ntly the driver will still > pass the extension down to the assembler regardless. > > Boostrapped aarch64-none-linux-gnu and ran regression tests. > > Is it OK for trunk? I use a similar patch for the last year and half. Thanks, Andrew > > gcc/ChangeLog: > 2018-07-09

Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Andrew Pinski
it. The only thing you could do is restrict > > replacement of CALL_EXPRs (in SCEV cprop) to those the target > > natively supports. > > How about restricting it in expression_expensive_p ? Is that what you > wanted. Attached patch does this. > Bootstrap and regression testing progr

Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Andrew Pinski
On Tue, Jul 10, 2018 at 6:35 PM Kugan Vivekanandarajah wrote: > > Hi Andrew, > > On 11 July 2018 at 11:19, Andrew Pinski wrote: > > On Tue, Jul 10, 2018 at 6:14 PM Kugan Vivekanandarajah > > wrote: > >> > >> On 10 July 2018 at 23:17, Richard Biener &g

Re: [PATCH 1/4] [ARC] Add more additional register names

2018-07-25 Thread Andrew Burgess
All the patches in this series look fine. Thanks, Andrew * Claudiu Zissulescu [2018-07-16 15:29:42 +0300]: > From: claziss > > gcc/ > 2017-06-14 Claudiu Zissulescu > > * config/arc/arc.h (ADDITIONAL_REGISTER_NAMES): Add additional > register names. &g

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-07-31 Thread Andrew Pinski
imental) (GCC)) > > on -O1 and above. > > > I don't see where the FUD comes in here; either this builtin has a defined > semantics across targets and they are adhered to, or the builtin doesn't have > well defined semantics, or the targets fail to implement those se

[PATCH] Add COMPLEX_VECTOR_INT modes

2023-05-26 Thread Andrew Stubbs
Hi all, I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just do it because the GCC middle-end models DIVMOD's return value as "complex int" type, and there are no vector equivalents of that type. Therefore, this patch adds minimal support for "complex vector int" modes.

Re: [patch] amdgcn: Change -m(no-)xnack to -mxnack=(on,off,any)

2023-05-26 Thread Andrew Stubbs
OK. Andrew On 26/05/2023 15:58, Tobias Burnus wrote: (Update the syntax of the amdgcn commandline option in anticipation of later patches; while -m(no-)xnack is in mainline since r12-2396-gaad32a00b7d2b6 (for PR100208), -mxsnack (contrary to -msram-ecc) is currently mostly a stub for later

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-05 Thread Andrew Stubbs
On 30/05/2023 07:26, Richard Biener wrote: On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote: Hi all, I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just do it because the GCC middle-end models DIVMOD's return value as "complex int" type, and

Re: [Patch] libgomp: plugin-gcn - support 'unified_address'

2023-06-06 Thread Andrew Stubbs
On 06/06/2023 16:33, Tobias Burnus wrote: Andrew: Does the GCN change look okay to you? This patch permits to use GCN devices with 'omp requires unified_address' which in principle works already, except that the requirement handling did disable it. (It also updates libgomp.tex

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Andrew Stubbs
;s no DIVMOD support so I couldn't just do a straight comparison. Thanks Andrew

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Andrew Stubbs
On 09/06/2023 10:02, Richard Sandiford wrote: Andrew Stubbs writes: On 07/06/2023 20:42, Richard Sandiford wrote: I don't know if this helps (probably not), but we have a similar situation on AArch64: a 64-bit mode like V8QI can be doubled to a 128-bit vector or to a pair of 64-bit ve

[PATCH] vect: Vectorize via libfuncs

2023-06-13 Thread Andrew Stubbs
This patch allows vectorization when operators are available as libfuncs, rather that only as insns. This will be useful for amdgcn where we plan to vectorize loops that contain integer division or modulus, but don't want to generate inline instructions for the division algorithm every time.

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Andrew Stubbs
nts? OK? Btw, testing on GCN would be welcome - the _avx512 paths could work for it so in case the while_ult path fails (not sure if it ever does) it could get _avx512 style masking. Likewise testing on ARM just to see I didn't break anything here. I don't have SVE hardware so testing is probably meaningless. I can set some tests going. Is vect.exp enough? Andrew

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs
On 14/06/2023 15:29, Richard Biener wrote: Am 14.06.2023 um 16:27 schrieb Andrew Stubbs : On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote: This implemens fully masked vectorization or a masked epilog for AVX512 style masks which single themselves out by representing each lane

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs
On 15/06/2023 10:58, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 14/06/2023 15:29, Richard Biener wrote: Am 14.06.2023 um 16:27 schrieb Andrew Stubbs : On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote: This implemens fully masked vectorization or a masked

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs
On 15/06/2023 12:06, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 10:58, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 14/06/2023 15:29, Richard Biener wrote: Am 14.06.2023 um 16:27 schrieb Andrew Stubbs : On 14/06/2023 12:54

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs
On 15/06/2023 14:34, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 12:06, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 10:58, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 14/06/2023 15:29, Richard

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs
On 15/06/2023 15:00, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 14:34, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 12:06, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 10:58, Richard

[PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators

2022-07-07 Thread Andrew Stubbs
ement memory that's both high-bandwidth and pinned anyway). Patches 15 to 17 are new work. I can probably approve these myself, but they can't be committed until the rest of the series is approved. Andrew Andrew Stubbs (11): libgomp, nvptx: low-latency memory allocator libgomp: pinned m

[PATCH 02/17] libgomp: pinned memory

2022-07-07 Thread Andrew Stubbs
Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN

[PATCH 01/17] libgomp, nvptx: low-latency memory allocator

2022-07-07 Thread Andrew Stubbs
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using t

[PATCH 04/17] openmp, nvptx: low-lat memory access traits

2022-07-07 Thread Andrew Stubbs
The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator now implicitly implies the "pteam" trait. libgomp/ChangeLog:

[PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc

2022-07-07 Thread Andrew Stubbs
This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development. The allocator is equivalent to using a custom allocator with the pinned

[PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-07-07 Thread Andrew Stubbs
. co-authored-by: Andrew Stubbs --- gcc/omp-low.cc | 174 +++ gcc/passes.def | 1 + gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 ++ gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++ gcc/testsuite/g++.dg/gomp/usm-1

[PATCH 06/17] openmp: Add -foffload-memory

2022-07-07 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 ++

[PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc

2022-07-07 Thread Andrew Stubbs
This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_unified_shared_mem_alloc and ompx_host_mem_alloc, plus corresponding memory spaces, which can

[PATCH 12/17] Handle cleanup of omp allocated variables (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs
Currently we are only handling omp allocate directive that is associated with an allocate statement. This statement results in malloc and free calls. The malloc calls are easy to get to as they are in the same block as allocate directive. But the free calls come in a separate cleanup block. To

[PATCH 07/17] openmp: allow requires unified_shared_memory

2022-07-07 Thread Andrew Stubbs
This is the front-end portion of the Unified Shared Memory implementation. It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets flag_offload_memory, but is otherwise inactive, for now. It also checks that -foffload-memory isn't set to an incompatible mode. gcc/c/ChangeL

[PATCH 11/17] Translate allocate directive (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs
gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR. (gfc_trans_omp_allocate): New function. (gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE. gcc/ChangeLog: * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_AL

[PATCH 14/17] Lower allocate directive (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs
This patch looks for malloc/free calls that were generated by allocate statement that is associated with allocate directive and replaces them with GOMP_alloc and GOMP_free. gcc/ChangeLog: * omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR. (scan_omp_allocate): New.

[PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-07 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mlo

[PATCH 13/17] Gimplify allocate directive (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs
gcc/ChangeLog: * doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE. * gimple-pretty-print.cc (dump_gimple_omp_allocate): New function. (pp_gimple_stmt_1): Call it. * gimple.cc (gimple_build_omp_allocate): New function. * gimple.def (GIMPLE_OMP_ALLOCATE): New no

[PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0)

2022-07-07 Thread Andrew Stubbs
Currently we only make use of this directive when it is associated with an allocate statement. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE. (show_code_node): Likewise. * gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.

[PATCH 17/17] amdgcn: libgomp plugin USM implementation

2022-07-07 Thread Andrew Stubbs
Implement the Unified Shared Memory API calls in the GCN plugin. The allocate and free are pretty straight-forward because all "target" memory allocations are compatible with USM, on the right hardware. However, there's no known way to check what memory region was used, after the fact, so we use

[PATCH 15/17] amdgcn: Support XNACK mode

2022-07-07 Thread Andrew Stubbs
The XNACK feature allows memory load instructions to restart safely following a page-miss interrupt. This is useful for shared-memory devices, like APUs, and to implement OpenMP Unified Shared Memory. To support the feature we must be able to set the appropriate meta-data and set the load instru

[PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK

2022-07-07 Thread Andrew Stubbs
The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split nature of the offload compilation pipeline. This patch sets a new attribute on OpenMP offload functions to ensure that the information is p

Re: [PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-07 Thread Andrew Stubbs
On 07/07/2022 12:54, Tobias Burnus wrote: Hi Andrew, On 07.07.22 12:34, Andrew Stubbs wrote: Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up.  The option is intended to provide a performance boost to certain offload

Re: [PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-08 Thread Andrew Stubbs
On 08/07/2022 10:00, Tobias Burnus wrote: On 08.07.22 00:18, Andrew Stubbs wrote: Likewise, the 'requires' mechanism could then also be used in '[PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'. No, I don't think so; that environment variable ne

[PATCH] openmp: fix max_vf setting for amdgcn offloading

2022-07-12 Thread Andrew Stubbs
This patch ensures that the maximum vectorization factor used to set the "safelen" attribute on "omp simd" constructs is suitable for all the configured offload devices. Right now it makes the proper adjustment for NVPTX, but otherwise just uses a value suitable for the host system (always x86

[committed] amdgcn: 64-bit not

2022-07-29 Thread Andrew Stubbs
I've committed this patch to enable DImode one's-complement on amdgcn. The hardware doesn't have 64-bit not, and this isn't needed by expand which is happy to use two SImode operations, but the vectorizer isn't so clever. Vector condition masks are DImode on amdgcn, so this has been causing lo

[committed] amdgcn: 64-bit vector shifts

2022-07-29 Thread Andrew Stubbs
I've committed this patch to implement V64DImode vector-vector and vector-scalar shifts. In particular, these are used by the SIMD "inbranch" clones that I'm working on right now, but it's an omission that ought to have been fixed anyway. Andrewamdgcn: 64-bit vector shifts Enable 64-bit vec

[PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs
This patch adjusts the generation of SIMD "inbranch" clones that use integer masks to ensure that it vectorizes on amdgcn. The problem was only that an amdgcn mask is DImode and the shift amount was SImode, and the difference causes vectorization to fail. OK for mainline? Andrewopenmp-simd-c

Re: [PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs
TYPE (mask)); g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt); gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING); } Your version gives the same output mine does, at least on amdgcn anyway. Am I OK to commit this version? Andrew openmp-simd-clone: Mat

  1   2   3   4   5   6   7   8   9   10   >