[committed] amdgcn: Vector procedure call ABI

2022-08-09 Thread Andrew Stubbs
I've committed this patch for amdgcn. This changes the procedure calling ABI such that vector arguments are passed in vector registers, rather than on the stack as before. The ABI for scalar functions is the same for arguments, but the return value has now moved to a vector register; keeping

[PATCH 0/3] OpenMP SIMD routines

2022-08-09 Thread Andrew Stubbs
ure has backend support for the clones at this time. OK for mainline (patches 1 & 3)? Thanks Andrew Andrew Stubbs (3): omp-simd-clone: Allow fixed-lane vectors amdgcn: OpenMP SIMD routine support vect: inbranch SIMD clones gcc/config/gcn/gcn.cc | 63 gcc

[PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-09 Thread Andrew Stubbs
The vecsize_int/vecsize_float has an assumption that all arguments will use the same bitsize, and vary the number of lanes according to the element size, but this is inappropriate on targets where the number of lanes is fixed and the bitsize varies (i.e. amdgcn). With this change the vecsize can

[PATCH 2/3] amdgcn: OpenMP SIMD routine support

2022-08-09 Thread Andrew Stubbs
Enable and configure SIMD clones for amdgcn. This affects both the __simd__ function attribute, and the OpenMP "declare simd" directive. Note that the masked SIMD variants are generated, but the middle end doesn't actually support calling them yet. gcc/ChangeLog: * config/gcn/gcn.cc (g

[PATCH 3/3] vect: inbranch SIMD clones

2022-08-09 Thread Andrew Stubbs
There has been support for generating "inbranch" SIMD clones for a long time, but nothing actually uses them (as far as I can see). This patch add supports for a sub-set of possible cases (those using mask_mode == VOIDmode). The other cases fail to vectorize, just as before, so there should be n

[PATCH v2 2/3] openmp, nvptx: low-lat memory access traits

2023-08-02 Thread Andrew Stubbs
The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator now implicitly implies the "pteam" trait. libgomp/ChangeLog:

[PATCH v2 0/3] libgomp: OpenMP low-latency omp_alloc

2023-08-02 Thread Andrew Stubbs
ation so both architectures can share the code. Andrew Andrew Stubbs (3): libgomp, nvptx: low-latency memory allocator openmp, nvptx: low-lat memory access traits amdgcn, libgomp: low-latency allocator gcc/config/gcn/gcn-builtins.def | 2 + gcc/config/gc

[PATCH v2 1/3] libgomp, nvptx: low-latency memory allocator

2023-08-02 Thread Andrew Stubbs
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using t

[PATCH v2 3/3] amdgcn, libgomp: low-latency allocator

2023-08-02 Thread Andrew Stubbs
This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing

[committed] amdgcn: Delete inactive libfuncs

2023-06-19 Thread Andrew Stubbs
There were implementations for HImode division in libgcc, but there were no matching libfuncs defined in the compiler, so the code was inactive (GCC only defines SImode and DImode, by default, and amdgcn only adds TImode explicitly). On trying to activate it I find that the definition of TARG

[committed] amdgcn: minimal V64TImode vector support

2023-06-19 Thread Andrew Stubbs
This patch adds just enough TImode vector support to use them for moving data about. This is primarily for the use of divmodv64di4, which will use TImode to return a pair of DImode values. The TImode vectors have no other operators defined, and there are no hardware instructions to support thi

Re: [PATCH] move the (a-b) CMP 0 ? (a-b) : (b-a) optimization from fold_cond_expr_with_comparison to match

2023-10-19 Thread Andrew Pinski
On Mon, Jul 12, 2021 at 4:47 AM Richard Biener via Gcc-patches wrote: > > On Sun, Jul 11, 2021 at 4:12 AM apinski--- via Gcc-patches > wrote: > > > > From: Andrew Pinski > > > > This patch moves the (a-b) CMP 0 ? (a-b) : (b-a) optimization > > from

[committed] amdgcn: add -march=gfx1030 EXPERIMENTAL

2023-10-20 Thread Andrew Stubbs
I've committed this patch that allows building binaries for AMD gfx1030 GPUs. I can't actually test it, however, so somebody else will have to debug it (or wait for me to get my hands on a device). Richi reports that it does not execute correctly, as is. This is an experimental broken feature,

Re: [PATCH] wwwdocs: gcc-14: mark amdgcn fiji deprecated

2023-10-20 Thread Andrew Stubbs
On 19/10/2023 11:07, Tobias Burnus wrote: On 19.10.23 11:49, Andrew Stubbs wrote: OK to commit? (I think as maintainer you don't need approval - but of course comments by others can be helpful; I hope mine are. Additionally, Gerald (CCed) helps with keeping the webpages in good shape (t

[PATCH] vect: Don't set excess bits in unform masks

2023-10-20 Thread Andrew Stubbs
This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the mask enable extra lanes that were supposed to be unused and are therefore undefined. Richi suggested an alternative approach involving narrower types and then a zero-extend to the actual mask type. This solved the p

Re: [PATCH 2/2] c++: remove NON_DEPENDENT_EXPR, part 2

2023-10-20 Thread Andrew Pinski
or constexpr and not modify it not to include trees it does not use. In this case NON_DEPENDENT_EXPR was removed and now the rust front-end is broken. Thanks, Andrew > > gcc/cp/ChangeLog: > > * call.cc (build_new_method_call): Remove calls to > buil

[PATCH] convert_to_complex vs invalid_conversion [PR111903]

2023-10-21 Thread Andrew Pinski
convert_to_complex when creating a COMPLEX_EXPR does not currently check if either the real or imag parts was not error_mark_node. This later on confuses the gimpilfier when there was a SAVE_EXPR wrapped around that COMPLEX_EXPR. The simple fix is after calling convert inside convert_to_complex_1,

Re: [PATCH] move the (a-b) CMP 0 ? (a-b) : (b-a) optimization from fold_cond_expr_with_comparison to match

2023-10-21 Thread Andrew Pinski
On Thu, Oct 19, 2023 at 10:13 PM Andrew Pinski wrote: > > On Mon, Jul 12, 2021 at 4:47 AM Richard Biener via Gcc-patches > wrote: > > > > On Sun, Jul 11, 2021 at 4:12 AM apinski--- via Gcc-patches > > wrote: > > > > > > From: Andrew Pinski > &

[PATCHv2] move the (a-b) CMP 0 ? (a-b) : (b-a) optimization from fold_cond_expr_with_comparison to match

2023-10-21 Thread Andrew Pinski
From: Andrew Pinski This patch moves the `(a-b) CMP 0 ? (a-b) : (b-a)` optimization from fold_cond_expr_with_comparison to match. Bootstrapped and tested on x86_64-linux-gnu. Changes in: v2: Removes `(a == b) ? 0 : (b - a)` handling since it was handled via r14-3606-g3d86e7f4a8ae

Re: [PATCH] gcc.c-torture/execute/builtins/fputs.c: Define _GNU_SOURCE

2023-10-22 Thread Andrew Pinski
a lib-fputs.c file which will define a fputs_unlock which is how it will link even if the libc does not define a fputs_unlock. Thanks, Andrew Pinski > > gcc/testsuite/ > > * gcc.c-torture/execute/builtins/fputs.c (_GNU_SOURCE): > Define. > > --- > gcc/te

[PATCH] Use error_mark_node after error in convert

2023-10-22 Thread Andrew Pinski
While working on PR c/111903, I Noticed that convert will convert integer_zero_node to that type after an error instead of returning error_mark_node. >From what I can tell this was the old way of not having error recovery since other places in this file does return error_mark_node and the places I

[Committedv2] aarch64: [PR110986] Emit csinv again for `a ? ~b : b`

2023-10-22 Thread Andrew Pinski
After r14-3110-g7fb65f10285, the canonical form for `a ? ~b : b` changed to be `-(a) ^ b` that means for aarch64 we need to add a few new insn patterns to be able to catch this and change it to be what is the canonical form for the aarch64 backend. A secondary pattern was needed to support a zero_e

[PATCH] match: Fix the `popcnt(a&b) + popcnt(a|b)` patthern for types [PR111913]

2023-10-23 Thread Andrew Pinski
So this pattern needs a little help on the gimple side of things to know what the type popcount should be. For most builtins, the type is the same as the input but popcount and others are not. And when using it with another outer expression, genmatch needs some slight help to know that the return

Re: Inquiry about ARM gcc5 CVE-2023-4039 Patch

2023-10-23 Thread Andrew Pinski
issue with any correct code that GCC will process. GCC does not consider this a security issue according to its security policy. See the "Security features implemented in GCC" section of https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=SECURITY.txt;hb=HEAD for more information on that policy. Thanks, Andrew Pinski > > Best regards,

[PATCHv2] Improve factor_out_conditional_operation for conversions and constants

2023-10-23 Thread Andrew Pinski
In the case of a NOP conversion (precisions of the 2 types are equal), factoring out the conversion can be done even if int_fits_type_p returns false and even when the conversion is defined by a statement inside the conditional. Since it is a NOP conversion there is no zero/sign extending happening

[PATCH] match: Simplify `a != C1 ? abs(a) : C2` when C2 == abs(C1) [PR111957]

2023-10-24 Thread Andrew Pinski
This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified to `abs(a)`. if C1 was originally *_MIN then change it over to use absu instead of abs. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/111957 gcc/ChangeLog: * match

[PATCH] Improve tree_expr_nonnegative_p by using the ranger [PR111959]

2023-10-24 Thread Andrew Pinski
I noticed we were missing optimizing `a / (1 << b)` when we know that a is nonnegative but only due to ranger information. This adds the use of the global ranger to tree_single_nonnegative_warnv_p for SSA_NAME. I didn't extend tree_single_nonnegative_warnv_p to use the ranger for floating point nor

[COMMITTED] Faster irange union for appending ranges.

2023-10-25 Thread Andrew MacLeod
.  The result is a 2.1% speedup in VRP and a 0.8% speedup in threading, with a overall compile time improvement of 0.14% across the GCC build. Bootstrapped on  x86_64-pc-linux-gnu with no regressions. Pushed. Andrew commit f7dbf6230453c76a19921607601eff968bb70169 Author: Andrew MacLeod Date

Re: [PATCH] match: Simplify `a != C1 ? abs(a) : C2` when C2 == abs(C1) [PR111957]

2023-10-26 Thread Andrew Pinski
On Thu, Oct 26, 2023 at 2:24 AM Richard Biener wrote: > > On Wed, Oct 25, 2023 at 5:37 AM Andrew Pinski wrote: > > > > This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified > > to `abs(a)`. if C1 was originally *_MIN then change it over to use absu

Re: [PATCH] Improve tree_expr_nonnegative_p by using the ranger [PR111959]

2023-10-26 Thread Andrew Pinski
On Thu, Oct 26, 2023 at 2:29 AM Richard Biener wrote: > > On Wed, Oct 25, 2023 at 5:51 AM Andrew Pinski wrote: > > > > I noticed we were missing optimizing `a / (1 << b)` when > > we know that a is nonnegative but only due to ranger information. > > This

Re: Ping: [PATCH v2 0/2] Replace intl/ with out-of-tree GNU gettext

2023-10-26 Thread Andrew Pinski
d modern gettext > > > > Ping on this patch series. One comment from me. It would be nice to update install.texi in gcc/doc/ to make a mention of this requirement for non-glibc hosts. Thanks, Andrew Pinski > > TIA, have a lovely night :-) > -- > Arsen Arsenović

Re: [PATCH] testsuite, aarch64: Normalise options to aarch64.exp.

2023-10-26 Thread Andrew Pinski
but that should stabilize. OK for trunk? This fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93619 I think. Thanks, Andrew > thanks > Iain > > --- 8< --- > > When the compiler is configured --with-cpu= and that is different from > the baselines assumed, we see excess te

Re: [PATCH htdocs v2] bugs: Mention -D_GLIBCXX_ASSERTIONS and -D_GLIBCXX_DEBUG

2023-10-26 Thread Andrew Pinski
always feasible because it requires the whole program and any used > libraries to also be built with it (as it breaks ABI). One suggestion to this is also link to the libstdc++ manual on debug mode: https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug_mode.html Thanks, Andrew > > S

[PATCH] MATCH: Simplify `(X &| B) CMP X` if possible [PR 101590]

2023-10-26 Thread Andrew Pinski
From: Andrew Pinski I noticed we were missing these simplifications so let's add them. This adds the following simplifications: U & N <= U -> true U & N > U -> false When U is known to be as non-negative. When N is also known to be non-negative, this is also true: U

Re: [PATCH] MATCH: Simplify `(X &| B) CMP X` if possible [PR 101590]

2023-10-27 Thread Andrew Pinski
On Thu, Oct 26, 2023 at 11:56 PM Richard Biener wrote: > > > > > Am 26.10.2023 um 23:10 schrieb Andrew Pinski : > > > > From: Andrew Pinski > > > > I noticed we were missing these simplifications so let's add them. > > > > This adds the

Re: [PATCH] wwwdocs: gcc-14: mark amdgcn fiji deprecated

2023-10-27 Thread Andrew Stubbs
On 22/10/2023 13:24, Gerald Pfeifer wrote: Hi Andrew, On Fri, 20 Oct 2023, Andrew Stubbs wrote: Additionally, I wonder whether "Fiji" should be changed to "Fiji (gfx803)" in the first line and whether the  "," should be removed in "The ... configuration .

[committed] amdgcn: silence warnings

2023-10-27 Thread Andrew Stubbs
This trivial patch adds the "operands" keyword to the condition in a couple of patterns that cause warnings about "missing" mode specifiers. With the iterators, there were a large number of warnings about these cases that have now been silenced. Andrewamdgcn: silence warnings The operands re

Re: [committed] amdgcn: add -march=gfx1030 EXPERIMENTAL

2023-10-27 Thread Andrew Stubbs
On 20/10/2023 12:51, Andrew Stubbs wrote: I've committed this patch that allows building binaries for AMD gfx1030 GPUs. I can't actually test it, however, so somebody else will have to debug it (or wait for me to get my hands on a device). Richi reports that it does not execute cor

Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div

2023-10-27 Thread Andrew Waterman
On Fri, Oct 27, 2023 at 6:55 AM Jeff Law wrote: > > > > On 10/27/23 01:49, Robin Dapp wrote: > >> @@ -346,7 +346,7 @@ static const struct riscv_tune_param rocket_tune_info > >> = { > >> {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */ > >> {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},

Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div

2023-10-27 Thread Andrew Waterman
On Fri, Oct 27, 2023 at 6:44 AM Jeff Law wrote: > > > > On 10/27/23 01:37, juzhe.zh...@rivai.ai wrote: > > LGTM from my side. > > > > The original integer division COST seems too low. > Almost certainly, though there may be good reasons why it was initially > set so low. I'm generally hesitant to

Re: [PATCH] testsuite, Darwin: Add support for Mach-O function body scans.

2023-10-27 Thread Andrew Pinski
k which seems like a good place to put a test for both formats so when someone changes the function, they could run that testsuite to make sure it is still working for the other format. (Note I am not saying you should add it as part of this patch but it seems like that would be the perfect place for it.) Thanks, Andrew > > Anyway, to make progress, is the revised version OK for trunk? (tested on > aarch64-linux and aarch64-darwin). > thanks > Iain > > >

[PATCH 0/3] start of moving value replacement from phiopt to match

2023-10-29 Thread Andrew Pinski
generation independently of that move. Note this does not add the absorbing_element_p optimizations yet; I filed PR 112271 to record that move. Andrew Pinski (3): MATCH: first of the value replacement moving from phiopt MATCH: Move jump_function_from_stmt support to match.pd MATCH: Add some more

[PATCH 2/3] MATCH: Move jump_function_from_stmt support to match.pd

2023-10-29 Thread Andrew Pinski
This moves the value_replacement support for jump_function_from_stmt to match pattern. This allows us to optimize things earlier in phiopt1 rather than waiting to phiopt2. Which means phiopt1 needs to be disable for vrp03.c testcase. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog:

[PATCH 3/3] MATCH: Add some more value_replacement simplifications to match

2023-10-29 Thread Andrew Pinski
This moves a few more value_replacements simplifications to match. /* a == 1 ? b : a * b -> a * b */ /* a == 1 ? b : b / a -> b / a */ /* a == -1 ? b : a & b -> a & b */ Also adds a testcase to show can we catch these where value_replacement would not (but other passes would). Bootstrapped and

[PATCH 1/3] MATCH: first of the value replacement moving from phiopt

2023-10-29 Thread Andrew Pinski
This moves a few simple patterns that are done in value replacement in phiopt over to match.pd. Just the simple ones which might show up in other code. This allows some optimizations to happen even without depending on sinking from happening and in some cases where phiopt is not invoked (cond-1.c

Re: [PATCH] Testsuite, i386: Fix test by passing -march

2023-10-30 Thread Andrew Pinski
On Mon, Oct 30, 2023 at 5:05 AM Iain Sandoe wrote: > > > > > On 30 Oct 2023, at 11:53, FX Coudert wrote: > > > The newly introduced test gcc.target/i386/pr111698.c currently fails on > > Darwin, where the default arch is core2. > > Andrew suggested in https

Re: [PATCH 2/3] MATCH: Move jump_function_from_stmt support to match.pd

2023-10-30 Thread Andrew Pinski
On Mon, Oct 30, 2023 at 2:29 AM Richard Biener wrote: > > On Sun, Oct 29, 2023 at 5:41 PM Andrew Pinski wrote: > > > > This moves the value_replacement support for jump_function_from_stmt > > to match pattern. > > This allows us to optimize things earlier in phio

Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-30 Thread Andrew Pinski
), > +num_ops (6) > +{ > + ops[0] = op0; > + ops[1] = op1; > + ops[2] = op2; > + ops[3] = op3; > + ops[4] = op4; > + ops[5] = op5; > +} Hmm, does it make sense to start to use variadic templates for these constructors instead of writing them out? And

Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-31 Thread Andrew Pinski
On Tue, Oct 31, 2023 at 12:08 AM Lehua Ding wrote: > > Hi Andrew, > > On 2023/10/31 14:48, Andrew Pinski wrote: > >> +inline > >> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in, > >> +

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-11-02 Thread Andrew Pinski
(_40 != 6.4e+1) // not working It is test_epi32_ps which is failing with TEST_PS macro and the plus operand that uses TESTOP: TESTOP (add, +, float, ps, 0.0f); \ I have not reduced the testcase any further though. Thanks, Andrew Pinski > > Regards &

Re: [1/3] Add support for target_version attribute

2023-11-03 Thread Andrew Carlotti
On Thu, Oct 26, 2023 at 07:41:09PM +0100, Richard Sandiford wrote: > Andrew Carlotti writes: > > This patch adds support for the "target_version" attribute to the middle > > end and the C++ frontend, which will be used to implement function > > multiversioning in th

[COMMITTED 1/2] Remove simple ranges from trailing zero bitmasks.

2023-11-03 Thread Andrew MacLeod
those bits with those bits from the value field. Bootstraps on build-x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From b20f1dce46fb8bb1b142e9087530e546a40edec8 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Tue, 31 Oct 2023 11:51:34 -0400 Subject: [PATCH 1/2] Remove simple range

[COMMITTED 2/2] PR tree-optimization/111766 - Adjust operators equal and not_equal to check bitmasks against constants

2023-11-03 Thread Andrew MacLeod
] [2, +INF] MASK 0xfffe VALUE 0x1  will indicate that any even constants will be false. Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed. Andrew From eb899fee35b8326b2105c04f58fd58bbdeca9d3b Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Wed, 25 Oct 2023 09:46:50 -0400 Sub

Re: [PATCH] Remove unnecessary "& 1" in year_month_day_last::day()

2023-11-05 Thread Andrew Pinski
On Sun, Nov 5, 2023 at 9:13 AM Cassio Neri wrote: > > I could not find any entry in gcc's bugzilla for that. Perhaps my search > wasn't good enough. I filed https://gcc.gnu.org/PR112395 with a first attempt at the patch (will double check it soon). Thanks, Andrew > >

Re: [RFC] vect: disable multiple calls of poly simdclones

2023-11-06 Thread Andrew Stubbs
t least your patch should have come with a testcase (or two). Is there a bugreport tracking this issue? It should affect GCN as well I guess. What does "non-constant simdclones" mean? I'm not sure if this is a thing that can happen on GCN, or not? Andrew

[PATCH][GCC13] PR tree-optimization/105834 - Choose better initial values for ranger.

2023-11-06 Thread Andrew MacLeod
As requested porting this patch from trunk resolves this PR in GCC 13. Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK for the gcc 13 branch? Andrew From 0182a25607fa353274c27ec57ca497c00f1d1b76 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Mon, 6 Nov 2023 11:33:32 -0500

Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types

2023-02-13 Thread Andrew Stubbs
ause the middle-end and expand pass give no assistance with that for vectors (unlike scalars). Andrew On 13/02/2023 08:07, Richard Biener via Gcc-patches wrote: On Sat, 11 Feb 2023, juzhe.zh...@rivai.ai wrote: Thanks for contributing this. Hi, Richard. Can you help us with this issue? In RVV

Re: -foffload-memory=pinned (was: [PATCH 1/5] openmp: Add -foffload-memory)

2023-02-13 Thread Andrew Stubbs
On 13/02/2023 14:38, Thomas Schwinge wrote: Hi! On 2022-03-08T11:30:55+, Hafiz Abid Qadeer wrote: From: Andrew Stubbs Add a new option. It will be used in follow-up patches. --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi +@option{-foffload-memory=pinned} forces all host

Re: [PATCH] amdgcn: Add instruction patterns for vector operations on complex numbers

2023-02-14 Thread Andrew Stubbs
On 09/02/2023 20:13, Andrew Jenner wrote: This patch introduces instruction patterns for complex number operations in the GCN machine description. These patterns are cmul, cmul_conj, vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls (cmla_conj and cmls_conj were not found

Re: [og12] In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE' (was: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator)

2023-02-14 Thread Andrew Stubbs
On 14/02/2023 12:54, Thomas Schwinge wrote: Hi Andrew! On 2022-01-13T11:13:51+, Andrew Stubbs wrote: Updated patch: this version fixes some missed cases of malloc in the realloc implementation. Right, and as it seems I've run into another issue: a stray 'free'.

[OG12][committed] amdgcn: OpenMP low-latency allocator

2023-02-16 Thread Andrew Stubbs
These patches implement an LDS memory allocator for OpenMP on AMD. 1. 230216-basic-allocator.patch Separate the allocator from NVPTX so the code can be shared. 2. 230216-amd-low-lat.patch Allocate the memory, adjust the default address space, and hook up the allocator. They will need to be

Re: [og12] Attempt to register OpenMP pinned memory using a device instead of 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)

2023-02-20 Thread Andrew Stubbs
On 17/02/2023 08:12, Thomas Schwinge wrote: Hi Andrew! On 2023-02-16T23:06:44+0100, I wrote: On 2023-02-16T16:17:32+, "Stubbs, Andrew via Gcc-patches" wrote: The mmap implementation was not optimized for a lot of small allocations, and I can't see that issue changin

Re: [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator)

2023-02-20 Thread Andrew Stubbs
ttached. Oops, thanks Thomas. Andrew

Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-02-23 Thread Andrew Stubbs
ay, if you want, commit the patch as is and tweak the testcases if possible incrementally. I will do so now. It would be nice to fix the testcase oddities, but I don't know how. I wrote the above yesterday, but apparently the email didn't send ... since then some bugs have been reported. I'll try to investigate today, although I think Richi has a fix already. Thanks Andrew

[committed][OG12] libgomp: no need to attach USM pointers

2023-02-23 Thread Andrew Stubbs
This patch fixes a bug in which libgomp doesn't know what to do with attached pointers in fortran derived types when using Unified Shared Memory instead of explicit mappings. I've committed it to the devel/omp/gcc-12 branch (OG12) and will fold it into the next rebase/repost of the USM patches

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-01 Thread Andrew Stubbs
in our test environment (and anyone else using remote), but the libgomp test should make up for that. Andrew

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-01 Thread Andrew Stubbs
On 01/03/2023 10:52, Andre Vieira (lists) wrote: On 01/03/2023 10:01, Andrew Stubbs wrote: > On 28/02/2023 23:01, Kwok Cheung Yeung wrote: >> Hello >> >> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION >> target hook for the AMD GCN a

Re: [PATCH] amdgcn: Add instruction patterns for conditional min/max operations

2023-03-02 Thread Andrew Stubbs
rtx_GTU (VOIDmode, 0, 0), operands[1], operands[2])); + Long lines need to be wrapped, here and elsewhere. Andrew

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-02 Thread Andrew Stubbs
On 02/03/2023 15:07, Kwok Cheung Yeung wrote: Hello I've made the suggested changes. Should I hold off on committing this until GCC 13 has been branched off? No need, amdgcn is not a primary target and this stuff won't affect anyone else. Please go ahead and commit. Andrew

Re: [PATCH] amdgcn: Add instruction patterns for conditional min/max operations

2023-03-06 Thread Andrew Stubbs
On 03/03/2023 17:05, Paul-Antoine Arras wrote: Le 02/03/2023 à 18:18, Andrew Stubbs a écrit : On 01/03/2023 16:56, Paul-Antoine Arras wrote: This patch introduces instruction patterns for conditional min and max operations (cond_{f|s|u}{max|min}) in the GCN machine description. It also allows

Re: [Patch] GCN update for wwwdocs / libgomp.texi

2023-03-08 Thread Andrew Stubbs
On 08/03/2023 11:06, Tobias Burnus wrote: Next try – this time with both patches. On 08.03.23 12:05, Tobias Burnus wrote: Hi Andrew, attached are two patches related to GCN, one for libgomp.texi documenting an env var and a release-notes update in www docs. OK? Comments? LGTM Andrew

Re: [Patch] gcn/mkoffload.cc: Pass -save-temps on for the hsaco step

2023-03-13 Thread Andrew Stubbs
On 13/03/2023 12:25, Tobias Burnus wrote: Found when comparing '-v -Wl,-v' output as despite -save-temps multiple runs yielded differed results. Fixed as attached. OK for mainline? OK. Andrew

[RFC] DWARF address spaces for local variables

2021-01-22 Thread Andrew Stubbs
he symbol being relocated. (This port does not use DTPrel for anything else.) How should I implement this feature to make it acceptable to commit? Thanks very much Andrew DWARF address spaces for local variables diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index f0e4636c06a..0c601bf2d0a 1

Re: [RFC] DWARF address spaces for local variables

2021-01-22 Thread Andrew Stubbs
On 22/01/2021 11:42, Andrew Stubbs wrote: @@ -20294,15 +20315,6 @@ add_location_or_const_value_attribute (dw_die_ref die, tree decl, bool cache_p) if (list) { add_AT_location_description (die, DW_AT_location, list); - - addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (decl

[committed] amdgcn: Allow V64DFmode min/max reductions

2021-01-26 Thread Andrew Stubbs
This patch fixes and AMD GCN bug in which attempting to use DFmode vector reductions would cause an ICE. There's no reason not to allow the reductions, so we simply enable them thusly. Andrew amdgcn: Allow V64DFmode min/max reductions I don't know why these were disabled. There&#x

[OG10][committed] amdgcn: Allow V64DFmode min/max reductions

2021-01-26 Thread Andrew Stubbs
Now backported to devel/omp/gcc-10. On 26/01/2021 10:29, Andrew Stubbs wrote: This patch fixes and AMD GCN bug in which attempting to use DFmode vector reductions would cause an ICE. There's no reason not to allow the reductions, so we simply enable them thusly. Andrew

[committed] amdgcn: Add gfx908 support

2021-02-03 Thread Andrew Stubbs
available yet, and don't have a public name yet, but with this we will be ready when they do. Andrew amdgcn: Add gfx908 support gcc/ * config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX908. * config/gcn/gcn.c (gcn_omp_device_kind_arch_isa): Add gfx908. (output_file_start)

Re: [RFC] DWARF address spaces for local variables

2021-02-04 Thread Andrew Stubbs
Ping. On 22/01/2021 11:42, Andrew Stubbs wrote: Hi all, Jakub, I need to implement DWARF for local variables that exist in an alternative address space. This happens for OpenACC gang-private variables (or will when the patches are committed) on AMD GCN, at least. This is distinct from

[commit][OG10] nvptx: remove erroneous stack deletion

2021-03-02 Thread Andrew Stubbs
data allocation fails, in the hope that memory can be reallocated more efficiently, but there's an additional, unconditional deallocate that looks like it may have been vestigial debug code, or something. Fixing the issue gives a 3x speed-up running the BabelStream benchmark. Andrew nvptx: remove

[committed][OG10] DWARF: late code range fixup

2021-03-06 Thread Andrew Stubbs
This patch fixes up the DWARF code ranges for offload debugging, again. This time it defers the changes until most other DWARF generation has occurred, because the previous method was causing ICEs on some testcases. This patch will be proposed for mainline in stage 1. Andrew DWARF: late

[committed][OG10] amdgcn: Fix early-debug relocations

2021-03-06 Thread Andrew Stubbs
This patch is now backported to devel/omp/gcc-10. Andrew On 26/11/2020 14:41, Andrew Stubbs wrote: This patch fixes an error in GCN mkoffload that corrupted relocations in the early-debug info. The code now updates the relocation code without zeroing the symbol index. Andrew

[OG11, committed] libgomp amdgcn: Fix issues with dynamic OpenMP thread scaling

2021-08-04 Thread Andrew Stubbs
increased resource usage. Committed to devel/omp/gcc-11. @ Thomas, this should probably be folded into another patch when upstreaming OG11 to mainline. Andrew libgomp amdgcn: Fix issues with dynamic OpenMP thread scaling libgomp/ChangeLog: * config/gcn/bar.h (gomp_barrier_init): Limit

RE: [wwwdocs] gcc-12/changes.html (GCN): >1 workers per gang

2021-08-16 Thread Stubbs, Andrew
ainline until the multiple-worker support is merged > there" > > @Andrew + @Julian: Do you intent to commit it relatively soon? > Regarding the wwwdocs patch, I can hold off until that commit or reword > it to only cover the workers part. Were these not part of the patch set Thomas was working on? Andrew

[PATCH] gdb: Add a dependency between gdb and libbacktrace

2021-08-30 Thread Andrew Burgess
I plan to make use of libbacktrace within GDB. I believe that the patch below needs to be merged into GCCs toplevel directory and then back-ported to the binutils-gdb repository. Is this OK to merge? Thanks, Andrew --- GDB is going to start using libbacktrace, so add a build dependency

[committed] amdgcn: Remove omp_gcn pass

2020-09-18 Thread Andrew Stubbs
to detect if nesting is present, and I don't have a lot of time to fix this issue, I'm just going to let it go, for now. Committed to master. I'll backport it to GCC 10 and OG10 shortly. Andrew amdgcn: Remove omp_gcn pass This pass only had an optimization for obtaining t

[PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-18 Thread Andrew Stubbs
xit then the barriers for the outer region will sync everything up again. OK to commit? Andrew P.S. I can approve the amdgcn portion myself; I'm seeking approval for the nvptx portion. libgomp: disable barriers in nested teams Both GCN and NVPTX allow nested parallel regions, but the barri

Re: [PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-19 Thread Andrew Stubbs
On 18/09/2020 12:25, Andrew Stubbs wrote: This patch fixes a problem in which nested OpenMP parallel regions cause errors if the number of inner teams is not balanced (i.e. the number of loop iterations is not divisible by the number of physical threads). A testcase is included. This updated

Re: [PATCH] dwarf: Multi-register CFI address support

2020-09-21 Thread Andrew Stubbs
Ping. On 03/09/2020 16:29, Andrew Stubbs wrote: On 28/08/2020 13:04, Andrew Stubbs wrote: Hi all, This patch introduces DWARF CFI support for architectures that require multiple registers to hold pointers, such as the stack pointer, frame pointer, and return address. The motivating case is

[committed, OG10] dwarf: Multi-register CFI address support

2020-09-22 Thread Andrew Stubbs
On 03/09/2020 16:29, Andrew Stubbs wrote: OK to commit? (Although, I'll hold off until AMD release the compatible GDB.) The ROCm 3.8 ROCGDB is now released. I'm committing the attached patches to devel/omp/gcc-10 while I wait for review. The first patch is the multi-register C

Re: [PATCH] OpenACC: Separate enter/exit data APIs

2020-09-25 Thread Andrew Stubbs
On 30/07/2020 12:10, Andrew Stubbs wrote: On 29/07/2020 15:05, Andrew Stubbs wrote: This patch does not implement anything new, but simply separates OpenACC 'enter data' and 'exit data' into two libgomp API functions. The original API name is kept for backward compatibi

Re: [PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-28 Thread Andrew Stubbs
d a comment there referring to the code this patch adds. /* Accelerators with fixed thread counts require this to return 1 for nested parallel regions. */ WDYT? Andrew

[PATCH] libgomp: Enforce 1-thread limit in subteams

2020-09-29 Thread Andrew Stubbs
it wasn't completely safe. This patch ensures that the previous assumption is safe, by ignoring the relevant ICV on NVPTX and AMD GCN, neither of which can support it. OK to commit? Andrew libgomp: Enforce 1-thread limit in subteams Accelerators with fixed thread-counts will break if n

Re: [PATCH] dwarf: Multi-register CFI address support

2020-10-05 Thread Andrew Stubbs
Ping. On 21/09/2020 14:51, Andrew Stubbs wrote: Ping. On 03/09/2020 16:29, Andrew Stubbs wrote: On 28/08/2020 13:04, Andrew Stubbs wrote: Hi all, This patch introduces DWARF CFI support for architectures that require multiple registers to hold pointers, such as the stack pointer, frame

[committed] amdgcn: Use scalar instructions for addptrdi3

2020-10-07 Thread Andrew Stubbs
register. It was only the way it was because vector instructions can specify a custom register to clobber. Hopefully this will help prevent unnecessary register moves for address calculations. Andrew amdgcn: Use scalar instructions for addptrdi3 Allow addptr to use SPGRs as well as VGPRs for

Re: [OG12][PATCH] openmp: Fix handling of target constructs in static member

2022-09-13 Thread Andrew Stubbs
On 13/09/2022 12:03, Paul-Antoine Arras wrote: Hello, This patch intends to backport e90af965e5c by Jakub Jelinek to devel/omp/gcc-12. The original patch was described here: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601189.html I've merged and committed it for you. Andrew

[PATCH] vect: while_ult for integer mask

2022-09-28 Thread Andrew Stubbs
This patch is a prerequisite for some amdgcn patches I'm working on to support shorter vector lengths (having fixed 64 lanes tends to miss optimizations, and masking is not supported everywhere yet). The problem is that, unlike AArch64, I'm not using different mask modes for different sized ve

Re: [PATCH] vect: while_ult for integer mask

2022-09-29 Thread Andrew Stubbs
On 29/09/2022 08:52, Richard Biener wrote: On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs wrote: This patch is a prerequisite for some amdgcn patches I'm working on to support shorter vector lengths (having fixed 64 lanes tends to miss optimizations, and masking is not supported everywher

Re: [PATCH] vect: while_ult for integer mask

2022-09-29 Thread Andrew Stubbs
s different because their config register has an explicit length field whereas GCN just uses a mask to limit the length (more like AArch64, I think). The RVV solution uses different logic in the gimple IR; this proposal is indistinguishable from the status quo at that point. Andrew

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-09-29 Thread Andrew Stubbs
Why has prog_finalized been moved? Andrew did suggest a while back to piggyback on the console_output handling, avoiding another atomic access. - If this is still wanted, I like to have some guidance regarding how to actually implement it. The console output ring buffer has the following type: struc

[committed] amdgcn: remove unused variable

2022-09-29 Thread Andrew Stubbs
I've committed this small clean up. It silences a warning. Andrewamdgcn: remove unused variable This was left over from a previous version of the SIMD clone patch. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Remove unused elt_bits variable.

<    1   2   3   4   5   6   7   8   9   10   >