I've committed this patch for amdgcn.
This changes the procedure calling ABI such that vector arguments are
passed in vector registers, rather than on the stack as before.
The ABI for scalar functions is the same for arguments, but the return
value has now moved to a vector register; keeping
ure has
backend support for the clones at this time.
OK for mainline (patches 1 & 3)?
Thanks
Andrew
Andrew Stubbs (3):
omp-simd-clone: Allow fixed-lane vectors
amdgcn: OpenMP SIMD routine support
vect: inbranch SIMD clones
gcc/config/gcn/gcn.cc | 63
gcc
The vecsize_int/vecsize_float has an assumption that all arguments will use
the same bitsize, and vary the number of lanes according to the element size,
but this is inappropriate on targets where the number of lanes is fixed and
the bitsize varies (i.e. amdgcn).
With this change the vecsize can
Enable and configure SIMD clones for amdgcn. This affects both the __simd__
function attribute, and the OpenMP "declare simd" directive.
Note that the masked SIMD variants are generated, but the middle end doesn't
actually support calling them yet.
gcc/ChangeLog:
* config/gcn/gcn.cc (g
There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).
This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode). The other cases fail to vectorize, just as before,
so there should be n
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.
libgomp/ChangeLog:
ation so both
architectures can share the code.
Andrew
Andrew Stubbs (3):
libgomp, nvptx: low-latency memory allocator
openmp, nvptx: low-lat memory access traits
amdgcn, libgomp: low-latency allocator
gcc/config/gcn/gcn-builtins.def | 2 +
gcc/config/gc
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
t
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).
Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing
There were implementations for HImode division in libgcc, but there were
no matching libfuncs defined in the compiler, so the code was inactive
(GCC only defines SImode and DImode, by default, and amdgcn only adds
TImode explicitly).
On trying to activate it I find that the definition of
TARG
This patch adds just enough TImode vector support to use them for moving
data about. This is primarily for the use of divmodv64di4, which will
use TImode to return a pair of DImode values.
The TImode vectors have no other operators defined, and there are no
hardware instructions to support thi
On Mon, Jul 12, 2021 at 4:47 AM Richard Biener via Gcc-patches
wrote:
>
> On Sun, Jul 11, 2021 at 4:12 AM apinski--- via Gcc-patches
> wrote:
> >
> > From: Andrew Pinski
> >
> > This patch moves the (a-b) CMP 0 ? (a-b) : (b-a) optimization
> > from
I've committed this patch that allows building binaries for AMD gfx1030
GPUs. I can't actually test it, however, so somebody else will have to
debug it (or wait for me to get my hands on a device). Richi reports
that it does not execute correctly, as is.
This is an experimental broken feature,
On 19/10/2023 11:07, Tobias Burnus wrote:
On 19.10.23 11:49, Andrew Stubbs wrote:
OK to commit?
(I think as maintainer you don't need approval - but of course comments
by others can be helpful; I hope mine are. Additionally, Gerald (CCed)
helps with keeping the webpages in good shape (t
This patch fixes a wrong-code bug on amdgcn in which the excess "ones"
in the mask enable extra lanes that were supposed to be unused and are
therefore undefined.
Richi suggested an alternative approach involving narrower types and
then a zero-extend to the actual mask type. This solved the p
or constexpr and not modify it not to include trees it does not use.
In this case NON_DEPENDENT_EXPR was removed and now the rust front-end
is broken.
Thanks,
Andrew
>
> gcc/cp/ChangeLog:
>
> * call.cc (build_new_method_call): Remove calls to
> buil
convert_to_complex when creating a COMPLEX_EXPR does
not currently check if either the real or imag parts
was not error_mark_node. This later on confuses the gimpilfier
when there was a SAVE_EXPR wrapped around that COMPLEX_EXPR.
The simple fix is after calling convert inside convert_to_complex_1,
On Thu, Oct 19, 2023 at 10:13 PM Andrew Pinski wrote:
>
> On Mon, Jul 12, 2021 at 4:47 AM Richard Biener via Gcc-patches
> wrote:
> >
> > On Sun, Jul 11, 2021 at 4:12 AM apinski--- via Gcc-patches
> > wrote:
> > >
> > > From: Andrew Pinski
> &
From: Andrew Pinski
This patch moves the `(a-b) CMP 0 ? (a-b) : (b-a)` optimization
from fold_cond_expr_with_comparison to match.
Bootstrapped and tested on x86_64-linux-gnu.
Changes in:
v2: Removes `(a == b) ? 0 : (b - a)` handling since it was handled
via r14-3606-g3d86e7f4a8ae
a lib-fputs.c file which will
define a fputs_unlock which is how it will link even if the libc does
not define a fputs_unlock.
Thanks,
Andrew Pinski
>
> gcc/testsuite/
>
> * gcc.c-torture/execute/builtins/fputs.c (_GNU_SOURCE):
> Define.
>
> ---
> gcc/te
While working on PR c/111903, I Noticed that
convert will convert integer_zero_node to that
type after an error instead of returning error_mark_node.
>From what I can tell this was the old way of not having
error recovery since other places in this file does return
error_mark_node and the places I
After r14-3110-g7fb65f10285, the canonical form for
`a ? ~b : b` changed to be `-(a) ^ b` that means
for aarch64 we need to add a few new insn patterns
to be able to catch this and change it to be
what is the canonical form for the aarch64 backend.
A secondary pattern was needed to support a zero_e
So this pattern needs a little help on the gimple side of things to know what
the type popcount should be. For most builtins, the type is the same as the
input
but popcount and others are not. And when using it with another outer
expression,
genmatch needs some slight help to know that the return
issue with any correct code that GCC will process.
GCC does not consider this a security issue according to its security policy.
See the "Security features implemented in GCC" section of
https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=SECURITY.txt;hb=HEAD
for more information on that policy.
Thanks,
Andrew Pinski
>
> Best regards,
In the case of a NOP conversion (precisions of the 2 types are equal),
factoring out the conversion can be done even if int_fits_type_p returns
false and even when the conversion is defined by a statement inside the
conditional. Since it is a NOP conversion there is no zero/sign extending
happening
This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified
to `abs(a)`. if C1 was originally *_MIN then change it over to use absu instead
of abs.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/111957
gcc/ChangeLog:
* match
I noticed we were missing optimizing `a / (1 << b)` when
we know that a is nonnegative but only due to ranger information.
This adds the use of the global ranger to tree_single_nonnegative_warnv_p
for SSA_NAME.
I didn't extend tree_single_nonnegative_warnv_p to use the ranger for floating
point nor
. The result is a
2.1% speedup in VRP and a 0.8% speedup in threading, with a overall
compile time improvement of 0.14% across the GCC build.
Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.
Andrew
commit f7dbf6230453c76a19921607601eff968bb70169
Author: Andrew MacLeod
Date
On Thu, Oct 26, 2023 at 2:24 AM Richard Biener
wrote:
>
> On Wed, Oct 25, 2023 at 5:37 AM Andrew Pinski wrote:
> >
> > This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified
> > to `abs(a)`. if C1 was originally *_MIN then change it over to use absu
On Thu, Oct 26, 2023 at 2:29 AM Richard Biener
wrote:
>
> On Wed, Oct 25, 2023 at 5:51 AM Andrew Pinski wrote:
> >
> > I noticed we were missing optimizing `a / (1 << b)` when
> > we know that a is nonnegative but only due to ranger information.
> > This
d modern gettext
> >
>
> Ping on this patch series.
One comment from me. It would be nice to update install.texi in
gcc/doc/ to make a mention of this requirement for non-glibc hosts.
Thanks,
Andrew Pinski
>
> TIA, have a lovely night :-)
> --
> Arsen Arsenović
but that should stabilize. OK for trunk?
This fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93619 I think.
Thanks,
Andrew
> thanks
> Iain
>
> --- 8< ---
>
> When the compiler is configured --with-cpu= and that is different from
> the baselines assumed, we see excess te
always feasible because it requires the whole program and any used
> libraries to also be built with it (as it breaks ABI).
One suggestion to this is also link to the libstdc++ manual on debug mode:
https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug_mode.html
Thanks,
Andrew
>
> S
From: Andrew Pinski
I noticed we were missing these simplifications so let's add them.
This adds the following simplifications:
U & N <= U -> true
U & N > U -> false
When U is known to be as non-negative.
When N is also known to be non-negative, this is also true:
U
On Thu, Oct 26, 2023 at 11:56 PM Richard Biener
wrote:
>
>
>
> > Am 26.10.2023 um 23:10 schrieb Andrew Pinski :
> >
> > From: Andrew Pinski
> >
> > I noticed we were missing these simplifications so let's add them.
> >
> > This adds the
On 22/10/2023 13:24, Gerald Pfeifer wrote:
Hi Andrew,
On Fri, 20 Oct 2023, Andrew Stubbs wrote:
Additionally, I wonder whether "Fiji" should be changed to "Fiji
(gfx803)" in the first line and whether the "," should be removed in
"The ... configuration .
This trivial patch adds the "operands" keyword to the condition in a
couple of patterns that cause warnings about "missing" mode specifiers.
With the iterators, there were a large number of warnings about these
cases that have now been silenced.
Andrewamdgcn: silence warnings
The operands re
On 20/10/2023 12:51, Andrew Stubbs wrote:
I've committed this patch that allows building binaries for AMD gfx1030
GPUs. I can't actually test it, however, so somebody else will have to
debug it (or wait for me to get my hands on a device). Richi reports
that it does not execute cor
On Fri, Oct 27, 2023 at 6:55 AM Jeff Law wrote:
>
>
>
> On 10/27/23 01:49, Robin Dapp wrote:
> >> @@ -346,7 +346,7 @@ static const struct riscv_tune_param rocket_tune_info
> >> = {
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
> >> {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},
On Fri, Oct 27, 2023 at 6:44 AM Jeff Law wrote:
>
>
>
> On 10/27/23 01:37, juzhe.zh...@rivai.ai wrote:
> > LGTM from my side.
> >
> > The original integer division COST seems too low.
> Almost certainly, though there may be good reasons why it was initially
> set so low. I'm generally hesitant to
k which seems like a good place to
put a test for both formats so when someone changes the function, they
could run that testsuite to make sure it is still working for the
other format.
(Note I am not saying you should add it as part of this patch but it
seems like that would be the perfect place for it.)
Thanks,
Andrew
>
> Anyway, to make progress, is the revised version OK for trunk? (tested on
> aarch64-linux and aarch64-darwin).
> thanks
> Iain
>
>
>
generation independently of that move.
Note this does not add the absorbing_element_p optimizations yet; I filed PR
112271
to record that move.
Andrew Pinski (3):
MATCH: first of the value replacement moving from phiopt
MATCH: Move jump_function_from_stmt support to match.pd
MATCH: Add some more
This moves the value_replacement support for jump_function_from_stmt
to match pattern.
This allows us to optimize things earlier in phiopt1 rather than waiting
to phiopt2. Which means phiopt1 needs to be disable for vrp03.c testcase.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
This moves a few more value_replacements simplifications to match.
/* a == 1 ? b : a * b -> a * b */
/* a == 1 ? b : b / a -> b / a */
/* a == -1 ? b : a & b -> a & b */
Also adds a testcase to show can we catch these where value_replacement would
not
(but other passes would).
Bootstrapped and
This moves a few simple patterns that are done in value replacement
in phiopt over to match.pd. Just the simple ones which might show up
in other code.
This allows some optimizations to happen even without depending
on sinking from happening and in some cases where phiopt is not
invoked (cond-1.c
On Mon, Oct 30, 2023 at 5:05 AM Iain Sandoe wrote:
>
>
>
> > On 30 Oct 2023, at 11:53, FX Coudert wrote:
>
> > The newly introduced test gcc.target/i386/pr111698.c currently fails on
> > Darwin, where the default arch is core2.
> > Andrew suggested in https
On Mon, Oct 30, 2023 at 2:29 AM Richard Biener
wrote:
>
> On Sun, Oct 29, 2023 at 5:41 PM Andrew Pinski wrote:
> >
> > This moves the value_replacement support for jump_function_from_stmt
> > to match pattern.
> > This allows us to optimize things earlier in phio
),
> +num_ops (6)
> +{
> + ops[0] = op0;
> + ops[1] = op1;
> + ops[2] = op2;
> + ops[3] = op3;
> + ops[4] = op4;
> + ops[5] = op5;
> +}
Hmm, does it make sense to start to use variadic templates for these
constructors instead of writing them out?
And
On Tue, Oct 31, 2023 at 12:08 AM Lehua Ding wrote:
>
> Hi Andrew,
>
> On 2023/10/31 14:48, Andrew Pinski wrote:
> >> +inline
> >> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> >> +
(_40 != 6.4e+1) // not working
It is test_epi32_ps which is failing with TEST_PS macro and the plus
operand that uses TESTOP:
TESTOP (add, +, float, ps, 0.0f); \
I have not reduced the testcase any further though.
Thanks,
Andrew Pinski
>
> Regards
&
On Thu, Oct 26, 2023 at 07:41:09PM +0100, Richard Sandiford wrote:
> Andrew Carlotti writes:
> > This patch adds support for the "target_version" attribute to the middle
> > end and the C++ frontend, which will be used to implement function
> > multiversioning in th
those bits with those bits from the value field.
Bootstraps on build-x86_64-pc-linux-gnu with no regressions. Pushed.
Andrew
From b20f1dce46fb8bb1b142e9087530e546a40edec8 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date: Tue, 31 Oct 2023 11:51:34 -0400
Subject: [PATCH 1/2] Remove simple range
] [2, +INF] MASK 0xfffe VALUE 0x1
will indicate that any even constants will be false.
Bootstraps on x86_64-pc-linux-gnu with no regressions. Pushed.
Andrew
From eb899fee35b8326b2105c04f58fd58bbdeca9d3b Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date: Wed, 25 Oct 2023 09:46:50 -0400
Sub
On Sun, Nov 5, 2023 at 9:13 AM Cassio Neri wrote:
>
> I could not find any entry in gcc's bugzilla for that. Perhaps my search
> wasn't good enough.
I filed https://gcc.gnu.org/PR112395 with a first attempt at the patch
(will double check it soon).
Thanks,
Andrew
>
>
t least your patch should have come with a testcase (or two).
Is there a bugreport tracking this issue? It should affect GCN as well
I guess.
What does "non-constant simdclones" mean? I'm not sure if this is a
thing that can happen on GCN, or not?
Andrew
As requested porting this patch from trunk resolves this PR in GCC 13.
Bootstraps on x86_64-pc-linux-gnu with no regressions. OK for the gcc
13 branch?
Andrew
From 0182a25607fa353274c27ec57ca497c00f1d1b76 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date: Mon, 6 Nov 2023 11:33:32 -0500
ause the middle-end and expand pass give no
assistance with that for vectors (unlike scalars).
Andrew
On 13/02/2023 08:07, Richard Biener via Gcc-patches wrote:
On Sat, 11 Feb 2023, juzhe.zh...@rivai.ai wrote:
Thanks for contributing this.
Hi, Richard. Can you help us with this issue?
In RVV
On 13/02/2023 14:38, Thomas Schwinge wrote:
Hi!
On 2022-03-08T11:30:55+, Hafiz Abid Qadeer wrote:
From: Andrew Stubbs
Add a new option. It will be used in follow-up patches.
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
+@option{-foffload-memory=pinned} forces all host
On 09/02/2023 20:13, Andrew Jenner wrote:
This patch introduces instruction patterns for complex number operations
in the GCN machine description. These patterns are cmul, cmul_conj,
vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls
(cmla_conj and cmls_conj were not found
On 14/02/2023 12:54, Thomas Schwinge wrote:
Hi Andrew!
On 2022-01-13T11:13:51+, Andrew Stubbs wrote:
Updated patch: this version fixes some missed cases of malloc in the
realloc implementation.
Right, and as it seems I've run into another issue: a stray 'free'.
These patches implement an LDS memory allocator for OpenMP on AMD.
1. 230216-basic-allocator.patch
Separate the allocator from NVPTX so the code can be shared.
2. 230216-amd-low-lat.patch
Allocate the memory, adjust the default address space, and hook up the
allocator.
They will need to be
On 17/02/2023 08:12, Thomas Schwinge wrote:
Hi Andrew!
On 2023-02-16T23:06:44+0100, I wrote:
On 2023-02-16T16:17:32+, "Stubbs, Andrew via Gcc-patches"
wrote:
The mmap implementation was not optimized for a lot of small allocations, and I
can't see that issue changin
ttached.
Oops, thanks Thomas.
Andrew
ay, if you want, commit the patch as is and tweak the testcases if
possible incrementally.
I will do so now. It would be nice to fix the testcase oddities, but I
don't know how.
I wrote the above yesterday, but apparently the email didn't send ...
since then some bugs have been reported. I'll try to investigate today,
although I think Richi has a fix already.
Thanks
Andrew
This patch fixes a bug in which libgomp doesn't know what to do with
attached pointers in fortran derived types when using Unified Shared
Memory instead of explicit mappings.
I've committed it to the devel/omp/gcc-12 branch (OG12) and will fold it
into the next rebase/repost of the USM patches
in our test environment (and anyone else
using remote), but the libgomp test should make up for that.
Andrew
On 01/03/2023 10:52, Andre Vieira (lists) wrote:
On 01/03/2023 10:01, Andrew Stubbs wrote:
> On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
>> Hello
>>
>> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>> target hook for the AMD GCN a
rtx_GTU (VOIDmode, 0, 0), operands[1],
operands[2]));
+
Long lines need to be wrapped, here and elsewhere.
Andrew
On 02/03/2023 15:07, Kwok Cheung Yeung wrote:
Hello
I've made the suggested changes. Should I hold off on committing this
until GCC 13 has been branched off?
No need, amdgcn is not a primary target and this stuff won't affect
anyone else. Please go ahead and commit.
Andrew
On 03/03/2023 17:05, Paul-Antoine Arras wrote:
Le 02/03/2023 à 18:18, Andrew Stubbs a écrit :
On 01/03/2023 16:56, Paul-Antoine Arras wrote:
This patch introduces instruction patterns for conditional min and max
operations (cond_{f|s|u}{max|min}) in the GCN machine description. It
also allows
On 08/03/2023 11:06, Tobias Burnus wrote:
Next try – this time with both patches.
On 08.03.23 12:05, Tobias Burnus wrote:
Hi Andrew,
attached are two patches related to GCN, one for libgomp.texi
documenting an env var
and a release-notes update in www docs.
OK? Comments?
LGTM
Andrew
On 13/03/2023 12:25, Tobias Burnus wrote:
Found when comparing '-v -Wl,-v' output as despite -save-temps multiple
runs
yielded differed results.
Fixed as attached.
OK for mainline?
OK.
Andrew
he symbol being
relocated. (This port does not use DTPrel for anything else.)
How should I implement this feature to make it acceptable to commit?
Thanks very much
Andrew
DWARF address spaces for local variables
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index f0e4636c06a..0c601bf2d0a 1
On 22/01/2021 11:42, Andrew Stubbs wrote:
@@ -20294,15 +20315,6 @@ add_location_or_const_value_attribute (dw_die_ref die,
tree decl, bool cache_p)
if (list)
{
add_AT_location_description (die, DW_AT_location, list);
-
- addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (decl
This patch fixes and AMD GCN bug in which attempting to use DFmode
vector reductions would cause an ICE.
There's no reason not to allow the reductions, so we simply enable them
thusly.
Andrew
amdgcn: Allow V64DFmode min/max reductions
I don't know why these were disabled. There
Now backported to devel/omp/gcc-10.
On 26/01/2021 10:29, Andrew Stubbs wrote:
This patch fixes and AMD GCN bug in which attempting to use DFmode
vector reductions would cause an ICE.
There's no reason not to allow the reductions, so we simply enable them
thusly.
Andrew
available yet, and don't have a
public name yet, but with this we will be ready when they do.
Andrew
amdgcn: Add gfx908 support
gcc/
* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX908.
* config/gcn/gcn.c (gcn_omp_device_kind_arch_isa): Add gfx908.
(output_file_start)
Ping.
On 22/01/2021 11:42, Andrew Stubbs wrote:
Hi all, Jakub,
I need to implement DWARF for local variables that exist in an
alternative address space. This happens for OpenACC gang-private
variables (or will when the patches are committed) on AMD GCN, at least.
This is distinct from
data allocation fails,
in the hope that memory can be reallocated more efficiently, but there's
an additional, unconditional deallocate that looks like it may have been
vestigial debug code, or something.
Fixing the issue gives a 3x speed-up running the BabelStream benchmark.
Andrew
nvptx: remove
This patch fixes up the DWARF code ranges for offload debugging, again.
This time it defers the changes until most other DWARF generation has
occurred, because the previous method was causing ICEs on some testcases.
This patch will be proposed for mainline in stage 1.
Andrew
DWARF: late
This patch is now backported to devel/omp/gcc-10.
Andrew
On 26/11/2020 14:41, Andrew Stubbs wrote:
This patch fixes an error in GCN mkoffload that corrupted relocations in
the early-debug info.
The code now updates the relocation code without zeroing the symbol index.
Andrew
increased resource usage.
Committed to devel/omp/gcc-11.
@ Thomas, this should probably be folded into another patch when
upstreaming OG11 to mainline.
Andrew
libgomp amdgcn: Fix issues with dynamic OpenMP thread scaling
libgomp/ChangeLog:
* config/gcn/bar.h (gomp_barrier_init): Limit
ainline until the multiple-worker support is merged
> there"
>
> @Andrew + @Julian: Do you intent to commit it relatively soon?
> Regarding the wwwdocs patch, I can hold off until that commit or reword
> it to only cover the workers part.
Were these not part of the patch set Thomas was working on?
Andrew
I plan to make use of libbacktrace within GDB. I believe that the
patch below needs to be merged into GCCs toplevel directory and then
back-ported to the binutils-gdb repository.
Is this OK to merge?
Thanks,
Andrew
---
GDB is going to start using libbacktrace, so add a build dependency
to detect if nesting is present, and I don't
have a lot of time to fix this issue, I'm just going to let it go, for now.
Committed to master. I'll backport it to GCC 10 and OG10 shortly.
Andrew
amdgcn: Remove omp_gcn pass
This pass only had an optimization for obtaining t
xit then the barriers for
the outer region will sync everything up again.
OK to commit?
Andrew
P.S. I can approve the amdgcn portion myself; I'm seeking approval for
the nvptx portion.
libgomp: disable barriers in nested teams
Both GCN and NVPTX allow nested parallel regions, but the barri
On 18/09/2020 12:25, Andrew Stubbs wrote:
This patch fixes a problem in which nested OpenMP parallel regions cause
errors if the number of inner teams is not balanced (i.e. the number of
loop iterations is not divisible by the number of physical threads). A
testcase is included.
This updated
Ping.
On 03/09/2020 16:29, Andrew Stubbs wrote:
On 28/08/2020 13:04, Andrew Stubbs wrote:
Hi all,
This patch introduces DWARF CFI support for architectures that require
multiple registers to hold pointers, such as the stack pointer, frame
pointer, and return address. The motivating case is
On 03/09/2020 16:29, Andrew Stubbs wrote:
OK to commit? (Although, I'll hold off until AMD release the
compatible GDB.)
The ROCm 3.8 ROCGDB is now released. I'm committing the attached patches
to devel/omp/gcc-10 while I wait for review.
The first patch is the multi-register C
On 30/07/2020 12:10, Andrew Stubbs wrote:
On 29/07/2020 15:05, Andrew Stubbs wrote:
This patch does not implement anything new, but simply separates
OpenACC 'enter data' and 'exit data' into two libgomp API functions.
The original API name is kept for backward compatibi
d a comment there referring to the code
this patch adds.
/* Accelerators with fixed thread counts require this to return 1 for
nested parallel regions. */
WDYT?
Andrew
it wasn't completely safe.
This patch ensures that the previous assumption is safe, by ignoring the
relevant ICV on NVPTX and AMD GCN, neither of which can support it.
OK to commit?
Andrew
libgomp: Enforce 1-thread limit in subteams
Accelerators with fixed thread-counts will break if n
Ping.
On 21/09/2020 14:51, Andrew Stubbs wrote:
Ping.
On 03/09/2020 16:29, Andrew Stubbs wrote:
On 28/08/2020 13:04, Andrew Stubbs wrote:
Hi all,
This patch introduces DWARF CFI support for architectures that
require multiple registers to hold pointers, such as the stack
pointer, frame
register. It was only the way it was because vector
instructions can specify a custom register to clobber.
Hopefully this will help prevent unnecessary register moves for address
calculations.
Andrew
amdgcn: Use scalar instructions for addptrdi3
Allow addptr to use SPGRs as well as VGPRs for
On 13/09/2022 12:03, Paul-Antoine Arras wrote:
Hello,
This patch intends to backport e90af965e5c by Jakub Jelinek to
devel/omp/gcc-12.
The original patch was described here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601189.html
I've merged and committed it for you.
Andrew
This patch is a prerequisite for some amdgcn patches I'm working on to
support shorter vector lengths (having fixed 64 lanes tends to miss
optimizations, and masking is not supported everywhere yet).
The problem is that, unlike AArch64, I'm not using different mask modes
for different sized ve
On 29/09/2022 08:52, Richard Biener wrote:
On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs wrote:
This patch is a prerequisite for some amdgcn patches I'm working on to
support shorter vector lengths (having fixed 64 lanes tends to miss
optimizations, and masking is not supported everywher
s different because their config
register has an explicit length field whereas GCN just uses a mask to
limit the length (more like AArch64, I think).
The RVV solution uses different logic in the gimple IR; this proposal is
indistinguishable from the status quo at that point.
Andrew
Why has prog_finalized been moved?
Andrew did suggest a while back to piggyback on the console_output handling,
avoiding another atomic access. - If this is still wanted, I like to have some
guidance regarding how to actually implement it.
The console output ring buffer has the following type:
struc
I've committed this small clean up. It silences a warning.
Andrewamdgcn: remove unused variable
This was left over from a previous version of the SIMD clone patch.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen):
Remove unused elt_bits variable.
101 - 200 of 6242 matches
Mail list logo