http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58669
Testing:
$ /usr/lib/jvm/icedtea-6/bin/java TestProcessors
Processors: 8
$ /usr/lib/jvm/gcj-jdk/bin/java -version
java version "1.5.0"
gij (GNU libgcj) version 4.8.1
$ /usr/lib/jvm/gcj-jdk/bin/java TestProcessors
Processors: 1
$ /h
From: Andrew Pinski
The issue here is that when backprop tries to go
and strip sign ops, it skips over ABSU_EXPR but
ABSU_EXPR not only does an ABS, it also changes the
type to unsigned.
Since strip_sign_op_1 is only supposed to strip off
sign changing operands and not ones that change types
From: Andrew Pinski
The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
the comparison to see if the transformation could be done was using the
wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
the outer value, it was comparing the inner to the value used in the
range but
turns out it was really being set in DOM2. Instead they check for the
range in the final listing...
Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.
Andrew
From dae5de2a2353b928cc7099a78d88a40473abefd2 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date: Wed, 27 Sep
ng on there)
Bootstraps on x86_64-pc-linux-gnu with no regressions. Pushed.
Andrew
From 29abc475a360ad14d5f692945f2805fba1fdc679 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date: Thu, 28 Sep 2023 09:19:32 -0400
Subject: [PATCH 2/5] Remove pass counting in VRP.
Rather than using a pass cou
huh. thanks, I'll have a look.
Andrew
On 10/3/23 11:47, David Edelsohn wrote:
This patch caused a bootstrap failure on AIX.
during GIMPLE pass: evrp
/nasfarm/edelsohn/src/src/libgcc/libgcc2.c: In function '__gcc_bcmp':
/nasfarm/edelsohn/src/src/libgcc/libgcc2.c:2910:1: in
Give this a try.. I'm testing it here, but x86 doesn't seem to show it
anyway for some reason :-P
I think i needed to handle pointers special since SSA_NAMES handle
pointer ranges different.
Andrew
On 10/3/23 11:47, David Edelsohn wrote:
This patch caused a bootstrap fail
perfect. I'll check it in when my testrun is done.
Thanks .. . and sorry :-)
Andrew
On 10/3/23 12:53, David Edelsohn wrote:
AIX bootstrap is happier with the patch.
Thanks, David
On Tue, Oct 3, 2023 at 12:30 PM Andrew MacLeod
wrote:
Give this a try.. I'm testing it her
On 10/3/23 13:02, David Malcolm wrote:
On Tue, 2023-10-03 at 10:32 -0400, Andrew MacLeod wrote:
Pass counting in VRP is used to decide when to call early VRP, pass
the
flag to enable warnings, and when the final pass is.
If you try to add additional passes, this becomes quite fragile. This
d8808c37d29110872fa51b98e71aef9e160b4692
Author: Andrew MacLeod
Date: Tue Oct 3 12:32:10 2023 -0400
Don't use range_info_get_range for pointers.
Pointers only track null and nonnull, so we need to handle them
specially.
* tree-ssanames.cc (set_range_info): Use get_ptr_inf
otstraps just fine as at commit 7eb5ce7f58ed ("Remove pass counting in
> VRP.").
>
> Shall I file a PR, or can you handle it regardless? Let me know if you
> need anything from me.
It is already filed as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111688 .
Thanks,
Andrew
>
> Maciej
d")
> (match_operand:GPF 1 "register_operand")
> - (match_operand:GPF 2 "register_operand")]
> + (match_operand:GPF 2 "nonmemory_operand")]
>"TARGET_SIMD"
> {
> - rtx bitmask = gen_reg_rtx (mode);
> + machine_mode int_mode =
from anywhere.
Pushed.
Andrew
From ad8cd713b4e489826e289551b8b8f8f708293a5b Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date: Fri, 28 Jul 2023 13:18:15 -0400
Subject: [PATCH 2/3] Add a dom based ranger for fast VRP.
Provide a dominator based implementation of a range query.
* gimple_ran
t only looks at whether NAME
has a range, and returns it if it does. not other overhead.
Pushed.
From 52c1e2c805bc2fd7a30583dce3608b738f3a5ce4 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date: Tue, 15 Aug 2023 17:29:58 -0400
Subject: [PATCH 1/3] Add outgoing range vector calcualtion API
Pr
file with the extension .fvrp.
pushed.
From f4e2dac53fd62fbf2af95e0bf26d24e929fa1f66 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date: Mon, 2 Oct 2023 18:32:49 -0400
Subject: [PATCH 3/3] Create a fast VRP pass
* timevar.def (TV_TREE_FAST_VRP): New.
* tree-pass.h (make_pass_fast_vrp): New
side
effects). A little additional work can reduce the memory footprint
further too. I have done no experiments as yet as to the cot of adding
relations, but it would be pretty straightforward as it is just reusing
all the same components the main ranger does
Andrew
form?
> > I.e. could we go with your new version of the match.pd patch, and add some
> > isel stuff as a follow-on?
> >
>
> Sure if that's what's desired But..
>
> The example you posted above is for instance worse for x86
> https://godbolt.org/z/x9ccqxW
Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)`
into `vec_cond(a & b, c, d)` but since in this case a is a comparison
fold will change `a & b` back into `vec_cond(a,b,0)` which causes an
infinite loop.
The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*
I've just committed this simple patch to silence an enum warning.
Andrewamdgcn: silence warning
gcc/ChangeLog:
* config/gcn/gcn.cc (print_operand): Adjust xcode type to fix warning.
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index f6cff659703..ef3b6472a52 100644
--- a/g
I've just committed this patch. It should have no functional changes
except to make it easier to add new alternatives into the
alternative-heavy move instructions.
Andrewamdgcn: switch mov insns to compact syntax
The move instructions typically have many alternatives (and I'm about to add
more
{
target vect_strided5 } } } */
This patch causes a test regression on amdgcn because vect_strided5 is
true (because check_effective_target_vect_fully_masked is true), but the
testcase still gives the message 4 times. Perhaps because amdgcn uses
masking and not vect_load_lanes?
Andrew
OPYSIGN @0 @1))
> > >>> (coss @0)))
> > >>>
> > >>> which properly will diagnose a duplicate pattern. Ther are
> > >>> currently no operator lists with just builtins defined (that
> > >>> could be fixed, see gencfn-macros
t;vectorizing stmts using SLP" 3
"vect" { target vect_strided5 && vect_load_lanes } } } */
Could you verify it whether it work for you ?
You need an additional set of curly braces in the second line to avoid a
syntax error message, but I get a pass with that change.
Thanks
Andrew
VREL_EQ... as there is only
one. As it stands, always returns VREL_EQ, so simply use VREL_EQ in the
2 calling locations.
Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.
Andrew
From 5ee51119d1345f3f13af784455a4ae466766912b Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date
.
Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.
Andrew
From b0892b1fc637fadf14d7016858983bc5776a1e69 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod
Date: Mon, 9 Oct 2023 10:15:07 -0400
Subject: [PATCH 2/2] Ensure float equivalences include + and - zero.
A floating point equivalence may not
So currently we have a simplification for `a | ~(a ^ b)` but
that does not match the case where we had originally `(~a) | (a ^ b)`
so we need to add a new pattern that matches that and uses
bitwise_inverted_equal_p
that also catches comparisons too.
OK? Bootstrapped and tested on x86_64-linux-gnu
> > + get_range_query (cfun)->range_of_expr (r, bound);
>
> expand doesn't have a ranger instance so this is a no-op. I'm unsure
> if it would be safe given we're half GIMPLE, half RTL. Please leave it
> out.
It definitely does not work and can
optimizing more than expected makes it low priority).
LGTM
Andrew
pops the RAS and the latter pushes it.
Any reason for using a different sequence in one than the other?
On Tue, Oct 10, 2023 at 3:11 PM Jeff Law wrote:
>
>
> Ventana has had a variant of this patch from Andrew W. in its tree for
> at least a year. I'm dusting it off
While `a & (b ^ ~a)` is optimized to `a & b` on the rtl level,
it is always good to optimize this at the gimple level and allows
us to match a few extra things including where a is a comparison.
Note I had to update/change the testcase and-1.c to avoid matching
this case as we can match -2 and 1 a
On Tue, Oct 10, 2023 at 8:26 PM Jeff Law wrote:
>
>
>
> On 10/10/23 18:24, Andrew Waterman wrote:
> > I remembered another concern since we discussed this patch privately.
> > Using ra for long calls results in a sequence that will corrupt the
> > return-address sta
Similar patch which was checked into trunk last week. slight tweak
needed as dconstm0 was not exported in gcc 13, otherwise functionally
the same
Bootstrapped on x86_64-pc-linux-gnu. pushed.
Andrew
commit f0efc4b25cba1bd35b08b7dfbab0f8fc81b55c66
Author: Andrew MacLeod
Date: Mon Oct 9 13
gcc] Error 2
> make[1]: Leaving directory
> '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1'
> make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2
This is also recorded as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 . It breaks more
than just RISCV; it depends on the version of texinfo that is
installed too.
Thanks,
Andrew
>
>
> juzhe.zh...@rivai.ai
. pushed.
Andrew
large, it can consume
a lot of time. Typically, partial equivalence lists are small. In
this case, a lot of dead stmts were not removed, so there was no
redundancy elimination and it was causing an issue.
Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.
Andrew
From
of course the patch would be handy...
On 10/13/23 09:23, Andrew MacLeod wrote:
Technically PR 111622 exposes a bug in GCC 13, but its been papered
over on trunk by this:
commit 9ea74d235c7e7816b996a17c61288f02ef767985
Author: Richard Biener
Date: Thu Sep 14 09:31:23 2023 +0200
This adds the simplification `a & (x | CST)` to a when we know that
`(a & ~CST) == 0`. In a similar fashion as `a & CST` is handle.
I looked into handling `a | (x & CST)` but that I don't see any decent
simplifications happening.
OK? Bootstrapped and tested on x86_linux-gnu with no regressions.
When checking to see if we have a function declaration has a conflict due to
promotations, there is no test to see if the type was an error mark and then
calls
c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an
ICE.
This adds a check for error before the call of c_ty
This is a simple error recovery issue when c_safe_arg_type_equiv_p
was added in r8-5312-gc65e18d3331aa999. The issue is that after
an error, an argument type (of a function type) might turn
into an error mark node and c_safe_arg_type_equiv_p was not ready
for that. So this just adds a check for err
This improves the `A CMP 0 ? A : -A` set of match patterns to use
bitwise_equal_p which allows an nop cast between signed and unsigned.
This allows catching a few extra cases which were not being caught before.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
In the case of a NOP conversion (precisions of the 2 types are equal),
factoring out the conversion can be done even if int_fits_type_p returns
false and even when the conversion is defined by a statement inside the
conditional. Since it is a NOP conversion there is no zero/sign extending
happening
Currently we able to simplify `~a CMP ~b` to `b CMP a` but we should allow a nop
conversion in between the `~` and the `a` which can show up. A similarly thing
should
be done for `~a CMP CST`.
I had originally submitted the `~a CMP CST` case as
https://gcc.gnu.org/pipermail/gcc-patches/2021-Novem
G.md
> https://github.com/git/git/blob/master/.github/PULL_REQUEST_TEMPLATE.md
> What do people think?
>
I think this is a great idea. Is a similar one for opening issues too?
Thanks,
Andrew
ChangeLog:
>
> * .github/CONTRIBUTING.md: New file.
> * .github/PULL_R
On Tue, Oct 17, 2023 at 1:52 PM Alex Coplan wrote:
>
> This adds a new aarch64-specific RTL-SSA pass dedicated to forming load
> and store pairs (LDPs and STPs).
>
> As a motivating example for the kind of thing this improves, take the
> following testcase:
>
> extern double c[20];
>
> double f(do
Pushed as obvious.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_test_fractional_cost):
Test <= instead of testing < twice.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index
2b0de7ca0389be6698c329b54f9501b8ec09183f..9c3c0e705e2e6ea3b55b4a5f1
tween target_clones and target/target_version multiversioning, but
would require agreement on how to resolve some of the issues discussed in [1].
Thanks,
Andrew
[1] https://gcc.gnu.org/pipermail/gcc/2023-October/242686.html
This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.
Note that C++ is currently the only frontend which supports
multiversioning using the "target" attribute, whereas the
This adds initial support for function multiversion on aarch64 using the
target_version and target_clones attributes. This mostly follows the
Beta specification in the ACLE [1], with a few diffences that remain to
be fixed:
- Symbol mangling for target_clones differs from that for target_version
This is a partial patch to make the mangling of function version names
for target_clones match those generated using the target or
target_version attributes. It modifies the name of function versions,
but does not yet rename the resolved symbol, resulting in a duplicate
symbol name (and an error a
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only s
After r14-3110-g7fb65f10285, the canonical form for
`a ? ~b : b` changed to be `-(a) ^ b` that means
for aarch64 we need to add a few new insn patterns
to be able to catch this and change it to be
what is the canonical form for the aarch64 backend.
A secondary pattern was needed to support a zero_e
The build has been failing for the last few days because LLVM removed
support for the HSACOv3 binary metadata format, which we were still
using for the Fiji multilib.
The LLVM commit has now been reverted (thank you Pierre van Houtryve),
but it's only a temporary repreive.
This patch removes
OK to commit?
Andrewgcc-14: mark amdgcn fiji deprecated
diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index c817dde4..91ab8132 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -178,6 +178,16 @@ a work-in-progress.
+AMD Radeon (GCN)
+
+
+
In a similar way we don't warn about NULL pointer constant conversion to
a different named address we should not warn to a different sso endian
either.
This adds the simple check.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR c/104822
gcc/c/ChangeLog:
* c-t
In the case of convert_argument, we would return the same expression
back rather than error_mark_node after the error message about
trying to convert to an incomplete type. This causes issues in
the gimplfier trying to see if another conversion is needed.
The code here dates back to before the rev
On Thu, Oct 19, 2023 at 07:04:09AM +, Richard Biener wrote:
> On Wed, 18 Oct 2023, Andrew Carlotti wrote:
>
> > This patch adds support for the "target_version" attribute to the middle
> > end and the C++ frontend, which will be used to implement function
> &
* Claudiu Zissulescu [2018-06-13 12:09:18 +0300]:
> From: Claudiu Zissulescu
>
> This patch adds support for two ARCHS variations.
>
> Ok to apply?
> Claudiu
Sorry for the delay, this looks fine.
Thanks,
Andrew
>
> gcc/
> 2017-03-10 Claudiu Zissulescu
ntly the driver will still
> pass the extension down to the assembler regardless.
>
> Boostrapped aarch64-none-linux-gnu and ran regression tests.
>
> Is it OK for trunk?
I use a similar patch for the last year and half.
Thanks,
Andrew
>
> gcc/ChangeLog:
> 2018-07-09
it. The only thing you could do is restrict
> > replacement of CALL_EXPRs (in SCEV cprop) to those the target
> > natively supports.
>
> How about restricting it in expression_expensive_p ? Is that what you
> wanted. Attached patch does this.
> Bootstrap and regression testing progr
On Tue, Jul 10, 2018 at 6:35 PM Kugan Vivekanandarajah
wrote:
>
> Hi Andrew,
>
> On 11 July 2018 at 11:19, Andrew Pinski wrote:
> > On Tue, Jul 10, 2018 at 6:14 PM Kugan Vivekanandarajah
> > wrote:
> >>
> >> On 10 July 2018 at 23:17, Richard Biener
&g
All the patches in this series look fine.
Thanks,
Andrew
* Claudiu Zissulescu [2018-07-16 15:29:42 +0300]:
> From: claziss
>
> gcc/
> 2017-06-14 Claudiu Zissulescu
>
> * config/arc/arc.h (ADDITIONAL_REGISTER_NAMES): Add additional
> register names.
&g
imental) (GCC))
> > on -O1 and above.
>
>
> I don't see where the FUD comes in here; either this builtin has a defined
> semantics across targets and they are adhered to, or the builtin doesn't have
> well defined semantics, or the targets fail to implement those se
Hi all,
I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and there are no vector equivalents of that type.
Therefore, this patch adds minimal support for "complex vector int"
modes.
OK.
Andrew
On 26/05/2023 15:58, Tobias Burnus wrote:
(Update the syntax of the amdgcn commandline option in anticipation of
later patches;
while -m(no-)xnack is in mainline since r12-2396-gaad32a00b7d2b6 (for
PR100208),
-mxsnack (contrary to -msram-ecc) is currently mostly a stub for later
On 30/05/2023 07:26, Richard Biener wrote:
On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote:
Hi all,
I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and
On 06/06/2023 16:33, Tobias Burnus wrote:
Andrew: Does the GCN change look okay to you?
This patch permits to use GCN devices with 'omp requires
unified_address' which
in principle works already, except that the requirement handling did
disable it.
(It also updates libgomp.tex
;s no DIVMOD support so I
couldn't just do a straight comparison.
Thanks
Andrew
On 09/06/2023 10:02, Richard Sandiford wrote:
Andrew Stubbs writes:
On 07/06/2023 20:42, Richard Sandiford wrote:
I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit ve
This patch allows vectorization when operators are available as
libfuncs, rather that only as insns.
This will be useful for amdgcn where we plan to vectorize loops that
contain integer division or modulus, but don't want to generate inline
instructions for the division algorithm every time.
nts?
OK?
Btw, testing on GCN would be welcome - the _avx512 paths could
work for it so in case the while_ult path fails (not sure if
it ever does) it could get _avx512 style masking. Likewise
testing on ARM just to see I didn't break anything here.
I don't have SVE hardware so testing is probably meaningless.
I can set some tests going. Is vect.exp enough?
Andrew
On 14/06/2023 15:29, Richard Biener wrote:
Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane
On 15/06/2023 10:58, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 14/06/2023 15:29, Richard Biener wrote:
Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
This implemens fully masked vectorization or a masked
On 15/06/2023 12:06, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 10:58, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 14/06/2023 15:29, Richard Biener wrote:
Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
On 14/06/2023 12:54
On 15/06/2023 14:34, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 12:06, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 10:58, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 14/06/2023 15:29, Richard
On 15/06/2023 15:00, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 14:34, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 12:06, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 10:58, Richard
ement memory that's both high-bandwidth and pinned anyway).
Patches 15 to 17 are new work. I can probably approve these myself, but
they can't be committed until the rest of the series is approved.
Andrew
Andrew Stubbs (11):
libgomp, nvptx: low-latency memory allocator
libgomp: pinned m
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
t
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.
libgomp/ChangeLog:
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.
The allocator is equivalent to using a custom allocator with the pinned
.
co-authored-by: Andrew Stubbs
---
gcc/omp-low.cc | 174 +++
gcc/passes.def | 1 +
gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 ++
gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++
gcc/testsuite/g++.dg/gomp/usm-1
Add a new option. It's inactive until I add some follow-up patches.
gcc/ChangeLog:
* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
gcc/common.opt | 16 ++
This adds support for using Cuda Managed Memory with omp_alloc. It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.
There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can
Currently we are only handling omp allocate directive that is associated
with an allocate statement. This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive. But the free calls come in a separate cleanup block. To
This is the front-end portion of the Unified Shared Memory implementation.
It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets
flag_offload_memory, but is otherwise inactive, for now.
It also checks that -foffload-memory isn't set to an incompatible mode.
gcc/c/ChangeL
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
(gfc_trans_omp_allocate): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.
gcc/ChangeLog:
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_AL
This patch looks for malloc/free calls that were generated by allocate statement
that is associated with allocate directive and replaces them with GOMP_alloc
and GOMP_free.
gcc/ChangeLog:
* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR.
(scan_omp_allocate): New.
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up. The option is
intended to provide a performance boost to certain offload programs without
modifying the code.
This feature only works on Linux, at present, and simply calls mlo
gcc/ChangeLog:
* doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE.
* gimple-pretty-print.cc (dump_gimple_omp_allocate): New function.
(pp_gimple_stmt_1): Call it.
* gimple.cc (gimple_build_omp_allocate): New function.
* gimple.def (GIMPLE_OMP_ALLOCATE): New no
Currently we only make use of this directive when it is associated
with an allocate statement.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE.
(show_code_node): Likewise.
* gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.
Implement the Unified Shared Memory API calls in the GCN plugin.
The allocate and free are pretty straight-forward because all "target" memory
allocations are compatible with USM, on the right hardware. However, there's
no known way to check what memory region was used, after the fact, so we use
The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt. This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.
To support the feature we must be able to set the appropriate meta-data and
set the load instru
The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.
This patch sets a new attribute on OpenMP offload functions to ensure that the
information is p
On 07/07/2022 12:54, Tobias Burnus wrote:
Hi Andrew,
On 07.07.22 12:34, Andrew Stubbs wrote:
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up. The option is
intended to provide a performance boost to certain offload
On 08/07/2022 10:00, Tobias Burnus wrote:
On 08.07.22 00:18, Andrew Stubbs wrote:
Likewise, the 'requires' mechanism could then also be used in '[PATCH
16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.
No, I don't think so; that environment variable ne
This patch ensures that the maximum vectorization factor used to set the
"safelen" attribute on "omp simd" constructs is suitable for all the
configured offload devices.
Right now it makes the proper adjustment for NVPTX, but otherwise just
uses a value suitable for the host system (always x86
I've committed this patch to enable DImode one's-complement on amdgcn.
The hardware doesn't have 64-bit not, and this isn't needed by expand
which is happy to use two SImode operations, but the vectorizer isn't so
clever. Vector condition masks are DImode on amdgcn, so this has been
causing lo
I've committed this patch to implement V64DImode vector-vector and
vector-scalar shifts.
In particular, these are used by the SIMD "inbranch" clones that I'm
working on right now, but it's an omission that ought to have been fixed
anyway.
Andrewamdgcn: 64-bit vector shifts
Enable 64-bit vec
This patch adjusts the generation of SIMD "inbranch" clones that use
integer masks to ensure that it vectorizes on amdgcn.
The problem was only that an amdgcn mask is DImode and the shift amount
was SImode, and the difference causes vectorization to fail.
OK for mainline?
Andrewopenmp-simd-c
TYPE (mask));
g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
}
Your version gives the same output mine does, at least on amdgcn anyway.
Am I OK to commit this version?
Andrew
openmp-simd-clone: Mat
1 - 100 of 5877 matches
Mail list logo