This is v2 of the predicate improvements. This is only
the changed patches; rather than all of them.
The main change is to use vn_valueize. But there was another change
dealing with canonicalization of the comparison with constants always
being on the rhs; that is why I am resending them even thoug
After the last patch, we also want to record `(A CMP B) != 0`
as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the
true/false edges swapped.
This shows up more due to the new handling of
`(A | B) ==/!= 0` in insert_predicates_for_cond
as now we can notice these comparisons which were not se
For `(a | b) == 0`, we can "assert" on the true edge that
both `a == 0` and `b == 0` but nothing on the false edge.
For `(a | b) != 0`, we can "assert" on the false edge that
both `a == 0` and `b == 0` but nothing on the true edge.
This adds that predicate and allows us to optimize f0, f1,
and f2 i
Aldy Hernandez writes:
> Martin Jambor writes:
>
>> Hi,
>>
>> Because the simplified way of extracting value ranges from functions
>> does not look at scalar constants (as one of the versions had been
>> doing before) but instead rely on the value range within the jump
>> function already captur
Jan Hubicka writes:
>> > 2024-11-01 Martin Jambor
>> >
>> > * ipa-prop.cc (ipa_compute_jump_functions_for_edge): When creating
>> > value-range jump functions from pointer type constant zero, do so
>> > as if it was not a pointer.
>> > ---
>> > gcc/ipa-prop.cc | 3 ++-
Hi!
As mentioned in the "inline asm: Add new constraint for symbol definitions"
patch description, while the c operand modifier is documented to:
Require a constant operand and print the constant expression with no
punctuation.
it actually doesn't do that with -fpic at least on some targets and
h
This patch adds support for checking bounds of SVE ACLE vector initialization
constructors. It also adds support to construct vector constant from init
constructors.
gcc/ChangeLog:
* c/c-typeck.cc (process_init_element): Add check to restrict
constructor length to the minimum vec
This patch enables ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS to indicate
that C/C++ language operations are available natively on SVE ACLE types.
gcc/ChangeLog:
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_SVE_VECTOR_OPERATORS.
---
gcc/con
This patch adds a change to handle VLA's poly indices.
gcc/ChangeLog:
* cp/decl.cc (reshape_init_array_1): Handle poly indices.
gcc/testsuite/ChangeLog:
* g++.dg/ext/sve-sizeless-1.C: Update test to test initialize error.
* g++.dg/ext/sve-sizeless-2.C: Likewise.
---
gcc
This patch adds an svboolx4_t type, to go alongside the existing
svboolx2_t type. It doesn't require any special ISA support beyond
SVE itself and it currently has no associated instructions.
gcc/
* config/aarch64/aarch64-modes.def (VNx64BI): New mode.
* config/aarch64/aarch64-pro
This patch moves the scalar and single-vector Advanced SIMD types
from arm_neon.h into a private header, so that they can be defined
by arm_sve.h as well. This is needed for the upcoming SVE2.1
hybrid-VLA reductions, which return 128-bit Advanced SIMD vectors.
The approach follows Claudio's patch
On 06/11/2024 17:59, Jakub Jelinek wrote:
On Wed, Nov 06, 2024 at 05:53:53PM +, Andrew Stubbs wrote:
I'm not sure why I didn't see this.
Was it bootstrap tested or just built without bootstrap + tested?
Otherwise it is just a warning.
Apparently I forgot to rerun the bootstrap after maki
Some instructions that were previously restricted to streaming mode
can also be used in non-streaming mode with SVE2.1. This patch adds
support for those, as well as the usual new-extension boilerplate.
A later patch will add the feature macro.
gcc/
* config/aarch64/aarch64-option-extensi
This patch handles the SVE2p1 instructions that are shared
with SME2. This includes the consecutive-register forms of
the 2-register and 4-register loads and stores, but not the
strided-register forms.
gcc/
* config/aarch64/aarch64.h (TARGET_SVE2p1_OR_SME2): New macro.
* config/aa
On Wed, 6 Nov 2024 at 14:52, Torbjorn SVENSSON
wrote:
>
>
>
> On 2024-11-06 14:04, Richard Earnshaw (lists) wrote:
> > On 06/11/2024 12:23, Torbjorn SVENSSON wrote:
> >>
> >>
> >> On 2024-11-06 12:26, Richard Earnshaw (lists) wrote:
> >>> On 06/11/2024 07:44, Christophe Lyon wrote:
> On Wed,
Some code was checking TARGET_STREAMING and TARGET_SME2 separately,
but we now have a macro to test both at once.
gcc/
* config/aarch64/aarch64-sme.md: Use TARGET_STREAMING_SME2
instead of separate TARGET_STREAMING and TARGET_SME2 tests.
* config/aarch64/aarch64-sve2.md: Li
GCC previously used the older assembly syntax for SVE TBL, with no
braces around the second operand. This patch switches to the newer,
official syntax, with braces around the operand.
The initial SVE binutils submission supported both syntaxes, so there
should be no issues with backwards compatib
g:ede97598e2c recorded separate ISA requirements for streaming
and non-streaming mode. The premise there was that AARCH64_FL_SME
should not be included in the streaming mode requirements, since:
(a) an __arm_streaming_compatible function wouldn't be in streaming
mode if SME wasn't available.
On November 6, 2024 10:15:13 AM PST, Jakub Jelinek wrote:
>On Wed, Nov 06, 2024 at 10:03:25AM -0800, H. Peter Anvin wrote:
>> The issue is that we want the frame pointer chain to be maintained, even
>> across alternatives.
>
>If the current function doesn't have frame pointer set up yet (or is in
In the upcoming SVE2.1 svld1q and svst1q intrinsics, the relationship
between the base vector and the data vector differs from existing
gather/scatter intrinsics. This patch adds a new abstraction to
handle the difference.
gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_sha
All extending gather load intrinsics encode the source type in
their name (e.g. svld1sb for an extending load from signed bytes).
The type of the extension result has to be specified using an
explicit type suffix; it isn't something that can be inferred
from the arguments, since there are multiple
Until now, all data arguments to a scatter store needed to have
32-bit or 64-bit elements. This isn't true for the upcoming SVE2.1
svst1q scatter intrinsic, so this patch adds an abstraction around the
restriction.
gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc
(store_scatte
This patch factors out some of ext_def into a base class,
so that it can be reused for the SVE2.1 svextq intrinsic.
gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc (ext_base): New base
class, extracted from...
(ext_def): ...here.
---
.../aarch64/aarch64-sve-builtins-s
gcc/
* config/aarch64/aarch64-sve-builtins-sve2.def: Sort entries
alphabetically.
* config/aarch64/aarch64-sve-builtins-sve2.h: Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc: Likewise.
---
.../aarch64/aarch64-sve-builtins-sve2.cc | 24 +++---
Past extensions to SVE have required new subsets of all_data; the
SVE2.1 patches will add another. This patch tries to make this more
scalable by defining the multi-size *_data macros to be unions of
single-size *_data macros.
gcc/
* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_data)
On Mon, Nov 04, 2024 at 11:10:05PM -0500, Jason Merrill wrote:
> On 10/30/24 4:59 PM, Marek Polacek wrote:
> > On Wed, Oct 30, 2024 at 09:01:36AM -0400, Patrick Palka wrote:
> > > On Tue, 29 Oct 2024, Marek Polacek wrote:
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -451,6 +451,7 @@
Thanks for doing this and sorry for the slow review.
Jennifer Schmitz writes:
> If an SVE intrinsic has predicate pfalse, we can fold the call to
> a simplified assignment statement: For _m, _x, and implicit predication,
> the LHS can be assigned the operand for inactive values and for _z, we can
Richard Sandiford writes:
> Some instructions that were previously restricted to streaming mode
> can also be used in non-streaming mode with SVE2.1. This patch adds
> support for those, as well as the usual new-extension boilerplate.
> A later patch will add the feature macro.
>
> gcc/
> *
On 11/6/24 2:23 PM, Simon Martin wrote:
Even though this PR is very close to PR117101, it's not addressed by the
fix I made through r15-4958-g5821f5c8c89a05 because cxx_placement_new_fn
has the very same issue as std_placement_new_fn_p used to have.
This patch fixes the issue exactly the same, b
writes:
> The AArch64 FEAT_LUT extension is optional from Armv9.2-a and mandatory
> from Armv9.5-a. This extension introduces instructions for lookup table
> read with 2-bit indices.
>
> This patch adds AdvSIMD LUT intrinsics for LUTI2, supporting table
> lookup with 2-bit packed indices. The foll
On Wed, 6 Nov 2024 at 18:39, Michal Jires wrote:
>
> On Wed, 2024-11-06 at 17:33:50 +, Jonathan Wakely wrote:
> >
> > If there's going to be a constructor then it should initialize the members.
> >
> > Otherwise, your original patch was better, because you could write
> > this to get an all-ze
On Wed, 6 Nov 2024, Jan Hubicka wrote:
> > > Thinking about this some more, I think we should just add -fno-malloc-dce
> > > option and do it even if ranges don't guarantee it won't be half of AS or
> > > more, that is really just a special case and not too different from
> > > doing 3 PTRDIFF_MAX
On Wed, Nov 6, 2024 at 4:29 PM Richard Biener
wrote:
>
> On Tue, Nov 5, 2024 at 10:50 PM H.J. Lu wrote:
> >
> > On Tue, Nov 5, 2024 at 5:27 PM Richard Biener
> > wrote:
> > >
> > > On Tue, Nov 5, 2024 at 10:09 AM Richard Biener
> > > wrote:
> > > >
> > > > On Tue, Nov 5, 2024 at 5:23 AM Jeff La
Ok for trunk and releases/gcc-14?
--
Using "dg-do run" with a selector breaks testing arm-none-eabi for any
architecture when check_effective_target_arm_neon_hw returns 0.
gcc/testsuite/ChangeLog:
* g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector
to instead use dg-
Hello!
After some more thinking and considering all recent discussion
(thanks!), I am convinced that a slightly simplified original patch
(attached), now one-liner, is the way to go.
Let's look at the following test:
--cut here--
unsigned long foo (void)
{
return __builtin_ia32_readeflags_u64
On Tue, Nov 5, 2024 at 10:50 PM H.J. Lu wrote:
>
> On Tue, Nov 5, 2024 at 5:27 PM Richard Biener
> wrote:
> >
> > On Tue, Nov 5, 2024 at 10:09 AM Richard Biener
> > wrote:
> > >
> > > On Tue, Nov 5, 2024 at 5:23 AM Jeff Law wrote:
> > > >
> > > >
> > > >
> > > > On 11/4/24 8:13 PM, H.J. Lu wrot
On Wed, 6 Nov 2024, Jakub Jelinek wrote:
> Hi!
>
> encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
> it has to allocate a buffer and do various extra work like shifting the bits
> etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
> doesn't have cor
On Wed, 6 Nov 2024, Jakub Jelinek wrote:
> Hi!
>
> Store merging assumes a merged region won't be too large. The assumption is
> e.g. in using inappropriate types in various spots (e.g. int for bit sizes
> and bit positions in a few spots, or unsigned for the total size in bytes of
> the merged
> > Thinking about this some more, I think we should just add -fno-malloc-dce
> > option and do it even if ranges don't guarantee it won't be half of AS or
> > more, that is really just a special case and not too different from
> > doing 3 PTRDIFF_MAX - 10 allocations and expecting at least one of
On Wed, Nov 06, 2024 at 09:55:29AM +0100, Richard Biener wrote:
> Btw, did you check what happens when doing new/delete without nothrow()
> and either external or internal EH? I think optimizing is OK in all
> cases, but I guess EH edges will prevent the optimization?
I've checked the one with ex
This patch refactors the infrastructure for defining advsimd pragma
intrinsics, adding support for more flexible type and signature
handling in future SIMD extensions.
A new simd_type structure is introduced, which allows for consistent
mode and qualifier management across various advsimd operati
Hi!
Store merging assumes a merged region won't be too large. The assumption is
e.g. in using inappropriate types in various spots (e.g. int for bit sizes
and bit positions in a few spots, or unsigned for the total size in bytes of
the merged region), in doing XNEWVEC for the whole total size of
On Tue, Nov 5, 2024 at 6:16 PM Lewis Hyatt wrote:
>
> On Tue, Nov 05, 2024 at 10:56:30AM +0100, Jakub Jelinek wrote:
> > On Tue, Nov 05, 2024 at 10:42:10AM +0100, Richard Biener wrote:
> > > > Actually, I think cpp_token isn't that big deal, that should be
> > > > short-lived
> > > > unless using
Hi,
I'm not a maintainer but I think we certainly want to have bootstrap-ubsan
documented and the patch looks good to me.
Cheers,
Filip
On Thu 2024-10-31 21:11:13, Sam James wrote:
> gcc/ChangeLog:
> PR other/116948
>
> * doc/install.texi (Building a native compiler): Mention
> boo
On 2024-11-06 12:26, Richard Earnshaw (lists) wrote:
On 06/11/2024 07:44, Christophe Lyon wrote:
On Wed, 6 Nov 2024 at 07:20, Torbjörn SVENSSON
wrote:
While the regression was reported on GCC15, I'm sure that same
regression will be seen on GCC14 when it's tested in the
arm-linux-gnueabihf
Commited with suggested changes.
-
This patch disables propagation of ipcp information into partitions
where all instances of the node are marked to be inlined.
Motivation:
Incremental LTO needs stable values between compilation
On Wed, 6 Nov 2024, Alexander Monakov wrote:
>
> On Wed, 6 Nov 2024, Richard Biener wrote:
>
> > Since we had malloc/free pair removal for quite some time I think
> > it should stay on by default.
>
> I missed that; now I see what you meant by "not making the existing
> situation worse".
>
> I
Never mind and thanks Richard for comments.
> Sorry for falling back in reviewing - it's not exactly clear the "cheap" form
> is
> cheaper. When I count the number of gimple statements (sub-expressions)
> the original appears as 3 while the result looks to have 5.
I may have a question about ho
During Incremental LTO, contents of LTO partitions diverge because of
external DIE references (DW_AT_abstract_origin).
External references are in form 'die_symbol+offset'.
Originally there is only single die_symbol for each compilation unit and
its offsets are in 100'000s, which easily diverge.
D
Hi!
encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
it has to allocate a buffer and do various extra work like shifting the bits
etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
doesn't have corresponding integer mode.
The last case is explained la
On Tue, 5 Nov 2024, Jakub Jelinek wrote:
> Hi!
>
> The following patch on top of the PR41045 toplevel extended asm patch
> allows marking inline asms (both toplevel and function-local, admittedly
> it is less useful for the latter, so if you want, I can add restrictions)
> as defining symbols, ei
On Tue, 5 Nov 2024, Jakub Jelinek wrote:
> On Tue, Nov 05, 2024 at 04:47:20PM +0100, Jan Hubicka wrote:
> > > POSIX semantics for malloc involve errno.
> >
> > So if I can check errno to see if malloc failed, I guess even our
> > current behaviour of optimizing away paired malloc+free calls provi
On Wed, 6 Nov 2024, Jakub Jelinek wrote:
> Hi!
>
> As mentioned in the "inline asm: Add new constraint for symbol definitions"
> patch description, while the c operand modifier is documented to:
> Require a constant operand and print the constant expression with no
> punctuation.
> it actually d
libstdc++-v3/ChangeLog:
* include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move
include guard to start of the header.
* include/c_global/ctgmath (_GLIBCXX_CTGMATH): Likewise.
---
Tested x86_64-linux. Pushed to trunk.
libstdc++-v3/include/c_compatibility/complex.h
Pushed
On Thu, 31 Oct 2024 at 20:06, Jonathan Wakely wrote:
>
> Several member functions of filesystem::directory_iterator and
> filesystem::recursive_directory_iterator currently dereference their
> shared_ptr data member without checking for non-null. Because they use
> operator-> and that func
These headers make no sense for C++ programs, because they either define
different content to the corresponding C header, or define
nothing at all in namespace std. They were all deprecated in C++17, so
add deprecation warnings to them, which can be disabled with
-Wno-deprecated. For C++20 and lat
There are two sets of patterns for FCLAMP: one set for single registers
and one set for multiple registers. The multiple-register set was
correctly gated on SME2, but the single-register set only required SME.
This doesn't matter for ACLE usage, since the intrinsic definitions
are correctly gated.
> x86 doesn't define mask_gather_loadmn, so I think you can drop this
> and all related, only keep the patch I give you in [1]
> Sorry I didn't make that clear last time.
Yes, that works, thanks. Will post a v4 soon.
--
Regards
Robin
On Wed, Nov 6, 2024 at 4:53 PM Jakub Jelinek wrote:
>
> On Wed, Nov 06, 2024 at 04:27:51PM +0100, Uros Bizjak wrote:
> > I see. While my solution would fit nicely with the above
> > ASM_CALL_CONSTRAINT approach, the approach using ASM_CALL_CONSTRAINT
> > is wrong by itself.
> >
> > Oh, well.
> >
>
On Tue, Oct 22, 2024 at 07:48:39PM +0200, Jakub Jelinek wrote:
> On Wed, Oct 16, 2024 at 05:44:05PM +0200, Jakub Jelinek wrote:
> > The following patch adds u{,l,ll,imax}abs builtins, which just fold
> > to ABSU_EXPR, similarly to how {,l,ll,imax}abs builtins fold to
> > ABS_EXPR.
> >
> > Tested o
On 05/11/2024 20:28, Torbjörn SVENSSON wrote:
> Changes since v1:
>
> - Changed from arm_neon to arm_arch_v7a for the required effective target.
>
> Ok for trunk and releases/gcc-14?
>
> --
>
> Force armv7-a as the tests require a neon compatible architecture.
>
> gcc/testsuite/ChangeLog:
>
>
On 06/11/2024 12:23, Torbjorn SVENSSON wrote:
>
>
> On 2024-11-06 12:26, Richard Earnshaw (lists) wrote:
>> On 06/11/2024 07:44, Christophe Lyon wrote:
>>> On Wed, 6 Nov 2024 at 07:20, Torbjörn SVENSSON
>>> wrote:
While the regression was reported on GCC15, I'm sure that same
regr
We currently make vect_check_gather_scatter happy by replacing SSA
name references in DR_REF for gather/scatter DRs but the replacement
process only works once since for the second epilogue we have SSA
names from the first epilogue in DR_REF but as we copied from the
original loop the SSA mapping d
On Wed, Nov 6, 2024 at 12:49 PM Tejas Belagod wrote:
>
> Ensure sizeless types don't end up trying to be canonicalised to
> BIT_FIELD_REFs.
You mean variable-sized? But don't we know, when there's a constant
array index,
that the size is at least so this indexing is OK? So what's wrong with a
We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS
in case the main loop didn't (the other way around is OK), the
computation whether the epilog is executed or not gets our of sync
otherwise.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
* tree-vect-loop.
The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside
LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main
vectorized loop info and the preceeding vectorized epilogue.
This is critical for correctness as we need to disallow never
executed epilogues by costing in vect_analyze_loo
The following is a prototype allowing N possible vector epilogues.
In the end I'd like the target to tell us a set of (or no) vector modes
to consider for the epilogue of the main or the current epilog analyzed loop
in a way similar as to how we communicate back suggested_unroll_factor.
The main m
On Wed, Nov 06, 2024 at 02:44:12PM +0300, Alexander Monakov wrote:
> I didn't see a discussion of a more gentle approach where instead of
> replacing the result of malloc with a non-zero constant, we would change
>
> tmp = malloc(sz);
>
> to
>
> tmp = (void *)(sz <= SIZE_MAX / 2);
>
> and l
Die symbols are used for external references.
Typically during LTO, early debug emits 'die_symbol+offset' for each
possibly referenced DIE in future. Partitions in LTRANS phase then
use these references.
Originally die symbols are handled only in root comp_unit and
in attributes.
This patch allow
ipa_strub_set_mode_for_new_functions uses node order as unique ever
increasing identifier. This is better satisfied with uid.
Order loses uniqueness with following patches.
gcc/ChangeLog:
* ipa-strub.cc (ipa_strub_set_mode_for_new_functions): Replace
order with uid.
(pass
On Thu, Oct 31, 2024 at 7:29 AM wrote:
>
> From: Pan Li
>
> There are sorts of forms for the unsigned SAT_ADD. Some of them are
> complicated while others are cheap. This patch would like to simplify
> the complicated form into the cheap ones. For example as below:
>
> From the form 4 (branch)
On 2024-11-06 14:04, Richard Earnshaw (lists) wrote:
On 06/11/2024 12:23, Torbjorn SVENSSON wrote:
On 2024-11-06 12:26, Richard Earnshaw (lists) wrote:
On 06/11/2024 07:44, Christophe Lyon wrote:
On Wed, 6 Nov 2024 at 07:20, Torbjörn SVENSSON
wrote:
While the regression was reported on
On reflection, I'm not so sure about these anymore:
On Mon, Nov 04, 2024 at 06:26:47PM -0500, Marek Polacek wrote:
> + switch (extern int i = 0); /* { dg-error "in condition|both .extern. and
> initializer" } */
I think this is definitely valid.
> + switch (register int i = 0); /* { dg-error
The following pulls the trigger, defaulting --param vect-force-slp to 1.
I know of no features missing but eventually minor testsuite and
optimization quality fallout.
Bootstrapped and tested on x86_64-unknown-linux-gnu. I'll amend
PR116578 with the list of FAILs this causes (my baseline is outda
On Tue, Jul 30, 2024 at 09:40:42AM -0700, Andi Kleen wrote:
> From: Andi Kleen
>
> ... that uses -march=native -mtune=native to build a compiler optimized
> for the host.
>
> config/ChangeLog:
>
> * bootstrap-native.mk: New file.
>
> gcc/ChangeLog:
>
> * doc/install.texi: Document
On Mon, Nov 4, 2024 at 2:01 PM Akram Ahmad wrote:
>
> On 31/10/2024 08:00, Richard Biener wrote:
> > On Wed, Oct 30, 2024 at 4:46 PM Akram Ahmad wrote:
> >> On 29/10/2024 12:48, Richard Biener wrote:
> >>> The testcases will FAIL unless the target has support for .SAT_ADD - you
> >>> want to
> >
On Wed, Nov 06, 2024 at 03:27:21PM +, Andrew Stubbs wrote:
> Delay omp_max_vf call until after the host and device compilers have diverged
> so that the max_vf value can be tuned exactly right on both variants.
>
> This change means that the ompdevlow pass must be enabled for functions that
>
This patch series is a rework of the patch originally posted a couple of
years ago:
https://patchwork.sourceware.org/project/gcc/patch/0e1a740e-46d5-ebfa-36f4-9a069ddf8...@codesourcery.com/
The review comments from that time have been addressed, as have the
comments from yesterday's review in the
On Wed, Nov 06, 2024 at 03:27:22PM +, Andrew Stubbs wrote:
> Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when
> offloading is enabled ("target" directives are present), and is inactive
> otherwise.
>
> This requires enabling the offload-dump scanning features previ
On November 6, 2024 7:27:51 AM PST, Uros Bizjak wrote:
>On Wed, Nov 6, 2024 at 11:57 AM Jakub Jelinek wrote:
>>
>> On Wed, Nov 06, 2024 at 10:44:54AM +0100, Uros Bizjak wrote:
>> > After some more thinking and considering all recent discussion
>> > (thanks!), I am convinced that a slightly simpli
This attempts to simplify and clean up our std::hash code. The primary
benefit is improved diagnostics for users when they do something wrong
involving std::hash or unordered containers. An additional benefit is
that for the unstable ABI (--enable-symvers=gnu-versioned-namespace) we
can reduce the
On Wed, Nov 06, 2024 at 04:27:51PM +0100, Uros Bizjak wrote:
> I see. While my solution would fit nicely with the above
> ASM_CALL_CONSTRAINT approach, the approach using ASM_CALL_CONSTRAINT
> is wrong by itself.
>
> Oh, well.
>
> Anyway, I guess "redzone" clobber you proposed does not remove the
On Wed, Nov 06, 2024 at 07:45:54AM -0800, H. Peter Anvin wrote:
> I suggested __builtin_frame_address(0) as an input constraint (which
> already works in gcc and clang) and the "red-zone" clobber (new) for this
> exact reason (Andrew Pinski, however, summarily closed those BRs.)
I posted a patch f
On Wed, Nov 06, 2024 at 09:42:02AM -0500, Marek Polacek wrote:
> On reflection, I'm not so sure about these anymore:
>
> On Mon, Nov 04, 2024 at 06:26:47PM -0500, Marek Polacek wrote:
> > + switch (extern int i = 0); /* { dg-error "in condition|both .extern.
> > and initializer" } */
>
> I thi
This patch adds a test case to cover C/C++ operators on SVE ACLE types. This
does not cover all types, but covers most representative types.
gcc/testsuite:
* gcc.target/aarch64/sve/acle/general/cops.c: New test.
---
.../aarch64/sve/acle/general/cops.c | 570 ++
This patch adds a check for non-GNU vectors to warn that the index is outside
the range of a fixed vector size. For VLA vectors, we don't diagnose.
gcc/ChangeLog:
* c-family/c-common.cc (convert_vector_to_array_for_subscript): Add
range-check for target vector types.
---
gcc/c-f
Hi, Jeff,
Thank you for the review!
All items were met, please find the comments and PATCH v2 in the message below:
On Mon, Nov 04, 2024 at 04:48:31PM -0700, Jeff Law wrote:
> > + /* Trying to optimize:
> > + (zero_extend:M (subreg:N (not:M (X:M ->
> > + (xor:M (zero_extend:M (s
While adding support for SVE2.1 and SME2.1, I found several
embarrassing mistakes in my earlier SME and SME2 patches. :(
This series tries to fix them.
Tested on aarch64-linux-gnu. I'm planning to commit to trunk on
Thursday evening UTC if there are no comments before then, but please
let me know
On Wed, 6 Nov 2024, Richard Biener wrote:
> Since we had malloc/free pair removal for quite some time I think
> it should stay on by default.
I missed that; now I see what you meant by "not making the existing
situation worse".
I still miss what happened to "correctness trumps performance" :)
Pushed
On Thu, 31 Oct 2024 at 20:08, Jonathan Wakely wrote:
>
> Currently dereferencing an empty shared_ptr prints a complicated
> internal type in the assertion message:
>
> include/bits/shared_ptr_base.h:1377: std::__shared_ptr_access<_Tp, _Lp,
> , >::element_type& std::__shared_ptr_access<_T
The chunk size for SIMD loops should be right for the current device; too big
allocates too much memory, too small is inefficient. Getting it wrong doesn't
actually break anything though.
This patch attempts to choose the optimal setting based on the context. Both
host-fallback and device will g
Delay omp_max_vf call until after the host and device compilers have diverged
so that the max_vf value can be tuned exactly right on both variants.
This change means that the ompdevlow pass must be enabled for functions that
use OpenMP directives with both "simd" and "schedule" enabled.
gcc/Chang
Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when
offloading is enabled ("target" directives are present), and is inactive
otherwise.
This requires enabling the offload-dump scanning features previously only used
in the libgomp testsuite. The automake scheme used there
This patch adds remapping of node order for each lto partition.
Resulting order conserves relative order inside partition, but
is independent of outside symbols. So if lto partition contains
identical set of symbols, their remapped order will be stable
between compilations.
gcc/ChangeLog:
These patches allow adding additional die symbols, so that
external references represented as 'die_symbol+offset' don't diverge
contents of LTO partitions.
Bootstrapped/regtested on x86_64-linux
On Wed, Nov 6, 2024 at 11:57 AM Jakub Jelinek wrote:
>
> On Wed, Nov 06, 2024 at 10:44:54AM +0100, Uros Bizjak wrote:
> > After some more thinking and considering all recent discussion
> > (thanks!), I am convinced that a slightly simplified original patch
> > (attached), now one-liner, is the way
On Fri, Nov 01, 2024 at 02:01:18PM -0400, John David Anglin wrote:
> This breaks build on hppa64-hp-hpux11.11. This target has clock_gettime
> but it doesn't have CLOCK_MONOTONIC. It has CLOCK_REALTIME. I modified
> timevar.cc as follows to restore build.
Alternative would be to check for CLOCK
On Wed, Nov 06, 2024 at 09:08:10AM +0100, Richard Biener wrote:
> It would probably be cleanest to have a separate print modifier for
> "symbol for assembler label definition" or so, but given this feature
See the patch I'll post next.
> targets existing uses those already know how to emit the de
ср, 6 нояб. 2024 г. в 12:58, Georg-Johann Lay :
>
> For operations like X o= CST, regalloc may spill l-reg X to a d-reg:
> D = X
> D o= CST
> X = D
> where it is better to instead
> D = CST
> X o= D
> This patch adds an according RTL peephole.
>
> Ok for trunk?
Please apply
When optimizing for NOPs in case of overlapping regs in VEC_SELECT expressions,
validate subreg data before using simplify_subreg_regno. There is no real
SUBREG rtx here, but a pseudo subreg call to check if subregs are possible.
gcc/ChangeLog:
* rtlanal.cc (set_noop_p): Validate subreg
1 - 100 of 217 matches
Mail list logo