[PATCH v2 0/2] VN predicate improvements

2024-11-06 Thread Andrew Pinski
This is v2 of the predicate improvements. This is only the changed patches; rather than all of them. The main change is to use vn_valueize. But there was another change dealing with canonicalization of the comparison with constants always being on the rhs; that is why I am resending them even thoug

[PATCH v2 2/2] VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]

2024-11-06 Thread Andrew Pinski
After the last patch, we also want to record `(A CMP B) != 0` as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the true/false edges swapped. This shows up more due to the new handling of `(A | B) ==/!= 0` in insert_predicates_for_cond as now we can notice these comparisons which were not se

[PATCH v2 1/2] VN: Handle `(a | b) !=/== 0` for predicates [PR117414]

2024-11-06 Thread Andrew Pinski
For `(a | b) == 0`, we can "assert" on the true edge that both `a == 0` and `b == 0` but nothing on the false edge. For `(a | b) != 0`, we can "assert" on the false edge that both `a == 0` and `b == 0` but nothing on the true edge. This adds that predicate and allows us to optimize f0, f1, and f2 i

Re: [PATCH 7/8] ipa: Verify that const jump functions have corresponding value range

2024-11-06 Thread Aldy Hernandez
Aldy Hernandez writes: > Martin Jambor writes: > >> Hi, >> >> Because the simplified way of extracting value ranges from functions >> does not look at scalar constants (as one of the versions had been >> doing before) but instead rely on the value range within the jump >> function already captur

Re: [PATCH 4/8] ipa: Better value ranges for zero pointer constants

2024-11-06 Thread Aldy Hernandez
Jan Hubicka writes: >> > 2024-11-01 Martin Jambor >> > >> > * ipa-prop.cc (ipa_compute_jump_functions_for_edge): When creating >> > value-range jump functions from pointer type constant zero, do so >> > as if it was not a pointer. >> > --- >> > gcc/ipa-prop.cc | 3 ++-

[PATCH] inline-asm: Add support for cc operand modifier

2024-11-06 Thread Jakub Jelinek
Hi! As mentioned in the "inline asm: Add new constraint for symbol definitions" patch description, while the c operand modifier is documented to: Require a constant operand and print the constant expression with no punctuation. it actually doesn't do that with -fpic at least on some targets and h

[PATCH 09/10] c: Fix bounds checking for VLA and construct VLA vector constants

2024-11-06 Thread Tejas Belagod
This patch adds support for checking bounds of SVE ACLE vector initialization constructors. It also adds support to construct vector constant from init constructors. gcc/ChangeLog: * c/c-typeck.cc (process_init_element): Add check to restrict constructor length to the minimum vec

[PATCH 01/10] aarch64: Fix ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS

2024-11-06 Thread Tejas Belagod
This patch enables ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS to indicate that C/C++ language operations are available natively on SVE ACLE types. gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define __ARM_FEATURE_SVE_VECTOR_OPERATORS. --- gcc/con

[PATCH 10/10] cp: Fix another assumption in the FE about constant vector indices.

2024-11-06 Thread Tejas Belagod
This patch adds a change to handle VLA's poly indices. gcc/ChangeLog: * cp/decl.cc (reshape_init_array_1): Handle poly indices. gcc/testsuite/ChangeLog: * g++.dg/ext/sve-sizeless-1.C: Update test to test initialize error. * g++.dg/ext/sve-sizeless-2.C: Likewise. --- gcc

[PATCH 10/15] aarch64: Add svboolx4_t

2024-11-06 Thread Richard Sandiford
This patch adds an svboolx4_t type, to go alongside the existing svboolx2_t type. It doesn't require any special ISA support beyond SVE itself and it currently has no associated instructions. gcc/ * config/aarch64/aarch64-modes.def (VNx64BI): New mode. * config/aarch64/aarch64-pro

[PATCH 11/15] aarch64: Define arm_neon.h types in arm_sve.h too

2024-11-06 Thread Richard Sandiford
This patch moves the scalar and single-vector Advanced SIMD types from arm_neon.h into a private header, so that they can be defined by arm_sve.h as well. This is needed for the upcoming SVE2.1 hybrid-VLA reductions, which return 128-bit Advanced SIMD vectors. The approach follows Claudio's patch

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs
On 06/11/2024 17:59, Jakub Jelinek wrote: On Wed, Nov 06, 2024 at 05:53:53PM +, Andrew Stubbs wrote: I'm not sure why I didn't see this. Was it bootstrap tested or just built without bootstrap + tested? Otherwise it is just a warning. Apparently I forgot to rerun the bootstrap after maki

[PATCH 12/15] aarch64: Add common subset of SVE2p1 and SME

2024-11-06 Thread Richard Sandiford
Some instructions that were previously restricted to streaming mode can also be used in non-streaming mode with SVE2.1. This patch adds support for those, as well as the usual new-extension boilerplate. A later patch will add the feature macro. gcc/ * config/aarch64/aarch64-option-extensi

[PATCH 13/15] aarch64: Add common subset of SVE2p1 and SME2

2024-11-06 Thread Richard Sandiford
This patch handles the SVE2p1 instructions that are shared with SME2. This includes the consecutive-register forms of the 2-register and 4-register loads and stores, but not the strided-register forms. gcc/ * config/aarch64/aarch64.h (TARGET_SVE2p1_OR_SME2): New macro. * config/aa

Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-06 Thread Christophe Lyon
On Wed, 6 Nov 2024 at 14:52, Torbjorn SVENSSON wrote: > > > > On 2024-11-06 14:04, Richard Earnshaw (lists) wrote: > > On 06/11/2024 12:23, Torbjorn SVENSSON wrote: > >> > >> > >> On 2024-11-06 12:26, Richard Earnshaw (lists) wrote: > >>> On 06/11/2024 07:44, Christophe Lyon wrote: > On Wed,

[PATCH 01/15] aarch64: Make more use of TARGET_STREAMING_SME2

2024-11-06 Thread Richard Sandiford
Some code was checking TARGET_STREAMING and TARGET_SME2 separately, but we now have a macro to test both at once. gcc/ * config/aarch64/aarch64-sme.md: Use TARGET_STREAMING_SME2 instead of separate TARGET_STREAMING and TARGET_SME2 tests. * config/aarch64/aarch64-sve2.md: Li

[PATCH 04/15] aarch64: Use braces in SVE TBL instructions

2024-11-06 Thread Richard Sandiford
GCC previously used the older assembly syntax for SVE TBL, with no braces around the second operand. This patch switches to the newer, official syntax, with braces around the operand. The initial SVE binutils submission supported both syntaxes, so there should be no issues with backwards compatib

[PATCH 02/15] aarch64: Test TARGET_STREAMING instead of TARGET_STREAMING_SME

2024-11-06 Thread Richard Sandiford
g:ede97598e2c recorded separate ISA requirements for streaming and non-streaming mode. The premise there was that AARCH64_FL_SME should not be included in the streaming mode requirements, since: (a) an __arm_streaming_compatible function wouldn't be in streaming mode if SME wasn't available.

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread H. Peter Anvin
On November 6, 2024 10:15:13 AM PST, Jakub Jelinek wrote: >On Wed, Nov 06, 2024 at 10:03:25AM -0800, H. Peter Anvin wrote: >> The issue is that we want the frame pointer chain to be maintained, even >> across alternatives. > >If the current function doesn't have frame pointer set up yet (or is in

[PATCH 05/15] aarch64: Add an abstraction for vector base addresses

2024-11-06 Thread Richard Sandiford
In the upcoming SVE2.1 svld1q and svst1q intrinsics, the relationship between the base vector and the data vector differs from existing gather/scatter intrinsics. This patch adds a new abstraction to handle the difference. gcc/ * config/aarch64/aarch64-sve-builtins.h (function_sha

[PATCH 07/15] aarch64: Parameterise SVE pointer type inference

2024-11-06 Thread Richard Sandiford
All extending gather load intrinsics encode the source type in their name (e.g. svld1sb for an extending load from signed bytes). The type of the extension result has to be specified using an explicit type suffix; it isn't something that can be inferred from the arguments, since there are multiple

[PATCH 06/15] aarch64: Add an abstraction for scatter store type inference

2024-11-06 Thread Richard Sandiford
Until now, all data arguments to a scatter store needed to have 32-bit or 64-bit elements. This isn't true for the upcoming SVE2.1 svst1q scatter intrinsic, so this patch adds an abstraction around the restriction. gcc/ * config/aarch64/aarch64-sve-builtins-shapes.cc (store_scatte

[PATCH 08/15] aarch64: Factor out part of the SVE ext_def class

2024-11-06 Thread Richard Sandiford
This patch factors out some of ext_def into a base class, so that it can be reused for the SVE2.1 svextq intrinsic. gcc/ * config/aarch64/aarch64-sve-builtins-shapes.cc (ext_base): New base class, extracted from... (ext_def): ...here. --- .../aarch64/aarch64-sve-builtins-s

[PATCH 09/15] aarch64: Sort some SVE2 lists alphabetically

2024-11-06 Thread Richard Sandiford
gcc/ * config/aarch64/aarch64-sve-builtins-sve2.def: Sort entries alphabetically. * config/aarch64/aarch64-sve-builtins-sve2.h: Likewise. * config/aarch64/aarch64-sve-builtins-sve2.cc: Likewise. --- .../aarch64/aarch64-sve-builtins-sve2.cc | 24 +++---

[PATCH 03/15] aarch64: Tweak definition of all_data & co

2024-11-06 Thread Richard Sandiford
Past extensions to SVE have required new subsets of all_data; the SVE2.1 patches will add another. This patch tries to make this more scalable by defining the multi-size *_data macros to be unions of single-size *_data macros. gcc/ * config/aarch64/aarch64-sve-builtins.cc (TYPES_all_data)

[PATCH v4] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-11-06 Thread Marek Polacek
On Mon, Nov 04, 2024 at 11:10:05PM -0500, Jason Merrill wrote: > On 10/30/24 4:59 PM, Marek Polacek wrote: > > On Wed, Oct 30, 2024 at 09:01:36AM -0400, Patrick Palka wrote: > > > On Tue, 29 Oct 2024, Marek Polacek wrote: > > --- a/gcc/cp/cp-tree.h > > +++ b/gcc/cp/cp-tree.h > > @@ -451,6 +451,7 @@

Re: [PATCH] [PR106329] SVE intrinsics: Fold calls with pfalse predicate.

2024-11-06 Thread Richard Sandiford
Thanks for doing this and sorry for the slow review. Jennifer Schmitz writes: > If an SVE intrinsic has predicate pfalse, we can fold the call to > a simplified assignment statement: For _m, _x, and implicit predication, > the LHS can be assigned the operand for inactive values and for _z, we can

Re: [PATCH 12/15] aarch64: Add common subset of SVE2p1 and SME

2024-11-06 Thread Richard Sandiford
Richard Sandiford writes: > Some instructions that were previously restricted to streaming mode > can also be used in non-streaming mode with SVE2.1. This patch adds > support for those, as well as the usual new-extension boilerplate. > A later patch will add the feature macro. > > gcc/ > *

Re: [PATCH] c++: Fix another crash with invalid new operators [PR117463]

2024-11-06 Thread Jason Merrill
On 11/6/24 2:23 PM, Simon Martin wrote: Even though this PR is very close to PR117101, it's not addressed by the fix I made through r15-4958-g5821f5c8c89a05 because cxx_placement_new_fn has the very same issue as std_placement_new_fn_p used to have. This patch fixes the issue exactly the same, b

Re: [PATCH 2/2] aarch64: Add AdvSIMD LUT extension and vluti2{q}_lane{q} intrinsics

2024-11-06 Thread Richard Sandiford
writes: > The AArch64 FEAT_LUT extension is optional from Armv9.2-a and mandatory > from Armv9.5-a. This extension introduces instructions for lookup table > read with 2-bit indices. > > This patch adds AdvSIMD LUT intrinsics for LUTI2, supporting table > lookup with 2-bit packed indices. The foll

Re: [COMMITED] [lto] ipcp don't propagate where not needed

2024-11-06 Thread Jonathan Wakely
On Wed, 6 Nov 2024 at 18:39, Michal Jires wrote: > > On Wed, 2024-11-06 at 17:33:50 +, Jonathan Wakely wrote: > > > > If there's going to be a constructor then it should initialize the members. > > > > Otherwise, your original patch was better, because you could write > > this to get an all-ze

Re: Implement removal of malloc/free pairs with NULL check

2024-11-06 Thread Richard Biener
On Wed, 6 Nov 2024, Jan Hubicka wrote: > > > Thinking about this some more, I think we should just add -fno-malloc-dce > > > option and do it even if ranges don't guarantee it won't be half of AS or > > > more, that is really just a special case and not too different from > > > doing 3 PTRDIFF_MAX

Re: [PATCH] Optimize incoming integer argument promotion

2024-11-06 Thread H.J. Lu
On Wed, Nov 6, 2024 at 4:29 PM Richard Biener wrote: > > On Tue, Nov 5, 2024 at 10:50 PM H.J. Lu wrote: > > > > On Tue, Nov 5, 2024 at 5:27 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 5, 2024 at 10:09 AM Richard Biener > > > wrote: > > > > > > > > On Tue, Nov 5, 2024 at 5:23 AM Jeff La

[PATCH] testsuite: arm: Use effective-target for pr84556.cc test

2024-11-06 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14? -- Using "dg-do run" with a selector breaks testing arm-none-eabi for any architecture when check_effective_target_arm_neon_hw returns 0. gcc/testsuite/ChangeLog: * g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector to instead use dg-

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread Uros Bizjak
Hello! After some more thinking and considering all recent discussion (thanks!), I am convinced that a slightly simplified original patch (attached), now one-liner, is the way to go. Let's look at the following test: --cut here-- unsigned long foo (void) { return __builtin_ia32_readeflags_u64

Re: [PATCH] Optimize incoming integer argument promotion

2024-11-06 Thread Richard Biener
On Tue, Nov 5, 2024 at 10:50 PM H.J. Lu wrote: > > On Tue, Nov 5, 2024 at 5:27 PM Richard Biener > wrote: > > > > On Tue, Nov 5, 2024 at 10:09 AM Richard Biener > > wrote: > > > > > > On Tue, Nov 5, 2024 at 5:23 AM Jeff Law wrote: > > > > > > > > > > > > > > > > On 11/4/24 8:13 PM, H.J. Lu wrot

Re: [PATCH] store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439]

2024-11-06 Thread Richard Biener
On Wed, 6 Nov 2024, Jakub Jelinek wrote: > Hi! > > encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which > it has to allocate a buffer and do various extra work like shifting the bits > etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen > doesn't have cor

Re: [PATCH] store-merging: Apply --param=store-merging-max-size= in more spots [PR117439]

2024-11-06 Thread Richard Biener
On Wed, 6 Nov 2024, Jakub Jelinek wrote: > Hi! > > Store merging assumes a merged region won't be too large. The assumption is > e.g. in using inappropriate types in various spots (e.g. int for bit sizes > and bit positions in a few spots, or unsigned for the total size in bytes of > the merged

Re: Implement removal of malloc/free pairs with NULL check

2024-11-06 Thread Jan Hubicka
> > Thinking about this some more, I think we should just add -fno-malloc-dce > > option and do it even if ranges don't guarantee it won't be half of AS or > > more, that is really just a special case and not too different from > > doing 3 PTRDIFF_MAX - 10 allocations and expecting at least one of

Re: Implement removal of malloc/free pairs with NULL check

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 09:55:29AM +0100, Richard Biener wrote: > Btw, did you check what happens when doing new/delete without nothrow() > and either external or internal EH? I think optimizing is OK in all > cases, but I guess EH edges will prevent the optimization? I've checked the one with ex

[PATCH 1/2] aarch64: Refactor infrastructure for advsimd intrinsics

2024-11-06 Thread vladimir.miloserdov
This patch refactors the infrastructure for defining advsimd pragma intrinsics, adding support for more flexible type and signature handling in future SIMD extensions. A new simd_type structure is introduced, which allows for consistent mode and qualifier management across various advsimd operati

[PATCH] store-merging: Apply --param=store-merging-max-size= in more spots [PR117439]

2024-11-06 Thread Jakub Jelinek
Hi! Store merging assumes a merged region won't be too large. The assumption is e.g. in using inappropriate types in various spots (e.g. int for bit sizes and bit positions in a few spots, or unsigned for the total size in bytes of the merged region), in doing XNEWVEC for the whole total size of

Re: [PATCH 00/15] Support for 64-bit location_t

2024-11-06 Thread Richard Biener
On Tue, Nov 5, 2024 at 6:16 PM Lewis Hyatt wrote: > > On Tue, Nov 05, 2024 at 10:56:30AM +0100, Jakub Jelinek wrote: > > On Tue, Nov 05, 2024 at 10:42:10AM +0100, Richard Biener wrote: > > > > Actually, I think cpp_token isn't that big deal, that should be > > > > short-lived > > > > unless using

Re: [PATCH 1/2] doc: install: document bootstrap-ubsan

2024-11-06 Thread Filip Kastl
Hi, I'm not a maintainer but I think we certainly want to have bootstrap-ubsan documented and the patch looks good to me. Cheers, Filip On Thu 2024-10-31 21:11:13, Sam James wrote: > gcc/ChangeLog: > PR other/116948 > > * doc/install.texi (Building a native compiler): Mention > boo

Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-06 Thread Torbjorn SVENSSON
On 2024-11-06 12:26, Richard Earnshaw (lists) wrote: On 06/11/2024 07:44, Christophe Lyon wrote: On Wed, 6 Nov 2024 at 07:20, Torbjörn SVENSSON wrote: While the regression was reported on GCC15, I'm sure that same regression will be seen on GCC14 when it's tested in the arm-linux-gnueabihf

[COMMITED] [lto] ipcp don't propagate where not needed

2024-11-06 Thread Michal Jires
Commited with suggested changes. - This patch disables propagation of ipcp information into partitions where all instances of the node are marked to be inlined. Motivation: Incremental LTO needs stable values between compilation

Re: Implement removal of malloc/free pairs with NULL check

2024-11-06 Thread Richard Biener
On Wed, 6 Nov 2024, Alexander Monakov wrote: > > On Wed, 6 Nov 2024, Richard Biener wrote: > > > Since we had malloc/free pair removal for quite some time I think > > it should stay on by default. > > I missed that; now I see what you meant by "not making the existing > situation worse". > > I

RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-11-06 Thread Li, Pan2
Never mind and thanks Richard for comments. > Sorry for falling back in reviewing - it's not exactly clear the "cheap" form > is > cheaper. When I count the number of gimple statements (sub-expressions) > the original appears as 3 while the result looks to have 5. I may have a question about ho

[PATCH 3/3] dwarf: lto: Stabilize external die references.

2024-11-06 Thread Michal Jires
During Incremental LTO, contents of LTO partitions diverge because of external DIE references (DW_AT_abstract_origin). External references are in form 'die_symbol+offset'. Originally there is only single die_symbol for each compilation unit and its offsets are in 100'000s, which easily diverge. D

[PATCH] store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439]

2024-11-06 Thread Jakub Jelinek
Hi! encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which it has to allocate a buffer and do various extra work like shifting the bits etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen doesn't have corresponding integer mode. The last case is explained la

Re: [RFC PATCH] inline asm: Add new constraint for symbol definitions

2024-11-06 Thread Richard Biener
On Tue, 5 Nov 2024, Jakub Jelinek wrote: > Hi! > > The following patch on top of the PR41045 toplevel extended asm patch > allows marking inline asms (both toplevel and function-local, admittedly > it is less useful for the latter, so if you want, I can add restrictions) > as defining symbols, ei

Re: Implement removal of malloc/free pairs with NULL check

2024-11-06 Thread Richard Biener
On Tue, 5 Nov 2024, Jakub Jelinek wrote: > On Tue, Nov 05, 2024 at 04:47:20PM +0100, Jan Hubicka wrote: > > > POSIX semantics for malloc involve errno. > > > > So if I can check errno to see if malloc failed, I guess even our > > current behaviour of optimizing away paired malloc+free calls provi

Re: [PATCH] inline-asm: Add support for cc operand modifier

2024-11-06 Thread Richard Biener
On Wed, 6 Nov 2024, Jakub Jelinek wrote: > Hi! > > As mentioned in the "inline asm: Add new constraint for symbol definitions" > patch description, while the c operand modifier is documented to: > Require a constant operand and print the constant expression with no > punctuation. > it actually d

[committed] libstdc++: Move include guards to start of headers

2024-11-06 Thread Jonathan Wakely
libstdc++-v3/ChangeLog: * include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move include guard to start of the header. * include/c_global/ctgmath (_GLIBCXX_CTGMATH): Likewise. --- Tested x86_64-linux. Pushed to trunk. libstdc++-v3/include/c_compatibility/complex.h

Re: [PATCH 1/2] libstdc++: Enable debug assertions for filesystem directory iterators

2024-11-06 Thread Jonathan Wakely
Pushed On Thu, 31 Oct 2024 at 20:06, Jonathan Wakely wrote: > > Several member functions of filesystem::directory_iterator and > filesystem::recursive_directory_iterator currently dereference their > shared_ptr data member without checking for non-null. Because they use > operator-> and that func

[committed v2] libstdc++: Deprecate useless compatibility headers for C++17

2024-11-06 Thread Jonathan Wakely
These headers make no sense for C++ programs, because they either define different content to the corresponding C header, or define nothing at all in namespace std. They were all deprecated in C++17, so add deprecation warnings to them, which can be disabled with -Wno-deprecated. For C++20 and lat

[PATCH 1/3] aarch64: Restrict FCLAMP to SME2

2024-11-06 Thread Richard Sandiford
There are two sets of patterns for FCLAMP: one set for single registers and one set for multiple registers. The multiple-register set was correctly gated on SME2, but the single-register set only required SME. This doesn't matter for ACLE usage, since the intrinsic definitions are correctly gated.

Re: [PATCH v3 7/8] i386: Add else operand to masked loads.

2024-11-06 Thread Robin Dapp
> x86 doesn't define mask_gather_loadmn, so I think you can drop this > and all related, only keep the patch I give you in [1] > Sorry I didn't make that clear last time. Yes, that works, thanks. Will post a v4 soon. -- Regards Robin

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread Uros Bizjak
On Wed, Nov 6, 2024 at 4:53 PM Jakub Jelinek wrote: > > On Wed, Nov 06, 2024 at 04:27:51PM +0100, Uros Bizjak wrote: > > I see. While my solution would fit nicely with the above > > ASM_CALL_CONSTRAINT approach, the approach using ASM_CALL_CONSTRAINT > > is wrong by itself. > > > > Oh, well. > > >

Patch ping (Re: [PATCH] c: Add u{,l,ll,imax}abs builtins [PR117024])

2024-11-06 Thread Jakub Jelinek
On Tue, Oct 22, 2024 at 07:48:39PM +0200, Jakub Jelinek wrote: > On Wed, Oct 16, 2024 at 05:44:05PM +0200, Jakub Jelinek wrote: > > The following patch adds u{,l,ll,imax}abs builtins, which just fold > > to ABSU_EXPR, similarly to how {,l,ll,imax}abs builtins fold to > > ABS_EXPR. > > > > Tested o

Re: [PATCH v2] testsuite: arm: Use effective-target for attr-neon* tests

2024-11-06 Thread Richard Earnshaw (lists)
On 05/11/2024 20:28, Torbjörn SVENSSON wrote: > Changes since v1: > > - Changed from arm_neon to arm_arch_v7a for the required effective target. > > Ok for trunk and releases/gcc-14? > > -- > > Force armv7-a as the tests require a neon compatible architecture. > > gcc/testsuite/ChangeLog: > >

Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-06 Thread Richard Earnshaw (lists)
On 06/11/2024 12:23, Torbjorn SVENSSON wrote: > > > On 2024-11-06 12:26, Richard Earnshaw (lists) wrote: >> On 06/11/2024 07:44, Christophe Lyon wrote: >>> On Wed, 6 Nov 2024 at 07:20, Torbjörn SVENSSON >>> wrote: While the regression was reported on GCC15, I'm sure that same regr

[PATCH 4/5] Disable gather/scatter for non-first vectorized epilogue

2024-11-06 Thread Richard Biener
We currently make vect_check_gather_scatter happy by replacing SSA name references in DR_REF for gather/scatter DRs but the replacement process only works once since for the second epilogue we have SSA names from the first epilogue in DR_REF but as we copied from the original loop the SSA mapping d

Re: [PATCH 04/10] gimple: Disallow sizeless types in BIT_FIELD_REFs.

2024-11-06 Thread Richard Biener
On Wed, Nov 6, 2024 at 12:49 PM Tejas Belagod wrote: > > Ensure sizeless types don't end up trying to be canonicalised to > BIT_FIELD_REFs. You mean variable-sized? But don't we know, when there's a constant array index, that the size is at least so this indexing is OK? So what's wrong with a

[PATCH 1/5] Check LOOP_VINFO_PEELING_FOR_GAPS on epilog is supported

2024-11-06 Thread Richard Biener
We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS in case the main loop didn't (the other way around is OK), the computation whether the epilog is executed or not gets our of sync otherwise. Bootstrapped and tested on x86_64-unknown-linux-gnu. * tree-vect-loop.

[PATCH 3/5] Add LOOP_VINFO_MAIN_LOOP_INFO

2024-11-06 Thread Richard Biener
The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main vectorized loop info and the preceeding vectorized epilogue. This is critical for correctness as we need to disallow never executed epilogues by costing in vect_analyze_loo

[PATCH 5/5] Allow multiple vectorized epilogs via --param vect-epilogues-nomask=N

2024-11-06 Thread Richard Biener
The following is a prototype allowing N possible vector epilogues. In the end I'd like the target to tell us a set of (or no) vector modes to consider for the epilogue of the main or the current epilog analyzed loop in a way similar as to how we communicate back suggested_unroll_factor. The main m

Re: Implement removal of malloc/free pairs with NULL check

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 02:44:12PM +0300, Alexander Monakov wrote: > I didn't see a discussion of a more gentle approach where instead of > replacing the result of malloc with a non-zero constant, we would change > > tmp = malloc(sz); > > to > > tmp = (void *)(sz <= SIZE_MAX / 2); > > and l

[PATCH 2/3] dwarf: lto: Allow die_symbol outside of comp_unit.

2024-11-06 Thread Michal Jires
Die symbols are used for external references. Typically during LTO, early debug emits 'die_symbol+offset' for each possibly referenced DIE in future. Partitions in LTRANS phase then use these references. Originally die symbols are handled only in root comp_unit and in attributes. This patch allow

[PATCH 1/3] ipa-strub: Replace cgraph_node order with uid.

2024-11-06 Thread Michal Jires
ipa_strub_set_mode_for_new_functions uses node order as unique ever increasing identifier. This is better satisfied with uid. Order loses uniqueness with following patches. gcc/ChangeLog: * ipa-strub.cc (ipa_strub_set_mode_for_new_functions): Replace order with uid. (pass

Re: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-11-06 Thread Richard Biener
On Thu, Oct 31, 2024 at 7:29 AM wrote: > > From: Pan Li > > There are sorts of forms for the unsigned SAT_ADD. Some of them are > complicated while others are cheap. This patch would like to simplify > the complicated form into the cheap ones. For example as below: > > From the form 4 (branch)

Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-06 Thread Torbjorn SVENSSON
On 2024-11-06 14:04, Richard Earnshaw (lists) wrote: On 06/11/2024 12:23, Torbjorn SVENSSON wrote: On 2024-11-06 12:26, Richard Earnshaw (lists) wrote: On 06/11/2024 07:44, Christophe Lyon wrote: On Wed, 6 Nov 2024 at 07:20, Torbjörn SVENSSON wrote: While the regression was reported on

Re: [PATCH] c: Implement C2y N3356, if declarations [PR117019]

2024-11-06 Thread Marek Polacek
On reflection, I'm not so sure about these anymore: On Mon, Nov 04, 2024 at 06:26:47PM -0500, Marek Polacek wrote: > + switch (extern int i = 0); /* { dg-error "in condition|both .extern. and > initializer" } */ I think this is definitely valid. > + switch (register int i = 0); /* { dg-error

[PATCH] Set --param vect-force-slp=1

2024-11-06 Thread Richard Biener
The following pulls the trigger, defaulting --param vect-force-slp to 1. I know of no features missing but eventually minor testsuite and optimization quality fallout. Bootstrapped and tested on x86_64-unknown-linux-gnu. I'll amend PR116578 with the list of FAILs this causes (my baseline is outda

Re: [PATCH] Add a bootstrap-native build config

2024-11-06 Thread Andi Kleen
On Tue, Jul 30, 2024 at 09:40:42AM -0700, Andi Kleen wrote: > From: Andi Kleen > > ... that uses -march=native -mtune=native to build a compiler optimized > for the host. > > config/ChangeLog: > > * bootstrap-native.mk: New file. > > gcc/ChangeLog: > > * doc/install.texi: Document

Re: [PATCH v2 2/2] Match: make SAT_ADD case 7 commutative

2024-11-06 Thread Richard Biener
On Mon, Nov 4, 2024 at 2:01 PM Akram Ahmad wrote: > > On 31/10/2024 08:00, Richard Biener wrote: > > On Wed, Oct 30, 2024 at 4:46 PM Akram Ahmad wrote: > >> On 29/10/2024 12:48, Richard Biener wrote: > >>> The testcases will FAIL unless the target has support for .SAT_ADD - you > >>> want to > >

Re: [PATCH 3/4] openmp: Add IFN_GOMP_MAX_VF

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 03:27:21PM +, Andrew Stubbs wrote: > Delay omp_max_vf call until after the host and device compilers have diverged > so that the max_vf value can be tuned exactly right on both variants. > > This change means that the ompdevlow pass must be enabled for functions that >

[PATCH 0/4] openmp: Fix omp_max_vf in offload contexts

2024-11-06 Thread Andrew Stubbs
This patch series is a rework of the patch originally posted a couple of years ago: https://patchwork.sourceware.org/project/gcc/patch/0e1a740e-46d5-ebfa-36f4-9a069ddf8...@codesourcery.com/ The review comments from that time have been addressed, as have the comments from yesterday's review in the

Re: [PATCH 4/4] openmp: Add testcases for omp_max_vf

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 03:27:22PM +, Andrew Stubbs wrote: > Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when > offloading is enabled ("target" directives are present), and is inactive > otherwise. > > This requires enabling the offload-dump scanning features previ

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread H. Peter Anvin
On November 6, 2024 7:27:51 AM PST, Uros Bizjak wrote: >On Wed, Nov 6, 2024 at 11:57 AM Jakub Jelinek wrote: >> >> On Wed, Nov 06, 2024 at 10:44:54AM +0100, Uros Bizjak wrote: >> > After some more thinking and considering all recent discussion >> > (thanks!), I am convinced that a slightly simpli

[PATCH] libstdc++: Refactor std::hash specializations

2024-11-06 Thread Jonathan Wakely
This attempts to simplify and clean up our std::hash code. The primary benefit is improved diagnostics for users when they do something wrong involving std::hash or unordered containers. An additional benefit is that for the unstable ABI (--enable-symvers=gnu-versioned-namespace) we can reduce the

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 04:27:51PM +0100, Uros Bizjak wrote: > I see. While my solution would fit nicely with the above > ASM_CALL_CONSTRAINT approach, the approach using ASM_CALL_CONSTRAINT > is wrong by itself. > > Oh, well. > > Anyway, I guess "redzone" clobber you proposed does not remove the

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 07:45:54AM -0800, H. Peter Anvin wrote: > I suggested __builtin_frame_address(0) as an input constraint (which > already works in gcc and clang) and the "red-zone" clobber (new) for this > exact reason (Andrew Pinski, however, summarily closed those BRs.) I posted a patch f

Re: [PATCH] c: Implement C2y N3356, if declarations [PR117019]

2024-11-06 Thread Marek Polacek
On Wed, Nov 06, 2024 at 09:42:02AM -0500, Marek Polacek wrote: > On reflection, I'm not so sure about these anymore: > > On Mon, Nov 04, 2024 at 06:26:47PM -0500, Marek Polacek wrote: > > + switch (extern int i = 0); /* { dg-error "in condition|both .extern. > > and initializer" } */ > > I thi

[PATCH 07/10] aarch64: Add testcase for C/C++ ops on SVE ACLE types.

2024-11-06 Thread Tejas Belagod
This patch adds a test case to cover C/C++ operators on SVE ACLE types. This does not cover all types, but covers most representative types. gcc/testsuite: * gcc.target/aarch64/sve/acle/general/cops.c: New test. --- .../aarch64/sve/acle/general/cops.c | 570 ++

[PATCH 03/10] c: Range-check indexing of SVE ACLE vectors

2024-11-06 Thread Tejas Belagod
This patch adds a check for non-GNU vectors to warn that the index is outside the range of a fixed vector size. For VLA vectors, we don't diagnose. gcc/ChangeLog: * c-family/c-common.cc (convert_vector_to_array_for_subscript): Add range-check for target vector types. --- gcc/c-f

[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]

2024-11-06 Thread Alexey Merzlyakov
Hi, Jeff, Thank you for the review! All items were met, please find the comments and PATCH v2 in the message below: On Mon, Nov 04, 2024 at 04:48:31PM -0700, Jeff Law wrote: > > +  /* Trying to optimize: > > +     (zero_extend:M (subreg:N (not:M (X:M -> > > +     (xor:M (zero_extend:M (s

[PATCH 0/3] aarch64: Fix various issues with the SME support

2024-11-06 Thread Richard Sandiford
While adding support for SVE2.1 and SME2.1, I found several embarrassing mistakes in my earlier SME and SME2 patches. :( This series tries to fix them. Tested on aarch64-linux-gnu. I'm planning to commit to trunk on Thursday evening UTC if there are no comments before then, but please let me know

Re: Implement removal of malloc/free pairs with NULL check

2024-11-06 Thread Alexander Monakov
On Wed, 6 Nov 2024, Richard Biener wrote: > Since we had malloc/free pair removal for quite some time I think > it should stay on by default. I missed that; now I see what you meant by "not making the existing situation worse". I still miss what happened to "correctness trumps performance" :)

Re: [PATCH 2/2] libstdc++: More user-friendly failed assertions from shared_ptr dereference

2024-11-06 Thread Jonathan Wakely
Pushed On Thu, 31 Oct 2024 at 20:08, Jonathan Wakely wrote: > > Currently dereferencing an empty shared_ptr prints a complicated > internal type in the assertion message: > > include/bits/shared_ptr_base.h:1377: std::__shared_ptr_access<_Tp, _Lp, > , >::element_type& std::__shared_ptr_access<_T

[PATCH 2/4] openmp: use offload max_vf for chunk_size

2024-11-06 Thread Andrew Stubbs
The chunk size for SIMD loops should be right for the current device; too big allocates too much memory, too small is inefficient. Getting it wrong doesn't actually break anything though. This patch attempts to choose the optimal setting based on the context. Both host-fallback and device will g

[PATCH 3/4] openmp: Add IFN_GOMP_MAX_VF

2024-11-06 Thread Andrew Stubbs
Delay omp_max_vf call until after the host and device compilers have diverged so that the max_vf value can be tuned exactly right on both variants. This change means that the ompdevlow pass must be enabled for functions that use OpenMP directives with both "simd" and "schedule" enabled. gcc/Chang

[PATCH 4/4] openmp: Add testcases for omp_max_vf

2024-11-06 Thread Andrew Stubbs
Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. This requires enabling the offload-dump scanning features previously only used in the libgomp testsuite. The automake scheme used there

[PATCH 3/3] incremental lto: Remap node order for stability.

2024-11-06 Thread Michal Jires
This patch adds remapping of node order for each lto partition. Resulting order conserves relative order inside partition, but is independent of outside symbols. So if lto partition contains identical set of symbols, their remapped order will be stable between compilations. gcc/ChangeLog:

[PATCH 0/3] dwarf: incremental lto: Stabilize external references.

2024-11-06 Thread Michal Jires
These patches allow adding additional die symbols, so that external references represented as 'die_symbol+offset' don't diverge contents of LTO partitions. Bootstrapped/regtested on x86_64-linux

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread Uros Bizjak
On Wed, Nov 6, 2024 at 11:57 AM Jakub Jelinek wrote: > > On Wed, Nov 06, 2024 at 10:44:54AM +0100, Uros Bizjak wrote: > > After some more thinking and considering all recent discussion > > (thanks!), I am convinced that a slightly simplified original patch > > (attached), now one-liner, is the way

Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-11-06 Thread Andi Kleen
On Fri, Nov 01, 2024 at 02:01:18PM -0400, John David Anglin wrote: > This breaks build on hppa64-hp-hpux11.11. This target has clock_gettime > but it doesn't have CLOCK_MONOTONIC. It has CLOCK_REALTIME. I modified > timevar.cc as follows to restore build. Alternative would be to check for CLOCK

[RFC PATCH] inline asm, v2: Add new constraint for symbol definitions

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 09:08:10AM +0100, Richard Biener wrote: > It would probably be cleanest to have a separate print modifier for > "symbol for assembler label definition" or so, but given this feature See the patch I'll post next. > targets existing uses those already know how to emit the de

Re: [patch,avr] Add an RTL peephole to tweak lower_reg:QI o= cst.

2024-11-06 Thread Denis Chertykov
ср, 6 нояб. 2024 г. в 12:58, Georg-Johann Lay : > > For operations like X o= CST, regalloc may spill l-reg X to a d-reg: > D = X > D o= CST > X = D > where it is better to instead > D = CST > X o= D > This patch adds an according RTL peephole. > > Ok for trunk? Please apply

[PATCH 06/10] rtl: Validate subreg info when optimizing vec_select.

2024-11-06 Thread Tejas Belagod
When optimizing for NOPs in case of overlapping regs in VEC_SELECT expressions, validate subreg data before using simplify_subreg_regno. There is no real SUBREG rtx here, but a pseudo subreg call to check if subregs are possible. gcc/ChangeLog: * rtlanal.cc (set_noop_p): Validate subreg

  1   2   3   >