Re: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF.

2024-11-06 Thread Jakub Jelinek
On Thu, Nov 07, 2024 at 01:57:21PM +0800, Hongtao Liu wrote: > > Does it turn the sNaNs into infinities or qNaNs silently? > Yes. Into infinities? > > Given the rounding, flag_rounding_math should avoid the hw instructions, > The default rounding mode for flag_rounding_math is rounding to > neare

Re: [PATCH] [PR106329] SVE intrinsics: Fold calls with pfalse predicate.

2024-11-06 Thread Richard Sandiford
Richard Sandiford writes: > Thanks for doing this and sorry for the slow review. > > Jennifer Schmitz writes: >> If an SVE intrinsic has predicate pfalse, we can fold the call to >> a simplified assignment statement: For _m, _x, and implicit predication, >> the LHS can be assigned the operand for

Re: [PATCH] inline-asm, i386: Add "redzone" clobber support

2024-11-06 Thread Uros Bizjak
On Wed, Nov 6, 2024 at 2:50 PM Jakub Jelinek wrote: > > Hi! > > The following patch adds a "redzone" clobber (recognized just > on targets which choose to recognize it, right now just x86), > with which one can mark the rare case where inline asm pushes > something on the stack or uses call instru

Re: [PATCH 04/10] gimple: Disallow sizeless types in BIT_FIELD_REFs.

2024-11-06 Thread Tejas Belagod
On 11/6/24 6:02 PM, Richard Biener wrote: On Wed, Nov 6, 2024 at 12:49 PM Tejas Belagod wrote: Ensure sizeless types don't end up trying to be canonicalised to BIT_FIELD_REFs. You mean variable-sized? But don't we know, when there's a constant array index, that the size is at least so this

Re: [PATCH 0/4] libsanitizer: merge from upstream

2024-11-06 Thread Sam James
Kito Cheng writes: > The patch set aims to update libsanitizer from upstream. The motivation is > that > RISC-V is changing the shadow offset for AddressSanitizer, and I also plan to > submit another patch set to add dynamic shadow offset support for GCC. > > This is my first time updating it, s

Re: [PATCH] i386: Support cstorebf4 with native bf16 comi

2024-11-06 Thread Uros Bizjak
On Thu, Nov 7, 2024 at 6:58 AM Hongyu Wang wrote: > > Hi, > > We recently supports cbranchbf4 with AVX10_2 native bf16 comi > instructions, so do similar to cstorebf4. > > Bootstrapped & regtested on x86_64-pc-linux-gnu. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.md (cstoreb

RE: [PATCH] i386: Modify regexp of pr117304-1.c

2024-11-06 Thread Liu, Hongtao
> -Original Message- > From: Hu, Lin1 > Sent: Thursday, November 7, 2024 2:35 PM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Modify regexp of pr117304-1.c > > OK, so just modify the regexp. > > Since the test doesn't care if the hint i

[PATCH 2/4] libsanitizer: Apply local patches

2024-11-06 Thread Kito Cheng
This patch just reapplies local patches (will be noted in LOCAL_PATCHES). --- libsanitizer/asan/asan_globals.cpp| 21 --- libsanitizer/asan/asan_interceptors.h | 7 ++- libsanitizer/asan/asan_mapping.h | 2 +- .../sanitizer_linux_libcdep.cpp |

[PATCH 3/4] libsanitizer: Improve FrameIsInternal

2024-11-06 Thread Kito Cheng
`FrameIsInternal` is a function that improves report quality by filtering out internal functions from the sanitizer, allowing it to point to a more precise root cause. However, the current checks are mostly specific to compiler-rt, so we are adding a few more rules to enhance the filtering for libs

[PATCH 4/4] libsanitizer: update test

2024-11-06 Thread Kito Cheng
gcc/testsuite/ChangeLog: * c-c++-common/ubsan/builtin-1.c: Update test case due to sanitizer has change the error message. --- gcc/testsuite/c-c++-common/ubsan/builtin-1.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/c-c++-common/ubsan/buil

[PATCH 0/4] libsanitizer: merge from upstream

2024-11-06 Thread Kito Cheng
The patch set aims to update libsanitizer from upstream. The motivation is that RISC-V is changing the shadow offset for AddressSanitizer, and I also plan to submit another patch set to add dynamic shadow offset support for GCC. This is my first time updating it, so I used my laptop and an AArch64

[PATCH] i386: Modify regexp of pr117304-1.c

2024-11-06 Thread Hu, Lin1
OK, so just modify the regexp. Since the test doesn't care if the hint is correct, modify the regexp of the hint part to avoid future changes to the hint that would cause the test to fail. BRs, Lin gcc/testsuite/ChangeLog: * gcc.target/i386/pr117304-1.c: Modify regexp. --- gcc/testsuit

Re: [PATCH] i386: Add -mavx512vl for pr117304-1.c

2024-11-06 Thread Hongtao Liu
On Thu, Nov 7, 2024 at 2:04 PM Hu, Lin1 wrote: > > > -Original Message- > > From: Liu, Hongtao > > Sent: Thursday, November 7, 2024 11:41 AM > > To: Hu, Lin1 ; gcc-patches@gcc.gnu.org > > Cc: ubiz...@gmail.com > > Subject: RE: [PATCH] i386: Add -mavx512vl for pr117304-1.c > > > > > > > >

RE: [PATCH] i386: Add -mavx512vl for pr117304-1.c

2024-11-06 Thread Hu, Lin1
> -Original Message- > From: Liu, Hongtao > Sent: Thursday, November 7, 2024 11:41 AM > To: Hu, Lin1 ; gcc-patches@gcc.gnu.org > Cc: ubiz...@gmail.com > Subject: RE: [PATCH] i386: Add -mavx512vl for pr117304-1.c > > > > > -Original Message- > > From: Hu, Lin1 > > Sent: Thursday

[PATCH] i386: Support cstorebf4 with native bf16 comi

2024-11-06 Thread Hongyu Wang
Hi, We recently supports cbranchbf4 with AVX10_2 native bf16 comi instructions, so do similar to cstorebf4. Bootstrapped & regtested on x86_64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.md (cstorebf4): Use vcomsbf16 under TARGET_AVX10_2_256 and -fno-trapping-m

RE: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Liu, Hongtao
> -Original Message- > From: Xi Ruoyao > Sent: Thursday, November 7, 2024 1:12 PM > To: Liu, Hongtao ; Mayshao-oc o...@zhaoxin.com>; Hongtao Liu > Cc: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; ubiz...@gmail.com; > richard.guent...@gmail.com; Tim Hu(WH-RD) ; Silvia > Zhao(BJ-RD) ; Louis

Re: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF.

2024-11-06 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 5:19 PM Jakub Jelinek wrote: > > On Tue, Nov 05, 2024 at 05:12:56PM +0800, Hongtao Liu wrote: > > Yes, there's a mismatch between scalar and vector code, I assume users > > may not care much about precision/NAN/INF/denormal behaviors for > > vector code. > > Just like we sup

PING^4: [PATCH] sibcall: Adjust BLKmode argument size for alignment padding

2024-11-06 Thread H.J. Lu
On Sat, Nov 2, 2024 at 6:48 AM H.J. Lu wrote: > > On Sat, Oct 26, 2024 at 7:25 AM H.J. Lu wrote: > > > > On Sun, Oct 20, 2024 at 6:42 AM H.J. Lu wrote: > > > > > > On Sun, Oct 13, 2024, 10:07 AM H.J. Lu wrote: > > >> > > >> Adjust BLKmode argument size for parameter alignment for sibcall check.

Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Xi Ruoyao
On Thu, 2024-11-07 at 04:58 +, Liu, Hongtao wrote: > > > > Hi all: > > > >     For zhaoxin, I find no improvement when enable > > > > pass_align_tight_loops, and have performance drop in some cases. > > > >     This patch add a new tunable to bypass > > > > pass_align_tight_loops in > > zhaoxin

[PATCH v2][GCC14] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2024-11-06 Thread Yuta Mukai (Fujitsu)
Thank you for pushing to trunk. Can I also ask for a backport to GCC14? I have attached the patch for GCC14. FP8 has been excluded from the list as it is not supported in GCC14. Bootstrapped/regtested on aarch64-unknown-linux-gnu. Thanks, Yuta -- Yuta Mukai Fujitsu Limited >> Thank you for the

RE: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Liu, Hongtao
> -Original Message- > From: Mayshao-oc > Sent: Thursday, November 7, 2024 11:13 AM > To: Hongtao Liu > Cc: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; Liu, Hongtao > ; ubiz...@gmail.com; richard.guent...@gmail.com; > Tim Hu(WH-RD) ; Silvia Zhao(BJ-RD) > ; Louis Qi(BJ-RD) ; Cobe > Chen(BJ

Re: [PATCH] Optimize incoming integer argument promotion

2024-11-06 Thread H.J. Lu
On Wed, Nov 6, 2024 at 6:01 PM Richard Biener wrote: > > On Wed, Nov 6, 2024 at 10:52 AM H.J. Lu wrote: > > > > On Wed, Nov 6, 2024 at 4:29 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 5, 2024 at 10:50 PM H.J. Lu wrote: > > > > > > > > On Tue, Nov 5, 2024 at 5:27 PM Richard Biener > > >

Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Mayshao-oc
> > On Thu, Nov 7, 2024 at 10:29?AM MayShao-oc wrote: > > > > Hi all: > >For zhaoxin, I find no improvement when enable pass_align_tight_loops, > > and have performance drop in some cases. > >This patch add a new tunable to bypass pass_align_tight_loops in zhaoxin. > > > >Bootstrapped

[PATCH 4/4] Write S_INLINESITE CodeView symbols

2024-11-06 Thread Mark Harmstone
Translate DW_TAG_inlined_subroutine DIEs into S_INLINESITE CodeView symbols, marking inlined functions. gcc/ * dwarf2codeview.cc (enum cv_sym_type): Add S_INLINESITE and S_INLINESITE_END. (get_func_id): Add declaration. (write_s_inlinesite): New function. (w

[PATCH 2/4] Don't output CodeView line numbers for inlined functions

2024-11-06 Thread Mark Harmstone
If we encounter an inlined function, treat it as another codeview_function, and skip over these when outputting line numbers. This information will instead be output as part of the S_INLINESITE symbols. gcc/ * dwarf2codeview.cc (struct codeview_function): Add parent and inline_bloc

[PATCH 3/4] Write S_INLINEELINES CodeView subsection

2024-11-06 Thread Mark Harmstone
When outputting the .debug$S CodeView section, also write an S_INLINEELINES subsection, which records the filename and line number of the start of each inlined function. gcc/ * dwarf2codeview.cc (DEBUG_S_INLINEELINES): Define. (CV_INLINEE_SOURCE_LINE_SIGNATURE): Define. (st

[PATCH 1/4] Add block parameter to begin_block debug hook

2024-11-06 Thread Mark Harmstone
Add a parameter to the begin_block debug hook that is a pointer to the tree_node of the block in question. CodeView needs this as it records line numbers of inlined functions in a different manner, so we need to be able to tell if the block is actually the start of an inlined function. gcc/

RE: [PATCH] i386: Add -mavx512vl for pr117304-1.c

2024-11-06 Thread Liu, Hongtao
> -Original Message- > From: Hu, Lin1 > Sent: Thursday, November 7, 2024 11:03 AM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Add -mavx512vl for pr117304-1.c > > Hi, all > > Testing pr117304-1.c in a machine with only avx2 generates so

[PATCH] i386: Add -mavx512vl for pr117304-1.c

2024-11-06 Thread Hu, Lin1
Hi, all Testing pr117304-1.c in a machine with only avx2 generates some different hints, so add -mavx512vl at its option list. Bootstrapped and regtested on x86-64-pc-linux-gnu. I think it is an obvious commit, but I still waiting for some while. If someone have other suggestion. BRs, Lin gcc/

Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Hongtao Liu
On Thu, Nov 7, 2024 at 10:29 AM MayShao-oc wrote: > > Hi all: >For zhaoxin, I find no improvement when enable pass_align_tight_loops, > and have performance drop in some cases. >This patch add a new tunable to bypass pass_align_tight_loops in zhaoxin. > >Bootstrapped X86_64. >Ok fo

[PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread MayShao-oc
Hi all: For zhaoxin, I find no improvement when enable pass_align_tight_loops, and have performance drop in some cases. This patch add a new tunable to bypass pass_align_tight_loops in zhaoxin. Bootstrapped X86_64. Ok for trunk? BR Mayshao gcc/ChangeLog: * config/i386/i386-fea

RE: [PATCH v2] Doc: Add doc for standard name mask_len_strided_load{store}m

2024-11-06 Thread Li, Pan2
Hi Richard, I would like to double confirm about the doc as I am not the native speaker. It may be referenced by all other developers and I am not sure if there is something misleading or fuzzy. Thanks a lot. Pan -Original Message- From: Li, Pan2 Sent: Wednesday, October 30, 2024 7:56

RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-11-06 Thread Li, Pan2
I see, thanks Tamar for the explanation. > The problem with the rewrite is that it pessimists the code if the saturating > instructions are not recognized afterwards. The original idea is somehow independent with the backend support IFN_SAT_* or not. Given we have sorts of form of IFN_SAT_*, som

[PATCH] Make ix86_align_loops uarch-specific tune.

2024-11-06 Thread liuhongt
Disable the tune for Zhaoxin/CLX/SKX since it could hurt performance for the inner loop. According to last test, align_loop helps performance for SPEC2017 on EMR and Znver4. So I'll still keep the tune for generic part. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Any comment? gcc/

[r15-4988 Regression] FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 16, 0\\);" 1 on Linux/x86_64

2024-11-06 Thread haochen.jiang
On Linux/x86_64, d334f729e53867b838e867375b3f475ba793d96e is the first bad commit commit d334f729e53867b838e867375b3f475ba793d96e Author: Andrew Stubbs Date: Wed Nov 6 12:26:08 2024 + openmp: Add testcases for omp_max_vf caused FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp

Re: [PATCH] testsuite: Fix up pr116725.c test [PR116725]

2024-11-06 Thread Hongtao Liu
On Wed, Nov 6, 2024 at 4:59 PM Jakub Jelinek wrote: > > On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote: > > PR target/116725 > > * gcc.target/i386/pr116725.c: Add test using those AVX builtins. > > This test FAILs for me, as I don't have the latest gas aroun

Re: [PATCH 4/8] ipa: Better value ranges for zero pointer constants

2024-11-06 Thread Aldy Hernandez
Jan Hubicka writes: >> > 2024-11-01 Martin Jambor >> > >> > * ipa-prop.cc (ipa_compute_jump_functions_for_edge): When creating >> > value-range jump functions from pointer type constant zero, do so >> > as if it was not a pointer. >> > --- >> > gcc/ipa-prop.cc | 3 ++-

[PATCH v2 2/2] VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]

2024-11-06 Thread Andrew Pinski
After the last patch, we also want to record `(A CMP B) != 0` as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the true/false edges swapped. This shows up more due to the new handling of `(A | B) ==/!= 0` in insert_predicates_for_cond as now we can notice these comparisons which were not se

Re: [PATCH 7/8] ipa: Verify that const jump functions have corresponding value range

2024-11-06 Thread Aldy Hernandez
Aldy Hernandez writes: > Martin Jambor writes: > >> Hi, >> >> Because the simplified way of extracting value ranges from functions >> does not look at scalar constants (as one of the versions had been >> doing before) but instead rely on the value range within the jump >> function already captur

[PATCH v2 1/2] VN: Handle `(a | b) !=/== 0` for predicates [PR117414]

2024-11-06 Thread Andrew Pinski
For `(a | b) == 0`, we can "assert" on the true edge that both `a == 0` and `b == 0` but nothing on the false edge. For `(a | b) != 0`, we can "assert" on the false edge that both `a == 0` and `b == 0` but nothing on the true edge. This adds that predicate and allows us to optimize f0, f1, and f2 i

[PATCH v2 0/2] VN predicate improvements

2024-11-06 Thread Andrew Pinski
This is v2 of the predicate improvements. This is only the changed patches; rather than all of them. The main change is to use vn_valueize. But there was another change dealing with canonicalization of the comparison with constants always being on the rhs; that is why I am resending them even thoug

Re: [PATCH] testsuite: Adjust jump threading test expectation

2024-11-06 Thread Andrew Pinski
On Tue, Nov 5, 2024 at 4:53 AM Andrew Carlotti wrote: > > This test started failing on aarch64 after 0cfc9c95 in 2023 ("Phi > analyzer - Initialize with range instead of a tree."). > > The only change visible in the pass dumps prior to thread2 is the upper > bounds of some ranges are reduced from

Re: [PATCH] testsuite: Adjust jump threading test expectation

2024-11-06 Thread Aldy Hernandez
Andrew Carlotti writes: > This test started failing on aarch64 after 0cfc9c95 in 2023 ("Phi > analyzer - Initialize with range instead of a tree."). > > The only change visible in the pass dumps prior to thread2 is the upper > bounds of some ranges are reduced from +INF to 7, consistent with the

Re: [PATCH 7/8] ipa: Verify that const jump functions have corresponding value range

2024-11-06 Thread Aldy Hernandez
Martin Jambor writes: > Hi, > > Because the simplified way of extracting value ranges from functions > does not look at scalar constants (as one of the versions had been > doing before) but instead rely on the value range within the jump > function already capturing the constant, I have added a v

Re: [PATCH] Match: Optimize log (x) CMP CST and exp (x) CMP CST operations

2024-11-06 Thread Jeff Law
On 11/6/24 1:12 AM, Soumya AR wrote: On 29 Oct 2024, at 6:59 PM, Richard Biener wrote: External email: Use caution opening links or attachments On Mon, 28 Oct 2024, Soumya AR wrote: This patch implements transformations for the following optimizations. logN(x) CMP CST -> x CMP expN(C

Re: [PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]

2024-11-06 Thread Jeff Law
On 11/6/24 4:47 AM, Alexey Merzlyakov wrote: This patch adds optimization of the following patterns: (zero_extend:M (subreg:N (not:O==M (X:Q==M -> (xor:M (zero_extend:M (subreg:N (X:M)), mask)) ... where the mask is GET_MODE_MASK (N). For the cases when X:M doesn't have any no

Re: [PATCH] [PR106329] SVE intrinsics: Fold calls with pfalse predicate.

2024-11-06 Thread Richard Sandiford
Thanks for doing this and sorry for the slow review. Jennifer Schmitz writes: > If an SVE intrinsic has predicate pfalse, we can fold the call to > a simplified assignment statement: For _m, _x, and implicit predication, > the LHS can be assigned the operand for inactive values and for _z, we can

[PATCH v4] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-11-06 Thread Marek Polacek
On Mon, Nov 04, 2024 at 11:10:05PM -0500, Jason Merrill wrote: > On 10/30/24 4:59 PM, Marek Polacek wrote: > > On Wed, Oct 30, 2024 at 09:01:36AM -0400, Patrick Palka wrote: > > > On Tue, 29 Oct 2024, Marek Polacek wrote: > > --- a/gcc/cp/cp-tree.h > > +++ b/gcc/cp/cp-tree.h > > @@ -451,6 +451,7 @@

Re: [COMMITED] [lto] ipcp don't propagate where not needed

2024-11-06 Thread Jonathan Wakely
On Wed, 6 Nov 2024 at 18:39, Michal Jires wrote: > > On Wed, 2024-11-06 at 17:33:50 +, Jonathan Wakely wrote: > > > > If there's going to be a constructor then it should initialize the members. > > > > Otherwise, your original patch was better, because you could write > > this to get an all-ze

Re: [PATCH 2/2] aarch64: Add AdvSIMD LUT extension and vluti2{q}_lane{q} intrinsics

2024-11-06 Thread Richard Sandiford
writes: > The AArch64 FEAT_LUT extension is optional from Armv9.2-a and mandatory > from Armv9.5-a. This extension introduces instructions for lookup table > read with 2-bit indices. > > This patch adds AdvSIMD LUT intrinsics for LUTI2, supporting table > lookup with 2-bit packed indices. The foll

Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-06 Thread Torbjorn SVENSSON
On 2024-11-06 19:06, Richard Earnshaw (lists) wrote: On 06/11/2024 13:50, Torbjorn SVENSSON wrote: On 2024-11-06 14:04, Richard Earnshaw (lists) wrote: On 06/11/2024 12:23, Torbjorn SVENSSON wrote: On 2024-11-06 12:26, Richard Earnshaw (lists) wrote: On 06/11/2024 07:44, Christophe Lyo

Re: [PATCH] c++: Fix another crash with invalid new operators [PR117463]

2024-11-06 Thread Jason Merrill
On 11/6/24 2:23 PM, Simon Martin wrote: Even though this PR is very close to PR117101, it's not addressed by the fix I made through r15-4958-g5821f5c8c89a05 because cxx_placement_new_fn has the very same issue as std_placement_new_fn_p used to have. This patch fixes the issue exactly the same, b

Re: [PATCH 12/15] aarch64: Add common subset of SVE2p1 and SME

2024-11-06 Thread Richard Sandiford
Richard Sandiford writes: > Some instructions that were previously restricted to streaming mode > can also be used in non-streaming mode with SVE2.1. This patch adds > support for those, as well as the usual new-extension boilerplate. > A later patch will add the feature macro. > > gcc/ > *

[PATCH] c++: Fix another crash with invalid new operators [PR117463]

2024-11-06 Thread Simon Martin
Even though this PR is very close to PR117101, it's not addressed by the fix I made through r15-4958-g5821f5c8c89a05 because cxx_placement_new_fn has the very same issue as std_placement_new_fn_p used to have. This patch fixes the issue exactly the same, by checking the first parameter against NUL

Re: [PATCH 1/2] aarch64: Refactor infrastructure for advsimd intrinsics

2024-11-06 Thread Richard Sandiford
writes: > This patch refactors the infrastructure for defining advsimd pragma > intrinsics, adding support for more flexible type and signature > handling in future SIMD extensions. > > A new simd_type structure is introduced, which allows for consistent > mode and qualifier management across vari

Re: [COMMITED] [lto] ipcp don't propagate where not needed

2024-11-06 Thread Michal Jires
On Wed, 2024-11-06 at 17:33:50 +, Jonathan Wakely wrote: > > If there's going to be a constructor then it should initialize the members. > > Otherwise, your original patch was better, because you could write > this to get an all-zeros object: > > lto_encoder_entry e{}; > > Now you can't s

[PATCH 15/15] aarch64: Conditionally define __ARM_FEATURE_SVE2p1

2024-11-06 Thread Richard Sandiford
Previous patches are supposed to add full support for SVE2.1, so this patch advertises that through __ARM_FEATURE_SVE2p1. pragma_cpp_predefs_3.c had one fewer pop than push. The final test is triple-nested: - armv8-a (to start with a clean slate, untainted by command-line flags) - the maximal SV

[PATCH 13/15] aarch64: Add common subset of SVE2p1 and SME2

2024-11-06 Thread Richard Sandiford
This patch handles the SVE2p1 instructions that are shared with SME2. This includes the consecutive-register forms of the 2-register and 4-register loads and stores, but not the strided-register forms. gcc/ * config/aarch64/aarch64.h (TARGET_SVE2p1_OR_SME2): New macro. * config/aa

[PATCH 12/15] aarch64: Add common subset of SVE2p1 and SME

2024-11-06 Thread Richard Sandiford
Some instructions that were previously restricted to streaming mode can also be used in non-streaming mode with SVE2.1. This patch adds support for those, as well as the usual new-extension boilerplate. A later patch will add the feature macro. gcc/ * config/aarch64/aarch64-option-extensi

[PATCH 11/15] aarch64: Define arm_neon.h types in arm_sve.h too

2024-11-06 Thread Richard Sandiford
This patch moves the scalar and single-vector Advanced SIMD types from arm_neon.h into a private header, so that they can be defined by arm_sve.h as well. This is needed for the upcoming SVE2.1 hybrid-VLA reductions, which return 128-bit Advanced SIMD vectors. The approach follows Claudio's patch

[PATCH 10/15] aarch64: Add svboolx4_t

2024-11-06 Thread Richard Sandiford
This patch adds an svboolx4_t type, to go alongside the existing svboolx2_t type. It doesn't require any special ISA support beyond SVE itself and it currently has no associated instructions. gcc/ * config/aarch64/aarch64-modes.def (VNx64BI): New mode. * config/aarch64/aarch64-pro

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs
On 06/11/2024 17:59, Jakub Jelinek wrote: On Wed, Nov 06, 2024 at 05:53:53PM +, Andrew Stubbs wrote: I'm not sure why I didn't see this. Was it bootstrap tested or just built without bootstrap + tested? Otherwise it is just a warning. Apparently I forgot to rerun the bootstrap after maki

[PATCH 03/15] aarch64: Tweak definition of all_data & co

2024-11-06 Thread Richard Sandiford
Past extensions to SVE have required new subsets of all_data; the SVE2.1 patches will add another. This patch tries to make this more scalable by defining the multi-size *_data macros to be unions of single-size *_data macros. gcc/ * config/aarch64/aarch64-sve-builtins.cc (TYPES_all_data)

[PATCH 09/15] aarch64: Sort some SVE2 lists alphabetically

2024-11-06 Thread Richard Sandiford
gcc/ * config/aarch64/aarch64-sve-builtins-sve2.def: Sort entries alphabetically. * config/aarch64/aarch64-sve-builtins-sve2.h: Likewise. * config/aarch64/aarch64-sve-builtins-sve2.cc: Likewise. --- .../aarch64/aarch64-sve-builtins-sve2.cc | 24 +++---

[PATCH 08/15] aarch64: Factor out part of the SVE ext_def class

2024-11-06 Thread Richard Sandiford
This patch factors out some of ext_def into a base class, so that it can be reused for the SVE2.1 svextq intrinsic. gcc/ * config/aarch64/aarch64-sve-builtins-shapes.cc (ext_base): New base class, extracted from... (ext_def): ...here. --- .../aarch64/aarch64-sve-builtins-s

[PATCH 05/15] aarch64: Add an abstraction for vector base addresses

2024-11-06 Thread Richard Sandiford
In the upcoming SVE2.1 svld1q and svst1q intrinsics, the relationship between the base vector and the data vector differs from existing gather/scatter intrinsics. This patch adds a new abstraction to handle the difference. gcc/ * config/aarch64/aarch64-sve-builtins.h (function_sha

[PATCH 07/15] aarch64: Parameterise SVE pointer type inference

2024-11-06 Thread Richard Sandiford
All extending gather load intrinsics encode the source type in their name (e.g. svld1sb for an extending load from signed bytes). The type of the extension result has to be specified using an explicit type suffix; it isn't something that can be inferred from the arguments, since there are multiple

[PATCH 06/15] aarch64: Add an abstraction for scatter store type inference

2024-11-06 Thread Richard Sandiford
Until now, all data arguments to a scatter store needed to have 32-bit or 64-bit elements. This isn't true for the upcoming SVE2.1 svst1q scatter intrinsic, so this patch adds an abstraction around the restriction. gcc/ * config/aarch64/aarch64-sve-builtins-shapes.cc (store_scatte

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread H. Peter Anvin
On November 6, 2024 10:15:13 AM PST, Jakub Jelinek wrote: >On Wed, Nov 06, 2024 at 10:03:25AM -0800, H. Peter Anvin wrote: >> The issue is that we want the frame pointer chain to be maintained, even >> across alternatives. > >If the current function doesn't have frame pointer set up yet (or is in

[PATCH 04/15] aarch64: Use braces in SVE TBL instructions

2024-11-06 Thread Richard Sandiford
GCC previously used the older assembly syntax for SVE TBL, with no braces around the second operand. This patch switches to the newer, official syntax, with braces around the operand. The initial SVE binutils submission supported both syntaxes, so there should be no issues with backwards compatib

[PATCH 02/15] aarch64: Test TARGET_STREAMING instead of TARGET_STREAMING_SME

2024-11-06 Thread Richard Sandiford
g:ede97598e2c recorded separate ISA requirements for streaming and non-streaming mode. The premise there was that AARCH64_FL_SME should not be included in the streaming mode requirements, since: (a) an __arm_streaming_compatible function wouldn't be in streaming mode if SME wasn't available.

[PATCH 01/15] aarch64: Make more use of TARGET_STREAMING_SME2

2024-11-06 Thread Richard Sandiford
Some code was checking TARGET_STREAMING and TARGET_SME2 separately, but we now have a macro to test both at once. gcc/ * config/aarch64/aarch64-sme.md: Use TARGET_STREAMING_SME2 instead of separate TARGET_STREAMING and TARGET_SME2 tests. * config/aarch64/aarch64-sve2.md: Li

[PATCH 00/15] aarch64: Add support for SVE2.1

2024-11-06 Thread Richard Sandiford
This series adds support for FEAT_SVE2p1 (-march=...+sve2p1). One thing that the extension does is make some SME and SME2 instructions available outside of streaming mode. It also adds quite a few new instructions. Some of those new instructions are shared with SME2.1, which will be added by a la

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 10:03:25AM -0800, H. Peter Anvin wrote: > The issue is that we want the frame pointer chain to be maintained, even > across alternatives. If the current function doesn't have frame pointer set up yet (or is in the epilogue after it got restored already), then the chain is s

Re: [PATCH] c: Implement C2y N3356, if declarations [PR117019]

2024-11-06 Thread Joseph Myers
On Wed, 6 Nov 2024, Marek Polacek wrote: > On Wed, Nov 06, 2024 at 09:42:02AM -0500, Marek Polacek wrote: > > On reflection, I'm not so sure about these anymore: > > > > On Mon, Nov 04, 2024 at 06:26:47PM -0500, Marek Polacek wrote: > > > + switch (extern int i = 0); /* { dg-error "in condition

Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-06 Thread Richard Earnshaw (lists)
On 06/11/2024 13:50, Torbjorn SVENSSON wrote: > > > On 2024-11-06 14:04, Richard Earnshaw (lists) wrote: >> On 06/11/2024 12:23, Torbjorn SVENSSON wrote: >>> >>> >>> On 2024-11-06 12:26, Richard Earnshaw (lists) wrote: On 06/11/2024 07:44, Christophe Lyon wrote: > On Wed, 6 Nov 2024 at 0

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread H. Peter Anvin
On November 6, 2024 8:31:53 AM PST, Uros Bizjak wrote: >On Wed, Nov 6, 2024 at 5:23 PM Jakub Jelinek wrote: >> >> On Wed, Nov 06, 2024 at 05:05:54PM +0100, Uros Bizjak wrote: >> > Please see [1]: >> > >> > /* >> > * This output constraint should be used for any inline asm which has a >> > "call

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 05:53:53PM +, Andrew Stubbs wrote: > I'm not sure why I didn't see this. Was it bootstrap tested or just built without bootstrap + tested? Otherwise it is just a warning. > I'm testing the attached patch. If it makes it to stage3, this is ok for trunk. Just 64U would

Re: Implement removal of malloc/free pairs with NULL check

2024-11-06 Thread Jan Hubicka
Hi, this is updated patch which adds -fmalloc-dce flag to control malloc/free removal. I ended up copying what -fallocation-dse does so -fmalloc-dce=1 enables malloc/free removal provided return value is unused otherwise and -fmalloc-dce=2 allows additional NULL pointer checks which it folds to no

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs
On 06/11/2024 17:38, Andrew Pinski wrote: + if (ENABLE_OFFLOADING && offload) +{ + for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;) + { + if (startswith (c, "amdgcn")) + return ordered_max (64, omp_max_vf (false)); This causes a bootstrap failure for m

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 09:38:21AM -0800, Andrew Pinski wrote: > > + for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;) > > + { > > + if (startswith (c, "amdgcn")) > > + return ordered_max (64, omp_max_vf (false)); > > This causes a bootstrap failure for me (and

[PATCH 10/10] cp: Fix another assumption in the FE about constant vector indices.

2024-11-06 Thread Tejas Belagod
This patch adds a change to handle VLA's poly indices. gcc/ChangeLog: * cp/decl.cc (reshape_init_array_1): Handle poly indices. gcc/testsuite/ChangeLog: * g++.dg/ext/sve-sizeless-1.C: Update test to test initialize error. * g++.dg/ext/sve-sizeless-2.C: Likewise. --- gcc

[PATCH 01/10] aarch64: Fix ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS

2024-11-06 Thread Tejas Belagod
This patch enables ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS to indicate that C/C++ language operations are available natively on SVE ACLE types. gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define __ARM_FEATURE_SVE_VECTOR_OPERATORS. --- gcc/con

[PATCH 09/10] c: Fix bounds checking for VLA and construct VLA vector constants

2024-11-06 Thread Tejas Belagod
This patch adds support for checking bounds of SVE ACLE vector initialization constructors. It also adds support to construct vector constant from init constructors. gcc/ChangeLog: * c/c-typeck.cc (process_init_element): Add check to restrict constructor length to the minimum vec

Re: Implement removal of malloc/free pairs with NULL check

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 01:08:11PM +0100, Jakub Jelinek wrote: > Though, unsure how that > https://eel.is/c++draft/expr.new#14 > interacts with > https://eel.is/c++draft/expr.new#8 > and we'd have to check if we do the size checking before the > ::operator new/new[] calls or it ::operator new just

RE: [PATCH] avx10_2-comibf-2.c: Require AVX10.2 support

2024-11-06 Thread Liu, Hongtao
> -Original Message- > From: H.J. Lu > Sent: Wednesday, November 6, 2024 4:17 PM > To: Liu, Hongtao ; GCC Patches patc...@gcc.gnu.org>; Uros Bizjak > Subject: [PATCH] avx10_2-comibf-2.c: Require AVX10.2 support > > Since avx10_2-comibf-2.c is a run test, require AVX10.2 support. > >

Re: [PATCH 3/4] VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]

2024-11-06 Thread Richard Biener
On Sat, Nov 2, 2024 at 4:10 PM Andrew Pinski wrote: > > After the last patch, we also want to record `(A CMP B) != 0` > as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the > true/false edges swapped. > > This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c` > and make

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Pinski
On Wed, Nov 6, 2024 at 7:28 AM Andrew Stubbs wrote: > > If requested, return the vectorization factor appropriate for the offload > device, if any. > > This change gives a significant speedup in the BabelStream "dot" benchmark on > amdgcn. > > The omp_adjust_chunk_size usecase is set "false", for

Re: [COMMITED] [lto] ipcp don't propagate where not needed

2024-11-06 Thread Jonathan Wakely
On 06/11/24 13:39 +0100, Michal Jires wrote: Commited with suggested changes. - This patch disables propagation of ipcp information into partitions where all instances of the node are marked to be inlined. Motivation: Incremen

Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-06 Thread Christophe Lyon
On Wed, 6 Nov 2024 at 14:52, Torbjorn SVENSSON wrote: > > > > On 2024-11-06 14:04, Richard Earnshaw (lists) wrote: > > On 06/11/2024 12:23, Torbjorn SVENSSON wrote: > >> > >> > >> On 2024-11-06 12:26, Richard Earnshaw (lists) wrote: > >>> On 06/11/2024 07:44, Christophe Lyon wrote: > On Wed,

Re: [PATCH] inline-asm, i386: Add "redzone" clobber support

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 06:04:43PM +0100, Jan Hubicka wrote: > LGTM, though if asm needs temporary memory it can ask for it explicitly. Sure, but the "redzone" is not about needing extra temporary memory, but about asking the compiler not to put any of its own temporaries in the red zone. The tem

Re: [PATCH] inline-asm, i386: Add "redzone" clobber support

2024-11-06 Thread Jan Hubicka
> Hi! > > The following patch adds a "redzone" clobber (recognized just > on targets which choose to recognize it, right now just x86), > with which one can mark the rare case where inline asm pushes > something on the stack or uses call instruction without taking > red zone into account (i.e. add

Re: [PATCH V2 1/1] Unify registered_pp_pragmas and registered_pragmas

2024-11-06 Thread Jason Merrill
On 11/6/24 8:08 AM, Paul Iannetta wrote: On Mon, Nov 04, 2024 at 01:36:56PM -0500, Jason Merrill wrote: On 11/3/24 12:26 PM, Paul Iannetta wrote: On Fri, Nov 01, 2024 at 11:45:07AM -0400, Jason Merrill wrote: On 10/31/24 6:43 AM, Paul Iannetta wrote: gcc/c-family/ChangeLog: * c-pragm

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 05:31:53PM +0100, Uros Bizjak wrote: > > So workaround about issues in some kernel tool? > > Not sure if gcc needs to provide workaround for that. > > Just do the call in a separate .subsection or add some magic labels around > > it that the tool can check and disable the wa

Re: [PATCH 3/4] VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]

2024-11-06 Thread Andrew Pinski
On Wed, Nov 6, 2024 at 4:56 AM Richard Biener wrote: > > On Sat, Nov 2, 2024 at 4:10 PM Andrew Pinski wrote: > > > > After the last patch, we also want to record `(A CMP B) != 0` > > as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the > > true/false edges swapped. > > > > This is enough t

[PATCH] inline-asm: Add support for cc operand modifier

2024-11-06 Thread Jakub Jelinek
Hi! As mentioned in the "inline asm: Add new constraint for symbol definitions" patch description, while the c operand modifier is documented to: Require a constant operand and print the constant expression with no punctuation. it actually doesn't do that with -fpic at least on some targets and h

[PATCH 00/10] aarch64: Enable C/C++ operations on SVE ACLE types.

2024-11-06 Thread Tejas Belagod
Hi, This patchset enables C/C++ operations on SVE ACLE types. The changes enable operations on SVE ACLE types to have the same semantics as GNU vector types. These operations like (+, -, &, | etc) behave exactly as they would behave on GNU vector types. The operations are self-contained as in we

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread Jakub Jelinek
On Wed, Nov 06, 2024 at 10:44:54AM +0100, Uros Bizjak wrote: > After some more thinking and considering all recent discussion > (thanks!), I am convinced that a slightly simplified original patch > (attached), now one-liner, is the way to go. > > Let's look at the following test: > > --cut here--

Re: [PATCH] middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

2024-11-06 Thread Uros Bizjak
On Wed, Nov 6, 2024 at 5:23 PM Jakub Jelinek wrote: > > On Wed, Nov 06, 2024 at 05:05:54PM +0100, Uros Bizjak wrote: > > Please see [1]: > > > > /* > > * This output constraint should be used for any inline asm which has a > > "call" > > * instruction. Otherwise the asm may be inserted before

[PATCH 1/3] dwarf: Delete dead code.

2024-11-06 Thread Michal Jires
This if branch checks for comdat_type_p (GTY union tag) and then uses incorrect union variant die_id.die_symbol. There is no way to create this combination of valid values even if we ignore the GTY. Running testsuite with abort() in branch confirms that it is never taken. gcc/ChangeLog:

  1   2   3   >