Re: [PATCH v3 1/1] Add warning for non-spec compliant FMV in Aarch64

2025-01-17 Thread Richard Sandiford
writes: > This patch adds a warning when FMV is used for Aarch64. > > The reasoning for this is the ACLE [1] spec for FMV has diverged > significantly from the current implementation and we want to prevent > potential future compatability issues. > > There is a patch for an ACLE compliant version

[PATCH] aarch64: Add missing simd requirements for INS [PR118531]

2025-01-17 Thread Richard Sandiford
In g:b096a6ebe9d9f9fed4c105f6555f724eb32af95c I'd forgotten to gate some uses of INS on TARGET_SIMD. Tested on aarch64-linux-gnu. I'll push around this time on Monday if there are no comments before then. Richard gcc/ PR target/118531 * config/aarch64/aarch64.md (*insv_reg_)

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-17 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, January 10, 2025 4:50 PM >> To: Akram Ahmad >> Cc: ktkac...@nvidia.com; gcc-patches@gcc.gnu.org >> Subject: Re: [PATCH v3 1/2] aarch64: Use standard names for sa

Re: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-17 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Kyrylo Tkachov >> Sent: Friday, January 17, 2025 1:22 PM >> To: Tamar Christina >> Cc: GCC Patches ; nd ; Richard >> Earnshaw ; ktkac...@gcc.gnu.org; Richard >> Sandiford >> Subject:

Re: [PATCH v4] AArch64: Add LUTI ACLE for SVE2

2025-01-17 Thread Richard Sandiford
Saurabh Jha writes: > On 1/16/2025 8:44 AM, Richard Sandiford wrote: >> Thanks for the update. Mostly LGTM, but some comments below: >> >> writes: >>> diff --git a/gcc/config/aarch64/aarch64-sve2.md >>> b/gcc/config/aarch64/aarch64-sve2.md >>>

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-17 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Wilco Dijkstra >> Sent: Tuesday, January 14, 2025 5:30 PM >> To: Richard Sandiford >> Cc: Richard Earnshaw ; ktkac...@nvidia.com; GCC >> Patches ; sch...@linux-m68k.org >> Subject:

Re: [PATCH]AArch64: have -mcpu=native detect architecture extensions for unknown non-homogenous systems [PR113257]

2025-01-16 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Thursday, January 16, 2025 7:11 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; ktkac...@gcc.gnu.org >> Subject: Re

Re: [PATCH 11/11] aarch64: Make AARCH64_FL_CRYPTO always unset

2025-01-16 Thread Richard Sandiford
Andrew Carlotti writes: > This feature flag bit only exists to support the +crypto alias. Outside > of option processing this bit needs to be set or unset consistently. > This patch goes with the latter option. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.cc: Assert that CR

Re: [PATCH 10/11] aarch64: Refactor aarch64_rewrite_mcpu

2025-01-16 Thread Richard Sandiford
Andrew Carlotti writes: > Use aarch64_validate_cpu instead of the existing duplicate (and worse) > version of the -mcpu parsing code. > > The original code used fatal_error; I'm guessing that using error > instead should be ok. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.cc

Re: [PATCH 09/11] aarch64: Rewrite architecture strings for assembler

2025-01-16 Thread Richard Sandiford
Andrew Carlotti writes: > @@ -697,6 +697,50 @@ aarch64_get_extension_string_for_isa_flags > + const struct arch_info *entry; > + for (entry = all_architectures; entry->arch != aarch64_no_arch; entry++) > +{ > + if (entry->arch == arch) > + break; > +} Sorry for the nit, but for

Re: [PATCH 09/11] aarch64: Rewrite architecture strings for assembler

2025-01-16 Thread Richard Sandiford
Andrew Carlotti writes: > Add infrastructure to allow rewriting the architecture strings passed to > the assembler (either as -march options or .arch directives). There was > already canonicalisation everywhere except for an -march driver option > passed directly to the compiler; this patch appli

Re: [PATCH v5 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2025-01-16 Thread Richard Sandiford
Hongyu Wang writes: > From: Lingling Kong > > Hi, > > Appreciated to Richard's review, the v5 patch contaings below change: > > 1. Separate the maskload/maskstore emit out from noce_emit_cmove, add > a new function emit_mask_load_store in optabs.cc. > 2. Follow the operand order of maskload and m

Re: [PATCH v4] AArch64: Add LUTI ACLE for SVE2

2025-01-16 Thread Richard Sandiford
Thanks for the update. Mostly LGTM, but some comments below: writes: > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index f8cfe08f4c0..0a1dc314f94 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -133,6

Re: [PATCH v3] AArch64: Add LUTI ACLE for SVE2

2025-01-16 Thread Richard Sandiford
Saurabh Jha writes: > On 1/8/2025 11:13 AM, Richard Sandiford wrote: >> writes: >>> [...] >>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def >>> b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def >>> index e726fa1fb68..0c4f8251ac0 10

Re: [PATCH]AArch64: have -mcpu=native detect architecture extensions for unknown non-homogenous systems [PR113257]

2025-01-15 Thread Richard Sandiford
Richard Sandiford writes: > Tamar Christina writes: >> Ok for master? and how do you feel about a backport for the two patches to >> help >> distros? > > Backporting to GCC 14 & GCC 13 sounds good. Not so sure about GCC 12, > since I think we should be ex

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-15 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > >> Sorry to be awkward, but I don't think we should put >> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT in base. >> CHEAP_SHIFT_EXTEND is a good base flag because it means we can make full >> use of a certain group of instructions.  FULLY_PIPELINED_FMA simila

Re: [PATCH 2/3] AArch64: Add FULLY_PIPELINED_FMA to tune baseline

2025-01-15 Thread Richard Sandiford
Wilco Dijkstra writes: > ping >   > > Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is > already enabled for some cores, but benchmarking it shows it is faster on all > modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1). > > Passes regress & b

Re: [PATCH] tree-optimization/115895 - overrun with masked loop

2025-01-15 Thread Richard Sandiford
Richard Biener writes: > The following addresses the fact that with loop masking (or regular > mask loads) we do not implement load shortening but we override > the case where we need that for correctness. Likewise when we > attempt to use loop masking to handle large trailing gaps we cannot > do

Re: [PATCH]AArch64: have -mcpu=native detect architecture extensions for unknown non-homogenous systems [PR113257]

2025-01-15 Thread Richard Sandiford
Tamar Christina writes: > Ok for master? and how do you feel about a backport for the two patches to > help > distros? Backporting to GCC 14 & GCC 13 sounds good. Not so sure about GCC 12, since I think we should be extra cautious with the "most stable" branch, but let's see what others think.

Re: [wwwdocs] gcc-15: Deprecate ILP32 on AArch64

2025-01-14 Thread Richard Sandiford
Wilco Dijkstra writes: > As suggested in > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673558.html > update the gcc-15 Changes page: > > Add ILP32 depreciation to Caveats section. OK once the GCC patch has gone in. Thanks, Richard > > --- > > diff --git a/htdocs/gcc-15/changes.html

Re: [PATCH 07/11] aarch64: Move arch/cpu parsing to aarch64-common.cc

2025-01-14 Thread Richard Sandiford
Andrew Carlotti writes: > Aside from moving the functions, the only changes are to make them > non-static, and to use the existing info arrays within aarch64-common.cc > instead of the info arrays remaining in aarch64.cc. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.cc >

Re: [PATCH 08/11] aarch64: Inline aarch64_get_all_extension_candidates

2025-01-14 Thread Richard Sandiford
Andrew Carlotti writes: > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.cc > (aarch64_get_all_extension_candidates): Inline into... > (aarch64_print_hint_for_extensions): ...this. OK, thanks. Richard > diff --git a/gcc/common/config/aarch64/aarch64-common.cc > b/g

Re: [PATCH 06/11] aarch64: Inline aarch64_print_hint_for_core_or_arch

2025-01-14 Thread Richard Sandiford
Andrew Carlotti writes: > It seems odd that we add "native" to the list for -march but not for > -mcpu. This is probably a bug, but for now we'll preserve the existing > behaviour. Yeah, agree it looks like a bug (but also that it's not something to fix as part of this series). > gcc/ChangeLog:

Re: [PATCH 05/11] aarch64: Adjust option parsing parameter types.

2025-01-14 Thread Richard Sandiford
Andrew Carlotti writes: > Replace `const struct processor *` in output parameters with > `aarch64_arch` or `aarch64_cpu`. > > Replace `std:string` parameter in aarch64_print_hint_for_extensions with > `char *`. > > Also name the return parameters more clearly and consistently. > > gcc/ChangeLog: >

Re: [PATCH 04/11] aarch64: Rename info structs in aarch64-common.cc

2025-01-14 Thread Richard Sandiford
Andrew Carlotti writes: > Also add a (currently unused) processor field to processor_info, and > change name from "" to NULL for the terminating array entries. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.cc > (struct aarch64_option_extension): Rename to.. > (str

Re: [PATCH 03/11] aarch64: Remove redundant generic cpu entry

2025-01-14 Thread Richard Sandiford
Andrew Carlotti writes: > The list of cores in aarch64-common.cc included an explicit "generic" > entry, despite this entry also being present in aarch64-cores.def. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.cc > (all_cores): Remove explicit generic entry. OK, thank

Re: [PATCH 02/11] aarch64: Replace duplicate cpu enums

2025-01-14 Thread Richard Sandiford
Andrew Carlotti writes: > Replace `enum aarch64_processor` and `enum target_cpus` with > `enum aarch64_cpu`, and prefix the entries with `AARCH64_CPU_`. > Also rename aarch64_none to aarch64_no_cpu. > > gcc/ChangeLog: > > * config/aarch64/aarch64-opts.h > (enum aarch64_processor): Rena

Re: [PATCH 01/11] aarch64: Improve mcpu/march conflict check

2025-01-14 Thread Richard Sandiford
Andrew Carlotti writes: > Features from a cpu or base architecture that were explicitly disabled > by a +nofeat option were being incorrectly added back in before checking > for conflicts between -mcpu and -march options. This patch instead > compares the returned feature masks directly. > > gcc/

Re: [RFA] [PR rtl-optimization/109592] Improve fwprop's handling of nested shifts/extensions

2025-01-14 Thread Richard Sandiford
Jeff Law writes: > On 12/30/24 3:02 PM, Richard Sandiford wrote: > >> >> So it seems like it's a bit of a mess :( >> >> If we do try to fix combine, I think something like the attached >> would fit within the current scheme. It is a pure shift-

Re: [PATCH] match: Keep conditional in simplification to constant [PR118140].

2025-01-14 Thread Richard Sandiford
"Robin Dapp" writes: >> OK, thanks. >> >> Richard > > The issue is also present on GCC 14 as well and the patch applies cleanly. > Regtested on rv64gcv_zvl512b. To make it explicit: OK to backport to 14? Yes, thanks. Richard

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-14 Thread Richard Sandiford
Wilco Dijkstra writes: > ILP32 was originally intended to make porting to AArch64 easier. Support was > never merged in the Linux kernel or GLIBC, so it has been unsupported for many > years. There isn't a benefit in keeping unsupported features forever, so > deprecate it now (and it could be re

Re: [PATCH]AArch64: don't override march to assembler with mcpu if march is specified [PR110901]

2025-01-14 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > When both -mcpu and -march are specified, the value of -march wins out. > > This is done correctly for the calls to cc1 and for the assembler directives > we > put out in assembly files. > > However in the call to as we don't do this and instead use the arch

Re: [PATCH]AArch64: have -mcpu=native detect architecture extensions for unknown non-homogenous systems [PR113257]

2025-01-13 Thread Richard Sandiford
Richard Sandiford writes: > Tamar Christina writes: >>> -Original Message- >>> From: Richard Sandiford >>> Sent: Monday, January 13, 2025 6:35 PM >>> To: Tamar Christina >>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >>&g

Re: [PATCH]AArch64: have -mcpu=native detect architecture extensions for unknown non-homogenous systems [PR113257]

2025-01-13 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Monday, January 13, 2025 6:35 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; ktkac...@gcc.gnu.org >> Subject: Re: [PATCH]AArc

[gcc r15-6875] Fix build for STORE_FLAG_VALUE<0 targets [PR118418]

2025-01-13 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:a1a14ce3c39c25fecf052ffde063fc0ecfc2ffa3 commit r15-6875-ga1a14ce3c39c25fecf052ffde063fc0ecfc2ffa3 Author: Richard Sandiford Date: Mon Jan 13 19:37:12 2025 + Fix build for STORE_FLAG_VALUE<0 targets [PR118418] I

[PATCH] Fix build for STORE_FLAG_VALUE<0 targets [PR118418]

2025-01-13 Thread Richard Sandiford
In g:06c4cf398947b53b4bfc65752f9f879bb2d07924 I mishandled signed comparisons of comparison results on STORE_FLAG_VALUE < 0 targets (despite specifically referencing STORE_FLAG_VALUE in the commit message). There, (lt TRUE FALSE) is true, although (ltu FALSE TRUE) still holds. Things get messy wi

Re: [PATCH]AArch64: have -mcpu=native detect architecture extensions for unknown non-homogenous systems [PR113257]

2025-01-13 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > in g:e91a17fe39c39e98cebe6e1cbc8064ee6846a3a7 we added the ability for > -mcpu=native on unknown CPUs to still enable architecture extensions. > > This has worked great but was only added for homogenous systems. > > However the same thing works for big.LITTLE

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-13 Thread Richard Sandiford
Iain Sandoe writes: > Hi Folks, > >> On 10 Jan 2025, at 18:30, Wilco Dijkstra wrote: >> >> Hi Andrew, >> >>> Personally I would like this deprecated even for bare-metal. Yes the >>> iwatch ABI is an ILP32 ABI but I don't see GCC implementing that any >>> time soon and I suspect it would not be

Re: [PATCH 3/3] aarch64: Add +cpa feature flag

2025-01-13 Thread Richard Sandiford
Andrew Carlotti writes: > This doesn't enable anything within the compiler, but this allows the > flag to be passed the assembler. There also doesn't appear to be a > kernel cpuinfo name yet. > > > Ok for master? > > gcc/ChangeLog: > > * config/aarch64/aarch64-arches.def (V9_5A): Add CPA. >

Re: [PATCH 1/3] aarch64: Add command line support for armv9.5-a

2025-01-13 Thread Richard Sandiford
Andrew Carlotti writes: > Ok for master? > > gcc/ChangeLog: > > * config/aarch64/aarch64-arches.def (V9_5A): New. > * doc/invoke.texi: Document armv9.5-a option. > > diff --git a/gcc/config/aarch64/aarch64-arches.def > b/gcc/config/aarch64/aarch64-arches.def > index > fd4881a8ebfbd34

Re: [PATCH] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > >>> +  /* LSE2 is a prerequisite for atomic LDIAPP/STILP.  */ >>> +  if (!(hwcap & HWCAP_USCAT)) >>> return false; >> >> Is there a reason for not using has_lse2 here?  It'd be good to have >> a comment if so. > > Yes, the MRS instructions cause expensiv

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Kyrill, > >>> Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and >>> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT >>> to the baseline tuning since all modern cores use it.  Fix the >>> neoverse512tvb tuning to be >>> like Neoverse V1/V2. >> >> For neoversev512tvb this me

Re: [PATCH] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Richard Sandiford
Wilco Dijkstra writes: > ping >   > > Simplify and cleanup ifunc selection logic.  Since LRCPC3 does > not imply LSE2, has_rcpc3() should also check LSE2 is enabled. > > Passes regress and bootstrap, OK for commit? > > libatomic: >     * config/linux/aarch64/host-config.h (has_lse2): Cleanup.

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-10 Thread Richard Sandiford
if the patch lands early stage 4. > On 09/01/2025 23:04, Richard Sandiford wrote: >> Akram Ahmad writes: >>> In the above example, subtraction replaces the adds with subs and the >>> csinv with csel. The 32-bit case follows the same approach. Arithmetic >>> with a co

Re: [PATCH] AArch64: Remove Cortex-A57 FMA steering pass

2025-01-10 Thread Richard Sandiford
Wilco Dijkstra writes: > As a minor cleanup remove Cortex-A57 FMA steering pass. Since Cortex-A57 is > pretty old, there isn't any benefit of keeping this. > > Passes regress & bootstrap, OK for commit? > > gcc: > * config.gcc (extra_objs): Remove cortex-a57-fma-steering.o. > * config

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Richard Sandiford
Wilco Dijkstra writes: > ILP32 was originally intended to make porting to AArch64 easier. Support was > never merged in the Linux kernel or GLIBC, so it has been unsupported for many > years. There isn't a benefit in keeping unsupported features forever, so > deprecate it now (and it could be re

Re: [PATCH v2 2/2] aarch64: Use standard names for SVE saturating arithmetic

2025-01-10 Thread Richard Sandiford
Akram Ahmad writes: > Rename the existing SVE unpredicated saturating arithmetic instructions > to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB. > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve.md: Rename insns > > gcc/testsuite/ChangeLog: > > * gcc/testsuite/gcc

Re: [PATCH] rtl: Remove invalid compare simplification [PR117186]

2025-01-10 Thread Richard Sandiford
Richard Biener writes: > On Mon, Jan 6, 2025 at 2:12 PM Richard Sandiford > wrote: >> >> g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at >> https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html , >> added code to treat: >> >> (set (

[gcc r15-6777] rtl: Remove invalid compare simplification [PR117186]

2025-01-10 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:06c4cf398947b53b4bfc65752f9f879bb2d07924 commit r15-6777-g06c4cf398947b53b4bfc65752f9f879bb2d07924 Author: Richard Sandiford Date: Fri Jan 10 12:51:15 2025 + rtl: Remove invalid compare simplification [PR117186] g:d882fe5150fbbeb4e44d007bb4964e5b22373021

Re: [PATCH v2] Add warning for non-spec compliant FMV in Aarch64

2025-01-10 Thread Richard Sandiford
writes: > This patch adds a warning when FMV is used for Aarch64. > > The reasoning for this is the ACLE [1] spec for FMV has diverged > significantly from the current implementation and we want to prevent > potential future compatability issues. > > There is a patch for an ACLE compliant version

Re: [PATCH v2] arm: [MVE intrinsics] Fix tuples field name (PR 118332)

2025-01-10 Thread Richard Sandiford
"Richard Earnshaw (lists)" writes: > On 09/01/2025 14:50, Christophe Lyon wrote: >> The previous fix only worked for C, for C++ we need to add more >> information to the underlying type so that >> finish_class_member_access_expr accepts it. >> >> We use the same logic as in aarch64's register_tup

Re: [PATCH] match: Keep conditional in simplification to constant [PR118140].

2025-01-10 Thread Richard Sandiford
"Robin Dapp" writes: > Hi, > > in PR118140 we simplify > > _ifc__33 = .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11); > > to "1": > > Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1 > > when _46 == 1. This happens by removing the conditional and applying > a | 1 = 1. Nor

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-09 Thread Richard Sandiford
Akram Ahmad writes: > Hi Kyrill, > > Thanks for the feedback on V2. I found a pattern which works for > the open-coded signed arithmetic, and I've implemented the other > feedback you provided as well. > > I've send the modified patch in this thread as the SVE patch [2/2] > hasn't been changed, bu

Re: [PATCH 10/10] aarch64: Try to avoid passing new flags to assembler

2025-01-09 Thread Richard Sandiford
Richard Sandiford writes: > Andrew Carlotti writes: >> On Mon, Nov 25, 2024 at 11:26:39PM +, Richard Sandiford wrote: >>> Sorry for the slow review. >>> >>> Andrew Carlotti writes: >>> > These new flags (+fcma, +jscvt, +rcpc2, +jscvt, +frintt

Re: Questions about macro fusion pass

2025-01-09 Thread Richard Sandiford via Gcc
Hau Hsu via Gcc writes: > Hi, > > I have a question about GCC's macro fusion pass. > In the GCC internals doc, there is a hook for scheduling: > TARGET_SCHED_MACRO_FUSION_PAIR_P > > I

Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-09 Thread Richard Sandiford
Tamar Christina writes: >> > + After the final loads are done it issues a >> > + vec_construct to recreate the vector from the scalar. For costing >> > when >> > + we see a vec_to_scalar on a stmt with VMAT_GATHER_SCATTER we are >> dealing >> > + with an emulated instruction and

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2025-01-09 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > >> The patch below is what I meant.  It passes bootstrap & regression-test >> on aarch64-linux-gnu (and so produces the same results for the tests >> that you changed).  Do you see any problems with this version? >> If not, I think we should go with it. > > T

[gcc r15-6703] aarch64: Fix overly restrictive sibcall check [PR107102]

2025-01-08 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:0de5c20b72a738782e31acce771c6f2085e1014b commit r15-6703-g0de5c20b72a738782e31acce771c6f2085e1014b Author: Richard Sandiford Date: Wed Jan 8 18:20:47 2025 + aarch64: Fix overly restrictive sibcall check [PR107102] aarch64_function_ok_for_sibcall required

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2025-01-08 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > >> ...I still think we should avoid testing can_create_pseudo_p. >> Does it work with the last part replaced by: >> >>  if (!DECIMAL_FLOAT_MODE_P (mode)) >>    { >>  if (aarch64_can_const_movi_rtx_p (src, mode) >>  || aarch64_float_const_represent

[clang] [AArch64][Clang] Add support for __arm_agnostic("sme_za_state") (PR #121788)

2025-01-08 Thread Richard Sandiford via cfe-commits
@@ -7559,6 +7559,26 @@ The attributes ``__arm_in(S)``, ``__arm_out(S)``, ``__arm_inout(S)`` and }]; } +def ArmAgnosticDocs : Documentation { + let Category = DocCatArmSmeAttributes; + let Content = [{ +The ``__arm_agnostic`` keyword applies to prototyped function types an

Re: [PATCH] Disable a broken multiversioning optimisation

2025-01-08 Thread Richard Sandiford
Andrew Carlotti writes: > This patch skips redirect_to_specific clone for aarch64 and riscv, > because the optimisation has two flaws: > > 1. It checks the value of the "target" attribute, even on targets that > don't use this attribute for multiversioning. > > 2. The algorithm used is too aggress

Re: [PATCH v3] AArch64: Add LUTI ACLE for SVE2

2025-01-08 Thread Richard Sandiford
writes: > This patch introduces support for LUTI2/LUTI4 ACLE for SVE2. > > LUTI instructions are used for efficient table lookups with 2-bit > or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from > the low 128 bits of the table vector using packed 2-bit indices, > while LUTI4 can re

Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-08 Thread Richard Sandiford
Tamar Christina writes: >> >> i.e. we use separate address arithmetic and avoid UMOVs. Counting >> >> two loads and one store for each element of the scatter store seems >> >> like overkill for that. >> > >> > Hmm agreed.. >> > >> > How about for stores we increase the load counts by count / 2? >

Re: [PATCH] docs: Document new hardreg PRE pass.

2025-01-08 Thread Richard Sandiford
Jeff Law writes: > On 1/7/25 11:17 AM, Richard Sandiford wrote: >> Andrew Carlotti writes: >>> I forgot to include this in the earlier patch; is this ok for master (once >>> the >>> pass is merged, of course)? >>> >>> gcc/ChangeLog: &g

Re: [PATCH] Prefer scalar_int_mode if the size - 1 is equal to UNITS_PER_WORD.

2025-01-07 Thread Richard Sandiford
Jeff Law writes: > On 1/7/25 2:09 AM, Tsung Chun Lin wrote: >> Hi, >> >> Could someone help merge this patch if there are no further concerns? > It'll get addressed. Many contributors have been on holiday and are > still catching up. FWIW, I'm happy to push the patch, but wasn't sure how to ch

Re: [PATCH] docs: Document new hardreg PRE pass.

2025-01-07 Thread Richard Sandiford
Andrew Carlotti writes: > I forgot to include this in the earlier patch; is this ok for master (once the > pass is merged, of course)? > > gcc/ChangeLog: > > * doc/passes.texi: Document hardreg PRE pass. > > > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi > index > 639f6b325c8be47b

Re: [PATCH 10/10] aarch64: Try to avoid passing new flags to assembler

2025-01-07 Thread Richard Sandiford
Andrew Carlotti writes: > On Mon, Nov 25, 2024 at 11:26:39PM +0000, Richard Sandiford wrote: >> Sorry for the slow review. >> >> Andrew Carlotti writes: >> > These new flags (+fcma, +jscvt, +rcpc2, +jscvt, +frintts, +wfxt and +xs) >> > were only recen

[PATCH] aarch64: Fix overly restrictive sibcall check [PR107102]

2025-01-07 Thread Richard Sandiford
aarch64_function_ok_for_sibcall required the caller and callee to use the same PCS variant. However, it should be enough for the callee to preserve at least as much register state as the caller; preserving more state is fine. ARM_PCS_AAPCS64, ARM_PCS_SIMD, and ARM_PCS_SVE agree on what GPRs shoul

Re: [PATCH] arm: [MVE intrinsics] Fix tuples field name (PR 118332)

2025-01-07 Thread Richard Sandiford
Christophe Lyon writes: > A recent commit mistakenly changed the field name for tuples from > 'val' to '__val', but unlike SVE this name is mandated by ACLE. > > The patch simply switches back the name to 'val'. > > PR target/118332 > > gcc/ChangeLog: > > * config/arm/arm-mve-builtins.

[clang] [AArch64][Clang] Add support for __arm_agnostic("sme_za_state") (PR #121788)

2025-01-07 Thread Richard Sandiford via cfe-commits
@@ -7559,6 +7559,26 @@ The attributes ``__arm_in(S)``, ``__arm_out(S)``, ``__arm_inout(S)`` and }]; } +def ArmAgnosticDocs : Documentation { + let Category = DocCatArmSmeAttributes; + let Content = [{ +The ``__arm_agnostic`` keyword applies to prototyped function types an

Re: [PATCH v3] aarch64: remove extra XTN in vector concatenation

2025-01-06 Thread Richard Sandiford
Akram Ahmad writes: > Hi Richard, > > Thanks for the feedback. I've copied in the resulting patch here- if > this is okay, please could it be committed on my behalf? The patch > continues below. > > Many thanks, > > Akram Thanks. LGTM. Pushed to trunk. Richard > --- > > GIMPLE code which perfo

[gcc r15-6609] aarch64: remove extra XTN in vector concatenation

2025-01-06 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:6069f02a486054484ad638b083cb3b9486bb4321 commit r15-6609-g6069f02a486054484ad638b083cb3b9486bb4321 Author: Akram Ahmad Date: Mon Jan 6 20:09:30 2025 + aarch64: remove extra XTN in vector concatenation GIMPLE code which performs a narrowing truncation on t

Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2025-01-06 Thread Richard Sandiford
Jennifer Schmitz writes: >> It would also be good to check for performance regressions, now that we have >> a patch to test: >> I will run SPEC2017 with -mcpu=generic and -mcpu=native on Grace, but we >> would appreciate help with benchmarking on other platforms. >> Tamar, would you still be wil

Re: [PATCH v2 5/7] IRA+LRA: Let the backend request to split basic blocks

2025-01-06 Thread Richard Sandiford
for the backend to use whenever it finds it > necessary. > > gcc/ > * function.h (struct function): Add > `split_basic_blocks_after_reload' member. > * lra.cc (lra): Handle it. > * reload1.cc (reload): Likewise. > --- > This was approved by

[PATCH] rtl: Remove invalid compare simplification [PR117186]

2025-01-06 Thread Richard Sandiford
g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html , added code to treat: (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0))) as a nop. This PR shows that that isn't always correct. The compare in the set a

Re: [RFA] [PR rtl-optimization/107455] Eliminate unnecessary constant load

2025-01-03 Thread Richard Sandiford
Jeff Law writes: > This resurrects a patch from a bit over 2 years ago that I never wrapped > up. IIRC, I ended up up catching covid, then in the hospital for an > unrelated issue and it just got dropped on the floor in the insanity. > > The basic idea here is to help postreload-cse eliminate m

Re: [PATCH]AArch64: Implement four and eight chunk VLA concats [PR118272]

2025-01-03 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, January 3, 2025 10:59 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; ktkac...@gcc.gnu.org >> Subject: Re: [PATCH]A

[gcc r15-6551] rtlanal: Treat writes to sp as also writing to memory [PR117938]

2025-01-03 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:355475e332f264107ef07555f7c379be7b85942f commit r15-6551-g355475e332f264107ef07555f7c379be7b85942f Author: Richard Sandiford Date: Fri Jan 3 18:12:07 2025 + rtlanal: Treat writes to sp as also writing to memory [PR117938] This PR was about a case in

Re: [PATCH]AArch64: Implement four and eight chunk VLA concats [PR118272]

2025-01-03 Thread Richard Sandiford
Tamar Christina writes: >> > >> > How about instead doing something like: >> > >> > worklist.reserve (nelts); >> > for (int i = 0; i < nelts; ++i) >> > worklist.quick_push (force_reg (elem_mode, XVECEXP (vals, 0, i))); >> > >> > while (nelts > 2) >> > { >> > for (int i = 0; i <

[PATCH] rtlanal: Treat writes to sp as also writing to memory [PR117938]

2025-01-03 Thread Richard Sandiford
This PR was about a case in which late-combine moved a stack deallocation across an earlier stack access. This was possible because the deallocation was missing the RTL-SSA equivalent of a vop, which in turn was because rtl_properties didn't treat the deallocation as writing to memory. I think th

Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-02 Thread Richard Sandiford
Tamar Christina writes: >> >> So I think ideally, we should try to detect whether the indices come >> >> directly from memory or are the result of arithmetic. In the former case, >> >> we should do the loads adjustment above. In the latter case, we should >> >> keep the vec_to_scalar accounting

Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-02 Thread Richard Sandiford
Tamar Christina writes: >> > [...] >> > #define iterations 10 >> > #define LEN_1D 32000 >> > >> > float a[LEN_1D], b[LEN_1D]; >> > >> > float >> > s4115 (int *ip) >> > { >> > float sum = 0.; >> > for (int i = 0; i < LEN_1D; i++) >> > { >> > sum += a[i] * b[ip[i]]; >

[pushed] Use _Float128 in test for PR118184

2025-01-02 Thread Richard Sandiford
The test was failing on x86 because longdouble128 only checks sizeof, rather than a full 128-bit payload. Using _Float128 is more portable and still exposes the original bug. Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious. Richard gcc/testsuite/ PR target/118184

[gcc r15-6506] Use _Float128 in test for PR118184

2025-01-02 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:99d5ef700619c28904846399a6f6692af4c56b1b commit r15-6506-g99d5ef700619c28904846399a6f6692af4c56b1b Author: Richard Sandiford Date: Thu Jan 2 17:33:49 2025 + Use _Float128 in test for PR118184 The test was failing on x86 because longdouble128 only checks

Re: [PATCH]AArch64: Implement four and eight chunk VLA concats [PR118272]

2025-01-02 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > The following testcase > > #pragma GCC target ("+sve") > extern char __attribute__ ((simd, const)) fn3 (int, short); > void test_fn3 (float *a, float *b, double *c, int n) > { > for (int i = 0; i < n; ++i) > a[i] = fn3 (b[i], c[i]); > } > >

Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-02 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > When a target does not support gathers and scatters the vectorizer tries to > emulate these using scalar loads/stores and a reconstruction of vectors from > scalar. > > The loads are still marked with VMAT_GATHER_SCATTER to indicate that they are > gather/scat

Re: [PATCH] testsuite: libitm: Adjust how libitm.c++ passes link flags

2025-01-02 Thread Richard Sandiford
Matthew Malcomson writes: > On 1/2/25 12:08, Richard Sandiford wrote: >>> +# This set in order to give libitm.c++/c++.exp a nicely named flag to >>> set >>> +# when adding C++ options. >>> +set TEST_ALWAYS_FLAGS "" >> >>

Re: [PATCH] testsuite: libitm: Adjust how libitm.c++ passes link flags

2025-01-02 Thread Richard Sandiford
writes: > From: Matthew Malcomson > > For the `gcc` and `g++` tools we often pass -B/path/to/object/dir in via > `TEST_ALWAYS_FLAGS` (see e.g. asan.exp where this is set). > In libitm.c++/c++.exp we pass that -B flag via the `tool_flags` argument > to `dg-runtest`. > > Passing as the `tool_flags`

Re: [committed] Use u'' instead of '' in libgdiagnostics/conf.py

2025-01-02 Thread Richard Sandiford
Jakub Jelinek writes: > Hi! > > libgdiagnostics/conf.py breaks update-copyright.py --this-year, > which only accepts copyright year in u'' literals in python files, > not in ''. > > u'' strings is what e.g. libgccjit conf.py uses. > Tested by building libgdiagnostics docs without/with this patch,

[PATCH] aarch64: Detect word-level modification in early-ra [PR118184]

2025-01-02 Thread Richard Sandiford
REGMODE_NATURAL_SIZE is set to 64 bits for everything except VLA SVE modes. This means that it's possible to modify (say) the highpart of a TI pseudo or a V2DI pseudo independently of the lowpart. Modifying such highparts requires a reload if the highpart ends up in the upper 64 bits of an FPR, s

[gcc r15-6503] aarch64: Detect word-level modification in early-ra [PR118184]

2025-01-02 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:2b687ad95de61091105d040d6bc06cb3d44ac3d1 commit r15-6503-g2b687ad95de61091105d040d6bc06cb3d44ac3d1 Author: Richard Sandiford Date: Thu Jan 2 11:34:52 2025 + aarch64: Detect word-level modification in early-ra [PR118184] REGMODE_NATURAL_SIZE is set to 64

Re: Questions about uses of (define_subst ...)

2024-12-31 Thread Richard Sandiford via Gcc
Hi, Sorry for the slow reply, didn't see this till now. Benoit Dinechin via Gcc writes: > Hello, > > I use (define_subst ...) in a way that solves my .md factoring > problems but I wonder if this use is expected and if will be > maintained / documented in the future. > > Target is a 64-bit core

Re: [PATCH] Prefer scalar_int_mode if the size - 1 is equal to UNITS_PER_WORD.

2024-12-31 Thread Richard Sandiford
Tsung Chun Lin writes: > Address Richard's comment. > > Thanks > Jim > > Richard Sandiford 於 2024年12月30日 週一 下午7:50寫道: >> >> Tsung Chun Lin writes: >> > From ddb7852c92dc0222af9eeec1deafce753b3a7067 Mon Sep 17 00:00:00 2001 >> > From: Jim

Re: [RFA] [PR rtl-optimization/109592] Improve fwprop's handling of nested shifts/extensions

2024-12-30 Thread Richard Sandiford
Jeff Law writes: > The BZ in question is a failure to recognize a pair of shifts as a sign > extension. > > I originally thought simplify-rtx would be the right framework to > address this problem, but fwprop is actually better. We can write the > recognizer much simpler in that framework. > >

Re: [PATCH] simplify-rtx: Limit number of elts in when encoding.

2024-12-30 Thread Richard Sandiford
Jeff Law writes: > On 12/30/24 8:16 AM, Richard Sandiford wrote: > >> >> The divisor is by definition 1. I think dropping it would make the >> loop more obviously correct, since the same assumption is implicit in >> the loop body. > I'll likely pick this

Re: [PATCH v2 1/1] aarch64: remove extra XTN in vector concatenation

2024-12-30 Thread Richard Sandiford
Akram Ahmad writes: > GIMPLE code which performs a narrowing truncation on the result of a > vector concatenation currently results in an unnecessary XTN being > emitted following a UZP1 to concate the operands. In cases such as this, > UZP1 should instead use a smaller arrangement specifier to re

Re: [PATCH] AArch64: Cleanup alignment macros

2024-12-30 Thread Richard Sandiford
Richard Sandiford writes: > Wilco Dijkstra writes: >> Hi Richard, >> >>>> A common case is a constant string which is compared against some >>>> argument. Most string functions work on 8 or 16-byte quantities. If we >>>> ensure the whole

Re: [PATCH] simplify-rtx: Limit number of elts in when encoding.

2024-12-30 Thread Richard Sandiford
Andrew Pinski writes: > On Fri, Dec 27, 2024 at 3:19 AM Robin Dapp wrote: >> >> Thanks for the helpful suggestion. The attached v2 patch tries to implement >> it. >> >> It was bootstrapped and regtested on x86, aarch64 and Power 10. >> Also regtested on rv64gcv_zvl512b. >> >> Those are all littl

[gcc r15-6468] aarch64: Add missing makefile dependency

2024-12-30 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:5f40ff8efde2b8b140f170619e99b6df9722f79d commit r15-6468-g5f40ff8efde2b8b140f170619e99b6df9722f79d Author: Richard Sandiford Date: Mon Dec 30 12:50:56 2024 + aarch64: Add missing makefile dependency gcc/ * config/aarch64/t-aarch64 (aarch64

[gcc r15-6467] aarch64: Use mf8 instead of f8 in builtin definitions

2024-12-30 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:834939a82ea23daaf99c58ea1694079f22eca6f4 commit r15-6467-g834939a82ea23daaf99c58ea1694079f22eca6f4 Author: Richard Sandiford Date: Mon Dec 30 12:50:55 2024 + aarch64: Use mf8 instead of f8 in builtin definitions The intrinsic type suffix for modal

  1   2   3   4   5   6   7   8   9   10   >