Re: [PATCH 0/2] aarch64: Add -msimd-memops option controlling SIMD usage

2025-08-07 Thread Wilco Dijkstra
Hi Keith, Thanks for the explanation - however I'm afraid compilers don't have a concept of implicit vs explicit use of operations or registers. > I'm trying to find all cases where that happens for data types other > than SIMD/FP. Do you know of other places where the compiler implicitly > uses

Re: [PATCH] aarch64: Mark SME functions as .variant_pcs [PR121414]

2025-08-07 Thread Wilco Dijkstra
Hi Richard, > Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible > functions allow/require PSTATE.SM to be 1 on entry, so they need to > be treated as STO_AARCH64_VARIANT_PCS. > > Similarly, functions that share ZA or ZT0 with their callers require > ZA to be active on entry

[PATCH] optab: Add optab for isnan

2025-08-07 Thread Wilco Dijkstra
Add an optab for isnan. This requires changes to the existing folding code to extend the interclass_mathfn infrastructure to support BUILT_IN_ISNAN. It now checks for a valid optab before emitting the generic expansion. There is no change if no optab is defined. Update documentation, including t

[PATCH 0/2] aarch64: Add -msimd-memops option controlling SIMD usage

2025-08-06 Thread Wilco Dijkstra
Hi Keith, > This option (enabled by default) preserves existing behavior by > allowing use of Advanced SIMD registers while expanding > memset/memcpy/memmove operations into inline instructions. > > Disabling this option prevents use of these registers for environments > where the FPU may be disab

[PATCH 8/8] aarch64: Use cc when CB/CBB/CBH is out-of-range

2025-08-06 Thread Wilco Dijkstra
Hi Richard, +++ b/gcc/config/aarch64/aarch64.md @@ -876,10 +876,16 @@ (clobber (reg:CC CC_REGNUM))] "TARGET_CMPBR && aarch64_cb_rhs (, operands[1])" { -return (get_attr_far_branch (insn) == FAR_BRANCH_NO) - ? "cb\\t%0, %1, %l2" - : aarch64_gen_far_branch (operands, 2, -

[PATCH 0/8] aarch64: CMPBR fixes

2025-08-05 Thread Wilco Dijkstra
Hi Richard, This is a really good improvement - I've built all of SPEC2017 without any issues. Overall it shows almost 3.0% codesize reduction with +cmpbr! I noticed that the patches slightly increase codesize even without +cmpbr - not quite sure why. So overall this looks OK for commit. Btw as

[PATCH 6/8] aarch64: Add cc clobber to compare-and-branch patterns

2025-08-05 Thread Wilco Dijkstra
>On 8/5/25 19:43, Richard Henderson wrote: >> That said, I'm a little confused >> why we'd want to use SUBS+B.{EQ,NE} instead of SUB+CB{Z,NZ}. > > The answer to that is that B.{EQ,NE} converts easily to CSEL/CSINC/CSINV. I did it originally because that works out best - CBZ has a shorter branch

[gcc r16-2682] AArch64: Use correct cost for shifted halfword load/stores

2025-07-31 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:731649066f0fd2e2b2fbfd8668e001c3e91290d6 commit r16-2682-g731649066f0fd2e2b2fbfd8668e001c3e91290d6 Author: Wilco Dijkstra Date: Thu Jun 26 15:41:06 2025 + AArch64: Use correct cost for shifted halfword load/stores Since all Armv9 cores support shifted

[gcc r16-2684] libgcc: Update FMV features to latest ACLE spec 2024Q4

2025-07-31 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:9996036205b5a71e7738f2daa29f4e6f79886a4e commit r16-2684-g9996036205b5a71e7738f2daa29f4e6f79886a4e Author: Wilco Dijkstra Date: Tue Mar 25 15:51:42 2025 + libgcc: Update FMV features to latest ACLE spec 2024Q4 Update FMV features to latest ACLE spec of

[gcc r16-2683] libgcc: Cleanup HWCAP defines in cpuinfo.c

2025-07-31 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:a6bb6934a491015c4d3f08763455d86ccfb3bcbe commit r16-2683-ga6bb6934a491015c4d3f08763455d86ccfb3bcbe Author: Wilco Dijkstra Date: Mon Apr 28 16:20:15 2025 + libgcc: Cleanup HWCAP defines in cpuinfo.c Cleanup HWCAP defines - rather than including hwcap.h

Re: [PATCH] aarch64: Fix endianness of DFmode vector constants

2025-07-09 Thread Wilco Dijkstra
Hi Richard,   > aarch64_simd_valid_imm tries to decompose a constant into a repeating > series of 64 bits, since most Advanced SIMD and SVE immediate forms > require that.  (The exceptions are handled first.)  It does this by > building up a byte-level register image, lsb first.  If the image does

[PATCH] AArch64: Use correct cost for shifted halfword load/stores

2025-07-01 Thread Wilco Dijkstra
Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero for these. Passes regress, OK for commit? gcc: * config/aarch64/tuning_models/generic_armv9_a.h (generic_armv9_a_addrcost_table): Use zero cost for himode. --- diff --git a/gcc/config/aarch64/tuning_mo

Re: alx-0029r4 - Restore the traditional realloc(3) specification

2025-06-25 Thread Wilco Dijkstra
Hi Alejandro, > > > >  +XXX) > > > >  +While atypical, > > > >  +realloc may fail > > > >  +for a call that shrinks the block of memory. > > > > > > Is it worth wording this as "may fail or return a different pointer > > > for a call that shrinks the block of memory"? > Oh, the text is still ther

[PATCH] AArch64: Disable TARGET_CONST_ANCHOR

2025-06-20 Thread Wilco Dijkstra
TARGET_CONST_ANCHOR appears to trigger too often, even on simple immediates. It inserts extra ADD/SUB instructions even when a single MOV exists. Disable it to improve overall code quality: on SPEC2017 it removes 1850 ADD/SUB instructions and 630 spill instructions, and SPECINT is ~0.06% faster on

[PATCH] libgcc: Cleanup HWCAP defines in cpuinfo.c

2025-04-30 Thread Wilco Dijkstra
Cleanup HWCAP defines - rather than including hwcap.h and then repeating it using #ifndef, just define the HWCAPs we need exactly as in hwcap.h. libgcc: * config/aarch64/cpuinfo.c: Cleanup HWCAP defines. --- diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c

[PATCH] libgcc: Update FMV features to latest ACLE spec 2024Q4

2025-04-30 Thread Wilco Dijkstra
Update FMV features to latest ACLE spec of 2024Q4 - several features have been removed or merged. Add FMV support for CSSC and MOPS. Preserve the ordering in enum CPUFeatures. gcc: * common/config/aarch64/cpuinfo.h: Remove unused features, add FEAT_CSSC and FEAT_MOPS.

Re: libatomic: use HWCAPs in AArch64 ifunc tests

2025-03-13 Thread Wilco Dijkstra
Hi Richard, > Could you give details?  I thought it was always known that trapped > system register accesses were slow.  In the previous versions, the > checks seemed to be presented as an up-front price worth paying for > faster atomic operations, on the systems that would use those paths. > Now

[gcc r15-8032] libgcc: Remove PREDRES and LS64 from AArch64 cpuinfo

2025-03-13 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:6e47e6d48844ee578fd384aaa4b8cd62d73b49db commit r15-8032-g6e47e6d48844ee578fd384aaa4b8cd62d73b49db Author: Wilco Dijkstra Date: Mon Feb 24 16:38:02 2025 + libgcc: Remove PREDRES and LS64 from AArch64 cpuinfo Change AArch64 cpuinfo to follow the latest

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-12 Thread Wilco Dijkstra
Hi Richard, > That was also what I was trying to say.  In the worst case, the linked > object has to meet the requirements of the lowest common denominator. > > And my supposition was that that isn't a property of static vs dynamic. But it is. Dynamic linking supports mixing different code models

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-07 Thread Wilco Dijkstra
Hi Richard, >> Basically the small and large model are fundamentally incompatible. The >> infamous >> "dumb linker" approach means it doesn't try to sort sections, so an ADRP >> relocation >> will be out of reach if its data is placed after a huge array. Static >> linking with GLIBC or >> enabl

[gcc r15-7871] AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-06 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:f870302515d5fcf7355f0108c3ead0038ff326fd commit r15-7871-gf870302515d5fcf7355f0108c3ead0038ff326fd Author: Wilco Dijkstra Date: Mon Mar 3 16:47:32 2025 + AArch64: Enable early scheduling for -O3 and higher (PR118351) Enable the early scheduler on AArch64

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Wilco Dijkstra
Hi Ramana, > -Generate code for the large code model.  This makes no assumptions about > -addresses and sizes of sections.  Programs can be statically linked only.  > The > +Generate code for the large code model.  This allows large .bss and .data > +sections, however .text and .rodata must still

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Wilco Dijkstra
Hi Kyrill, > This restriction should be documented in invoke.texi IMO. > I also think it would be more user friendly to warn them about the > incompatibility if an explicit -moutline-atomics option is passed. > It’s okay though to silently turn off the implicit default-on option though. I've upd

Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-04 Thread Wilco Dijkstra
Hi Richard&Kyrill, >> I’m in favour of this. > > Yeah, seems ok to me too.  I suppose we ought to update the documentation too: I've added a note to the documentation. However it is impossible to be complete here since many targets switch off early scheduling under various circumstances. So I'v

libatomic: use HWCAPs in AArch64 ifunc tests

2025-03-03 Thread Wilco Dijkstra
Feedback from the kernel team suggests that it's best to only use HWCAPs rather than also use low-level checks as done by has_lse128() and has_rcpc3(). So change these to just use HWCAPs which simplifies the code and speeds up ifunc selection by avoiding expensive system register accesses. Passes

libgcc: Remove PREDRES and LS64 from AArch64 cpuinfo

2025-03-03 Thread Wilco Dijkstra
Change AArch64 cpuinfo to follow the latest updates to the FMV spec [1]: Remove FEAT_PREDRES and FEAT_LS64*. Preserve the ordering in enum CPUFeatures. Passes regress, OK for commit? [1] https://github.com/ARM-software/acle/pull/382 gcc: * common/config/aarch64/cpuinfo.h: Remove FEAT_PR

AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-03 Thread Wilco Dijkstra
Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 benefits from much faster build times with -O2, but avoids the regressions in lbm which is very sensitive to minor scheduling changes due to long FMA chains. We can then revisit this for GCC16. gcc: PR target/118351

AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-03 Thread Wilco Dijkstra
Outline atomics is not designed to be used with -mcmodel=large, so disable it automatically if the large code model is used. Passes regress, OK for commit? gcc: PR target/112465 * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1): Turn off outline atomic

[gcc r15-6922] AArch64: Add FULLY_PIPELINED_FMA to tune baseline

2025-01-15 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:2713f6bb90765de81954275a30c62432d30e1d68 commit r15-6922-g2713f6bb90765de81954275a30c62432d30e1d68 Author: Wilco Dijkstra Date: Thu Nov 14 14:34:17 2024 + AArch64: Add FULLY_PIPELINED_FMA to tune baseline Add FULLY_PIPELINED_FMA to tune baseline - this

gcc-wwwdocs branch master updated. 3cf8c149005ecd153b1bb0a082920f95edb670b0

2025-01-15 Thread Wilco Dijkstra via Gcc-cvs-wwwdocs
--- commit 3cf8c149005ecd153b1bb0a082920f95edb670b0 Author: Wilco Dijkstra Date: Tue Jan 14 16:14:52 2025 + gcc-15: Add ILP32 depreciation on AArch64. diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index 1c690c4a..d5037efb 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.htm

[gcc r15-6923] AArch64: Update neoverse512tvb tuning

2025-01-15 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:4ce502f31f95ec19e7d347d43afcd015895f135d commit r15-6923-g4ce502f31f95ec19e7d347d43afcd015895f135d Author: Wilco Dijkstra Date: Fri Jan 10 19:48:02 2025 + AArch64: Update neoverse512tvb tuning Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and

[gcc r15-6921] AArch64: Deprecate -mabi=ilp32

2025-01-15 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:625ea3c6ea1811388d030eddff57cd46c209d49a commit r15-6921-g625ea3c6ea1811388d030eddff57cd46c209d49a Author: Wilco Dijkstra Date: Thu Jan 9 19:41:14 2025 + AArch64: Deprecate -mabi=ilp32 ILP32 was originally intended to make porting to AArch64 easier

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-14 Thread Wilco Dijkstra
Hi Richard, > Sorry to be awkward, but I don't think we should put > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT in base. > CHEAP_SHIFT_EXTEND is a good base flag because it means we can make full > use of a certain group of instructions.  FULLY_PIPELINED_FMA similarly > means that FMA chains beh

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-14 Thread Wilco Dijkstra
Hi Richard, >> +  if (TARGET_ILP32) >> +    warning (OPT_Wdeprecated, "%<-mabi=ilp32%> is deprecated."); > > There should be no "." at the end of the message. Right, fixed in v2 below. > Otherwise it looks good to me, although like Kyrill says, it'll also > need a release note. I've added one,

[wwwdocs] gcc-15: Deprecate ILP32 on AArch64

2025-01-14 Thread Wilco Dijkstra
As suggested in https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673558.html update the gcc-15 Changes page: Add ILP32 depreciation to Caveats section. --- diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index 1c690c4a168f4d6297ad33dd5b798e9200792dc5..d5037efb34cc8e6

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-13 Thread Wilco Dijkstra
Hi all, > In that case, I'm coming round to the idea of deprecating ILP32. > I think it was already common ground that the GNU/Linux support is dead. > watchOS would use Mach objects rather than ELF.  As you say, it isn't > clear how much of the current ILP32 support would be relevant for it. > An

[gcc r15-6802] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:81bcf412c1c221bc2557666a6ca8381dac1de097 commit r15-6802-g81bcf412c1c221bc2557666a6ca8381dac1de097 Author: Wilco Dijkstra Date: Fri Jan 10 18:01:58 2025 + libatomic: Cleanup AArch64 ifunc selection Simplify and cleanup ifunc selection logic. Since

Re: [PATCH] AArch64: Cleanup alignment macros

2025-01-10 Thread Wilco Dijkstra
Hi Richard, > It looks like you committed the original version instead, with no extra > explanation.  I suppose I should have asked for another review round > instead. Did you check the commit log? Change the AARCH64_EXPAND_ALIGNMENT macro into proper function calls to make future change

Re: [PATCH] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Wilco Dijkstra
Hi Richard, > Yeah, somewhat.  But won't we go on to test has_lse2 anyway, due to: > > #  elif defined (LSE2_LRCPC3_ATOP) > #   define IFUNC_NCOND(N)   2 > #   define IFUNC_COND_1 (has_rcpc3 (hwcap, features)) > #   define IFUNC_COND_2 (has_lse2 (hwcap, features)) > > If we want to reduce the

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Wilco Dijkstra
Hi Andrew, > Personally I would like this deprecated even for bare-metal. Yes the > iwatch ABI is an ILP32 ABI but I don't see GCC implementing that any > time soon and I suspect it would not be hard to resurrect the code at > that point. My patch deprecates it in all cases currently. It will be

Re: [PATCH] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Wilco Dijkstra
Hi Richard, >> +  /* LSE2 is a prerequisite for atomic LDIAPP/STILP.  */ >> +  if (!(hwcap & HWCAP_USCAT)) >> return false; > > Is there a reason for not using has_lse2 here?  It'd be good to have > a comment if so. Yes, the MRS instructions cause expensive traps, so we try to avoid them whe

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Wilco Dijkstra
Hi Kyrill, >> Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and >> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT >> to the baseline tuning since all modern cores use it.  Fix the >> neoverse512tvb tuning to be >> like Neoverse V1/V2. > > For neoversev512tvb this means adding AARCH64_EXTRA_TUNE_AVOI

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Wilco Dijkstra
ping   Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT to the baseline tuning since all modern cores use it.  Fix the neoverse512tvb tuning to be like Neoverse V1/V2. gcc/ChangeLog:     * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TU

Re: [PATCH 2/3] AArch64: Add FULLY_PIPELINED_FMA to tune baseline

2025-01-10 Thread Wilco Dijkstra
ping   Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is already enabled for some cores, but benchmarking it shows it is faster on all modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1). Passes regress & bootstrap, OK for commit? gcc/ChangeLo

Re: [PATCH] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Wilco Dijkstra
ping   Simplify and cleanup ifunc selection logic.  Since LRCPC3 does not imply LSE2, has_rcpc3() should also check LSE2 is enabled. Passes regress and bootstrap, OK for commit? libatomic:     * config/linux/aarch64/host-config.h (has_lse2): Cleanup.     (has_lse128): Likewise.     (

[PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Wilco Dijkstra
ILP32 was originally intended to make porting to AArch64 easier. Support was never merged in the Linux kernel or GLIBC, so it has been unsupported for many years. There isn't a benefit in keeping unsupported features forever, so deprecate it now (and it could be removed in a future release). Pa

[PATCH] AArch64: Remove Cortex-A57 FMA steering pass

2025-01-10 Thread Wilco Dijkstra
As a minor cleanup remove Cortex-A57 FMA steering pass. Since Cortex-A57 is pretty old, there isn't any benefit of keeping this. Passes regress & bootstrap, OK for commit? gcc: * config.gcc (extra_objs): Remove cortex-a57-fma-steering.o. * config/aarch64/aarch64-passes.def: Remo

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2025-01-09 Thread Wilco Dijkstra
Hi Richard, > The patch below is what I meant.  It passes bootstrap & regression-test > on aarch64-linux-gnu (and so produces the same results for the tests > that you changed).  Do you see any problems with this version? > If not, I think we should go with it. Thanks for the detailed example - u

[gcc r15-6661] AArch64: Switch off early scheduling

2025-01-07 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:c5db3f50bdf34ea96fd193a2a66d686401053bd2 commit r15-6661-gc5db3f50bdf34ea96fd193a2a66d686401053bd2 Author: Wilco Dijkstra Date: Fri Nov 1 14:40:26 2024 + AArch64: Switch off early scheduling The early scheduler takes up ~33% of the total build time

[gcc r15-6660] AArch64: Block combine_and_move from creating FP literal loads

2025-01-07 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:45d306a835cb3f865a897dc7c04efbe1f9f46c28 commit r15-6660-g45d306a835cb3f865a897dc7c04efbe1f9f46c28 Author: Wilco Dijkstra Date: Fri Nov 1 14:44:56 2024 + AArch64: Block combine_and_move from creating FP literal loads The IRA combine_and_move pass runs if

[gcc r14-11101] arm: Fix LDRD register overlap [PR117675]

2024-12-18 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:9366c328518766d896155388726055624716c0af commit r14-11101-g9366c328518766d896155388726055624716c0af Author: Wilco Dijkstra Date: Tue Dec 10 14:22:48 2024 + arm: Fix LDRD register overlap [PR117675] The register indexed variants of LDRD have complex

[gcc r15-6087] AArch64: Add baseline tune

2024-12-10 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:132025a5fe6a9ba59d62126ecba21887f7ac0f98 commit r15-6087-g132025a5fe6a9ba59d62126ecba21887f7ac0f98 Author: Wilco Dijkstra Date: Thu Nov 14 14:28:10 2024 + AArch64: Add baseline tune Cleanup the extra tune defines by introducing AARCH64_EXTRA_TUNE_BASE as

[gcc r15-6088] arm: Fix LDRD register overlap [PR117675]

2024-12-10 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:21fbfae2e55e1a153820acc6fbd922e66f67e65b commit r15-6088-g21fbfae2e55e1a153820acc6fbd922e66f67e65b Author: Wilco Dijkstra Date: Tue Dec 10 14:22:48 2024 + arm: Fix LDRD register overlap [PR117675] The register indexed variants of LDRD have complex

[gcc r15-6086] AArch64: Cleanup alignment macros

2024-12-10 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:bf6efbbad14e46f97bcc36c531000d8d4740c863 commit r15-6086-gbf6efbbad14e46f97bcc36c531000d8d4740c863 Author: Wilco Dijkstra Date: Tue Oct 1 16:51:14 2024 + AArch64: Cleanup alignment macros Change the AARCH64_EXPAND_ALIGNMENT macro into proper function

[gcc r15-6085] AArch64: Use LDP/STP for large struct types

2024-12-10 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:27d9b6d312678c7b9b09104eb0f48dc46e0f8ca2 commit r15-6085-g27d9b6d312678c7b9b09104eb0f48dc46e0f8ca2 Author: Wilco Dijkstra Date: Fri May 10 17:13:40 2024 + AArch64: Use LDP/STP for large struct types Use LDP/STP for large struct types as they have useful

Re: [PATCH] AArch64: Cleanup alignment macros

2024-12-06 Thread Wilco Dijkstra
Hi Richard, >> A common case is a constant string which is compared against some >> argument. Most string functions work on 8 or 16-byte quantities. If we >> ensure the whole array fits in one aligned load, we save time in the >> string function. >> >> Runtime data collected for strlen calls shows

Re: [PATCH] AArch64: Cleanup alignment macros

2024-12-06 Thread Wilco Dijkstra
Hi Richard, > So just to be sure I understand: we still want to align (say) an array > of 4 chars to 32 bits so that the LDR & STR are aligned, and an array of > 3 chars to 32 bits so that the LDRH & STRH for the leading two bytes are > aligned?  Is that right?  We don't seem to take advantage of

[PATCH] arm: Fix LDRD register overlap [PR117675]

2024-12-03 Thread Wilco Dijkstra
The register indexed variants of LDRD have complex register overlap constraints which makes them hard to use without using output_move_double (which can't be used for atomics as it doesn't guarantee to emit atomic LDRD/STRD when required). Add a new predicate and constraint for plain LDRD/STRD wi

[PATCH] AArch64: Cleanup alignment macros

2024-12-03 Thread Wilco Dijkstra
Change the AARCH64_EXPAND_ALIGNMENT macro into proper function calls to make future changes easier. Use the existing alignment settings, however avoid overaligning small array's or structs to 64 bits when there is no benefit. This gives a small reduction in data and stack size. Passes regress & b

[PATCH] libatomic: Cleanup AArch64 ifunc selection

2024-11-27 Thread Wilco Dijkstra
Simplify and cleanup ifunc selection logic. Since LRCPC3 does not imply LSE2, has_rcpc3() should also check LSE2 is enabled. Passes regress and bootstrap, OK for commit? libatomic: * config/linux/aarch64/host-config.h (has_lse2): Cleanup. (has_lse128): Likewise. (has_rcp

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2024-11-15 Thread Wilco Dijkstra
Hi Kyrill, > This would make USE_NEW_VECTOR_COSTS effectively the default. > Jennifer has been trying to do that as well and then to remove it (as it > would be always true) but there are some codegen regressions that still > > need to be addressed. Yes, that's the goal - we should use good tun

[PATCH 2/3] AArch64: Add FULLY_PIPELINED_FMA to tune baseline

2024-11-14 Thread Wilco Dijkstra
Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is already enabled for some cores, but benchmarking it shows it is faster on all modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1). Passes regress & bootstrap, OK for commit? gcc/ChangeLog:

[PATCH 1/3] AArch64: Add baseline tune

2024-11-14 Thread Wilco Dijkstra
Cleanup the extra tune defines by introducing AARCH64_EXTRA_TUNE_BASE as a common base supported by all modern cores. Initially set it to AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND. No change in generated code. Passes regress & bootstrap, OK for commit? gcc/ChangeLog: * config/aarch64/aarc

[PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2024-11-14 Thread Wilco Dijkstra
Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT to the baseline tuning since all modern cores use it. Fix the neoverse512tvb tuning to be like Neoverse V1/V2. gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2024-11-13 Thread Wilco Dijkstra
Hi Richard, > ...I still think we should avoid testing can_create_pseudo_p. > Does it work with the last part replaced by: > >  if (!DECIMAL_FLOAT_MODE_P (mode)) >    { >  if (aarch64_can_const_movi_rtx_p (src, mode) >  || aarch64_float_const_representable_p (src) >  || aarch64

Re: [PATCH] AArch64: Switch off early scheduling

2024-11-12 Thread Wilco Dijkstra
Hi, >>> What do you think about disabling late scheduling as well? >> >> I think this would definitely need separate consideration and evaluation >> given the above. >> >> Another thing to consider is the macro fusion machinery. IIRC it works >> during scheduling so if we don’t run any schedulin

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2024-11-12 Thread Wilco Dijkstra
Hi Richard, > The idea was that, if we did the split during expand, the movsf/df > define_insns would then only accept the immediates that their > constraints can handle. Right, always disallowing these immediates works fine too (it seems reload doesn't require all immediates to be valid), and th

[gcc r15-5173] AArch64: Cleanup fusion defines

2024-11-12 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:deb0e2f61908bdc57b481995fa9e7c5083839a25 commit r15-5173-gdeb0e2f61908bdc57b481995fa9e7c5083839a25 Author: Wilco Dijkstra Date: Wed Oct 2 16:34:41 2024 + AArch64: Cleanup fusion defines Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a

[gcc r15-5174] AArch64: Remove duplicated addr_cost tables

2024-11-12 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:95305c800b1b3263534fdf67b63609772ecbb78d commit r15-5174-g95305c800b1b3263534fdf67b63609772ecbb78d Author: Wilco Dijkstra Date: Mon Oct 7 15:42:49 2024 + AArch64: Remove duplicated addr_cost tables Remove duplicated addr_cost tables - use

[PATCH] AArch64: Cleanup fusion defines

2024-11-08 Thread Wilco Dijkstra
Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a common base level of fusion supported by almost all cores. Add AARCH64_FUSE_MOVK as a shortcut for all MOVK fusion. In most cases there is no change. It enables AARCH64_FUSE_CMP_BRANCH for a few older cores since it has no measura

[PATCH] AArch64: Remove duplicated addr_cost tables

2024-11-08 Thread Wilco Dijkstra
Remove duplicated addr_cost tables - use generic_armv9_a_addrcost_table for Armv9-a cores and generic_armv8_a_addrcost_table for recent Armv8-a cores. No changes in generated code. OK for commit? gcc/ChangeLog: * config/aarch64/tuning_models/cortexx925.h (cortexx925_addrcost_table): Re

Re: [PATCH] AArch64: Block combine_and_move from creating FP literal loads

2024-11-08 Thread Wilco Dijkstra
Hi Richard, > That's because, once an instruction matches, the instruction should > continue to match.  It should always be possible to set the INSN_CODE of > an existing instruction to -1, rerun recog, and get the same instruction > code back. > > Because of that, insn conditions shouldn't depend

Re: [PATCH] AArch64: Block combine_and_move from creating FP literal loads

2024-11-08 Thread Wilco Dijkstra
Hi Richard, > It's ok for instructions to require properties that are false during > early RTL passes and then transition to true.  But they can't require > properties that go from true to false, since that would mean that > existing instructions become unrecognisable at certain points during > th

[PATCH v2] AArch64: Switch off early scheduling

2024-11-01 Thread Wilco Dijkstra
v2: split off movsf/df pattern fixes, remove some guality xfails that now pass The early scheduler takes up ~33% of the total build time, however it doesn't provide a meaningful performance gain.  This is partly because modern OoO cores need far less scheduling, partly because the scheduler tends

[PATCH] AArch64: Block combine_and_move from creating FP literal loads

2024-11-01 Thread Wilco Dijkstra
The IRA combine_and_move pass runs if the scheduler is disabled and aggressively combines moves. The movsf/df patterns allow all FP immediates since they rely on a split pattern. However splits do not happen during IRA, so the result is extra literal loads. To avoid this, use a more accurate ch

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Wilco Dijkstra
Hi Kyrill, > I think the approach that I’d like to try is using the TARGET_SCHED_DISPATCH > hooks like x86 does for bdver1-4. > That would try to exploit the dispatch constraints information in the SWOGs > rather than the instruction latency and throughput tables. > That would still require some

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Wilco Dijkstra
Hi Andrew, > I suspect the following scheduling models could be removed due either > to hw never going to production or no longer being used by anyone: > thunderx3t110.md > falkor.md > saphira.md If you're planning to remove these, it would also be good to remove the falkor-tag-collision-avoidanc

[PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Wilco Dijkstra
The early scheduler takes up ~33% of the total build time, however it doesn't provide a meaningful performance gain. This is partly because modern OoO cores need far less scheduling, partly because the scheduler tends to create many unnecessary spills by increasing register pressure. Building ap

Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-29 Thread Wilco Dijkstra
Hi Vineet, > I agree the NARROW/WIDE stuff is obfuscating things in technicalities. Is there evidence this change would make things significantly worse for some targets? I did a few runs on Neoverse V2 with various options and it looks beneficial both for integer and FP. On the example and option

[gcc r15-4678] AArch64: Add more accurate constraint [PR117292]

2024-10-25 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:7c17058eac3834fb03ec9e518235e4192557b97d commit r15-4678-g7c17058eac3834fb03ec9e518235e4192557b97d Author: Wilco Dijkstra Date: Fri Oct 25 14:53:58 2024 + AArch64: Add more accurate constraint [PR117292] As shown in the PR, reload may only check the

[PATCH] AArch64: Add more accurate constraint [PR117292]

2024-10-25 Thread Wilco Dijkstra
As shown in the PR, reload may only check the constraint in some cases and and not check the predicate is still valid for the resulting instruction. To fix the issue, add a new constraint which matches the predicate exactly. Passes regress & bootstrap, OK for commit? gcc/ChangeLog: PR ta

[gcc r15-4572] AArch64: Remove redundant check in aarch64_simd_mov

2024-10-23 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:2ac01a4efceacb9f2f9433db636545885296da0a commit r15-4572-g2ac01a4efceacb9f2f9433db636545885296da0a Author: Wilco Dijkstra Date: Thu Oct 17 14:33:44 2024 + AArch64: Remove redundant check in aarch64_simd_mov The split condition in aarch64_simd_mov uses

[gcc r15-4571] AArch64: Fix copysign patterns

2024-10-23 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:7c7c895c2f34d2a5c0cd2139c5e76c13c6c030c9 commit r15-4571-g7c7c895c2f34d2a5c0cd2139c5e76c13c6c030c9 Author: Wilco Dijkstra Date: Tue Oct 15 16:22:23 2024 + AArch64: Fix copysign patterns The current copysign pattern has a mismatch in the predicates and

[gcc r15-4568] AArch64: Improve SIMD immediate generation (2/3)

2024-10-23 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:756890d66cf4971fc11187ccdf5893681aa661a1 commit r15-4568-g756890d66cf4971fc11187ccdf5893681aa661a1 Author: Wilco Dijkstra Date: Tue Oct 8 15:55:25 2024 + AArch64: Improve SIMD immediate generation (2/3) Allow use of SVE immediates when generating AdvSIMD

[gcc r15-4569] AArch64: Add support for SIMD xor immediate (3/3)

2024-10-23 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:22a37534c640ca5ff2f0e947dfe60df59fb6bba1 commit r15-4569-g22a37534c640ca5ff2f0e947dfe60df59fb6bba1 Author: Wilco Dijkstra Date: Mon Oct 14 16:53:44 2024 + AArch64: Add support for SIMD xor immediate (3/3) Add support for SVE xor immediate when generating

[gcc r15-4567] AArch64: Improve SIMD immediate generation (1/3)

2024-10-23 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:bcbf4fa46ae2919cf281322bd39f4810b7c18c9d commit r15-4567-gbcbf4fa46ae2919cf281322bd39f4810b7c18c9d Author: Wilco Dijkstra Date: Tue Oct 8 13:32:09 2024 + AArch64: Improve SIMD immediate generation (1/3) Cleanup the various interfaces related to SIMD

[PATCH] AArch64: Remove redundant check in aarch64_simd_mov

2024-10-17 Thread Wilco Dijkstra
The split condition in aarch64_simd_mov uses aarch64_simd_special_constant_p. While doing the split, it checks the mode before calling aarch64_maybe_generate_simd_constant. This risky since it may result in unexpectedly calling aarch64_split_simd_move instead of aarch64_maybe_generate_simd_con

[PATCH v3] AArch64: Fix copysign patterns

2024-10-17 Thread Wilco Dijkstra
The current copysign pattern has a mismatch in the predicates and constraints - operand[2] is a register_operand but also has an alternative X which allows any operand. Since it is a floating point operation, having an integer alternative makes no sense. Change the expander to always use vector i

Re: [PATCH 3/3] AArch64: Add support for SIMD xor immediate

2024-10-15 Thread Wilco Dijkstra
Add support for SVE xor immediate when generating AdvSIMD code and SVE is available. Passes bootstrap & regress, OK for commit? gcc/ChangeLog: * config/aarch64/aarch64.cc (enum simd_immediate_check): Add AARCH64_CHECK_XOR. (aarch64_simd_valid_xor_imm): New function. (a

Re: [PATCH 2/2] AArch64: Improve SIMD immediate generation

2024-10-14 Thread Wilco Dijkstra
Allow use of SVE immediates when generating AdvSIMD code and SVE is available. First check for a valid AdvSIMD immediate, and if SVE is available, try using an SVE move or bitmask immediate. Passes bootstrap & regress, OK for commit? gcc/ChangeLog: * config/aarch64/aarch64-simd.md (ior3

[PATCH 1/2] AArch64: Improve SIMD immediate generation

2024-10-14 Thread Wilco Dijkstra
Cleanup the various interfaces related to SIMD immediate generation. Introduce new functions that make it clear which operation (AND, OR, MOV) we are testing for rather than guessing the final instruction. Reduce the use of overly long names, unused and default parameters for clarity. No cha

Re: [PATCH] aarch64: Fix bug with max/min (PR116934)

2024-10-04 Thread Wilco Dijkstra
Hi Saurabh, This looks good, one little nit: > gcc/ChangeLog: > >     * config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and >     UNSPEC_COND_SMIN to correct iterators. This should also have the PR target/116934 before it - it's fine to change it when you commit. Speaking of which,

[PATCH v2] AArch64: Fix copysign patterns

2024-09-18 Thread Wilco Dijkstra
v2: Add more testcase fixes. The current copysign pattern has a mismatch in the predicates and constraints - operand[2] is a register_operand but also has an alternative X which allows any operand. Since it is a floating point operation, having an integer alternative makes no sense. Change the e

[PATCH] AArch64: Fix copysign patterns

2024-09-17 Thread Wilco Dijkstra
The current copysign pattern has a mismatch in the predicates and constraints - operand[2] is a register_operand but also has an alternative X which allows any operand. Since it is a floating point operation, having an integer alternative makes no sense. Change the expander to always use the vec

[gcc r14-10399] Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890]

2024-07-09 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:72753ec82076d15443c32aac88a8c0fa0ab4bc2f commit r14-10399-g72753ec82076d15443c32aac88a8c0fa0ab4bc2f Author: Alfie Richards Date: Thu Jul 4 09:09:19 2024 +0200 Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890] This change removes code that switches

[gcc r14-10398] Arm: Fix ldrd offset range [PR115153]

2024-07-09 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:83332e3f808b146ca06dbc6a91d15bd3e5650658 commit r14-10398-g83332e3f808b146ca06dbc6a91d15bd3e5650658 Author: Wilco Dijkstra Date: Fri Jul 5 17:31:25 2024 +0100 Arm: Fix ldrd offset range [PR115153] The valid offset range of LDRD in arm_legitimate_index_p is

[gcc r15-1865] Arm: Fix ldrd offset range [PR115153]

2024-07-05 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:44e5ecfd261afe72aa04eba4bf1a9ec782579cab commit r15-1865-g44e5ecfd261afe72aa04eba4bf1a9ec782579cab Author: Wilco Dijkstra Date: Fri Jul 5 17:31:25 2024 +0100 Arm: Fix ldrd offset range [PR115153] The valid offset range of LDRD in arm_legitimate_index_p is

[gcc r12-10603] AArch64: Fix strict-align cpymem/setmem [PR103100]

2024-07-05 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:b9d16d8361a9e3a82a2f21e759e760d235d43322 commit r12-10603-gb9d16d8361a9e3a82a2f21e759e760d235d43322 Author: Wilco Dijkstra Date: Wed Oct 25 16:28:04 2023 +0100 AArch64: Fix strict-align cpymem/setmem [PR103100] The cpymemdi/setmemdi implementation doesn&#

[gcc r14-10383] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-07-05 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:100d353e545564931efaac90a089a4e8f3d42e6e commit r14-10383-g100d353e545564931efaac90a089a4e8f3d42e6e Author: Wilco Dijkstra Date: Tue Jul 2 17:37:04 2024 +0100 Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188] A Thumb-1 memory operand allows

[gcc r15-1786] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-07-02 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:d04c5537f5ae4a3acd3f5135347d7e2d8c218811 commit r15-1786-gd04c5537f5ae4a3acd3f5135347d7e2d8c218811 Author: Wilco Dijkstra Date: Tue Jul 2 17:37:04 2024 +0100 Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188] A Thumb-1 memory operand allows

  1   2   3   4   5   6   7   8   9   10   >