Hi Keith,
Thanks for the explanation - however I'm afraid compilers don't have a concept
of implicit vs explicit use of operations or registers.
> I'm trying to find all cases where that happens for data types other
> than SIMD/FP. Do you know of other places where the compiler implicitly
> uses
Hi Richard,
> Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible
> functions allow/require PSTATE.SM to be 1 on entry, so they need to
> be treated as STO_AARCH64_VARIANT_PCS.
>
> Similarly, functions that share ZA or ZT0 with their callers require
> ZA to be active on entry
Add an optab for isnan. This requires changes to the existing folding code
to extend the interclass_mathfn infrastructure to support BUILT_IN_ISNAN.
It now checks for a valid optab before emitting the generic expansion.
There is no change if no optab is defined. Update documentation, including
t
Hi Keith,
> This option (enabled by default) preserves existing behavior by
> allowing use of Advanced SIMD registers while expanding
> memset/memcpy/memmove operations into inline instructions.
>
> Disabling this option prevents use of these registers for environments
> where the FPU may be disab
Hi Richard,
+++ b/gcc/config/aarch64/aarch64.md
@@ -876,10 +876,16 @@
(clobber (reg:CC CC_REGNUM))]
"TARGET_CMPBR && aarch64_cb_rhs (, operands[1])"
{
-return (get_attr_far_branch (insn) == FAR_BRANCH_NO)
- ? "cb\\t%0, %1, %l2"
- : aarch64_gen_far_branch (operands, 2,
-
Hi Richard,
This is a really good improvement - I've built all of SPEC2017 without any
issues.
Overall it shows almost 3.0% codesize reduction with +cmpbr! I noticed that the
patches slightly increase codesize even without +cmpbr - not quite sure why.
So overall this looks OK for commit.
Btw as
>On 8/5/25 19:43, Richard Henderson wrote:
>> That said, I'm a little confused
>> why we'd want to use SUBS+B.{EQ,NE} instead of SUB+CB{Z,NZ}.
>
> The answer to that is that B.{EQ,NE} converts easily to CSEL/CSINC/CSINV.
I did it originally because that works out best - CBZ has a shorter branch
https://gcc.gnu.org/g:731649066f0fd2e2b2fbfd8668e001c3e91290d6
commit r16-2682-g731649066f0fd2e2b2fbfd8668e001c3e91290d6
Author: Wilco Dijkstra
Date: Thu Jun 26 15:41:06 2025 +
AArch64: Use correct cost for shifted halfword load/stores
Since all Armv9 cores support shifted
https://gcc.gnu.org/g:9996036205b5a71e7738f2daa29f4e6f79886a4e
commit r16-2684-g9996036205b5a71e7738f2daa29f4e6f79886a4e
Author: Wilco Dijkstra
Date: Tue Mar 25 15:51:42 2025 +
libgcc: Update FMV features to latest ACLE spec 2024Q4
Update FMV features to latest ACLE spec of
https://gcc.gnu.org/g:a6bb6934a491015c4d3f08763455d86ccfb3bcbe
commit r16-2683-ga6bb6934a491015c4d3f08763455d86ccfb3bcbe
Author: Wilco Dijkstra
Date: Mon Apr 28 16:20:15 2025 +
libgcc: Cleanup HWCAP defines in cpuinfo.c
Cleanup HWCAP defines - rather than including hwcap.h
Hi Richard,
> aarch64_simd_valid_imm tries to decompose a constant into a repeating
> series of 64 bits, since most Advanced SIMD and SVE immediate forms
> require that. (The exceptions are handled first.) It does this by
> building up a byte-level register image, lsb first. If the image does
Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero
for these.
Passes regress, OK for commit?
gcc:
* config/aarch64/tuning_models/generic_armv9_a.h
(generic_armv9_a_addrcost_table): Use zero cost for himode.
---
diff --git a/gcc/config/aarch64/tuning_mo
Hi Alejandro,
> > > > +XXX)
> > > > +While atypical,
> > > > +realloc may fail
> > > > +for a call that shrinks the block of memory.
> > >
> > > Is it worth wording this as "may fail or return a different pointer
> > > for a call that shrinks the block of memory"?
> Oh, the text is still ther
TARGET_CONST_ANCHOR appears to trigger too often, even on simple immediates.
It inserts extra ADD/SUB instructions even when a single MOV exists.
Disable it to improve overall code quality: on SPEC2017 it removes
1850 ADD/SUB instructions and 630 spill instructions, and SPECINT is ~0.06%
faster on
Cleanup HWCAP defines - rather than including hwcap.h and then repeating it
using
#ifndef, just define the HWCAPs we need exactly as in hwcap.h.
libgcc:
* config/aarch64/cpuinfo.c: Cleanup HWCAP defines.
---
diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
Update FMV features to latest ACLE spec of 2024Q4 - several features have been
removed
or merged. Add FMV support for CSSC and MOPS. Preserve the ordering in enum
CPUFeatures.
gcc:
* common/config/aarch64/cpuinfo.h: Remove unused features, add FEAT_CSSC
and FEAT_MOPS.
Hi Richard,
> Could you give details? I thought it was always known that trapped
> system register accesses were slow. In the previous versions, the
> checks seemed to be presented as an up-front price worth paying for
> faster atomic operations, on the systems that would use those paths.
> Now
https://gcc.gnu.org/g:6e47e6d48844ee578fd384aaa4b8cd62d73b49db
commit r15-8032-g6e47e6d48844ee578fd384aaa4b8cd62d73b49db
Author: Wilco Dijkstra
Date: Mon Feb 24 16:38:02 2025 +
libgcc: Remove PREDRES and LS64 from AArch64 cpuinfo
Change AArch64 cpuinfo to follow the latest
Hi Richard,
> That was also what I was trying to say. In the worst case, the linked
> object has to meet the requirements of the lowest common denominator.
>
> And my supposition was that that isn't a property of static vs dynamic.
But it is. Dynamic linking supports mixing different code models
Hi Richard,
>> Basically the small and large model are fundamentally incompatible. The
>> infamous
>> "dumb linker" approach means it doesn't try to sort sections, so an ADRP
>> relocation
>> will be out of reach if its data is placed after a huge array. Static
>> linking with GLIBC or
>> enabl
https://gcc.gnu.org/g:f870302515d5fcf7355f0108c3ead0038ff326fd
commit r15-7871-gf870302515d5fcf7355f0108c3ead0038ff326fd
Author: Wilco Dijkstra
Date: Mon Mar 3 16:47:32 2025 +
AArch64: Enable early scheduling for -O3 and higher (PR118351)
Enable the early scheduler on AArch64
Hi Ramana,
> -Generate code for the large code model. This makes no assumptions about
> -addresses and sizes of sections. Programs can be statically linked only.
> The
> +Generate code for the large code model. This allows large .bss and .data
> +sections, however .text and .rodata must still
Hi Kyrill,
> This restriction should be documented in invoke.texi IMO.
> I also think it would be more user friendly to warn them about the
> incompatibility if an explicit -moutline-atomics option is passed.
> It’s okay though to silently turn off the implicit default-on option though.
I've upd
Hi Richard&Kyrill,
>> I’m in favour of this.
>
> Yeah, seems ok to me too. I suppose we ought to update the documentation too:
I've added a note to the documentation. However it is impossible to be complete
here
since many targets switch off early scheduling under various circumstances. So
I'v
Feedback from the kernel team suggests that it's best to only use HWCAPs
rather than also use low-level checks as done by has_lse128() and has_rcpc3().
So change these to just use HWCAPs which simplifies the code and speeds up
ifunc selection by avoiding expensive system register accesses.
Passes
Change AArch64 cpuinfo to follow the latest updates to the FMV spec [1]:
Remove FEAT_PREDRES and FEAT_LS64*. Preserve the ordering in enum CPUFeatures.
Passes regress, OK for commit?
[1] https://github.com/ARM-software/acle/pull/382
gcc:
* common/config/aarch64/cpuinfo.h: Remove FEAT_PR
Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 benefits
from much faster build times with -O2, but avoids the regressions in lbm which
is very sensitive to minor scheduling changes due to long FMA chains. We can
then revisit this for GCC16.
gcc:
PR target/118351
Outline atomics is not designed to be used with -mcmodel=large, so disable
it automatically if the large code model is used.
Passes regress, OK for commit?
gcc:
PR target/112465
* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
Turn off outline atomic
https://gcc.gnu.org/g:2713f6bb90765de81954275a30c62432d30e1d68
commit r15-6922-g2713f6bb90765de81954275a30c62432d30e1d68
Author: Wilco Dijkstra
Date: Thu Nov 14 14:34:17 2024 +
AArch64: Add FULLY_PIPELINED_FMA to tune baseline
Add FULLY_PIPELINED_FMA to tune baseline - this
---
commit 3cf8c149005ecd153b1bb0a082920f95edb670b0
Author: Wilco Dijkstra
Date: Tue Jan 14 16:14:52 2025 +
gcc-15: Add ILP32 depreciation on AArch64.
diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 1c690c4a..d5037efb 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.htm
https://gcc.gnu.org/g:4ce502f31f95ec19e7d347d43afcd015895f135d
commit r15-6923-g4ce502f31f95ec19e7d347d43afcd015895f135d
Author: Wilco Dijkstra
Date: Fri Jan 10 19:48:02 2025 +
AArch64: Update neoverse512tvb tuning
Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and
https://gcc.gnu.org/g:625ea3c6ea1811388d030eddff57cd46c209d49a
commit r15-6921-g625ea3c6ea1811388d030eddff57cd46c209d49a
Author: Wilco Dijkstra
Date: Thu Jan 9 19:41:14 2025 +
AArch64: Deprecate -mabi=ilp32
ILP32 was originally intended to make porting to AArch64 easier
Hi Richard,
> Sorry to be awkward, but I don't think we should put
> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT in base.
> CHEAP_SHIFT_EXTEND is a good base flag because it means we can make full
> use of a certain group of instructions. FULLY_PIPELINED_FMA similarly
> means that FMA chains beh
Hi Richard,
>> + if (TARGET_ILP32)
>> + warning (OPT_Wdeprecated, "%<-mabi=ilp32%> is deprecated.");
>
> There should be no "." at the end of the message.
Right, fixed in v2 below.
> Otherwise it looks good to me, although like Kyrill says, it'll also
> need a release note.
I've added one,
As suggested in
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673558.html
update the gcc-15 Changes page:
Add ILP32 depreciation to Caveats section.
---
diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index
1c690c4a168f4d6297ad33dd5b798e9200792dc5..d5037efb34cc8e6
Hi all,
> In that case, I'm coming round to the idea of deprecating ILP32.
> I think it was already common ground that the GNU/Linux support is dead.
> watchOS would use Mach objects rather than ELF. As you say, it isn't
> clear how much of the current ILP32 support would be relevant for it.
> An
https://gcc.gnu.org/g:81bcf412c1c221bc2557666a6ca8381dac1de097
commit r15-6802-g81bcf412c1c221bc2557666a6ca8381dac1de097
Author: Wilco Dijkstra
Date: Fri Jan 10 18:01:58 2025 +
libatomic: Cleanup AArch64 ifunc selection
Simplify and cleanup ifunc selection logic. Since
Hi Richard,
> It looks like you committed the original version instead, with no extra
> explanation. I suppose I should have asked for another review round
> instead.
Did you check the commit log?
Change the AARCH64_EXPAND_ALIGNMENT macro into proper function calls to make
future change
Hi Richard,
> Yeah, somewhat. But won't we go on to test has_lse2 anyway, due to:
>
> # elif defined (LSE2_LRCPC3_ATOP)
> # define IFUNC_NCOND(N) 2
> # define IFUNC_COND_1 (has_rcpc3 (hwcap, features))
> # define IFUNC_COND_2 (has_lse2 (hwcap, features))
>
> If we want to reduce the
Hi Andrew,
> Personally I would like this deprecated even for bare-metal. Yes the
> iwatch ABI is an ILP32 ABI but I don't see GCC implementing that any
> time soon and I suspect it would not be hard to resurrect the code at
> that point.
My patch deprecates it in all cases currently. It will be
Hi Richard,
>> + /* LSE2 is a prerequisite for atomic LDIAPP/STILP. */
>> + if (!(hwcap & HWCAP_USCAT))
>> return false;
>
> Is there a reason for not using has_lse2 here? It'd be good to have
> a comment if so.
Yes, the MRS instructions cause expensive traps, so we try to avoid them whe
Hi Kyrill,
>> Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and
>> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
>> to the baseline tuning since all modern cores use it. Fix the
>> neoverse512tvb tuning to be
>> like Neoverse V1/V2.
>
> For neoversev512tvb this means adding AARCH64_EXTRA_TUNE_AVOI
ping
Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and
AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
to the baseline tuning since all modern cores use it. Fix the neoverse512tvb
tuning to be
like Neoverse V1/V2.
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TU
ping
Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is
already enabled for some cores, but benchmarking it shows it is faster on all
modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1).
Passes regress & bootstrap, OK for commit?
gcc/ChangeLo
ping
Simplify and cleanup ifunc selection logic. Since LRCPC3 does
not imply LSE2, has_rcpc3() should also check LSE2 is enabled.
Passes regress and bootstrap, OK for commit?
libatomic:
* config/linux/aarch64/host-config.h (has_lse2): Cleanup.
(has_lse128): Likewise.
(
ILP32 was originally intended to make porting to AArch64 easier. Support was
never merged in the Linux kernel or GLIBC, so it has been unsupported for many
years. There isn't a benefit in keeping unsupported features forever, so
deprecate it now (and it could be removed in a future release).
Pa
As a minor cleanup remove Cortex-A57 FMA steering pass. Since Cortex-A57 is
pretty old, there isn't any benefit of keeping this.
Passes regress & bootstrap, OK for commit?
gcc:
* config.gcc (extra_objs): Remove cortex-a57-fma-steering.o.
* config/aarch64/aarch64-passes.def: Remo
Hi Richard,
> The patch below is what I meant. It passes bootstrap & regression-test
> on aarch64-linux-gnu (and so produces the same results for the tests
> that you changed). Do you see any problems with this version?
> If not, I think we should go with it.
Thanks for the detailed example - u
https://gcc.gnu.org/g:c5db3f50bdf34ea96fd193a2a66d686401053bd2
commit r15-6661-gc5db3f50bdf34ea96fd193a2a66d686401053bd2
Author: Wilco Dijkstra
Date: Fri Nov 1 14:40:26 2024 +
AArch64: Switch off early scheduling
The early scheduler takes up ~33% of the total build time
https://gcc.gnu.org/g:45d306a835cb3f865a897dc7c04efbe1f9f46c28
commit r15-6660-g45d306a835cb3f865a897dc7c04efbe1f9f46c28
Author: Wilco Dijkstra
Date: Fri Nov 1 14:44:56 2024 +
AArch64: Block combine_and_move from creating FP literal loads
The IRA combine_and_move pass runs if
https://gcc.gnu.org/g:9366c328518766d896155388726055624716c0af
commit r14-11101-g9366c328518766d896155388726055624716c0af
Author: Wilco Dijkstra
Date: Tue Dec 10 14:22:48 2024 +
arm: Fix LDRD register overlap [PR117675]
The register indexed variants of LDRD have complex
https://gcc.gnu.org/g:132025a5fe6a9ba59d62126ecba21887f7ac0f98
commit r15-6087-g132025a5fe6a9ba59d62126ecba21887f7ac0f98
Author: Wilco Dijkstra
Date: Thu Nov 14 14:28:10 2024 +
AArch64: Add baseline tune
Cleanup the extra tune defines by introducing AARCH64_EXTRA_TUNE_BASE as
https://gcc.gnu.org/g:21fbfae2e55e1a153820acc6fbd922e66f67e65b
commit r15-6088-g21fbfae2e55e1a153820acc6fbd922e66f67e65b
Author: Wilco Dijkstra
Date: Tue Dec 10 14:22:48 2024 +
arm: Fix LDRD register overlap [PR117675]
The register indexed variants of LDRD have complex
https://gcc.gnu.org/g:bf6efbbad14e46f97bcc36c531000d8d4740c863
commit r15-6086-gbf6efbbad14e46f97bcc36c531000d8d4740c863
Author: Wilco Dijkstra
Date: Tue Oct 1 16:51:14 2024 +
AArch64: Cleanup alignment macros
Change the AARCH64_EXPAND_ALIGNMENT macro into proper function
https://gcc.gnu.org/g:27d9b6d312678c7b9b09104eb0f48dc46e0f8ca2
commit r15-6085-g27d9b6d312678c7b9b09104eb0f48dc46e0f8ca2
Author: Wilco Dijkstra
Date: Fri May 10 17:13:40 2024 +
AArch64: Use LDP/STP for large struct types
Use LDP/STP for large struct types as they have useful
Hi Richard,
>> A common case is a constant string which is compared against some
>> argument. Most string functions work on 8 or 16-byte quantities. If we
>> ensure the whole array fits in one aligned load, we save time in the
>> string function.
>>
>> Runtime data collected for strlen calls shows
Hi Richard,
> So just to be sure I understand: we still want to align (say) an array
> of 4 chars to 32 bits so that the LDR & STR are aligned, and an array of
> 3 chars to 32 bits so that the LDRH & STRH for the leading two bytes are
> aligned? Is that right? We don't seem to take advantage of
The register indexed variants of LDRD have complex register overlap constraints
which makes them hard to use without using output_move_double (which can't be
used for atomics as it doesn't guarantee to emit atomic LDRD/STRD when
required).
Add a new predicate and constraint for plain LDRD/STRD wi
Change the AARCH64_EXPAND_ALIGNMENT macro into proper function calls to make
future changes easier. Use the existing alignment settings, however avoid
overaligning small array's or structs to 64 bits when there is no benefit.
This gives a small reduction in data and stack size.
Passes regress & b
Simplify and cleanup ifunc selection logic. Since LRCPC3 does
not imply LSE2, has_rcpc3() should also check LSE2 is enabled.
Passes regress and bootstrap, OK for commit?
libatomic:
* config/linux/aarch64/host-config.h (has_lse2): Cleanup.
(has_lse128): Likewise.
(has_rcp
Hi Kyrill,
> This would make USE_NEW_VECTOR_COSTS effectively the default.
> Jennifer has been trying to do that as well and then to remove it (as it
> would be always true) but there are some codegen regressions that still >
> need to be addressed.
Yes, that's the goal - we should use good tun
Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is
already enabled for some cores, but benchmarking it shows it is faster on all
modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1).
Passes regress & bootstrap, OK for commit?
gcc/ChangeLog:
Cleanup the extra tune defines by introducing AARCH64_EXTRA_TUNE_BASE as a
common base supported by all modern cores. Initially set it to
AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND. No change in generated code.
Passes regress & bootstrap, OK for commit?
gcc/ChangeLog:
* config/aarch64/aarc
Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and
AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
to the baseline tuning since all modern cores use it. Fix the neoverse512tvb
tuning to be
like Neoverse V1/V2.
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE
Hi Richard,
> ...I still think we should avoid testing can_create_pseudo_p.
> Does it work with the last part replaced by:
>
> if (!DECIMAL_FLOAT_MODE_P (mode))
> {
> if (aarch64_can_const_movi_rtx_p (src, mode)
> || aarch64_float_const_representable_p (src)
> || aarch64
Hi,
>>> What do you think about disabling late scheduling as well?
>>
>> I think this would definitely need separate consideration and evaluation
>> given the above.
>>
>> Another thing to consider is the macro fusion machinery. IIRC it works
>> during scheduling so if we don’t run any schedulin
Hi Richard,
> The idea was that, if we did the split during expand, the movsf/df
> define_insns would then only accept the immediates that their
> constraints can handle.
Right, always disallowing these immediates works fine too (it seems
reload doesn't require all immediates to be valid), and th
https://gcc.gnu.org/g:deb0e2f61908bdc57b481995fa9e7c5083839a25
commit r15-5173-gdeb0e2f61908bdc57b481995fa9e7c5083839a25
Author: Wilco Dijkstra
Date: Wed Oct 2 16:34:41 2024 +
AArch64: Cleanup fusion defines
Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a
https://gcc.gnu.org/g:95305c800b1b3263534fdf67b63609772ecbb78d
commit r15-5174-g95305c800b1b3263534fdf67b63609772ecbb78d
Author: Wilco Dijkstra
Date: Mon Oct 7 15:42:49 2024 +
AArch64: Remove duplicated addr_cost tables
Remove duplicated addr_cost tables - use
Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a common base
level of fusion supported by almost all cores. Add AARCH64_FUSE_MOVK as a
shortcut for all MOVK fusion. In most cases there is no change. It enables
AARCH64_FUSE_CMP_BRANCH for a few older cores since it has no measura
Remove duplicated addr_cost tables - use generic_armv9_a_addrcost_table for
Armv9-a cores and generic_armv8_a_addrcost_table for recent Armv8-a cores.
No changes in generated code.
OK for commit?
gcc/ChangeLog:
* config/aarch64/tuning_models/cortexx925.h
(cortexx925_addrcost_table): Re
Hi Richard,
> That's because, once an instruction matches, the instruction should
> continue to match. It should always be possible to set the INSN_CODE of
> an existing instruction to -1, rerun recog, and get the same instruction
> code back.
>
> Because of that, insn conditions shouldn't depend
Hi Richard,
> It's ok for instructions to require properties that are false during
> early RTL passes and then transition to true. But they can't require
> properties that go from true to false, since that would mean that
> existing instructions become unrecognisable at certain points during
> th
v2: split off movsf/df pattern fixes, remove some guality xfails that now pass
The early scheduler takes up ~33% of the total build time, however it doesn't
provide a meaningful performance gain. This is partly because modern OoO cores
need far less scheduling, partly because the scheduler tends
The IRA combine_and_move pass runs if the scheduler is disabled and aggressively
combines moves. The movsf/df patterns allow all FP immediates since they rely
on a split pattern. However splits do not happen during IRA, so the result is
extra literal loads. To avoid this, use a more accurate ch
Hi Kyrill,
> I think the approach that I’d like to try is using the TARGET_SCHED_DISPATCH
> hooks like x86 does for bdver1-4.
> That would try to exploit the dispatch constraints information in the SWOGs
> rather than the instruction latency and throughput tables.
> That would still require some
Hi Andrew,
> I suspect the following scheduling models could be removed due either
> to hw never going to production or no longer being used by anyone:
> thunderx3t110.md
> falkor.md
> saphira.md
If you're planning to remove these, it would also be good to remove the
falkor-tag-collision-avoidanc
The early scheduler takes up ~33% of the total build time, however it doesn't
provide a meaningful performance gain. This is partly because modern OoO cores
need far less scheduling, partly because the scheduler tends to create many
unnecessary spills by increasing register pressure. Building ap
Hi Vineet,
> I agree the NARROW/WIDE stuff is obfuscating things in technicalities.
Is there evidence this change would make things significantly worse for
some targets? I did a few runs on Neoverse V2 with various options and
it looks beneficial both for integer and FP. On the example and option
https://gcc.gnu.org/g:7c17058eac3834fb03ec9e518235e4192557b97d
commit r15-4678-g7c17058eac3834fb03ec9e518235e4192557b97d
Author: Wilco Dijkstra
Date: Fri Oct 25 14:53:58 2024 +
AArch64: Add more accurate constraint [PR117292]
As shown in the PR, reload may only check the
As shown in the PR, reload may only check the constraint in some cases and
and not check the predicate is still valid for the resulting instruction.
To fix the issue, add a new constraint which matches the predicate exactly.
Passes regress & bootstrap, OK for commit?
gcc/ChangeLog:
PR ta
https://gcc.gnu.org/g:2ac01a4efceacb9f2f9433db636545885296da0a
commit r15-4572-g2ac01a4efceacb9f2f9433db636545885296da0a
Author: Wilco Dijkstra
Date: Thu Oct 17 14:33:44 2024 +
AArch64: Remove redundant check in aarch64_simd_mov
The split condition in aarch64_simd_mov uses
https://gcc.gnu.org/g:7c7c895c2f34d2a5c0cd2139c5e76c13c6c030c9
commit r15-4571-g7c7c895c2f34d2a5c0cd2139c5e76c13c6c030c9
Author: Wilco Dijkstra
Date: Tue Oct 15 16:22:23 2024 +
AArch64: Fix copysign patterns
The current copysign pattern has a mismatch in the predicates and
https://gcc.gnu.org/g:756890d66cf4971fc11187ccdf5893681aa661a1
commit r15-4568-g756890d66cf4971fc11187ccdf5893681aa661a1
Author: Wilco Dijkstra
Date: Tue Oct 8 15:55:25 2024 +
AArch64: Improve SIMD immediate generation (2/3)
Allow use of SVE immediates when generating AdvSIMD
https://gcc.gnu.org/g:22a37534c640ca5ff2f0e947dfe60df59fb6bba1
commit r15-4569-g22a37534c640ca5ff2f0e947dfe60df59fb6bba1
Author: Wilco Dijkstra
Date: Mon Oct 14 16:53:44 2024 +
AArch64: Add support for SIMD xor immediate (3/3)
Add support for SVE xor immediate when generating
https://gcc.gnu.org/g:bcbf4fa46ae2919cf281322bd39f4810b7c18c9d
commit r15-4567-gbcbf4fa46ae2919cf281322bd39f4810b7c18c9d
Author: Wilco Dijkstra
Date: Tue Oct 8 13:32:09 2024 +
AArch64: Improve SIMD immediate generation (1/3)
Cleanup the various interfaces related to SIMD
The split condition in aarch64_simd_mov uses aarch64_simd_special_constant_p.
While
doing the split, it checks the mode before calling
aarch64_maybe_generate_simd_constant.
This risky since it may result in unexpectedly calling aarch64_split_simd_move
instead
of aarch64_maybe_generate_simd_con
The current copysign pattern has a mismatch in the predicates and constraints -
operand[2] is a register_operand but also has an alternative X which allows any
operand. Since it is a floating point operation, having an integer alternative
makes no sense. Change the expander to always use vector i
Add support for SVE xor immediate when generating AdvSIMD code and SVE is
available.
Passes bootstrap & regress, OK for commit?
gcc/ChangeLog:
* config/aarch64/aarch64.cc (enum simd_immediate_check): Add
AARCH64_CHECK_XOR.
(aarch64_simd_valid_xor_imm): New function.
(a
Allow use of SVE immediates when generating AdvSIMD code and SVE is available.
First check for a valid AdvSIMD immediate, and if SVE is available, try using
an SVE move or bitmask immediate.
Passes bootstrap & regress, OK for commit?
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (ior3
Cleanup the various interfaces related to SIMD immediate generation. Introduce
new functions
that make it clear which operation (AND, OR, MOV) we are testing for rather
than guessing the
final instruction. Reduce the use of overly long names, unused and default
parameters for
clarity. No cha
Hi Saurabh,
This looks good, one little nit:
> gcc/ChangeLog:
>
> * config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and
> UNSPEC_COND_SMIN to correct iterators.
This should also have the PR target/116934 before it - it's fine to change it
when you commit.
Speaking of which,
v2: Add more testcase fixes.
The current copysign pattern has a mismatch in the predicates and constraints -
operand[2] is a register_operand but also has an alternative X which allows any
operand. Since it is a floating point operation, having an integer alternative
makes no sense. Change the e
The current copysign pattern has a mismatch in the predicates and constraints -
operand[2] is a register_operand but also has an alternative X which allows any
operand. Since it is a floating point operation, having an integer alternative
makes no sense. Change the expander to always use the vec
https://gcc.gnu.org/g:72753ec82076d15443c32aac88a8c0fa0ab4bc2f
commit r14-10399-g72753ec82076d15443c32aac88a8c0fa0ab4bc2f
Author: Alfie Richards
Date: Thu Jul 4 09:09:19 2024 +0200
Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890]
This change removes code that switches
https://gcc.gnu.org/g:83332e3f808b146ca06dbc6a91d15bd3e5650658
commit r14-10398-g83332e3f808b146ca06dbc6a91d15bd3e5650658
Author: Wilco Dijkstra
Date: Fri Jul 5 17:31:25 2024 +0100
Arm: Fix ldrd offset range [PR115153]
The valid offset range of LDRD in arm_legitimate_index_p is
https://gcc.gnu.org/g:44e5ecfd261afe72aa04eba4bf1a9ec782579cab
commit r15-1865-g44e5ecfd261afe72aa04eba4bf1a9ec782579cab
Author: Wilco Dijkstra
Date: Fri Jul 5 17:31:25 2024 +0100
Arm: Fix ldrd offset range [PR115153]
The valid offset range of LDRD in arm_legitimate_index_p is
https://gcc.gnu.org/g:b9d16d8361a9e3a82a2f21e759e760d235d43322
commit r12-10603-gb9d16d8361a9e3a82a2f21e759e760d235d43322
Author: Wilco Dijkstra
Date: Wed Oct 25 16:28:04 2023 +0100
AArch64: Fix strict-align cpymem/setmem [PR103100]
The cpymemdi/setmemdi implementation doesn
https://gcc.gnu.org/g:100d353e545564931efaac90a089a4e8f3d42e6e
commit r14-10383-g100d353e545564931efaac90a089a4e8f3d42e6e
Author: Wilco Dijkstra
Date: Tue Jul 2 17:37:04 2024 +0100
Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]
A Thumb-1 memory operand allows
https://gcc.gnu.org/g:d04c5537f5ae4a3acd3f5135347d7e2d8c218811
commit r15-1786-gd04c5537f5ae4a3acd3f5135347d7e2d8c218811
Author: Wilco Dijkstra
Date: Tue Jul 2 17:37:04 2024 +0100
Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]
A Thumb-1 memory operand allows
1 - 100 of 1114 matches
Mail list logo