Re: [PATCH] Document AArch64 changes for GCC 15

Andrew Pinski Tue, 22 Apr 2025 13:04:30 -0700

On Tue, Apr 22, 2025 at 5:32 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> The list is structured as:
>
> - new configurations
> - command-line changes
> - ACLE changes
> - everything else
>
> As usual, the list of new architectures, CPUs, and features is from a
> purely mechanical trawl of the associated .def files.  I've identified
> features by their architectural name to try to improve searchability.
> Similarly, the list of ACLE changes includes the associated ACLE
> feature macros, again to try to improve searchability.
>
> The list summarises some of the target-specific optimisations because
> it sounded like Tamar had received feedback that people found such
> information interesting.
>
> I've used the passive tense for most entries, to try to follow the
> style used elsewhere.
>
> We don't yet define __ARM_FEATURE_FAMINMAX, but I'll fix that
> separately.
>
> How does this look?  Anything I missed?


I don't see a mention that even if falkor and saphira support still
exists, the tuning for them are mostly removed.
(scheduler and the tag collision pass was removed).

Maybe a mention that the pre-RA scheduler is disabled at -O2? (I am
not 100% sure this should be mentioned).

Those are the only 2 I saw missing.

Thanks,
Andrew Pinski

>
> I'll leave a few days for comments.
>
> Thanks,
> Richard
>
> ---
>  htdocs/gcc-15/changes.html | 241 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 240 insertions(+), 1 deletion(-)
>
> diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
> index f03e29c8..dee476c7 100644
> --- a/htdocs/gcc-15/changes.html
> +++ b/htdocs/gcc-15/changes.html
> @@ -681,7 +681,246 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
>  <!-- .................................................................. -->
>  <h2 id="targets">New Targets and Target Specific Improvements</h2>
>
> -<!-- <h3 id="aarch64">AArch64</h3> -->
> +<h3 id="aarch64">AArch64</h3>
> +
> +<ul>
> +  <li>Support has been added for the AArch64 MinGW target
> +    (<code>aarch64-w64-mingw32</code>).  At present, this target only
> +    supports C, but further work is planned.
> +  </li>
> +</li>
> +  <li>The following architecture level is now supported by
> +    <code>-march</code> and related source-level constructs
> +    (GCC identifiers in parentheses):
> +    <ul>
> +      <li>Armv9.5-A (<code>arm9.5-a</code>)</li>
> +    </ul>
> +  </li>
> +  <li>The following CPUs are now supported by <code>-mcpu</code>,
> +    <code>-mtune</code>, and related source-level constructs
> +    (GCC identifiers in parentheses):
> +    <ul>
> +      <li>Apple A12 (<code>apple-a12</code>)</li>
> +      <li>Apple M1 (<code>apple-m1</code>)</li>
> +      <li>Apple M2 (<code>apple-m2</code>)</li>
> +      <li>Apple M3 (<code>apple-m3</code>)</li>
> +      <li>Arm Cortex-A520AE (<code>cortex-a520ae</code>)</li>
> +      <li>Arm Cortex-A720AE (<code>cortex-a720ae</code>)</li>
> +      <li>Arm Cortex-A725 (<code>cortex-a725</code>)</li>
> +      <li>Arm Cortex-R82AE (<code>cortex-r82ae</code>)</li>
> +      <li>Arm Cortex-X925 (<code>cortex-x925</code>)</li>
> +      <li>Arm Neoverse N3 (<code>neoverse-n3</code>)</li>
> +      <li>Arm Neoverse V3 (<code>neoverse-v3</code>)</li>
> +      <li>Arm Neoverse V3AE (<code>neoverse-v3ae</code>)</li>
> +      <li>FUJITSU-MONAKA (<code>fujitsu-monaka</code>)</li>
> +      <li>NVIDIA Grace (<code>grace</code>)</li>
> +      <li>NVIDIA Olympus (<code>olympus</code>)</li>
> +      <li>Qualcomm Oryon-1 (<code>oryon-1</code>)</li>
> +    </ul>
> +  </li>
> +  <li>The following features are now supported by <code>-march</code>,
> +    <code>-mcpu</code>, and related source-level constructs
> +    (GCC modifiers in parentheses):
> +    <ul>
> +      <li>FEAT_CPA (<code>+cpa</code>), enabled by default for
> +        Arm9.5-A and above
> +      </li>
> +      <li>FEAT_FAMINMAX (<code>+faminmax</code>), enabled by default for
> +        Arm9.5-A and above
> +      </li>
> +      <li>FEAT_FCMA (<code>+fcma</code>), enabled by default for Armv8.3-A
> +        and above
> +      </li>
> +      <li>FEAT_FLAGM2 (<code>+flagm2</code>), enabled by default for
> +        Armv8.5-A and above
> +      </li>
> +      <li>FEAT_FP8 (<code>+fp8</code>)</li>
> +      <li>FEAT_FP8DOT2 (<code>+fp8dot2</code>)</li>
> +      <li>FEAT_FP8DOT4 (<code>+fp8dot4</code>)</li>
> +      <li>FEAT_FP8FMA (<code>+fp8fma</code>)</li>
> +      <li>FEAT_FRINTTS (<code>+frintts</code>), enabled by default for
> +        Armv8.5-A and above
> +      </li>
> +      <li>FEAT_JSCVT (<code>+jscvt</code>), enabled by default for
> +        Armv8.3-A and above
> +      </li>
> +      <li>FEAT_LUT (<code>+lut</code>), enabled by default for
> +        Arm9.5-A and above
> +      </li>
> +      <li>FEAT_LRCPC2 (<code>+rcpc2</code>), enabled by default for
> +        Armv8.4-A and above
> +      </li>
> +      <li>FEAT_SME_B16B16 (<code>+sme-b16b16</code>)</li>
> +      <li>FEAT_SME_F16F16 (<code>+sme-f16f16</code>)</li>
> +      <li>FEAT_SME2p1 (<code>+sme2p1</code>)</li>
> +      <li>FEAT_SSVE_FP8DOT2 (<code>+ssve-fp8dot2</code>)</li>
> +      <li>FEAT_SSVE_FP8DOT4 (<code>+ssve-fp8dot4</code>)</li>
> +      <li>FEAT_SSVE_FP8FMA (<code>+ssve-fp8fma</code>)</li>
> +      <li>FEAT_SVE_B16B16 (<code>+sve-b16b16</code>)</li>
> +      <li>FEAT_SVE2p1 (<code>+sve2p1</code>), enabled by default for
> +        Armv9.4-A and above
> +      </li>
> +      <li>FEAT_WFXT (<code>+wfxt</code>), enabled by default for
> +        Armv8.7-A and above
> +      </li>
> +      <li>FEAT_XS (<code>+xs</code>), enabled by default for
> +        Armv8.7-A and above
> +      </li>
> +    </ul>
> +    The features listed as being enabled by default for Armv8.7-A or earlier
> +    were previously only selectable using the associated architecture level.
> +    For example, FEAT_FCMA was previously selected by
> +    <code>-march=armv8.3-a</code> and above (as it still is), but it wasn't
> +    previously selectable independently.
> +  </li>
> +  <li>The <code>-mbranch-protection</code> feature has been extended to
> +    support the Guarded Control Stack (GCS) extension.  This support
> +    is included in <code>-mbranch-protection=standard</code> and can
> +    be enabled individually using <code>-mbranch-protection=gcs</code>.
> +  </li>
> +  <li>The following additional changes have been made to the
> +    command-line options:
> +    <ul>
> +      <li>In order to align with other tools, the SME feature modifier
> +        <code>+sme</code> no longer enables SVE.  However, GCC does not
> +        yet support using SME without SVE and instead rejects such
> +        combinations with a “not implemented” error.
> +      </li>
> +      <li>The options <code>-mfix-cortex-a53-835769</code> and
> +        <code>-mfix-cortex-a53-843419</code> are now silently ignored
> +        if the selected architecture is incompatible with Cortex-A53.
> +        This is particularly useful for toolchains that are configured
> +        to apply the Cortex-A53 workarounds by default.  For example,
> +        all other things being equal, a toolchain configured with
> +        <code>--enable-fix-cortex-a53-835769</code> now produces the
> +        same code for <code>-mcpu=neoverse-n2</code> as a toolchain
> +        configured without <code>--enable-fix-cortex-a53-835769</code>.
> +     </li>
> +     <li><code>-mcpu=native</code> now handles unrecognized heterogeneous
> +       systems by detecting which individual architecture features are
> +       supported by the CPUs.  This matches the preexisting behavior for
> +       unknown homogeneous systems.
> +     </li>
> +    </ul>
> +  </li>
> +  <li>Support has been added for the following features of the Arm C
> +    Language Extensions
> +    (<a href="https://github.com/ARM-software/acle";>ACLE</a>):
> +    <ul>
> +      <li>Guarded control stacks</li>
> +      <li>Lookup table instructions with 2-bit and 4-bit indices
> +        (predefined macro
> +        <code>__ARM_FEATURE_LUT</code>, enabled by <code>+lut</code>)
> +      </li>
> +      <li>Floating-point absolute minimum and maximum instructions
> +        (predefined macro <code>__ARM_FEATURE_FAMINMAX</code>,
> +        enabled by <code>+faminmax</code>)
> +      </li>
> +      <li>FP8 conversions (predefined macro
> +        <code>__ARM_FEATURE_FP8</code>, enabled by <code>+fp8</code>)
> +      </li>
> +      <li>FP8 2-way dot product to half precision instructions
> +        (predefined macro <code>__ARM_FEATURE_FP8DOT2</code>,
> +        enabled by <code>+fp8dot2</code>)
> +      </li>
> +      <li>FP8 4-way dot product to single precision instructions
> +        (predefined macro <code>__ARM_FEATURE_FP8DOT4</code>,
> +        enabled by <code>+fp8dot4</code>)
> +      </li>
> +      <li>FP8 multiply-accumulate to half precision and single precision
> +        instructions (predefined macro <code>__ARM_FEATURE_FP8FMA</code>,
> +        enabled by <code>+fp8fma</code>)
> +      </li>
> +      <li>SVE FP8 2-way dot product to half precision instructions
> +        (predefined macro <code>__ARM_FEATURE_SSVE_FP8DOT2</code>,
> +        enabled by <code>+ssve-fp8dot2</code>)
> +      </li>
> +      <li>SVE FP8 4-way dot product to single precision instructions
> +        (predefined macro <code>__ARM_FEATURE_SSVE_FP8DOT4</code>,
> +        enabled by <code>+ssve-fp8dot4</code>)
> +      </li>
> +      <li>SVE FP8 multiply-accumulate to half precision and single precision
> +        instructions (predefined macro 
> <code>__ARM_FEATURE_SSVE_FP8FMA</code>,
> +        enabled by <code>+ssve-fp8fma</code>)
> +      </li>
> +      <li>SVE2.1 instructions (predefined macro
> +        <code>__ARM_FEATURE_SVE2p1</code>, enabled by <code>+sve2p1</code>)
> +      </li>
> +      <li>SVE non-widening bfloat16 instructions
> +        (predefined macro <code>__ARM_FEATURE_SVE_B16B16</code>,
> +        enabled by <code>+sve-b16b16</code>)
> +      </li>
> +      <li>SME2.1 instructions (predefined macro
> +        <code>__ARM_FEATURE_SME2p1</code>, enabled by <code>+sme2p1</code>)
> +      </li>
> +      <li>SME non-widening bfloat16 instructions
> +        (predefined macro <code>__ARM_FEATURE_SME_B16B16</code>,
> +        enabled by <code>+sme-b16b16</code>)
> +      </li>
> +      <li>SME half-precision instructions
> +        (predefined macro <code>__ARM_FEATURE_SME_F16F16</code>,
> +        enabled by <code>+sme-f16f16</code>)
> +      </li>
> +      <li>using C and C++ prefix operators, infix operators, and postfix
> +        operators with scalable SVE ACLE types
> +        (predefined macro <code>__ARM_FEATURE_SVE_VECTOR_OPERATORS==2</code>,
> +        enabled by <code>+sve</code>)
> +      </li>
> +      <li><code>__fma</code> (in <code>arm_acle.h</code>)</li>
> +      <li><code>__fmaf</code> (in <code>arm_acle.h</code>)</li>
> +      <li><code>__chkfeat</code> (in <code>arm_acle.h</code>)</li>
> +    </ul>
> +  </li>
> +  </li>In addition, the following changes have been made to preexisting
> +    ACLE features:
> +    <ul>
> +      <li>The macros <code>__ARM_FEATURE_BF16</code> and
> +        <code>__ARM_FEATURE_SVE_BF16</code> are now predefined when the
> +        associated support is available.  Previous versions of GCC provided
> +        the associated intrinsics but did not predefine the macros.
> +      </li>
> +      <li>OpenMP tasks can now share scalable SVE vectors and predicates.
> +        However, offloading of scalable vectors and predicates is not
> +        supported.
> +      </li>
> +      <li>ACLE system register functions (such as <code>__arm_rsr</code>
> +        and <code>__arm_wsr</code>) no longer try to enforce the minimum
> +        architectural requirement.
> +      </li>
> +      <li>A warning is reported if code attempts to use the Function
> +        Multi-Versioning feature.  GCC's current implementation of this
> +        feature is still experimental and it does not conform to the
> +        ACLE specification.
> +      </li>
> +    </ul>
> +  </li>
> +  <li>Support has been added for the <code>indirect_return</code>
> +    function-type attribute, which indicates that a function might return
> +    via an indirect branch instead of via a normal return instruction.
> +  </li>
> +  <li>128-bit atomic operations have been extended to make use of
> +    FEAT_LRCPC3 instructions, when support for the instructions is
> +    detected at runtime.
> +  </li>
> +  <li>There have been many code-generation improvements to the AArch64 port.
> +    Some examples are:
> +    <ul>
> +      <li>automatic use of AArch64 CRC instructions</li>
> +      <li>automatic use of AArch64 saturating vector arithmetic
> +        instructions
> +      </li>
> +      <li>better code generation of population counts</li>
> +      <li>improved handling of floating-point and vector immediates</li>
> +      <li>improved handling of vector permutations</li>
> +      <li>more use of SVE instructions to optimize Advanced SIMD code</li>
> +      <li>more folding and simplification of SVE ACLE intrinsics</li>
> +      <li>improved CPU-specific tuning</li>
> +      <li>improved register allocation, such as eliminating some
> +        vector moves
> +      </li>
> +    </ul>
> +  </li>
> +</ul>
>
>  <h3 id="amdgcn">AMD GPU (GCN)</h3>
>
> --
> 2.43.0
>

Re: [PATCH] Document AArch64 changes for GCC 15

Reply via email to