[PATCH] Canonicalize vec_merge in simplify_ternary_operation
Similar to the canonicalization done in combine, we canonicalize vec_merge with swap_communattive_operands_p in simplify_ternary_operation too. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_exact_log2_inverse): New. * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero): Update pattern accordingly. * config/aarch64/aarch64.cc (aarch64_exact_log2_inverse): New. * simplify-rtx.cc (simplify_context::simplify_ternary_operation): Canonicalize vec_merge. Signed-off-by: Pengxuan Zheng --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-simd.md | 10 ++ gcc/config/aarch64/aarch64.cc | 10 ++ gcc/simplify-rtx.cc | 7 +++ 4 files changed, 24 insertions(+), 4 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 4235f4a0ca5..2391b99cacd 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -1051,6 +1051,7 @@ void aarch64_subvti_scratch_regs (rtx, rtx, rtx *, rtx *, rtx *, rtx *); void aarch64_expand_subvti (rtx, rtx, rtx, rtx, rtx, rtx, rtx, bool); +int aarch64_exact_log2_inverse (unsigned int, rtx); /* Initialize builtins for SIMD intrinsics. */ diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index e2afe87e513..1099e742cbf 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1193,12 +1193,14 @@ (define_insn "@aarch64_simd_vec_set" (define_insn "aarch64_simd_vec_set_zero" [(set (match_operand:VALL_F16 0 "register_operand" "=w") (vec_merge:VALL_F16 - (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "") - (match_operand:VALL_F16 3 "register_operand" "0") + (match_operand:VALL_F16 1 "register_operand" "0") + (match_operand:VALL_F16 3 "aarch64_simd_imm_zero" "") (match_operand:SI 2 "immediate_operand" "i")))] - "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" + "TARGET_SIMD && aarch64_exact_log2_inverse (, operands[2]) >= 0" { -int elt = ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2]))); +int elt = ENDIAN_LANE_N (, +aarch64_exact_log2_inverse (, +operands[2])); operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); return "ins\\t%0.[%p2], zr"; } diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f5f23f6ff4b..103a00915e5 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -23682,6 +23682,16 @@ aarch64_strided_registers_p (rtx *operands, unsigned int num_operands, return true; } +/* Return the base 2 logarithm of the bit inverse of OP masked by the lowest + NELTS bits, if OP is a power of 2. Otherwise, returns -1. */ + +int +aarch64_exact_log2_inverse (unsigned int nelts, rtx op) +{ + return exact_log2 ((~INTVAL (op)) +& ((HOST_WIDE_INT_1U << nelts) - 1)); +} + /* Bounds-check lanes. Ensure OPERAND lies between LOW (inclusive) and HIGH (exclusive). */ void diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index c478bd060fc..22002d1e1ab 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -7307,6 +7307,13 @@ simplify_context::simplify_ternary_operation (rtx_code code, machine_mode mode, return gen_rtx_CONST_VECTOR (mode, v); } + if (swap_commutative_operands_p (op0, op1) + /* Two operands have same precedence, then first bit of mask +select first operand. */ + || (!swap_commutative_operands_p (op1, op0) && !(sel & 1))) + return simplify_gen_ternary (code, mode, mode, op1, op0, +GEN_INT (~sel & mask)); + /* Replace (vec_merge (vec_merge a b m) c n) with (vec_merge b c n) if no element from a appears in the result. */ if (GET_CODE (op0) == VEC_MERGE) -- 2.17.1
[PATCH] aarch64: Fix testcase pr112105.c
This testcase started to fail with r15-268-g9dbff9c05520a7. When late_combine was added, it was turned on for -O2+ only, so this testcase still failed. This changes the option to be -O2 instead of -O and the testcase started to pass again. tested for aarch64-linux-gnu. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr112105.c: Change to be -O2 rather than -O1. Signed-off-by: Andrew Pinski --- gcc/testsuite/gcc.target/aarch64/pr112105.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/aarch64/pr112105.c b/gcc/testsuite/gcc.target/aarch64/pr112105.c index 1368ea3f784..5e60c6184b7 100644 --- a/gcc/testsuite/gcc.target/aarch64/pr112105.c +++ b/gcc/testsuite/gcc.target/aarch64/pr112105.c @@ -1,4 +1,4 @@ -/* { dg-options "-O" } */ +/* { dg-options "-O2" } */ #include typedef struct { -- 2.43.0
[committed] [PR middle-end/113525] Drop obsolete options from documentation
The sibling and unshare passes were dropped as distinct passes 10+ years ago. Docs weren't ever updated. This just removes them; given their age I don't think we need to keep them around any longer. Pushing to the trunk. Jeffcommit 3e93035fcc9247928b58443e37fbf844278b7ac7 Author: Jeff Law Date: Tue Feb 18 19:45:29 2025 -0700 [PR middle-end/113525] Drop obsolete options from documentation The sibling and unshare passes were dropped as distinct passes 10+ years ago. Docs weren't ever updated. This just removes them; given their age I don't think we need to keep them around any longer. PR middle-end/113525 gcc/ * doc/invoke.texi (dump-rtl-sibling): Drop documentation for pass removed long ago. (dump-rtl-unshare): Likewise. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d0e0ca80b0c..0c7adc039b5 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -20383,10 +20383,6 @@ Dump after common sequence discovery. @item -fdump-rtl-shorten Dump after shortening branches. -@opindex fdump-rtl-sibling -@item -fdump-rtl-sibling -Dump after sibling call optimizations. - @opindex fdump-rtl-split1 @opindex fdump-rtl-split2 @opindex fdump-rtl-split3 @@ -20417,10 +20413,6 @@ x87's stack-like registers. This pass is only run on x86 variants. @option{-fdump-rtl-subreg1} and @option{-fdump-rtl-subreg2} enable dumping after the two subreg expansion passes. -@opindex fdump-rtl-unshare -@item -fdump-rtl-unshare -Dump after all rtl has been unshared. - @opindex fdump-rtl-vartrack @item -fdump-rtl-vartrack Dump after variable tracking.
Re: [PATCH] LoongArch: Use normal RTL pattern instead of UNSPEC for {x,}vsr{a,l}ri instructions
LGTM! Thanks! 在 2025/2/14 下午9:37, Xi Ruoyao 写道: Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding shift operation. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove. (UNSPEC_LASX_XVSRLRI): Remove. (lasx_xvsrari_): Remove. (lasx_xvsrlri_): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_VSRARI): Remove. (UNSPEC_LSX_VSRLRI): Remove. (lsx_vsrari_): Remove. (lsx_vsrlri_): Remove. * config/loongarch/simd.md (simd__imm_round_): New define_insn. (_vri_): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-shift-imm-round.c: New test. --- gcc/config/loongarch/lasx.md | 22 -- gcc/config/loongarch/lsx.md | 22 -- gcc/config/loongarch/simd.md | 29 +++ .../loongarch/vect-shift-imm-round.c | 11 +++ 4 files changed, 40 insertions(+), 44 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-shift-imm-round.c diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 4ac85b7fcf9..e4505c1660d 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -43,9 +43,7 @@ (define_c_enum "unspec" [ UNSPEC_LASX_XVSAT_U UNSPEC_LASX_XVREPL128VEI UNSPEC_LASX_XVSRAR - UNSPEC_LASX_XVSRARI UNSPEC_LASX_XVSRLR - UNSPEC_LASX_XVSRLRI UNSPEC_LASX_XVSHUF UNSPEC_LASX_XVSHUF_B UNSPEC_LASX_BRANCH @@ -2035,16 +2033,6 @@ (define_insn "lasx_xvsrar_" [(set_attr "type" "simd_shift") (set_attr "mode" "")]) -(define_insn "lasx_xvsrari_" - [(set (match_operand:ILASX 0 "register_operand" "=f") - (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") - (match_operand 2 "const__operand" "")] - UNSPEC_LASX_XVSRARI))] - "ISA_HAS_LASX" - "xvsrari.\t%u0,%u1,%2" - [(set_attr "type" "simd_shift") - (set_attr "mode" "")]) - (define_insn "lasx_xvsrlr_" [(set (match_operand:ILASX 0 "register_operand" "=f") (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") @@ -2055,16 +2043,6 @@ (define_insn "lasx_xvsrlr_" [(set_attr "type" "simd_shift") (set_attr "mode" "")]) -(define_insn "lasx_xvsrlri_" - [(set (match_operand:ILASX 0 "register_operand" "=f") - (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") - (match_operand 2 "const__operand" "")] - UNSPEC_LASX_XVSRLRI))] - "ISA_HAS_LASX" - "xvsrlri.\t%u0,%u1,%2" - [(set_attr "type" "simd_shift") - (set_attr "mode" "")]) - (define_insn "lasx_xvssub_s_" [(set (match_operand:ILASX 0 "register_operand" "=f") (ss_minus:ILASX (match_operand:ILASX 1 "register_operand" "f") diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 9d7254768ae..c35826ffc0e 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -44,9 +44,7 @@ (define_c_enum "unspec" [ UNSPEC_LSX_VSAT_S UNSPEC_LSX_VSAT_U UNSPEC_LSX_VSRAR - UNSPEC_LSX_VSRARI UNSPEC_LSX_VSRLR - UNSPEC_LSX_VSRLRI UNSPEC_LSX_VSHUF UNSPEC_LSX_VEXTW_S UNSPEC_LSX_VEXTW_U @@ -1710,16 +1708,6 @@ (define_insn "lsx_vsrar_" [(set_attr "type" "simd_shift") (set_attr "mode" "")]) -(define_insn "lsx_vsrari_" - [(set (match_operand:ILSX 0 "register_operand" "=f") - (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") - (match_operand 2 "const__operand" "")] -UNSPEC_LSX_VSRARI))] - "ISA_HAS_LSX" - "vsrari.\t%w0,%w1,%2" - [(set_attr "type" "simd_shift") - (set_attr "mode" "")]) - (define_insn "lsx_vsrlr_" [(set (match_operand:ILSX 0 "register_operand" "=f") (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") @@ -1730,16 +1718,6 @@ (define_insn "lsx_vsrlr_" [(set_attr "type" "simd_shift") (set_attr "mode" "")]) -(define_insn "lsx_vsrlri_" - [(set (match_operand:ILSX 0 "register_operand" "=f") - (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") - (match_operand 2 "const__operand" "")] -UNSPEC_LSX_VSRLRI))] - "ISA_HAS_LSX" - "vsrlri.\t%w0,%w1,%2" - [(set_attr "type" "simd_shift") - (set_attr "mode" "")]) - (define_insn "lsx_vssub_s_" [(set (match_operand:ILSX 0 "register_operand" "=f") (ss_minus:ILSX (match_operand:ILSX 1 "register_operand" "f") diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md index 45d2bcaec2e..5e7bd49eaa2 100644 --- a/gcc/config/loongarch/simd.md +++ b/gcc/config/loongarch/simd.md @@ -932,6 +932,35 @@ (define_expand "_maddw_q_du_d_punned" DONE; }) +;; Integer shift right with rounding. +(define_insn "simd__imm_round_" + [(set (match_operand:IVEC 0 "register_operand" "=f") + (any_shiftrt:IVEC + (plus:IVEC + (match_operand:IVEC 1 "re
[PATCH v2] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]
This patch optimizes certain vector permute expansion with the FMOV instruction when one of the input vectors is a vector of all zeros and the result of the vector permute is as if the upper lane of the non-zero input vector is set to zero and the lower lane remains unchanged. Note that the patch also propagates zero_op0_p and zero_op1_p during re-encode now. They will be used by aarch64_evpc_fmov to check if the input vectors are valid candidates. PR target/100165 gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_lane0_mask_p): New. * config/aarch64/aarch64-simd.md (@aarch64_simd_vec_set_zero_fmov): New define_insn. * config/aarch64/aarch64.cc (aarch64_lane0_mask_p): New. (aarch64_evpc_reencode): Copy zero_op0_p and zero_op1_p. (aarch64_evpc_fmov): New. (aarch64_expand_vec_perm_const_1): Add call to aarch64_evpc_fmov. * config/aarch64/iterators.md (VALL_F16_NO_QI): New mode iterator. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vec-set-zero.c: Update test accordingly. * gcc.target/aarch64/fmov-1.c: New test. * gcc.target/aarch64/fmov-2.c: New test. * gcc.target/aarch64/fmov-3.c: New test. * gcc.target/aarch64/fmov-be-1.c: New test. * gcc.target/aarch64/fmov-be-2.c: New test. * gcc.target/aarch64/fmov-be-3.c: New test. Signed-off-by: Pengxuan Zheng --- gcc/config/aarch64/aarch64-protos.h | 2 +- gcc/config/aarch64/aarch64-simd.md| 13 ++ gcc/config/aarch64/aarch64.cc | 96 ++- gcc/config/aarch64/iterators.md | 9 + gcc/testsuite/gcc.target/aarch64/fmov-1.c | 158 ++ gcc/testsuite/gcc.target/aarch64/fmov-2.c | 52 ++ gcc/testsuite/gcc.target/aarch64/fmov-3.c | 144 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c | 144 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c | 52 ++ gcc/testsuite/gcc.target/aarch64/fmov-be-3.c | 144 .../gcc.target/aarch64/vec-set-zero.c | 6 +- 11 files changed, 816 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-3.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 4235f4a0ca5..cba94914903 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -1051,7 +1051,7 @@ void aarch64_subvti_scratch_regs (rtx, rtx, rtx *, rtx *, rtx *, rtx *); void aarch64_expand_subvti (rtx, rtx, rtx, rtx, rtx, rtx, rtx, bool); - +bool aarch64_lane0_mask_p (unsigned int, rtx); /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index e2afe87e513..6ddc27c223e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1190,6 +1190,19 @@ (define_insn "@aarch64_simd_vec_set" [(set_attr "type" "neon_ins, neon_from_gp, neon_load1_one_lane")] ) +(define_insn "@aarch64_simd_vec_set_zero_fmov" + [(set (match_operand:VALL_F16_NO_QI 0 "register_operand" "=w") + (vec_merge:VALL_F16_NO_QI + (match_operand:VALL_F16_NO_QI 1 "register_operand" "w") + (match_operand:VALL_F16_NO_QI 2 "aarch64_simd_imm_zero" "Dz") + (match_operand:SI 3 "immediate_operand" "i")))] + "TARGET_SIMD && aarch64_lane0_mask_p (, operands[3])" + { +return "fmov\\t%0, %1"; + } + [(set_attr "type" "fmov")] +) + (define_insn "aarch64_simd_vec_set_zero" [(set (match_operand:VALL_F16 0 "register_operand" "=w") (vec_merge:VALL_F16 diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f5f23f6ff4b..41e2e5d76d8 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -23682,6 +23682,15 @@ aarch64_strided_registers_p (rtx *operands, unsigned int num_operands, return true; } +/* Return TRUE if OP is a valid vec_merge bit mask for lane 0. */ + +bool +aarch64_lane0_mask_p (unsigned int nelts, rtx op) +{ + return exact_log2 (INTVAL (op)) >= 0 +&& (ENDIAN_LANE_N (nelts, exact_log2 (INTVAL (op))) == 0); +} + /* Bounds-check lanes. Ensure OPERAND lies between LOW (inclusive) and HIGH (exclusive). */ void @@ -26058,6 +26067,8 @@ aarch64_evpc_reencode (struct expand_vec_perm_d *d) newd.target = d->target ? gen_lowpart (new_mode, d->target) : NULL; newd.op0 = d->op0 ? gen_lowpart (new_mode, d->op0) : NULL; newd.op1 = d->op1 ? gen_lowpart (new_mode, d->op1) : NULL; +
[PATCH v2] aarch64: Ignore target pragmas while defining intrinsics
Compared to v1, I've added a new function aarch64_get_required_features to avoid having to pass a long list of explicit features. I also changed aarch64_target_switcher to only disable TARGET_GENERAL_REGS_ONLY if the requested flags include FP, to address Richard's comment. Bootstrapped and regression tested on aarch64. Is this ok for master? --- When initialising intrinsics with `#pragma GCC aarch64 "arm_*.h"`, we often set an explicit target, but currently leave current_target_pragma unchanged. This results in the target pragma being applied to each simulated intrinsic on top of our explicit target, which is clearly undesirable. As far as I can tell this doesn't cause any bugs at the moment, because none of the behaviour for builtin functions depends upon the function specific target. However, the unintended target feature combinations led to unwanted behaviour in an under-developement patch. This patch fixes the issue by extending aarch64_simd_switcher to explicitly unset the current_target_pragma. It also simplifies constructor arguments by automatically including any feature dependencies, which results in FCMA and BF16 being added to the sets of features used when handling arm_sve.h and arm_sme.h pragmas. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (struct aarch64_extension_info): Add field. (aarch64_get_required_features): New. * config/aarch64/aarch64-builtins.cc (aarch64_simd_switcher::aarch64_simd_switcher): Rename to... (aarch64_target_switcher::aarch64_target_switcher): ...this, remove default simd flags and save current_target_pragma. (aarch64_simd_switcher::~aarch64_simd_switcher): Rename to... (aarch64_target_switcher::~aarch64_target_switcher): ...this, and restore current_target_pragma. (handle_arm_acle_h): Use aarch64_target_switcher. (handle_arm_neon_h): Rename switcher and pass explicit flags. (aarch64_general_init_builtins): Ditto. * config/aarch64/aarch64-protos.h (class aarch64_simd_switcher): Rename to... (class aarch64_target_switcher): ...this, and add pragma member. (aarch64_get_required_features): New prototype. * config/aarch64/aarch64-sve-builtins.cc (sve_switcher::sve_switcher): Rename to... (sve_target_switcher::sve_target_switcher): ...this. (sve_switcher::~sve_switcher): Rename to... (sve_target_switcher::~sve_target_switcher): ...this. (init_builtins): Rename switcher. (handle_arm_sve_h): Ditto. (handle_arm_neon_sve_bridge_h): Ditto. (handle_arm_sme_h): Ditto. * config/aarch64/aarch64-sve-builtins.h (class sve_switcher): Rename to... (class sve_target_switcher): ...this. (class sme_switcher): Rename to... (class sme_target_switcher): ...this. diff --git a/gcc/common/config/aarch64/aarch64-common.cc b/gcc/common/config/aarch64/aarch64-common.cc index ef4458fb69308d2bb6785e97be5be85226cf0ebb..500bf784983d851c54ea4ec59cf3cad29e5e309e 100644 --- a/gcc/common/config/aarch64/aarch64-common.cc +++ b/gcc/common/config/aarch64/aarch64-common.cc @@ -157,6 +157,8 @@ struct aarch64_extension_info aarch64_feature_flags flags_on; /* If this feature is turned off, these bits also need to be turned off. */ aarch64_feature_flags flags_off; + /* If this feature remains enabled, these bits must also remain enabled. */ + aarch64_feature_flags flags_required; }; /* ISA extensions in AArch64. */ @@ -164,9 +166,10 @@ static constexpr aarch64_extension_info all_extensions[] = { #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, FEATURE_STRING) \ {NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \ - feature_deps::get_flags_off (feature_deps::root_off_##IDENT)}, + feature_deps::get_flags_off (feature_deps::root_off_##IDENT), \ + feature_deps::IDENT ().enable}, #include "config/aarch64/aarch64-option-extensions.def" - {NULL, 0, 0, 0} + {NULL, 0, 0, 0, 0} }; struct aarch64_arch_info @@ -204,6 +207,18 @@ static constexpr aarch64_processor_info all_cores[] = {NULL, aarch64_no_cpu, aarch64_no_arch, 0} }; +/* Return the set of feature flags that are required to be enabled when the + features in FLAGS are enabled. */ + +aarch64_feature_flags +aarch64_get_required_features (aarch64_feature_flags flags) +{ + const struct aarch64_extension_info *opt; + for (opt = all_extensions; opt->name != NULL; opt++) +if (flags & opt->flag_canonical) + flags |= opt->flags_required; + return flags; +} /* Print a list of CANDIDATES for an argument, and try to suggest a specific close match. */ diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 128cc365d3d585e01cb69668f285318ee56a36fc..5174fb1daefee2d73a5098e0de1cca73dc103416 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@
[COMMITTED PATCH] Fix description of file-cache-lines/file-cache-files params
From: Andi Kleen The file-cache-lines / file-cache-files tunables were documented in the wrong section. Fix that. Reported-by: Filip Kastl Comitted as obvious. gcc/ChangeLog: * doc/invoke.texi: --- gcc/doc/invoke.texi | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index ca8e468f3f2d..d0e0ca80b0c2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -13010,16 +13010,6 @@ having large chains of nested wrapper functions. Enabled by default. -@item -ffile-cache-files= -Max number of files in the file cache. -The file cache is used to print source lines in diagnostics and do some -source checks like @option{-Wmisleading-indentation}. - -@item -ffile-cache-files= -Max number of lines to index into file cache. When 0 this is automatically sized. -The file cache is used to print source lines in diagnostics and do some -source checks like @option{-Wmisleading-indentation}. - @opindex fipa-sra @item -fipa-sra Perform interprocedural scalar replacement of aggregates, removal of @@ -15792,6 +15782,16 @@ considered for if-conversion. The compiler will also use other heuristics to decide whether if-conversion is likely to be profitable. +@item file-cache-files +Max number of files in the file cache. +The file cache is used to print source lines in diagnostics and do some +source checks like @option{-Wmisleading-indentation}. + +@item file-cache-files +Max number of lines to index into file cache. When 0 this is automatically sized. +The file cache is used to print source lines in diagnostics and do some +source checks like @option{-Wmisleading-indentation}. + @item max-rtl-if-conversion-predictable-cost RTL if-conversion will try to remove conditional branches around a block and replace them with conditionally executed instructions. These parameters -- 2.48.1
[PATCH] COBOL v3: 3/14 80K bld: config and build machinery
>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:10 PM EST From: "James K. Lowden" Date: Tue 18 Feb 2025 04:19:10 PM EST Subject: [PATCH] COBOL 3/14 80K bld: config and build machinery ChangeLog * Makefile.def: Add libgcobol module and cobol language. * Makefile.in: Add libgcobol module and cobol language. * configure.ac: Add libgcobol module and cobol language. gcc/ChangeLog * common.opt: New file. * dwarf2out.cc: Add cobol language. gcc/cobol/ChangeLog * LICENSE: New file. * Make-lang.in: New file. * config-lang.in: New file. * lang.opt: New file. * lang.opt.urls: New file. libgcobol/ChangeLog * /Makefile.in: New file. * /acinclude.m4: New file. * /aclocal.m4: New file. * /configure.ac: New file. * /configure.tgt: New file. maintainer-scripts/ChangeLog * maintainer-scripts/update_web_docs_git: Add libgcobol module and cobol language. --- Makefile.def | +- Makefile.in | +++- configure.ac | - gcc/cobol/LICENSE | +- gcc/cobol/Make-lang.in | ++- gcc/cobol/config-lang.in | ++- gcc/cobol/lang.opt | - gcc/cobol/lang.opt.urls | +- gcc/common.opt | - gcc/dwarf2out.cc | +- libgcobol/Makefile.in | - libgcobol/acinclude.m4 | ++- libgcobol/aclocal.m4 | +- libgcobol/configure.ac | +++- libgcobol/configure.tgt | +++- maintainer-scripts/update_web_docs_git | + 16 files changed, 2005 insertions(+), 20 deletions(-) diff --git a/Makefile.def b/Makefile.def index 19954e7d731..d2a1cd55b6e 100644 --- a/Makefile.def +++ b/Makefile.def @@ -209,6 +209,7 @@ target_modules = { module= libgomp; bootstrap= true; lib_path=.libs; }; target_modules = { module= libitm; lib_path=.libs; }; target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; }; target_modules = { module= libgrust; }; +target_modules = { module= libgcobol; }; // These are (some of) the make targets to be done in each subdirectory. // Not all; these are the ones which don't have special options. @@ -655,6 +656,7 @@ lang_env_dependencies = { module=libgcc; no_gcc=true; no_c=true; }; // built newlib on some targets (e.g. Cygwin). It still needs // a dependency on libgcc for native targets to configure. lang_env_dependencies = { module=libiberty; no_c=true; }; +lang_env_dependencies = { module=libgcobol; cxx=true; }; dependencies = { module=configure-target-fastjar; on=configure-target-zlib; }; dependencies = { module=all-target-fastjar; on=all-target-zlib; }; @@ -690,6 +692,7 @@ dependencies = { module=install-target-libvtv; on=install-target-libgcc; }; dependencies = { module=install-target-libitm; on=install-target-libgcc; }; dependencies = { module=install-target-libobjc; on=install-target-libgcc; }; dependencies = { module=install-target-libstdc++-v3; on=install-target-libgcc; }; +dependencies = { module=install-target-libgcobol; on=install-target-libstdc++-v3; }; // Target modules in the 'src' repository. lang_env_dep
[PATCH] COBOL v3: 2/14 8K pre: introduce ChangeLog files
>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:10 PM EST From: "James K. Lowden" Date: Tue 18 Feb 2025 04:19:10 PM EST Subject: [PATCH] COBOL 2/14 8.0K pre: introduce ChangeLog files gcc/cobol/ChangeLog * ChangeLog: New file. libgcobol/ChangeLog * /ChangeLog: New file. --- gcc/cobol/ChangeLog | +++- libgcobol/ChangeLog | +++ 2 files changed, 166 insertions(+), 2 deletions(-) diff --git a/gcc/cobol/ChangeLog b/gcc/cobol/ChangeLog new file mode 100644 index 000..620265df68e --- /dev/null +++ b/gcc/cobol/ChangeLog @@ -0,0 +1,147 @@ +2025-02-17 Robert Dubner + * Moved #include from genapi.cc to cobol-system.h as + #include + * Removed GCOBOL_FOR_TARGET from /Makefile.def + * Removed if $USER = "bob" stuff from cobol/Make-lang.in + * Backed -std=c++17 down to c++14 in cobol/Make-lang.in + * Removed the single c++17 dependency from show_parse.h ANALYZER + * Removed -Wno-cpp from cobol/Make-lang.in + * Removed Wno-missing-field-initializers from cobol/Make-lang.in + * Added some informative comments to placeholder functions in cobol1.cc + * Removed a call to build_tree_list() in cobol1.cc + * Use default for LANG_HOOKS_TYPE_FOR_SIZE in cobol1.cc + * Commented out, but saved, unused code in convert.cc + * Eliminated numerous "-Wmissing-field-initializers" warnings + +2025-02-16 Robert Dubner + * Added GTY(()) tags to gengen.h and structs.h. Put includes for them into + cobol1.cc + * Removed some fixed-length text buffers for handling mangled names + +2025-02-11 Robert Dubner + * libgcobol quietly is not built for -m32 systems in a multi-lib build + * configure.ac allows COBOL only for x86_64 and aarch64 architectures. + Other systems get a warning and the COBOL language is suppressed. + +2025-02-07 Robert Dubner + * Modified configure.ac and Makefile.in to notice that MULTISUBDIR=/32 to + suppress 32-builds. + * Eliminate -Wunused-result warning in libgcobol.cc compilation + +2025-01-28 Robert Dubner + * Remove TRACE1 statements from parser_enter_file and parser_leave_file; + they are incompatible with COPY statements in the DATA DIVISION. + +2025-01-24 Robert Dubner + * Eliminated missing main() error message; we now rely on linker error + * Cleaned up valconv-dupe and charmaps-dupe processing in Make-lang.in + +2025-01-21 Robert Dubner + * Eliminated all "local" #includes from .h files; they are instead included, + in order, in the .cc files. + +2025-01-16 Robert Dubner + * Code 88 named-conditional comparisons for floating-point + +2025-01-06 Robert Dubner + * Updated warning in tests/check_88 and etests/check_88 + * Updated some UAT error messages. + +2025-01-03 Robert Dubner + * Eliminate old "#if 0" code + * Modify line directives to skip over paragraph/section labels: + * Unwrapped asprintf calls in assert(), because it was a stupid error. + +2025-01-01 Robert Dubner + * Eliminate proc->target_of_call variable; it was unused. + * Wrap asprintf calls in assert() to suppress compiler warnings. + +2024-12-27 Robert Dubner + * Use built_in version of realloc and free + * Use built_in version of strdup, memchr, and memset + * Use built_in version of abort + * Use built_in version of exit + * Use built_in version of strncmp + * Use built_in version of strcmp + * Use built_in version of strcpy + +2024-12-27 Robert Dubner + * Put called_by_main_counter in static memory, not the stack! + +2024-12-26 Robert Dubner + * Use built_in version of memcpy + * Use built_in version of malloc; required initialization + during lang_hook_init + +2024-12-25 Robert Dubner + * Normalize #includes in util.cc + * Normalize #includes in symfind.cc + * Normalize #includes in cdf-copy.cc and copybook.h + * Normalize #includes in lexio.cc + * Normalize #includes in cdf.y + * Normalize #includes in scan.l + required the creation of fisspace and fisdigit in util.cc + * Normalize #includes in parse.y + required the creation of ftolower in util.cc. Jim uses things like + std::transform, which can't take TOLOWER because it is a macro. So I + wrapped those necessary macros into functions. + * Normalize #includes in symbols.h.cc + +2024-12-23 Robert Dubner + + * Created ChangeLog + * Eliminate vestigial ".global" code + * Create "cobol-system.h" file. + trimmed .h files in cobol1.cc + trimmed .h files in convert.cc + trimmed .h files in except.cc + trimmed .h files in gcobolspec.cc + trimmed .h files in
The COBOL front end, version 3, now in 14 easy pieces
The following 14 patches constitute 105,720 lines of code in 83 files to build and document the COBOL front end. The messages are in a more or less logical order. We have: 1/14 4K dir: create gcc/cobol and libgcobol directories 2/14 8K pre: introduce ChangeLog files 3/14 80K bld: config and build machinery 4/14 376K hdr: header files 5/14 152K lex: lexer 6/14 476K par: parser 7/14 344K cbl: parser support 8/14 516K api: GENERIC interface 9/14 244K gen: GENERIC interface support 10/14 72K doc: man pages and GnuCOBOL emulation 11/14 84K lhd: libgcobol header files 12/14 320K lib: libgcobol support 13/14 372K lcc: libgcobol, main file 14/14 148K fun: libgcobol, intrinsic functions To slide under the 400 KB limit, the intrinsic functions now have their own patch. The configure files are removed, as is the Posix adapter framework. They are still against the master branch as of commit 3e08a4ecea27c54fda90e8f58641b1986ad957e1 Date: Wed Feb 5 14:22:33 2025 -0700 Our repository is https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/ using branch cobol-stage I tested these patches using "git apply" to an unpublished branch "cobol-patched". We have endeavored to address all must-fix issues raised in Round 2. 1. Generated files use Autoconf 2.69 2. Commit message matches mail Subject: line 3. Various problems with Make-lang.in and cobol1.cc 4. s/assert(false)/gcc_unreachable()/g 5. Nixed range-based cases 6. Removed Posix adapter files & generated configure scripts 7. Explained memory-management engineering choice 8. s/option_id/option_zero/g, for clarity 9. GTY issues 10. Require only C++14 (not 17) 11. Moved #include 12. Check regex buffer bounds outside gcc_assert Still to do (no particular order): 13. Try SARIF options 14. Do not compose messages (I18N). 15. Try valgrind for memory report 16. Review https://github.com/cooljeanius/legislation/blob/master/tech/21-R-mrg.htm.diff 17. Enumerated warnings in cobol/lang.opt. 18. texinfo update to describe gcobol 19. cross-compilation There are a few places where gcc_unreachable() is now followed by truly unreachable code. We will lop off those bits soon. This patchset still excludes tests. I will supply tests separately. Simplest I think is to use the NIST test suite, assuming the code and documentation pass legal muster. I have also prepared release notes for the www repository under separate cover. We remain hopeful the COBOL front end will be accepted into gcc-15. Thank you for your kind consideration of our work. --jkl
[PATCH] i386: Implement Thread Local Storage on Windows
Hi all, This is a reimplementation of Windows Thread Local Storage, rewritten to support native thread local access on Windows, which had previous been using emulated thread local storage mechanisms. Note that due to issues on my end, I was unable to regenerate configure no matter what I tried. I do not have write access to gcc, and will need help with committing this once the green light is given (Although approval was already given by MINGW maintainers in the relevant bug at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881 and in private communication) best regards, Julian gcc\ChangeLog: * config/i386/i386.cc (ix86_legitimate_constant_p): Handle new UNSPEC (legitimate_pic_operand_p): Handle new UNSPEC (legitimate_pic_address_disp_p): Handle new UNSPEC (ix86_legitimate_address_p): Handle new UNSPEC (ix86_tls_index_symbol): New symbol (ix86_tls_index): Handle creation of _tls_index symbol (legitimize_tls_address): Create thread local access sequence (output_pic_addr_const): Handle new UNSPEC (i386_output_dwarf_dtprel): Handle new UNSPEC (i386_asm_output_addr_const_extra): Handle new UNSPEC * config/i386/i386.h (TARGET_WIN32_TLS): Define * config/i386/i386.md: New UNSPEC * config/i386/predicates.md: Handle new UNSPEC * config/mingw/mingw32.h (TARGET_WIN32_TLS): Define (TARGET_ASM_SELECT_SECTION): Define (DEFAULT_TLS_SEG_REG): Define * config/mingw/winnt.cc (mingw_pe_select_section): Handle TLS section (mingw_pe_unique_section): Select TLS section * config/mingw/winnt.h (mingw_pe_select_section): Declare * configure.ac: New check for broken linker thread local support >From 05d4491d862a16426f2a0986e7f3598714615f93 Mon Sep 17 00:00:00 2001 From: Julian Waters Date: Tue, 15 Oct 2024 20:56:22 +0800 Subject: [PATCH] Implement Windows TLS Signed-off-by: Julian Waters --- gcc/config/i386/i386.cc | 61 ++- gcc/config/i386/i386.h| 1 + gcc/config/i386/i386.md | 1 + gcc/config/i386/predicates.md | 1 + gcc/config/mingw/mingw32.h| 9 ++ gcc/config/mingw/winnt.cc | 14 gcc/config/mingw/winnt.h | 1 + gcc/configure.ac | 29 + 8 files changed, 116 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 473e4cbf10e..304189bd947 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -11170,6 +11170,9 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x) x = XVECEXP (x, 0, 0); return (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_DYNAMIC); + case UNSPEC_SECREL32: + x = XVECEXP (x, 0, 0); + return GET_CODE (x) == SYMBOL_REF; default: return false; } @@ -11306,6 +11309,9 @@ legitimate_pic_operand_p (rtx x) x = XVECEXP (inner, 0, 0); return (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_EXEC); + case UNSPEC_SECREL32: + x = XVECEXP (inner, 0, 0); + return GET_CODE (x) == SYMBOL_REF; case UNSPEC_MACHOPIC_OFFSET: return legitimate_pic_address_disp_p (x); default: @@ -11486,6 +11492,9 @@ legitimate_pic_address_disp_p (rtx disp) disp = XVECEXP (disp, 0, 0); return (GET_CODE (disp) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (disp) == TLS_MODEL_LOCAL_DYNAMIC); +case UNSPEC_SECREL32: + disp = XVECEXP (disp, 0, 0); + return GET_CODE (disp) == SYMBOL_REF; } return false; @@ -11763,6 +11772,7 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool strict, case UNSPEC_INDNTPOFF: case UNSPEC_NTPOFF: case UNSPEC_DTPOFF: + case UNSPEC_SECREL32: break; default: @@ -11788,7 +11798,8 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool strict, || GET_CODE (XEXP (XEXP (disp, 0), 0)) != UNSPEC || !CONST_INT_P (XEXP (XEXP (disp, 0), 1)) || (XINT (XEXP (XEXP (disp, 0), 0), 1) != UNSPEC_DTPOFF - && XINT (XEXP (XEXP (disp, 0), 0), 1) != UNSPEC_NTPOFF)) + && XINT (XEXP (XEXP (disp, 0), 0), 1) != UNSPEC_NTPOFF + && XINT (XEXP (XEXP (disp, 0), 0), 1) != UNSPEC_SECREL32)) /* Non-constant pic memory reference. */ return false; } @@ -12112,6 +12123,22 @@ get_thread_pointer (machine_mode tp_mode, bool to_reg) return tp; } +/* Construct the SYMBOL_REF for the _tls_index symbol. */ + +static GTY(()) rtx ix86_tls_index_symbol; + +static rtx +ix86_tls_index (void) +{ + if (!ix86_tls_index_symbol) +ix86_tls_index_symbol = gen_rtx_SYMBOL_REF (SImode, "_tls_index"); + + if
Re: [PATCH] COBOL v3: 8/14 516K api: GENERIC interface
On Tue, Feb 18, 2025 at 10:52 PM James K. Lowden wrote: > > From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:11 PM EST > From: "James K. Lowden" > Date: Tue 18 Feb 2025 04:19:11 PM EST > Subject: [PATCH] COBOL 8/14 516K api: GENERIC interface A few comments about this: > +static > +void > +treeplet_fill_source(TREEPLET &treeplet, cbl_refer_t &refer) > + { > + treeplet.pfield = gg_get_address_of(refer.field->var_decl_node); > + treeplet.offset = refer_offset_source(refer); > + treeplet.length = refer_size_source(refer); > + } This function (and many others) are missing a comment in the front describing what it does with each argument. > +_Float128 src = (_Float128)sourceref.field->data.value; Is this in the front-end or is this in the target library. Either way I see it is used unconditionally. For the front-end, you should use the real.h interface for floats. For the target you need to use it only conditionally otherwise it won't work on targets which don't have _Float128. I noticed __int128 use in this file too. The same thing applies here except for the front-end, you should use the wide-int.h interface. And only define it conditionally for target code. Also you can't use 128bit integer as a tree type either unless you check the target supports it. There is at least one 64bit GCC target which does NOT support 128bit integers (HPPA64). I see strfromf128 is used here but that was only added to glibc in 2017 and GCC still supports older glibc that don't have full _Float128 support. see above about using real.h. Thanks, Andrew Pinski
Re: Ping: [PATCH] testsuite: Fix up toplevel-asm-1.c for LoongArch
在 2025/2/19 下午3:27, Xi Ruoyao 写道: On Wed, 2025-02-05 at 08:57 +0800, Xi Ruoyao wrote: Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs even with -fno-pic. gcc/testsuite/ChangeLog: * c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3 %c4 on LoongArch. --- Ok for trunk? Ping. LGTM! Thanks. gcc/testsuite/c-c++-common/toplevel-asm-1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/c-c++-common/toplevel-asm-1.c b/gcc/testsuite/c-c++-common/toplevel-asm-1.c index d6766b00e72..e1687d28e0b 100644 --- a/gcc/testsuite/c-c++-common/toplevel-asm-1.c +++ b/gcc/testsuite/c-c++-common/toplevel-asm-1.c @@ -9,7 +9,7 @@ int v[42]; void foo (void) {} /* Not all targets can use %cN even in non-pic code. */ -#if defined(__riscv) +#if defined(__riscv) || defined(__loongarch__) asm ("# %0 %1 %2 %cc3 %cc4 %5 %% %=" #else asm ("# %0 %1 %2 %c3 %c4 %5 %% %="
[PATCH] c++: Enhance -Wuninitialized to check private base class [PR80681]
The issue described in PR80681 highlights a problem that: g++'s -Wuninitialized option does not warn when a privately inherited base class contains public const data or reference members, and the derived class does not have a user-provided constructor. Similarly, the same issue occurs when the privately inherited base class contains protected const data or reference members. In both cases, the derived class is unable to initialize these members in the base class. For private const data or reference members in privately inherited base classes, these members are inherently inaccessible to the derived class and cannot be initialized. Therefore, they are not considered as a condition for issuing a warning. In my proposed patch, under the condition that the current class does not have a user-provided constructor and the -Wuninitialized option is enabled, I traverse all directly privately inherited base classes of the current class. For each base class, I check whether it contains any non-private const data or reference members. If such members are found, a warning is issued at the declaration location of the current class. Additionally, supplementary information is provided to indicate the declaration location of the non-private const data or reference members in the base class. Successfully bootstrapped and regretested on x86_64-pc-linux-gnu: adds 21 PASS results to g++.sum. PR c++/80681 gcc/cp/ChangeLog: * class.cc (check_bases_and_members): Enhanced -Wuninitialized to warn for classes without user-provided constructors that privately inherit base classes with non-private const data or reference members. gcc/testsuite/ChangeLog: * g++.dg/warn/Wuninitialized-pr80681-1.C: New test. --- gcc/cp/class.cc | 61 +++ .../g++.dg/warn/Wuninitialized-pr80681-1.C| 19 ++ 2 files changed, 80 insertions(+) create mode 100644 gcc/testsuite/g++.dg/warn/Wuninitialized-pr80681-1.C diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc index d5ae69b0fdf..49c4ef08f33 100644 --- a/gcc/cp/class.cc +++ b/gcc/cp/class.cc @@ -6500,6 +6500,67 @@ check_bases_and_members (tree t) OPT_Wuninitialized, "non-static const member %q#D " "in class without a constructor", field); } + /* If the class privately inherited from a class with public + or protected non-static const or reference data members, + these members can never be initialized. */ + + tree binfo = TYPE_BINFO (t); + vec *accesses = BINFO_BASE_ACCESSES (binfo); + tree base_binfo; + unsigned i; + + for (i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) + { + tree basetype = TREE_TYPE (base_binfo); + + if ((*accesses)[i] == access_private_node) + { + tree base_field; + + for (base_field = TYPE_FIELDS (basetype); base_field; + base_field = DECL_CHAIN (base_field)) + { + tree field_type; + + if (TREE_CODE (base_field) != FIELD_DECL + || DECL_INITIAL (base_field) != NULL_TREE) + continue; + + field_type = TREE_TYPE (base_field); + + if (!TREE_PRIVATE (base_field)) + { + if (TYPE_REF_P (field_type)) + { + warning (OPT_Wuninitialized, + "private inheritance of base class " + "%q#T with non-private " + "non-static reference in class " + "without a constructor", + basetype); + inform (DECL_SOURCE_LOCATION (base_field), + "non-static reference %q#D here:", + base_field); + } + else if (CP_TYPE_CONST_P (field_type) + && (!CLASS_TYPE_P (field_type) + || !TYPE_HAS_DEFAULT_CONSTRUCTOR ( +field_type))) + { + warning (OPT_Wuninitialized, + "private inheritance of base class " + "%q#T with non-private " + "non-static const member in class " + "without a constructor", + basetype); + inform (DECL_SOURCE_LOCATION (base_field), + "non-static const member %q#D here:", + base_field); + } + } + } + } + } } /* Synthesize any needed methods. */ diff --git a/gcc/testsuite/g++.dg/warn/
Ping: [PATCH] testsuite: Fix up toplevel-asm-1.c for LoongArch
On Wed, 2025-02-05 at 08:57 +0800, Xi Ruoyao wrote: > Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs > even with -fno-pic. > > gcc/testsuite/ChangeLog: > > * c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3 > %c4 on LoongArch. > --- > > Ok for trunk? Ping. > gcc/testsuite/c-c++-common/toplevel-asm-1.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/testsuite/c-c++-common/toplevel-asm-1.c > b/gcc/testsuite/c-c++-common/toplevel-asm-1.c > index d6766b00e72..e1687d28e0b 100644 > --- a/gcc/testsuite/c-c++-common/toplevel-asm-1.c > +++ b/gcc/testsuite/c-c++-common/toplevel-asm-1.c > @@ -9,7 +9,7 @@ int v[42]; > void foo (void) {} > > /* Not all targets can use %cN even in non-pic code. */ > -#if defined(__riscv) > +#if defined(__riscv) || defined(__loongarch__) > asm ("# %0 %1 %2 %cc3 %cc4 %5 %% %=" > #else > asm ("# %0 %1 %2 %c3 %c4 %5 %% %=" -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.
On 2/18/25 7:30 PM, Jin Ma wrote: I apologize for not explaining things more clearly. I also discovered that the issue is caused by CSE. I think that during the substitution process, CSE recognized the syntax of if_then_else and concluded that the expressions in the "then" and "else" branches are equivalent, resulting in both yielding (reg/v:RVVMF2SF 140 [ vreg_memory ]): (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ]) (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 [0x0.0p+0] is considered equivalent to: (reg/v:RVVMF2SF 140 [ vreg_memory ]) Clearly, there wasn’t a deeper consideration of the fact that float_extend requires a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to inform CSE that we have a rounding mode. Right. It worked, but there's a deeper issue here. As I mentioned before, this may not be a good solution, as it risks missing other optimization opportunities. As you pointed out, we need a more general approach to fix it. Unfortunately, while I’m still trying to find a solution, I currently don't have any other good ideas. Changing the rounding modes isn't common, but it's not unheard of. My suspicion is that we need to expose the rounding mode assignment earlier (at RTL generation time). That may not work well with the current optimization of FRM, but I think early exposure is the only viable path forward in my mind. Depending on the depth of the problems it may not be something we can fix in the gcc-15 space. You might experiment with emitting the FRM assignment in the insn_expander class in the risc-v backend. This code: /* Add rounding mode operand. */ if (m_insn_flags & FRM_DYN_P) add_rounding_mode_operand (FRM_DYN); else if (m_insn_flags & FRM_RUP_P) add_rounding_mode_operand (FRM_RUP); else if (m_insn_flags & FRM_RDN_P) add_rounding_mode_operand (FRM_RDN); else if (m_insn_flags & FRM_RMM_P) add_rounding_mode_operand (FRM_RMM); else if (m_insn_flags & FRM_RNE_P) add_rounding_mode_operand (FRM_RNE); else if (m_insn_flags & VXRM_RNU_P) add_rounding_mode_operand (VXRM_RNU); else if (m_insn_flags & VXRM_RDN_P) add_rounding_mode_operand (VXRM_RDN); For anything other than FRM_DYN_P emit the appropriate insn to set FRM. This may generate poor code in the presence of explicit rounding modes, but I think something along these lines is ultimately going to be needed. jeff
RE: [PATCH] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]
> Pengxuan Zheng writes: > > This patch optimizes certain vector permute expansion with the FMOV > > instruction when one of the input vectors is a vector of all zeros and > > the result of the vector permute is as if the upper lane of the > > non-zero input vector is set to zero and the lower lane remains unchanged. > > > > Note that the patch also propagates zero_op0_p and zero_op1_p during > > re-encode now. They will be used by aarch64_evpc_fmov to check if the > > input vectors are valid candidates. > > > > PR target/100165 > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-simd.md > (aarch64_simd_vec_set_zero_fmov): > > New define_insn. > > * config/aarch64/aarch64.cc (aarch64_evpc_reencode): Copy > zero_op0_p and > > zero_op1_p. > > (aarch64_evpc_fmov): New function. > > (aarch64_expand_vec_perm_const_1): Add call to > aarch64_evpc_fmov. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/vec-set-zero.c: Update test accordingly. > > * gcc.target/aarch64/fmov.c: New test. > > * gcc.target/aarch64/fmov-be.c: New test. > > Nice! Thanks for doing this. Some comments on the patch below. > > > > Signed-off-by: Pengxuan Zheng > > --- > > gcc/config/aarch64/aarch64-simd.md| 14 +++ > > gcc/config/aarch64/aarch64.cc | 74 +++- > > gcc/testsuite/gcc.target/aarch64/fmov-be.c| 74 > > gcc/testsuite/gcc.target/aarch64/fmov.c | 110 ++ > > .../gcc.target/aarch64/vec-set-zero.c | 6 +- > > 5 files changed, 275 insertions(+), 3 deletions(-) create mode > > 100644 gcc/testsuite/gcc.target/aarch64/fmov-be.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov.c > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > > b/gcc/config/aarch64/aarch64-simd.md > > index e456f693d2f..543126948e7 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -1190,6 +1190,20 @@ (define_insn "aarch64_simd_vec_set" > >[(set_attr "type" "neon_ins, neon_from_gp, > > neon_load1_one_lane")] > > ) > > > > +(define_insn "aarch64_simd_vec_set_zero_fmov" > > + [(set (match_operand:VP_2E 0 "register_operand" "=w") > > + (vec_merge:VP_2E > > + (match_operand:VP_2E 1 "aarch64_simd_imm_zero" "Dz") > > + (match_operand:VP_2E 3 "register_operand" "w") > > + (match_operand:SI 2 "immediate_operand" "i")))] > > + "TARGET_SIMD > > + && (ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2]))) == > 1)" > > + { > > +return "fmov\\t%0, %3"; > > + } > > + [(set_attr "type" "fmov")] > > +) > > + > > I think this shows that target-independent code is missing some > canonicalisation of vec_merge. combine has: > > unsigned n_elts = 0; > if (GET_CODE (x) == VEC_MERGE > && CONST_INT_P (XEXP (x, 2)) > && GET_MODE_NUNITS (GET_MODE (x)).is_constant (&n_elts) > && (swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1)) > /* Two operands have same precedence, then >first bit of mask select first operand. */ > || (!swap_commutative_operands_p (XEXP (x, 1), XEXP (x, 0)) > && !(UINTVAL (XEXP (x, 2)) & 1 > { > rtx temp = XEXP (x, 0); > unsigned HOST_WIDE_INT sel = UINTVAL (XEXP (x, 2)); > unsigned HOST_WIDE_INT mask = HOST_WIDE_INT_1U; > if (n_elts == HOST_BITS_PER_WIDE_INT) > mask = -1; > else > mask = (HOST_WIDE_INT_1U << n_elts) - 1; > SUBST (XEXP (x, 0), XEXP (x, 1)); > SUBST (XEXP (x, 1), temp); > SUBST (XEXP (x, 2), GEN_INT (~sel & mask)); > } > > which AFAICT would prefer to put the immediate second, not first. I think we > should be doing the same canonicalisation in simplify_ternary_operation, and > possibly elsewhere, so that the .md pattern only needs to match the canonical > form (i.e. register, immedate, mask). Thanks for the suggestion. I've added the canonicalization in a separate patch. https://gcc.gnu.org/pipermail/gcc-patches/2025-February/676105.html > > On: > > > + && (ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2]))) == > 1)" > > it seems dangerous to pass exact_log2 to ENDIAN_LANE_N when we haven't > checked whether it is a power of 2. (0b00 or 0b11 ought to get simplified, but > I don't think we can ignore the possibility.) > > Rather than restrict the pattern to pairs, could we instead handle > VALL_F16 minus the QI elements, with the 16-bit elements restricted to > TARGET_F16? E.g. we should be able to handle V4SI using an FMOV of S > registers if only the low element is nonzero. Good point! I've addressed these in the latest version. Please let me know if I missed anything. https://gcc.gnu.org/pipermail/gcc-patches/2025-February/676106.html Thanks, Pengxuan > > Part of me thinks that this should just be described as a plain old AND, but I > suppose that doesn't work well for FP modes. Still, handling ANDs might be > an interesting follow-up :) > > Thank
Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.
On Tue, 18 Feb 2025 13:48:02 -0700, Jeff Law wrote: > > > On 2/18/25 4:12 AM, Jin Ma wrote: > > We overlooked the side effects of the rounding mode in the pattern, > > which can impact the result of float_extend and lead to incorrect > > optimizations in the final program. This issue likely affects nearly > > all similar patterns that involve rounding modes, and the tests in > > this patch only highlight one example. It seems challenging to address, > > and I only implemented a simple fix, which is not a good way to solve > > the problem. > > > > Any comments on this? > > > > gcc/ChangeLog: > > > > * config/riscv/vector-iterators.md (UNSPEC_VRM): New. > > * config/riscv/vector.md: Use UNSPEC for float_extend. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/base/bug-11.c: New test. > So as Kito note, the insn you changed already has a reference to the FRM > it needs -- kept in operands[9]. It seems like your patch, while fixing > the bug, more likely does so by accident rather than by design. > > What I see when I look at the dump files is a deeper issue. > > > In the .expand dump we have: > > > (insn 17 16 18 2 (set (reg:HF 147) > > (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 -1 > > (nil)) > > (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ]) > > (if_then_else:RVVMF2SF (unspec:RVVMF64BI [ > > (reg/v:RVVMF64BI 138 [ vmask ]) > > (const_int 1 [0x1]) > > (const_int 0 [0]) > > (const_int 2 [0x2]) > > (const_int 0 [0]) > > (const_int 2 [0x2]) > > (reg:SI 66 vl) > > (reg:SI 67 vtype) > > (reg:SI 69 frm) > > ] UNSPEC_VPREDICATE) > > (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ]) > > (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (reg:HF > > 147 > > (reg/v:RVVMF2SF 140 [ vreg_memory ]))) "j.c":14:24 -1 > > (nil)) > > > > Insn 18 does the subtraction with the adjusted rounding mode. So far, > so good. Things look fine at the start of cse1. But if we look at the > end of cse1 we have: > > > (insn 17 16 18 2 (set (reg:HF 147) > > (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 136 {*movhf_hardfloat} > > (nil)) > > (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ]) > > (reg/v:RVVMF2SF 140 [ vreg_memory ])) "j.c":14:24 2786 > > {*movrvvmf2sf_fract} > > (expr_list:REG_DEAD (reg:HF 147) > > (expr_list:REG_DEAD (reg/v:RVVMF2SF 140 [ vreg_memory ]) > > (expr_list:REG_DEAD (reg/v:RVVMF64BI 138 [ vmask ]) > > (expr_list:REG_DEAD (reg:SI 69 frm) > > (nil)) > > > Note how CSE replace the arithmetic with a simple copy. At this point > things are broken. > > I don't see how CSE can make the right decision here; we don't expose > rounding modes this early and thus CSE has no way to know it can't make > that kind of replacement. > > You patch kindof works, but it seems to me it's more accident than > design and that we need to fix this in a more general manner. > > The natural question is what do other targets do when the rounding mode > gets changed. I'm guessing its exposed as a unspec set before the RTL > optimizers run. I apologize for not explaining things more clearly. I also discovered that the issue is caused by CSE. I think that during the substitution process, CSE recognized the syntax of if_then_else and concluded that the expressions in the "then" and "else" branches are equivalent, resulting in both yielding (reg/v:RVVMF2SF 140 [ vreg_memory ]): (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ]) (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 [0x0.0p+0] is considered equivalent to: (reg/v:RVVMF2SF 140 [ vreg_memory ]) Clearly, there wasn’t a deeper consideration of the fact that float_extend requires a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to inform CSE that we have a rounding mode. As I mentioned before, this may not be a good solution, as it risks missing other optimization opportunities. As you pointed out, we need a more general approach to fix it. Unfortunately, while I’m still trying to find a solution, I currently don't have any other good ideas. Best regards, Jin Ma > jeff
[PATCH v2] Vect: Fix ICE when vect_verify_loop_lens acts on relevant mode [PR116351]
From: Pan Li This patch would like to fix the ICE similar as below, assump we have sample code: 1 │ int a, b, c; 2 │ short d, e, f; 3 │ long g (long h) { return h; } 4 │ 5 │ void i () { 6 │ for (; b; ++b) { 7 │ f = 5 >> a ? d : d << a; 8 │ e &= c | g(f); 9 │ } 10 │ } It will ice when compile with -O3 -march=rv64gc_zve64f -mrvv-vector-bits=zvl during GIMPLE pass: vect pr116351-1.c: In function ‘i’: pr116351-1.c:8:6: internal compiler error: in get_len_load_store_mode, at optabs-tree.cc:655 8 | void i () { | ^ 0x44d6b9d internal_error(char const*, ...) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517 0x44a26a6 fancy_abort(char const*, int, char const*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722 0x19e4309 get_len_load_store_mode(machine_mode, bool, internal_fn*, vec*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs-tree.cc:655 0x1fada40 vect_verify_loop_lens /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:1566 0x1fb2b07 vect_analyze_loop_2 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3037 0x1fb4302 vect_analyze_loop_1 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3478 0x1fb4e9a vect_analyze_loop(loop*, gimple*, vec_info_shared*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3638 0x203c2dc try_vectorize_loop_1 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1095 0x203c839 try_vectorize_loop /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1212 0x203cb2c execute During vectorization the override_widen pattern matched and then will get DImode as vector_mode in loop_info. After that the loop_vinfo will step in vect_analyze_xx with below flow: vect_analyze_loop_2 |- vect_pattern_recog // over-widening and set loop_vinfo->vector_mode to DImode |- ... |- vect_analyze_loop_operations |- stmt_info->def_type == vect_reduction_def |- stmt_info->slp_type == pure_slp |- vectorizable_lc_phi // Not Hit |- vectorizable_induction // Not Hit |- vectorizable_reduction // Not Hit |- vectorizable_recurr // Not Hit |- vectorizable_live_operation // Not Hit |- vect_analyze_stmt |- stmt_info->relevant == vect_unused_in_scope |- stmt_info->live == false |- p pattern_stmt_info == (stmt_vec_info) 0x0 |- return opt_result::success (); OR |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP analysis\n" |- Early return opt_result::success (); |- vectorizable_load/store/call_convert/... // Not Hit |- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS(loop_vinfo).is_empty () |- vect_verify_loop_lens (loop_vinfo) |- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert result in ICE Finally, the DImode in loop_vinfo will hit the assert (VECTOR_MODE_P (mode)) in vect_verify_loop_lens. This patch would like to return false directly if the loop_vinfo has relevant mode like DImode for the ICE fix, but still may have mis-optimization for similar cases. We will try to cover that in separated patches. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. PR middle-end/116351 gcc/ChangeLog: * tree-vect-loop.cc (vect_verify_loop_lens): Return false if the loop_vinfo has relevant mode such as DImode. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr116351-1.c: New test. * gcc.target/riscv/rvv/base/pr116351-2.c: New test. * gcc.target/riscv/rvv/base/pr116351.h: New test. Signed-off-by: Pan Li --- .../gcc.target/riscv/rvv/base/pr116351-1.c | 5 + .../gcc.target/riscv/rvv/base/pr116351-2.c | 5 + .../gcc.target/riscv/rvv/base/pr116351.h | 18 ++ gcc/tree-vect-loop.cc | 3 +++ 4 files changed, 31 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116351.h diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-1.c new file mode 100644 index 000..f58fedfeaf1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-1.c @@ -0,0 +1,5 @@ +/* Test that we do not have ice when compile */ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zve32x -mabi=lp64d -O3 -ftree-vectorize" } */ + +#include "pr116351.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-2.c b/gcc/t
[PATCH] COBOL v3: 10/14 72K doc: man pages and GnuCOBOL emulation
>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:12 PM EST From: "James K. Lowden" Date: Tue 18 Feb 2025 04:19:12 PM EST Subject: [PATCH] COBOL 10/14 72K doc: man pages and GnuCOBOL emulation gcc/cobol/ChangeLog * gcobc: New file. * gcobol.1: New file. * gcobol.3: New file. * help.gen: New file. gcc/cobol/udf/ChangeLog * udf/stored-char-length.cbl: New file. --- gcc/cobol/gcobc | +- gcc/cobol/gcobol.1 | - gcc/cobol/gcobol.3 | - gcc/cobol/help.gen | +++- gcc/cobol/udf/stored-char-length.cbl | +++ 5 files changed, 2451 insertions(+), 5 deletions(-) diff --git a/gcc/cobol/gcobc b/gcc/cobol/gcobc new file mode 100755 index 000..93e1bd302a6 --- /dev/null +++ b/gcc/cobol/gcobc @@ -0,0 +1,465 @@ +#! /bin/sh -e + +# +# COPYRIGHT +# The gcobc program is in public domain. +# If it breaks then you get to keep both pieces. +# +# This file emulates the GnuCOBOL cobc compiler to a limited degree. +# For options that can be "mapped" (see migration-guide.1), it accepts +# cobc options, changing them to the gcobol equivalents. Options not +# recognized by the script are passed verbatim to gcobol, which will +# reject them unless of course they are gcobol options. +# +# User-defined variables, and their defaults: +# +# Variable Default Effect +# echo none If defined, echo the gcobol command +# gcobcxnone Produce verbose messages +# gcobol ./gcobolName of the gcobol binary +# GCOBCUDF PREFIX/share/cobol/udf/Location of UDFs to be prepended to input +# +# By default, this script includes all files in $GCOBCUDF. To defeat +# that behavior, use GCOBCUDF=none. +# +# A list of supported options is produced with "gcobc -HELP". +# +## Maintainer note. In modifying this file, the following may make +## your life easier: +## +## - To force the script to exit, either set exit_status to 1, or call +##the error function. +## - As handled options are added, add them to the HELP here-doc. +## - The compiler can produce only one kind of output. In this +##script, that's known by $mode. Options that affect the type of +##output set the mode variable. Everything else is appended to the +##opts variable. +## + +if [ "$COBCPY" ] +then +copydir="-I$COBCPY" +fi + +if [ "$COB_COPY_DIR" ] +then +copydir="-I$COB_COPY_DIR" +fi + +# TODO: this file likely needs to query gcobol for its shared path instead +udf_default="${0%/*}/../share/gcobol/udf" +if [ ! -d "$udfdir" ] +then +
[PATCH] COBOL v3: 1/14 4K dir: create gcc/cobol and libgcobol directories
>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:09 PM EST From: "James K. Lowden" Date: Tue 18 Feb 2025 04:19:09 PM EST Subject: [PATCH] COBOL 1/14 4.0K dir: create gcc/cobol and libgcobol directories contrib/gcc-changelog/ChangeLog * contrib/gcc-changelog/git_commit.py: Add libgcobol module and cobol language. --- contrib/gcc-changelog/git_commit.py | ++ 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py index 5c0596c2627..c2297d1051f 100755 --- a/contrib/gcc-changelog/git_commit.py +++ b/contrib/gcc-changelog/git_commit.py @@ -39,6 +39,7 @@ default_changelog_locations = { 'gcc/c-family', 'gcc', 'gcc/cp', +'gcc/cobol', 'gcc/d', 'gcc/fortran', 'gcc/go', @@ -66,6 +67,7 @@ default_changelog_locations = { 'libgcc', 'libgcc/config/avr/libf7', 'libgcc/config/libbid', +'libgcobol', 'libgfortran', 'libgm2', 'libgomp',
[PATCH] COBOL v3: 11/14 84K lhd: libgcobol header files
>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:13 PM EST From: "James K. Lowden" Date: Tue 18 Feb 2025 04:19:13 PM EST Subject: [PATCH] COBOL 11/14 84K lhd: libgcobol header files libgcobol/ChangeLog * /charmaps.h: New file. * /common-defs.h: New file. * /ec.h: New file. * /exceptl.h: New file. * /gcobolio.h: New file. * /gfileio.h: New file. * /gmath.h: New file. * /io.h: New file. * /libgcobol.h: New file. * /valconv.h: New file. --- libgcobol/charmaps.h | +- libgcobol/common-defs.h | - libgcobol/ec.h | +- libgcobol/exceptl.h | - libgcobol/gcobolio.h | ++- libgcobol/gfileio.h | +- libgcobol/gmath.h | ++- libgcobol/io.h | +- libgcobol/libgcobol.h | +- libgcobol/valconv.h | 10 files changed, 2017 insertions(+), 10 deletions(-) diff --git a/libgcobol/charmaps.h b/libgcobol/charmaps.h new file mode 100644 index 000..64270c6f08c --- /dev/null +++ b/libgcobol/charmaps.h @@ -0,0 +1,369 @@ +/* + * Copyright (c) 2021-2025 Symas Corporation + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following disclaimer + * in the documentation and/or other materials provided with the + * distribution. + * * Neither the name of the Symas Corporation nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef CHARMAPS_H +#define CHARMAPS_H + +#include + +/* There are four distinct codeset domains in the COBOL compiler. + * + * First is the codeset of the console. Established by looking at what + * setlocale() reports, this can be either UTF-8 or some ASCII based code + * page. (We assume CP1252). Data coming from the console or the system, + * ACCEPT statements;
New Chinese (simplified) PO file for 'cpplib' (version 15-b20250216)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'cpplib' has been submitted by the Chinese (simplified) team of translators. The file is available at: https://translationproject.org/latest/cpplib/zh_CN.po (This file, 'cpplib-15-b20250216.zh_CN.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/cpplib/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/cpplib.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Contents of PO file 'cpplib-15-b20250216.zh_CN.po'
cpplib-15-b20250216.zh_CN.po.gz Description: Binary data The Translation Project robot, in the name of your translation coordinator.
RE: [PATCH] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]
> > Pengxuan Zheng writes: > > > This patch optimizes certain vector permute expansion with the FMOV > > > instruction when one of the input vectors is a vector of all zeros > > > and the result of the vector permute is as if the upper lane of the > > > non-zero input vector is set to zero and the lower lane remains > unchanged. > > > > > > Note that the patch also propagates zero_op0_p and zero_op1_p during > > > re-encode now. They will be used by aarch64_evpc_fmov to check if > > > the input vectors are valid candidates. > > > > > > PR target/100165 > > > > > > gcc/ChangeLog: > > > > > > * config/aarch64/aarch64-simd.md > > (aarch64_simd_vec_set_zero_fmov): > > > New define_insn. > > > * config/aarch64/aarch64.cc (aarch64_evpc_reencode): Copy > > zero_op0_p and > > > zero_op1_p. > > > (aarch64_evpc_fmov): New function. > > > (aarch64_expand_vec_perm_const_1): Add call to > > aarch64_evpc_fmov. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/aarch64/vec-set-zero.c: Update test accordingly. > > > * gcc.target/aarch64/fmov.c: New test. > > > * gcc.target/aarch64/fmov-be.c: New test. > > > > Nice! Thanks for doing this. Some comments on the patch below. > > > > > > Signed-off-by: Pengxuan Zheng > > > --- > > > gcc/config/aarch64/aarch64-simd.md| 14 +++ > > > gcc/config/aarch64/aarch64.cc | 74 +++- > > > gcc/testsuite/gcc.target/aarch64/fmov-be.c| 74 > > > gcc/testsuite/gcc.target/aarch64/fmov.c | 110 ++ > > > .../gcc.target/aarch64/vec-set-zero.c | 6 +- > > > 5 files changed, 275 insertions(+), 3 deletions(-) create mode > > > 100644 gcc/testsuite/gcc.target/aarch64/fmov-be.c > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov.c > > > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > > > b/gcc/config/aarch64/aarch64-simd.md > > > index e456f693d2f..543126948e7 100644 > > > --- a/gcc/config/aarch64/aarch64-simd.md > > > +++ b/gcc/config/aarch64/aarch64-simd.md > > > @@ -1190,6 +1190,20 @@ (define_insn "aarch64_simd_vec_set" > > >[(set_attr "type" "neon_ins, neon_from_gp, > > > neon_load1_one_lane")] > > > ) > > > > > > +(define_insn "aarch64_simd_vec_set_zero_fmov" > > > + [(set (match_operand:VP_2E 0 "register_operand" "=w") > > > + (vec_merge:VP_2E > > > + (match_operand:VP_2E 1 "aarch64_simd_imm_zero" "Dz") > > > + (match_operand:VP_2E 3 "register_operand" "w") > > > + (match_operand:SI 2 "immediate_operand" "i")))] > > > + "TARGET_SIMD > > > + && (ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2]))) > > > +== > > 1)" > > > + { > > > +return "fmov\\t%0, %3"; > > > + } > > > + [(set_attr "type" "fmov")] > > > +) > > > + > > > > I think this shows that target-independent code is missing some > > canonicalisation of vec_merge. combine has: > > > > unsigned n_elts = 0; > > if (GET_CODE (x) == VEC_MERGE > > && CONST_INT_P (XEXP (x, 2)) > > && GET_MODE_NUNITS (GET_MODE (x)).is_constant (&n_elts) > > && (swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1)) > > /* Two operands have same precedence, then > > first bit of mask select first operand. */ > > || (!swap_commutative_operands_p (XEXP (x, 1), XEXP (x, 0)) > > && !(UINTVAL (XEXP (x, 2)) & 1 > > { > > rtx temp = XEXP (x, 0); > > unsigned HOST_WIDE_INT sel = UINTVAL (XEXP (x, 2)); > > unsigned HOST_WIDE_INT mask = HOST_WIDE_INT_1U; > > if (n_elts == HOST_BITS_PER_WIDE_INT) > > mask = -1; > > else > > mask = (HOST_WIDE_INT_1U << n_elts) - 1; > > SUBST (XEXP (x, 0), XEXP (x, 1)); > > SUBST (XEXP (x, 1), temp); > > SUBST (XEXP (x, 2), GEN_INT (~sel & mask)); > > } > > > > which AFAICT would prefer to put the immediate second, not first. I > > think we should be doing the same canonicalisation in > > simplify_ternary_operation, and possibly elsewhere, so that the .md > > pattern only needs to match the canonical form (i.e. register, immedate, > mask). > > Thanks for the suggestion. I've added the canonicalization in a separate patch. > https://gcc.gnu.org/pipermail/gcc-patches/2025-February/676105.html > > > > > On: > > > > > + && (ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2]))) > > > + == > > 1)" > > > > it seems dangerous to pass exact_log2 to ENDIAN_LANE_N when we > haven't > > checked whether it is a power of 2. (0b00 or 0b11 ought to get > > simplified, but I don't think we can ignore the possibility.) > > > > Rather than restrict the pattern to pairs, could we instead handle > > VALL_F16 minus the QI elements, with the 16-bit elements restricted to > > TARGET_F16? E.g. we should be able to handle V4SI using an FMOV of S > > registers if only the low element is nonzero. > > Good point! I've addressed these in the latest version. Please let me know if I > missed anything. > https://gcc.gnu.org/pipermail/gcc-patches/2025-February/676106.html Missed
[PATCH v3] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]
This patch optimizes certain vector permute expansion with the FMOV instruction when one of the input vectors is a vector of all zeros and the result of the vector permute is as if the upper lane of the non-zero input vector is set to zero and the lower lane remains unchanged. Note that the patch also propagates zero_op0_p and zero_op1_p during re-encode now. They will be used by aarch64_evpc_fmov to check if the input vectors are valid candidates. PR target/100165 gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_lane0_mask_p): New. * config/aarch64/aarch64-simd.md (@aarch64_simd_vec_set_zero_fmov): New define_insn. * config/aarch64/aarch64.cc (aarch64_lane0_mask_p): New. (aarch64_evpc_reencode): Copy zero_op0_p and zero_op1_p. (aarch64_evpc_fmov): New. (aarch64_expand_vec_perm_const_1): Add call to aarch64_evpc_fmov. * config/aarch64/iterators.md (VALL_F16_NO_QI): New mode iterator. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vec-set-zero.c: Update test accordingly. * gcc.target/aarch64/fmov-1.c: New test. * gcc.target/aarch64/fmov-2.c: New test. * gcc.target/aarch64/fmov-3.c: New test. * gcc.target/aarch64/fmov-be-1.c: New test. * gcc.target/aarch64/fmov-be-2.c: New test. * gcc.target/aarch64/fmov-be-3.c: New test. Signed-off-by: Pengxuan Zheng --- gcc/config/aarch64/aarch64-protos.h | 2 +- gcc/config/aarch64/aarch64-simd.md| 13 ++ gcc/config/aarch64/aarch64.cc | 96 ++- gcc/config/aarch64/iterators.md | 9 + gcc/testsuite/gcc.target/aarch64/fmov-1.c | 158 ++ gcc/testsuite/gcc.target/aarch64/fmov-2.c | 52 ++ gcc/testsuite/gcc.target/aarch64/fmov-3.c | 144 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c | 144 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c | 52 ++ gcc/testsuite/gcc.target/aarch64/fmov-be-3.c | 144 .../gcc.target/aarch64/vec-set-zero.c | 6 +- 11 files changed, 816 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-3.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 4235f4a0ca5..cba94914903 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -1051,7 +1051,7 @@ void aarch64_subvti_scratch_regs (rtx, rtx, rtx *, rtx *, rtx *, rtx *); void aarch64_expand_subvti (rtx, rtx, rtx, rtx, rtx, rtx, rtx, bool); - +bool aarch64_lane0_mask_p (unsigned int, rtx); /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index e2afe87e513..6ddc27c223e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1190,6 +1190,19 @@ (define_insn "@aarch64_simd_vec_set" [(set_attr "type" "neon_ins, neon_from_gp, neon_load1_one_lane")] ) +(define_insn "@aarch64_simd_vec_set_zero_fmov" + [(set (match_operand:VALL_F16_NO_QI 0 "register_operand" "=w") + (vec_merge:VALL_F16_NO_QI + (match_operand:VALL_F16_NO_QI 1 "register_operand" "w") + (match_operand:VALL_F16_NO_QI 2 "aarch64_simd_imm_zero" "Dz") + (match_operand:SI 3 "immediate_operand" "i")))] + "TARGET_SIMD && aarch64_lane0_mask_p (, operands[3])" + { +return "fmov\\t%0, %1"; + } + [(set_attr "type" "fmov")] +) + (define_insn "aarch64_simd_vec_set_zero" [(set (match_operand:VALL_F16 0 "register_operand" "=w") (vec_merge:VALL_F16 diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f5f23f6ff4b..c29a43f2553 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -23682,6 +23682,15 @@ aarch64_strided_registers_p (rtx *operands, unsigned int num_operands, return true; } +/* Return TRUE if OP is a valid vec_merge bit mask for lane 0. */ + +bool +aarch64_lane0_mask_p (unsigned int nelts, rtx op) +{ + return exact_log2 (INTVAL (op)) >= 0 +&& (ENDIAN_LANE_N (nelts, exact_log2 (INTVAL (op))) == 0); +} + /* Bounds-check lanes. Ensure OPERAND lies between LOW (inclusive) and HIGH (exclusive). */ void @@ -26058,6 +26067,8 @@ aarch64_evpc_reencode (struct expand_vec_perm_d *d) newd.target = d->target ? gen_lowpart (new_mode, d->target) : NULL; newd.op0 = d->op0 ? gen_lowpart (new_mode, d->op0) : NULL; newd.op1 = d->op1 ? gen_lowpart (new_mode, d->op1) : NULL; +
Re: [PATCH v2 15/16] Add error cases and tests for Aarch64 FMV.
Alfie Richards writes: > This changes the ambiguation error for C++ to cover cases of differently > annotated FMV function sets whose signatures only differ by their return > type. > > It also adds tests covering many FMV errors for Aarch64, including > redeclaration, and mixing target_clones and target_versions. The tests look good. Sorry for not applying the series to find out for myself, but what's the full message for: > diff --git a/gcc/testsuite/g++.target/aarch64/mvc-error2.C > b/gcc/testsuite/g++.target/aarch64/mvc-error2.C > new file mode 100644 > index 000..0e956e402d8 > --- /dev/null > +++ b/gcc/testsuite/g++.target/aarch64/mvc-error2.C > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-require-ifunc "" } */ > +/* { dg-options "-O0" } */ > +/* { dg-additional-options "-Wno-experimental-fmv-target" } */ > + > +__attribute__ ((target_clones ("default, dotprod"))) float > +foo () { return 3; } /* { dg-message "previously defined here" } */ > + > +__attribute__ ((target_clones ("dotprod", "mve"))) float > +foo () { return 3; } /* { dg-error "redefinition of" } */ ...the redefinition error here? Does it mention dotprod specifically? If so, it might be worth capturing that in the test, so that we don't regress later. Thanks, Richard
Re: 7/7 [Fortran, Patch, Coarray, PR107635] Remove deprecated coarray routines
Am 18.02.25 um 16:00 schrieb Andre Vehreschild: Hi Thomas, This patch series (of necessity) introduces ABI changes. What will happen with user code compiled against the old interface? That depends on the library you are linking against. When using caf_single from gfortran, then you will get link failures when you mix code compiled by gfortran < 15 and gfortran-15. But caf_single is anyhow only considered for testing. So why should one do this ? OK. If your questions targets the users of this ABI, which to my knowledge is only OpenCoarrays at the moment, then the user will experience nothing. A mix of pre-gfortran-15 and gfortran-15 generated .o-files will link and work as expected, because OpenCoarrays provides all ABIs. We do not compile a gfortran-15 exclusive version of OpenCoarrays, i.e. all routines are present, fully functional and interoperable. Very good, then. I guess a link failure (plus an answer in stack exchange where the explanation is given, so people can google it, and a mention in the release notes) would be acceptable, but is there anything that can be done in addition? I can provide an entry in release notes, if need be. Where do I have to do this? Never did. It is a separate repository from the gcc source, it can be found by cloning git+ssh://you...@gcc.gnu.org/git/gcc-wwwdocs.git . Best regards (and a lot of thanks for the patch series!) Thomas
New template for 'gcc' made available
Hello, gentle maintainer. This is a message from the Translation Project robot. (If you have any questions, send them to .) A new POT file for textual domain 'gcc' has been made available to the language teams for translation. It is archived as: https://translationproject.org/POT-files/gcc-15-b20250216.pot Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. Below is the URL which has been provided to the translators of your package. Please inform the translation coordinator, at the address at the bottom, if this information is not current: https://gcc.gnu.org/pub/gcc/snapshots/15-20250216/gcc-15-20250216.tar.xz Translated PO files will later be automatically e-mailed to you. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
New template for 'cpplib' made available
Hello, gentle maintainer. This is a message from the Translation Project robot. (If you have any questions, send them to .) A new POT file for textual domain 'cpplib' has been made available to the language teams for translation. It is archived as: https://translationproject.org/POT-files/cpplib-15-b20250216.pot Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. Below is the URL which has been provided to the translators of your package. Please inform the translation coordinator, at the address at the bottom, if this information is not current: https://gcc.gnu.org/pub/gcc/snapshots/15-20250216/gcc-15-20250216.tar.xz Translated PO files will later be automatically e-mailed to you. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH] c, v2: do not warn about truncating NUL char when initializing nonstring arrays [PR117178]
On Fri, Feb 14, 2025 at 11:21:07AM +0100, Jakub Jelinek wrote: > On Thu, Feb 13, 2025 at 02:10:25PM +0100, Jakub Jelinek wrote: > > Kees, are you submitting this under assignment to FSF (maybe the Google one > > if it has one) or DCO? See https://gcc.gnu.org/contribute.html#legal > > for details. If DCO, can you add your Signed-off-by: tag for it? > > > > So far lightly tested, ok for trunk if it passes bootstrap/regtest? > > Bootstrapped/regtested on x86_64-linux and i686-linux successfully. Thank you for getting this done! I really appreciate having this available. I'll give it a spin. :) -- Kees Cook
Contents of PO file 'cpplib-15-b20250216.uk.po'
cpplib-15-b20250216.uk.po.gz Description: Binary data The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH] aarch64: Ignore target pragmas while defining intrinsics
Andrew Carlotti writes: > When initialising intrinsics with `#pragma GCC aarch64 "arm_*.h"`, we > often set an explicit target, but currently leave current_target_pragma > unchanged. This results in the target pragma being applied to each > simulated intrinsic on top of our explicit target, which is clearly > undesirable. > > As far as I can tell this doesn't cause any bugs at the moment, because > none of the behaviour for builtin functions depends upon the function > specific target. However, the unintended target feature combinations > led to unwanted behaviour in an under-developement patch. > > This patch resolves the issue by extending aarch64_simd_switcher to > explicitly unset the current_target_pragma, and adapting it for to > support handle_arm_acle_h as well. I've also renamed the switcher classes > and instances, because I think the new names a slightly clearer. > > The chosen sets of features for arm_sve.h and arm_sme.h are not normally > valid, because they exclude FCMA and BF16. However, I don't think that > matters for the usage here. Alternatively, aarch64_target_switcher > could be modified to enable all the dependent features as well. > > > Bootstrapped and regression tested on aarch64. Ok for master (to enable the > dependant WIP patch)? > > gcc/ChangeLog: > > * config/aarch64/aarch64-builtins.cc > (aarch64_simd_switcher::aarch64_simd_switcher): Rename to... > (aarch64_target_switcher::aarch64_target_switcher): ...this, > remove default simd flags and save current_target_pragma. > (aarch64_simd_switcher::~aarch64_simd_switcher): Rename to... > (aarch64_target_switcher::~aarch64_target_switcher): ...this, > and restore current_target_pragma. > (handle_arm_acle_h): Use aarch64_target_switcher. > (handle_arm_neon_h): Rename switcher and pass explicit flags. > (aarch64_general_init_builtins): Ditto. > * config/aarch64/aarch64-protos.h > (class aarch64_simd_switcher): Rename to... > (class aarch64_target_switcher): ...this, and add pragma member. > * config/aarch64/aarch64-sve-builtins.cc > (sve_switcher::sve_switcher): Rename to... > (sve_target_switcher::sve_target_switcher): ...this. > (sve_switcher::~sve_switcher): Rename to... > (sve_target_switcher::~sve_target_switcher): ...this. > (init_builtins): Rename switcher. > (handle_arm_sve_h): Ditto. > (handle_arm_neon_sve_bridge_h): Ditto. > (handle_arm_sme_h): Ditto. > * config/aarch64/aarch64-sve-builtins.h > (class sve_switcher): Rename to... > (class sve_target_switcher): ...this. > (class sme_switcher): Rename to... > (class sme_target_switcher): ...this. > > > diff --git a/gcc/config/aarch64/aarch64-builtins.cc > b/gcc/config/aarch64/aarch64-builtins.cc > index > 128cc365d3d585e01cb69668f285318ee56a36fc..c1cb6cdcc81c6b45c0132250589bba0be42f195d > 100644 > --- a/gcc/config/aarch64/aarch64-builtins.cc > +++ b/gcc/config/aarch64/aarch64-builtins.cc > @@ -1877,23 +1877,25 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type t) >return (t == Poly8_t || t == Poly16_t || t == Poly64_t || t == Poly128_t); > } > > -/* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced SIMD > - set. */ > -aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags > extra_flags) > +/* Temporarily set FLAGS as the enabled target features. */ > +aarch64_target_switcher::aarch64_target_switcher (aarch64_feature_flags > flags) >: m_old_asm_isa_flags (aarch64_asm_isa_flags), > -m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY) > +m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY), > +m_old_target_pragma (current_target_pragma) > { >/* Changing the ISA flags should be enough here. We shouldn't need to > pay the compile-time cost of a full target switch. */ >global_options.x_target_flags &= ~MASK_GENERAL_REGS_ONLY; > - aarch64_set_asm_isa_flags (AARCH64_FL_FP | AARCH64_FL_SIMD | extra_flags); > + aarch64_set_asm_isa_flags (flags); This feels a bit inconsistent, in that it forces -mgeneral-regs off but doesn't force AARCH64_FL_FP on. I think it'd be better to keep this part of aarch64_simd_(target_)switcher (and continue to have sve_(target_)switcher derive from it) and make aarch64_target_switcher a new base class that just does the pragma bit. Thanks, Richard > + current_target_pragma = NULL_TREE; > } > > -aarch64_simd_switcher::~aarch64_simd_switcher () > +aarch64_target_switcher::~aarch64_target_switcher () > { >if (m_old_general_regs_only) > global_options.x_target_flags |= MASK_GENERAL_REGS_ONLY; >aarch64_set_asm_isa_flags (m_old_asm_isa_flags); > + current_target_pragma = m_old_target_pragma; > } > > /* Implement #pragma GCC aarch64 "arm_neon.h". > @@ -1903,7 +1905,7 @@ aarch64_simd_switcher::~aarch64_simd_switcher () > void > handle_arm_neon_h (void) > { > - aarch64_simd_switcher simd; > + a
[PATCH] avoid-store-forwarding: Handle REG_EH_REGION notes
From: kelefth The pass rejects the transformation when there are instructions in the sequence that might throw an exception. This was added due to having cases that the load instruction contains a REG_EH_REGION note and moving it before the store instructions caused an error, as it was no longer the last instruction in the basic block. This patch handles those cases by moving a possible REG_EH_REGION note from the load instruction of the store-load sequence to the last instruction of the basic block. gcc/ChangeLog: * avoid-store-forwarding.cc (process_store_forwarding): (store_forwarding_analyzer::avoid_store_forwarding): Move a possible REG_EH_REGION note from the load instruction to the last instruction of the basic block. --- gcc/avoid-store-forwarding.cc | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc index 34a7bba4043..05c91bb1a82 100644 --- a/gcc/avoid-store-forwarding.cc +++ b/gcc/avoid-store-forwarding.cc @@ -400,6 +400,17 @@ process_store_forwarding (vec &stores, rtx_insn *load_insn, if (load_elim) delete_insn (load_insn); + /* Find possible REG_EH_REGION note in the load instruction and move it + into the last instruction of the basic block. */ + rtx reg_eh_region_note = find_reg_note (load_insn, REG_EH_REGION, NULL_RTX); + if (reg_eh_region_note != NULL_RTX) +{ + remove_note (load_insn, reg_eh_region_note); + basic_block load_bb = BLOCK_FOR_INSN (load_insn); + add_reg_note (BB_END (load_bb), REG_EH_REGION, + XEXP (reg_eh_region_note, 0)); +} + return true; } @@ -425,7 +436,7 @@ store_forwarding_analyzer::avoid_store_forwarding (basic_block bb) rtx set = single_set (insn); - if (!set || insn_could_throw_p (insn)) + if (!set) { store_exprs.truncate (0); continue; -- 2.47.0
[committed] testsuite: Include stdint.h instead of stdint-gcc.h in some tests
Fixes PR testsuite/116986. Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed to trunk. Dave --- testsuite: Include stdint.h instead of stdint-gcc.h in some tests When use_gcc_stdint=provide, the stdint-gcc.h header is not provided. 2025-02-18 John David Anglin gcc/testsuite/ChangeLog: PR testsuite/116986 * gcc.dg/crc-builtin-rev-target32.c: Include stdint.h instead of stdint-gcc.h. * gcc.dg/crc-builtin-rev-target64.c: Likewise. * gcc.dg/crc-builtin-target32.c: Likewise. * gcc.dg/crc-builtin-target64.c: Likewise. * gcc.dg/torture/pr115387-2.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/crc-builtin-rev-target32.c b/gcc/testsuite/gcc.dg/crc-builtin-rev-target32.c index 4fc58e5f513..f2b63db7fd1 100644 --- a/gcc/testsuite/gcc.dg/crc-builtin-rev-target32.c +++ b/gcc/testsuite/gcc.dg/crc-builtin-rev-target32.c @@ -2,7 +2,7 @@ /* { dg-require-effective-target int32plus } */ /* { dg-additional-options "-fdump-rtl-expand-details" } */ -#include +#include int8_t rev_crc8_data8 () { diff --git a/gcc/testsuite/gcc.dg/crc-builtin-rev-target64.c b/gcc/testsuite/gcc.dg/crc-builtin-rev-target64.c index d63981e0101..97e80004d37 100644 --- a/gcc/testsuite/gcc.dg/crc-builtin-rev-target64.c +++ b/gcc/testsuite/gcc.dg/crc-builtin-rev-target64.c @@ -2,7 +2,7 @@ /* { dg-require-effective-target int32plus } */ /* { dg-additional-options "-fdump-rtl-expand-details" } */ -#include +#include int8_t rev_crc8_data8 () { diff --git a/gcc/testsuite/gcc.dg/crc-builtin-target32.c b/gcc/testsuite/gcc.dg/crc-builtin-target32.c index 13db531e93a..43db8c96e16 100644 --- a/gcc/testsuite/gcc.dg/crc-builtin-target32.c +++ b/gcc/testsuite/gcc.dg/crc-builtin-target32.c @@ -2,7 +2,7 @@ /* { dg-require-effective-target int32plus } */ /* { dg-additional-options "-fdump-rtl-expand-details" } */ -#include +#include int8_t crc8_data8 () { diff --git a/gcc/testsuite/gcc.dg/crc-builtin-target64.c b/gcc/testsuite/gcc.dg/crc-builtin-target64.c index 4b3d813995a..09aa39fcd86 100644 --- a/gcc/testsuite/gcc.dg/crc-builtin-target64.c +++ b/gcc/testsuite/gcc.dg/crc-builtin-target64.c @@ -2,7 +2,7 @@ /* { dg-require-effective-target int32plus } */ /* { dg-additional-options "-fdump-rtl-expand-details" } */ -#include +#include int8_t crc8_data8 () { diff --git a/gcc/testsuite/gcc.dg/torture/pr115387-2.c b/gcc/testsuite/gcc.dg/torture/pr115387-2.c index 9e93024b45c..190ad4b0977 100644 --- a/gcc/testsuite/gcc.dg/torture/pr115387-2.c +++ b/gcc/testsuite/gcc.dg/torture/pr115387-2.c @@ -2,7 +2,7 @@ /* { dg-do compile } */ #include -#include +#include char * test (char *string, size_t maxlen) signature.asc Description: PGP signature
Re: [PATCH v2 06/16] Change function versions to be implicitly ordered.
Alfie Richards writes: > On 18/02/2025 12:11, Richard Sandiford wrote: >> Alfie Richards writes: >>> This changes function version structures to maintain the default version >>> as the first declaration in the linked data structures by giving priority >>> to the set containing the default when constructing the structure. >>> >>> This allows for removing logic for moving the default to the first >>> position which was duplicated across target specific code and enables >>> easier reasoning about function sets when checking for a default. >>> >>> gcc/ChangeLog: >>> >>> * cgraph.cc (cgraph_node::record_function_versions): Update to >>> implicitly keep default first. >>> * config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher): >>> Remove reordering. >>> * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher): >>> Remove reordering. >>> * config/riscv/riscv.cc (riscv_get_function_versions_dispatcher): >>> Remove reordering. >>> * config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher): >>> Remove reordering. >>> --- >>> gcc/cgraph.cc| 27 - >>> gcc/config/aarch64/aarch64.cc| 37 +++- >>> gcc/config/i386/i386-features.cc | 33 - >>> gcc/config/riscv/riscv.cc| 41 +++- >>> gcc/config/rs6000/rs6000.cc | 35 +-- >>> 5 files changed, 49 insertions(+), 124 deletions(-) >>> >>> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc >>> index d0b19ad850e..bf6b43d00db 100644 >>> --- a/gcc/cgraph.cc >>> +++ b/gcc/cgraph.cc >>> @@ -247,7 +247,9 @@ cgraph_node::record_function_versions (tree decl1, tree >>> decl2) >>> decl1_v = decl1_node->function_version (); >>> decl2_v = decl2_node->function_version (); >>> >>> - if (decl1_v != NULL && decl2_v != NULL) >>> + /* If the nodes are already linked, skip. */ >>> + if ((decl1_v != NULL && (decl1_v->next || decl1_v->prev)) >>> + && (decl2_v != NULL && (decl2_v->next || decl2_v->prev))) >>> return; >>> >>> if (decl1_v == NULL) >>> @@ -256,18 +258,31 @@ cgraph_node::record_function_versions (tree decl1, >>> tree decl2) >>> if (decl2_v == NULL) >>> decl2_v = decl2_node->insert_new_function_version (); >>> >>> - /* Chain decl2_v and decl1_v. All semantically identical versions >>> - will be chained together. */ >>> + gcc_assert (decl1_v); >>> + gcc_assert (decl2_v); >>> >>> before = decl1_v; >>> after = decl2_v; >>> >>> + /* Go to first after node. */ >>> + while (after->prev != NULL) >>> +after = after->prev; >>> + >>> + while (before->prev != NULL) >>> +before = before->prev; >>> + >>> + /* Potentially swap the nodes to maintain the default always being in the >>> + first position. */ >>> + if (before->next >>> + ? !is_function_default_version (before->this_node->decl) >>> + : is_function_default_version (after->this_node->decl)) >>> +std::swap (before, after); >>> + >>> + /* Go to last node of before. */ >>> while (before->next != NULL) >>> before = before->next; >>> >>> - while (after->prev != NULL) >>> -after= after->prev; >>> - >>> + /* Chain decl2_v and decl1_v. */ >> I think this can be simplified to: >> >>before = decl1_v; >>after = decl2_v; >> >>/* Potentially swap the nodes to maintain the default always being in the >> first position. */ >>if (before->prev || before->next >>? is_function_default_version (after->this_node->decl) >>: !is_function_default_version (before->this_node->decl)) >> std::swap (before, after); >> >>while (before->next != NULL) >> before = before->next; >> >>while (after->prev != NULL) >> after = after->prev; >> >> That is, if one decl is linked (and so the other is not), we only want >> to put the other decl first if it is the default. > I see your point here, which I think relies on the assumption that > functions get > added to the structure one by one rather than in a fractal pattern. > This assumption is already used here subtly so that makes sense. > > I added this logic to at least try make this work in a slightly more > general case as > to tell if a structure contains the default we should check the first > element > of that structure, but it is unnecessary given that knowledge. > > I would prefer to change this to make that more explicit and change this > to be > "add_decl_to_version_into" taking a cgraph_function_version_info for the > existing structure and a decl for the version to add to make this > explicit. Would that change work for you? Yeah, sounds good to me. I agree that it would be better than having to maintain symmetry, and it should make the interface a bit simpler. Honza should have the final say though. Thanks, Richard
Re: [PATCH] RISC-V: Fix some dynamic LMUL costing.
As just agreed in the patchwork meeting let's defer that to stage 1 unless somebody really has a need for better dynamic LMUL right now. -- Regards Robin
[PATCH] rx: avoid adding setpsw for rx_cmpstrn when len is const
We can avoid the setpsw instructions when len is a known constant. When len is zero, the insn result is zero. When len is non-zero, the scmpu instructions will set the flags correctly. Signed-off-by: Keith Packard --- gcc/config/rx/rx.md | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/gcc/config/rx/rx.md b/gcc/config/rx/rx.md index edb2c96603f..8c7974d69a5 100644 --- a/gcc/config/rx/rx.md +++ b/gcc/config/rx/rx.md @@ -2545,6 +2545,16 @@ (define_expand "cmpstrnsi" (match_operand:SI4 "immediate_operand")] ;; Known Align "rx_allow_string_insns" { +bool const_len = CONST_INT_P(operands[3]); +if (const_len) +{ + if (INTVAL(operands[3]) == 0) + { +emit_move_insn (operands[0], operands[3]); +DONE; + } +} + rtx str1 = gen_rtx_REG (SImode, 1); rtx str2 = gen_rtx_REG (SImode, 2); rtx len = gen_rtx_REG (SImode, 3); @@ -2553,6 +2563,11 @@ (define_expand "cmpstrnsi" emit_move_insn (str2, force_operand (XEXP (operands[2], 0), NULL_RTX)); emit_move_insn (len, operands[3]); +/* Set flags in case len is zero */ +if (!const_len) { + emit_insn (gen_setpsw (GEN_INT('C'))); + emit_insn (gen_setpsw (GEN_INT('Z'))); +} emit_insn (gen_rx_cmpstrn (operands[0], operands[1], operands[2])); DONE; } @@ -2590,9 +2605,7 @@ (define_insn "rx_cmpstrn" (clobber (reg:SI 3)) (clobber (reg:CC CC_REG))] "rx_allow_string_insns" - "setpsw z ; Set flags in case len is zero - setpsw c - scmpu ; Perform the string comparison + "scmpu ; Perform the string comparison mov #-1, %0 ; Set up -1 result (which cannot be created ; by the SC insn) bnc?+ ; If Carry is not set skip over -- 2.47.2
Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h
> On 18 Feb 2025, at 2:27 PM, Kyrylo Tkachov wrote: > > > >> On 18 Feb 2025, at 09:48, Kyrylo Tkachov wrote: >> >> >> >>> On 18 Feb 2025, at 09:41, Richard Sandiford >>> wrote: >>> >>> Kyrylo Tkachov writes: Hi Soumya > On 18 Feb 2025, at 09:12, Soumya AR wrote: > > generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses > generic_prefetch_tune in generic_armv8_a_tunings. > > This patch updates the pointer to generic_armv8_a_prefetch_tune. > > This patch was bootstrapped and regtested on aarch64-linux-gnu, no > regression. > > Ok for GCC 15 now? Yes, this looks like a simple oversight. Ok to push to master. >>> >>> I suppose the alternative would be to remove generic_armv8_a_prefetch_tune, >>> since it's (deliberately) identical to generic_prefetch_tune. >> >> Looks like we have one prefetch_tune structure for each of the generic >> tunings (generic, generic_armv8_a, generic_armv9_a). >> For the sake of symmetry it feels a bit better to have them independently >> tunable. >> But as the effects are the same, it may be better to remove it in the >> interest of less code. >> > > I see Soumya has already pushed her patch. I’m okay with either approach tbh, > but if Richard prefers we can remove generic_armv8_a_prefetch_tune in a > separate commit. Yeah, missed Richard’s mail. Let me know which is preferable, thanks. Best, Soumya > Thanks, > Kyrill > > >> Thanks, >> Kyrill >> >>> Thanks, Kyrill > > Signed-off-by: Soumya AR > > gcc/ChangeLog: > > * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch > struct pointer. > > --- > gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h > b/gcc/config/aarch64/tuning_models/generic_armv8_a.h > index 35de3f03296..01080cade46 100644 > --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h > +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h > @@ -184,7 +184,7 @@ static const struct tune_params > generic_armv8_a_tunings = > (AARCH64_EXTRA_TUNE_BASE > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ > - &generic_prefetch_tune, > + &generic_armv8_a_prefetch_tune, > AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ > AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ > }; > -- > 2.34.1
[committed] pair-fusion: Tweak wording in dump message [PR118320]
As discussed in https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675978.html this tweaks the dump messasge added with the fix for PR118320 since it doesn't just apply to load pairs. Tested on aarch64-linux-gnu, pushed to trunk. Alex gcc/ChangeLog: PR rtl-optimization/118320 * pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Tweak wording in dump message when punting on invalid use arrays. diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc index 5708d0f3b67..72e64246534 100644 --- a/gcc/pair-fusion.cc +++ b/gcc/pair-fusion.cc @@ -1742,7 +1742,7 @@ pair_fusion_bb_info::fuse_pair (bool load_p, { if (dump_file) fprintf (dump_file, -" load pair: i%d and i%d use different definiitions of" +" rejecting pair: i%d and i%d use different definiitions of" " the same register\n", insns[0]->uid (), insns[1]->uid ()); return false;
Re: [PATCH] pair-fusion: A couple of fixes for sp updates [PR118429]
Alex Coplan writes: > On 17/02/2025 16:15, Richard Sandiford wrote: >> Alex Coplan writes: >> >> @@ -588,6 +590,10 @@ latest_hazard_before (insn_info *insn, rtx *ignore, >> >>&& find_reg_note (insn->rtl (), REG_EH_REGION, NULL_RTX)) >> >> return insn->prev_nondebug_insn (); >> >> >> >> + if (!is_load_store >> >> + && accesses_include_memory (insn->defs ())) >> >> +return insn->prev_nondebug_insn (); >> > >> > This seems like it might be a little too restrictive. I agree that it's >> > a nice and simple way of solving the problem, but wouldn't it be enough >> > to prevent moving such accesses (stack deallocations) above the latest >> > preceding def or use of mem? Certainly we don't want to start >> > attempting alias analysis here, but is the above suggestion not a happy >> > middle ground (between a simple solution and not overly restricting >> > optimisation)? >> >> Would it help in practice though? Although it is possible to combine >> a deallocation with preceding stores, that only happens for dead code, >> in which case the better optimisation is to delete the stores. >> If we're combining with loads, the loads would normally be restoring >> registers for the caller, in which case the loads could be moved >> forward to the deallocation (since nothing would use or clobber >> the loaded values between the two points). > > I see. I must admit that I don't immediately see why this can only > occur with dead stores, [...] I was thinking of the post-increment case, but yeah, I suppose technically there could be pre-increment cases. It seems very unlikely in practice, given how we manage the frame, but I agree that the case for not trying harder is weaker than I'd initially assumed. Thanks, Richard
Re: [PATCH v2 06/16] Change function versions to be implicitly ordered.
On 18/02/2025 12:11, Richard Sandiford wrote: Alfie Richards writes: This changes function version structures to maintain the default version as the first declaration in the linked data structures by giving priority to the set containing the default when constructing the structure. This allows for removing logic for moving the default to the first position which was duplicated across target specific code and enables easier reasoning about function sets when checking for a default. gcc/ChangeLog: * cgraph.cc (cgraph_node::record_function_versions): Update to implicitly keep default first. * config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher): Remove reordering. * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher): Remove reordering. * config/riscv/riscv.cc (riscv_get_function_versions_dispatcher): Remove reordering. * config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher): Remove reordering. --- gcc/cgraph.cc| 27 - gcc/config/aarch64/aarch64.cc| 37 +++- gcc/config/i386/i386-features.cc | 33 - gcc/config/riscv/riscv.cc| 41 +++- gcc/config/rs6000/rs6000.cc | 35 +-- 5 files changed, 49 insertions(+), 124 deletions(-) diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc index d0b19ad850e..bf6b43d00db 100644 --- a/gcc/cgraph.cc +++ b/gcc/cgraph.cc @@ -247,7 +247,9 @@ cgraph_node::record_function_versions (tree decl1, tree decl2) decl1_v = decl1_node->function_version (); decl2_v = decl2_node->function_version (); - if (decl1_v != NULL && decl2_v != NULL) + /* If the nodes are already linked, skip. */ + if ((decl1_v != NULL && (decl1_v->next || decl1_v->prev)) + && (decl2_v != NULL && (decl2_v->next || decl2_v->prev))) return; if (decl1_v == NULL) @@ -256,18 +258,31 @@ cgraph_node::record_function_versions (tree decl1, tree decl2) if (decl2_v == NULL) decl2_v = decl2_node->insert_new_function_version (); - /* Chain decl2_v and decl1_v. All semantically identical versions - will be chained together. */ + gcc_assert (decl1_v); + gcc_assert (decl2_v); before = decl1_v; after = decl2_v; + /* Go to first after node. */ + while (after->prev != NULL) +after = after->prev; + + while (before->prev != NULL) +before = before->prev; + + /* Potentially swap the nodes to maintain the default always being in the + first position. */ + if (before->next + ? !is_function_default_version (before->this_node->decl) + : is_function_default_version (after->this_node->decl)) +std::swap (before, after); + + /* Go to last node of before. */ while (before->next != NULL) before = before->next; - while (after->prev != NULL) -after= after->prev; - + /* Chain decl2_v and decl1_v. */ I think this can be simplified to: before = decl1_v; after = decl2_v; /* Potentially swap the nodes to maintain the default always being in the first position. */ if (before->prev || before->next ? is_function_default_version (after->this_node->decl) : !is_function_default_version (before->this_node->decl)) std::swap (before, after); while (before->next != NULL) before = before->next; while (after->prev != NULL) after = after->prev; That is, if one decl is linked (and so the other is not), we only want to put the other decl first if it is the default. I see your point here, which I think relies on the assumption that functions get added to the structure one by one rather than in a fractal pattern. This assumption is already used here subtly so that makes sense. I added this logic to at least try make this work in a slightly more general case as to tell if a structure contains the default we should check the first element of that structure, but it is unnecessary given that knowledge. I would prefer to change this to make that more explicit and change this to be "add_decl_to_version_into" taking a cgraph_function_version_info for the existing structure and a decl for the version to add to make this explicit. Would that change work for you? [...] diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 9bf7713139f..e5aa99a4965 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -13726,7 +13726,6 @@ riscv_get_function_versions_dispatcher (void *decl) struct cgraph_node *node = NULL; struct cgraph_node *default_node = NULL; struct cgraph_function_version_info *node_v = NULL; - struct cgraph_function_version_info *first_v = NULL; tree dispatch_decl = NULL; @@ -13743,41 +13742,19 @@ riscv_get_function_versions_dispatcher (void *decl) if (node_v->dispatcher_resolver != NULL) return node_v->dispatcher_resolver;
Re: [PATCH] builtins: Ensure sin and cos properly set errno when INFINITY is passed [PR80042]
On Tue, Feb 18, 2025 at 1:54 PM Peter0x44 wrote: > > 18 Feb 2025 8:51:16 am Richard Biener : > > > On Tue, Feb 18, 2025 at 1:21 AM Sam James wrote: > >> > >> Peter Damianov writes: > >> > >>> POSIX says that sin and cos should set errno to EDOM when infinity is > >>> passed to > >>> them. Make sure this is accounted for in builtins.def, and add tests. > >>> > >>> gcc/ > >>> PR middle-end/80042 > >>> * builtins.def: (sin|cos)(f|l) can set errno. > >>> gcc/testsuite/ > >>> * gcc.dg/pr80042.c: New testcase. > >>> --- > >>> gcc/builtins.def | 20 +- > >>> gcc/testsuite/gcc.dg/pr80042.c | 71 > >>> ++ > >>> 2 files changed, 82 insertions(+), 9 deletions(-) > >>> create mode 100644 gcc/testsuite/gcc.dg/pr80042.c > >>> > >>> [...] > >>> diff --git a/gcc/testsuite/gcc.dg/pr80042.c > >>> b/gcc/testsuite/gcc.dg/pr80042.c > >>> new file mode 100644 > >>> index 000..cc578ae67e2 > >>> --- /dev/null > >>> +++ b/gcc/testsuite/gcc.dg/pr80042.c > >>> @@ -0,0 +1,71 @@ > >>> +/* dg-do run */ > >>> +/* dg-options "-O2 -lm" */ > >> > >> These two lines are missing {}. Please double check the logs from your > >> testsuite run to make sure newly added/changed tests are executed (and > >> in the way you expect). > > > > This test will also FAIL on *BSD IIRC as that doesn't set errno for any > > math > > functions. > > So what do you suggest I do about it? Drop the test, or only enable it > for certain known good targets? > I don't use BSD so cannot test it. Good question. It's also that old glibc did not set errno here. > > > > I'll note GCC models sincos as cexpi which does not set errno, and will > > eventually expand that to sincos or cexp. It does that without any > > restriction on -fno-math-errno. > > Is this a problem? Would I need to disable expansion to cexp with > -fmath-errno make this work? I think that the code might assume sin()/cos() is always CONST/PURE and that for "POSIX-y correctness" we'd have to guard the transform with -fno-math-errno. > > I'll also note the C standard does not document any domain error on +- > > Inf arguments. > > Instead it documents a range error for sin(x) and nonzero x too close > > to zero. > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/sin.html > POSIX does specify it should be a domain error, but C itself doesn't seem > to say anything regarding it other than basically "implementations are > allowed to invent errors for this case". So what's the point of your patch? That GCC does not assume sin/cos will not clobber errno? Maybe the testcase can be rewritten to consider that? Like check that we did not fold the != EDOM checks at compile-time instead of hard-requiring the library to set that error? Richard. > > > > Richard. > > > >> > >>> [...]
RE: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351]
Thanks Richard. > so the obvious fix would be to add > > if (!VECTOR_MODE_P (loop_vinfo->vector_mode)) >return false; I also think of it, but it is too "easy" and then dropped. > Ah, it needs -march=rv64imd_xsfvcp. It can also be reproduced by " -march=rv64imd_zve32x -mrvv-vector-bits=zvl", sorry forgot to mention this. > The error is probably that vect_verify_loop_lens does not do anything > to ensure the checks are done on a relevant mode. With the suggested > added check above this then becomes a missed optimization rather > than an ICE. But it might fall apart if there's not one load/store len mode > to consider? I see, it may fall apart I am afraid, consider RVVM1DImode when rv64gc_zve32x, the riscv_vector_mode_supported_any_target_p will always return true and we may have RVVM1DImode here but zve32x cannot support DI as element size. I will try to reproduce this after this ICE fix. Pan -Original Message- From: Richard Biener Sent: Tuesday, February 18, 2025 5:36 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351] On Tue, Feb 18, 2025 at 10:12 AM Richard Biener wrote: > > On Tue, Feb 18, 2025 at 9:40 AM Li, Pan2 wrote: > > > > Hi Richard, > > > > After some more investigation, the sample code never hit one vectorizable_* > > routines which may check the loop_vinfo->vector_mode, > > and then the loop_vinfo->vector_mode == DImode will hit the > > vect_verify_loop_lens and trigger the assert VECTOR_MODE_P, detail > > flow as below. > > > > vect_analyze_loop_2 > > |- vect_pattern_recog // Hit over-widening pattern and set > > loop_vinfo->vector_mode to DImode > > |- ... > > |- vect_analyze_loop_operations > >|- (gdb) p stmt_info->def_type > >|- $1 = vect_reduction_def > >|- (gdb) p stmt_info->slp_type > >|- $2 = pure_slp > >|- vectorizable_lc_phi // Not Hit > >|- vectorizable_induction // Not Hit > >|- vectorizable_reduction // Not Hit > >|- vectorizable_recurr // Not Hit > >|- vectorizable_live_operation // Not Hit > >|- vect_analyze_stmt > > |- (gdb) p stmt_info->relevant > > |- $3 = vect_unused_in_scope > > |- (gdb) p stmt_info->live > > |- $4 = false > > |- (gdb) p pattern_stmt_info > > |- $5 = (stmt_vec_info) 0x0 > > |- return opt_result::success (); > > OR > > |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP > > analysis\n" > >|- Early return opt_result::success (); > > |- vectorizable_load/store/call_convert/... // Not Hit > >|- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS > > (loop_vinfo).is_empty () > > |- vect_verify_loop_lens (loop_vinfo) > >|- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert > > result in ICE > > > > I am a little hesitant by two options here. > > > > 1. shall we add some condition and dump log here to make the > > vect_analyze_loop_2 failure when loop_vinfo->vector_mode is not supported > > vector mode by target. > > 2. it should not be LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P here? Then we need > > to find out where set the partial vector to true. > > > > Is there any suggestion here? > > static bool > vect_verify_loop_lens (loop_vec_info loop_vinfo) > { > if (LOOP_VINFO_LENS (loop_vinfo).is_empty ()) > return false; > > machine_mode len_load_mode, len_store_mode; > if (!get_len_load_store_mode (loop_vinfo->vector_mode, true) > .exists (&len_load_mode)) > return false; > > so the obvious fix would be to add > > if (!VECTOR_MODE_P (loop_vinfo->vector_mode)) > return false; > > here? But then I wonder how we got to a DImode vector_mode and record > a loop len > in the first place. I could imagine we first end up with DImode but > other stmts using > a vector mode and we record a len for those. But then the above > get_len_load_store_mode > on ->vector_mode seems to assume that all modes we need a len for are > "compatible" with ->vector_mode so I assume recording a LEN would check that. > > I can't reproduce the ICE with a cross on trunk btw. Ah, it needs -march=rv64imd_xsfvcp. So we indeed call vect_record_loop_len with (gdb) p debug_tree (vectype) unit-size align:16 warn_if_not_align:0 symtab:0 alias-set 2 canonical-type 0x77017690 precision:16 min max pointer_to_this > RVVM2HI (gdb) p loop_vinfo->vector_mode $2 = E_DImode from vectorizable_operation and ->vector_mode is set via vect_recog_over_widening_pattern which commits to a DImode vector type ->vector_mode prematurely. The error is probably that vect_verify_loop_lens does not do anything to ensure the checks are done on a relevant mode. With the suggested added check above this then becomes a missed optimization rather than an ICE. But it might fall
Re: [PATCH v2 06/16] Change function versions to be implicitly ordered.
Alfie Richards writes: > This changes function version structures to maintain the default version > as the first declaration in the linked data structures by giving priority > to the set containing the default when constructing the structure. > > This allows for removing logic for moving the default to the first > position which was duplicated across target specific code and enables > easier reasoning about function sets when checking for a default. > > gcc/ChangeLog: > > * cgraph.cc (cgraph_node::record_function_versions): Update to > implicitly keep default first. > * config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher): > Remove reordering. > * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher): > Remove reordering. > * config/riscv/riscv.cc (riscv_get_function_versions_dispatcher): > Remove reordering. > * config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher): > Remove reordering. > --- > gcc/cgraph.cc| 27 - > gcc/config/aarch64/aarch64.cc| 37 +++- > gcc/config/i386/i386-features.cc | 33 - > gcc/config/riscv/riscv.cc| 41 +++- > gcc/config/rs6000/rs6000.cc | 35 +-- > 5 files changed, 49 insertions(+), 124 deletions(-) > > diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc > index d0b19ad850e..bf6b43d00db 100644 > --- a/gcc/cgraph.cc > +++ b/gcc/cgraph.cc > @@ -247,7 +247,9 @@ cgraph_node::record_function_versions (tree decl1, tree > decl2) >decl1_v = decl1_node->function_version (); >decl2_v = decl2_node->function_version (); > > - if (decl1_v != NULL && decl2_v != NULL) > + /* If the nodes are already linked, skip. */ > + if ((decl1_v != NULL && (decl1_v->next || decl1_v->prev)) > + && (decl2_v != NULL && (decl2_v->next || decl2_v->prev))) > return; > >if (decl1_v == NULL) > @@ -256,18 +258,31 @@ cgraph_node::record_function_versions (tree decl1, tree > decl2) >if (decl2_v == NULL) > decl2_v = decl2_node->insert_new_function_version (); > > - /* Chain decl2_v and decl1_v. All semantically identical versions > - will be chained together. */ > + gcc_assert (decl1_v); > + gcc_assert (decl2_v); > >before = decl1_v; >after = decl2_v; > > + /* Go to first after node. */ > + while (after->prev != NULL) > +after = after->prev; > + > + while (before->prev != NULL) > +before = before->prev; > + > + /* Potentially swap the nodes to maintain the default always being in the > + first position. */ > + if (before->next > + ? !is_function_default_version (before->this_node->decl) > + : is_function_default_version (after->this_node->decl)) > +std::swap (before, after); > + > + /* Go to last node of before. */ >while (before->next != NULL) > before = before->next; > > - while (after->prev != NULL) > -after= after->prev; > - > + /* Chain decl2_v and decl1_v. */ I think this can be simplified to: before = decl1_v; after = decl2_v; /* Potentially swap the nodes to maintain the default always being in the first position. */ if (before->prev || before->next ? is_function_default_version (after->this_node->decl) : !is_function_default_version (before->this_node->decl)) std::swap (before, after); while (before->next != NULL) before = before->next; while (after->prev != NULL) after = after->prev; That is, if one decl is linked (and so the other is not), we only want to put the other decl first if it is the default. > [...] > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index 9bf7713139f..e5aa99a4965 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -13726,7 +13726,6 @@ riscv_get_function_versions_dispatcher (void *decl) >struct cgraph_node *node = NULL; >struct cgraph_node *default_node = NULL; >struct cgraph_function_version_info *node_v = NULL; > - struct cgraph_function_version_info *first_v = NULL; > >tree dispatch_decl = NULL; > > @@ -13743,41 +13742,19 @@ riscv_get_function_versions_dispatcher (void *decl) >if (node_v->dispatcher_resolver != NULL) > return node_v->dispatcher_resolver; > > - /* Find the default version and make it the first node. */ > - first_v = node_v; > - /* Go to the beginning of the chain. */ > - while (first_v->prev != NULL) > -first_v = first_v->prev; > - default_version_info = first_v; > - > - while (default_version_info != NULL) > -{ > - struct riscv_feature_bits res; > - int priority; /* Unused. */ > - parse_features_for_version (default_version_info->this_node->decl, > - res, priority); > - if (res.length == 0) > - break; > - default_version_info = default_version_info->next; > -} > + /* The default node is alw
[PATCH] c++: Fix checking assert upon invalid class definition [PR116740]
A checking assert triggers upon the following invalid code since GCC 11: === cut here === class { a (struct b; } struct b === cut here === The problem is that during error recovery, we call set_identifier_type_value_with_scope for B in the global namespace, and the checking assert added via r11-7228-g8f93e1b892850b fails. This patch relaxes that assert to not fail if we've seen a parser error (it a generalization of another fix done to that checking assert via r11-7266-g24bf79f1798ad1). Successfully tested on x86_64-pc-linux-gnu. PR c++/116740 gcc/cp/ChangeLog: * name-lookup.cc (set_identifier_type_value_with_scope): Don't fail assert with ill-formed input. gcc/testsuite/ChangeLog: * g++.dg/parse/crash80.C: New test. --- gcc/cp/name-lookup.cc| 6 ++ gcc/testsuite/g++.dg/parse/crash80.C | 7 +++ 2 files changed, 9 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/parse/crash80.C diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc index d1abb205bc7..742e5d289dc 100644 --- a/gcc/cp/name-lookup.cc +++ b/gcc/cp/name-lookup.cc @@ -5101,10 +5101,8 @@ set_identifier_type_value_with_scope (tree id, tree decl, cp_binding_level *b) if (b->kind == sk_namespace) /* At namespace scope we should not see an identifier type value. */ gcc_checking_assert (!REAL_IDENTIFIER_TYPE_VALUE (id) -/* We could be pushing a friend underneath a template - parm (ill-formed). */ -|| (TEMPLATE_PARM_P -(TYPE_NAME (REAL_IDENTIFIER_TYPE_VALUE (id); +/* But we might end up here with ill-formed input. */ +|| seen_error ()); else { /* Push the current type value, so we can restore it later */ diff --git a/gcc/testsuite/g++.dg/parse/crash80.C b/gcc/testsuite/g++.dg/parse/crash80.C new file mode 100644 index 000..cd9216adf5c --- /dev/null +++ b/gcc/testsuite/g++.dg/parse/crash80.C @@ -0,0 +1,7 @@ +// PR c++/116740 +// { dg-do "compile" } + +class K { + int a(struct b; // { dg-error "expected '\\)'" } +}; +struct b {}; -- 2.44.0
Re: [PATCH v2 08/16] Add get_clone_versions function.
Alfie Richards writes: > This is a reimplementation of get_target_clone_attr_len, > get_attr_str, and separate_attrs using string_slice and auto_vec to make > memory management and use simpler. > > gcc/c-family/ChangeLog: > > * c-attribs.cc (handle_target_clones_attribute): Change to use > get_clone_versions. > > gcc/ChangeLog: > > * tree.cc (get_clone_versions): New function. > (get_clone_attr_versions): New function. > * tree.h (get_clone_versions): New function. > (get_clone_attr_versions): New function. OK for GCC 16, thanks. Richard > --- > gcc/c-family/c-attribs.cc | 2 +- > gcc/tree.cc | 40 +++ > gcc/tree.h| 3 +++ > 3 files changed, 44 insertions(+), 1 deletion(-) > > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc > index f3181e7b57c..642d724f6c6 100644 > --- a/gcc/c-family/c-attribs.cc > +++ b/gcc/c-family/c-attribs.cc > @@ -6129,7 +6129,7 @@ handle_target_clones_attribute (tree *node, tree name, > tree ARG_UNUSED (args), > } > } > > - if (get_target_clone_attr_len (args) == -1) > + if (get_clone_attr_versions (args).length () == 1) > { > warning (OPT_Wattributes, > "single % attribute is ignored"); > diff --git a/gcc/tree.cc b/gcc/tree.cc > index 0743ed71c78..83dc9f32f96 100644 > --- a/gcc/tree.cc > +++ b/gcc/tree.cc > @@ -15356,6 +15356,46 @@ get_target_clone_attr_len (tree arglist) >return str_len_sum; > } > > +/* Returns an auto_vec of string_slices containing the version strings from > + ARGLIST. DEFAULT_COUNT is incremented for each default version found. */ > + > +auto_vec > +get_clone_attr_versions (const tree arglist, int *default_count) > +{ > + gcc_assert (TREE_CODE (arglist) == TREE_LIST); > + auto_vec versions; > + > + static const char separator_str[] = {TARGET_CLONES_ATTR_SEPARATOR, 0}; > + string_slice separators = string_slice (separator_str); > + > + for (tree arg = arglist; arg; arg = TREE_CHAIN (arg)) > +{ > + string_slice str = string_slice (TREE_STRING_POINTER (TREE_VALUE > (arg))); > + while (str.is_valid ()) > + { > + string_slice attr = string_slice::tokenize (&str, separators); > + attr = attr.strip (); > + if (attr == "default" && default_count) > + (*default_count)++; > + versions.safe_push (attr); > + } > +} > + return versions; > +} > + > +/* Returns an auto_vec of string_slices containing the version strings from > + the target_clone attribute from DECL. DEFAULT_COUNT is incremented for > each > + default version found. */ > +auto_vec > +get_clone_versions (const tree decl, int *default_count) > +{ > + tree attr = lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl)); > + if (!attr) > +return auto_vec (); > + tree arglist = TREE_VALUE (attr); > + return get_clone_attr_versions (arglist, default_count); > +} > + > void > tree_cc_finalize (void) > { > diff --git a/gcc/tree.h b/gcc/tree.h > index 21f3cd5525c..70541070c40 100644 > --- a/gcc/tree.h > +++ b/gcc/tree.h > @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see > > #include "tree-core.h" > #include "options.h" > +#include "vec.h" > > /* Convert a target-independent built-in function code to a combined_fn. */ > > @@ -7035,5 +7036,7 @@ extern unsigned fndecl_dealloc_argno (tree); > extern tree get_attr_nonstring_decl (tree, tree * = NULL); > > extern int get_target_clone_attr_len (tree); > +auto_vec get_clone_versions (const tree, int * = NULL); > +auto_vec get_clone_attr_versions (const tree, int * = NULL); > > #endif /* GCC_TREE_H */
Re: [PATCH] arm: Remove inner 'fix:HF/SF/DF' from fixed-point patterns (PR 117712)
On Tue, 18 Feb 2025 at 13:49, Richard Earnshaw (lists) wrote: > > On 18/02/2025 08:37, Christophe Lyon wrote: > > As discussed in the PR, removing the inner 'fix:HF/SD/DF' fixes the > > problem, like other targets do. > > > > The double-'fix' idiom was introduced in > https://gcc.gnu.org/pipermail/gcc-patches/2003-March/098380.html to address > target/5985. Certainly at the time it seems that FIX had two meanings > depending on the mode. If the target was a floating point mode it did a > truncation operation with rounding. If it was an integer mode it did > trucation with unspecified rounding. But the manual doesn't seem to mention > FIX: (at least not now), so I'm wondering if something has been > lost somewhere along the line. > > Anyway, I'm not sure this is right yet. > Well, this adopts the same approach as the fix for PR 117525 (same problem, but on hppa). In that PR there's also a mention of a similar problem on Sparc, and Konstantinos says he is working on a middle-end fix (see comment #9 in PR117712). Let's wait for that, then? Thanks, Christophe > R. > > > gcc/ChangeLog: > > > > PR rtl-optimization/117712 > > * config/arm/arm.md (fix_trunchfsi2): Remove inner fix:HF. > > (fix_trunchfdi2): Likewise. > > (fix_truncsfsi2): Remove inner fix:SF. > > (fix_truncdfsi2): Remove inner fix:DF. > > * config/arm/vfp.md (truncsisf2_vfp): remove inner fix:SF. > > (truncsidf2_vfp): Remove inner fix:DF. > > (fixuns_truncsfsi2): Remove inner fix:SF. > > (fixuns_truncdfsi2): Remove inner fix:DF. > > > > gcc/testsuite/ChangeLog: > > > > PR rtl-optimization/117712 > > * gcc.target/arm/pr117712-df.c: New test. > > * gcc.target/arm/pr117712-hf-di.c: New test. > > * gcc.target/arm/pr117712-hf.c: New test. > > * gcc.target/arm/pr117712-sf.c: New test. > > --- > > gcc/config/arm/arm.md | 8 > > gcc/config/arm/vfp.md | 8 > > gcc/testsuite/gcc.target/arm/pr117712-df.c| 10 ++ > > gcc/testsuite/gcc.target/arm/pr117712-hf-di.c | 10 ++ > > gcc/testsuite/gcc.target/arm/pr117712-hf.c| 10 ++ > > gcc/testsuite/gcc.target/arm/pr117712-sf.c| 10 ++ > > 6 files changed, 48 insertions(+), 8 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-df.c > > create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf-di.c > > create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf.c > > create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-sf.c > > > > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > > index 442d86b9329..ed0d0da2e63 100644 > > --- a/gcc/config/arm/arm.md > > +++ b/gcc/config/arm/arm.md > > @@ -5477,7 +5477,7 @@ (define_expand "floatsidf2" > > > > (define_expand "fix_trunchfsi2" > >[(set (match_operand:SI 0 "general_operand") > > - (fix:SI (fix:HF (match_operand:HF 1 "general_operand"] > > + (fix:SI (match_operand:HF 1 "general_operand")))] > >"TARGET_EITHER" > >" > >{ > > @@ -5489,7 +5489,7 @@ (define_expand "fix_trunchfsi2" > > > > (define_expand "fix_trunchfdi2" > >[(set (match_operand:DI 0 "general_operand") > > - (fix:DI (fix:HF (match_operand:HF 1 "general_operand"] > > + (fix:DI (match_operand:HF 1 "general_operand")))] > >"TARGET_EITHER" > >" > >{ > > @@ -5501,14 +5501,14 @@ (define_expand "fix_trunchfdi2" > > > > (define_expand "fix_truncsfsi2" > >[(set (match_operand:SI 0 "s_register_operand") > > - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand"] > > + (fix:SI (match_operand:SF 1 "s_register_operand")))] > >"TARGET_32BIT && TARGET_HARD_FLOAT" > >" > > ") > > > > (define_expand "fix_truncdfsi2" > >[(set (match_operand:SI 0 "s_register_operand") > > - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand"] > > + (fix:SI (match_operand:DF 1 "s_register_operand")))] > >"TARGET_32BIT && TARGET_HARD_FLOAT && !TARGET_VFP_SINGLE" > >" > > ") > > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > > index 379f5f7b3dc..0ef019b1727 100644 > > --- a/gcc/config/arm/vfp.md > > +++ b/gcc/config/arm/vfp.md > > @@ -1508,7 +1508,7 @@ (define_insn "truncsfhf2" > > > > (define_insn "*truncsisf2_vfp" > >[(set (match_operand:SI 0 "s_register_operand" "=t") > > - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" "t"] > > + (fix:SI (match_operand:SF 1 "s_register_operand" "t")))] > >"TARGET_32BIT && TARGET_HARD_FLOAT" > >"vcvt%?.s32.f32\\t%0, %1" > >[(set_attr "predicable" "yes") > > @@ -1517,7 +1517,7 @@ (define_insn "*truncsisf2_vfp" > > > > (define_insn "*truncsidf2_vfp" > >[(set (match_operand:SI 0 "s_register_operand" "=t") > > - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand" "w"] > > + (fix:SI (match_operand:DF 1 "s_register_oper
Re: [PATCH v2 12/16] Refactor FMV name mangling.
Alfie Richards writes: > diff --git a/gcc/attribs.cc b/gcc/attribs.cc > index b00d9529a8d..d0f37d77098 100644 > --- a/gcc/attribs.cc > +++ b/gcc/attribs.cc > [...] > @@ -1287,6 +1282,33 @@ make_dispatcher_decl (const tree decl) >DECL_EXTERNAL (func_decl) = 1; >/* This will be of type IFUNCs have to be externally visible. */ >TREE_PUBLIC (func_decl) = 1; > + TREE_NOTHROW (func_decl) = TREE_NOTHROW (decl); > + > + /* Set the decl name to avoid graph_node re-mangling it. */ > + SET_DECL_ASSEMBLER_NAME (func_decl, DECL_ASSEMBLER_NAME (decl)); > + > + cgraph_node *node = cgraph_node::get (decl); > + gcc_assert (node); > + cgraph_function_version_info *node_v = node->function_version (); > + gcc_assert (node_v); Very minor suggestion, but: all callers already have the node to hand and pass the decl inside it, so perhaps it would make sense to change make_dispatcher_decl so that it takes the cgraph node instead. > [...] > @@ -19894,37 +19894,35 @@ static aarch64_fmv_feature_datum > aarch64_fmv_feature_data[] = { > the extension string is created and stored to INVALID_EXTENSION. */ > > static enum aarch_parse_opt_result > -aarch64_parse_fmv_features (const char *str, aarch64_feature_flags > *isa_flags, > +aarch64_parse_fmv_features (string_slice str, aarch64_feature_flags > *isa_flags, > aarch64_fmv_feature_mask *feature_mask, > std::string *invalid_extension) > { >if (feature_mask) > *feature_mask = 0ULL; > > - if (strcmp (str, "default") == 0) > + if (str == "default") > return AARCH_PARSE_OK; > > - while (str != NULL && *str != 0) > + string_slice str_parse = str; > + > + gcc_assert (str.is_valid ()); > + while (str_parse.is_valid ()) > { > - const char *ext; > - size_t len; > + string_slice ext; > > - ext = strchr (str, '+'); > + ext = string_slice::tokenize (&str_parse, string_slice ("+")); Following on from the comment about explicit constructors, it'd be nice not to need the explicit constructor here. > - if (ext != NULL) > - len = ext - str; > - else > - len = strlen (str); > + gcc_assert (ext.is_valid ()); > > - if (len == 0) > + if (!ext.is_valid () || ext.empty ()) The assert makes the !ext.is_valid () part redundant. > return AARCH_PARSE_MISSING_ARG; > >int num_features = ARRAY_SIZE (aarch64_fmv_feature_data); >int i; >for (i = 0; i < num_features; i++) > { > - if (strlen (aarch64_fmv_feature_data[i].name) == len > - && strncmp (aarch64_fmv_feature_data[i].name, str, len) == 0) > + if (aarch64_fmv_feature_data[i].name == ext) > { > if (isa_flags) > *isa_flags |= aarch64_fmv_feature_data[i].opt_flags; > [...] > @@ -19992,7 +19987,7 @@ aarch64_process_target_version_attr (tree args) >return false; > } > > - const char *str = TREE_STRING_POINTER (args); > + string_slice str = string_slice (TREE_STRING_POINTER (args)); Similarly here, I'd hope: string_slice str = TREE_STRING_POINTER (args); would be enough. > >enum aarch_parse_opt_result parse_res; >auto isa_flags = aarch64_asm_isa_flags; > @@ -20195,36 +20191,33 @@ tree > aarch64_mangle_decl_assembler_name (tree decl, tree id) > { >/* For function version, add the target suffix to the assembler name. */ > - if (TREE_CODE (decl) == FUNCTION_DECL > - && DECL_FUNCTION_VERSIONED (decl)) > + if (TREE_CODE (decl) == FUNCTION_DECL) > { > - aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version > (decl); > - > - std::string name = IDENTIFIER_POINTER (id); > - > - /* For the default version, append ".default". */ > - if (feature_mask == 0ULL) > + cgraph_node *node = cgraph_node::get (decl); > + if (node && node->dispatcher_function) > + return id; > + else if (node && node->dispatcher_resolver_function) > + return clone_identifier (id, "resolver"); > + else if (DECL_FUNCTION_VERSIONED (decl)) > { > - name += ".default"; > - return get_identifier (name.c_str()); > - } > + aarch64_fmv_feature_mask feature_mask > + = get_feature_mask_for_version (decl); > > - name += "._"; > + if (feature_mask == 0ULL) > + return clone_identifier (id, "default"); > > - int num_features = ARRAY_SIZE (aarch64_fmv_feature_data); > - for (int i = 0; i < num_features; i++) > - { > - if (feature_mask & aarch64_fmv_feature_data[i].feature_mask) > - { > - name += "M"; > - name += aarch64_fmv_feature_data[i].name; > - } > - } > + std::string suffix = "_"; > > - if (DECL_ASSEMBLER_NAME_SET_P (decl)) > - SET_DECL_RTL (decl, NULL); > + int num_features = ARRAY_SIZE (aarch64_fmv_feature_data); > + for (int i = 0; i < num_features; i++) > + if (feature_mask
Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.
On 2/18/25 4:12 AM, Jin Ma wrote: We overlooked the side effects of the rounding mode in the pattern, which can impact the result of float_extend and lead to incorrect optimizations in the final program. This issue likely affects nearly all similar patterns that involve rounding modes, and the tests in this patch only highlight one example. It seems challenging to address, and I only implemented a simple fix, which is not a good way to solve the problem. Any comments on this? gcc/ChangeLog: * config/riscv/vector-iterators.md (UNSPEC_VRM): New. * config/riscv/vector.md: Use UNSPEC for float_extend. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-11.c: New test. So as Kito note, the insn you changed already has a reference to the FRM it needs -- kept in operands[9]. It seems like your patch, while fixing the bug, more likely does so by accident rather than by design. What I see when I look at the dump files is a deeper issue. In the .expand dump we have: (insn 17 16 18 2 (set (reg:HF 147) (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 -1 (nil)) (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ]) (if_then_else:RVVMF2SF (unspec:RVVMF64BI [ (reg/v:RVVMF64BI 138 [ vmask ]) (const_int 1 [0x1]) (const_int 0 [0]) (const_int 2 [0x2]) (const_int 0 [0]) (const_int 2 [0x2]) (reg:SI 66 vl) (reg:SI 67 vtype) (reg:SI 69 frm) ] UNSPEC_VPREDICATE) (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ]) (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (reg:HF 147 (reg/v:RVVMF2SF 140 [ vreg_memory ]))) "j.c":14:24 -1 (nil)) Insn 18 does the subtraction with the adjusted rounding mode. So far, so good. Things look fine at the start of cse1. But if we look at the end of cse1 we have: (insn 17 16 18 2 (set (reg:HF 147) (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 136 {*movhf_hardfloat} (nil)) (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ]) (reg/v:RVVMF2SF 140 [ vreg_memory ])) "j.c":14:24 2786 {*movrvvmf2sf_fract} (expr_list:REG_DEAD (reg:HF 147) (expr_list:REG_DEAD (reg/v:RVVMF2SF 140 [ vreg_memory ]) (expr_list:REG_DEAD (reg/v:RVVMF64BI 138 [ vmask ]) (expr_list:REG_DEAD (reg:SI 69 frm) (nil)) Note how CSE replace the arithmetic with a simple copy. At this point things are broken. I don't see how CSE can make the right decision here; we don't expose rounding modes this early and thus CSE has no way to know it can't make that kind of replacement. You patch kindof works, but it seems to me it's more accident than design and that we need to fix this in a more general manner. The natural question is what do other targets do when the rounding mode gets changed. I'm guessing its exposed as a unspec set before the RTL optimizers run. jeff
Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h
On Tue, Feb 18, 2025 at 8:26 PM Uros Bizjak wrote: > > On Tue, Feb 18, 2025 at 8:23 PM Richard Biener wrote: > > > > > > > > > Am 18.02.2025 um 20:07 schrieb Roman Kagan : > > > > > > On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak wrote: > > >>> On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan wrote: > > >>> > > >>> On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote: > > When gcc is built for x86_64-linux-musl target, stack unwinding from > > within signal handler stops at the innermost signal frame. The reason > > for this behaviro is that the signal trampoline is not accompanied with > > appropiate CFI directives, and the fallback path in libgcc to recognize > > it by the code sequence is only enabled for glibc except 2.0. The > > latter is motivated by the lack of sys/ucontext.h in that glibc > > version. > > > > Given that all relevant libc-s ship sys/ucontext.h for over a decade, > > and that other arches aren't shy of unconditionally using it, follow > > suit and remove the preprocessor condition, too. > > >> > > >> "Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC, > > >> LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest > > >> glibc 2.0.x version was released in 1997 [1], so I guess we can remove > > >> the condition for version 2.0. Based on your claim, the other > > >> mentioned libcs also provide the required header for a long time. > > > > > > Ah, good point, for completeness I should've supplied evidence from > > > their respective git repos, here you go: > > > > > > uclibc(-ng): > > > libc/sysdeps/linux/i386/sys/ucontext.h > > > > > >commit 9cee42f10dbc5b33866ff137b926a74abd7c1a5b > > >Author: Eric Andersen > > >Date: Fri Mar 1 20:46:26 2002 + > > > > > >Major rework of the include files to eliminate redundancy > > >and to better support each arch. This is a really big patch... > > > -Erik > > > > > > libc/sysdeps/linux/i386/sys/ucontext.h > > > > > >commit 1fef64b22811709b2e640d341237bce1c8081203 > > >Author: Mike Frysinger > > >Date: Tue Feb 15 01:27:10 2005 + > > > > > >headers for x86_64 > > > > > > bionic: > > > libc/include/sys/ucontext.h > > > > > >commit e61d106008f7d77fa1c0de43ac27311320225135 > > >Author: Pavel Chupin > > >Date: Mon Jan 27 17:56:43 2014 +0400 > > > > > >Add x86_64 ucontext.h for better compatibility > > > > > >As suggested here: > > > https://android-review.googlesource.com/#/c/71267/ > > >it may be used for x86_64 libunwind enabling. > > > > > >Change-Id: I21623261a48ea7099e030d33932556e294d226ff > > >Signed-off-by: Pavel Chupin > > > > > >commit 677a07cb9a3f5964e9ead4d37b9f775d971c61e0 > > >Author: Elliott Hughes > > >Date: Wed Jan 29 16:46:00 2014 -0800 > > > > > >Add x86 . > > > > > >Change-Id: I43e72604f7a932f134733b78094b577415a5edb7 > > > > > > musl: > > > arch/i386/bits/signal.h > > > arch/x86_64/bits/signal.h > > > include/ucontext.h > > > > > >commit ad2fe25041622b6cf426b0f98af0e52c2c9727f6 > > >Author: Rich Felker > > >Date: Fri Feb 18 22:03:03 2011 -0500 > > > > > >support the ugly and deprecated ucontext and sigcontext header > > > stuff... > > > > > >only the structures, not the functions from ucontext.h, are > > > supported > > >at this point. the main goal of this commit is to make modern gcc > > > with > > >dwarf2 unwinding build without errors. > > > > > >honestly, it probably doesn't matter how we define these as long as > > >they have members with the right names to prevent errors while > > >compiling libgcc. the only time they will be used is for > > > propagating > > >exceptions across signal-handler boundaries, which invokes > > > undefined > > >behavior anyway. but as-is, they're probably correct and may be > > > useful > > >to various low-level applications dealing with virtualization, jit > > >code generation, and so on... > > > > > >> I have no objection to the patch, but I think that this patch is a bit > > >> late for gcc-15 and should be committed early in the gcc-16 > > >> development cycle. But let's hear release managers (CC'd). > > > > It’s fine for 15, or rather I’m leaving it for you to decide. > > OK, based on the above research, I'll commit it to gcc-15. Committed as e129b8d7682c9a6c4d874f58de142543d3804169 with the following ChangeLog entry: libgcc/ChangeLog: * config/i386/linux-unwind.h: Remove preprocessor condition to enable fallback path for all libc-s. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Thanks, Uros.
Re: [PATCH] COBOL 12/15 24K pos: Posix adapter framework
On Tue, 18 Feb 2025 09:35:33 +0100 Richard Biener wrote: > > I'm sure you agree we don't want to let this tail wag the dog. > > With my exegesis in mind, what would you recommend? If it's > > limited to more judicious use of makefile variables, I could surely > > implement those suggestions. > > So to simplify things at this point can we postpone merging this bit > then? If you say it's more like a "contrib", wouldn't > putting it in the toplevel contrib/ directory be more appropriate? > Maybe in a contrib/cobol/ subdirectory? As you wish. I'll eliminate it from the next patchset, which I hope will be later today. --jkl
Re: The COBOL front end, version 2, in 15-part harmony
On Tue, 18 Feb 2025 09:37:57 +0100 Richard Biener wrote: > > Except for "lib", patches over 400 KB consist of just one big file. > > For a future possible version 3 of the patch set, you do not need to > send big generated files like 'configure' as part of the patch, but > just the sources/changes to their templates. IIUC, just send normal patches to configure.ac & friends, and ignore the fact that e.g. libgcobol/configure has changed. Will do. --jkl
Re: [PATCH] avoid-store-forwarding: Handle REG_EH_REGION notes
> Am 18.02.2025 um 17:04 schrieb Konstantinos Eleftheriou > : > > From: kelefth > > The pass rejects the transformation when there are instructions in the > sequence that might throw an exception. This was added due to having > cases that the load instruction contains a REG_EH_REGION note and > moving it before the store instructions caused an error, as it was > no longer the last instruction in the basic block. > > This patch handles those cases by moving a possible REG_EH_REGION > note from the load instruction of the store-load sequence to the > last instruction of the basic block. But that’s not a correct transform and will lead to bogus exception handling? You’d need to move the note and split the block, possibly updating the EH info on the side. Richard > gcc/ChangeLog: > >* avoid-store-forwarding.cc (process_store_forwarding): >(store_forwarding_analyzer::avoid_store_forwarding): >Move a possible REG_EH_REGION note from the load instruction >to the last instruction of the basic block. > --- > gcc/avoid-store-forwarding.cc | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc > index 34a7bba4043..05c91bb1a82 100644 > --- a/gcc/avoid-store-forwarding.cc > +++ b/gcc/avoid-store-forwarding.cc > @@ -400,6 +400,17 @@ process_store_forwarding (vec &stores, > rtx_insn *load_insn, > if (load_elim) > delete_insn (load_insn); > > + /* Find possible REG_EH_REGION note in the load instruction and move it > + into the last instruction of the basic block. */ > + rtx reg_eh_region_note = find_reg_note (load_insn, REG_EH_REGION, > NULL_RTX); > + if (reg_eh_region_note != NULL_RTX) > +{ > + remove_note (load_insn, reg_eh_region_note); > + basic_block load_bb = BLOCK_FOR_INSN (load_insn); > + add_reg_note (BB_END (load_bb), REG_EH_REGION, > +XEXP (reg_eh_region_note, 0)); > +} > + > return true; > } > > @@ -425,7 +436,7 @@ store_forwarding_analyzer::avoid_store_forwarding > (basic_block bb) > > rtx set = single_set (insn); > > - if (!set || insn_could_throw_p (insn)) > + if (!set) >{ > store_exprs.truncate (0); > continue; > -- > 2.47.0 >
Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h
> Am 18.02.2025 um 20:07 schrieb Roman Kagan : > > On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak wrote: >>> On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan wrote: >>> >>> On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote: When gcc is built for x86_64-linux-musl target, stack unwinding from within signal handler stops at the innermost signal frame. The reason for this behaviro is that the signal trampoline is not accompanied with appropiate CFI directives, and the fallback path in libgcc to recognize it by the code sequence is only enabled for glibc except 2.0. The latter is motivated by the lack of sys/ucontext.h in that glibc version. Given that all relevant libc-s ship sys/ucontext.h for over a decade, and that other arches aren't shy of unconditionally using it, follow suit and remove the preprocessor condition, too. >> >> "Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC, >> LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest >> glibc 2.0.x version was released in 1997 [1], so I guess we can remove >> the condition for version 2.0. Based on your claim, the other >> mentioned libcs also provide the required header for a long time. > > Ah, good point, for completeness I should've supplied evidence from > their respective git repos, here you go: > > uclibc(-ng): > libc/sysdeps/linux/i386/sys/ucontext.h > >commit 9cee42f10dbc5b33866ff137b926a74abd7c1a5b >Author: Eric Andersen >Date: Fri Mar 1 20:46:26 2002 + > >Major rework of the include files to eliminate redundancy >and to better support each arch. This is a really big patch... > -Erik > > libc/sysdeps/linux/i386/sys/ucontext.h > >commit 1fef64b22811709b2e640d341237bce1c8081203 >Author: Mike Frysinger >Date: Tue Feb 15 01:27:10 2005 + > >headers for x86_64 > > bionic: > libc/include/sys/ucontext.h > >commit e61d106008f7d77fa1c0de43ac27311320225135 >Author: Pavel Chupin >Date: Mon Jan 27 17:56:43 2014 +0400 > >Add x86_64 ucontext.h for better compatibility > >As suggested here: https://android-review.googlesource.com/#/c/71267/ >it may be used for x86_64 libunwind enabling. > >Change-Id: I21623261a48ea7099e030d33932556e294d226ff >Signed-off-by: Pavel Chupin > >commit 677a07cb9a3f5964e9ead4d37b9f775d971c61e0 >Author: Elliott Hughes >Date: Wed Jan 29 16:46:00 2014 -0800 > >Add x86 . > >Change-Id: I43e72604f7a932f134733b78094b577415a5edb7 > > musl: > arch/i386/bits/signal.h > arch/x86_64/bits/signal.h > include/ucontext.h > >commit ad2fe25041622b6cf426b0f98af0e52c2c9727f6 >Author: Rich Felker >Date: Fri Feb 18 22:03:03 2011 -0500 > >support the ugly and deprecated ucontext and sigcontext header stuff... > >only the structures, not the functions from ucontext.h, are supported >at this point. the main goal of this commit is to make modern gcc with >dwarf2 unwinding build without errors. > >honestly, it probably doesn't matter how we define these as long as >they have members with the right names to prevent errors while >compiling libgcc. the only time they will be used is for propagating >exceptions across signal-handler boundaries, which invokes undefined >behavior anyway. but as-is, they're probably correct and may be useful >to various low-level applications dealing with virtualization, jit >code generation, and so on... > >> I have no objection to the patch, but I think that this patch is a bit >> late for gcc-15 and should be committed early in the gcc-16 >> development cycle. But let's hear release managers (CC'd). It’s fine for 15, or rather I’m leaving it for you to decide. > I gather that GCC doesn't have "cc: stable" process similar to Linux, > does it? Patches can be back ported to release branches if they fix regressions or important bugs. You should possibly see to add the missing CFI directives on your system? Richard > > Fine by me anyway. If it lands in GCC repo I'll at least be able to > poke at some downstream maintainers with a link to cherry-pick. > > Thanks, > Roman.
Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h
On Tue, Feb 18, 2025 at 8:23 PM Richard Biener wrote: > > > > > Am 18.02.2025 um 20:07 schrieb Roman Kagan : > > > > On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak wrote: > >>> On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan wrote: > >>> > >>> On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote: > When gcc is built for x86_64-linux-musl target, stack unwinding from > within signal handler stops at the innermost signal frame. The reason > for this behaviro is that the signal trampoline is not accompanied with > appropiate CFI directives, and the fallback path in libgcc to recognize > it by the code sequence is only enabled for glibc except 2.0. The > latter is motivated by the lack of sys/ucontext.h in that glibc version. > > Given that all relevant libc-s ship sys/ucontext.h for over a decade, > and that other arches aren't shy of unconditionally using it, follow > suit and remove the preprocessor condition, too. > >> > >> "Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC, > >> LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest > >> glibc 2.0.x version was released in 1997 [1], so I guess we can remove > >> the condition for version 2.0. Based on your claim, the other > >> mentioned libcs also provide the required header for a long time. > > > > Ah, good point, for completeness I should've supplied evidence from > > their respective git repos, here you go: > > > > uclibc(-ng): > > libc/sysdeps/linux/i386/sys/ucontext.h > > > >commit 9cee42f10dbc5b33866ff137b926a74abd7c1a5b > >Author: Eric Andersen > >Date: Fri Mar 1 20:46:26 2002 + > > > >Major rework of the include files to eliminate redundancy > >and to better support each arch. This is a really big patch... > > -Erik > > > > libc/sysdeps/linux/i386/sys/ucontext.h > > > >commit 1fef64b22811709b2e640d341237bce1c8081203 > >Author: Mike Frysinger > >Date: Tue Feb 15 01:27:10 2005 + > > > >headers for x86_64 > > > > bionic: > > libc/include/sys/ucontext.h > > > >commit e61d106008f7d77fa1c0de43ac27311320225135 > >Author: Pavel Chupin > >Date: Mon Jan 27 17:56:43 2014 +0400 > > > >Add x86_64 ucontext.h for better compatibility > > > >As suggested here: https://android-review.googlesource.com/#/c/71267/ > >it may be used for x86_64 libunwind enabling. > > > >Change-Id: I21623261a48ea7099e030d33932556e294d226ff > >Signed-off-by: Pavel Chupin > > > >commit 677a07cb9a3f5964e9ead4d37b9f775d971c61e0 > >Author: Elliott Hughes > >Date: Wed Jan 29 16:46:00 2014 -0800 > > > >Add x86 . > > > >Change-Id: I43e72604f7a932f134733b78094b577415a5edb7 > > > > musl: > > arch/i386/bits/signal.h > > arch/x86_64/bits/signal.h > > include/ucontext.h > > > >commit ad2fe25041622b6cf426b0f98af0e52c2c9727f6 > >Author: Rich Felker > >Date: Fri Feb 18 22:03:03 2011 -0500 > > > >support the ugly and deprecated ucontext and sigcontext header > > stuff... > > > >only the structures, not the functions from ucontext.h, are supported > >at this point. the main goal of this commit is to make modern gcc > > with > >dwarf2 unwinding build without errors. > > > >honestly, it probably doesn't matter how we define these as long as > >they have members with the right names to prevent errors while > >compiling libgcc. the only time they will be used is for propagating > >exceptions across signal-handler boundaries, which invokes undefined > >behavior anyway. but as-is, they're probably correct and may be > > useful > >to various low-level applications dealing with virtualization, jit > >code generation, and so on... > > > >> I have no objection to the patch, but I think that this patch is a bit > >> late for gcc-15 and should be committed early in the gcc-16 > >> development cycle. But let's hear release managers (CC'd). > > It’s fine for 15, or rather I’m leaving it for you to decide. OK, based on the above research, I'll commit it to gcc-15. Thanks, Uros.
New Ukrainian PO file for 'cpplib' (version 15-b20250216)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'cpplib' has been submitted by the Ukrainian team of translators. The file is available at: https://translationproject.org/latest/cpplib/uk.po (This file, 'cpplib-15-b20250216.uk.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/cpplib/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/cpplib.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h
On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak wrote: > On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan wrote: > > > > On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote: > > > When gcc is built for x86_64-linux-musl target, stack unwinding from > > > within signal handler stops at the innermost signal frame. The reason > > > for this behaviro is that the signal trampoline is not accompanied with > > > appropiate CFI directives, and the fallback path in libgcc to recognize > > > it by the code sequence is only enabled for glibc except 2.0. The > > > latter is motivated by the lack of sys/ucontext.h in that glibc version. > > > > > > Given that all relevant libc-s ship sys/ucontext.h for over a decade, > > > and that other arches aren't shy of unconditionally using it, follow > > > suit and remove the preprocessor condition, too. > > "Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC, > LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest > glibc 2.0.x version was released in 1997 [1], so I guess we can remove > the condition for version 2.0. Based on your claim, the other > mentioned libcs also provide the required header for a long time. Ah, good point, for completeness I should've supplied evidence from their respective git repos, here you go: uclibc(-ng): libc/sysdeps/linux/i386/sys/ucontext.h commit 9cee42f10dbc5b33866ff137b926a74abd7c1a5b Author: Eric Andersen Date: Fri Mar 1 20:46:26 2002 + Major rework of the include files to eliminate redundancy and to better support each arch. This is a really big patch... -Erik libc/sysdeps/linux/i386/sys/ucontext.h commit 1fef64b22811709b2e640d341237bce1c8081203 Author: Mike Frysinger Date: Tue Feb 15 01:27:10 2005 + headers for x86_64 bionic: libc/include/sys/ucontext.h commit e61d106008f7d77fa1c0de43ac27311320225135 Author: Pavel Chupin Date: Mon Jan 27 17:56:43 2014 +0400 Add x86_64 ucontext.h for better compatibility As suggested here: https://android-review.googlesource.com/#/c/71267/ it may be used for x86_64 libunwind enabling. Change-Id: I21623261a48ea7099e030d33932556e294d226ff Signed-off-by: Pavel Chupin commit 677a07cb9a3f5964e9ead4d37b9f775d971c61e0 Author: Elliott Hughes Date: Wed Jan 29 16:46:00 2014 -0800 Add x86 . Change-Id: I43e72604f7a932f134733b78094b577415a5edb7 musl: arch/i386/bits/signal.h arch/x86_64/bits/signal.h include/ucontext.h commit ad2fe25041622b6cf426b0f98af0e52c2c9727f6 Author: Rich Felker Date: Fri Feb 18 22:03:03 2011 -0500 support the ugly and deprecated ucontext and sigcontext header stuff... only the structures, not the functions from ucontext.h, are supported at this point. the main goal of this commit is to make modern gcc with dwarf2 unwinding build without errors. honestly, it probably doesn't matter how we define these as long as they have members with the right names to prevent errors while compiling libgcc. the only time they will be used is for propagating exceptions across signal-handler boundaries, which invokes undefined behavior anyway. but as-is, they're probably correct and may be useful to various low-level applications dealing with virtualization, jit code generation, and so on... > I have no objection to the patch, but I think that this patch is a bit > late for gcc-15 and should be committed early in the gcc-16 > development cycle. But let's hear release managers (CC'd). I gather that GCC doesn't have "cc: stable" process similar to Linux, does it? Fine by me anyway. If it lands in GCC repo I'll at least be able to poke at some downstream maintainers with a link to cherry-pick. Thanks, Roman.
Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h
On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan wrote: > > On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote: > > When gcc is built for x86_64-linux-musl target, stack unwinding from > > within signal handler stops at the innermost signal frame. The reason > > for this behaviro is that the signal trampoline is not accompanied with > > appropiate CFI directives, and the fallback path in libgcc to recognize > > it by the code sequence is only enabled for glibc except 2.0. The > > latter is motivated by the lack of sys/ucontext.h in that glibc version. > > > > Given that all relevant libc-s ship sys/ucontext.h for over a decade, > > and that other arches aren't shy of unconditionally using it, follow > > suit and remove the preprocessor condition, too. "Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC, LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest glibc 2.0.x version was released in 1997 [1], so I guess we can remove the condition for version 2.0. Based on your claim, the other mentioned libcs also provide the required header for a long time. I have no objection to the patch, but I think that this patch is a bit late for gcc-15 and should be committed early in the gcc-16 development cycle. But let's hear release managers (CC'd). [1] https://sourceware.org/glibc/wiki/Glibc%20Timeline Thanks, Uros. > > > > Signed-off-by: Roman Kagan > > --- > > libgcc/config/i386/linux-unwind.h | 7 --- > > 1 file changed, 7 deletions(-) > > > > diff --git a/libgcc/config/i386/linux-unwind.h > > b/libgcc/config/i386/linux-unwind.h > > index fe316ee02cf2..8f37642bbf55 100644 > > --- a/libgcc/config/i386/linux-unwind.h > > +++ b/libgcc/config/i386/linux-unwind.h > > @@ -33,12 +33,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively. > > If not, see > > > > #ifndef inhibit_libc > > > > -/* There's no sys/ucontext.h for glibc 2.0, so no > > - signal-turned-exceptions for them. There's also no configure-run for > > - the target, so we can't check on (e.g.) HAVE_SYS_UCONTEXT_H. Using the > > - target libc version macro should be enough. */ > > -#if defined __GLIBC__ && !(__GLIBC__ == 2 && __GLIBC_MINOR__ == 0) > > - > > #include > > #include > > > > @@ -199,5 +193,4 @@ x86_frob_update_context (struct _Unwind_Context > > *context, > > } > > > > #endif /* ifdef __x86_64__ */ > > -#endif /* not glibc 2.0 */ > > #endif /* ifdef inhibit_libc */ > > Ping? > > Roman.
[pushed: r15-7610] sarif output: fix alphabetization in sarif_scheme_handler::make_sink
No functional change intended. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-7610-g196e8dbddc509c. Signed-off-by: David Malcolm gcc/ChangeLog: * opts-diagnostic.cc (sarif_scheme_handler::make_sink): Put properties in alphabetical order. Signed-off-by: David Malcolm --- gcc/opts-diagnostic.cc | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/opts-diagnostic.cc b/gcc/opts-diagnostic.cc index 6516e5aec7e..cab7925aa34 100644 --- a/gcc/opts-diagnostic.cc +++ b/gcc/opts-diagnostic.cc @@ -434,12 +434,17 @@ sarif_scheme_handler::make_sink (const context &ctxt, const char *unparsed_arg, const scheme_name_and_params &parsed_arg) const { - enum sarif_version version = sarif_version::v2_1_0; label_text filename; + enum sarif_version version = sarif_version::v2_1_0; for (auto& iter : parsed_arg.m_kvs) { const std::string &key = iter.first; const std::string &value = iter.second; + if (key == "file") + { + filename = label_text::take (xstrdup (value.c_str ())); + continue; + } if (key == "version") { static const std::array, @@ -454,11 +459,6 @@ sarif_scheme_handler::make_sink (const context &ctxt, return nullptr; continue; } - if (key == "file") - { - filename = label_text::take (xstrdup (value.c_str ())); - continue; - } /* Key not found. */ auto_vec known_keys; -- 2.26.3
[pushed: r15-7611] analyzer: add more properties to sarif output
Add some more properties to the analyzer's sarif output, to help with debugging -fanalyzer. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Successful run of analyzer integration tests on x86_64-pc-linux-gnu. Pushed to trunk as r15-7611-gfcdcccdbf809f9. gcc/analyzer/ChangeLog: * diagnostic-manager.cc (saved_diagnostic::maybe_add_sarif_properties): Add various properties for debugging, for m_stmt, m_var, and m_duplicates. Remove stray 'if' statement. Capture the kind of the pending_diagnostic. * region-model.cc (poisoned_value_diagnostic::maybe_add_sarif_properties): New. Signed-off-by: David Malcolm --- gcc/analyzer/diagnostic-manager.cc | 26 +- gcc/analyzer/region-model.cc | 13 + 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/gcc/analyzer/diagnostic-manager.cc b/gcc/analyzer/diagnostic-manager.cc index 8db6a533e604..4bf1dce967de 100644 --- a/gcc/analyzer/diagnostic-manager.cc +++ b/gcc/analyzer/diagnostic-manager.cc @@ -1032,12 +1032,36 @@ saved_diagnostic::maybe_add_sarif_properties (sarif_object &result_obj) const props.set_string (PROPERTY_PREFIX "sm", m_sm->get_name ()); props.set_integer (PROPERTY_PREFIX "enode", m_enode->m_index); props.set_integer (PROPERTY_PREFIX "snode", m_snode->m_index); + if (m_stmt) +{ + pretty_printer pp; + pp_gimple_stmt_1 (&pp, m_stmt, 0, (dump_flags_t)0); + props.set_string (PROPERTY_PREFIX "stmt", pp_formatted_text (&pp)); +} + if (m_var) +props.set (PROPERTY_PREFIX "var", tree_to_json (m_var)); if (m_sval) props.set (PROPERTY_PREFIX "sval", m_sval->to_json ()); if (m_state) props.set (PROPERTY_PREFIX "state", m_state->to_json ()); - if (m_best_epath) + // TODO: m_best_epath props.set_integer (PROPERTY_PREFIX "idx", m_idx); + if (m_duplicates.length () > 0) +{ + auto duplicates_arr = ::make_unique (); + for (auto iter : m_duplicates) + { + auto sd_obj = ::make_unique (); + iter->maybe_add_sarif_properties (*sd_obj); + duplicates_arr->append (std::move (sd_obj)); + } + props.set (PROPERTY_PREFIX "duplicates", + std::move (duplicates_arr)); +} +#undef PROPERTY_PREFIX + +#define PROPERTY_PREFIX "gcc/analyzer/pending_diagnostic/" + props.set_string (PROPERTY_PREFIX "kind", m_d->get_kind ()); #undef PROPERTY_PREFIX /* Potentially add pending_diagnostic-specific properties. */ diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 78b086900b48..79378a9e6e5f 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -753,6 +753,19 @@ public: return true; } + void + maybe_add_sarif_properties (sarif_object &result_obj) const final override + { +sarif_property_bag &props = result_obj.get_or_create_properties (); +#define PROPERTY_PREFIX "gcc/analyzer/poisoned_value_diagnostic/" +props.set (PROPERTY_PREFIX "expr", tree_to_json (m_expr)); +props.set_string (PROPERTY_PREFIX "kind", poison_kind_to_str (m_pkind)); +if (m_src_region) + props.set (PROPERTY_PREFIX "src_region", m_src_region->to_json ()); +props.set (PROPERTY_PREFIX "check_expr", tree_to_json (m_check_expr)); +#undef PROPERTY_PREFIX + } + private: tree m_expr; enum poison_kind m_pkind; -- 2.26.3
RE: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]
I see, thanks Richard S for explaining, that makes sense to me and we do similar things for frm. It sounds like we need to re-visit what the semantics of vxrm is, from the spec I only find below words. Does that indicates callee-save(the spec doesn't mention it but it should if it is) or something different? Like single-use and then discard. I may wait a while for the official explanation. >From spec: "The vxrm and vxsat fields of vcsr are not preserved across calls >and their values are unspecified upon entry. " Pan -Original Message- From: Richard Sandiford Sent: Monday, February 17, 2025 7:48 PM To: Li, Pan2 Cc: Jeff Law ; Andrew Waterman ; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103] Richard Sandiford writes: > The problem seems to be that mode-switching overloads VXRM_MODE_NONE > to mean both "no requirement" and "unknown state". So we have: > > static int > singleton_vxrm_need (void) > { > /* Only needed for vector code. */ > if (!TARGET_VECTOR) > return VXRM_MODE_NONE; This was a bad example, sorry. What matters more is that non-vector instructions are also VXRM_MODE_NONE. Or more specifically: > > and: > > if (vxrm_unknown_p (insn)) > return VXRM_MODE_NONE; > > This means that VXRM is assumed to be transparent in an instruction > that matches vxrm_unknown_p. ...the function: static int riscv_vxrm_mode_after (rtx_insn *insn, int mode) { if (vxrm_unknown_p (insn)) return VXRM_MODE_NONE; if (recog_memoized (insn) < 0) return mode; if (reg_mentioned_p (gen_rtx_REG (SImode, VXRM_REGNUM), PATTERN (insn))) return get_attr_vxrm_mode (insn); else return mode; } will return VXRM_MODE_NONE if: (a) insn is something like a call (b) insn is a normal instruction that does not mention VXRM at all and mode is already VXRM_MODE_NONE (b) is the transparent case but (a) is a kill. Since the block walk starts with VXRM_MODE_NONE as the initial mode, there needs to be another mode that (a) can use to indicate a kill. Thanks, Richard
Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]
Hi Kyrill, Thanks for your comments, and for answering my question RE your work. Happy to apply those changes in the next revision. Cheers, Spencer
[PATCH] aarch64: Ignore target pragmas while defining intrinsics
When initialising intrinsics with `#pragma GCC aarch64 "arm_*.h"`, we often set an explicit target, but currently leave current_target_pragma unchanged. This results in the target pragma being applied to each simulated intrinsic on top of our explicit target, which is clearly undesirable. As far as I can tell this doesn't cause any bugs at the moment, because none of the behaviour for builtin functions depends upon the function specific target. However, the unintended target feature combinations led to unwanted behaviour in an under-developement patch. This patch resolves the issue by extending aarch64_simd_switcher to explicitly unset the current_target_pragma, and adapting it for to support handle_arm_acle_h as well. I've also renamed the switcher classes and instances, because I think the new names a slightly clearer. The chosen sets of features for arm_sve.h and arm_sme.h are not normally valid, because they exclude FCMA and BF16. However, I don't think that matters for the usage here. Alternatively, aarch64_target_switcher could be modified to enable all the dependent features as well. Bootstrapped and regression tested on aarch64. Ok for master (to enable the dependant WIP patch)? gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (aarch64_simd_switcher::aarch64_simd_switcher): Rename to... (aarch64_target_switcher::aarch64_target_switcher): ...this, remove default simd flags and save current_target_pragma. (aarch64_simd_switcher::~aarch64_simd_switcher): Rename to... (aarch64_target_switcher::~aarch64_target_switcher): ...this, and restore current_target_pragma. (handle_arm_acle_h): Use aarch64_target_switcher. (handle_arm_neon_h): Rename switcher and pass explicit flags. (aarch64_general_init_builtins): Ditto. * config/aarch64/aarch64-protos.h (class aarch64_simd_switcher): Rename to... (class aarch64_target_switcher): ...this, and add pragma member. * config/aarch64/aarch64-sve-builtins.cc (sve_switcher::sve_switcher): Rename to... (sve_target_switcher::sve_target_switcher): ...this. (sve_switcher::~sve_switcher): Rename to... (sve_target_switcher::~sve_target_switcher): ...this. (init_builtins): Rename switcher. (handle_arm_sve_h): Ditto. (handle_arm_neon_sve_bridge_h): Ditto. (handle_arm_sme_h): Ditto. * config/aarch64/aarch64-sve-builtins.h (class sve_switcher): Rename to... (class sve_target_switcher): ...this. (class sme_switcher): Rename to... (class sme_target_switcher): ...this. diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 128cc365d3d585e01cb69668f285318ee56a36fc..c1cb6cdcc81c6b45c0132250589bba0be42f195d 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -1877,23 +1877,25 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type t) return (t == Poly8_t || t == Poly16_t || t == Poly64_t || t == Poly128_t); } -/* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced SIMD - set. */ -aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags extra_flags) +/* Temporarily set FLAGS as the enabled target features. */ +aarch64_target_switcher::aarch64_target_switcher (aarch64_feature_flags flags) : m_old_asm_isa_flags (aarch64_asm_isa_flags), -m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY) +m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY), +m_old_target_pragma (current_target_pragma) { /* Changing the ISA flags should be enough here. We shouldn't need to pay the compile-time cost of a full target switch. */ global_options.x_target_flags &= ~MASK_GENERAL_REGS_ONLY; - aarch64_set_asm_isa_flags (AARCH64_FL_FP | AARCH64_FL_SIMD | extra_flags); + aarch64_set_asm_isa_flags (flags); + current_target_pragma = NULL_TREE; } -aarch64_simd_switcher::~aarch64_simd_switcher () +aarch64_target_switcher::~aarch64_target_switcher () { if (m_old_general_regs_only) global_options.x_target_flags |= MASK_GENERAL_REGS_ONLY; aarch64_set_asm_isa_flags (m_old_asm_isa_flags); + current_target_pragma = m_old_target_pragma; } /* Implement #pragma GCC aarch64 "arm_neon.h". @@ -1903,7 +1905,7 @@ aarch64_simd_switcher::~aarch64_simd_switcher () void handle_arm_neon_h (void) { - aarch64_simd_switcher simd; + aarch64_target_switcher switcher (AARCH64_FL_FP | AARCH64_FL_SIMD); /* Register the AdvSIMD vector tuple types. */ for (unsigned int i = 0; i < ARM_NEON_H_TYPES_LAST; i++) @@ -2353,6 +2355,8 @@ aarch64_init_data_intrinsics (void) void handle_arm_acle_h (void) { + aarch64_target_switcher switcher; + aarch64_init_ls64_builtins (); aarch64_init_tme_builtins (); aarch64_init_memtag_builtins (); @@ -2446,7 +2450,7 @@ aarch64_general_init_builtins (void) aarch64_init_bf16_types ();
[PATCH] c++: Use capture from outer lambda, if any, instead of erroring out [PR110584]
We've been rejecting this valid code since r8-4571: === cut here === void foo (float); int main () { constexpr float x = 0; (void) [&] () { foo (x); (void) [] () { foo (x); }; }; } === cut here === The problem is that when processing X in the inner lambda, process_outer_var_ref errors out even though it does find the capture from the enclosing lambda. This patch changes process_outer_var_ref to accept and return the outer proxy if it finds any. Successfully tested on x86_64-pc-linux-gnu. PR c++/110584 gcc/cp/ChangeLog: * semantics.cc (process_outer_var_ref): Use capture from enclosing lambda, if any. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-nested10.C: New test. --- gcc/cp/semantics.cc | 4 ++ .../g++.dg/cpp0x/lambda/lambda-nested10.C | 46 +++ 2 files changed, 50 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 7c7d3e3c432..7bbc82f7dc1 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -4598,6 +4598,10 @@ process_outer_var_ref (tree decl, tsubst_flags_t complain, bool odr_use) if (!odr_use && context == containing_function) decl = add_default_capture (lambda_stack, /*id=*/DECL_NAME (decl), initializer); + /* When doing lambda capture, if we found a capture in an enclosing lambda, + we can use it. */ + else if (!odr_use && is_capture_proxy (decl)) +return decl; /* Only an odr-use of an outer automatic variable causes an error, and a constant variable can decay to a prvalue constant without odr-use. So don't complain yet. */ diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C new file mode 100644 index 000..2dd9dd4955e --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C @@ -0,0 +1,46 @@ +// PR c++/110584 +// { dg-do "run" { target c++11 } } + +void foo (int i) { + if (i != 0) +__builtin_abort (); +} + +int main () { + const int x = 0; + + // We would error out on this. + (void) [&] () { +foo (x); +(void)[] () { + foo (x); +}; + } (); + // As well as those. + (void) [&] () { +(void) [] () { + foo (x); +}; + } (); + (void) [&x] () { +(void) [] () { + foo (x); +}; + } (); + // But those would work already. + (void) [] () { +(void) [&] () { + foo (x); +}; + } (); + (void) [&] () { +(void) [&] () { + foo (x); +}; + } (); + (void) [=] () { +(void) [] () { + foo (x); +}; + } (); +} -- 2.44.0
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
Hello, I looked into updating the hook > -/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE. */ > +/* Implement TARGET_CALLEE_SAVE_COST. */ > > static int > -ix86_ira_callee_saved_register_cost_scale (int) > +ix86_callee_save_cost (spill_cost_type, unsigned int, machine_mode, > +unsigned int, int mem_cost, const HARD_REG_SET &, bool) > { > - return 1; > + /* Account for the fact that push and pop are shorter and do their > + own allocation and deallocation. */ > + return mem_cost - 2; > } I think this is fine for usual performance metrics of push/pop. For size we now end up with cost of 0, which is likely not right, so I added a special case and return 1. Size costs do not quite correspond to mov-mov sizes, so I will try to fix it and see if that results in better code size. I also added a test that regno in question is integer registers. While we do not callee save XMM for the defualt ABI, Microsoft version does. I am not sure how push2 and pushp extensions comes into game, but we can do that once we have hardward to test. Concerning x86 specifics, there is cost for allocating stack frame. So if the function has nothing on stack frame push/pop becomes bit better candidate then a spill. The hook you added does not seem to be able to test this, since it does not have frame size as an parameter. I wonder if there is easy way to get it in? Also for old CPUs with no stack prediction engine we split either one or two push instructions into adjustemnet+move pair. I do not see how to put that into game, since the cost of 1 or 2 reigsters then differs from 3 or more, but also I think we do not need to care about this, since all reaosnably current CPUs have stack prediction. I am benchmarking updated patch and will send once it is done. Honza
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
> Jan Hubicka writes: > > Concerning x86 specifics, there is cost for allocating stack frame. So > > if the function has nothing on stack frame push/pop becomes bit better > > candidate then a spill. The hook you added does not seem to be able to > > test this, since it does not have frame size as an parameter. I wonder > > if there is easy way to get it in? > > The main frame size is available globally as get_frame_size (). > There's also the question of whether a frame needs to be created > for other reasons, such as an alloca call, but I suppose setting > up a frame for just alloca would also use push on x86? Usually the frame is first created by push/pop instructions (which are callee saves and possibly frame pointer) and the remaining capacity is allocated using add/sub of ESP pointer. If these can be avoided we save about 8 bytes of code. Performance wise the stack engine will likely completely hide the overhead of extra add/sub. We need add/sub for caller saves, spilling and on-stack variables. We may be able to hide it in red-zone, but only for leafs. get_frame_size I think only tells me about hte on-stack variables at the time ira-color is performed. This is something that would be nice to model better, but also is likely not critical. So I only mentioned it in case you or Vladimir can come up with a nice way to fit this in. > > > Also for old CPUs with no stack prediction engine we split either one or > > two push instructions into adjustemnet+move pair. I do not see how to > > put that into game, since the cost of 1 or 2 reigsters then differs from > > 3 or more, but also I think we do not need to care about this, since all > > reaosnably current CPUs have stack prediction. > > Yeah. The hook does allow you test how many registers have been pushed, > and how many will be pushed after the change that is being costed. > But giving a higher cost for the first two registers would probably > tend to penalise using callee-saved registers for the first few allocnos > that we colour, which are also likely to be the most important allocnos. > Trying to cost the difference might therefore be counter-productive. Actually my memory got this backwards. While I experimented by avoiding only some push/pop instructions on CPUs w/o stack engine (those were produced before 2003) it is not in mainline. All we do is the oposite conversion. Sometimes we turn sub/add of ESP into shorter but more expensive push or pop. This may be accounted in frame allocation cost, but again, it is only about extra old CPUs. Honza > > > I am benchmarking updated patch and will send once it is done. > > Thanks! > > Richard
[RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.
We overlooked the side effects of the rounding mode in the pattern, which can impact the result of float_extend and lead to incorrect optimizations in the final program. This issue likely affects nearly all similar patterns that involve rounding modes, and the tests in this patch only highlight one example. It seems challenging to address, and I only implemented a simple fix, which is not a good way to solve the problem. Any comments on this? gcc/ChangeLog: * config/riscv/vector-iterators.md (UNSPEC_VRM): New. * config/riscv/vector.md: Use UNSPEC for float_extend. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-11.c: New test. Reported-by: CunJian Huang Signed-off-by: Jin Ma --- gcc/config/riscv/vector-iterators.md | 3 +++ gcc/config/riscv/vector.md| 6 +++-- .../gcc.target/riscv/rvv/base/bug-11.c| 24 +++ 3 files changed, 31 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index c1bd7397441..bd592f736e2 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -120,6 +120,9 @@ (define_c_enum "unspec" [ UNSPEC_SF_VFNRCLIP UNSPEC_SF_VFNRCLIPU + + ;; Side effects of rounding mode + UNSPEC_VRM ]) (define_c_enum "unspecv" [ diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 8ee43cf0ce1..e971dcdc973 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -7135,8 +7135,10 @@ (define_insn "@pred_single_widen__scalar" (plus_minus:VWEXTF (match_operand:VWEXTF 3 "register_operand"" vr, vr, vr, vr") (float_extend:VWEXTF - (vec_duplicate: - (match_operand: 4 "register_operand" " f, f, f, f" + (unspec:VWEXTF + [(vec_duplicate: + (match_operand: 4 "register_operand" " f, f, f, f")) + (reg:SI FRM_REGNUM)] UNSPEC_VRM))) (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0, vu, 0")))] "TARGET_VECTOR" "vfw.wf\t%0,%3,%4%p1" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c new file mode 100644 index 000..52d940cb57a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c @@ -0,0 +1,24 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O2" } */ + +#include + +int main () +{ + float data_store = 0; + int8_t mask = 1; + size_t vl = 1; + float data_load = 0.0; + _Float16 data_sub = 0.0; + vint8mf8_t mask_value = __riscv_vle8_v_i8mf8 (&mask, vl); + vbool64_t vmask = __riscv_vmseq_vx_i8mf8_b64 (mask_value, 1, vl); + vfloat32mf2_t vd_load = __riscv_vfmv_v_f_f32mf2 (0, __riscv_vsetvlmax_e32mf2 ()); + vfloat32mf2_t vreg_memory = __riscv_vle32_v_f32mf2_tu (vd_load, &data_load, vl); + vfloat32mf2_t vreg = __riscv_vfwsub_wf_f32mf2_rm_tum (vmask, vreg_memory, vreg_memory, data_sub, __RISCV_FRM_RDN, vl); + __riscv_vse32_v_f32mf2 (&data_store, vreg, vl); + + __builtin_printf ("%f\n", data_store); + return 0; +} + +/* { dg-output "-0.00\\s+\n" } */ -- 2.25.1
Re: [PATCH] builtins: Ensure sin and cos properly set errno when INFINITY is passed [PR80042]
On 2025-02-18 13:30, Richard Biener wrote: On Tue, Feb 18, 2025 at 1:54 PM Peter0x44 wrote: 18 Feb 2025 8:51:16 am Richard Biener : > On Tue, Feb 18, 2025 at 1:21 AM Sam James wrote: >> >> Peter Damianov writes: >> >>> POSIX says that sin and cos should set errno to EDOM when infinity is >>> passed to >>> them. Make sure this is accounted for in builtins.def, and add tests. >>> >>> gcc/ >>> PR middle-end/80042 >>> * builtins.def: (sin|cos)(f|l) can set errno. >>> gcc/testsuite/ >>> * gcc.dg/pr80042.c: New testcase. >>> --- >>> gcc/builtins.def | 20 +- >>> gcc/testsuite/gcc.dg/pr80042.c | 71 >>> ++ >>> 2 files changed, 82 insertions(+), 9 deletions(-) >>> create mode 100644 gcc/testsuite/gcc.dg/pr80042.c >>> >>> [...] >>> diff --git a/gcc/testsuite/gcc.dg/pr80042.c >>> b/gcc/testsuite/gcc.dg/pr80042.c >>> new file mode 100644 >>> index 000..cc578ae67e2 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/pr80042.c >>> @@ -0,0 +1,71 @@ >>> +/* dg-do run */ >>> +/* dg-options "-O2 -lm" */ >> >> These two lines are missing {}. Please double check the logs from your >> testsuite run to make sure newly added/changed tests are executed (and >> in the way you expect). > > This test will also FAIL on *BSD IIRC as that doesn't set errno for any > math > functions. So what do you suggest I do about it? Drop the test, or only enable it for certain known good targets? I don't use BSD so cannot test it. Good question. It's also that old glibc did not set errno here. > > I'll note GCC models sincos as cexpi which does not set errno, and will > eventually expand that to sincos or cexp. It does that without any > restriction on -fno-math-errno. Is this a problem? Would I need to disable expansion to cexp with -fmath-errno make this work? I think that the code might assume sin()/cos() is always CONST/PURE and that for "POSIX-y correctness" we'd have to guard the transform with -fno-math-errno. Okay. I will look at doing that. > I'll also note the C standard does not document any domain error on +- > Inf arguments. > Instead it documents a range error for sin(x) and nonzero x too close > to zero. https://pubs.opengroup.org/onlinepubs/9699919799/functions/sin.html POSIX does specify it should be a domain error, but C itself doesn't seem to say anything regarding it other than basically "implementations are allowed to invent errors for this case". So what's the point of your patch? That GCC does not assume sin/cos will not clobber errno? Maybe the testcase can be rewritten to consider that? Like check that we did not fold the != EDOM checks at compile-time instead of hard-requiring the library to set that error? Yes, that's the point. I'm not really sure how to check that specifically instead of executing the code, but I should figure it out. I think a test written in this way would also avoid the mentioned problems of the libraries which don't set errno. Richard. > > Richard. > >> >>> [...]
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
Jan Hubicka writes: > Concerning x86 specifics, there is cost for allocating stack frame. So > if the function has nothing on stack frame push/pop becomes bit better > candidate then a spill. The hook you added does not seem to be able to > test this, since it does not have frame size as an parameter. I wonder > if there is easy way to get it in? The main frame size is available globally as get_frame_size (). There's also the question of whether a frame needs to be created for other reasons, such as an alloca call, but I suppose setting up a frame for just alloca would also use push on x86? > Also for old CPUs with no stack prediction engine we split either one or > two push instructions into adjustemnet+move pair. I do not see how to > put that into game, since the cost of 1 or 2 reigsters then differs from > 3 or more, but also I think we do not need to care about this, since all > reaosnably current CPUs have stack prediction. Yeah. The hook does allow you test how many registers have been pushed, and how many will be pushed after the change that is being costed. But giving a higher cost for the first two registers would probably tend to penalise using callee-saved registers for the first few allocnos that we colour, which are also likely to be the most important allocnos. Trying to cost the difference might therefore be counter-productive. > I am benchmarking updated patch and will send once it is done. Thanks! Richard
Re: [PATCH v2 03/16] Add string_slice class.
Alfie Richards writes: > The string_slice inherits from array_slice and is used to refer to a > substring of an array that is memory managed elsewhere without modifying > the underlying array. > > For example, this is useful in cases such as when needing to refer to a > substring of an attribute in the syntax tree. > > This commit also adds some minimal helper functions for string_slice, > such as a strtok alternative, equality operators, strcmp, and a function > to strip whitespace from the beginning and end of a string_slice. > > gcc/ChangeLog: > > * vec.cc (string_slice::strtok): New method. > (strcmp): Add implementation for string_slice. > (string_slice::strip): New method. > (test_string_slice_initializers): New test. > (test_string_slice_strtok): Ditto. > (test_string_slice_strcmp): Ditto. > (test_string_slice_equality): Ditto. > (test_string_slice_invalid): Ditto. > (test_string_slice_strip): Ditto. > (vec_cc_tests): Add new tests. > * vec.h (class string_slice): New class. > (strcmp): Add implementation for string_slice. Thanks, mostly LGTM. Some very minor things below, and a question: > diff --git a/gcc/vec.cc b/gcc/vec.cc > index 55f5f3dd447..189cb492c7e 100644 > --- a/gcc/vec.cc > +++ b/gcc/vec.cc > @@ -176,6 +176,61 @@ dump_vec_loc_statistics (void) >vec_mem_desc.dump (VEC_ORIGIN); > } > > +string_slice > +string_slice::tokenize (string_slice *str, string_slice delims) > +{ > + const char *ptr = str->begin (); > + > + gcc_assert (str->is_valid () && delims.is_valid ()); > + > + for (; ptr < str->end (); ptr++) > +for (char c : delims) > + if (*ptr == c) > + { > + /* Update the input string to be the remaining string. */ > + const char* str_begin = str->begin (); Formatting nit: const char *str_begin > + *str = string_slice (ptr + 1, str->end ()); > + return string_slice (str_begin, ptr); > + } > + > + /* If no deliminators between the start and end, return the whole string. > */ > + string_slice res = *str; > + *str = string_slice::invalid (); > + return res; > +} > + > +int > +strcmp (string_slice str1, string_slice str2) > +{ > + for (unsigned int i = 0; i < str1.size () && i < str2.size (); i++) > +{ > + if (str1[i] < str2[i]) > + return -1; > + if (str1[i] > str2[i]) > + return 1; > +} > + > + if (str1.size () < str2.size ()) > +return -1; > + if (str1.size () > str2.size ()) > +return 1; > + return 0; > +} > + > +string_slice > +string_slice::strip () > +{ > + const char *start = this->begin (); > + const char *end = this->end (); > + > + while (start < end && ISSPACE (*start)) > +start++; > + while (end > start && ISSPACE (*(end-1))) > +end--; > + > + return string_slice (start, end-start); Just string_slice (start, end) should be enough. > +} > + > #if CHECKING_P > /* Report qsort comparator CMP consistency check failure with P1, P2, P3 as > witness elements. */ > [...] > diff --git a/gcc/vec.h b/gcc/vec.h > index 915df06f03e..d709d339d40 100644 > --- a/gcc/vec.h > +++ b/gcc/vec.h > @@ -2484,4 +2484,69 @@ make_array_slice (T *base, unsigned int size) > # pragma GCC poison m_vec m_vecpfx m_vecdata > #endif > > +/* string_slice inherits from array_slice, specifically to refer to a > substring > + of a character array. > + It includes some string like helpers. */ > +class string_slice : public array_slice > +{ > +public: > + explicit string_slice () : array_slice () {} > + explicit string_slice (const char *str) : array_slice (str, strlen (str)) > {} > + explicit string_slice (const char *str, size_t len) : > +array_slice (str, len) {} > + explicit string_slice (const char *start, const char *end) : > +array_slice (start, end-start) {} Formatting nit: end - start. What was the reason for making the constructors explicit? It would be nice if string literals at least could be used implicitly. Thanks, Richard > + > + friend bool operator== (const string_slice &lhs, const string_slice &rhs) > + { > +if (!lhs.is_valid () || !rhs.is_valid ()) > + return false; > +if (lhs.size () != rhs.size ()) > + return false; > +return memcmp (lhs.begin (), rhs.begin (), lhs.size ()) == 0; > + } > + > + friend bool operator== (const char *lhs, const string_slice &rhs) > + { > +return string_slice (lhs) == rhs; > + } > + > + friend bool operator== (const string_slice &lhs, const char *rhs) > + { > +return lhs == string_slice (rhs); > + } > + > + friend bool operator!= (const string_slice &lhs, const string_slice &rhs) > + { > +return !(lhs == rhs); > + } > + > + friend bool operator!= (const char *lhs, const string_slice &rhs) > + { > +return !(string_slice (lhs) == rhs); > + } > + > + friend bool operator!= (const string_slice &lhs, const char *rhs) > + { > +return !(lhs == string_slice (rhs)); > + } > + > + /* Returns an inval
[PATCH][stage1] middle-end/60779 - LTO vs. -fcx-fortran-rules and -fcx-limited-range
The following changes how flag_complex_method is managed towards being able to record that in the optimization set so we can stream and restore it per function. Currently -fcx-fortran-rules and -fcx-limited-range are separate recorded options but saving/restoring does not restore flag_complex_method which is later used in the middle-end. The solution is to make -fcx-fortran-rules and -fcx-limited-range aliases of a new -fcx-method= switch that represents flag_complex_method directly so we can save and restore it. Bootstrap and regtest running on x86_64-unknown-linux-gnu. How do we go about documenting Aliased flags? I'm hoping for test coverage of language-specific defaults. We allowed inlining of -fcx-limited-range into -fno-cx-limited-range (but failed to check -fcx-fortran-rules). Such inlining would pessimize complex multiplication/division, but I've preserved this behavior and properly based it on flag_complex_method. OK for stage1? Thanks, Richard. PR middle-end/60779 * common.opt (fcx-method=): New, map to flag_complex_method. (Enum complex_method): New. (fcx-limited-range): Alias to -fcx-method=limited-range. (fcx-fortran-rules): Alias to -fcx-medhot=fortran. * ipa-inline-transform.cc (inline_call): Check flag_complex_method. * ipa-inline.cc (can_inline_edge_by_limits_p): Likewise. * opts.cc (finish_options): Adjust. (set_fast_math_flags): Likewise. * doc/invoke.texi (fcx-method=): Document. * gcc.dg/lto/pr60779_0.c: New testcase. * gcc.dg/lto/pr60779_1.c: Likewise. --- gcc/common.opt | 28 gcc/doc/invoke.texi | 14 ++ gcc/ipa-inline-transform.cc | 8 gcc/ipa-inline.cc| 2 +- gcc/opts.cc | 16 gcc/testsuite/gcc.dg/lto/pr60779_0.c | 21 + gcc/testsuite/gcc.dg/lto/pr60779_1.c | 6 ++ 7 files changed, 66 insertions(+), 29 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/lto/pr60779_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr60779_1.c diff --git a/gcc/common.opt b/gcc/common.opt index 4c2560a0632..b5c1d41abe9 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -53,12 +53,6 @@ bool in_lto_p = false Variable enum incremental_link flag_incremental_link = INCREMENTAL_LINK_NONE -; 0 means straightforward implementation of complex divide acceptable. -; 1 means wide ranges of inputs must work for complex divide. -; 2 means C99-like requirements for complex multiply and divide. -Variable -int flag_complex_method = 1 - Variable int flag_default_complex_method = 1 @@ -1292,12 +1286,30 @@ fcse-skip-blocks Common Ignore Does nothing. Preserved for backward compatibility. +fcx-method= +Common Joined RejectNegative Enum(complex_method) Var(flag_complex_method) Optimization SetByCombined + +Enum +Name(complex_method) Type(int) + +; straightforward implementation of complex divide acceptable. +EnumValue +Enum(complex_method) String(limited-range) Value(0) + +; wide ranges of inputs must work for complex divide. +EnumValue +Enum(complex_method) String(fortran) Value(1) + +; C99-like requirements for complex multiply and divide. +EnumValue +Enum(complex_method) String(stdc) Value(2) + fcx-limited-range -Common Var(flag_cx_limited_range) Optimization SetByCombined +Common Alias(fcx-method=,limited-range,stdc) Omit range reduction step when performing complex division. fcx-fortran-rules -Common Var(flag_cx_fortran_rules) Optimization +Common Alias(fcx-method=,fortran,stdc) Complex multiplication and division follow Fortran rules. fdata-sections diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d9b0278228f..8779488027b 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -574,7 +574,7 @@ Objective-C and Objective-C++ Dialects}. -ffold-mem-offsets -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules --fcx-limited-range +-fcx-limited-range -fcx-method -fdata-sections -fdce -fdelayed-branch -fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fdevirtualize-at-ltrans -fdse @@ -15482,8 +15482,7 @@ When enabled, this option states that a range reduction step is not needed when performing complex division. Also, there is no checking whether the result of a complex multiplication or division is @code{NaN + I*NaN}, with an attempt to rescue the situation in that case. The -default is @option{-fno-cx-limited-range}, but is enabled by -@option{-ffast-math}. +option is enabled by @option{-ffast-math}. This option controls the default setting of the ISO C99 @code{CX_LIMITED_RANGE} pragma. Nevertheless, the option applies to @@ -15496,7 +15495,14 @@ reduction is done as part of complex division, but there is no checking whether the result of a complex multiplication or division is @code{NaN +
Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]
On Tue, Feb 18, 2025 at 10:27:46AM +, Richard Sandiford wrote: > Thanks, this generally looks really good. Some comments on top of > Kyrill's, and Christophe's comment internally about -save-temps. > > Spencer Abson writes: > > +/* Build and return a new VECTOR_CST that is the concatenation of > > + VEC_IN with itself. */ > > +static tree > > +aarch64_self_concat_vec_cst (tree vec_in) > > +{ > > + gcc_assert ((TREE_CODE (vec_in) == VECTOR_CST)); > > + unsigned HOST_WIDE_INT nelts > > += VECTOR_CST_NELTS (vec_in).to_constant (); > > + > > + tree out_type = build_vector_type (TREE_TYPE (TREE_TYPE (vec_in)), > > +nelts * 2); > > It would be good to pass in the type that the caller wants. > More about that below. Yeah, I can see the advantage of that. > > > + > > + /* Avoid decoding/encoding if the encoding won't change. */ > > + if (VECTOR_CST_DUPLICATE_P (vec_in)) > > +{ > > + tree vec_out = make_vector (exact_log2 > > +(VECTOR_CST_NPATTERNS (vec_in)), 1); > > + unsigned int encoded_size > > + = vector_cst_encoded_nelts (vec_in) * sizeof (tree); > > + > > + memcpy (VECTOR_CST_ENCODED_ELTS (vec_out), > > + VECTOR_CST_ENCODED_ELTS (vec_in), encoded_size); > > + > > + TREE_TYPE (vec_out) = out_type; > > + return vec_out; > > +} > > I'm not sure this is worth it. The approach below shouldn't be that > much less efficient, since all the temporaries are generally on the > stack. Also: > > > + > > + tree_vector_builder vec_out (out_type, nelts, 1); > > This call rightly describes a duplicated sequence of NELTS elements so... > > > + for (unsigned i = 0; i < nelts * 2; i++) > > +vec_out.quick_push (VECTOR_CST_ELT (vec_in, i % nelts)); > > ...it should only be necessary to push nelts elements here. Good point! > > > + > > + return vec_out.build (); > > +} > > + > > +/* If the SSA_NAME_DEF_STMT of ARG is an assignement to a > > + BIT_FIELD_REF with SIZE and OFFSET, return the object of the > > + BIT_FIELD_REF. Otherwise, return NULL_TREE. */ > > +static tree > > +aarch64_object_of_bfr (tree arg, unsigned HOST_WIDE_INT size, > > + unsigned HOST_WIDE_INT offset) > > +{ > > + if (TREE_CODE (arg) != SSA_NAME) > > +return NULL_TREE; > > + > > + gassign *stmt = dyn_cast (SSA_NAME_DEF_STMT (arg)); > > + > > + if (!stmt) > > +return NULL_TREE; > > + > > + if (gimple_assign_rhs_code (stmt) != BIT_FIELD_REF) > > +return NULL_TREE; > > + > > + tree bf_ref = gimple_assign_rhs1 (stmt); > > + > > + if (bit_field_size (bf_ref).to_constant () != size > > + || bit_field_offset (bf_ref).to_constant () != offset) > > +return NULL_TREE; > > + > > + return TREE_OPERAND (bf_ref, 0); > > I think this also needs to check that operand 0 of the BIT_FIELD_REF > is a 128-bit vector. A 64-bit reference at offset 64 could instead > be into something else, such as a 256-bit vector. > > An example is: > > -- > #include > > typedef int16_t int16x16_t __attribute__((vector_size(32))); > > int32x4_t > f (int16x16_t foo) > { > return vmovl_s16 ((int16x4_t) { foo[4], foo[5], foo[6], foo[7] }); > } > -- > > which triggers an ICE. > > Even if the argument is a 128-bit vector, it could be a 128-bit > vector of a different type, such as in: > > -- > #include > > int32x4_t > f (int32x4_t foo) > { > return vmovl_s16 (vget_high_s16 (vreinterpretq_s16_s32 (foo))); > } > -- > > I think we should still accept this second case, but emit a VIEW_CONVERT_EXPR > before the call to convert the argument to the right type. > Thanks for raising these, serious tunnel vision on my part... > > +} > > + > > +/* Prefer to use the highpart builtin when: > > + > > +1) All lowpart arguments are references to the highparts of other > > +vectors. > > + > > +2) For calls with two lowpart arguments, if either refers to a > > +vector highpart and the other is a VECTOR_CST. We can copy the > > +VECTOR_CST to 128b in this case. */ > > +static bool > > +aarch64_fold_lo_call_to_hi (tree arg_0, tree arg_1, tree *out_0, > > + tree *out_1) > > +{ > > + /* Punt until as late as possible: > > + > > + 1) By folding away BIT_FIELD_REFs we remove information about the > > + operands that may be useful to other optimizers. > > + > > + 2) For simplicity, we'd like the expression > > + > > + x = BIT_FIELD_REF > > + > > + to imply that A is not a VECTOR_CST. This assumption is unlikely > > + to hold before constant propagation/folding. */ > > + if (!(cfun->curr_properties & PROP_last_full_fold)) > > +return false; > > + > > + unsigned int offset = B
Re: [PATCH] arm: Remove inner 'fix:HF/SF/DF' from fixed-point patterns (PR 117712)
On 18/02/2025 08:37, Christophe Lyon wrote: > As discussed in the PR, removing the inner 'fix:HF/SD/DF' fixes the > problem, like other targets do. > The double-'fix' idiom was introduced in https://gcc.gnu.org/pipermail/gcc-patches/2003-March/098380.html to address target/5985. Certainly at the time it seems that FIX had two meanings depending on the mode. If the target was a floating point mode it did a truncation operation with rounding. If it was an integer mode it did trucation with unspecified rounding. But the manual doesn't seem to mention FIX: (at least not now), so I'm wondering if something has been lost somewhere along the line. Anyway, I'm not sure this is right yet. R. > gcc/ChangeLog: > > PR rtl-optimization/117712 > * config/arm/arm.md (fix_trunchfsi2): Remove inner fix:HF. > (fix_trunchfdi2): Likewise. > (fix_truncsfsi2): Remove inner fix:SF. > (fix_truncdfsi2): Remove inner fix:DF. > * config/arm/vfp.md (truncsisf2_vfp): remove inner fix:SF. > (truncsidf2_vfp): Remove inner fix:DF. > (fixuns_truncsfsi2): Remove inner fix:SF. > (fixuns_truncdfsi2): Remove inner fix:DF. > > gcc/testsuite/ChangeLog: > > PR rtl-optimization/117712 > * gcc.target/arm/pr117712-df.c: New test. > * gcc.target/arm/pr117712-hf-di.c: New test. > * gcc.target/arm/pr117712-hf.c: New test. > * gcc.target/arm/pr117712-sf.c: New test. > --- > gcc/config/arm/arm.md | 8 > gcc/config/arm/vfp.md | 8 > gcc/testsuite/gcc.target/arm/pr117712-df.c| 10 ++ > gcc/testsuite/gcc.target/arm/pr117712-hf-di.c | 10 ++ > gcc/testsuite/gcc.target/arm/pr117712-hf.c| 10 ++ > gcc/testsuite/gcc.target/arm/pr117712-sf.c| 10 ++ > 6 files changed, 48 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-df.c > create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf-di.c > create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf.c > create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-sf.c > > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index 442d86b9329..ed0d0da2e63 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -5477,7 +5477,7 @@ (define_expand "floatsidf2" > > (define_expand "fix_trunchfsi2" >[(set (match_operand:SI 0 "general_operand") > - (fix:SI (fix:HF (match_operand:HF 1 "general_operand"] > + (fix:SI (match_operand:HF 1 "general_operand")))] >"TARGET_EITHER" >" >{ > @@ -5489,7 +5489,7 @@ (define_expand "fix_trunchfsi2" > > (define_expand "fix_trunchfdi2" >[(set (match_operand:DI 0 "general_operand") > - (fix:DI (fix:HF (match_operand:HF 1 "general_operand"] > + (fix:DI (match_operand:HF 1 "general_operand")))] >"TARGET_EITHER" >" >{ > @@ -5501,14 +5501,14 @@ (define_expand "fix_trunchfdi2" > > (define_expand "fix_truncsfsi2" >[(set (match_operand:SI 0 "s_register_operand") > - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand"] > + (fix:SI (match_operand:SF 1 "s_register_operand")))] >"TARGET_32BIT && TARGET_HARD_FLOAT" >" > ") > > (define_expand "fix_truncdfsi2" >[(set (match_operand:SI 0 "s_register_operand") > - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand"] > + (fix:SI (match_operand:DF 1 "s_register_operand")))] >"TARGET_32BIT && TARGET_HARD_FLOAT && !TARGET_VFP_SINGLE" >" > ") > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > index 379f5f7b3dc..0ef019b1727 100644 > --- a/gcc/config/arm/vfp.md > +++ b/gcc/config/arm/vfp.md > @@ -1508,7 +1508,7 @@ (define_insn "truncsfhf2" > > (define_insn "*truncsisf2_vfp" >[(set (match_operand:SI 0 "s_register_operand" "=t") > - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" "t"] > + (fix:SI (match_operand:SF 1 "s_register_operand" "t")))] >"TARGET_32BIT && TARGET_HARD_FLOAT" >"vcvt%?.s32.f32\\t%0, %1" >[(set_attr "predicable" "yes") > @@ -1517,7 +1517,7 @@ (define_insn "*truncsisf2_vfp" > > (define_insn "*truncsidf2_vfp" >[(set (match_operand:SI 0 "s_register_operand" "=t") > - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand" "w"] > + (fix:SI (match_operand:DF 1 "s_register_operand" "w")))] >"TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE" >"vcvt%?.s32.f64\\t%0, %P1" >[(set_attr "predicable" "yes") > @@ -1527,7 +1527,7 @@ (define_insn "*truncsidf2_vfp" > > (define_insn "fixuns_truncsfsi2" >[(set (match_operand:SI 0 "s_register_operand" "=t") > - (unsigned_fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" > "t"] > + (unsigned_fix:SI (match_operand:SF 1 "s_register_operand" "t")))] >"TARGET_32BIT && TARGET_HARD_FLOAT" >"vcvt%?.u32.f32\\t%0, %1" >[(set_a
Re: [PATCH] builtins: Ensure sin and cos properly set errno when INFINITY is passed [PR80042]
18 Feb 2025 8:51:16 am Richard Biener : On Tue, Feb 18, 2025 at 1:21 AM Sam James wrote: Peter Damianov writes: POSIX says that sin and cos should set errno to EDOM when infinity is passed to them. Make sure this is accounted for in builtins.def, and add tests. gcc/ PR middle-end/80042 * builtins.def: (sin|cos)(f|l) can set errno. gcc/testsuite/ * gcc.dg/pr80042.c: New testcase. --- gcc/builtins.def | 20 +- gcc/testsuite/gcc.dg/pr80042.c | 71 ++ 2 files changed, 82 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr80042.c [...] diff --git a/gcc/testsuite/gcc.dg/pr80042.c b/gcc/testsuite/gcc.dg/pr80042.c new file mode 100644 index 000..cc578ae67e2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr80042.c @@ -0,0 +1,71 @@ +/* dg-do run */ +/* dg-options "-O2 -lm" */ These two lines are missing {}. Please double check the logs from your testsuite run to make sure newly added/changed tests are executed (and in the way you expect). This test will also FAIL on *BSD IIRC as that doesn't set errno for any math functions. So what do you suggest I do about it? Drop the test, or only enable it for certain known good targets? I don't use BSD so cannot test it. I'll note GCC models sincos as cexpi which does not set errno, and will eventually expand that to sincos or cexp. It does that without any restriction on -fno-math-errno. Is this a problem? Would I need to disable expansion to cexp with -fmath-errno make this work? I'll also note the C standard does not document any domain error on +- Inf arguments. Instead it documents a range error for sin(x) and nonzero x too close to zero. https://pubs.opengroup.org/onlinepubs/9699919799/functions/sin.html POSIX does specify it should be a domain error, but C itself doesn't seem to say anything regarding it other than basically "implementations are allowed to invent errors for this case". Richard. [...]
Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]
Thanks, this generally looks really good. Some comments on top of Kyrill's, and Christophe's comment internally about -save-temps. Spencer Abson writes: > +/* Build and return a new VECTOR_CST that is the concatenation of > + VEC_IN with itself. */ > +static tree > +aarch64_self_concat_vec_cst (tree vec_in) > +{ > + gcc_assert ((TREE_CODE (vec_in) == VECTOR_CST)); > + unsigned HOST_WIDE_INT nelts > += VECTOR_CST_NELTS (vec_in).to_constant (); > + > + tree out_type = build_vector_type (TREE_TYPE (TREE_TYPE (vec_in)), > + nelts * 2); It would be good to pass in the type that the caller wants. More about that below. > + > + /* Avoid decoding/encoding if the encoding won't change. */ > + if (VECTOR_CST_DUPLICATE_P (vec_in)) > +{ > + tree vec_out = make_vector (exact_log2 > + (VECTOR_CST_NPATTERNS (vec_in)), 1); > + unsigned int encoded_size > + = vector_cst_encoded_nelts (vec_in) * sizeof (tree); > + > + memcpy (VECTOR_CST_ENCODED_ELTS (vec_out), > + VECTOR_CST_ENCODED_ELTS (vec_in), encoded_size); > + > + TREE_TYPE (vec_out) = out_type; > + return vec_out; > +} I'm not sure this is worth it. The approach below shouldn't be that much less efficient, since all the temporaries are generally on the stack. Also: > + > + tree_vector_builder vec_out (out_type, nelts, 1); This call rightly describes a duplicated sequence of NELTS elements so... > + for (unsigned i = 0; i < nelts * 2; i++) > +vec_out.quick_push (VECTOR_CST_ELT (vec_in, i % nelts)); ...it should only be necessary to push nelts elements here. > + > + return vec_out.build (); > +} > + > +/* If the SSA_NAME_DEF_STMT of ARG is an assignement to a > + BIT_FIELD_REF with SIZE and OFFSET, return the object of the > + BIT_FIELD_REF. Otherwise, return NULL_TREE. */ > +static tree > +aarch64_object_of_bfr (tree arg, unsigned HOST_WIDE_INT size, > +unsigned HOST_WIDE_INT offset) > +{ > + if (TREE_CODE (arg) != SSA_NAME) > +return NULL_TREE; > + > + gassign *stmt = dyn_cast (SSA_NAME_DEF_STMT (arg)); > + > + if (!stmt) > +return NULL_TREE; > + > + if (gimple_assign_rhs_code (stmt) != BIT_FIELD_REF) > +return NULL_TREE; > + > + tree bf_ref = gimple_assign_rhs1 (stmt); > + > + if (bit_field_size (bf_ref).to_constant () != size > + || bit_field_offset (bf_ref).to_constant () != offset) > +return NULL_TREE; > + > + return TREE_OPERAND (bf_ref, 0); I think this also needs to check that operand 0 of the BIT_FIELD_REF is a 128-bit vector. A 64-bit reference at offset 64 could instead be into something else, such as a 256-bit vector. An example is: -- #include typedef int16_t int16x16_t __attribute__((vector_size(32))); int32x4_t f (int16x16_t foo) { return vmovl_s16 ((int16x4_t) { foo[4], foo[5], foo[6], foo[7] }); } -- which triggers an ICE. Even if the argument is a 128-bit vector, it could be a 128-bit vector of a different type, such as in: -- #include int32x4_t f (int32x4_t foo) { return vmovl_s16 (vget_high_s16 (vreinterpretq_s16_s32 (foo))); } -- I think we should still accept this second case, but emit a VIEW_CONVERT_EXPR before the call to convert the argument to the right type. > +} > + > +/* Prefer to use the highpart builtin when: > + > +1) All lowpart arguments are references to the highparts of other > +vectors. > + > +2) For calls with two lowpart arguments, if either refers to a > +vector highpart and the other is a VECTOR_CST. We can copy the > +VECTOR_CST to 128b in this case. */ > +static bool > +aarch64_fold_lo_call_to_hi (tree arg_0, tree arg_1, tree *out_0, > + tree *out_1) > +{ > + /* Punt until as late as possible: > + > + 1) By folding away BIT_FIELD_REFs we remove information about the > + operands that may be useful to other optimizers. > + > + 2) For simplicity, we'd like the expression > + > + x = BIT_FIELD_REF > + > + to imply that A is not a VECTOR_CST. This assumption is unlikely > + to hold before constant propagation/folding. */ > + if (!(cfun->curr_properties & PROP_last_full_fold)) > +return false; > + > + unsigned int offset = BYTES_BIG_ENDIAN ? 0 : 64; > + > + tree hi_arg_0 = aarch64_object_of_bfr (arg_0, 64, offset); > + tree hi_arg_1 = aarch64_object_of_bfr (arg_1, 64, offset); > + if (!hi_arg_0) > +{ > + if (!hi_arg_1 || TREE_CODE (arg_0) != VECTOR_CST) > + return false; > + hi_arg_0 = aarch64_self_concat_vec_cst (arg_0); > +} > + else if (!hi_arg_1) > +{ > + if (TREE_CODE (arg_1) != VECTOR_CST) > + return false; > + hi_arg_1 = aarc
Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h
Soumya AR writes: >> On 18 Feb 2025, at 2:27 PM, Kyrylo Tkachov wrote: >> >> >> >>> On 18 Feb 2025, at 09:48, Kyrylo Tkachov wrote: >>> >>> >>> On 18 Feb 2025, at 09:41, Richard Sandiford wrote: Kyrylo Tkachov writes: > Hi Soumya > >> On 18 Feb 2025, at 09:12, Soumya AR wrote: >> >> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses >> generic_prefetch_tune in generic_armv8_a_tunings. >> >> This patch updates the pointer to generic_armv8_a_prefetch_tune. >> >> This patch was bootstrapped and regtested on aarch64-linux-gnu, no >> regression. >> >> Ok for GCC 15 now? > > Yes, this looks like a simple oversight. > Ok to push to master. I suppose the alternative would be to remove generic_armv8_a_prefetch_tune, since it's (deliberately) identical to generic_prefetch_tune. >>> >>> Looks like we have one prefetch_tune structure for each of the generic >>> tunings (generic, generic_armv8_a, generic_armv9_a). >>> For the sake of symmetry it feels a bit better to have them independently >>> tunable. >>> But as the effects are the same, it may be better to remove it in the >>> interest of less code. >>> >> >> I see Soumya has already pushed her patch. I’m okay with either approach >> tbh, but if Richard prefers we can remove generic_armv8_a_prefetch_tune in a >> separate commit. > > Yeah, missed Richard’s mail. > > Let me know which is preferable, thanks. No, it's fine as is. My comment was just a suggestion. Thanks, Richard
Re: [PATCH v2 05/16] Update is_function_default_version to work with target_version.
Alfie Richards writes: > Notably this respects target_version semantics where an unannotated > function can be the default version. > > gcc/ChangeLog: > > * attribs.cc (is_function_default_version): Add target_version logic. OK for GCC 16, thanks. Richard > --- > gcc/attribs.cc | 27 --- > 1 file changed, 20 insertions(+), 7 deletions(-) > > diff --git a/gcc/attribs.cc b/gcc/attribs.cc > index 56dd18c2fa8..f6667839c01 100644 > --- a/gcc/attribs.cc > +++ b/gcc/attribs.cc > @@ -1279,18 +1279,31 @@ make_dispatcher_decl (const tree decl) >return func_decl; > } > > -/* Returns true if DECL is multi-versioned using the target attribute, and > this > - is the default version. This function can only be used for targets that > do > - not support the "target_version" attribute. */ > +/* Returns true if DECL a multiversioned default. > + With the target attribute semantics, returns true if the function is > marked > + as default with the target version. > + With the target_version attribute semantics, returns true if the function > + is either not annotated, or annotated as default. */ > > bool > is_function_default_version (const tree decl) > { > - if (TREE_CODE (decl) != FUNCTION_DECL > - || !DECL_FUNCTION_VERSIONED (decl)) > + tree attr; > + if (TREE_CODE (decl) != FUNCTION_DECL) > return false; > - tree attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl)); > - gcc_assert (attr); > + if (TARGET_HAS_FMV_TARGET_ATTRIBUTE) > +{ > + if (!DECL_FUNCTION_VERSIONED (decl)) > + return false; > + attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl)); > + gcc_assert (attr); > +} > + else > +{ > + attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl)); > + if (!attr) > + return true; > +} >attr = TREE_VALUE (TREE_VALUE (attr)); >return (TREE_CODE (attr) == STRING_CST > && strcmp (TREE_STRING_POINTER (attr), "default") == 0);
Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.
We already have a use of "(reg:SI FRM_REGNUM)" within the pattern, is it not enough? I believe the answer is not enough so you propose this patch, so could you explain a few more about what happened? (define_insn "@pred_single_widen__scalar" [(set (match_operand:VWEXTF 0 "register_operand""=vd, vd, vr, vr") (if_then_else:VWEXTF (unspec: [(match_operand: 1 "vector_mask_operand" " vm, vm,Wc1,Wc1") (match_operand 5 "vector_length_operand" "rvl,rvl,rvl,rvl") (match_operand 6 "const_int_operand" " i, i, i, i") (match_operand 7 "const_int_operand" " i, i, i, i") (match_operand 8 "const_int_operand" " i, i, i, i") (match_operand 9 "const_int_operand" " i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM) (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) <-here (plus_minus:VWEXTF (match_operand:VWEXTF 3 "register_operand"" vr, vr, vr, vr") (float_extend:VWEXTF (vec_duplicate: (match_operand: 4 "register_operand" " f, f, f, f" (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0, vu, 0")))] On Tue, Feb 18, 2025 at 7:14 PM Jin Ma wrote: > > We overlooked the side effects of the rounding mode in the pattern, > which can impact the result of float_extend and lead to incorrect > optimizations in the final program. This issue likely affects nearly > all similar patterns that involve rounding modes, and the tests in > this patch only highlight one example. It seems challenging to address, > and I only implemented a simple fix, which is not a good way to solve > the problem. > > Any comments on this? > > gcc/ChangeLog: > > * config/riscv/vector-iterators.md (UNSPEC_VRM): New. > * config/riscv/vector.md: Use UNSPEC for float_extend. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/bug-11.c: New test. > > Reported-by: CunJian Huang > Signed-off-by: Jin Ma > --- > gcc/config/riscv/vector-iterators.md | 3 +++ > gcc/config/riscv/vector.md| 6 +++-- > .../gcc.target/riscv/rvv/base/bug-11.c| 24 +++ > 3 files changed, 31 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c > > diff --git a/gcc/config/riscv/vector-iterators.md > b/gcc/config/riscv/vector-iterators.md > index c1bd7397441..bd592f736e2 100644 > --- a/gcc/config/riscv/vector-iterators.md > +++ b/gcc/config/riscv/vector-iterators.md > @@ -120,6 +120,9 @@ (define_c_enum "unspec" [ > >UNSPEC_SF_VFNRCLIP >UNSPEC_SF_VFNRCLIPU > + > + ;; Side effects of rounding mode > + UNSPEC_VRM > ]) > > (define_c_enum "unspecv" [ > diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md > index 8ee43cf0ce1..e971dcdc973 100644 > --- a/gcc/config/riscv/vector.md > +++ b/gcc/config/riscv/vector.md > @@ -7135,8 +7135,10 @@ (define_insn > "@pred_single_widen__scalar" > (plus_minus:VWEXTF > (match_operand:VWEXTF 3 "register_operand"" vr, vr, > vr, vr") > (float_extend:VWEXTF > - (vec_duplicate: > - (match_operand: 4 "register_operand" " f, f, > f, f" > + (unspec:VWEXTF > + [(vec_duplicate: > + (match_operand: 4 "register_operand" " f, f, > f, f")) > + (reg:SI FRM_REGNUM)] UNSPEC_VRM))) > (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0, > vu, 0")))] >"TARGET_VECTOR" >"vfw.wf\t%0,%3,%4%p1" > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c > new file mode 100644 > index 000..52d940cb57a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c > @@ -0,0 +1,24 @@ > +/* { dg-do run { target { riscv_v } } } */ > +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O2" } */ > + > +#include > + > +int main () > +{ > + float data_store = 0; > + int8_t mask = 1; > + size_t vl = 1; > + float data_load = 0.0; > + _Float16 data_sub = 0.0; > + vint8mf8_t mask_value = __riscv_vle8_v_i8mf8 (&mask, vl); > + vbool64_t vmask = __riscv_vmseq_vx_i8mf8_b64 (mask_value, 1, vl); > + vfloat32mf2_t vd_load = __riscv_vfmv_v_f_f32mf2 (0, > __riscv_vsetvlmax_e32mf2 ()); > + vfloat32mf2_t vreg_memory = __riscv_vle32_v_f32mf2_tu (vd_load, > &data_load, vl); > + vfloat32mf2_t vreg = __riscv_vfwsub_wf_f32mf2_rm_tum (vmask, vreg_memory, > vreg_memory, data_sub, __RISCV_FRM_RDN, vl); > + __riscv_vse32_v_f32mf2 (&data_store, vreg, vl); > + > + __builtin_printf ("%f\n", data_store); > + return 0; > +} > + > +/* { dg-output "-0.00\\s+\n" } */ > -- > 2.25.1 >
[committed] gfortran.dg/gomp/metadirective-3.f90
With a compiler setup to compile (also) for nvptx offloading, the testcase triggered a bogus error - and that prevents in addition the gimple scan. Fixed by adding an xfail and an xfailed dg-bogus. The issue itself is the known https://gcc.gnu.org/PR118694 Committed as obvious asr15-7606-g8d922a80396b0c, cf. attachment. Tobias commit 8d922a80396b0cc9f5311d79aa760412dd018848 Author: Tobias Burnus Date: Tue Feb 18 15:48:39 2025 +0100 gfortran.dg/gomp/metadirective-3.f90: xfail on offload_nvptx Currently, 'target' with a nested metadirective creating a 'teams' will fail with a bogus error ("‘target’ construct with nested ‘teams’ construct contains directives outside of the ‘teams’ construct"). That's tracked at PR118694 - and, hence, expected. However, the testcase metadirective-3.f90 triggers this when compiling for 'target offload_nvptx' (otherwise, the code is optimized away). Use xfail to silence the error as it is known and there is a tracking PR. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/metadirective-3.f90: Add xfail when compiling for offload_nvptx. --- gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90 | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90 b/gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90 index c5e25e598eb..e2ebb0a39c1 100644 --- a/gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90 @@ -22,4 +22,7 @@ end module ! that alternative and not produce a metadirective at all. Otherwise this ! won't be resolved until late. ! { dg-final { scan-tree-dump-not "#pragma omp metadirective" "gimple" { target { ! offload_nvptx } } } } -! { dg-final { scan-tree-dump "#pragma omp metadirective" "gimple" { target { offload_nvptx } } } } + +! The following two are xfail because the bogus error triggers and thus prevents the dump, cf. PR118694 +! { dg-final { scan-tree-dump "#pragma omp metadirective" "gimple" { target { offload_nvptx } xfail { offload_nvptx } } } } +! { dg-bogus "'target' construct with nested 'teams' construct contains directives outside of the 'teams' construct" "PR118694" { xfail offload_nvptx } 10 }
Re: 7/7 [Fortran, Patch, Coarray, PR107635] Remove deprecated coarray routines
Hi Thomas, > This patch series (of necessity) introduces ABI changes. What will > happen with user code compiled against the old interface? That depends on the library you are linking against. When using caf_single from gfortran, then you will get link failures when you mix code compiled by gfortran < 15 and gfortran-15. But caf_single is anyhow only considered for testing. So why should one do this ? If your questions targets the users of this ABI, which to my knowledge is only OpenCoarrays at the moment, then the user will experience nothing. A mix of pre-gfortran-15 and gfortran-15 generated .o-files will link and work as expected, because OpenCoarrays provides all ABIs. We do not compile a gfortran-15 exclusive version of OpenCoarrays, i.e. all routines are present, fully functional and interoperable. > I guess a link failure (plus an answer in stack exchange where the > explanation is given, so people can google it, and a mention in the > release notes) would be acceptable, but is there anything that > can be done in addition? I can provide an entry in release notes, if need be. Where do I have to do this? Never did. Thanks again, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: 7/7 [Fortran, Patch, Coarray, PR107635] Remove deprecated coarray routines
Hi Jerry, thank you very much for taking on the job of reviewing and sorry for my late answer. In fact, I was having a hard time figuring regressions in the OpenCoarrays library. This also answers your first question: Yes, OpenCoarrays will make use of the new interface. Most of the changes in the interface are required by OpenCoarrays. Today I got all OpenCoarray's tests passing. The OpenCoarrays tests all run a little bit faster than with the old method. Please keep in mind, that those tests keep starting and stopping tiny apps. I.e. the overhead of this sequential part is significant. Unfortunately the speedup is tiny (about 3 seconds for the whole suite running now in 1:21.38 (m:ss.ms; Release-build, i.e. -O3; mpich and Intel's mpi). I will look for a better benchmark suite. I think to remember that in some ticket on OpenCoarrays one was mentioned. Nevertheless are all these tests run on single machine. I have no cluster to command. I will rebase, rename rewrite.cc to coarray.cc, retest and merge shortly, if no one objects. Then I unfortunately have to post a new small bugfix (about 10 lines). Thanks again, Andre On Fri, 14 Feb 2025 10:19:28 -0800 Jerry D wrote: > On 2/13/25 11:48 AM, Jerry D wrote: > > On 2/10/25 2:25 AM, Andre Vehreschild wrote: > >> [PATCH 7/7] Fortran: Remove deprecated coarray routines [PR107635] > >> > > > > I have applied all patches. Regression tested OK here. > > > > From patch 5 there was one reject: > > > > patching file gcc/testsuite/gfortran.dg/coarray/send_char_array_1.f90 > > Hunk #1 FAILED at 39. > > 1 out of 1 hunk FAILED -- saving rejects to file gcc/testsuite/ > > gfortran.dg/coarray/send_char_array_1.f90.rej > > > > > I commented earlier about changing the name of rewrite.cc. > this please. > > > > I am now going through the whole enchilada for editorial stuff. > > > > Regards, > > > > I finished going through the last nine yards and it looks good. I have a > couple of questions: > > Have you been able to test against the OpenCoarray tests? > > Have you been able to measure any performance improvements? > > I suspect that the latter question may relate only to multi-node large > systems. > > I think this is good to commit. (all 7 parts) > > Does anyone else have any comments? > > Regards, > > Jerry > > > -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH v2 13/16] Change target_version semantics to follow ACLE specification.
Alfie Richards writes: > This changes behavior of target_clones and target_version attributes > to be inline with what is specified in the Arm C Language Extension. > > Notably this changes the scope and signature of multiversioned functions > to that of the default version, and changes the resolver to be > created at the implementation of the default version. > > This is achieved by changing the C++ front end to no longer resolve any > non-default version decls in lookup, and by moving dipatching > for default_target sets to reuse the dispatching logic for target_clones > in multiple_target.cc. > > The dispatching in create_dispatcher_calls is changed for the case of > a lone annotated default function to change the dispatched symbol to > be an alias for the mangled default function. Heh, nice trick. I agree that conceptually it's also a a very clean solution, but I don't know the cgraph internals well enough to know whether there might be dragons. The gcc/*.cc changes look good to me as far as I can review them. Thanks, Richard > > gcc/ChangeLog: > > * cgraphunit.cc (analyze_functions): Add logic for target version > dependencies. > * ipa.cc (symbol_table::remove_unreachable_nodes): Ditto. > * multiple_target.cc (create_dispatcher_calls): Change to support > target version semantics. > (ipa_target_clone): Change to dispatch all function sets in > target_version semantics. > > gcc/cp/ChangeLog: > > * call.cc (add_candidates): Change to not resolve non-default versions > in > target_version semantics. > * class.cc (resolve_address_of_overloaded_function): Ditto. > * cp-gimplify.cc (cp_genericize_r): Change logic to not apply for > target_version semantics. > * decl.cc (start_decl): Change to mark and therefore mangle all > target_version decls. > (start_preparsed_function): Ditto. > * typeck.cc (cp_build_function_call_vec): Add error for calling > unresolvable > non-default node in target_version semantics. > > gcc/testsuite/ChangeLog: > > * g++.target/aarch64/mv-1.C: Change for target_version semantics. > * g++.target/aarch64/mv-symbols2.C: Ditto. > * g++.target/aarch64/mv-symbols3.C: Ditto. > * g++.target/aarch64/mv-symbols4.C: Ditto. > * g++.target/aarch64/mv-symbols5.C: Ditto. > * g++.target/aarch64/mvc-symbols3.C: Ditto. > * g++.target/riscv/mv-symbols2.C: Ditto. > * g++.target/riscv/mv-symbols3.C: Ditto. > * g++.target/riscv/mv-symbols4.C: Ditto. > * g++.target/riscv/mv-symbols5.C: Ditto. > * g++.target/riscv/mvc-symbols3.C: Ditto. > * g++.target/aarch64/mv-symbols10.C: New test. > * g++.target/aarch64/mv-symbols11.C: New test. > * g++.target/aarch64/mv-symbols12.C: New test. > * g++.target/aarch64/mv-symbols13.C: New test. > * g++.target/aarch64/mv-symbols6.C: New test. > * g++.target/aarch64/mv-symbols7.C: New test. > * g++.target/aarch64/mv-symbols8.C: New test. > * g++.target/aarch64/mv-symbols9.C: New test. > --- > gcc/cgraphunit.cc | 9 +++ > gcc/cp/call.cc| 10 +++ > gcc/cp/class.cc | 13 +++- > gcc/cp/cp-gimplify.cc | 11 ++- > gcc/cp/decl.cc| 14 > gcc/cp/typeck.cc | 10 +++ > gcc/ipa.cc| 11 +++ > gcc/multiple_target.cc| 73 --- > gcc/testsuite/g++.target/aarch64/mv-1.C | 4 + > .../g++.target/aarch64/mv-symbols10.C | 27 +++ > .../g++.target/aarch64/mv-symbols11.C | 30 > .../g++.target/aarch64/mv-symbols12.C | 28 +++ > .../g++.target/aarch64/mv-symbols13.C | 28 +++ > .../g++.target/aarch64/mv-symbols2.C | 12 +-- > .../g++.target/aarch64/mv-symbols3.C | 6 +- > .../g++.target/aarch64/mv-symbols4.C | 6 +- > .../g++.target/aarch64/mv-symbols5.C | 6 +- > .../g++.target/aarch64/mv-symbols6.C | 25 +++ > .../g++.target/aarch64/mv-symbols7.C | 48 > .../g++.target/aarch64/mv-symbols8.C | 46 > .../g++.target/aarch64/mv-symbols9.C | 43 +++ > .../g++.target/aarch64/mvc-symbols3.C | 12 +-- > gcc/testsuite/g++.target/riscv/mv-symbols2.C | 12 +-- > gcc/testsuite/g++.target/riscv/mv-symbols3.C | 6 +- > gcc/testsuite/g++.target/riscv/mv-symbols4.C | 6 +- > gcc/testsuite/g++.target/riscv/mv-symbols5.C | 6 +- > gcc/testsuite/g++.target/riscv/mvc-symbols3.C | 12 +-- > 27 files changed, 456 insertions(+), 58 deletions(-) > create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols10.C > create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols11.C > create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols12.C > create
RE: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351]
Hi Richard, After some more investigation, the sample code never hit one vectorizable_* routines which may check the loop_vinfo->vector_mode, and then the loop_vinfo->vector_mode == DImode will hit the vect_verify_loop_lens and trigger the assert VECTOR_MODE_P, detail flow as below. vect_analyze_loop_2 |- vect_pattern_recog // Hit over-widening pattern and set loop_vinfo->vector_mode to DImode |- ... |- vect_analyze_loop_operations |- (gdb) p stmt_info->def_type |- $1 = vect_reduction_def |- (gdb) p stmt_info->slp_type |- $2 = pure_slp |- vectorizable_lc_phi // Not Hit |- vectorizable_induction // Not Hit |- vectorizable_reduction // Not Hit |- vectorizable_recurr // Not Hit |- vectorizable_live_operation // Not Hit |- vect_analyze_stmt |- (gdb) p stmt_info->relevant |- $3 = vect_unused_in_scope |- (gdb) p stmt_info->live |- $4 = false |- (gdb) p pattern_stmt_info |- $5 = (stmt_vec_info) 0x0 |- return opt_result::success (); OR |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP analysis\n" |- Early return opt_result::success (); |- vectorizable_load/store/call_convert/... // Not Hit |- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS (loop_vinfo).is_empty () |- vect_verify_loop_lens (loop_vinfo) |- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert result in ICE I am a little hesitant by two options here. 1. shall we add some condition and dump log here to make the vect_analyze_loop_2 failure when loop_vinfo->vector_mode is not supported vector mode by target. 2. it should not be LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P here? Then we need to find out where set the partial vector to true. Is there any suggestion here? Pan -Original Message- From: Li, Pan2 Sent: Monday, February 17, 2025 6:08 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351] > But that's wrong - read the comment before the code. We do support integer > mode > "generic" vectorization just fine. Iff there's anything to plug then > it's how we end > up thinking there's with_len support for DImode vectors. I see, then we need another place to fix this, let me have a try. Pan -Original Message- From: Richard Biener Sent: Monday, February 17, 2025 6:02 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351] On Mon, Feb 17, 2025 at 10:38 AM wrote: > > From: Pan Li > > This patch would like to fix the ICE similar as below, assump we have > sample code: > >1 │ int a, b, c; >2 │ short d, e, f; >3 │ long g (long h) { return h; } >4 │ >5 │ void i () { >6 │ for (; b; ++b) { >7 │ f = 5 >> a ? d : d << a; >8 │ e &= c | g(f); >9 │ } > 10 │ } > > It will ice when compile with -O3 -march=rv64gc_zve64f -mrvv-vector-bits=zvl > > during GIMPLE pass: vect > pr116351-1.c: In function ‘i’: > pr116351-1.c:8:6: internal compiler error: in get_len_load_store_mode, > at optabs-tree.cc:655 > 8 | void i () { > | ^ > 0x44d6b9d internal_error(char const*, ...) > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517 > 0x44a26a6 fancy_abort(char const*, int, char const*) > > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722 > 0x19e4309 get_len_load_store_mode(machine_mode, bool, internal_fn*, > vec*) > > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs-tree.cc:655 > 0x1fada40 vect_verify_loop_lens > > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:1566 > 0x1fb2b07 vect_analyze_loop_2 > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3037 > 0x1fb4302 vect_analyze_loop_1 > > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3478 > 0x1fb4e9a vect_analyze_loop(loop*, gimple*, vec_info_shared*) > > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3638 > 0x203c2dc try_vectorize_loop_1 > > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1095 > 0x203c839 try_vectorize_loop > > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1212 > 0x203cb2c execute > > The zve32x cannot have 64 elen, and then the > get_related_vectype_for_scalar_type > will get DImode as vector_mode in loop_info. After that the underlying > vect_analyze_xx will assert the mode is VECTOR
Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h
Kyrylo Tkachov writes: > Hi Soumya > >> On 18 Feb 2025, at 09:12, Soumya AR wrote: >> >> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses >> generic_prefetch_tune in generic_armv8_a_tunings. >> >> This patch updates the pointer to generic_armv8_a_prefetch_tune. >> >> This patch was bootstrapped and regtested on aarch64-linux-gnu, no >> regression. >> >> Ok for GCC 15 now? > > Yes, this looks like a simple oversight. > Ok to push to master. I suppose the alternative would be to remove generic_armv8_a_prefetch_tune, since it's (deliberately) identical to generic_prefetch_tune. > Thanks, > Kyrill > >> >> Signed-off-by: Soumya AR >> >> gcc/ChangeLog: >> >> * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch >> struct pointer. >> >> --- >> gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h >> b/gcc/config/aarch64/tuning_models/generic_armv8_a.h >> index 35de3f03296..01080cade46 100644 >> --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h >> +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h >> @@ -184,7 +184,7 @@ static const struct tune_params generic_armv8_a_tunings = >> (AARCH64_EXTRA_TUNE_BASE >> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS >> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ >> - &generic_prefetch_tune, >> + &generic_armv8_a_prefetch_tune, >> AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ >> AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ >> }; >> -- >> 2.34.1 >> >> >> >>
Re: [PATCH v2] [testsuite] add x86 effective target
Alexandre Oliva writes: > On Feb 13, 2025, Alexandre Oliva wrote: > >> @@ -14108,10 +14113,9 @@ proc dg-require-python-h { args } { >> # Return 1 if the target supports heap-trampoline, 0 otherwise. >> proc check_effective_target_heap_trampoline {} { >> if { [istarget aarch64*-*-linux*] >> - || [istarget i?86-*-darwin*] >> - || [istarget x86_64-*-darwin*] >> - || [istarget i?86-*-linux*] >> - || [istarget x86_64-*-linux*] } { >> + || { [check_effective_target_x86] >> + && { [istarget *-*-darwin*] >> + || [istarget *-*-linux*] } } } { >> return 1 >> } >> return 0 > > I used the wrong kind of brackets here, and missed the error that it > caused. Here's a corrected patch, retested on x86_64-linux-gnu. > Ok to install? > > > I got tired of repeating the conditional that recognizes ia32 or > x86_64, and introduced 'x86' as a shorthand for that, adjusting all > occurrences in target-supports.exp, to set an example. I found some > patterns that recognized i?86* and x86_64*, but I took those as likely > cut&pastos instead of trying to preserve those weirdnesses. > > > for gcc/ChangeLog > > * doc/sourcebuild.texi: Add x86 effective target. > > for gcc/testsuite/ChangeLog > > * lib/target-supports.exp (check_effective_target_x86): New. > Replace all uses of i?86-*-* and x86_64-*-* in this file. Thanks for doing this. How about also replacing all uses of: ([check_effective_target_x86]) with: [check_effective_target_x86] OK with that change if there are no objections within 24 hours. Thanks, Richard > --- > gcc/doc/sourcebuild.texi |3 + > gcc/testsuite/lib/target-supports.exp | 188 > + > 2 files changed, 99 insertions(+), 92 deletions(-) > > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi > index 28338324f0724..d44c2e8cbe6a1 100644 > --- a/gcc/doc/sourcebuild.texi > +++ b/gcc/doc/sourcebuild.texi > @@ -2798,6 +2798,9 @@ Target supports the execution of @code{user_msr} > instructions. > @item vect_cmdline_needed > Target requires a command line argument to enable a SIMD instruction set. > > +@item x86 > +Target is ia32 or x86_64. > + > @item xorsign > Target supports the xorsign optab expansion. > > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 9b5fbe5275613..fbeb2ad3dafa3 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -740,7 +740,7 @@ proc check_profiling_available { test_what } { > } > > if { $test_what == "-fauto-profile" } { > - if { !([istarget i?86-*-linux*] || [istarget x86_64-*-linux*]) } { > + if { !([check_effective_target_x86] && [istarget *-*-linux*]) } { > verbose "autofdo only supported on linux" > return 0 > } > @@ -2616,17 +2616,23 @@ proc remove_options_for_riscv_zvbb { flags } { > return [add_options_for_riscv_z_ext zvbb $flags] > } > > +# Return 1 if the target is ia32 or x86_64. > + > +proc check_effective_target_x86 { } { > +if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) } { > + return 1 > +} else { > +return 0 > +} > +} > + > # Return 1 if the target OS supports running SSE executables, 0 > # otherwise. Cache the result. > > proc check_sse_os_support_available { } { > return [check_cached_effective_target sse_os_support_available { > # If this is not the right target then we can skip the test. > - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { > - expr 0 > - } else { > - expr 1 > - } > + expr [check_effective_target_x86] > }] > } > > @@ -2636,7 +2642,7 @@ proc check_sse_os_support_available { } { > proc check_avx_os_support_available { } { > return [check_cached_effective_target avx_os_support_available { > # If this is not the right target then we can skip the test. > - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { > + if { !([check_effective_target_x86]) } { > expr 0 > } else { > # Check that OS has AVX and SSE saving enabled. > @@ -2659,7 +2665,7 @@ proc check_avx_os_support_available { } { > proc check_avx512_os_support_available { } { > return [check_cached_effective_target avx512_os_support_available { > # If this is not the right target then we can skip the test. > - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { > + if { !([check_effective_target_x86]) } { > expr 0 > } else { > # Check that OS has AVX512, AVX and SSE saving enabled. > @@ -2682,7 +2688,7 @@ proc check_avx512_os_support_available { } { > proc check_sse_hw_available { } { > return [check_cached_effective_target sse_hw_available { > # If this is not the right target then we can skip the test. > - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { > + if
Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h
> On 18 Feb 2025, at 09:41, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Soumya >> >>> On 18 Feb 2025, at 09:12, Soumya AR wrote: >>> >>> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses >>> generic_prefetch_tune in generic_armv8_a_tunings. >>> >>> This patch updates the pointer to generic_armv8_a_prefetch_tune. >>> >>> This patch was bootstrapped and regtested on aarch64-linux-gnu, no >>> regression. >>> >>> Ok for GCC 15 now? >> >> Yes, this looks like a simple oversight. >> Ok to push to master. > > I suppose the alternative would be to remove generic_armv8_a_prefetch_tune, > since it's (deliberately) identical to generic_prefetch_tune. Looks like we have one prefetch_tune structure for each of the generic tunings (generic, generic_armv8_a, generic_armv9_a). For the sake of symmetry it feels a bit better to have them independently tunable. But as the effects are the same, it may be better to remove it in the interest of less code. Thanks, Kyrill > >> Thanks, >> Kyrill >> >>> >>> Signed-off-by: Soumya AR >>> >>> gcc/ChangeLog: >>> >>> * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch >>> struct pointer. >>> >>> --- >>> gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h >>> b/gcc/config/aarch64/tuning_models/generic_armv8_a.h >>> index 35de3f03296..01080cade46 100644 >>> --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h >>> +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h >>> @@ -184,7 +184,7 @@ static const struct tune_params generic_armv8_a_tunings >>> = >>> (AARCH64_EXTRA_TUNE_BASE >>> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS >>> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ >>> - &generic_prefetch_tune, >>> + &generic_armv8_a_prefetch_tune, >>> AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ >>> AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ >>> }; >>> -- >>> 2.34.1 >>> >>> >>> >>>
Re: [PATCH] builtins: Ensure sin and cos properly set errno when INFINITY is passed [PR80042]
On Tue, Feb 18, 2025 at 1:21 AM Sam James wrote: > > Peter Damianov writes: > > > POSIX says that sin and cos should set errno to EDOM when infinity is > > passed to > > them. Make sure this is accounted for in builtins.def, and add tests. > > > > gcc/ > > PR middle-end/80042 > > * builtins.def: (sin|cos)(f|l) can set errno. > > gcc/testsuite/ > > * gcc.dg/pr80042.c: New testcase. > > --- > > gcc/builtins.def | 20 +- > > gcc/testsuite/gcc.dg/pr80042.c | 71 ++ > > 2 files changed, 82 insertions(+), 9 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/pr80042.c > > > > [...] > > diff --git a/gcc/testsuite/gcc.dg/pr80042.c b/gcc/testsuite/gcc.dg/pr80042.c > > new file mode 100644 > > index 000..cc578ae67e2 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/pr80042.c > > @@ -0,0 +1,71 @@ > > +/* dg-do run */ > > +/* dg-options "-O2 -lm" */ > > These two lines are missing {}. Please double check the logs from your > testsuite run to make sure newly added/changed tests are executed (and > in the way you expect). This test will also FAIL on *BSD IIRC as that doesn't set errno for any math functions. I'll note GCC models sincos as cexpi which does not set errno, and will eventually expand that to sincos or cexp. It does that without any restriction on -fno-math-errno. I'll also note the C standard does not document any domain error on +- Inf arguments. Instead it documents a range error for sin(x) and nonzero x too close to zero. Richard. > > > [...]
Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h
> On 18 Feb 2025, at 09:48, Kyrylo Tkachov wrote: > > > >> On 18 Feb 2025, at 09:41, Richard Sandiford >> wrote: >> >> Kyrylo Tkachov writes: >>> Hi Soumya >>> On 18 Feb 2025, at 09:12, Soumya AR wrote: generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses generic_prefetch_tune in generic_armv8_a_tunings. This patch updates the pointer to generic_armv8_a_prefetch_tune. This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. Ok for GCC 15 now? >>> >>> Yes, this looks like a simple oversight. >>> Ok to push to master. >> >> I suppose the alternative would be to remove generic_armv8_a_prefetch_tune, >> since it's (deliberately) identical to generic_prefetch_tune. > > Looks like we have one prefetch_tune structure for each of the generic > tunings (generic, generic_armv8_a, generic_armv9_a). > For the sake of symmetry it feels a bit better to have them independently > tunable. > But as the effects are the same, it may be better to remove it in the > interest of less code. > I see Soumya has already pushed her patch. I’m okay with either approach tbh, but if Richard prefers we can remove generic_armv8_a_prefetch_tune in a separate commit. Thanks, Kyrill > Thanks, > Kyrill > >> >>> Thanks, >>> Kyrill >>> Signed-off-by: Soumya AR gcc/ChangeLog: * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch struct pointer. --- gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h b/gcc/config/aarch64/tuning_models/generic_armv8_a.h index 35de3f03296..01080cade46 100644 --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h @@ -184,7 +184,7 @@ static const struct tune_params generic_armv8_a_tunings = (AARCH64_EXTRA_TUNE_BASE | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune, + &generic_armv8_a_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ }; -- 2.34.1 >
[wwwdocs][committed] projects/gomp/: Update OpenMP implementation status
Result of the commit, see: https://gcc.gnu.org/projects/gomp/ Main change are sync'ing a couple of now fully/partially supported items from libgomp.texi's implementation status table. Otherwise as Sandra found out: a comma between directive and clauses in '#pragma' is already supported since a while (GCC 13; correct in the .texi file) and having a link directly to the OpenMP section makes sense, now that it is available. (Thanks!) Tobias commit 08114aefac17271a87eeaa6394f1874bf90604ab Author: Tobias Burnus Date: Tue Feb 18 10:27:27 2025 +0100 projects/gomp/: Update OpenMP implementation status Sync implementation status from libgomp.texi; fix one omission; link to 'openmp' anchor for GCC 15. Co-authored-by: Sandra Loosemore --- htdocs/projects/gomp/index.html | 66 +++-- 1 file changed, 43 insertions(+), 23 deletions(-) diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html index a4fb4c98..97d14308 100644 --- a/htdocs/projects/gomp/index.html +++ b/htdocs/projects/gomp/index.html @@ -318,7 +318,7 @@ than listed, depending on resolved corner cases and optimizations. GCC 12 GCC 13 GCC 14 - GCC 15 + GCC 15 (atomic_default_mem_order) @@ -352,8 +352,10 @@ than listed, depending on resolved corner cases and optimizations. declare variant directive -GCC 10/GCC 11 -simd traits not handled correctly + + GCC 10/GCC 11 + GCC 15 +simd traits not handled correctly use_device_addr clause on target data @@ -474,7 +476,7 @@ than listed, depending on resolved corner cases and optimizations. metadirective directive -No +GCC 15 @@ -486,7 +488,7 @@ than listed, depending on resolved corner cases and optimizations. allocate directive GCC 14 - GCC 15 + GCC 15 Only C for stack/automatic and Fortran for stack/automatic and allocatable/pointer variables @@ -691,12 +693,12 @@ than listed, depending on resolved corner cases and optimizations. target_device trait in OpenMP Context -No +GCC 15 target_device selector set in context selectors -No +GCC 15 @@ -706,17 +708,18 @@ than listed, depending on resolved corner cases and optimizations. declare variant: new clauses adjust_args and append_args -No - +GCC 15 +For append_args, all interop objects + must be specified in the interop clause of dispatch dispatch construct -No +GCC 15 Loop transformation constructs -GCC 15 +GCC 15 @@ -736,7 +739,7 @@ than listed, depending on resolved corner cases and optimizations. omp_interop_t object support in runtime routines -No +GCC 15 @@ -763,7 +766,7 @@ than listed, depending on resolved corner cases and optimizations. Optional comma between directive and clause in the #pragma form -No +GCC 13 @@ -781,6 +784,23 @@ than listed, depending on resolved corner cases and optimizations. GCC 14 + +Changed interaction between declare target and OpenMP context +GCC 15 + + + +Dynamic selector support in metadirective +GCC 15 + + + +Dynamic selector support in declare variant +GCC 15 +Fortran rejects non-constant expressions in dynamic selectors; C/C++ +reject expressions using argument variables. +(https://gcc.gnu.org/PR113904";>PR113904) + ompt_sync_region_t enum additions No @@ -893,7 +913,7 @@ than listed, depending on resolved corner cases and optimizations. Optional paired end directive with dispatch -No +GCC 15 @@ -908,7 +928,7 @@ than listed, depending on resolved corner cases and optimizations. New otherwise clause as alias for default on metadirectives -No +GCC 15 @@ -978,7 +998,7 @@ than listed, depending on resolved corner cases and optimizations. interop_types in any position of the modifier list for the init clause of the interop construct -No +GCC 15 @@ -1123,7 +1143,7 @@ error. Extension of interop operation of append_args, allowing all modifiers of the init clause -No +GCC 15 @@ -1295,7 +1315,7 @@ error. interop clause to dispatch -No +GCC 15 @@ -1311,7 +1331,7 @@ error. self_maps clause to requires directive -GCC 15 +GCC 15 @@ -1355,7 +1375,7 @@ error. -stipe loop-transformation construct +stripe loop-transformation construct No @@ -1447,7 +1467,7 @@ error. Extended prefer-type modifier to init clause -No +GCC 15 @@ -1507,13 +1527,13 @@ error. omp_targ
Re: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351]
On Tue, Feb 18, 2025 at 10:12 AM Richard Biener wrote: > > On Tue, Feb 18, 2025 at 9:40 AM Li, Pan2 wrote: > > > > Hi Richard, > > > > After some more investigation, the sample code never hit one vectorizable_* > > routines which may check the loop_vinfo->vector_mode, > > and then the loop_vinfo->vector_mode == DImode will hit the > > vect_verify_loop_lens and trigger the assert VECTOR_MODE_P, detail > > flow as below. > > > > vect_analyze_loop_2 > > |- vect_pattern_recog // Hit over-widening pattern and set > > loop_vinfo->vector_mode to DImode > > |- ... > > |- vect_analyze_loop_operations > >|- (gdb) p stmt_info->def_type > >|- $1 = vect_reduction_def > >|- (gdb) p stmt_info->slp_type > >|- $2 = pure_slp > >|- vectorizable_lc_phi // Not Hit > >|- vectorizable_induction // Not Hit > >|- vectorizable_reduction // Not Hit > >|- vectorizable_recurr // Not Hit > >|- vectorizable_live_operation // Not Hit > >|- vect_analyze_stmt > > |- (gdb) p stmt_info->relevant > > |- $3 = vect_unused_in_scope > > |- (gdb) p stmt_info->live > > |- $4 = false > > |- (gdb) p pattern_stmt_info > > |- $5 = (stmt_vec_info) 0x0 > > |- return opt_result::success (); > > OR > > |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP > > analysis\n" > >|- Early return opt_result::success (); > > |- vectorizable_load/store/call_convert/... // Not Hit > >|- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS > > (loop_vinfo).is_empty () > > |- vect_verify_loop_lens (loop_vinfo) > >|- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert > > result in ICE > > > > I am a little hesitant by two options here. > > > > 1. shall we add some condition and dump log here to make the > > vect_analyze_loop_2 failure when loop_vinfo->vector_mode is not supported > > vector mode by target. > > 2. it should not be LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P here? Then we need > > to find out where set the partial vector to true. > > > > Is there any suggestion here? > > static bool > vect_verify_loop_lens (loop_vec_info loop_vinfo) > { > if (LOOP_VINFO_LENS (loop_vinfo).is_empty ()) > return false; > > machine_mode len_load_mode, len_store_mode; > if (!get_len_load_store_mode (loop_vinfo->vector_mode, true) > .exists (&len_load_mode)) > return false; > > so the obvious fix would be to add > > if (!VECTOR_MODE_P (loop_vinfo->vector_mode)) > return false; > > here? But then I wonder how we got to a DImode vector_mode and record > a loop len > in the first place. I could imagine we first end up with DImode but > other stmts using > a vector mode and we record a len for those. But then the above > get_len_load_store_mode > on ->vector_mode seems to assume that all modes we need a len for are > "compatible" with ->vector_mode so I assume recording a LEN would check that. > > I can't reproduce the ICE with a cross on trunk btw. Ah, it needs -march=rv64imd_xsfvcp. So we indeed call vect_record_loop_len with (gdb) p debug_tree (vectype) unit-size align:16 warn_if_not_align:0 symtab:0 alias-set 2 canonical-type 0x77017690 precision:16 min max pointer_to_this > RVVM2HI (gdb) p loop_vinfo->vector_mode $2 = E_DImode from vectorizable_operation and ->vector_mode is set via vect_recog_over_widening_pattern which commits to a DImode vector type ->vector_mode prematurely. The error is probably that vect_verify_loop_lens does not do anything to ensure the checks are done on a relevant mode. With the suggested added check above this then becomes a missed optimization rather than an ICE. But it might fall apart if there's not one load/store len mode to consider? > > Richard. > > > > > Pan > > > > -Original Message- > > From: Li, Pan2 > > Sent: Monday, February 17, 2025 6:08 PM > > To: Richard Biener > > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > > jeffreya...@gmail.com; rdapp@gmail.com > > Subject: RE: [PATCH v1] Vect: Fix ICE when get DImode from > > get_related_vectype_for_scalar_type [PR116351] > > > > > But that's wrong - read the comment before the code. We do support > > > integer mode > > > "generic" vectorization just fine. Iff there's anything to plug then > > > it's how we end > > > up thinking there's with_len support for DImode vectors. > > > > I see, then we need another place to fix this, let me have a try. > > > > Pan > > > > -Original Message- > > From: Richard Biener > > Sent: Monday, February 17, 2025 6:02 PM > > To: Li, Pan2 > > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > > jeffreya...@gmail.com; rdapp@gmail.com > > Subject: Re: [PATCH v1] Vect: Fix ICE when get DImode from > > get_related_vectype_for_scalar_type [PR116351] > > > > On Mon, Feb 17, 2025 at 10:38 AM wrote: > > > > > > From: Pan Li > > > > > > This patch
Re: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351]
On Tue, Feb 18, 2025 at 9:40 AM Li, Pan2 wrote: > > Hi Richard, > > After some more investigation, the sample code never hit one vectorizable_* > routines which may check the loop_vinfo->vector_mode, > and then the loop_vinfo->vector_mode == DImode will hit the > vect_verify_loop_lens and trigger the assert VECTOR_MODE_P, detail > flow as below. > > vect_analyze_loop_2 > |- vect_pattern_recog // Hit over-widening pattern and set > loop_vinfo->vector_mode to DImode > |- ... > |- vect_analyze_loop_operations >|- (gdb) p stmt_info->def_type >|- $1 = vect_reduction_def >|- (gdb) p stmt_info->slp_type >|- $2 = pure_slp >|- vectorizable_lc_phi // Not Hit >|- vectorizable_induction // Not Hit >|- vectorizable_reduction // Not Hit >|- vectorizable_recurr // Not Hit >|- vectorizable_live_operation // Not Hit >|- vect_analyze_stmt > |- (gdb) p stmt_info->relevant > |- $3 = vect_unused_in_scope > |- (gdb) p stmt_info->live > |- $4 = false > |- (gdb) p pattern_stmt_info > |- $5 = (stmt_vec_info) 0x0 > |- return opt_result::success (); > OR > |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP > analysis\n" >|- Early return opt_result::success (); > |- vectorizable_load/store/call_convert/... // Not Hit >|- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS > (loop_vinfo).is_empty () > |- vect_verify_loop_lens (loop_vinfo) >|- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert > result in ICE > > I am a little hesitant by two options here. > > 1. shall we add some condition and dump log here to make the > vect_analyze_loop_2 failure when loop_vinfo->vector_mode is not supported > vector mode by target. > 2. it should not be LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P here? Then we need > to find out where set the partial vector to true. > > Is there any suggestion here? static bool vect_verify_loop_lens (loop_vec_info loop_vinfo) { if (LOOP_VINFO_LENS (loop_vinfo).is_empty ()) return false; machine_mode len_load_mode, len_store_mode; if (!get_len_load_store_mode (loop_vinfo->vector_mode, true) .exists (&len_load_mode)) return false; so the obvious fix would be to add if (!VECTOR_MODE_P (loop_vinfo->vector_mode)) return false; here? But then I wonder how we got to a DImode vector_mode and record a loop len in the first place. I could imagine we first end up with DImode but other stmts using a vector mode and we record a len for those. But then the above get_len_load_store_mode on ->vector_mode seems to assume that all modes we need a len for are "compatible" with ->vector_mode so I assume recording a LEN would check that. I can't reproduce the ICE with a cross on trunk btw. Richard. > > Pan > > -Original Message- > From: Li, Pan2 > Sent: Monday, February 17, 2025 6:08 PM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp@gmail.com > Subject: RE: [PATCH v1] Vect: Fix ICE when get DImode from > get_related_vectype_for_scalar_type [PR116351] > > > But that's wrong - read the comment before the code. We do support integer > > mode > > "generic" vectorization just fine. Iff there's anything to plug then > > it's how we end > > up thinking there's with_len support for DImode vectors. > > I see, then we need another place to fix this, let me have a try. > > Pan > > -Original Message- > From: Richard Biener > Sent: Monday, February 17, 2025 6:02 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v1] Vect: Fix ICE when get DImode from > get_related_vectype_for_scalar_type [PR116351] > > On Mon, Feb 17, 2025 at 10:38 AM wrote: > > > > From: Pan Li > > > > This patch would like to fix the ICE similar as below, assump we have > > sample code: > > > >1 │ int a, b, c; > >2 │ short d, e, f; > >3 │ long g (long h) { return h; } > >4 │ > >5 │ void i () { > >6 │ for (; b; ++b) { > >7 │ f = 5 >> a ? d : d << a; > >8 │ e &= c | g(f); > >9 │ } > > 10 │ } > > > > It will ice when compile with -O3 -march=rv64gc_zve64f -mrvv-vector-bits=zvl > > > > during GIMPLE pass: vect > > pr116351-1.c: In function ‘i’: > > pr116351-1.c:8:6: internal compiler error: in get_len_load_store_mode, > > at optabs-tree.cc:655 > > 8 | void i () { > > | ^ > > 0x44d6b9d internal_error(char const*, ...) > > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517 > > 0x44a26a6 fancy_abort(char const*, int, char const*) > > > > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722 > > 0x19e4309 get_len_load_store_mode(machine_mode, bool, internal_fn*, > > vec*) > > > > /home/pli
[PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h
generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses generic_prefetch_tune in generic_armv8_a_tunings. This patch updates the pointer to generic_armv8_a_prefetch_tune. This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. Ok for GCC 15 now? Signed-off-by: Soumya AR gcc/ChangeLog: * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch struct pointer. --- gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h b/gcc/config/aarch64/tuning_models/generic_armv8_a.h index 35de3f03296..01080cade46 100644 --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h @@ -184,7 +184,7 @@ static const struct tune_params generic_armv8_a_tunings = (AARCH64_EXTRA_TUNE_BASE | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune, + &generic_armv8_a_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ }; -- 2.34.1
Re: [PATCH] COBOL 12/15 24K pos: Posix adapter framework
On Mon, Feb 17, 2025 at 6:50 PM James K. Lowden wrote: > > On Sat, 15 Feb 2025 21:24:52 + > Sam James wrote: > > > > +prototypes.cpp: posix.txt > > > + awk -F'[/.]' '{ print $$6 }' $^ | \ > > > + while read F; do echo "/* $$F */" && man 2 $$F | \ > > > + ./scrape.awk -v funcname=$$6; done > $@~ > > > + @mv $@~ $@ > > > + > > > +posix.txt: > > > + zgrep -l 'POSIX[.]' /usr/share/man/man2/*z > $@~ > > > > This will need reworking. It assumes the location of the man pages on > > the system, assumes 'zgrep' exists, and assumes 'zgrep' can read the > > man pages (the man pages may be compressed with something else; I know > > such systems exist). > > > > I'm not sure this is really any less brittle or more robust than just > > listing the actual functions you scraped out from your system. > > You might be reading more into this than you want to. > > As you saw in gcc/cobol/posix/README.md, the files in that directory are not > part of the compiler. They are tools we provide that potentially make it > easier to generate user-defined COBOL functions that call functions in the C > standard library, in particular syscalls. IMO they don't need to be perfect; > it is enough that they are good. > > The user need never touch this part of the system. The compiler functions > without it. It's there as a convenience and demonstration. I hope to > encourage contributions from users to this directory in a "contrib/" kind of > way. > > There are dependencies beyond the ones you mention, not least (as documented) > the Python PLY module. Anyone sitting down with this tool will have to > wrestle with it a bit. I contend that, if the user needs more than a few > functions, it will be less trouble to engage the tool than to write them by > hand. > > I agree it could be improved. For example, > > > +posix.txt: > > + zgrep -l 'POSIX[.]' /usr/share/man/man2/*z > $@~ > > could be > > posix.txt: > $(ZGREP) -l 'POSIX[.]' $(MANDIR)/man/man2/*z > $@~ > > but that doesn't gain us much, does it? We could start over with autoconf & > automake, to ensure full portability. But that would defeat the purpose. > What I want to provide here is a prototype, not a robust foolproof tool. > > I think a simple example -- even a brittle one loaded with assumptions -- is > easier to understand and serves as a better illustration than a complicated > one. I want to provide such a tool as part of gcobol, to give the user a > facility not available from any other COBOL compiler. I think it's better > included in the gcc distribution than as an SO post or FAQ at > http://www.cobolworx.com. > > I'm sure you agree we don't want to let this tail wag the dog. With my > exegesis in mind, what would you recommend? If it's limited to more > judicious use of makefile variables, I could surely implement those > suggestions. So to simplify things at this point can we postpone merging this bit then? If you say it's more like a "contrib", wouldn't putting it in the toplevel contrib/ directory be more appropriate? Maybe in a contrib/cobol/ subdirectory? Richard. > > --jkl >
Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h
Hi Soumya > On 18 Feb 2025, at 09:12, Soumya AR wrote: > > generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses > generic_prefetch_tune in generic_armv8_a_tunings. > > This patch updates the pointer to generic_armv8_a_prefetch_tune. > > This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > > Ok for GCC 15 now? Yes, this looks like a simple oversight. Ok to push to master. Thanks, Kyrill > > Signed-off-by: Soumya AR > > gcc/ChangeLog: > > * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch > struct pointer. > > --- > gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h > b/gcc/config/aarch64/tuning_models/generic_armv8_a.h > index 35de3f03296..01080cade46 100644 > --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h > +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h > @@ -184,7 +184,7 @@ static const struct tune_params generic_armv8_a_tunings = > (AARCH64_EXTRA_TUNE_BASE > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ > - &generic_prefetch_tune, > + &generic_armv8_a_prefetch_tune, > AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ > AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ > }; > -- > 2.34.1 > > > >
[PATCH] arm: Remove inner 'fix:HF/SF/DF' from fixed-point patterns (PR 117712)
As discussed in the PR, removing the inner 'fix:HF/SD/DF' fixes the problem, like other targets do. gcc/ChangeLog: PR rtl-optimization/117712 * config/arm/arm.md (fix_trunchfsi2): Remove inner fix:HF. (fix_trunchfdi2): Likewise. (fix_truncsfsi2): Remove inner fix:SF. (fix_truncdfsi2): Remove inner fix:DF. * config/arm/vfp.md (truncsisf2_vfp): remove inner fix:SF. (truncsidf2_vfp): Remove inner fix:DF. (fixuns_truncsfsi2): Remove inner fix:SF. (fixuns_truncdfsi2): Remove inner fix:DF. gcc/testsuite/ChangeLog: PR rtl-optimization/117712 * gcc.target/arm/pr117712-df.c: New test. * gcc.target/arm/pr117712-hf-di.c: New test. * gcc.target/arm/pr117712-hf.c: New test. * gcc.target/arm/pr117712-sf.c: New test. --- gcc/config/arm/arm.md | 8 gcc/config/arm/vfp.md | 8 gcc/testsuite/gcc.target/arm/pr117712-df.c| 10 ++ gcc/testsuite/gcc.target/arm/pr117712-hf-di.c | 10 ++ gcc/testsuite/gcc.target/arm/pr117712-hf.c| 10 ++ gcc/testsuite/gcc.target/arm/pr117712-sf.c| 10 ++ 6 files changed, 48 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-df.c create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf-di.c create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf.c create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-sf.c diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 442d86b9329..ed0d0da2e63 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -5477,7 +5477,7 @@ (define_expand "floatsidf2" (define_expand "fix_trunchfsi2" [(set (match_operand:SI 0 "general_operand") - (fix:SI (fix:HF (match_operand:HF 1 "general_operand"] + (fix:SI (match_operand:HF 1 "general_operand")))] "TARGET_EITHER" " { @@ -5489,7 +5489,7 @@ (define_expand "fix_trunchfsi2" (define_expand "fix_trunchfdi2" [(set (match_operand:DI 0 "general_operand") - (fix:DI (fix:HF (match_operand:HF 1 "general_operand"] + (fix:DI (match_operand:HF 1 "general_operand")))] "TARGET_EITHER" " { @@ -5501,14 +5501,14 @@ (define_expand "fix_trunchfdi2" (define_expand "fix_truncsfsi2" [(set (match_operand:SI 0 "s_register_operand") - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand"] + (fix:SI (match_operand:SF 1 "s_register_operand")))] "TARGET_32BIT && TARGET_HARD_FLOAT" " ") (define_expand "fix_truncdfsi2" [(set (match_operand:SI 0 "s_register_operand") - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand"] + (fix:SI (match_operand:DF 1 "s_register_operand")))] "TARGET_32BIT && TARGET_HARD_FLOAT && !TARGET_VFP_SINGLE" " ") diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 379f5f7b3dc..0ef019b1727 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -1508,7 +1508,7 @@ (define_insn "truncsfhf2" (define_insn "*truncsisf2_vfp" [(set (match_operand:SI0 "s_register_operand" "=t") - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" "t"] + (fix:SI (match_operand:SF 1 "s_register_operand" "t")))] "TARGET_32BIT && TARGET_HARD_FLOAT" "vcvt%?.s32.f32\\t%0, %1" [(set_attr "predicable" "yes") @@ -1517,7 +1517,7 @@ (define_insn "*truncsisf2_vfp" (define_insn "*truncsidf2_vfp" [(set (match_operand:SI0 "s_register_operand" "=t") - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand" "w"] + (fix:SI (match_operand:DF 1 "s_register_operand" "w")))] "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE" "vcvt%?.s32.f64\\t%0, %P1" [(set_attr "predicable" "yes") @@ -1527,7 +1527,7 @@ (define_insn "*truncsidf2_vfp" (define_insn "fixuns_truncsfsi2" [(set (match_operand:SI0 "s_register_operand" "=t") - (unsigned_fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" "t"] + (unsigned_fix:SI (match_operand:SF 1 "s_register_operand" "t")))] "TARGET_32BIT && TARGET_HARD_FLOAT" "vcvt%?.u32.f32\\t%0, %1" [(set_attr "predicable" "yes") @@ -1536,7 +1536,7 @@ (define_insn "fixuns_truncsfsi2" (define_insn "fixuns_truncdfsi2" [(set (match_operand:SI0 "s_register_operand" "=t") - (unsigned_fix:SI (fix:DF (match_operand:DF 1 "s_register_operand" "t"] + (unsigned_fix:SI (match_operand:DF 1 "s_register_operand" "t")))] "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE" "vcvt%?.u32.f64\\t%0, %P1" [(set_attr "predicable" "yes") diff --git a/gcc/testsuite/gcc.target/arm/pr117712-df.c b/gcc/testsuite/gcc.target/arm/pr117712-df.c new file mode 100644 index 000..534f2e4ed1d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr117712-df.c @@ -0,0 +1,10 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -
Re: The COBOL front end, version 2, in 15-part harmony
On Sat, Feb 15, 2025 at 10:01 PM James K. Lowden wrote: > > The following 15 patches constitute 134,033 lines of code in 97 files > to build and document the COBOL front end. The messages are > grouped by files in a more or less logical order. We have: > > 4K dir create gcc/cobol and libgcobol directories > 8K pre introduce ChangeLog files > 92K bld config and build machinery > 436K cfg libgcobol/configure > 380K hdr header files > 156K lex lexer > 492K par parser > 360K cbl parser support > 532K api GENERIC interface > 252K gen GENERIC interface support > 72K doc man pages and GnuCOBOL emulation > 24K pos Posix adapter framework > 84K lhd libgcobol header files > 480K lib libgcobol support > 384K lcc libgcobol, main file > > Except for "lib", patches over 400 KB consist of just one big file. For a future possible version 3 of the patch set, you do not need to send big generated files like 'configure' as part of the patch, but just the sources/changes to their templates. Thanks, Richard. > They are against the master branch as of > > commit 3e08a4ecea27c54fda90e8f58641b1986ad957e1 > Date: Wed Feb 5 14:22:33 2025 -0700 > > Our repository is > > https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/ > > using branch > > cobol-stage > > I tested these patches using "git apply" to an unpublished branch > "cobol-patched". I will push it on request. There are some whitespace > warnings that I understand, and some I do not. There is no trailing > whitespace, and tabs occur only in lex/yacc files. > > I have endeavored to address all the issues raised in Round 1. In > particular: > > 1. The patches are against a recent commit. > 2. Generated files use Autoconf 2.69. > 3. Flex and Bison outputs respect --enable-generated-files-in-srcdir. > We use the gcc FLEX and BISON make variables. > 4. Documentation is generated as HTML and PDF. > 5. Python machinery has been patched to add 'cobol' > 6. ChangeLogs ! > 7. libgcobol builds independent of gcc/cobol. The library does not use > compiler header files. Shared information is maintained in library > headers. > 8. --enable-languages=all works. gcobol supports x86_64 and aarch64 > (so far, for now). For unsupported targets, configure reports > gcobol is not built. We have built with multilib enabled and > from bootstrap. > 9. Diagnostic messages go through the diagnostic framework, and report > the location, including the column. > 10. Use xasprintf & friends from libiberty. Removed PATH_MAX. > > Still to come: > > 11. Enumerated warnings in cobol/lang.opt. > 12. texinfo update to describe gcobol > 13. cross-compilation > > This patchset still excludes tests. I will supply tests separately. > Simplest I think is to use the NIST test suite, assuming the code and > documentation passes legal muster. > > I want to thank David and Matthias for their patches, which are > incorporated. My thanks too to the many people contributed invaluable > advice and offered encouragement. > > I remain obdurately hopeful the COBOL front end will be deemed ready > for gcc-15. The von Clausewitz test of any compiler is the real world. > Users kicking the tires push us to improve the compiler in ways that > are are practical to them. (Several features are now pending while we > strive to meet reviewers' concerns.) To that end, I have also prepared > release notes for the www repository under separate cover. > > Thank you for your kind consideration of our work. > > --jkl >