date:20250218

[PATCH] Canonicalize vec_merge in simplify_ternary_operation

2025-02-18 Thread Pengxuan Zheng

Similar to the canonicalization done in combine, we canonicalize vec_merge with
swap_communattive_operands_p in simplify_ternary_operation too.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_exact_log2_inverse): New.
* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero):
Update pattern accordingly.
* config/aarch64/aarch64.cc (aarch64_exact_log2_inverse): New.
* simplify-rtx.cc (simplify_context::simplify_ternary_operation):
Canonicalize vec_merge.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-protos.h |  1 +
 gcc/config/aarch64/aarch64-simd.md  | 10 ++
 gcc/config/aarch64/aarch64.cc   | 10 ++
 gcc/simplify-rtx.cc |  7 +++
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 4235f4a0ca5..2391b99cacd 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1051,6 +1051,7 @@ void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
  rtx *, rtx *, rtx *);
 void aarch64_expand_subvti (rtx, rtx, rtx,
rtx, rtx, rtx, rtx, bool);
+int aarch64_exact_log2_inverse (unsigned int, rtx);
 
 
 /* Initialize builtins for SIMD intrinsics.  */
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index e2afe87e513..1099e742cbf 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1193,12 +1193,14 @@ (define_insn "@aarch64_simd_vec_set"
 (define_insn "aarch64_simd_vec_set_zero"
   [(set (match_operand:VALL_F16 0 "register_operand" "=w")
(vec_merge:VALL_F16
-   (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "")
-   (match_operand:VALL_F16 3 "register_operand" "0")
+   (match_operand:VALL_F16 1 "register_operand" "0")
+   (match_operand:VALL_F16 3 "aarch64_simd_imm_zero" "")
(match_operand:SI 2 "immediate_operand" "i")))]
-  "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0"
+  "TARGET_SIMD && aarch64_exact_log2_inverse (, operands[2]) >= 0"
   {
-int elt = ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2])));
+int elt = ENDIAN_LANE_N (,
+aarch64_exact_log2_inverse (,
+operands[2]));
 operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt);
 return "ins\\t%0.[%p2], zr";
   }
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f5f23f6ff4b..103a00915e5 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23682,6 +23682,16 @@ aarch64_strided_registers_p (rtx *operands, unsigned 
int num_operands,
   return true;
 }
 
+/* Return the base 2 logarithm of the bit inverse of OP masked by the lowest
+   NELTS bits, if OP is a power of 2.  Otherwise, returns -1.  */
+
+int
+aarch64_exact_log2_inverse (unsigned int nelts, rtx op)
+{
+  return exact_log2 ((~INTVAL (op))
+& ((HOST_WIDE_INT_1U << nelts) - 1));
+}
+
 /* Bounds-check lanes.  Ensure OPERAND lies between LOW (inclusive) and
HIGH (exclusive).  */
 void
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index c478bd060fc..22002d1e1ab 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7307,6 +7307,13 @@ simplify_context::simplify_ternary_operation (rtx_code 
code, machine_mode mode,
  return gen_rtx_CONST_VECTOR (mode, v);
}
 
+ if (swap_commutative_operands_p (op0, op1)
+ /* Two operands have same precedence, then first bit of mask
+select first operand.  */
+ || (!swap_commutative_operands_p (op1, op0) && !(sel & 1)))
+   return simplify_gen_ternary (code, mode, mode, op1, op0,
+GEN_INT (~sel & mask));
+
  /* Replace (vec_merge (vec_merge a b m) c n) with (vec_merge b c n)
 if no element from a appears in the result.  */
  if (GET_CODE (op0) == VEC_MERGE)
-- 
2.17.1

[PATCH] aarch64: Fix testcase pr112105.c

2025-02-18 Thread Andrew Pinski

This testcase started to fail with r15-268-g9dbff9c05520a7.
When late_combine was added, it was turned on for -O2+ only,
so this testcase still failed.
This changes the option to be -O2 instead of -O and the testcase
started to pass again.

tested for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr112105.c: Change to be -O2 rather
than -O1.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.target/aarch64/pr112105.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/pr112105.c 
b/gcc/testsuite/gcc.target/aarch64/pr112105.c
index 1368ea3f784..5e60c6184b7 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr112105.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr112105.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O" } */
+/* { dg-options "-O2" } */
 
 #include 
 typedef struct {
-- 
2.43.0

[committed] [PR middle-end/113525] Drop obsolete options from documentation

2025-02-18 Thread Jeff Law



The sibling and unshare passes were dropped as distinct passes 10+ years 
ago.  Docs weren't ever updated.  This just removes them; given their 
age I don't think we need to keep them around any longer.


Pushing to the trunk.

Jeffcommit 3e93035fcc9247928b58443e37fbf844278b7ac7
Author: Jeff Law 
Date:   Tue Feb 18 19:45:29 2025 -0700

[PR middle-end/113525] Drop obsolete options from documentation

The sibling and unshare passes were dropped as distinct passes 10+ years 
ago.
Docs weren't ever updated.  This just removes them; given their age I don't
think we need to keep them around any longer.

PR middle-end/113525

gcc/
* doc/invoke.texi (dump-rtl-sibling): Drop documentation for pass
removed long ago.
(dump-rtl-unshare): Likewise.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d0e0ca80b0c..0c7adc039b5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20383,10 +20383,6 @@ Dump after common sequence discovery.
 @item -fdump-rtl-shorten
 Dump after shortening branches.
 
-@opindex fdump-rtl-sibling
-@item -fdump-rtl-sibling
-Dump after sibling call optimizations.
-
 @opindex fdump-rtl-split1
 @opindex fdump-rtl-split2
 @opindex fdump-rtl-split3
@@ -20417,10 +20413,6 @@ x87's stack-like registers.  This pass is only run on 
x86 variants.
 @option{-fdump-rtl-subreg1} and @option{-fdump-rtl-subreg2} enable dumping 
after
 the two subreg expansion passes.
 
-@opindex fdump-rtl-unshare
-@item -fdump-rtl-unshare
-Dump after all rtl has been unshared.
-
 @opindex fdump-rtl-vartrack
 @item -fdump-rtl-vartrack
 Dump after variable tracking.

Re: [PATCH] LoongArch: Use normal RTL pattern instead of UNSPEC for {x,}vsr{a,l}ri instructions

2025-02-18 Thread Lulu Cheng


LGTM!

Thanks!

在 2025/2/14 下午9:37, Xi Ruoyao 写道:

Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding
shift operation.

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove.
(UNSPEC_LASX_XVSRLRI): Remove.
(lasx_xvsrari_): Remove.
(lasx_xvsrlri_): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_VSRARI): Remove.
(UNSPEC_LSX_VSRLRI): Remove.
(lsx_vsrari_): Remove.
(lsx_vsrlri_): Remove.
* config/loongarch/simd.md (simd__imm_round_): New
define_insn.
(_vri_): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-shift-imm-round.c: New test.
---
  gcc/config/loongarch/lasx.md  | 22 --
  gcc/config/loongarch/lsx.md   | 22 --
  gcc/config/loongarch/simd.md  | 29 +++
  .../loongarch/vect-shift-imm-round.c  | 11 +++
  4 files changed, 40 insertions(+), 44 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-shift-imm-round.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 4ac85b7fcf9..e4505c1660d 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -43,9 +43,7 @@ (define_c_enum "unspec" [
UNSPEC_LASX_XVSAT_U
UNSPEC_LASX_XVREPL128VEI
UNSPEC_LASX_XVSRAR
-  UNSPEC_LASX_XVSRARI
UNSPEC_LASX_XVSRLR
-  UNSPEC_LASX_XVSRLRI
UNSPEC_LASX_XVSHUF
UNSPEC_LASX_XVSHUF_B
UNSPEC_LASX_BRANCH
@@ -2035,16 +2033,6 @@ (define_insn "lasx_xvsrar_"
[(set_attr "type" "simd_shift")
 (set_attr "mode" "")])
  
-(define_insn "lasx_xvsrari_"

-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand 2 "const__operand" "")]
- UNSPEC_LASX_XVSRARI))]
-  "ISA_HAS_LASX"
-  "xvsrari.\t%u0,%u1,%2"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "")])
-
  (define_insn "lasx_xvsrlr_"
[(set (match_operand:ILASX 0 "register_operand" "=f")
(unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
@@ -2055,16 +2043,6 @@ (define_insn "lasx_xvsrlr_"
[(set_attr "type" "simd_shift")
 (set_attr "mode" "")])
  
-(define_insn "lasx_xvsrlri_"

-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand 2 "const__operand" "")]
- UNSPEC_LASX_XVSRLRI))]
-  "ISA_HAS_LASX"
-  "xvsrlri.\t%u0,%u1,%2"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "")])
-
  (define_insn "lasx_xvssub_s_"
[(set (match_operand:ILASX 0 "register_operand" "=f")
(ss_minus:ILASX (match_operand:ILASX 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 9d7254768ae..c35826ffc0e 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -44,9 +44,7 @@ (define_c_enum "unspec" [
UNSPEC_LSX_VSAT_S
UNSPEC_LSX_VSAT_U
UNSPEC_LSX_VSRAR
-  UNSPEC_LSX_VSRARI
UNSPEC_LSX_VSRLR
-  UNSPEC_LSX_VSRLRI
UNSPEC_LSX_VSHUF
UNSPEC_LSX_VEXTW_S
UNSPEC_LSX_VEXTW_U
@@ -1710,16 +1708,6 @@ (define_insn "lsx_vsrar_"
[(set_attr "type" "simd_shift")
 (set_attr "mode" "")])
  
-(define_insn "lsx_vsrari_"

-  [(set (match_operand:ILSX 0 "register_operand" "=f")
-   (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")
- (match_operand 2 "const__operand" "")]
-UNSPEC_LSX_VSRARI))]
-  "ISA_HAS_LSX"
-  "vsrari.\t%w0,%w1,%2"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "")])
-
  (define_insn "lsx_vsrlr_"
[(set (match_operand:ILSX 0 "register_operand" "=f")
(unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")
@@ -1730,16 +1718,6 @@ (define_insn "lsx_vsrlr_"
[(set_attr "type" "simd_shift")
 (set_attr "mode" "")])
  
-(define_insn "lsx_vsrlri_"

-  [(set (match_operand:ILSX 0 "register_operand" "=f")
-   (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")
- (match_operand 2 "const__operand" "")]
-UNSPEC_LSX_VSRLRI))]
-  "ISA_HAS_LSX"
-  "vsrlri.\t%w0,%w1,%2"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "")])
-
  (define_insn "lsx_vssub_s_"
[(set (match_operand:ILSX 0 "register_operand" "=f")
(ss_minus:ILSX (match_operand:ILSX 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 45d2bcaec2e..5e7bd49eaa2 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -932,6 +932,35 @@ (define_expand "_maddw_q_du_d_punned"
DONE;
  })
  
+;; Integer shift right with rounding.

+(define_insn "simd__imm_round_"
+  [(set (match_operand:IVEC 0 "register_operand" "=f")
+   (any_shiftrt:IVEC
+ (plus:IVEC
+   (match_operand:IVEC 1 "re

[PATCH v2] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]

2025-02-18 Thread Pengxuan Zheng

This patch optimizes certain vector permute expansion with the FMOV instruction
when one of the input vectors is a vector of all zeros and the result of the
vector permute is as if the upper lane of the non-zero input vector is set to
zero and the lower lane remains unchanged.

Note that the patch also propagates zero_op0_p and zero_op1_p during re-encode
now.  They will be used by aarch64_evpc_fmov to check if the input vectors are
valid candidates.

PR target/100165

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_lane0_mask_p): New.
* config/aarch64/aarch64-simd.md 
(@aarch64_simd_vec_set_zero_fmov):
New define_insn.
* config/aarch64/aarch64.cc (aarch64_lane0_mask_p): New.
(aarch64_evpc_reencode): Copy zero_op0_p and zero_op1_p.
(aarch64_evpc_fmov): New.
(aarch64_expand_vec_perm_const_1): Add call to aarch64_evpc_fmov.
* config/aarch64/iterators.md (VALL_F16_NO_QI): New mode iterator.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vec-set-zero.c: Update test accordingly.
* gcc.target/aarch64/fmov-1.c: New test.
* gcc.target/aarch64/fmov-2.c: New test.
* gcc.target/aarch64/fmov-3.c: New test.
* gcc.target/aarch64/fmov-be-1.c: New test.
* gcc.target/aarch64/fmov-be-2.c: New test.
* gcc.target/aarch64/fmov-be-3.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-protos.h   |   2 +-
 gcc/config/aarch64/aarch64-simd.md|  13 ++
 gcc/config/aarch64/aarch64.cc |  96 ++-
 gcc/config/aarch64/iterators.md   |   9 +
 gcc/testsuite/gcc.target/aarch64/fmov-1.c | 158 ++
 gcc/testsuite/gcc.target/aarch64/fmov-2.c |  52 ++
 gcc/testsuite/gcc.target/aarch64/fmov-3.c | 144 
 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c  | 144 
 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c  |  52 ++
 gcc/testsuite/gcc.target/aarch64/fmov-be-3.c  | 144 
 .../gcc.target/aarch64/vec-set-zero.c |   6 +-
 11 files changed, 816 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-3.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 4235f4a0ca5..cba94914903 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1051,7 +1051,7 @@ void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
  rtx *, rtx *, rtx *);
 void aarch64_expand_subvti (rtx, rtx, rtx,
rtx, rtx, rtx, rtx, bool);
-
+bool aarch64_lane0_mask_p (unsigned int, rtx);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index e2afe87e513..6ddc27c223e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1190,6 +1190,19 @@ (define_insn "@aarch64_simd_vec_set"
   [(set_attr "type" "neon_ins, neon_from_gp, neon_load1_one_lane")]
 )
 
+(define_insn "@aarch64_simd_vec_set_zero_fmov"
+  [(set (match_operand:VALL_F16_NO_QI 0 "register_operand" "=w")
+   (vec_merge:VALL_F16_NO_QI
+   (match_operand:VALL_F16_NO_QI 1 "register_operand" "w")
+   (match_operand:VALL_F16_NO_QI 2 "aarch64_simd_imm_zero" "Dz")
+   (match_operand:SI 3 "immediate_operand" "i")))]
+  "TARGET_SIMD && aarch64_lane0_mask_p (, operands[3])"
+  {
+return "fmov\\t%0, %1";
+  }
+  [(set_attr "type" "fmov")]
+)
+
 (define_insn "aarch64_simd_vec_set_zero"
   [(set (match_operand:VALL_F16 0 "register_operand" "=w")
(vec_merge:VALL_F16
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f5f23f6ff4b..41e2e5d76d8 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23682,6 +23682,15 @@ aarch64_strided_registers_p (rtx *operands, unsigned 
int num_operands,
   return true;
 }
 
+/* Return TRUE if OP is a valid vec_merge bit mask for lane 0.  */
+
+bool
+aarch64_lane0_mask_p (unsigned int nelts, rtx op)
+{
+  return exact_log2 (INTVAL (op)) >= 0
+&& (ENDIAN_LANE_N (nelts, exact_log2 (INTVAL (op))) == 0);
+}
+
 /* Bounds-check lanes.  Ensure OPERAND lies between LOW (inclusive) and
HIGH (exclusive).  */
 void
@@ -26058,6 +26067,8 @@ aarch64_evpc_reencode (struct expand_vec_perm_d *d)
   newd.target = d->target ? gen_lowpart (new_mode, d->target) : NULL;
   newd.op0 = d->op0 ? gen_lowpart (new_mode, d->op0) : NULL;
   newd.op1 = d->op1 ? gen_lowpart (new_mode, d->op1) : NULL;
+

[PATCH v2] aarch64: Ignore target pragmas while defining intrinsics

2025-02-18 Thread Andrew Carlotti

Compared to v1, I've added a new function aarch64_get_required_features to
avoid having to pass a long list of explicit features.  I also changed
aarch64_target_switcher to only disable TARGET_GENERAL_REGS_ONLY if the
requested flags include FP, to address Richard's comment.

Bootstrapped and regression tested on aarch64. Is this ok for master?

---

When initialising intrinsics with `#pragma GCC aarch64 "arm_*.h"`, we
often set an explicit target, but currently leave current_target_pragma
unchanged.  This results in the target pragma being applied to each
simulated intrinsic on top of our explicit target, which is clearly
undesirable.

As far as I can tell this doesn't cause any bugs at the moment, because
none of the behaviour for builtin functions depends upon the function
specific target.  However, the unintended target feature combinations
led to unwanted behaviour in an under-developement patch.

This patch fixes the issue by extending aarch64_simd_switcher to
explicitly unset the current_target_pragma.  It also simplifies
constructor arguments by automatically including any feature
dependencies, which results in FCMA and BF16 being added to the sets of
features used when handling arm_sve.h and arm_sme.h pragmas.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(struct aarch64_extension_info): Add field.
(aarch64_get_required_features): New.
* config/aarch64/aarch64-builtins.cc
(aarch64_simd_switcher::aarch64_simd_switcher): Rename to...
(aarch64_target_switcher::aarch64_target_switcher): ...this,
remove default simd flags and save current_target_pragma.
(aarch64_simd_switcher::~aarch64_simd_switcher): Rename to...
(aarch64_target_switcher::~aarch64_target_switcher): ...this,
and restore current_target_pragma.
(handle_arm_acle_h): Use aarch64_target_switcher.
(handle_arm_neon_h): Rename switcher and pass explicit flags.
(aarch64_general_init_builtins): Ditto.
* config/aarch64/aarch64-protos.h
(class aarch64_simd_switcher): Rename to...
(class aarch64_target_switcher): ...this, and add pragma member.
(aarch64_get_required_features): New prototype.
* config/aarch64/aarch64-sve-builtins.cc
(sve_switcher::sve_switcher): Rename to...
(sve_target_switcher::sve_target_switcher): ...this.
(sve_switcher::~sve_switcher): Rename to...
(sve_target_switcher::~sve_target_switcher): ...this.
(init_builtins): Rename switcher.
(handle_arm_sve_h): Ditto.
(handle_arm_neon_sve_bridge_h): Ditto.
(handle_arm_sme_h): Ditto.
* config/aarch64/aarch64-sve-builtins.h
(class sve_switcher): Rename to...
(class sve_target_switcher): ...this.
(class sme_switcher): Rename to...
(class sme_target_switcher): ...this.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
ef4458fb69308d2bb6785e97be5be85226cf0ebb..500bf784983d851c54ea4ec59cf3cad29e5e309e
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -157,6 +157,8 @@ struct aarch64_extension_info
   aarch64_feature_flags flags_on;
   /* If this feature is turned off, these bits also need to be turned off.  */
   aarch64_feature_flags flags_off;
+  /* If this feature remains enabled, these bits must also remain enabled.  */
+  aarch64_feature_flags flags_required;
 };
 
 /* ISA extensions in AArch64.  */
@@ -164,9 +166,10 @@ static constexpr aarch64_extension_info all_extensions[] =
 {
 #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, FEATURE_STRING) \
   {NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
-   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
+   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), \
+   feature_deps::IDENT ().enable},
 #include "config/aarch64/aarch64-option-extensions.def"
-  {NULL, 0, 0, 0}
+  {NULL, 0, 0, 0, 0}
 };
 
 struct aarch64_arch_info
@@ -204,6 +207,18 @@ static constexpr aarch64_processor_info all_cores[] =
   {NULL, aarch64_no_cpu, aarch64_no_arch, 0}
 };
 
+/* Return the set of feature flags that are required to be enabled when the
+   features in FLAGS are enabled.  */
+
+aarch64_feature_flags
+aarch64_get_required_features (aarch64_feature_flags flags)
+{
+  const struct aarch64_extension_info *opt;
+  for (opt = all_extensions; opt->name != NULL; opt++)
+if (flags & opt->flag_canonical)
+  flags |= opt->flags_required;
+  return flags;
+}
 
 /* Print a list of CANDIDATES for an argument, and try to suggest a specific
close match.  */
diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
128cc365d3d585e01cb69668f285318ee56a36fc..5174fb1daefee2d73a5098e0de1cca73dc103416
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@

[COMMITTED PATCH] Fix description of file-cache-lines/file-cache-files params

2025-02-18 Thread Andi Kleen

From: Andi Kleen 

The file-cache-lines / file-cache-files tunables were documented in the
wrong section. Fix that.

Reported-by: Filip Kastl

Comitted as obvious.

gcc/ChangeLog:

* doc/invoke.texi:
---
 gcc/doc/invoke.texi | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ca8e468f3f2d..d0e0ca80b0c2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13010,16 +13010,6 @@ having large chains of nested wrapper functions.
 
 Enabled by default.
 
-@item -ffile-cache-files=
-Max number of files in the file cache.
-The file cache is used to print source lines in diagnostics and do some
-source checks like @option{-Wmisleading-indentation}.
-
-@item -ffile-cache-files=
-Max number of lines to index into file cache. When 0 this is automatically 
sized.
-The file cache is used to print source lines in diagnostics and do some
-source checks like @option{-Wmisleading-indentation}.
-
 @opindex fipa-sra
 @item -fipa-sra
 Perform interprocedural scalar replacement of aggregates, removal of
@@ -15792,6 +15782,16 @@ considered for if-conversion.  The compiler will
 also use other heuristics to decide whether if-conversion is likely to be
 profitable.
 
+@item file-cache-files
+Max number of files in the file cache.
+The file cache is used to print source lines in diagnostics and do some
+source checks like @option{-Wmisleading-indentation}.
+
+@item file-cache-files
+Max number of lines to index into file cache. When 0 this is automatically 
sized.
+The file cache is used to print source lines in diagnostics and do some
+source checks like @option{-Wmisleading-indentation}.
+
 @item max-rtl-if-conversion-predictable-cost
 RTL if-conversion will try to remove conditional branches around a block
 and replace them with conditionally executed instructions.  These parameters
-- 
2.48.1

[PATCH] COBOL v3: 3/14 80K bld: config and build machinery

2025-02-18 Thread James K. Lowden

>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:10 PM EST
From: "James K. Lowden" 
Date: Tue 18 Feb 2025 04:19:10 PM EST
Subject: [PATCH] COBOL  3/14 80K bld: config and build machinery

ChangeLog
* Makefile.def: Add libgcobol module and cobol language.
* Makefile.in: Add libgcobol module and cobol language.
* configure.ac: Add libgcobol module and cobol language.

gcc/ChangeLog
* common.opt: New file.
* dwarf2out.cc: Add cobol language.

gcc/cobol/ChangeLog
* LICENSE: New file.
* Make-lang.in: New file.
* config-lang.in: New file.
* lang.opt: New file.
* lang.opt.urls: New file.

libgcobol/ChangeLog
* /Makefile.in: New file.
* /acinclude.m4: New file.
* /aclocal.m4: New file.
* /configure.ac: New file.
* /configure.tgt: New file.

maintainer-scripts/ChangeLog
* maintainer-scripts/update_web_docs_git: Add libgcobol module and 
cobol language.

---
Makefile.def | +-
Makefile.in | 
+++-
configure.ac | -
gcc/cobol/LICENSE | +-
gcc/cobol/Make-lang.in | 
++-
gcc/cobol/config-lang.in | ++-
gcc/cobol/lang.opt | 
-
gcc/cobol/lang.opt.urls | +-
gcc/common.opt | -
gcc/dwarf2out.cc | +-
libgcobol/Makefile.in | 
-
libgcobol/acinclude.m4 | ++-
libgcobol/aclocal.m4 | 
+-
libgcobol/configure.ac | 
+++-
libgcobol/configure.tgt | 
+++-
maintainer-scripts/update_web_docs_git | +
16 files changed, 2005 insertions(+), 20 deletions(-)
diff --git a/Makefile.def b/Makefile.def
index 19954e7d731..d2a1cd55b6e 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -209,6 +209,7 @@ target_modules = { module= libgomp; bootstrap= true; 
lib_path=.libs; };
 target_modules = { module= libitm; lib_path=.libs; };
 target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
 target_modules = { module= libgrust; };
+target_modules = { module= libgcobol; };
 
 // These are (some of) the make targets to be done in each subdirectory.
 // Not all; these are the ones which don't have special options.
@@ -655,6 +656,7 @@ lang_env_dependencies = { module=libgcc; no_gcc=true; 
no_c=true; };
 // built newlib on some targets (e.g. Cygwin).  It still needs
 // a dependency on libgcc for native targets to configure.
 lang_env_dependencies = { module=libiberty; no_c=true; };
+lang_env_dependencies = { module=libgcobol; cxx=true; };
 
 dependencies = { module=configure-target-fastjar; on=configure-target-zlib; };
 dependencies = { module=all-target-fastjar; on=all-target-zlib; };
@@ -690,6 +692,7 @@ dependencies = { module=install-target-libvtv; 
on=install-target-libgcc; };
 dependencies = { module=install-target-libitm; on=install-target-libgcc; };
 dependencies = { module=install-target-libobjc; on=install-target-libgcc; };
 dependencies = { module=install-target-libstdc++-v3; on=install-target-libgcc; 
};
+dependencies = { module=install-target-libgcobol; 
on=install-target-libstdc++-v3; };
 
 // Target modules in the 'src' repository.
 lang_env_dep

[PATCH] COBOL v3: 2/14 8K pre: introduce ChangeLog files

2025-02-18 Thread James K. Lowden

>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:10 PM EST
From: "James K. Lowden" 
Date: Tue 18 Feb 2025 04:19:10 PM EST
Subject: [PATCH] COBOL  2/14 8.0K pre: introduce ChangeLog files

gcc/cobol/ChangeLog
* ChangeLog: New file.

libgcobol/ChangeLog
* /ChangeLog: New file.

---
gcc/cobol/ChangeLog | 
+++-
libgcobol/ChangeLog | +++
2 files changed, 166 insertions(+), 2 deletions(-)
diff --git a/gcc/cobol/ChangeLog b/gcc/cobol/ChangeLog
new file mode 100644
index 000..620265df68e
--- /dev/null
+++ b/gcc/cobol/ChangeLog
@@ -0,0 +1,147 @@
+2025-02-17  Robert Dubner 
+   * Moved #include  from genapi.cc to cobol-system.h as
+   #include 
+   * Removed GCOBOL_FOR_TARGET from /Makefile.def
+   * Removed if $USER = "bob" stuff from cobol/Make-lang.in
+   * Backed -std=c++17 down to c++14 in cobol/Make-lang.in
+   * Removed the single c++17 dependency from show_parse.h ANALYZER
+   * Removed -Wno-cpp from cobol/Make-lang.in
+   * Removed Wno-missing-field-initializers from cobol/Make-lang.in
+   * Added some informative comments to placeholder functions in cobol1.cc
+   * Removed a call to build_tree_list() in cobol1.cc
+   * Use default for LANG_HOOKS_TYPE_FOR_SIZE in cobol1.cc
+   * Commented out, but saved, unused code in convert.cc
+   * Eliminated numerous "-Wmissing-field-initializers" warnings
+
+2025-02-16  Robert Dubner 
+   * Added GTY(()) tags to gengen.h and structs.h.  Put includes for them 
into
+   cobol1.cc
+   * Removed some fixed-length text buffers for handling mangled names
+
+2025-02-11  Robert Dubner 
+   * libgcobol quietly is not built for -m32 systems in a multi-lib build
+   * configure.ac allows COBOL only for x86_64 and aarch64 architectures.
+   Other systems get a warning and the COBOL language is suppressed.
+
+2025-02-07  Robert Dubner 
+   * Modified configure.ac and Makefile.in to notice that MULTISUBDIR=/32 
to
+   suppress 32-builds.
+   * Eliminate -Wunused-result warning in libgcobol.cc compilation
+
+2025-01-28  Robert Dubner 
+   * Remove TRACE1 statements from parser_enter_file and parser_leave_file;
+   they are incompatible with COPY statements in the DATA DIVISION.
+
+2025-01-24  Robert Dubner 
+   * Eliminated missing main() error message; we now rely on linker error
+   * Cleaned up valconv-dupe and charmaps-dupe processing in Make-lang.in
+
+2025-01-21  Robert Dubner 
+   * Eliminated all "local" #includes from .h files; they are instead 
included,
+   in order, in the .cc files.
+
+2025-01-16  Robert Dubner 
+   * Code 88 named-conditional comparisons for floating-point
+
+2025-01-06  Robert Dubner 
+   * Updated warning in tests/check_88 and etests/check_88
+   * Updated some UAT error messages.
+
+2025-01-03  Robert Dubner 
+   * Eliminate old "#if 0" code
+   * Modify line directives to skip over paragraph/section labels:
+   * Unwrapped asprintf calls in assert(), because it was a stupid error.
+
+2025-01-01  Robert Dubner 
+   * Eliminate proc->target_of_call variable; it was unused.
+   * Wrap asprintf calls in assert() to suppress compiler warnings.
+
+2024-12-27  Robert Dubner 
+   * Use built_in version of realloc and free
+   * Use built_in version of strdup, memchr, and memset
+   * Use built_in version of abort
+   * Use built_in version of exit
+   * Use built_in version of strncmp
+   * Use built_in version of strcmp
+   * Use built_in version of strcpy
+
+2024-12-27  Robert Dubner 
+   * Put called_by_main_counter in static memory, not the stack!
+
+2024-12-26  Robert Dubner 
+   * Use built_in version of memcpy
+   * Use built_in version of malloc; required initialization
+   during lang_hook_init
+
+2024-12-25  Robert Dubner 
+   * Normalize #includes in util.cc
+   * Normalize #includes in symfind.cc
+   * Normalize #includes in cdf-copy.cc and copybook.h
+   * Normalize #includes in lexio.cc
+   * Normalize #includes in cdf.y
+   * Normalize #includes in scan.l
+   required the creation of fisspace and fisdigit in util.cc
+   * Normalize #includes in parse.y
+   required the creation of ftolower in util.cc.  Jim uses things like
+   std::transform, which can't take TOLOWER because it is a macro.  So I
+   wrapped those necessary macros into functions.
+   * Normalize #includes in symbols.h.cc
+
+2024-12-23  Robert Dubner 
+
+   * Created ChangeLog
+   * Eliminate vestigial ".global" code
+   * Create "cobol-system.h" file.
+   trimmed .h files in cobol1.cc
+   trimmed .h files in convert.cc
+   trimmed .h files in except.cc
+   trimmed .h files in gcobolspec.cc
+   trimmed .h files in

The COBOL front end, version 3, now in 14 easy pieces

2025-02-18 Thread James K. Lowden

The following 14 patches constitute 105,720 lines of code in 83 files
to build and document the COBOL front end.  The messages are 
in a more or less logical order. We have:

 1/14   4K dir: create gcc/cobol and libgcobol directories
 2/14   8K pre: introduce ChangeLog files
 3/14  80K bld: config and build machinery
 4/14 376K hdr: header files
 5/14 152K lex: lexer
 6/14 476K par: parser
 7/14 344K cbl: parser support
 8/14 516K api: GENERIC interface
 9/14 244K gen: GENERIC interface support
10/14  72K doc: man pages and GnuCOBOL emulation
11/14  84K lhd: libgcobol header files
12/14 320K lib: libgcobol support
13/14 372K lcc: libgcobol, main file
14/14 148K fun: libgcobol, intrinsic functions

To slide under the 400 KB limit, the intrinsic functions now have
their own patch.  The configure files are removed, as is the Posix
adapter framework.

They are still against the master branch as of 

commit 3e08a4ecea27c54fda90e8f58641b1986ad957e1
Date:   Wed Feb 5 14:22:33 2025 -0700

Our repository is 

https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/

using branch

cobol-stage

I tested these patches using "git apply" to an unpublished branch
"cobol-patched". 

We have endeavored to address all must-fix issues raised in Round 2.  

1.  Generated files use Autoconf 2.69 
2.  Commit message matches mail Subject: line
3.  Various problems with Make-lang.in and cobol1.cc
4.  s/assert(false)/gcc_unreachable()/g
5.  Nixed range-based cases
6.  Removed Posix adapter files & generated configure scripts
7.  Explained memory-management engineering choice
8.  s/option_id/option_zero/g, for clarity
9.  GTY issues
10. Require only C++14 (not 17)
11. Moved #include 
12. Check regex buffer bounds outside gcc_assert

Still to do (no particular order): 

13. Try SARIF options
14. Do not compose messages (I18N).
15. Try valgrind for memory report
16. Review 
https://github.com/cooljeanius/legislation/blob/master/tech/21-R-mrg.htm.diff
17. Enumerated warnings in cobol/lang.opt. 
18. texinfo update to describe gcobol
19. cross-compilation

There are a few places where gcc_unreachable() is now followed by truly 
unreachable code. We will lop off those bits soon.  

This patchset still excludes tests. I will supply tests separately.
Simplest I think is to use the NIST test suite, assuming the code and
documentation pass legal muster. 

I have also prepared release notes for the www repository under
separate cover.  

We remain hopeful the COBOL front end will be accepted into gcc-15.  

Thank you for your kind consideration of our work.

--jkl

[PATCH] i386: Implement Thread Local Storage on Windows

2025-02-18 Thread Julian Waters

Hi all,

This is a reimplementation of Windows Thread Local Storage, rewritten to 
support native thread local access on Windows, which had previous been using 
emulated thread local storage mechanisms. Note that due to issues on my end, I 
was unable to regenerate configure no matter what I tried. I do not have write 
access to gcc, and will need help with committing this once the green light is 
given (Although approval was already given by MINGW maintainers in the relevant 
bug at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881 and in private 
communication)

best regards,
Julian

gcc\ChangeLog:

* config/i386/i386.cc (ix86_legitimate_constant_p): Handle new UNSPEC
(legitimate_pic_operand_p): Handle new UNSPEC
(legitimate_pic_address_disp_p): Handle new UNSPEC
(ix86_legitimate_address_p): Handle new UNSPEC
(ix86_tls_index_symbol): New symbol
(ix86_tls_index): Handle creation of _tls_index symbol
(legitimize_tls_address): Create thread local access sequence
(output_pic_addr_const): Handle new UNSPEC
(i386_output_dwarf_dtprel): Handle new UNSPEC
(i386_asm_output_addr_const_extra): Handle new UNSPEC
* config/i386/i386.h (TARGET_WIN32_TLS): Define
* config/i386/i386.md: New UNSPEC
* config/i386/predicates.md: Handle new UNSPEC
* config/mingw/mingw32.h (TARGET_WIN32_TLS): Define
(TARGET_ASM_SELECT_SECTION): Define
(DEFAULT_TLS_SEG_REG): Define
* config/mingw/winnt.cc (mingw_pe_select_section): Handle TLS section
(mingw_pe_unique_section): Select TLS section
* config/mingw/winnt.h (mingw_pe_select_section): Declare
* configure.ac: New check for broken linker thread local support

>From 05d4491d862a16426f2a0986e7f3598714615f93 Mon Sep 17 00:00:00 2001
From: Julian Waters 
Date: Tue, 15 Oct 2024 20:56:22 +0800
Subject: [PATCH] Implement Windows TLS

Signed-off-by: Julian Waters 
---
 gcc/config/i386/i386.cc   | 61 ++-
 gcc/config/i386/i386.h|  1 +
 gcc/config/i386/i386.md   |  1 +
 gcc/config/i386/predicates.md |  1 +
 gcc/config/mingw/mingw32.h|  9 ++
 gcc/config/mingw/winnt.cc | 14 
 gcc/config/mingw/winnt.h  |  1 +
 gcc/configure.ac  | 29 +
 8 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 473e4cbf10e..304189bd947 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -11170,6 +11170,9 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
x = XVECEXP (x, 0, 0);
return (GET_CODE (x) == SYMBOL_REF
&& SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_DYNAMIC);
+ case UNSPEC_SECREL32:
+   x = XVECEXP (x, 0, 0);
+   return GET_CODE (x) == SYMBOL_REF;
  default:
return false;
  }
@@ -11306,6 +11309,9 @@ legitimate_pic_operand_p (rtx x)
x = XVECEXP (inner, 0, 0);
return (GET_CODE (x) == SYMBOL_REF
&& SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_EXEC);
+ case UNSPEC_SECREL32:
+   x = XVECEXP (inner, 0, 0);
+   return GET_CODE (x) == SYMBOL_REF;
  case UNSPEC_MACHOPIC_OFFSET:
return legitimate_pic_address_disp_p (x);
  default:
@@ -11486,6 +11492,9 @@ legitimate_pic_address_disp_p (rtx disp)
   disp = XVECEXP (disp, 0, 0);
   return (GET_CODE (disp) == SYMBOL_REF
  && SYMBOL_REF_TLS_MODEL (disp) == TLS_MODEL_LOCAL_DYNAMIC);
+case UNSPEC_SECREL32:
+  disp = XVECEXP (disp, 0, 0);
+  return GET_CODE (disp) == SYMBOL_REF;
 }
 
   return false;
@@ -11763,6 +11772,7 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool 
strict,
  case UNSPEC_INDNTPOFF:
  case UNSPEC_NTPOFF:
  case UNSPEC_DTPOFF:
+ case UNSPEC_SECREL32:
break;
 
  default:
@@ -11788,7 +11798,8 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool 
strict,
  || GET_CODE (XEXP (XEXP (disp, 0), 0)) != UNSPEC
  || !CONST_INT_P (XEXP (XEXP (disp, 0), 1))
  || (XINT (XEXP (XEXP (disp, 0), 0), 1) != UNSPEC_DTPOFF
- && XINT (XEXP (XEXP (disp, 0), 0), 1) != UNSPEC_NTPOFF))
+ && XINT (XEXP (XEXP (disp, 0), 0), 1) != UNSPEC_NTPOFF
+ && XINT (XEXP (XEXP (disp, 0), 0), 1) != UNSPEC_SECREL32))
/* Non-constant pic memory reference.  */
return false;
}
@@ -12112,6 +12123,22 @@ get_thread_pointer (machine_mode tp_mode, bool to_reg)
   return tp;
 }
 
+/* Construct the SYMBOL_REF for the _tls_index symbol.  */
+
+static GTY(()) rtx ix86_tls_index_symbol;
+
+static rtx
+ix86_tls_index (void)
+{
+  if (!ix86_tls_index_symbol)
+ix86_tls_index_symbol = gen_rtx_SYMBOL_REF (SImode, "_tls_index");
+
+  if

Re: [PATCH] COBOL v3: 8/14 516K api: GENERIC interface

2025-02-18 Thread Andrew Pinski

On Tue, Feb 18, 2025 at 10:52 PM James K. Lowden
 wrote:
>
> From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:11 PM EST
> From: "James K. Lowden" 
> Date: Tue 18 Feb 2025 04:19:11 PM EST
> Subject: [PATCH] COBOL  8/14 516K api: GENERIC interface

A few comments about this:

> +static
> +void
> +treeplet_fill_source(TREEPLET &treeplet, cbl_refer_t &refer)
> +  {
> +  treeplet.pfield = gg_get_address_of(refer.field->var_decl_node);
> +  treeplet.offset = refer_offset_source(refer);
> +  treeplet.length = refer_size_source(refer);
> +  }

This function (and many others) are missing a comment in the front
describing what it does with each argument.

> +_Float128 src = (_Float128)sourceref.field->data.value;

Is this in the front-end or is this in the target library.  Either way
I see it is used unconditionally.
For the front-end, you should use the real.h interface for floats. For
the target you need to use it only conditionally otherwise it won't
work on targets which don't have _Float128.
I noticed __int128 use in this file too. The same thing applies here
except for the front-end, you should use the wide-int.h interface. And
only define it conditionally for target code.

Also you can't use 128bit integer as a tree type either unless you
check the target supports it. There is at least one 64bit GCC target
which does NOT support 128bit integers (HPPA64).

I see strfromf128 is used here but that was only added to glibc in
2017 and GCC still supports older glibc that don't have full _Float128
support. see above about using real.h.

Thanks,
Andrew Pinski

Re: Ping: [PATCH] testsuite: Fix up toplevel-asm-1.c for LoongArch

2025-02-18 Thread Lulu Cheng




在 2025/2/19 下午3:27, Xi Ruoyao 写道:

On Wed, 2025-02-05 at 08:57 +0800, Xi Ruoyao wrote:

Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs
even with -fno-pic.

gcc/testsuite/ChangeLog:

* c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3
%c4 on LoongArch.
---

Ok for trunk?

Ping.


LGTM!

Thanks.




  gcc/testsuite/c-c++-common/toplevel-asm-1.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/toplevel-asm-1.c
b/gcc/testsuite/c-c++-common/toplevel-asm-1.c
index d6766b00e72..e1687d28e0b 100644
--- a/gcc/testsuite/c-c++-common/toplevel-asm-1.c
+++ b/gcc/testsuite/c-c++-common/toplevel-asm-1.c
@@ -9,7 +9,7 @@ int v[42];
  void foo (void) {}
  
  /* Not all targets can use %cN even in non-pic code.  */

-#if defined(__riscv)
+#if defined(__riscv) || defined(__loongarch__)
  asm ("# %0 %1 %2 %cc3 %cc4 %5 %% %="
  #else
  asm ("# %0 %1 %2 %c3 %c4 %5 %% %="

[PATCH] c++: Enhance -Wuninitialized to check private base class [PR80681]

2025-02-18 Thread xxie-xd

The issue described in PR80681 highlights a problem that:
g++'s -Wuninitialized option does not warn when a privately inherited
base class contains public const data or reference members, and the
derived class does not have a user-provided constructor.

Similarly, the same issue occurs when the privately inherited base
class contains protected const data or reference members. In both
cases, the derived class is unable to initialize these members in the
base class.

For private const data or reference members in privately inherited
base classes, these members are inherently inaccessible to the derived
class and cannot be initialized. Therefore, they are not considered as
a condition for issuing a warning.

In my proposed patch, under the condition that the current class
does not have a user-provided constructor and the -Wuninitialized option
is enabled, I traverse all directly privately inherited base classes
of the current class. For each base class, I check whether it contains
any non-private const data or reference members. If such members are
found, a warning is issued at the declaration location of the
current class. Additionally, supplementary information is provided to
indicate the declaration location of the non-private const data or
reference members in the base class.

Successfully bootstrapped and regretested on x86_64-pc-linux-gnu:
adds 21 PASS results to g++.sum.

PR c++/80681

gcc/cp/ChangeLog:

* class.cc (check_bases_and_members): Enhanced -Wuninitialized to
warn for classes without user-provided constructors that privately
inherit base classes with non-private const data or reference members.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuninitialized-pr80681-1.C: New test.
---
 gcc/cp/class.cc   | 61 +++
 .../g++.dg/warn/Wuninitialized-pr80681-1.C| 19 ++
 2 files changed, 80 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wuninitialized-pr80681-1.C

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index d5ae69b0fdf..49c4ef08f33 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -6500,6 +6500,67 @@ check_bases_and_members (tree t)
OPT_Wuninitialized, "non-static const member %q#D "
"in class without a constructor", field);
}
+  /* If the class privately inherited from a class with public
+   or protected non-static const or reference data members,
+   these members can never be initialized.  */
+
+  tree binfo = TYPE_BINFO (t);
+  vec *accesses = BINFO_BASE_ACCESSES (binfo);
+  tree base_binfo;
+  unsigned i;
+
+  for (i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
+   {
+ tree basetype = TREE_TYPE (base_binfo);
+
+ if ((*accesses)[i] == access_private_node)
+   {
+ tree base_field;
+
+ for (base_field = TYPE_FIELDS (basetype); base_field;
+  base_field = DECL_CHAIN (base_field))
+   {
+ tree field_type;
+
+ if (TREE_CODE (base_field) != FIELD_DECL
+ || DECL_INITIAL (base_field) != NULL_TREE)
+   continue;
+
+ field_type = TREE_TYPE (base_field);
+
+ if (!TREE_PRIVATE (base_field))
+   {
+ if (TYPE_REF_P (field_type))
+   {
+ warning (OPT_Wuninitialized,
+  "private inheritance of base class "
+  "%q#T with non-private "
+  "non-static reference in class "
+  "without a constructor",
+  basetype);
+ inform (DECL_SOURCE_LOCATION (base_field),
+ "non-static reference %q#D here:",
+ base_field);
+   }
+ else if (CP_TYPE_CONST_P (field_type)
+  && (!CLASS_TYPE_P (field_type)
+  || !TYPE_HAS_DEFAULT_CONSTRUCTOR (
+field_type)))
+   {
+ warning (OPT_Wuninitialized,
+  "private inheritance of base class "
+  "%q#T with non-private "
+  "non-static const member in class "
+  "without a constructor",
+  basetype);
+ inform (DECL_SOURCE_LOCATION (base_field),
+ "non-static const member %q#D here:",
+ base_field);
+   }
+   }
+   }
+   }
+   }
 }
 
   /* Synthesize any needed methods.  */
diff --git a/gcc/testsuite/g++.dg/warn/

Ping: [PATCH] testsuite: Fix up toplevel-asm-1.c for LoongArch

2025-02-18 Thread Xi Ruoyao

On Wed, 2025-02-05 at 08:57 +0800, Xi Ruoyao wrote:
> Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs
> even with -fno-pic.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3
>   %c4 on LoongArch.
> ---
> 
> Ok for trunk?

Ping.

>  gcc/testsuite/c-c++-common/toplevel-asm-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/c-c++-common/toplevel-asm-1.c
> b/gcc/testsuite/c-c++-common/toplevel-asm-1.c
> index d6766b00e72..e1687d28e0b 100644
> --- a/gcc/testsuite/c-c++-common/toplevel-asm-1.c
> +++ b/gcc/testsuite/c-c++-common/toplevel-asm-1.c
> @@ -9,7 +9,7 @@ int v[42];
>  void foo (void) {}
>  
>  /* Not all targets can use %cN even in non-pic code.  */
> -#if defined(__riscv)
> +#if defined(__riscv) || defined(__loongarch__)
>  asm ("# %0 %1 %2 %cc3 %cc4 %5 %% %="
>  #else
>  asm ("# %0 %1 %2 %c3 %c4 %5 %% %="

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.

2025-02-18 Thread Jeff Law





On 2/18/25 7:30 PM, Jin Ma wrote:



I apologize for not explaining things more clearly. I also discovered that
the issue is caused by CSE. I think that during the substitution process,
CSE recognized the syntax of if_then_else and concluded that the expressions
in the "then" and "else" branches are equivalent, resulting in both yielding
(reg/v:RVVMF2SF 140 [ vreg_memory ]):

(minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
 (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 
[0x0.0p+0]

is considered equivalent to:

(reg/v:RVVMF2SF 140 [ vreg_memory ])

Clearly, there wasn’t a deeper consideration of the fact that float_extend 
requires
a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to 
inform
CSE that we have a rounding mode.

Right.  It worked, but there's a deeper issue here.



As I mentioned before, this may not be a good solution, as it risks missing 
other
optimization opportunities. As you pointed out, we need a more general approach
to fix it. Unfortunately, while I’m still trying to find a solution, I currently
don't have any other good ideas.
Changing the rounding modes isn't common, but it's not unheard of.  My 
suspicion is that we need to expose the rounding mode assignment earlier 
(at RTL generation time).


That may not work well with the current optimization of FRM, but I think 
early exposure is the only viable path forward in my mind.  Depending on 
the depth of the problems it may not be something we can fix in the 
gcc-15 space.


You might experiment with emitting the FRM assignment in the 
insn_expander class in the risc-v backend.  This code:

/* Add rounding mode operand.  */
if (m_insn_flags & FRM_DYN_P)
  add_rounding_mode_operand (FRM_DYN);
else if (m_insn_flags & FRM_RUP_P)
  add_rounding_mode_operand (FRM_RUP);
else if (m_insn_flags & FRM_RDN_P)
  add_rounding_mode_operand (FRM_RDN);
else if (m_insn_flags & FRM_RMM_P)
  add_rounding_mode_operand (FRM_RMM);
else if (m_insn_flags & FRM_RNE_P)
  add_rounding_mode_operand (FRM_RNE);
else if (m_insn_flags & VXRM_RNU_P)
  add_rounding_mode_operand (VXRM_RNU);
else if (m_insn_flags & VXRM_RDN_P)
  add_rounding_mode_operand (VXRM_RDN);


For anything other than FRM_DYN_P emit the appropriate insn to set FRM. 
This may generate poor code in the presence of explicit rounding modes, 
but I think something along these lines is ultimately going to be needed.


jeff

RE: [PATCH] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]

2025-02-18 Thread quic_pzheng

> Pengxuan Zheng  writes:
> > This patch optimizes certain vector permute expansion with the FMOV
> > instruction when one of the input vectors is a vector of all zeros and
> > the result of the vector permute is as if the upper lane of the
> > non-zero input vector is set to zero and the lower lane remains
unchanged.
> >
> > Note that the patch also propagates zero_op0_p and zero_op1_p during
> > re-encode now.  They will be used by aarch64_evpc_fmov to check if the
> > input vectors are valid candidates.
> >
> > PR target/100165
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md
> (aarch64_simd_vec_set_zero_fmov):
> > New define_insn.
> > * config/aarch64/aarch64.cc (aarch64_evpc_reencode): Copy
> zero_op0_p and
> > zero_op1_p.
> > (aarch64_evpc_fmov): New function.
> > (aarch64_expand_vec_perm_const_1): Add call to
> aarch64_evpc_fmov.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/vec-set-zero.c: Update test accordingly.
> > * gcc.target/aarch64/fmov.c: New test.
> > * gcc.target/aarch64/fmov-be.c: New test.
> 
> Nice!  Thanks for doing this.  Some comments on the patch below.
> >
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-simd.md|  14 +++
> >  gcc/config/aarch64/aarch64.cc |  74 +++-
> >  gcc/testsuite/gcc.target/aarch64/fmov-be.c|  74 
> >  gcc/testsuite/gcc.target/aarch64/fmov.c   | 110 ++
> >  .../gcc.target/aarch64/vec-set-zero.c |   6 +-
> >  5 files changed, 275 insertions(+), 3 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/aarch64/fmov-be.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index e456f693d2f..543126948e7 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -1190,6 +1190,20 @@ (define_insn "aarch64_simd_vec_set"
> >[(set_attr "type" "neon_ins, neon_from_gp,
> > neon_load1_one_lane")]
> >  )
> >
> > +(define_insn "aarch64_simd_vec_set_zero_fmov"
> > +  [(set (match_operand:VP_2E 0 "register_operand" "=w")
> > +   (vec_merge:VP_2E
> > +   (match_operand:VP_2E 1 "aarch64_simd_imm_zero" "Dz")
> > +   (match_operand:VP_2E 3 "register_operand" "w")
> > +   (match_operand:SI 2 "immediate_operand" "i")))]
> > +  "TARGET_SIMD
> > +   && (ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2]))) ==
> 1)"
> > +  {
> > +return "fmov\\t%0, %3";
> > +  }
> > +  [(set_attr "type" "fmov")]
> > +)
> > +
> 
> I think this shows that target-independent code is missing some
> canonicalisation of vec_merge.  combine has:
> 
>   unsigned n_elts = 0;
>   if (GET_CODE (x) == VEC_MERGE
>   && CONST_INT_P (XEXP (x, 2))
>   && GET_MODE_NUNITS (GET_MODE (x)).is_constant (&n_elts)
>   && (swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1))
> /* Two operands have same precedence, then
>first bit of mask select first operand.  */
> || (!swap_commutative_operands_p (XEXP (x, 1), XEXP (x, 0))
> && !(UINTVAL (XEXP (x, 2)) & 1
> {
>   rtx temp = XEXP (x, 0);
>   unsigned HOST_WIDE_INT sel = UINTVAL (XEXP (x, 2));
>   unsigned HOST_WIDE_INT mask = HOST_WIDE_INT_1U;
>   if (n_elts == HOST_BITS_PER_WIDE_INT)
>   mask = -1;
>   else
>   mask = (HOST_WIDE_INT_1U << n_elts) - 1;
>   SUBST (XEXP (x, 0), XEXP (x, 1));
>   SUBST (XEXP (x, 1), temp);
>   SUBST (XEXP (x, 2), GEN_INT (~sel & mask));
> }
> 
> which AFAICT would prefer to put the immediate second, not first.  I think
we
> should be doing the same canonicalisation in simplify_ternary_operation,
and
> possibly elsewhere, so that the .md pattern only needs to match the
canonical
> form (i.e. register, immedate, mask).

Thanks for the suggestion. I've added the canonicalization in a separate
patch.
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/676105.html

> 
> On:
> 
> > +   && (ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2]))) ==
> 1)"
> 
> it seems dangerous to pass exact_log2 to ENDIAN_LANE_N when we haven't
> checked whether it is a power of 2.  (0b00 or 0b11 ought to get
simplified, but
> I don't think we can ignore the possibility.)
> 
> Rather than restrict the pattern to pairs, could we instead handle
> VALL_F16 minus the QI elements, with the 16-bit elements restricted to
> TARGET_F16?  E.g. we should be able to handle V4SI using an FMOV of S
> registers if only the low element is nonzero.

Good point! I've addressed these in the latest version. Please let me know
if I missed anything.
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/676106.html

Thanks,
Pengxuan

> 
> Part of me thinks that this should just be described as a plain old AND,
but I
> suppose that doesn't work well for FP modes.  Still, handling ANDs might
be
> an interesting follow-up :)
> 
> Thank

Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.

2025-02-18 Thread Jin Ma

On Tue, 18 Feb 2025 13:48:02 -0700, Jeff Law wrote:
> 
> 
> On 2/18/25 4:12 AM, Jin Ma wrote:
> > We overlooked the side effects of the rounding mode in the pattern,
> > which can impact the result of float_extend and lead to incorrect
> > optimizations in the final program. This issue likely affects nearly
> > all similar patterns that involve rounding modes, and the tests in
> > this patch only highlight one example. It seems challenging to address,
> > and I only implemented a simple fix, which is not a good way to solve
> > the problem.
> > 
> > Any comments on this?
> > 
> > gcc/ChangeLog:
> > 
> > * config/riscv/vector-iterators.md (UNSPEC_VRM): New.
> > * config/riscv/vector.md: Use UNSPEC for float_extend.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/riscv/rvv/base/bug-11.c: New test.
> So as Kito note, the insn you changed already has a reference to the FRM 
> it needs -- kept in operands[9].  It seems like your patch, while fixing 
> the bug, more likely does so by accident rather than by design.
> 
> What I see when I look at the dump files is a deeper issue.
> 
> 
> In the .expand dump we have:
> 
> > (insn 17 16 18 2 (set (reg:HF 147)
> > (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 -1
> >  (nil))
> > (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
> > (if_then_else:RVVMF2SF (unspec:RVVMF64BI [
> > (reg/v:RVVMF64BI 138 [ vmask ])
> > (const_int 1 [0x1])
> > (const_int 0 [0])
> > (const_int 2 [0x2])
> > (const_int 0 [0])
> > (const_int 2 [0x2])
> > (reg:SI 66 vl)
> > (reg:SI 67 vtype)
> > (reg:SI 69 frm)
> > ] UNSPEC_VPREDICATE)
> > (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
> > (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (reg:HF 
> > 147
> > (reg/v:RVVMF2SF 140 [ vreg_memory ]))) "j.c":14:24 -1
> >  (nil))   
> 
> 
> 
> Insn 18 does the subtraction with the adjusted rounding mode.  So far, 
> so good.  Things look fine at the start of cse1.  But if we look at the 
> end of cse1 we have:
> 
> > (insn 17 16 18 2 (set (reg:HF 147)
> > (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 136 {*movhf_hardfloat}
> >  (nil))
> > (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
> > (reg/v:RVVMF2SF 140 [ vreg_memory ])) "j.c":14:24 2786 
> > {*movrvvmf2sf_fract}
> >  (expr_list:REG_DEAD (reg:HF 147)
> > (expr_list:REG_DEAD (reg/v:RVVMF2SF 140 [ vreg_memory ])
> > (expr_list:REG_DEAD (reg/v:RVVMF64BI 138 [ vmask ])
> > (expr_list:REG_DEAD (reg:SI 69 frm)
> > (nil))
> 
> 
> Note how CSE replace the arithmetic with a simple copy.  At this point 
> things are broken.
> 
> I don't see how CSE can make the right decision here; we don't expose 
> rounding modes this early and thus CSE has no way to know it can't make 
> that kind of replacement.
> 
> You patch kindof works, but it seems to me it's more accident than 
> design and that we need to fix this in a more general manner.
> 
> The natural question is what do other targets do when the rounding mode 
> gets changed.  I'm guessing its exposed as a unspec set before the RTL 
> optimizers run.

I apologize for not explaining things more clearly. I also discovered that
the issue is caused by CSE. I think that during the substitution process,
CSE recognized the syntax of if_then_else and concluded that the expressions
in the "then" and "else" branches are equivalent, resulting in both yielding
(reg/v:RVVMF2SF 140 [ vreg_memory ]):

(minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ]) 
(float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 
[0x0.0p+0]

is considered equivalent to:

(reg/v:RVVMF2SF 140 [ vreg_memory ])

Clearly, there wasn’t a deeper consideration of the fact that float_extend 
requires
a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to 
inform
CSE that we have a rounding mode.

As I mentioned before, this may not be a good solution, as it risks missing 
other
optimization opportunities. As you pointed out, we need a more general approach
to fix it. Unfortunately, while I’m still trying to find a solution, I currently
don't have any other good ideas.

Best regards,
Jin Ma

> jeff

[PATCH v2] Vect: Fix ICE when vect_verify_loop_lens acts on relevant mode [PR116351]

2025-02-18 Thread pan2 . li

From: Pan Li 

This patch would like to fix the ICE similar as below, assump we have
sample code:

   1   │ int a, b, c;
   2   │ short d, e, f;
   3   │ long g (long h) { return h; }
   4   │
   5   │ void i () {
   6   │   for (; b; ++b) {
   7   │ f = 5 >> a ? d : d << a;
   8   │ e &= c | g(f);
   9   │   }
  10   │ }

It will ice when compile with -O3 -march=rv64gc_zve64f -mrvv-vector-bits=zvl

during GIMPLE pass: vect
pr116351-1.c: In function ‘i’:
pr116351-1.c:8:6: internal compiler error: in get_len_load_store_mode,
at optabs-tree.cc:655
8 | void i () {
  |  ^
0x44d6b9d internal_error(char const*, ...)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517
0x44a26a6 fancy_abort(char const*, int, char const*)

/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722
0x19e4309 get_len_load_store_mode(machine_mode, bool, internal_fn*, vec*)

/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs-tree.cc:655
0x1fada40 vect_verify_loop_lens

/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:1566
0x1fb2b07 vect_analyze_loop_2
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3037
0x1fb4302 vect_analyze_loop_1

/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3478
0x1fb4e9a vect_analyze_loop(loop*, gimple*, vec_info_shared*)

/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3638
0x203c2dc try_vectorize_loop_1

/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1095
0x203c839 try_vectorize_loop

/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1212
0x203cb2c execute

During vectorization the override_widen pattern matched and then will get DImode
as vector_mode in loop_info.  After that the loop_vinfo will step in 
vect_analyze_xx
with below flow:

vect_analyze_loop_2
 |- vect_pattern_recog // over-widening and set loop_vinfo->vector_mode to 
DImode
 |- ...
 |- vect_analyze_loop_operations
   |- stmt_info->def_type == vect_reduction_def
   |- stmt_info->slp_type == pure_slp
   |- vectorizable_lc_phi // Not Hit
   |- vectorizable_induction  // Not Hit
   |- vectorizable_reduction  // Not Hit
   |- vectorizable_recurr // Not Hit
   |- vectorizable_live_operation  // Not Hit
   |- vect_analyze_stmt
 |- stmt_info->relevant == vect_unused_in_scope
 |- stmt_info->live == false
 |- p pattern_stmt_info == (stmt_vec_info) 0x0
 |- return opt_result::success ();
 OR
 |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP 
analysis\n"
   |- Early return opt_result::success ();
 |- vectorizable_load/store/call_convert/... // Not Hit
   |- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && 
!LOOP_VINFO_MASKS(loop_vinfo).is_empty ()
 |- vect_verify_loop_lens (loop_vinfo)
   |- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert result 
in ICE

Finally, the DImode in loop_vinfo will hit the assert (VECTOR_MODE_P (mode))
in vect_verify_loop_lens.  This patch would like to return false
directly if the loop_vinfo has relevant mode like DImode for the ICE
fix, but still may have mis-optimization for similar cases.  We will try
to cover that in separated patches.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

PR middle-end/116351

gcc/ChangeLog:

* tree-vect-loop.cc (vect_verify_loop_lens): Return false if the
loop_vinfo has relevant mode such as DImode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr116351-1.c: New test.
* gcc.target/riscv/rvv/base/pr116351-2.c: New test.
* gcc.target/riscv/rvv/base/pr116351.h: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/base/pr116351-1.c |  5 +
 .../gcc.target/riscv/rvv/base/pr116351-2.c |  5 +
 .../gcc.target/riscv/rvv/base/pr116351.h   | 18 ++
 gcc/tree-vect-loop.cc  |  3 +++
 4 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116351.h

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-1.c
new file mode 100644
index 000..f58fedfeaf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-1.c
@@ -0,0 +1,5 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zve32x -mabi=lp64d -O3 -ftree-vectorize" } */
+
+#include "pr116351.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116351-2.c 
b/gcc/t

[PATCH] COBOL v3: 10/14 72K doc: man pages and GnuCOBOL emulation

2025-02-18 Thread James K. Lowden

>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:12 PM EST
From: "James K. Lowden" 
Date: Tue 18 Feb 2025 04:19:12 PM EST
Subject: [PATCH] COBOL 10/14 72K doc: man pages and GnuCOBOL emulation

gcc/cobol/ChangeLog
* gcobc: New file.
* gcobol.1: New file.
* gcobol.3: New file.
* help.gen: New file.

gcc/cobol/udf/ChangeLog
* udf/stored-char-length.cbl: New file.

---
gcc/cobol/gcobc | 
+-
gcc/cobol/gcobol.1 | 
-
gcc/cobol/gcobol.3 | 
-
gcc/cobol/help.gen | +++-
gcc/cobol/udf/stored-char-length.cbl | +++
5 files changed, 2451 insertions(+), 5 deletions(-)
diff --git a/gcc/cobol/gcobc b/gcc/cobol/gcobc
new file mode 100755
index 000..93e1bd302a6
--- /dev/null
+++ b/gcc/cobol/gcobc
@@ -0,0 +1,465 @@
+#! /bin/sh -e
+
+#
+# COPYRIGHT
+# The gcobc program is in public domain.
+# If it breaks then you get to keep both pieces.
+#
+# This file emulates the GnuCOBOL cobc compiler to a limited degree.
+# For options that can be "mapped" (see migration-guide.1), it accepts
+# cobc options, changing them to the gcobol equivalents.  Options not
+# recognized by the script are passed verbatim to gcobol, which will
+# reject them unless of course they are gcobol options.
+#
+# User-defined variables, and their defaults:
+#
+# Variable Default Effect
+# echo  none   If defined, echo the gcobol command
+# gcobcxnone   Produce verbose messages
+# gcobol   ./gcobolName of the gcobol binary
+# GCOBCUDF PREFIX/share/cobol/udf/Location of UDFs to be prepended to 
input
+#
+# By default, this script includes all files in $GCOBCUDF.  To defeat
+# that behavior, use GCOBCUDF=none.
+#
+# A list of supported options is produced with "gcobc -HELP".
+#
+## Maintainer note. In modifying this file, the following may make
+## your life easier:
+##
+##  - To force the script to exit, either set exit_status to 1, or call
+##the error function.
+##  - As handled options are added, add them to the HELP here-doc.
+##  - The compiler can produce only one kind of output.  In this
+##script, that's known by $mode.  Options that affect the type of
+##output set the mode variable.  Everything else is appended to the
+##opts variable.
+##
+
+if [ "$COBCPY" ]
+then
+copydir="-I$COBCPY"
+fi
+
+if [ "$COB_COPY_DIR" ]
+then
+copydir="-I$COB_COPY_DIR"
+fi
+
+# TODO: this file likely needs to query gcobol for its shared path instead
+udf_default="${0%/*}/../share/gcobol/udf"
+if [ ! -d "$udfdir" ]
+then
+

[PATCH] COBOL v3: 1/14 4K dir: create gcc/cobol and libgcobol directories

2025-02-18 Thread James K. Lowden

>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:09 PM EST
From: "James K. Lowden" 
Date: Tue 18 Feb 2025 04:19:09 PM EST
Subject: [PATCH] COBOL  1/14 4.0K dir: create gcc/cobol and libgcobol 
directories

contrib/gcc-changelog/ChangeLog
* contrib/gcc-changelog/git_commit.py: Add libgcobol module and cobol 
language.

---
contrib/gcc-changelog/git_commit.py | ++
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 5c0596c2627..c2297d1051f 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -39,6 +39,7 @@ default_changelog_locations = {
 'gcc/c-family',
 'gcc',
 'gcc/cp',
+'gcc/cobol',
 'gcc/d',
 'gcc/fortran',
 'gcc/go',
@@ -66,6 +67,7 @@ default_changelog_locations = {
 'libgcc',
 'libgcc/config/avr/libf7',
 'libgcc/config/libbid',
+'libgcobol',
 'libgfortran',
 'libgm2',
 'libgomp',

[PATCH] COBOL v3: 11/14 84K lhd: libgcobol header files

2025-02-18 Thread James K. Lowden

>From f89a50238de62b73d9fc44ee7226461650ab119d Tue 18 Feb 2025 04:19:13 PM EST
From: "James K. Lowden" 
Date: Tue 18 Feb 2025 04:19:13 PM EST
Subject: [PATCH] COBOL 11/14 84K lhd: libgcobol header files

libgcobol/ChangeLog
* /charmaps.h: New file.
* /common-defs.h: New file.
* /ec.h: New file.
* /exceptl.h: New file.
* /gcobolio.h: New file.
* /gfileio.h: New file.
* /gmath.h: New file.
* /io.h: New file.
* /libgcobol.h: New file.
* /valconv.h: New file.

---
libgcobol/charmaps.h | 
+-
libgcobol/common-defs.h | 
-
libgcobol/ec.h | 
+-
libgcobol/exceptl.h | 
-
libgcobol/gcobolio.h | 
++-
libgcobol/gfileio.h | +-
libgcobol/gmath.h | ++-
libgcobol/io.h | 
+-
libgcobol/libgcobol.h | 
+-
libgcobol/valconv.h | 

10 files changed, 2017 insertions(+), 10 deletions(-)
diff --git a/libgcobol/charmaps.h b/libgcobol/charmaps.h
new file mode 100644
index 000..64270c6f08c
--- /dev/null
+++ b/libgcobol/charmaps.h
@@ -0,0 +1,369 @@
+/*
+ * Copyright (c) 2021-2025 Symas Corporation
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following disclaimer
+ *   in the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of the Symas Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived from
+ *   this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef CHARMAPS_H
+#define CHARMAPS_H
+
+#include 
+
+/*  There are four distinct codeset domains in the COBOL compiler.
+ *
+ *  First is the codeset of the console.  Established by looking at what
+ *  setlocale() reports, this can be either UTF-8 or some ASCII based code
+ *  page.  (We assume CP1252).  Data coming from the console or the system,
+ *  ACCEPT statements;

New Chinese (simplified) PO file for 'cpplib' (version 15-b20250216)

2025-02-18 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Chinese (simplified) team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/zh_CN.po

(This file, 'cpplib-15-b20250216.zh_CN.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Contents of PO file 'cpplib-15-b20250216.zh_CN.po'

2025-02-18 Thread Translation Project Robot



cpplib-15-b20250216.zh_CN.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

RE: [PATCH] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]

2025-02-18 Thread quic_pzheng

> > Pengxuan Zheng  writes:
> > > This patch optimizes certain vector permute expansion with the FMOV
> > > instruction when one of the input vectors is a vector of all zeros
> > > and the result of the vector permute is as if the upper lane of the
> > > non-zero input vector is set to zero and the lower lane remains
> unchanged.
> > >
> > > Note that the patch also propagates zero_op0_p and zero_op1_p during
> > > re-encode now.  They will be used by aarch64_evpc_fmov to check if
> > > the input vectors are valid candidates.
> > >
> > >   PR target/100165
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-simd.md
> > (aarch64_simd_vec_set_zero_fmov):
> > >   New define_insn.
> > >   * config/aarch64/aarch64.cc (aarch64_evpc_reencode): Copy
> > zero_op0_p and
> > >   zero_op1_p.
> > >   (aarch64_evpc_fmov): New function.
> > >   (aarch64_expand_vec_perm_const_1): Add call to
> > aarch64_evpc_fmov.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/vec-set-zero.c: Update test accordingly.
> > >   * gcc.target/aarch64/fmov.c: New test.
> > >   * gcc.target/aarch64/fmov-be.c: New test.
> >
> > Nice!  Thanks for doing this.  Some comments on the patch below.
> > >
> > > Signed-off-by: Pengxuan Zheng 
> > > ---
> > >  gcc/config/aarch64/aarch64-simd.md|  14 +++
> > >  gcc/config/aarch64/aarch64.cc |  74 +++-
> > >  gcc/testsuite/gcc.target/aarch64/fmov-be.c|  74 
> > >  gcc/testsuite/gcc.target/aarch64/fmov.c   | 110
++
> > >  .../gcc.target/aarch64/vec-set-zero.c |   6 +-
> > >  5 files changed, 275 insertions(+), 3 deletions(-)  create mode
> > > 100644 gcc/testsuite/gcc.target/aarch64/fmov-be.c
> > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov.c
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > > b/gcc/config/aarch64/aarch64-simd.md
> > > index e456f693d2f..543126948e7 100644
> > > --- a/gcc/config/aarch64/aarch64-simd.md
> > > +++ b/gcc/config/aarch64/aarch64-simd.md
> > > @@ -1190,6 +1190,20 @@ (define_insn "aarch64_simd_vec_set"
> > >[(set_attr "type" "neon_ins, neon_from_gp,
> > > neon_load1_one_lane")]
> > >  )
> > >
> > > +(define_insn "aarch64_simd_vec_set_zero_fmov"
> > > +  [(set (match_operand:VP_2E 0 "register_operand" "=w")
> > > + (vec_merge:VP_2E
> > > + (match_operand:VP_2E 1 "aarch64_simd_imm_zero" "Dz")
> > > + (match_operand:VP_2E 3 "register_operand" "w")
> > > + (match_operand:SI 2 "immediate_operand" "i")))]
> > > +  "TARGET_SIMD
> > > +   && (ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2])))
> > > +==
> > 1)"
> > > +  {
> > > +return "fmov\\t%0, %3";
> > > +  }
> > > +  [(set_attr "type" "fmov")]
> > > +)
> > > +
> >
> > I think this shows that target-independent code is missing some
> > canonicalisation of vec_merge.  combine has:
> >
> >   unsigned n_elts = 0;
> >   if (GET_CODE (x) == VEC_MERGE
> >   && CONST_INT_P (XEXP (x, 2))
> >   && GET_MODE_NUNITS (GET_MODE (x)).is_constant (&n_elts)
> >   && (swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1))
> >   /* Two operands have same precedence, then
> >  first bit of mask select first operand.  */
> >   || (!swap_commutative_operands_p (XEXP (x, 1), XEXP (x, 0))
> >   && !(UINTVAL (XEXP (x, 2)) & 1
> > {
> >   rtx temp = XEXP (x, 0);
> >   unsigned HOST_WIDE_INT sel = UINTVAL (XEXP (x, 2));
> >   unsigned HOST_WIDE_INT mask = HOST_WIDE_INT_1U;
> >   if (n_elts == HOST_BITS_PER_WIDE_INT)
> > mask = -1;
> >   else
> > mask = (HOST_WIDE_INT_1U << n_elts) - 1;
> >   SUBST (XEXP (x, 0), XEXP (x, 1));
> >   SUBST (XEXP (x, 1), temp);
> >   SUBST (XEXP (x, 2), GEN_INT (~sel & mask));
> > }
> >
> > which AFAICT would prefer to put the immediate second, not first.  I
> > think we should be doing the same canonicalisation in
> > simplify_ternary_operation, and possibly elsewhere, so that the .md
> > pattern only needs to match the canonical form (i.e. register, immedate,
> mask).
> 
> Thanks for the suggestion. I've added the canonicalization in a separate
patch.
> https://gcc.gnu.org/pipermail/gcc-patches/2025-February/676105.html
> 
> >
> > On:
> >
> > > +   && (ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2])))
> > > + ==
> > 1)"
> >
> > it seems dangerous to pass exact_log2 to ENDIAN_LANE_N when we
> haven't
> > checked whether it is a power of 2.  (0b00 or 0b11 ought to get
> > simplified, but I don't think we can ignore the possibility.)
> >
> > Rather than restrict the pattern to pairs, could we instead handle
> > VALL_F16 minus the QI elements, with the 16-bit elements restricted to
> > TARGET_F16?  E.g. we should be able to handle V4SI using an FMOV of S
> > registers if only the low element is nonzero.
> 
> Good point! I've addressed these in the latest version. Please let me know
if I
> missed anything.
> https://gcc.gnu.org/pipermail/gcc-patches/2025-February/676106.html

Missed

[PATCH v3] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]

2025-02-18 Thread Pengxuan Zheng

This patch optimizes certain vector permute expansion with the FMOV instruction
when one of the input vectors is a vector of all zeros and the result of the
vector permute is as if the upper lane of the non-zero input vector is set to
zero and the lower lane remains unchanged.

Note that the patch also propagates zero_op0_p and zero_op1_p during re-encode
now.  They will be used by aarch64_evpc_fmov to check if the input vectors are
valid candidates.

PR target/100165

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_lane0_mask_p): New.
* config/aarch64/aarch64-simd.md 
(@aarch64_simd_vec_set_zero_fmov):
New define_insn.
* config/aarch64/aarch64.cc (aarch64_lane0_mask_p): New.
(aarch64_evpc_reencode): Copy zero_op0_p and zero_op1_p.
(aarch64_evpc_fmov): New.
(aarch64_expand_vec_perm_const_1): Add call to aarch64_evpc_fmov.
* config/aarch64/iterators.md (VALL_F16_NO_QI): New mode iterator.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vec-set-zero.c: Update test accordingly.
* gcc.target/aarch64/fmov-1.c: New test.
* gcc.target/aarch64/fmov-2.c: New test.
* gcc.target/aarch64/fmov-3.c: New test.
* gcc.target/aarch64/fmov-be-1.c: New test.
* gcc.target/aarch64/fmov-be-2.c: New test.
* gcc.target/aarch64/fmov-be-3.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-protos.h   |   2 +-
 gcc/config/aarch64/aarch64-simd.md|  13 ++
 gcc/config/aarch64/aarch64.cc |  96 ++-
 gcc/config/aarch64/iterators.md   |   9 +
 gcc/testsuite/gcc.target/aarch64/fmov-1.c | 158 ++
 gcc/testsuite/gcc.target/aarch64/fmov-2.c |  52 ++
 gcc/testsuite/gcc.target/aarch64/fmov-3.c | 144 
 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c  | 144 
 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c  |  52 ++
 gcc/testsuite/gcc.target/aarch64/fmov-be-3.c  | 144 
 .../gcc.target/aarch64/vec-set-zero.c |   6 +-
 11 files changed, 816 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-3.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 4235f4a0ca5..cba94914903 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1051,7 +1051,7 @@ void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
  rtx *, rtx *, rtx *);
 void aarch64_expand_subvti (rtx, rtx, rtx,
rtx, rtx, rtx, rtx, bool);
-
+bool aarch64_lane0_mask_p (unsigned int, rtx);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index e2afe87e513..6ddc27c223e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1190,6 +1190,19 @@ (define_insn "@aarch64_simd_vec_set"
   [(set_attr "type" "neon_ins, neon_from_gp, neon_load1_one_lane")]
 )
 
+(define_insn "@aarch64_simd_vec_set_zero_fmov"
+  [(set (match_operand:VALL_F16_NO_QI 0 "register_operand" "=w")
+   (vec_merge:VALL_F16_NO_QI
+   (match_operand:VALL_F16_NO_QI 1 "register_operand" "w")
+   (match_operand:VALL_F16_NO_QI 2 "aarch64_simd_imm_zero" "Dz")
+   (match_operand:SI 3 "immediate_operand" "i")))]
+  "TARGET_SIMD && aarch64_lane0_mask_p (, operands[3])"
+  {
+return "fmov\\t%0, %1";
+  }
+  [(set_attr "type" "fmov")]
+)
+
 (define_insn "aarch64_simd_vec_set_zero"
   [(set (match_operand:VALL_F16 0 "register_operand" "=w")
(vec_merge:VALL_F16
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f5f23f6ff4b..c29a43f2553 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23682,6 +23682,15 @@ aarch64_strided_registers_p (rtx *operands, unsigned 
int num_operands,
   return true;
 }
 
+/* Return TRUE if OP is a valid vec_merge bit mask for lane 0.  */
+
+bool
+aarch64_lane0_mask_p (unsigned int nelts, rtx op)
+{
+  return exact_log2 (INTVAL (op)) >= 0
+&& (ENDIAN_LANE_N (nelts, exact_log2 (INTVAL (op))) == 0);
+}
+
 /* Bounds-check lanes.  Ensure OPERAND lies between LOW (inclusive) and
HIGH (exclusive).  */
 void
@@ -26058,6 +26067,8 @@ aarch64_evpc_reencode (struct expand_vec_perm_d *d)
   newd.target = d->target ? gen_lowpart (new_mode, d->target) : NULL;
   newd.op0 = d->op0 ? gen_lowpart (new_mode, d->op0) : NULL;
   newd.op1 = d->op1 ? gen_lowpart (new_mode, d->op1) : NULL;
+

Re: [PATCH v2 15/16] Add error cases and tests for Aarch64 FMV.

2025-02-18 Thread Richard Sandiford

Alfie Richards  writes:
> This changes the ambiguation error for C++ to cover cases of differently
> annotated FMV function sets whose signatures only differ by their return
> type.
>
> It also adds tests covering many FMV errors for Aarch64, including
> redeclaration, and mixing target_clones and target_versions.

The tests look good.  Sorry for not applying the series to find out
for myself, but what's the full message for:

> diff --git a/gcc/testsuite/g++.target/aarch64/mvc-error2.C 
> b/gcc/testsuite/g++.target/aarch64/mvc-error2.C
> new file mode 100644
> index 000..0e956e402d8
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/mvc-error2.C
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-require-ifunc "" } */
> +/* { dg-options "-O0" } */
> +/* { dg-additional-options "-Wno-experimental-fmv-target" } */
> +
> +__attribute__ ((target_clones ("default, dotprod"))) float
> +foo () { return 3; } /* { dg-message "previously defined here" } */
> +
> +__attribute__ ((target_clones ("dotprod", "mve"))) float
> +foo () { return 3; } /* { dg-error "redefinition of" } */

...the redefinition error here?  Does it mention dotprod specifically?
If so, it might be worth capturing that in the test, so that we don't
regress later.

Thanks,
Richard

Re: 7/7 [Fortran, Patch, Coarray, PR107635] Remove deprecated coarray routines

2025-02-18 Thread Thomas Koenig


Am 18.02.25 um 16:00 schrieb Andre Vehreschild:

Hi Thomas,


This patch series (of necessity) introduces ABI changes.  What will
happen with user code compiled against the old interface?


That depends on the library you are linking against. When using caf_single from
gfortran, then you will get link failures when you mix code compiled by
gfortran < 15 and gfortran-15. But caf_single is anyhow only considered for
testing. So why should one do this ?


OK.


If your questions targets the users of this ABI, which to my knowledge is only
OpenCoarrays at the moment, then the user will experience nothing. A mix of
pre-gfortran-15 and gfortran-15 generated .o-files will link and work as
expected, because OpenCoarrays provides all ABIs. We do not compile a
gfortran-15 exclusive version of OpenCoarrays, i.e. all routines are present,
fully functional and interoperable.


Very good, then.


I guess a link failure (plus an answer in stack exchange where the
explanation is given, so people can google it, and a mention in the
release notes) would be acceptable, but is there anything that
can be done in addition?


I can provide an entry in release notes, if need be. Where do I have to do
this? Never did.


It is a separate repository from the gcc source, it can be found by
cloning  git+ssh://you...@gcc.gnu.org/git/gcc-wwwdocs.git .

Best regards (and a lot of thanks for the patch series!)

Thomas

New template for 'gcc' made available

2025-02-18 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'gcc' has been made available
to the language teams for translation.  It is archived as:

https://translationproject.org/POT-files/gcc-15-b20250216.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

https://gcc.gnu.org/pub/gcc/snapshots/15-20250216/gcc-15-20250216.tar.xz

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

New template for 'cpplib' made available

2025-02-18 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'cpplib' has been made available
to the language teams for translation.  It is archived as:

https://translationproject.org/POT-files/cpplib-15-b20250216.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

https://gcc.gnu.org/pub/gcc/snapshots/15-20250216/gcc-15-20250216.tar.xz

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] c, v2: do not warn about truncating NUL char when initializing nonstring arrays [PR117178]

2025-02-18 Thread Kees Cook

On Fri, Feb 14, 2025 at 11:21:07AM +0100, Jakub Jelinek wrote:
> On Thu, Feb 13, 2025 at 02:10:25PM +0100, Jakub Jelinek wrote:
> > Kees, are you submitting this under assignment to FSF (maybe the Google one
> > if it has one) or DCO?  See https://gcc.gnu.org/contribute.html#legal
> > for details.  If DCO, can you add your Signed-off-by: tag for it?
> > 
> > So far lightly tested, ok for trunk if it passes bootstrap/regtest?
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux successfully.

Thank you for getting this done! I really appreciate having this
available. I'll give it a spin. :)

-- 
Kees Cook

Contents of PO file 'cpplib-15-b20250216.uk.po'

2025-02-18 Thread Translation Project Robot



cpplib-15-b20250216.uk.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] aarch64: Ignore target pragmas while defining intrinsics

2025-02-18 Thread Richard Sandiford

Andrew Carlotti  writes:
> When initialising intrinsics with `#pragma GCC aarch64 "arm_*.h"`, we
> often set an explicit target, but currently leave current_target_pragma
> unchanged.  This results in the target pragma being applied to each
> simulated intrinsic on top of our explicit target, which is clearly 
> undesirable.
>
> As far as I can tell this doesn't cause any bugs at the moment, because
> none of the behaviour for builtin functions depends upon the function
> specific target.  However, the unintended target feature combinations
> led to unwanted behaviour in an under-developement patch.
>
> This patch resolves the issue by extending aarch64_simd_switcher to
> explicitly unset the current_target_pragma, and adapting it for to
> support handle_arm_acle_h as well.  I've also renamed the switcher classes
> and instances, because I think the new names a slightly clearer.
>
> The chosen sets of features for arm_sve.h and arm_sme.h are not normally
> valid, because they exclude FCMA and BF16.  However, I don't think that
> matters for the usage here.  Alternatively, aarch64_target_switcher
> could be modified to enable all the dependent features as well.
>
>
> Bootstrapped and regression tested on aarch64. Ok for master (to enable the
> dependant WIP patch)?
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc
>   (aarch64_simd_switcher::aarch64_simd_switcher): Rename to...
>   (aarch64_target_switcher::aarch64_target_switcher): ...this,
>   remove default simd flags and save current_target_pragma.
>   (aarch64_simd_switcher::~aarch64_simd_switcher): Rename to...
>   (aarch64_target_switcher::~aarch64_target_switcher): ...this,
>   and restore current_target_pragma.
>   (handle_arm_acle_h): Use aarch64_target_switcher.
>   (handle_arm_neon_h): Rename switcher and pass explicit flags.
>   (aarch64_general_init_builtins): Ditto.
>   * config/aarch64/aarch64-protos.h
>   (class aarch64_simd_switcher): Rename to...
>   (class aarch64_target_switcher): ...this, and add pragma member.
>   * config/aarch64/aarch64-sve-builtins.cc
>   (sve_switcher::sve_switcher): Rename to...
>   (sve_target_switcher::sve_target_switcher): ...this.
>   (sve_switcher::~sve_switcher): Rename to...
>   (sve_target_switcher::~sve_target_switcher): ...this.
>   (init_builtins): Rename switcher.
>   (handle_arm_sve_h): Ditto.
>   (handle_arm_neon_sve_bridge_h): Ditto.
>   (handle_arm_sme_h): Ditto.
>   * config/aarch64/aarch64-sve-builtins.h
>   (class sve_switcher): Rename to...
>   (class sve_target_switcher): ...this.
>   (class sme_switcher): Rename to...
>   (class sme_target_switcher): ...this.
>
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> 128cc365d3d585e01cb69668f285318ee56a36fc..c1cb6cdcc81c6b45c0132250589bba0be42f195d
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1877,23 +1877,25 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type t)
>return (t == Poly8_t || t == Poly16_t || t == Poly64_t || t == Poly128_t);
>  }
>  
> -/* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced SIMD
> -   set.  */
> -aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags 
> extra_flags)
> +/* Temporarily set FLAGS as the enabled target features.  */
> +aarch64_target_switcher::aarch64_target_switcher (aarch64_feature_flags 
> flags)
>: m_old_asm_isa_flags (aarch64_asm_isa_flags),
> -m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY)
> +m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY),
> +m_old_target_pragma (current_target_pragma)
>  {
>/* Changing the ISA flags should be enough here.  We shouldn't need to
>   pay the compile-time cost of a full target switch.  */
>global_options.x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
> -  aarch64_set_asm_isa_flags (AARCH64_FL_FP | AARCH64_FL_SIMD | extra_flags);
> +  aarch64_set_asm_isa_flags (flags);

This feels a bit inconsistent, in that it forces -mgeneral-regs off
but doesn't force AARCH64_FL_FP on.  I think it'd be better to keep
this part of aarch64_simd_(target_)switcher (and continue to have
sve_(target_)switcher derive from it) and make aarch64_target_switcher
a new base class that just does the pragma bit.

Thanks,
Richard

> +  current_target_pragma = NULL_TREE;
>  }
>  
> -aarch64_simd_switcher::~aarch64_simd_switcher ()
> +aarch64_target_switcher::~aarch64_target_switcher ()
>  {
>if (m_old_general_regs_only)
>  global_options.x_target_flags |= MASK_GENERAL_REGS_ONLY;
>aarch64_set_asm_isa_flags (m_old_asm_isa_flags);
> +  current_target_pragma = m_old_target_pragma;
>  }
>  
>  /* Implement #pragma GCC aarch64 "arm_neon.h".
> @@ -1903,7 +1905,7 @@ aarch64_simd_switcher::~aarch64_simd_switcher ()
>  void
>  handle_arm_neon_h (void)
>  {
> -  aarch64_simd_switcher simd;
> +  a

[PATCH] avoid-store-forwarding: Handle REG_EH_REGION notes

2025-02-18 Thread Konstantinos Eleftheriou

From: kelefth 

The pass rejects the transformation when there are instructions in the
sequence that might throw an exception. This was added due to having
cases that the load instruction contains a REG_EH_REGION note and
moving it before the store instructions caused an error, as it was
no longer the last instruction in the basic block.

This patch handles those cases by moving a possible REG_EH_REGION
note from the load instruction of the store-load sequence to the
last instruction of the basic block.

gcc/ChangeLog:

* avoid-store-forwarding.cc (process_store_forwarding):
(store_forwarding_analyzer::avoid_store_forwarding):
Move a possible REG_EH_REGION note from the load instruction
to the last instruction of the basic block.
---
 gcc/avoid-store-forwarding.cc | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
index 34a7bba4043..05c91bb1a82 100644
--- a/gcc/avoid-store-forwarding.cc
+++ b/gcc/avoid-store-forwarding.cc
@@ -400,6 +400,17 @@ process_store_forwarding (vec &stores, 
rtx_insn *load_insn,
   if (load_elim)
 delete_insn (load_insn);
 
+  /* Find possible REG_EH_REGION note in the load instruction and move it
+ into the last instruction of the basic block.  */
+  rtx reg_eh_region_note = find_reg_note (load_insn, REG_EH_REGION, NULL_RTX);
+  if (reg_eh_region_note != NULL_RTX)
+{
+  remove_note (load_insn, reg_eh_region_note);
+  basic_block load_bb = BLOCK_FOR_INSN (load_insn);
+  add_reg_note (BB_END (load_bb), REG_EH_REGION,
+   XEXP (reg_eh_region_note, 0));
+}
+
   return true;
 }
 
@@ -425,7 +436,7 @@ store_forwarding_analyzer::avoid_store_forwarding 
(basic_block bb)
 
   rtx set = single_set (insn);
 
-  if (!set || insn_could_throw_p (insn))
+  if (!set)
{
  store_exprs.truncate (0);
  continue;
-- 
2.47.0

[committed] testsuite: Include stdint.h instead of stdint-gcc.h in some tests

2025-02-18 Thread John David Anglin

Fixes PR testsuite/116986.  Tested on hppa-unknown-linux-gnu and
hppa64-hp-hpux11.11.

Committed to trunk.

Dave
---

testsuite: Include stdint.h instead of stdint-gcc.h in some tests

When use_gcc_stdint=provide, the stdint-gcc.h header is not provided.

2025-02-18  John David Anglin  

gcc/testsuite/ChangeLog:

PR testsuite/116986
* gcc.dg/crc-builtin-rev-target32.c: Include stdint.h
instead of stdint-gcc.h.
* gcc.dg/crc-builtin-rev-target64.c: Likewise.
* gcc.dg/crc-builtin-target32.c: Likewise.
* gcc.dg/crc-builtin-target64.c: Likewise.
* gcc.dg/torture/pr115387-2.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/crc-builtin-rev-target32.c 
b/gcc/testsuite/gcc.dg/crc-builtin-rev-target32.c
index 4fc58e5f513..f2b63db7fd1 100644
--- a/gcc/testsuite/gcc.dg/crc-builtin-rev-target32.c
+++ b/gcc/testsuite/gcc.dg/crc-builtin-rev-target32.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target int32plus } */
 /* { dg-additional-options "-fdump-rtl-expand-details" } */
 
-#include 
+#include 
 
 int8_t rev_crc8_data8 ()
 {
diff --git a/gcc/testsuite/gcc.dg/crc-builtin-rev-target64.c 
b/gcc/testsuite/gcc.dg/crc-builtin-rev-target64.c
index d63981e0101..97e80004d37 100644
--- a/gcc/testsuite/gcc.dg/crc-builtin-rev-target64.c
+++ b/gcc/testsuite/gcc.dg/crc-builtin-rev-target64.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target int32plus } */
 /* { dg-additional-options "-fdump-rtl-expand-details" } */
 
-#include 
+#include 
 
 int8_t rev_crc8_data8 ()
 {
diff --git a/gcc/testsuite/gcc.dg/crc-builtin-target32.c 
b/gcc/testsuite/gcc.dg/crc-builtin-target32.c
index 13db531e93a..43db8c96e16 100644
--- a/gcc/testsuite/gcc.dg/crc-builtin-target32.c
+++ b/gcc/testsuite/gcc.dg/crc-builtin-target32.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target int32plus } */
 /* { dg-additional-options "-fdump-rtl-expand-details" } */
 
-#include 
+#include 
 
 int8_t crc8_data8 ()
 {
diff --git a/gcc/testsuite/gcc.dg/crc-builtin-target64.c 
b/gcc/testsuite/gcc.dg/crc-builtin-target64.c
index 4b3d813995a..09aa39fcd86 100644
--- a/gcc/testsuite/gcc.dg/crc-builtin-target64.c
+++ b/gcc/testsuite/gcc.dg/crc-builtin-target64.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target int32plus } */
 /* { dg-additional-options "-fdump-rtl-expand-details" } */
 
-#include 
+#include 
 
 int8_t crc8_data8 ()
 {
diff --git a/gcc/testsuite/gcc.dg/torture/pr115387-2.c 
b/gcc/testsuite/gcc.dg/torture/pr115387-2.c
index 9e93024b45c..190ad4b0977 100644
--- a/gcc/testsuite/gcc.dg/torture/pr115387-2.c
+++ b/gcc/testsuite/gcc.dg/torture/pr115387-2.c
@@ -2,7 +2,7 @@
 /* { dg-do compile } */
 
 #include 
-#include 
+#include 
 
 char *
 test (char *string, size_t maxlen)


signature.asc
Description: PGP signature

Re: [PATCH v2 06/16] Change function versions to be implicitly ordered.

2025-02-18 Thread Richard Sandiford

Alfie Richards  writes:
> On 18/02/2025 12:11, Richard Sandiford wrote:
>> Alfie Richards  writes:
>>> This changes function version structures to maintain the default version
>>> as the first declaration in the linked data structures by giving priority
>>> to the set containing the default when constructing the structure.
>>>
>>> This allows for removing logic for moving the default to the first
>>> position which was duplicated across target specific code and enables
>>> easier reasoning about function sets when checking for a default.
>>>
>>> gcc/ChangeLog:
>>>
>>> * cgraph.cc (cgraph_node::record_function_versions): Update to
>>> implicitly keep default first.
>>> * config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher):
>>> Remove reordering.
>>> * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher):
>>> Remove reordering.
>>> * config/riscv/riscv.cc (riscv_get_function_versions_dispatcher):
>>> Remove reordering.
>>> * config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher):
>>> Remove reordering.
>>> ---
>>>   gcc/cgraph.cc| 27 -
>>>   gcc/config/aarch64/aarch64.cc| 37 +++-
>>>   gcc/config/i386/i386-features.cc | 33 -
>>>   gcc/config/riscv/riscv.cc| 41 +++-
>>>   gcc/config/rs6000/rs6000.cc  | 35 +--
>>>   5 files changed, 49 insertions(+), 124 deletions(-)
>>>
>>> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
>>> index d0b19ad850e..bf6b43d00db 100644
>>> --- a/gcc/cgraph.cc
>>> +++ b/gcc/cgraph.cc
>>> @@ -247,7 +247,9 @@ cgraph_node::record_function_versions (tree decl1, tree 
>>> decl2)
>>> decl1_v = decl1_node->function_version ();
>>> decl2_v = decl2_node->function_version ();
>>>   
>>> -  if (decl1_v != NULL && decl2_v != NULL)
>>> +  /* If the nodes are already linked, skip.  */
>>> +  if ((decl1_v != NULL && (decl1_v->next || decl1_v->prev))
>>> +  && (decl2_v != NULL && (decl2_v->next || decl2_v->prev)))
>>>   return;
>>>   
>>> if (decl1_v == NULL)
>>> @@ -256,18 +258,31 @@ cgraph_node::record_function_versions (tree decl1, 
>>> tree decl2)
>>> if (decl2_v == NULL)
>>>   decl2_v = decl2_node->insert_new_function_version ();
>>>   
>>> -  /* Chain decl2_v and decl1_v.  All semantically identical versions
>>> - will be chained together.  */
>>> +  gcc_assert (decl1_v);
>>> +  gcc_assert (decl2_v);
>>>   
>>> before = decl1_v;
>>> after = decl2_v;
>>>   
>>> +  /* Go to first after node.  */
>>> +  while (after->prev != NULL)
>>> +after = after->prev;
>>> +
>>> +  while (before->prev != NULL)
>>> +before = before->prev;
>>> +
>>> +  /* Potentially swap the nodes to maintain the default always being in the
>>> + first position.  */
>>> +  if (before->next
>>> +  ? !is_function_default_version (before->this_node->decl)
>>> +  : is_function_default_version (after->this_node->decl))
>>> +std::swap (before, after);
>>> +
>>> +  /* Go to last node of before.  */
>>> while (before->next != NULL)
>>>   before = before->next;
>>>   
>>> -  while (after->prev != NULL)
>>> -after= after->prev;
>>> -
>>> +  /* Chain decl2_v and decl1_v.  */
>> I think this can be simplified to:
>>
>>before = decl1_v;
>>after = decl2_v;
>>
>>/* Potentially swap the nodes to maintain the default always being in the
>>   first position.  */
>>if (before->prev || before->next
>>? is_function_default_version (after->this_node->decl)
>>: !is_function_default_version (before->this_node->decl))
>>  std::swap (before, after);
>>
>>while (before->next != NULL)
>>  before = before->next;
>>   
>>while (after->prev != NULL)
>>  after = after->prev;
>>
>> That is, if one decl is linked (and so the other is not), we only want
>> to put the other decl first if it is the default.
> I see your point here, which I think relies on the assumption that 
> functions get
> added to the structure one by one rather than in a fractal pattern.
> This assumption is already used here subtly so that makes sense.
>
> I added this logic to at least try make this work in a slightly more 
> general case as
> to tell if a structure contains the default we should check the first 
> element
> of that structure, but it is unnecessary given that knowledge.
>
> I would prefer to change this to make that more explicit and change this 
> to be
> "add_decl_to_version_into" taking a cgraph_function_version_info for the 
> existing structure and a decl for the version to add to make this 
> explicit. Would that change work for you?

Yeah, sounds good to me.  I agree that it would be better than having
to maintain symmetry, and it should make the interface a bit simpler.
Honza should have the final say though.

Thanks,
Richard

Re: [PATCH] RISC-V: Fix some dynamic LMUL costing.

2025-02-18 Thread Robin Dapp

As just agreed in the patchwork meeting let's defer that to stage 1
unless somebody really has a need for better dynamic LMUL right now.

-- 
Regards
 Robin

[PATCH] rx: avoid adding setpsw for rx_cmpstrn when len is const

2025-02-18 Thread Keith Packard

We can avoid the setpsw instructions when len is a known constant.
When len is zero, the insn result is zero. When len is non-zero,
the scmpu instructions will set the flags correctly.

Signed-off-by: Keith Packard 
---
 gcc/config/rx/rx.md | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rx/rx.md b/gcc/config/rx/rx.md
index edb2c96603f..8c7974d69a5 100644
--- a/gcc/config/rx/rx.md
+++ b/gcc/config/rx/rx.md
@@ -2545,6 +2545,16 @@ (define_expand "cmpstrnsi"
(match_operand:SI4 "immediate_operand")] ;; 
Known Align
   "rx_allow_string_insns"
   {
+bool const_len = CONST_INT_P(operands[3]);
+if (const_len)
+{
+  if (INTVAL(operands[3]) == 0)
+  {
+emit_move_insn (operands[0], operands[3]);
+DONE;
+  }
+}
+
 rtx str1 = gen_rtx_REG (SImode, 1);
 rtx str2 = gen_rtx_REG (SImode, 2);
 rtx len  = gen_rtx_REG (SImode, 3);
@@ -2553,6 +2563,11 @@ (define_expand "cmpstrnsi"
 emit_move_insn (str2, force_operand (XEXP (operands[2], 0), NULL_RTX));
 emit_move_insn (len, operands[3]);
 
+/* Set flags in case len is zero */
+if (!const_len) {
+  emit_insn (gen_setpsw (GEN_INT('C')));
+  emit_insn (gen_setpsw (GEN_INT('Z')));
+}
 emit_insn (gen_rx_cmpstrn (operands[0], operands[1], operands[2]));
 DONE;
   }
@@ -2590,9 +2605,7 @@ (define_insn "rx_cmpstrn"
(clobber (reg:SI 3))
(clobber (reg:CC CC_REG))]
   "rx_allow_string_insns"
-  "setpsw  z   ; Set flags in case len is zero
-   setpsw  c
-   scmpu   ; Perform the string comparison
+  "scmpu   ; Perform the string comparison
mov #-1, %0  ; Set up -1 result (which cannot be created
 ; by the SC insn)
bnc?+   ; If Carry is not set skip over
-- 
2.47.2

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Soumya AR



> On 18 Feb 2025, at 2:27 PM, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 18 Feb 2025, at 09:48, Kyrylo Tkachov  wrote:
>> 
>> 
>> 
>>> On 18 Feb 2025, at 09:41, Richard Sandiford  
>>> wrote:
>>> 
>>> Kyrylo Tkachov  writes:
 Hi Soumya
 
> On 18 Feb 2025, at 09:12, Soumya AR  wrote:
> 
> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
> generic_prefetch_tune in generic_armv8_a_tunings.
> 
> This patch updates the pointer to generic_armv8_a_prefetch_tune.
> 
> This patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> regression.
> 
> Ok for GCC 15 now?
 
 Yes, this looks like a simple oversight.
 Ok to push to master.
>>> 
>>> I suppose the alternative would be to remove generic_armv8_a_prefetch_tune,
>>> since it's (deliberately) identical to generic_prefetch_tune.
>> 
>> Looks like we have one prefetch_tune structure for each of the generic 
>> tunings (generic, generic_armv8_a, generic_armv9_a).
>> For the sake of symmetry it feels a bit better to have them independently 
>> tunable.
>> But as the effects are the same, it may be better to remove it in the 
>> interest of less code.
>> 
> 
> I see Soumya has already pushed her patch. I’m okay with either approach tbh, 
> but if Richard prefers we can remove generic_armv8_a_prefetch_tune in a 
> separate commit.

Yeah, missed Richard’s mail.

Let me know which is preferable, thanks.

Best,
Soumya

> Thanks,
> Kyrill
> 
> 
>> Thanks,
>> Kyrill
>> 
>>> 
 Thanks,
 Kyrill
 
> 
> Signed-off-by: Soumya AR 
> 
> gcc/ChangeLog:
> 
> * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch
> struct pointer.
> 
> ---
> gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
> b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> index 35de3f03296..01080cade46 100644
> --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> @@ -184,7 +184,7 @@ static const struct tune_params 
> generic_armv8_a_tunings =
> (AARCH64_EXTRA_TUNE_BASE
> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */
> - &generic_prefetch_tune,
> + &generic_armv8_a_prefetch_tune,
> AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */
> AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */
> };
> -- 
> 2.34.1

[committed] pair-fusion: Tweak wording in dump message [PR118320]

2025-02-18 Thread Alex Coplan

As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675978.html
this tweaks the dump messasge added with the fix for PR118320 since it doesn't
just apply to load pairs.

Tested on aarch64-linux-gnu, pushed to trunk.

Alex

gcc/ChangeLog:

PR rtl-optimization/118320
* pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Tweak wording in dump
message when punting on invalid use arrays.
diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index 5708d0f3b67..72e64246534 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -1742,7 +1742,7 @@ pair_fusion_bb_info::fuse_pair (bool load_p,
 {
   if (dump_file)
fprintf (dump_file,
-"  load pair: i%d and i%d use different definiitions of"
+"  rejecting pair: i%d and i%d use different definiitions of"
 " the same register\n",
 insns[0]->uid (), insns[1]->uid ());
   return false;

Re: [PATCH] pair-fusion: A couple of fixes for sp updates [PR118429]

2025-02-18 Thread Richard Sandiford

Alex Coplan  writes:
> On 17/02/2025 16:15, Richard Sandiford wrote:
>> Alex Coplan  writes:
>> >> @@ -588,6 +590,10 @@ latest_hazard_before (insn_info *insn, rtx *ignore,
>> >>&& find_reg_note (insn->rtl (), REG_EH_REGION, NULL_RTX))
>> >>  return insn->prev_nondebug_insn ();
>> >>  
>> >> +  if (!is_load_store
>> >> +  && accesses_include_memory (insn->defs ()))
>> >> +return insn->prev_nondebug_insn ();
>> >
>> > This seems like it might be a little too restrictive.  I agree that it's
>> > a nice and simple way of solving the problem, but wouldn't it be enough
>> > to prevent moving such accesses (stack deallocations) above the latest
>> > preceding def or use of mem?  Certainly we don't want to start
>> > attempting alias analysis here, but is the above suggestion not a happy
>> > middle ground (between a simple solution and not overly restricting
>> > optimisation)?
>> 
>> Would it help in practice though?  Although it is possible to combine
>> a deallocation with preceding stores, that only happens for dead code,
>> in which case the better optimisation is to delete the stores.
>> If we're combining with loads, the loads would normally be restoring
>> registers for the caller, in which case the loads could be moved
>> forward to the deallocation (since nothing would use or clobber
>> the loaded values between the two points).
>
> I see.  I must admit that I don't immediately see why this can only
> occur with dead stores, [...]

I was thinking of the post-increment case, but yeah, I suppose technically
there could be pre-increment cases.  It seems very unlikely in practice,
given how we manage the frame, but I agree that the case for not trying
harder is weaker than I'd initially assumed.

Thanks,
Richard

Re: [PATCH v2 06/16] Change function versions to be implicitly ordered.

2025-02-18 Thread Alfie Richards


On 18/02/2025 12:11, Richard Sandiford wrote:

Alfie Richards  writes:

This changes function version structures to maintain the default version
as the first declaration in the linked data structures by giving priority
to the set containing the default when constructing the structure.

This allows for removing logic for moving the default to the first
position which was duplicated across target specific code and enables
easier reasoning about function sets when checking for a default.

gcc/ChangeLog:

* cgraph.cc (cgraph_node::record_function_versions): Update to
implicitly keep default first.
* config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher):
Remove reordering.
* config/i386/i386-features.cc (ix86_get_function_versions_dispatcher):
Remove reordering.
* config/riscv/riscv.cc (riscv_get_function_versions_dispatcher):
Remove reordering.
* config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher):
Remove reordering.
---
  gcc/cgraph.cc| 27 -
  gcc/config/aarch64/aarch64.cc| 37 +++-
  gcc/config/i386/i386-features.cc | 33 -
  gcc/config/riscv/riscv.cc| 41 +++-
  gcc/config/rs6000/rs6000.cc  | 35 +--
  5 files changed, 49 insertions(+), 124 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index d0b19ad850e..bf6b43d00db 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -247,7 +247,9 @@ cgraph_node::record_function_versions (tree decl1, tree 
decl2)
decl1_v = decl1_node->function_version ();
decl2_v = decl2_node->function_version ();
  
-  if (decl1_v != NULL && decl2_v != NULL)

+  /* If the nodes are already linked, skip.  */
+  if ((decl1_v != NULL && (decl1_v->next || decl1_v->prev))
+  && (decl2_v != NULL && (decl2_v->next || decl2_v->prev)))
  return;
  
if (decl1_v == NULL)

@@ -256,18 +258,31 @@ cgraph_node::record_function_versions (tree decl1, tree 
decl2)
if (decl2_v == NULL)
  decl2_v = decl2_node->insert_new_function_version ();
  
-  /* Chain decl2_v and decl1_v.  All semantically identical versions

- will be chained together.  */
+  gcc_assert (decl1_v);
+  gcc_assert (decl2_v);
  
before = decl1_v;

after = decl2_v;
  
+  /* Go to first after node.  */

+  while (after->prev != NULL)
+after = after->prev;
+
+  while (before->prev != NULL)
+before = before->prev;
+
+  /* Potentially swap the nodes to maintain the default always being in the
+ first position.  */
+  if (before->next
+  ? !is_function_default_version (before->this_node->decl)
+  : is_function_default_version (after->this_node->decl))
+std::swap (before, after);
+
+  /* Go to last node of before.  */
while (before->next != NULL)
  before = before->next;
  
-  while (after->prev != NULL)

-after= after->prev;
-
+  /* Chain decl2_v and decl1_v.  */

I think this can be simplified to:

   before = decl1_v;
   after = decl2_v;

   /* Potentially swap the nodes to maintain the default always being in the
  first position.  */
   if (before->prev || before->next
   ? is_function_default_version (after->this_node->decl)
   : !is_function_default_version (before->this_node->decl))
 std::swap (before, after);

   while (before->next != NULL)
 before = before->next;
  
   while (after->prev != NULL)

 after = after->prev;

That is, if one decl is linked (and so the other is not), we only want
to put the other decl first if it is the default.
I see your point here, which I think relies on the assumption that 
functions get

added to the structure one by one rather than in a fractal pattern.
This assumption is already used here subtly so that makes sense.

I added this logic to at least try make this work in a slightly more 
general case as
to tell if a structure contains the default we should check the first 
element

of that structure, but it is unnecessary given that knowledge.

I would prefer to change this to make that more explicit and change this 
to be
"add_decl_to_version_into" taking a cgraph_function_version_info for the 
existing structure and a decl for the version to add to make this 
explicit. Would that change work for you?

[...]
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9bf7713139f..e5aa99a4965 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -13726,7 +13726,6 @@ riscv_get_function_versions_dispatcher (void *decl)
struct cgraph_node *node = NULL;
struct cgraph_node *default_node = NULL;
struct cgraph_function_version_info *node_v = NULL;
-  struct cgraph_function_version_info *first_v = NULL;
  
tree dispatch_decl = NULL;
  
@@ -13743,41 +13742,19 @@ riscv_get_function_versions_dispatcher (void *decl)

if (node_v->dispatcher_resolver != NULL)
  return node_v->dispatcher_resolver;

Re: [PATCH] builtins: Ensure sin and cos properly set errno when INFINITY is passed [PR80042]

2025-02-18 Thread Richard Biener

On Tue, Feb 18, 2025 at 1:54 PM Peter0x44  wrote:
>
> 18 Feb 2025 8:51:16 am Richard Biener :
>
> > On Tue, Feb 18, 2025 at 1:21 AM Sam James  wrote:
> >>
> >> Peter Damianov  writes:
> >>
> >>> POSIX says that sin and cos should set errno to EDOM when infinity is
> >>> passed to
> >>> them. Make sure this is accounted for in builtins.def, and add tests.
> >>>
> >>> gcc/
> >>>   PR middle-end/80042
> >>>   * builtins.def: (sin|cos)(f|l) can set errno.
> >>> gcc/testsuite/
> >>>   * gcc.dg/pr80042.c: New testcase.
> >>> ---
> >>> gcc/builtins.def   | 20 +-
> >>> gcc/testsuite/gcc.dg/pr80042.c | 71
> >>> ++
> >>> 2 files changed, 82 insertions(+), 9 deletions(-)
> >>> create mode 100644 gcc/testsuite/gcc.dg/pr80042.c
> >>>
> >>> [...]
> >>> diff --git a/gcc/testsuite/gcc.dg/pr80042.c
> >>> b/gcc/testsuite/gcc.dg/pr80042.c
> >>> new file mode 100644
> >>> index 000..cc578ae67e2
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.dg/pr80042.c
> >>> @@ -0,0 +1,71 @@
> >>> +/* dg-do run */
> >>> +/* dg-options "-O2 -lm" */
> >>
> >> These two lines are missing {}. Please double check the logs from your
> >> testsuite run to make sure newly added/changed tests are executed (and
> >> in the way you expect).
> >
> > This test will also FAIL on *BSD IIRC as that doesn't set errno for any
> > math
> > functions.
>
> So what do you suggest I do about it? Drop the test, or only enable it
> for certain known good targets?
> I don't use BSD so cannot test it.

Good question.  It's also that old glibc did not set errno here.

> >
> > I'll note GCC models sincos as cexpi which does not set errno, and will
> > eventually expand that to sincos or cexp.  It does that without any
> > restriction on -fno-math-errno.
>
> Is this a problem? Would I need to disable expansion to cexp with
> -fmath-errno make this work?

I think that the code might assume sin()/cos() is always CONST/PURE
and that for "POSIX-y correctness" we'd have to guard the transform
with -fno-math-errno.

> > I'll also note the C standard does not document any domain error on +-
> > Inf arguments.
> > Instead it documents a range error for sin(x) and nonzero x too close
> > to zero.
>
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sin.html
> POSIX does specify it should be a domain error, but C itself doesn't seem
> to say anything regarding it other than basically "implementations are
> allowed to invent errors for this case".

So what's the point of your patch?  That GCC does not assume sin/cos
will not clobber errno?  Maybe the testcase can be rewritten to consider
that?  Like check that we did not fold the != EDOM checks at compile-time
instead of hard-requiring the library to set that error?

Richard.

> >
> > Richard.
> >
> >>
> >>> [...]

RE: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351]

2025-02-18 Thread Li, Pan2

Thanks Richard.

> so the obvious fix would be to add
>
> if (!VECTOR_MODE_P (loop_vinfo->vector_mode))
>return false;

I also think of it, but it is too "easy" and then dropped.

> Ah, it needs -march=rv64imd_xsfvcp.  

It can also be reproduced by " -march=rv64imd_zve32x  -mrvv-vector-bits=zvl", 
sorry forgot to mention this.

> The error is probably that vect_verify_loop_lens does not do anything
> to ensure the checks are done on a relevant mode.  With the suggested
> added check above this then becomes a missed optimization rather
> than an ICE.  But it might fall apart if there's not one load/store len mode
> to consider?

I see, it may fall apart I am afraid, consider RVVM1DImode when rv64gc_zve32x,
the riscv_vector_mode_supported_any_target_p will always return true and we may
have RVVM1DImode here but zve32x cannot support DI as element size.

I will try to reproduce this after this ICE fix.

Pan


-Original Message-
From: Richard Biener  
Sent: Tuesday, February 18, 2025 5:36 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Vect: Fix ICE when get DImode from 
get_related_vectype_for_scalar_type [PR116351]

On Tue, Feb 18, 2025 at 10:12 AM Richard Biener
 wrote:
>
> On Tue, Feb 18, 2025 at 9:40 AM Li, Pan2  wrote:
> >
> > Hi Richard,
> >
> > After some more investigation, the sample code never hit one vectorizable_* 
> > routines which may check the loop_vinfo->vector_mode,
> > and then the loop_vinfo->vector_mode == DImode will hit the 
> > vect_verify_loop_lens and trigger the assert VECTOR_MODE_P, detail
> > flow as below.
> >
> > vect_analyze_loop_2
> >  |- vect_pattern_recog // Hit over-widening pattern and set 
> > loop_vinfo->vector_mode to DImode
> >  |- ...
> >  |- vect_analyze_loop_operations
> >|- (gdb) p stmt_info->def_type
> >|- $1 = vect_reduction_def
> >|- (gdb) p stmt_info->slp_type
> >|- $2 = pure_slp
> >|- vectorizable_lc_phi // Not Hit
> >|- vectorizable_induction  // Not Hit
> >|- vectorizable_reduction  // Not Hit
> >|- vectorizable_recurr // Not Hit
> >|- vectorizable_live_operation  // Not Hit
> >|- vect_analyze_stmt
> >  |- (gdb) p stmt_info->relevant
> >  |- $3 = vect_unused_in_scope
> >  |- (gdb) p stmt_info->live
> >  |- $4 = false
> >  |- (gdb) p pattern_stmt_info
> >  |- $5 = (stmt_vec_info) 0x0
> >  |- return opt_result::success ();
> >  OR
> >  |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP 
> > analysis\n"
> >|- Early return opt_result::success ();
> >  |- vectorizable_load/store/call_convert/... // Not Hit
> >|- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS 
> > (loop_vinfo).is_empty ()
> >  |- vect_verify_loop_lens (loop_vinfo)
> >|- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert 
> > result in ICE
> >
> > I am a little hesitant by two options here.
> >
> > 1. shall we add some condition and dump log here to make the 
> > vect_analyze_loop_2 failure when loop_vinfo->vector_mode is not supported 
> > vector mode by target.
> > 2. it should not be LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P here? Then we need 
> > to find out where set the partial vector to true.
> >
> > Is there any suggestion here?
>
> static bool
> vect_verify_loop_lens (loop_vec_info loop_vinfo)
> {
>   if (LOOP_VINFO_LENS (loop_vinfo).is_empty ())
> return false;
>
>   machine_mode len_load_mode, len_store_mode;
>   if (!get_len_load_store_mode (loop_vinfo->vector_mode, true)
>  .exists (&len_load_mode))
> return false;
>
> so the obvious fix would be to add
>
>   if (!VECTOR_MODE_P (loop_vinfo->vector_mode))
> return false;
>
> here?  But then I wonder how we got to a DImode vector_mode and record
> a loop len
> in the first place.  I could imagine we first end up with DImode but
> other stmts using
> a vector mode and we record a len for those.  But then the above
> get_len_load_store_mode
> on ->vector_mode seems to assume that all modes we need a len for are
> "compatible" with ->vector_mode so I assume recording a LEN would check that.
>
> I can't reproduce the ICE with a cross on trunk btw.

Ah, it needs -march=rv64imd_xsfvcp.  So we indeed call vect_record_loop_len
with

(gdb) p debug_tree (vectype)
 
unit-size 
align:16 warn_if_not_align:0 symtab:0 alias-set 2
canonical-type 0x77017690 precision:16 min  max 
pointer_to_this >
RVVM2HI
(gdb) p loop_vinfo->vector_mode
$2 = E_DImode

from vectorizable_operation and ->vector_mode is set via
vect_recog_over_widening_pattern which commits to a DImode
vector type ->vector_mode prematurely.

The error is probably that vect_verify_loop_lens does not do anything
to ensure the checks are done on a relevant mode.  With the suggested
added check above this then becomes a missed optimization rather
than an ICE.  But it might fall

Re: [PATCH v2 06/16] Change function versions to be implicitly ordered.

2025-02-18 Thread Richard Sandiford

Alfie Richards  writes:
> This changes function version structures to maintain the default version
> as the first declaration in the linked data structures by giving priority
> to the set containing the default when constructing the structure.
>
> This allows for removing logic for moving the default to the first
> position which was duplicated across target specific code and enables
> easier reasoning about function sets when checking for a default.
>
> gcc/ChangeLog:
>
>   * cgraph.cc (cgraph_node::record_function_versions): Update to
>   implicitly keep default first.
>   * config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher):
>   Remove reordering.
>   * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher):
>   Remove reordering.
>   * config/riscv/riscv.cc (riscv_get_function_versions_dispatcher):
>   Remove reordering.
>   * config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher):
>   Remove reordering.
> ---
>  gcc/cgraph.cc| 27 -
>  gcc/config/aarch64/aarch64.cc| 37 +++-
>  gcc/config/i386/i386-features.cc | 33 -
>  gcc/config/riscv/riscv.cc| 41 +++-
>  gcc/config/rs6000/rs6000.cc  | 35 +--
>  5 files changed, 49 insertions(+), 124 deletions(-)
>
> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index d0b19ad850e..bf6b43d00db 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -247,7 +247,9 @@ cgraph_node::record_function_versions (tree decl1, tree 
> decl2)
>decl1_v = decl1_node->function_version ();
>decl2_v = decl2_node->function_version ();
>  
> -  if (decl1_v != NULL && decl2_v != NULL)
> +  /* If the nodes are already linked, skip.  */
> +  if ((decl1_v != NULL && (decl1_v->next || decl1_v->prev))
> +  && (decl2_v != NULL && (decl2_v->next || decl2_v->prev)))
>  return;
>  
>if (decl1_v == NULL)
> @@ -256,18 +258,31 @@ cgraph_node::record_function_versions (tree decl1, tree 
> decl2)
>if (decl2_v == NULL)
>  decl2_v = decl2_node->insert_new_function_version ();
>  
> -  /* Chain decl2_v and decl1_v.  All semantically identical versions
> - will be chained together.  */
> +  gcc_assert (decl1_v);
> +  gcc_assert (decl2_v);
>  
>before = decl1_v;
>after = decl2_v;
>  
> +  /* Go to first after node.  */
> +  while (after->prev != NULL)
> +after = after->prev;
> +
> +  while (before->prev != NULL)
> +before = before->prev;
> +
> +  /* Potentially swap the nodes to maintain the default always being in the
> + first position.  */
> +  if (before->next
> +  ? !is_function_default_version (before->this_node->decl)
> +  : is_function_default_version (after->this_node->decl))
> +std::swap (before, after);
> +
> +  /* Go to last node of before.  */
>while (before->next != NULL)
>  before = before->next;
>  
> -  while (after->prev != NULL)
> -after= after->prev;
> -
> +  /* Chain decl2_v and decl1_v.  */

I think this can be simplified to:

  before = decl1_v;
  after = decl2_v;

  /* Potentially swap the nodes to maintain the default always being in the
 first position.  */
  if (before->prev || before->next
  ? is_function_default_version (after->this_node->decl)
  : !is_function_default_version (before->this_node->decl))
std::swap (before, after);

  while (before->next != NULL)
before = before->next;
 
  while (after->prev != NULL)
after = after->prev;

That is, if one decl is linked (and so the other is not), we only want
to put the other decl first if it is the default.

> [...]
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 9bf7713139f..e5aa99a4965 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -13726,7 +13726,6 @@ riscv_get_function_versions_dispatcher (void *decl)
>struct cgraph_node *node = NULL;
>struct cgraph_node *default_node = NULL;
>struct cgraph_function_version_info *node_v = NULL;
> -  struct cgraph_function_version_info *first_v = NULL;
>  
>tree dispatch_decl = NULL;
>  
> @@ -13743,41 +13742,19 @@ riscv_get_function_versions_dispatcher (void *decl)
>if (node_v->dispatcher_resolver != NULL)
>  return node_v->dispatcher_resolver;
>  
> -  /* Find the default version and make it the first node.  */
> -  first_v = node_v;
> -  /* Go to the beginning of the chain.  */
> -  while (first_v->prev != NULL)
> -first_v = first_v->prev;
> -  default_version_info = first_v;
> -
> -  while (default_version_info != NULL)
> -{
> -  struct riscv_feature_bits res;
> -  int priority; /* Unused.  */
> -  parse_features_for_version (default_version_info->this_node->decl,
> -   res, priority);
> -  if (res.length == 0)
> - break;
> -  default_version_info = default_version_info->next;
> -}
> +  /* The default node is alw

[PATCH] c++: Fix checking assert upon invalid class definition [PR116740]

2025-02-18 Thread Simon Martin

A checking assert triggers upon the following invalid code since
GCC 11:

=== cut here ===
class { a (struct b;
} struct b
=== cut here ===

The problem is that during error recovery, we call
set_identifier_type_value_with_scope for B in the global namespace, and
the checking assert added via r11-7228-g8f93e1b892850b fails.

This patch relaxes that assert to not fail if we've seen a parser error
(it a generalization of another fix done to that checking assert via
r11-7266-g24bf79f1798ad1).

Successfully tested on x86_64-pc-linux-gnu.

PR c++/116740

gcc/cp/ChangeLog:

* name-lookup.cc (set_identifier_type_value_with_scope): Don't
fail assert with ill-formed input.

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash80.C: New test.

---
 gcc/cp/name-lookup.cc| 6 ++
 gcc/testsuite/g++.dg/parse/crash80.C | 7 +++
 2 files changed, 9 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/parse/crash80.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index d1abb205bc7..742e5d289dc 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -5101,10 +5101,8 @@ set_identifier_type_value_with_scope (tree id, tree 
decl, cp_binding_level *b)
   if (b->kind == sk_namespace)
 /* At namespace scope we should not see an identifier type value.  */
 gcc_checking_assert (!REAL_IDENTIFIER_TYPE_VALUE (id)
-/* We could be pushing a friend underneath a template
-   parm (ill-formed).  */
-|| (TEMPLATE_PARM_P
-(TYPE_NAME (REAL_IDENTIFIER_TYPE_VALUE (id);
+/* But we might end up here with ill-formed input.  */
+|| seen_error ());
   else
 {
   /* Push the current type value, so we can restore it later  */
diff --git a/gcc/testsuite/g++.dg/parse/crash80.C 
b/gcc/testsuite/g++.dg/parse/crash80.C
new file mode 100644
index 000..cd9216adf5c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash80.C
@@ -0,0 +1,7 @@
+// PR c++/116740
+// { dg-do "compile" }
+
+class K {
+  int a(struct b; // { dg-error "expected '\\)'" }
+};
+struct b {};
-- 
2.44.0

Re: [PATCH v2 08/16] Add get_clone_versions function.

2025-02-18 Thread Richard Sandiford

Alfie Richards  writes:
> This is a reimplementation of get_target_clone_attr_len,
> get_attr_str, and separate_attrs using string_slice and auto_vec to make
> memory management and use simpler.
>
> gcc/c-family/ChangeLog:
>
>   * c-attribs.cc (handle_target_clones_attribute): Change to use
>   get_clone_versions.
>
> gcc/ChangeLog:
>
>   * tree.cc (get_clone_versions): New function.
>   (get_clone_attr_versions): New function.
>   * tree.h (get_clone_versions): New function.
>   (get_clone_attr_versions): New function.

OK for GCC 16, thanks.

Richard

> ---
>  gcc/c-family/c-attribs.cc |  2 +-
>  gcc/tree.cc   | 40 +++
>  gcc/tree.h|  3 +++
>  3 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index f3181e7b57c..642d724f6c6 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -6129,7 +6129,7 @@ handle_target_clones_attribute (tree *node, tree name, 
> tree ARG_UNUSED (args),
>   }
>   }
>  
> -  if (get_target_clone_attr_len (args) == -1)
> +  if (get_clone_attr_versions (args).length () == 1)
>   {
> warning (OPT_Wattributes,
>  "single % attribute is ignored");
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 0743ed71c78..83dc9f32f96 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -15356,6 +15356,46 @@ get_target_clone_attr_len (tree arglist)
>return str_len_sum;
>  }
>  
> +/* Returns an auto_vec of string_slices containing the version strings from
> +   ARGLIST.  DEFAULT_COUNT is incremented for each default version found.  */
> +
> +auto_vec
> +get_clone_attr_versions (const tree arglist, int *default_count)
> +{
> +  gcc_assert (TREE_CODE (arglist) == TREE_LIST);
> +  auto_vec versions;
> +
> +  static const char separator_str[] = {TARGET_CLONES_ATTR_SEPARATOR, 0};
> +  string_slice separators = string_slice (separator_str);
> +
> +  for (tree arg = arglist; arg; arg = TREE_CHAIN (arg))
> +{
> +  string_slice str = string_slice (TREE_STRING_POINTER (TREE_VALUE 
> (arg)));
> +  while (str.is_valid ())
> + {
> +   string_slice attr = string_slice::tokenize (&str, separators);
> +   attr = attr.strip ();
> +   if (attr == "default" && default_count)
> + (*default_count)++;
> +   versions.safe_push (attr);
> + }
> +}
> +  return versions;
> +}
> +
> +/* Returns an auto_vec of string_slices containing the version strings from
> +   the target_clone attribute from DECL.  DEFAULT_COUNT is incremented for 
> each
> +   default version found.  */
> +auto_vec
> +get_clone_versions (const tree decl, int *default_count)
> +{
> +  tree attr = lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl));
> +  if (!attr)
> +return auto_vec ();
> +  tree arglist = TREE_VALUE (attr);
> +  return get_clone_attr_versions (arglist, default_count);
> +}
> +
>  void
>  tree_cc_finalize (void)
>  {
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 21f3cd5525c..70541070c40 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #include "tree-core.h"
>  #include "options.h"
> +#include "vec.h"
>  
>  /* Convert a target-independent built-in function code to a combined_fn.  */
>  
> @@ -7035,5 +7036,7 @@ extern unsigned fndecl_dealloc_argno (tree);
>  extern tree get_attr_nonstring_decl (tree, tree * = NULL);
>  
>  extern int get_target_clone_attr_len (tree);
> +auto_vec get_clone_versions (const tree, int * = NULL);
> +auto_vec get_clone_attr_versions (const tree, int * = NULL);
>  
>  #endif  /* GCC_TREE_H  */

Re: [PATCH] arm: Remove inner 'fix:HF/SF/DF' from fixed-point patterns (PR 117712)

2025-02-18 Thread Christophe Lyon

On Tue, 18 Feb 2025 at 13:49, Richard Earnshaw (lists)
 wrote:
>
> On 18/02/2025 08:37, Christophe Lyon wrote:
> > As discussed in the PR, removing the inner 'fix:HF/SD/DF' fixes the
> > problem, like other targets do.
> >
>
> The double-'fix' idiom was introduced in 
> https://gcc.gnu.org/pipermail/gcc-patches/2003-March/098380.html to address 
> target/5985.  Certainly at the time it seems that FIX had two meanings 
> depending on the mode.  If the target was a floating point mode it did a 
> truncation operation with rounding.  If it was an integer mode it did 
> trucation with unspecified rounding.  But the manual doesn't seem to mention 
> FIX: (at least not now), so I'm wondering if something has been 
> lost somewhere along the line.
>
> Anyway, I'm not sure this is right yet.
>

Well, this adopts the same approach as the fix for PR 117525 (same
problem, but on hppa).
In that PR there's  also a mention of a similar problem on Sparc, and
Konstantinos says he is working on a middle-end fix (see comment #9 in
PR117712).

Let's wait for that, then?

Thanks,

Christophe

> R.
>
> > gcc/ChangeLog:
> >
> >   PR rtl-optimization/117712
> >   * config/arm/arm.md (fix_trunchfsi2): Remove inner fix:HF.
> >   (fix_trunchfdi2): Likewise.
> >   (fix_truncsfsi2): Remove inner fix:SF.
> >   (fix_truncdfsi2): Remove inner fix:DF.
> >   * config/arm/vfp.md (truncsisf2_vfp): remove inner fix:SF.
> >   (truncsidf2_vfp): Remove inner fix:DF.
> >   (fixuns_truncsfsi2): Remove inner fix:SF.
> >   (fixuns_truncdfsi2): Remove inner fix:DF.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR rtl-optimization/117712
> >   * gcc.target/arm/pr117712-df.c: New test.
> >   * gcc.target/arm/pr117712-hf-di.c: New test.
> >   * gcc.target/arm/pr117712-hf.c: New test.
> >   * gcc.target/arm/pr117712-sf.c: New test.
> > ---
> >  gcc/config/arm/arm.md |  8 
> >  gcc/config/arm/vfp.md |  8 
> >  gcc/testsuite/gcc.target/arm/pr117712-df.c| 10 ++
> >  gcc/testsuite/gcc.target/arm/pr117712-hf-di.c | 10 ++
> >  gcc/testsuite/gcc.target/arm/pr117712-hf.c| 10 ++
> >  gcc/testsuite/gcc.target/arm/pr117712-sf.c| 10 ++
> >  6 files changed, 48 insertions(+), 8 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-df.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf-di.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-sf.c
> >
> > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> > index 442d86b9329..ed0d0da2e63 100644
> > --- a/gcc/config/arm/arm.md
> > +++ b/gcc/config/arm/arm.md
> > @@ -5477,7 +5477,7 @@ (define_expand "floatsidf2"
> >
> >  (define_expand "fix_trunchfsi2"
> >[(set (match_operand:SI 0 "general_operand")
> > - (fix:SI (fix:HF (match_operand:HF 1 "general_operand"]
> > + (fix:SI (match_operand:HF 1 "general_operand")))]
> >"TARGET_EITHER"
> >"
> >{
> > @@ -5489,7 +5489,7 @@ (define_expand "fix_trunchfsi2"
> >
> >  (define_expand "fix_trunchfdi2"
> >[(set (match_operand:DI 0 "general_operand")
> > - (fix:DI (fix:HF (match_operand:HF 1 "general_operand"]
> > + (fix:DI (match_operand:HF 1 "general_operand")))]
> >"TARGET_EITHER"
> >"
> >{
> > @@ -5501,14 +5501,14 @@ (define_expand "fix_trunchfdi2"
> >
> >  (define_expand "fix_truncsfsi2"
> >[(set (match_operand:SI 0 "s_register_operand")
> > - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand"]
> > + (fix:SI (match_operand:SF 1 "s_register_operand")))]
> >"TARGET_32BIT && TARGET_HARD_FLOAT"
> >"
> >  ")
> >
> >  (define_expand "fix_truncdfsi2"
> >[(set (match_operand:SI 0 "s_register_operand")
> > - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand"]
> > + (fix:SI (match_operand:DF 1 "s_register_operand")))]
> >"TARGET_32BIT && TARGET_HARD_FLOAT && !TARGET_VFP_SINGLE"
> >"
> >  ")
> > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> > index 379f5f7b3dc..0ef019b1727 100644
> > --- a/gcc/config/arm/vfp.md
> > +++ b/gcc/config/arm/vfp.md
> > @@ -1508,7 +1508,7 @@ (define_insn "truncsfhf2"
> >
> >  (define_insn "*truncsisf2_vfp"
> >[(set (match_operand:SI  0 "s_register_operand" "=t")
> > - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" "t"]
> > + (fix:SI (match_operand:SF 1 "s_register_operand" "t")))]
> >"TARGET_32BIT && TARGET_HARD_FLOAT"
> >"vcvt%?.s32.f32\\t%0, %1"
> >[(set_attr "predicable" "yes")
> > @@ -1517,7 +1517,7 @@ (define_insn "*truncsisf2_vfp"
> >
> >  (define_insn "*truncsidf2_vfp"
> >[(set (match_operand:SI  0 "s_register_operand" "=t")
> > - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand" "w"]
> > + (fix:SI (match_operand:DF 1 "s_register_oper

Re: [PATCH v2 12/16] Refactor FMV name mangling.

2025-02-18 Thread Richard Sandiford

Alfie Richards  writes:
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index b00d9529a8d..d0f37d77098 100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> [...]
> @@ -1287,6 +1282,33 @@ make_dispatcher_decl (const tree decl)
>DECL_EXTERNAL (func_decl) = 1;
>/* This will be of type IFUNCs have to be externally visible.  */
>TREE_PUBLIC (func_decl) = 1;
> +  TREE_NOTHROW (func_decl) = TREE_NOTHROW (decl);
> +
> +  /* Set the decl name to avoid graph_node re-mangling it.  */
> +  SET_DECL_ASSEMBLER_NAME (func_decl, DECL_ASSEMBLER_NAME (decl));
> +
> +  cgraph_node *node = cgraph_node::get (decl);
> +  gcc_assert (node);
> +  cgraph_function_version_info *node_v = node->function_version ();
> +  gcc_assert (node_v);

Very minor suggestion, but: all callers already have the node to hand
and pass the decl inside it, so perhaps it would make sense to change
make_dispatcher_decl so that it takes the cgraph node instead.

> [...]
> @@ -19894,37 +19894,35 @@ static aarch64_fmv_feature_datum 
> aarch64_fmv_feature_data[] = {
> the extension string is created and stored to INVALID_EXTENSION.  */
>  
>  static enum aarch_parse_opt_result
> -aarch64_parse_fmv_features (const char *str, aarch64_feature_flags 
> *isa_flags,
> +aarch64_parse_fmv_features (string_slice str, aarch64_feature_flags 
> *isa_flags,
>   aarch64_fmv_feature_mask *feature_mask,
>   std::string *invalid_extension)
>  {
>if (feature_mask)
>  *feature_mask = 0ULL;
>  
> -  if (strcmp (str, "default") == 0)
> +  if (str == "default")
>  return AARCH_PARSE_OK;
>  
> -  while (str != NULL && *str != 0)
> +  string_slice str_parse = str;
> +
> +  gcc_assert (str.is_valid ());
> +  while (str_parse.is_valid ())
>  {
> -  const char *ext;
> -  size_t len;
> +  string_slice ext;
>  
> -  ext = strchr (str, '+');
> +  ext = string_slice::tokenize (&str_parse, string_slice ("+"));

Following on from the comment about explicit constructors, it'd be
nice not to need the explicit constructor here.

> -  if (ext != NULL)
> - len = ext - str;
> -  else
> - len = strlen (str);
> +  gcc_assert (ext.is_valid ());
>  
> -  if (len == 0)
> +  if (!ext.is_valid () || ext.empty ())

The assert makes the !ext.is_valid () part redundant.

>   return AARCH_PARSE_MISSING_ARG;
>  
>int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
>int i;
>for (i = 0; i < num_features; i++)
>   {
> -   if (strlen (aarch64_fmv_feature_data[i].name) == len
> -   && strncmp (aarch64_fmv_feature_data[i].name, str, len) == 0)
> +   if (aarch64_fmv_feature_data[i].name == ext)
>   {
> if (isa_flags)
>   *isa_flags |= aarch64_fmv_feature_data[i].opt_flags;
> [...]
> @@ -19992,7 +19987,7 @@ aarch64_process_target_version_attr (tree args)
>return false;
>  }
>  
> -  const char *str = TREE_STRING_POINTER (args);
> +  string_slice str = string_slice (TREE_STRING_POINTER (args));

Similarly here, I'd hope:

  string_slice str = TREE_STRING_POINTER (args);

would be enough.
>  
>enum aarch_parse_opt_result parse_res;
>auto isa_flags = aarch64_asm_isa_flags;
> @@ -20195,36 +20191,33 @@ tree
>  aarch64_mangle_decl_assembler_name (tree decl, tree id)
>  {
>/* For function version, add the target suffix to the assembler name.  */
> -  if (TREE_CODE (decl) == FUNCTION_DECL
> -  && DECL_FUNCTION_VERSIONED (decl))
> +  if (TREE_CODE (decl) == FUNCTION_DECL)
>  {
> -  aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version 
> (decl);
> -
> -  std::string name = IDENTIFIER_POINTER (id);
> -
> -  /* For the default version, append ".default".  */
> -  if (feature_mask == 0ULL)
> +  cgraph_node *node = cgraph_node::get (decl);
> +  if (node && node->dispatcher_function)
> + return id;
> +  else if (node && node->dispatcher_resolver_function)
> + return clone_identifier (id, "resolver");
> +  else if (DECL_FUNCTION_VERSIONED (decl))
>   {
> -   name += ".default";
> -   return get_identifier (name.c_str());
> - }
> +   aarch64_fmv_feature_mask feature_mask
> + = get_feature_mask_for_version (decl);
>  
> -  name += "._";
> +   if (feature_mask == 0ULL)
> + return clone_identifier (id, "default");
>  
> -  int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
> -  for (int i = 0; i < num_features; i++)
> - {
> -   if (feature_mask & aarch64_fmv_feature_data[i].feature_mask)
> - {
> -   name += "M";
> -   name += aarch64_fmv_feature_data[i].name;
> - }
> - }
> +   std::string suffix = "_";
>  
> -  if (DECL_ASSEMBLER_NAME_SET_P (decl))
> - SET_DECL_RTL (decl, NULL);
> +   int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
> +   for (int i = 0; i < num_features; i++)
> + if (feature_mask

Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.

2025-02-18 Thread Jeff Law





On 2/18/25 4:12 AM, Jin Ma wrote:

We overlooked the side effects of the rounding mode in the pattern,
which can impact the result of float_extend and lead to incorrect
optimizations in the final program. This issue likely affects nearly
all similar patterns that involve rounding modes, and the tests in
this patch only highlight one example. It seems challenging to address,
and I only implemented a simple fix, which is not a good way to solve
the problem.

Any comments on this?

gcc/ChangeLog:

* config/riscv/vector-iterators.md (UNSPEC_VRM): New.
* config/riscv/vector.md: Use UNSPEC for float_extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-11.c: New test.
So as Kito note, the insn you changed already has a reference to the FRM 
it needs -- kept in operands[9].  It seems like your patch, while fixing 
the bug, more likely does so by accident rather than by design.


What I see when I look at the dump files is a deeper issue.


In the .expand dump we have:


(insn 17 16 18 2 (set (reg:HF 147)
(const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 -1
 (nil))
(insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
(if_then_else:RVVMF2SF (unspec:RVVMF64BI [
(reg/v:RVVMF64BI 138 [ vmask ])
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 2 [0x2])
(const_int 0 [0])
(const_int 2 [0x2])
(reg:SI 66 vl)
(reg:SI 67 vtype)
(reg:SI 69 frm)
] UNSPEC_VPREDICATE)
(minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
(float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (reg:HF 147
(reg/v:RVVMF2SF 140 [ vreg_memory ]))) "j.c":14:24 -1
 (nil))   




Insn 18 does the subtraction with the adjusted rounding mode.  So far, 
so good.  Things look fine at the start of cse1.  But if we look at the 
end of cse1 we have:



(insn 17 16 18 2 (set (reg:HF 147)
(const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 136 {*movhf_hardfloat}
 (nil))
(insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
(reg/v:RVVMF2SF 140 [ vreg_memory ])) "j.c":14:24 2786 
{*movrvvmf2sf_fract}
 (expr_list:REG_DEAD (reg:HF 147)
(expr_list:REG_DEAD (reg/v:RVVMF2SF 140 [ vreg_memory ])
(expr_list:REG_DEAD (reg/v:RVVMF64BI 138 [ vmask ])
(expr_list:REG_DEAD (reg:SI 69 frm)
(nil))



Note how CSE replace the arithmetic with a simple copy.  At this point 
things are broken.


I don't see how CSE can make the right decision here; we don't expose 
rounding modes this early and thus CSE has no way to know it can't make 
that kind of replacement.


You patch kindof works, but it seems to me it's more accident than 
design and that we need to fix this in a more general manner.


The natural question is what do other targets do when the rounding mode 
gets changed.  I'm guessing its exposed as a unspec set before the RTL 
optimizers run.


jeff

Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

2025-02-18 Thread Uros Bizjak

On Tue, Feb 18, 2025 at 8:26 PM Uros Bizjak  wrote:
>
> On Tue, Feb 18, 2025 at 8:23 PM Richard Biener  wrote:
> >
> >
> >
> > > Am 18.02.2025 um 20:07 schrieb Roman Kagan :
> > >
> > > On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak wrote:
> > >>> On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan  wrote:
> > >>>
> > >>> On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote:
> >  When gcc is built for x86_64-linux-musl target, stack unwinding from
> >  within signal handler stops at the innermost signal frame.  The reason
> >  for this behaviro is that the signal trampoline is not accompanied with
> >  appropiate CFI directives, and the fallback path in libgcc to recognize
> >  it by the code sequence is only enabled for glibc except 2.0.  The
> >  latter is motivated by the lack of sys/ucontext.h in that glibc 
> >  version.
> > 
> >  Given that all relevant libc-s ship sys/ucontext.h for over a decade,
> >  and that other arches aren't shy of unconditionally using it, follow
> >  suit and remove the preprocessor condition, too.
> > >>
> > >> "Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC,
> > >> LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest
> > >> glibc 2.0.x version was released in 1997 [1], so I guess we can remove
> > >> the condition for version 2.0. Based on your claim, the other
> > >> mentioned libcs also provide the required header for a long time.
> > >
> > > Ah, good point, for completeness I should've supplied evidence from
> > > their respective git repos, here you go:
> > >
> > > uclibc(-ng):
> > >  libc/sysdeps/linux/i386/sys/ucontext.h
> > >
> > >commit 9cee42f10dbc5b33866ff137b926a74abd7c1a5b
> > >Author: Eric Andersen 
> > >Date:   Fri Mar 1 20:46:26 2002 +
> > >
> > >Major rework of the include files to eliminate redundancy
> > >and to better support each arch.  This is a really big patch...
> > > -Erik
> > >
> > >  libc/sysdeps/linux/i386/sys/ucontext.h
> > >
> > >commit 1fef64b22811709b2e640d341237bce1c8081203
> > >Author: Mike Frysinger 
> > >Date:   Tue Feb 15 01:27:10 2005 +
> > >
> > >headers for x86_64
> > >
> > > bionic:
> > >  libc/include/sys/ucontext.h
> > >
> > >commit e61d106008f7d77fa1c0de43ac27311320225135
> > >Author: Pavel Chupin 
> > >Date:   Mon Jan 27 17:56:43 2014 +0400
> > >
> > >Add x86_64 ucontext.h for better compatibility
> > >
> > >As suggested here: 
> > > https://android-review.googlesource.com/#/c/71267/
> > >it may be used for x86_64 libunwind enabling.
> > >
> > >Change-Id: I21623261a48ea7099e030d33932556e294d226ff
> > >Signed-off-by: Pavel Chupin 
> > >
> > >commit 677a07cb9a3f5964e9ead4d37b9f775d971c61e0
> > >Author: Elliott Hughes 
> > >Date:   Wed Jan 29 16:46:00 2014 -0800
> > >
> > >Add x86 .
> > >
> > >Change-Id: I43e72604f7a932f134733b78094b577415a5edb7
> > >
> > > musl:
> > >  arch/i386/bits/signal.h
> > >  arch/x86_64/bits/signal.h
> > >  include/ucontext.h
> > >
> > >commit ad2fe25041622b6cf426b0f98af0e52c2c9727f6
> > >Author: Rich Felker 
> > >Date:   Fri Feb 18 22:03:03 2011 -0500
> > >
> > >support the ugly and deprecated ucontext and sigcontext header 
> > > stuff...
> > >
> > >only the structures, not the functions from ucontext.h, are 
> > > supported
> > >at this point. the main goal of this commit is to make modern gcc 
> > > with
> > >dwarf2 unwinding build without errors.
> > >
> > >honestly, it probably doesn't matter how we define these as long as
> > >they have members with the right names to prevent errors while
> > >compiling libgcc. the only time they will be used is for 
> > > propagating
> > >exceptions across signal-handler boundaries, which invokes 
> > > undefined
> > >behavior anyway. but as-is, they're probably correct and may be 
> > > useful
> > >to various low-level applications dealing with virtualization, jit
> > >code generation, and so on...
> > >
> > >> I have no objection to the patch, but I think that this patch is a bit
> > >> late for gcc-15 and should be committed early in the gcc-16
> > >> development cycle. But let's hear release managers (CC'd).
> >
> > It’s fine for 15, or rather I’m leaving it for you to decide.
>
> OK, based on the above research, I'll commit it to gcc-15.

Committed as e129b8d7682c9a6c4d874f58de142543d3804169
with the following ChangeLog entry:

libgcc/ChangeLog:

* config/i386/linux-unwind.h: Remove preprocessor
condition to enable fallback path for all libc-s.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Thanks,
Uros.

Re: [PATCH] COBOL 12/15 24K pos: Posix adapter framework

2025-02-18 Thread James K. Lowden

On Tue, 18 Feb 2025 09:35:33 +0100
Richard Biener  wrote:

> > I'm sure you agree we don't want to let this tail wag the dog.
> > With my exegesis in mind, what would you recommend?  If it's
> > limited to more judicious use of makefile variables, I could surely
> > implement those suggestions.
> 
> So to simplify things at this point can we postpone merging this bit
> then?  If you say it's more like a "contrib", wouldn't
> putting it in the toplevel contrib/ directory be more appropriate?
> Maybe in a contrib/cobol/ subdirectory?

As you wish.  I'll eliminate it from the next patchset, which I hope will be 
later today.  

--jkl

Re: The COBOL front end, version 2, in 15-part harmony

2025-02-18 Thread James K. Lowden

On Tue, 18 Feb 2025 09:37:57 +0100
Richard Biener  wrote:

> > Except for "lib", patches over 400 KB consist of just one big file.
> 
> For a future possible version 3 of the patch set, you do not need to
> send big generated files like 'configure' as part of the patch, but
> just the sources/changes to their templates.

IIUC, just send normal patches to configure.ac & friends, and ignore the fact 
that e.g. libgcobol/configure has changed.  

Will do. 

--jkl

Re: [PATCH] avoid-store-forwarding: Handle REG_EH_REGION notes

2025-02-18 Thread Richard Biener




> Am 18.02.2025 um 17:04 schrieb Konstantinos Eleftheriou 
> :
> 
> From: kelefth 
> 
> The pass rejects the transformation when there are instructions in the
> sequence that might throw an exception. This was added due to having
> cases that the load instruction contains a REG_EH_REGION note and
> moving it before the store instructions caused an error, as it was
> no longer the last instruction in the basic block.
> 
> This patch handles those cases by moving a possible REG_EH_REGION
> note from the load instruction of the store-load sequence to the
> last instruction of the basic block.

But that’s not a correct transform and will lead to bogus exception handling?  
You’d need to move the note and split the block, possibly updating the EH info 
on the side.

Richard 

> gcc/ChangeLog:
> 
>* avoid-store-forwarding.cc (process_store_forwarding):
>(store_forwarding_analyzer::avoid_store_forwarding):
>Move a possible REG_EH_REGION note from the load instruction
>to the last instruction of the basic block.
> ---
> gcc/avoid-store-forwarding.cc | 13 -
> 1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> index 34a7bba4043..05c91bb1a82 100644
> --- a/gcc/avoid-store-forwarding.cc
> +++ b/gcc/avoid-store-forwarding.cc
> @@ -400,6 +400,17 @@ process_store_forwarding (vec &stores, 
> rtx_insn *load_insn,
>   if (load_elim)
> delete_insn (load_insn);
> 
> +  /* Find possible REG_EH_REGION note in the load instruction and move it
> + into the last instruction of the basic block.  */
> +  rtx reg_eh_region_note = find_reg_note (load_insn, REG_EH_REGION, 
> NULL_RTX);
> +  if (reg_eh_region_note != NULL_RTX)
> +{
> +  remove_note (load_insn, reg_eh_region_note);
> +  basic_block load_bb = BLOCK_FOR_INSN (load_insn);
> +  add_reg_note (BB_END (load_bb), REG_EH_REGION,
> +XEXP (reg_eh_region_note, 0));
> +}
> +
>   return true;
> }
> 
> @@ -425,7 +436,7 @@ store_forwarding_analyzer::avoid_store_forwarding 
> (basic_block bb)
> 
>   rtx set = single_set (insn);
> 
> -  if (!set || insn_could_throw_p (insn))
> +  if (!set)
>{
>  store_exprs.truncate (0);
>  continue;
> --
> 2.47.0
>

Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

2025-02-18 Thread Richard Biener




> Am 18.02.2025 um 20:07 schrieb Roman Kagan :
> 
> On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak wrote:
>>> On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan  wrote:
>>> 
>>> On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote:
 When gcc is built for x86_64-linux-musl target, stack unwinding from
 within signal handler stops at the innermost signal frame.  The reason
 for this behaviro is that the signal trampoline is not accompanied with
 appropiate CFI directives, and the fallback path in libgcc to recognize
 it by the code sequence is only enabled for glibc except 2.0.  The
 latter is motivated by the lack of sys/ucontext.h in that glibc version.
 
 Given that all relevant libc-s ship sys/ucontext.h for over a decade,
 and that other arches aren't shy of unconditionally using it, follow
 suit and remove the preprocessor condition, too.
>> 
>> "Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC,
>> LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest
>> glibc 2.0.x version was released in 1997 [1], so I guess we can remove
>> the condition for version 2.0. Based on your claim, the other
>> mentioned libcs also provide the required header for a long time.
> 
> Ah, good point, for completeness I should've supplied evidence from
> their respective git repos, here you go:
> 
> uclibc(-ng):
>  libc/sysdeps/linux/i386/sys/ucontext.h
> 
>commit 9cee42f10dbc5b33866ff137b926a74abd7c1a5b
>Author: Eric Andersen 
>Date:   Fri Mar 1 20:46:26 2002 +
> 
>Major rework of the include files to eliminate redundancy
>and to better support each arch.  This is a really big patch...
> -Erik
> 
>  libc/sysdeps/linux/i386/sys/ucontext.h
> 
>commit 1fef64b22811709b2e640d341237bce1c8081203
>Author: Mike Frysinger 
>Date:   Tue Feb 15 01:27:10 2005 +
> 
>headers for x86_64
> 
> bionic:
>  libc/include/sys/ucontext.h
> 
>commit e61d106008f7d77fa1c0de43ac27311320225135
>Author: Pavel Chupin 
>Date:   Mon Jan 27 17:56:43 2014 +0400
> 
>Add x86_64 ucontext.h for better compatibility
> 
>As suggested here: https://android-review.googlesource.com/#/c/71267/
>it may be used for x86_64 libunwind enabling.
> 
>Change-Id: I21623261a48ea7099e030d33932556e294d226ff
>Signed-off-by: Pavel Chupin 
> 
>commit 677a07cb9a3f5964e9ead4d37b9f775d971c61e0
>Author: Elliott Hughes 
>Date:   Wed Jan 29 16:46:00 2014 -0800
> 
>Add x86 .
> 
>Change-Id: I43e72604f7a932f134733b78094b577415a5edb7
> 
> musl:
>  arch/i386/bits/signal.h
>  arch/x86_64/bits/signal.h
>  include/ucontext.h
> 
>commit ad2fe25041622b6cf426b0f98af0e52c2c9727f6
>Author: Rich Felker 
>Date:   Fri Feb 18 22:03:03 2011 -0500
> 
>support the ugly and deprecated ucontext and sigcontext header stuff...
> 
>only the structures, not the functions from ucontext.h, are supported
>at this point. the main goal of this commit is to make modern gcc with
>dwarf2 unwinding build without errors.
> 
>honestly, it probably doesn't matter how we define these as long as
>they have members with the right names to prevent errors while
>compiling libgcc. the only time they will be used is for propagating
>exceptions across signal-handler boundaries, which invokes undefined
>behavior anyway. but as-is, they're probably correct and may be useful
>to various low-level applications dealing with virtualization, jit
>code generation, and so on...
> 
>> I have no objection to the patch, but I think that this patch is a bit
>> late for gcc-15 and should be committed early in the gcc-16
>> development cycle. But let's hear release managers (CC'd).

It’s fine for 15, or rather I’m leaving it for you to decide.

> I gather that GCC doesn't have "cc: stable" process similar to Linux,
> does it?

Patches can be back ported to release branches if they fix regressions or 
important bugs.  You should possibly see to add the missing CFI directives on 
your system?

Richard 

> 
> Fine by me anyway.  If it lands in GCC repo I'll at least be able to
> poke at some downstream maintainers with a link to cherry-pick.
> 
> Thanks,
> Roman.

Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

2025-02-18 Thread Uros Bizjak

On Tue, Feb 18, 2025 at 8:23 PM Richard Biener  wrote:
>
>
>
> > Am 18.02.2025 um 20:07 schrieb Roman Kagan :
> >
> > On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak wrote:
> >>> On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan  wrote:
> >>>
> >>> On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote:
>  When gcc is built for x86_64-linux-musl target, stack unwinding from
>  within signal handler stops at the innermost signal frame.  The reason
>  for this behaviro is that the signal trampoline is not accompanied with
>  appropiate CFI directives, and the fallback path in libgcc to recognize
>  it by the code sequence is only enabled for glibc except 2.0.  The
>  latter is motivated by the lack of sys/ucontext.h in that glibc version.
> 
>  Given that all relevant libc-s ship sys/ucontext.h for over a decade,
>  and that other arches aren't shy of unconditionally using it, follow
>  suit and remove the preprocessor condition, too.
> >>
> >> "Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC,
> >> LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest
> >> glibc 2.0.x version was released in 1997 [1], so I guess we can remove
> >> the condition for version 2.0. Based on your claim, the other
> >> mentioned libcs also provide the required header for a long time.
> >
> > Ah, good point, for completeness I should've supplied evidence from
> > their respective git repos, here you go:
> >
> > uclibc(-ng):
> >  libc/sysdeps/linux/i386/sys/ucontext.h
> >
> >commit 9cee42f10dbc5b33866ff137b926a74abd7c1a5b
> >Author: Eric Andersen 
> >Date:   Fri Mar 1 20:46:26 2002 +
> >
> >Major rework of the include files to eliminate redundancy
> >and to better support each arch.  This is a really big patch...
> > -Erik
> >
> >  libc/sysdeps/linux/i386/sys/ucontext.h
> >
> >commit 1fef64b22811709b2e640d341237bce1c8081203
> >Author: Mike Frysinger 
> >Date:   Tue Feb 15 01:27:10 2005 +
> >
> >headers for x86_64
> >
> > bionic:
> >  libc/include/sys/ucontext.h
> >
> >commit e61d106008f7d77fa1c0de43ac27311320225135
> >Author: Pavel Chupin 
> >Date:   Mon Jan 27 17:56:43 2014 +0400
> >
> >Add x86_64 ucontext.h for better compatibility
> >
> >As suggested here: https://android-review.googlesource.com/#/c/71267/
> >it may be used for x86_64 libunwind enabling.
> >
> >Change-Id: I21623261a48ea7099e030d33932556e294d226ff
> >Signed-off-by: Pavel Chupin 
> >
> >commit 677a07cb9a3f5964e9ead4d37b9f775d971c61e0
> >Author: Elliott Hughes 
> >Date:   Wed Jan 29 16:46:00 2014 -0800
> >
> >Add x86 .
> >
> >Change-Id: I43e72604f7a932f134733b78094b577415a5edb7
> >
> > musl:
> >  arch/i386/bits/signal.h
> >  arch/x86_64/bits/signal.h
> >  include/ucontext.h
> >
> >commit ad2fe25041622b6cf426b0f98af0e52c2c9727f6
> >Author: Rich Felker 
> >Date:   Fri Feb 18 22:03:03 2011 -0500
> >
> >support the ugly and deprecated ucontext and sigcontext header 
> > stuff...
> >
> >only the structures, not the functions from ucontext.h, are supported
> >at this point. the main goal of this commit is to make modern gcc 
> > with
> >dwarf2 unwinding build without errors.
> >
> >honestly, it probably doesn't matter how we define these as long as
> >they have members with the right names to prevent errors while
> >compiling libgcc. the only time they will be used is for propagating
> >exceptions across signal-handler boundaries, which invokes undefined
> >behavior anyway. but as-is, they're probably correct and may be 
> > useful
> >to various low-level applications dealing with virtualization, jit
> >code generation, and so on...
> >
> >> I have no objection to the patch, but I think that this patch is a bit
> >> late for gcc-15 and should be committed early in the gcc-16
> >> development cycle. But let's hear release managers (CC'd).
>
> It’s fine for 15, or rather I’m leaving it for you to decide.

OK, based on the above research, I'll commit it to gcc-15.

Thanks,
Uros.

New Ukrainian PO file for 'cpplib' (version 15-b20250216)

2025-02-18 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Ukrainian team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/uk.po

(This file, 'cpplib-15-b20250216.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

2025-02-18 Thread Roman Kagan

On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak wrote:
> On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan  wrote:
> >
> > On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote:
> > > When gcc is built for x86_64-linux-musl target, stack unwinding from
> > > within signal handler stops at the innermost signal frame.  The reason
> > > for this behaviro is that the signal trampoline is not accompanied with
> > > appropiate CFI directives, and the fallback path in libgcc to recognize
> > > it by the code sequence is only enabled for glibc except 2.0.  The
> > > latter is motivated by the lack of sys/ucontext.h in that glibc version.
> > >
> > > Given that all relevant libc-s ship sys/ucontext.h for over a decade,
> > > and that other arches aren't shy of unconditionally using it, follow
> > > suit and remove the preprocessor condition, too.
> 
> "Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC,
> LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest
> glibc 2.0.x version was released in 1997 [1], so I guess we can remove
> the condition for version 2.0. Based on your claim, the other
> mentioned libcs also provide the required header for a long time.

Ah, good point, for completeness I should've supplied evidence from
their respective git repos, here you go:

uclibc(-ng):
  libc/sysdeps/linux/i386/sys/ucontext.h

commit 9cee42f10dbc5b33866ff137b926a74abd7c1a5b
Author: Eric Andersen 
Date:   Fri Mar 1 20:46:26 2002 +

Major rework of the include files to eliminate redundancy
and to better support each arch.  This is a really big patch...
 -Erik

  libc/sysdeps/linux/i386/sys/ucontext.h

commit 1fef64b22811709b2e640d341237bce1c8081203
Author: Mike Frysinger 
Date:   Tue Feb 15 01:27:10 2005 +

headers for x86_64

bionic:
  libc/include/sys/ucontext.h

commit e61d106008f7d77fa1c0de43ac27311320225135
Author: Pavel Chupin 
Date:   Mon Jan 27 17:56:43 2014 +0400

Add x86_64 ucontext.h for better compatibility

As suggested here: https://android-review.googlesource.com/#/c/71267/
it may be used for x86_64 libunwind enabling.

Change-Id: I21623261a48ea7099e030d33932556e294d226ff
Signed-off-by: Pavel Chupin 

commit 677a07cb9a3f5964e9ead4d37b9f775d971c61e0
Author: Elliott Hughes 
Date:   Wed Jan 29 16:46:00 2014 -0800

Add x86 .

Change-Id: I43e72604f7a932f134733b78094b577415a5edb7

musl:
  arch/i386/bits/signal.h
  arch/x86_64/bits/signal.h
  include/ucontext.h

commit ad2fe25041622b6cf426b0f98af0e52c2c9727f6
Author: Rich Felker 
Date:   Fri Feb 18 22:03:03 2011 -0500

support the ugly and deprecated ucontext and sigcontext header stuff...

only the structures, not the functions from ucontext.h, are supported
at this point. the main goal of this commit is to make modern gcc with
dwarf2 unwinding build without errors.

honestly, it probably doesn't matter how we define these as long as
they have members with the right names to prevent errors while
compiling libgcc. the only time they will be used is for propagating
exceptions across signal-handler boundaries, which invokes undefined
behavior anyway. but as-is, they're probably correct and may be useful
to various low-level applications dealing with virtualization, jit
code generation, and so on...

> I have no objection to the patch, but I think that this patch is a bit
> late for gcc-15 and should be committed early in the gcc-16
> development cycle. But let's hear release managers (CC'd).

I gather that GCC doesn't have "cc: stable" process similar to Linux,
does it?

Fine by me anyway.  If it lands in GCC repo I'll at least be able to
poke at some downstream maintainers with a link to cherry-pick.

Thanks,
Roman.

Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

2025-02-18 Thread Uros Bizjak

On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan  wrote:
>
> On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote:
> > When gcc is built for x86_64-linux-musl target, stack unwinding from
> > within signal handler stops at the innermost signal frame.  The reason
> > for this behaviro is that the signal trampoline is not accompanied with
> > appropiate CFI directives, and the fallback path in libgcc to recognize
> > it by the code sequence is only enabled for glibc except 2.0.  The
> > latter is motivated by the lack of sys/ucontext.h in that glibc version.
> >
> > Given that all relevant libc-s ship sys/ucontext.h for over a decade,
> > and that other arches aren't shy of unconditionally using it, follow
> > suit and remove the preprocessor condition, too.

"Relevant libc"-s for x86 linux are LIBC_GLIBC, LIBC_UCLIBC,
LIBC_BIONIC and LIBC_MUSL. As far as glibc is concerned, the latest
glibc 2.0.x version was released in 1997 [1], so I guess we can remove
the condition for version 2.0. Based on your claim, the other
mentioned libcs also provide the required header for a long time.

I have no objection to the patch, but I think that this patch is a bit
late for gcc-15 and should be committed early in the gcc-16
development cycle. But let's hear release managers (CC'd).

[1] https://sourceware.org/glibc/wiki/Glibc%20Timeline

Thanks,
Uros.

> >

> > Signed-off-by: Roman Kagan 
> > ---
> >  libgcc/config/i386/linux-unwind.h | 7 ---
> >  1 file changed, 7 deletions(-)
> >
> > diff --git a/libgcc/config/i386/linux-unwind.h 
> > b/libgcc/config/i386/linux-unwind.h
> > index fe316ee02cf2..8f37642bbf55 100644
> > --- a/libgcc/config/i386/linux-unwind.h
> > +++ b/libgcc/config/i386/linux-unwind.h
> > @@ -33,12 +33,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
> >  If not, see
> >
> >  #ifndef inhibit_libc
> >
> > -/* There's no sys/ucontext.h for glibc 2.0, so no
> > -   signal-turned-exceptions for them.  There's also no configure-run for
> > -   the target, so we can't check on (e.g.) HAVE_SYS_UCONTEXT_H.  Using the
> > -   target libc version macro should be enough.  */
> > -#if defined __GLIBC__ && !(__GLIBC__ == 2 && __GLIBC_MINOR__ == 0)
> > -
> >  #include 
> >  #include 
> >
> > @@ -199,5 +193,4 @@ x86_frob_update_context (struct _Unwind_Context 
> > *context,
> >  }
> >
> >  #endif /* ifdef __x86_64__  */
> > -#endif /* not glibc 2.0 */
> >  #endif /* ifdef inhibit_libc  */
>
> Ping?
>
> Roman.

[pushed: r15-7610] sarif output: fix alphabetization in sarif_scheme_handler::make_sink

2025-02-18 Thread David Malcolm

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-7610-g196e8dbddc509c.

Signed-off-by: David Malcolm 

gcc/ChangeLog:
* opts-diagnostic.cc (sarif_scheme_handler::make_sink): Put
properties in alphabetical order.

Signed-off-by: David Malcolm 
---
 gcc/opts-diagnostic.cc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/opts-diagnostic.cc b/gcc/opts-diagnostic.cc
index 6516e5aec7e..cab7925aa34 100644
--- a/gcc/opts-diagnostic.cc
+++ b/gcc/opts-diagnostic.cc
@@ -434,12 +434,17 @@ sarif_scheme_handler::make_sink (const context &ctxt,
 const char *unparsed_arg,
 const scheme_name_and_params &parsed_arg) const
 {
-  enum sarif_version version = sarif_version::v2_1_0;
   label_text filename;
+  enum sarif_version version = sarif_version::v2_1_0;
   for (auto& iter : parsed_arg.m_kvs)
 {
   const std::string &key = iter.first;
   const std::string &value = iter.second;
+  if (key == "file")
+   {
+ filename = label_text::take (xstrdup (value.c_str ()));
+ continue;
+   }
   if (key == "version")
{
  static const std::array,
@@ -454,11 +459,6 @@ sarif_scheme_handler::make_sink (const context &ctxt,
return nullptr;
  continue;
}
-  if (key == "file")
-   {
- filename = label_text::take (xstrdup (value.c_str ()));
- continue;
-   }
 
   /* Key not found.  */
   auto_vec known_keys;
-- 
2.26.3

[pushed: r15-7611] analyzer: add more properties to sarif output

2025-02-18 Thread David Malcolm

Add some more properties to the analyzer's sarif output, to
help with debugging -fanalyzer.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-7611-gfcdcccdbf809f9.

gcc/analyzer/ChangeLog:
* diagnostic-manager.cc
(saved_diagnostic::maybe_add_sarif_properties): Add various
properties for debugging, for m_stmt, m_var, and m_duplicates.
Remove stray 'if' statement.  Capture the kind of the
pending_diagnostic.
* region-model.cc
(poisoned_value_diagnostic::maybe_add_sarif_properties): New.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/diagnostic-manager.cc | 26 +-
 gcc/analyzer/region-model.cc   | 13 +
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index 8db6a533e604..4bf1dce967de 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -1032,12 +1032,36 @@ saved_diagnostic::maybe_add_sarif_properties 
(sarif_object &result_obj) const
 props.set_string (PROPERTY_PREFIX "sm", m_sm->get_name ());
   props.set_integer (PROPERTY_PREFIX "enode", m_enode->m_index);
   props.set_integer (PROPERTY_PREFIX "snode", m_snode->m_index);
+  if (m_stmt)
+{
+  pretty_printer pp;
+  pp_gimple_stmt_1 (&pp, m_stmt, 0, (dump_flags_t)0);
+  props.set_string (PROPERTY_PREFIX "stmt", pp_formatted_text (&pp));
+}
+  if (m_var)
+props.set (PROPERTY_PREFIX "var", tree_to_json (m_var));
   if (m_sval)
 props.set (PROPERTY_PREFIX "sval", m_sval->to_json ());
   if (m_state)
 props.set (PROPERTY_PREFIX "state", m_state->to_json ());
-  if (m_best_epath)
+  // TODO: m_best_epath
   props.set_integer (PROPERTY_PREFIX "idx", m_idx);
+  if (m_duplicates.length () > 0)
+{
+  auto duplicates_arr = ::make_unique ();
+  for (auto iter : m_duplicates)
+   {
+ auto sd_obj = ::make_unique ();
+ iter->maybe_add_sarif_properties (*sd_obj);
+ duplicates_arr->append (std::move (sd_obj));
+   }
+  props.set (PROPERTY_PREFIX "duplicates",
+ std::move (duplicates_arr));
+}
+#undef PROPERTY_PREFIX
+
+#define PROPERTY_PREFIX "gcc/analyzer/pending_diagnostic/"
+  props.set_string (PROPERTY_PREFIX "kind", m_d->get_kind ());
 #undef PROPERTY_PREFIX
 
   /* Potentially add pending_diagnostic-specific properties.  */
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 78b086900b48..79378a9e6e5f 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -753,6 +753,19 @@ public:
 return true;
   }
 
+  void
+  maybe_add_sarif_properties (sarif_object &result_obj) const final override
+  {
+sarif_property_bag &props = result_obj.get_or_create_properties ();
+#define PROPERTY_PREFIX "gcc/analyzer/poisoned_value_diagnostic/"
+props.set (PROPERTY_PREFIX "expr", tree_to_json (m_expr));
+props.set_string (PROPERTY_PREFIX "kind", poison_kind_to_str (m_pkind));
+if (m_src_region)
+  props.set (PROPERTY_PREFIX "src_region", m_src_region->to_json ());
+props.set (PROPERTY_PREFIX "check_expr", tree_to_json (m_check_expr));
+#undef PROPERTY_PREFIX
+  }
+
 private:
   tree m_expr;
   enum poison_kind m_pkind;
-- 
2.26.3

RE: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-18 Thread Li, Pan2

I see, thanks Richard S for explaining, that makes sense to me and we do 
similar things for frm.

It sounds like we need to re-visit what the semantics of vxrm is, from the spec 
I only find below words.
Does that indicates callee-save(the spec doesn't mention it but it should if it 
is) or something different? Like single-use and then discard.

I may wait a while for the official explanation.

>From spec: "The vxrm and vxsat fields of vcsr are not preserved across calls 
>and their values are unspecified upon entry. "

Pan

-Original Message-
From: Richard Sandiford  
Sent: Monday, February 17, 2025 7:48 PM
To: Li, Pan2 
Cc: Jeff Law ; Andrew Waterman ; 
gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

Richard Sandiford  writes:
> The problem seems to be that mode-switching overloads VXRM_MODE_NONE
> to mean both "no requirement" and "unknown state".  So we have:
>
> static int
> singleton_vxrm_need (void)
> {
>   /* Only needed for vector code.  */
>   if (!TARGET_VECTOR)
> return VXRM_MODE_NONE;

This was a bad example, sorry.  What matters more is that non-vector
instructions are also VXRM_MODE_NONE.  Or more specifically:

>
> and:
>
>   if (vxrm_unknown_p (insn))
> return VXRM_MODE_NONE;
>
> This means that VXRM is assumed to be transparent in an instruction
> that matches vxrm_unknown_p.

...the function:

static int
riscv_vxrm_mode_after (rtx_insn *insn, int mode)
{
  if (vxrm_unknown_p (insn))
return VXRM_MODE_NONE;

  if (recog_memoized (insn) < 0)
return mode;

  if (reg_mentioned_p (gen_rtx_REG (SImode, VXRM_REGNUM), PATTERN (insn)))
return get_attr_vxrm_mode (insn);
  else
return mode;
}

will return VXRM_MODE_NONE if:

(a) insn is something like a call
(b) insn is a normal instruction that does not mention VXRM at all and
mode is already VXRM_MODE_NONE

(b) is the transparent case but (a) is a kill.  Since the block walk
starts with VXRM_MODE_NONE as the initial mode, there needs to be
another mode that (a) can use to indicate a kill.

Thanks,
Richard

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-18 Thread Spencer Abson

Hi Kyrill,

Thanks for your comments, and for answering my question RE your work. Happy to
apply those changes in the next revision.

Cheers,
Spencer

[PATCH] aarch64: Ignore target pragmas while defining intrinsics

2025-02-18 Thread Andrew Carlotti

When initialising intrinsics with `#pragma GCC aarch64 "arm_*.h"`, we
often set an explicit target, but currently leave current_target_pragma
unchanged.  This results in the target pragma being applied to each
simulated intrinsic on top of our explicit target, which is clearly undesirable.

As far as I can tell this doesn't cause any bugs at the moment, because
none of the behaviour for builtin functions depends upon the function
specific target.  However, the unintended target feature combinations
led to unwanted behaviour in an under-developement patch.

This patch resolves the issue by extending aarch64_simd_switcher to
explicitly unset the current_target_pragma, and adapting it for to
support handle_arm_acle_h as well.  I've also renamed the switcher classes
and instances, because I think the new names a slightly clearer.

The chosen sets of features for arm_sve.h and arm_sme.h are not normally
valid, because they exclude FCMA and BF16.  However, I don't think that
matters for the usage here.  Alternatively, aarch64_target_switcher
could be modified to enable all the dependent features as well.


Bootstrapped and regression tested on aarch64. Ok for master (to enable the
dependant WIP patch)?

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc
(aarch64_simd_switcher::aarch64_simd_switcher): Rename to...
(aarch64_target_switcher::aarch64_target_switcher): ...this,
remove default simd flags and save current_target_pragma.
(aarch64_simd_switcher::~aarch64_simd_switcher): Rename to...
(aarch64_target_switcher::~aarch64_target_switcher): ...this,
and restore current_target_pragma.
(handle_arm_acle_h): Use aarch64_target_switcher.
(handle_arm_neon_h): Rename switcher and pass explicit flags.
(aarch64_general_init_builtins): Ditto.
* config/aarch64/aarch64-protos.h
(class aarch64_simd_switcher): Rename to...
(class aarch64_target_switcher): ...this, and add pragma member.
* config/aarch64/aarch64-sve-builtins.cc
(sve_switcher::sve_switcher): Rename to...
(sve_target_switcher::sve_target_switcher): ...this.
(sve_switcher::~sve_switcher): Rename to...
(sve_target_switcher::~sve_target_switcher): ...this.
(init_builtins): Rename switcher.
(handle_arm_sve_h): Ditto.
(handle_arm_neon_sve_bridge_h): Ditto.
(handle_arm_sme_h): Ditto.
* config/aarch64/aarch64-sve-builtins.h
(class sve_switcher): Rename to...
(class sve_target_switcher): ...this.
(class sme_switcher): Rename to...
(class sme_target_switcher): ...this.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
128cc365d3d585e01cb69668f285318ee56a36fc..c1cb6cdcc81c6b45c0132250589bba0be42f195d
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1877,23 +1877,25 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type t)
   return (t == Poly8_t || t == Poly16_t || t == Poly64_t || t == Poly128_t);
 }
 
-/* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced SIMD
-   set.  */
-aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags 
extra_flags)
+/* Temporarily set FLAGS as the enabled target features.  */
+aarch64_target_switcher::aarch64_target_switcher (aarch64_feature_flags flags)
   : m_old_asm_isa_flags (aarch64_asm_isa_flags),
-m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY)
+m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY),
+m_old_target_pragma (current_target_pragma)
 {
   /* Changing the ISA flags should be enough here.  We shouldn't need to
  pay the compile-time cost of a full target switch.  */
   global_options.x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
-  aarch64_set_asm_isa_flags (AARCH64_FL_FP | AARCH64_FL_SIMD | extra_flags);
+  aarch64_set_asm_isa_flags (flags);
+  current_target_pragma = NULL_TREE;
 }
 
-aarch64_simd_switcher::~aarch64_simd_switcher ()
+aarch64_target_switcher::~aarch64_target_switcher ()
 {
   if (m_old_general_regs_only)
 global_options.x_target_flags |= MASK_GENERAL_REGS_ONLY;
   aarch64_set_asm_isa_flags (m_old_asm_isa_flags);
+  current_target_pragma = m_old_target_pragma;
 }
 
 /* Implement #pragma GCC aarch64 "arm_neon.h".
@@ -1903,7 +1905,7 @@ aarch64_simd_switcher::~aarch64_simd_switcher ()
 void
 handle_arm_neon_h (void)
 {
-  aarch64_simd_switcher simd;
+  aarch64_target_switcher switcher (AARCH64_FL_FP | AARCH64_FL_SIMD);
 
   /* Register the AdvSIMD vector tuple types.  */
   for (unsigned int i = 0; i < ARM_NEON_H_TYPES_LAST; i++)
@@ -2353,6 +2355,8 @@ aarch64_init_data_intrinsics (void)
 void
 handle_arm_acle_h (void)
 {
+  aarch64_target_switcher switcher;
+
   aarch64_init_ls64_builtins ();
   aarch64_init_tme_builtins ();
   aarch64_init_memtag_builtins ();
@@ -2446,7 +2450,7 @@ aarch64_general_init_builtins (void)
   aarch64_init_bf16_types ();

[PATCH] c++: Use capture from outer lambda, if any, instead of erroring out [PR110584]

2025-02-18 Thread Simon Martin

We've been rejecting this valid code since r8-4571:

=== cut here ===
void foo (float);
int main () {
  constexpr float x = 0;
  (void) [&] () {
foo (x);
(void) [] () {
  foo (x);
};
  };
}
=== cut here ===

The problem is that when processing X in the inner lambda,
process_outer_var_ref errors out even though it does find the capture
from the enclosing lambda.

This patch changes process_outer_var_ref to accept and return the outer
proxy if it finds any.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/110584

gcc/cp/ChangeLog:

* semantics.cc (process_outer_var_ref): Use capture from
enclosing lambda, if any.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-nested10.C: New test.

---
 gcc/cp/semantics.cc   |  4 ++
 .../g++.dg/cpp0x/lambda/lambda-nested10.C | 46 +++
 2 files changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 7c7d3e3c432..7bbc82f7dc1 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4598,6 +4598,10 @@ process_outer_var_ref (tree decl, tsubst_flags_t 
complain, bool odr_use)
   if (!odr_use && context == containing_function)
 decl = add_default_capture (lambda_stack,
/*id=*/DECL_NAME (decl), initializer);
+  /* When doing lambda capture, if we found a capture in an enclosing lambda,
+ we can use it.  */
+  else if (!odr_use && is_capture_proxy (decl))
+return decl;
   /* Only an odr-use of an outer automatic variable causes an
  error, and a constant variable can decay to a prvalue
  constant without odr-use.  So don't complain yet.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C
new file mode 100644
index 000..2dd9dd4955e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C
@@ -0,0 +1,46 @@
+// PR c++/110584
+// { dg-do "run" { target c++11 } }
+
+void foo (int i) {
+  if (i != 0)
+__builtin_abort ();
+}
+
+int main () {
+  const int x = 0;
+
+  // We would error out on this.
+  (void) [&] () {
+foo (x);
+(void)[] () {
+  foo (x);
+};
+  } ();
+  // As well as those.
+  (void) [&] () {
+(void) [] () {
+  foo (x);
+};
+  } ();
+  (void) [&x] () {
+(void) [] () {
+  foo (x);
+};
+  } ();
+  // But those would work already.
+  (void) [] () {
+(void) [&] () {
+  foo (x);
+};
+  } ();
+  (void) [&] () {
+(void) [&] () {
+  foo (x);
+};
+  } ();
+  (void) [=] () {
+(void) [] () {
+  foo (x);
+};
+  } ();
+}
-- 
2.44.0

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-18 Thread Jan Hubicka

Hello,
I looked into updating the hook
> -/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
> +/* Implement TARGET_CALLEE_SAVE_COST.  */
>  
>  static int
> -ix86_ira_callee_saved_register_cost_scale (int)
> +ix86_callee_save_cost (spill_cost_type, unsigned int, machine_mode,
> +unsigned int, int mem_cost, const HARD_REG_SET &, bool)
>  {
> -  return 1;
> +  /* Account for the fact that push and pop are shorter and do their
> + own allocation and deallocation.  */
> +  return mem_cost - 2;
>  }

I think this is fine for usual performance metrics of push/pop.  For
size we now end up with cost of 0, which is likely not right, so I added
a special case and return 1.  Size costs do not quite correspond to
mov-mov sizes, so I will try to fix it and see if that results in better
code size.

I also added a test that regno in question is integer registers.  While
we do not callee save XMM for the defualt ABI, Microsoft version does.
I am not sure how push2 and pushp extensions comes into game, but we can
do that once we have hardward to test.

Concerning x86 specifics, there is cost for allocating stack frame.  So
if the function has nothing on stack frame push/pop becomes bit better
candidate then a spill.  The hook you added does not seem to be able to
test this, since it does not have frame size as an parameter.  I wonder
if there is easy way to get it in?

Also for old CPUs with no stack prediction engine we split either one or
two push instructions into adjustemnet+move pair.  I do not see how to
put that into game, since the cost of 1 or 2 reigsters then differs from
3 or more, but also I think we do not need to care about this, since all
reaosnably current CPUs have stack prediction.

I am benchmarking updated patch and will send once it is done.

Honza

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-18 Thread Jan Hubicka

> Jan Hubicka  writes:
> > Concerning x86 specifics, there is cost for allocating stack frame.  So
> > if the function has nothing on stack frame push/pop becomes bit better
> > candidate then a spill.  The hook you added does not seem to be able to
> > test this, since it does not have frame size as an parameter.  I wonder
> > if there is easy way to get it in?
> 
> The main frame size is available globally as get_frame_size ().
> There's also the question of whether a frame needs to be created
> for other reasons, such as an alloca call, but I suppose setting
> up a frame for just alloca would also use push on x86?

Usually the frame is first created by push/pop instructions (which are
callee saves and possibly frame pointer) and the remaining capacity is
allocated using add/sub of ESP pointer. If these can be avoided we save
about 8 bytes of code. Performance wise the stack engine will likely
completely hide the overhead of extra add/sub.

We need add/sub for caller saves, spilling and on-stack variables.
We may be able to hide it in red-zone, but only for leafs.
get_frame_size I think only tells me about hte on-stack variables at the
time ira-color is performed.

This is something that would be nice to model better, but also is likely
not critical.  So I only mentioned it in case you or Vladimir can come
up with a nice way to fit this in.
> 
> > Also for old CPUs with no stack prediction engine we split either one or
> > two push instructions into adjustemnet+move pair.  I do not see how to
> > put that into game, since the cost of 1 or 2 reigsters then differs from
> > 3 or more, but also I think we do not need to care about this, since all
> > reaosnably current CPUs have stack prediction.
> 
> Yeah.  The hook does allow you test how many registers have been pushed,
> and how many will be pushed after the change that is being costed.
> But giving a higher cost for the first two registers would probably
> tend to penalise using callee-saved registers for the first few allocnos
> that we colour, which are also likely to be the most important allocnos.
> Trying to cost the difference might therefore be counter-productive.

Actually my memory got this backwards. While I experimented by avoiding
only some push/pop instructions on CPUs w/o stack engine (those were
produced before 2003) it is not in mainline.  All we do is the oposite
conversion. Sometimes we turn sub/add of ESP into shorter but more
expensive push or pop.  This may be accounted in frame allocation cost,
but again, it is only about extra old CPUs.

Honza
> 
> > I am benchmarking updated patch and will send once it is done.
> 
> Thanks!
> 
> Richard

[RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.

2025-02-18 Thread Jin Ma

We overlooked the side effects of the rounding mode in the pattern,
which can impact the result of float_extend and lead to incorrect
optimizations in the final program. This issue likely affects nearly
all similar patterns that involve rounding modes, and the tests in
this patch only highlight one example. It seems challenging to address,
and I only implemented a simple fix, which is not a good way to solve
the problem.

Any comments on this?

gcc/ChangeLog:

* config/riscv/vector-iterators.md (UNSPEC_VRM): New.
* config/riscv/vector.md: Use UNSPEC for float_extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-11.c: New test.

Reported-by: CunJian Huang 
Signed-off-by: Jin Ma 
---
 gcc/config/riscv/vector-iterators.md  |  3 +++
 gcc/config/riscv/vector.md|  6 +++--
 .../gcc.target/riscv/rvv/base/bug-11.c| 24 +++
 3 files changed, 31 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index c1bd7397441..bd592f736e2 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -120,6 +120,9 @@ (define_c_enum "unspec" [
 
   UNSPEC_SF_VFNRCLIP
   UNSPEC_SF_VFNRCLIPU
+
+  ;; Side effects of rounding mode
+  UNSPEC_VRM
 ])
 
 (define_c_enum "unspecv" [
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 8ee43cf0ce1..e971dcdc973 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7135,8 +7135,10 @@ (define_insn 
"@pred_single_widen__scalar"
  (plus_minus:VWEXTF
(match_operand:VWEXTF 3 "register_operand"" vr, vr, vr, 
vr")
(float_extend:VWEXTF
- (vec_duplicate:
-   (match_operand: 4 "register_operand"  "  f,  f,  f, 
 f"
+ (unspec:VWEXTF
+   [(vec_duplicate:
+ (match_operand: 4 "register_operand"  "  f,  f,  
f,  f"))
+ (reg:SI FRM_REGNUM)] UNSPEC_VRM)))
  (match_operand:VWEXTF 2 "vector_merge_operand"  " vu,  0, vu, 
 0")))]
   "TARGET_VECTOR"
   "vfw.wf\t%0,%3,%4%p1"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
new file mode 100644
index 000..52d940cb57a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O2" } */
+
+#include 
+
+int main ()
+{
+  float data_store = 0;
+  int8_t mask = 1;
+  size_t vl = 1;
+  float data_load = 0.0;
+  _Float16 data_sub = 0.0;
+  vint8mf8_t mask_value = __riscv_vle8_v_i8mf8 (&mask, vl);
+  vbool64_t vmask = __riscv_vmseq_vx_i8mf8_b64 (mask_value, 1, vl);
+  vfloat32mf2_t vd_load = __riscv_vfmv_v_f_f32mf2 (0, __riscv_vsetvlmax_e32mf2 
());
+  vfloat32mf2_t vreg_memory = __riscv_vle32_v_f32mf2_tu (vd_load, &data_load, 
vl);
+  vfloat32mf2_t vreg = __riscv_vfwsub_wf_f32mf2_rm_tum (vmask, vreg_memory, 
vreg_memory, data_sub, __RISCV_FRM_RDN, vl);
+  __riscv_vse32_v_f32mf2 (&data_store, vreg, vl);
+
+  __builtin_printf ("%f\n", data_store);
+  return 0;
+}
+
+/* { dg-output "-0.00\\s+\n" } */
-- 
2.25.1

Re: [PATCH] builtins: Ensure sin and cos properly set errno when INFINITY is passed [PR80042]

2025-02-18 Thread Peter0x44

On 2025-02-18 13:30, Richard Biener wrote:
On Tue, Feb 18, 2025 at 1:54 PM Peter0x44  
wrote:

18 Feb 2025 8:51:16 am Richard Biener :

> On Tue, Feb 18, 2025 at 1:21 AM Sam James  wrote:
>>
>> Peter Damianov  writes:
>>
>>> POSIX says that sin and cos should set errno to EDOM when infinity is
>>> passed to
>>> them. Make sure this is accounted for in builtins.def, and add tests.
>>>
>>> gcc/
>>>   PR middle-end/80042
>>>   * builtins.def: (sin|cos)(f|l) can set errno.
>>> gcc/testsuite/
>>>   * gcc.dg/pr80042.c: New testcase.
>>> ---
>>> gcc/builtins.def   | 20 +-
>>> gcc/testsuite/gcc.dg/pr80042.c | 71
>>> ++
>>> 2 files changed, 82 insertions(+), 9 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.dg/pr80042.c
>>>
>>> [...]
>>> diff --git a/gcc/testsuite/gcc.dg/pr80042.c
>>> b/gcc/testsuite/gcc.dg/pr80042.c
>>> new file mode 100644
>>> index 000..cc578ae67e2
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/pr80042.c
>>> @@ -0,0 +1,71 @@
>>> +/* dg-do run */
>>> +/* dg-options "-O2 -lm" */
>>
>> These two lines are missing {}. Please double check the logs from your
>> testsuite run to make sure newly added/changed tests are executed (and
>> in the way you expect).
>
> This test will also FAIL on *BSD IIRC as that doesn't set errno for any
> math
> functions.

So what do you suggest I do about it? Drop the test, or only enable it
for certain known good targets?
I don't use BSD so cannot test it.

Good question.  It's also that old glibc did not set errno here.

>
> I'll note GCC models sincos as cexpi which does not set errno, and will
> eventually expand that to sincos or cexp.  It does that without any
> restriction on -fno-math-errno.

Is this a problem? Would I need to disable expansion to cexp with
-fmath-errno make this work?

I think that the code might assume sin()/cos() is always CONST/PURE
and that for "POSIX-y correctness" we'd have to guard the transform
with -fno-math-errno.

Okay. I will look at doing that.

> I'll also note the C standard does not document any domain error on +-
> Inf arguments.
> Instead it documents a range error for sin(x) and nonzero x too close
> to zero.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sin.html
POSIX does specify it should be a domain error, but C itself doesn't 
seem

to say anything regarding it other than basically "implementations are
allowed to invent errors for this case".

So what's the point of your patch?  That GCC does not assume sin/cos
will not clobber errno?  Maybe the testcase can be rewritten to 
consider
that?  Like check that we did not fold the != EDOM checks at 
compile-time

instead of hard-requiring the library to set that error?

Yes, that's the point. I'm not really sure how to check that 
specifically instead of executing the code, but I should figure it out.

I think a test written in this way would also avoid the mentioned 
problems of the libraries which don't set errno.

Richard.

>
> Richard.
>
>>
>>> [...]

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-18 Thread Richard Sandiford

Jan Hubicka  writes:
> Concerning x86 specifics, there is cost for allocating stack frame.  So
> if the function has nothing on stack frame push/pop becomes bit better
> candidate then a spill.  The hook you added does not seem to be able to
> test this, since it does not have frame size as an parameter.  I wonder
> if there is easy way to get it in?

The main frame size is available globally as get_frame_size ().
There's also the question of whether a frame needs to be created
for other reasons, such as an alloca call, but I suppose setting
up a frame for just alloca would also use push on x86?

> Also for old CPUs with no stack prediction engine we split either one or
> two push instructions into adjustemnet+move pair.  I do not see how to
> put that into game, since the cost of 1 or 2 reigsters then differs from
> 3 or more, but also I think we do not need to care about this, since all
> reaosnably current CPUs have stack prediction.

Yeah.  The hook does allow you test how many registers have been pushed,
and how many will be pushed after the change that is being costed.
But giving a higher cost for the first two registers would probably
tend to penalise using callee-saved registers for the first few allocnos
that we colour, which are also likely to be the most important allocnos.
Trying to cost the difference might therefore be counter-productive.

> I am benchmarking updated patch and will send once it is done.

Thanks!

Richard

Re: [PATCH v2 03/16] Add string_slice class.

2025-02-18 Thread Richard Sandiford

Alfie Richards  writes:
> The string_slice inherits from array_slice and is used to refer to a
> substring of an array that is memory managed elsewhere without modifying
> the underlying array.
>
> For example, this is useful in cases such as when needing to refer to a
> substring of an attribute in the syntax tree.
>
> This commit also adds some minimal helper functions for string_slice,
> such as a strtok alternative, equality operators, strcmp, and a function
> to strip whitespace from the beginning and end of a string_slice.
>
> gcc/ChangeLog:
>
>   * vec.cc (string_slice::strtok): New method.
>   (strcmp): Add implementation for string_slice.
>   (string_slice::strip): New method.
>   (test_string_slice_initializers): New test.
>   (test_string_slice_strtok): Ditto.
>   (test_string_slice_strcmp): Ditto.
>   (test_string_slice_equality): Ditto.
>   (test_string_slice_invalid): Ditto.
>   (test_string_slice_strip): Ditto.
>   (vec_cc_tests): Add new tests.
>   * vec.h (class string_slice): New class.
>   (strcmp): Add implementation for string_slice.

Thanks, mostly LGTM.  Some very minor things below, and a question:

> diff --git a/gcc/vec.cc b/gcc/vec.cc
> index 55f5f3dd447..189cb492c7e 100644
> --- a/gcc/vec.cc
> +++ b/gcc/vec.cc
> @@ -176,6 +176,61 @@ dump_vec_loc_statistics (void)
>vec_mem_desc.dump (VEC_ORIGIN);
>  }
>  
> +string_slice
> +string_slice::tokenize (string_slice *str, string_slice delims)
> +{
> +  const char *ptr = str->begin ();
> +
> +  gcc_assert (str->is_valid () && delims.is_valid ());
> +
> +  for (; ptr < str->end (); ptr++)
> +for (char c : delims)
> +  if (*ptr == c)
> + {
> +   /* Update the input string to be the remaining string.  */
> +   const char* str_begin = str->begin ();

Formatting nit: const char *str_begin

> +   *str = string_slice (ptr  + 1, str->end ());
> +   return string_slice (str_begin, ptr);
> + }
> +
> +  /* If no deliminators between the start and end, return the whole string.  
> */
> +  string_slice res = *str;
> +  *str = string_slice::invalid ();
> +  return res;
> +}
> +
> +int
> +strcmp (string_slice str1, string_slice str2)
> +{
> +  for (unsigned int i = 0; i < str1.size () && i < str2.size (); i++)
> +{
> +  if (str1[i] < str2[i])
> + return -1;
> +  if (str1[i] > str2[i])
> + return 1;
> +}
> +
> +  if (str1.size () < str2.size ())
> +return -1;
> +  if (str1.size () > str2.size ())
> +return 1;
> +  return 0;
> +}
> +
> +string_slice
> +string_slice::strip ()
> +{
> +  const char *start = this->begin ();
> +  const char *end = this->end ();
> +
> +  while (start < end && ISSPACE (*start))
> +start++;
> +  while (end > start && ISSPACE (*(end-1)))
> +end--;
> +
> +  return string_slice (start, end-start);

Just string_slice (start, end) should be enough.

> +}
> +
>  #if CHECKING_P
>  /* Report qsort comparator CMP consistency check failure with P1, P2, P3 as
> witness elements.  */
> [...]
> diff --git a/gcc/vec.h b/gcc/vec.h
> index 915df06f03e..d709d339d40 100644
> --- a/gcc/vec.h
> +++ b/gcc/vec.h
> @@ -2484,4 +2484,69 @@ make_array_slice (T *base, unsigned int size)
>  # pragma GCC poison m_vec m_vecpfx m_vecdata
>  #endif
>  
> +/* string_slice inherits from array_slice, specifically to refer to a 
> substring
> +   of a character array.
> +   It includes some string like helpers.  */
> +class string_slice : public array_slice
> +{
> +public:
> +  explicit string_slice () : array_slice () {}
> +  explicit string_slice (const char *str) : array_slice (str, strlen (str)) 
> {}
> +  explicit string_slice (const char *str, size_t len) :
> +array_slice (str, len) {}
> +  explicit string_slice (const char *start, const char *end) :
> +array_slice (start, end-start) {}

Formatting nit: end - start.

What was the reason for making the constructors explicit?  It would be nice
if string literals at least could be used implicitly.

Thanks,
Richard

> +
> +  friend bool operator== (const string_slice &lhs, const string_slice &rhs)
> +  {
> +if (!lhs.is_valid () || !rhs.is_valid ())
> +  return false;
> +if (lhs.size () != rhs.size ())
> +  return false;
> +return memcmp (lhs.begin (), rhs.begin (), lhs.size ()) == 0;
> +  }
> +
> +  friend bool operator== (const char *lhs, const string_slice &rhs)
> +  {
> +return string_slice (lhs) == rhs;
> +  }
> +
> +  friend bool operator== (const string_slice &lhs, const char *rhs)
> +  {
> +return lhs == string_slice (rhs);
> +  }
> +
> +  friend bool operator!= (const string_slice &lhs, const string_slice &rhs)
> +  {
> +return !(lhs == rhs);
> +  }
> +
> +  friend bool operator!= (const char *lhs, const string_slice &rhs)
> +  {
> +return !(string_slice (lhs) == rhs);
> +  }
> +
> +  friend bool operator!= (const string_slice &lhs, const char *rhs)
> +  {
> +return !(lhs == string_slice (rhs));
> +  }
> +
> +  /* Returns an inval

[PATCH][stage1] middle-end/60779 - LTO vs. -fcx-fortran-rules and -fcx-limited-range

2025-02-18 Thread Richard Biener

The following changes how flag_complex_method is managed towards
being able to record that in the optimization set so we can stream
and restore it per function.  Currently -fcx-fortran-rules and
-fcx-limited-range are separate recorded options but saving/restoring
does not restore flag_complex_method which is later used in the
middle-end.

The solution is to make -fcx-fortran-rules and -fcx-limited-range
aliases of a new -fcx-method= switch that represents flag_complex_method
directly so we can save and restore it.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.  How do
we go about documenting Aliased flags?  I'm hoping for test coverage
of language-specific defaults.

We allowed inlining of -fcx-limited-range into -fno-cx-limited-range
(but failed to check -fcx-fortran-rules).  Such inlining would
pessimize complex multiplication/division, but I've preserved this
behavior and properly based it on flag_complex_method.

OK for stage1?

Thanks,
Richard.

PR middle-end/60779
* common.opt (fcx-method=): New, map to flag_complex_method.
(Enum complex_method): New.
(fcx-limited-range): Alias to -fcx-method=limited-range.
(fcx-fortran-rules): Alias to -fcx-medhot=fortran.
* ipa-inline-transform.cc (inline_call): Check flag_complex_method.
* ipa-inline.cc (can_inline_edge_by_limits_p): Likewise.
* opts.cc (finish_options): Adjust.
(set_fast_math_flags): Likewise.
* doc/invoke.texi (fcx-method=): Document.

* gcc.dg/lto/pr60779_0.c: New testcase.
* gcc.dg/lto/pr60779_1.c: Likewise.
---
 gcc/common.opt   | 28 
 gcc/doc/invoke.texi  | 14 ++
 gcc/ipa-inline-transform.cc  |  8 
 gcc/ipa-inline.cc|  2 +-
 gcc/opts.cc  | 16 
 gcc/testsuite/gcc.dg/lto/pr60779_0.c | 21 +
 gcc/testsuite/gcc.dg/lto/pr60779_1.c |  6 ++
 7 files changed, 66 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr60779_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr60779_1.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 4c2560a0632..b5c1d41abe9 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -53,12 +53,6 @@ bool in_lto_p = false
 Variable
 enum incremental_link flag_incremental_link = INCREMENTAL_LINK_NONE
 
-; 0 means straightforward implementation of complex divide acceptable.
-; 1 means wide ranges of inputs must work for complex divide.
-; 2 means C99-like requirements for complex multiply and divide.
-Variable
-int flag_complex_method = 1
-
 Variable
 int flag_default_complex_method = 1
 
@@ -1292,12 +1286,30 @@ fcse-skip-blocks
 Common Ignore
 Does nothing.  Preserved for backward compatibility.
 
+fcx-method=
+Common Joined RejectNegative Enum(complex_method) Var(flag_complex_method) 
Optimization SetByCombined
+
+Enum
+Name(complex_method) Type(int)
+
+; straightforward implementation of complex divide acceptable.
+EnumValue
+Enum(complex_method) String(limited-range) Value(0)
+
+; wide ranges of inputs must work for complex divide.
+EnumValue
+Enum(complex_method) String(fortran) Value(1)
+
+; C99-like requirements for complex multiply and divide.
+EnumValue
+Enum(complex_method) String(stdc) Value(2)
+
 fcx-limited-range
-Common Var(flag_cx_limited_range) Optimization SetByCombined
+Common Alias(fcx-method=,limited-range,stdc)
 Omit range reduction step when performing complex division.
 
 fcx-fortran-rules
-Common Var(flag_cx_fortran_rules) Optimization
+Common Alias(fcx-method=,fortran,stdc)
 Complex multiplication and division follow Fortran rules.
 
 fdata-sections
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d9b0278228f..8779488027b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -574,7 +574,7 @@ Objective-C and Objective-C++ Dialects}.
 -ffold-mem-offsets
 -fcompare-elim  -fcprop-registers  -fcrossjumping
 -fcse-follow-jumps  -fcse-skip-blocks  -fcx-fortran-rules
--fcx-limited-range
+-fcx-limited-range -fcx-method
 -fdata-sections  -fdce  -fdelayed-branch
 -fdelete-null-pointer-checks  -fdevirtualize  -fdevirtualize-speculatively
 -fdevirtualize-at-ltrans  -fdse
@@ -15482,8 +15482,7 @@ When enabled, this option states that a range reduction 
step is not
 needed when performing complex division.  Also, there is no checking
 whether the result of a complex multiplication or division is @code{NaN
 + I*NaN}, with an attempt to rescue the situation in that case.  The
-default is @option{-fno-cx-limited-range}, but is enabled by
-@option{-ffast-math}.
+option is enabled by @option{-ffast-math}.
 
 This option controls the default setting of the ISO C99
 @code{CX_LIMITED_RANGE} pragma.  Nevertheless, the option applies to
@@ -15496,7 +15495,14 @@ reduction is done as part of complex division, but 
there is no checking
 whether the result of a complex multiplication or division is @code{NaN
 +

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-18 Thread Spencer Abson

On Tue, Feb 18, 2025 at 10:27:46AM +, Richard Sandiford wrote:
> Thanks, this generally looks really good.  Some comments on top of
> Kyrill's, and Christophe's comment internally about -save-temps.
> 
> Spencer Abson  writes:
> > +/* Build and return a new VECTOR_CST that is the concatenation of
> > +   VEC_IN with itself.  */
> > +static tree
> > +aarch64_self_concat_vec_cst (tree vec_in)
> > +{
> > +  gcc_assert ((TREE_CODE (vec_in) == VECTOR_CST));
> > +  unsigned HOST_WIDE_INT nelts
> > += VECTOR_CST_NELTS (vec_in).to_constant ();
> > +
> > +  tree out_type = build_vector_type (TREE_TYPE (TREE_TYPE (vec_in)),
> > +nelts * 2);
> 
> It would be good to pass in the type that the caller wants.
> More about that below.

Yeah, I can see the advantage of that.

> 
> > +
> > +  /* Avoid decoding/encoding if the encoding won't change.  */
> > +  if (VECTOR_CST_DUPLICATE_P (vec_in))
> > +{
> > +  tree vec_out = make_vector (exact_log2
> > +(VECTOR_CST_NPATTERNS (vec_in)), 1);
> > +  unsigned int encoded_size
> > +   = vector_cst_encoded_nelts (vec_in) * sizeof (tree);
> > +
> > +  memcpy (VECTOR_CST_ENCODED_ELTS (vec_out),
> > + VECTOR_CST_ENCODED_ELTS (vec_in), encoded_size);
> > +
> > +  TREE_TYPE (vec_out) = out_type;
> > +  return vec_out;
> > +}
> 
> I'm not sure this is worth it.  The approach below shouldn't be that
> much less efficient, since all the temporaries are generally on the
> stack.  Also:
> 
> > +
> > +  tree_vector_builder vec_out (out_type, nelts, 1);
> 
> This call rightly describes a duplicated sequence of NELTS elements so...
> 
> > +  for (unsigned i = 0; i < nelts * 2; i++)
> > +vec_out.quick_push (VECTOR_CST_ELT (vec_in, i % nelts));
> 
> ...it should only be necessary to push nelts elements here.

Good point!

> 
> > +
> > +  return vec_out.build ();
> > +}
> > +
> > +/* If the SSA_NAME_DEF_STMT of ARG is an assignement to a
> > +   BIT_FIELD_REF with SIZE and OFFSET, return the object of the
> > +   BIT_FIELD_REF.  Otherwise, return NULL_TREE.  */
> > +static tree
> > +aarch64_object_of_bfr (tree arg, unsigned HOST_WIDE_INT size,
> > +  unsigned HOST_WIDE_INT offset)
> > +{
> > +  if (TREE_CODE (arg) != SSA_NAME)
> > +return NULL_TREE;
> > +
> > +  gassign *stmt = dyn_cast (SSA_NAME_DEF_STMT (arg));
> > +
> > +  if (!stmt)
> > +return NULL_TREE;
> > +
> > +  if (gimple_assign_rhs_code (stmt) != BIT_FIELD_REF)
> > +return NULL_TREE;
> > +
> > +  tree bf_ref = gimple_assign_rhs1 (stmt);
> > +
> > +  if (bit_field_size (bf_ref).to_constant () != size
> > +  || bit_field_offset (bf_ref).to_constant () != offset)
> > +return NULL_TREE;
> > +
> > +  return TREE_OPERAND (bf_ref, 0);
> 
> I think this also needs to check that operand 0 of the BIT_FIELD_REF
> is a 128-bit vector.  A 64-bit reference at offset 64 could instead
> be into something else, such as a 256-bit vector.
> 
> An example is:
> 
> --
> #include 
> 
> typedef int16_t int16x16_t __attribute__((vector_size(32)));
> 
> int32x4_t
> f (int16x16_t foo)
> {
>   return vmovl_s16 ((int16x4_t) { foo[4], foo[5], foo[6], foo[7] });
> }
> --
> 
> which triggers an ICE.
> 
> Even if the argument is a 128-bit vector, it could be a 128-bit
> vector of a different type, such as in:
> 
> --
> #include 
> 
> int32x4_t
> f (int32x4_t foo)
> {
>   return vmovl_s16 (vget_high_s16 (vreinterpretq_s16_s32 (foo)));
> }
> --
> 
> I think we should still accept this second case, but emit a VIEW_CONVERT_EXPR
> before the call to convert the argument to the right type.
> 

Thanks for raising these, serious tunnel vision on my part...

> > +}
> > +
> > +/*  Prefer to use the highpart builtin when:
> > +
> > +1) All lowpart arguments are references to the highparts of other
> > +vectors.
> > +
> > +2) For calls with two lowpart arguments, if either refers to a
> > +vector highpart and the other is a VECTOR_CST.  We can copy the
> > +VECTOR_CST to 128b in this case.  */
> > +static bool
> > +aarch64_fold_lo_call_to_hi (tree arg_0, tree arg_1, tree *out_0,
> > +   tree *out_1)
> > +{
> > +  /* Punt until as late as possible:
> > +
> > + 1) By folding away BIT_FIELD_REFs we remove information about the
> > + operands that may be useful to other optimizers.
> > +
> > + 2) For simplicity, we'd like the expression
> > +
> > +   x = BIT_FIELD_REF
> > +
> > + to imply that A is not a VECTOR_CST.  This assumption is unlikely
> > + to hold before constant propagation/folding.  */
> > +  if (!(cfun->curr_properties & PROP_last_full_fold))
> > +return false;
> > +
> > +  unsigned int offset = B

Re: [PATCH] arm: Remove inner 'fix:HF/SF/DF' from fixed-point patterns (PR 117712)

2025-02-18 Thread Richard Earnshaw (lists)

On 18/02/2025 08:37, Christophe Lyon wrote:
> As discussed in the PR, removing the inner 'fix:HF/SD/DF' fixes the
> problem, like other targets do.
> 

The double-'fix' idiom was introduced in 
https://gcc.gnu.org/pipermail/gcc-patches/2003-March/098380.html to address 
target/5985.  Certainly at the time it seems that FIX had two meanings 
depending on the mode.  If the target was a floating point mode it did a 
truncation operation with rounding.  If it was an integer mode it did trucation 
with unspecified rounding.  But the manual doesn't seem to mention 
FIX: (at least not now), so I'm wondering if something has been lost 
somewhere along the line.

Anyway, I'm not sure this is right yet.

R.

> gcc/ChangeLog:
> 
>   PR rtl-optimization/117712
>   * config/arm/arm.md (fix_trunchfsi2): Remove inner fix:HF.
>   (fix_trunchfdi2): Likewise.
>   (fix_truncsfsi2): Remove inner fix:SF.
>   (fix_truncdfsi2): Remove inner fix:DF.
>   * config/arm/vfp.md (truncsisf2_vfp): remove inner fix:SF.
>   (truncsidf2_vfp): Remove inner fix:DF.
>   (fixuns_truncsfsi2): Remove inner fix:SF.
>   (fixuns_truncdfsi2): Remove inner fix:DF.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR rtl-optimization/117712
>   * gcc.target/arm/pr117712-df.c: New test.
>   * gcc.target/arm/pr117712-hf-di.c: New test.
>   * gcc.target/arm/pr117712-hf.c: New test.
>   * gcc.target/arm/pr117712-sf.c: New test.
> ---
>  gcc/config/arm/arm.md |  8 
>  gcc/config/arm/vfp.md |  8 
>  gcc/testsuite/gcc.target/arm/pr117712-df.c| 10 ++
>  gcc/testsuite/gcc.target/arm/pr117712-hf-di.c | 10 ++
>  gcc/testsuite/gcc.target/arm/pr117712-hf.c| 10 ++
>  gcc/testsuite/gcc.target/arm/pr117712-sf.c| 10 ++
>  6 files changed, 48 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-df.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf-di.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-sf.c
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 442d86b9329..ed0d0da2e63 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -5477,7 +5477,7 @@ (define_expand "floatsidf2"
>  
>  (define_expand "fix_trunchfsi2"
>[(set (match_operand:SI 0 "general_operand")
> - (fix:SI (fix:HF (match_operand:HF 1 "general_operand"]
> + (fix:SI (match_operand:HF 1 "general_operand")))]
>"TARGET_EITHER"
>"
>{
> @@ -5489,7 +5489,7 @@ (define_expand "fix_trunchfsi2"
>  
>  (define_expand "fix_trunchfdi2"
>[(set (match_operand:DI 0 "general_operand")
> - (fix:DI (fix:HF (match_operand:HF 1 "general_operand"]
> + (fix:DI (match_operand:HF 1 "general_operand")))]
>"TARGET_EITHER"
>"
>{
> @@ -5501,14 +5501,14 @@ (define_expand "fix_trunchfdi2"
>  
>  (define_expand "fix_truncsfsi2"
>[(set (match_operand:SI 0 "s_register_operand")
> - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand"]
> + (fix:SI (match_operand:SF 1 "s_register_operand")))]
>"TARGET_32BIT && TARGET_HARD_FLOAT"
>"
>  ")
>  
>  (define_expand "fix_truncdfsi2"
>[(set (match_operand:SI 0 "s_register_operand")
> - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand"]
> + (fix:SI (match_operand:DF 1 "s_register_operand")))]
>"TARGET_32BIT && TARGET_HARD_FLOAT && !TARGET_VFP_SINGLE"
>"
>  ")
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 379f5f7b3dc..0ef019b1727 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -1508,7 +1508,7 @@ (define_insn "truncsfhf2"
>  
>  (define_insn "*truncsisf2_vfp"
>[(set (match_operand:SI  0 "s_register_operand" "=t")
> - (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" "t"]
> + (fix:SI (match_operand:SF 1 "s_register_operand" "t")))]
>"TARGET_32BIT && TARGET_HARD_FLOAT"
>"vcvt%?.s32.f32\\t%0, %1"
>[(set_attr "predicable" "yes")
> @@ -1517,7 +1517,7 @@ (define_insn "*truncsisf2_vfp"
>  
>  (define_insn "*truncsidf2_vfp"
>[(set (match_operand:SI  0 "s_register_operand" "=t")
> - (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand" "w"]
> + (fix:SI (match_operand:DF 1 "s_register_operand" "w")))]
>"TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE"
>"vcvt%?.s32.f64\\t%0, %P1"
>[(set_attr "predicable" "yes")
> @@ -1527,7 +1527,7 @@ (define_insn "*truncsidf2_vfp"
>  
>  (define_insn "fixuns_truncsfsi2"
>[(set (match_operand:SI  0 "s_register_operand" "=t")
> - (unsigned_fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" 
> "t"]
> + (unsigned_fix:SI (match_operand:SF 1 "s_register_operand" "t")))]
>"TARGET_32BIT && TARGET_HARD_FLOAT"
>"vcvt%?.u32.f32\\t%0, %1"
>[(set_a

Re: [PATCH] builtins: Ensure sin and cos properly set errno when INFINITY is passed [PR80042]

2025-02-18 Thread Peter0x44


18 Feb 2025 8:51:16 am Richard Biener :


On Tue, Feb 18, 2025 at 1:21 AM Sam James  wrote:


Peter Damianov  writes:

POSIX says that sin and cos should set errno to EDOM when infinity is 
passed to

them. Make sure this is accounted for in builtins.def, and add tests.

gcc/
  PR middle-end/80042
  * builtins.def: (sin|cos)(f|l) can set errno.
gcc/testsuite/
  * gcc.dg/pr80042.c: New testcase.
---
gcc/builtins.def   | 20 +-
gcc/testsuite/gcc.dg/pr80042.c | 71 
++

2 files changed, 82 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/pr80042.c

[...]
diff --git a/gcc/testsuite/gcc.dg/pr80042.c 
b/gcc/testsuite/gcc.dg/pr80042.c

new file mode 100644
index 000..cc578ae67e2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80042.c
@@ -0,0 +1,71 @@
+/* dg-do run */
+/* dg-options "-O2 -lm" */


These two lines are missing {}. Please double check the logs from your
testsuite run to make sure newly added/changed tests are executed (and
in the way you expect).


This test will also FAIL on *BSD IIRC as that doesn't set errno for any 
math

functions.


So what do you suggest I do about it? Drop the test, or only enable it 
for certain known good targets?

I don't use BSD so cannot test it.



I'll note GCC models sincos as cexpi which does not set errno, and will
eventually expand that to sincos or cexp.  It does that without any
restriction on -fno-math-errno.


Is this a problem? Would I need to disable expansion to cexp with 
-fmath-errno make this work?



I'll also note the C standard does not document any domain error on +-
Inf arguments.
Instead it documents a range error for sin(x) and nonzero x too close 
to zero.


https://pubs.opengroup.org/onlinepubs/9699919799/functions/sin.html
POSIX does specify it should be a domain error, but C itself doesn't seem 
to say anything regarding it other than basically "implementations are 
allowed to invent errors for this case".




Richard.




[...]

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-18 Thread Richard Sandiford

Thanks, this generally looks really good.  Some comments on top of
Kyrill's, and Christophe's comment internally about -save-temps.

Spencer Abson  writes:
> +/* Build and return a new VECTOR_CST that is the concatenation of
> +   VEC_IN with itself.  */
> +static tree
> +aarch64_self_concat_vec_cst (tree vec_in)
> +{
> +  gcc_assert ((TREE_CODE (vec_in) == VECTOR_CST));
> +  unsigned HOST_WIDE_INT nelts
> += VECTOR_CST_NELTS (vec_in).to_constant ();
> +
> +  tree out_type = build_vector_type (TREE_TYPE (TREE_TYPE (vec_in)),
> +  nelts * 2);

It would be good to pass in the type that the caller wants.
More about that below.

> +
> +  /* Avoid decoding/encoding if the encoding won't change.  */
> +  if (VECTOR_CST_DUPLICATE_P (vec_in))
> +{
> +  tree vec_out = make_vector (exact_log2
> +  (VECTOR_CST_NPATTERNS (vec_in)), 1);
> +  unsigned int encoded_size
> + = vector_cst_encoded_nelts (vec_in) * sizeof (tree);
> +
> +  memcpy (VECTOR_CST_ENCODED_ELTS (vec_out),
> +   VECTOR_CST_ENCODED_ELTS (vec_in), encoded_size);
> +
> +  TREE_TYPE (vec_out) = out_type;
> +  return vec_out;
> +}

I'm not sure this is worth it.  The approach below shouldn't be that
much less efficient, since all the temporaries are generally on the
stack.  Also:

> +
> +  tree_vector_builder vec_out (out_type, nelts, 1);

This call rightly describes a duplicated sequence of NELTS elements so...

> +  for (unsigned i = 0; i < nelts * 2; i++)
> +vec_out.quick_push (VECTOR_CST_ELT (vec_in, i % nelts));

...it should only be necessary to push nelts elements here.

> +
> +  return vec_out.build ();
> +}
> +
> +/* If the SSA_NAME_DEF_STMT of ARG is an assignement to a
> +   BIT_FIELD_REF with SIZE and OFFSET, return the object of the
> +   BIT_FIELD_REF.  Otherwise, return NULL_TREE.  */
> +static tree
> +aarch64_object_of_bfr (tree arg, unsigned HOST_WIDE_INT size,
> +unsigned HOST_WIDE_INT offset)
> +{
> +  if (TREE_CODE (arg) != SSA_NAME)
> +return NULL_TREE;
> +
> +  gassign *stmt = dyn_cast (SSA_NAME_DEF_STMT (arg));
> +
> +  if (!stmt)
> +return NULL_TREE;
> +
> +  if (gimple_assign_rhs_code (stmt) != BIT_FIELD_REF)
> +return NULL_TREE;
> +
> +  tree bf_ref = gimple_assign_rhs1 (stmt);
> +
> +  if (bit_field_size (bf_ref).to_constant () != size
> +  || bit_field_offset (bf_ref).to_constant () != offset)
> +return NULL_TREE;
> +
> +  return TREE_OPERAND (bf_ref, 0);

I think this also needs to check that operand 0 of the BIT_FIELD_REF
is a 128-bit vector.  A 64-bit reference at offset 64 could instead
be into something else, such as a 256-bit vector.

An example is:

--
#include 

typedef int16_t int16x16_t __attribute__((vector_size(32)));

int32x4_t
f (int16x16_t foo)
{
  return vmovl_s16 ((int16x4_t) { foo[4], foo[5], foo[6], foo[7] });
}
--

which triggers an ICE.

Even if the argument is a 128-bit vector, it could be a 128-bit
vector of a different type, such as in:

--
#include 

int32x4_t
f (int32x4_t foo)
{
  return vmovl_s16 (vget_high_s16 (vreinterpretq_s16_s32 (foo)));
}
--

I think we should still accept this second case, but emit a VIEW_CONVERT_EXPR
before the call to convert the argument to the right type.

> +}
> +
> +/*  Prefer to use the highpart builtin when:
> +
> +1) All lowpart arguments are references to the highparts of other
> +vectors.
> +
> +2) For calls with two lowpart arguments, if either refers to a
> +vector highpart and the other is a VECTOR_CST.  We can copy the
> +VECTOR_CST to 128b in this case.  */
> +static bool
> +aarch64_fold_lo_call_to_hi (tree arg_0, tree arg_1, tree *out_0,
> + tree *out_1)
> +{
> +  /* Punt until as late as possible:
> +
> + 1) By folding away BIT_FIELD_REFs we remove information about the
> + operands that may be useful to other optimizers.
> +
> + 2) For simplicity, we'd like the expression
> +
> + x = BIT_FIELD_REF
> +
> + to imply that A is not a VECTOR_CST.  This assumption is unlikely
> + to hold before constant propagation/folding.  */
> +  if (!(cfun->curr_properties & PROP_last_full_fold))
> +return false;
> +
> +  unsigned int offset = BYTES_BIG_ENDIAN ? 0 : 64;
> +
> +  tree hi_arg_0 = aarch64_object_of_bfr (arg_0, 64, offset);
> +  tree hi_arg_1 = aarch64_object_of_bfr (arg_1, 64, offset);
> +  if (!hi_arg_0)
> +{
> +  if (!hi_arg_1 || TREE_CODE (arg_0) != VECTOR_CST)
> + return false;
> +  hi_arg_0 = aarch64_self_concat_vec_cst (arg_0);
> +}
> +  else if (!hi_arg_1)
> +{
> +  if (TREE_CODE (arg_1) != VECTOR_CST)
> + return false;
> +  hi_arg_1 = aarc

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Richard Sandiford

Soumya AR  writes:
>> On 18 Feb 2025, at 2:27 PM, Kyrylo Tkachov  wrote:
>> 
>> 
>> 
>>> On 18 Feb 2025, at 09:48, Kyrylo Tkachov  wrote:
>>> 
>>> 
>>> 
 On 18 Feb 2025, at 09:41, Richard Sandiford  
 wrote:
 
 Kyrylo Tkachov  writes:
> Hi Soumya
> 
>> On 18 Feb 2025, at 09:12, Soumya AR  wrote:
>> 
>> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
>> generic_prefetch_tune in generic_armv8_a_tunings.
>> 
>> This patch updates the pointer to generic_armv8_a_prefetch_tune.
>> 
>> This patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>> regression.
>> 
>> Ok for GCC 15 now?
> 
> Yes, this looks like a simple oversight.
> Ok to push to master.
 
 I suppose the alternative would be to remove generic_armv8_a_prefetch_tune,
 since it's (deliberately) identical to generic_prefetch_tune.
>>> 
>>> Looks like we have one prefetch_tune structure for each of the generic 
>>> tunings (generic, generic_armv8_a, generic_armv9_a).
>>> For the sake of symmetry it feels a bit better to have them independently 
>>> tunable.
>>> But as the effects are the same, it may be better to remove it in the 
>>> interest of less code.
>>> 
>> 
>> I see Soumya has already pushed her patch. I’m okay with either approach 
>> tbh, but if Richard prefers we can remove generic_armv8_a_prefetch_tune in a 
>> separate commit.
>
> Yeah, missed Richard’s mail.
>
> Let me know which is preferable, thanks.

No, it's fine as is.  My comment was just a suggestion.

Thanks,
Richard

Re: [PATCH v2 05/16] Update is_function_default_version to work with target_version.

2025-02-18 Thread Richard Sandiford

Alfie Richards  writes:
> Notably this respects target_version semantics where an unannotated
> function can be the default version.
>
> gcc/ChangeLog:
>
>   * attribs.cc (is_function_default_version): Add target_version logic.

OK for GCC 16, thanks.

Richard

> ---
>  gcc/attribs.cc | 27 ---
>  1 file changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index 56dd18c2fa8..f6667839c01 100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -1279,18 +1279,31 @@ make_dispatcher_decl (const tree decl)
>return func_decl;
>  }
>  
> -/* Returns true if DECL is multi-versioned using the target attribute, and 
> this
> -   is the default version.  This function can only be used for targets that 
> do
> -   not support the "target_version" attribute.  */
> +/* Returns true if DECL a multiversioned default.
> +   With the target attribute semantics, returns true if the function is 
> marked
> +   as default with the target version.
> +   With the target_version attribute semantics, returns true if the function
> +   is either not annotated, or annotated as default.  */
>  
>  bool
>  is_function_default_version (const tree decl)
>  {
> -  if (TREE_CODE (decl) != FUNCTION_DECL
> -  || !DECL_FUNCTION_VERSIONED (decl))
> +  tree attr;
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
>  return false;
> -  tree attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
> -  gcc_assert (attr);
> +  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
> +{
> +  if (!DECL_FUNCTION_VERSIONED (decl))
> + return false;
> +  attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
> +  gcc_assert (attr);
> +}
> +  else
> +{
> +  attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl));
> +  if (!attr)
> + return true;
> +}
>attr = TREE_VALUE (TREE_VALUE (attr));
>return (TREE_CODE (attr) == STRING_CST
> && strcmp (TREE_STRING_POINTER (attr), "default") == 0);

Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.

2025-02-18 Thread Kito Cheng

We already have a use of "(reg:SI FRM_REGNUM)" within the pattern, is
it not enough?
I believe the answer is not enough so you propose this patch, so could
you explain a few more about what happened?

(define_insn "@pred_single_widen__scalar"
 [(set (match_operand:VWEXTF 0 "register_operand""=vd,
vd, vr, vr")
   (if_then_else:VWEXTF
 (unspec:
   [(match_operand: 1 "vector_mask_operand"  " vm,
vm,Wc1,Wc1")
(match_operand 5 "vector_length_operand"
"rvl,rvl,rvl,rvl")
(match_operand 6 "const_int_operand" "  i,
 i,  i,  i")
(match_operand 7 "const_int_operand" "  i,
 i,  i,  i")
(match_operand 8 "const_int_operand" "  i,
 i,  i,  i")
(match_operand 9 "const_int_operand" "  i,
 i,  i,  i")
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)
(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)  <-here
 (plus_minus:VWEXTF
   (match_operand:VWEXTF 3 "register_operand"" vr,
vr, vr, vr")
   (float_extend:VWEXTF
 (vec_duplicate:
   (match_operand: 4 "register_operand"  "  f,
 f,  f,  f"
 (match_operand:VWEXTF 2 "vector_merge_operand"  " vu,
 0, vu,  0")))]

On Tue, Feb 18, 2025 at 7:14 PM Jin Ma  wrote:
>
> We overlooked the side effects of the rounding mode in the pattern,
> which can impact the result of float_extend and lead to incorrect
> optimizations in the final program. This issue likely affects nearly
> all similar patterns that involve rounding modes, and the tests in
> this patch only highlight one example. It seems challenging to address,
> and I only implemented a simple fix, which is not a good way to solve
> the problem.
>
> Any comments on this?
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md (UNSPEC_VRM): New.
> * config/riscv/vector.md: Use UNSPEC for float_extend.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/bug-11.c: New test.
>
> Reported-by: CunJian Huang 
> Signed-off-by: Jin Ma 
> ---
>  gcc/config/riscv/vector-iterators.md  |  3 +++
>  gcc/config/riscv/vector.md|  6 +++--
>  .../gcc.target/riscv/rvv/base/bug-11.c| 24 +++
>  3 files changed, 31 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index c1bd7397441..bd592f736e2 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -120,6 +120,9 @@ (define_c_enum "unspec" [
>
>UNSPEC_SF_VFNRCLIP
>UNSPEC_SF_VFNRCLIPU
> +
> +  ;; Side effects of rounding mode
> +  UNSPEC_VRM
>  ])
>
>  (define_c_enum "unspecv" [
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 8ee43cf0ce1..e971dcdc973 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7135,8 +7135,10 @@ (define_insn 
> "@pred_single_widen__scalar"
>   (plus_minus:VWEXTF
> (match_operand:VWEXTF 3 "register_operand"" vr, vr, 
> vr, vr")
> (float_extend:VWEXTF
> - (vec_duplicate:
> -   (match_operand: 4 "register_operand"  "  f,  f,  
> f,  f"
> + (unspec:VWEXTF
> +   [(vec_duplicate:
> + (match_operand: 4 "register_operand"  "  f,  f, 
>  f,  f"))
> + (reg:SI FRM_REGNUM)] UNSPEC_VRM)))
>   (match_operand:VWEXTF 2 "vector_merge_operand"  " vu,  0, 
> vu,  0")))]
>"TARGET_VECTOR"
>"vfw.wf\t%0,%3,%4%p1"
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
> new file mode 100644
> index 000..52d940cb57a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O2" } */
> +
> +#include 
> +
> +int main ()
> +{
> +  float data_store = 0;
> +  int8_t mask = 1;
> +  size_t vl = 1;
> +  float data_load = 0.0;
> +  _Float16 data_sub = 0.0;
> +  vint8mf8_t mask_value = __riscv_vle8_v_i8mf8 (&mask, vl);
> +  vbool64_t vmask = __riscv_vmseq_vx_i8mf8_b64 (mask_value, 1, vl);
> +  vfloat32mf2_t vd_load = __riscv_vfmv_v_f_f32mf2 (0, 
> __riscv_vsetvlmax_e32mf2 ());
> +  vfloat32mf2_t vreg_memory = __riscv_vle32_v_f32mf2_tu (vd_load, 
> &data_load, vl);
> +  vfloat32mf2_t vreg = __riscv_vfwsub_wf_f32mf2_rm_tum (vmask, vreg_memory, 
> vreg_memory, data_sub, __RISCV_FRM_RDN, vl);
> +  __riscv_vse32_v_f32mf2 (&data_store, vreg, vl);
> +
> +  __builtin_printf ("%f\n", data_store);
> +  return 0;
> +}
> +
> +/* { dg-output "-0.00\\s+\n" } */
> --
> 2.25.1
>

[committed] gfortran.dg/gomp/metadirective-3.f90

2025-02-18 Thread Tobias Burnus


With a compiler setup to compile (also) for nvptx offloading,
the testcase triggered a bogus error - and that prevents in
addition the gimple scan.

Fixed by adding an xfail and an xfailed dg-bogus.

The issue itself is the known https://gcc.gnu.org/PR118694

Committed as obvious asr15-7606-g8d922a80396b0c, cf. attachment. Tobias
commit 8d922a80396b0cc9f5311d79aa760412dd018848
Author: Tobias Burnus 
Date:   Tue Feb 18 15:48:39 2025 +0100

gfortran.dg/gomp/metadirective-3.f90: xfail on offload_nvptx

Currently, 'target' with a nested metadirective creating a 'teams' will
fail with a bogus error ("‘target’ construct with nested ‘teams’ construct
contains directives outside of the ‘teams’ construct").
That's tracked at PR118694 - and, hence, expected.

However, the testcase metadirective-3.f90 triggers this when compiling for
'target offload_nvptx' (otherwise, the code is optimized away). Use xfail to
silence the error as it is known and there is a tracking PR.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/metadirective-3.f90: Add xfail when
compiling for offload_nvptx.
---
 gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90 | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90 b/gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90
index c5e25e598eb..e2ebb0a39c1 100644
--- a/gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/metadirective-3.f90
@@ -22,4 +22,7 @@ end module
 !  that alternative and not produce a metadirective at all.  Otherwise this
 !  won't be resolved until late.
 ! { dg-final { scan-tree-dump-not "#pragma omp metadirective" "gimple" { target { ! offload_nvptx } } } }
-! { dg-final { scan-tree-dump "#pragma omp metadirective" "gimple" { target { offload_nvptx } } } }
+
+! The following two are xfail because the bogus error triggers and thus prevents the dump, cf. PR118694
+! { dg-final { scan-tree-dump "#pragma omp metadirective" "gimple" { target { offload_nvptx } xfail { offload_nvptx } } } }
+! { dg-bogus "'target' construct with nested 'teams' construct contains directives outside of the 'teams' construct" "PR118694" { xfail offload_nvptx } 10 }

Re: 7/7 [Fortran, Patch, Coarray, PR107635] Remove deprecated coarray routines

2025-02-18 Thread Andre Vehreschild

Hi Thomas,

> This patch series (of necessity) introduces ABI changes.  What will
> happen with user code compiled against the old interface?

That depends on the library you are linking against. When using caf_single from
gfortran, then you will get link failures when you mix code compiled by
gfortran < 15 and gfortran-15. But caf_single is anyhow only considered for
testing. So why should one do this ?

If your questions targets the users of this ABI, which to my knowledge is only
OpenCoarrays at the moment, then the user will experience nothing. A mix of
pre-gfortran-15 and gfortran-15 generated .o-files will link and work as
expected, because OpenCoarrays provides all ABIs. We do not compile a
gfortran-15 exclusive version of OpenCoarrays, i.e. all routines are present,
fully functional and interoperable.

> I guess a link failure (plus an answer in stack exchange where the
> explanation is given, so people can google it, and a mention in the
> release notes) would be acceptable, but is there anything that
> can be done in addition?

I can provide an entry in release notes, if need be. Where do I have to do
this? Never did.

Thanks again,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: 7/7 [Fortran, Patch, Coarray, PR107635] Remove deprecated coarray routines

2025-02-18 Thread Andre Vehreschild

Hi Jerry,

thank you very much for taking on the job of reviewing and sorry for my late
answer. In fact, I was having a hard time figuring regressions in the
OpenCoarrays library.

This also answers your first question: Yes, OpenCoarrays will make use of the
new interface. Most of the changes in the interface are required by
OpenCoarrays. Today I got all OpenCoarray's tests passing.

The OpenCoarrays tests all run a little bit faster than with the old method.
Please keep in mind, that those tests keep starting and stopping tiny apps.
I.e. the overhead of this sequential part is significant. Unfortunately the
speedup is tiny (about 3 seconds for the whole suite running now in 1:21.38
(m:ss.ms; Release-build, i.e. -O3; mpich and Intel's mpi).

I will look for a better benchmark suite. I think to remember that in some
ticket on OpenCoarrays one was mentioned. Nevertheless are all these tests run
on single machine. I have no cluster to command.

I will rebase, rename rewrite.cc to coarray.cc, retest and merge shortly, if no
one objects. Then I unfortunately have to post a new small bugfix (about 10
lines).

Thanks again,
Andre

On Fri, 14 Feb 2025 10:19:28 -0800
Jerry D  wrote:

> On 2/13/25 11:48 AM, Jerry D wrote:
> > On 2/10/25 2:25 AM, Andre Vehreschild wrote:
> >> [PATCH 7/7] Fortran: Remove deprecated coarray routines [PR107635]
> >>
> >
> > I have applied all patches. Regression tested OK here.
> >
> >  From patch 5 there was one reject:
> >
> > patching file gcc/testsuite/gfortran.dg/coarray/send_char_array_1.f90
> > Hunk #1 FAILED at 39.
> > 1 out of 1 hunk FAILED -- saving rejects to file gcc/testsuite/
> > gfortran.dg/coarray/send_char_array_1.f90.rej
> >
> 
> > I commented earlier about changing the name of rewrite.cc.
>  this please.
> >
> > I am now going through the whole enchilada for editorial stuff.
> >
> > Regards,
> >
>
> I finished going through the last nine yards and it looks good. I have a
> couple of questions:
>
> Have you been able to test against the OpenCoarray tests?
>
> Have you been able to measure any performance improvements?
>
> I suspect that the latter question may relate only to multi-node large
> systems.
>
> I think this is good to commit. (all 7 parts)
>
> Does anyone else have any comments?
>
> Regards,
>
> Jerry
>
>
>

--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH v2 13/16] Change target_version semantics to follow ACLE specification.

2025-02-18 Thread Richard Sandiford

Alfie Richards  writes:
> This changes behavior of target_clones and target_version attributes
> to be inline with what is specified in the Arm C Language Extension.
>
> Notably this changes the scope and signature of multiversioned functions
> to that of the default version, and changes the resolver to be
> created at the implementation of the default version.
>
> This is achieved by changing the C++ front end to no longer resolve any
> non-default version decls in lookup, and by moving dipatching
> for default_target sets to reuse the dispatching logic for target_clones
> in multiple_target.cc.
>
> The dispatching in create_dispatcher_calls is changed for the case of
> a lone annotated default function to change the dispatched symbol to
> be an alias for the mangled default function.

Heh, nice trick.  I agree that conceptually it's also a a very clean
solution, but I don't know the cgraph internals well enough to know
whether there might be dragons.

The gcc/*.cc changes look good to me as far as I can review them.

Thanks,
Richard

>
> gcc/ChangeLog:
>
>   * cgraphunit.cc (analyze_functions): Add logic for target version
>   dependencies.
>   * ipa.cc (symbol_table::remove_unreachable_nodes): Ditto.
>   * multiple_target.cc (create_dispatcher_calls): Change to support
>   target version semantics.
>   (ipa_target_clone): Change to dispatch all function sets in
>   target_version semantics.
>
> gcc/cp/ChangeLog:
>
>   * call.cc (add_candidates): Change to not resolve non-default versions 
> in
>   target_version semantics.
>   * class.cc (resolve_address_of_overloaded_function): Ditto.
>   * cp-gimplify.cc (cp_genericize_r): Change logic to not apply for
>   target_version semantics.
>   * decl.cc (start_decl): Change to mark and therefore mangle all
>   target_version decls.
>   (start_preparsed_function): Ditto.
>   * typeck.cc (cp_build_function_call_vec): Add error for calling 
> unresolvable
>   non-default node in target_version semantics.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/aarch64/mv-1.C: Change for target_version semantics.
>   * g++.target/aarch64/mv-symbols2.C: Ditto.
>   * g++.target/aarch64/mv-symbols3.C: Ditto.
>   * g++.target/aarch64/mv-symbols4.C: Ditto.
>   * g++.target/aarch64/mv-symbols5.C: Ditto.
>   * g++.target/aarch64/mvc-symbols3.C: Ditto.
>   * g++.target/riscv/mv-symbols2.C: Ditto.
>   * g++.target/riscv/mv-symbols3.C: Ditto.
>   * g++.target/riscv/mv-symbols4.C: Ditto.
>   * g++.target/riscv/mv-symbols5.C: Ditto.
>   * g++.target/riscv/mvc-symbols3.C: Ditto.
>   * g++.target/aarch64/mv-symbols10.C: New test.
>   * g++.target/aarch64/mv-symbols11.C: New test.
>   * g++.target/aarch64/mv-symbols12.C: New test.
>   * g++.target/aarch64/mv-symbols13.C: New test.
>   * g++.target/aarch64/mv-symbols6.C: New test.
>   * g++.target/aarch64/mv-symbols7.C: New test.
>   * g++.target/aarch64/mv-symbols8.C: New test.
>   * g++.target/aarch64/mv-symbols9.C: New test.
> ---
>  gcc/cgraphunit.cc |  9 +++
>  gcc/cp/call.cc| 10 +++
>  gcc/cp/class.cc   | 13 +++-
>  gcc/cp/cp-gimplify.cc | 11 ++-
>  gcc/cp/decl.cc| 14 
>  gcc/cp/typeck.cc  | 10 +++
>  gcc/ipa.cc| 11 +++
>  gcc/multiple_target.cc| 73 ---
>  gcc/testsuite/g++.target/aarch64/mv-1.C   |  4 +
>  .../g++.target/aarch64/mv-symbols10.C | 27 +++
>  .../g++.target/aarch64/mv-symbols11.C | 30 
>  .../g++.target/aarch64/mv-symbols12.C | 28 +++
>  .../g++.target/aarch64/mv-symbols13.C | 28 +++
>  .../g++.target/aarch64/mv-symbols2.C  | 12 +--
>  .../g++.target/aarch64/mv-symbols3.C  |  6 +-
>  .../g++.target/aarch64/mv-symbols4.C  |  6 +-
>  .../g++.target/aarch64/mv-symbols5.C  |  6 +-
>  .../g++.target/aarch64/mv-symbols6.C  | 25 +++
>  .../g++.target/aarch64/mv-symbols7.C  | 48 
>  .../g++.target/aarch64/mv-symbols8.C  | 46 
>  .../g++.target/aarch64/mv-symbols9.C  | 43 +++
>  .../g++.target/aarch64/mvc-symbols3.C | 12 +--
>  gcc/testsuite/g++.target/riscv/mv-symbols2.C  | 12 +--
>  gcc/testsuite/g++.target/riscv/mv-symbols3.C  |  6 +-
>  gcc/testsuite/g++.target/riscv/mv-symbols4.C  |  6 +-
>  gcc/testsuite/g++.target/riscv/mv-symbols5.C  |  6 +-
>  gcc/testsuite/g++.target/riscv/mvc-symbols3.C | 12 +--
>  27 files changed, 456 insertions(+), 58 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols10.C
>  create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols11.C
>  create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols12.C
>  create

RE: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351]

2025-02-18 Thread Li, Pan2

Hi Richard,

After some more investigation, the sample code never hit one vectorizable_* 
routines which may check the loop_vinfo->vector_mode,
and then the loop_vinfo->vector_mode == DImode will hit the 
vect_verify_loop_lens and trigger the assert VECTOR_MODE_P, detail
flow as below.

vect_analyze_loop_2
 |- vect_pattern_recog // Hit over-widening pattern and set 
loop_vinfo->vector_mode to DImode
 |- ...
 |- vect_analyze_loop_operations
   |- (gdb) p stmt_info->def_type
   |- $1 = vect_reduction_def
   |- (gdb) p stmt_info->slp_type
   |- $2 = pure_slp
   |- vectorizable_lc_phi // Not Hit
   |- vectorizable_induction  // Not Hit
   |- vectorizable_reduction  // Not Hit
   |- vectorizable_recurr // Not Hit
   |- vectorizable_live_operation  // Not Hit
   |- vect_analyze_stmt
 |- (gdb) p stmt_info->relevant
 |- $3 = vect_unused_in_scope
 |- (gdb) p stmt_info->live
 |- $4 = false
 |- (gdb) p pattern_stmt_info
 |- $5 = (stmt_vec_info) 0x0
 |- return opt_result::success ();
 OR
 |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP 
analysis\n" 
   |- Early return opt_result::success ();
 |- vectorizable_load/store/call_convert/... // Not Hit
   |- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS 
(loop_vinfo).is_empty ()
 |- vect_verify_loop_lens (loop_vinfo)
   |- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert result 
in ICE

I am a little hesitant by two options here.

1. shall we add some condition and dump log here to make the 
vect_analyze_loop_2 failure when loop_vinfo->vector_mode is not supported 
vector mode by target.
2. it should not be LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P here? Then we need to 
find out where set the partial vector to true.

Is there any suggestion here?

Pan

-Original Message-
From: Li, Pan2 
Sent: Monday, February 17, 2025 6:08 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v1] Vect: Fix ICE when get DImode from 
get_related_vectype_for_scalar_type [PR116351]

> But that's wrong - read the comment before the code.  We do support integer 
> mode
> "generic" vectorization just fine.  Iff there's anything to plug then
> it's how we end
> up thinking there's with_len support for DImode vectors.

I see, then we need another place to fix this, let me have a try.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, February 17, 2025 6:02 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Vect: Fix ICE when get DImode from 
get_related_vectype_for_scalar_type [PR116351]

On Mon, Feb 17, 2025 at 10:38 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix the ICE similar as below, assump we have
> sample code:
>
>1   │ int a, b, c;
>2   │ short d, e, f;
>3   │ long g (long h) { return h; }
>4   │
>5   │ void i () {
>6   │   for (; b; ++b) {
>7   │ f = 5 >> a ? d : d << a;
>8   │ e &= c | g(f);
>9   │   }
>   10   │ }
>
> It will ice when compile with -O3 -march=rv64gc_zve64f -mrvv-vector-bits=zvl
>
> during GIMPLE pass: vect
> pr116351-1.c: In function ‘i’:
> pr116351-1.c:8:6: internal compiler error: in get_len_load_store_mode,
> at optabs-tree.cc:655
> 8 | void i () {
>   |  ^
> 0x44d6b9d internal_error(char const*, ...)
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517
> 0x44a26a6 fancy_abort(char const*, int, char const*)
> 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722
> 0x19e4309 get_len_load_store_mode(machine_mode, bool, internal_fn*,
> vec*)
> 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs-tree.cc:655
> 0x1fada40 vect_verify_loop_lens
> 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:1566
> 0x1fb2b07 vect_analyze_loop_2
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3037
> 0x1fb4302 vect_analyze_loop_1
> 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3478
> 0x1fb4e9a vect_analyze_loop(loop*, gimple*, vec_info_shared*)
> 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3638
> 0x203c2dc try_vectorize_loop_1
> 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1095
> 0x203c839 try_vectorize_loop
> 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1212
> 0x203cb2c execute
>
> The zve32x cannot have 64 elen, and then the 
> get_related_vectype_for_scalar_type
> will get DImode as vector_mode in loop_info.  After that the underlying
> vect_analyze_xx will assert the mode is VECTOR

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Richard Sandiford

Kyrylo Tkachov  writes:
> Hi Soumya
>
>> On 18 Feb 2025, at 09:12, Soumya AR  wrote:
>> 
>> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
>> generic_prefetch_tune in generic_armv8_a_tunings.
>> 
>> This patch updates the pointer to generic_armv8_a_prefetch_tune.
>> 
>> This patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>> regression.
>> 
>> Ok for GCC 15 now?
>
> Yes, this looks like a simple oversight.
> Ok to push to master.

I suppose the alternative would be to remove generic_armv8_a_prefetch_tune,
since it's (deliberately) identical to generic_prefetch_tune.

> Thanks,
> Kyrill
>
>> 
>> Signed-off-by: Soumya AR 
>> 
>> gcc/ChangeLog:
>> 
>> * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch
>> struct pointer.
>> 
>> ---
>> gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
>> b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
>> index 35de3f03296..01080cade46 100644
>> --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
>> +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
>> @@ -184,7 +184,7 @@ static const struct tune_params generic_armv8_a_tunings =
>> (AARCH64_EXTRA_TUNE_BASE
>> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
>> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */
>> - &generic_prefetch_tune,
>> + &generic_armv8_a_prefetch_tune,
>> AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */
>> AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */
>> };
>> -- 
>> 2.34.1
>> 
>> 
>> 
>>

Re: [PATCH v2] [testsuite] add x86 effective target

2025-02-18 Thread Richard Sandiford

Alexandre Oliva  writes:
> On Feb 13, 2025, Alexandre Oliva  wrote:
>
>> @@ -14108,10 +14113,9 @@ proc dg-require-python-h { args } {
>>  # Return 1 if the target supports heap-trampoline, 0 otherwise.
>>  proc check_effective_target_heap_trampoline {} {
>>  if { [istarget aarch64*-*-linux*]
>> - || [istarget i?86-*-darwin*]
>> - || [istarget x86_64-*-darwin*]
>> - || [istarget i?86-*-linux*]
>> - || [istarget x86_64-*-linux*] } {
>> + || { [check_effective_target_x86]
>> +  && { [istarget *-*-darwin*]
>> +   || [istarget *-*-linux*] } } } {
>>  return 1
>>  }
>>  return 0
>
> I used the wrong kind of brackets here, and missed the error that it
> caused.  Here's a corrected patch, retested on x86_64-linux-gnu.
> Ok to install?
>
>
> I got tired of repeating the conditional that recognizes ia32 or
> x86_64, and introduced 'x86' as a shorthand for that, adjusting all
> occurrences in target-supports.exp, to set an example.  I found some
> patterns that recognized i?86* and x86_64*, but I took those as likely
> cut&pastos instead of trying to preserve those weirdnesses.
>
>
> for  gcc/ChangeLog
>
>   * doc/sourcebuild.texi: Add x86 effective target.
>
> for  gcc/testsuite/ChangeLog
>
>   * lib/target-supports.exp (check_effective_target_x86): New.
>   Replace all uses of i?86-*-* and x86_64-*-* in this file.

Thanks for doing this.  How about also replacing all uses of:

   ([check_effective_target_x86])

with:

   [check_effective_target_x86]

OK with that change if there are no objections within 24 hours.

Thanks,
Richard

> ---
>  gcc/doc/sourcebuild.texi  |3 +
>  gcc/testsuite/lib/target-supports.exp |  188 
> +
>  2 files changed, 99 insertions(+), 92 deletions(-)
>
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 28338324f0724..d44c2e8cbe6a1 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2798,6 +2798,9 @@ Target supports the execution of @code{user_msr} 
> instructions.
>  @item vect_cmdline_needed
>  Target requires a command line argument to enable a SIMD instruction set.
>  
> +@item x86
> +Target is ia32 or x86_64.
> +
>  @item xorsign
>  Target supports the xorsign optab expansion.
>  
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 9b5fbe5275613..fbeb2ad3dafa3 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -740,7 +740,7 @@ proc check_profiling_available { test_what } {
>  }
>  
>  if { $test_what == "-fauto-profile" } {
> - if { !([istarget i?86-*-linux*] || [istarget x86_64-*-linux*]) } {
> + if { !([check_effective_target_x86] && [istarget *-*-linux*]) } {
>   verbose "autofdo only supported on linux"
>   return 0
>   }
> @@ -2616,17 +2616,23 @@ proc remove_options_for_riscv_zvbb { flags } {
>  return [add_options_for_riscv_z_ext zvbb $flags]
>  }
>  
> +# Return 1 if the target is ia32 or x86_64.
> +
> +proc check_effective_target_x86 { } {
> +if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
> + return 1
> +} else {
> +return 0
> +}
> +}
> +
>  # Return 1 if the target OS supports running SSE executables, 0
>  # otherwise.  Cache the result.
>  
>  proc check_sse_os_support_available { } {
>  return [check_cached_effective_target sse_os_support_available {
>   # If this is not the right target then we can skip the test.
> - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
> - expr 0
> - } else {
> - expr 1
> - }
> + expr [check_effective_target_x86]
>  }]
>  }
>  
> @@ -2636,7 +2642,7 @@ proc check_sse_os_support_available { } {
>  proc check_avx_os_support_available { } {
>  return [check_cached_effective_target avx_os_support_available {
>   # If this is not the right target then we can skip the test.
> - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
> + if { !([check_effective_target_x86]) } {
>   expr 0
>   } else {
>   # Check that OS has AVX and SSE saving enabled.
> @@ -2659,7 +2665,7 @@ proc check_avx_os_support_available { } {
>  proc check_avx512_os_support_available { } {
>  return [check_cached_effective_target avx512_os_support_available {
>   # If this is not the right target then we can skip the test.
> - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
> + if { !([check_effective_target_x86]) } {
>   expr 0
>   } else {
>   # Check that OS has AVX512, AVX and SSE saving enabled.
> @@ -2682,7 +2688,7 @@ proc check_avx512_os_support_available { } {
>  proc check_sse_hw_available { } {
>  return [check_cached_effective_target sse_hw_available {
>   # If this is not the right target then we can skip the test.
> - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
> + if

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Kyrylo Tkachov




> On 18 Feb 2025, at 09:41, Richard Sandiford  wrote:
> 
> Kyrylo Tkachov  writes:
>> Hi Soumya
>> 
>>> On 18 Feb 2025, at 09:12, Soumya AR  wrote:
>>> 
>>> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
>>> generic_prefetch_tune in generic_armv8_a_tunings.
>>> 
>>> This patch updates the pointer to generic_armv8_a_prefetch_tune.
>>> 
>>> This patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> 
>>> Ok for GCC 15 now?
>> 
>> Yes, this looks like a simple oversight.
>> Ok to push to master.
> 
> I suppose the alternative would be to remove generic_armv8_a_prefetch_tune,
> since it's (deliberately) identical to generic_prefetch_tune.

Looks like we have one prefetch_tune structure for each of the generic tunings 
(generic, generic_armv8_a, generic_armv9_a).
For the sake of symmetry it feels a bit better to have them independently 
tunable.
But as the effects are the same, it may be better to remove it in the interest 
of less code.

Thanks,
Kyrill

> 
>> Thanks,
>> Kyrill
>> 
>>> 
>>> Signed-off-by: Soumya AR 
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch
>>> struct pointer.
>>> 
>>> ---
>>> gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
>>> b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
>>> index 35de3f03296..01080cade46 100644
>>> --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
>>> +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
>>> @@ -184,7 +184,7 @@ static const struct tune_params generic_armv8_a_tunings 
>>> =
>>> (AARCH64_EXTRA_TUNE_BASE
>>> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
>>> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */
>>> - &generic_prefetch_tune,
>>> + &generic_armv8_a_prefetch_tune,
>>> AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */
>>> AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */
>>> };
>>> -- 
>>> 2.34.1
>>> 
>>> 
>>> 
>>>

Re: [PATCH] builtins: Ensure sin and cos properly set errno when INFINITY is passed [PR80042]

2025-02-18 Thread Richard Biener

On Tue, Feb 18, 2025 at 1:21 AM Sam James  wrote:
>
> Peter Damianov  writes:
>
> > POSIX says that sin and cos should set errno to EDOM when infinity is 
> > passed to
> > them. Make sure this is accounted for in builtins.def, and add tests.
> >
> > gcc/
> >   PR middle-end/80042
> >   * builtins.def: (sin|cos)(f|l) can set errno.
> > gcc/testsuite/
> >   * gcc.dg/pr80042.c: New testcase.
> > ---
> >  gcc/builtins.def   | 20 +-
> >  gcc/testsuite/gcc.dg/pr80042.c | 71 ++
> >  2 files changed, 82 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr80042.c
> >
> > [...]
> > diff --git a/gcc/testsuite/gcc.dg/pr80042.c b/gcc/testsuite/gcc.dg/pr80042.c
> > new file mode 100644
> > index 000..cc578ae67e2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr80042.c
> > @@ -0,0 +1,71 @@
> > +/* dg-do run */
> > +/* dg-options "-O2 -lm" */
>
> These two lines are missing {}. Please double check the logs from your
> testsuite run to make sure newly added/changed tests are executed (and
> in the way you expect).

This test will also FAIL on *BSD IIRC as that doesn't set errno for any math
functions.

I'll note GCC models sincos as cexpi which does not set errno, and will
eventually expand that to sincos or cexp.  It does that without any
restriction on -fno-math-errno.

I'll also note the C standard does not document any domain error on +-
Inf arguments.
Instead it documents a range error for sin(x) and nonzero x too close to zero.

Richard.

>
> > [...]

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Kyrylo Tkachov



> On 18 Feb 2025, at 09:48, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 18 Feb 2025, at 09:41, Richard Sandiford  
>> wrote:
>> 
>> Kyrylo Tkachov  writes:
>>> Hi Soumya
>>> 
 On 18 Feb 2025, at 09:12, Soumya AR  wrote:
 
 generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
 generic_prefetch_tune in generic_armv8_a_tunings.
 
 This patch updates the pointer to generic_armv8_a_prefetch_tune.
 
 This patch was bootstrapped and regtested on aarch64-linux-gnu, no 
 regression.
 
 Ok for GCC 15 now?
>>> 
>>> Yes, this looks like a simple oversight.
>>> Ok to push to master.
>> 
>> I suppose the alternative would be to remove generic_armv8_a_prefetch_tune,
>> since it's (deliberately) identical to generic_prefetch_tune.
> 
> Looks like we have one prefetch_tune structure for each of the generic 
> tunings (generic, generic_armv8_a, generic_armv9_a).
> For the sake of symmetry it feels a bit better to have them independently 
> tunable.
> But as the effects are the same, it may be better to remove it in the 
> interest of less code.
> 

I see Soumya has already pushed her patch. I’m okay with either approach tbh, 
but if Richard prefers we can remove generic_armv8_a_prefetch_tune in a 
separate commit.

Thanks,
Kyrill


> Thanks,
> Kyrill
> 
>> 
>>> Thanks,
>>> Kyrill
>>> 
 
 Signed-off-by: Soumya AR 
 
 gcc/ChangeLog:
 
 * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch
 struct pointer.
 
 ---
 gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
 b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
 index 35de3f03296..01080cade46 100644
 --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
 +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
 @@ -184,7 +184,7 @@ static const struct tune_params 
 generic_armv8_a_tunings =
 (AARCH64_EXTRA_TUNE_BASE
 | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
 | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */
 - &generic_prefetch_tune,
 + &generic_armv8_a_prefetch_tune,
 AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */
 AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */
 };
 -- 
 2.34.1
 
 
 
 
>

[wwwdocs][committed] projects/gomp/: Update OpenMP implementation status

2025-02-18 Thread Tobias Burnus


Result of the commit, see: https://gcc.gnu.org/projects/gomp/

Main change are sync'ing a couple of now fully/partially supported items 
from libgomp.texi's implementation status table.


Otherwise as Sandra found out: a comma between directive and clauses in 
'#pragma' is already supported since a while (GCC 13; correct in the 
.texi file) and having a link directly to the OpenMP section makes 
sense, now that it is available. (Thanks!)


Tobias
commit 08114aefac17271a87eeaa6394f1874bf90604ab
Author: Tobias Burnus 
Date:   Tue Feb 18 10:27:27 2025 +0100

projects/gomp/: Update OpenMP implementation status

Sync implementation status from libgomp.texi; fix one omission;
link to 'openmp' anchor for GCC 15.

Co-authored-by: Sandra Loosemore 
---
 htdocs/projects/gomp/index.html | 66 +++--
 1 file changed, 43 insertions(+), 23 deletions(-)

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index a4fb4c98..97d14308 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -318,7 +318,7 @@ than listed, depending on resolved corner cases and optimizations.
   GCC 12
   GCC 13
   GCC 14
-  GCC 15
+  GCC 15
 
 
   (atomic_default_mem_order)
@@ -352,8 +352,10 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 declare variant directive
-GCC 10/GCC 11
-simd traits not handled correctly
+
+  GCC 10/GCC 11
+  GCC 15
+simd traits not handled correctly 
   
   
 use_device_addr clause on target data
@@ -474,7 +476,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 metadirective directive
-No
+GCC 15
 
   
   
@@ -486,7 +488,7 @@ than listed, depending on resolved corner cases and optimizations.
 allocate directive
 
   GCC 14
-  GCC 15
+  GCC 15
 
 
   Only C for stack/automatic and Fortran for stack/automatic and allocatable/pointer variables
@@ -691,12 +693,12 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 target_device trait in OpenMP Context
-No
+GCC 15
 
   
   
 target_device selector set in context selectors
-No
+GCC 15
 
   
   
@@ -706,17 +708,18 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 declare variant: new clauses adjust_args and append_args
-No
-
+GCC 15
+For append_args, all interop objects
+  must be specified in the interop clause of dispatch
   
   
 dispatch construct
-No
+GCC 15
 
   
   
 Loop transformation constructs
-GCC 15
+GCC 15
 
   
   
@@ -736,7 +739,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 omp_interop_t object support in runtime routines
-No
+GCC 15
 
   
   
@@ -763,7 +766,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Optional comma between directive and clause in the #pragma form
-No
+GCC 13
 
   
   
@@ -781,6 +784,23 @@ than listed, depending on resolved corner cases and optimizations.
 GCC 14
 
   
+  
+Changed interaction between declare target and OpenMP context
+GCC 15
+
+  
+  
+Dynamic selector support in metadirective
+GCC 15
+
+  
+  
+Dynamic selector support in declare variant
+GCC 15
+Fortran rejects non-constant expressions in dynamic selectors; C/C++
+reject expressions using argument variables.
+(https://gcc.gnu.org/PR113904";>PR113904)
+  
   
 ompt_sync_region_t enum additions
 No
@@ -893,7 +913,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Optional paired end directive with dispatch
-No
+GCC 15
 
   
   
@@ -908,7 +928,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 New otherwise clause as alias for default on metadirectives
-No
+GCC 15
 
   
   
@@ -978,7 +998,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 interop_types in any position of the modifier list for the init clause of the interop construct
-No
+GCC 15
 
   
   
@@ -1123,7 +1143,7 @@ error.
   
 Extension of interop operation of append_args,
   allowing all modifiers of the init clause
-No
+GCC 15
 
   
   
@@ -1295,7 +1315,7 @@ error.
   
   
 interop clause to dispatch
-No
+GCC 15
 
   
   
@@ -1311,7 +1331,7 @@ error.
   
   
 self_maps clause to requires directive
-GCC 15
+GCC 15
 
   
   
@@ -1355,7 +1375,7 @@ error.
 
   
   
-stipe loop-transformation construct
+stripe loop-transformation construct
 No
 
   
@@ -1447,7 +1467,7 @@ error.
   
   
 Extended prefer-type modifier to init clause
-No
+GCC 15
 
   
   
@@ -1507,13 +1527,13 @@ error.
   
   
 omp_targ

Re: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351]

2025-02-18 Thread Richard Biener

On Tue, Feb 18, 2025 at 10:12 AM Richard Biener
 wrote:
>
> On Tue, Feb 18, 2025 at 9:40 AM Li, Pan2  wrote:
> >
> > Hi Richard,
> >
> > After some more investigation, the sample code never hit one vectorizable_* 
> > routines which may check the loop_vinfo->vector_mode,
> > and then the loop_vinfo->vector_mode == DImode will hit the 
> > vect_verify_loop_lens and trigger the assert VECTOR_MODE_P, detail
> > flow as below.
> >
> > vect_analyze_loop_2
> >  |- vect_pattern_recog // Hit over-widening pattern and set 
> > loop_vinfo->vector_mode to DImode
> >  |- ...
> >  |- vect_analyze_loop_operations
> >|- (gdb) p stmt_info->def_type
> >|- $1 = vect_reduction_def
> >|- (gdb) p stmt_info->slp_type
> >|- $2 = pure_slp
> >|- vectorizable_lc_phi // Not Hit
> >|- vectorizable_induction  // Not Hit
> >|- vectorizable_reduction  // Not Hit
> >|- vectorizable_recurr // Not Hit
> >|- vectorizable_live_operation  // Not Hit
> >|- vect_analyze_stmt
> >  |- (gdb) p stmt_info->relevant
> >  |- $3 = vect_unused_in_scope
> >  |- (gdb) p stmt_info->live
> >  |- $4 = false
> >  |- (gdb) p pattern_stmt_info
> >  |- $5 = (stmt_vec_info) 0x0
> >  |- return opt_result::success ();
> >  OR
> >  |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP 
> > analysis\n"
> >|- Early return opt_result::success ();
> >  |- vectorizable_load/store/call_convert/... // Not Hit
> >|- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS 
> > (loop_vinfo).is_empty ()
> >  |- vect_verify_loop_lens (loop_vinfo)
> >|- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert 
> > result in ICE
> >
> > I am a little hesitant by two options here.
> >
> > 1. shall we add some condition and dump log here to make the 
> > vect_analyze_loop_2 failure when loop_vinfo->vector_mode is not supported 
> > vector mode by target.
> > 2. it should not be LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P here? Then we need 
> > to find out where set the partial vector to true.
> >
> > Is there any suggestion here?
>
> static bool
> vect_verify_loop_lens (loop_vec_info loop_vinfo)
> {
>   if (LOOP_VINFO_LENS (loop_vinfo).is_empty ())
> return false;
>
>   machine_mode len_load_mode, len_store_mode;
>   if (!get_len_load_store_mode (loop_vinfo->vector_mode, true)
>  .exists (&len_load_mode))
> return false;
>
> so the obvious fix would be to add
>
>   if (!VECTOR_MODE_P (loop_vinfo->vector_mode))
> return false;
>
> here?  But then I wonder how we got to a DImode vector_mode and record
> a loop len
> in the first place.  I could imagine we first end up with DImode but
> other stmts using
> a vector mode and we record a len for those.  But then the above
> get_len_load_store_mode
> on ->vector_mode seems to assume that all modes we need a len for are
> "compatible" with ->vector_mode so I assume recording a LEN would check that.
>
> I can't reproduce the ICE with a cross on trunk btw.

Ah, it needs -march=rv64imd_xsfvcp.  So we indeed call vect_record_loop_len
with

(gdb) p debug_tree (vectype)
 
unit-size 
align:16 warn_if_not_align:0 symtab:0 alias-set 2
canonical-type 0x77017690 precision:16 min  max 
pointer_to_this >
RVVM2HI
(gdb) p loop_vinfo->vector_mode
$2 = E_DImode

from vectorizable_operation and ->vector_mode is set via
vect_recog_over_widening_pattern which commits to a DImode
vector type ->vector_mode prematurely.

The error is probably that vect_verify_loop_lens does not do anything
to ensure the checks are done on a relevant mode.  With the suggested
added check above this then becomes a missed optimization rather
than an ICE.  But it might fall apart if there's not one load/store len mode
to consider?

>
> Richard.
>
> >
> > Pan
> >
> > -Original Message-
> > From: Li, Pan2
> > Sent: Monday, February 17, 2025 6:08 PM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> > jeffreya...@gmail.com; rdapp@gmail.com
> > Subject: RE: [PATCH v1] Vect: Fix ICE when get DImode from 
> > get_related_vectype_for_scalar_type [PR116351]
> >
> > > But that's wrong - read the comment before the code.  We do support 
> > > integer mode
> > > "generic" vectorization just fine.  Iff there's anything to plug then
> > > it's how we end
> > > up thinking there's with_len support for DImode vectors.
> >
> > I see, then we need another place to fix this, let me have a try.
> >
> > Pan
> >
> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, February 17, 2025 6:02 PM
> > To: Li, Pan2 
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> > jeffreya...@gmail.com; rdapp@gmail.com
> > Subject: Re: [PATCH v1] Vect: Fix ICE when get DImode from 
> > get_related_vectype_for_scalar_type [PR116351]
> >
> > On Mon, Feb 17, 2025 at 10:38 AM  wrote:
> > >
> > > From: Pan Li 
> > >
> > > This patch

Re: [PATCH v1] Vect: Fix ICE when get DImode from get_related_vectype_for_scalar_type [PR116351]

2025-02-18 Thread Richard Biener

On Tue, Feb 18, 2025 at 9:40 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> After some more investigation, the sample code never hit one vectorizable_* 
> routines which may check the loop_vinfo->vector_mode,
> and then the loop_vinfo->vector_mode == DImode will hit the 
> vect_verify_loop_lens and trigger the assert VECTOR_MODE_P, detail
> flow as below.
>
> vect_analyze_loop_2
>  |- vect_pattern_recog // Hit over-widening pattern and set 
> loop_vinfo->vector_mode to DImode
>  |- ...
>  |- vect_analyze_loop_operations
>|- (gdb) p stmt_info->def_type
>|- $1 = vect_reduction_def
>|- (gdb) p stmt_info->slp_type
>|- $2 = pure_slp
>|- vectorizable_lc_phi // Not Hit
>|- vectorizable_induction  // Not Hit
>|- vectorizable_reduction  // Not Hit
>|- vectorizable_recurr // Not Hit
>|- vectorizable_live_operation  // Not Hit
>|- vect_analyze_stmt
>  |- (gdb) p stmt_info->relevant
>  |- $3 = vect_unused_in_scope
>  |- (gdb) p stmt_info->live
>  |- $4 = false
>  |- (gdb) p pattern_stmt_info
>  |- $5 = (stmt_vec_info) 0x0
>  |- return opt_result::success ();
>  OR
>  |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP 
> analysis\n"
>|- Early return opt_result::success ();
>  |- vectorizable_load/store/call_convert/... // Not Hit
>|- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS 
> (loop_vinfo).is_empty ()
>  |- vect_verify_loop_lens (loop_vinfo)
>|- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert 
> result in ICE
>
> I am a little hesitant by two options here.
>
> 1. shall we add some condition and dump log here to make the 
> vect_analyze_loop_2 failure when loop_vinfo->vector_mode is not supported 
> vector mode by target.
> 2. it should not be LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P here? Then we need 
> to find out where set the partial vector to true.
>
> Is there any suggestion here?

static bool
vect_verify_loop_lens (loop_vec_info loop_vinfo)
{
  if (LOOP_VINFO_LENS (loop_vinfo).is_empty ())
return false;

  machine_mode len_load_mode, len_store_mode;
  if (!get_len_load_store_mode (loop_vinfo->vector_mode, true)
 .exists (&len_load_mode))
return false;

so the obvious fix would be to add

  if (!VECTOR_MODE_P (loop_vinfo->vector_mode))
return false;

here?  But then I wonder how we got to a DImode vector_mode and record
a loop len
in the first place.  I could imagine we first end up with DImode but
other stmts using
a vector mode and we record a len for those.  But then the above
get_len_load_store_mode
on ->vector_mode seems to assume that all modes we need a len for are
"compatible" with ->vector_mode so I assume recording a LEN would check that.

I can't reproduce the ICE with a cross on trunk btw.

Richard.

>
> Pan
>
> -Original Message-
> From: Li, Pan2
> Sent: Monday, February 17, 2025 6:08 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v1] Vect: Fix ICE when get DImode from 
> get_related_vectype_for_scalar_type [PR116351]
>
> > But that's wrong - read the comment before the code.  We do support integer 
> > mode
> > "generic" vectorization just fine.  Iff there's anything to plug then
> > it's how we end
> > up thinking there's with_len support for DImode vectors.
>
> I see, then we need another place to fix this, let me have a try.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, February 17, 2025 6:02 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Vect: Fix ICE when get DImode from 
> get_related_vectype_for_scalar_type [PR116351]
>
> On Mon, Feb 17, 2025 at 10:38 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to fix the ICE similar as below, assump we have
> > sample code:
> >
> >1   │ int a, b, c;
> >2   │ short d, e, f;
> >3   │ long g (long h) { return h; }
> >4   │
> >5   │ void i () {
> >6   │   for (; b; ++b) {
> >7   │ f = 5 >> a ? d : d << a;
> >8   │ e &= c | g(f);
> >9   │   }
> >   10   │ }
> >
> > It will ice when compile with -O3 -march=rv64gc_zve64f -mrvv-vector-bits=zvl
> >
> > during GIMPLE pass: vect
> > pr116351-1.c: In function ‘i’:
> > pr116351-1.c:8:6: internal compiler error: in get_len_load_store_mode,
> > at optabs-tree.cc:655
> > 8 | void i () {
> >   |  ^
> > 0x44d6b9d internal_error(char const*, ...)
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517
> > 0x44a26a6 fancy_abort(char const*, int, char const*)
> > 
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722
> > 0x19e4309 get_len_load_store_mode(machine_mode, bool, internal_fn*,
> > vec*)
> > 
> > /home/pli

[PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Soumya AR

generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
generic_prefetch_tune in generic_armv8_a_tunings.

This patch updates the pointer to generic_armv8_a_prefetch_tune.

This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.

Ok for GCC 15 now?

Signed-off-by: Soumya AR 

gcc/ChangeLog:

* config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch
struct pointer.

---
gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
index 35de3f03296..01080cade46 100644
--- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
+++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
@@ -184,7 +184,7 @@ static const struct tune_params generic_armv8_a_tunings =
(AARCH64_EXTRA_TUNE_BASE
| AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
| AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */
- &generic_prefetch_tune,
+ &generic_armv8_a_prefetch_tune,
AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */
AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */
};
-- 
2.34.1

Re: [PATCH] COBOL 12/15 24K pos: Posix adapter framework

2025-02-18 Thread Richard Biener

On Mon, Feb 17, 2025 at 6:50 PM James K. Lowden
 wrote:
>
> On Sat, 15 Feb 2025 21:24:52 +
> Sam James  wrote:
>
> > > +prototypes.cpp: posix.txt
> > > +   awk -F'[/.]' '{ print $$6 }' $^ | \
> > > +   while read F; do echo "/* $$F */" && man 2 $$F | \
> > > +   ./scrape.awk -v funcname=$$6; done > $@~
> > > +   @mv $@~ $@
> > > +
> > > +posix.txt:
> > > +   zgrep -l 'POSIX[.]' /usr/share/man/man2/*z > $@~
> >
> > This will need reworking. It assumes the location of the man pages on
> > the system, assumes 'zgrep' exists, and assumes 'zgrep' can read the
> > man pages (the man pages may be compressed with something else; I know
> > such systems exist).
> >
> > I'm not sure this is really any less brittle or more robust than just
> > listing the actual functions you scraped out from your system.
>
> You might be reading more into this than you want to.
>
> As you saw in gcc/cobol/posix/README.md, the files in that directory are not 
> part of the compiler.  They are tools we provide that potentially make it 
> easier to generate user-defined COBOL functions that call functions in the C 
> standard library, in particular syscalls.  IMO they don't need to be perfect; 
> it is enough that they are good.
>
> The user need never touch this part of the system.  The compiler functions 
> without it.  It's there as a convenience and demonstration.  I hope to 
> encourage contributions from users to this directory in a "contrib/" kind of 
> way.
>
> There are dependencies beyond the ones you mention, not least (as documented) 
> the Python PLY module.  Anyone sitting down with this tool will have to 
> wrestle with it a bit.  I contend that, if the user needs more than a few 
> functions, it will be less trouble to engage the tool than to write them by 
> hand.
>
> I agree it could be improved.  For example,
>
> > +posix.txt:
> > + zgrep -l 'POSIX[.]' /usr/share/man/man2/*z > $@~
>
> could be
>
> posix.txt:
> $(ZGREP) -l 'POSIX[.]' $(MANDIR)/man/man2/*z > $@~
>
> but that doesn't gain us much, does it?  We could start over with autoconf & 
> automake, to ensure full portability.  But that would defeat the purpose.  
> What I want to provide here is a prototype, not a robust foolproof tool.
>
> I think a simple example -- even a brittle one loaded with assumptions -- is 
> easier to understand and serves as a better illustration than a complicated 
> one.  I want to provide such a tool as part of gcobol, to give the user a 
> facility not available from any other COBOL compiler.  I think it's better 
> included in the gcc distribution than as an SO post or FAQ at 
> http://www.cobolworx.com.
>
> I'm sure you agree we don't want to let this tail wag the dog.  With my 
> exegesis in mind, what would you recommend?  If it's limited to more 
> judicious use of makefile variables, I could surely implement those 
> suggestions.

So to simplify things at this point can we postpone merging this bit
then?  If you say it's more like a "contrib", wouldn't
putting it in the toplevel contrib/ directory be more appropriate?
Maybe in a contrib/cobol/ subdirectory?

Richard.

>
> --jkl
>

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Kyrylo Tkachov

Hi Soumya

> On 18 Feb 2025, at 09:12, Soumya AR  wrote:
> 
> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
> generic_prefetch_tune in generic_armv8_a_tunings.
> 
> This patch updates the pointer to generic_armv8_a_prefetch_tune.
> 
> This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> 
> Ok for GCC 15 now?

Yes, this looks like a simple oversight.
Ok to push to master.
Thanks,
Kyrill

> 
> Signed-off-by: Soumya AR 
> 
> gcc/ChangeLog:
> 
> * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch
> struct pointer.
> 
> ---
> gcc/config/aarch64/tuning_models/generic_armv8_a.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
> b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> index 35de3f03296..01080cade46 100644
> --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> @@ -184,7 +184,7 @@ static const struct tune_params generic_armv8_a_tunings =
> (AARCH64_EXTRA_TUNE_BASE
> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */
> - &generic_prefetch_tune,
> + &generic_armv8_a_prefetch_tune,
> AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */
> AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */
> };
> -- 
> 2.34.1
> 
> 
> 
>

[PATCH] arm: Remove inner 'fix:HF/SF/DF' from fixed-point patterns (PR 117712)

2025-02-18 Thread Christophe Lyon

As discussed in the PR, removing the inner 'fix:HF/SD/DF' fixes the
problem, like other targets do.

gcc/ChangeLog:

PR rtl-optimization/117712
* config/arm/arm.md (fix_trunchfsi2): Remove inner fix:HF.
(fix_trunchfdi2): Likewise.
(fix_truncsfsi2): Remove inner fix:SF.
(fix_truncdfsi2): Remove inner fix:DF.
* config/arm/vfp.md (truncsisf2_vfp): remove inner fix:SF.
(truncsidf2_vfp): Remove inner fix:DF.
(fixuns_truncsfsi2): Remove inner fix:SF.
(fixuns_truncdfsi2): Remove inner fix:DF.

gcc/testsuite/ChangeLog:

PR rtl-optimization/117712
* gcc.target/arm/pr117712-df.c: New test.
* gcc.target/arm/pr117712-hf-di.c: New test.
* gcc.target/arm/pr117712-hf.c: New test.
* gcc.target/arm/pr117712-sf.c: New test.
---
 gcc/config/arm/arm.md |  8 
 gcc/config/arm/vfp.md |  8 
 gcc/testsuite/gcc.target/arm/pr117712-df.c| 10 ++
 gcc/testsuite/gcc.target/arm/pr117712-hf-di.c | 10 ++
 gcc/testsuite/gcc.target/arm/pr117712-hf.c| 10 ++
 gcc/testsuite/gcc.target/arm/pr117712-sf.c| 10 ++
 6 files changed, 48 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-df.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf-di.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-hf.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pr117712-sf.c

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 442d86b9329..ed0d0da2e63 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5477,7 +5477,7 @@ (define_expand "floatsidf2"
 
 (define_expand "fix_trunchfsi2"
   [(set (match_operand:SI 0 "general_operand")
-   (fix:SI (fix:HF (match_operand:HF 1 "general_operand"]
+   (fix:SI (match_operand:HF 1 "general_operand")))]
   "TARGET_EITHER"
   "
   {
@@ -5489,7 +5489,7 @@ (define_expand "fix_trunchfsi2"
 
 (define_expand "fix_trunchfdi2"
   [(set (match_operand:DI 0 "general_operand")
-   (fix:DI (fix:HF (match_operand:HF 1 "general_operand"]
+   (fix:DI (match_operand:HF 1 "general_operand")))]
   "TARGET_EITHER"
   "
   {
@@ -5501,14 +5501,14 @@ (define_expand "fix_trunchfdi2"
 
 (define_expand "fix_truncsfsi2"
   [(set (match_operand:SI 0 "s_register_operand")
-   (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand"]
+   (fix:SI (match_operand:SF 1 "s_register_operand")))]
   "TARGET_32BIT && TARGET_HARD_FLOAT"
   "
 ")
 
 (define_expand "fix_truncdfsi2"
   [(set (match_operand:SI 0 "s_register_operand")
-   (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand"]
+   (fix:SI (match_operand:DF 1 "s_register_operand")))]
   "TARGET_32BIT && TARGET_HARD_FLOAT && !TARGET_VFP_SINGLE"
   "
 ")
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 379f5f7b3dc..0ef019b1727 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -1508,7 +1508,7 @@ (define_insn "truncsfhf2"
 
 (define_insn "*truncsisf2_vfp"
   [(set (match_operand:SI0 "s_register_operand" "=t")
-   (fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" "t"]
+   (fix:SI (match_operand:SF 1 "s_register_operand" "t")))]
   "TARGET_32BIT && TARGET_HARD_FLOAT"
   "vcvt%?.s32.f32\\t%0, %1"
   [(set_attr "predicable" "yes")
@@ -1517,7 +1517,7 @@ (define_insn "*truncsisf2_vfp"
 
 (define_insn "*truncsidf2_vfp"
   [(set (match_operand:SI0 "s_register_operand" "=t")
-   (fix:SI (fix:DF (match_operand:DF 1 "s_register_operand" "w"]
+   (fix:SI (match_operand:DF 1 "s_register_operand" "w")))]
   "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE"
   "vcvt%?.s32.f64\\t%0, %P1"
   [(set_attr "predicable" "yes")
@@ -1527,7 +1527,7 @@ (define_insn "*truncsidf2_vfp"
 
 (define_insn "fixuns_truncsfsi2"
   [(set (match_operand:SI0 "s_register_operand" "=t")
-   (unsigned_fix:SI (fix:SF (match_operand:SF 1 "s_register_operand" 
"t"]
+   (unsigned_fix:SI (match_operand:SF 1 "s_register_operand" "t")))]
   "TARGET_32BIT && TARGET_HARD_FLOAT"
   "vcvt%?.u32.f32\\t%0, %1"
   [(set_attr "predicable" "yes")
@@ -1536,7 +1536,7 @@ (define_insn "fixuns_truncsfsi2"
 
 (define_insn "fixuns_truncdfsi2"
   [(set (match_operand:SI0 "s_register_operand" "=t")
-   (unsigned_fix:SI (fix:DF (match_operand:DF 1 "s_register_operand" 
"t"]
+   (unsigned_fix:SI (match_operand:DF 1 "s_register_operand" "t")))]
   "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE"
   "vcvt%?.u32.f64\\t%0, %P1"
   [(set_attr "predicable" "yes")
diff --git a/gcc/testsuite/gcc.target/arm/pr117712-df.c 
b/gcc/testsuite/gcc.target/arm/pr117712-df.c
new file mode 100644
index 000..534f2e4ed1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr117712-df.c
@@ -0,0 +1,10 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -

Re: The COBOL front end, version 2, in 15-part harmony

2025-02-18 Thread Richard Biener

On Sat, Feb 15, 2025 at 10:01 PM James K. Lowden
 wrote:
>
> The following 15 patches constitute 134,033 lines of code in 97 files
> to build and document the COBOL front end.  The messages are
> grouped by files in a more or less logical order. We have:
>
>   4K dir  create gcc/cobol and libgcobol directories
>   8K pre  introduce ChangeLog files
>  92K bld  config and build machinery
> 436K cfg  libgcobol/configure
> 380K hdr  header files
> 156K lex  lexer
> 492K par  parser
> 360K cbl  parser support
> 532K api  GENERIC interface
> 252K gen  GENERIC interface support
>  72K doc  man pages and GnuCOBOL emulation
>  24K pos  Posix adapter framework
>  84K lhd  libgcobol header files
> 480K lib  libgcobol support
> 384K lcc  libgcobol, main file
>
> Except for "lib", patches over 400 KB consist of just one big file.

For a future possible version 3 of the patch set, you do not need to
send big generated files like 'configure' as part of the patch, but just
the sources/changes to their templates.

Thanks,
Richard.

> They are against the master branch as of
>
> commit 3e08a4ecea27c54fda90e8f58641b1986ad957e1
> Date:   Wed Feb 5 14:22:33 2025 -0700
>
> Our repository is
>
> https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/
>
> using branch
>
> cobol-stage
>
> I tested these patches using "git apply" to an unpublished branch
> "cobol-patched". I will push it on request.  There are some whitespace
> warnings that I understand, and some I do not.  There is no trailing
> whitespace, and tabs occur only in lex/yacc files.
>
> I have endeavored to address all the issues raised in Round 1.  In
> particular:
>
> 1.  The patches are against a recent commit.
> 2.  Generated files use Autoconf 2.69.
> 3.  Flex and Bison outputs respect --enable-generated-files-in-srcdir.
> We use the gcc FLEX and BISON make variables.
> 4.  Documentation is generated as HTML and PDF.
> 5.  Python machinery has been patched to add 'cobol'
> 6.  ChangeLogs !
> 7.  libgcobol builds independent of gcc/cobol.  The library does not use
> compiler header files.  Shared information is maintained in library
> headers.
> 8.  --enable-languages=all works. gcobol supports x86_64 and aarch64
> (so far, for now). For unsupported targets, configure reports
> gcobol is not built.  We have built with multilib enabled and
> from bootstrap.
> 9.  Diagnostic messages go through the diagnostic framework, and report
> the location, including the column.
> 10. Use xasprintf & friends from libiberty. Removed PATH_MAX.
>
> Still to come:
>
> 11. Enumerated warnings in cobol/lang.opt.
> 12. texinfo update to describe gcobol
> 13. cross-compilation
>
> This patchset still excludes tests. I will supply tests separately.
> Simplest I think is to use the NIST test suite, assuming the code and
> documentation passes legal muster.
>
> I want to thank David and Matthias for their patches, which are
> incorporated.  My thanks too to the many people contributed invaluable
> advice and offered encouragement.
>
> I remain obdurately hopeful the COBOL front end will be deemed ready
> for gcc-15. The von Clausewitz test of any compiler is the real world.
> Users kicking the tires push us to improve the compiler in ways that
> are are practical to them. (Several features are now pending while we
> strive to meet reviewers' concerns.)  To that end, I have also prepared
> release notes for the www repository under separate cover.
>
> Thank you for your kind consideration of our work.
>
> --jkl
>

97 matches

Mail list logo