date:20230609

[PATCH v10] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

2023-06-09 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
iterators of RVV. And then the ZVFH will leverage one define attr as
the gate for FP16 supported or not.

Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.

Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
Co-Authored by: Kito Cheng 

gcc/ChangeLog:

* config/riscv/riscv.md (enabled): Move to another place, and
add fp_vector_disabled to the cond.
(fp_vector_disabled): New attr defined for disabling fp.
* config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add vle16 test
for ZVFHMIN.
---
 gcc/config/riscv/riscv.md | 39 ---
 gcc/config/riscv/vector-iterators.md  | 23 ++-
 .../riscv/rvv/base/zvfhmin-intrinsic.c| 15 ++-
 3 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 38b8fba2a53..d8e935cb934 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -239,12 +239,6 @@ (define_attr "ext_enabled" "no,yes"
]
(const_string "no")))
 
-;; Attribute to control enable or disable instructions.
-(define_attr "enabled" "no,yes"
-  (cond [(eq_attr "ext_enabled" "no")
-(const_string "no")]
-   (const_string "yes")))
-
 ;; Classification of each insn.
 ;; branch  conditional branch
 ;; jumpunconditional jump
@@ -434,6 +428,39 @@ (define_attr "type"
 (eq_attr "move_type" "rdvlenb") (const_string "rdvlenb")]
(const_string "unknown")))
 
+;; True if the float point vector is disabled.
+(define_attr "fp_vector_disabled" "no,yes"
+  (cond [
+(and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
+ vfwalu,vfwmul,vfmuladd,vfwmuladd,
+ vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
+ vfclass,vfmerge,
+ vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
+ vfredo,vfredu,vfwredo,vfwredu,
+ vfslide1up,vfslide1down")
+(and (eq_attr "mode" 
"VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
+ (match_test "!TARGET_ZVFH")))
+(const_string "yes")
+
+;; The mode records as QI for the FP16 <=> INT8 instruction.
+(and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
+(and (eq_attr "mode" 
"VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")
+ (match_test "!TARGET_ZVFH")))
+(const_string "yes")
+  ]
+  (const_string "no")))
+
+;; Attribute to control enable or disable instructions.
+(define_attr "enabled" "no,yes"
+  (cond [
+(eq_attr "ext_enabled" "no")
+(const_string "no")
+
+(eq_attr "fp_vector_disabled" "yes")
+(const_string "no")
+  ]
+  (const_string "yes")))
+
 ;; Length of instruction in bytes.
 (define_attr "length" ""
(cond [
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f4946d84449..234b712bc9d 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -453,9 +453,8 @@ (define_mode_iterator V_WHOLE [
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
 
-  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
-  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 64")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
@@ -477,7 +476,11 @@ (define_mode_iterator V_WHOLE [
 (define_mode_iterator V_FRACT [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI (VNx4QI "TARGET_MIN_VLEN > 32") 
(VNx8QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN > 32") (VNx4HI 
"TARGET_MIN_VLEN >= 128")
-  (VNx1HF "TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_MIN_VLEN > 32") (VNx4HF 
"TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SI "TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN < 128") (VNx2SI 
"TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN 
< 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
@@ -497,12 +500,12 @@ (define_mode_iterator VWEXTI [
 ])
 
 (define_mode_iterator VWEXTF [
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
-  (VNx2SF "

RE: RE: [PATCH v9] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

2023-06-09 Thread Li, Pan2 via Gcc-patches

Thanks Juzhe and Kito for reviewing, update the PATCH v10 as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621104.html

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Li, Pan2 via Gcc-patches
Sent: Friday, June 9, 2023 2:41 PM
To: juzhe.zh...@rivai.ai; gcc-patches 
Cc: Robin Dapp ; jeffreyalaw ; 
Wang, Yanzhang ; kito.cheng 
Subject: RE: RE: [PATCH v9] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

By logically, Yes, we should not change that but here I would like to put all 
enable related code together, will remove this part as it may has no 
relationship with this patch.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, June 9, 2023 2:31 PM
To: Li, Pan2 ; gcc-patches 
Cc: Robin Dapp ; jeffreyalaw ; 
Wang, Yanzhang ; kito.cheng 
Subject: Re: RE: [PATCH v9] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

OK. But why change the place of these

-;; ISA attributes.
-(define_attr "ext" "base,f,d,vector"
-  (const_string "base"))
-
-;; True if the extension is enabled.
-(define_attr "ext_enabled" "no,yes"
-  (cond [(eq_attr "ext" "base")
- (const_string "yes")
-
- (and (eq_attr "ext" "f")
-   (match_test "TARGET_HARD_FLOAT"))
- (const_string "yes")
-
- (and (eq_attr "ext" "d")
-   (match_test "TARGET_DOUBLE_FLOAT"))
- (const_string "yes")
-
- (and (eq_attr "ext" "vector")
-   (match_test "TARGET_VECTOR"))
- (const_string "yes")
- ]
- (const_string "no")))
I think it should not be changed.



juzhe.zh...@rivai.ai

From: Li, Pan2
Date: 2023-06-09 14:23
To: juzhe.zh...@rivai.ai; 
gcc-patches
CC: Robin Dapp; 
jeffreyalaw; Wang, 
Yanzhang; 
kito.cheng
Subject: RE: [PATCH v9] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.
-;; ISA attributes.
-(define_attr "ext" "base,f,d,vector"
-  (const_string "base"))
-
-;; True if the extension is enabled.
-(define_attr "ext_enabled" "no,yes"
-  (cond [(eq_attr "ext" "base")
- (const_string "yes")
-
- (and (eq_attr "ext" "f")
-   (match_test "TARGET_HARD_FLOAT"))
- (const_string "yes")
-
- (and (eq_attr "ext" "d")
-   (match_test "TARGET_DOUBLE_FLOAT"))
- (const_string "yes")
-
- (and (eq_attr "ext" "vector")
-   (match_test "TARGET_VECTOR"))
- (const_string "yes")
- ]
- (const_string "no")))
>> Why change this ?
As the fp will reference the type attr, we should move this part after the type 
attr definition.

-;; Attribute to control enable or disable instructions.
-(define_attr "enabled" "no,yes"
-  (cond [(eq_attr "ext_enabled" "no")
- (const_string "no")]
- (const_string "yes")))

>> I think it should only add fp16_vector_disable. However, it seems the whole 
>> thing is removed?
The same as above, move to the place after than type attr definition and only 
add fp_vector_disable here.

>> This should be in vector.md instead of riscv.md
It will trigger “unknown attribute `fp_vector_disabled' in definition of 
attribute `enabled'”, because riscv.md include the vector.md at the end of file.

Pan

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: Friday, June 9, 2023 2:14 PM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: Robin Dapp mailto:rdapp@gmail.com>>; jeffreyalaw 
mailto:jeffreya...@gmail.com>>; Li, Pan2 
mailto:pan2...@intel.com>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>; kito.cheng 
mailto:kito.ch...@gmail.com>>
Subject: Re: [PATCH v9] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

-;; ISA attributes.
-(define_attr "ext" "base,f,d,vector"
-  (const_string "base"))
-
-;; True if the extension is enabled.
-(define_attr "ext_enabled" "no,yes"
-  (cond [(eq_attr "ext" "base")
- (const_string "yes")
-
- (and (eq_attr "ext" "f")
-   (match_test "TARGET_HARD_FLOAT"))
- (const_string "yes")
-
- (and (eq_attr "ext" "d")
-   (match_test "TARGET_DOUBLE_FLOAT"))
- (const_string "yes")
-
- (and (eq_attr "ext" "vector")
-   (match_test "TARGET_VECTOR"))
- (const_string "yes")
- ]
- (const_string "no")))


Why change this ?

-;; Attribute to control enable or disable instructions.
-(define_attr "enabled" "no,yes"
-  (cond [(eq_attr "ext_enabled" "no")
- (const_string "no")]
- (const_string "yes")))

I think it should only add fp16_vector_disable. However, it seems the whole 
thing is removed?

+;; True if the float point vector is disabled.
+(define_attr "fp_vector_disabled" "no,yes"
+  (cond [
+(and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
+   vfwalu,vfwmul,vfmuladd,vfwmuladd,
+   vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
+   vfclass,vfmerge,
+   vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
+   vfredo,vfredu,vfwredo,vfwredu,
+   vfslide1up,vfslide1down")
+ (and (eq_attr "mode" "VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
+   (match_te

Re: [PATCH v10] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

2023-06-09 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-09 15:07
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v10] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.
From: Pan Li 
 
This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
iterators of RVV. And then the ZVFH will leverage one define attr as
the gate for FP16 supported or not.
 
Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
Co-Authored by: Kito Cheng 
 
gcc/ChangeLog:
 
* config/riscv/riscv.md (enabled): Move to another place, and
add fp_vector_disabled to the cond.
(fp_vector_disabled): New attr defined for disabling fp.
* config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add vle16 test
for ZVFHMIN.
---
gcc/config/riscv/riscv.md | 39 ---
gcc/config/riscv/vector-iterators.md  | 23 ++-
.../riscv/rvv/base/zvfhmin-intrinsic.c| 15 ++-
3 files changed, 59 insertions(+), 18 deletions(-)
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 38b8fba2a53..d8e935cb934 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -239,12 +239,6 @@ (define_attr "ext_enabled" "no,yes"
]
(const_string "no")))
-;; Attribute to control enable or disable instructions.
-(define_attr "enabled" "no,yes"
-  (cond [(eq_attr "ext_enabled" "no")
- (const_string "no")]
- (const_string "yes")))
-
;; Classification of each insn.
;; branch conditional branch
;; jump unconditional jump
@@ -434,6 +428,39 @@ (define_attr "type"
(eq_attr "move_type" "rdvlenb") (const_string "rdvlenb")]
(const_string "unknown")))
+;; True if the float point vector is disabled.
+(define_attr "fp_vector_disabled" "no,yes"
+  (cond [
+(and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
+   vfwalu,vfwmul,vfmuladd,vfwmuladd,
+   vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
+   vfclass,vfmerge,
+   vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
+   vfredo,vfredu,vfwredo,vfwredu,
+   vfslide1up,vfslide1down")
+ (and (eq_attr "mode" "VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
+   (match_test "!TARGET_ZVFH")))
+(const_string "yes")
+
+;; The mode records as QI for the FP16 <=> INT8 instruction.
+(and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
+ (and (eq_attr "mode" "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")
+   (match_test "!TARGET_ZVFH")))
+(const_string "yes")
+  ]
+  (const_string "no")))
+
+;; Attribute to control enable or disable instructions.
+(define_attr "enabled" "no,yes"
+  (cond [
+(eq_attr "ext_enabled" "no")
+(const_string "no")
+
+(eq_attr "fp_vector_disabled" "yes")
+(const_string "no")
+  ]
+  (const_string "yes")))
+
;; Length of instruction in bytes.
(define_attr "length" ""
(cond [
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f4946d84449..234b712bc9d 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -453,9 +453,8 @@ (define_mode_iterator V_WHOLE [
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
-  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
-  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 64")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
@@ -477,7 +476,11 @@ (define_mode_iterator V_WHOLE [
(define_mode_iterator V_FRACT [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI (VNx4QI "TARGET_MIN_VLEN > 32") 
(VNx8QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN > 32") (VNx4HI 
"TARGET_MIN_VLEN >= 128")
-  (VNx1HF "TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_MIN_VLEN > 32") (VNx4HF 
"TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SI "TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN < 128") (VNx2SI 
"TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN 
< 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
@@ -497,12 +500,12 @@ (define_mode_iterator VWEXTI [
])
(define_mode_iterator VWEXTF [
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx4SF "TARGET_VECTOR_E

[PATCH] middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-code

2023-06-09 Thread Richard Biener via Gcc-patches

When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE.  The following fixes this by
using element_precision instead.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.
---
 gcc/match.pd | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 4ad037d641a..4072afb474a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4147,19 +4147,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   int inside_ptr = POINTER_TYPE_P (inside_type);
   int inside_float = FLOAT_TYPE_P (inside_type);
   int inside_vec = VECTOR_TYPE_P (inside_type);
-  unsigned int inside_prec = TYPE_PRECISION (inside_type);
+  unsigned int inside_prec = element_precision (inside_type);
   int inside_unsignedp = TYPE_UNSIGNED (inside_type);
   int inter_int = INTEGRAL_TYPE_P (inter_type);
   int inter_ptr = POINTER_TYPE_P (inter_type);
   int inter_float = FLOAT_TYPE_P (inter_type);
   int inter_vec = VECTOR_TYPE_P (inter_type);
-  unsigned int inter_prec = TYPE_PRECISION (inter_type);
+  unsigned int inter_prec = element_precision (inter_type);
   int inter_unsignedp = TYPE_UNSIGNED (inter_type);
   int final_int = INTEGRAL_TYPE_P (type);
   int final_ptr = POINTER_TYPE_P (type);
   int final_float = FLOAT_TYPE_P (type);
   int final_vec = VECTOR_TYPE_P (type);
-  unsigned int final_prec = TYPE_PRECISION (type);
+  unsigned int final_prec = element_precision (type);
   int final_unsignedp = TYPE_UNSIGNED (type);
 }
(switch
-- 
2.35.3

[PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-09 Thread Richard Biener via Gcc-patches

The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
ICEs when tree checking is enabled.  This should avoid wrong-code
in cases like PR110182 and instead ICE.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu, I guess
there will be some fallout of such change ...

* tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
---
 gcc/tree.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree.h b/gcc/tree.h
index 1854fe4a7d4..9c525d14474 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2191,7 +2191,8 @@ class auto_suppress_location_wrappers
 #define TYPE_SIZE_UNIT(NODE) (TYPE_CHECK (NODE)->type_common.size_unit)
 #define TYPE_POINTER_TO(NODE) (TYPE_CHECK (NODE)->type_common.pointer_to)
 #define TYPE_REFERENCE_TO(NODE) (TYPE_CHECK (NODE)->type_common.reference_to)
-#define TYPE_PRECISION(NODE) (TYPE_CHECK (NODE)->type_common.precision)
+#define TYPE_PRECISION(NODE) \
+  (TREE_NOT_CHECK (TYPE_CHECK (NODE), VECTOR_TYPE)->type_common.precision)
 #define TYPE_NAME(NODE) (TYPE_CHECK (NODE)->type_common.name)
 #define TYPE_NEXT_VARIANT(NODE) (TYPE_CHECK (NODE)->type_common.next_variant)
 #define TYPE_MAIN_VARIANT(NODE) (TYPE_CHECK (NODE)->type_common.main_variant)
-- 
2.35.3

[PATCH v1] RISC-V: Fix one warning of frm enum.

2023-06-09 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to fix one warning similar as below, and add the
link for where the values comes from.

./gcc/config/riscv/riscv-protos.h:260:13: warning: binary constants are
a C++14 feature or GCC extension
FRM_RNE = 0b000,
  ^

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum frm_field_enum): Adjust
literal to int.
---
 gcc/config/riscv/riscv-protos.h | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 38e4125424b..66c1f535d60 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -254,15 +254,18 @@ enum vxrm_field_enum
   VXRM_RDN,
   VXRM_ROD
 };
-/* Rounding mode bitfield for floating point FRM.  */
+/* Rounding mode bitfield for floating point FRM.  The value of enum comes
+   from the below link.
+   
https://github.com/riscv/riscv-isa-manual/blob/main/src/f-st-ext.adoc#floating-point-control-and-status-register
+ */
 enum frm_field_enum
 {
-  FRM_RNE = 0b000,
-  FRM_RTZ = 0b001,
-  FRM_RDN = 0b010,
-  FRM_RUP = 0b011,
-  FRM_RMM = 0b100,
-  FRM_DYN = 0b111
+  FRM_RNE = 0, /* Aka 0b000.  */
+  FRM_RTZ = 1, /* Aka 0b001.  */
+  FRM_RDN = 2, /* Aka 0b010.  */
+  FRM_RUP = 3, /* Aka 0b011.  */
+  FRM_RMM = 4, /* Aka 0b100.  */
+  FRM_DYN = 7, /* Aka 0b111.  */
 };
 
 opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
-- 
2.34.1

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, 9 Jun 2023, Jiufu Guo wrote:

> Hi,
> 
> As checking the code, there is a "gcc_assert (SCALAR_INT_MODE_P (mode))"
> in "try_const_anchors".
> This assert seems correct because the function try_const_anchors cares
> about integer values currently, and modes other than SCALAR_INT_MODE_P
> are not needed to support.
> 
> This patch makes sure SCALAR_INT_MODE_P when calling try_const_anchors.
> 
> This patch is raised when drafting below one.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
> try_const_anchors, and hits the assert/ice.
> 
> Boostrap and regtest pass on ppc64{,le} and x86_64.
> Is this ok for trunk?

Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
I suggest to instead fix try_const_anchors to change

  /* CONST_INT is used for CC modes, but we should leave those alone.  */
  if (GET_MODE_CLASS (mode) == MODE_CC)
return NULL_RTX;

  gcc_assert (SCALAR_INT_MODE_P (mode));

to

  /* CONST_INT is used for CC modes, leave any non-scalar-int mode alone.  */
  if (!SCALAR_INT_MODE_P (mode))
return NULL_RTX;

but as said I wonder how we arrive at a BLKmode CONST_INT and whether
we should have fended this off earlier.  Can you share more complete
RTL of that stack_tie?

> 
> BR,
> Jeff (Jiufu Guo)
> 
> gcc/ChangeLog:
> 
>   * cse.cc (cse_insn): Add SCALAR_INT_MODE_P condition.
> 
> ---
>  gcc/cse.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index 2bb63ac4105..f213fa0faf7 100644
> *** a/gcc/cse.cc
> --- b/gcc/cse.cc
> ***
> *** 5003,5009 
> if (targetm.const_anchor
> && !src_related
> && src_const
> !   && GET_CODE (src_const) == CONST_INT)
>   {
> src_related = try_const_anchors (src_const, mode);
> src_related_is_const_anchor = src_related != NULL_RTX;
> - - 
> --- 5003,5010 
> if (targetm.const_anchor
> && !src_related
> && src_const
> !   && GET_CODE (src_const) == CONST_INT
> !   && SCALAR_INT_MODE_P (mode))
>   {
> src_related = try_const_anchors (src_const, mode);
> src_related_is_const_anchor = src_related != NULL_RTX;
> 2.39.3
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

[committed] fortran: Fix ICE on pr96024.f90 on big-endian hosts [PR96024]

2023-06-09 Thread Jakub Jelinek via Gcc-patches

Hi!

The pr96024.f90 testcase ICEs on big-endian hosts.  The problem is
that length->val.integer is accessed after checking
length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant
which uses length->val.character union member instead and on big-endian
we end up reading constant 0x1 rather than some small number
on little-endian and if target doesn't have enough memory for 4 times
that (i.e. 16GB allocation), it ICEs.

Fixed thusly, bootstrapped/regtested on
{x86_64,i686,powerpc64le,aarch64,s390x}-linux, preapproved in bugzilla
by Harald, committed to trunk and 13, 12, 11 and 10 release branches.

2023-06-09  Jakub Jelinek  

PR fortran/96024
* primary.cc (gfc_convert_to_structure_constructor): Only do
constant string ctor length verification and truncation/padding
if constant length has INTEGER type.

--- gcc/fortran/primary.cc.jj   2023-05-20 15:31:09.183661713 +0200
+++ gcc/fortran/primary.cc  2023-06-08 11:49:39.354875373 +0200
@@ -3188,10 +3188,11 @@ gfc_convert_to_structure_constructor (gf
goto cleanup;
 
   /* For a constant string constructor, make sure the length is
-correct; truncate of fill with blanks if needed.  */
+correct; truncate or fill with blanks if needed.  */
   if (this_comp->ts.type == BT_CHARACTER && !this_comp->attr.allocatable
  && this_comp->ts.u.cl && this_comp->ts.u.cl->length
  && this_comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
+ && this_comp->ts.u.cl->length->ts.type == BT_INTEGER
  && actual->expr->ts.type == BT_CHARACTER
  && actual->expr->expr_type == EXPR_CONSTANT)
{

Jakub

Re: [PATCH v1] RISC-V: Fix one warning of frm enum.

2023-06-09 Thread juzhe.zh...@rivai.ai

Ok.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-09 15:53
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Fix one warning of frm enum.
From: Pan Li 
 
This patch would like to fix one warning similar as below, and add the
link for where the values comes from.
 
./gcc/config/riscv/riscv-protos.h:260:13: warning: binary constants are
a C++14 feature or GCC extension
FRM_RNE = 0b000,
  ^
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (enum frm_field_enum): Adjust
literal to int.
---
gcc/config/riscv/riscv-protos.h | 17 ++---
1 file changed, 10 insertions(+), 7 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 38e4125424b..66c1f535d60 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -254,15 +254,18 @@ enum vxrm_field_enum
   VXRM_RDN,
   VXRM_ROD
};
-/* Rounding mode bitfield for floating point FRM.  */
+/* Rounding mode bitfield for floating point FRM.  The value of enum comes
+   from the below link.
+   
https://github.com/riscv/riscv-isa-manual/blob/main/src/f-st-ext.adoc#floating-point-control-and-status-register
+ */
enum frm_field_enum
{
-  FRM_RNE = 0b000,
-  FRM_RTZ = 0b001,
-  FRM_RDN = 0b010,
-  FRM_RUP = 0b011,
-  FRM_RMM = 0b100,
-  FRM_DYN = 0b111
+  FRM_RNE = 0, /* Aka 0b000.  */
+  FRM_RTZ = 1, /* Aka 0b001.  */
+  FRM_RDN = 2, /* Aka 0b010.  */
+  FRM_RUP = 3, /* Aka 0b011.  */
+  FRM_RMM = 4, /* Aka 0b100.  */
+  FRM_DYN = 7, /* Aka 0b111.  */
};
opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
-- 
2.34.1

Re: [PATCH v1] RISC-V: Fix one warning of frm enum.

2023-06-09 Thread Kito Cheng via Gcc-patches

Lgtm

juzhe.zh...@rivai.ai 於 2023年6月9日 週五，16:08寫道：

> Ok.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-06-09 15:53
> To: gcc-patches
> CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang;
> kito.cheng
> Subject: [PATCH v1] RISC-V: Fix one warning of frm enum.
> From: Pan Li 
>
> This patch would like to fix one warning similar as below, and add the
> link for where the values comes from.
>
> ./gcc/config/riscv/riscv-protos.h:260:13: warning: binary constants are
> a C++14 feature or GCC extension
> FRM_RNE = 0b000,
>   ^
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (enum frm_field_enum): Adjust
> literal to int.
> ---
> gcc/config/riscv/riscv-protos.h | 17 ++---
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-protos.h
> b/gcc/config/riscv/riscv-protos.h
> index 38e4125424b..66c1f535d60 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -254,15 +254,18 @@ enum vxrm_field_enum
>VXRM_RDN,
>VXRM_ROD
> };
> -/* Rounding mode bitfield for floating point FRM.  */
> +/* Rounding mode bitfield for floating point FRM.  The value of enum comes
> +   from the below link.
> +
> https://github.com/riscv/riscv-isa-manual/blob/main/src/f-st-ext.adoc#floating-point-control-and-status-register
> + */
> enum frm_field_enum
> {
> -  FRM_RNE = 0b000,
> -  FRM_RTZ = 0b001,
> -  FRM_RDN = 0b010,
> -  FRM_RUP = 0b011,
> -  FRM_RMM = 0b100,
> -  FRM_DYN = 0b111
> +  FRM_RNE = 0, /* Aka 0b000.  */
> +  FRM_RTZ = 1, /* Aka 0b001.  */
> +  FRM_RDN = 2, /* Aka 0b010.  */
> +  FRM_RUP = 3, /* Aka 0b011.  */
> +  FRM_RMM = 4, /* Aka 0b100.  */
> +  FRM_DYN = 7, /* Aka 0b111.  */
> };
> opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
> --
> 2.34.1
>
>
>

Re: [PATCH V5] VECT: Add SELECT_VL support

2023-06-09 Thread Richard Biener via Gcc-patches

On Thu, 8 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Co-authored-by: Richard Sandiford
> Co-authored-by: Richard Biener 
> 
> This patch address comments from Richard && Richi and rebase to trunk.
> 
> This patch is adding SELECT_VL middle-end support
> allow target have target dependent optimization in case of
> length calculation.
> 
> This patch is inspired by RVV ISA and LLVM:
> https://reviews.llvm.org/D99750
> 
> The SELECT_VL is same behavior as LLVM "get_vector_length" with
> these following properties:
> 
> 1. Only apply on single-rgroup.
> 2. non SLP.
> 3. adjust loop control IV.
> 4. adjust data reference IV.
> 5. allow non-vf elements processing in non-final iteration
> 
> Code:
># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> # { for (size_t i=0; i 
> Take RVV codegen for example:
> 
> Before this patch:
> vvaddint32:
> ble a0,zero,.L6
> csrra4,vlenb
> srlia6,a4,2
> .L4:
> mv  a5,a0
> bleua0,a6,.L3
> mv  a5,a6
> .L3:
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a2)
> vsetvli a7,zero,e32,m1,ta,ma
> sub a0,a0,a5
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a3)
> add a2,a2,a4
> add a3,a3,a4
> add a1,a1,a4
> bne a0,zero,.L4
> .L6:
> ret
> 
> After this patch:
> 
> vvaddint32:
> vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> vle32.v v0, (a1) # Get first vector
>   sub a0, a0, t0 # Decrement number done
>   slli t0, t0, 2 # Multiply number done by 4 bytes
>   add a1, a1, t0 # Bump pointer
> vle32.v v1, (a2) # Get second vector
>   add a2, a2, t0 # Bump pointer
> vadd.vv v2, v0, v1   # Sum vectors
> vse32.v v2, (a3) # Store result
>   add a3, a3, t0 # Bump pointer
>   bnez a0, vvaddint32# Loop back
>   ret# Finished
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add SELECT_VL support.
> * internal-fn.def (SELECT_VL): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
> * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
> (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
>
> Co-authored-by: Richard Sandiford 
> Co-authored-by: Richard Biener 
> 
> ---
>  gcc/doc/md.texi | 22 ++
>  gcc/internal-fn.def |  1 +
>  gcc/optabs.def  |  1 +
>  gcc/tree-vect-loop-manip.cc | 32 ++
>  gcc/tree-vect-loop.cc   | 72 +++
>  gcc/tree-vect-stmts.cc  | 86 -
>  gcc/tree-vectorizer.h   |  6 +++
>  7 files changed, 201 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6a435eb4461..95f7fe1f802 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of scalar iterations that should be handled
> +by one iteration of a vector loop.  Operand 1 is the total number of
> +scalar iterations that the loop needs to process and operand 2 is a
> +maximum bound on the result (also known as the maximum ``vectorization
> +factor'').
> +
> +The maximum value of operand 0 is given by:
> +@smallexample
> +operand0 = MIN (operand1, operand2)
> +@end smallexample
> +However, targets might choose a lower value than this, based on
> +target-specific criteria.  Each iteration of the vector loop might
> +therefore process a different number of scalar iterations, which in turn
> +means that induction variables will have a variable step.  Because of
> +this, it is generally not useful to define this instruction if it will
> +always calculate the maximum value.
> +
> +This optab is only useful on targets that implement @samp{len_load_@var{m}}
> +and/or @samp{len_store_@var{m}}.
> +
>  @cindex @code{check_raw_ptrs@var{m}} instruction pattern
>  @item @samp{check_raw_ptrs@var{m}}
>  Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 3ac9d82aace..5d638de6d06 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -177,6 +177,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
>  DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
>  
>  DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)

RE: [PATCH v1] RISC-V: Fix one warning of frm enum.

2023-06-09 Thread Li, Pan2 via Gcc-patches

Committed, thanks Kito and Juzhe.

Pan

From: Kito Cheng 
Sent: Friday, June 9, 2023 4:11 PM
To: juzhe.zh...@rivai.ai
Cc: Robin Dapp ; gcc-patches ; 
jeffreyalaw ; Li, Pan2 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Fix one warning of frm enum.

Lgtm

juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>於 2023年6月9日 週五，16:08寫道：
Ok.



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-09 15:53
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; 
yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Fix one warning of frm enum.
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to fix one warning similar as below, and add the
link for where the values comes from.

./gcc/config/riscv/riscv-protos.h:260:13: warning: binary constants are
a C++14 feature or GCC extension
FRM_RNE = 0b000,
  ^

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum frm_field_enum): Adjust
literal to int.
---
gcc/config/riscv/riscv-protos.h | 17 ++---
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 38e4125424b..66c1f535d60 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -254,15 +254,18 @@ enum vxrm_field_enum
   VXRM_RDN,
   VXRM_ROD
};
-/* Rounding mode bitfield for floating point FRM.  */
+/* Rounding mode bitfield for floating point FRM.  The value of enum comes
+   from the below link.
+   
https://github.com/riscv/riscv-isa-manual/blob/main/src/f-st-ext.adoc#floating-point-control-and-status-register
+ */
enum frm_field_enum
{
-  FRM_RNE = 0b000,
-  FRM_RTZ = 0b001,
-  FRM_RDN = 0b010,
-  FRM_RUP = 0b011,
-  FRM_RMM = 0b100,
-  FRM_DYN = 0b111
+  FRM_RNE = 0, /* Aka 0b000.  */
+  FRM_RTZ = 1, /* Aka 0b001.  */
+  FRM_RDN = 2, /* Aka 0b010.  */
+  FRM_RUP = 3, /* Aka 0b011.  */
+  FRM_RMM = 4, /* Aka 0b100.  */
+  FRM_DYN = 7, /* Aka 0b111.  */
};
opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
--
2.34.1

Re: Re: [PATCH V5] VECT: Add SELECT_VL support

2023-06-09 Thread juzhe.zh...@rivai.ai

Hi, Richi.

Thanks for comments.

>>this is in a if (j == 0) branch, please put the assert into the
>>else {} block of it instead.  You still run the 'bump' computation
>>before the loop, so if you intend to never handle j != 0 you could
>>put it there, too.  But at least avoid calling
>>vect_get_data_ptr_increment in that code when you re-do it here.
Ok

>> btw, I wonder how vect_create_data_ref_ptr and
>>vect_get_data_ptr_increment handle LEN when not using .SELECT_VL?
>>Are they always using the constant VF here?
Yes, when we are not using SELECT_VL, we always use VF.

>>Can't this be done in vect_get_data_ptr_increment by instead
>>of using VF for LOOP_VINFO_USING_SELECT_VL_P use
>>vect_get_loop_len () and so only change one place?
oK, I will try that in V6 patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-09 16:13
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V5] VECT: Add SELECT_VL support
On Thu, 8 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Co-authored-by: Richard Sandiford
> Co-authored-by: Richard Biener 
> 
> This patch address comments from Richard && Richi and rebase to trunk.
> 
> This patch is adding SELECT_VL middle-end support
> allow target have target dependent optimization in case of
> length calculation.
> 
> This patch is inspired by RVV ISA and LLVM:
> https://reviews.llvm.org/D99750
> 
> The SELECT_VL is same behavior as LLVM "get_vector_length" with
> these following properties:
> 
> 1. Only apply on single-rgroup.
> 2. non SLP.
> 3. adjust loop control IV.
> 4. adjust data reference IV.
> 5. allow non-vf elements processing in non-final iteration
> 
> Code:
># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> # { for (size_t i=0; i 
> Take RVV codegen for example:
> 
> Before this patch:
> vvaddint32:
> ble a0,zero,.L6
> csrra4,vlenb
> srlia6,a4,2
> .L4:
> mv  a5,a0
> bleua0,a6,.L3
> mv  a5,a6
> .L3:
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a2)
> vsetvli a7,zero,e32,m1,ta,ma
> sub a0,a0,a5
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a3)
> add a2,a2,a4
> add a3,a3,a4
> add a1,a1,a4
> bne a0,zero,.L4
> .L6:
> ret
> 
> After this patch:
> 
> vvaddint32:
> vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> vle32.v v0, (a1) # Get first vector
>   sub a0, a0, t0 # Decrement number done
>   slli t0, t0, 2 # Multiply number done by 4 bytes
>   add a1, a1, t0 # Bump pointer
> vle32.v v1, (a2) # Get second vector
>   add a2, a2, t0 # Bump pointer
> vadd.vv v2, v0, v1   # Sum vectors
> vse32.v v2, (a3) # Store result
>   add a3, a3, t0 # Bump pointer
>   bnez a0, vvaddint32# Loop back
>   ret# Finished
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add SELECT_VL support.
> * internal-fn.def (SELECT_VL): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
> * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
> (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
>
> Co-authored-by: Richard Sandiford 
> Co-authored-by: Richard Biener 
> 
> ---
>  gcc/doc/md.texi | 22 ++
>  gcc/internal-fn.def |  1 +
>  gcc/optabs.def  |  1 +
>  gcc/tree-vect-loop-manip.cc | 32 ++
>  gcc/tree-vect-loop.cc   | 72 +++
>  gcc/tree-vect-stmts.cc  | 86 -
>  gcc/tree-vectorizer.h   |  6 +++
>  7 files changed, 201 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6a435eb4461..95f7fe1f802 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of scalar iterations that should be handled
> +by one iteration of a vector loop.  Operand 1 is the total number of
> +scalar iterations that the loop needs to process and operand 2 is a
> +maximum bound on the result (also known as the maximum ``vectorization
> +factor'').
> +
> +The maximum value of operand 0 is given by:
> +@smallexample
> +operand0 = MIN (operand1, operand2)
> +@end smallexample
> +However, targets might choose a lower value than this, based on
> +target-specific criteria.  Each iteration o

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread guojiufu via Gcc-patches


Hi,

On 2023-06-09 16:00, Richard Biener wrote:

On Fri, 9 Jun 2023, Jiufu Guo wrote:


Hi,

As checking the code, there is a "gcc_assert (SCALAR_INT_MODE_P 
(mode))"

in "try_const_anchors".
This assert seems correct because the function try_const_anchors cares
about integer values currently, and modes other than SCALAR_INT_MODE_P
are not needed to support.

This patch makes sure SCALAR_INT_MODE_P when calling 
try_const_anchors.


This patch is raised when drafting below one.
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
try_const_anchors, and hits the assert/ice.

Boostrap and regtest pass on ppc64{,le} and x86_64.
Is this ok for trunk?


Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
I suggest to instead fix try_const_anchors to change

  /* CONST_INT is used for CC modes, but we should leave those alone.  
*/

  if (GET_MODE_CLASS (mode) == MODE_CC)
return NULL_RTX;

  gcc_assert (SCALAR_INT_MODE_P (mode));

to

  /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
alone.  */

  if (!SCALAR_INT_MODE_P (mode))
return NULL_RTX;



This is also able to fix this issue.  there is a "Punt on CC modes" 
patch

to return NULL_RTX in try_const_anchors.


but as said I wonder how we arrive at a BLKmode CONST_INT and whether
we should have fended this off earlier.  Can you share more complete
RTL of that stack_tie?



(insn 15 14 16 3 (parallel [
(set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
(const_int 0 [0]))
]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
 (nil))

It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".

This is generated by:

rs6000.md
(define_expand "restore_stack_block"
  [(set (match_dup 2) (match_dup 3))
   (set (match_dup 4) (match_dup 2))
   (match_dup 5)
   (set (match_operand 0 "register_operand")
(match_operand 1 "register_operand"))]
  ""
{
  rtvec p;

  operands[1] = force_reg (Pmode, operands[1]);
  operands[2] = gen_reg_rtx (Pmode);
  operands[3] = gen_frame_mem (Pmode, operands[0]);
  operands[4] = gen_frame_mem (Pmode, operands[1]);
  p = rtvec_alloc (1);
  RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
  const0_rtx);
  operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
})

This kind of case (like BLK with const0) is rare, but this would be an 
intended

RTL, and seems not invalid.

Thanks so much for your quick and very helpful comments!!

BR,
Jeff (Jiufu Guo)






BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* cse.cc (cse_insn): Add SCALAR_INT_MODE_P condition.

---
 gcc/cse.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index 2bb63ac4105..f213fa0faf7 100644
*** a/gcc/cse.cc
--- b/gcc/cse.cc
***
*** 5003,5009 
if (targetm.const_anchor
  && !src_related
  && src_const
! && GET_CODE (src_const) == CONST_INT)
{
  src_related = try_const_anchors (src_const, mode);
  src_related_is_const_anchor = src_related != NULL_RTX;
- -
--- 5003,5010 
if (targetm.const_anchor
  && !src_related
  && src_const
! && GET_CODE (src_const) == CONST_INT
! && SCALAR_INT_MODE_P (mode))
{
  src_related = try_const_anchors (src_const, mode);
  src_related_is_const_anchor = src_related != NULL_RTX;
2.39.3

Re: [PATCH v10] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

2023-06-09 Thread Kito Cheng via Gcc-patches

lgtm too, thanks :)

On Fri, Jun 9, 2023 at 3:15 PM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-06-09 15:07
> To: gcc-patches
> CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
> Subject: [PATCH v10] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.
> From: Pan Li 
>
> This patch would like to refactor the requirement of both the ZVFH
> and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
> iterators of RVV. And then the ZVFH will leverage one define attr as
> the gate for FP16 supported or not.
>
> Please note the ZVFH will cover the ZVFHMIN instructions. This patch
> add one test for this.
>
> Signed-off-by: Pan Li 
> Co-Authored by: Juzhe-Zhong 
> Co-Authored by: Kito Cheng 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.md (enabled): Move to another place, and
> add fp_vector_disabled to the cond.
> (fp_vector_disabled): New attr defined for disabling fp.
> * config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add vle16 test
> for ZVFHMIN.
> ---
> gcc/config/riscv/riscv.md | 39 ---
> gcc/config/riscv/vector-iterators.md  | 23 ++-
> .../riscv/rvv/base/zvfhmin-intrinsic.c| 15 ++-
> 3 files changed, 59 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 38b8fba2a53..d8e935cb934 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -239,12 +239,6 @@ (define_attr "ext_enabled" "no,yes"
> ]
> (const_string "no")))
> -;; Attribute to control enable or disable instructions.
> -(define_attr "enabled" "no,yes"
> -  (cond [(eq_attr "ext_enabled" "no")
> - (const_string "no")]
> - (const_string "yes")))
> -
> ;; Classification of each insn.
> ;; branch conditional branch
> ;; jump unconditional jump
> @@ -434,6 +428,39 @@ (define_attr "type"
> (eq_attr "move_type" "rdvlenb") (const_string "rdvlenb")]
> (const_string "unknown")))
> +;; True if the float point vector is disabled.
> +(define_attr "fp_vector_disabled" "no,yes"
> +  (cond [
> +(and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
> +   vfwalu,vfwmul,vfmuladd,vfwmuladd,
> +   vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
> +   vfclass,vfmerge,
> +   vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
> +   vfredo,vfredu,vfwredo,vfwredu,
> +   vfslide1up,vfslide1down")
> + (and (eq_attr "mode" "VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
> +   (match_test "!TARGET_ZVFH")))
> +(const_string "yes")
> +
> +;; The mode records as QI for the FP16 <=> INT8 instruction.
> +(and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
> + (and (eq_attr "mode" "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")
> +   (match_test "!TARGET_ZVFH")))
> +(const_string "yes")
> +  ]
> +  (const_string "no")))
> +
> +;; Attribute to control enable or disable instructions.
> +(define_attr "enabled" "no,yes"
> +  (cond [
> +(eq_attr "ext_enabled" "no")
> +(const_string "no")
> +
> +(eq_attr "fp_vector_disabled" "yes")
> +(const_string "no")
> +  ]
> +  (const_string "yes")))
> +
> ;; Length of instruction in bytes.
> (define_attr "length" ""
> (cond [
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index f4946d84449..234b712bc9d 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -453,9 +453,8 @@ (define_mode_iterator V_WHOLE [
>(VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> "TARGET_VECTOR_ELEN_64")
>(VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
> "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
> -  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> -  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> -  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 32")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 64")
>(VNx8HF "TARGET_VECTOR_ELEN_FP_16")
>(VNx16HF "TARGET_VECTOR_ELEN_FP_16")
>(VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> @@ -477,7 +476,11 @@ (define_mode_iterator V_WHOLE [
> (define_mode_iterator V_FRACT [
>(VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI (VNx4QI "TARGET_MIN_VLEN > 32") 
> (VNx8QI "TARGET_MIN_VLEN >= 128")
>(VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN > 32") (VNx4HI 
> "TARGET_MIN_VLEN >= 128")
> -  (VNx1HF "TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_MIN_VLEN > 32") (VNx4HF 
> "TARGET_MIN_VLEN >= 128")
> +
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
> +
>(VNx1SI "TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN < 128") (VNx2SI 
> "TARGET_MIN_VLEN >= 128")
>(VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_

RE: [PATCH v10] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

2023-06-09 Thread Li, Pan2 via Gcc-patches

Committed, thanks Juzhe and Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, June 9, 2023 4:28 PM
To: juzhe.zh...@rivai.ai
Cc: Li, Pan2 ; gcc-patches ; Robin 
Dapp ; jeffreyalaw ; Wang, Yanzhang 

Subject: Re: [PATCH v10] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

lgtm too, thanks :)

On Fri, Jun 9, 2023 at 3:15 PM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-06-09 15:07
> To: gcc-patches
> CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
> Subject: [PATCH v10] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.
> From: Pan Li 
>
> This patch would like to refactor the requirement of both the ZVFH
> and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
> iterators of RVV. And then the ZVFH will leverage one define attr as
> the gate for FP16 supported or not.
>
> Please note the ZVFH will cover the ZVFHMIN instructions. This patch
> add one test for this.
>
> Signed-off-by: Pan Li 
> Co-Authored by: Juzhe-Zhong 
> Co-Authored by: Kito Cheng 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.md (enabled): Move to another place, and
> add fp_vector_disabled to the cond.
> (fp_vector_disabled): New attr defined for disabling fp.
> * config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add vle16 test
> for ZVFHMIN.
> ---
> gcc/config/riscv/riscv.md | 39 ---
> gcc/config/riscv/vector-iterators.md  | 23 ++-
> .../riscv/rvv/base/zvfhmin-intrinsic.c| 15 ++-
> 3 files changed, 59 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 38b8fba2a53..d8e935cb934 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -239,12 +239,6 @@ (define_attr "ext_enabled" "no,yes"
> ]
> (const_string "no")))
> -;; Attribute to control enable or disable instructions.
> -(define_attr "enabled" "no,yes"
> -  (cond [(eq_attr "ext_enabled" "no")
> - (const_string "no")]
> - (const_string "yes")))
> -
> ;; Classification of each insn.
> ;; branch conditional branch
> ;; jump unconditional jump
> @@ -434,6 +428,39 @@ (define_attr "type"
> (eq_attr "move_type" "rdvlenb") (const_string "rdvlenb")]
> (const_string "unknown")))
> +;; True if the float point vector is disabled.
> +(define_attr "fp_vector_disabled" "no,yes"
> +  (cond [
> +(and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
> +   vfwalu,vfwmul,vfmuladd,vfwmuladd,
> +   vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
> +   vfclass,vfmerge,
> +   vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
> +   vfredo,vfredu,vfwredo,vfwredu,
> +   vfslide1up,vfslide1down")
> + (and (eq_attr "mode" "VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
> +   (match_test "!TARGET_ZVFH")))
> +(const_string "yes")
> +
> +;; The mode records as QI for the FP16 <=> INT8 instruction.
> +(and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
> + (and (eq_attr "mode" "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")
> +   (match_test "!TARGET_ZVFH")))
> +(const_string "yes")
> +  ]
> +  (const_string "no")))
> +
> +;; Attribute to control enable or disable instructions.
> +(define_attr "enabled" "no,yes"
> +  (cond [
> +(eq_attr "ext_enabled" "no")
> +(const_string "no")
> +
> +(eq_attr "fp_vector_disabled" "yes")
> +(const_string "no")
> +  ]
> +  (const_string "yes")))
> +
> ;; Length of instruction in bytes.
> (define_attr "length" ""
> (cond [
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index f4946d84449..234b712bc9d 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -453,9 +453,8 @@ (define_mode_iterator V_WHOLE [
>(VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> "TARGET_VECTOR_ELEN_64")
>(VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
> "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
> -  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> -  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> -  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 32")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 64")
>(VNx8HF "TARGET_VECTOR_ELEN_FP_16")
>(VNx16HF "TARGET_VECTOR_ELEN_FP_16")
>(VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> @@ -477,7 +476,11 @@ (define_mode_iterator V_WHOLE [
> (define_mode_iterator V_FRACT [
>(VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI (VNx4QI "TARGET_MIN_VLEN > 32") 
> (VNx8QI "TARGET_MIN_VLEN >= 128")
>(VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN > 32") (VNx4HI 
> "TARGET_MIN_VLEN >= 128")
> -  (VNx1HF "TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_MIN_VLEN > 32") (VNx4HF 
> "TARGET_MIN_VLEN >= 128")
> +
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_V

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread Richard Sandiford via Gcc-patches

guojiufu  writes:
> Hi,
>
> On 2023-06-09 16:00, Richard Biener wrote:
>> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> 
>>> Hi,
>>> 
>>> As checking the code, there is a "gcc_assert (SCALAR_INT_MODE_P 
>>> (mode))"
>>> in "try_const_anchors".
>>> This assert seems correct because the function try_const_anchors cares
>>> about integer values currently, and modes other than SCALAR_INT_MODE_P
>>> are not needed to support.
>>> 
>>> This patch makes sure SCALAR_INT_MODE_P when calling 
>>> try_const_anchors.
>>> 
>>> This patch is raised when drafting below one.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>>> try_const_anchors, and hits the assert/ice.
>>> 
>>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>>> Is this ok for trunk?
>> 
>> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
>> I suggest to instead fix try_const_anchors to change
>> 
>>   /* CONST_INT is used for CC modes, but we should leave those alone.  
>> */
>>   if (GET_MODE_CLASS (mode) == MODE_CC)
>> return NULL_RTX;
>> 
>>   gcc_assert (SCALAR_INT_MODE_P (mode));
>> 
>> to
>> 
>>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
>> alone.  */
>>   if (!SCALAR_INT_MODE_P (mode))
>> return NULL_RTX;
>> 
>
> This is also able to fix this issue.  there is a "Punt on CC modes" 
> patch
> to return NULL_RTX in try_const_anchors.
>
>> but as said I wonder how we arrive at a BLKmode CONST_INT and whether
>> we should have fended this off earlier.  Can you share more complete
>> RTL of that stack_tie?
>
>
> (insn 15 14 16 3 (parallel [
>  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>  (const_int 0 [0]))
>  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>   (nil))
>
> It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".

I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] ...)
would be though.  It's arguably more accurate too, since the effect
on the stack locations is unspecified rather than predictable.

Thanks,
Richard

Re: [COMMITTED 2/4] - Remove tree_code from range-operator.

2023-06-09 Thread Aldy Hernandez via Gcc-patches





On 6/8/23 20:57, Andrew MacLeod wrote:
Range_operator had a tree code added last release to facilitate bitmask 
operations.  This was intended to be a temporary change until we could 
figure out something more strategic going forward.


This patch removes the tree_code and replaces it with a virtual routine 
to perform the masking. Each of the affected tree codes operators now 
call the bitmask routine via a virtual function.  At some point we may 
want to consolidate the code that CCP is using so that it resides in the 
range_operator, but the extensive parameter list used by that CCP 
routine makes that prohibitive to do at the moment.


It's on my radar for this release.  I will be changing the 
nonzero_bitmask field in irange for a value/mask pair, as CCP does, and 
if all goes well, consolidating CCP as well as the on-the-side bitmask 
tracking IPA does.


Thanks for tidying this up.

Aldy

[PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener 

This patch address comments from Richard && Richi and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-&incr_gsi, insert_after, &index_before_incr,
-&index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+insert_after, &index_before_incr, &index_after_incr);
+ tree len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
+  index_before_incr, nitems_step);
+ gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len));
+   }
+  else
+   {
+ create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+&incr_gsi, insert_after, &index_before_incr,
+&index_after_incr);
+ gimple_seq_add_stmt (header_seq,
+  gimple_build_assign (step, MIN_EXPR,
+   index_before_incr,
+   nitems_step));
+   }
   *iv_step = step;
   *compare_step = nitems_step;
-  return index_before_incr;
+  return LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? index_after_incr
+  : index_before_incr;
 }
 
   /* Create increment IV.  */
@@ -888,7 +901,8 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
   gcond *cond_stmt;
-  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
+  && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
 {
   gcc_assert (compare_step);
   tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5b7a0da0034..ace9e759f5b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
 using_partial_vectors_p (false),
 using_decrementing_iv_p (false),
+using_select_vl_p (false),
 epil_using_partial_vectors_p (false),
 partial_load_store_bias (0),
 peeling_for_gaps (false),
@@ -2737,6 +2738,77 @@ start_over:
LOOP_VINFO_VECT_FACTOR (loop_vinfo
 LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
 
+  /* If a loop uses length controls and has a decrementing loop control IV,
+ we will normally pass that IV through a MIN_EXPR to calcaluate the
+ basis for the length controls.  E.g. in a loop that processes one
+ element per scalar iteration, the number of elements would be
+ MIN_EXPR , where N is the number of scalar iterations left.
+
+ This MIN_EXPR approach allows us to use pointer IVs with an invariant
+ step, since only the final iteration of the vector loop can have
+ inactive lanes.
+
+ However, some targets have a dedicated instruction for calculating the
+ preferred length, given the total number of elements that still need to
+ be processed.  This is encapsulated in the SELECT_VL internal function.
+
+ If the target supports SELECT_VL, we can use it instead of MIN_EXPR
+ to determine the basis for the length controls.  However, unlike the
+ MIN_EXPR calculation, the SELECT_VL calculation can decide to make
+ lanes inactive in any iteration of the vector loop, not just the

Re: Re: [PATCH V5] VECT: Add SELECT_VL support

2023-06-09 Thread juzhe.zh...@rivai.ai

Hi, Richi. I have fixed by following your suggestions
Could you take a look at it?

V6 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621122.html 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-09 16:13
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V5] VECT: Add SELECT_VL support
On Thu, 8 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Co-authored-by: Richard Sandiford
> Co-authored-by: Richard Biener 
> 
> This patch address comments from Richard && Richi and rebase to trunk.
> 
> This patch is adding SELECT_VL middle-end support
> allow target have target dependent optimization in case of
> length calculation.
> 
> This patch is inspired by RVV ISA and LLVM:
> https://reviews.llvm.org/D99750
> 
> The SELECT_VL is same behavior as LLVM "get_vector_length" with
> these following properties:
> 
> 1. Only apply on single-rgroup.
> 2. non SLP.
> 3. adjust loop control IV.
> 4. adjust data reference IV.
> 5. allow non-vf elements processing in non-final iteration
> 
> Code:
># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> # { for (size_t i=0; i 
> Take RVV codegen for example:
> 
> Before this patch:
> vvaddint32:
> ble a0,zero,.L6
> csrra4,vlenb
> srlia6,a4,2
> .L4:
> mv  a5,a0
> bleua0,a6,.L3
> mv  a5,a6
> .L3:
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a2)
> vsetvli a7,zero,e32,m1,ta,ma
> sub a0,a0,a5
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a3)
> add a2,a2,a4
> add a3,a3,a4
> add a1,a1,a4
> bne a0,zero,.L4
> .L6:
> ret
> 
> After this patch:
> 
> vvaddint32:
> vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> vle32.v v0, (a1) # Get first vector
>   sub a0, a0, t0 # Decrement number done
>   slli t0, t0, 2 # Multiply number done by 4 bytes
>   add a1, a1, t0 # Bump pointer
> vle32.v v1, (a2) # Get second vector
>   add a2, a2, t0 # Bump pointer
> vadd.vv v2, v0, v1   # Sum vectors
> vse32.v v2, (a3) # Store result
>   add a3, a3, t0 # Bump pointer
>   bnez a0, vvaddint32# Loop back
>   ret# Finished
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add SELECT_VL support.
> * internal-fn.def (SELECT_VL): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
> * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
> (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
>
> Co-authored-by: Richard Sandiford 
> Co-authored-by: Richard Biener 
> 
> ---
>  gcc/doc/md.texi | 22 ++
>  gcc/internal-fn.def |  1 +
>  gcc/optabs.def  |  1 +
>  gcc/tree-vect-loop-manip.cc | 32 ++
>  gcc/tree-vect-loop.cc   | 72 +++
>  gcc/tree-vect-stmts.cc  | 86 -
>  gcc/tree-vectorizer.h   |  6 +++
>  7 files changed, 201 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6a435eb4461..95f7fe1f802 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of scalar iterations that should be handled
> +by one iteration of a vector loop.  Operand 1 is the total number of
> +scalar iterations that the loop needs to process and operand 2 is a
> +maximum bound on the result (also known as the maximum ``vectorization
> +factor'').
> +
> +The maximum value of operand 0 is given by:
> +@smallexample
> +operand0 = MIN (operand1, operand2)
> +@end smallexample
> +However, targets might choose a lower value than this, based on
> +target-specific criteria.  Each iteration of the vector loop might
> +therefore process a different number of scalar iterations, which in turn
> +means that induction variables will have a variable step.  Because of
> +this, it is generally not useful to define this instruction if it will
> +always calculate the maximum value.
> +
> +This optab is only useful on targets that implement @samp{len_load_@var{m}}
> +and/or @samp{len_store_@var{m}}.
> +
>  @cindex @code{check_raw_ptrs@var{m}} instruction pattern
>  @item @samp{check_raw_ptrs@var{m}}
>  Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> diff --git a/gcc/internal-fn.d

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Andrew Stubbs


On 07/06/2023 20:42, Richard Sandiford wrote:

I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
the former and "V2x8QI" for the latter.  V2x8QI is forced to come
after V16QI in the mode list, and so it is only ever used through
explicit choice.  But both modes are functionally vectors of 16 QIs.


OK, that's interesting, but how do you map "complex int" vectors to that 
mode? I tried to figure it out, but there's no DIVMOD support so I 
couldn't just do a straight comparison.


Thanks

Andrew

Re: [PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-09 Thread Richard Biener via Gcc-patches

On Wed, 7 Jun 2023, Jiufu Guo wrote:

> Hi,
> 
> This patch tries to optimize "(X - N * M) / N" to "X / N - M".
> For C code, "/" towards zero (trunc_div), and "X - N * M" maybe
> wrap/overflow/underflow. So, it is valid that "X - N * M" does
> not cross zero and does not wrap/overflow/underflow.
> 
> Compare with previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618796.html
> 
> This patch 1. adds the patterns for variable N or M,
> 2. uses simpler form "(X - N * M) / N" for patterns,
> 3. adds functions to gimle-fold.h/cc (not gimple-match-head.cc)
> 4. updates testcases
> 
> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> Is this patch ok for trunk?

Comments below.

> 
> BR,
> Jeff (Jiufu Guo)
> 
>   PR tree-optimization/108757
> 
> gcc/ChangeLog:
> 
>   * gimple-fold.cc (maybe_mult_overflow): New function.
>   (maybe_plus_overflow): New function.
>   (maybe_minus_overflow): New function.
>   (plus_mult_no_ovf_and_keep_sign): New function.
>   (plus_no_ovf_and_keep_sign): New function.
>   * gimple-fold.h (maybe_mult_overflow): New declare.
>   (plus_mult_no_ovf_and_keep_sign): New declare.
>   (plus_no_ovf_and_keep_sign): New declare.
>   * match.pd ((X - N * M) / N): New pattern.
>   ((X + N * M) / N): New pattern.
>   ((X + C) / N): New pattern.
>   ((X + C) >> N): New pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr108757-1.c: New test.
>   * gcc.dg/pr108757-2.c: New test.
>   * gcc.dg/pr108757.h: New test.
> 
> ---
>  gcc/gimple-fold.cc| 161 
>  gcc/gimple-fold.h |   3 +
>  gcc/match.pd  |  58 +++
>  gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
>  gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
>  gcc/testsuite/gcc.dg/pr108757.h   | 244 ++
>  6 files changed, 503 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 581575b65ec..bb833ae17b3 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -9349,3 +9349,164 @@ gimple_stmt_integer_valued_real_p (gimple *stmt, int 
> depth)
>return false;
>  }
>  }
> +
> +/* Return true if "X * Y" may be overflow.  */
> +
> +bool
> +maybe_mult_overflow (value_range &x, value_range &y, signop sgn)

These functions look like some "basic" functionality that should
be (or maybe already is?  Andrew?) provided by the value-range
framework.  That means it should not reside in gimple-fold.{cc,h}
but elsehwere and possibly with an API close to the existing
value-range stuff.

Andrew?

> +{
> +  wide_int wmin0 = x.lower_bound ();
> +  wide_int wmax0 = x.upper_bound ();
> +  wide_int wmin1 = y.lower_bound ();
> +  wide_int wmax1 = y.upper_bound ();
> +
> +  wi::overflow_type min_ovf, max_ovf;
> +  wi::mul (wmin0, wmin1, sgn, &min_ovf);
> +  wi::mul (wmax0, wmax1, sgn, &max_ovf);
> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
> +{
> +  wi::mul (wmin0, wmax1, sgn, &min_ovf);
> +  wi::mul (wmax0, wmin1, sgn, &max_ovf);
> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
> + return false;
> +}
> +  return true;
> +}
> +
> +/* Return true if "X + Y" may be overflow.  */
> +
> +static bool
> +maybe_plus_overflow (value_range &x, value_range &y, signop sgn)
> +{
> +  wide_int wmin0 = x.lower_bound ();
> +  wide_int wmax0 = x.upper_bound ();
> +  wide_int wmin1 = y.lower_bound ();
> +  wide_int wmax1 = y.upper_bound ();
> +
> +  wi::overflow_type min_ovf, max_ovf;
> +  wi::add (wmax0, wmax1, sgn, &min_ovf);
> +  wi::add (wmin0, wmin1, sgn, &max_ovf);
> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
> +return false;
> +
> +  return true;
> +}
> +
> +/* Return true if "X - Y" may be overflow.  */
> +
> +static bool
> +maybe_minus_overflow (value_range &x, value_range &y, signop sgn)
> +{
> +  wide_int wmin0 = x.lower_bound ();
> +  wide_int wmax0 = x.upper_bound ();
> +  wide_int wmin1 = y.lower_bound ();
> +  wide_int wmax1 = y.upper_bound ();
> +
> +  wi::overflow_type min_ovf, max_ovf;
> +  wi::sub (wmin0, wmax1, sgn, &min_ovf);
> +  wi::sub (wmax0, wmin1, sgn, &max_ovf);
> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
> +return false;
> +
> +  return true;
> +}
> +
> +/* Return true if there is no overflow in the expression.
> +   And no sign change on the plus/minus for X.

What does the second sentence mean?  sign(X) == sign (X + N*M)?
I suppose zero has positive sign?

> +   CODE is PLUS_EXPR, if the expression is "X + N * M".
> +   CODE is MINUS_EXPR, if the expression is "X - N * M".
> +   TYPE is the integer type of the expressions.  */
> +
> +bool
> +plus_mult_no_ovf_and_keep_sign (tree x, tree m, tree n, tree_code code,
> + tree type)
> +{
> +  value_range vr

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, 9 Jun 2023, Richard Sandiford wrote:

> guojiufu  writes:
> > Hi,
> >
> > On 2023-06-09 16:00, Richard Biener wrote:
> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
> >> 
> >>> Hi,
> >>> 
> >>> As checking the code, there is a "gcc_assert (SCALAR_INT_MODE_P 
> >>> (mode))"
> >>> in "try_const_anchors".
> >>> This assert seems correct because the function try_const_anchors cares
> >>> about integer values currently, and modes other than SCALAR_INT_MODE_P
> >>> are not needed to support.
> >>> 
> >>> This patch makes sure SCALAR_INT_MODE_P when calling 
> >>> try_const_anchors.
> >>> 
> >>> This patch is raised when drafting below one.
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
> >>> try_const_anchors, and hits the assert/ice.
> >>> 
> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
> >>> Is this ok for trunk?
> >> 
> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
> >> I suggest to instead fix try_const_anchors to change
> >> 
> >>   /* CONST_INT is used for CC modes, but we should leave those alone.  
> >> */
> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
> >> return NULL_RTX;
> >> 
> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
> >> 
> >> to
> >> 
> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
> >> alone.  */
> >>   if (!SCALAR_INT_MODE_P (mode))
> >> return NULL_RTX;
> >> 
> >
> > This is also able to fix this issue.  there is a "Punt on CC modes" 
> > patch
> > to return NULL_RTX in try_const_anchors.
> >
> >> but as said I wonder how we arrive at a BLKmode CONST_INT and whether
> >> we should have fended this off earlier.  Can you share more complete
> >> RTL of that stack_tie?
> >
> >
> > (insn 15 14 16 3 (parallel [
> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
> >  (const_int 0 [0]))
> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
> >   (nil))
> >
> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
> 
> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] ...)
> would be though.  It's arguably more accurate too, since the effect
> on the stack locations is unspecified rather than predictable.

powerpc seems to be the only port with a stack_tie that's not
using an UNSPEC RHS.

> Thanks,
> Richard

[committed] libstdc++: Improve tests for emplace member of sequence containers

2023-06-09 Thread Jonathan Wakely via Gcc-patches

I'm fairly confident these emplace member functions work correctly, but
it's still nice to actually test them!

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

Our existing tests for std::deque::emplace, std::list::emplace and
std::vector::emplace are poor. We only have compile tests for PR 52799
and the equivalent for a const_iterator as the insertion point. This
fails to check that the value is actually inserted correctly and the
right iterator is returned.

Add new tests that cover the existing 52799.cc and const_iterator.cc
compile-only tests, as well as verifying the effects are correct.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/deque/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/deque/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/vector/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/vector/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/deque/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/list/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/vector/modifiers/emplace/1.cc: New
test.
---
 .../deque/modifiers/emplace/1.cc  | 70 ++
 .../deque/modifiers/emplace/52799.cc  | 27 ---
 .../deque/modifiers/emplace/const_iterator.cc | 26 ---
 .../23_containers/list/modifiers/emplace/1.cc | 71 +++
 .../list/modifiers/emplace/52799.cc   | 27 ---
 .../list/modifiers/emplace/const_iterator.cc  | 26 ---
 .../vector/modifiers/emplace/1.cc | 70 ++
 .../vector/modifiers/emplace/52799.cc | 27 ---
 .../modifiers/emplace/const_iterator.cc   | 26 ---
 9 files changed, 211 insertions(+), 159 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/deque/modifiers/emplace/1.cc
 delete mode 100644 
libstdc++-v3/testsuite/23_containers/deque/modifiers/emplace/52799.cc
 delete mode 100644 
libstdc++-v3/testsuite/23_containers/deque/modifiers/emplace/const_iterator.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/list/modifiers/emplace/1.cc
 delete mode 100644 
libstdc++-v3/testsuite/23_containers/list/modifiers/emplace/52799.cc
 delete mode 100644 
libstdc++-v3/testsuite/23_containers/list/modifiers/emplace/const_iterator.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/vector/modifiers/emplace/1.cc
 delete mode 100644 
libstdc++-v3/testsuite/23_containers/vector/modifiers/emplace/52799.cc
 delete mode 100644 
libstdc++-v3/testsuite/23_containers/vector/modifiers/emplace/const_iterator.cc

diff --git a/libstdc++-v3/testsuite/23_containers/deque/modifiers/emplace/1.cc 
b/libstdc++-v3/testsuite/23_containers/deque/modifiers/emplace/1.cc
new file mode 100644
index 000..c6b0318e5ea
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/deque/modifiers/emplace/1.cc
@@ -0,0 +1,70 @@
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+
+void
+test01()
+{
+  std::deque c;
+  std::deque::iterator pos;
+
+  // libstdc++/52799
+  pos = c.emplace(c.begin());
+  VERIFY( c.size() == 1 );
+  VERIFY( c[0] == 0 );
+  VERIFY( pos == c.begin() );
+  pos = c.emplace(c.begin(), 2);
+  VERIFY( c.size() == 2 );
+  VERIFY( c[0] == 2 );
+  VERIFY( c[1] == 0 );
+  VERIFY( pos == c.begin() );
+  pos = c.emplace(c.end(), 3);
+  VERIFY( c.size() == 3 );
+  VERIFY( c[0] == 2 );
+  VERIFY( c[1] == 0 );
+  VERIFY( c[2] == 3 );
+  VERIFY( pos == --c.end() );
+
+  // const_iterator
+  pos = c.emplace(c.cbegin());
+  VERIFY( c.size() == 4 );
+  VERIFY( c[0] == 0 );
+  VERIFY( c[1] == 2 );
+  VERIFY( pos == c.cbegin() );
+  pos = c.emplace(c.cbegin() + 2, 22);
+  VERIFY( c.size() == 5 );
+  VERIFY( c[0] == 0 );
+  VERIFY( c[1] == 2 );
+  VERIFY( c[2] == 22 );
+  VERIFY( pos == c.cbegin() + 2 );
+}
+
+struct V
+{
+  explicit V(int a, int b = 0) : val(a+b) { }
+  int val;
+};
+
+void
+test02()
+{
+  std::deque c;
+  std::deque::iterator pos;
+
+  pos = c.emplace(c.end(), 1);
+  VERIFY( c.size() == 1 );
+  VERIFY( c[0].val == 1 );
+  VERIFY( pos == --c.end() );
+  pos = c.emplace(c.cend(), 2, 3);
+  VERIFY( c.size() == 2 );
+  VERIFY( c[0].val == 1 );
+  VERIFY( c[1].val == 5 );
+  VERIFY( pos == --c.cend() );
+}
+
+int main()
+{
+  test01();
+  test02();
+}
diff --git 
a/libstdc++-v3/testsuite/23_containers/deque/modifiers/emplace/52799.cc 
b/libstdc++-v3/testsuite/23_containers/deque/modifiers/emplace/52799.cc
deleted file mode 100644
index 0beebb58248..000
--- a/libstdc++-v3/testsuite/23_containers/deque/modifiers/emplace/52799.cc
+++ /dev/null
@@ -1,27 +0,0 @@
-// { dg-do compile { target c++11 } }
-
-// Copyright (C) 2012-2023 Free Software Foundation, Inc.
-//
-// This fil

Re: [PATCH] testsuite: fix the condition bug in tsvc s176

2023-06-09 Thread Richard Biener via Gcc-patches

On Thu, Jun 8, 2023 at 1:24 PM Lehua Ding  wrote:
>
> Hi,
>
> This patch fixes the problem that the loop in the tsvc s176 function is
> optimized and removed because `iterations/LEN_1D` is 0 (where iterations
> is set to 1, LEN_1D is set to 32000 in tsvc.h).
>
> This testcase passed on x86 and AArch64 system.

OK.

It's odd that the checksum doesn't depend on the number of iterations done ...

> Best,
> Lehua
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/tsvc/vect-tsvc-s176.c: adjust iterations
>
> ---
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c
> index 79faf7fdb9e4..365e5205982b 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c
> @@ -14,7 +14,7 @@ real_t s176(struct args_t * func_args)
>  initialise_arrays(__func__);
>
>  int m = LEN_1D/2;
> -for (int nl = 0; nl < 4*(iterations/LEN_1D); nl++) {
> +for (int nl = 0; nl < 4*(10*iterations/LEN_1D); nl++) {
>  for (int j = 0; j < (LEN_1D/2); j++) {
>  for (int i = 0; i < m; i++) {
>  a[i] += b[i+m-j-1] * c[j];
> @@ -39,4 +39,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } 
> } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> --
> 2.36.1
>

Re: [committed] libstdc++: Fix code size regressions in std::vector [PR110060]

2023-06-09 Thread Richard Biener via Gcc-patches

On Thu, Jun 8, 2023 at 11:15 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Thu, Jun 08, 2023 at 10:05:43AM +0100, Jonathan Wakely via Gcc-patches 
> wrote:
> > > Looking at assembly, one of the differences I see is that the "after"
> > > version has calls to realloc_insert(), while "before" version seems to 
> > > have
> > > them inlined [2].
> > >
> > > [1]
> > > https://git.linaro.org/toolchain/ci/interesting-commits.git/tree/gcc/sha1/b7b255e77a271974479c34d1db3daafc04b920bc/tcwg_bmk-code_size-cpu2017fast/status.txt
> > >
> > >
> > I find it annoying that adding `if (n < sz) __builtin_unreachable()` seems
> > to affect the size estimates for the function, and so perturbs inlining
> > decisions. That code shouldn't add any actual instructions, so shouldn't
> > affect size estimates.
> >
> > I mentioned this in a meeting last week and Jason suggested checking
> > whether using __builtin_assume has the same undesirable consequences, so I
>
> We don't support __builtin_assume (intentionally), if you mean 
> [[assume(n>=sz)]],
> then because n >= sz doesn't have side-effects, it will be lowered to
> exactly that if (n < sz) __builtin_unreachable(); - you can look at
> -fdump-tree-all to confirm that.
>
> I agree that the inliner should ignore if (comparison) 
> __builtin_unreachable();
> from costs estimation.  And inliner should ignore what we emit for 
> [[assume()]]
> if there are side-effects.  CCing Honza.

Agreed, that would be nice.  Note that we have inliner limits in place to avoid
compile-time and memory-usage explosion as well so these kind of
"tricks" may be a way to
defeat them.

Richard.

>
> Jakub
>

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Richard Sandiford via Gcc-patches

Andrew Stubbs  writes:
> On 07/06/2023 20:42, Richard Sandiford wrote:
>> I don't know if this helps (probably not), but we have a similar
>> situation on AArch64: a 64-bit mode like V8QI can be doubled to a
>> 128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
>> the former and "V2x8QI" for the latter.  V2x8QI is forced to come
>> after V16QI in the mode list, and so it is only ever used through
>> explicit choice.  But both modes are functionally vectors of 16 QIs.
>
> OK, that's interesting, but how do you map "complex int" vectors to that 
> mode? I tried to figure it out, but there's no DIVMOD support so I 
> couldn't just do a straight comparison.

Yeah, we don't do that currently.  Instead we make TARGET_ARRAY_MODE
return V2x8QI for an array of 2 V8QIs (which is OK, since V2x8QI has
64-bit rather than 128-bit alignment).  So we should use it for a
complex-y type like:

  struct { res_type res[2]; };

In principle we should be able to do the same for:

  struct { res_type a, b; };

but that isn't supported yet.  I think it would need a new target hook
along the lines of TARGET_ARRAY_MODE, but for structs rather than arrays.

The advantage of this from AArch64's PoV is that it extends to 3x and 4x
tuples as well, whereas complex is obviously for pairs only.

I don't know if it would be acceptable to use that kind of struct wrapper
for the divmod code though (for the vector case only).

Thanks,
Richard

Re: [PATCH RFC] c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]

2023-06-09 Thread Richard Biener via Gcc-patches

On Thu, Jun 8, 2023 at 3:14 PM Jonathan Wakely via Gcc-patches
 wrote:
>
> On Fri, 26 May 2023 at 10:58, Jonathan Wakely wrote:
>
> >
> >
> > On Wed, 24 May 2023 at 19:56, Jason Merrill via Libstdc++ <
> > libstd...@gcc.gnu.org> wrote:
> >
> >> Middle-end folks: any thoughts about how best to make the change
> >> described in
> >> the last paragraph below?
> >>
> >> Library folks: any thoughts on the changes to __cxa_call_terminate?
> >>
> >
> > I see no harm in exporting it (with the adjusted signature). The "looks
> > standard but isn't" name is a little unfortunate, but not a big deal.
> >
>
> Jason, do you have any objection to exporting __cxa_call_terminate for GCC
> 13.2 as well, even though the FE won't use it?
>
> Currently both gcc-13 and trunk are at the same library version,
> libstdc++.so.6.0.32
>
> But with this addition to trunk we need to bump that .32 to .33, meaning
> that gcc-13 and trunk diverge. If we want to backport any new symbols from
> trunk to gcc-13 that gets trickier once they've diverged.

But if you backport any new used symbol you have to bump the version
anyway.  So why not bump now (on trunk)?

> If we added __cxa_call_terminate to gcc-13, making it another new addition
> to libstdc++.so.6.0.32, then it would simplify a few things.
>
> In theory it could be a problem for distros already shipping gcc-13.1.1
> with that new libstdc++.so.6.0.32 version, but since the
> __cxa_call_terminate symbol won't actually be used by the gcc-13.1.1
> compilers, I don't think it will be a problem.

Re: [PATCH] fix frange_nextafter odr violation

2023-06-09 Thread Richard Biener via Gcc-patches

On Thu, Jun 8, 2023 at 4:38 PM Alexandre Oliva via Gcc-patches
 wrote:
>
>
> C++ requires inline functions to be declared inline and defined in
> every translation unit that uses them.  frange_nextafter is used in
> gimple-range-op.cc but it's only defined as inline in
> range-op-float.cc.  Drop the extraneous inline specifier.
>
> Other non-static inline functions in range-op-float.cc are not
> referenced elsewhere, so I'm making them static.
>
> Bootstrapping on x86_64-linux-gnu, along with other changes that exposed
> the problem; it's already into stage3, and it wouldn't get past stage2
> before.  Ok to install?

OK

>
> for  gcc/ChangeLog
>
> * range-op-float.cc (frange_nextafter): Drop inline.
> (frelop_early_resolve): Add static.
> (frange_float): Likewise.
> ---
>  gcc/range-op-float.cc |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
> index a99a6b01ed835..d6da2aa701ee3 100644
> --- a/gcc/range-op-float.cc
> +++ b/gcc/range-op-float.cc
> @@ -255,7 +255,7 @@ maybe_isnan (const frange &op1, const frange &op2)
>  // Floating version of relop_early_resolve that takes into account NAN
>  // and -ffinite-math-only.
>
> -inline bool
> +static inline bool
>  frelop_early_resolve (irange &r, tree type,
>   const frange &op1, const frange &op2,
>   relation_trio rel, relation_kind my_rel)
> @@ -272,7 +272,7 @@ frelop_early_resolve (irange &r, tree type,
>
>  // Set VALUE to its next real value, or INF if the operation overflows.
>
> -inline void
> +void
>  frange_nextafter (enum machine_mode mode,
>   REAL_VALUE_TYPE &value,
>   const REAL_VALUE_TYPE &inf)
> @@ -2878,7 +2878,7 @@ namespace selftest
>
>  // Build an frange from string endpoints.
>
> -inline frange
> +static inline frange
>  frange_float (const char *lb, const char *ub, tree type = float_type_node)
>  {
>REAL_VALUE_TYPE min, max;
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about

Re: [PATCH] doc: Clarification for -Wmissing-field-initializers

2023-06-09 Thread Richard Biener via Gcc-patches

On Thu, Jun 8, 2023 at 7:57 PM Marek Polacek via Gcc-patches
 wrote:
>
> The manual is incorrect in saying that the option does not warn
> about designated initializers, which it does in C++.  Whether the
> divergence in behavior is desirable is another thing, but let's
> at least make the manual match the reality.

OK.

> PR c/39589
> PR c++/96868
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: Clarify that -Wmissing-field-initializers doesn't
> warn about designated initializers in C only.
> ---
>  gcc/doc/invoke.texi | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 6d08229ce40..0870f7aff93 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -9591,8 +9591,9 @@ struct s @{ int f, g, h; @};
>  struct s x = @{ 3, 4 @};
>  @end smallexample
>
> -This option does not warn about designated initializers, so the following
> -modification does not trigger a warning:
> +@c It's unclear if this behavior is desirable.  See PR39589 and PR96868.
> +In C this option does not warn about designated initializers, so the
> +following modification does not trigger a warning:
>
>  @smallexample
>  struct s @{ int f, g, h; @};
>
> base-commit: 1379ae33e05c28d705f3c69a3f6c774bf6e83136
> --
> 2.40.1
>

Re: [PATCH] MATCH: Fix zero_one_valued_p not to match signed 1 bit integers

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, Jun 9, 2023 at 3:48 AM Andrew Pinski via Gcc-patches
 wrote:
>
> So for the attached testcase, we assumed that zero_one_valued_p would
> be the value [0,1] but currently zero_one_valued_p matches also
> signed 1 bit integers.
> This changes that not to match that and fixes the 2 new testcases at
> all optimization levels.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> Note the GCC 13 patch will be slightly different due to the changes
> made to zero_one_valued_p.
>
> PR tree-optimization/110165
> PR tree-optimization/110166
>
> gcc/ChangeLog:
>
> * match.pd (zero_one_valued_p): Don't accept
> signed 1-bit integers.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/pr110165-1.c: New test.
> * gcc.c-torture/execute/pr110166-1.c: New test.
> ---
>  gcc/match.pd  | 13 ++--
>  .../gcc.c-torture/execute/pr110165-1.c| 28 
>  .../gcc.c-torture/execute/pr110166-1.c| 33 +++
>  3 files changed, 71 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110165-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110166-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4ad037d641a..9a6bc2e9348 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1984,12 +1984,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>@0)
>
>  /* zero_one_valued_p will match when a value is known to be either
> -   0 or 1 including constants 0 or 1. */
> +   0 or 1 including constants 0 or 1.
> +   Signed 1-bits includes -1 so they cannot match here. */
>  (match zero_one_valued_p
>   @0
> - (if (INTEGRAL_TYPE_P (type) && wi::leu_p (tree_nonzero_bits (@0), 1
> + (if (INTEGRAL_TYPE_P (type)
> +  && (TYPE_UNSIGNED (type)
> + || TYPE_PRECISION (type) > 1)
> +  && wi::leu_p (tree_nonzero_bits (@0), 1
>  (match zero_one_valued_p
> - truth_valued_p@0)
> + truth_valued_p@0
> + (if (INTEGRAL_TYPE_P (type)
> +  && (TYPE_UNSIGNED (type)
> + || TYPE_PRECISION (type) > 1
>
>  /* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 }.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110165-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110165-1.c
> new file mode 100644
> index 000..9521a19428e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110165-1.c
> @@ -0,0 +1,28 @@
> +struct s
> +{
> +  int t : 1;
> +};
> +
> +int f(struct s t, int a, int b) __attribute__((noinline));
> +int f(struct s t, int a, int b)
> +{
> +int bd = t.t;
> +if (bd) a|=b;
> +return a;
> +}
> +
> +int main(void)
> +{
> +struct s t;
> +for(int i = -1;i <= 1; i++)
> +{
> +int a = 0x10;
> +int b = 0x0f;
> +int c = a | b;
> +   struct s t = {i};
> +int r = f(t, a, b);
> +int exp = (i != 0) ? a | b : a;
> +if (exp != r)
> + __builtin_abort();
> +}
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110166-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110166-1.c
> new file mode 100644
> index 000..f999d47fe69
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110166-1.c
> @@ -0,0 +1,33 @@
> +struct s
> +{
> +  int t : 1;
> +  int t1 : 1;
> +};
> +
> +int f(struct s t) __attribute__((noinline));
> +int f(struct s t)
> +{
> +   int c = t.t;
> +   int d = t.t1;
> +   if (c > d)
> + t.t = d;
> +   else
> + t.t = c;
> +  return t.t;
> +}
> +
> +int main(void)
> +{
> +struct s t;
> +for(int i = -1;i <= 0; i++)
> +{
> +  for(int j = -1;j <= 0; j++)
> +  {
> +   struct s t = {i, j};
> +int r = f(t);
> +int exp = i < j ? i : j;
> +if (exp != r)
> + __builtin_abort();
> +  }
> +}
> +}
> --
> 2.31.1
>

Re: [PATCH RFC] c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]

2023-06-09 Thread Jonathan Wakely via Gcc-patches

On Fri, 9 Jun 2023 at 10:03, Richard Biener 
wrote:

> On Thu, Jun 8, 2023 at 3:14 PM Jonathan Wakely via Gcc-patches
>  wrote:
> >
> > On Fri, 26 May 2023 at 10:58, Jonathan Wakely wrote:
> >
> > >
> > >
> > > On Wed, 24 May 2023 at 19:56, Jason Merrill via Libstdc++ <
> > > libstd...@gcc.gnu.org> wrote:
> > >
> > >> Middle-end folks: any thoughts about how best to make the change
> > >> described in
> > >> the last paragraph below?
> > >>
> > >> Library folks: any thoughts on the changes to __cxa_call_terminate?
> > >>
> > >
> > > I see no harm in exporting it (with the adjusted signature). The "looks
> > > standard but isn't" name is a little unfortunate, but not a big deal.
> > >
> >
> > Jason, do you have any objection to exporting __cxa_call_terminate for
> GCC
> > 13.2 as well, even though the FE won't use it?
> >
> > Currently both gcc-13 and trunk are at the same library version,
> > libstdc++.so.6.0.32
> >
> > But with this addition to trunk we need to bump that .32 to .33, meaning
> > that gcc-13 and trunk diverge. If we want to backport any new symbols
> from
> > trunk to gcc-13 that gets trickier once they've diverged.
>
> But if you backport any new used symbol you have to bump the version
> anyway.  So why not bump now (on trunk)?
>

We've already bumped it once since 13.1, and until 13.2 is released we
aren't committed to freezing the new version. I think we can add this
__cxa_call_terminate symbol to the version currently used by 13.1.1 without
problems. And if we want to backport another new symbol before 13.2, we can
do that too (unless it would be too difficult for distros already shipping
13.1.1, but I don't think that applies in this case).

Re: [PATCH RFC] c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]

2023-06-09 Thread Jakub Jelinek via Gcc-patches

On Fri, Jun 09, 2023 at 11:02:48AM +0200, Richard Biener via Gcc-patches wrote:
> > Currently both gcc-13 and trunk are at the same library version,
> > libstdc++.so.6.0.32
> >
> > But with this addition to trunk we need to bump that .32 to .33, meaning
> > that gcc-13 and trunk diverge. If we want to backport any new symbols from
> > trunk to gcc-13 that gets trickier once they've diverged.
> 
> But if you backport any new used symbol you have to bump the version
> anyway.  So why not bump now (on trunk)?

We've already done that in 13.1.1.  So, before 13.2 is released, we can add
further symbols to the GLIBCXX_3.4.32 symbol version.
Though, I don't see a problem bumping libstdc++ to libstdc++.so.6.0.33
on the trunk now and put __cxa_call_terminate to GLIBCXX_3.4.33.
The ABI on the trunk is certainly not stable at this point.
If we come up with a need to introduce another symbol to 13.2, we can just
add it to GLIBCXX_3.4.32 on the trunk and then backport that change to the
branch.  If nothing in 13 will use the new symbol, seems like a waste to add
it to libstdc++.so.6.0.32.

> > If we added __cxa_call_terminate to gcc-13, making it another new addition
> > to libstdc++.so.6.0.32, then it would simplify a few things.
> >
> > In theory it could be a problem for distros already shipping gcc-13.1.1
> > with that new libstdc++.so.6.0.32 version, but since the
> > __cxa_call_terminate symbol won't actually be used by the gcc-13.1.1
> > compilers, I don't think it will be a problem.

Jakub

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread Jiufu Guo via Gcc-patches



Hi,

Richard Biener  writes:

> On Fri, 9 Jun 2023, Richard Sandiford wrote:
>
>> guojiufu  writes:
>> > Hi,
>> >
>> > On 2023-06-09 16:00, Richard Biener wrote:
>> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> 
>> >>> Hi,
>> >>> 
>> >>> As checking the code, there is a "gcc_assert (SCALAR_INT_MODE_P 
>> >>> (mode))"
>> >>> in "try_const_anchors".
>> >>> This assert seems correct because the function try_const_anchors cares
>> >>> about integer values currently, and modes other than SCALAR_INT_MODE_P
>> >>> are not needed to support.
>> >>> 
>> >>> This patch makes sure SCALAR_INT_MODE_P when calling 
>> >>> try_const_anchors.
>> >>> 
>> >>> This patch is raised when drafting below one.
>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>> >>> try_const_anchors, and hits the assert/ice.
>> >>> 
>> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>> >>> Is this ok for trunk?
>> >> 
>> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
>> >> I suggest to instead fix try_const_anchors to change
>> >> 
>> >>   /* CONST_INT is used for CC modes, but we should leave those alone.  
>> >> */
>> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
>> >> return NULL_RTX;
>> >> 
>> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
>> >> 
>> >> to
>> >> 
>> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
>> >> alone.  */
>> >>   if (!SCALAR_INT_MODE_P (mode))
>> >> return NULL_RTX;
>> >> 
>> >
>> > This is also able to fix this issue.  there is a "Punt on CC modes" 
>> > patch
>> > to return NULL_RTX in try_const_anchors.
>> >
>> >> but as said I wonder how we arrive at a BLKmode CONST_INT and whether
>> >> we should have fended this off earlier.  Can you share more complete
>> >> RTL of that stack_tie?
>> >
>> >
>> > (insn 15 14 16 3 (parallel [
>> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>> >  (const_int 0 [0]))
>> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>> >   (nil))
>> >
>> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
>> 
>> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] ...)
>> would be though.  It's arguably more accurate too, since the effect
>> on the stack locations is unspecified rather than predictable.
>
> powerpc seems to be the only port with a stack_tie that's not
> using an UNSPEC RHS.
In rs6000.md, it is

; This is to explain that changes to the stack pointer should
; not be moved over loads from or stores to stack memory.
(define_insn "stack_tie"
  [(match_parallel 0 "tie_operand"
   [(set (mem:BLK (reg 1)) (const_int 0))])]
  ""
  ""
  [(set_attr "length" "0")])

This would be just an placeholder insn, and acts as the comments.
UNSPEC_ would works like other targets.  While, I'm wondering
the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
MODEs between SET_DEST and SET_SRC?

Thanks for comments!

BR,
Jeff (Jiufu Guo)
>
>> Thanks,
>> Richard

Re: [PATCH] MATCH: Fix zero_one_valued_p not to match signed 1 bit integers

2023-06-09 Thread Jakub Jelinek via Gcc-patches

On Fri, Jun 09, 2023 at 11:06:04AM +0200, Richard Biener via Gcc-patches wrote:
> On Fri, Jun 9, 2023 at 3:48 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > So for the attached testcase, we assumed that zero_one_valued_p would
> > be the value [0,1] but currently zero_one_valued_p matches also
> > signed 1 bit integers.
> > This changes that not to match that and fixes the 2 new testcases at
> > all optimization levels.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> 
> OK.

Note, this means it won't return true for zero INTEGER_CSTs with
signed 1-bit precision type.  Such value is in the [0, 1] range.
Though, I guess signed 1-bit precision types are so rare and problematic
that it doesn't hurt not to optimize that.

Jakub

Re: [PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread juzhe.zh...@rivai.ai

Just finish running Boostrap on X86 has passed.
Ok for trunk?

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-09 16:39
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH V6] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong 
 
Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener 
 
This patch address comments from Richard && Richi and rebase to trunk.
 
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
 
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
 
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
 
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
 
Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
-_36 = MIN_EXPR ;
+_36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
- &incr_gsi, insert_after, &index_before_incr,
- &index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
- index_before_incr,
- nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+ {
+   create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+  insert_after, &index_before_incr, &index_after_incr);
+   tree len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
+index_before_incr, nitems_step);
+   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len));
+ }
+  else
+ {
+   create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+  &incr_gsi, insert_after, &index_before_incr,
+  &index_after_incr);
+   gimple_seq_add_stmt (header_seq,
+gimple_build_assign (step, MIN_EXPR,
+ index_before_incr,
+ nitems_step));
+ }
   *iv_step = step;
   *compare_step = nitems_step;
-  return index_before_incr;
+  return LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? index_after_incr
+: index_before_incr;
 }
   /* Create increment IV.  */
@@ -888,7 +901,8 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
   gcond *cond_stmt;
-  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
+  && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
 {
   gcc_assert (compare_step);
   tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5b7a0da0034..ace9e759f5b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
 using_partial_vectors_p (false),
 using_decrementing_iv_p (false),
+using_select_vl_p (false),
 epil_using_partial_vectors_p (false),
 partial_load_store_bias (0),
 peeling_for_gaps (false),
@@ -2737,6 +2738,77 @@ start_over:
LOOP_VINFO_VECT_FACTOR (loop_vinfo
 LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
+  /* If a loop uses length controls and has a decrementing loop control IV,
+ we will normally pass that IV through a MIN_EXPR to calcaluate the
+ basis for the length controls.  E.g. in a loop that processes one
+ element per scalar iteration, the number of elements would be
+ MIN_EXPR , where N is the number of scalar iterations left.
+
+ This MIN_EXPR approach allows us to use pointer IVs with an invariant
+ step, since only the final iteration of the vector loop can have
+ inactive lanes.
+
+ However, some targets have a dedicated instruction for calculating the
+ preferred length, given the total number of elements that still need to
+ be processed.  This is encapsulated in the SELECT_VL internal function.
+
+ If the target supports SELECT_VL, we can use it instead of MIN_EXPR
+ to determine the basis for the length controls.  However, unlike the
+ MIN_EXPR calculation, the SELECT_VL calculation can decide to make
+ lanes inactive in any iteration of the vector loop, not just the last
+ iteration.  This SELECT_VL approach therefore requires us to use pointer
+ IVs with variable steps.
+
+ Once we've decided how many elements should be processed by one
+ iteration of the vector loop, we need to populate the rgroup controls.
+ If

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Andrew Stubbs


On 09/06/2023 10:02, Richard Sandiford wrote:

Andrew Stubbs  writes:

On 07/06/2023 20:42, Richard Sandiford wrote:

I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
the former and "V2x8QI" for the latter.  V2x8QI is forced to come
after V16QI in the mode list, and so it is only ever used through
explicit choice.  But both modes are functionally vectors of 16 QIs.


OK, that's interesting, but how do you map "complex int" vectors to that
mode? I tried to figure it out, but there's no DIVMOD support so I
couldn't just do a straight comparison.


Yeah, we don't do that currently.  Instead we make TARGET_ARRAY_MODE
return V2x8QI for an array of 2 V8QIs (which is OK, since V2x8QI has
64-bit rather than 128-bit alignment).  So we should use it for a
complex-y type like:

   struct { res_type res[2]; };

In principle we should be able to do the same for:

   struct { res_type a, b; };

but that isn't supported yet.  I think it would need a new target hook
along the lines of TARGET_ARRAY_MODE, but for structs rather than arrays.

The advantage of this from AArch64's PoV is that it extends to 3x and 4x
tuples as well, whereas complex is obviously for pairs only.

I don't know if it would be acceptable to use that kind of struct wrapper
for the divmod code though (for the vector case only).


Looking again, I don't think this will help because GCN does not have an 
instruction that loads vectors that are back-to-back, hence there's 
little benefit in adding the tuple mode.


However, GCN does have instructions that effectively load 2, 3, or 4 
vectors that are *interleaved*, which would be the likely case for 
complex numbers (or pixel colour data!)


I need to figure out how to move forward with this patch, please; if the 
new complex modes are not acceptable then I think I need to reimplement 
DIVMOD (maybe the scalars can remain as-is), but it's not clear to me 
what that would look like.


Andrew

Re: [PATCH v4] RISC-V: Add vector psabi checking.

2023-06-09 Thread Kito Cheng via Gcc-patches

Hmmm, I still saw some fail on testsuite after applying this patch,
most are because the testcase has used vector type as argument or
return value, but .. vector-abi-1.c should not fail I think?

For other fails, I would suggest you could just add -Wno-psabi to rvv.exp

=== gcc: Unexpected fails for rv64imafdcv lp64d medlow ===
FAIL: gcc.target/riscv/vector-abi-1.c   -O0   (test for warnings, line 7)
FAIL: gcc.target/riscv/vector-abi-1.c   -O1   (test for warnings, line 7)
FAIL: gcc.target/riscv/vector-abi-1.c   -O2   (test for warnings, line 7)
FAIL: gcc.target/riscv/vector-abi-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none   (test for warnings, line
7)
FAIL: gcc.target/riscv/vector-abi-1.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects   (test for warnings, line 7)
FAIL: gcc.target/riscv/vector-abi-1.c   -O3 -g   (test for warnings, line 7)
FAIL: gcc.target/riscv/vector-abi-1.c   -Os   (test for warnings, line 7)
FAIL: gcc.target/riscv/vector-abi-1.c  -Og -g   (test for warnings, line 7)
FAIL: gcc.target/riscv/vector-abi-1.c  -Oz   (test for warnings, line 7)
FAIL: gcc.target/riscv/rvv/base/binop_vx_constraint-120.c (test for
excess errors)
FAIL: gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c (test
for excess errors)
FAIL: gcc.target/riscv/rvv/base/mask_insn_shortcut.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c (test
for excess errors)
FAIL: gcc.target/riscv/rvv/base/pr110109-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/scalar_move-9.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/vlmul_ext-1.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c
(test for excess errors)
FAIL: gcc.target/riscv/rvv/base/zvfh-intrinsic.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c (test for excess errors)

  = Summary of gcc testsuite =
   | # of unexpected case / # of unique unexpected case
   |  gcc |  g++ | gfortran |
rv32imafdc/ ilp32d/ medlow |   20 /12 |0 / 0 |0 / 0 |
rv32imafdcv/ ilp32d/ medlow |   25 /14 |   22 /22 |0 / 0 |
rv64imafdc/  lp64d/ medlow |   20 /12 |0 / 0 |0 / 0 |
rv64imafdcv/  lp64d/ medlow |   20 /12 |   21 /21 |0 / 0 |

On Fri, Jun 9, 2023 at 2:02 PM yanzhang.wang--- via Gcc-patches
 wrote:
>
> From: Yanzhang Wang 
>
> This patch adds support to check function's argument or return is vector type
> and throw warning if yes.
>
> There're two exceptions,
>   - The vector_size attribute.
>   - The intrinsic functions.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (riscv_init_cumulative_args): Set
>   warning flag if func is not builtin
> * config/riscv/riscv.cc
> (riscv_scalable_vector_type_p): Determine whether the type is 
> scalable vector.
> (riscv_arg_has_vector): Determine whether the arg is vector type.
> (riscv_pass_in_vector_p): Check the vector type param is passed by 
> value.
> (riscv_init_cumulative_args): The same as header.
> (riscv_get_arg_info): Add the checking.
> (riscv_function_value): Check the func return and set warning flag
> * config/riscv/riscv.h (INIT_CUMULATIVE_ARGS): Add a flag to
>   determine whether warning psabi or not.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/vector-abi-1.c: New test.
> * gcc.target/riscv/vector-abi-2.c: New test.
> * gcc.target/riscv/vector-abi-3.c: New test.
> * gcc.target/riscv/vector-abi-4.c: New test.
> * gcc.target/riscv/vector-abi-5.c: New test.
> * gcc.target/riscv/vector-abi-6.c: New test.
>
> Signed-off-by: Yanzhang Wang 
> Co-authored-by: Kito Cheng 
> ---
>  gcc/config/riscv/riscv-protos.h   |   2 +
>  gcc/config/riscv/riscv.cc | 112 +-
>  gcc/config/riscv/riscv.h  |   5 +-
>  gcc/testsuite/gcc.target/riscv/vector-abi-1.c |  14 +++
>  gcc/testsuite/gcc.target/riscv/vector-abi-2.c |  15 +++
>  gcc/testsuite/gcc.target/riscv/vector-abi-3.c |  14 +++
>  gcc/testsuite/gcc.target/riscv/vector-abi-4.c |  16 +++
>  gcc/testsuite/gcc.target/riscv/vector-abi-5.c |  15 +++
>  gcc/testsuite/gcc.target/riscv/vector-abi-6.c |  20 
>  9 files changed, 211 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-6.c
>
> diff --git a/gcc/

Re: [PATCH] testsuite: fix the condition bug in tsvc s176

2023-06-09 Thread Lehua Ding

> It's odd that the checksum doesn't depend on the number of iterations done ...

This is because the difference between the calculated result (32063.902344) and
the expected result (32000.00) is small. The current check is that the 
result
is considered correct as long as the `value/expected` ratio is between 0.99f and
1.01f. I'm not sure if this check is enough, but I should also update the 
expected
result to 32063.902344 (the same without vectorized).

Best,
Lehua

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/tsvc.h:
* gcc.dg/vect/tsvc/vect-tsvc-s176.c:

---
 gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h   | 2 +-
 gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h 
b/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h
index cd39c041903d..d910c384fc83 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h
@@ -1164,7 +1164,7 @@ real_t get_expected_result(const char * name)
 } else if (!strcmp(name, "s175")) {
return 32009.023438f;
 } else if (!strcmp(name, "s176")) {
-   return 32000.f;
+   return 32063.902344f;
 } else if (!strcmp(name, "s211")) {
return 63983.308594f;
 } else if (!strcmp(name, "s212")) {
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c
index 79faf7fdb9e4..365e5205982b 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c
@@ -14,7 +14,7 @@ real_t s176(struct args_t * func_args)
 initialise_arrays(__func__);
 
 int m = LEN_1D/2;
-for (int nl = 0; nl < 4*(iterations/LEN_1D); nl++) {
+for (int nl = 0; nl < 4*(10*iterations/LEN_1D); nl++) {
 for (int j = 0; j < (LEN_1D/2); j++) {
 for (int i = 0; i < m; i++) {
 a[i] += b[i+m-j-1] * c[j];
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
-- 
2.36.1

[PATCH] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && 
Phase 6
are quite messy and cause some bugs discovered by my downstream 
auto-vectorization
test-generator.

Before this patch.

Phase 5 is cleanup_insns is the function remove AVL operand dependency from 
each RVV instruction.
E.g. vadd.vv (use a5), after Phase 5, > vadd.vv (use const_int 0). Since 
"a5" is used in "vsetvl" instructions and
after the correct "vsetvl" instructions are inserted, each RVV instruction 
doesn't need AVL operand "a5" anymore. Then,
we remove this operand dependency helps for the following scheduling PASS.

Phase 6 is propagate_avl do the following 2 things:
1. Local && Global user vsetvl instructions optimization.
   E.g.
  vsetvli a2, a2, e8, mf8   ==> Change it into vsetvli a2, a2, e32, mf2
  vsetvli zero,a2, e32, mf2  ==> eliminate
2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is 
not used by any instructions.
Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM 
which change the CFG, I re-new a new
RTL_SSA framework (which is more expensive than just using DF) for Phase 6 and 
optmize user vsetvli base on the new RTL_SSA.

There are 2 issues in Phase 5 && Phase 6:
1. local_eliminate_vsetvl_insn was introduced by @kito which can do better 
local user vsetvl optimizations better than
   Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So 
the local user vsetvli instructions optimizaiton
   in Phase 6 is redundant and should be removed.
2. A bug discovered by my downstream auto-vectorization test-generator (I can't 
put the test in this patch since we are missing autovec
   patterns for it so we can't use the upstream GCC directly reproduce such 
issue but I will remember put it back after I support the
   necessary autovec patterns). Such bug is causing by using RTL_SSA re-new 
framework. The issue description is this:
   
Before Phase 6:
   ...
   insn1: vsetlvi a3, 17 <== generated by SELECT_VL auto-vec pattern.
   slli a4,a3,3
   ...
   insn2: vsetvli zero, a3, ... 
   load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" is 
removed in Phase 5)
   ...

In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
insn2 is the vsetvli instruction inserted in Phase 4 which is not included in 
the RLT_SSA framework
even though we renew it (I didn't take a look at it and I don't think we need 
to now).
Base on this situation, the def_info of insn2 has the information 
"set->single_nondebug_insn_use ()"
which return true. Obviously, this information is not correct, since insn1 has 
aleast 2 uses:
1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by my 
downstream test-generator
execution test failed.

Conclusion of RTL_SSA framework:
Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of 
the VSETVL PASS which is absolutely correct, the other
is re-new after Phase 4 (LCM) has incorrect information that causes bugs.

Besides, we don't like to initialize RTL_SSA second time it seems to be a waste 
since we just need to do a little optimization.

Base on all circumstances I described above, I rework and reorganize Phase 5 && 
Phase 6 as follows:
1. Phase 5 is called ssa_post_optimization which is doing the optimization base 
on the RTL_SSA information (The RTL_SSA is initialized
   at the beginning of the VSETVL PASS, no need to re-new it again). This phase 
includes 3 optimizaitons:
   1). local_eliminate_vsetvl_insn we already have (no change).
   2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from orignal 
Phase 6 but with more powerful and reliable implementation.
  E.g. 
  void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
size_t avl;
if (m > 100)
  avl = __riscv_vsetvl_e16mf4(vl << 4);
else{
  avl = __riscv_vsetvl_e8mf8(vl);
}
for (size_t i = 0; i < m; i++) {
  vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
  __riscv_vse8_v_i8mf8(out + i, v0, avl);
}
  }
  This example failed to global user vsetvl optimize before this patch:
  f:
  ...
  vsetvli a2,a2,e16,mf4,ta,mu
  .L3:
  li  a5,0
  vsetvli zero,a2,e8,mf8,ta,ma
  .L5:
  ...
  vle8.v  v1,0(a6)
  addia5,a5,1
  vse8.v  v1,0(a4)
  bgtua3,a5,.L5
  .L10:
  ret
  .L2:
  beq a3,zero,.L10
  vsetvli a2,a2,e8,mf8,ta,mu
  j   .L3
  With this patch:
  f:
  ...
  vsetvli zero,a2,e16,mf4,ta,mu
  .L3:
  li  a5,0
  .L5:
  ...
  vle8.v  v1,0(a6)
  addia5,a5,1
  vse8.v  v1,0(a4)
  bgtua3,a5,.L5
  .L10:

[PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && 
Phase 6
are quite messy and cause some bugs discovered by my downstream 
auto-vectorization
test-generator.

Before this patch.

Phase 5 is cleanup_insns is the function remove AVL operand dependency from 
each RVV instruction.
E.g. vadd.vv (use a5), after Phase 5, > vadd.vv (use const_int 0). Since 
"a5" is used in "vsetvl" instructions and
after the correct "vsetvl" instructions are inserted, each RVV instruction 
doesn't need AVL operand "a5" anymore. Then,
we remove this operand dependency helps for the following scheduling PASS.

Phase 6 is propagate_avl do the following 2 things:
1. Local && Global user vsetvl instructions optimization.
   E.g.
  vsetvli a2, a2, e8, mf8   ==> Change it into vsetvli a2, a2, e32, mf2
  vsetvli zero,a2, e32, mf2  ==> eliminate
2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is 
not used by any instructions.
Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM 
which change the CFG, I re-new a new
RTL_SSA framework (which is more expensive than just using DF) for Phase 6 and 
optmize user vsetvli base on the new RTL_SSA.

There are 2 issues in Phase 5 && Phase 6:
1. local_eliminate_vsetvl_insn was introduced by @kito which can do better 
local user vsetvl optimizations better than
   Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So 
the local user vsetvli instructions optimizaiton
   in Phase 6 is redundant and should be removed.
2. A bug discovered by my downstream auto-vectorization test-generator (I can't 
put the test in this patch since we are missing autovec
   patterns for it so we can't use the upstream GCC directly reproduce such 
issue but I will remember put it back after I support the
   necessary autovec patterns). Such bug is causing by using RTL_SSA re-new 
framework. The issue description is this:
   
Before Phase 6:
   ...
   insn1: vsetlvi a3, 17 <== generated by SELECT_VL auto-vec pattern.
   slli a4,a3,3
   ...
   insn2: vsetvli zero, a3, ... 
   load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" is 
removed in Phase 5)
   ...

In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
insn2 is the vsetvli instruction inserted in Phase 4 which is not included in 
the RLT_SSA framework
even though we renew it (I didn't take a look at it and I don't think we need 
to now).
Base on this situation, the def_info of insn2 has the information 
"set->single_nondebug_insn_use ()"
which return true. Obviously, this information is not correct, since insn1 has 
aleast 2 uses:
1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by my 
downstream test-generator
execution test failed.

Conclusion of RTL_SSA framework:
Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of 
the VSETVL PASS which is absolutely correct, the other
is re-new after Phase 4 (LCM) has incorrect information that causes bugs.

Besides, we don't like to initialize RTL_SSA second time it seems to be a waste 
since we just need to do a little optimization.

Base on all circumstances I described above, I rework and reorganize Phase 5 && 
Phase 6 as follows:
1. Phase 5 is called ssa_post_optimization which is doing the optimization base 
on the RTL_SSA information (The RTL_SSA is initialized
   at the beginning of the VSETVL PASS, no need to re-new it again). This phase 
includes 3 optimizaitons:
   1). local_eliminate_vsetvl_insn we already have (no change).
   2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from orignal 
Phase 6 but with more powerful and reliable implementation.
  E.g. 
  void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
size_t avl;
if (m > 100)
  avl = __riscv_vsetvl_e16mf4(vl << 4);
else
  avl = __riscv_vsetvl_e32mf2(vl >> 8);
for (size_t i = 0; i < m; i++) {
  vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
  v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl);
  __riscv_vse8_v_i8mf8(out + i, v0, avl);
}
  }

  This example failed to global user vsetvl optimize before this patch:
  f:
  li  a5,100
  bleua3,a5,.L2
  sllia2,a2,4
  vsetvli a4,a2,e16,mf4,ta,mu
  .L3:
  li  a5,0
  vsetvli zero,a4,e8,mf8,ta,ma
  .L5:
  add a6,a0,a5
  add a2,a1,a5
  vle8.v  v1,0(a6)
  addia5,a5,1
  vadd.vv v1,v1,v1
  vse8.v  v1,0(a2)
  bgtua3,a5,.L5
  .L10:
  ret
  .L2:
  beq a3,zero,.L10
  srlia2,a2,8
  vsetvli a4,a2,e32,mf2,ta,mu
  j   .L3
  With this patch:
  f:
  li  a5,100
  bleu

Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread Kito Cheng via Gcc-patches

Thankful you send this before weekend, I could run the fuzzy testing
during this weekend :P

On Fri, Jun 9, 2023 at 6:41 PM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && 
> Phase 6
> are quite messy and cause some bugs discovered by my downstream 
> auto-vectorization
> test-generator.
>
> Before this patch.
>
> Phase 5 is cleanup_insns is the function remove AVL operand dependency from 
> each RVV instruction.
> E.g. vadd.vv (use a5), after Phase 5, > vadd.vv (use const_int 0). Since 
> "a5" is used in "vsetvl" instructions and
> after the correct "vsetvl" instructions are inserted, each RVV instruction 
> doesn't need AVL operand "a5" anymore. Then,
> we remove this operand dependency helps for the following scheduling PASS.
>
> Phase 6 is propagate_avl do the following 2 things:
> 1. Local && Global user vsetvl instructions optimization.
>E.g.
>   vsetvli a2, a2, e8, mf8   ==> Change it into vsetvli a2, a2, e32, 
> mf2
>   vsetvli zero,a2, e32, mf2  ==> eliminate
> 2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is 
> not used by any instructions.
> Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM 
> which change the CFG, I re-new a new
> RTL_SSA framework (which is more expensive than just using DF) for Phase 6 
> and optmize user vsetvli base on the new RTL_SSA.
>
> There are 2 issues in Phase 5 && Phase 6:
> 1. local_eliminate_vsetvl_insn was introduced by @kito which can do better 
> local user vsetvl optimizations better than
>Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So 
> the local user vsetvli instructions optimizaiton
>in Phase 6 is redundant and should be removed.
> 2. A bug discovered by my downstream auto-vectorization test-generator (I 
> can't put the test in this patch since we are missing autovec
>patterns for it so we can't use the upstream GCC directly reproduce such 
> issue but I will remember put it back after I support the
>necessary autovec patterns). Such bug is causing by using RTL_SSA re-new 
> framework. The issue description is this:
>
> Before Phase 6:
>...
>insn1: vsetlvi a3, 17 <== generated by SELECT_VL auto-vec pattern.
>slli a4,a3,3
>...
>insn2: vsetvli zero, a3, ...
>load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" 
> is removed in Phase 5)
>...
>
> In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
> insn2 is the vsetvli instruction inserted in Phase 4 which is not included in 
> the RLT_SSA framework
> even though we renew it (I didn't take a look at it and I don't think we need 
> to now).
> Base on this situation, the def_info of insn2 has the information 
> "set->single_nondebug_insn_use ()"
> which return true. Obviously, this information is not correct, since insn1 
> has aleast 2 uses:
> 1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by 
> my downstream test-generator
> execution test failed.
>
> Conclusion of RTL_SSA framework:
> Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of 
> the VSETVL PASS which is absolutely correct, the other
> is re-new after Phase 4 (LCM) has incorrect information that causes bugs.
>
> Besides, we don't like to initialize RTL_SSA second time it seems to be a 
> waste since we just need to do a little optimization.
>
> Base on all circumstances I described above, I rework and reorganize Phase 5 
> && Phase 6 as follows:
> 1. Phase 5 is called ssa_post_optimization which is doing the optimization 
> base on the RTL_SSA information (The RTL_SSA is initialized
>at the beginning of the VSETVL PASS, no need to re-new it again). This 
> phase includes 3 optimizaitons:
>1). local_eliminate_vsetvl_insn we already have (no change).
>2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from 
> orignal Phase 6 but with more powerful and reliable implementation.
>   E.g.
>   void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
> size_t avl;
> if (m > 100)
>   avl = __riscv_vsetvl_e16mf4(vl << 4);
> else
>   avl = __riscv_vsetvl_e32mf2(vl >> 8);
> for (size_t i = 0; i < m; i++) {
>   vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
>   v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl);
>   __riscv_vse8_v_i8mf8(out + i, v0, avl);
> }
>   }
>
>   This example failed to global user vsetvl optimize before this patch:
>   f:
>   li  a5,100
>   bleua3,a5,.L2
>   sllia2,a2,4
>   vsetvli a4,a2,e16,mf4,ta,mu
>   .L3:
>   li  a5,0
>   vsetvli zero,a4,e8,mf8,ta,ma
>   .L5:
>   add a6,a0,a5
>   add a2,a1,a5
>   vle8.v  v1,0(a6)
>   addia5,a5,1
>

Re: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread juzhe.zh...@rivai.ai

This patch removed 2nd time initialization of RTL_SSA which is the approach we 
both hate.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-09 18:45
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li
Subject: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS
Thankful you send this before weekend, I could run the fuzzy testing
during this weekend :P
 
On Fri, Jun 9, 2023 at 6:41 PM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && 
> Phase 6
> are quite messy and cause some bugs discovered by my downstream 
> auto-vectorization
> test-generator.
>
> Before this patch.
>
> Phase 5 is cleanup_insns is the function remove AVL operand dependency from 
> each RVV instruction.
> E.g. vadd.vv (use a5), after Phase 5, > vadd.vv (use const_int 0). Since 
> "a5" is used in "vsetvl" instructions and
> after the correct "vsetvl" instructions are inserted, each RVV instruction 
> doesn't need AVL operand "a5" anymore. Then,
> we remove this operand dependency helps for the following scheduling PASS.
>
> Phase 6 is propagate_avl do the following 2 things:
> 1. Local && Global user vsetvl instructions optimization.
>E.g.
>   vsetvli a2, a2, e8, mf8   ==> Change it into vsetvli a2, a2, e32, 
> mf2
>   vsetvli zero,a2, e32, mf2  ==> eliminate
> 2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is 
> not used by any instructions.
> Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM 
> which change the CFG, I re-new a new
> RTL_SSA framework (which is more expensive than just using DF) for Phase 6 
> and optmize user vsetvli base on the new RTL_SSA.
>
> There are 2 issues in Phase 5 && Phase 6:
> 1. local_eliminate_vsetvl_insn was introduced by @kito which can do better 
> local user vsetvl optimizations better than
>Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So 
> the local user vsetvli instructions optimizaiton
>in Phase 6 is redundant and should be removed.
> 2. A bug discovered by my downstream auto-vectorization test-generator (I 
> can't put the test in this patch since we are missing autovec
>patterns for it so we can't use the upstream GCC directly reproduce such 
> issue but I will remember put it back after I support the
>necessary autovec patterns). Such bug is causing by using RTL_SSA re-new 
> framework. The issue description is this:
>
> Before Phase 6:
>...
>insn1: vsetlvi a3, 17 <== generated by SELECT_VL auto-vec pattern.
>slli a4,a3,3
>...
>insn2: vsetvli zero, a3, ...
>load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" 
> is removed in Phase 5)
>...
>
> In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
> insn2 is the vsetvli instruction inserted in Phase 4 which is not included in 
> the RLT_SSA framework
> even though we renew it (I didn't take a look at it and I don't think we need 
> to now).
> Base on this situation, the def_info of insn2 has the information 
> "set->single_nondebug_insn_use ()"
> which return true. Obviously, this information is not correct, since insn1 
> has aleast 2 uses:
> 1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by 
> my downstream test-generator
> execution test failed.
>
> Conclusion of RTL_SSA framework:
> Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of 
> the VSETVL PASS which is absolutely correct, the other
> is re-new after Phase 4 (LCM) has incorrect information that causes bugs.
>
> Besides, we don't like to initialize RTL_SSA second time it seems to be a 
> waste since we just need to do a little optimization.
>
> Base on all circumstances I described above, I rework and reorganize Phase 5 
> && Phase 6 as follows:
> 1. Phase 5 is called ssa_post_optimization which is doing the optimization 
> base on the RTL_SSA information (The RTL_SSA is initialized
>at the beginning of the VSETVL PASS, no need to re-new it again). This 
> phase includes 3 optimizaitons:
>1). local_eliminate_vsetvl_insn we already have (no change).
>2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from 
> orignal Phase 6 but with more powerful and reliable implementation.
>   E.g.
>   void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
> size_t avl;
> if (m > 100)
>   avl = __riscv_vsetvl_e16mf4(vl << 4);
> else
>   avl = __riscv_vsetvl_e32mf2(vl >> 8);
> for (size_t i = 0; i < m; i++) {
>   vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
>   v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl);
>   __riscv_vse8_v_i8mf8(out + i, v0, avl);
> }
>   }
>
>   This example failed to global user vsetvl optimize before this patch:
>   f:
>   li  a5,100
>   bleu

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, 9 Jun 2023, Jiufu Guo wrote:

> 
> Hi,
> 
> Richard Biener  writes:
> 
> > On Fri, 9 Jun 2023, Richard Sandiford wrote:
> >
> >> guojiufu  writes:
> >> > Hi,
> >> >
> >> > On 2023-06-09 16:00, Richard Biener wrote:
> >> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
> >> >> 
> >> >>> Hi,
> >> >>> 
> >> >>> As checking the code, there is a "gcc_assert (SCALAR_INT_MODE_P 
> >> >>> (mode))"
> >> >>> in "try_const_anchors".
> >> >>> This assert seems correct because the function try_const_anchors cares
> >> >>> about integer values currently, and modes other than SCALAR_INT_MODE_P
> >> >>> are not needed to support.
> >> >>> 
> >> >>> This patch makes sure SCALAR_INT_MODE_P when calling 
> >> >>> try_const_anchors.
> >> >>> 
> >> >>> This patch is raised when drafting below one.
> >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
> >> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
> >> >>> try_const_anchors, and hits the assert/ice.
> >> >>> 
> >> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
> >> >>> Is this ok for trunk?
> >> >> 
> >> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
> >> >> I suggest to instead fix try_const_anchors to change
> >> >> 
> >> >>   /* CONST_INT is used for CC modes, but we should leave those alone.  
> >> >> */
> >> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
> >> >> return NULL_RTX;
> >> >> 
> >> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
> >> >> 
> >> >> to
> >> >> 
> >> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
> >> >> alone.  */
> >> >>   if (!SCALAR_INT_MODE_P (mode))
> >> >> return NULL_RTX;
> >> >> 
> >> >
> >> > This is also able to fix this issue.  there is a "Punt on CC modes" 
> >> > patch
> >> > to return NULL_RTX in try_const_anchors.
> >> >
> >> >> but as said I wonder how we arrive at a BLKmode CONST_INT and whether
> >> >> we should have fended this off earlier.  Can you share more complete
> >> >> RTL of that stack_tie?
> >> >
> >> >
> >> > (insn 15 14 16 3 (parallel [
> >> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
> >> >  (const_int 0 [0]))
> >> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
> >> >   (nil))
> >> >
> >> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
> >> 
> >> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] ...)
> >> would be though.  It's arguably more accurate too, since the effect
> >> on the stack locations is unspecified rather than predictable.
> >
> > powerpc seems to be the only port with a stack_tie that's not
> > using an UNSPEC RHS.
> In rs6000.md, it is
> 
> ; This is to explain that changes to the stack pointer should
> ; not be moved over loads from or stores to stack memory.
> (define_insn "stack_tie"
>   [(match_parallel 0 "tie_operand"
>  [(set (mem:BLK (reg 1)) (const_int 0))])]
>   ""
>   ""
>   [(set_attr "length" "0")])
> 
> This would be just an placeholder insn, and acts as the comments.
> UNSPEC_ would works like other targets.  While, I'm wondering
> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
> MODEs between SET_DEST and SET_SRC?

I don't think the issue is the mode but the issue is that
the patter as-is says some memory is zeroed while that's not
actually true (not specifying a size means we can't really do
anything with this MEM, but still).  Using an UNSPEC avoids
implying anything for the stored value.

Of course I think a MEM SET_DEST without a specified size is bougs
as well, but there's larger precedent for this...

Richard.

> Thanks for comments!
> 
> BR,
> Jeff (Jiufu Guo)
> >
> >> Thanks,
> >> Richard
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, 9 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Co-authored-by: Richard Sandiford
> Co-authored-by: Richard Biener 
> 
> This patch address comments from Richard && Richi and rebase to trunk.
> 
> This patch is adding SELECT_VL middle-end support
> allow target have target dependent optimization in case of
> length calculation.
> 
> This patch is inspired by RVV ISA and LLVM:
> https://reviews.llvm.org/D99750
> 
> The SELECT_VL is same behavior as LLVM "get_vector_length" with
> these following properties:
> 
> 1. Only apply on single-rgroup.
> 2. non SLP.
> 3. adjust loop control IV.
> 4. adjust data reference IV.
> 5. allow non-vf elements processing in non-final iteration
> 
> Code:
># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> # { for (size_t i=0; i 
> Take RVV codegen for example:
> 
> Before this patch:
> vvaddint32:
> ble a0,zero,.L6
> csrra4,vlenb
> srlia6,a4,2
> .L4:
> mv  a5,a0
> bleua0,a6,.L3
> mv  a5,a6
> .L3:
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a2)
> vsetvli a7,zero,e32,m1,ta,ma
> sub a0,a0,a5
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a3)
> add a2,a2,a4
> add a3,a3,a4
> add a1,a1,a4
> bne a0,zero,.L4
> .L6:
> ret
> 
> After this patch:
> 
> vvaddint32:
> vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> vle32.v v0, (a1) # Get first vector
>   sub a0, a0, t0 # Decrement number done
>   slli t0, t0, 2 # Multiply number done by 4 bytes
>   add a1, a1, t0 # Bump pointer
> vle32.v v1, (a2) # Get second vector
>   add a2, a2, t0 # Bump pointer
> vadd.vv v2, v0, v1   # Sum vectors
> vse32.v v2, (a3) # Store result
>   add a3, a3, t0 # Bump pointer
>   bnez a0, vvaddint32# Loop back
>   ret# Finished

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> * doc/md.texi: Add SELECT_VL support.
> * internal-fn.def (SELECT_VL): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
> * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
> (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
> 
> ---
>  gcc/doc/md.texi | 22 
>  gcc/internal-fn.def |  1 +
>  gcc/optabs.def  |  1 +
>  gcc/tree-vect-loop-manip.cc | 32 -
>  gcc/tree-vect-loop.cc   | 72 +
>  gcc/tree-vect-stmts.cc  | 69 +++
>  gcc/tree-vectorizer.h   |  6 
>  7 files changed, 187 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6a435eb4461..95f7fe1f802 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of scalar iterations that should be handled
> +by one iteration of a vector loop.  Operand 1 is the total number of
> +scalar iterations that the loop needs to process and operand 2 is a
> +maximum bound on the result (also known as the maximum ``vectorization
> +factor'').
> +
> +The maximum value of operand 0 is given by:
> +@smallexample
> +operand0 = MIN (operand1, operand2)
> +@end smallexample
> +However, targets might choose a lower value than this, based on
> +target-specific criteria.  Each iteration of the vector loop might
> +therefore process a different number of scalar iterations, which in turn
> +means that induction variables will have a variable step.  Because of
> +this, it is generally not useful to define this instruction if it will
> +always calculate the maximum value.
> +
> +This optab is only useful on targets that implement @samp{len_load_@var{m}}
> +and/or @samp{len_store_@var{m}}.
> +
>  @cindex @code{check_raw_ptrs@var{m}} instruction pattern
>  @item @samp{check_raw_ptrs@var{m}}
>  Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 3ac9d82aace..5d638de6d06 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -177,6 +177,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
>  DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
>  
>  DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
> +DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_C

Re: Re: [PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread juzhe.zh...@rivai.ai

Thanks, Richi.

Should I wait for Richard ACK gain ? 
Since the last email of this patch, he just asked me to adjust comment no codes 
change.
I am not sure whether he is ok.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-09 19:02
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V6] VECT: Add SELECT_VL support
On Fri, 9 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Co-authored-by: Richard Sandiford
> Co-authored-by: Richard Biener 
> 
> This patch address comments from Richard && Richi and rebase to trunk.
> 
> This patch is adding SELECT_VL middle-end support
> allow target have target dependent optimization in case of
> length calculation.
> 
> This patch is inspired by RVV ISA and LLVM:
> https://reviews.llvm.org/D99750
> 
> The SELECT_VL is same behavior as LLVM "get_vector_length" with
> these following properties:
> 
> 1. Only apply on single-rgroup.
> 2. non SLP.
> 3. adjust loop control IV.
> 4. adjust data reference IV.
> 5. allow non-vf elements processing in non-final iteration
> 
> Code:
># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> # { for (size_t i=0; i 
> Take RVV codegen for example:
> 
> Before this patch:
> vvaddint32:
> ble a0,zero,.L6
> csrra4,vlenb
> srlia6,a4,2
> .L4:
> mv  a5,a0
> bleua0,a6,.L3
> mv  a5,a6
> .L3:
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a2)
> vsetvli a7,zero,e32,m1,ta,ma
> sub a0,a0,a5
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a3)
> add a2,a2,a4
> add a3,a3,a4
> add a1,a1,a4
> bne a0,zero,.L4
> .L6:
> ret
> 
> After this patch:
> 
> vvaddint32:
> vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> vle32.v v0, (a1) # Get first vector
>   sub a0, a0, t0 # Decrement number done
>   slli t0, t0, 2 # Multiply number done by 4 bytes
>   add a1, a1, t0 # Bump pointer
> vle32.v v1, (a2) # Get second vector
>   add a2, a2, t0 # Bump pointer
> vadd.vv v2, v0, v1   # Sum vectors
> vse32.v v2, (a3) # Store result
>   add a3, a3, t0 # Bump pointer
>   bnez a0, vvaddint32# Loop back
>   ret# Finished
 
OK.
 
Thanks,
Richard.
 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add SELECT_VL support.
> * internal-fn.def (SELECT_VL): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
> * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
> (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
> 
> ---
>  gcc/doc/md.texi | 22 
>  gcc/internal-fn.def |  1 +
>  gcc/optabs.def  |  1 +
>  gcc/tree-vect-loop-manip.cc | 32 -
>  gcc/tree-vect-loop.cc   | 72 +
>  gcc/tree-vect-stmts.cc  | 69 +++
>  gcc/tree-vectorizer.h   |  6 
>  7 files changed, 187 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6a435eb4461..95f7fe1f802 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of scalar iterations that should be handled
> +by one iteration of a vector loop.  Operand 1 is the total number of
> +scalar iterations that the loop needs to process and operand 2 is a
> +maximum bound on the result (also known as the maximum ``vectorization
> +factor'').
> +
> +The maximum value of operand 0 is given by:
> +@smallexample
> +operand0 = MIN (operand1, operand2)
> +@end smallexample
> +However, targets might choose a lower value than this, based on
> +target-specific criteria.  Each iteration of the vector loop might
> +therefore process a different number of scalar iterations, which in turn
> +means that induction variables will have a variable step.  Because of
> +this, it is generally not useful to define this instruction if it will
> +always calculate the maximum value.
> +
> +This optab is only useful on targets that implement @samp{len_load_@var{m}}
> +and/or @samp{len_store_@var{m}}.
> +
>  @cindex @code{check_raw_ptrs@var{m}} instruction pattern
>  @item @samp{check_raw_ptrs@var{m}}
>  Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.de

Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, 9 Jun 2023, Richard Biener wrote:

> The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
> ICEs when tree checking is enabled.  This should avoid wrong-code
> in cases like PR110182 and instead ICE.
> 
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu, I guess
> there will be some fallout of such change ...

The following is what I need to get it to boostrap on 
x86_64-unknown-linux-gnu (with all languages enabled).

I think some cases warrant a TYPE_PRECISION_RAW but most
are fixing existing errors.  For some cases I didn't dig
deep enough if the code also needs to compare TYPE_VECTOR_SUBPARTS.

The testsuite is running and shows more issues ...

I put this on hold for the moment but hope to get back to it at
some point.  I'll followup with the testresults though.

Richard.


diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 9c8eed5442a..34566a342bd 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -1338,6 +1338,10 @@ shorten_binary_op (tree result_type, tree op0, tree op1, 
bool bitwise)
   int uns;
   tree type;
 
+  /* Do not shorten vector operations.  */
+  if (VECTOR_TYPE_P (result_type))
+return result_type;
+
   /* Cast OP0 and OP1 to RESULT_TYPE.  Doing so prevents
  excessive narrowing when we call get_narrower below.  For
  example, suppose that OP0 is of unsigned int extended
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 3f3c6685bb3..a8c033ba008 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -12574,10 +12574,10 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
tree targ1 = strip_float_extensions (arg1);
tree newtype = TREE_TYPE (targ0);
 
-   if (TYPE_PRECISION (TREE_TYPE (targ1)) > TYPE_PRECISION (newtype))
+   if (element_precision (TREE_TYPE (targ1)) > element_precision (newtype))
  newtype = TREE_TYPE (targ1);
 
-   if (TYPE_PRECISION (newtype) < TYPE_PRECISION (TREE_TYPE (arg0)))
+   if (element_precision (newtype) < element_precision (TREE_TYPE (arg0)))
  return fold_build2_loc (loc, code, type,
  fold_convert_loc (loc, newtype, targ0),
  fold_convert_loc (loc, newtype, targ1));
@@ -14540,7 +14540,8 @@ tree_expr_maybe_real_minus_zero_p (const_tree x)
 static bool
 tree_simple_nonnegative_warnv_p (enum tree_code code, tree type)
 {
-  if ((TYPE_PRECISION (type) != 1 || TYPE_UNSIGNED (type))
+  if (!VECTOR_TYPE_P (type)
+  && (TYPE_PRECISION (type) != 1 || TYPE_UNSIGNED (type))
   && truth_value_p (code))
 /* Truth values evaluate to 0 or 1, which is nonnegative unless we
have a signed:1 type (where the value is -1 and 0).  */
diff --git a/gcc/tree-ssa-scopedtables.cc b/gcc/tree-ssa-scopedtables.cc
index 528ddf2a2ab..e698ef97343 100644
--- a/gcc/tree-ssa-scopedtables.cc
+++ b/gcc/tree-ssa-scopedtables.cc
@@ -574,7 +574,7 @@ hashable_expr_equal_p (const struct hashable_expr *expr0,
   && (TREE_CODE (type0) == ERROR_MARK
  || TREE_CODE (type1) == ERROR_MARK
  || TYPE_UNSIGNED (type0) != TYPE_UNSIGNED (type1)
- || TYPE_PRECISION (type0) != TYPE_PRECISION (type1)
+ || element_precision (type0) != element_precision (type1)
  || TYPE_MODE (type0) != TYPE_MODE (type1)))
 return false;
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 8e144bc090e..4b43e209c6e 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -13423,7 +13423,10 @@ verify_type_variant (const_tree t, tree tv)
}
   verify_variant_match (TYPE_NEEDS_CONSTRUCTING);
 }
-  verify_variant_match (TYPE_PRECISION);
+  /* ???  Need a TYPE_PRECISION_RAW here?  TYPE_VECTOR_SUBPARTS
+ is a poly-int.  */
+  if (!VECTOR_TYPE_P (t))
+verify_variant_match (TYPE_PRECISION);
   if (RECORD_OR_UNION_TYPE_P (t))
 verify_variant_match (TYPE_TRANSPARENT_AGGR);
   else if (TREE_CODE (t) == ARRAY_TYPE)
@@ -13701,8 +13704,12 @@ gimple_canonical_types_compatible_p (const_tree t1, 
const_tree t2,
   || TREE_CODE (t1) == OFFSET_TYPE
   || POINTER_TYPE_P (t1))
 {
-  /* Can't be the same type if they have different recision.  */
-  if (TYPE_PRECISION (t1) != TYPE_PRECISION (t2))
+  /* Can't be the same type if they have different precision.  */
+  /* ??? TYPE_PRECISION_RAW for speed.  */
+  if ((VECTOR_TYPE_P (t1)
+  && maybe_ne (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2)))
+ || (!VECTOR_TYPE_P (t1)
+ && TYPE_PRECISION (t1) != TYPE_PRECISION (t2)))
return false;
 
   /* In some cases the signed and unsigned types are required to be

Re: Re: [PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread juzhe.zh...@rivai.ai

Thanks a lot Richi.

Even though last time Richard asked me no need to wait for 2nd ACK,
I am still want to wait for Richard final approval since I am not sure this 
patch is ok for him.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-09 19:02
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V6] VECT: Add SELECT_VL support
On Fri, 9 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Co-authored-by: Richard Sandiford
> Co-authored-by: Richard Biener 
> 
> This patch address comments from Richard && Richi and rebase to trunk.
> 
> This patch is adding SELECT_VL middle-end support
> allow target have target dependent optimization in case of
> length calculation.
> 
> This patch is inspired by RVV ISA and LLVM:
> https://reviews.llvm.org/D99750
> 
> The SELECT_VL is same behavior as LLVM "get_vector_length" with
> these following properties:
> 
> 1. Only apply on single-rgroup.
> 2. non SLP.
> 3. adjust loop control IV.
> 4. adjust data reference IV.
> 5. allow non-vf elements processing in non-final iteration
> 
> Code:
># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> # { for (size_t i=0; i 
> Take RVV codegen for example:
> 
> Before this patch:
> vvaddint32:
> ble a0,zero,.L6
> csrra4,vlenb
> srlia6,a4,2
> .L4:
> mv  a5,a0
> bleua0,a6,.L3
> mv  a5,a6
> .L3:
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a2)
> vsetvli a7,zero,e32,m1,ta,ma
> sub a0,a0,a5
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a3)
> add a2,a2,a4
> add a3,a3,a4
> add a1,a1,a4
> bne a0,zero,.L4
> .L6:
> ret
> 
> After this patch:
> 
> vvaddint32:
> vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> vle32.v v0, (a1) # Get first vector
>   sub a0, a0, t0 # Decrement number done
>   slli t0, t0, 2 # Multiply number done by 4 bytes
>   add a1, a1, t0 # Bump pointer
> vle32.v v1, (a2) # Get second vector
>   add a2, a2, t0 # Bump pointer
> vadd.vv v2, v0, v1   # Sum vectors
> vse32.v v2, (a3) # Store result
>   add a3, a3, t0 # Bump pointer
>   bnez a0, vvaddint32# Loop back
>   ret# Finished
 
OK.
 
Thanks,
Richard.
 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add SELECT_VL support.
> * internal-fn.def (SELECT_VL): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
> * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
> (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
> 
> ---
>  gcc/doc/md.texi | 22 
>  gcc/internal-fn.def |  1 +
>  gcc/optabs.def  |  1 +
>  gcc/tree-vect-loop-manip.cc | 32 -
>  gcc/tree-vect-loop.cc   | 72 +
>  gcc/tree-vect-stmts.cc  | 69 +++
>  gcc/tree-vectorizer.h   |  6 
>  7 files changed, 187 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6a435eb4461..95f7fe1f802 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of scalar iterations that should be handled
> +by one iteration of a vector loop.  Operand 1 is the total number of
> +scalar iterations that the loop needs to process and operand 2 is a
> +maximum bound on the result (also known as the maximum ``vectorization
> +factor'').
> +
> +The maximum value of operand 0 is given by:
> +@smallexample
> +operand0 = MIN (operand1, operand2)
> +@end smallexample
> +However, targets might choose a lower value than this, based on
> +target-specific criteria.  Each iteration of the vector loop might
> +therefore process a different number of scalar iterations, which in turn
> +means that induction variables will have a variable step.  Because of
> +this, it is generally not useful to define this instruction if it will
> +always calculate the maximum value.
> +
> +This optab is only useful on targets that implement @samp{len_load_@var{m}}
> +and/or @samp{len_store_@var{m}}.
> +
>  @cindex @code{check_raw_ptrs@var{m}} instruction pattern
>  @item @samp{check_raw_ptrs@var{m}}
>  Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> diff --git a/gcc/internal-fn.def b/gcc/in

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, Jun 9, 2023 at 11:45 AM Andrew Stubbs  wrote:
>
> On 09/06/2023 10:02, Richard Sandiford wrote:
> > Andrew Stubbs  writes:
> >> On 07/06/2023 20:42, Richard Sandiford wrote:
> >>> I don't know if this helps (probably not), but we have a similar
> >>> situation on AArch64: a 64-bit mode like V8QI can be doubled to a
> >>> 128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
> >>> the former and "V2x8QI" for the latter.  V2x8QI is forced to come
> >>> after V16QI in the mode list, and so it is only ever used through
> >>> explicit choice.  But both modes are functionally vectors of 16 QIs.
> >>
> >> OK, that's interesting, but how do you map "complex int" vectors to that
> >> mode? I tried to figure it out, but there's no DIVMOD support so I
> >> couldn't just do a straight comparison.
> >
> > Yeah, we don't do that currently.  Instead we make TARGET_ARRAY_MODE
> > return V2x8QI for an array of 2 V8QIs (which is OK, since V2x8QI has
> > 64-bit rather than 128-bit alignment).  So we should use it for a
> > complex-y type like:
> >
> >struct { res_type res[2]; };
> >
> > In principle we should be able to do the same for:
> >
> >struct { res_type a, b; };
> >
> > but that isn't supported yet.  I think it would need a new target hook
> > along the lines of TARGET_ARRAY_MODE, but for structs rather than arrays.

And the same should work for complex types, no?  In fact we could document
that TARGET_ARRAY_MODE also is used for _Complex?  Note the hook
is used for type layout and thus innocent array types (in aggregates) can end up
with a vector mode now.  Hopefully that's without bad effects (on the ABI).

That said, the hook _could_ be used just for divmod expansion without
actually creating a complex (or array) type of vectors.

> > The advantage of this from AArch64's PoV is that it extends to 3x and 4x
> > tuples as well, whereas complex is obviously for pairs only.
> >
> > I don't know if it would be acceptable to use that kind of struct wrapper
> > for the divmod code though (for the vector case only).
>
> Looking again, I don't think this will help because GCN does not have an
> instruction that loads vectors that are back-to-back, hence there's
> little benefit in adding the tuple mode.
>
> However, GCN does have instructions that effectively load 2, 3, or 4
> vectors that are *interleaved*, which would be the likely case for
> complex numbers (or pixel colour data!)

that's load_lanes and I think not related here but it probably also
needs the xN modes.

> I need to figure out how to move forward with this patch, please; if the
> new complex modes are not acceptable then I think I need to reimplement
> DIVMOD (maybe the scalars can remain as-is), but it's not clear to me
> what that would look like.
>
> Andrew

Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, 9 Jun 2023, Richard Biener wrote:

> On Fri, 9 Jun 2023, Richard Biener wrote:
> 
> > The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
> > ICEs when tree checking is enabled.  This should avoid wrong-code
> > in cases like PR110182 and instead ICE.
> > 
> > Bootstrap and regtest pending on x86_64-unknown-linux-gnu, I guess
> > there will be some fallout of such change ...
> 
> The following is what I need to get it to boostrap on 
> x86_64-unknown-linux-gnu (with all languages enabled).
> 
> I think some cases warrant a TYPE_PRECISION_RAW but most
> are fixing existing errors.  For some cases I didn't dig
> deep enough if the code also needs to compare TYPE_VECTOR_SUBPARTS.
> 
> The testsuite is running and shows more issues ...
> 
> I put this on hold for the moment but hope to get back to it at
> some point.  I'll followup with the testresults though.

Attached - it's not too much it seems, but things repeat of course.

Richard.

testresults.xz
Description: application/xz

Re: [PATCH] testsuite: fix the condition bug in tsvc s176

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, Jun 9, 2023 at 11:58 AM Lehua Ding  wrote:
>
> > It's odd that the checksum doesn't depend on the number of iterations done 
> > ...
>
> This is because the difference between the calculated result (32063.902344) 
> and
> the expected result (32000.00) is small. The current check is that the 
> result
> is considered correct as long as the `value/expected` ratio is between 0.99f 
> and
> 1.01f.

Oh, I see ...

> I'm not sure if this check is enough, but I should also update the expected
> result to 32063.902344 (the same without vectorized).

OK.

> Best,
> Lehua
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/tsvc/tsvc.h:
> * gcc.dg/vect/tsvc/vect-tsvc-s176.c:
>
> ---
>  gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h   | 2 +-
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h 
> b/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h
> index cd39c041903d..d910c384fc83 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h
> @@ -1164,7 +1164,7 @@ real_t get_expected_result(const char * name)
>  } else if (!strcmp(name, "s175")) {
> return 32009.023438f;
>  } else if (!strcmp(name, "s176")) {
> -   return 32000.f;
> +   return 32063.902344f;
>  } else if (!strcmp(name, "s211")) {
> return 63983.308594f;
>  } else if (!strcmp(name, "s212")) {
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c
> index 79faf7fdb9e4..365e5205982b 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s176.c
> @@ -14,7 +14,7 @@ real_t s176(struct args_t * func_args)
>  initialise_arrays(__func__);
>
>  int m = LEN_1D/2;
> -for (int nl = 0; nl < 4*(iterations/LEN_1D); nl++) {
> +for (int nl = 0; nl < 4*(10*iterations/LEN_1D); nl++) {
>  for (int j = 0; j < (LEN_1D/2); j++) {
>  for (int i = 0; i < m; i++) {
>  a[i] += b[i+m-j-1] * c[j];
> @@ -39,4 +39,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } 
> } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> --
> 2.36.1
>

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Fri, Jun 9, 2023 at 11:45 AM Andrew Stubbs  wrote:
>>
>> On 09/06/2023 10:02, Richard Sandiford wrote:
>> > Andrew Stubbs  writes:
>> >> On 07/06/2023 20:42, Richard Sandiford wrote:
>> >>> I don't know if this helps (probably not), but we have a similar
>> >>> situation on AArch64: a 64-bit mode like V8QI can be doubled to a
>> >>> 128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
>> >>> the former and "V2x8QI" for the latter.  V2x8QI is forced to come
>> >>> after V16QI in the mode list, and so it is only ever used through
>> >>> explicit choice.  But both modes are functionally vectors of 16 QIs.
>> >>
>> >> OK, that's interesting, but how do you map "complex int" vectors to that
>> >> mode? I tried to figure it out, but there's no DIVMOD support so I
>> >> couldn't just do a straight comparison.
>> >
>> > Yeah, we don't do that currently.  Instead we make TARGET_ARRAY_MODE
>> > return V2x8QI for an array of 2 V8QIs (which is OK, since V2x8QI has
>> > 64-bit rather than 128-bit alignment).  So we should use it for a
>> > complex-y type like:
>> >
>> >struct { res_type res[2]; };
>> >
>> > In principle we should be able to do the same for:
>> >
>> >struct { res_type a, b; };
>> >
>> > but that isn't supported yet.  I think it would need a new target hook
>> > along the lines of TARGET_ARRAY_MODE, but for structs rather than arrays.
>
> And the same should work for complex types, no?  In fact we could document
> that TARGET_ARRAY_MODE also is used for _Complex?  Note the hook
> is used for type layout and thus innocent array types (in aggregates) can end 
> up
> with a vector mode now.

Yeah, that was deliberate.  Given that we have modes for pairs of vectors,
it seemed better to use them even without an explicit opt-in.

> Hopefully that's without bad effects (on the ABI).

Well, I won't make any guarantees :)  But we did check, and it seemed
to be handled correctly.  Most of the AArch64 ABI code is agnostic to
aggregate modes.

> That said, the hook _could_ be used just for divmod expansion without
> actually creating a complex (or array) type of vectors.
>
>> > The advantage of this from AArch64's PoV is that it extends to 3x and 4x
>> > tuples as well, whereas complex is obviously for pairs only.
>> >
>> > I don't know if it would be acceptable to use that kind of struct wrapper
>> > for the divmod code though (for the vector case only).
>>
>> Looking again, I don't think this will help because GCN does not have an
>> instruction that loads vectors that are back-to-back, hence there's
>> little benefit in adding the tuple mode.
>>
>> However, GCN does have instructions that effectively load 2, 3, or 4
>> vectors that are *interleaved*, which would be the likely case for
>> complex numbers (or pixel colour data!)
>
> that's load_lanes and I think not related here but it probably also
> needs the xN modes.

Yeah, we need the modes for that too.

I don't the modes imply that the registers can be loaded and stored
in-order using a single instruction.  That isn't possible for big-endian
AArch64, for example.  It also isn't possible for the equivalent SVE types.
But the modes are still useful in those cases, because of their use in
interleaved loads and stores (and for the ABI).

Thanks,
Richard

[committed] libstdc++: Optimize std::to_array for trivial types [PR110167]

2023-06-09 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

This makes sense to backport after some soak time on trunk.

-- >8 --

As reported in PR libstdc++/110167, std::to_array compiles extremely
slowly for very large arrays. It needs to instantiate a very large
specialization of std::index_sequence and then create a very large
aggregate initializer from the pack expansion. For trivial types we can
simply default-initialize the std::array and then use memcpy to copy the
values. For non-trivial types we need to use the existing
implementation, despite the compilation cost.

As also noted in the PR, using a generic lambda instead of the
__to_array helper compiles faster since gcc-13. It also produces
slightly smaller code at -O1, due to additional inlining. The code at
-Os, -O2 and -O3 seems to be the same. This new implementation requires
__cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).

libstdc++-v3/ChangeLog:

PR libstdc++/110167
* include/std/array (to_array): Initialize arrays of trivial
types using memcpy. For non-trivial types, use lambda
expressions instead of a separate helper function.
(__to_array): Remove.
* testsuite/23_containers/array/creation/110167.cc: New test.
---
 libstdc++-v3/include/std/array| 53 +--
 .../23_containers/array/creation/110167.cc| 14 +
 2 files changed, 51 insertions(+), 16 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/array/creation/110167.cc

diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
index 70280c1beeb..b791d86ddb2 100644
--- a/libstdc++-v3/include/std/array
+++ b/libstdc++-v3/include/std/array
@@ -414,19 +414,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return std::move(std::get<_Int>(__arr));
 }
 
-#if __cplusplus > 201703L
+#if __cplusplus >= 202002L && __cpp_generic_lambdas >= 201707L
 #define __cpp_lib_to_array 201907L
-
-  template
-constexpr array, sizeof...(_Idx)>
-__to_array(_Tp (&__a)[sizeof...(_Idx)], index_sequence<_Idx...>)
-{
-  if constexpr (_Move)
-   return {{std::move(__a[_Idx])...}};
-  else
-   return {{__a[_Idx]...}};
-}
-
   template
 [[nodiscard]]
 constexpr array, _Nm>
@@ -436,8 +425,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert(!is_array_v<_Tp>);
   static_assert(is_constructible_v<_Tp, _Tp&>);
   if constexpr (is_constructible_v<_Tp, _Tp&>)
-   return __to_array(__a, make_index_sequence<_Nm>{});
-  __builtin_unreachable(); // FIXME: see PR c++/91388
+   {
+ if constexpr (is_trivial_v<_Tp> && _Nm != 0)
+   {
+ array, _Nm> __arr;
+ if (!__is_constant_evaluated() && _Nm != 0)
+   __builtin_memcpy(__arr.data(), __a, sizeof(__a));
+ else
+   for (size_t __i = 0; __i < _Nm; ++__i)
+ __arr._M_elems[__i] = __a[__i];
+ return __arr;
+   }
+ else
+   return [&__a](index_sequence<_Idx...>) {
+ return array, _Nm>{{ __a[_Idx]... }};
+   }(make_index_sequence<_Nm>{});
+   }
+  else
+   __builtin_unreachable(); // FIXME: see PR c++/91388
 }
 
   template
@@ -449,8 +454,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert(!is_array_v<_Tp>);
   static_assert(is_move_constructible_v<_Tp>);
   if constexpr (is_move_constructible_v<_Tp>)
-   return __to_array<1>(__a, make_index_sequence<_Nm>{});
-  __builtin_unreachable(); // FIXME: see PR c++/91388
+   {
+ if constexpr (is_trivial_v<_Tp>)
+   {
+ array, _Nm> __arr;
+ if (!__is_constant_evaluated() && _Nm != 0)
+   __builtin_memcpy(__arr.data(), __a, sizeof(__a));
+ else
+   for (size_t __i = 0; __i < _Nm; ++__i)
+ __arr._M_elems[__i] = std::move(__a[__i]);
+ return __arr;
+   }
+ else
+   return [&__a](index_sequence<_Idx...>) {
+ return array, _Nm>{{ std::move(__a[_Idx])... }};
+   }(make_index_sequence<_Nm>{});
+   }
+  else
+   __builtin_unreachable(); // FIXME: see PR c++/91388
 }
 #endif // C++20
 
diff --git a/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc 
b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
new file mode 100644
index 000..c2aecc911bd
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
@@ -0,0 +1,14 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+// PR libstdc++/110167 - excessive compile time when optimizing std::to_array
+
+#include 
+
+constexpr int N = 512 * 512;
+
+std::array
+make_std_array(int (&a)[N])
+{
+  return std::to_array(a);
+}
-- 
2.40.1

[committed] libstdc++: Fix P2510R3 "Formatting pointers" [PR110149]

2023-06-09 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

I'll backport it to gcc-13 later.

-- >8 --

I had intended to support the P2510R3 proposal unconditionally in C++20
mode, but I left it half implemented. The parse function supported the
new extensions, but the format function didn't.

This adds the missing pieces, and makes it only enabled for C++26 and
non-strict modes.

libstdc++-v3/ChangeLog:

PR libstdc++/110149
* include/std/format (formatter::parse):
Only alow 0 and P for C++26 and non-strict modes.
(formatter::format): Use toupper for P
type, and insert zero-fill characters for 0 option.
* testsuite/std/format/functions/format.cc: Check pointer
formatting. Only check P2510R3 extensions conditionally.
* testsuite/std/format/parse_ctx.cc: Only check P2510R3
extensions conditionally.
---
 libstdc++-v3/include/std/format   | 56 ---
 .../testsuite/std/format/functions/format.cc  | 42 ++
 .../testsuite/std/format/parse_ctx.cc | 15 +++--
 3 files changed, 101 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 6edc3208afa..96a1e62ccc8 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -830,7 +830,7 @@ namespace __format
{
  if (_M_spec._M_type == _Pres_esc)
{
- // TODO: C++20 escaped string presentation
+ // TODO: C++23 escaped string presentation
}
 
  if (_M_spec._M_width_kind == _WP_none
@@ -2081,19 +2081,31 @@ namespace __format
if (__finished())
  return __first;
 
-   // _GLIBCXX_RESOLVE_LIB_DEFECTS
-   // P2519R3 Formatting pointers
+// _GLIBCXX_RESOLVE_LIB_DEFECTS
+// P2510R3 Formatting pointers
+#define _GLIBCXX_P2518R3 (__cplusplus > 202302L || ! defined __STRICT_ANSI__)
+
+#if _GLIBCXX_P2518R3
__first = __spec._M_parse_zero_fill(__first, __last);
if (__finished())
  return __first;
+#endif
 
__first = __spec._M_parse_width(__first, __last, __pc);
 
-   if (__first != __last && (*__first == 'p' || *__first == 'P'))
+   if (__first != __last)
  {
-   if (*__first == 'P')
+   if (*__first == 'p')
+ ++__first;
+#if _GLIBCXX_P2518R3
+   else if (*__first == 'P')
+   {
+ // _GLIBCXX_RESOLVE_LIB_DEFECTS
+ // P2510R3 Formatting pointers
  __spec._M_type = __format::_Pres_P;
-   ++__first;
+ ++__first;
+   }
+#endif
  }
 
if (__finished())
@@ -2110,9 +2122,21 @@ namespace __format
  char __buf[2 + sizeof(__v) * 2];
  auto [__ptr, __ec] = std::to_chars(__buf + 2, std::end(__buf),
 __u, 16);
- const int __n = __ptr - __buf;
+ int __n = __ptr - __buf;
  __buf[0] = '0';
  __buf[1] = 'x';
+#if _GLIBCXX_P2518R3
+ if (_M_spec._M_type == __format::_Pres_P)
+   {
+ __buf[1] = 'X';
+ for (auto __p = __buf + 2; __p != __ptr; ++__p)
+#if __has_builtin(__builtin_toupper)
+   *__p = __builtin_toupper(*__p);
+#else
+   *__p = std::toupper(*__p);
+#endif
+   }
+#endif
 
  basic_string_view<_CharT> __str;
  if constexpr (is_same_v<_CharT, char>)
@@ -2126,6 +2150,24 @@ namespace __format
  __str = wstring_view(__p, __n);
}
 
+#if _GLIBCXX_P2518R3
+ if (_M_spec._M_zero_fill)
+   {
+ size_t __width = _M_spec._M_get_width(__fc);
+ if (__width <= __str.size())
+   return __format::__write(__fc.out(), __str);
+
+ auto __out = __fc.out();
+ // Write "0x" or "0X" prefix before zero-filling.
+ __out = __format::__write(std::move(__out), __str.substr(0, 2));
+ __str.remove_prefix(2);
+ size_t __nfill = __width - __n;
+ return __format::__write_padded(std::move(__out), __str,
+ __format::_Align_right,
+ __nfill, _CharT('0'));
+   }
+#endif
+
  return __format::__write_padded_as_spec(__str, __n, __fc, _M_spec,
  __format::_Align_right);
}
diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
b/libstdc++-v3/testsuite/std/format/functions/format.cc
index 2a1b1560394..3485535e3cb 100644
--- a/libstdc++-v3/testsuite/std/format/functions/format.cc
+++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
@@ -206,6 +206,8 @@ test_width()
   VERIFY( s == "" );
   s = std::format("{:{}}", "", 3);
   VERIFY( s == "   " );
+  s = std::format("{:{}}|{:{}}", 1, 2, 3, 4);
+  VERIFY( s == " 1|   3" );
   s = std::format("{1:{0}}", 2, "");
   VERIFY( s == "  " );
   s = std::

[committed] libstdc++: Bump library version to libstdc++.so.6.0.33

2023-06-09 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, powerpc64le-linux, sparcv9-solaris. Pushed to
trunk.

There's no new GLIBCXX_3.4.33 symbol version yet, because we have
nothing to put in it. The bump is because of the new CXXABI_1.3.15
version.

-- >8 --

The addition of __cxa_call_terminate@@CXXABI_1.3.15 on trunk means we
need a new version.

libstdc++-v3/ChangeLog:

* acinclude.m4 (libtool_VERSION): Update to 6.0.33.
* configure: Regenerate.
* doc/xml/manual/abi.xml: Add libstdc++.so.6.0.33.
* doc/html/manual/abi.html: Regenerate.
---
 libstdc++-v3/acinclude.m4 | 2 +-
 libstdc++-v3/configure| 2 +-
 libstdc++-v3/doc/html/manual/abi.html | 2 +-
 libstdc++-v3/doc/xml/manual/abi.xml   | 1 +
 4 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 13c3966b317..efc27aa493e 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4214,7 +4214,7 @@ changequote([,])dnl
 fi
 
 # For libtool versioning info, format is CURRENT:REVISION:AGE
-libtool_VERSION=6:32:0
+libtool_VERSION=6:33:0
 
 # Everything parsed; figure out what files and settings to use.
 case $enable_symvers in
diff --git a/libstdc++-v3/doc/xml/manual/abi.xml 
b/libstdc++-v3/doc/xml/manual/abi.xml
index 4b4930bef4c..44063831779 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -281,6 +281,7 @@ compatible.
 GCC 12.1.0: libstdc++.so.6.0.30
 GCC 13.1.0: libstdc++.so.6.0.31
 GCC 13.2.0: libstdc++.so.6.0.32
+GCC : libstdc++.so.6.0.33
 
 
   Note 1: Error should be libstdc++.so.3.0.3.
-- 
2.40.1

Re: [PATCH RFC] c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]

2023-06-09 Thread Jonathan Wakely via Gcc-patches

On Fri, 9 Jun 2023 at 10:09, Jakub Jelinek  wrote:

> On Fri, Jun 09, 2023 at 11:02:48AM +0200, Richard Biener via Gcc-patches
> wrote:
> > > Currently both gcc-13 and trunk are at the same library version,
> > > libstdc++.so.6.0.32
> > >
> > > But with this addition to trunk we need to bump that .32 to .33,
> meaning
> > > that gcc-13 and trunk diverge. If we want to backport any new symbols
> from
> > > trunk to gcc-13 that gets trickier once they've diverged.
> >
> > But if you backport any new used symbol you have to bump the version
> > anyway.  So why not bump now (on trunk)?
>
> We've already done that in 13.1.1.  So, before 13.2 is released, we can add
> further symbols to the GLIBCXX_3.4.32 symbol version.
> Though, I don't see a problem bumping libstdc++ to libstdc++.so.6.0.33
> on the trunk now


OK, done at r14-1649-g9a3558cf1fb40b


> and put __cxa_call_terminate to GLIBCXX_3.4.33.
>

Well it's already in CXXABI_1.3.15, we just didn't bump the library version
when adding that.

[committed] libstdc++: Add preprocessor checks to [PR100285]

2023-06-09 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

 --> 8--

We can't define endpoints and resolvers without the relevant OS support.
If IPPROTO_TCP and IPPROTO_UDP are both udnefined then we won't need
basic_endpoint and basic_resovler anyway, so make them depend on those
macros.

libstdc++-v3/ChangeLog:

PR libstdc++/100285
* include/experimental/internet [IPPROTO_TCP || IPPROTO_UDP]
(basic_endpoint, basic_resolver_entry, resolver_base)
(basic_resolver_results, basic_resolver): Only define if the tcp
or udp protocols will be defined.
---
 libstdc++-v3/include/experimental/internet | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/include/experimental/internet 
b/libstdc++-v3/include/experimental/internet
index 1f63c61ce85..bd9a05f12aa 100644
--- a/libstdc++-v3/include/experimental/internet
+++ b/libstdc++-v3/include/experimental/internet
@@ -1502,6 +1502,7 @@ namespace ip
 operator<<(basic_ostream<_CharT, _Traits>& __os, const network_v6& __net)
 { return __os << __net.to_string(); }
 
+#if defined IPPROTO_TCP || defined  IPPROTO_UDP
   /// An IP endpoint.
   template
 class basic_endpoint
@@ -2187,6 +2188,7 @@ namespace ip
   __ec = std::make_error_code(errc::operation_not_supported);
 #endif
 }
+#endif // IPPROTO_TCP || IPPROTO_UDP
 
   /** The name of the local host.
* @{
-- 
2.40.1

[committed] libstdc++: Remove duplicate definition of _Float128 std::from_chars [PR110077]

2023-06-09 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, sparc-solaris. Pushed to trunk.

-- >8 --

When long double uses IEEE binary128 representation we define the
_Float128 overload of std::from_chars inline in . My changes
in r14-1431-g7037e7b6e4ac41 cause it to also be defined non-inline in
the library, leading to an abi-check failure for (at least) sparc and
aarch64.

Suppress the definition in the library if long double and _Float128 have
are both IEEE binary128.

libstdc++-v3/ChangeLog:

PR libstdc++/110077
* src/c++17/floating_from_chars.cc (from_chars) <_Float128>:
Only define if _Float128 and long double have different
representations.
---
 libstdc++-v3/src/c++17/floating_from_chars.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
b/libstdc++-v3/src/c++17/floating_from_chars.cc
index f1dd1037bf3..3152d64c67c 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -1325,7 +1325,8 @@ _ZSt10from_charsPKcS0_RDF128_St12chars_format(const char* 
first,
  __ieee128& value,
  chars_format fmt) noexcept
 __attribute__((alias ("_ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format")));
-#elif defined(__FLT128_MANT_DIG__)
+#elif __FLT128_MANT_DIG__ == 113 && __LDBL_MANT_DIG__ != 113
+// Overload for _Float128 is not defined inline in , define it here.
 from_chars_result
 from_chars(const char* first, const char* last, _Float128& value,
   chars_format fmt) noexcept
-- 
2.40.1

Re: [PATCH] fix frange_nextafter odr violation

2023-06-09 Thread Alexandre Oliva via Gcc-patches

On Jun  9, 2023, Richard Biener  wrote:

> On Thu, Jun 8, 2023 at 4:38 PM Alexandre Oliva via Gcc-patches
>  wrote:

>> C++ requires inline functions to be declared inline and defined in
>> every translation unit that uses them.  frange_nextafter is used in
>> gimple-range-op.cc but it's only defined as inline in
>> range-op-float.cc.  Drop the extraneous inline specifier.

> OK

>> for  gcc/ChangeLog
>> 
>> * range-op-float.cc (frange_nextafter): Drop inline.
>> (frelop_early_resolve): Add static.
>> (frange_float): Likewise.

The problem is also present in gcc-13.  Ok there as well?  Regstrapped
on x86_64-linux-gnu.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Ping: [PATCH] libatomic: x86_64: Always try ifunc

2023-06-09 Thread Xi Ruoyao via Gcc-patches

Ping (in hopes that someone can review before the weekend).

On Sat, 2023-06-03 at 19:25 +0800, Xi Ruoyao wrote:
> We used to skip ifunc check when CX16 is available.  But now we use
> CX16+AVX+Intel/AMD for the "perfect" 16b load implementation, so CX16
> alone is not a sufficient reason not to use ifunc (see PR104688).
> 
> This causes a subtle and annoying issue: when GCC is built with a
> higher -march= setting in CFLAGS_FOR_TARGET, ifunc is disabled and
> the worst (locked) implementation of __atomic_load_16 is always used.
> 
> There seems no good way to check if the CPU is Intel or AMD from
> the built-in macros (maybe we can check every known model like
> __skylake,
> __bdver2, ..., but it will be very error-prune and require an update
> whenever we add the support for a new x86 model).  The best thing we
> can
> do seems "always try ifunc" here.
> 
> Bootstrapped and tested on x86_64-linux-gnu.  Ok for trunk?
> 
> libatomic/ChangeLog:
> 
> * configure.tgt: For x86_64, always set try_ifunc=yes.
> ---
>  libatomic/configure.tgt | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
> index a92ae9e8309..39dd5686f2e 100644
> --- a/libatomic/configure.tgt
> +++ b/libatomic/configure.tgt
> @@ -100,9 +100,7 @@ EOF
> fi
> cat > conftestx.c <  #ifdef __x86_64__
> -#ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
> -#error need -mcx16
> -#endif
> +#error ifunc is always wanted for 16B atomic load
>  #else
>  #ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
>  #error need -march=i686

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread Jiufu Guo via Gcc-patches



Hi,

Richard Biener  writes:

> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>
>> 
>> Hi,
>> 
>> Richard Biener  writes:
>> 
>> > On Fri, 9 Jun 2023, Richard Sandiford wrote:
>> >
>> >> guojiufu  writes:
>> >> > Hi,
>> >> >
>> >> > On 2023-06-09 16:00, Richard Biener wrote:
>> >> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> >> 
>> >> >>> Hi,
>> >> >>> 
...
>> >> >>> 
>> >> >>> This patch is raised when drafting below one.
>> >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>> >> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>> >> >>> try_const_anchors, and hits the assert/ice.
>> >> >>> 
>> >> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>> >> >>> Is this ok for trunk?
>> >> >> 
>> >> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
>> >> >> I suggest to instead fix try_const_anchors to change
>> >> >> 
>> >> >>   /* CONST_INT is used for CC modes, but we should leave those alone.  
>> >> >> */
>> >> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
>> >> >> return NULL_RTX;
>> >> >> 
>> >> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
>> >> >> 
>> >> >> to
>> >> >> 
>> >> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
>> >> >> alone.  */
>> >> >>   if (!SCALAR_INT_MODE_P (mode))
>> >> >> return NULL_RTX;
>> >> >> 
>> >> >
>> >> > This is also able to fix this issue.  there is a "Punt on CC modes" 
>> >> > patch
>> >> > to return NULL_RTX in try_const_anchors.
>> >> >
>> >> >> but as said I wonder how we arrive at a BLKmode CONST_INT and whether
>> >> >> we should have fended this off earlier.  Can you share more complete
>> >> >> RTL of that stack_tie?
>> >> >
>> >> >
>> >> > (insn 15 14 16 3 (parallel [
>> >> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>> >> >  (const_int 0 [0]))
>> >> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>> >> >   (nil))
>> >> >
>> >> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
>> >> 
>> >> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] ...)
>> >> would be though.  It's arguably more accurate too, since the effect
>> >> on the stack locations is unspecified rather than predictable.
>> >
>> > powerpc seems to be the only port with a stack_tie that's not
>> > using an UNSPEC RHS.
>> In rs6000.md, it is
>> 
>> ; This is to explain that changes to the stack pointer should
>> ; not be moved over loads from or stores to stack memory.
>> (define_insn "stack_tie"
>>   [(match_parallel 0 "tie_operand"
>> [(set (mem:BLK (reg 1)) (const_int 0))])]
>>   ""
>>   ""
>>   [(set_attr "length" "0")])
>> 
>> This would be just an placeholder insn, and acts as the comments.
>> UNSPEC_ would works like other targets.  While, I'm wondering
>> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
>> MODEs between SET_DEST and SET_SRC?
>
> I don't think the issue is the mode but the issue is that
> the patter as-is says some memory is zeroed while that's not
> actually true (not specifying a size means we can't really do
> anything with this MEM, but still).  Using an UNSPEC avoids
> implying anything for the stored value.
>
> Of course I think a MEM SET_DEST without a specified size is bougs
> as well, but there's larger precedent for this...

Thanks for your kindly comments!
Using "(set (mem:BLK (reg 1)) (const_int 0))" here, may because this
insn does not generate real thing (not a real store and no asm code),
may like barrier.

While I agree that, using UNSPEC may be more clear to avoid mis-reading.

BR,
Jeff (Jiufu Guo)

>
> Richard.
>
>> Thanks for comments!
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> >
>> >> Thanks,
>> >> Richard
>>

Re: Ping: [PATCH] libatomic: x86_64: Always try ifunc

2023-06-09 Thread Jakub Jelinek via Gcc-patches

On Fri, Jun 09, 2023 at 08:37:20PM +0800, Xi Ruoyao wrote:
> Ping (in hopes that someone can review before the weekend).
> 
> On Sat, 2023-06-03 at 19:25 +0800, Xi Ruoyao wrote:
> > We used to skip ifunc check when CX16 is available.  But now we use
> > CX16+AVX+Intel/AMD for the "perfect" 16b load implementation, so CX16
> > alone is not a sufficient reason not to use ifunc (see PR104688).
> > 
> > This causes a subtle and annoying issue: when GCC is built with a
> > higher -march= setting in CFLAGS_FOR_TARGET, ifunc is disabled and
> > the worst (locked) implementation of __atomic_load_16 is always used.
> > 
> > There seems no good way to check if the CPU is Intel or AMD from
> > the built-in macros (maybe we can check every known model like
> > __skylake,
> > __bdver2, ..., but it will be very error-prune and require an update
> > whenever we add the support for a new x86 model).  The best thing we
> > can
> > do seems "always try ifunc" here.
> > 
> > Bootstrapped and tested on x86_64-linux-gnu.  Ok for trunk?
> > 
> > libatomic/ChangeLog:
> > 
> > * configure.tgt: For x86_64, always set try_ifunc=yes.

Ok, thanks.

Jakub

Re: [PATCH] fix frange_nextafter odr violation

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, Jun 9, 2023 at 2:26 PM Alexandre Oliva  wrote:
>
> On Jun  9, 2023, Richard Biener  wrote:
>
> > On Thu, Jun 8, 2023 at 4:38 PM Alexandre Oliva via Gcc-patches
> >  wrote:
>
> >> C++ requires inline functions to be declared inline and defined in
> >> every translation unit that uses them.  frange_nextafter is used in
> >> gimple-range-op.cc but it's only defined as inline in
> >> range-op-float.cc.  Drop the extraneous inline specifier.
>
> > OK
>
> >> for  gcc/ChangeLog
> >>
> >> * range-op-float.cc (frange_nextafter): Drop inline.
> >> (frelop_early_resolve): Add static.
> >> (frange_float): Likewise.
>
> The problem is also present in gcc-13.  Ok there as well?  Regstrapped
> on x86_64-linux-gnu.

Yes.

Richard.

> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread Richard Biener via Gcc-patches

On Fri, 9 Jun 2023, Jiufu Guo wrote:

> 
> Hi,
> 
> Richard Biener  writes:
> 
> > On Fri, 9 Jun 2023, Jiufu Guo wrote:
> >
> >> 
> >> Hi,
> >> 
> >> Richard Biener  writes:
> >> 
> >> > On Fri, 9 Jun 2023, Richard Sandiford wrote:
> >> >
> >> >> guojiufu  writes:
> >> >> > Hi,
> >> >> >
> >> >> > On 2023-06-09 16:00, Richard Biener wrote:
> >> >> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
> >> >> >> 
> >> >> >>> Hi,
> >> >> >>> 
> ...
> >> >> >>> 
> >> >> >>> This patch is raised when drafting below one.
> >> >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
> >> >> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
> >> >> >>> try_const_anchors, and hits the assert/ice.
> >> >> >>> 
> >> >> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
> >> >> >>> Is this ok for trunk?
> >> >> >> 
> >> >> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
> >> >> >> I suggest to instead fix try_const_anchors to change
> >> >> >> 
> >> >> >>   /* CONST_INT is used for CC modes, but we should leave those 
> >> >> >> alone.  
> >> >> >> */
> >> >> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
> >> >> >> return NULL_RTX;
> >> >> >> 
> >> >> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
> >> >> >> 
> >> >> >> to
> >> >> >> 
> >> >> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
> >> >> >> alone.  */
> >> >> >>   if (!SCALAR_INT_MODE_P (mode))
> >> >> >> return NULL_RTX;
> >> >> >> 
> >> >> >
> >> >> > This is also able to fix this issue.  there is a "Punt on CC modes" 
> >> >> > patch
> >> >> > to return NULL_RTX in try_const_anchors.
> >> >> >
> >> >> >> but as said I wonder how we arrive at a BLKmode CONST_INT and whether
> >> >> >> we should have fended this off earlier.  Can you share more complete
> >> >> >> RTL of that stack_tie?
> >> >> >
> >> >> >
> >> >> > (insn 15 14 16 3 (parallel [
> >> >> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
> >> >> >  (const_int 0 [0]))
> >> >> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
> >> >> >   (nil))
> >> >> >
> >> >> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
> >> >> 
> >> >> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] ...)
> >> >> would be though.  It's arguably more accurate too, since the effect
> >> >> on the stack locations is unspecified rather than predictable.
> >> >
> >> > powerpc seems to be the only port with a stack_tie that's not
> >> > using an UNSPEC RHS.
> >> In rs6000.md, it is
> >> 
> >> ; This is to explain that changes to the stack pointer should
> >> ; not be moved over loads from or stores to stack memory.
> >> (define_insn "stack_tie"
> >>   [(match_parallel 0 "tie_operand"
> >>   [(set (mem:BLK (reg 1)) (const_int 0))])]
> >>   ""
> >>   ""
> >>   [(set_attr "length" "0")])
> >> 
> >> This would be just an placeholder insn, and acts as the comments.
> >> UNSPEC_ would works like other targets.  While, I'm wondering
> >> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
> >> MODEs between SET_DEST and SET_SRC?
> >
> > I don't think the issue is the mode but the issue is that
> > the patter as-is says some memory is zeroed while that's not
> > actually true (not specifying a size means we can't really do
> > anything with this MEM, but still).  Using an UNSPEC avoids
> > implying anything for the stored value.
> >
> > Of course I think a MEM SET_DEST without a specified size is bougs
> > as well, but there's larger precedent for this...
> 
> Thanks for your kindly comments!
> Using "(set (mem:BLK (reg 1)) (const_int 0))" here, may because this
> insn does not generate real thing (not a real store and no asm code),
> may like barrier.
> 
> While I agree that, using UNSPEC may be more clear to avoid mis-reading.

Btw, another way to avoid the issue in CSE is to make it not process
(aka record anything for optimization) for SET from MEMs with
!MEM_SIZE_KNOWN_P

Richard.

Re: [PATCH 2/2] ipa-cp: Feed results of IPA-CP into value numbering

2023-06-09 Thread Martin Jambor

Hi,

thanks for looking at this.

On Fri, Jun 02 2023, Richard Biener wrote:
> On Mon, 29 May 2023, Martin Jambor wrote:
>

[...]

>> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
>> index 27c84e78fcf..33215b5fc82 100644
>> --- a/gcc/tree-ssa-sccvn.cc
>> +++ b/gcc/tree-ssa-sccvn.cc
>> @@ -74,6 +74,9 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "ipa-modref-tree.h"
>>  #include "ipa-modref.h"
>>  #include "tree-ssa-sccvn.h"
>> +#include "alloc-pool.h"
>> +#include "symbol-summary.h"
>> +#include "ipa-prop.h"
>>  
>>  /* This algorithm is based on the SCC algorithm presented by Keith
>> Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
>> @@ -2327,7 +2330,7 @@ vn_walk_cb_data::push_partial_def (pd_data pd,
>> with the current VUSE and performs the expression lookup.  */
>>  
>>  static void *
>> -vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
>> +vn_reference_lookup_2 (ao_ref *op, tree vuse, void *data_)
>>  {
>>vn_walk_cb_data *data = (vn_walk_cb_data *)data_;
>>vn_reference_t vr = data->vr;
>> @@ -2361,6 +2364,37 @@ vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, 
>> tree vuse, void *data_)
>>return *slot;
>>  }
>>  
>> +  if (SSA_NAME_IS_DEFAULT_DEF (vuse))
>> +{
>> +  HOST_WIDE_INT offset, size;
>> +  tree v = NULL_TREE;
>> +  if (op->base && TREE_CODE (op->base) == PARM_DECL
>> +  && op->offset.is_constant (&offset)
>> +  && op->size.is_constant (&size)
>> +  && op->max_size_known_p ()
>> +  && known_eq (op->size, op->max_size))
>> +v = ipcp_get_aggregate_const (cfun, op->base, false, offset, size);
>
> We've talked about partial definition support, this does not
> have this implemented AFAICS.  But that means you cannot simply
> do ->finish () without verifying data->partial_defs.is_empty ().
>

You are right, partial definitions are not implemented.  I have added
the is_empty check to the patch.  I'll continue looking into adding the
support as a follow-up.

>> +  else if (op->ref)
>> +{
>
> does this ever happen to imrpove things?

Yes, this branch is necessary for propagation of all known constants
passed in memory pointed to by a POINTER_TYPE_P parameter.  It handles
the second testcase added by the patch.

> There's the remote
> possibility op->base isn't initialized yet, for this reason
> above you should use ao_ref_base (op) instead of accessing
> op->base directly.

OK

>
>> +  HOST_WIDE_INT offset, size;
>> +  bool reverse;
>> +  tree base = get_ref_base_and_extent_hwi (op->ref, &offset,
>> +   &size, &reverse);
>> +  if (base
>> +  && TREE_CODE (base) == MEM_REF
>> +  && integer_zerop (TREE_OPERAND (base, 1))
>> +  && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
>
> And this then should be done within the above branch as well,
> just keyed off base == MEM_REF.

I am sorry but I don't understand this comment, can you please try to
re-phrase it?  The previous branch handles direct accesses to
PARM_DECLs, MEM_REFs don't need to be there at all.

Updated (bootstrap and testing passing) patch is below for reference,
but I obviously expect to incorporate the above comment as well before
proposing to push it.

Thanks,

Martin

Subject: [PATCH 2/2] ipa-cp: Feed results of IPA-CP into value numbering

PRs 68930 and 92497 show that when IPA-CP figures out constants in
aggregate parameters or when passed by reference but the loads happen
in an inlined function the information is lost.  This happens even
when the inlined function itself was known to have - or even cloned to
have - such constants in incoming parameters because the transform
phase of IPA passes is not run on them.  See discussion in the bugs
for reasons why.

Honza suggested that we can plug the results of IPA-CP analysis into
value numbering, so that FRE can figure out that some loads fetch
known constants.  This is what this patch attempts to do.

This version of the patch uses the new way we represent aggregate
constants discovered IPA-CP and so avoids linear scan to find them.
Similarly, it depends on the previous patch which avoids potentially
slow linear look ups of indices of PARM_DECLs when there are many of
them.

gcc/ChangeLog:

2023-06-07  Martin Jambor  

PR ipa/68930
PR ipa/92497
* ipa-prop.h (ipcp_get_aggregate_const): Declare.
* ipa-prop.cc (ipcp_get_aggregate_const): New function.
(ipcp_transform_function): Do not deallocate transformation info.
* tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and
ipa-prop.h.
(vn_reference_lookup_2): When hitting default-def vuse, query
IPA-CP transformation info for any known constants.

gcc/testsuite/ChangeLog:

2023-06-07  Martin Jambor  

PR ipa/68930
PR ipa/92497
* gcc.dg/ipa/pr92497-1.c: New test.
* gcc.dg/ipa/pr92497-2.c: Likewise.
---
 gc

Re: [PATCH] testsuite: fix the condition bug in tsvc s176

2023-06-09 Thread Jeff Law via Gcc-patches





On 6/9/23 05:56, Richard Biener via Gcc-patches wrote:

On Fri, Jun 9, 2023 at 11:58 AM Lehua Ding  wrote:



It's odd that the checksum doesn't depend on the number of iterations done ...


This is because the difference between the calculated result (32063.902344) and
the expected result (32000.00) is small. The current check is that the 
result
is considered correct as long as the `value/expected` ratio is between 0.99f and
1.01f.


Oh, I see ...


I'm not sure if this check is enough, but I should also update the expected
result to 32063.902344 (the same without vectorized).


OK.


Best,
Lehua

gcc/testsuite/ChangeLog:

 * gcc.dg/vect/tsvc/tsvc.h:
 * gcc.dg/vect/tsvc/vect-tsvc-s176.c:
I stitched together appropriate ChangeLog entries and pushed this to the 
trunk (I don't think Lehua has write access).


jeff

Re: [PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-09 Thread Jiufu Guo via Gcc-patches



Hi,

Richard Biener  writes:

> On Wed, 7 Jun 2023, Jiufu Guo wrote:
>
>> Hi,
>> 
>> This patch tries to optimize "(X - N * M) / N" to "X / N - M".
>> For C code, "/" towards zero (trunc_div), and "X - N * M" maybe
>> wrap/overflow/underflow. So, it is valid that "X - N * M" does
>> not cross zero and does not wrap/overflow/underflow.
>> 
>> Compare with previous version:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618796.html
>> 
>> This patch 1. adds the patterns for variable N or M,
>> 2. uses simpler form "(X - N * M) / N" for patterns,
>> 3. adds functions to gimle-fold.h/cc (not gimple-match-head.cc)
>> 4. updates testcases
>> 
>> Bootstrap & regtest pass on ppc64{,le} and x86_64.
>> Is this patch ok for trunk?
>
> Comments below.
>
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>>  PR tree-optimization/108757
>> 
>> gcc/ChangeLog:
>> 
>>  * gimple-fold.cc (maybe_mult_overflow): New function.
>>  (maybe_plus_overflow): New function.
>>  (maybe_minus_overflow): New function.
>>  (plus_mult_no_ovf_and_keep_sign): New function.
>>  (plus_no_ovf_and_keep_sign): New function.
>>  * gimple-fold.h (maybe_mult_overflow): New declare.
>>  (plus_mult_no_ovf_and_keep_sign): New declare.
>>  (plus_no_ovf_and_keep_sign): New declare.
>>  * match.pd ((X - N * M) / N): New pattern.
>>  ((X + N * M) / N): New pattern.
>>  ((X + C) / N): New pattern.
>>  ((X + C) >> N): New pattern.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/pr108757-1.c: New test.
>>  * gcc.dg/pr108757-2.c: New test.
>>  * gcc.dg/pr108757.h: New test.
>> 
>> ---
>>  gcc/gimple-fold.cc| 161 
>>  gcc/gimple-fold.h |   3 +
>>  gcc/match.pd  |  58 +++
>>  gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
>>  gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
>>  gcc/testsuite/gcc.dg/pr108757.h   | 244 ++
>>  6 files changed, 503 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
>> 
>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>> index 581575b65ec..bb833ae17b3 100644
>> --- a/gcc/gimple-fold.cc
>> +++ b/gcc/gimple-fold.cc
>> @@ -9349,3 +9349,164 @@ gimple_stmt_integer_valued_real_p (gimple *stmt, int 
>> depth)
>>return false;
>>  }
>>  }
>> +
>> +/* Return true if "X * Y" may be overflow.  */
>> +
>> +bool
>> +maybe_mult_overflow (value_range &x, value_range &y, signop sgn)
>
> These functions look like some "basic" functionality that should
> be (or maybe already is?  Andrew?) provided by the value-range
> framework.  That means it should not reside in gimple-fold.{cc,h}
> but elsehwere and possibly with an API close to the existing
> value-range stuff.
>
> Andrew?

It would be great to get the overflow info directly from VR :)
Now, in range-op.cc, there is aleady value_range_with_overflow and
value_range_from_overflowed_bounds which checks OVFs.
While this information seems not recorded.  Maybe, it is helpful
adding a field in VR and adding API to query it.

>
>> +{
>> +  wide_int wmin0 = x.lower_bound ();
>> +  wide_int wmax0 = x.upper_bound ();
>> +  wide_int wmin1 = y.lower_bound ();
>> +  wide_int wmax1 = y.upper_bound ();
>> +
>> +  wi::overflow_type min_ovf, max_ovf;
>> +  wi::mul (wmin0, wmin1, sgn, &min_ovf);
>> +  wi::mul (wmax0, wmax1, sgn, &max_ovf);
>> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
>> +{
>> +  wi::mul (wmin0, wmax1, sgn, &min_ovf);
>> +  wi::mul (wmax0, wmin1, sgn, &max_ovf);
>> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
>> +return false;
>> +}
>> +  return true;
>> +}
>> +
>> +/* Return true if "X + Y" may be overflow.  */
>> +
>> +static bool
>> +maybe_plus_overflow (value_range &x, value_range &y, signop sgn)
>> +{
>> +  wide_int wmin0 = x.lower_bound ();
>> +  wide_int wmax0 = x.upper_bound ();
>> +  wide_int wmin1 = y.lower_bound ();
>> +  wide_int wmax1 = y.upper_bound ();
>> +
>> +  wi::overflow_type min_ovf, max_ovf;
>> +  wi::add (wmax0, wmax1, sgn, &min_ovf);
>> +  wi::add (wmin0, wmin1, sgn, &max_ovf);
>> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
>> +return false;
>> +
>> +  return true;
>> +}
>> +
>> +/* Return true if "X - Y" may be overflow.  */
>> +
>> +static bool
>> +maybe_minus_overflow (value_range &x, value_range &y, signop sgn)
>> +{
>> +  wide_int wmin0 = x.lower_bound ();
>> +  wide_int wmax0 = x.upper_bound ();
>> +  wide_int wmin1 = y.lower_bound ();
>> +  wide_int wmax1 = y.upper_bound ();
>> +
>> +  wi::overflow_type min_ovf, max_ovf;
>> +  wi::sub (wmin0, wmax1, sgn, &min_ovf);
>> +  wi::sub (wmax0, wmin1, sgn, &max_ovf);
>> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
>> +return false;
>> +
>> +  return true;
>> +}
>> +
>> +/* Return true if there is no overflow in the expression.
>> +   And no

Re: [PATCH] testsuite: fix the condition bug in tsvc s176

2023-06-09 Thread Lehua Ding

> I stitched together appropriate ChangeLog entries and pushed this to 
the 
> trunk (I don't think Lehua has write access).

Thank you!


Best,
Lehua

[PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement

2023-06-09 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch fixes the requirement of V_WHOLE and V_FRACT.
E.g. VNx8QI in V_WHOLE has no requirement which is incorrect.
 Actually, VNx8QI should be whole(full) mode when TARGET_MIN_VLEN < 128
 since when TARGET_MIN_VLEN == 128, VNx8QI is e8mf2 which is fractional
 vector.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix requirement.

---
 gcc/config/riscv/vector-iterators.md | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 234b712bc9d..8c71c9e22cc 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -447,21 +447,24 @@
 ])
 
 (define_mode_iterator V_WHOLE [
-  (VNx4QI "TARGET_MIN_VLEN == 32") VNx8QI VNx16QI VNx32QI (VNx64QI 
"TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
-  (VNx2HI "TARGET_MIN_VLEN == 32") VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
-  (VNx1SI "TARGET_MIN_VLEN == 32") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
+  (VNx4QI "TARGET_MIN_VLEN == 32") (VNx8QI "TARGET_MIN_VLEN < 128") VNx16QI 
VNx32QI
+  (VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
+  (VNx2HI "TARGET_MIN_VLEN == 32") (VNx4HI "TARGET_MIN_VLEN < 128") VNx8HI 
VNx16HI
+  (VNx32HI "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx1SI "TARGET_MIN_VLEN == 32") (VNx2SI "TARGET_MIN_VLEN < 128") VNx4SI 
VNx8SI
+  (VNx16SI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
 
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 32")
-  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 64")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
   (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN == 32")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
+  (VNx2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
@@ -481,8 +484,8 @@
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
   (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 
-  (VNx1SI "TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN < 128") (VNx2SI 
"TARGET_MIN_VLEN >= 128")
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN 
< 128")
+  (VNx1SI "TARGET_MIN_VLEN == 64") (VNx2SI "TARGET_MIN_VLEN >= 128")
+  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN == 64")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
 ])
 
-- 
2.36.1

Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread Jeff Law via Gcc-patches





On 6/9/23 04:41, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && 
Phase 6
are quite messy and cause some bugs discovered by my downstream 
auto-vectorization
test-generator.

Before this patch.

Phase 5 is cleanup_insns is the function remove AVL operand dependency from 
each RVV instruction.
E.g. vadd.vv (use a5), after Phase 5, > vadd.vv (use const_int 0). Since "a5" is used 
in "vsetvl" instructions and
after the correct "vsetvl" instructions are inserted, each RVV instruction doesn't need 
AVL operand "a5" anymore. Then,
we remove this operand dependency helps for the following scheduling PASS.
Right.  Removal of the unused operand gives the scheduler more freedom. 
It's not clear yet how much gain there is for scheduling vector on RV, 
but there's no good reason to handcuff it with unnecessary dependencies.





Phase 6 is propagate_avl do the following 2 things:
1. Local && Global user vsetvl instructions optimization.
E.g.
   vsetvli a2, a2, e8, mf8   ==> Change it into vsetvli a2, a2, e32, mf2
   vsetvli zero,a2, e32, mf2  ==> eliminate
Always good to eliminate more instructions.   So while vsetvl is 
designed to be minimal overhead and it's fully expected that we'll see a 
lot of them, there's no good reason to have unnnecessary ones in the stream.




2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is 
not used by any instructions.
Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM 
which change the CFG, I re-new a new
RTL_SSA framework (which is more expensive than just using DF) for Phase 6 and 
optmize user vsetvli base on the new RTL_SSA.
This one isn't as clear cut, but I still think it's the right thing to 
do.  The first form explicitly kills the value in a2 while the second 
does not.  Though if the value is dead it's going to be discoverable by 
DF and we should also end up with REG_DEAD note as well.   It does have 
the advantage that it does not open a new live range.




There are 2 issues in Phase 5 && Phase 6:
1. local_eliminate_vsetvl_insn was introduced by @kito which can do better 
local user vsetvl optimizations better than
Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So 
the local user vsetvli instructions optimizaiton
in Phase 6 is redundant and should be removed.
2. A bug discovered by my downstream auto-vectorization test-generator (I can't 
put the test in this patch since we are missing autovec
patterns for it so we can't use the upstream GCC directly reproduce such 
issue but I will remember put it back after I support the
necessary autovec patterns). Such bug is causing by using RTL_SSA re-new 
framework. The issue description is this:
Note that you could potentially go ahead and submit that test and just 
xfail it.  Not a requirement, but a possibility that I sometimes use if 
I know I've got a fix coming shortly.




Before Phase 6:

...
insn1: vsetlvi a3, 17 <== generated by SELECT_VL auto-vec pattern.
slli a4,a3,3
...
insn2: vsetvli zero, a3, ...
load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" 
is removed in Phase 5)
...

In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
insn2 is the vsetvli instruction inserted in Phase 4 which is not included in 
the RLT_SSA framework
even though we renew it (I didn't take a look at it and I don't think we need 
to now).
Base on this situation, the def_info of insn2 has the information 
"set->single_nondebug_insn_use ()"
which return true. Obviously, this information is not correct, since insn1 has 
aleast 2 uses:
1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by my 
downstream test-generator
execution test failed.

Understood.



Conclusion of RTL_SSA framework:
Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of 
the VSETVL PASS which is absolutely correct, the other
is re-new after Phase 4 (LCM) has incorrect information that causes bugs.

Besides, we don't like to initialize RTL_SSA second time it seems to be a waste 
since we just need to do a little optimization.

Base on all circumstances I described above, I rework and reorganize Phase 5 && 
Phase 6 as follows:
1. Phase 5 is called ssa_post_optimization which is doing the optimization base 
on the RTL_SSA information (The RTL_SSA is initialized
at the beginning of the VSETVL PASS, no need to re-new it again). This 
phase includes 3 optimizaitons:
1). local_eliminate_vsetvl_insn we already have (no change).
2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from 
orignal Phase 6 but with more powerful and reliable implementation.
   E.g.
   void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
 size_t avl;
 if (m > 100)
   avl = __riscv_vsetvl_e16mf4(vl << 4);

Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement

2023-06-09 Thread Robin Dapp via Gcc-patches

On 6/9/23 16:32, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> This patch fixes the requirement of V_WHOLE and V_FRACT.
> E.g. VNx8QI in V_WHOLE has no requirement which is incorrect.
>  Actually, VNx8QI should be whole(full) mode when TARGET_MIN_VLEN < 128
>  since when TARGET_MIN_VLEN == 128, VNx8QI is e8mf2 which is fractional
>  vector.
> 
> gcc/ChangeLog:
> 
> * config/riscv/vector-iterators.md: Fix requirement.

I actually have the attached already on my local tree (as well as a test),
and wanted to post it with the vec_set patch.  I think the alignment helps
a bit with readability.

>From 147a459dfbf1fe9d5dd93148f475f42dee3bd94b Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Tue, 6 Jun 2023 17:29:26 +0200
Subject: [PATCH] RISC-V: Change V_WHOLE iterator to properly match
 instruction.

Currently we emit e.g. an vl1r.v even when loading a mode whose size is
smaller than the hardware vector size.  This can happen when reload
decides to switch to another alternative.

This patch fixes the iterator and adds a testcase for the problem.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Add guards for modes smaller
than the hardware vector size.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c: New test.
---
 gcc/config/riscv/vector-iterators.md  | 65 ++-
 .../rvv/autovec/vls-vlmax/full-vec-move1.c| 23 +++
 2 files changed, 72 insertions(+), 16 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 90743ed76c5..0587325e82c 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -430,32 +430,65 @@ (define_mode_iterator VNX64_QHI [
   VNx64QI (VNx64HI "TARGET_MIN_VLEN >= 128")
 ])
 
+;; This iterator describes which modes can be moved/loaded/stored by
+;; full-register move instructions (e.g. vl1r.v).
+;; For now we support a maximum vector length of 1024 that can
+;; also be reached by combining multiple hardware registers (mf1, mf2, ...).
+;; This means that e.g. VNx64HI (with a size of 128 bytes) requires
+;; at least a minimum vector length of 128 bits = 16 bytes in order
+;; to be loadable by vl8r.v (mf8).
+;; Apart from that we must make sure that modes smaller than the
+;; vector size are properly guarded so that e.g. VNx4HI is not loaded
+;; by vl1r.v when VL == 128.
 (define_mode_iterator V_WHOLE [
-  (VNx4QI "TARGET_MIN_VLEN == 32") VNx8QI VNx16QI VNx32QI (VNx64QI 
"TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
-  (VNx2HI "TARGET_MIN_VLEN == 32") VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
-  (VNx1SI "TARGET_MIN_VLEN == 32") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
-  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
-  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_MIN_VLEN >= 128")
+  (VNx4QI "TARGET_MIN_VLEN == 32")
+  (VNx8QI "TARGET_MIN_VLEN <= 64")
+  (VNx16QI "TARGET_MIN_VLEN <= 128")
+  (VNx32QI "TARGET_MIN_VLEN <= 256")
+  (VNx64QI "TARGET_MIN_VLEN >= 64 && TARGET_MIN_VLEN <= 512")
+  (VNx128QI "TARGET_MIN_VLEN >= 128 && TARGET_MIN_VLEN <= 1024")
+  (VNx2HI "TARGET_MIN_VLEN == 32")
+  (VNx4HI "TARGET_MIN_VLEN <= 64")
+  (VNx8HI "TARGET_MIN_VLEN <= 128")
+  (VNx16HI "TARGET_MIN_VLEN <= 256")
+  (VNx32HI "TARGET_MIN_VLEN >= 64 && TARGET_MIN_VLEN <= 512")
+  (VNx64HI "TARGET_MIN_VLEN >= 128 && TARGET_MIN_VLEN <= 1024")
+  (VNx1SI "TARGET_MIN_VLEN == 32")
+  (VNx2SI "TARGET_MIN_VLEN <= 64")
+  (VNx4SI "TARGET_MIN_VLEN <= 128")
+  (VNx8SI "TARGET_MIN_VLEN <= 256")
+  (VNx16SI "TARGET_MIN_VLEN >= 64 && TARGET_MIN_VLEN <= 512")
+  (VNx32SI "TARGET_MIN_VLEN >= 128 && TARGET_MIN_VLEN <= 1024")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN == 64")
+  (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN <= 128")
+  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN <= 256")
+  (VNx8DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN <= 512")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128
+&& TARGET_MIN_VLEN <= 1024")
 
   (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 64")
   (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN == 32")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-  (VNx32SF "TARGET_

Re: Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement

2023-06-09 Thread 钟居哲

Ok. If you have done this plz go ahead.
I think it shouldn't be with vec_set patch.
Instead, it obviously should be the separate patch.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-09 22:37
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement
On 6/9/23 16:32, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> This patch fixes the requirement of V_WHOLE and V_FRACT.
> E.g. VNx8QI in V_WHOLE has no requirement which is incorrect.
>  Actually, VNx8QI should be whole(full) mode when TARGET_MIN_VLEN < 128
>  since when TARGET_MIN_VLEN == 128, VNx8QI is e8mf2 which is fractional
>  vector.
> 
> gcc/ChangeLog:
> 
> * config/riscv/vector-iterators.md: Fix requirement.
 
I actually have the attached already on my local tree (as well as a test),
and wanted to post it with the vec_set patch.  I think the alignment helps
a bit with readability.
 
From 147a459dfbf1fe9d5dd93148f475f42dee3bd94b Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Tue, 6 Jun 2023 17:29:26 +0200
Subject: [PATCH] RISC-V: Change V_WHOLE iterator to properly match
instruction.
 
Currently we emit e.g. an vl1r.v even when loading a mode whose size is
smaller than the hardware vector size.  This can happen when reload
decides to switch to another alternative.
 
This patch fixes the iterator and adds a testcase for the problem.
 
gcc/ChangeLog:
 
* config/riscv/vector-iterators.md: Add guards for modes smaller
than the hardware vector size.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c: New test.
---
gcc/config/riscv/vector-iterators.md  | 65 ++-
.../rvv/autovec/vls-vlmax/full-vec-move1.c| 23 +++
2 files changed, 72 insertions(+), 16 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 90743ed76c5..0587325e82c 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -430,32 +430,65 @@ (define_mode_iterator VNX64_QHI [
   VNx64QI (VNx64HI "TARGET_MIN_VLEN >= 128")
])
+;; This iterator describes which modes can be moved/loaded/stored by
+;; full-register move instructions (e.g. vl1r.v).
+;; For now we support a maximum vector length of 1024 that can
+;; also be reached by combining multiple hardware registers (mf1, mf2, ...).
+;; This means that e.g. VNx64HI (with a size of 128 bytes) requires
+;; at least a minimum vector length of 128 bits = 16 bytes in order
+;; to be loadable by vl8r.v (mf8).
+;; Apart from that we must make sure that modes smaller than the
+;; vector size are properly guarded so that e.g. VNx4HI is not loaded
+;; by vl1r.v when VL == 128.
(define_mode_iterator V_WHOLE [
-  (VNx4QI "TARGET_MIN_VLEN == 32") VNx8QI VNx16QI VNx32QI (VNx64QI 
"TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
-  (VNx2HI "TARGET_MIN_VLEN == 32") VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
-  (VNx1SI "TARGET_MIN_VLEN == 32") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
-  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
-  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_MIN_VLEN >= 128")
+  (VNx4QI "TARGET_MIN_VLEN == 32")
+  (VNx8QI "TARGET_MIN_VLEN <= 64")
+  (VNx16QI "TARGET_MIN_VLEN <= 128")
+  (VNx32QI "TARGET_MIN_VLEN <= 256")
+  (VNx64QI "TARGET_MIN_VLEN >= 64 && TARGET_MIN_VLEN <= 512")
+  (VNx128QI "TARGET_MIN_VLEN >= 128 && TARGET_MIN_VLEN <= 1024")
+  (VNx2HI "TARGET_MIN_VLEN == 32")
+  (VNx4HI "TARGET_MIN_VLEN <= 64")
+  (VNx8HI "TARGET_MIN_VLEN <= 128")
+  (VNx16HI "TARGET_MIN_VLEN <= 256")
+  (VNx32HI "TARGET_MIN_VLEN >= 64 && TARGET_MIN_VLEN <= 512")
+  (VNx64HI "TARGET_MIN_VLEN >= 128 && TARGET_MIN_VLEN <= 1024")
+  (VNx1SI "TARGET_MIN_VLEN == 32")
+  (VNx2SI "TARGET_MIN_VLEN <= 64")
+  (VNx4SI "TARGET_MIN_VLEN <= 128")
+  (VNx8SI "TARGET_MIN_VLEN <= 256")
+  (VNx16SI "TARGET_MIN_VLEN >= 64 && TARGET_MIN_VLEN <= 512")
+  (VNx32SI "TARGET_MIN_VLEN >= 128 && TARGET_MIN_VLEN <= 1024")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN == 64")
+  (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN <= 128")
+  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN <= 256")
+  (VNx8DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN <= 512")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128
+&& TARGET_MIN_VLEN <= 1024")
   (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 64")
   (VNx64

Re: [PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread Jeff Law via Gcc-patches





On 6/9/23 05:32, juzhe.zh...@rivai.ai wrote:

Thanks a lot Richi.

Even though last time Richard asked me no need to wait for 2nd ACK,
I am still want to wait for Richard final approval since I am not sure this 
patch is ok for him.
If Richard had asked you to wait for Richi and you've done updates based 
on Richi's feedback, then it becomes a judgment call -- if the changes 
are significant, then we might want Richard to take another look.  If 
the changes are minor, then getting another ACK isn't necessary.


It's not always clear what the best path forward ought to be and in 
cases where it isn't clear, a bit of caution is appreciated.


So to give some clear guidance.  Based on my understanding both Richard 
and Richi are basically on board with what you've implemented.  So how 
about this, if Richard hasn't chimed in by the start of your day on 
Tuesday, go ahead with the patch.  That gives Richard the rest of today 
and his Monday if there's something he wants to comment on.


jeff

Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement

2023-06-09 Thread Robin Dapp via Gcc-patches

> I think it shouldn't be with vec_set patch.
> Instead, it obviously should be the separate patch.

Yes, I didn't mean in the actual same patch.

Regards
 Robin

Re: Re: [PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread 钟居哲

Ok. Thanks Jeff.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2023-06-09 22:42
To: juzhe.zh...@rivai.ai; rguenther
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V6] VECT: Add SELECT_VL support

On 6/9/23 05:32, juzhe.zh...@rivai.ai wrote:
> Thanks a lot Richi.
> 
> Even though last time Richard asked me no need to wait for 2nd ACK,
> I am still want to wait for Richard final approval since I am not sure this 
> patch is ok for him.
If Richard had asked you to wait for Richi and you've done updates based 
on Richi's feedback, then it becomes a judgment call -- if the changes 
are significant, then we might want Richard to take another look.  If 
the changes are minor, then getting another ACK isn't necessary.

It's not always clear what the best path forward ought to be and in 
cases where it isn't clear, a bit of caution is appreciated.

So to give some clear guidance.  Based on my understanding both Richard 
and Richi are basically on board with what you've implemented.  So how 
about this, if Richard hasn't chimed in by the start of your day on 
Tuesday, go ahead with the patch.  That gives Richard the rest of today 
and his Monday if there's something he wants to comment on.

jeff

Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement

2023-06-09 Thread Jeff Law via Gcc-patches





On 6/9/23 08:37, Robin Dapp wrote:

On 6/9/23 16:32, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

This patch fixes the requirement of V_WHOLE and V_FRACT.
E.g. VNx8QI in V_WHOLE has no requirement which is incorrect.
  Actually, VNx8QI should be whole(full) mode when TARGET_MIN_VLEN < 128
  since when TARGET_MIN_VLEN == 128, VNx8QI is e8mf2 which is fractional
  vector.

gcc/ChangeLog:

 * config/riscv/vector-iterators.md: Fix requirement.


I actually have the attached already on my local tree (as well as a test),
and wanted to post it with the vec_set patch.  I think the alignment helps
a bit with readability.

 From 147a459dfbf1fe9d5dd93148f475f42dee3bd94b Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Tue, 6 Jun 2023 17:29:26 +0200
Subject: [PATCH] RISC-V: Change V_WHOLE iterator to properly match
  instruction.

Currently we emit e.g. an vl1r.v even when loading a mode whose size is
smaller than the hardware vector size.  This can happen when reload
decides to switch to another alternative.

This patch fixes the iterator and adds a testcase for the problem.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Add guards for modes smaller
than the hardware vector size.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c: New test.
Sounds like Juzhe is OK with this moving independently.  So I'll rubber 
stamp it. :-)


jeff

Re: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread 钟居哲

Thanks Jeff.
Actually, RTL_SSA framework is a very usefull tool very similar the framwork of 
SDnode of LLVM.
which is the framework I am familar with. I just realize that the 2nd build of 
RTL_SSA causes bugs
that's why I change it into data-flow.

Address all comments will send V3 soon.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-09 22:33
To: juzhe.zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc; pan2.li
Subject: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS
 
 
On 6/9/23 04:41, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && 
> Phase 6
> are quite messy and cause some bugs discovered by my downstream 
> auto-vectorization
> test-generator.
> 
> Before this patch.
> 
> Phase 5 is cleanup_insns is the function remove AVL operand dependency from 
> each RVV instruction.
> E.g. vadd.vv (use a5), after Phase 5, > vadd.vv (use const_int 0). Since 
> "a5" is used in "vsetvl" instructions and
> after the correct "vsetvl" instructions are inserted, each RVV instruction 
> doesn't need AVL operand "a5" anymore. Then,
> we remove this operand dependency helps for the following scheduling PASS.
Right.  Removal of the unused operand gives the scheduler more freedom. 
It's not clear yet how much gain there is for scheduling vector on RV, 
but there's no good reason to handcuff it with unnecessary dependencies.
 
 
> 
> Phase 6 is propagate_avl do the following 2 things:
> 1. Local && Global user vsetvl instructions optimization.
> E.g.
>vsetvli a2, a2, e8, mf8   ==> Change it into vsetvli a2, a2, e32, 
> mf2
>vsetvli zero,a2, e32, mf2  ==> eliminate
Always good to eliminate more instructions.   So while vsetvl is 
designed to be minimal overhead and it's fully expected that we'll see a 
lot of them, there's no good reason to have unnnecessary ones in the stream.
 
 
> 2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is 
> not used by any instructions.
> Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM 
> which change the CFG, I re-new a new
> RTL_SSA framework (which is more expensive than just using DF) for Phase 6 
> and optmize user vsetvli base on the new RTL_SSA.
This one isn't as clear cut, but I still think it's the right thing to 
do.  The first form explicitly kills the value in a2 while the second 
does not.  Though if the value is dead it's going to be discoverable by 
DF and we should also end up with REG_DEAD note as well.   It does have 
the advantage that it does not open a new live range.
 
> 
> There are 2 issues in Phase 5 && Phase 6:
> 1. local_eliminate_vsetvl_insn was introduced by @kito which can do better 
> local user vsetvl optimizations better than
> Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. 
> So the local user vsetvli instructions optimizaiton
> in Phase 6 is redundant and should be removed.
> 2. A bug discovered by my downstream auto-vectorization test-generator (I 
> can't put the test in this patch since we are missing autovec
> patterns for it so we can't use the upstream GCC directly reproduce such 
> issue but I will remember put it back after I support the
> necessary autovec patterns). Such bug is causing by using RTL_SSA re-new 
> framework. The issue description is this:
Note that you could potentially go ahead and submit that test and just 
xfail it.  Not a requirement, but a possibility that I sometimes use if 
I know I've got a fix coming shortly.
 
 
> 
> Before Phase 6:
> ...
> insn1: vsetlvi a3, 17 <== generated by SELECT_VL auto-vec pattern.
> slli a4,a3,3
> ...
> insn2: vsetvli zero, a3, ...
> load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" 
> is removed in Phase 5)
> ...
> 
> In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
> insn2 is the vsetvli instruction inserted in Phase 4 which is not included in 
> the RLT_SSA framework
> even though we renew it (I didn't take a look at it and I don't think we need 
> to now).
> Base on this situation, the def_info of insn2 has the information 
> "set->single_nondebug_insn_use ()"
> which return true. Obviously, this information is not correct, since insn1 
> has aleast 2 uses:
> 1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by 
> my downstream test-generator
> execution test failed.
Understood.
 
> 
> Conclusion of RTL_SSA framework:
> Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of 
> the VSETVL PASS which is absolutely correct, the other
> is re-new after Phase 4 (LCM) has incorrect information that causes bugs.
> 
> Besides, we don't like to initialize RTL_SSA second time it seems to be a 
> waste since we just need to do a little optimization.
> 
> Base on all circumstances I described above, I rework and r

Re: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread 钟居哲

>> I'd probably adjust the name as well.  There's an important exception to 
>> returning the first vsetvl -- you stop the search if you encounter a 
>> user RVV instruction.
Could you give me a function name of this?
like:
get_first_vsetvl_prior_all_rvv_insns
is it ok? But I think the name is too long.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-09 22:33
To: juzhe.zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc; pan2.li
Subject: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS
 
 
On 6/9/23 04:41, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && 
> Phase 6
> are quite messy and cause some bugs discovered by my downstream 
> auto-vectorization
> test-generator.
> 
> Before this patch.
> 
> Phase 5 is cleanup_insns is the function remove AVL operand dependency from 
> each RVV instruction.
> E.g. vadd.vv (use a5), after Phase 5, > vadd.vv (use const_int 0). Since 
> "a5" is used in "vsetvl" instructions and
> after the correct "vsetvl" instructions are inserted, each RVV instruction 
> doesn't need AVL operand "a5" anymore. Then,
> we remove this operand dependency helps for the following scheduling PASS.
Right.  Removal of the unused operand gives the scheduler more freedom. 
It's not clear yet how much gain there is for scheduling vector on RV, 
but there's no good reason to handcuff it with unnecessary dependencies.
 
 
> 
> Phase 6 is propagate_avl do the following 2 things:
> 1. Local && Global user vsetvl instructions optimization.
> E.g.
>vsetvli a2, a2, e8, mf8   ==> Change it into vsetvli a2, a2, e32, 
> mf2
>vsetvli zero,a2, e32, mf2  ==> eliminate
Always good to eliminate more instructions.   So while vsetvl is 
designed to be minimal overhead and it's fully expected that we'll see a 
lot of them, there's no good reason to have unnnecessary ones in the stream.
 
 
> 2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is 
> not used by any instructions.
> Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM 
> which change the CFG, I re-new a new
> RTL_SSA framework (which is more expensive than just using DF) for Phase 6 
> and optmize user vsetvli base on the new RTL_SSA.
This one isn't as clear cut, but I still think it's the right thing to 
do.  The first form explicitly kills the value in a2 while the second 
does not.  Though if the value is dead it's going to be discoverable by 
DF and we should also end up with REG_DEAD note as well.   It does have 
the advantage that it does not open a new live range.
 
> 
> There are 2 issues in Phase 5 && Phase 6:
> 1. local_eliminate_vsetvl_insn was introduced by @kito which can do better 
> local user vsetvl optimizations better than
> Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. 
> So the local user vsetvli instructions optimizaiton
> in Phase 6 is redundant and should be removed.
> 2. A bug discovered by my downstream auto-vectorization test-generator (I 
> can't put the test in this patch since we are missing autovec
> patterns for it so we can't use the upstream GCC directly reproduce such 
> issue but I will remember put it back after I support the
> necessary autovec patterns). Such bug is causing by using RTL_SSA re-new 
> framework. The issue description is this:
Note that you could potentially go ahead and submit that test and just 
xfail it.  Not a requirement, but a possibility that I sometimes use if 
I know I've got a fix coming shortly.
 
 
> 
> Before Phase 6:
> ...
> insn1: vsetlvi a3, 17 <== generated by SELECT_VL auto-vec pattern.
> slli a4,a3,3
> ...
> insn2: vsetvli zero, a3, ...
> load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" 
> is removed in Phase 5)
> ...
> 
> In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
> insn2 is the vsetvli instruction inserted in Phase 4 which is not included in 
> the RLT_SSA framework
> even though we renew it (I didn't take a look at it and I don't think we need 
> to now).
> Base on this situation, the def_info of insn2 has the information 
> "set->single_nondebug_insn_use ()"
> which return true. Obviously, this information is not correct, since insn1 
> has aleast 2 uses:
> 1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by 
> my downstream test-generator
> execution test failed.
Understood.
 
> 
> Conclusion of RTL_SSA framework:
> Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of 
> the VSETVL PASS which is absolutely correct, the other
> is re-new after Phase 4 (LCM) has incorrect information that causes bugs.
> 
> Besides, we don't like to initialize RTL_SSA second time it seems to be a 
> waste since we just need to do a little optimization.
> 
> Base on all circumstances I described above, I rework

Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-09 Thread Jeff Law via Gcc-patches





On 6/9/23 08:58, 钟居哲 wrote:
I'd probably adjust the name as well.  There's an important exception to 
returning the first vsetvl -- you stop the search if you encounter a

user RVV instruction.


Could you give me a function name of this?
like:
get_first_vsetvl_prior_all_rvv_insns
is it ok? But I think the name is too long.
get_first_vsetvl_before_rvv_insns?  It's a bit smaller and I think 
captures the key exception -- does that work for you?


Jeff

Re: [PATCH] simplify-rtx: Implement constant folding of SS_TRUNCATE, US_TRUNCATE

2023-06-09 Thread Jeff Law via Gcc-patches





On 6/8/23 08:56, Kyrylo Tkachov via Gcc-patches wrote:

Hi all,

This patch implements RTL constant-folding for the SS_TRUNCATE and US_TRUNCATE 
codes.
The semantics are a clamping operation on the argument with the min and max of 
the narrow mode,
followed by a truncation. The signedness of the clamp and the min/max extrema 
is derived from
the signedness of the saturating operation.

We have a number of instructions in aarch64 that use SS_TRUNCATE and 
US_TRUNCATE to represent
their operations and we have pretty thorough runtime tests in 
gcc.target/aarch64/advsimd-intrinsics/vqmovn*.c.
With this patch the instructions are folded away at optimisation levels and the 
correctness checks still
pass.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Ok for trunk?

Thanks,
Kyrill

gcc/ChangeLog:

* simplify-rtx.cc (simplify_const_unary_operation):
Handle US_TRUNCATE, SS_TRUNCATE.

OK.
jeff

[PATCH] c++: diagnostic ICE b/c of empty TPARMS_PRIMARY_TEMPLATE [PR109655]

2023-06-09 Thread Patrick Palka via Gcc-patches

When defining a previously declared class template, we neglect to set
TPARMS_PRIMARY_TEMPLATE for the in-scope template parameters, which the
class members go on to inherit, and so the members' DECL_TEMPLATE_PARMS
will have empty TPARMS_PRIMARY_TEMPLATE at those levels as well.  This
causes us to crash when diagnosing a constraint mismatch for an
out-of-line declaration of a member of a constrained class template.

This patch fixes this by walking the context to get at the corresponding
primary template instead.  I spent a while trying to get us to set
TPARMS_PRIMARY_TEMPLATE for templated class definitions that are
redeclarations, but it proved to be hairy in particular for partial
specializations and nested templates.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

PR c++/109655

gcc/cp/ChangeLog:

* pt.cc (push_template_decl): Handle TPARMS_PRIMARY_TEMPLATE
being empty when diagnosing a constraint mismatch for an
enclosing template scope.  Don't bother checking constraints
if DECL_PARMS and SCOPE_PARMS are the same.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-class6.C: New test.
* g++.dg/cpp2a/concepts-class6a.C: New test.
---
 gcc/cp/pt.cc  | 19 +++--
 gcc/testsuite/g++.dg/cpp2a/concepts-class6.C  | 30 ++
 gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C | 40 +++
 3 files changed, 86 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-class6.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 17bf4d24151..f913b248345 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -6155,12 +6155,25 @@ push_template_decl (tree decl, bool is_friend)
  decl_parms = TREE_CHAIN (decl_parms);
  scope_parms = TREE_CHAIN (scope_parms);
}
- while (decl_parms)
+ while (decl_parms && decl_parms != scope_parms)
{
  if (!template_requirements_equivalent_p (decl_parms, scope_parms))
{
- error ("redeclaration of %qD with different constraints",
-TPARMS_PRIMARY_TEMPLATE (TREE_VALUE (decl_parms)));
+ tree td = TPARMS_PRIMARY_TEMPLATE (TREE_VALUE (decl_parms));
+ if (!td)
+   {
+ /* FIXME: TPARMS_PRIMARY_TEMPLATE doesn't always get
+set for enclosing template scopes.  Work around
+this by walking the context to obtain the relevant
+(primary) template whose constraints we mismatch.  */
+ int level = TMPL_PARMS_DEPTH (decl_parms);
+ td = TYPE_TI_TEMPLATE (ctx);
+ while (!PRIMARY_TEMPLATE_P (td)
+|| (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (td))
+!= level))
+   td = TYPE_TI_TEMPLATE (DECL_CONTEXT (td));
+   }
+ error ("redeclaration of %qD with different constraints", td);
  break;
}
  decl_parms = TREE_CHAIN (decl_parms);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-class6.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-class6.C
new file mode 100644
index 000..dcef6a2c9d4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-class6.C
@@ -0,0 +1,30 @@
+// PR c++/109655
+// { dg-do compile { target c++20 } }
+
+class C {
+  template
+  requires true
+  friend class D;
+
+  template
+  requires true
+  class E;
+};
+
+template
+requires true
+class D {
+  void f();
+};
+
+template
+void D::f() { } // { dg-error "class D' with different constraints" }
+
+template
+requires true
+class C::E {
+  void f();
+};
+
+template
+void C::E::f() { } // { dg-error "class C::E' with different constraints" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C
new file mode 100644
index 000..751d13cdf6c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C
@@ -0,0 +1,40 @@
+// PR c++/109655
+// { dg-do compile { target c++20 } }
+
+template
+requires true
+class C {
+  class D;
+
+  template
+  requires (!!true)
+  class E;
+};
+
+template
+requires true
+class C::D {
+  void f();
+};
+
+template  // missing "requires true"
+void C::D::f() { } // { dg-error "class C' with different constraints" }
+
+template
+requires true
+template
+requires (!!true)
+class C::E {
+  void f();
+  void g();
+};
+
+template
+requires true
+template
+void C::E::f() { } // { dg-error "class C::E' with different 
constraints" }
+
+template
+template
+requires (!!true)
+void C::E::g() { } // { dg-error "class C' with different constraints" }
-- 
2.41.0.rc1.10.g9e49351c30

[pushed] c++: init-list of uncopyable type [PR110102]

2023-06-09 Thread Jason Merrill via Gcc-patches

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The maybe_init_list_as_range optimization is a form of copy elision, but we
can only elide well-formed copies.

PR c++/110102

gcc/cp/ChangeLog:

* call.cc (maybe_init_list_as_array): Check that the element type is
copyable.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-opt1.C: New test.
---
 gcc/cp/call.cc |  8 
 gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C | 15 +++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index d6154f1a319..354773f00c6 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -4272,6 +4272,14 @@ maybe_init_list_as_array (tree elttype, tree init)
   if (has_non_trivial_temporaries (first))
 return NULL_TREE;
 
+  /* We can't do this if copying from the initializer_list would be
+ ill-formed.  */
+  tree copy_argtypes = make_tree_vec (1);
+  TREE_VEC_ELT (copy_argtypes, 0)
+= cp_build_qualified_type (elttype, TYPE_QUAL_CONST);
+  if (!is_xible (INIT_EXPR, elttype, copy_argtypes))
+return NULL_TREE;
+
   init_elttype = cp_build_qualified_type (init_elttype, TYPE_QUAL_CONST);
   tree arr = build_array_of_n_type (init_elttype, CONSTRUCTOR_NELTS (init));
   arr = finish_compound_literal (arr, init, tf_none);
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C
new file mode 100644
index 000..56de4bc0092
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C
@@ -0,0 +1,15 @@
+// PR c++/110102
+// { dg-do compile { target c++11 } }
+
+// { dg-error "deleted|construct_at" "" { target *-*-* } 0 }
+
+#include 
+
+struct A {
+  A(int) {}
+  A(const A&) = delete;// { dg-message "declared here" }
+  A(A&&) {}
+};
+int main() {
+  std::list v = {1,2,3};
+}

base-commit: 3e12669a0eb968cfcbe9242b382fd8020935edf8
-- 
2.31.1

[pushed] c++: diagnose auto in template arg

2023-06-09 Thread Jason Merrill via Gcc-patches

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

We were failing to diagnose this Concepts TS feature that didn't make it
into C++20 because the 'auto' was getting converted to a template parameter
before we checked for it.  So also check in cp_parser_simple_type_specifier.

The code in cp_parser_template_type_arg that I initially expected to
diagnose this seems unreachable because cp_parser_type_id_1 already checks
auto.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_type_specifier): Check for auto
in template argument.
(cp_parser_template_type_arg): Remove auto checking.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/auto7.C: New test.
* g++.dg/concepts/auto7a.C: New test.
---
 gcc/cp/parser.cc   | 17 -
 gcc/testsuite/g++.dg/concepts/auto7.C  |  9 +
 gcc/testsuite/g++.dg/concepts/auto7a.C |  8 
 3 files changed, 25 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/concepts/auto7.C
 create mode 100644 gcc/testsuite/g++.dg/concepts/auto7a.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d77fbd20e56..09cba713437 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -19823,15 +19823,19 @@ cp_parser_simple_type_specifier (cp_parser* parser,
 "only available with "
 "%<-std=c++14%> or %<-std=gnu++14%>");
}
+ else if (!flag_concepts_ts && parser->in_template_argument_list_p)
+   pedwarn (token->location, 0,
+"use of % in template argument "
+"only available with %<-fconcepts-ts%>");
+ else if (!flag_concepts)
+   pedwarn (token->location, 0,
+"use of % in parameter declaration "
+"only available with %<-std=c++20%> or %<-fconcepts%>");
  else if (cxx_dialect < cxx14)
error_at (token->location,
 "use of % in parameter declaration "
 "only available with "
 "%<-std=c++14%> or %<-std=gnu++14%>");
- else if (!flag_concepts)
-   pedwarn (token->location, 0,
-"use of % in parameter declaration "
-"only available with %<-std=c++20%> or %<-fconcepts%>");
}
   else
type = make_auto ();
@@ -24522,11 +24526,6 @@ cp_parser_template_type_arg (cp_parser *parser)
 = G_("types may not be defined in template arguments");
   r = cp_parser_type_id_1 (parser, CP_PARSER_FLAGS_NONE, true, false, NULL);
   parser->type_definition_forbidden_message = saved_message;
-  if (cxx_dialect >= cxx14 && !flag_concepts && type_uses_auto (r))
-{
-  error ("invalid use of % in template argument");
-  r = error_mark_node;
-}
   return r;
 }
 
diff --git a/gcc/testsuite/g++.dg/concepts/auto7.C 
b/gcc/testsuite/g++.dg/concepts/auto7.C
new file mode 100644
index 000..3cbf5dd8dfc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/auto7.C
@@ -0,0 +1,9 @@
+// { dg-do compile { target c++14 } }
+// { dg-additional-options -fconcepts-ts }
+
+template  struct A { };
+void f(A a) { }
+int main()
+{
+  f(A());
+}
diff --git a/gcc/testsuite/g++.dg/concepts/auto7a.C 
b/gcc/testsuite/g++.dg/concepts/auto7a.C
new file mode 100644
index 000..88868f45d1c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/auto7a.C
@@ -0,0 +1,8 @@
+// { dg-do compile { target c++14 } }
+
+template  struct A { };
+void f(A a) { }  // { dg-error "auto. in template argument" }
+int main()
+{
+  f(A());
+}

base-commit: 3e12669a0eb968cfcbe9242b382fd8020935edf8
-- 
2.31.1

[pushed] c++: fix 32-bit spaceship failures [PR110185]

2023-06-09 Thread Jason Merrill via Gcc-patches

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Various spaceship tests failed after r14-1624.  This turned out to be
because the comparison category classes return in memory on 32-bit targets,
and the synthesized operator<=> looks something like

if (auto v = a.x <=> b.x, v == 0); else return v;
if (auto v = a.y <=> b.y, v == 0); else return v;
etc.

so check_return_expr was trying to do NRVO for all the 'v' variables, and
now on subsequent returns we check to see if the previous NRV is still in
scope.  But the NRVs didn't have names, so looking up name bindings crashed.
Fixed both by giving 'v' a name so we can NRVO the first one, and fixing the
test to give up if the old NRV has no name.

PR c++/110185
PR c++/58487

gcc/cp/ChangeLog:

* method.cc (build_comparison_op): Give retval a name.
* typeck.cc (check_return_expr): Fix for nameless variables.
---
 gcc/cp/method.cc | 1 +
 gcc/cp/typeck.cc | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 0c2ca9e4f41..91cf943f110 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -1679,6 +1679,7 @@ build_comparison_op (tree fndecl, bool defining, 
tsubst_flags_t complain)
  if (defining)
{
  tree var = create_temporary_var (rettype);
+ DECL_NAME (var) = get_identifier ("retval");
  pushdecl (var);
  cp_finish_decl (var, comp, false, NULL_TREE, flags);
  comp = retval = var;
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 11927cbdf83..da591dafc8f 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -11174,6 +11174,7 @@ check_return_expr (tree retval, bool *no_warning)
current_function_return_value = bare_retval;
   else if (current_function_return_value
   && VAR_P (current_function_return_value)
+  && DECL_NAME (current_function_return_value)
   && !decl_in_scope_p (current_function_return_value))
{
  /* The earlier NRV is out of scope at this point, so it's safe to

base-commit: 3e12669a0eb968cfcbe9242b382fd8020935edf8
prerequisite-patch-id: c69d648e8ed235e699b6a36b3cfbc031dc37fca0
prerequisite-patch-id: f01eb83e6402e182a0c9bd6202e4ed6db7529733
-- 
2.31.1

Splitting up 27_io/basic_istream/ignore/wchar_t/94749.cc (takes too long)

2023-06-09 Thread Hans-Peter Nilsson via Gcc-patches

Hi!

The test 27_io/basic_istream/ignore/wchar_t/94749.cc takes
about 10 minutes to run for cris-elf in the "gdb simulator"
here on my arguably way-past-retirement machine (and it
looks like it gained a minute with LRA).  I've seen it
timing out every now and then on busy days with load >
`nproc`.  Usually it happens some time after I've forgot
about why. :)

It has had some performance surgery before (pruning for
simulators, doubling timeout for ilp32).  I'd probably just
try cutting along the function boundaries and keep those
parts separate that have >1 min execution time.

Anyway, your thoughts on the matter would be appreciated.

brgds, H-P

[COMMITTED] Relocate range_cast to header, and add a generic version.

2023-06-09 Thread Andrew MacLeod via Gcc-patches


THis patch moves range_cast into the header file and makes it inlinable.

 I also added a trap so that if you try to cast into an unsupported 
type, it traps.  It can't return a value of the correct type, so the 
caller needs to be doing something else...


Such as using the new variant of range_cast provided here which uses a 
Value_Range.  This is the malleable range type and it first sets the 
type appropriately.   This will also work for unsupported types, and 
will assist with things like  float to int casts and vice versa.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From de03afe3168db7e2eb2a594293c846188a1b5be8 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 31 May 2023 17:02:00 -0400
Subject: [PATCH 1/2] Relocate range_cast to header, and add a generic version.

Make range_cast inlinable by moving it to the header file.
Also trap if the destination is not capable of representing the cast type.
Add a generic version which can change range classes.. ie float to int.

	* range-op.cc (range_cast): Move to...
	* range-op.h (range_cast): Here and add generic a version.
---
 gcc/range-op.cc | 18 --
 gcc/range-op.h  | 44 +++-
 2 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 4d122de3026..44a95b20ffa 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -4929,24 +4929,6 @@ pointer_table::pointer_table ()
   set (BIT_XOR_EXPR, op_bitwise_xor);
 }
 
-// Cast the range in R to TYPE.
-
-bool
-range_cast (vrange &r, tree type)
-{
-  Value_Range tmp (r);
-  Value_Range varying (type);
-  varying.set_varying (type);
-  range_op_handler op (CONVERT_EXPR, type);
-  // Call op_convert, if it fails, the result is varying.
-  if (!op || !op.fold_range (r, type, tmp, varying))
-{
-  r.set_varying (type);
-  return false;
-}
-  return true;
-}
-
 #if CHECKING_P
 #include "selftest.h"
 
diff --git a/gcc/range-op.h b/gcc/range-op.h
index 7af58736c3f..2abec3299ef 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -216,7 +216,49 @@ protected:
   range_operator *m_operator;
 };
 
-extern bool range_cast (vrange &, tree type);
+// Cast the range in R to TYPE if R supports TYPE.
+
+inline bool
+range_cast (vrange &r, tree type)
+{
+  gcc_checking_assert (r.supports_type_p (type));
+  Value_Range tmp (r);
+  Value_Range varying (type);
+  varying.set_varying (type);
+  range_op_handler op (CONVERT_EXPR, type);
+  // Call op_convert, if it fails, the result is varying.
+  if (!op || !op.fold_range (r, type, tmp, varying))
+{
+  r.set_varying (type);
+  return false;
+}
+  return true;
+}
+
+// Range cast which is capable of switching range kinds.
+// ie for float to int.
+
+inline bool
+range_cast (Value_Range &r, tree type)
+{
+  Value_Range tmp (r);
+  Value_Range varying (type);
+  varying.set_varying (type);
+
+  // Ensure we are in the correct mode for the call to fold.
+  r.set_type (type);
+
+  range_op_handler op (CONVERT_EXPR, type);
+  // Call op_convert, if it fails, the result is varying.
+  if (!op || !op.fold_range (r, type, tmp, varying))
+{
+  r.set_varying (type);
+  return false;
+}
+  return true;
+}
+
+
 extern void wi_set_zero_nonzero_bits (tree type,
   const wide_int &, const wide_int &,
   wide_int &maybe_nonzero,
-- 
2.40.1

[COMMITTED] PR ipa/109886 - Also check type being cast to

2023-06-09 Thread Andrew MacLeod via Gcc-patches

before casting into an irange, make sure the type being cast into is 
also supported by irange.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 6314d76cf87df92a0f7d0fdd48240283e667998a Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 9 Jun 2023 10:17:59 -0400
Subject: [PATCH 2/2] Also check type being cast to

before casting into an irange, make sure the type being cast into
is also supported.

	PR ipa/109886
	* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Check param
	type as well.
---
 gcc/ipa-prop.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index ab6de9f10da..4e9a307ad4d 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2405,6 +2405,7 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi,
 		 of this file uses value_range's, which only hold
 		 integers and pointers.  */
 	  && irange::supports_p (TREE_TYPE (arg))
+	  && irange::supports_p (param_type)
 	  && get_range_query (cfun)->range_of_expr (vr, arg)
 	  && !vr.undefined_p ())
 	{
-- 
2.40.1

Re: Splitting up 27_io/basic_istream/ignore/wchar_t/94749.cc (takes too long)

2023-06-09 Thread Mike Stump via Gcc-patches

On Jun 9, 2023, at 9:20 AM, Hans-Peter Nilsson via Gcc-patches 
 wrote:
> 
> The test 27_io/basic_istream/ignore/wchar_t/94749.cc takes
> about 10 minutes to run for cris-elf in the "gdb simulator"

I'd let the libstdc++ people comment on specific things.  I'll comment on 
general things.  We could let line count (or word count or character count) 
scale the timeout in part, we could record times in a db and put an expected 
run time into test cases or in an along side db. We could have factors for slow 
systems, slow simulators. A 5 GHz x86_64 will likely be faster that a 40 year 
old pdp11. We can have these scale factors trigger off OS, cpu statically, 
and/or we can do a quick bogomips calculation and let that scale it and record 
that scaling factor in the build tree.

A wealth of possibilities. Solutions that require maintenance or test case 
modification are annoying. Solutions that need port work are annoying. I'd be 
tempted to say bogomips into the build (test) tree.  There are two parts, time 
to compile test cases and time to run them.  I'd be fine with a half solution 
that only does what you need.  The other part can be done by someone that has a 
need.

I'd invite comments by others on other solutions or commentary on downsides.  
For example, having a 208 thread count machine that takes 2-3 minutes to run 
the full testsuite is nice. A problem arises when 4-10 test cases suddenly 
start timing out.  You then go to around 10-53 minutes to test, which  is 
annoying. Anything that boosts the timeouts can hinder early port bring up, 
which, we'd like to avoid. I mention it without much a solution other than a db 
approach in the test tree that records each test case and can identify test 
cases that timeout and trim the timeout for them to something nicer like base + 
50% once they timeout with a larger allotment of time.

We could entertain wild thoughts. For example, make a run cache that caches run 
results given an object. See an object in the future, just look it up in a hash 
cache for the object and return those results instead of running it.  This can 
give you a large speedup in testing and would simultaneously advantage all slow 
simulation ports.  Maybe a 20-100x speedup? If you want to go this way I'd say 
do it in python at the bottom as it would be nice to switch over to python in 
the next 5-20 years and away from tcl.

A object cache in python, should be fairly small wether it is used for 
remembering run times from previous runs and setting a timeout based upon it, 
or as a does it run  and pass or run and fail cache.  The caches are likely 
only part of the problem, one still needs to have a timeout when no cache entry 
it present.  They can speed testing for the day to day grind of people that run 
1-200 times a week.

[PATCH] MATCH: Fix zero_one_valued_p not to match signed 1 bit integers

2023-06-09 Thread Andrew Pinski via Gcc-patches

So for the attached testcase, we assumed that zero_one_valued_p would
be the value [0,1] but currently zero_one_valued_p matches also
signed 1 bit integers.
This changes that not to match that and fixes the 2 new testcases at
all optimization levels.

OK for GCC 13? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/110165
PR tree-optimization/110166

gcc/ChangeLog:

* match.pd (zero_one_valued_p): Don't accept
signed 1-bit integers.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr110165-1.c: New test.
* gcc.c-torture/execute/pr110166-1.c: New test.

(cherry picked from commit 72e652f3425079259faa4edefe1dc571f72f91e0)
---
 gcc/match.pd  | 10 --
 .../gcc.c-torture/execute/pr110165-1.c| 28 
 .../gcc.c-torture/execute/pr110166-1.c| 33 +++
 3 files changed, 69 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110165-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110166-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 995ad98d823..91182448250 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1922,9 +1922,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 (match zero_one_valued_p
  @0
- (if (INTEGRAL_TYPE_P (type) && tree_nonzero_bits (@0) == 1)))
+ (if (INTEGRAL_TYPE_P (type)
+  && (TYPE_UNSIGNED (type)
+ || TYPE_PRECISION (type) > 1)
+  && tree_nonzero_bits (@0) == 1)))
 (match zero_one_valued_p
- truth_valued_p@0)
+ truth_valued_p@0
+ (if (INTEGRAL_TYPE_P (type)
+  && (TYPE_UNSIGNED (type)
+ || TYPE_PRECISION (type) > 1
 
 /* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 }.  */
 (simplify
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110165-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr110165-1.c
new file mode 100644
index 000..9521a19428e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr110165-1.c
@@ -0,0 +1,28 @@
+struct s
+{
+  int t : 1;
+};
+
+int f(struct s t, int a, int b) __attribute__((noinline));
+int f(struct s t, int a, int b)
+{
+int bd = t.t;
+if (bd) a|=b;
+return a;
+}
+
+int main(void)
+{
+struct s t;
+for(int i = -1;i <= 1; i++)
+{
+int a = 0x10;
+int b = 0x0f;
+int c = a | b;
+   struct s t = {i};
+int r = f(t, a, b);
+int exp = (i != 0) ? a | b : a;
+if (exp != r)
+ __builtin_abort();
+}
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110166-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr110166-1.c
new file mode 100644
index 000..f999d47fe69
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr110166-1.c
@@ -0,0 +1,33 @@
+struct s
+{
+  int t : 1;
+  int t1 : 1;
+};
+
+int f(struct s t) __attribute__((noinline));
+int f(struct s t)
+{
+   int c = t.t;
+   int d = t.t1;
+   if (c > d)
+ t.t = d;
+   else
+ t.t = c;
+  return t.t;
+}
+
+int main(void)
+{
+struct s t;
+for(int i = -1;i <= 0; i++)
+{
+  for(int j = -1;j <= 0; j++)
+  {
+   struct s t = {i, j};
+int r = f(t);
+int exp = i < j ? i : j;
+if (exp != r)
+ __builtin_abort();
+  }
+}
+}
-- 
2.31.1

Re: [PATCH V6] VECT: Add SELECT_VL support

2023-06-09 Thread Richard Sandiford via Gcc-patches

"juzhe.zh...@rivai.ai"  writes:
> Thanks, Richi.
>
> Should I wait for Richard ACK gain ? 
> Since the last email of this patch, he just asked me to adjust comment no 
> codes change.
> I am not sure whether he is ok.

Yeah, OK from my POV too, thanks.

Richard

Re: Splitting up 27_io/basic_istream/ignore/wchar_t/94749.cc (takes too long)

2023-06-09 Thread Hans-Peter Nilsson via Gcc-patches

> From: Mike Stump 
> Date: Fri, 9 Jun 2023 10:18:45 -0700

> On Jun 9, 2023, at 9:20 AM, Hans-Peter Nilsson via Gcc-patches 
>  wrote:
> > 
> > The test 27_io/basic_istream/ignore/wchar_t/94749.cc takes
> > about 10 minutes to run for cris-elf in the "gdb simulator"
> 
> I'd let the libstdc++ people comment on specific things.
> I'll comment on general things.  We could let line count
> (or word count or character count) scale the timeout in
> part, we could record times in a db and put an expected
> run time into test cases or in an along side db. We could
> have factors for slow systems, slow simulators. A 5 GHz
> x86_64 will likely be faster that a 40 year old pdp11. We
> can have these scale factors trigger off OS, cpu
> statically, and/or we can do a quick bogomips calculation
> and let that scale it and record that scaling factor in
> the build tree.

Wild plans, but with some points.

Beware that uniform testing IMO weighs in much heavier than
uniform test-time.  Like, arm-eabi, rv32-elf and cris-elf,
having common main factors, should test the same code and
the same number of iterations, preferably regardless of
simulator quality.  (FWIW, I consider the cris-elf gdb
simulator is *fast* - before 2021 or when built
--disable-sim-hardware.)

> A wealth of possibilities.

And complexity!

> Solutions that require maintenance or test case
> modification are annoying.

Yeah, but that maintenance annoyance unfortunately includes
initial setup.  You propose quite a major shift there.  It
sounds good, but sorry, but I must settle for just editing
the test-case some way.

brgds, H-P

Re: [PATCH v3 4/6] libstdc++: use new built-in trait __is_function for std::is_function

2023-06-09 Thread Patrick Palka via Gcc-patches

On Sun, 2 Apr 2023, Ken Matsui via Gcc-patches wrote:

> This patch gets std::is_function to dispatch to new built-in trait
> __is_function.

For std::is_function and other predicate-like type traits, I think we also
want to make the corresponding variable template is_function_v directly
use the built-in too.

> 
> libstdc++-v3/ChangeLog:
> 
>   * include/std/type_traits (is_function): Use __is_function built-in
>   trait.
> 
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/std/type_traits | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 58a732735c8..9eafd6b16f2 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -594,6 +594,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  { };
>  
>/// is_function
> +#if __has_builtin(__is_function)
> +  template
> +struct is_function
> +: public __bool_constant<__is_function(_Tp)>
> +{ };
> +#else
>template
>  struct is_function
>  : public __bool_constant::value> { };
> @@ -605,6 +611,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
>  struct is_function<_Tp&&>
>  : public false_type { };
> +#endif
>  
>  #define __cpp_lib_is_null_pointer 201309L
>  
> -- 
> 2.40.0
> 
>

Re: [PATCH] simplify-rtx: Implement constant folding of SS_TRUNCATE, US_TRUNCATE

2023-06-09 Thread Richard Sandiford via Gcc-patches

Kyrylo Tkachov via Gcc-patches  writes:
> Hi all,
>
> This patch implements RTL constant-folding for the SS_TRUNCATE and 
> US_TRUNCATE codes.
> The semantics are a clamping operation on the argument with the min and max 
> of the narrow mode,
> followed by a truncation. The signedness of the clamp and the min/max extrema 
> is derived from
> the signedness of the saturating operation.
>
> We have a number of instructions in aarch64 that use SS_TRUNCATE and 
> US_TRUNCATE to represent
> their operations and we have pretty thorough runtime tests in 
> gcc.target/aarch64/advsimd-intrinsics/vqmovn*.c.
> With this patch the instructions are folded away at optimisation levels and 
> the correctness checks still
> pass.
>
> Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> gcc/ChangeLog:
>
>   * simplify-rtx.cc (simplify_const_unary_operation):
>   Handle US_TRUNCATE, SS_TRUNCATE.
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 
> 276be67aa67247dd46361ab9badc46ab089d6df0..5983a06e5a8ca89c717e8648be410024147b16e6
>  100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -2131,6 +2131,22 @@ simplify_const_unary_operation (enum rtx_code code, 
> machine_mode mode,
> result = wide_int::from (op0, width, UNSIGNED);
> break;
>  
> + case US_TRUNCATE:
> + case SS_TRUNCATE:
> +   {
> + signop sgn = code == US_TRUNCATE ? UNSIGNED : SIGNED;
> + wide_int nmax
> +   = wide_int::from (wi::max_value (width, sgn),
> + GET_MODE_PRECISION (imode), sgn);
> + wide_int nmin
> +   = wide_int::from (wi::min_value (width, sgn),
> + GET_MODE_PRECISION (imode), sgn);
> + result
> +   = wide_int::from (op0, GET_MODE_PRECISION (imode), sgn);
> + result = wi::min (wi::max (result, nmin, sgn), nmax, sgn);

FWIW, it looks like this could be:

result = wi::min (wi::max (op0, nmin, sgn), nmax, sgn);

without the first assignment to result.  That feels more natural IMO,
since no conversion is being done on op0.

Thanks,
Richard

> + result = wide_int::from (result, width, sgn);
> + break;
> +   }
>   case SIGN_EXTEND:
> result = wide_int::from (op0, width, SIGNED);
> break;

[PATCH 1/2] analyzer: Fix allocation size false positive on conjured svalue [PR109577]

2023-06-09 Thread Tim Lange

Currently, the analyzer tries to prove that the allocation size is a
multiple of the pointee's type size.  This patch reverses the behavior
to try to prove that the expression is not a multiple of the pointee's
type size.  With this change, each unhandled case should be gracefully
considered as correct.  This fixes the bug reported in PR 109577 by
Paul Eggert.

Regression-tested on Linux x86-64 with -m32 and -m64.

2023-06-09  Tim Lange  

PR analyzer/109577

gcc/analyzer/ChangeLog:

* constraint-manager.cc (class sval_finder): Visitor to find
childs in svalue trees.
(constraint_manager::sval_constrained_p): Add new function to
check whether a sval might be part of an constraint.
* constraint-manager.h: Add sval_constrained_p function.
* region-model.cc (class size_visitor): Reverse behavior to not
emit a warning on not explicitly considered cases.
(region_model::check_region_size):
Adapt to size_visitor changes.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/allocation-size-2.c: Change expected output
and add new test case.
* gcc.dg/analyzer/pr109577.c: New test.

---
 gcc/analyzer/constraint-manager.cc| 131 ++
 gcc/analyzer/constraint-manager.h |   1 +
 gcc/analyzer/region-model.cc  |  80 ---
 .../gcc.dg/analyzer/allocation-size-2.c   |  24 ++--
 gcc/testsuite/gcc.dg/analyzer/pr109577.c  |  16 +++
 5 files changed, 194 insertions(+), 58 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr109577.c

diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index 2c9c435527e..24cd8960098 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -2218,6 +2218,137 @@ constraint_manager::get_equiv_class_by_svalue (const 
svalue *sval,
   return false;
 }
 
+/* Tries to find a svalue inside another svalue.  */
+
+class sval_finder : public visitor
+{
+public:
+  sval_finder (const svalue *query) : m_query (query)
+  {
+  }
+
+  bool found_query_p ()
+  {
+return m_found;
+  }
+
+  void visit_region_svalue (const region_svalue *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_constant_svalue (const constant_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_unknown_svalue (const unknown_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_poisoned_svalue (const poisoned_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_setjmp_svalue (const setjmp_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_initial_svalue (const initial_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_unaryop_svalue (const unaryop_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_binop_svalue (const binop_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_sub_svalue (const sub_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_repeated_svalue (const repeated_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_bits_within_svalue (const bits_within_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_unmergeable_svalue (const unmergeable_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_placeholder_svalue (const placeholder_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_widening_svalue (const widening_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_compound_svalue (const compound_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_conjured_svalue (const conjured_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_asm_output_svalue (const asm_output_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_const_fn_result_svalue (const const_fn_result_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+private:
+  const svalue *m_query;
+  bool m_found;
+};
+
+/* Returns true if SVAL is constrained.  */
+
+bool
+constraint_manager::sval_constrained_p (const svalue *sval) const
+{
+  int i;
+  equiv_class *ec;
+  sval_finder finder (sval);
+  FOR_EACH_VEC_ELT (m_equiv_classes, i, ec)
+{
+  int j;
+  const svalue *iv;
+  FOR_EACH_VEC_ELT (ec->m_vars, j, iv)
+   {
+ iv->accept (&finder);
+ if (finder.found_query_p ())
+   return true;
+   }
+}
+  return false;
+}
+
 /* Ensure that SVAL has an equivalence class within this constraint_manager;
return the ID of the class.  */
 
diff --git a/gcc/analyzer/constraint-manager.h 
b/gcc/analyzer/constraint-manager.h
index 3afbc7f848e..72753e43c96 100644
--- a/gcc/analyzer/constraint-manager.h
+++ b/gcc/analyzer/constraint-manager.h
@@ -459,6 +459,7 @@ public:
 
   bool get_equiv_class_by_svalue (const svalue *sval,

[PATCH 2/2] testsuite: Add more allocation size tests for conjured svalues [PR110014]

2023-06-09 Thread Tim Lange

This patch adds the reproducers reported in PR 110014 as test cases. The
false positives in those cases are already fixed with PR 109577.

2023-06-09  Tim Lange  

PR analyzer/110014

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/pr110014.c: New tests.

---
 gcc/testsuite/gcc.dg/analyzer/pr110014.c | 25 
 1 file changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr110014.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr110014.c 
b/gcc/testsuite/gcc.dg/analyzer/pr110014.c
new file mode 100644
index 000..d76b8781413
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr110014.c
@@ -0,0 +1,25 @@
+void *realloc (void *, unsigned long)
+  __attribute__((__nothrow__, __leaf__))
+  __attribute__((__warn_unused_result__)) __attribute__((__alloc_size__ (2)));
+
+long *
+slurp (long *buffer, unsigned long file_size)
+{
+  unsigned long cc;
+  if (!__builtin_add_overflow (file_size - file_size % sizeof (long),
+  2 * sizeof (long), &cc))
+buffer = realloc (buffer, cc);
+  return buffer;
+}
+
+long *
+slurp1 (long *buffer, unsigned long file_size)
+{
+  return realloc (buffer, file_size - file_size % sizeof (long));
+}
+
+long *
+slurp2 (long *buffer, unsigned long file_size)
+{
+  return realloc (buffer, (file_size / sizeof (long)) * sizeof (long));
+}
-- 
2.40.1

Re: [PATCH 1/2] analyzer: Fix allocation size false positive on conjured svalue [PR109577]

2023-06-09 Thread David Malcolm via Gcc-patches

On Fri, 2023-06-09 at 20:28 +0200, Tim Lange wrote:


[...snip...]

Thanks for the patch.

> diff --git a/gcc/analyzer/constraint-manager.cc 
> b/gcc/analyzer/constraint-manager.cc
> index 2c9c435527e..24cd8960098 100644
> --- a/gcc/analyzer/constraint-manager.cc
> +++ b/gcc/analyzer/constraint-manager.cc
> @@ -2218,6 +2218,137 @@ constraint_manager::get_equiv_class_by_svalue (const 
> svalue *sval,
>    return false;
>  }
>  
> +/* Tries to find a svalue inside another svalue.  */
> +
> +class sval_finder : public visitor
> +{
> +public:
> +  sval_finder (const svalue *query) : m_query (query)
> +  {
> +  }

It looks like this ctor is missing an initialization of the new field
"m_found" to false.

[...snip...]

> +private:
> +  const svalue *m_query;
> +  bool m_found;
> +};
> +

[...snip...]



Other than that, looks good to me.


Dave

Re: [PATCH 2/2] testsuite: Add more allocation size tests for conjured svalues [PR110014]

2023-06-09 Thread David Malcolm via Gcc-patches

On Fri, 2023-06-09 at 20:28 +0200, Tim Lange wrote:
> This patch adds the reproducers reported in PR 110014 as test cases.
> The
> false positives in those cases are already fixed with PR 109577.
> 
> 2023-06-09  Tim Lange  
> 
> PR analyzer/110014
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/analyzer/pr110014.c: New tests.

Please can you rename the new test case to 
"realloc-pr110014.c" (since having too many test cases named simply
prXX.c gets overwhelming).

Approved for trunk once the fix for PR 109577 goes in

Thanks!
Dave

Re: [PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-09 Thread Segher Boessenkool

Hi!

On Wed, Jun 07, 2023 at 04:21:11PM +0800, Jiufu Guo wrote:
> This patch tries to optimize "(X - N * M) / N" to "X / N - M".
> For C code, "/" towards zero (trunc_div), and "X - N * M" maybe
> wrap/overflow/underflow. So, it is valid that "X - N * M" does
> not cross zero and does not wrap/overflow/underflow.

Is it ever valid semi-generally when N does not divide X?

Say X=5, N=2, M=3.  Then the original expression evaluates to 0, but the
new one to -1.  Whenever one of the divisions rounds up and the other
rounds down you have this problem.

Segher

Re: Splitting up 27_io/basic_istream/ignore/wchar_t/94749.cc (takes too long)

2023-06-09 Thread Jonathan Wakely via Gcc-patches

On Fri, 9 Jun 2023 at 17:20, Hans-Peter Nilsson  wrote:

> Hi!
>
> The test 27_io/basic_istream/ignore/wchar_t/94749.cc takes
> about 10 minutes to run for cris-elf in the "gdb simulator"
> here on my arguably way-past-retirement machine (and it
> looks like it gained a minute with LRA).  I've seen it
> timing out every now and then on busy days with load >
> `nproc`.  Usually it happens some time after I've forgot
> about why. :)
>
> It has had some performance surgery before (pruning for
> simulators, doubling timeout for ilp32).  I'd probably just
> try cutting along the function boundaries and keep those
> parts separate that have >1 min execution time.
>

test01, test02, test03 and test04 should run almost instantly. On my system
they take about 5 microseconds each. So I don't think splitting those up
will help.

test05 extracts INT_MAX characters from a stream, which is a LOT of work.
It doesn't actually read those from a file, the "stream" is a custom
streambuf that contains a buffer of millions of wchar_t and "reading" from
the stream just increments a counter into that buffer. But we do have to
allocate memory for that buffer and then zero-init that buffer. That's a
lot of cycles. Then once we've done that, we need to keep looping until we
overflow a 32-bit counter (we don't increment by 1 every loop, so it
overflows pretty quickly).

Then we do it again and again and again! Each time takes about half a
second for me.

I thought it would help to avoid re-allocating the buffer and zeroing it
again. If we reuse the same buffer, then we just have to loop until we
overflow the 32-bit counter. That would make the whole test run much
faster, which would reduce the total time for a testsuite run. Splitting
the file up into smaller files would not decrease the total time, only
decrease the time for that single test so it doesn't time out.

I've attached a patch that does that. I makes very little difference for
me, probably because allocating zero-filled pages isn't actually expensive
on linux. Maybe it will make a differene for your simulator though?

You could also try reducing the size of the buffer:
+#ifdef SIMULATOR_TEST
+  static const streamsize bufsz = 16 << limits::digits10;
+#else
  static const streamsize bufsz = 2048 << limits::digits10;
+#endif

test06 is the really slow part, that takes 10+ seconds for me. But that
entire function should already be skipped for simulators.

We can probably skip test05 for simulators too, none of the code it tests
is platform-specific, so as long as it's being tested on x86 we don't
really need to test it on cris-elf too.
diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/94749.cc 
b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/94749.cc
index 65e0a326c10..040e94aa4d6 100644
--- a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/94749.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/94749.cc
@@ -89,7 +89,7 @@ struct buff : std::basic_streambuf
   typedef std::streamsizestreamsize;
   typedef std::numeric_limits limits;

-  buff() : count(0), buf() { }
+  buff() : count(0), nonzero_chars(), buf() { }

   int_type underflow()
   {
@@ -112,12 +112,23 @@ struct buff : std::basic_streambuf
   buf[headroom+1] = L'3';
   this->setg(buf, buf, buf + headroom + 2);
   count = limits::max();
+  nonzero_chars = headroom - 1;
 }

 return buf[0];
   }

+  void reset()
+  {
+buf[nonzero_chars] = char_type();
+buf[nonzero_chars+1] = char_type();
+buf[nonzero_chars+2] = char_type();
+nonzero_chars = 0;
+count = 0;
+  }
+
   streamsize count;
+  streamsize nonzero_chars;

   static const streamsize bufsz = 2048 << limits::digits10;
   char_type buf[bufsz + 2];
@@ -132,7 +143,8 @@ test05()

   typedef std::char_traits T;

-  std::basic_istream in(new buff);
+  buff* pbuf = new buff;
+  std::basic_istream in(pbuf);

   in.ignore(std::numeric_limits::max(), L'1');
   VERIFY(in.good());
@@ -141,7 +153,9 @@ test05()
   VERIFY(in.get() == L'3');
   VERIFY(in.get() == T::eof());

-  delete in.rdbuf(new buff);
+  pbuf->reset();
+  in.clear();
+  VERIFY(in.gcount() == 0);

   in.ignore(std::numeric_limits::max(), L'2');
   VERIFY(in.good());
@@ -150,7 +164,9 @@ test05()
   VERIFY(in.get() == L'3');
   VERIFY(in.get() == T::eof());

-  delete in.rdbuf(new buff);
+  pbuf->reset();
+  in.clear();
+  VERIFY(in.gcount() == 0);

   in.ignore(std::numeric_limits::max(), L'3');
   VERIFY(in.good());
@@ -158,7 +174,9 @@ test05()
   VERIFY(in.gcount() == std::numeric_limits::max());
   VERIFY(in.get() == T::eof());

-  delete in.rdbuf(new buff);
+  pbuf->reset();
+  in.clear();
+  VERIFY(in.gcount() == 0);

   in.ignore(std::numeric_limits::max(), L'4');
   VERIFY(in.eof());
@@ -166,7 +184,8 @@ test05()
   VERIFY(in.gcount() == std::numeric_limits::max());
   VERIFY(in.get() == T::eof());

-  delete in.rdbuf(0);
+  in.rdbuf(0);
+  delete pbuf;
 }

 void
@@ -177,7

Re: [PATCH] MATCH: Fix zero_one_valued_p not to match signed 1 bit integers

2023-06-09 Thread Jeff Law via Gcc-patches





On 6/9/23 11:27, Andrew Pinski via Gcc-patches wrote:

So for the attached testcase, we assumed that zero_one_valued_p would
be the value [0,1] but currently zero_one_valued_p matches also
signed 1 bit integers.
This changes that not to match that and fixes the 2 new testcases at
all optimization levels.

OK for GCC 13? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/110165
PR tree-optimization/110166

gcc/ChangeLog:

* match.pd (zero_one_valued_p): Don't accept
signed 1-bit integers.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr110165-1.c: New test.
* gcc.c-torture/execute/pr110166-1.c: New test.

OK.
Jeff

1 2 >

1 - 100 of 130 matches

Mail list logo