date:20250518

Re: [PATCH] gcc: add trigonometric pi-based functions as gcc builtins

2025-05-18 Thread Yuao Ma

Hi Jakub,
Thank you for your suggestion. I actually learned from your earlier patch 
(https://gcc.gnu.org/cgit/gcc/commit?id=7f940822) and had already planned to 
update tree-call-cdce.cc when handling these builtins. Your guidance is much 
appreciated!
Best regards,
Yuao

From: Jakub Jelinek 
Sent: Saturday, May 17, 2025 22:11
To: Yuao Ma 
Cc: gcc-patches@gcc.gnu.org ; fort...@gcc.gnu.org 
; tbur...@baylibre.com ; 
j...@polyomino.org.uk 
Subject: Re: [PATCH] gcc: add trigonometric pi-based functions as gcc builtins

On Wed, May 14, 2025 at 02:22:23PM +, Yuao Ma wrote:
> If approved, I suggest committing this foundational change first. Constant
> folding for these builtins will be addressed in subsequent patches.

Note, not just constant folding is needed, but I think the builtins should
be handled in
tree-call-cdce.cc (can_test_argument_range, edom_only_function,
get_no_error_domain).

Jakub

Re: [to-be-committed][RISC-V] Avoid setting output object more than once in IOR/XOR synthesis

2025-05-18 Thread Jeff Law





On 5/18/25 8:53 AM, Mark Wielaard wrote:

Hi Jeff,

On Thu, May 15, 2025 at 10:11:19PM -0600, Jeff Law wrote:

This has been tested in my tester and is currently bootstrapping on
my BPI.  Waiting on data from the pre-commit tester before moving
forward...


It looks like the Sourceware p550 and spacemit-x60 builders do flag a
bootstrap issue with this:

https://builder.sourceware.org/buildbot/#/builders/337/builds/255
https://builder.sourceware.org/buildbot/#/builders/338/builds/228

../../gcc/gcc/config/riscv/riscv.cc: In function ‘bool 
synthesize_ior_xor(rtx_code, rtx_def**)’:
../../gcc/gcc/config/riscv/riscv.cc:14422:18: error: ‘output’ may be used 
uninitialized [-Werror=maybe-uninitialized]
14422 |   emit_move_insn (operands[0], output);
   |   ~~~^
../../gcc/gcc/config/riscv/riscv.cc:14393:7: note: ‘output’ was declared here
14393 |   rtx output;
   |   ^~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:2728: riscv.o] Error 1
make[3]: *** Waiting for unfinished jobs
Arggh.  Mine failed too (big surprise).  I'll fix it up.  The 24hr cycle 
time is quite annoying...


jeff

Re: [PATCH] Partially lift restriction from loc_list_from_tree_1

2025-05-18 Thread Eric Botcazou

> OK.

Thanks.

> Btw, can we try to add a "guality" for gnat.dg?  Or are you making sure to
> add coverage to the gdb testsuite?

Yes, the GDB testsuite will get a testcase.

-- 
Eric Botcazou

Re: [PATCH] phiopt: Use mark_lhs_in_seq_for_dce instead of doing it inline

2025-05-18 Thread Richard Biener




> Am 18.05.2025 um 08:26 schrieb Andrew Pinski :
> 
> Right now phiopt has the same code as mark_lhs_in_seq_for_dce
> inlined into match_simplify_replacement. Instead let's use the
> function in gimple-fold that does the same thing.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

Ok

Richard 

> gcc/ChangeLog:
> 
>* gimple-fold.cc (mark_lhs_in_seq_for_dce): Make
>non-static.
>* gimple-fold.h (mark_lhs_in_seq_for_dce): Declare.
>* tree-ssa-phiopt.cc (match_simplify_replacement): Use
>mark_lhs_in_seq_for_dce instead of manually looping.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/gimple-fold.cc |  2 +-
> gcc/gimple-fold.h  |  1 +
> gcc/tree-ssa-phiopt.cc | 13 +++--
> 3 files changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index b74fb8bb50c..0f437616d77 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -6020,7 +6020,7 @@ has_use_on_stmt (tree name, gimple *stmt)
> 
> /* Add the lhs of each statement of SEQ to DCE_WORKLIST. */
> 
> -static void
> +void
> mark_lhs_in_seq_for_dce (bitmap dce_worklist, gimple_seq seq)
> {
>   if (!dce_worklist)
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index afecbb8ceef..8b1e246b0c0 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -264,6 +264,7 @@ gimple_build_round_up (gimple_seq *seq, tree type, tree 
> old_size,
> 
> extern bool gimple_stmt_nonnegative_warnv_p (gimple *, bool *, int = 0);
> extern bool gimple_stmt_integer_valued_real_p (gimple *, int = 0);
> +extern void mark_lhs_in_seq_for_dce (bitmap, gimple_seq);
> 
> /* In gimple-match.cc.  */
> extern tree gimple_simplify (enum tree_code, tree, tree,
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 9724040fc3d..8c5908e5bff 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -1001,16 +1001,9 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>   if (seq)
> {
>   // Mark the lhs of the new statements maybe for dce
> -  gimple_stmt_iterator gsi1 = gsi_start (seq);
> -  for (; !gsi_end_p (gsi1); gsi_next (&gsi1))
> -{
> -  gimple *stmt = gsi_stmt (gsi1);
> -  tree name = gimple_get_lhs (stmt);
> -  if (name && TREE_CODE (name) == SSA_NAME)
> -bitmap_set_bit (exprs_maybe_dce, SSA_NAME_VERSION (name));
> -}
> -gsi_insert_seq_before (&gsi, seq, GSI_CONTINUE_LINKING);
> -  }
> +  mark_lhs_in_seq_for_dce (exprs_maybe_dce, seq);
> +  gsi_insert_seq_before (&gsi, seq, GSI_CONTINUE_LINKING);
> +}
> 
>   /* If there was a statement to move, move it to right before
>  the original conditional.  */
> --
> 2.43.0
>

[PATCH] match: Remove valueize_condition argument from gimple_extra template

2025-05-18 Thread Andrew Pinski

After r15-4791-gb60031e8f9f8fe, the valueize_condition argument becomes
unused. I didn't notice that as there was -Wno-unused option being added
while compiling gimple-match-exports.cc. This removes that too as there are
no unused warnings.

gcc/ChangeLog:

* Makefile.in (gimple-match-exports.o-warn): Remove.
* gimple-match-exports.cc (gimple_extract): Remove valueize_condition
argument.
(gimple_extract_op): Update call to gimple_extract.
(gimple_simplify): Likewise. Also remove valueize_condition lambda.

Signed-off-by: Andrew Pinski 
---
 gcc/Makefile.in |  1 -
 gcc/gimple-match-exports.cc | 44 +
 2 files changed, 6 insertions(+), 39 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 72d132207c0..366364a23de 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -253,7 +253,6 @@ gengtype-lex.o-warn = -Wno-error
 libgcov-util.o-warn = -Wno-error
 libgcov-driver-tool.o-warn = -Wno-error
 libgcov-merge-tool.o-warn = -Wno-error
-gimple-match-exports.o-warn = -Wno-unused
 dfp.o-warn = -Wno-strict-aliasing
 
 # All warnings have to be shut off in stage1 if the compiler used then
diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b3acae21fa5..06f155427b3 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -720,16 +720,14 @@ gimple_simplify (combined_fn fn, tree type,
describe STMT in RES_OP, returning true on success.  Before recording
an operand, call:
 
-   - VALUEIZE_CONDITION for a COND_EXPR condition
-   - VALUEIZE_OP for every other top-level operand
+   - VALUEIZE_OP for all top-level operand
 
-   Both routines take a tree argument and returns a tree.  */
+   This routine takes a tree argument and returns a tree.  */
 
-template
+template
 inline bool
 gimple_extract (gimple *stmt, gimple_match_op *res_op,
-   ValueizeOp valueize_op,
-   ValueizeCondition valueize_condition)
+   ValueizeOp valueize_op)
 {
   switch (gimple_code (stmt))
 {
@@ -858,7 +856,7 @@ bool
 gimple_extract_op (gimple *stmt, gimple_match_op *res_op)
 {
   auto nop = [](tree op) { return op; };
-  return gimple_extract (stmt, res_op, nop, nop);
+  return gimple_extract (stmt, res_op, nop);
 }
 
 /* In some cases, the resulting RES_OP might contain just a
@@ -895,38 +893,8 @@ gimple_simplify (gimple *stmt, gimple_match_op *res_op, 
gimple_seq *seq,
 {
   return do_valueize (op, top_valueize, valueized);
 };
-  auto valueize_condition = [&](tree op) -> tree
-{
-  bool cond_valueized = false;
-  tree lhs = do_valueize (TREE_OPERAND (op, 0), top_valueize,
- cond_valueized);
-  tree rhs = do_valueize (TREE_OPERAND (op, 1), top_valueize,
- cond_valueized);
-  gimple_match_op res_op2 (res_op->cond, TREE_CODE (op),
-  TREE_TYPE (op), lhs, rhs);
-  if ((gimple_resimplify2 (seq, &res_op2, valueize)
-  || cond_valueized)
- && res_op2.code.is_tree_code ())
-   {
- auto code = tree_code (res_op2.code);
- if (TREE_CODE_CLASS (code) == tcc_comparison)
-   {
- valueized = true;
- return build2 (code, TREE_TYPE (op),
-res_op2.ops[0], res_op2.ops[1]);
-   }
- else if (code == SSA_NAME
-  || code == INTEGER_CST
-  || code == VECTOR_CST)
-   {
- valueized = true;
- return res_op2.ops[0];
-   }
-   }
-  return valueize_op (op);
-};
 
-  if (!gimple_extract (stmt, res_op, valueize_op, valueize_condition))
+  if (!gimple_extract (stmt, res_op, valueize_op))
 return false;
 
   if (res_op->code.is_internal_fn ())
-- 
2.43.0

[PATCH] match: Undo maybe_push_res_to_seq in some cases [PR120331]

2025-05-18 Thread Andrew Pinski

While working on improving forwprop and removal of
forward_propagate_into_gimple_cond/forward_propagate_into_comparison,
I came cross a case where we end up with SSA_NAME in the resulting
gimple_match_op and one statement in the sequence.  This was the result
of simplification of:
```
_3 = MIN_EXPR  > 16
if (_3 > 16) ...
```

Which simplifies down to:
(maxlen_2(D) > 16) & (264 > 16)
into
(maxlen_2(D) > 16) & 1

Which `maxlen_2(D) > 16` gets pushed onto the sequence
and then the & 1 is removed via the match pattern:
```
/* x & ~0 -> x  */
(simplify
 (bit_and @0 integer_all_onesp)
  (non_lvalue @0))
```

So what this patch does is to undo the push extracting the new op
from the pushed statement and remove the sequence as it is not used
any more.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/120331
* gimple-match-exports.cc (maybe_undo_push): New function.
(gimple_simplify): Call maybe_undo_push if resimplify was successfull.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-match-exports.cc | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index ccba046a1d4..b3acae21fa5 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -861,6 +861,28 @@ gimple_extract_op (gimple *stmt, gimple_match_op *res_op)
   return gimple_extract (stmt, res_op, nop, nop);
 }
 
+/* In some cases, the resulting RES_OP might contain just a
+   SSA_NAME and the sequence SEQ contains one statement, we can
+   possibile undo the push and change the match RES_OP into
+   what the statement is.  */
+static void
+maybe_undo_push (gimple_seq *seq, gimple_match_op *res_op)
+{
+  if (!seq || !gimple_seq_singleton_p (*seq))
+return;
+  if (res_op->code != SSA_NAME)
+return;
+  gimple *stmt = gimple_seq_first_stmt (*seq);
+  if (gimple_get_lhs (stmt) != res_op->ops[0])
+return;
+  gimple_match_op new_op;
+  if (!gimple_extract_op (stmt, &new_op))
+return;
+  gimple_seq_discard (*seq);
+  *seq = NULL;
+  *res_op = new_op;
+}
+
 /* The main STMT based simplification entry.  It is used by the fold_stmt
and the fold_stmt_to_constant APIs.  */
 
@@ -917,7 +939,10 @@ gimple_simplify (gimple *stmt, gimple_match_op *res_op, 
gimple_seq *seq,
   if (!res_op->reverse
   && res_op->num_ops
   && res_op->resimplify (seq, valueize))
-return true;
+{
+  maybe_undo_push (seq, res_op);
+  return true;
+}
 
   return valueized;
 }
-- 
2.43.0

Re: Subject: [PATCH] cobol: gcobc wrapper fixes and additions

2025-05-18 Thread James K. Lowden

On Sat, 5 Apr 2025 00:36:48 +0200
Simon Sobisch  wrote:

> * defaults to dialect GNU (gnucobol)
> * more ibm and strict dialects supported
> * Implemented -A, -Q, -E
> * support known alias "-debug" for "--debug"
> * fix -P, -T and -W consuming source files
> * deduce output file name, as done by cobc

I applied this patch manually to our local parser branch.  Hopefully
the result is as intended. 

https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/-/commit/244688a870175aa4cc23298fc19c7631ded71b34

The original did not apply cleanly because the gcobc script had been
modified meanwhile.  It should appear in gcc/master in a few days.  

--jkl

Re: cobol.1 fix for not using underscores in intrinsic function names

2025-05-18 Thread James K. Lowden

On Wed, 9 Apr 2025 23:12:39 +0200
Simon Sobisch  wrote:

> just stumbled over this and only have a mail client running, so...
> patch as text. The change is in all those cases: change _ (likely
> parsed from the parser or similar) to -.
> 
> Kind regards,
> Simon
> 
> 
> 
> -BASECONVERT BIT_OF BIT_TO_CHAR BOOLEAN_OF_INTEGER BYTE_LENGTH
> +BASECONVERT BIT-OF BIT-TO-CHAR BOOLEAN-OF-INTEGER BYTE-LENGTH
[snip]

Thank you for noticing the mistake.  I did my own search & replace with
I hope the same effect.  

https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/-/commit/244688a870175aa4cc23298fc19c7631ded71b34

It will make its way to gcc/master in time.  

--jkl

[COMMITTED] Regenerate cobol/lang.opt.urls

2025-05-18 Thread Mark Wielaard

The Cobol frontend lang.opt got -M added, but lang.opt.urls wasn't
regenerated.

Fixes: 92b6485a75ca ("cobol: Eliminate exception "blob"; streamline some code 
generation.")

gcc/cobol/ChangeLog:

* lang.opt.urls: Regenerated.
---
 gcc/cobol/lang.opt.urls | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cobol/lang.opt.urls b/gcc/cobol/lang.opt.urls
index 69f52973c025..78fc491fa67f 100644
--- a/gcc/cobol/lang.opt.urls
+++ b/gcc/cobol/lang.opt.urls
@@ -10,6 +10,9 @@ UrlSuffix(gcc/Preprocessor-Options.html#index-D-1)
 I
 UrlSuffix(gcc/Directory-Options.html#index-I) 
LangUrlSuffix_D(gdc/Directory-Options.html#index-I)
 
+M
+UrlSuffix(gcc/Preprocessor-Options.html#index-M) 
LangUrlSuffix_D(gdc/Code-Generation.html#index-M)
+
 ffixed-form
 LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-ffixed-form)
 
-- 
2.49.0

Re: [PATCH v2 1/2] emit-rtl: Allow extra checks for paradoxical subregs [PR119966]

2025-05-18 Thread Dimitar Dimitrov

On Fri, May 16, 2025 at 06:01:43PM +0100, Richard Sandiford wrote:
> Dimitar Dimitrov  writes:
> > When a paradoxical subreg is detected, validate_subreg exits early, thus
> > skipping the important checks later in the function.
> >
> > Fix by continuing with the checks instead of declaring early that the
> > paradoxical subreg is valid.
> >
> > One of the newly allowed subsequent checks needed to be disabled for
> > paradoxical subregs.  It turned out that combine attempts to create
> > a paradoxical subreg of mem even for strict-alignment targets.
> > That is invalid and should eventually be rejected, but is
> > temporarily left allowed to prevent regressions for
> > armv8l-unknown-linux-gnueabihf.
> >
> > Tests I did:
> >  - No regressions were found for C and C++ for the following targets:
> >- native x86_64-pc-linux-gnu
> >- cross riscv64-unknown-linux-gnu
> >- cross riscv32-none-elf
> >  - Sanity checked armv8l-unknown-linux-gnueabihf by cross-building
> >up to including libgcc. I'll monitor Linaro CI bot for the
> >full regression test results.
> >  - Sanity checked powerpc64-unknown-linux-gnu by building native
> >toolchain, but could not setup qemu-user for DejaGnu testing.
> >
> > PR target/119966
> >
> > gcc/ChangeLog:
> >
> > * emit-rtl.cc (validate_subreg): Do not exit immediately for
> > paradoxical subregs.  Filter subsequent tests which are
> > not valid for paradoxical subregs.
> >
> > Co-authored-by: Richard Sandiford 
> > Signed-off-by: Dimitar Dimitrov 
> > ---
> >  gcc/emit-rtl.cc | 25 ++---
> >  1 file changed, 18 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> > index 3e2c4309dee..e46b0f9eac4 100644
> > --- a/gcc/emit-rtl.cc
> > +++ b/gcc/emit-rtl.cc
> > @@ -969,10 +969,10 @@ validate_subreg (machine_mode omode, machine_mode 
> > imode,
> >  }
> >  
> >/* Paradoxical subregs must have offset zero.  */
> > -  if (maybe_gt (osize, isize))
> > -return known_eq (offset, 0U);
> > +  if (maybe_gt (osize, isize) && !known_eq (offset, 0U))
> > +return false;
> >  
> > -  /* This is a normal subreg.  Verify that the offset is representable.  */
> > +  /* Verify that the offset is representable.  */
> >  
> >/* For hard registers, we already have most of these rules collected in
> >   subreg_offset_representable_p.  */
> > @@ -988,9 +988,13 @@ validate_subreg (machine_mode omode, machine_mode 
> > imode,
> >  
> >return subreg_offset_representable_p (regno, imode, offset, omode);
> >  }
> > -  /* Do not allow SUBREG with stricter alignment than the inner MEM.  */
> > +  /* Do not allow normal SUBREG with stricter alignment than the inner MEM.
> > +
> > + FIXME: Combine can create paradoxical mem subregs even for
> > + strict-alignment targets.  Allow it until combine is fixed.  */
> 
> Are the details captured in bugzilla somewhere?  If not, could you file
> a PR and explain when this happens, or add a comment to PR119966?
> 
> I think this should have a reference to a particular bugzilla comment
> that describes the problem, otherwise it would be hard to tell later
> whether the problem has been fixed.
> 
> OK with that change, thanks.

I created a separate PR120329 with all details how to reproduce.

Pushed the patch with added reference as r16-718-geb2ea476db2182.

Thank you,
Dimitar

> 
> Richard
> 
> >else if (reg && MEM_P (reg) && STRICT_ALIGNMENT
> > -  && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
> > +  && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode)
> > +  && known_le (osize, isize))
> >  return false;
> >  
> >/* The outer size must be ordered wrt the register size, otherwise
> > @@ -999,7 +1003,7 @@ validate_subreg (machine_mode omode, machine_mode 
> > imode,
> >if (!ordered_p (osize, regsize))
> >  return false;
> >  
> > -  /* For pseudo registers, we want most of the same checks.  Namely:
> > +  /* For normal pseudo registers, we want most of the same checks.  Namely:
> >  
> >   Assume that the pseudo register will be allocated to hard registers
> >   that can hold REGSIZE bytes each.  If OSIZE is not a multiple of 
> > REGSIZE,
> > @@ -1008,8 +1012,15 @@ validate_subreg (machine_mode omode, machine_mode 
> > imode,
> >   otherwise it is at the lowest offset.
> >  
> >   Given that we've already checked the mode and offset alignment,
> > - we only have to check subblock subregs here.  */
> > + we only have to check subblock subregs here.
> > +
> > + For paradoxical little-endian registers, this check is redundant.  The
> > + offset has already been validated to be zero.
> > +
> > + For paradoxical big-endian registers, this check is not valid
> > + because the offset is zero.  */
> >if (maybe_lt (osize, regsize)
> > +  && known_le (osize, isize)
> >&& ! (lra_in_progress && (FLOAT_MODE_P (imode) || FLOAT_MODE_P 
> > (omode
> >  {
> >

Re: [PATCH v2 0/7] Remove -mavx10.1-256/512 and -mno-evex512

2025-05-18 Thread Hongtao Liu

On Wed, May 14, 2025 at 3:29 PM Haochen Jiang  wrote:
>
> Hi all,
>
> This is the v2 patch to remove -mavx10.1/256-512 and -mno-evex512. I suppose
> this time all the patches will not be held due to size.
>
> As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-evex512
> options in GCC 16. Also we will do some clean up in code for all the size
> happening all together.
>
> The first patch will remove -mavx10.1-256/512. The second patch will remove
> those OPTION_MASK_ISA2_EVEX512 pushed into builtins, and the third patch will
> remove -mevex512. The following four are refactoring and cleaning up for the
> machine description and AVX10.2.
>
> Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
>


-- 
BR,
Hongtao

[PATCH v3] Extend vect_recog_cond_expr_convert_pattern to handle REAL_CST

2025-05-18 Thread liuhongt

Changed, here's the updated patch I'm going to check in.

REAL_CST is handled if it can be represented in different floating
point types without loss of precision or under fast math.

gcc/ChangeLog:

PR tree-optimization/103771
* match.pd (cond_expr_convert_p): Extend the match to handle
REAL_CST.
* tree-vect-patterns.cc
(vect_recog_cond_expr_convert_pattern): Handle REAL_CST.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103771-5.c: New test.
* gcc.target/i386/pr103771-6.c: New test.
---
 gcc/match.pd   | 31 +
 gcc/testsuite/gcc.target/i386/pr103771-5.c | 54 ++
 gcc/testsuite/gcc.target/i386/pr103771-6.c | 16 +++
 gcc/tree-vect-patterns.cc  | 41 
 4 files changed, 132 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103771-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103771-6.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 789e3d33326..7dba30311e3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -11346,6 +11346,37 @@ and,
&& single_use (@4)
&& single_use (@5
 
+/* Floating point or integer comparison and floating point conversion
+   with REAL_CST.  */
+(match (cond_expr_convert_p @0 @2 @3 @6)
+ (cond (simple_comparison@6 @0 @1) (REAL_CST@2) (convert@5 @3))
+  (if (!flag_trapping_math
+   && SCALAR_FLOAT_TYPE_P (type)
+   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@3))
+   && !operand_equal_p (TYPE_SIZE (type),
+   TYPE_SIZE (TREE_TYPE (@0)))
+   && operand_equal_p (TYPE_SIZE (TREE_TYPE (@0)),
+  TYPE_SIZE (TREE_TYPE (@3)))
+   && single_use (@5)
+   && (flag_unsafe_math_optimizations
+  || exact_real_truncate (TYPE_MODE (TREE_TYPE (@3)),
+  &TREE_REAL_CST (@2))
+
+/* Floating point or integer comparison and floating point conversion
+   with REAL_CST.  */
+(match (cond_expr_convert_p @0 @2 @3 @6)
+ (cond (simple_comparison@6 @0 @1) (convert@4 @2) (REAL_CST@3))
+  (if (!flag_trapping_math
+   && SCALAR_FLOAT_TYPE_P (type)
+   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
+   && !operand_equal_p (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (@0)))
+   && operand_equal_p (TYPE_SIZE (TREE_TYPE (@0)),
+  TYPE_SIZE (TREE_TYPE (@2)))
+   && single_use (@4)
+   && (flag_unsafe_math_optimizations
+  || exact_real_truncate (TYPE_MODE (TREE_TYPE (@2)),
+  &TREE_REAL_CST (@3))
+
 (for bit_op (bit_and bit_ior bit_xor)
  (match (bitwise_induction_p @0 @2 @3)
   (bit_op:c
diff --git a/gcc/testsuite/gcc.target/i386/pr103771-5.c 
b/gcc/testsuite/gcc.target/i386/pr103771-5.c
new file mode 100644
index 000..bf94f53b88c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103771-5.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64-v4 -O3 -fno-trapping-math 
-fdump-tree-vect-details" } */
+/* { dg-final { scan-assembler-not "kshift" { target { ! ia32 } } } } */
+/* { dg-final { scan-tree-dump-times "loop vectorized using 64 byte vectors" 4 
"vect" { target { ! ia32 } } } } */
+
+void
+foo (float* a, float* b, float* c, float* d, double* __restrict e, int n)
+{
+  for (int i = 0 ; i != n; i++)
+{
+  float tmp = c[i] + d[i];
+  if (a[i] < b[i])
+   tmp = 0.0;
+  e[i] = tmp;
+}
+}
+
+void
+foo1 (int* a, int* b, float* c, float* d, double* __restrict e, int n)
+{
+  for (int i = 0 ; i != n; i++)
+{
+  float tmp = c[i] + d[i];
+  if (a[i] < b[i])
+   tmp = 0.0;
+  e[i] = tmp;
+}
+}
+
+
+void
+foo2 (double* a, double* b, double* c, double* d, float* __restrict e, int n)
+{
+  for (int i = 0 ; i != n; i++)
+{
+  float tmp = c[i] + d[i];
+  if (a[i] < b[i])
+   tmp = 0.0;
+  e[i] = tmp;
+}
+}
+
+void
+foo3 (long long* a, long long* b, double* c, double* d, float* __restrict e, 
int n)
+{
+  for (int i = 0 ; i != n; i++)
+{
+  float tmp = c[i] + d[i];
+  if (a[i] < b[i])
+   tmp = 0.0;
+  e[i] = tmp;
+}
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr103771-6.c 
b/gcc/testsuite/gcc.target/i386/pr103771-6.c
new file mode 100644
index 000..92de6f6249d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103771-6.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64-v4 -O3 -fno-trapping-math 
-fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-not "vect_recog_cond_expr_convert_pattern" 
"vect" } } */
+/* { dg-final { scan-tree-dump-times "loop vectorized using 64 byte vectors" 1 
"vect" { target { ! ia32 } } } } */
+
+void
+foo (float* a, float* b, float* c, float* d, double* __restrict e, int n)
+{
+for (int i = 0 ; i != n; i++)
+{
+double tmp = c[i] + d[i];
+if (a[i] < b[i])
+  tmp = 1.001;
+e[i] = tmp;
+}
+}
diff

[PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not

2025-05-18 Thread liuhongt

From: "hongtao.liu" 

AutoFDO profile is a scaled profile, as a result, 0 sample does not
mean never executed. especially there's profile from function
body. Prevent combine_with_ipa_count·(ipa_count) from zeroing all
bb->count.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
OK for trunk?

gcc/ChangeLog:

PR gcov-profile/118551
* predict.cc (estimate_bb_frequencies): Prevent
combine_with_ipa_count·(ipa_count) from zeroing all bb->count
when function body is known to be hot.
---
 gcc/predict.cc | 73 +-
 1 file changed, 42 insertions(+), 31 deletions(-)

diff --git a/gcc/predict.cc b/gcc/predict.cc
index ef31c48bfe2..7d4bf5261ad 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -4105,40 +4105,51 @@ estimate_bb_frequencies ()
   if (freq_max < 16)
 freq_max = 16;
   profile_count ipa_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.ipa ();
-  cfun->cfg->count_max = profile_count::uninitialized ();
-  FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
+
+  /* AutoFDO profile is a scaled profile, as a result, 0 sample does not
+ mean never executed. especially there's profile from function
+ body. Prevent combine_with_ipa_count·(ipa_count) from zeroing all
+ bb->count.  */
+  if (!(ipa_count.quality () == AFDO
+   && cfun->cfg->count_max.quality () == AFDO
+   && !ipa_count.nonzero_p ()
+   && cfun->cfg->count_max.nonzero_p ()))
 {
-  sreal tmp = BLOCK_INFO (bb)->frequency;
-  if (tmp >= 1)
+  cfun->cfg->count_max = profile_count::uninitialized ();
+  FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
{
- gimple_stmt_iterator gsi;
- tree decl;
-
- /* Self recursive calls can not have frequency greater than 1
-or program will never terminate.  This will result in an
-inconsistent bb profile but it is better than greatly confusing
-IPA cost metrics.  */
- for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-   if (is_gimple_call (gsi_stmt (gsi))
-   && (decl = gimple_call_fndecl (gsi_stmt (gsi))) != NULL
-   && recursive_call_p (current_function_decl, decl))
- {
-   if (dump_file)
- fprintf (dump_file, "Dropping frequency of recursive call"
-  " in bb %i from %f\n", bb->index,
-  tmp.to_double ());
-   tmp = (sreal)9 / (sreal)10;
-   break;
- }
+ sreal tmp = BLOCK_INFO (bb)->frequency;
+ if (tmp >= 1)
+   {
+ gimple_stmt_iterator gsi;
+ tree decl;
+
+ /* Self recursive calls can not have frequency greater than 1
+or program will never terminate.  This will result in an
+inconsistent bb profile but it is better than greatly confusing
+IPA cost metrics.  */
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+   if (is_gimple_call (gsi_stmt (gsi))
+   && (decl = gimple_call_fndecl (gsi_stmt (gsi))) != NULL
+   && recursive_call_p (current_function_decl, decl))
+ {
+   if (dump_file)
+ fprintf (dump_file, "Dropping frequency of recursive call"
+  " in bb %i from %f\n", bb->index,
+  tmp.to_double ());
+   tmp = (sreal)9 / (sreal)10;
+   break;
+ }
+   }
+ tmp = tmp * freq_max;
+ profile_count count = profile_count::from_gcov_type 
(tmp.to_nearest_int ());
+
+ /* If we have profile feedback in which this function was never
+executed, then preserve this info.  */
+ if (!(bb->count == profile_count::zero ()))
+   bb->count = count.guessed_local ().combine_with_ipa_count 
(ipa_count);
+ cfun->cfg->count_max = cfun->cfg->count_max.max (bb->count);
}
-  tmp = tmp * freq_max;
-  profile_count count = profile_count::from_gcov_type (tmp.to_nearest_int 
());
-
-  /* If we have profile feedback in which this function was never
-executed, then preserve this info.  */
-  if (!(bb->count == profile_count::zero ()))
-   bb->count = count.guessed_local ().combine_with_ipa_count (ipa_count);
-  cfun->cfg->count_max = cfun->cfg->count_max.max (bb->count);
 }
 
   free_aux_for_blocks ();
-- 
2.34.1

[PATCH] RISC-V: Rename conflicting variables in gen-riscv-ext-texi.cc

2025-05-18 Thread Songhe Zhu

From: zhusonghe 

The variables `major` and `minor` in `gen-riscv-ext-texi.cc`
conflict with the macros of the same name defined in ``,
which are exposed when building with newer versions of GCC on older
Linux distributions (e.g., Ubuntu 18.04). To resolve this, we rename them
to `major_version` and `minor_version` respectively. This aligns with the
GCC community's recommended practice [1] and improves code clarity.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683881.html

gcc/ChangeLog:

* config/riscv/gen-riscv-ext-texi.cc (struct version_t):rename
major/minor to major_version/minor_version.

Signed-off-by: Songhe Zhu 
---
 gcc/config/riscv/gen-riscv-ext-texi.cc | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/gen-riscv-ext-texi.cc 
b/gcc/config/riscv/gen-riscv-ext-texi.cc
index e15fdbf36f6..6fec179e6c5 100644
--- a/gcc/config/riscv/gen-riscv-ext-texi.cc
+++ b/gcc/config/riscv/gen-riscv-ext-texi.cc
@@ -6,22 +6,22 @@
 
 struct version_t
 {
-  int major;
-  int minor;
+  int major_version;
+  int minor_version;
   version_t (int major, int minor,
 enum riscv_isa_spec_class spec = ISA_SPEC_CLASS_NONE)
-: major (major), minor (minor)
+: major_version (major), minor_version (minor)
   {}
   bool operator<(const version_t &other) const
   {
-if (major != other.major)
-  return major < other.major;
-return minor < other.minor;
+if (major_version != other.major_version)
+  return major_version < other.major_version;
+return minor_version < other.minor_version;
   }
 
   bool operator== (const version_t &other) const
   {
-return major == other.major && minor == other.minor;
+return major_version == other.major_version && minor_version == 
other.minor_version;
   }
 };
 
-- 
2.17.1

Re: [PATCH] RISC-V: Add new operand constraint: cR

2025-05-18 Thread Kito Cheng

Committed :)

On Sat, May 17, 2025 at 9:36 PM Jeff Law  wrote:
>
>
>
> On 5/14/25 9:20 PM, Kito Cheng wrote:
> > This commit introduces a new operand constraint `cR` for the RISC-V
> > architecture, which allows the use of an even-odd RVC general purpose 
> > register
> > (x8-x15) in inline asm.
> >
> > Ref: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/102
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/constraints.md (cR): New constraint.
> >   * doc/md.texi (Machine Constraints::RISC-V): Document the new cR
> >   constraint.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/constraint-cR.c: New test case.
> OK
> jeff
>

[PATCH v1 0/8] RISC-V: Combine vec_duplicate + vrsub.vv to vrsub.vx on GR2VR cost

2025-05-18 Thread pan2 . li

From: Pan Li 

This patch would like to introduce the combine of vec_dup + vsub.vv into
vsub.vx on the cost value of GR2VR.  The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test.  There will be two cases for the combine:

Case 0:
 |   ...
 |   vmv.v.x
 | L1:
 |   vrsub.vv
 |   J L1
 |   ...

Case 1:
 |   ...
 | L1:
 |   vmv.v.x
 |   vrsub.vv
 |   J L1
 |   ...

Both will be combined to below if the cost of GR2VR is zero.
 |   ...
 | L1:
 |   vrsub.vx
 |   J L1
 |   ...

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

Pan Li (8):
  RISC-V: Combine vec_duplicate + vrsub.vv to vrsub.vx on GR2VR cost
  RISC-V: Add test for vec_duplicate + vrsub.vv combine case 0 with GR2VR cost 0
  RISC-V: Add test for vec_duplicate + vrsub.vv combine case 0 with GR2VR cost 1
  RISC-V: Add test for vec_duplicate + vrsub.vv combine case 0 with GR2VR cost 
15
  RISC-V: Add test for vec_duplicate + vrsub.vv combine case 1 with GR2VR cost 0
  RISC-V: Add test for vec_duplicate + vrsub.vv combine case 1 with GR2VR cost 1
  RISC-V: Add test for vec_duplicate + vrsub.vv combine case 1 with GR2VR cost 2
  RISC-V: Tweak the asm check test of vx combine on GR2VR cost [NFC]

 gcc/config/riscv/autovec-opt.md   |  16 +-
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   |  49 +++
 .../riscv/rvv/autovec/vx_vf/vx-1-i16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-i32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-i64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-i8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-2-i16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-2-i32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-2-i64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-2-i8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-3-i16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-3-i32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-3-i64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-3-i8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-4-i16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-4-i32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-4-i64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-4-i8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-4-u16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-4-u32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-4-u64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-4-u8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-5-i16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-5-i32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-5-i64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-5-i8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-5-u16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-5-u32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-5-u64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-5-u8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-6-i16.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-6-i32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-6-i64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-6-i8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-6-u16.c|   9 +-
 .../riscv/rvv/autovec/vx_vf/vx-6-u32.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-6-u64.c|   8 +-
 .../riscv/rvv/autovec/vx_vf/vx-6-u8.c |   8 +-
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  63 +++
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 392 ++
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-i16.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-i32.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-i64.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-i8.c |  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-u16.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-u32.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-u64.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-u8.c |  15 +
 61 files changed, 922 insertions(+), 105 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx

[PATCH v1 1/8] RISC-V: Combine vec_duplicate + vrsub.vv to vrsub.vx on GR2VR cost

2025-05-18 Thread pan2 . li

From: Pan Li 

This patch would like to combine the vec_duplicate + vrub.vv to the
vrsub.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY_REVERSE_CASE_0(T, OP, NAME)   \
  void\
  test_vx_binary_reverse_##NAME##_##T##_case_0 (T * restrict out, \
T * restrict in, T x, \
unsigned n)   \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = x OP in[i];\
  }

  DEF_VX_BINARY_REVERSE_CASE_0(int32_t, -)

Before this patch:
  54   │ test_vx_binary_reverse_rsub_int32_t_case_0:
  55   │ beq a3,zero,.L27
  56   │ vsetvli a5,zero,e32,m1,ta,ma
  57   │ vmv.v.x v2,a2
  58   │ sllia3,a3,32
  59   │ srlia3,a3,32
  60   │ .L22:
  61   │ vsetvli a5,a3,e32,m1,ta,ma
  62   │ vle32.v v1,0(a1)
  63   │ sllia4,a5,2
  64   │ sub a3,a3,a5
  65   │ add a1,a1,a4
  66   │ vsub.vv v1,v2,v1
  67   │ vse32.v v1,0(a0)
  68   │ add a0,a0,a4
  69   │ bne a3,zero,.L22

After this patch:
  50   │ test_vx_binary_reverse_rsub_int32_t_case_0:
  51   │ beq a3,zero,.L27
  52   │ sllia3,a3,32
  53   │ srlia3,a3,32
  54   │ .L22:
  55   │ vsetvli a5,a3,e32,m1,ta,ma
  56   │ vle32.v v1,0(a1)
  57   │ sllia4,a5,2
  58   │ sub a3,a3,a5
  59   │ add a1,a1,a4
  60   │ vrsub.vxv1,v1,a2
  61   │ vse32.v v1,0(a0)
  62   │ add a0,a0,a4
  63   │ bne a3,zero,.L22

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Leverage the new add func to
expand the vx insn.
* config/riscv/riscv-protos.h (expand_vx_binary_vec_dup_vec): Add
new func decl to expand format v = vop(vec_dup(x), v).
(expand_vx_binary_vec_vec_dup): Diito but for format
v = vop(v, vec_dup(x)).
* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new
func impl to expand vx for v = vop(vec_dup(x), v).
(expand_vx_binary_vec_vec_dup): Diito but for another format
v = vop(v, vec_dup(x)).

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec-opt.md | 16 +--
 gcc/config/riscv/riscv-protos.h |  2 ++
 gcc/config/riscv/riscv-v.cc | 49 +
 3 files changed, 59 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 9c6bf06c3a9..a972eda8de4 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1691,25 +1691,25 @@ (define_insn_and_split "*_vx_"
   "&& 1"
   [(const_int 0)]
   {
-rtx ops[] = {operands[0], operands[2], operands[1]};
-riscv_vector::emit_vlmax_insn (code_for_pred_scalar (, mode),
-  riscv_vector::BINARY_OP, ops);
+riscv_vector::expand_vx_binary_vec_dup_vec (operands[0], operands[2],
+   operands[1], ,
+   mode);
   }
   [(set_attr "type" "vialu")])
 
 (define_insn_and_split "*_vx_"
  [(set (match_operand:V_VLSI0 "register_operand")
(any_int_binop_no_shift_vx:V_VLSI
-(match_operand:V_VLSI  2 "")
+(match_operand:V_VLSI  1 "")
 (vec_duplicate:V_VLSI
-  (match_operand: 1 "register_operand"]
+  (match_operand: 2 "register_operand"]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
   [(const_int 0)]
   {
-rtx ops[] = {operands[0], operands[2], operands[1]};
-riscv_vector::emit_vlmax_insn (code_for_pred_scalar (, mode),
-  riscv_vector::BINARY_OP, ops);
+riscv_vector::expand_vx_binary_vec_vec_dup (operands[0], operands[1],
+   operands[2], ,
+   mode);
   }
   [(set_attr "type" "vialu")])
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 271a9a3228d..b39b858acac 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -667,6 +667,8 @@ void expand_vec_oct_ustrunc (rtx, rtx, machine_mode, 
machine_mode,
 machine_mode);
 void expand_vec_oct_sstrunc (rtx, rtx, machine_mode, machine_mode,
 machine_mode);
+void expand_vx_binary_vec_dup_vec (rtx, rtx, rtx, rtx_code, machine_mode);
+void expand_vx_binary_vec_vec_dup (rtx, rtx, r

[PATCH v1 4/8] RISC-V: Add test for vec_duplicate + vrsub.vv combine case 0 with GR2VR cost 15

2025-05-18 Thread pan2 . li

From: Pan Li 

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Add asm check
for vrsub with GR2VR cost is 15.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c  | 2 ++
 8 files changed, 16 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c
index aa21e10130b..b5f36ff3a44 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int16_t, +, add)
 DEF_VX_BINARY_CASE_0(int16_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int16_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c
index 7c374694321..93ba98d57e9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int32_t, +, add)
 DEF_VX_BINARY_CASE_0(int32_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int32_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c
index 3efb0d7e92e..e73fbce0106 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int64_t, +, add)
 DEF_VX_BINARY_CASE_0(int64_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int64_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c
index d823ed9cc9a..2a3a6f1884b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int8_t, +, add)
 DEF_VX_BINARY_CASE_0(int8_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int8_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c
index 1ab09c8d78e..63358cd3354 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(uint16_t, +, add)
 DEF_VX_BINARY_CASE_0(uint16_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(uint16_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c
index 9247db70154..6ed098773c7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(uint32_t, +, add)
 DEF_VX_BINARY_CASE_0(uint32_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(uint32_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { d

[PATCH v1 2/8] RISC-V: Add test for vec_duplicate + vrsub.vv combine case 0 with GR2VR cost 0

2025-05-18 Thread pan2 . li

From: Pan Li 

Add asm dump check and run test for vec_duplicate + vrsub.vv combine to 
vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add vrsub asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test helper
macros for vx binary reversed.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for vrsub.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx-1-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  63 +++
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 392 ++
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-i16.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-i32.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-i64.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-i8.c |  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-u16.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-u32.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-u64.c|  15 +
 .../rvv/autovec/vx_vf/vx_vrsub-run-1-u8.c |  15 +
 18 files changed, 591 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c
index c6b25f1b857..015b088 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int16_t, +, add)
 DEF_VX_BINARY_CASE_0(int16_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int16_t, -, rsub);
 
 /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */
 /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vrsub.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c
index cb4ccfa1790..f0a88e8da87 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int32_t, +, add)
 DEF_VX_BINARY_CASE_0(int32_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int32_t, -, rsub);
 
 /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */
 /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vrsub.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c
index bf249846452..fbf9f6a930f 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int64_t, +, add)
 DEF_VX_BINARY_CASE_0(int6

[PATCH v1 5/8] RISC-V: Add test for vec_duplicate + vrsub.vv combine case 1 with GR2VR cost 0

2025-05-18 Thread pan2 . li

From: Pan Li 

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vrsub case 1 with GR2VR cost 0.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c  | 2 ++
 8 files changed, 16 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c
index 0ae0566fcfb..4d108569313 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int16_t, +, add, VX_BINARY_BODY_X16)
 DEF_VX_BINARY_CASE_1(int16_t, -, sub, VX_BINARY_BODY_X16)
+DEF_VX_BINARY_REVERSE_CASE_1(int16_t, -, rsub, VX_BINARY_REVERSE_BODY_X16);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c
index 86085d12cf7..410d9ffcfea 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int32_t, +, add, VX_BINARY_BODY_X4)
 DEF_VX_BINARY_CASE_1(int32_t, -, sub, VX_BINARY_BODY_X4)
+DEF_VX_BINARY_REVERSE_CASE_1(int32_t, -, rsub, VX_BINARY_REVERSE_BODY_X4);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c
index 9d89db3d489..51b207055bd 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int64_t, +, add, VX_BINARY_BODY)
 DEF_VX_BINARY_CASE_1(int64_t, -, sub, VX_BINARY_BODY)
+DEF_VX_BINARY_REVERSE_CASE_1(int64_t, -, rsub, VX_BINARY_REVERSE_BODY);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c
index 40b02db8a01..ff7773daee3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int8_t, +, add, VX_BINARY_BODY_X16)
 DEF_VX_BINARY_CASE_1(int8_t, -, sub, VX_BINARY_BODY_X16)
+DEF_VX_BINARY_REVERSE_CASE_1(int8_t, -, rsub, VX_BINARY_REVERSE_BODY_X16);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
index ca2010685d8..00110752964 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(uint16_t, +, add, VX_BINARY_BODY_X16)
 DEF_VX_BINARY_CASE_1(uint16_t, -, sub, VX_BINARY_BODY_X16)
+DEF_VX_BINARY_REVERSE_CASE_1(uint16_t, -, rsub, VX_BINARY_REVERSE_BODY_X16);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
index 6e2456c41e4..ecd405a3574 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u

[PATCH v1 8/8] RISC-V: Tweak the asm check test of vx combine on GR2VR cost [NFC]

2025-05-18 Thread pan2 . li

From: Pan Li 

Tweak the asm check with define T uint8_t for adding more
vx test easily, as well as less possibility to make mistake.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Extract
define T as type for testing.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c| 8 +---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c| 8

[PATCH v1 7/8] RISC-V: Add test for vec_duplicate + vrsub.vv combine case 1 with GR2VR cost 2

2025-05-18 Thread pan2 . li

From: Pan Li 

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Add asm check
for vrsub with GR2VR cost 2.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c  | 2 ++
 8 files changed, 16 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c
index 0e5ad322aa5..ce1b40fd174 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int16_t, +, add, VX_BINARY_BODY_X8)
 DEF_VX_BINARY_CASE_1(int16_t, -, sub, VX_BINARY_BODY_X8)
+DEF_VX_BINARY_REVERSE_CASE_1(int16_t, -, rsub, VX_BINARY_REVERSE_BODY_X8);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c
index b46b74a0887..7326ded06f0 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int32_t, +, add, VX_BINARY_BODY_X4)
 DEF_VX_BINARY_CASE_1(int32_t, -, sub, VX_BINARY_BODY_X4)
+DEF_VX_BINARY_REVERSE_CASE_1(int32_t, -, rsub, VX_BINARY_REVERSE_BODY_X4);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c
index 13e64d7752b..7b8b63dd3ce 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int64_t, +, add, VX_BINARY_BODY)
 DEF_VX_BINARY_CASE_1(int64_t, -, sub, VX_BINARY_BODY)
+DEF_VX_BINARY_REVERSE_CASE_1(int64_t, -, rsub, VX_BINARY_REVERSE_BODY);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c
index 1f58daaad38..f440b7075dc 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int8_t, +, add, VX_BINARY_BODY_X16)
 DEF_VX_BINARY_CASE_1(int8_t, -, sub, VX_BINARY_BODY_X16)
+DEF_VX_BINARY_REVERSE_CASE_1(int8_t, -, rsub, VX_BINARY_REVERSE_BODY_X16);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c
index 2249cb242fe..c36c5cb6416 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c
@@ -6,6 +6,8 @@
 
 DEF_VX_BINARY_CASE_1(uint16_t, +, add, VX_BINARY_BODY_X8)
 DEF_VX_BINARY_CASE_1(uint16_t, -, sub, VX_BINARY_BODY_X8)
+DEF_VX_BINARY_REVERSE_CASE_1(uint16_t, -, rsub, VX_BINARY_REVERSE_BODY_X8);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c
index d768fc72141..cfbcd9e5772 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf

[PATCH v1 6/8] RISC-V: Add test for vec_duplicate + vrsub.vv combine case 1 with GR2VR cost 1

2025-05-18 Thread pan2 . li

From: Pan Li 

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Add asm check
for vrsub with GR2VR cost 1.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c  | 2 ++
 8 files changed, 16 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c
index 05742671003..3f33c45fafb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int16_t, +, add, VX_BINARY_BODY_X8)
 DEF_VX_BINARY_CASE_1(int16_t, -, sub, VX_BINARY_BODY_X8)
+DEF_VX_BINARY_REVERSE_CASE_1(int16_t, -, rsub, VX_BINARY_REVERSE_BODY_X8);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c
index f990e34355e..059cf0b1d2e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int32_t, +, add, VX_BINARY_BODY_X4)
 DEF_VX_BINARY_CASE_1(int32_t, -, sub, VX_BINARY_BODY_X4)
+DEF_VX_BINARY_REVERSE_CASE_1(int32_t, -, rsub, VX_BINARY_REVERSE_BODY_X4);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c
index 3b189e31c6f..9ac1dd06714 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int64_t, +, add, VX_BINARY_BODY)
 DEF_VX_BINARY_CASE_1(int64_t, -, sub, VX_BINARY_BODY)
+DEF_VX_BINARY_REVERSE_CASE_1(int64_t, -, rsub, VX_BINARY_REVERSE_BODY);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c
index 3590b88d761..63d0a820aa9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(int8_t, +, add, VX_BINARY_BODY_X16)
 DEF_VX_BINARY_CASE_1(int8_t, -, sub, VX_BINARY_BODY_X16)
+DEF_VX_BINARY_REVERSE_CASE_1(int8_t, -, rsub, VX_BINARY_REVERSE_BODY_X16);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c
index 994c7f24652..fe0ab0ea081 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_1(uint16_t, +, add, VX_BINARY_BODY_X8)
 DEF_VX_BINARY_CASE_1(uint16_t, -, sub, VX_BINARY_BODY_X8)
+DEF_VX_BINARY_REVERSE_CASE_1(uint16_t, -, rsub, VX_BINARY_REVERSE_BODY_X8);
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
+/* { dg-final { scan-assembler {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c
index 2aceab5ff51..305f3564bb5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c

[PATCH v1 3/8] RISC-V: Add test for vec_duplicate + vrsub.vv combine case 0 with GR2VR cost 1

2025-05-18 Thread pan2 . li

From: Pan Li 

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Add vrsub asm
dump check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c  | 2 ++
 8 files changed, 16 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c
index 49e9957cf15..c55eaaac278 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int16_t, +, add)
 DEF_VX_BINARY_CASE_0(int16_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int16_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c
index 869f9fd7e24..0a0258ccfee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int32_t, +, add)
 DEF_VX_BINARY_CASE_0(int32_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int32_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c
index 6ba71431997..4956315ee14 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int64_t, +, add)
 DEF_VX_BINARY_CASE_0(int64_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int64_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c
index 128a279dbb2..c1fa3b605d7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(int8_t, +, add)
 DEF_VX_BINARY_CASE_0(int8_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(int8_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c
index a2a35ccd8f1..5dca3850240 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(uint16_t, +, add)
 DEF_VX_BINARY_CASE_0(uint16_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(uint16_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembler-not {vrsub.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c
index bd89bfa6fd0..4460fc06d00 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c
@@ -5,6 +5,8 @@
 
 DEF_VX_BINARY_CASE_0(uint32_t, +, add)
 DEF_VX_BINARY_CASE_0(uint32_t, -, sub)
+DEF_VX_BINARY_REVERSE_CASE_0(uint32_t, -, rsub);
 
 /* { dg-final { scan-assembler-not {vadd.vx} } } */
 /* { dg-final { scan-assembler-not {vsub.vx} } } */
+/* { dg-final { scan-assembl

Re: [PATCH] RISC-V: Support Zilsd code gen

2025-05-18 Thread Kito Cheng

On Sat, May 17, 2025 at 9:34 PM Jeff Law  wrote:
>
>
>
> On 5/14/25 9:14 PM, Kito Cheng wrote:
> > This commit adds the code gen support for Zilsd, which is a
> > newly added extension for RISC-V. The Zilsd extension allows
> > for loading and storing 64-bit values using even-odd register
> > pairs.
> >
> > We only try to do miminal code gen support for that, which means only
> > use the new instructions when the load store is 64 bits data, we can use
> > that to optimize the code gen of memcpy/memset/memmove and also the
> > prologue and epilogue of functions, but I think that probably should be
> > done in a follow up patch.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.cc (riscv_legitimize_move): Handle
> >   load/store with odd-even reg pair.
> >   (riscv_split_64bit_move_p): Don't split load/store if zilsd enabled.
> >   (riscv_hard_regno_mode_ok): Only allow even reg can be used for
> >   64 bits mode for zilsd.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zilsd-code-gen.c: New test.
> > ---
> >   gcc/config/riscv/riscv.cc | 38 +++
> >   .../gcc.target/riscv/zilsd-code-gen.c | 18 +
> >   2 files changed, 56 insertions(+)
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zilsd-code-gen.c
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index d28aee4b439..f5ee3ce9034 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -3742,6 +3742,25 @@ riscv_legitimize_move (machine_mode mode, rtx dest, 
> > rtx src)
> > return true;
> >   }
> >
> > +  if (TARGET_ZILSD
> > +  && (GET_MODE_UNIT_SIZE (mode) == (UNITS_PER_WORD * 2))
> > +  && ((REG_P (dest) && MEM_P (src))
> > +   || (MEM_P (dest) && REG_P (src
> > +{
> > +  rtx reg = REG_P (dest) ? dest : src;
> > +  unsigned regno = REGNO (reg);
> > +  /* ZILSD require even-odd register pair, let RA to
> > +  fix the constraint if the reg is hard reg and not even reg.  */
> > +  if ((regno < FIRST_PSEUDO_REGISTER)
> > +   && (regno % 2) != 0)
> > + {
> > +   rtx tmp = gen_reg_rtx (GET_MODE (reg));
> > +   emit_move_insn (tmp, src);
> > +   emit_move_insn (dest, tmp);
> > +   return true;
> > + }
> AFAICT this will only ever be called by the various movXX expanders, but
> those can be called during IRA, so we probably should a bit safer here.
>
> We could either add can_create_pseudo_p to the guard or we could assert
> it's true.  The former would be most appropriate if the rest of the code
> will still do the right thing, the latter if not.

Good suggestion, I guess just adding can_create_pseudo_p in the
if-condition would be fine,
we already have a splitter later to handle those cases we can't handle here.

>
>
> > @@ -9799,6 +9831,12 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
> > machine_mode mode)
> > if (riscv_v_ext_mode_p (mode))
> >   return false;
> >
> > +  /* Zilsd require load/store with even-odd reg pair.  */
> > +  if (TARGET_ZILSD
> > +   && (GET_MODE_UNIT_SIZE (mode) == (UNITS_PER_WORD * 2))
> > +   && ((regno % 2) != 0))
> > + return false;
> Do you need to check that you're working with a GPR here?

We have checked that few lines before, so we are safe here, but it's
not easy to observe from the diff since we have only 3 lines before,
hope one day we can migrate to something like github...

>
> At a higher level, my understanding is zilsd is only for rv32.  Do we
> want to be extra safe and check TARGET_32BIT alongside TARGET_ZILSD?

We have checked that during arch string parsing, so I'm inclined not
to check that again in other places :)

>
> Take the action you feel is appropriate on the above issues and the
> result is pre-approved for the trunk.

Thanks for reviewing, I will commit after applying those small changes
and testing :)

>
> Thanks,
> jeff
>

Re: [PATCH] RISC-V: Add zvfbfa and zvfofp8min intrinsic.

2025-05-18 Thread Kito Cheng

Seems like you don't really add new intrinsics for those two new
extensions? Also our policy is only to add extensions when they are
ratified.

I am happy to review the patch anyway, but just remind you we won't
accept that until it is ratified :)

On Mon, Apr 14, 2025 at 4:25 PM Dongyan Chen
 wrote:
>
> This patch add zvfbfa and zvfofp8min intrinsic[1].
> To enable GCC to recognize and process zvfbfa and zvfofp8min extensions 
> correctly at compile time.
>
> [1]https://github.com/aswaterman/riscv-misc/blob/e515758c24504cf3c16145bc763a76c59425ed1b/isa/zvfbfa.adoc
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: New intrinsic.
> * config/riscv/riscv-vector-builtins.cc 
> (validate_instance_type_required_extensions): Add required_ext checking for 
> 'zvfbfa' and 'zvfofp8min'.
> * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_OFP_8): New 
> bit value for OFP8.
> (enum required_ext): Add required_ext declaration for 'zvfbfa' and 
> 'zvfofp8min'.
> (required_ext_to_isa_name): Ditto.
> (required_extensions_specified): Ditto.
> (struct function_group_info): Add match case for 'zvfbfa' and 
> 'zvfofp8min'.
> * config/riscv/riscv.opt: New mask for 'zvfbfa' and 'zvfofp8min'.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/arch-45.c: New test.
> * gcc.target/riscv/arch-46.c: New test.
>
> ---
>  gcc/common/config/riscv/riscv-common.cc   |  9 +
>  gcc/config/riscv/riscv-vector-builtins.cc | 20 
>  gcc/config/riscv/riscv-vector-builtins.h  | 15 +++
>  gcc/config/riscv/riscv.opt|  6 ++
>  gcc/testsuite/gcc.target/riscv/arch-45.c  |  5 +
>  gcc/testsuite/gcc.target/riscv/arch-46.c  |  5 +
>  6 files changed, 60 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-45.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-46.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index b34409adf39c..7aaa9d92455b 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -193,12 +193,15 @@ static const riscv_implied_info_t riscv_implied_info[] =
>
>{"zfa", "f"},
>
> +  {"zvfbfa", "zve32f"},
> +  {"zvfbfa", "zfbfmin"},
>{"zvfbfmin", "zve32f"},
>{"zvfbfwma", "zvfbfmin"},
>{"zvfbfwma", "zfbfmin"},
>{"zvfhmin", "zve32f"},
>{"zvfh", "zve32f"},
>{"zvfh", "zfhmin"},
> +  {"zvfofp8min", "zve32f"},
>
>{"zhinx", "zhinxmin"},
>{"zhinxmin", "zfinx"},
> @@ -383,10 +386,12 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"zfbfmin",   ISA_SPEC_CLASS_NONE, 1, 0},
>{"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
>{"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvfbfa",ISA_SPEC_CLASS_NONE, 0, 1},
>{"zvfbfmin",  ISA_SPEC_CLASS_NONE, 1, 0},
>{"zvfbfwma",  ISA_SPEC_CLASS_NONE, 1, 0},
>{"zvfhmin",   ISA_SPEC_CLASS_NONE, 1, 0},
>{"zvfh",  ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvfofp8min",ISA_SPEC_CLASS_NONE, 0, 2},
>
>{"zfa", ISA_SPEC_CLASS_NONE, 1, 0},
>
> @@ -1676,10 +1681,12 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>RISCV_EXT_FLAG_ENTRY ("zve64x",   x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_64),
>RISCV_EXT_FLAG_ENTRY ("zve64f",   x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_FP_32),
>RISCV_EXT_FLAG_ENTRY ("zve64d",   x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_FP_64),
> +  RISCV_EXT_FLAG_ENTRY ("zvfbfa",   x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_BF_16),
>RISCV_EXT_FLAG_ENTRY ("zvfbfmin", x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_BF_16),
>RISCV_EXT_FLAG_ENTRY ("zvfbfwma", x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_BF_16),
>RISCV_EXT_FLAG_ENTRY ("zvfhmin",  x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_FP_16),
>RISCV_EXT_FLAG_ENTRY ("zvfh", x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_FP_16),
> +  RISCV_EXT_FLAG_ENTRY ("zvfofp8min",   x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_OFP_8),
>
>RISCV_EXT_FLAG_ENTRY ("zvbb",   x_riscv_zvb_subext, MASK_ZVBB),
>RISCV_EXT_FLAG_ENTRY ("zvbc",   x_riscv_zvb_subext, MASK_ZVBC),
> @@ -1714,10 +1721,12 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>RISCV_EXT_FLAG_ENTRY ("zfbfmin",  x_riscv_zf_subext, MASK_ZFBFMIN),
>RISCV_EXT_FLAG_ENTRY ("zfhmin",   x_riscv_zf_subext, MASK_ZFHMIN),
>RISCV_EXT_FLAG_ENTRY ("zfh",  x_riscv_zf_subext, MASK_ZFH),
> +  RISCV_EXT_FLAG_ENTRY ("zvfbfa",   x_riscv_zf_subext, MASK_ZVFBFA),
>RISCV_EXT_FLAG_ENTRY ("zvfbfmin", x_riscv_zf_subext, MASK_ZVFBFMIN),
>RISCV_EXT_FLAG_ENTRY ("zvfbfwma", x_riscv_zf_subext, MASK_ZVFBFWMA),
>RISCV_EXT_FLAG_ENTRY ("zvfhmin",  x_riscv_zf_subext, MASK_ZVFHMIN),
>RISCV_EXT_FLAG_ENTRY ("zvfh", x_riscv_zf_subext, MASK_ZVFH),
> +  RISCV_EXT_FLAG_ENTRY ("zvfofp8min",   x_riscv_zf_subext, MASK_ZVFOFP8MIN),
>
>RISCV_EX

[PATCH] libstdc++: Implement C++23 P1659R3 starts_with and ends_with

2025-05-18 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__starts_with_fn, starts_with):
Define.
(__ends_with_fn, ends_with): Define.
* include/bits/version.def (ranges_starts_ends_with): Define.
* include/bits/version.h: Regenerate.
* include/std/algorithm: Provide __cpp_lib_ranges_starts_ends_with.
* src/c++23/std.cc.in (ranges::starts_with): Export.
(ranges::ends_with): Export.
* testsuite/25_algorithms/ends_with/1.cc: New test.
* testsuite/25_algorithms/starts_with/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   | 232 ++
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/algorithm|   1 +
 libstdc++-v3/src/c++23/std.cc.in  |   4 +
 .../testsuite/25_algorithms/ends_with/1.cc| 129 ++
 .../testsuite/25_algorithms/starts_with/1.cc  | 128 ++
 7 files changed, 512 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/ends_with/1.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/starts_with/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index f36e7dd59911..c59a555f528a 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -438,6 +438,238 @@ namespace ranges
 
   inline constexpr __search_n_fn search_n{};
 
+#if __glibcxx_ranges_starts_ends_with // C++ >= 23
+  struct __starts_with_fn
+  {
+template _Sent1,
+input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
+typename _Pred = ranges::equal_to,
+typename _Proj1 = identity, typename _Proj2 = identity>
+  requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, _Proj2>
+  constexpr bool
+  operator()(_Iter1 __first1, _Sent1 __last1,
+_Iter2 __first2, _Sent2 __last2, _Pred __pred = {},
+_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
+  {
+   iter_difference_t<_Iter1> __n1 = -1;
+   iter_difference_t<_Iter2> __n2 = -1;
+   if constexpr (sized_sentinel_for<_Sent1, _Iter1>)
+ __n1 = __last1 - __first1;
+   if constexpr (sized_sentinel_for<_Sent2, _Iter2>)
+ __n2 = __last2 - __first2;
+   return _S_impl(std::move(__first1), __last1,
+  std::move(__first2), __last2,
+  std::move(__pred),
+  std::move(__proj1), std::move(__proj2),
+  __n1, __n2);
+  }
+
+template
+  requires indirectly_comparable, iterator_t<_Range2>,
+_Pred, _Proj1, _Proj2>
+  constexpr bool
+  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
+_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
+  {
+   range_difference_t<_Range1> __n1 = -1;
+   range_difference_t<_Range1> __n2 = -1;
+   if constexpr (sized_range<_Range1>)
+ __n1 = ranges::size(__r1);
+   if constexpr (sized_range<_Range2>)
+ __n2 = ranges::size(__r2);
+   return _S_impl(ranges::begin(__r1), ranges::end(__r1),
+  ranges::begin(__r2), ranges::end(__r2),
+  std::move(__pred),
+  std::move(__proj1), std::move(__proj2),
+  __n1, __n2);
+  }
+
+template
+  static constexpr bool
+  _S_impl(_Iter1 __first1, _Sent1 __last1,
+ _Iter2 __first2, _Sent2 __last2,
+ _Pred __pred,
+ _Proj1 __proj1, _Proj2 __proj2,
+ iter_difference_t<_Iter1> __n1,
+ iter_difference_t<_Iter2> __n2)
+  {
+   if (__n1 != -1 && __n2 != -1)
+ {
+   if (__n1 < __n2)
+ return false;
+   if constexpr (random_access_iterator<_Iter1>)
+ return ranges::equal(__first1, __first1 + __n2,
+  std::move(__first2), __last2,
+  std::move(__pred),
+  std::move(__proj1), std::move(__proj2));
+   else
+ return ranges::equal(counted_iterator(std::move(__first1), __n2),
+  default_sentinel,
+  std::move(__first2), __last2,
+  std::move(__pred),
+  std::move(__proj1), std::move(__proj2));
+ }
+   else
+ return ranges::mismatch(std::move(__first1), __last1,
+ std::move(__first2), __last2,
+ std::move(__pred),
+ std::move(__proj1), std::move(__proj2)).in2 
== __last2;
+  }
+  };
+
+  inline constexpr __starts_with_fn starts_with{};
+
+  struct __ends_with_fn
+  {
+t

[PATCH] RISC-V: Rename conflicting variables in gen-riscv-ext-texi.cc

2025-05-18 Thread Songhe Zhu

From: zhusonghe 

The variables `major` and `minor` in `gen-riscv-ext-texi.cc`
conflict with the macros of the same name defined in ``,
which are exposed when building with newer versions of GCC on older
Linux distributions (e.g., Ubuntu 18.04). To resolve this, we rename them
to `major_version` and `minor_version` respectively. This aligns with the
GCC community's recommended practice [1] and improves code clarity.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683881.html

gcc/ChangeLog:

* config/riscv/gen-riscv-ext-texi.cc (struct version_t):rename
major/minor to major_version/minor_version.

Signed-off-by: Songhe Zhu 
---
 gcc/config/riscv/gen-riscv-ext-texi.cc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/gen-riscv-ext-texi.cc 
b/gcc/config/riscv/gen-riscv-ext-texi.cc
index e15fdbf36f6..c29a375d56c 100644
--- a/gcc/config/riscv/gen-riscv-ext-texi.cc
+++ b/gcc/config/riscv/gen-riscv-ext-texi.cc
@@ -6,22 +6,22 @@
 
 struct version_t
 {
-  int major;
-  int minor;
+  int major_version;
+  int minor_version;
   version_t (int major, int minor,
 enum riscv_isa_spec_class spec = ISA_SPEC_CLASS_NONE)
-: major (major), minor (minor)
+: major_version (major), minor_version (minor)
   {}
   bool operator<(const version_t &other) const
   {
-if (major != other.major)
-  return major < other.major;
-return minor < other.minor;
+if (major_version != other.major_version)
+  return major_version < other.major_version;
+return minor_version < other.minor_version;
   }
 
   bool operator== (const version_t &other) const
   {
-return major == other.major && minor == other.minor;
+return major_version == other.major_version && minor_version == 
other.minor_version;
   }
 };
 
@@ -39,7 +39,7 @@ print_ext_doc_entry (const std::string &ext_name, const 
std::string &full_name,
   printf ("@tab");
   for (const auto &version : unique_versions)
 {
-  printf (" %d.%d", version.major, version.minor);
+  printf (" %d.%d", version.major_version, version.minor_version);
 }
   printf ("\n");
   printf ("@tab %s", full_name.c_str ());
-- 
2.17.1

RE: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-18 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 16, 2025 11:35 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Tamar Christina
> 
> Subject: [PATCH][RFC] Allow the target to request a masked vector epilogue
> 
> Targets recently got the ability to request the vector mode to be
> used for a vector epilogue (or the epilogue of a vector epilogue).  The
> following adds the ability for it to indicate the epilogue should use
> loop masking, irrespective of the --param vect-partial-vector-usage
> setting.
> 
> The simple prototype below uses a separate flag from the epilogue
> mode, but I wonder how we want to more generally want to handle
> whether to use masking or not when iterating over modes.  Currently
> we mostly rely on --param vect-partial-vector-usage.  aarch64
> and riscv have both variable-length modes but also fixed-size modes
> where for the latter, like on x86, the target couldn't request
> a mode specifically with or without masking.  It seems both
> aarch64 and riscv fully rely on cost comparison and fully
> exploiting the mode iteration space (but not masked vs. non-masked?!)
> here?
> 
> I was thinking of adding a vectorization_mode class that would
> encapsulate the mode and whether to allow masking or alternatively
> to make the vector_modes array (and the m_suggested_epilogue_mode)
> a std::pair of mode and mask flag?
> 

I personally like the class approach as it seems more easily extensible in
the future.  I was recently wondering about what would be useful for
epilogues, and with the change to unroll in the vectorizer it would be
useful to be able to requires an epilogue of a particular unroll factor.

Or some other way to convey your requested VF?

Thanks,
Tamar

> For the x86 case going the prototype way would be sufficient, we
> wouldn't want to say use a masked AVX epilogue for a AVX512 loop,
> so any further iteration on epilogue modes if the requested mode
> would fail to vectorize is OK to be unmasked.
> 
> Any comments on this?  You are not yet using m_suggested_epilogue_mode
> to get more than one vector epilogue, this might be a way to add
> heuristics when to use a masked epilogue.
> 
> Thanks,
> Richard.
> 
>   * tree-vectorizer.h (vector_costs::suggested_epilogue_mode):
>   Add masked output parameter and return m_masked_epilogue.
>   (vector_costs::m_masked_epilogue): New tristate flag.
>   (vector_costs::vector_costs): Initialize m_masked_epilogue.
>   * tree-vect-loop.cc (vect_analyze_loop_1): Pass in masked
>   flag to optionally initialize can_use_partial_vectors_p.
>   (vect_analyze_loop): For epilogues also get whether to use
>   a masked epilogue for this loop from the target and use
>   that for the first epilogue mode we try.
> ---
>  gcc/tree-vect-loop.cc | 29 +
>  gcc/tree-vectorizer.h | 12 +---
>  2 files changed, 30 insertions(+), 11 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 2d1a6883e6b..4af510ff20c 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3407,6 +3407,7 @@ vect_analyze_loop_1 (class loop *loop,
> vec_info_shared *shared,
>const vect_loop_form_info *loop_form_info,
>loop_vec_info orig_loop_vinfo,
>const vector_modes &vector_modes, unsigned &mode_i,
> +  int masked_p,
>machine_mode &autodetected_vector_mode,
>bool &fatal)
>  {
> @@ -3415,6 +3416,8 @@ vect_analyze_loop_1 (class loop *loop,
> vec_info_shared *shared,
> 
>machine_mode vector_mode = vector_modes[mode_i];
>loop_vinfo->vector_mode = vector_mode;
> +  if (masked_p != -1)
> +loop_vinfo->can_use_partial_vectors_p = masked_p;
>unsigned int suggested_unroll_factor = 1;
>unsigned slp_done_for_suggested_uf = 0;
> 
> @@ -3580,7 +3583,7 @@ vect_analyze_loop (class loop *loop, gimple
> *loop_vectorized_call,
>cached_vf_per_mode[last_mode_i] = -1;
>opt_loop_vec_info loop_vinfo
>   = vect_analyze_loop_1 (loop, shared, &loop_form_info,
> -NULL, vector_modes, mode_i,
> +NULL, vector_modes, mode_i, -1,
>  autodetected_vector_mode, fatal);
>if (fatal)
>   break;
> @@ -3665,19 +3668,24 @@ vect_analyze_loop (class loop *loop, gimple
> *loop_vectorized_call,
>   array may contain length-agnostic and length-specific modes.  Their
>   ordering is not guaranteed, so we could end up picking a mode for the 
> main
>   loop that is after the epilogue's optimal mode.  */
> +  int masked_p = -1;
>if (!unlimited_cost_model (loop)
> -  && first_loop_vinfo->vector_costs->suggested_epilogue_mode () !=
> VOIDmode)
> +  && (first_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p)
> +   != VOIDmode))
>  {
>vector_modes[0]
> - = first_loop_vinfo->vector_co

[r16-372 Regression] FAIL: gfortran.dg/specifics_1.f90 -O3 -g execution test on Linux/x86_64

2025-05-18 Thread haochen.jiang

On Linux/x86_64,

064cac730f88dc71c6da578f9ae5b8e092ab6cd4 is the first bad commit
commit 064cac730f88dc71c6da578f9ae5b8e092ab6cd4
Author: Jan Hubicka 
Date:   Sun May 4 10:52:35 2025 +0200

Improve maybe_hot handling in inliner heuristics

caused

FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "Alignment of 
access forced using peeling" 1
FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "vectorized 1 
loops" 1
FAIL: gcc.dg/tree-ssa/pr81627.c scan-tree-dump-times pcom "Store-stores chain" 1
FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512vl-vpmovwb-2.c (test for excess errors)
FAIL: g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C   -O2  execution 
test
FAIL: g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C   -O3 -g  
execution test
FAIL: g++.dg/coroutines/torture/func-params-07.C   -O2  execution test
FAIL: g++.dg/coroutines/torture/func-params-07.C   -O3 -g  execution test
FAIL: g++.dg/coroutines/torture/pr103953.C   -O2  execution test
FAIL: g++.dg/coroutines/torture/pr103953.C   -O3 -g  execution test
FAIL: g++.dg/vect/pr64410.cc  -std=c++17  scan-tree-dump vect "vectorized 1 
loops in function"
FAIL: g++.dg/vect/pr64410.cc  -std=c++26  scan-tree-dump vect "vectorized 1 
loops in function"
FAIL: gfortran.dg/guality/arg1.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  line 14 a(10) == 10
FAIL: gfortran.dg/specifics_1.f90   -O2  execution test
FAIL: gfortran.dg/specifics_1.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/specifics_1.f90   -O3 -g  execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r16-372/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/gen-vect-28.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/gen-vect-28.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr81627.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr81627.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr81627.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr81627.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovwb-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovwb-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovwb-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovwb-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="coro-torture.exp=g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C
 --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="coro-torture.exp=g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C
 --target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="coro-torture.exp=g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C
 --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="coro-torture.exp=g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C
 --target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="coro-torture.exp=g++.dg/coroutines/torture/func-params-07.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="coro-torture.exp=g++.dg/coroutines/torture/func-params-07.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_d

[r16-385 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18d.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2025-05-18 Thread haochen.jiang

On Linux/x86_64,

c9982eec2d3edc5306291d4628f08825ba46d483 is the first bad commit
commit c9982eec2d3edc5306291d4628f08825ba46d483
Author: Thomas Schwinge 
Date:   Mon May 5 10:21:35 2025 +0200

vect-simd-clone-1[6-8][cd].c: Expect in-branch clones for x86: Fix target 
selector syntax

caused

FAIL: gcc.dg/vect/vect-simd-clone-16c.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16d.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17c.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17d.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18c.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18d.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r16-385/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16c.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16c.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16d.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16d.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17c.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17c.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17d.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17d.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18c.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18c.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18d.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18d.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r16-645 Regression] FAIL: gcc.target/i386/vect-epilogues-5.c scan-tree-dump-times vect "loop vectorized using 64 byte vectors" 2 on Linux/x86_64

2025-05-18 Thread haochen.jiang

On Linux/x86_64,

af7b84d0d02ffa23e4843e9555a888c9e80bd9b5 is the first bad commit
commit af7b84d0d02ffa23e4843e9555a888c9e80bd9b5
Author: Richard Biener 
Date:   Wed May 14 16:45:08 2025 +0200

Enhance -fopt-info-vec vectorized loop diagnostic

caused

FAIL: gcc.target/i386/vect-epilogues-4.c scan-tree-dump-times vect "loop 
vectorized using 64 byte vectors" 2
FAIL: gcc.target/i386/vect-epilogues-5.c scan-tree-dump-times vect "loop 
vectorized using 64 byte vectors" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r16-645/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-epilogues-4.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-epilogues-4.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-epilogues-4.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-epilogues-4.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-epilogues-5.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-epilogues-5.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-epilogues-5.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-epilogues-5.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r16-517 Regression] FAIL: gcc.target/i386/pr78794.c scan-assembler pandn on Linux/x86_64

2025-05-18 Thread haochen.jiang

On Linux/x86_64,

993aa0bd28722c7f01fb8310f1c79814aef217ed is the first bad commit
commit 993aa0bd28722c7f01fb8310f1c79814aef217ed
Author: Jan Hubicka 
Date:   Sat May 10 22:23:48 2025 +0200

i386: Fix some problems in stv cost model

caused

FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times vpro[lr]q 
29
FAIL: gcc.target/i386/pr78794.c scan-assembler pandn

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r16-517/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-stv-rotatedi-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr78794.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr78794.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r16-531 Regression] FAIL: gcc.target/i386/vect-shiftv8qi.c scan-assembler-times psllw 2 on Linux/x86_64

2025-05-18 Thread haochen.jiang

On Linux/x86_64,

37e61c793c1b22bdcfbf142cd6086da2745be596 is the first bad commit
commit 37e61c793c1b22bdcfbf142cd6086da2745be596
Author: Jan Hubicka 
Date:   Sun May 11 23:49:11 2025 +0200

i386: Fix move costs in vectorizer cost model.

caused

FAIL: gcc.target/i386/pr108938-3.c scan-assembler-times bswap[\t ]+ 3
FAIL: gcc.target/i386/pr111023-2.c scan-assembler (?:pcmpgtd|psrad)
FAIL: gcc.target/i386/pr111023-2.c scan-assembler (?:pcmpgtw|psraw)
FAIL: gcc.target/i386/pr111023-2.c scan-assembler punpckldq
FAIL: gcc.target/i386/pr111023-2.c scan-assembler punpcklwd
FAIL: gcc.target/i386/pr111023.c scan-assembler punpckldq
FAIL: gcc.target/i386/pr111023.c scan-assembler punpcklwd
FAIL: gcc.target/i386/vect-shiftv4qi.c scan-assembler-times psllw 2
FAIL: gcc.target/i386/vect-shiftv4qi.c scan-assembler-times psrlw 5
FAIL: gcc.target/i386/vect-shiftv8qi.c scan-assembler-times psllw 2
FAIL: gcc.target/i386/vect-shiftv8qi.c scan-assembler-times psrlw 5

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r16-531/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr108938-3.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr108938-3.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr111023-2.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr111023-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr111023.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr111023.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-shiftv4qi.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-shiftv4qi.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-shiftv8qi.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [r16-372 Regression] FAIL: gfortran.dg/specifics_1.f90 -O3 -g execution test on Linux/x86_64

2025-05-18 Thread Andrew Pinski

On Sun, May 18, 2025 at 11:19 PM haochen.jiang
 wrote:
>
> On Linux/x86_64,
>
> 064cac730f88dc71c6da578f9ae5b8e092ab6cd4 is the first bad commit
> commit 064cac730f88dc71c6da578f9ae5b8e092ab6cd4
> Author: Jan Hubicka 
> Date:   Sun May 4 10:52:35 2025 +0200
>
> Improve maybe_hot handling in inliner heuristics
>
> caused
>
> FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "Alignment of 
> access forced using peeling" 1
> FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "vectorized 1 
> loops" 1
> FAIL: gcc.dg/tree-ssa/pr81627.c scan-tree-dump-times pcom "Store-stores 
> chain" 1

This is just an extra inlining which causes pcom to happen twice now.
Will look into fix that this coming week.


> FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c (test for excess errors)
> FAIL: gcc.target/i386/avx512vl-vpmovwb-2.c (test for excess errors)
> FAIL: g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C   -O2  execution 
> test
> FAIL: g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none  execution test
> FAIL: g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C   -O3 -g  
> execution test
> FAIL: g++.dg/coroutines/torture/func-params-07.C   -O2  execution test
> FAIL: g++.dg/coroutines/torture/func-params-07.C   -O3 -g  execution test
> FAIL: g++.dg/coroutines/torture/pr103953.C   -O2  execution test
> FAIL: g++.dg/coroutines/torture/pr103953.C   -O3 -g  execution test

The coroutines failures might be a front-end issue or a testcase issue
dealing with operator new.

> FAIL: g++.dg/vect/pr64410.cc  -std=c++17  scan-tree-dump vect "vectorized 1 
> loops in function"
> FAIL: g++.dg/vect/pr64410.cc  -std=c++26  scan-tree-dump vect "vectorized 1 
> loops in function"
> FAIL: gfortran.dg/guality/arg1.f90   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  line 14 a(10) == 10



> FAIL: gfortran.dg/specifics_1.f90   -O2  execution test
> FAIL: gfortran.dg/specifics_1.f90   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/specifics_1.f90   -O3 -g  execution test

The specifics_1 failure is a front-end bug which is being fixed:
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684016.html .

>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r16-372/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/gen-vect-28.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/gen-vect-28.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr81627.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr81627.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr81627.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr81627.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovwb-2.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovwb-2.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovwb-2.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovwb-2.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="coro-torture.exp=g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C
>  --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="coro-torture.exp=g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C
>  --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> R

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-18 Thread Jennifer Schmitz



> On 16 May 2025, at 18:54, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>> [PATCH] [PR120276] regcprop: Return from copy_value for unordered modes
>> 
>> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
>> partial_subreg_p in the function copy_value during the RTL pass
>> regcprop, failing the assertion in
>> 
>> inline bool
>> partial_subreg_p (machine_mode outermode, machine_mode innermode)
>> {
>>  /* Modes involved in a subreg must be ordered.  In particular, we must
>> always know at compile time whether the subreg is paradoxical.  */
>>  poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>>  poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>>  gcc_checking_assert (ordered_p (outer_prec, inner_prec));
>>  return maybe_lt (outer_prec, inner_prec);
>> }
>> 
>> Returning from the function if the modes are not ordered before reaching
>> the call to partial_subreg_p resolves the ICE and passes bootstrap and
>> testing without regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>>  PR middle-end/120276
>>  * regcprop.cc (copy_value): Return in case of unordered modes.
>> 
>> gcc/testsuite/
>>  PR middle-end/120276
>>  * gcc.dg/torture/pr120276.c: New test.
> 
> OK, thanks.
Thanks, committed to trunk: 2ec5082dd24cef5149ba645ee88a9acd8b4c290a
Jennifer
> 
> Richard
> 
>> ---
>> gcc/regcprop.cc |  4 
>> gcc/testsuite/gcc.dg/torture/pr120276.c | 20 
>> 2 files changed, 24 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/torture/pr120276.c
>> 
>> diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
>> index 4fa1305526c..98ab3f77e83 100644
>> --- a/gcc/regcprop.cc
>> +++ b/gcc/regcprop.cc
>> @@ -332,6 +332,10 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>>   if (vd->e[sr].mode == VOIDmode)
>> set_value_regno (sr, vd->e[dr].mode, vd);
>> 
>> +  else if (!ordered_p (GET_MODE_PRECISION (vd->e[sr].mode),
>> +GET_MODE_PRECISION (GET_MODE (src
>> +return;
>> +
>>   /* If we are narrowing the input to a smaller number of hard regs,
>>  and it is in big endian, we are really extracting a high part.
>>  Since we generally associate a low part of a value with the value 
>> itself,
>> diff --git a/gcc/testsuite/gcc.dg/torture/pr120276.c 
>> b/gcc/testsuite/gcc.dg/torture/pr120276.c
>> new file mode 100644
>> index 000..9717a7103e5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/torture/pr120276.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-additional-options "-march=armv8.2-a+sve" { target aarch64*-*-* } } 
>> */
>> +
>> +int a;
>> +char b[1];
>> +int c[18];
>> +void d(char *);
>> +void e() {
>> +  int f;
>> +  char *g;
>> +  a = 0;
>> +  for (; a < 18; a++) {
>> +int h = f = 0;
>> +for (; f < 4; f++) {
>> +  g[a * 4 + f] = c[a] >> h;
>> +  h += 8;
>> +}
>> +  }
>> +  d(b);
>> +}
>> \ No newline at end of file



smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH v2] driver: Fix multilib_os_dir and multiarch_dir for those target use TARGET_COMPUTE_MULTILIB

2025-05-18 Thread Kito Cheng

Hi Jin:

Thanks for heads up:)

Hi Jeff:

I've rebased that on the trunk and everything seems right, do you think
it's OK for the trunk?


On Mon, May 19, 2025 at 2:35 PM Jin Ma  wrote:

> On Sun, 16 Mar 2025 11:23:07 -0600, Jeff Law wrote:
> >
> >
> > On 3/10/25 2:26 AM, Kito Cheng wrote:
> > > This patch fixes the multilib_os_dir and multiarch_dir for those
> targets
> > > that use TARGET_COMPUTE_MULTILIB, since the TARGET_COMPUTE_MULTILIB
> hook
> > > only update/fix the multilib_dir but not the multilib_os_dir and
> multiarch_dir,
> > > so the multilib_os_dir and multiarch_dir are not set correctly for
> those targets.
> > Thankfully only RISC-V defines TARGET_COMPUTE_MULTILIB.  Though that may
> > be an argument we should look to avoid whatever magic we're doing in
> there.
> >
> >
> > >
> > > Use RISC-V linux target (riscv64-unknown-linux-gnu) as an example:
> > >
> > > ```
> > > $ riscv64-unknown-linux-gnu-gcc -print-multi-lib
> > > .;
> > > lib32/ilp32;@march=rv32imac@mabi=ilp32
> > > lib32/ilp32d;@march=rv32imafdc@mabi=ilp32d
> > > lib64/lp64;@march=rv64imac@mabi=lp64
> > > lib64/lp64d;@march=rv64imafdc@mabi=lp64d
> > > ```
> > >
> > > If we use the exactly same -march and -mabi options to compile a
> source file,
> > > the multilib_os_dir and multiarch_dir are set correctly:
> > >
> > > ```
> > > $ riscv64-unknown-linux-gnu-gcc -print-multi-os-directory
> -march=rv64imafdc -mabi=lp64d
> > > ../lib64/lp64d
> > > $ riscv64-unknown-linux-gnu-gcc -print-multi-directory
> -march=rv64imafdc -mabi=lp64d
> > > lib64/lp64d
> > > ```
> > >
> > > However if we use the -march=rv64imafdcv -mabi=lp64d option to compile
> a source
> > > file, the multilib_os_dir and multiarch_dir are not set correctly:
> > > ```
> > > $ riscv64-unknown-linux-gnu-gcc -print-multi-os-directory
> -march=rv64imafdc -mabi=lp64d
> > > lib64/lp64d
> > > $ riscv64-unknown-linux-gnu-gcc -print-multi-directory
> -march=rv64imafdc -mabi=lp64d
> > > lib64/lp64d
> > > ```
> > >
> > > That's because the TARGET_COMPUTE_MULTILIB hook only update/fix the
> multilib_dir
> > > but not the multilib_os_dir, so the multilib_os_dir is blank and will
> use same
> > > value as multilib_dir, but that is not correct.
> > >
> > > So we introduce second chance to fix the multilib_os_dir if it's not
> set, we do
> > > also try to fix the multiarch_dir, because it may also not set
> correctly if
> > > multilib_os_dir is not set.
> > >
> > > Changes since v1:
> > > - Fix non-multilib build.
> > > - Fix fix indentation.
> > >
> > > gcc/ChangeLog:
> > >
> > > * gcc.c (find_multilib_os_dir_by_multilib_dir): New.
> > > (set_multilib_dir): Fix multilib_os_dir and multiarch_dir
> > > if multilib_os_dir is not set.
> > Given the fact this code is shared and I don't have a good handle on its
> > behavior and how the change potentially affects other targets, I'm
> > inclined to ask for this to wait for gcc-16 development to open and
> > backport into gcc-15.2 after soak time on the trunk.
> >
> > Jeff
>
> Hi,I think this patch is essential. Can we proceed to push it to the trunk
> now?
>
> Best regards,
> Jin Ma

Re: [PATCH] RISC-V: Support Zilsd code gen

2025-05-18 Thread Kito Cheng

committed to trunk :)

On Mon, May 19, 2025 at 11:49 AM Kito Cheng  wrote:

> On Sat, May 17, 2025 at 9:34 PM Jeff Law  wrote:
> >
> >
> >
> > On 5/14/25 9:14 PM, Kito Cheng wrote:
> > > This commit adds the code gen support for Zilsd, which is a
> > > newly added extension for RISC-V. The Zilsd extension allows
> > > for loading and storing 64-bit values using even-odd register
> > > pairs.
> > >
> > > We only try to do miminal code gen support for that, which means only
> > > use the new instructions when the load store is 64 bits data, we can
> use
> > > that to optimize the code gen of memcpy/memset/memmove and also the
> > > prologue and epilogue of functions, but I think that probably should be
> > > done in a follow up patch.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/riscv/riscv.cc (riscv_legitimize_move): Handle
> > >   load/store with odd-even reg pair.
> > >   (riscv_split_64bit_move_p): Don't split load/store if zilsd
> enabled.
> > >   (riscv_hard_regno_mode_ok): Only allow even reg can be used for
> > >   64 bits mode for zilsd.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/riscv/zilsd-code-gen.c: New test.
> > > ---
> > >   gcc/config/riscv/riscv.cc | 38
> +++
> > >   .../gcc.target/riscv/zilsd-code-gen.c | 18 +
> > >   2 files changed, 56 insertions(+)
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/zilsd-code-gen.c
> > >
> > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > > index d28aee4b439..f5ee3ce9034 100644
> > > --- a/gcc/config/riscv/riscv.cc
> > > +++ b/gcc/config/riscv/riscv.cc
> > > @@ -3742,6 +3742,25 @@ riscv_legitimize_move (machine_mode mode, rtx
> dest, rtx src)
> > > return true;
> > >   }
> > >
> > > +  if (TARGET_ZILSD
> > > +  && (GET_MODE_UNIT_SIZE (mode) == (UNITS_PER_WORD * 2))
> > > +  && ((REG_P (dest) && MEM_P (src))
> > > +   || (MEM_P (dest) && REG_P (src
> > > +{
> > > +  rtx reg = REG_P (dest) ? dest : src;
> > > +  unsigned regno = REGNO (reg);
> > > +  /* ZILSD require even-odd register pair, let RA to
> > > +  fix the constraint if the reg is hard reg and not even reg.  */
> > > +  if ((regno < FIRST_PSEUDO_REGISTER)
> > > +   && (regno % 2) != 0)
> > > + {
> > > +   rtx tmp = gen_reg_rtx (GET_MODE (reg));
> > > +   emit_move_insn (tmp, src);
> > > +   emit_move_insn (dest, tmp);
> > > +   return true;
> > > + }
> > AFAICT this will only ever be called by the various movXX expanders, but
> > those can be called during IRA, so we probably should a bit safer here.
> >
> > We could either add can_create_pseudo_p to the guard or we could assert
> > it's true.  The former would be most appropriate if the rest of the code
> > will still do the right thing, the latter if not.
>
> Good suggestion, I guess just adding can_create_pseudo_p in the
> if-condition would be fine,
> we already have a splitter later to handle those cases we can't handle
> here.
>
> >
> >
> > > @@ -9799,6 +9831,12 @@ riscv_hard_regno_mode_ok (unsigned int regno,
> machine_mode mode)
> > > if (riscv_v_ext_mode_p (mode))
> > >   return false;
> > >
> > > +  /* Zilsd require load/store with even-odd reg pair.  */
> > > +  if (TARGET_ZILSD
> > > +   && (GET_MODE_UNIT_SIZE (mode) == (UNITS_PER_WORD * 2))
> > > +   && ((regno % 2) != 0))
> > > + return false;
> > Do you need to check that you're working with a GPR here?
>
> We have checked that few lines before, so we are safe here, but it's
> not easy to observe from the diff since we have only 3 lines before,
> hope one day we can migrate to something like github...
>
> >
> > At a higher level, my understanding is zilsd is only for rv32.  Do we
> > want to be extra safe and check TARGET_32BIT alongside TARGET_ZILSD?
>
> We have checked that during arch string parsing, so I'm inclined not
> to check that again in other places :)
>
> >
> > Take the action you feel is appropriate on the above issues and the
> > result is pre-approved for the trunk.
>
> Thanks for reviewing, I will commit after applying those small changes
> and testing :)
>
> >
> > Thanks,
> > jeff
> >
>

[to-be-committed][RISC-V] Avoid multiple assignments to output object

2025-05-18 Thread Jeff Law

This is the next batch of changes to reduce multiple assignments to an 
output object.  This time I'm focused on splitters in bitmanip.md.


This doesn't convert every case.  For example there is one case that is 
very clearly dependent on eliminating mvconst_internal and adjustment of 
a splitter for andn and until those things happen it would clearly be a 
QOI implementation regression.


There are cases where we set a scratch register more than once.  It may 
be possible to use an additional scratch.  I haven't tried that yet.


I've seen one failure to if-convert a sequence after this patch, but it 
should be resolved once the logical AND changes are merged.  Otherwise 
I'm primarily seeing slight differences in register allocation and 
scheduling.  Nothing concerning to me.


This has run through my tester, but I obviously want to see how it 
behaves in the upstream CI system as that tests slightly different 
multilibs than mine (on purpose).


Jeffgcc/

* config/riscv/bitmanip.md (various splits): Avoid writing the output
more than once when trivially possible.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index c226c39f580..400fea30f91 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -68,23 +68,25 @@ (define_split
   [(set (match_operand:DI 0 "register_operand")
(zero_extend:DI (plus:SI (ashift:SI (subreg:SI (match_operand:DI 1 
"register_operand") 0)
   (match_operand:QI 2 
"imm123_operand"))
-(subreg:SI (match_operand:DI 3 
"register_operand") 0]
+(subreg:SI (match_operand:DI 3 
"register_operand") 0
+   (clobber (match_operand:DI 4 "register_operand"))]
   "TARGET_64BIT && TARGET_ZBA"
-  [(set (match_dup 0) (plus:DI (ashift:DI (match_dup 1) (match_dup 2)) 
(match_dup 3)))
-   (set (match_dup 0) (zero_extend:DI (subreg:SI (match_dup 0) 0)))])
+  [(set (match_dup 4) (plus:DI (ashift:DI (match_dup 1) (match_dup 2)) 
(match_dup 3)))
+   (set (match_dup 0) (zero_extend:DI (subreg:SI (match_dup 4) 0)))])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
(zero_extend:DI (plus:SI (subreg:SI (and:DI (ashift:DI 
(match_operand:DI 1 "register_operand")
   
(match_operand:QI 2 "imm123_operand"))
(match_operand:DI 3 
"consecutive_bits_operand")) 0)
-(subreg:SI (match_operand:DI 4 
"register_operand") 0]
+(subreg:SI (match_operand:DI 4 
"register_operand") 0
+   (clobber (match_operand:DI 5 "register_operand"))]
   "TARGET_64BIT && TARGET_ZBA
&& riscv_shamt_matches_mask_p (INTVAL (operands[2]), INTVAL (operands[3]))
/* Ensure the mask includes all the bits in SImode.  */
&& ((INTVAL (operands[3]) & (HOST_WIDE_INT_1U << 31)) != 0)"
-  [(set (match_dup 0) (plus:DI (ashift:DI (match_dup 1) (match_dup 2)) 
(match_dup 4)))
-   (set (match_dup 0) (zero_extend:DI (subreg:SI (match_dup 0) 0)))])
+  [(set (match_dup 5) (plus:DI (ashift:DI (match_dup 1) (match_dup 2)) 
(match_dup 4)))
+   (set (match_dup 0) (zero_extend:DI (subreg:SI (match_dup 5) 0)))])
 
 ; Make sure that an andi followed by a sh[123]add remains a two instruction
 ; sequence--and is not torn apart into slli, slri, add.
@@ -195,13 +197,14 @@ (define_split
 (match_operand:QI 2 
"imm123_operand"))
  (match_operand 3 
"consecutive_bits32_operand"))
  (match_operand:DI 4 "register_operand"))
-(match_operand 5 "immediate_operand")))]
+(match_operand 5 "immediate_operand")))
+   (clobber (match_operand:DI 6 "register_operand"))]
   "TARGET_64BIT && TARGET_ZBA"
-  [(set (match_dup 0)
+  [(set (match_dup 6)
(plus:DI (and:DI (ashift:DI (match_dup 1) (match_dup 2))
 (match_dup 3))
 (match_dup 4)))
-   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 5)))])
+   (set (match_dup 0) (plus:DI (match_dup 6) (match_dup 5)))])
 
 ;; ZBB extension.
 
@@ -846,18 +849,19 @@ (define_insn "*bclri"
 (define_insn_and_split "*bclri_nottwobits"
   [(set (match_operand:X 0 "register_operand" "=r")
(and:X (match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "const_nottwobits_not_arith_operand" "i")))]
+  (match_operand:X 2 "const_nottwobits_not_arith_operand" "i")))
+   (clobber (match_scratch:X 3 "=&r"))]
   "TARGET_ZBS && !paradoxical_subreg_p (operands[1])"
   "#"
   "&& reload_completed"
-  [(set (match_dup 0) (and:X (match_dup 1) (match_dup 3)))
-   (set (match_dup 0) (and:X (match_dup 0) (match_dup 4)))]
+  [(set (match_dup 3) (and:X (match_dup 1) (match_dup 4)))
+   (set (match_dup 0) (and:X (match_dup 3) (match_dup 5)))]
 {
-

Re: [to-be-committed][RISC-V] Avoid setting output object more than once in IOR/XOR synthesis

2025-05-18 Thread Mark Wielaard

Hi Jeff,

On Thu, May 15, 2025 at 10:11:19PM -0600, Jeff Law wrote:
> This has been tested in my tester and is currently bootstrapping on
> my BPI.  Waiting on data from the pre-commit tester before moving
> forward...

It looks like the Sourceware p550 and spacemit-x60 builders do flag a
bootstrap issue with this:

https://builder.sourceware.org/buildbot/#/builders/337/builds/255
https://builder.sourceware.org/buildbot/#/builders/338/builds/228

../../gcc/gcc/config/riscv/riscv.cc: In function ‘bool 
synthesize_ior_xor(rtx_code, rtx_def**)’:
../../gcc/gcc/config/riscv/riscv.cc:14422:18: error: ‘output’ may be used 
uninitialized [-Werror=maybe-uninitialized]
14422 |   emit_move_insn (operands[0], output);
  |   ~~~^
../../gcc/gcc/config/riscv/riscv.cc:14393:7: note: ‘output’ was declared here
14393 |   rtx output;
  |   ^~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:2728: riscv.o] Error 1
make[3]: *** Waiting for unfinished jobs

>   * config/riscv/riscv.cc (synthesize_ior_xor): Avoid writing
>   operands[0] more than once, use new pseudos instead.
> 
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index d996965d095..b908c4684ac 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> [...]
> @@ -14378,6 +14394,7 @@ synthesize_ior_xor (rtx_code code, rtx operands[3])
>/* Synthesis is better than loading the constant.  */
>ival = INTVAL (operands[2]);
>rtx input = operands[1];
> +  rtx output;
>  
>/* Emit the [x]ori insn that sets the low 11 bits into
>   the proper state.  */

Should output here be initialized to NULL_RTX?

Thanks,

Mark

[PATCH] cobol: Minor grammatical correction as the first issue.

2025-05-18 Thread Hugo Marrassé

Hi everyone,

I started studying GCC and the new COBOL part when I noticed something that
looked like a typing error. I thought it would make a good first issue to
report, so here is my patch.

Here is the patch
From 4f7fd1e08151df26b37a5a1f2cbce2623f214361 Mon Sep 17 00:00:00 2001
From: pulk66-s 
Date: Sun, 18 May 2025 16:19:04 +0200
Subject: [PATCH] cobol: fix minor grammar in comments

---
 gcc/cobol/lexio.cc | 2 +-
 gcc/cobol/parse.y  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/cobol/lexio.cc b/gcc/cobol/lexio.cc
index 2db1af273e9..4f68bf65887 100644
--- a/gcc/cobol/lexio.cc
+++ b/gcc/cobol/lexio.cc
@@ -1455,7 +1455,7 @@ cdftext::lex_open( const char filename[] ) {
 
   int output = open_output();
 
-  // Process any files supplied by the -include comamnd-line option.
+  // Process any files supplied by the -include command-line option.
   for( auto name : included_files ) {
 int input;
 if( -1 == (input = open(name, O_RDONLY)) ) {
diff --git a/gcc/cobol/parse.y b/gcc/cobol/parse.y
index cb96c907361..1df9a9a63e9 100644
--- a/gcc/cobol/parse.y
+++ b/gcc/cobol/parse.y
@@ -5038,7 +5038,7 @@ accept: accept_body end_accept {
 		  switch( $accept_body.func ) {
 		  case accept_done_e:
 		error_msg(@ec, "ON EXCEPTION valid only "
-			"with ENVIRONMENT or COMAMND-LINE(n)");
+			"with ENVIRONMENT or COMMAND-LINE(n)");
 		break;
 		  case accept_command_line_e:
 		if( $1.from->field == NULL ) { // take next command-line arg
@@ -5050,7 +5050,7 @@ accept: accept_body end_accept {
 		  parser_move(*$1.into, *$1.from);
 		  if( $ec.on_error || $ec.not_error ) {
 			error_msg(@ec, "ON EXCEPTION valid only "
-"with ENVIRONMENT or COMAMND-LINE(n)");
+"with ENVIRONMENT or COMMAND-LINE(n)");
 		  }
 		} else {
 		  parser_accept_command_line(*$1.into, *$1.from,
-- 
2.45.2

[PATCH v1 4/6] libstdc++: Add tests for layout_right.

2025-05-18 Thread Luc Grosheintz

Adds tests for layout_right and for the parts of layout_left that depend
on layout_right.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: Add
tests for layout_stride.
* testsuite/23_containers/mdspan/layouts/ctors.cc: Add tests for
layout_right and the interaction with layout_left.
* testsuite/23_containers/mdspan/layouts/mapping.cc: ditto.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/layouts/class_mandate_neg.cc   |  1 +
 .../23_containers/mdspan/layouts/ctors.cc | 64 +++
 .../23_containers/mdspan/layouts/mapping.cc   | 78 ---
 3 files changed, 133 insertions(+), 10 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
index f122541b3e8..137cf8f06a9 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
@@ -18,5 +18,6 @@ template
   };
 
 A a_left; // { dg-error "required from" }
+A a_right;   // { dg-error "required from" }
 
 // { dg-prune-output "must be representable as index_type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
index 4592a05dec8..e3e25528f33 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -242,6 +242,66 @@ namespace from_same_layout
 }
 }
 
+// ctor: mapping(layout_{right,left}::mapping)
+namespace from_left_or_right
+{
+  template
+constexpr void
+verify_ctor(OExtents oexts)
+{
+  using SMapping = typename SLayout::mapping;
+  using OMapping = typename OLayout::mapping;
+
+  constexpr bool expected = std::is_convertible_v;
+  if constexpr (expected)
+   verify_nothrow_convertible(OMapping(oexts));
+  else
+   verify_nothrow_constructible(OMapping(oexts));
+}
+
+  template
+constexpr bool
+test_ctor()
+{
+  assert_not_constructible<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>>();
+
+  verify_ctor>(
+   std::extents{});
+
+  verify_ctor>(
+   std::extents{});
+
+  assert_not_constructible<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>>();
+
+  verify_ctor>(
+   std::extents{});
+
+  verify_ctor>(
+   std::extents{});
+
+  verify_ctor>(
+   std::extents{});
+
+  assert_not_constructible<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>>();
+  return true;
+}
+
+  template
+constexpr void
+test_all()
+{
+  test_ctor();
+  static_assert(test_ctor());
+}
+}
+
 template
   constexpr void
   test_all()
@@ -254,5 +314,9 @@ int
 main()
 {
   test_all();
+  test_all();
+
+  from_left_or_right::test_all();
+  from_left_or_right::test_all();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
index 18f3548df67..7cbb284492c 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
@@ -301,6 +301,15 @@ template<>
 VERIFY(m.stride(1) == 3);
   }
 
+template<>
+  constexpr void
+  test_stride_2d()
+  {
+std::layout_right::mapping> m;
+VERIFY(m.stride(0) == 5);
+VERIFY(m.stride(1) == 1);
+  }
+
 template
   constexpr void
   test_stride_3d();
@@ -315,6 +324,16 @@ template<>
 VERIFY(m.stride(2) == 3*5);
   }
 
+template<>
+  constexpr void
+  test_stride_3d()
+  {
+std::layout_right::mapping m(std::dextents(3, 5, 7));
+VERIFY(m.stride(0) == 35);
+VERIFY(m.stride(1) == 7);
+VERIFY(m.stride(2) == 1);
+  }
+
 template
   constexpr bool
   test_stride_all()
@@ -389,24 +408,59 @@ template
 { m2 != m1 } -> std::same_as;
   };
 
-template
-  constexpr bool
+template
+  constexpr void
   test_has_op_eq()
   {
+static_assert(has_op_eq<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>> == Expected);
+
+static_assert(!has_op_eq<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>>);
+
+static_assert(has_op_eq<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>> == Expected);
+
+static_assert(has_op_eq<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>> == Expected);
+
 static_assert(!has_op_eq<
-   typename Layout::mapping>,
-   typename Layout::mapping>>);
+   typename SLayout::mapping>,
+   typename OLayout::mapping>>);
 
 static_assert(has_op_eq<
-   typename Layout::mapping>,
-   typename Layout::mapping>>);
+   typename SLayout::mapping>,
+   typename OLayout::map

[PATCH v1 2/6] libstdc++: Add tests for layout_left.

2025-05-18 Thread Luc Grosheintz

Implements a suite of tests for the currently implemented parts of
layout_left. The individual tests are templated over the layout type, to
allow reuse as more layouts are added.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: New test.
* testsuite/23_containers/mdspan/layouts/ctors.cc: New test.
* testsuite/23_containers/mdspan/layouts/mapping.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/layouts/class_mandate_neg.cc   |  22 +
 .../23_containers/mdspan/layouts/ctors.cc | 258 ++
 .../23_containers/mdspan/layouts/mapping.cc   | 445 ++
 3 files changed, 725 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
new file mode 100644
index 000..f122541b3e8
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
@@ -0,0 +1,22 @@
+// { dg-do compile { target c++23 } }
+#include
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+static constexpr size_t n = (size_t(1) << 7) - 1;
+
+template
+  struct A
+  {
+typename Layout::mapping> m0;
+typename Layout::mapping> m1;
+typename Layout::mapping> m2;
+
+using extents_type = std::extents;
+typename Layout::mapping m3; // { dg-error "required from" }
+  };
+
+A a_left; // { dg-error "required from" }
+
+// { dg-prune-output "must be representable as index_type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
new file mode 100644
index 000..4592a05dec8
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -0,0 +1,258 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  constexpr void
+  verify_from_exts(OExtents exts)
+  {
+auto m = Mapping(exts);
+VERIFY(m.extents() == exts);
+  }
+
+
+template
+  constexpr void
+  verify_from_mapping(OMapping other)
+  {
+auto m = SMapping(other);
+VERIFY(m.extents() == other.extents());
+  }
+
+template
+  requires (std::__mdspan::__is_extents)
+  constexpr void
+  verify(OExtents oexts)
+  {
+auto m = Mapping(oexts);
+VERIFY(m.extents() == oexts);
+  }
+
+template
+  requires (std::__mdspan::__standardized_mapping)
+  constexpr void
+  verify(OMapping other)
+  {
+constexpr auto rank = Mapping::extents_type::rank();
+auto m = Mapping(other);
+VERIFY(m.extents() == other.extents());
+if constexpr (rank > 0)
+  for(size_t i = 0; i < rank; ++i)
+   VERIFY(std::cmp_equal(m.stride(i), other.stride(i)));
+  }
+
+
+template
+  constexpr void
+  verify_nothrow_convertible(From from)
+  {
+static_assert(std::is_nothrow_constructible_v);
+static_assert(std::is_convertible_v);
+verify(from);
+  }
+
+template
+  constexpr void
+  verify_convertible(From from)
+  {
+static_assert(std::is_convertible_v);
+verify(from);
+  }
+
+template
+  constexpr void
+  verify_constructible(From from)
+  {
+static_assert(!std::is_convertible_v);
+static_assert(!std::is_nothrow_constructible_v);
+static_assert(std::is_constructible_v);
+verify(from);
+  }
+
+template
+  constexpr void
+  verify_nothrow_constructible(From from)
+  {
+static_assert(!std::is_convertible_v);
+static_assert(std::is_nothrow_constructible_v);
+verify(from);
+  }
+
+template
+  constexpr void
+  assert_not_constructible()
+  {
+static_assert(!std::is_constructible_v);
+  }
+
+// ctor: mapping(const extents&)
+namespace from_extents
+{
+  template
+constexpr void
+verify_nothrow_convertible(OExtents oexts)
+{
+  using Mapping = typename Layout::mapping;
+  ::verify_nothrow_convertible(oexts);
+}
+
+  template
+constexpr void
+verify_nothrow_constructible(OExtents oexts)
+{
+  using Mapping = typename Layout::mapping;
+  ::verify_nothrow_constructible(oexts);
+}
+
+  template
+constexpr void
+assert_not_constructible()
+{
+  using Mapping = typename Layout::mapping;
+  ::assert_not_constructible();
+}
+
+  template
+constexpr bool
+test_ctor()
+{
+  verify_nothrow_convertible>(
+   std::extents{});
+
+  verify_nothrow_convertible>(
+   std::extents{});
+
+  verify_nothrow_convertible>(
+   std::extents{2});
+
+  verify_nothrow_constructible>(
+   std::extents{});
+
+  verify_nothrow_constructible>(
+   std::extents{});
+
+  verify_nothrow_constructible>(
+   std::extents{});
+
+

[PATCH v1 0/6] Implement layouts from mdspan.

2025-05-18 Thread Luc Grosheintz

Technically, this is the second iteration of these patches. Previous
discussion can be found here:

https://gcc.gnu.org/pipermail/libstdc++/2025-May/061350.html`

The implementation of `layout_stride::mapping::is_exhaustive` needs
to be discussed, because for empty extents, the standard seems to
require implementing a formula that doesn't require returning true
in all cases.

Luc Grosheintz (6):
  libstdc++: Implement layout_left from mdspan.
  libstdc++: Add tests for layout_left.
  libstdc++: Implement layout_right from mdspan.
  libstdc++: Add tests for layout_right.
  libstdc++: Implement layout_stride from mdspan.
  libstdc++: Add tests for layout_stride.

 libstdc++-v3/include/std/mdspan   | 604 ++
 .../mdspan/layouts/class_mandate_neg.cc   |  42 ++
 .../23_containers/mdspan/layouts/ctors.cc | 421 
 .../23_containers/mdspan/layouts/mapping.cc   | 573 +
 .../23_containers/mdspan/layouts/stride.cc| 494 ++
 5 files changed, 2134 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc

-- 
2.49.0

[PATCH v1 3/6] libstdc++: Implement layout_right from mdspan.

2025-05-18 Thread Luc Grosheintz

Implement the parts of layout_left that depend on layout_right; and the
parts of layout_right that don't depend on layout_stride.

libstdc++-v3/ChangeLog:

* include/std/mdspan (layout_right): New class.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan | 153 +++-
 1 file changed, 152 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 3c1c33d9e9a..b1984eb2a33 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -360,6 +360,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class mapping;
   };
 
+  struct layout_right
+  {
+template
+  class mapping;
+  };
+
   namespace __mdspan
   {
 template
@@ -427,7 +433,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _Mapping>;
 
 template
-  concept __standardized_mapping = __mapping_of;
+  concept __standardized_mapping = __mapping_of
+  || __mapping_of;
 
 template
   concept __mapping_like = requires
@@ -488,6 +495,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: mapping(__other.extents(), __mdspan::__internal_ctor{})
{ }
 
+  template
+   requires (_Extents::rank() <= 1
+ && is_constructible_v<_Extents, _OExtents>)
+   constexpr explicit(!is_convertible_v<_OExtents, _Extents>)
+   mapping(const layout_right::mapping<_OExtents>& __other) noexcept
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { }
+
   constexpr mapping&
   operator=(const mapping&) noexcept = default;
 
@@ -544,6 +559,142 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
[[no_unique_address]] _Extents _M_extents;
 };
 
+  namespace __mdspan
+  {
+template
+  constexpr typename _Extents::index_type
+  __linear_index_right(const _Extents& __exts, _Indices... __indices)
+  {
+   using _IndexType = typename _Extents::index_type;
+   array<_IndexType, sizeof...(__indices)> __ind_arr{__indices...};
+   _IndexType __res = 0;
+   if constexpr (sizeof...(__indices) > 0)
+ {
+   _IndexType __mult = 1;
+   auto __update = [&, __pos = __exts.rank()](_IndexType) mutable
+ {
+   --__pos;
+   __res += __ind_arr[__pos] * __mult;
+   __mult *= __exts.extent(__pos);
+ };
+   (__update(__indices), ...);
+ }
+   return __res;
+  }
+  }
+
+  template
+class layout_right::mapping
+{
+  static_assert(__mdspan::__layout_extent<_Extents>,
+   "The size of extents_type must be representable as index_type");
+
+public:
+  using extents_type = _Extents;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using layout_type = layout_right;
+
+  constexpr
+  mapping() noexcept = default;
+
+  constexpr
+  mapping(const mapping&) noexcept = default;
+
+  constexpr
+  mapping(const _Extents& __extents) noexcept
+  : _M_extents(__extents)
+  { __glibcxx_assert(__mdspan::__is_representable_extents(_M_extents)); }
+
+private:
+  template
+   constexpr explicit
+   mapping(const _OExtents& __oexts, __mdspan::__internal_ctor) noexcept
+   : mapping(extents_type(__oexts))
+   {
+ static_assert(__mdspan::__representable_size<_OExtents, index_type>,
+   "The size of OtherExtents must be representable as index_type");
+   }
+
+public:
+  template
+   requires (is_constructible_v)
+   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
+   mapping(const mapping<_OExtents>& __other) noexcept
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { }
+
+  template
+   requires (extents_type::rank() <= 1
+   && is_constructible_v)
+   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
+   mapping(const layout_left::mapping<_OExtents>& __other) noexcept
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { }
+
+  constexpr mapping&
+  operator=(const mapping&) noexcept = default;
+
+  constexpr const _Extents&
+  extents() const noexcept { return _M_extents; }
+
+  constexpr index_type
+  required_span_size() const noexcept
+  { return __mdspan::__fwd_prod(_M_extents, extents_type::rank()); }
+
+  template<__mdspan::__valid_index_type... _Indices>
+   requires (sizeof...(_Indices) == extents_type::rank())
+   constexpr index_type
+   operator()(_Indices... __indices) const noexcept
+   {
+ return __mdspan::__linear_index_right(
+   _M_extents, static_cast(__indices)...);
+   }
+
+  static constexpr bool
+  is_always_unique() noexcept
+  { return true; }
+
+  static constexpr bool
+  is_always_

Re: [PATCH 1/3] genemit: Remove support for string operands

2025-05-18 Thread Richard Sandiford

Jeff Law  writes:
> On 5/16/25 11:32 AM, Richard Sandiford wrote:
>> gen_exp currently supports the 's' (string) operand type.  It would
>> certainly be possible to make the upcoming bytecode patch support
>> that too.  However, the rtx codes that have string operands should
>> be very rarely used in hard-coded define_insn/expand/split/peephole2
>> rtx templates (as opposed to things like attribute expressions,
>> where const_string is commonplace).  And AFAICT, no current target
>> does use them like that.
>> 
>> This patch therefore reports an error for these rtx codes,
>> rather than adding code that would be unused and untested.
>> 
>> gcc/
>>  * genemmit.cc (generator::gen_exp): Report an error for 's' operands.
> OK.  And we'll get a pretty good sense if a port is doing something 
> really weird in this space from my tester once this patch goes in.

Yeah.  I'm hoping my config-list.mk testing would have caught that
though, since it should show up as a build-time failure.

Thanks for the reviews.

Richard

Re: [PATCH 1/9] nds32: Avoid accessing beyond the operands[] array

2025-05-18 Thread Richard Sandiford

Jeff Law  writes:
> On 5/16/25 11:32 AM, Jeff Law wrote:
>> 
>> 
>> On 5/16/25 11:21 AM, Richard Sandiford wrote:
>>> This pattern used operands[2] to hold the shift amount, even though
>>> the pattern doesn't have an operand 2 (not even as a match_dup).
>>> This caused a build failure with -Werror:
>>>
>>>    array subscript 2 is above array bounds of ‘rtx_def* [2]’
>>>
>>> gcc/
>>> * config/nds32/nds32-intrinsic.md (unspec_get_pending_int): Use
>>> a local variable instead of operands[2].
>> Obviously OK.  IMHO you should just commit this kind of fix.
> You might consider looking at pr100837 which looks like it'd be fixed by 
> this change.

Ah yeah, good spot.  I'll add it to the commit message.

Richard

[PATCH v1 6/6] libstdc++: Add tests for layout_stride.

2025-05-18 Thread Luc Grosheintz

Implements the tests for layout_stride and for the features of the other
two layouts that depend on layout_stride.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: Add
tests for layout_stride.
* testsuite/23_containers/mdspan/layouts/ctors.cc: Add test for
layout_stride and the interaction with other layouts.
* testsuite/23_containers/mdspan/layouts/mapping.cc: Ditto.
* testsuite/23_containers/mdspan/layouts/stride.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/layouts/class_mandate_neg.cc   |  19 +
 .../23_containers/mdspan/layouts/ctors.cc |  99 
 .../23_containers/mdspan/layouts/mapping.cc   |  72 ++-
 .../23_containers/mdspan/layouts/stride.cc| 494 ++
 4 files changed, 683 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
index 137cf8f06a9..d1998f4eae3 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
@@ -17,7 +17,26 @@ template
 typename Layout::mapping m3; // { dg-error "required from" }
   };
 
+template
+  struct B // { dg-error "expansion of" }
+  {
+using Extents = std::extents;
+using OExtents = std::extents;
+
+using Mapping = typename Layout::mapping;
+using OMapping = typename Layout::mapping;
+
+Mapping m{OMapping{}};
+  };
+
 A a_left; // { dg-error "required from" }
 A a_right;   // { dg-error "required from" }
+A a_stride; // { dg-error "required from" }
+
+B<1, std::layout_left, std::layout_right> blr; // { dg-error "required 
here" }
+B<2, std::layout_left, std::layout_stride> bls;// { dg-error "required 
here" }
+
+B<3, std::layout_right, std::layout_left> brl; // { dg-error "required 
here" }
+B<4, std::layout_right, std::layout_stride> brs;   // { dg-error "required 
here" }
 
 // { dg-prune-output "must be representable as index_type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
index e3e25528f33..19a6c8853e9 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -302,12 +302,111 @@ namespace from_left_or_right
 }
 }
 
+// ctor: mapping(layout_stride::mapping)
+namespace from_stride
+{
+  template
+constexpr auto
+strides(Mapping m)
+{
+  constexpr auto rank = Mapping::extents_type::rank();
+  std::array s;
+
+  if constexpr (rank > 0)
+   for(size_t i = 0; i < rank; ++i)
+ s[i] = m.stride(i);
+  return s;
+}
+
+  template
+constexpr void
+verify_convertible(OExtents oexts)
+{
+  using Mapping = typename Layout::mapping;
+  using OMapping = std::layout_stride::mapping;
+
+  constexpr auto other = OMapping(oexts, strides(Mapping(Extents(oexts;
+  if constexpr (std::is_same_v)
+   ::verify_nothrow_convertible(other);
+  else
+   ::verify_convertible(other);
+}
+
+  template
+constexpr void
+verify_constructible(OExtents oexts)
+{
+  using Mapping = typename Layout::mapping;
+  using OMapping = std::layout_stride::mapping;
+
+  constexpr auto other = OMapping(oexts, strides(Mapping(Extents(oexts;
+  if constexpr (std::is_same_v)
+   ::verify_nothrow_constructible(other);
+  else
+   ::verify_constructible(other);
+}
+
+  template
+constexpr bool
+test_ctor()
+{
+  assert_not_constructible<
+   typename Layout::mapping>,
+   std::layout_stride::mapping>>();
+
+  assert_not_constructible<
+   typename Layout::mapping>,
+   std::layout_stride::mapping>>();
+
+  assert_not_constructible<
+   typename Layout::mapping>,
+   std::layout_stride::mapping>>();
+
+  verify_convertible>(std::extents{});
+
+  verify_convertible>(
+   std::extents{});
+
+  // Rank ==  0 doesn't check IndexType for convertibility.
+  verify_convertible>(
+   std::extents{});
+
+  verify_constructible>(
+   std::extents{});
+
+  verify_constructible>(
+   std::extents{});
+
+  verify_constructible>(
+   std::extents{});
+
+  verify_constructible>(
+   std::extents{});
+
+  verify_constructible>(
+   std::extents{});
+
+  verify_constructible>(
+   std::extents{});
+  return true;
+}
+
+  template
+constexpr void
+test_all()
+{
+  test_ctor();
+  static_assert(test_ctor());
+}
+}
+
 template
   constexpr void
   test_all()
   {
 from_extents::test_all();
 from_same_la

[PATCH v1 1/6] libstdc++: Implement layout_left from mdspan.

2025-05-18 Thread Luc Grosheintz

Implements the parts of layout_left that don't depend on any of the
other layouts.

libstdc++-v3/ChangeLog:

* include/std/mdspan (layout_left): New class.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan | 240 
 1 file changed, 240 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 47cfa405e44..3c1c33d9e9a 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -144,6 +144,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  { return __exts[__i]; });
  }
 
+   static constexpr size_t
+   _M_static_extents_prod(size_t __begin, size_t __end) noexcept
+   {
+ size_t __ret = 1;
+ if constexpr (_S_rank > 0)
+   for(size_t __i = __begin; __i < __end; ++__i)
+ __ret *= (_Extents[__i] == dynamic_extent ? 1 : _Extents[__i]);
+ return __ret;
+   }
+
+   constexpr _IndexType
+   _M_dynamic_extents_prod(size_t __begin, size_t __end) const noexcept
+   {
+ _IndexType __ret = 1;
+ if constexpr (_S_rank_dynamic > 0)
+   {
+ size_t __dyn_begin = _S_dynamic_index[__begin];
+ size_t __dyn_end = _S_dynamic_index[__end];
+
+ for(size_t __i = __dyn_begin; __i < __dyn_end; ++__i)
+   __ret *= _M_dynamic_extents[__i];
+   }
+ return __ret;
+   }
+
+   constexpr _IndexType
+   _M_extents_prod(size_t __begin, size_t __end) const noexcept
+   {
+ return _IndexType(_M_static_extents_prod(__begin, __end))
+* _M_dynamic_extents_prod(__begin, __end);
+   }
+
   private:
using _S_storage = __array_traits<_IndexType, _S_rank_dynamic>::_Type;
[[no_unique_address]] _S_storage _M_dynamic_extents;
@@ -190,6 +222,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  return _S_storage::_S_static_extent(__r);
   }
 
+  constexpr index_type
+  _M_fwd_prod(rank_type __r) const noexcept
+  { return _M_dynamic_extents._M_extents_prod(0, __r); }
+
+  constexpr index_type
+  _M_rev_prod(rank_type __r) const noexcept
+  { return _M_dynamic_extents._M_extents_prod(__r + 1, rank()); }
+
   constexpr index_type
   extent(rank_type __r) const noexcept
   {
@@ -286,6 +326,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   namespace __mdspan
   {
+template
+  constexpr typename _Extents::index_type
+  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
+  { return __exts._M_fwd_prod(__r); }
+
+template
+  constexpr typename _Extents::index_type
+  __rev_prod(const _Extents& __exts, size_t __r) noexcept
+  { return __exts._M_rev_prod(__r); }
+
 template
   auto __build_dextents_type(integer_sequence)
-> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
@@ -304,6 +354,196 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 explicit extents(_Integrals...) ->
   extents()...>;
 
+  struct layout_left
+  {
+template
+  class mapping;
+  };
+
+  namespace __mdspan
+  {
+template
+  constexpr bool __is_extents = false;
+
+template
+  constexpr bool __is_extents> = true;
+
+template
+  constexpr typename _Extents::index_type
+  __linear_index_left(const _Extents& __exts, _Indices... __indices)
+  {
+   using _IndexType = typename _Extents::index_type;
+   _IndexType __res = 0;
+   if constexpr (sizeof...(__indices) > 0)
+ {
+   _IndexType __mult = 1;
+   auto __update = [&, __pos = 0u](_IndexType __idx) mutable
+ {
+   __res += __idx * __mult;
+   __mult *= __exts.extent(__pos);
+   ++__pos;
+ };
+   (__update(__indices), ...);
+ }
+   return __res;
+  }
+
+template
+  constexpr bool
+  __is_representable_product(_GetFactor __get_factor)
+  {
+   size_t __rest = numeric_limits<_IndexType>::max();
+   for(size_t __i = 0; __i < _Nm; ++__i)
+ {
+   auto __factor = _IndexType(__get_factor(__i));
+   if (__factor == 0)
+ return true;
+   __rest /= __factor;
+ }
+   return __rest > 0;
+  }
+
+template
+  constexpr bool
+  __is_representable_extents(const _Extents& __exts)
+  {
+   using _IndexType = typename _Extents::index_type;
+   return __is_representable_product<_IndexType, _Extents::rank()>(
+   [&](size_t __i) { return __exts.extent(__i); });
+  }
+
+template
+  concept __representable_size = _Extents::rank_dynamic() != 0
+   || __is_representable_product<_IndexType, _Extents::rank()>(
+[](size_t __i) { return _Extents::static_extent(__i); });
+
+template
+  concept __layout_extent =
+   __representable_size<_Extents, typename _Extents::index_type>;
+
+template
+  concept __mapping_of =
+   is_same

Re: [PATCH 3/3] genemit: Use a byte encoding to generate insns

2025-05-18 Thread Richard Sandiford

Richard Biener  writes:
>> Am 16.05.2025 um 19:37 schrieb Richard Sandiford :
>> 
>> genemit has traditionally used open-coded gen_rtx_FOO sequences
>> to build up the instruction pattern.  This is now the source of
>> quite a bit of bloat in the binary, and also a source of slow
>> compile times.
>> 
>> Two obvious ways of trying to deal with this are:
>> 
>> (1) Try to identify rtxes that have a similar form and use shared
>>routines to generate rtxes of that form.
>> 
>> (2) Use a static table to encode the rtx and call a common routine
>>to expand it.
>> 
>> I did briefly look at (1).  However, it's more complex than (2),
>> and I think suffers from being the worst of both worlds, for reasons
>> that I'll explain below.  This patch therefore does (2).
>> 
>> In theory, one of the advantages of open-coding the calls to
>> gen_rtx_FOO is that the rtx can be populated using stores of known
>> constants (for the rtx code, mode, unspec number, etc).  However,
>> the time spent constructing an rtx is likely to be dominated by
>> the call to rtx_alloc, rather than by the stores to the fields.
>> 
>> Option (1) above loses this advantage of storing constants.
>> The shared routines would parameterise an rtx according to things
>> like the modes on the rtx and its suboperands, so the code would
>> need to fetch the parameters.  In a sense, the rtx structure would
>> be open-coded but the parameters would be table-encoded (albeit
>> in a simple way).
>> 
>> The expansion code also shouldn't be particularly hot.  Anything that
>> treats expand/discard cycles as very cheap would be misconceived,
>> since each discarded expansion generates garbage memory that needs
>> to be cleaned up later.
>> 
>> Option (2) turns out to be pretty simple -- certainly simpler
>> than (1) -- and seems to give a reasonable saving.  Some numbers,
>> all for --enable-checking=yes,rtl,extra:
>> 
>> [A] size of the @progbits sections in insn-emit-*.o, new / old
>> [B] size of the load segments in cc1, new / old
>> [C] time to compile a typical insn-emit*.cc, new / old
>> 
>> Target [A]  [B]  [C]
>> 
>> native aarch64  0.5627   0.9585   0.5677
>> native x86_64   0.5925   0.9467   0.6377
>> aarch64-x-riscv64   0.   0.9066   0.2762
>
> Nice.  So how large is the tables, aka what’s the effect on .rodata of cc1?
>
> One nice thing about the old way is that you can set breakpoints on the gen_* 
> routines.  Can the tables be annotated with comments so it’s easy to lookup 
> the part for a particular .md entry and is there a place to break on 
> conditional on some table index to simulate the old way?

Yeah, the number of gen_* routines is unchanged, and it's still possible
to set breakpoints on them.

I did wonder about removing the out-of-line gen_* functions where possible
and adding the encoding to the recog_data array instead.  But it would
still be necessary to define the gen_* routines as at least macros or
inline functions, since the target code can expect gen_* routines to exist
for any non-* named pattern, even optab ones.  Doing that did sound like it
would hurt debuggability and might be counterproductive in size terms too.

A typical function looks like:

-
/* .../gcc/config/aarch64/aarch64.md:1162 */
rtx
gen_aarch64_tbnehidi (rtx operand0, rtx operand1, rtx operand2)
{
  rtx operands[3] ATTRIBUTE_UNUSED = { operand0, operand1, operand2 };
  static const uint8_t expand_encoding[] = {
 0x17, 0x00, 0x02, 0x1f, 0x2f, 0x39, 0x00, 0x5c,
 0x00, 0x81, 0x06, 0x11, 0x01, 0x00, 0x27, 0x01,
 0x01, 0x01, 0x27, 0x00, 0x37, 0x00, 0x01, 0x02,
 0x2f, 0x05, 0x02, 0x42
  };
  return expand_rtx (expand_encoding, operands);
}
-

or, for embedded C++ code:

-
/* .../gcc/config/aarch64/aarch64.md:3117 */
rtx
gen_addsi3_carryinC (rtx operand0, rtx operand1, rtx operand2)
{
  rtx operands[7] ATTRIBUTE_UNUSED = { operand0, operand1, operand2 };
  start_sequence ();
  {
#define FAIL return (end_sequence (), nullptr)
#define DONE return end_sequence ()
#line 3134 "/home/ricsan01/gnu/src/gcc/gcc/config/aarch64/aarch64.md"
{
  operands[3] = gen_rtx_REG (CC_ADCmode, CC_REGNUM);
  rtx ccin = gen_rtx_REG (CC_Cmode, CC_REGNUM);
  operands[4] = gen_rtx_LTU (DImode, ccin, const0_rtx);
  operands[5] = gen_rtx_LTU (SImode, ccin, const0_rtx);
  operands[6] = immed_wide_int_const (wi::shwi (1, DImode)
  << GET_MODE_BITSIZE (SImode),
  TImode);
}
#undef DONE
#undef FAIL
  }
  static const uint8_t expand_encoding[] = {
 0x01, 0x17, 0x00, 0x02, 0x1f, 0x01, 0x03, 0x3a,
 0x0b, 0x3b, 0x11, 0x3b, 0x11, 0x01, 0x04, 0x6f,
 0x11, 0x01, 0x01, 0x6f, 0x11, 0x01, 0x02, 0x01,

[PATCH] Fortran: fix FAIL of gfortran.dg/specifics_1.f90 after r16-372 [PR120099]

2025-05-18 Thread Harald Anlauf


Dear all,

the attached proposed patch fixes PR120099 by modifying
gfc_return_by_reference so that it returns true with -ff2c
also for intrinsics returning complex numbers, as these are
not pure in the GCC IR sense, and wrapper functions for the
intrinsics were optimized out by DCE.

The change only affects compilation with -ff2c, so I guess
there will be only few people able to test any performance
impact on real-world code...

Regtested on x86_64-pc-linux-gnu with testcases only that
use the -ff2c flag explicitly.

OK for mainline?

Thanks,
Harald

From 65d7c6efe51371ba4d0681fc2fa0e732b70b70d7 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 18 May 2025 22:42:26 +0200
Subject: [PATCH] Fortran: fix FAIL of gfortran.dg/specifics_1.f90 after
 r16-372 [PR120099]

After commit r16-372, testcase gfortran.dg/specifics_1.f90 started to
FAIL at -O2 and higher, as DCE lead to elimination of evaluations of
Fortran specific intrinsics returning complex results and with -ff2c.
As the Fortran runtime library is compiled with -fno-f2c, the frontend
generates calls to wrapper subroutines _gfortran_f2c_specific_* that
return their result by reference via their first argument when this is
needed.  This is e.g. the case when specific names of the intrinsics are
used for passing as actual argument to procedures.  These wrappers are
not pure in the GCC IR sense, even if the Fortran intrinsics are.
Therefore gfc_return_by_reference must return true for these.

	PR fortran/120099

gcc/fortran/ChangeLog:

	* trans-types.cc (gfc_return_by_reference): Intrinsic functions
	returning complex numbers may return their result by reference
	with -ff2c.
---
 gcc/fortran/trans-types.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index f8980754685..e15b1bb89f0 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -3231,13 +3231,14 @@ gfc_return_by_reference (gfc_symbol * sym)
 
   /* Possibly return complex numbers by reference for g77 compatibility.
  We don't do this for calls to intrinsics (as the library uses the
- -fno-f2c calling convention), nor for calls to functions which always
+ -fno-f2c calling convention) except for calls to specific wrappers
+ (_gfortran_f2c_specific_*), nor for calls to functions which always
  require an explicit interface, as no compatibility problems can
  arise there.  */
   if (flag_f2c && sym->ts.type == BT_COMPLEX
   && !sym->attr.pointer
   && !sym->attr.allocatable
-  && !sym->attr.intrinsic && !sym->attr.always_explicit)
+  && !sym->attr.always_explicit)
 return 1;
 
   return 0;
-- 
2.43.0

[PATCH] gimple-fold: Implement simple copy propagation for aggregates [PR14295]

2025-05-18 Thread Andrew Pinski

This implements a simple copy propagation for aggregates in the similar
fashion as we already do for copy prop of zeroing.

Right now this only looks at the previous vdef statement but this allows us
to catch a lot of cases that show up in C++ code.

Also deletes aggregate copies that are to the same location (PR57361), this was
already done in DSE but we should do it here also since it is simple to add and
when doing a copy to a temporary and back to itself should be deleted too.
So we need a variant that tests DSE and one for forwprop.

Also adds a variant of pr22237.c which was found while working on this patch.

PR tree-optimization/14295
PR tree-optimization/108358
PR tree-optimization/114169

gcc/ChangeLog:

* tree-ssa-forwprop.cc (optimize_agr_copyprop): New function.
(pass_forwprop::execute): Call optimize_agr_copyprop for load/store 
statements.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/20031106-6.c: Un-xfail. Add scan for forwprop1.
* g++.dg/opt/pr66119.C: Disable forwprop since that does
the copy prop now.
* gcc.dg/tree-ssa/pr108358-a.c: New test.
* gcc.dg/tree-ssa/pr114169-1.c: New test.
* gcc.c-torture/execute/builtins/pr22237-1-lib.c: New test.
* gcc.c-torture/execute/builtins/pr22237-1.c: New test.
* gcc.dg/tree-ssa/pr57361.c: Disable forwprop1.
* gcc.dg/tree-ssa/pr57361-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/opt/pr66119.C|   2 +-
 .../execute/builtins/pr22237-1-lib.c  |  27 +
 .../execute/builtins/pr22237-1.c  |  57 ++
 gcc/testsuite/gcc.dg/tree-ssa/20031106-6.c|   8 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr108358-a.c|  33 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr114169-1.c|  39 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr57361-1.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr57361.c   |   2 +-
 gcc/tree-ssa-forwprop.cc  | 103 ++
 9 files changed, 276 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr108358-a.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr114169-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr57361-1.c

diff --git a/gcc/testsuite/g++.dg/opt/pr66119.C 
b/gcc/testsuite/g++.dg/opt/pr66119.C
index d1b1845a258..52362e44434 100644
--- a/gcc/testsuite/g++.dg/opt/pr66119.C
+++ b/gcc/testsuite/g++.dg/opt/pr66119.C
@@ -3,7 +3,7 @@
the value of MOVE_RATIO now is.  */
 
 /* { dg-do compile  { target { { i?86-*-* x86_64-*-* } && c++11 } }  }  */
-/* { dg-options "-O3 -mavx -fdump-tree-sra -march=slm -mtune=slm 
-fno-early-inlining" } */
+/* { dg-options "-O3 -mavx -fdump-tree-sra -fno-tree-forwprop -march=slm 
-mtune=slm -fno-early-inlining" } */
 // { dg-skip-if "requires hosted libstdc++ for cstdlib malloc" { ! hostedlib } 
}
 
 #include 
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c 
b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
new file mode 100644
index 000..44032357405
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
@@ -0,0 +1,27 @@
+extern void abort (void);
+
+void *
+memcpy (void *dst, const void *src, __SIZE_TYPE__ n)
+{
+  const char *srcp;
+  char *dstp;
+
+  srcp = src;
+  dstp = dst;
+
+  if (dst < src)
+{
+  if (dst + n > src)
+   abort ();
+}
+  else
+{
+  if (src + n > dst)
+   abort ();
+}
+
+  while (n-- != 0)
+*dstp++ = *srcp++;
+
+  return dst;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c 
b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
new file mode 100644
index 000..0a12b0fc9a1
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
@@ -0,0 +1,57 @@
+extern void abort (void);
+extern void exit (int);
+struct s { unsigned char a[256]; };
+union u { struct { struct s b; int c; } d; struct { int c; struct s b; } e; };
+static union u v;
+static union u v0;
+static struct s *p = &v.d.b;
+static struct s *q = &v.e.b;
+
+struct outers
+{
+  struct s inner;
+};
+
+static inline struct s rp (void) { return *p; }
+static inline struct s rq (void) { return *q; }
+static void pq (void)
+{
+  struct outers o = {rq () };
+  *p = o.inner;
+}
+static void qp (void)
+{
+  struct outers o = {rp () };
+  *q  = o.inner;
+}
+
+static void
+init (struct s *sp)
+{
+  int i;
+  for (i = 0; i < 256; i++)
+sp->a[i] = i;
+}
+
+static void
+check (struct s *sp)
+{
+  int i;
+  for (i = 0; i < 256; i++)
+if (sp->a[i] != i)
+  abort ();
+}
+
+void
+main_test (void)
+{
+  v = v0;
+  init (p);
+  qp ();
+  check (q);
+  v = v0;
+  init (q);
+  pq ();
+  check (p);
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20031106-6.c 
b/gcc/testsuite/gcc.dg/t

[PATCH v1 5/6] libstdc++: Implement layout_stride from mdspan.

2025-05-18 Thread Luc Grosheintz

Implements the remaining parts of layout_left and layout_right; and all
of layout_stride.

libstdc++-v3/ChangeLog:

* include/std/mdspan(layout_stride): New class.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan | 219 +++-
 1 file changed, 216 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index b1984eb2a33..31a38c736c2 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -366,6 +366,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class mapping;
   };
 
+  struct layout_stride
+  {
+template
+  class mapping;
+  };
+
   namespace __mdspan
   {
 template
@@ -434,7 +440,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 template
   concept __standardized_mapping = __mapping_of
-  || __mapping_of;
+  || __mapping_of
+  || __mapping_of;
 
 template
   concept __mapping_like = requires
@@ -503,6 +510,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: mapping(__other.extents(), __mdspan::__internal_ctor{})
{ }
 
+  template
+   requires (is_constructible_v)
+   constexpr explicit(extents_type::rank() > 0)
+   mapping(const layout_stride::mapping<_OExtents>& __other)
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   {
+ __glibcxx_assert(
+   layout_left::mapping<_OExtents>(__other.extents()) == __other);
+   }
+
   constexpr mapping&
   operator=(const mapping&) noexcept = default;
 
@@ -518,8 +535,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
constexpr index_type
operator()(_Indices... __indices) const noexcept
{
- return __mdspan::__linear_index_left(
-   this->extents(), static_cast(__indices)...);
+ return __mdspan::__linear_index_left(_M_extents,
+   static_cast(__indices)...);
}
 
   static constexpr bool
@@ -633,6 +650,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: mapping(__other.extents(), __mdspan::__internal_ctor{})
{ }
 
+  template
+   requires (is_constructible_v)
+   constexpr explicit(extents_type::rank() > 0)
+   mapping(const layout_stride::mapping<_OExtents>& __other) noexcept
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   {
+ __glibcxx_assert(
+   layout_right::mapping<_OExtents>(__other.extents()) == __other);
+   }
+
   constexpr mapping&
   operator=(const mapping&) noexcept = default;
 
@@ -695,6 +722,192 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
[[no_unique_address]] _Extents _M_extents;
 };
 
+  namespace __mdspan
+  {
+template
+  constexpr typename _Mapping::index_type
+  __offset_impl(const _Mapping& __m, index_sequence<_Counts...>) noexcept
+  { return __m(((void) _Counts, 0)...); }
+
+template
+  constexpr typename _Mapping::index_type
+  __offset(const _Mapping& __m) noexcept
+  {
+   return __offset_impl(__m,
+   make_index_sequence<_Mapping::extents_type::rank()>());
+  }
+
+template
+  constexpr typename _Mapping::index_type
+  __linear_index_strides(const _Mapping& __m,
+_Indices... __indices)
+  {
+   using _IndexType = typename _Mapping::index_type;
+   _IndexType __res = 0;
+   if constexpr (sizeof...(__indices) > 0)
+ {
+   auto __update = [&, __pos = 0u](_IndexType __idx) mutable
+ {
+   __res += __idx * __m.stride(__pos++);
+ };
+   (__update(__indices), ...);
+ }
+   return __res;
+  }
+  }
+
+  template
+class layout_stride::mapping
+{
+  static_assert(__mdspan::__layout_extent<_Extents>,
+   "The size of extents_type must be representable as index_type");
+
+public:
+  using extents_type = _Extents;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using layout_type = layout_stride;
+
+  constexpr
+  mapping() noexcept
+  {
+   auto __stride = index_type(1);
+   for(size_t __i = extents_type::rank(); __i > 0; --__i)
+ {
+   _M_strides[__i - 1] = __stride;
+   __stride *= _M_extents.extent(__i - 1);
+ }
+  }
+
+  constexpr
+  mapping(const mapping&) noexcept = default;
+
+  template<__mdspan::__valid_index_type _OIndexType>
+   constexpr
+   mapping(const extents_type& __exts,
+   span<_OIndexType, extents_type::rank()> __strides) noexcept
+   : _M_extents(__exts)
+   {
+ for(size_t __i = 0; __i < extents_type::rank(); ++__i)
+   _M_strides[__i] = index_type(as_const(__strides[__i]));
+   }
+
+  template<__mdspan::__valid_index_ty

Re: [PATCH 6/9] genemit: Consistently use operand arrays in gen_* functions

2025-05-18 Thread Richard Sandiford

Jeff Law  writes:
> On 5/16/25 11:21 AM, Richard Sandiford wrote:
>> One slightly awkward part about emitting the generator function
>> bodies is that:
>> 
>> * define_insn and define_expand routines have a separate argument for
>>each operand, named "operand0" upwards.
>> 
>> * define_split and define_peephole2 routines take a pointer to an array,
>>named "operands".
>> 
>> * the C++ preparation code for expands, splits and peephole2s uses an
>>array called "operands" to refer to the operands.
>> 
>> * the automatically-generated code uses individual "operand"
>>variables to refer to the operands.
>> 
>> So define_expands have to store the incoming arguments into an operands
>> array before the md file's C++ code, then copy the operands array back
>> to the individual variables before the automatically-generated code.
>> splits and peephole2s have to copy the incoming operands array to
>> individual variables after the md file's C++ code, creating more
>> local variables that are live across calls to rtx_alloc.
>> 
>> This patch tries to simplify things by making the whole function
>> body use the operands array in preference to individual variables.
>> define_insns and define_expands store their arguments to the array
>> on entry.
>> 
>> This would have pros and cons on its own, but having a single array
>> helps with future efforts to reduce the duplication between gen_*
>> functions.
>> 
>> Doing this tripped a warning in stormy16.md about writing beyond
>> the end of the array.  The negsi2 C++ code writes to operands[2]
>> even though the pattern has no operand 2.
>> 
>> gcc/
>>  * genemit.cc (gen_rtx_scratch, gen_exp): Use operands[%d] rather than
>>  operand%d.
>>  (start_gen_insn): Store the incoming arguments to an operands array.
>>  (gen_expand, gen_split): Remove copies into and out of the operands
>>  array.
>>  * config/stormy16/stormy16.md (negsi): Remove redundant assignment.
> So two questions.  Is there any meanginful performance impact expected 
> here using the array form rather than locals?   And does this impact how 
> folks write their C/C++ fragments in the expanders and such?

I don't think there should be any compile-time impact, and I can't
measure one when compiling fold-const.ii -O0 (my go-to test for this).

The md interface remains the same, in that all interaction is via the
the operands[] array.  Any writes to the individual operandN variables
(where present) are ignored both before and after the patch.

However, I suppose this does make it possible to turn the operandN
arguments into constants, to prevent accidents.  I'll try that.

Thanks,
Richard

Re: [patch, fortran] PR120049 - ICE when using IS_C_ASSOCIATED ()

2025-05-18 Thread Harald Anlauf


Hi Jerry,

I found 2 corner invalid cases which are silently accepted with
your patch when iso_c_binding is used indirectly:

  print *, c_associated(c_loc(val), C_NULL_FUNPTR)
  print *, c_associated(C_NULL_FUNPTR, c_loc(val))

These should get rejected, too.  Can you see how to catch these, too?

Thanks,
Harald

On 5/17/25 19:22, Jerry D wrote:

Hello all,

The attached patch revises the logic of the checks in 
gfc_check_c_associated to handle previous cases that ICE'ed as seen in 
the PR. There are multiple gotchas in these cases, particularly with the 
optional c_ptr_2 argument.


I factored the logic into two new helper functions. This helps to see 
what is happening and allows the c_ptr_1 checks to be performed 
separately in the event the c_ptr_2 checks succeed.


In  gfc_typename we did not handle the BT_VOID case which occurs in some 
of the error conditions.  I thought to possibly let it fall through to 
"UNKNOWN".  As it is with the patch I return "VOID".


I added a new test case.

I want to add Steve as Co-author as soon as I figure out how to do that 
with the git machinery.


Regression tested on x86_64.  OK for trunk and eventual backport to 15?

Regards,

Jerry

Author: Jerry DeLisle 
Date:   Sat May 17 09:45:14 2025 -0700

     Fortran: Fix c_associated argument checks.

   PR fortran/120049

     gcc/fortran/ChangeLog:

   * check.cc (gfc_check_c_associated): Use new helper functions.
     Only call check_c_ptr_1 if optional c_ptr_2 tests succeed.
     (check_c_ptr_1):  Handle only c_ptr_1 checks.
     (check_c_ptr_2): Expand checks for c_ptr_2 and handles cases
     where there is no derived pointer in the gfc_expr and check
     the inmod_sym_id only if it exists.
     * misc.cc (gfc_typename): Handle the case for BT_VOID rather
     than throw an internal error.

     gcc/testsuite/ChangeLog:

     * gfortran.dg/pr120049_2.f90: New test.

Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-05-18 Thread Andrew Pinski

On Sun, May 18, 2025 at 2:09 PM Gerald Pfeifer  wrote:
>
> On Mon, 3 Mar 2025, Wilco Dijkstra wrote:
> > Enable the early scheduler on AArch64 for O3/Ofast.  This means GCC15
> > benefits from much faster build times with -O2, but avoids the
> > regressions in lbm which is very sensitive to minor scheduling changes
> > due to long FMA chains.  We can then revisit this for GCC16.
> >
> > gcc:
> > PR target/118351
> > * common/config/aarch64/aarch64-common.cc: Enable early scheduling 
> > with
> > -O3 and higher.
>
> Is this something you may want to add to the release notes?

It is there already:
The first scheduling pass (-fschedule-insns) is no longer enabled by
default at -O2 for AArch64 targets. The pass is still enabled by
default at -O3 and -Ofast.

Thanks,
Andrew Pinski

>
> Gerald

Re: [patch, fortran] PR120049 - ICE when using IS_C_ASSOCIATED ()

2025-05-18 Thread Jerry D


On 5/18/25 2:10 PM, Harald Anlauf wrote:

Hi Jerry,

I found 2 corner invalid cases which are silently accepted with
your patch when iso_c_binding is used indirectly:

   print *, c_associated(c_loc(val), C_NULL_FUNPTR)
   print *, c_associated(C_NULL_FUNPTR, c_loc(val))

These should get rejected, too.  Can you see how to catch these, too?

Thanks,
Harald


Yes, will do! I try to think of cases to run through on. This helps.

Thanks,

Jerry
--- snip ---

Re: [patch, fortran] PR120049 - ICE when using IS_C_ASSOCIATED ()

2025-05-18 Thread Jerry D


On 5/18/25 2:34 PM, Jerry D wrote:

On 5/18/25 2:10 PM, Harald Anlauf wrote:

Hi Jerry,

I found 2 corner invalid cases which are silently accepted with
your patch when iso_c_binding is used indirectly:

   print *, c_associated(c_loc(val), C_NULL_FUNPTR)
   print *, c_associated(C_NULL_FUNPTR, c_loc(val))

These should get rejected, too.  Can you see how to catch these, too?

Thanks,
Harald


Yes, will do! I try to think of cases to run through on. This helps.

Thanks,

Jerry
--- snip ---


Will this was easy.  I added those two lines to my current test2.f90 and 
they are rejected.  I will update the testcase in the ready to commit copy.


OK to push then?

$ gfc test2.f90
test2.f90:46:36:

   46 |   print *, c_associated(c_loc(val), C_NULL_FUNPTR)
  |1
Error: Argument C_PTR_2 at (1) to C_ASSOCIATED shall have the same type 
as C_PTR_1: TYPE(c_ptr) instead of TYPE(c_funptr)

test2.f90:47:39:

   47 |   print *, c_associated(C_NULL_FUNPTR, c_loc(val))
  |   1
Error: Argument C_PTR_2 at (1) to C_ASSOCIATED shall have the same type 
as C_PTR_1: TYPE(c_funptr) instead of TYPE(c_ptr)

58 matches

Mail list logo