date:20231012

[PATCH v2] LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.

2023-10-12 Thread Lulu Cheng

There are two reasons for removing this macro definition:
1. The default in the assembler is to use the nop instruction for filling.
2. For assembly directives: .align [abs-expr[, abs-expr[, abs-expr]]]
   The third expression it is the maximum number of bytes that should be
   skipped by this alignment directive.
   Therefore, it will affect the display of the specified alignment rules
   and affect the operating efficiency.

This modification relies on binutils commit 
1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c.
(Since the assembler will add nop based on the .align information when doing 
relax,
it will cause the conditional branch to go out of bounds during the assembly 
process.
This submission of binutils solves this problem.)

gcc/ChangeLog:

* config/loongarch/loongarch.h (ASM_OUTPUT_ALIGN_WITH_NOP):
Delete.

Co-authored-by: Chenghua Xu 
---
v1 -> v2:
   Modify description information

---
 gcc/config/loongarch/loongarch.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index d357e32e414..f700b3cb939 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -1061,11 +1061,6 @@ typedef struct {
 
 #define ASM_OUTPUT_ALIGN(STREAM, LOG) fprintf (STREAM, "\t.align\t%d\n", (LOG))
 
-/* "nop" instruction 54525952 (andi $r0,$r0,0) is
-   used for padding.  */
-#define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
-  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
-
 /* This is how to output an assembler line to advance the location
counter by SIZE bytes.  */
 
-- 
2.31.1

Re: [pushed][PATCH v2] LoongArch: Adjust makefile dependency for loongarch headers.

2023-10-12 Thread chenglulu


Pushed to r14-4584.

在 2023/10/11 下午5:59, Yang Yujie 写道:

gcc/ChangeLog:

* config.gcc: Add loongarch-driver.h to tm_files.
* config/loongarch/loongarch.h: Do not include loongarch-driver.h.
* config/loongarch/t-loongarch: Append loongarch-multilib.h to $(GTM_H)
instead of $(TM_H) for building generator programs.
---
  gcc/config.gcc   | 4 ++--
  gcc/config/loongarch/loongarch.h | 3 ---
  gcc/config/loongarch/t-loongarch | 3 ++-
  3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ee46d96bf62..60f63b6c7d4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2524,7 +2524,7 @@ riscv*-*-freebsd*)
  
  loongarch*-*-linux*)

tm_file="elfos.h gnu-user.h linux.h linux-android.h glibc-stdint.h 
${tm_file}"
-   tm_file="${tm_file} loongarch/gnu-user.h loongarch/linux.h"
+   tm_file="${tm_file} loongarch/gnu-user.h loongarch/linux.h 
loongarch/loongarch-driver.h"
extra_options="${extra_options} linux-android.opt"
tmake_file="${tmake_file} loongarch/t-multilib loongarch/t-linux"
gnu_ld=yes
@@ -2537,7 +2537,7 @@ loongarch*-*-linux*)
  
  loongarch*-*-elf*)

tm_file="elfos.h newlib-stdint.h ${tm_file}"
-   tm_file="${tm_file} loongarch/elf.h loongarch/linux.h"
+   tm_file="${tm_file} loongarch/elf.h loongarch/linux.h 
loongarch/loongarch-driver.h"
tmake_file="${tmake_file} loongarch/t-multilib loongarch/t-linux"
gnu_ld=yes
gas=yes
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index d357e32e414..19a18fb5f1b 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -49,9 +49,6 @@ along with GCC; see the file COPYING3.  If not see
  
  #define TARGET_LIBGCC_SDATA_SECTION ".sdata"
  
-/* Driver native functions for SPEC processing in the GCC driver.  */

-#include "loongarch-driver.h"
-
  /* This definition replaces the formerly used 'm' constraint with a
 different constraint letter in order to avoid changing semantics of
 the 'm' constraint when accepting new address formats in
diff --git a/gcc/config/loongarch/t-loongarch b/gcc/config/loongarch/t-loongarch
index 9b06fa84bcc..667a6bb3b50 100644
--- a/gcc/config/loongarch/t-loongarch
+++ b/gcc/config/loongarch/t-loongarch
@@ -16,7 +16,8 @@
  # along with GCC; see the file COPYING3.  If not see
  # .
  
-TM_H += loongarch-multilib.h $(srcdir)/config/loongarch/loongarch-driver.h

+
+GTM_H += loongarch-multilib.h
  OPTIONS_H_EXTRA += $(srcdir)/config/loongarch/loongarch-def.h \
   $(srcdir)/config/loongarch/loongarch-tune.h

Re: [pushed][PATCH v3 0/2] LoongArch: Update target-supports.exp for LoongArch SX/ASX.

2023-10-12 Thread chenglulu


Pushed to r14-4585.

在 2023/9/28 下午6:05, Chenghui Pan 写道:

This is the update of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631379.html

This version does not include changes for codes, but fixes the commit title
format and appends the missing PR info.

Chenghui Pan (2):
   LoongArch: Enable vect.exp for LoongArch. [PR111424]
   LoongArch: Modify check_effective_target_vect_int_mod according to
 SX/ASX capabilities.

  gcc/testsuite/lib/target-supports.exp | 49 +++
  1 file changed, 49 insertions(+)

[PATCH] libstdc++: Fix tr1/8_c_compatibility/cstdio/functions.cc regression with recent glibc

2023-10-12 Thread Jakub Jelinek

Hi!

The following testcase started FAILing recently after the
https://sourceware.org/git/?p=glibc.git;a=commit;h=64b1a44183a3094672ed304532bedb9acc707554
glibc change which marked vfscanf with nonnull (1) attribute.
While vfwscanf hasn't been marked similarly (strangely), the patch changes
that too.  By using va_arg one hides the value of it from the compiler
(volatile keyword would do too, or making the FILE* stream a function
argument, but then it might need to be guarded by #if or something).

Tested on x86_64-linux, ok for trunk?

2023-10-12  Jakub Jelinek  

* testsuite/tr1/8_c_compatibility/cstdio/functions.cc (test01):
Initialize stream to va_arg(ap, FILE*) rather than 0.
* testsuite/tr1/8_c_compatibility/cwchar/functions.cc (test01):
Likewise.

--- libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdio/functions.cc.jj 
2023-01-16 23:19:06.651711546 +0100
+++ libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdio/functions.cc
2023-10-12 09:46:28.695011763 +0200
@@ -35,7 +35,7 @@ void test01(int dummy, ...)
   char* s = 0;
   const char* cs = 0;
   const char* format = "%i";
-  FILE* stream = 0;
+  FILE* stream = va_arg(ap, FILE*);
   std::size_t n = 0;
 
   int ret;
--- libstdc++-v3/testsuite/tr1/8_c_compatibility/cwchar/functions.cc.jj 
2023-01-16 23:19:06.651711546 +0100
+++ libstdc++-v3/testsuite/tr1/8_c_compatibility/cwchar/functions.cc
2023-10-12 09:46:19.236141897 +0200
@@ -42,7 +42,7 @@ void test01(int dummy, ...)
 #endif
 
 #if _GLIBCXX_HAVE_VFWSCANF
-  FILE* stream = 0;
+  FILE* stream = va_arg(arg, FILE*);
   const wchar_t* format1 = 0;
   int ret1;
   ret1 = std::tr1::vfwscanf(stream, format1, arg);

Jakub

Re: [PATCH 02/11] Handle epilogues that contain jumps

2023-10-12 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Aug 22, 2023 at 12:42 PM Szabolcs Nagy via Gcc-patches
>  wrote:
>>
>> From: Richard Sandiford 
>>
>> The prologue/epilogue pass allows the prologue sequence
>> to contain jumps.  The sequence is then partitioned into
>> basic blocks using find_many_sub_basic_blocks.
>>
>> This patch treats epilogues in the same way.  It's needed for
>> a follow-on aarch64 patch that adds conditional code to both
>> the prologue and the epilogue.
>>
>> Tested on aarch64-linux-gnu (including with a follow-on patch)
>> and x86_64-linux-gnu.  OK to install?
>>
>> Richard
>>
>> gcc/
>> * function.cc (thread_prologue_and_epilogue_insns): Handle
>> epilogues that contain jumps.
>> ---
>>
>> This is a previously approved patch that was not committed
>> because it was not needed at the time, but i'd like to commit
>> it as it is needed for the followup aarch64 eh_return changes:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605769.html
>>
>> ---
>>  gcc/function.cc | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/gcc/function.cc b/gcc/function.cc
>> index dd2c1136e07..70d1cd65303 100644
>> --- a/gcc/function.cc
>> +++ b/gcc/function.cc
>> @@ -6120,6 +6120,11 @@ thread_prologue_and_epilogue_insns (void)
>>   && returnjump_p (BB_END (e->src)))
>> e->flags &= ~EDGE_FALLTHRU;
>> }
>> +
>> + auto_sbitmap blocks (last_basic_block_for_fn (cfun));
>> + bitmap_clear (blocks);
>> +   bitmap_set_bit (blocks, BLOCK_FOR_INSN (epilogue_seq)->index);
>> + find_many_sub_basic_blocks (blocks);
>> }
>>else if (next_active_insn (BB_END (exit_fallthru_edge->src)))
>> {
>> @@ -6218,6 +6223,11 @@ thread_prologue_and_epilogue_insns (void)
>>   set_insn_locations (seq, epilogue_location);
>>
>>   emit_insn_before (seq, insn);
>> +
>> + auto_sbitmap blocks (last_basic_block_for_fn (cfun));
>> + bitmap_clear (blocks);
>> + bitmap_set_bit (blocks, BLOCK_FOR_INSN (insn)->index);
>> + find_many_sub_basic_blocks (blocks);
>
> I'll note that clearing a full sbitmap to pass down a single basic block
> to find_many_sub_basic_blocks is a quite expensive operation.  May I suggest
> to add an overload operating on a single basic block?  It's only
>
>   FOR_EACH_BB_FN (bb, cfun)
> SET_STATE (bb,
>bitmap_bit_p (blocks, bb->index) ? BLOCK_TO_SPLIT :
> BLOCK_ORIGINAL);
>
> using the bitmap, so factoring the rest of the function and customizing this
> walk would do the trick.  Note that the whole function could be refactored to
> handle single blocks more efficiently.

Sorry for the late reply, but does this look OK?  Tested on
aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Richard

---

The prologue/epilogue pass allows the prologue sequence to contain
jumps.  The sequence is then partitioned into basic blocks using
find_many_sub_basic_blocks.

This patch treats epilogues in a similar way.  Since only one block
might need to be split, the patch (re)introduces a find_sub_basic_blocks
routine to handle a single block.

The new routine hard-codes the assumption that split_block will chain
the new block immediately after the original block.  The routine doesn't
try to replicate the fix for PR81030, since that was specific to
gimple->rtl expansion.

The patch is needed for follow-on aarch64 patches that add conditional
code to the epilogue.  The tests are part of those patches.

gcc/
* cfgbuild.h (find_sub_basic_blocks): Declare.
* cfgbuild.cc (update_profile_for_new_sub_basic_block): New function,
split out from...
(find_many_sub_basic_blocks): ...here.
(find_sub_basic_blocks): New function.
* function.cc (thread_prologue_and_epilogue_insns): Handle
epilogues that contain jumps.
---
 gcc/cfgbuild.cc | 95 +
 gcc/cfgbuild.h  |  1 +
 gcc/function.cc |  4 +++
 3 files changed, 70 insertions(+), 30 deletions(-)

diff --git a/gcc/cfgbuild.cc b/gcc/cfgbuild.cc
index 15ed4deb5f7..9a6b34fb4b1 100644
--- a/gcc/cfgbuild.cc
+++ b/gcc/cfgbuild.cc
@@ -693,6 +693,43 @@ compute_outgoing_frequencies (basic_block b)
 }
 }
 
+/* Update the profile information for BB, which was created by splitting
+   an RTL block that had a non-final jump.  */
+
+static void
+update_profile_for_new_sub_basic_block (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  bool initialized_src = false, uninitialized_src = false;
+  bb->count = profile_count::zero ();
+  FOR_EACH_EDGE (e, ei, bb->preds)
+{
+  if (e->count ().initialized_p ())
+   {
+ bb->count += e->count ();
+ initialized_src = true;
+   }
+  else
+   uninitialized_src = true;
+}
+  /* When some edges are missing with read profile, this is
+ most likely because RTL expansion introduced loop.
+ When profile is guessed we may have BB t

[PATCH] tree-optimization/111764 - wrong reduction vectorization

2023-10-12 Thread Richard Biener

The following removes a misguided attempt to allow x + x in a reduction
path, also allowing x * x which isn't valid.  x + x actually never
arrives this way but instead is canonicalized to 2 * x.  This makes
reduction path handling consistent with how we handle the single-stmt
reduction case.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111764
* tree-vect-loop.cc (check_reduction_path): Remove the attempt
to allow x + x via special-casing of assigns.

* gcc.dg/vect/pr111764.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr111764.c | 16 
 gcc/tree-vect-loop.cc| 15 +++
 2 files changed, 19 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr111764.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr111764.c 
b/gcc/testsuite/gcc.dg/vect/pr111764.c
new file mode 100644
index 000..f4e110f3bbf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111764.c
@@ -0,0 +1,16 @@
+#include "tree-vect.h"
+
+short b = 2;
+
+int main()
+{
+  check_vect ();
+
+  for (int a = 1; a <= 9; a++)
+b = b * b;
+  if (b != 0)
+__builtin_abort ();
+
+  return 0;
+}
+
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 23c6e8259e7..82b793db74b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3986,24 +3986,15 @@ pop:
 ???  We could relax this and handle arbitrary live stmts by
 forcing a scalar epilogue for example.  */
   imm_use_iterator imm_iter;
+  use_operand_p use_p;
   gimple *op_use_stmt;
   unsigned cnt = 0;
   FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
if (!is_gimple_debug (op_use_stmt)
&& (*code != ERROR_MARK
|| flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
- {
-   /* We want to allow x + x but not x < 1 ? x : 2.  */
-   if (is_gimple_assign (op_use_stmt)
-   && gimple_assign_rhs_code (op_use_stmt) == COND_EXPR)
- {
-   use_operand_p use_p;
-   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
- cnt++;
- }
-   else
- cnt++;
- }
+ FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+   cnt++;
   if (cnt != 1)
{
  fail = true;
-- 
2.35.3

[PATCH] PR target/111778 - Fix undefined shifts in PowerPC compiler

2023-10-12 Thread Michael Meissner

I was building a cross compiler to PowerPC on my x86_86 workstation with the
latest version of GCC on October 11th.  I could not build the compiler on the
x86_64 system as it died in building libgcc.  I looked into it, and I
discovered the compiler was recursing until it ran out of stack space.  If I
build a native compiler with the same sources on a PowerPC system, it builds
fine.

I traced this down to a change made around October 10th:

| commit 8f1a70a4fbcc6441c70da60d4ef6db1e5635e18a (HEAD)
| Author: Jiufu Guo 
| Date:   Tue Jan 10 20:52:33 2023 +0800
|
|   rs6000: build constant via li/lis;rldicl/rldicr
|
|   If a constant is possible left/right cleaned on a rotated value from
|   a negative value of "li/lis".  Then, using "li/lis ; rldicl/rldicr"
|   to build the constant.

The code was doing a -1 << 64 which is undefined behavior because different
machines produce different results.  On the x86_64 system, (-1 << 64) produces
-1 while on a PowerPC 64-bit system, (-1 << 64) produces 0.  The x86_64 then
recurses until the stack runs out of space.

If I apply this patch, the compiler builds fine on both x86_64 as a PowerPC
crosss compiler and on a native PowerPC system.

Can I check this into the master branch to fix the problem?

2023-10-12  Michael Meissner  

gcc/

PR target/111778
* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): Protect
code from shifts that are undefined.
(can_be_built_by_li_lis_and_rldicr): Likewise.
(can_be_built_by_li_and_rldic): Protect code from shifts that
undefined.  Also replace uses of 1ULL with HOST_WIDE_INT_1U.

---
 gcc/config/rs6000/rs6000.cc | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 2828f01413c..cc24dd5301e 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10370,6 +10370,11 @@ can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, 
int *shift,
   /* Leading zeros may be cleaned by rldicl with a mask.  Change leading zeros
  to ones and then recheck it.  */
   int lz = clz_hwi (c);
+
+  /* If lz == 0, the left shift is undefined.  */
+  if (!lz)
+return false;
+
   HOST_WIDE_INT unmask_c
 = c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
   int n;
@@ -10398,6 +10403,11 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, 
int *shift,
   /* Tailing zeros may be cleaned by rldicr with a mask.  Change tailing zeros
  to ones and then recheck it.  */
   int tz = ctz_hwi (c);
+
+  /* If tz == HOST_BITS_PER_WIDE_INT, the left shift is undefined.  */
+  if (tz >= HOST_BITS_PER_WIDE_INT)
+return false;
+
   HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
   int n;
   if (can_be_rotated_to_lowbits (~unmask_c, 15, &n)
@@ -10428,8 +10438,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
  right bits are shifted as 0's, and left 1's(and x's) are cleaned.  */
   int tz = ctz_hwi (c);
   int lz = clz_hwi (c);
+
+  /* If lz == HOST_BITS_PER_WIDE_INT, the left shift is undefined.  */
+  if (lz >= HOST_BITS_PER_WIDE_INT)
+return false;
+
   int middle_ones = clz_hwi (~(c << lz));
-  if (tz + lz + middle_ones >= ones)
+  if (tz + lz + middle_ones >= ones
+  && (tz - lz) < HOST_BITS_PER_WIDE_INT
+  && tz < HOST_BITS_PER_WIDE_INT)
 {
   *mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
   *shift = tz;
@@ -10440,7 +10457,8 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
   int leading_ones = clz_hwi (~c);
   int tailing_ones = ctz_hwi (~c);
   int middle_zeros = ctz_hwi (c >> tailing_ones);
-  if (leading_ones + tailing_ones + middle_zeros >= ones)
+  if (leading_ones + tailing_ones + middle_zeros >= ones
+  && middle_zeros < HOST_BITS_PER_WIDE_INT)
 {
   *mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
   *shift = tailing_ones + middle_zeros;
@@ -10450,10 +10468,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
   /* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
   /* Get the position for the first bit of successive 1.
  The 24th bit would be in successive 0 or 1.  */
-  HOST_WIDE_INT low_mask = (1LL << 24) - 1LL;
+  HOST_WIDE_INT low_mask = (HOST_WIDE_INT_1U << 24) - HOST_WIDE_INT_1U;
   int pos_first_1 = ((c & (low_mask + 1)) == 0)
  ? clz_hwi (c & low_mask)
  : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask));
+
+  /* Make sure the left and right shifts are defined.  */
+  if (!IN_RANGE (pos_first_1, 1, HOST_BITS_PER_WIDE_INT-1))
+return false;
+
   middle_ones = clz_hwi (~c << pos_first_1);
   middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_first_1));
   if (pos_first_1 < HOST_BITS_PER_WIDE_INT
-- 
2.41.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432

[PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-12 Thread Ajit Agarwal

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated. Synced and modified with latest trunk sources.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;

  if (a != 5)
{
  l = a + b + c + d +e + f;
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code after function calls.  This increases
register pressure for callee-saved registers.  The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.

2023-10-12  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-20.c: New test.
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++
 gcc/tree-ssa-sink.cc| 39 -
 3 files changed, 56 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index a360c5cdd6e..95298bc8402 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -174,7 +174,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
 
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
-   statements.
+   statements. The best basic block should be an immediate dominator of
+   best basic block if the use stmt is after the call.
 
We want the most control dependent block in the shallowest loop nest.
 
@@ -196,6 +197,16 @@ select_best_block (basic_block early_bb,
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
   int threshold;
+  /* Get the sinking threshold.  If the statement to be moved has memory
+ operands, then increase the threshold by 7% as those are even more
+ profitable to avoid, clamping at 100%.  */
+  threshold = param_sink_frequency_threshold;
+  if (gimple_vuse (stmt) || gimple_vdef (stmt))
+{
+  threshold += 7;
+  if (threshold > 100)
+   threshold = 100;
+}
 
   while (temp_bb != early_bb)
 {
@@ -204,6 +215,14 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
 
+  /* if we have temp_bb post dominated by use block block then immediate
+   * dominator would be our best block.  */
+  if (!gimple_vuse (stmt)
+ && bb_loop_depth (temp_bb) == bb_loop_depth (early_bb)
+ && !(temp_bb->count * 100 >= early_bb->count * threshold)
+ && dominated_by_p (CDI_DOMINATORS, late_bb, temp_bb))
+   best_bb = temp_bb;
+
   /* Walk up the dominator tree, hopefully we'll find a shallower
 loop nest.  */
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
@@ -233,17 +252,6 @@ select_best_block (basic_block early_bb,
   && !dominated_by_p (CDI_DOMINATORS, b

[PATCH v6] c++: Check for indirect change of active union member in constexpr [PR101631,PR102286]

2023-10-12 Thread Nathaniel Shead

On Wed, Oct 11, 2023 at 12:48:12AM +1100, Nathaniel Shead wrote:
> On Mon, Oct 09, 2023 at 04:46:46PM -0400, Jason Merrill wrote:
> > On 10/8/23 21:03, Nathaniel Shead wrote:
> > > Ping for 
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631203.html
> > > 
> > > +   && (TREE_CODE (t) == MODIFY_EXPR
> > > +   /* Also check if initializations have implicit change of active
> > > +  member earlier up the access chain.  */
> > > +   || !refs->is_empty())
> > 
> > I'm not sure what the cumulative point of these two tests is.  TREE_CODE (t)
> > will be either MODIFY_EXPR or INIT_EXPR, and either should be OK.
> > 
> > As I understand it, the problematic case is something like
> > constexpr-union2.C, where we're also looking at a MODIFY_EXPR.  So what is
> > this check doing?
> 
> The reasoning was to correctly handle cases like the the following (in
> constexpr-union6.C):
> 
>   constexpr int test1() {
> U u {};
> std::construct_at(&u.s, S{ 1, 2 });
> return u.s.b;
>   }
>   static_assert(test1() == 2);
> 
> The initialisation of &u.s here is not a member access expression within
> the call to std::construct_at, since it's just a pointer, but this code
> is still legal; in general, an INIT_EXPR to initialise a union member
> should always be OK (I believe?), hence constraining to just
> MODIFY_EXPR.
> 
> However, just that would then (incorrectly) allow all the following
> cases in that test to compile, such as
> 
>   constexpr int test2() {
> U u {};
> int* p = &u.s.b;
> std::construct_at(p, 5);
> return u.s.b;
>   }
>   constexpr int x2 = test2();
> 
> since the INIT_EXPR is really only initialising 'b', but the implicit
> "modification" of active member to 'u.s' is illegal.
> 
> Maybe a better way of expressing this condition would be something like
> this?
> 
>   /* An INIT_EXPR of the last member in an access chain is always OK,
>  but still check implicit change of members earlier on; see 
>  cpp2a/constexpr-union6.C.  */
>   && !(TREE_CODE (t) == INIT_EXPR && refs->is_empty ())
> 
> Otherwise I'll see if I can rework some of the other conditions instead.
> 
> > Incidentally, I think constexpr-union6.C could use a test where we pass &u.s
> > to a function other than construct_at, and then try (and fail) to assign to
> > the b member from that function.
> > 
> > Jason
> > 
> 
> Sounds good; I've added the following test:
> 
>   constexpr void foo(S* s) {
> s->b = 10;  // { dg-error "accessing .U::s. member instead of initialized 
> .U::k." }
>   }
>   constexpr int test3() {
> U u {};
> foo(&u.s);  // { dg-message "in .constexpr. expansion" }
> return u.s.b;
>   }
>   constexpr int x3 = test3();  // { dg-message "in .constexpr. expansion" }
> 
> Incidentally I found this particular example caused a very unhelpful
> error + ICE due to reporting that S could not be value-initialized in
> the current version of the patch. The updated version below fixes that
> by using 'build_zero_init' instead -- is this an appropriate choice
> here?
> 
> A similar (but unrelated) issue is with e.g.
>   
>   struct S { const int a; int b; };
>   union U { int k; S s; };
> 
>   constexpr int test() {
> U u {};
> return u.s.b;
>   }
>   constexpr int x = test();
> 
> giving me this pretty unhelpful error message:
> 
> /home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
> /home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
> 6 |   return u.s.b;
>   |  ~~^
> /home/ns/main.cpp:1:8: note: ‘S::S()’ is implicitly deleted because the 
> default definition would be ill-formed:
> 1 | struct S { const int a; int b; };
>   |^
> /home/ns/main.cpp:1:8: error: uninitialised const member in ‘struct S’
> /home/ns/main.cpp:1:22: note: ‘const int S::a’ should be initialised
> 1 | struct S { const int a; int b; };
>   |  ^
> /home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
> /home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
> 6 |   return u.s.b;
>   |  ~~^
> /home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
> /home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
> 
> but I'll try and fix this separately (it exists on current trunk without
> this patch as well).

While attempting to fix this I found a much better way of handling
value-initialised unions. Here's a new version of the patch which also
includes the fix for accessing the wrong member of a value-initialised
union as well.

Additionally includes an `auto_diagnostic_group` for the `inform`
diagnostics as Marek helpfully informed me about in my other patch.

Bootstrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

This patch adds checks for attempting to change the active member of a
union by methods other than a member access expression.

To be able to properly distinguish `*(&u.a) = ` from `u.a = `, this
patch redoes the solution for c++/59950 t

[PATCH v1] RISC-V: Support FP lround/lroundf auto vectorization

2023-10-12 Thread pan2 . li

From: Pan Li 

This patch would like to support the FP lround/lroundf auto vectorization.

* long lround (double) for rv64
* long lroundf (float) for rv32

Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lroundmn2 only act on DF => DI for
rv64, and SF => SI for rv32.

Given we have code like:

void
test_lround (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lround (in[i]);
}

Before this patch:
.L3:
  ...
  fld  fa5,0(a1)
  fcvt.l.d a5,fa5,rmm
  sd   a5,-8(a0)
  ...
  bne  a1,a4,.L3

After this patch:
  frrm a6
  ...
  fsrmi4 // RMM
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
  ...
  fsrm a6

The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.

gcc/ChangeLog:

* config/riscv/autovec.md (lround2): New
pattern for lround/lroundf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lround): New func decl for expanding lround.
* config/riscv/riscv-v.cc (expand_vec_lround): New func impl
for expanding lround.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-lround-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lround-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lround-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   | 10 +++
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv-v.cc   | 10 +++
 .../riscv/rvv/autovec/unop/math-lround-0.c| 19 +
 .../riscv/rvv/autovec/unop/math-lround-1.c| 19 +
 .../rvv/autovec/unop/math-lround-run-0.c  | 72 +++
 .../rvv/autovec/unop/math-lround-run-1.c  | 72 +++
 .../riscv/rvv/autovec/vls/math-lround-0.c | 30 
 .../riscv/rvv/autovec/vls/math-lround-1.c | 30 
 9 files changed, 264 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lround-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lround-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ebc51ea69fd..33b11723c21 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2321,3 +2321,13 @@ (define_expand "lrint2"
 DONE;
   }
 )
+
+(define_expand "lround2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 8c9f7e0ab11..b7eeeb8f55d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -302,6 +302,7 @@ enum insn_type : unsigned int
   UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
   UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
+  UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
   UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P,
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
@@ -475,6 +476,7 @@ void expand_vec_round (rtx, rtx, machine_mode, 
machine_mode);
 void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lround (rtx, rtx, machine_mode, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a75eb59eb43..b61c745678b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4122,4 +4122,14 @@ expand_vec_lrint (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
   emit_vec_cvt_x_f (op_0, op_1, UNARY_OP_FRM_DYN, vec_fp_mode);
 }
 
+void
+expand_vec_lround (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
+  machine_mode vec_long_mode)
+{
+  gcc_assert (known_eq (GET_MODE_SIZE (vec_fp

[PATCH] tree-optimization/111773 - avoid CD-DCE of noreturn special calls

2023-10-12 Thread Richard Biener

The support to elide calls to allocation functions in DCE runs into
the issue that when implementations are discovered noreturn we end
up DCEing the calls anyway, leaving blocks without termination and
without outgoing edges which is both invalid IL and wrong-code when
as in the example the noreturn call would throw.  The following
avoids taking advantage of both noreturn and the ability to elide
allocation at the same time.

For the testcase it's valid to throw or return 10 by eliding the
allocation.  But we have to do either where currently we'd run
off the function.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Honza, any objections here?

Thanks,
Richard.

PR tree-optimization/111773
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Do
not elide noreturn calls that are reflected to the IL.

* g++.dg/torture/pr111773.C: New testcase.
---
 gcc/testsuite/g++.dg/torture/pr111773.C | 31 +
 gcc/tree-ssa-dce.cc |  8 +++
 2 files changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr111773.C

diff --git a/gcc/testsuite/g++.dg/torture/pr111773.C 
b/gcc/testsuite/g++.dg/torture/pr111773.C
new file mode 100644
index 000..af8c687252c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr111773.C
@@ -0,0 +1,31 @@
+// { dg-do run }
+
+#include 
+
+void* operator new(std::size_t sz)
+{
+  throw std::bad_alloc{};
+}
+
+int __attribute__((noipa)) foo ()
+{
+  int* p1 = static_cast(::operator new(sizeof(int)));
+  return 10;
+}
+
+int main()
+{
+  int res;
+  try
+{
+  res = foo ();
+}
+  catch (...)
+{
+  return 0;
+}
+
+  if (res != 10)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index f0b02456132..bbdf9312c9f 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -221,6 +221,14 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool 
aggressive)
 
 case GIMPLE_CALL:
   {
+   /* Never elide a noreturn call we pruned control-flow for.  */
+   if ((gimple_call_flags (stmt) & ECF_NORETURN)
+   && gimple_call_ctrl_altering_p (stmt))
+ {
+   mark_stmt_necessary (stmt, true);
+   return;
+ }
+
tree callee = gimple_call_fndecl (stmt);
if (callee != NULL_TREE
&& fndecl_built_in_p (callee, BUILT_IN_NORMAL))
-- 
2.35.3

Re: [PATCH v1] RISC-V: Support FP lround/lroundf auto vectorization

2023-10-12 Thread juzhe.zh...@rivai.ai

OK




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-12 16:59
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP lround/lroundf auto vectorization
From: Pan Li 
 
This patch would like to support the FP lround/lroundf auto vectorization.
 
* long lround (double) for rv64
* long lroundf (float) for rv32
 
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lroundmn2 only act on DF => DI for
rv64, and SF => SI for rv32.
 
Given we have code like:
 
void
test_lround (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lround (in[i]);
}
 
Before this patch:
.L3:
  ...
  fld  fa5,0(a1)
  fcvt.l.d a5,fa5,rmm
  sd   a5,-8(a0)
  ...
  bne  a1,a4,.L3
 
After this patch:
  frrm a6
  ...
  fsrmi4 // RMM
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
  ...
  fsrm a6
 
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lround2): New
pattern for lround/lroundf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lround): New func decl for expanding lround.
* config/riscv/riscv-v.cc (expand_vec_lround): New func impl
for expanding lround.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-lround-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lround-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lround-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 10 +++
gcc/config/riscv/riscv-protos.h   |  2 +
gcc/config/riscv/riscv-v.cc   | 10 +++
.../riscv/rvv/autovec/unop/math-lround-0.c| 19 +
.../riscv/rvv/autovec/unop/math-lround-1.c| 19 +
.../rvv/autovec/unop/math-lround-run-0.c  | 72 +++
.../rvv/autovec/unop/math-lround-run-1.c  | 72 +++
.../riscv/rvv/autovec/vls/math-lround-0.c | 30 
.../riscv/rvv/autovec/vls/math-lround-1.c | 30 
9 files changed, 264 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lround-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lround-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ebc51ea69fd..33b11723c21 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2321,3 +2321,13 @@ (define_expand "lrint2"
 DONE;
   }
)
+
+(define_expand "lround2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 8c9f7e0ab11..b7eeeb8f55d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -302,6 +302,7 @@ enum insn_type : unsigned int
   UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
   UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
+  UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
   UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P,
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
@@ -475,6 +476,7 @@ void expand_vec_round (rtx, rtx, machine_mode, 
machine_mode);
void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode);
void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lround (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a75eb59eb43..b61c745678b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4122,4 +4122,14 @@ expand_vec_lrint (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
   emit_vec_cvt_x_f (op_0, op_1, UNARY_OP_FRM_DYN, vec_fp_mode);
}
+void
+expand_vec_lround (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
+machine_mode v

[PATCH] reg-notes.def: Fix up description of REG_NOALIAS

2023-10-12 Thread Alex Coplan

Hi,

The description of the REG_NOALIAS note in reg-notes.def isn't quite
right. It describes it as being attached to call insns, but it is
instead attached to a move insn receiving the return value from a call.

This can be seen by looking at the code in calls.cc:expand_call which
attaches the note:

  emit_move_insn (temp, valreg);

  /* The return value from a malloc-like function cannot alias
 anything else.  */
  last = get_last_insn ();
  add_reg_note (last, REG_NOALIAS, temp);

Bootstrapped on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

* reg-notes.def (NOALIAS): Correct comment.
diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def
index 1f74a605b3e..5cbe35dfe36 100644
--- a/gcc/reg-notes.def
+++ b/gcc/reg-notes.def
@@ -96,8 +96,9 @@ REG_NOTE (DEP_CONTROL)
to extract the actual value.  */
 REG_NOTE (BR_PROB)
 
-/* Attached to a call insn; indicates that the call is malloc-like and
-   that the pointer returned cannot alias anything else.  */
+/* Attached to a move insn which receives the result of a call; indicates that
+   the call is malloc-like and that the pointer returned cannot alias anything
+   else.  */
 REG_NOTE (NOALIAS)
 
 /* REG_BR_PRED is attached to JUMP_INSNs.  It contains

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener

On Thu, 12 Oct 2023, ??? wrote:

> Thanks Richi point it out.
> 
> I found this patch can't make conditional gather load succeed on SLP.
> 
> I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> 
> If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD 
> (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> naturally.
> 
> If has condition mask, in tree-vect-patterns.cc,  I build 
> MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> as MASK_GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> flow naturally.
> 
> Is it reasonable ?

What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
even when the mask is -1?

> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-11 20:50
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
>  
> > This patch fixes this following FAILs in RISC-V regression:
> > 
> > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > 
> > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > 
> > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > tree-vect-patterns.cc if it is same
> > situation as GATHER_LOAD (no conditional mask).
> > 
> > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > argument is a dummy mask.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-vect-slp.cc (vect_get_operand_map):
> > (vect_build_slp_tree_1):
> > (vect_build_slp_tree_2):
> > * tree-vect-stmts.cc (vectorizable_load):
> > 
> > ---
> >  gcc/tree-vect-slp.cc   | 18 --
> >  gcc/tree-vect-stmts.cc |  4 ++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index fa098f9ff4e..712c04ec278 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned 
> > char swap = 0)
> >case IFN_MASK_GATHER_LOAD:
> >  return arg1_arg4_map;
> >  
> > +   case IFN_MASK_LEN_GATHER_LOAD:
> > + /* In tree-vect-patterns.cc, we will have these 2 situations:
> > +
> > + - Unconditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > +
> > + - Conditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > + return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > +   : nullptr;
> > +
> >case IFN_MASK_STORE:
> >  return arg3_arg2_map;
> >  
> > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> > *swap,
> >  
> >if (cfn == CFN_MASK_LOAD
> >|| cfn == CFN_GATHER_LOAD
> > -   || cfn == CFN_MASK_GATHER_LOAD)
> > +   || cfn == CFN_MASK_GATHER_LOAD
> > +   || cfn == CFN_MASK_LEN_GATHER_LOAD)
> >  ldst_p = true;
> >else if (cfn == CFN_MASK_STORE)
> >  {
> > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> > *swap,
> >if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> >&& rhs_code != CFN_GATHER_LOAD
> >&& rhs_code != CFN_MASK_GATHER_LOAD
> > +   && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> >/* Not grouped loads are handled as externals for BB
> >  vectorization.  For loop vectorization we can handle
> >  splats the same we handle single element interleaving.  */
> > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> >if (gcall *stmt = dyn_cast  (stmt_info->stmt))
> >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> >  || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > + || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> >else
> >  {
> >*max_nunits = this_max_nunits;
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index cd7c1090d88..263acf5d3cd 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> >  return false;
> >  
> >mask_index = internal_fn_mask_index (ifn);
> > -  if (mask_index >= 0 && slp_node)
> > +  if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> >  mask_index = vect_slp_child_index_for_operand (call, m

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

I tree-vect-slp.cc:
vect_get_and_check_slp_defs
711: 

  tree type = TREE_TYPE (oprnd);
  dt = dts[i];
  if ((dt == vect_constant_def
   || dt == vect_external_def)
  && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
  && (TREE_CODE (type) == BOOLEAN_TYPE
  || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
  type)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Build SLP failed: invalid type of def "
 "for variable-length SLP %T\n", oprnd);
  return -1;
}

Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
condition, then SLP failed:
Build SLP failed: invalid type of def




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:44
To: 钟居哲
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, ??? wrote:
 
> Thanks Richi point it out.
> 
> I found this patch can't make conditional gather load succeed on SLP.
> 
> I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> 
> If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD 
> (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> naturally.
> 
> If has condition mask, in tree-vect-patterns.cc,  I build 
> MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> as MASK_GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> flow naturally.
> 
> Is it reasonable ?
 
What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
even when the mask is -1?
 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-11 20:50
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
>  
> > This patch fixes this following FAILs in RISC-V regression:
> > 
> > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > 
> > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > 
> > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > tree-vect-patterns.cc if it is same
> > situation as GATHER_LOAD (no conditional mask).
> > 
> > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > argument is a dummy mask.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-vect-slp.cc (vect_get_operand_map):
> > (vect_build_slp_tree_1):
> > (vect_build_slp_tree_2):
> > * tree-vect-stmts.cc (vectorizable_load):
> > 
> > ---
> >  gcc/tree-vect-slp.cc   | 18 --
> >  gcc/tree-vect-stmts.cc |  4 ++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index fa098f9ff4e..712c04ec278 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned 
> > char swap = 0)
> >case IFN_MASK_GATHER_LOAD:
> >  return arg1_arg4_map;
> >  
> > +   case IFN_MASK_LEN_GATHER_LOAD:
> > + /* In tree-vect-patterns.cc, we will have these 2 situations:
> > +
> > + - Unconditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > +
> > + - Conditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > + return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > +   : nullptr;
> > +
> >case IFN_MASK_STORE:
> >  return arg3_arg2_map;
> >  
> > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> > *swap,
> >  
> >if (cfn == CFN_MASK_LOAD
> >|| cfn == CFN_GATHER_LOAD
> > -   || cfn == CFN_MASK_GATHER_LOAD)
> > +   || cfn == CFN_MASK_GATHER_LOAD
> > +   || cfn == CFN_MASK_LEN_GATHER_LOAD)
> >  ldst_p = true;
> >else if (cfn == CFN_MASK_STORE)
> >  {
> > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> > *swap,
> >if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> >&& rhs_code != CFN_GATHER_LOAD
> >&& rhs_code != CFN_MASK_GATHER_LOAD
> > +   && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> >/* Not grouped loads are handled as externals for BB
> >  vectorization.  For loop vectorizat

Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-12 Thread Prathamesh Kulkarni

On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni
 wrote:
>
> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
> >  wrote:
> > >
> > > Prathamesh Kulkarni  writes:
> > > > Hi,
> > > > The attached patch attempts to fix PR111648.
> > > > As mentioned in PR, the issue is when a1 is a multiple of vector
> > > > length, we end up creating following encoding in result: { base_elem,
> > > > arg[0], arg[1], ... } (assuming S = 1),
> > > > where arg is chosen input vector, which is incorrect, since the
> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> > > >
> > > > For the test-case mentioned in PR, vectorizer pass creates
> > > > VEC_PERM_EXPR where:
> > > > arg0: { -16, -9, -10, -11 }
> > > > arg1: { -12, -5, -6, -7 }
> > > > sel = { 3, 4, 5, 6 }
> > > >
> > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern 
> > > > = 3.
> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > > > following encoding:
> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern 
> > > > = 3
> > > >   = { -11, -12, -5 }
> > > >
> > > > So for res[3], it used S = (-5) - (-12) = 7
> > > > And hence computed it as -5 + 7 = 2.
> > > > instead of selecting arg1[2], ie, -6.
> > > >
> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a 
> > > > multiple
> > > > of vector length, so a1 ... ae select elements only from stepped part
> > > > of the pattern
> > > > from input vector and return false for this case.
> > > >
> > > > Since the vectors are VLS, fold_vec_perm_cst then sets:
> > > > res_npatterns = res_nelts
> > > > res_nelts_per_pattern  = 1
> > > > which seems to fix the issue by encoding all the elements.
> > > >
> > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 
> > > > because
> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> > > > which used a1 = 0, and thus selected arg1[0].
> > > >
> > > > I removed Case 4 because it was already covered in test_nunits_min_4,
> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> > > > and added a new Case 9 to test for this issue.
> > > >
> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> > > > and on x86_64-linux-gnu.
> > > > Does the patch look OK ?
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> > > >
> > > > gcc/ChangeLog:
> > > >   PR tree-optimization/111648
> > > >   * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
> > > >   is a multiple of vector length.
> > > >   (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> > > >   (test_nunits_min_4): ... here and rename case numbers. Also add
> > > >   Case 9.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >   PR tree-optimization/111648
> > > >   * gcc.dg/vect/pr111648.c: New test.
> > > >
> > > >
> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > > index 4f8561509ff..c5f421d6b76 100644
> > > > --- a/gcc/fold-const.cc
> > > > +++ b/gcc/fold-const.cc
> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> > > > tree arg1,
> > > > return false;
> > > >   }
> > > >
> > > > -  /* Ensure that the stepped sequence always selects from the same
> > > > -  input pattern.  */
> > > > +  /* Ensure that the stepped sequence always selects from the 
> > > > stepped
> > > > +  part of same input pattern.  */
> > > >unsigned arg_npatterns
> > > >   = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> > > > : VECTOR_CST_NPATTERNS (arg1);
> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> > > > tree arg1,
> > > >   *reason = "step is not multiple of npatterns";
> > > > return false;
> > > >   }
> > > > +
> > > > +  /* If a1 is a multiple of len, it will select base element of 
> > > > input
> > > > +  vector resulting in following encoding:
> > > > +  { base_elem, arg[0], arg[1], ... } where arg is the chosen input
> > > > +  vector. This encoding is not originally present in arg, since 
> > > > it's
> > > > +  defined as:
> > > > +  { arg[0], arg[1], arg[2], ... }.  */
> > > > +
> > > > +  if (multiple_p (a1, arg_len))
> > > > + {
> > > > +   if (reason)
> > > > + *reason = "selecting base element of input vector";
> > > > +   return false;
> > > > + }
> > >
> > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
> > > second argument has 2 stepped patterns.
> > Ah right, thanks for pointing out. In the attached patch I extended the 
> > check
> > so that r1 < arg_npatterns which should check if we are choosing base
> > elements from any of the patterns in arg (and not just first).
> > Does that look OK

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener

On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:

> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>   tree type = TREE_TYPE (oprnd);
>   dt = dts[i];
>   if ((dt == vect_constant_def
>|| dt == vect_external_def)
>   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>   && (TREE_CODE (type) == BOOLEAN_TYPE
>   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>   type)))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "Build SLP failed: invalid type of def "
>  "for variable-length SLP %T\n", oprnd);
>   return -1;
> }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> condition, then SLP failed:
> Build SLP failed: invalid type of def

I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
 
>
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> > naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> > as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> > flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on 
> > > MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > > argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-vect-slp.cc (vect_get_operand_map):
> > > (vect_build_slp_tree_1):
> > > (vect_build_slp_tree_2):
> > > * tree-vect-stmts.cc (vectorizable_load):
> > > 
> > > ---
> > >  gcc/tree-vect-slp.cc   | 18 --
> > >  gcc/tree-vect-stmts.cc |  4 ++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index fa098f9ff4e..712c04ec278 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned 
> > > char swap = 0)
> > >case IFN_MASK_GATHER_LOAD:
> > >  return arg1_arg4_map;
> > >  
> > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > + /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > +
> > > + - Unconditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > +
> > > + - Conditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > + return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > +   : nullptr;
> > > +
> > >case IFN_MASK_STORE:
> > >  return arg3_arg2_map;
> > >  
> > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned 
> > > char *swap,
> > >  
> > >if (cfn == CFN_MASK_LOAD
> > >|| cfn == CFN_GATHER_LOAD
> > > -   || cfn == CFN_MASK_GATHER_LOAD)
> > > +   || cf

[PATCH] RISCV: Bugfix for incorrect documentation heading nesting

2023-10-12 Thread Mary Bennett

gcc/ChangeLog:
* doc/extend.texi: Change subsubsection to subsection for
  CORE-V built-ins.
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ffe8532ad91..e8180945ab4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21719,7 +21719,7 @@ vector intrinsic specification, which is available at 
the following link:
 All of these functions are declared in the include file @file{riscv_vector.h}.
 
 @node CORE-V Built-in Functions
-@subsubsection CORE-V Built-in Functions
+@subsection CORE-V Built-in Functions
 
 These built-in functions are available for the CORE-V MAC machine
 architecture. For more information on CORE-V built-ins, please see
-- 
2.34.1

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

Hi, Richi.

I restrict as you said into vect_external_def.

Then this condition made SLP failed:

-  if (mask_index >= 0
+  if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
  &mask, NULL, &mask_dt, &mask_vectype))
return false;

So I add 'internal_fn_len_index (ifn) < 0' for MASK_LEN_GATHER_LOAD does not 
check scalar mask.

Then ICE here:

vect_slp_analyze_node_operations
if (child
  && (SLP_TREE_DEF_TYPE (child) == vect_constant_def
  || SLP_TREE_DEF_TYPE (child) == vect_external_def)
  /* Perform usual caching, note code-generation still
 code-gens these nodes multiple times but we expect
 to CSE them later.  */
  && !visited_set.add (child))
{
  visited_vec.safe_push (child);
  /* ???  After auditing more code paths make a "default"
 and push the vector type from NODE to all children
 if it is not already set.  */
  /* Compute the number of vectors to be generated.  */
  tree vector_type = SLP_TREE_VECTYPE (child);
  if (!vector_type)
{
  /* For shifts with a scalar argument we don't need
 to cost or code-generate anything.
 ???  Represent this more explicitely.  */
  gcc_assert ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) 
> assert FAILed.
   == shift_vec_info_type)
  && j == 1);
  continue;
}

Could you help me with that?


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:55
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>   tree type = TREE_TYPE (oprnd);
>   dt = dts[i];
>   if ((dt == vect_constant_def
>|| dt == vect_external_def)
>   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>   && (TREE_CODE (type) == BOOLEAN_TYPE
>   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>   type)))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "Build SLP failed: invalid type of def "
>  "for variable-length SLP %T\n", oprnd);
>   return -1;
> }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> condition, then SLP failed:
> Build SLP failed: invalid type of def
 
I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
>
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> > naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> > as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> > flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > 
> > > Th

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

Oh. I see.

Here make vect_constant_def failed to SLP:

tree-vect-slp.cc:
vect_build_slp_tree_2
line 2354:

  if (oprnd_info->first_dt == vect_external_def
  || oprnd_info->first_dt == vect_constant_def)
{
  slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops);
  SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt;
  oprnd_info->ops = vNULL;
  children.safe_push (invnode);
  continue;
}

It seems that we handle vect_constant_def same as vect_external_def.
So failed to SLP ?



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:55
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>   tree type = TREE_TYPE (oprnd);
>   dt = dts[i];
>   if ((dt == vect_constant_def
>|| dt == vect_external_def)
>   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>   && (TREE_CODE (type) == BOOLEAN_TYPE
>   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>   type)))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "Build SLP failed: invalid type of def "
>  "for variable-length SLP %T\n", oprnd);
>   return -1;
> }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> condition, then SLP failed:
> Build SLP failed: invalid type of def
 
I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
>
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> > naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> > as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> > flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on 
> > > MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > > argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-vect-slp.cc (vect_get_operand_map):
> > > (vect_build_slp_tree_1):
> > > (vect_build_slp_tree_2):
> > > * tree-vect-stmts.cc (vectorizable_load):
> > > 
> > > ---
> > >  gcc/tree-vect-slp.cc   | 18 --
> > >  gcc/tree-vect-stmts.cc |  4 ++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index fa098f9ff4e..712c04ec278 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned 
> > > char swap = 0)
> > >case IFN_MASK_GATHER_LOAD:
> > >  return arg1_arg4_map;
> > >  
> > > +   c

Re: [PATCH 6/6] aarch64: Add front-end argument type checking for target builtins

2023-10-12 Thread Richard Sandiford

"Richard Earnshaw (lists)"  writes:
> On 09/10/2023 14:12, Victor Do Nascimento wrote:
>> 
>> 
>> On 10/7/23 12:53, Richard Sandiford wrote:
>>> Richard Earnshaw  writes:
 On 03/10/2023 16:18, Victor Do Nascimento wrote:
> In implementing the ACLE read/write system register builtins it was
> observed that leaving argument type checking to be done at expand-time
> meant that poorly-formed function calls were being "fixed" by certain
> optimization passes, meaning bad code wasn't being properly picked up
> in checking.
>
> Example:
>
>     const char *regname = "amcgcr_el0";
>     long long a = __builtin_aarch64_rsr64 (regname);
>
> is reduced by the ccp1 pass to
>
>     long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");
>
> As these functions require an argument of STRING_CST type, there needs
> to be a check carried out by the front-end capable of picking this up.
>
> The introduced `check_general_builtin_call' function will be called by
> the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
> belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
> carrying out any appropriate checks associated with a particular
> builtin function code.

 Doesn't this prevent reasonable wrapping of the __builtin... names with
 something more palatable?  Eg:

 static inline __attribute__(("always_inline")) long long get_sysreg_ll
 (const char *regname)
 {
     return __builtin_aarch64_rsr64 (regname);
 }

 ...
     long long x = get_sysreg_ll("amcgcr_el0");
 ...
>>>
>>> I think it's case of picking your poison.  If we didn't do this,
>>> and only checked later, then it's unlikely that GCC and Clang would
>>> be consistent about when a constant gets folded soon enough.
>>>
>>> But yeah, it means that the above would need to be a macro in C.
>>> Enlightened souls using C++ could instead do:
>>>
>>>    template
>>>    long long get_sysreg_ll()
>>>    {
>>>  return __builtin_aarch64_rsr64(regname);
>>>    }
>>>
>>>    ... get_sysreg_ll<"amcgcr_el0">() ...
>>>
>>> Or at least I hope so.  Might be nice to have a test for this.
>>>
>>> Thanks,
>>> Richard
>> 
>> As Richard Earnshaw mentioned, this does break the use of `static inline 
>> __attribute__(("always_inline"))', something I had found out in my testing.  
>> My chosen implementation was indeed, to quote Richard Sandiford, a case of 
>> "picking your poison" to have things line up with Clang and behaving 
>> consistently across optimization levels.
>> 
>> Relaxing the the use of `TARGET_CHECK_BUILTIN_CALL' meant optimizations were 
>> letting too many things through. Example:
>> 
>> const char *regname = "amcgcr_el0";
>> long long a = __builtin_aarch64_rsr64 (regname);
>> 
>> gets folded to
>> 
>> long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");
>> 
>> and compilation passes at -01 even though it fails at -O0.
>> 
>> I had, however, not given any thought to the use of a template as a valid 
>> C++ alternative.
>> 
>> I will evaluate the use of templates and add tests accordingly.
>
> This just seems inconsistent with all the builtins we already have that 
> require literal constants for parameters.  For example (to pick just one of 
> many), vshr_n_q8(), where the second parameter must be a literal value.  In 
> practice we accept anything that resolves to a compile-time constant integer 
> expression and rely on that to avoid having to have hundreds of macros 
> binding the ACLE names to the underlying builtin equivalents.

That's true for the way that GCC handles things like Advanced SIMD.
But Clang behaves differently.  So does GCC's SVE ACLE implementation.
Both of those (try to) follow the language rules about what is a constant
expression.

Thanks,
Richard

[PATCH] tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA

2023-10-12 Thread Richard Biener

The following handles byte-aligned, power-of-two and byte-multiple
sized BIT_FIELD_REF reads in SRA.  In particular this should cover
BIT_FIELD_REFs created by optimize_bit_field_compare.

For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF
appearing there leading to more DSE, fully eliding the aggregates.

This results in the same false positive -Wuninitialized as the
older attempt to remove the folding from optimize_bit_field_compare,
fixed by initializing part of the aggregate unconditionally.

Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages.

Martin is on leave so I'll push this tomorrow unless the Fortran
folks have objections.

Thanks,
Richard.

PR tree-optimization/111779
gcc/
* tree-sra.cc (sra_handled_bf_read_p): New function.
(build_access_from_expr_1): Handle some BIT_FIELD_REFs.
(sra_modify_expr): Likewise.
(make_fancy_name_1): Skip over BIT_FIELD_REF.

gcc/fortran/
* trans-expr.cc (gfc_trans_assignment_1): Initialize
lhs_caf_attr and rhs_caf_attr codimension flag to avoid
false positive -Wuninitialized.

gcc/testsuite/
* gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE.
* gcc.dg/vect/vect-pr111779.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c |  4 +-
 gcc/testsuite/gcc.dg/vect/vect-pr111779.c  | 56 ++
 gcc/tree-sra.cc| 24 --
 3 files changed, 79 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr111779.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
index e3c33f49ef6..43152de5616 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
@@ -31,5 +31,5 @@ constraint_equal (struct constraint a, struct constraint b)
 && constraint_expr_equal (a.rhs, b.rhs);
 }
 
-/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 1 "dse1" } } */
-/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 1 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 2 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 2 "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-pr111779.c 
b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
new file mode 100644
index 000..79b72aebc78
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
@@ -0,0 +1,56 @@
+#include 
+#include "tree-vect.h"
+
+struct C
+{
+int c;
+int d;
+bool f :1;
+float e;
+};
+
+struct A
+{
+  unsigned int a;
+  unsigned char c1, c2;
+  bool b1 : 1;
+  bool b2 : 1;
+  bool b3 : 1;
+  struct C b4;
+};
+
+void __attribute__((noipa))
+foo (const struct A * __restrict x, int y)
+{
+  int s = 0, i = 0;
+  for (i = 0; i < y; ++i)
+{
+  const struct A a = x[i];
+  s += a.b4.f ? 1 : 0;
+}
+  if (s != 0)
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  struct A x[100];
+  int i;
+
+  check_vect ();
+
+  __builtin_memset (x, -1, sizeof (x));
+#pragma GCC novect
+  for (i = 0; i < 100; i++)
+{
+  x[i].b1 = false;
+  x[i].b2 = false;
+  x[i].b3 = false;
+  x[i].b4.f = false;
+}
+  foo (x, 100);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_int } } 
} */
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 56a8ba26135..24d0c20da6a 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1113,6 +1113,21 @@ disqualify_base_of_expr (tree t, const char *reason)
 disqualify_candidate (t, reason);
 }
 
+/* Return true if the BIT_FIELD_REF read EXPR is handled by SRA.  */
+
+static bool
+sra_handled_bf_read_p (tree expr)
+{
+  uint64_t size, offset;
+  if (bit_field_size (expr).is_constant (&size)
+  && bit_field_offset (expr).is_constant (&offset)
+  && size % BITS_PER_UNIT == 0
+  && offset % BITS_PER_UNIT == 0
+  && pow2p_hwi (size))
+return true;
+  return false;
+}
+
 /* Scan expression EXPR and create access structures for all accesses to
candidates for scalarization.  Return the created access or NULL if none is
created.  */
@@ -1123,7 +1138,8 @@ build_access_from_expr_1 (tree expr, gimple *stmt, bool 
write)
   struct access *ret = NULL;
   bool partial_ref;
 
-  if (TREE_CODE (expr) == BIT_FIELD_REF
+  if ((TREE_CODE (expr) == BIT_FIELD_REF
+   && (write || !sra_handled_bf_read_p (expr)))
   || TREE_CODE (expr) == IMAGPART_EXPR
   || TREE_CODE (expr) == REALPART_EXPR)
 {
@@ -1170,6 +1186,7 @@ build_access_from_expr_1 (tree expr, gimple *stmt, bool 
write)
 case COMPONENT_REF:
 case ARRAY_REF:
 case ARRAY_RANGE_REF:
+case BIT_FIELD_REF:
   ret = create_access (expr, stmt, write);
   break;
 
@@ -1549,6 +1566,7 @@ make_fancy_name_1 (tree expr)
   obstack_grow (&name_obstack, buffer, strlen (buffer));
   break;
 
+case BIT_FIELD_REF:
 case ADDR_EXPR:
   make_fancy_name_1 (TREE_OPER

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-12 Thread Richard Sandiford

Robin Dapp  writes:
>> It wasn't very clear, sorry, but it was the last sentence I was asking
>> for clarification on, not the other bits.  Why do we want to avoid
>> generating a COND_ADD when the operand is a vectorisable call?
>
> Ah, I see, apologies.  Upon thinking about it a bit more (thanks)
> I figured this hunk is not necessary.  I added it early in the process
> in order to keep the current behavior for situations like the following:
>
>  before:
>  _1 = .FMA (...)
>  _2 = COND (cond, .FMA, 0.0)
>  _3 = COND_ADD (true, result, _2, result)
>
>  This we would simplify to:
>  _2 = COND_FMA (cond, ...)
>  _3 = COND_ADD (true, result, _2, result)
>
>  with the patch we have:
>  _1 = .FMA (...)
>  _2 = .COND_ADD (cond, arg1, _1, arg1)
>
> Due to differences in expansion we'd end up with a masked
> vfmacc ("a += a + b * c") before and now emit an unmasked
> vfmadd ("a += a * b + c") and a masked result add.  This shouldn't
> be worse from a vector spec point of view, so I just changed the
> test expectation for now.

Thanks, sounds good.

> The attached v4 also includes Richi's suggestion for the HONOR...
> stuff.
>
> Bootstrap and regtest unchanged on aarch64, x86 and power10.

I'm reluctant to comment on the signed zeros/MINUS_EXPR parts,
but FWIW, the rest looks good to me.

Thanks,
Richard

>
> Regards
>  Robin
>
>
> From 1752507ce22c22b50b96f889dc0a9c2fc8e50859 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Wed, 13 Sep 2023 22:19:35 +0200
> Subject: [PATCH v4] ifcvt/vect: Emit COND_ADD for conditional scalar
>  reduction.
>
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
>
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
>
> gcc/ChangeLog:
>
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 
>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>  gcc/tree-if-conv.cc   |  49 --
>  gcc/tree-vect-loop.cc | 156 ++
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 446 insertions(+), 49 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_plus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_minus_double (double *restrict a, double init, int *cond, int n)

[Patch] libgomp.texi: Note to 'Memory allocation' sect and missing mem-memory routines

2023-10-12 Thread Tobias Burnus


This patch improves the documentation by completing the description of
the remaining so far undocumented OpenMP Memory-Management Routines
(except for the two function added in TR11, which are also unimplmeneted).
Current online version:

https://gcc.gnu.org/onlinedocs/libgomp/Memory-Management-Routines.html

And the patch makes clearer when the OpenMP allocators are actually used;
besides the obvious use via the routines above, it happens when using the
'allocate' directive/clause and the 'allocators' clause. The new note
is added to the beginning of the section

https://gcc.gnu.org/onlinedocs/libgomp/Memory-allocation.html

The new note mostly applies, except that only C supports 'omp allocate';
Fortran has a patch pending a comment by Jakub and for C++ I still need
to complete my daft patch.

I also fixed some typos (albeit 'behaviour' is not really a typo), removed
some 'kind=' for local consistency and to make especially the 'info' output
cleaner.

Comment, remarks, suggestions - before (or after) I apply it?

Tobias

PS: Also general comments and suggestions to the documentation are welcome.
The wording (current patch and current documentation) can surely be improved.

That documentation is missing - e.g. for several routines - is a known
problem.

If someone feels bored, besides reviewing libgomp.texi (6282 LoC),
https://gcc.gnu.org/projects/gomp/ (1281 LoC) and 
https://gcc.gnu.org/wiki/Offloading
(+ openmp + OpenACC) can surely be improved :-)
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Note to 'Memory allocation' sect and missing mem-memory routines

This commit completes the documentation of the OpenMP memory-management
routines, except for the unimplemented TR11 additions.  It also makes clear
in the 'Memory allocation' section of the 'OpenMP-Implementation Specifics'
chapter under which condition OpenMP managed memory/allocators are used.

libgomp/ChangeLog:

	* libgomp.texi: Fix some typos.
	(Memory Management Routines): Document remaining 5.x routines.
	(Memory allocation): Make clear when the section applies.

 libgomp/libgomp.texi | 382 +--
 1 file changed, 367 insertions(+), 15 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 0d965f96d48..3fc9c7dea23 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1917,7 +1917,7 @@ is not supported.
 @item @emph{Description}:
 If the device number is refers to the initial device or to a device with
 memory accessible from the host (shared memory), the @code{omp_get_mapped_ptr}
-routines returnes the value of the passed @var{ptr}.  Otherwise, if associated
+routines returns the value of the passed @var{ptr}.  Otherwise, if associated
 storage to the passed host pointer @var{ptr} exists on device associated with
 @var{device_num}, it returns that pointer. In all other cases and in cases of
 an error, a null pointer is returned.
@@ -2397,12 +2397,12 @@ They have C linkage and do not throw exceptions.
 * omp_destroy_allocator:: Destroy an allocator
 * omp_set_default_allocator:: Set the default allocator
 * omp_get_default_allocator:: Get the default allocator
-@c * omp_alloc:: 
-@c * omp_aligned_alloc:: 
-@c * omp_free:: 
-@c * omp_calloc:: 
-@c * omp_aligned_calloc:: 
-@c * omp_realloc:: 
+* omp_alloc:: Memory allocation with an allocator
+* omp_aligned_alloc:: Memory allocation with an allocator and alignment
+* omp_free:: Freeing memory allocated with OpenMP routines
+* omp_calloc:: Allocate nullified memory with an allocator
+* omp_aligned_calloc:: Allocate nullified aligned memory with an allocator
+* omp_realloc:: Reallocate memory allocated with OpenMP routines
 @c * omp_get_memspace_num_resources:: /TR11
 @c * omp_get_submemspace:: /TR11
 @end menu
@@ -2434,8 +2434,8 @@ may be used as trait value to specify that the default value should be used.
 @item @emph{Fortran}:
 @multitable @columnfractions .20 .80
 @item @emph{Interface}: @tab @code{function omp_init_allocator(memspace, ntraits, traits)}
-@item   @tab @code{integer (kind=omp_allocator_handle_kind) :: omp_init_allocator}
-@item   @tab @code{integer (kind=omp_memspace_handle_kind), intent(in) :: memspace}
+@item   @tab @code{integer (omp_allocator_handle_kind) :: omp_init_allocator}
+@item   @tab @code{integer (omp_memspace_handle_kind), intent(in) :: memspace}
 @item   @tab @code{integer, intent(in) :: ntraits}
 @item   @tab @code{type (omp_alloctrait), intent(in) :: traits(*)}
 @end multitable
@@ -2467,7 +2467,7 @@ routine is permitted but will have no effect.
 @item @emph{Fortran}:
 @multitable @columnfractions .20 .80
 @item @emph{Interface}: @tab @code{subroutine omp_destroy

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-12 Thread Richard Sandiford

Jakub Jelinek  writes:
> @@ -2036,11 +2075,20 @@ wi::lrshift_large (HOST_WIDE_INT *val, c
>  unsigned int xlen, unsigned int xprecision,
>  unsigned int precision, unsigned int shift)
>  {
> -  unsigned int len = rshift_large_common (val, xval, xlen, xprecision, 
> shift);
> +  /* Work out how many blocks are needed to store the significant bits
> + (excluding the upper zeros or signs).  */
> +  unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
> +  unsigned int len = blocks_needed;
> +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> +  && len > xlen
> +  && xval[xlen - 1] >= 0)
> +len = xlen;

I think here too it would be worth dropping the:

  UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)

part of the condition, since presumably the change should be safe
regardless of that.

> +
> +  rshift_large_common (val, xval, xlen, shift, len);
>  
>/* The value we just created has precision XPRECISION - SHIFT.
>   Zero-extend it to wider precisions.  */
> -  if (precision > xprecision - shift)
> +  if (precision > xprecision - shift && len == blocks_needed)
>  {
>unsigned int small_prec = (xprecision - shift) % 
> HOST_BITS_PER_WIDE_INT;
>if (small_prec)
> @@ -2063,11 +2111,18 @@ wi::arshift_large (HOST_WIDE_INT *val, c
>  unsigned int xlen, unsigned int xprecision,
>  unsigned int precision, unsigned int shift)
>  {
> -  unsigned int len = rshift_large_common (val, xval, xlen, xprecision, 
> shift);
> +  /* Work out how many blocks are needed to store the significant bits
> + (excluding the upper zeros or signs).  */
> +  unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
> +  unsigned int len = blocks_needed;
> +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) && len > xlen)
> +len = xlen;
> +

Same here.

OK for thw wide-int parts with those changes.

Thanks,
Richard

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

In tree-vect-stmts.cc

vect_check_scalar_mask

Failed here:

  /* If the caller is not prepared for adjusting an external/constant
 SLP mask vector type fail.  */
  if (slp_node
  && !mask_node
  && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "SLP mask argument is not vectorized.\n");
  return false;
}

If we allow vect_constant_def, we should adjust constant SLP mask ? in the 
caller "vectorizable_load" ?

But I don't know how to adjust that.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:55
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>   tree type = TREE_TYPE (oprnd);
>   dt = dts[i];
>   if ((dt == vect_constant_def
>|| dt == vect_external_def)
>   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>   && (TREE_CODE (type) == BOOLEAN_TYPE
>   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>   type)))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "Build SLP failed: invalid type of def "
>  "for variable-length SLP %T\n", oprnd);
>   return -1;
> }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> condition, then SLP failed:
> Build SLP failed: invalid type of def
 
I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
>
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> > naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> > as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> > flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on 
> > > MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > > argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-vect-slp.cc (vect_get_operand_map):
> > > (vect_build_slp_tree_1):
> > > (vect_build_slp_tree_2):
> > > * tree-vect-stmts.cc (vectorizable_load):
> > > 
> > > ---
> > >  gcc/tree-vect-slp.cc   | 18 --
> > >  gcc/tree-vect-stmts.cc |  4 ++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index fa098f9ff4e..712c04ec278 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned 
> > > char swap = 0)
> > >case IFN_MASK_GATHER_LOAD:
> >

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-12 Thread Jakub Jelinek

On Thu, Oct 12, 2023 at 11:54:14AM +0100, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > @@ -2036,11 +2075,20 @@ wi::lrshift_large (HOST_WIDE_INT *val, c
> >unsigned int xlen, unsigned int xprecision,
> >unsigned int precision, unsigned int shift)
> >  {
> > -  unsigned int len = rshift_large_common (val, xval, xlen, xprecision, 
> > shift);
> > +  /* Work out how many blocks are needed to store the significant bits
> > + (excluding the upper zeros or signs).  */
> > +  unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
> > +  unsigned int len = blocks_needed;
> > +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> > +  && len > xlen
> > +  && xval[xlen - 1] >= 0)
> > +len = xlen;
> 
> I think here too it would be worth dropping the:
> 
>   UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> 
> part of the condition, since presumably the change should be safe
> regardless of that.

If so, there is also one spot in lshift_large as well.  So incrementally:

--- gcc/wide-int.cc 2023-10-11 14:41:23.719132402 +0200
+++ gcc/wide-int.cc 2023-10-11 14:41:23.719132402 +0200
@@ -2013,8 +2013,7 @@
 
   /* The whole-block shift fills with zeros.  */
   unsigned int len = BLOCKS_NEEDED (precision);
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
-len = xlen + skip + 1;
+  len = MIN (xlen + skip + 1, len);
   for (unsigned int i = 0; i < skip; ++i)
 val[i] = 0;
 
@@ -2079,9 +2078,7 @@
  (excluding the upper zeros or signs).  */
   unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
   unsigned int len = blocks_needed;
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
-  && len > xlen
-  && xval[xlen - 1] >= 0)
+  if (len > xlen && xval[xlen - 1] >= 0)
 len = xlen;
 
   rshift_large_common (val, xval, xlen, shift, len);
@@ -2114,9 +2111,7 @@
   /* Work out how many blocks are needed to store the significant bits
  (excluding the upper zeros or signs).  */
   unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
-  unsigned int len = blocks_needed;
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) && len > xlen)
-len = xlen;
+  unsigned int len = MIN (xlen, blocks_needed);
 
   rshift_large_common (val, xval, xlen, shift, len);
 
which I'll test soon.

> OK for thw wide-int parts with those changes.

Thanks.  What do you think about that
--- gcc/wide-int.h.jj   2023-10-11 12:05:47.718059477 +0200
+++ gcc/wide-int.h  2023-10-11 13:51:56.081552500 +0200
@@ -1635,6 +1635,8 @@ widest_int_storage ::write_val (unsig
   u.valp = XNEWVEC (HOST_WIDE_INT, l);
   return u.valp;
 }
+  else if (CHECKING_P && l < WIDE_INT_MAX_INL_ELTS)
+u.val[l] = HOST_WIDE_INT_UC (0xbaaddeadbeef);
   return u.val;
 }
 
@@ -1650,6 +1652,9 @@ widest_int_storage ::set_len (unsigne
   memcpy (u.val, valp, l * sizeof (u.val[0]));
   XDELETEVEC (valp);
 }
+  else if (len && len < WIDE_INT_MAX_INL_ELTS)
+gcc_checking_assert ((unsigned HOST_WIDE_INT) u.val[len]
+== HOST_WIDE_INT_UC (0xbaaddeadbeef));
   len = l;
   /* There are no excess bits in val[len - 1].  */
   STATIC_ASSERT (N % HOST_BITS_PER_WIDE_INT == 0);

part, shall that go into trunk as well or is that too much slowdown
for checking builds?

Jakub

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-12 Thread Richard Biener

On Wed, 11 Oct 2023, Robin Dapp wrote:

> > It wasn't very clear, sorry, but it was the last sentence I was asking
> > for clarification on, not the other bits.  Why do we want to avoid
> > generating a COND_ADD when the operand is a vectorisable call?
> 
> Ah, I see, apologies.  Upon thinking about it a bit more (thanks)
> I figured this hunk is not necessary.  I added it early in the process
> in order to keep the current behavior for situations like the following:
> 
>  before:
>  _1 = .FMA (...)
>  _2 = COND (cond, .FMA, 0.0)
>  _3 = COND_ADD (true, result, _2, result)
> 
>  This we would simplify to:
>  _2 = COND_FMA (cond, ...)
>  _3 = COND_ADD (true, result, _2, result)
> 
>  with the patch we have:
>  _1 = .FMA (...)
>  _2 = .COND_ADD (cond, arg1, _1, arg1)
> 
> Due to differences in expansion we'd end up with a masked
> vfmacc ("a += a + b * c") before and now emit an unmasked
> vfmadd ("a += a * b + c") and a masked result add.  This shouldn't
> be worse from a vector spec point of view, so I just changed the
> test expectation for now.
> 
> The attached v4 also includes Richi's suggestion for the HONOR...
> stuff.
> 
> Bootstrap and regtest unchanged on aarch64, x86 and power10.

OK

Thanks,
Richard.

> Regards
>  Robin
> 
> 
> From 1752507ce22c22b50b96f889dc0a9c2fc8e50859 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Wed, 13 Sep 2023 22:19:35 +0200
> Subject: [PATCH v4] ifcvt/vect: Emit COND_ADD for conditional scalar
>  reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 
>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>  gcc/tree-if-conv.cc   |  49 --
>  gcc/tree-vect-loop.cc | 156 ++
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 446 insertions(+), 49 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_plus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_minus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener

On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> I restrict as you said into vect_external_def.
> 
> Then this condition made SLP failed:
> 
> -  if (mask_index >= 0
> +  if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
>   && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
>   &mask, NULL, &mask_dt, &mask_vectype))
> return false;
>
> So I add 'internal_fn_len_index (ifn) < 0' for MASK_LEN_GATHER_LOAD does not 
> check scalar mask.

Rather figure why.
 
> Then ICE here:
> 
> vect_slp_analyze_node_operations
> if (child
>   && (SLP_TREE_DEF_TYPE (child) == vect_constant_def
>   || SLP_TREE_DEF_TYPE (child) == vect_external_def)
>   /* Perform usual caching, note code-generation still
>  code-gens these nodes multiple times but we expect
>  to CSE them later.  */
>   && !visited_set.add (child))
> {
>   visited_vec.safe_push (child);
>   /* ???  After auditing more code paths make a "default"
>  and push the vector type from NODE to all children
>  if it is not already set.  */
>   /* Compute the number of vectors to be generated.  */
>   tree vector_type = SLP_TREE_VECTYPE (child);
>   if (!vector_type)
> {
>   /* For shifts with a scalar argument we don't need
>  to cost or code-generate anything.
>  ???  Represent this more explicitely.  */
>   gcc_assert ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) 
> > assert FAILed.
>== shift_vec_info_type)
>   && j == 1);
>   continue;
> }
> 
> Could you help me with that?
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD 
> > > SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP st

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener

On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:

> Oh. I see.
> 
> Here make vect_constant_def failed to SLP:
> 
> tree-vect-slp.cc:
> vect_build_slp_tree_2
> line 2354:
> 
>   if (oprnd_info->first_dt == vect_external_def
>   || oprnd_info->first_dt == vect_constant_def)
> {
>   slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops);
>   SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt;
>   oprnd_info->ops = vNULL;
>   children.safe_push (invnode);
>   continue;
> }
> 
> It seems that we handle vect_constant_def same as vect_external_def.
> So failed to SLP ?

Why?  We _should_ see a SLP node for the all-true mask operand.

> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD 
> > > SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > 
> > > > The root cause of these FAIL is that GCC SLP failed on 
> > > > MASK_LEN_GATHER_LOAD.
> > > > 
> > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > > tree-vect-patterns.cc if it is same
> > > > situation as GATHER_LOAD (no conditional mask).
> > > > 
> > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if 
> > > > mask argument is a dummy mask.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * tree-vect-slp.cc (vect_get_operand_map):
> > > > (vect_build_slp_tree_1):
> > > > (vect_build_slp_tree_2):
> > > > * tree-vect-stmts.cc (vectorizable_load):
> > > > 
> > > > ---
> > > >  gcc/tree-vect-slp.cc   | 18 --
> > > >  gcc/tree-vect-stmts.cc |  4 ++--
> > > >  2 files changed, 18 ins

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener

On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:

> In tree-vect-stmts.cc
> 
> vect_check_scalar_mask
> 
> Failed here:
> 
>   /* If the caller is not prepared for adjusting an external/constant
>  SLP mask vector type fail.  */
>   if (slp_node
>   && !mask_node

^^^

where's the mask_node?

>   && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "SLP mask argument is not vectorized.\n");
>   return false;
> }
> 
> If we allow vect_constant_def, we should adjust constant SLP mask ? in the 
> caller "vectorizable_load" ?
> 
> But I don't know how to adjust that.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD 
> > > SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > 
> > > > The root cause of these FAIL is that GCC SLP failed on 
> > > > MASK_LEN_GATHER_LOAD.
> > > > 
> > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > > tree-vect-patterns.cc if it is same
> > > > situation as GATHER_LOAD (no conditional mask).
> > > > 
> > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if 
> > > > mask argument is a dummy mask.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * tree-vect-slp.cc (vect_get_operand_map):
> > > > (vect_build_slp_tree_1):
> > > > (vect_build_slp_tree_2):
> > > > * tree-vect-stmts.cc (vectorizable_load):
> > > > 
> > > > ---
> > > >  gcc/tree-vect-slp.cc   | 18 --
> > > >  gcc/tree-vect-stmts.cc |  4 ++--
> > > >  2 files

Re: [PATCH] PR target/111778 - Fix undefined shifts in PowerPC compiler

2023-10-12 Thread Jiufu Guo



Hi,

Thanks for your quick fix!

Michael Meissner  writes:

> I was building a cross compiler to PowerPC on my x86_86 workstation with the
> latest version of GCC on October 11th.  I could not build the compiler on the
> x86_64 system as it died in building libgcc.  I looked into it, and I
> discovered the compiler was recursing until it ran out of stack space.  If I
> build a native compiler with the same sources on a PowerPC system, it builds
> fine.
>
> I traced this down to a change made around October 10th:
>
> | commit 8f1a70a4fbcc6441c70da60d4ef6db1e5635e18a (HEAD)
> | Author: Jiufu Guo 
> | Date:   Tue Jan 10 20:52:33 2023 +0800
> |
> |   rs6000: build constant via li/lis;rldicl/rldicr
> |
> |   If a constant is possible left/right cleaned on a rotated value from
> |   a negative value of "li/lis".  Then, using "li/lis ; rldicl/rldicr"
> |   to build the constant.
>
> The code was doing a -1 << 64 which is undefined behavior because different
> machines produce different results.  On the x86_64 system, (-1 << 64) produces
> -1 while on a PowerPC 64-bit system, (-1 << 64) produces 0.  The x86_64 then
> recurses until the stack runs out of space.
>
> If I apply this patch, the compiler builds fine on both x86_64 as a PowerPC
> crosss compiler and on a native PowerPC system.
>
> Can I check this into the master branch to fix the problem?
>
> 2023-10-12  Michael Meissner  
>
> gcc/
>
>   PR target/111778
>   * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): Protect
>   code from shifts that are undefined.
>   (can_be_built_by_li_lis_and_rldicr): Likewise.
>   (can_be_built_by_li_and_rldic): Protect code from shifts that
>   undefined.  Also replace uses of 1ULL with HOST_WIDE_INT_1U.
>
> ---
>  gcc/config/rs6000/rs6000.cc | 29 ++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 2828f01413c..cc24dd5301e 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10370,6 +10370,11 @@ can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, 
> int *shift,
>/* Leading zeros may be cleaned by rldicl with a mask.  Change leading 
> zeros
>   to ones and then recheck it.  */
>int lz = clz_hwi (c);
> +
> +  /* If lz == 0, the left shift is undefined.  */
> +  if (!lz)
> +return false;
> +
Thanks! This should be checked.
If "lz" is zero, it means for input "C", there is no leading
zeros which are cleanded by "rldicl". And then no future analyzing is
needed.

>HOST_WIDE_INT unmask_c
>  = c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
>int n;
> @@ -10398,6 +10403,11 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, 
> int *shift,
>/* Tailing zeros may be cleaned by rldicr with a mask.  Change tailing 
> zeros
>   to ones and then recheck it.  */
>int tz = ctz_hwi (c);
> +
> +  /* If tz == HOST_BITS_PER_WIDE_INT, the left shift is undefined.  */
> +  if (tz >= HOST_BITS_PER_WIDE_INT)
> +return false;
> +
This is correct in theory and could make sure "tz" is ok.
Just one minor thing:
"ctz_hwi" would not return value greater than HOST_BITS_PER_WIDE_INT
other than 0, right? 

>HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
>int n;
>if (can_be_rotated_to_lowbits (~unmask_c, 15, &n)
> @@ -10428,8 +10438,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
> *shift, HOST_WIDE_INT *mask)
>   right bits are shifted as 0's, and left 1's(and x's) are cleaned.  */
>int tz = ctz_hwi (c);
>int lz = clz_hwi (c);
> +
> +  /* If lz == HOST_BITS_PER_WIDE_INT, the left shift is undefined.  */
> +  if (lz >= HOST_BITS_PER_WIDE_INT)
> +return false;
> +
This maybe similar.

>int middle_ones = clz_hwi (~(c << lz));
> -  if (tz + lz + middle_ones >= ones)
> +  if (tz + lz + middle_ones >= ones
> +  && (tz - lz) < HOST_BITS_PER_WIDE_INT
> +  && tz < HOST_BITS_PER_WIDE_INT)
>  {
>*mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
>*shift = tz;
> @@ -10440,7 +10457,8 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
> *shift, HOST_WIDE_INT *mask)
>int leading_ones = clz_hwi (~c);
>int tailing_ones = ctz_hwi (~c);
>int middle_zeros = ctz_hwi (c >> tailing_ones);
> -  if (leading_ones + tailing_ones + middle_zeros >= ones)
> +  if (leading_ones + tailing_ones + middle_zeros >= ones
> +  && middle_zeros < HOST_BITS_PER_WIDE_INT)
Thanks.
>  {
>*mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
>*shift = tailing_ones + middle_zeros;
> @@ -10450,10 +10468,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
> *shift, HOST_WIDE_INT *mask)
>/* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
>/* Get the position for the first bit of successive 1.
>   The 24th bit would be in successive 0 or 1.  */
> -  HOST_WIDE_INT low_mask = (1LL << 24) -

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

The mask node is NULL since the caller :

  if (mask_index >= 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
  &mask, NULL, &mask_dt, &mask_vectype))
return false;

pass NULL to mask_node.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 19:14
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> In tree-vect-stmts.cc
> 
> vect_check_scalar_mask
> 
> Failed here:
> 
>   /* If the caller is not prepared for adjusting an external/constant
>  SLP mask vector type fail.  */
>   if (slp_node
>   && !mask_node
 
^^^
 
where's the mask_node?
 
>   && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "SLP mask argument is not vectorized.\n");
>   return false;
> }
> 
> If we allow vect_constant_def, we should adjust constant SLP mask ? in the 
> caller "vectorizable_load" ?
> 
> But I don't know how to adjust that.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD 
> > > SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > 
> > > > The root cause of these FAIL is that GCC SLP failed on 
> > > > MASK_LEN_GATHER_LOAD.
> > > > 
> > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > > tree-vect-patterns.cc if it is same
> > > > situation as GATHER_LOAD (no conditi

Re: [PATCH] PR target/111778 - Fix undefined shifts in PowerPC compiler

2023-10-12 Thread David Edelsohn

On Thu, Oct 12, 2023 at 4:24 AM Michael Meissner 
wrote:

> I was building a cross compiler to PowerPC on my x86_86 workstation with
> the
> latest version of GCC on October 11th.  I could not build the compiler on
> the
> x86_64 system as it died in building libgcc.  I looked into it, and I
> discovered the compiler was recursing until it ran out of stack space.  If
> I
> build a native compiler with the same sources on a PowerPC system, it
> builds
> fine.
>
> I traced this down to a change made around October 10th:
>
> | commit 8f1a70a4fbcc6441c70da60d4ef6db1e5635e18a (HEAD)
> | Author: Jiufu Guo 
> | Date:   Tue Jan 10 20:52:33 2023 +0800
> |
> |   rs6000: build constant via li/lis;rldicl/rldicr
> |
> |   If a constant is possible left/right cleaned on a rotated value from
> |   a negative value of "li/lis".  Then, using "li/lis ; rldicl/rldicr"
> |   to build the constant.
>
> The code was doing a -1 << 64 which is undefined behavior because different
> machines produce different results.  On the x86_64 system, (-1 << 64)
> produces
> -1 while on a PowerPC 64-bit system, (-1 << 64) produces 0.  The x86_64
> then
> recurses until the stack runs out of space.
>
> If I apply this patch, the compiler builds fine on both x86_64 as a PowerPC
> crosss compiler and on a native PowerPC system.
>
> Can I check this into the master branch to fix the problem?
>

Thanks for finding and debugging this problem.  This is okay.

Thanks, David


>
> 2023-10-12  Michael Meissner  
>
> gcc/
>
> PR target/111778
> * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl):
> Protect
> code from shifts that are undefined.
> (can_be_built_by_li_lis_and_rldicr): Likewise.
> (can_be_built_by_li_and_rldic): Protect code from shifts that
> undefined.  Also replace uses of 1ULL with HOST_WIDE_INT_1U.
>
> ---
>  gcc/config/rs6000/rs6000.cc | 29 ++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 2828f01413c..cc24dd5301e 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10370,6 +10370,11 @@ can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT
> c, int *shift,
>/* Leading zeros may be cleaned by rldicl with a mask.  Change leading
> zeros
>   to ones and then recheck it.  */
>int lz = clz_hwi (c);
> +
> +  /* If lz == 0, the left shift is undefined.  */
> +  if (!lz)
> +return false;
> +
>HOST_WIDE_INT unmask_c
>  = c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
>int n;
> @@ -10398,6 +10403,11 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT
> c, int *shift,
>/* Tailing zeros may be cleaned by rldicr with a mask.  Change tailing
> zeros
>   to ones and then recheck it.  */
>int tz = ctz_hwi (c);
> +
> +  /* If tz == HOST_BITS_PER_WIDE_INT, the left shift is undefined.  */
> +  if (tz >= HOST_BITS_PER_WIDE_INT)
> +return false;
> +
>HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
>int n;
>if (can_be_rotated_to_lowbits (~unmask_c, 15, &n)
> @@ -10428,8 +10438,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c,
> int *shift, HOST_WIDE_INT *mask)
>   right bits are shifted as 0's, and left 1's(and x's) are cleaned.  */
>int tz = ctz_hwi (c);
>int lz = clz_hwi (c);
> +
> +  /* If lz == HOST_BITS_PER_WIDE_INT, the left shift is undefined.  */
> +  if (lz >= HOST_BITS_PER_WIDE_INT)
> +return false;
> +
>int middle_ones = clz_hwi (~(c << lz));
> -  if (tz + lz + middle_ones >= ones)
> +  if (tz + lz + middle_ones >= ones
> +  && (tz - lz) < HOST_BITS_PER_WIDE_INT
> +  && tz < HOST_BITS_PER_WIDE_INT)
>  {
>*mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
>*shift = tz;
> @@ -10440,7 +10457,8 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int
> *shift, HOST_WIDE_INT *mask)
>int leading_ones = clz_hwi (~c);
>int tailing_ones = ctz_hwi (~c);
>int middle_zeros = ctz_hwi (c >> tailing_ones);
> -  if (leading_ones + tailing_ones + middle_zeros >= ones)
> +  if (leading_ones + tailing_ones + middle_zeros >= ones
> +  && middle_zeros < HOST_BITS_PER_WIDE_INT)
>  {
>*mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
>*shift = tailing_ones + middle_zeros;
> @@ -10450,10 +10468,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c,
> int *shift, HOST_WIDE_INT *mask)
>/* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
>/* Get the position for the first bit of successive 1.
>   The 24th bit would be in successive 0 or 1.  */
> -  HOST_WIDE_INT low_mask = (1LL << 24) - 1LL;
> +  HOST_WIDE_INT low_mask = (HOST_WIDE_INT_1U << 24) - HOST_WIDE_INT_1U;
>int pos_first_1 = ((c & (low_mask + 1)) == 0)
>   ? clz_hwi (c & low_mask)
>   : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask));
> +
> +  /* Make s

Re: [PATCH-1v2, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-12 Thread Richard Sandiford

HAO CHEN GUI  writes:
> Hi,
>   Vector mode instructions are efficient on some targets (e.g. ppc64).
> This patch enables vector mode for compare_by_pieces. The non-member
> function widest_fixed_size_mode_for_size takes by_pieces_operation
> as the second argument and decide whether vector mode is enabled or
> not by the type of operations. Currently only set and compare enabled
> vector mode and do the optab checking correspondingly.
>
>   The test case is in the second patch which is rs6000 specific.
>
>   Compared to last version, the main change is to enable vector mode
> for compare_by_pieces in smallest_fixed_size_mode_for_size which
> is used for overlapping compare.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions.
>
> Thanks
> Gui Haochen
>
> ChangeLog
> Expand: Enable vector mode for pieces compares
>
> Vector mode compare instructions are efficient for equality compare on
> rs6000. This patch refactors the codes of pieces operation to enable
> vector mode for compare.
>
> gcc/
>   PR target/111449
>   * expr.cc (widest_fixed_size_mode_for_size): Enable vector mode
>   for compare.  Replace the second argument with the type of pieces
>   operation.  Add optab checks for vector mode used in compare.
>   (by_pieces_ninsns): Pass the type of pieces operation to
>   widest_fixed_size_mode_for_size.
>   (class op_by_pieces_d): Define virtual function
>   widest_fixed_size_mode_for_size and optab_checking.
>   (op_by_pieces_d::op_by_pieces_d): Call outer function
>   widest_fixed_size_mode_for_size.
>   (op_by_pieces_d::get_usable_mode): Call class function
>   widest_fixed_size_mode_for_size.
>   (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
>   optab_checking for different types of operations.
>   (op_by_pieces_d::run): Call class function
>   widest_fixed_size_mode_for_size.
>   (class move_by_pieces_d): Declare function
>   widest_fixed_size_mode_for_size.
>   (move_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
>   (class store_by_pieces_d): Declare function
>   widest_fixed_size_mode_for_size and optab_checking.
>   (store_by_pieces_d::optab_checking): Implement.
>   (store_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
>   (can_store_by_pieces): Pass the type of pieces operation to
>   widest_fixed_size_mode_for_size.
>   (class compare_by_pieces_d): Declare function
>   widest_fixed_size_mode_for_size and optab_checking.
>   (compare_by_pieces_d::compare_by_pieces_d): Set m_qi_vector_mode
>   to true to enable vector mode.
>   (compare_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
>   (compare_by_pieces_d::optab_checking): Implement.
>
> patch.diff
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 9a37bff1fdd..e83c0a378ed 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -992,8 +992,9 @@ alignment_for_piecewise_move (unsigned int max_pieces, 
> unsigned int align)
> that is narrower than SIZE bytes.  */
>
>  static fixed_size_mode
> -widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)
> +widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op)

The comment above the function needs to be updated to describe the
new parameter.

>  {
> +  bool qi_vector = ((op == COMPARE_BY_PIECES) || op == SET_BY_PIECES);

Nit: redundant brackets around the first comparison.

>fixed_size_mode result = NARROWEST_INT_MODE;
>
>gcc_checking_assert (size > 1);
> @@ -1009,8 +1010,13 @@ widest_fixed_size_mode_for_size (unsigned int size, 
> bool qi_vector)
> {
>   if (GET_MODE_SIZE (candidate) >= size)
> break;
> - if (optab_handler (vec_duplicate_optab, candidate)
> - != CODE_FOR_nothing)
> + if ((op == SET_BY_PIECES
> +  && optab_handler (vec_duplicate_optab, candidate)
> +!= CODE_FOR_nothing)
> +  || (op == COMPARE_BY_PIECES
> +  && optab_handler (mov_optab, mode)
> + != CODE_FOR_nothing
> +  && can_compare_p (EQ, mode, ccp_jump)))
> result = candidate;
> }
>
> @@ -1061,8 +1067,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
> int align,
>  {
>/* NB: Round up L and ALIGN to the widest integer mode for
>MAX_SIZE.  */
> -  mode = widest_fixed_size_mode_for_size (max_size,
> -   op == SET_BY_PIECES);
> +  mode = widest_fixed_size_mode_for_size (max_size, op);
>if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
>   {
> unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
> @@ -1076,8 +1081,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
> int align,
>
>while (max_size > 1 && l > 0)
>  {
> -  mode = widest_fixed_size_mode_for_size (max_size,
> -

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-12 Thread Richard Sandiford

Jakub Jelinek  writes:
> On Thu, Oct 12, 2023 at 11:54:14AM +0100, Richard Sandiford wrote:
>> Jakub Jelinek  writes:
>> > @@ -2036,11 +2075,20 @@ wi::lrshift_large (HOST_WIDE_INT *val, c
>> >   unsigned int xlen, unsigned int xprecision,
>> >   unsigned int precision, unsigned int shift)
>> >  {
>> > -  unsigned int len = rshift_large_common (val, xval, xlen, xprecision, 
>> > shift);
>> > +  /* Work out how many blocks are needed to store the significant bits
>> > + (excluding the upper zeros or signs).  */
>> > +  unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
>> > +  unsigned int len = blocks_needed;
>> > +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
>> > +  && len > xlen
>> > +  && xval[xlen - 1] >= 0)
>> > +len = xlen;
>> 
>> I think here too it would be worth dropping the:
>> 
>>   UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
>> 
>> part of the condition, since presumably the change should be safe
>> regardless of that.
>
> If so, there is also one spot in lshift_large as well.  So incrementally:
>
> --- gcc/wide-int.cc   2023-10-11 14:41:23.719132402 +0200
> +++ gcc/wide-int.cc   2023-10-11 14:41:23.719132402 +0200
> @@ -2013,8 +2013,7 @@
>  
>/* The whole-block shift fills with zeros.  */
>unsigned int len = BLOCKS_NEEDED (precision);
> -  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
> -len = xlen + skip + 1;
> +  len = MIN (xlen + skip + 1, len);
>for (unsigned int i = 0; i < skip; ++i)
>  val[i] = 0;
>  
> @@ -2079,9 +2078,7 @@
>   (excluding the upper zeros or signs).  */
>unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
>unsigned int len = blocks_needed;
> -  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> -  && len > xlen
> -  && xval[xlen - 1] >= 0)
> +  if (len > xlen && xval[xlen - 1] >= 0)
>  len = xlen;
>  
>rshift_large_common (val, xval, xlen, shift, len);
> @@ -2114,9 +2111,7 @@
>/* Work out how many blocks are needed to store the significant bits
>   (excluding the upper zeros or signs).  */
>unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
> -  unsigned int len = blocks_needed;
> -  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) && len > xlen)
> -len = xlen;
> +  unsigned int len = MIN (xlen, blocks_needed);
>  
>rshift_large_common (val, xval, xlen, shift, len);
>  
> which I'll test soon.

LGTM.

>> OK for thw wide-int parts with those changes.
>
> Thanks.  What do you think about that
> --- gcc/wide-int.h.jj 2023-10-11 12:05:47.718059477 +0200
> +++ gcc/wide-int.h2023-10-11 13:51:56.081552500 +0200
> @@ -1635,6 +1635,8 @@ widest_int_storage ::write_val (unsig
>u.valp = XNEWVEC (HOST_WIDE_INT, l);
>return u.valp;
>  }
> +  else if (CHECKING_P && l < WIDE_INT_MAX_INL_ELTS)
> +u.val[l] = HOST_WIDE_INT_UC (0xbaaddeadbeef);
>return u.val;
>  }
>  
> @@ -1650,6 +1652,9 @@ widest_int_storage ::set_len (unsigne
>memcpy (u.val, valp, l * sizeof (u.val[0]));
>XDELETEVEC (valp);
>  }
> +  else if (len && len < WIDE_INT_MAX_INL_ELTS)
> +gcc_checking_assert ((unsigned HOST_WIDE_INT) u.val[len]
> +  == HOST_WIDE_INT_UC (0xbaaddeadbeef));
>len = l;
>/* There are no excess bits in val[len - 1].  */
>STATIC_ASSERT (N % HOST_BITS_PER_WIDE_INT == 0);
>
> part, shall that go into trunk as well or is that too much slowdown
> for checking builds?

I don't have a good intuition about how big the slowdown will be,
but FWIW I agree with Richi that it'd be better to include the change.
We can always take it out again if it proves to be unexpectedly expensive.

Thanks,
Richard

Re: [PATCH V2] Emit funcall external declarations only if actually used.

2023-10-12 Thread Jose E. Marchesi

Hi Richard.
Thanks for looking at this! :)

> "Jose E. Marchesi"  writes:
>> ping
>
> I don't know this code very well, and have AFAIR haven't worked
> with an assembler that requires external declarations, but since
> it's at a second ping :)
>
>>
>>> ping
>>>
 [Differences from V1:
 - Prototype for call_from_call_insn moved before comment block.
 - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
 - Fallback to check REG_CALL_DECL in non-direct calls.
 - New test to check correct behavior for non-direct calls.]

 There are many places in GCC where alternative local sequences are
 tried in order to determine what is the cheapest or best alternative
 to use in the current target.  When any of these sequences involve a
 libcall, the current implementation of emit_library_call_value_1
 introduce a side-effect consisting on emitting an external declaration
 for the funcall (such as __divdi3) which is thus emitted even if the
 sequence that does the libcall is not retained.

 This is problematic in targets such as BPF, because the kernel loader
 chokes on the spurious symbol __divdi3 and makes the resulting BPF
 object unloadable.  Note that BPF objects are not linked before being
 loaded.

 This patch changes emit_library_call_value_1 to mark the target
 SYMBOL_REF as a libcall.  Then, the emission of the external
 declaration is done in the first loop of final.cc:shorten_branches.
 This happens only if the corresponding sequence has been kept.

 Regtested in x86_64-linux-gnu.
 Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>
> I'm not sure that shorten_branches is a natural place to do this.
> It isn't something that would normally emit asm text.

Well, that was the approach suggested by another reviewer (Jakub) once
my initial approach (in the V1) got rejected.  He explicitly suggested
to use shorten_branches.

> Would it be OK to emit the declaration at the same point as for decls,
> which IIUC is process_pending_assemble_externals?  If so, how about
> making assemble_external_libcall add the symbol to a list when
> !SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
> directly?  assemble_external_libcall could then also call get_identifier
> on the name (perhaps after calling strip_name_encoding -- can't
> remember whether assemble_external_libcall sees the encoded or
> unencoded name).
>
> All being well, the call to get_identifier should cause
> assemble_name_resolve to record when the name is used, via
> TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
> go through the list of libcalls recorded by assemble_external_libcall
> and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.
>
> Not super elegant, but it seems to fit within the existing scheme.
> And I don't there should be any problem with using get_identifier
> for libcalls, since it isn't valid to use libcall names for other
> types of symbol.

This sounds way more complicated to me than the approach in V2, which
seems to work and is thus a clear improvement compared to the current
situation in the trunk.  The approach in V2 may be ugly, but it is
simple and easy to understand.  Is the proposed more convoluted
alternative really worth the extra complexity, given it is "not super
elegant"?

I am willing to give it a try if you insist on it, but I wouldn't want a
V3 series based on that approach to be deflected again on the basis of a
yet another more potentially elegant solution... that wont' converge,
and I need this fixed for the BPF backend..

>
> Thanks,
> Richard
>

 gcc/ChangeLog

* rtl.h (SYMBOL_REF_LIBCALL): Define.
* calls.cc (emit_library_call_value_1): Do not emit external
libcall declaration here.
* final.cc (shorten_branches): Do it here.

 gcc/testsuite/ChangeLog

* gcc.target/bpf/divmod-libcall-1.c: New test.
* gcc.target/bpf/divmod-libcall-2.c: Likewise.
* gcc.c-torture/compile/libcall-2.c: Likewise.
 ---
  gcc/calls.cc  |  9 +++---
  gcc/final.cc  | 30 +++
  gcc/rtl.h |  5 
  .../gcc.c-torture/compile/libcall-2.c |  8 +
  .../gcc.target/bpf/divmod-libcall-1.c | 19 
  .../gcc.target/bpf/divmod-libcall-2.c | 16 ++
  6 files changed, 83 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c

 diff --git a/gcc/calls.cc b/gcc/calls.cc
 index 1f3a6d5c450..219ea599b16 100644
 --- a/gcc/calls.cc
 +++ b/gcc/calls.cc
 @@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval

Re: [PATCH] tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA

2023-10-12 Thread Richard Biener

On Thu, 12 Oct 2023, Richard Biener wrote:

> The following handles byte-aligned, power-of-two and byte-multiple
> sized BIT_FIELD_REF reads in SRA.  In particular this should cover
> BIT_FIELD_REFs created by optimize_bit_field_compare.
> 
> For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF
> appearing there leading to more DSE, fully eliding the aggregates.
> 
> This results in the same false positive -Wuninitialized as the
> older attempt to remove the folding from optimize_bit_field_compare,
> fixed by initializing part of the aggregate unconditionally.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages.
> 
> Martin is on leave so I'll push this tomorrow unless the Fortran
> folks have objections.

Err, and I forgot that hunk.  It's

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 7beefa2e69c..1b8be081a17 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -12015,7 +12015,10 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * 
expr2, bool init_flag,
 && !is_runtime_conformable (expr1, expr2);
 
   /* Only analyze the expressions for coarray properties, when in coarray-lib
- mode.  */
+ mode.  Avoid false-positive uninitialized diagnostics with initializing
+ the codimension flag unconditionally.  */
+  lhs_caf_attr.codimension = false;
+  rhs_caf_attr.codimension = false;
   if (flag_coarray == GFC_FCOARRAY_LIB)
 {
   lhs_caf_attr = gfc_caf_attr (expr1, false, &lhs_refs_comp);


> Thanks,
> Richard.
> 
>   PR tree-optimization/111779
> gcc/
>   * tree-sra.cc (sra_handled_bf_read_p): New function.
>   (build_access_from_expr_1): Handle some BIT_FIELD_REFs.
>   (sra_modify_expr): Likewise.
>   (make_fancy_name_1): Skip over BIT_FIELD_REF.
> 
> gcc/fortran/
>   * trans-expr.cc (gfc_trans_assignment_1): Initialize
>   lhs_caf_attr and rhs_caf_attr codimension flag to avoid
>   false positive -Wuninitialized.
> 
> gcc/testsuite/
>   * gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE.
>   * gcc.dg/vect/vect-pr111779.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c |  4 +-
>  gcc/testsuite/gcc.dg/vect/vect-pr111779.c  | 56 ++
>  gcc/tree-sra.cc| 24 --
>  3 files changed, 79 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr111779.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> index e3c33f49ef6..43152de5616 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> @@ -31,5 +31,5 @@ constraint_equal (struct constraint a, struct constraint b)
>  && constraint_expr_equal (a.rhs, b.rhs);
>  }
>  
> -/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 1 "dse1" } } 
> */
> -/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 1 "dse1" } } 
> */
> +/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 2 "dse1" } } 
> */
> +/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 2 "dse1" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-pr111779.c 
> b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
> new file mode 100644
> index 000..79b72aebc78
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
> @@ -0,0 +1,56 @@
> +#include 
> +#include "tree-vect.h"
> +
> +struct C
> +{
> +int c;
> +int d;
> +bool f :1;
> +float e;
> +};
> +
> +struct A
> +{
> +  unsigned int a;
> +  unsigned char c1, c2;
> +  bool b1 : 1;
> +  bool b2 : 1;
> +  bool b3 : 1;
> +  struct C b4;
> +};
> +
> +void __attribute__((noipa))
> +foo (const struct A * __restrict x, int y)
> +{
> +  int s = 0, i = 0;
> +  for (i = 0; i < y; ++i)
> +{
> +  const struct A a = x[i];
> +  s += a.b4.f ? 1 : 0;
> +}
> +  if (s != 0)
> +__builtin_abort ();
> +}
> +
> +int
> +main ()
> +{
> +  struct A x[100];
> +  int i;
> +
> +  check_vect ();
> +
> +  __builtin_memset (x, -1, sizeof (x));
> +#pragma GCC novect
> +  for (i = 0; i < 100; i++)
> +{
> +  x[i].b1 = false;
> +  x[i].b2 = false;
> +  x[i].b3 = false;
> +  x[i].b4.f = false;
> +}
> +  foo (x, 100);
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_int } 
> } } */
> diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
> index 56a8ba26135..24d0c20da6a 100644
> --- a/gcc/tree-sra.cc
> +++ b/gcc/tree-sra.cc
> @@ -1113,6 +1113,21 @@ disqualify_base_of_expr (tree t, const char *reason)
>  disqualify_candidate (t, reason);
>  }
>  
> +/* Return true if the BIT_FIELD_REF read EXPR is handled by SRA.  */
> +
> +static bool
> +sra_handled_bf_read_p (tree expr)
> +{
> +  uint64_t size, offset;
> +  if (bit_field_size (expr).is_constant (&size)
> +  && bit_field_offset (expr).is_constant (&offset)
> +  && size % BITS_PER_UNIT == 0
> +  && offset % B

Re: [PATCH] tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA

2023-10-12 Thread Andre Vehreschild

Hi Richard,

being the one who wrote the surrounding code:
The fortran part looks good to me.

Ok for merge from the fortran side.

- Andre

On Thu, 12 Oct 2023 11:44:01 + (UTC)
Richard Biener  wrote:

> On Thu, 12 Oct 2023, Richard Biener wrote:
>
> > The following handles byte-aligned, power-of-two and byte-multiple
> > sized BIT_FIELD_REF reads in SRA.  In particular this should cover
> > BIT_FIELD_REFs created by optimize_bit_field_compare.
> >
> > For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF
> > appearing there leading to more DSE, fully eliding the aggregates.
> >
> > This results in the same false positive -Wuninitialized as the
> > older attempt to remove the folding from optimize_bit_field_compare,
> > fixed by initializing part of the aggregate unconditionally.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages.
> >
> > Martin is on leave so I'll push this tomorrow unless the Fortran
> > folks have objections.
>
> Err, and I forgot that hunk.  It's
>
> diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
> index 7beefa2e69c..1b8be081a17 100644
> --- a/gcc/fortran/trans-expr.cc
> +++ b/gcc/fortran/trans-expr.cc
> @@ -12015,7 +12015,10 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr *
> expr2, bool init_flag, && !is_runtime_conformable (expr1, expr2);
>
>/* Only analyze the expressions for coarray properties, when in coarray-lib
> - mode.  */
> + mode.  Avoid false-positive uninitialized diagnostics with initializing
> + the codimension flag unconditionally.  */
> +  lhs_caf_attr.codimension = false;
> +  rhs_caf_attr.codimension = false;
>if (flag_coarray == GFC_FCOARRAY_LIB)
>  {
>lhs_caf_attr = gfc_caf_attr (expr1, false, &lhs_refs_comp);
>
>
> > Thanks,
> > Richard.
> >
> > PR tree-optimization/111779
> > gcc/
> > * tree-sra.cc (sra_handled_bf_read_p): New function.
> > (build_access_from_expr_1): Handle some BIT_FIELD_REFs.
> > (sra_modify_expr): Likewise.
> > (make_fancy_name_1): Skip over BIT_FIELD_REF.
> >
> > gcc/fortran/
> > * trans-expr.cc (gfc_trans_assignment_1): Initialize
> > lhs_caf_attr and rhs_caf_attr codimension flag to avoid
> > false positive -Wuninitialized.
> >
> > gcc/testsuite/
> > * gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE.
> > * gcc.dg/vect/vect-pr111779.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c |  4 +-
> >  gcc/testsuite/gcc.dg/vect/vect-pr111779.c  | 56 ++
> >  gcc/tree-sra.cc| 24 --
> >  3 files changed, 79 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr111779.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c index e3c33f49ef6..43152de5616
> > 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> > @@ -31,5 +31,5 @@ constraint_equal (struct constraint a, struct constraint
> > b) && constraint_expr_equal (a.rhs, b.rhs);
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 1 "dse1" }
> > } */ -/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 1
> > "dse1" } } */ +/* { dg-final { scan-tree-dump-times "Deleted dead store: x
> > = " 2 "dse1" } } */ +/* { dg-final { scan-tree-dump-times "Deleted dead
> > store: y = " 2 "dse1" } } */ diff --git
> > a/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
> > b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c new file mode 100644 index
> > 000..79b72aebc78 --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
> > @@ -0,0 +1,56 @@
> > +#include 
> > +#include "tree-vect.h"
> > +
> > +struct C
> > +{
> > +int c;
> > +int d;
> > +bool f :1;
> > +float e;
> > +};
> > +
> > +struct A
> > +{
> > +  unsigned int a;
> > +  unsigned char c1, c2;
> > +  bool b1 : 1;
> > +  bool b2 : 1;
> > +  bool b3 : 1;
> > +  struct C b4;
> > +};
> > +
> > +void __attribute__((noipa))
> > +foo (const struct A * __restrict x, int y)
> > +{
> > +  int s = 0, i = 0;
> > +  for (i = 0; i < y; ++i)
> > +{
> > +  const struct A a = x[i];
> > +  s += a.b4.f ? 1 : 0;
> > +}
> > +  if (s != 0)
> > +__builtin_abort ();
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  struct A x[100];
> > +  int i;
> > +
> > +  check_vect ();
> > +
> > +  __builtin_memset (x, -1, sizeof (x));
> > +#pragma GCC novect
> > +  for (i = 0; i < 100; i++)
> > +{
> > +  x[i].b1 = false;
> > +  x[i].b2 = false;
> > +  x[i].b3 = false;
> > +  x[i].b4.f = false;
> > +}
> > +  foo (x, 100);
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_int
> > } } } */ diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
> > index 56a8ba26135..24d0c20da6a 100644
> > --- a/gcc/tree-sra.cc
> > +++ b/gcc/tree-sra.cc
> > @@ -1113,6 +1113,21 @@ disqualify_base_of_expr

Re: [PATCH] libstdc++: Fix tr1/8_c_compatibility/cstdio/functions.cc regression with recent glibc

2023-10-12 Thread Jonathan Wakely

On Thursday, 12 October 2023, Jakub Jelinek  wrote:
> Hi!
>
> The following testcase started FAILing recently after the
>
https://sourceware.org/git/?p=glibc.git;a=commit;h=64b1a44183a3094672ed304532bedb9acc707554
> glibc change which marked vfscanf with nonnull (1) attribute.
> While vfwscanf hasn't been marked similarly (strangely), the patch changes
> that too.  By using va_arg one hides the value of it from the compiler
> (volatile keyword would do too, or making the FILE* stream a function
> argument, but then it might need to be guarded by #if or something).
>
> Tested on x86_64-linux, ok for trunk?

OK, thanks.

>
> 2023-10-12  Jakub Jelinek  
>
> * testsuite/tr1/8_c_compatibility/cstdio/functions.cc (test01):
> Initialize stream to va_arg(ap, FILE*) rather than 0.
> * testsuite/tr1/8_c_compatibility/cwchar/functions.cc (test01):
> Likewise.
>
> --- libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdio/functions.cc.jj
2023-01-16 23:19:06.651711546 +0100
> +++ libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdio/functions.cc
2023-10-12 09:46:28.695011763 +0200
> @@ -35,7 +35,7 @@ void test01(int dummy, ...)
>char* s = 0;
>const char* cs = 0;
>const char* format = "%i";
> -  FILE* stream = 0;
> +  FILE* stream = va_arg(ap, FILE*);
>std::size_t n = 0;
>
>int ret;
> --- libstdc++-v3/testsuite/tr1/8_c_compatibility/cwchar/functions.cc.jj
2023-01-16 23:19:06.651711546 +0100
> +++ libstdc++-v3/testsuite/tr1/8_c_compatibility/cwchar/functions.cc
2023-10-12 09:46:19.236141897 +0200
> @@ -42,7 +42,7 @@ void test01(int dummy, ...)
>  #endif
>
>  #if _GLIBCXX_HAVE_VFWSCANF
> -  FILE* stream = 0;
> +  FILE* stream = va_arg(arg, FILE*);
>const wchar_t* format1 = 0;
>int ret1;
>ret1 = std::tr1::vfwscanf(stream, format1, arg);
>
> Jakub
>
>

[PATCH v9] Improve code sinking pass

2023-10-12 Thread Ajit Agarwal

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated. Synced with latest sources and modify the 
code changes
accordingly.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;

  if (a != 5)
{
  l = a + b + c + d +e + f;
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code after function calls.  This increases
register pressure for callee-saved registers.  The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.

2023-10-12  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-20.c: New test.
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++
 gcc/tree-ssa-sink.cc| 39 -
 3 files changed, 56 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index a360c5cdd6e..95298bc8402 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -174,7 +174,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
 
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
-   statements.
+   statements. The best basic block should be an immediate dominator of
+   best basic block if the use stmt is after the call.
 
We want the most control dependent block in the shallowest loop nest.
 
@@ -196,6 +197,16 @@ select_best_block (basic_block early_bb,
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
   int threshold;
+  /* Get the sinking threshold.  If the statement to be moved has memory
+ operands, then increase the threshold by 7% as those are even more
+ profitable to avoid, clamping at 100%.  */
+  threshold = param_sink_frequency_threshold;
+  if (gimple_vuse (stmt) || gimple_vdef (stmt))
+{
+  threshold += 7;
+  if (threshold > 100)
+   threshold = 100;
+}
 
   while (temp_bb != early_bb)
 {
@@ -204,6 +215,14 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
 
+  /* if we have temp_bb post dominated by use block block then immediate
+   * dominator would be our best block.  */
+  if (!gimple_vuse (stmt)
+ && bb_loop_depth (temp_bb) == bb_loop_depth (early_bb)
+ && !(temp_bb->count * 100 >= early_bb->count * threshold)
+ && dominated_by_p (CDI_DOMINATORS, late_bb, temp_bb))
+   best_bb = temp_bb;
+
   /* Walk up the dominator tree, hopefully we'll find a shallower
 loop nest.  */
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
@@ -233,17 +252,6 @@ select_best_block (basic_block early_bb,
   && !dominated_b

[PATCH v9] tree-ssa-sink: Improve code sinking pass

2023-10-12 Thread Ajit Agarwal

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated.
Synced with latest trunk sources and modify the sinking pass accordingly.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;

  if (a != 5)
{
  l = a + b + c + d +e + f;
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code after function calls.  This increases
register pressure for callee-saved registers.  The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.

2023-10-12  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-20.c: New test.
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++
 gcc/tree-ssa-sink.cc| 39 -
 3 files changed, 56 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index a360c5cdd6e..95298bc8402 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -174,7 +174,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
 
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
-   statements.
+   statements. The best basic block should be an immediate dominator of
+   best basic block if the use stmt is after the call.
 
We want the most control dependent block in the shallowest loop nest.
 
@@ -196,6 +197,16 @@ select_best_block (basic_block early_bb,
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
   int threshold;
+  /* Get the sinking threshold.  If the statement to be moved has memory
+ operands, then increase the threshold by 7% as those are even more
+ profitable to avoid, clamping at 100%.  */
+  threshold = param_sink_frequency_threshold;
+  if (gimple_vuse (stmt) || gimple_vdef (stmt))
+{
+  threshold += 7;
+  if (threshold > 100)
+   threshold = 100;
+}
 
   while (temp_bb != early_bb)
 {
@@ -204,6 +215,14 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
 
+  /* if we have temp_bb post dominated by use block block then immediate
+   * dominator would be our best block.  */
+  if (!gimple_vuse (stmt)
+ && bb_loop_depth (temp_bb) == bb_loop_depth (early_bb)
+ && !(temp_bb->count * 100 >= early_bb->count * threshold)
+ && dominated_by_p (CDI_DOMINATORS, late_bb, temp_bb))
+   best_bb = temp_bb;
+
   /* Walk up the dominator tree, hopefully we'll find a shallower
 loop nest.  */
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
@@ -233,17 +252,6 @@ select_best_block (basic_block early_bb,
   && !domina

Re: [PATCH V2] Emit funcall external declarations only if actually used.

2023-10-12 Thread Richard Sandiford

"Jose E. Marchesi"  writes:
> Hi Richard.
> Thanks for looking at this! :)
>
>
>> "Jose E. Marchesi"  writes:
>>> ping
>>
>> I don't know this code very well, and have AFAIR haven't worked
>> with an assembler that requires external declarations, but since
>> it's at a second ping :)
>>
>>>
 ping

> [Differences from V1:
> - Prototype for call_from_call_insn moved before comment block.
> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
> - Fallback to check REG_CALL_DECL in non-direct calls.
> - New test to check correct behavior for non-direct calls.]
>
> There are many places in GCC where alternative local sequences are
> tried in order to determine what is the cheapest or best alternative
> to use in the current target.  When any of these sequences involve a
> libcall, the current implementation of emit_library_call_value_1
> introduce a side-effect consisting on emitting an external declaration
> for the funcall (such as __divdi3) which is thus emitted even if the
> sequence that does the libcall is not retained.
>
> This is problematic in targets such as BPF, because the kernel loader
> chokes on the spurious symbol __divdi3 and makes the resulting BPF
> object unloadable.  Note that BPF objects are not linked before being
> loaded.
>
> This patch changes emit_library_call_value_1 to mark the target
> SYMBOL_REF as a libcall.  Then, the emission of the external
> declaration is done in the first loop of final.cc:shorten_branches.
> This happens only if the corresponding sequence has been kept.
>
> Regtested in x86_64-linux-gnu.
> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>>
>> I'm not sure that shorten_branches is a natural place to do this.
>> It isn't something that would normally emit asm text.
>
> Well, that was the approach suggested by another reviewer (Jakub) once
> my initial approach (in the V1) got rejected.  He explicitly suggested
> to use shorten_branches.
>
>> Would it be OK to emit the declaration at the same point as for decls,
>> which IIUC is process_pending_assemble_externals?  If so, how about
>> making assemble_external_libcall add the symbol to a list when
>> !SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
>> directly?  assemble_external_libcall could then also call get_identifier
>> on the name (perhaps after calling strip_name_encoding -- can't
>> remember whether assemble_external_libcall sees the encoded or
>> unencoded name).
>>
>> All being well, the call to get_identifier should cause
>> assemble_name_resolve to record when the name is used, via
>> TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
>> go through the list of libcalls recorded by assemble_external_libcall
>> and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.
>>
>> Not super elegant, but it seems to fit within the existing scheme.
>> And I don't there should be any problem with using get_identifier
>> for libcalls, since it isn't valid to use libcall names for other
>> types of symbol.
>
> This sounds way more complicated to me than the approach in V2, which
> seems to work and is thus a clear improvement compared to the current
> situation in the trunk.  The approach in V2 may be ugly, but it is
> simple and easy to understand.  Is the proposed more convoluted
> alternative really worth the extra complexity, given it is "not super
> elegant"?

Is it really that much more convoluted?  I was thinking of something
like the attached, which seems a bit shorter than V2, and does seem
to fix the bpf tests.

I think most (all?) libcalls already have an associated decl due to
optabs-libfuncs.cc, so an alternative to get_identifier would be to
set the SYMBOL_REF_DECL.  Using get_identiifer seems a bit more
lightweight though.

Richard


diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index b0eff17b8b5..073e3eb2579 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -2461,6 +2461,10 @@ contains_pointers_p (tree type)
it all the way to final.  See PR 17982 for further discussion.  */
 static GTY(()) tree pending_assemble_externals;
 
+/* A similar list of pending libcall symbols.  We only want to declare
+   symbols that are actually used in the final assembly.  */
+static GTY(()) rtx pending_libcall_symbols;
+
 #ifdef ASM_OUTPUT_EXTERNAL
 /* Some targets delay some output to final using TARGET_ASM_FILE_END.
As a result, assemble_external can be called after the list of externals
@@ -2516,12 +2520,20 @@ void
 process_pending_assemble_externals (void)
 {
 #ifdef ASM_OUTPUT_EXTERNAL
-  tree list;
-  for (list = pending_assemble_externals; list; list = TREE_CHAIN (list))
+  for (tree list = pending_assemble_externals; list; list = TREE_CHAIN (list))
 assemble_external_real (TREE_VALUE (list));
 
+  for (rtx list = pending_libcall_symbols; list; list = XEXP (list, 1))
+{
+  rtx symbol = XEXP (list, 0);
+  tree id =

[PATCH v2] RISC-V: Make xtheadcondmov-indirect tests robust against instruction reordering

2023-10-12 Thread Christoph Muellner

From: Christoph Müllner 

Fixes: c1bc7513b1d7 ("RISC-V: const: hide mvconst splitter from IRA")

A recent change broke the xtheadcondmov-indirect tests, because the order of
emitted instructions changed. Since the test is too strict when testing for
a fixed instruction order, let's change the tests to simply count instruction,
like it is done for similar tests.

Reported-by: Patrick O'Neill 
Signed-off-by: Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect.c: Make robust against
instruction reordering.

Signed-off-by: Christoph Müllner 
---
 .../gcc.target/riscv/xtheadcondmov-indirect.c | 89 ++-
 1 file changed, 29 insertions(+), 60 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect.c 
b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect.c
index c3253ba5239..427c9c1a41e 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect.c
@@ -1,16 +1,11 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32gc_xtheadcondmov -fno-sched-pressure" { target { 
rv32 } } } */
-/* { dg-options "-march=rv64gc_xtheadcondmov -fno-sched-pressure" { target { 
rv64 } } } */
+/* { dg-options "-march=rv32gc_xtheadcondmov" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_xtheadcondmov" { target { rv64 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
-/* { dg-final { check-function-bodies "**" "" } } */
 
-/*
-** ConEmv_imm_imm_reg:
-** addia[0-9]+,a[0-9]+,-1000
-** li  a[0-9]+,10
-** th\.mvnez   a[0-9]+,a[0-9]+,a[0-9]+
-** ret
-*/
+/* addi aX, aX, -1000
+   li aX, 10
+   th.mvnez aX, aX, aX  */
 int ConEmv_imm_imm_reg(int x, int y)
 {
   if (x == 1000)
@@ -18,13 +13,8 @@ int ConEmv_imm_imm_reg(int x, int y)
   return y;
 }
 
-/*
-** ConEmv_imm_reg_reg:
-** addia[0-9]+,a[0-9]+,-1000
-** th.mveqza[0-9]+,a[0-9]+,a[0-9]+
-** mv  a[0-9]+,a[0-9]+
-** ret
-*/
+/* addiaX, aX, -1000
+   th.mveqz aX, aX, aX  */
 int ConEmv_imm_reg_reg(int x, int y, int z)
 {
   if (x == 1000)
@@ -32,13 +22,9 @@ int ConEmv_imm_reg_reg(int x, int y, int z)
   return z;
 }
 
-/*
-** ConEmv_reg_imm_reg:
-** sub a[0-9]+,a[0-9]+,a[0-9]+
-** li  a[0-9]+,10
-** th.mvneza[0-9]+,a[0-9]+,a[0-9]+
-** ret
-*/
+/* sub aX, aX, aX
+   li aX, 10
+   th.mvnez aX, aX, aX  */
 int ConEmv_reg_imm_reg(int x, int y, int z)
 {
   if (x == y)
@@ -46,13 +32,8 @@ int ConEmv_reg_imm_reg(int x, int y, int z)
   return z;
 }
 
-/*
-** ConEmv_reg_reg_reg:
-** sub a[0-9]+,a[0-9]+,a[0-9]+
-** th.mveqza[0-9]+,a[0-9]+,a[0-9]+
-** mv  a[0-9]+,a[0-9]+
-** ret
-*/
+/* sub aX, aX, aX
+   th.mveqz aX, aX, aX  */
 int ConEmv_reg_reg_reg(int x, int y, int z, int n)
 {
   if (x == y)
@@ -60,14 +41,10 @@ int ConEmv_reg_reg_reg(int x, int y, int z, int n)
   return n;
 }
 
-/*
-** ConNmv_imm_imm_reg:
-** addia[0-9]+,a[0-9]+,-1000+
-** li  a[0-9]+,9998336+
-** addia[0-9]+,a[0-9]+,1664+
-** th.mveqza[0-9]+,a[0-9]+,a[0-9]+
-** ret
-*/
+/* addi aX, aX, -1000
+   li aX, 9998336
+   addi aX, aX, 1664
+   th.mveqz aX, aX, aX  */
 int ConNmv_imm_imm_reg(int x, int y)
 {
   if (x != 1000)
@@ -75,13 +52,8 @@ int ConNmv_imm_imm_reg(int x, int y)
   return y;
 }
 
-/*
-**ConNmv_imm_reg_reg:
-** addia[0-9]+,a[0-9]+,-1000+
-** th.mvneza[0-9]+,a[0-9]+,a[0-9]+
-** mv  a[0-9]+,a[0-9]+
-** ret
-*/
+/* addi aX, aX, 1000
+   th.mvnez aX, aX, aX  */
 int ConNmv_imm_reg_reg(int x, int y, int z)
 {
   if (x != 1000)
@@ -89,13 +61,9 @@ int ConNmv_imm_reg_reg(int x, int y, int z)
   return z;
 }
 
-/*
-**ConNmv_reg_imm_reg:
-** sub a[0-9]+,a[0-9]+,a[0-9]+
-** li  a[0-9]+,10+
-** th.mveqza[0-9]+,a[0-9]+,a[0-9]+
-** ret
-*/
+/* sub aX, aX, aX
+   li aX, 10
+   th.mveqz aX, aX, aX  */
 int ConNmv_reg_imm_reg(int x, int y, int z)
 {
   if (x != y)
@@ -103,16 +71,17 @@ int ConNmv_reg_imm_reg(int x, int y, int z)
   return z;
 }
 
-/*
-**ConNmv_reg_reg_reg:
-** sub a[0-9]+,a[0-9]+,a[0-9]+
-** th.mvneza[0-9]+,a[0-9]+,a[0-9]+
-** mv  a[0-9]+,a[0-9]+
-** ret
-*/
+/* sub aX, aX, aX
+   th.mvnez aX, aX, aX  */
 int ConNmv_reg_reg_reg(int x, int y, int z, int n)
 {
   if (x != y)
 return z;
   return n;
 }
+
+/* { dg-final { scan-assembler-times "addi\t" 5 } } */
+/* { dg-final { scan-assembler-times "li\t" 4 } } */
+/* { dg-final { scan-assembler-times "sub\t" 4 } } */
+/* { dg-final { scan-assembler-times "th.mveqz\t" 4 } } */
+/* { dg-final { scan-assembler-times "th.mvnez\t" 4 } } */
-- 
2.41.0

[avr,committed] Implement atan2

2023-10-12 Thread Georg-Johann Lay


This implements atan2 which was missing from LibF7.

Johann

--

LibF7: Implement atan2.

libgcc/config/avr/libf7/
* libf7.c (F7MOD_atan2_, f7_atan2): New module and function.
* libf7.h: Adjust comments.
* libf7-common.mk (CALL_PROLOGUES): Add atan2.


diff --git a/libgcc/config/avr/libf7/libf7-common.mk 
b/libgcc/config/avr/libf7/libf7-common.mk

index 28663b52e6c..e417715a7e5 100644
--- a/libgcc/config/avr/libf7/libf7-common.mk
+++ b/libgcc/config/avr/libf7/libf7-common.mk
@@ -43,7 +43,7 @@ m_xd += lrint lround
 # -mcall-prologues
 CALL_PROLOGUES += divx sqrt cbrt get_double set_double logx exp exp10 
pow10

 CALL_PROLOGUES += put_C truncx round minmax sincos tan cotan pow powi fmod
-CALL_PROLOGUES += atan asinacos madd_msub hypot init horner sinhcosh tanh
+CALL_PROLOGUES += atan atan2 asinacos madd_msub hypot init horner 
sinhcosh tanh


 # -mstrict-X
 STRICT_X += log addsub truncx ldexp exp
diff --git a/libgcc/config/avr/libf7/libf7.c 
b/libgcc/config/avr/libf7/libf7.c

index 0d9e4c325b2..49baac73e6d 100644
--- a/libgcc/config/avr/libf7/libf7.c
+++ b/libgcc/config/avr/libf7/libf7.c
@@ -1099,7 +1099,7 @@ f7_t* f7_ldexp (f7_t *cc, const f7_t *aa, int delta)

   F7_CONST_ADDR ( CST, f7_t* PTMP)

-  Return an LD address to for some f7_const_X[_P] constant.
+  Return an LD address to some f7_const_X[_P] constant.
   *PTMP might be needed to hold a copy of f7_const_X_P in RAM.

   f7_t*   F7_U16_ADDR (uint16_t X, f7_t* PTMP)   // USE_LPM
@@ -2189,6 +2189,64 @@ void f7_atan (f7_t *cc, const f7_t *aa)
 #endif // F7MOD_atan_


+#ifdef F7MOD_atan2_
+F7_WEAK
+void f7_atan2 (f7_t *cc, const f7_t *yy, const f7_t *xx)
+{
+  uint8_t y_class = f7_classify (yy);
+  uint8_t x_class = f7_classify (xx);
+
+  // (NaN, *) -> NaN
+  // (*, NaN) -> NaN
+  if (f7_class_nan (y_class | x_class))
+return f7_set_nan (cc);
+
+  // (0, 0) -> 0
+  if (f7_class_zero (y_class & x_class))
+return f7_clr (cc);
+
+  f7_t pi7, *pi = &pi7;
+  f7_const (pi, pi);
+
+  // (Inf, +Inf) -> +pi/4;(-Inf, +Inf) -> +3pi/4
+  // (Inf, -Inf) -> -pi/4;(-Inf, -Inf) -> -3pi/4
+  if (f7_class_inf (y_class & x_class))
+{
+  f7_copy (cc, pi);
+  if (! f7_class_sign (x_class))
+   cc->expo = F7_(const_pi_expo) - 1; // pi / 2
+  pi->expo = F7_(const_pi_expo) - 2;   // pi / 4
+  f7_Isub (cc, pi);
+  cc->flags = y_class & F7_FLAG_sign;
+  return;
+}
+
+  // sign(pi) := sign(y)
+  pi->flags = y_class & F7_FLAG_sign;
+
+  // Only use atan(*) with |*| <= 1.
+
+  if (f7_cmp_abs (yy, xx) > 0)
+{
+  // |y| > |x|:  atan2 = sgn(y) * pi/2 - atan (x / y);
+  pi->expo = F7_(const_pi_expo) - 1;  // +- pi / 2
+  f7_div (cc, xx, yy);
+  f7_atan (cc, cc);
+  f7_IRsub (cc, pi);
+}
+  else
+{
+  // x >  |y|:  atan2 = atan (y / x)
+  // x < -|y|:  atan2 = atan (y / x) +- pi
+  f7_div (cc, yy, xx);
+  f7_atan (cc, cc);
+  if (f7_class_sign (x_class))
+   f7_Iadd (cc, pi);
+}
+}
+#endif // F7MOD_atan2_
+
+
 #ifdef F7MOD_asinacos_

 #define ARRAY_NAME coeff_func_a_zahler
diff --git a/libgcc/config/avr/libf7/libf7.h 
b/libgcc/config/avr/libf7/libf7.h

index f692854dced..b50e6e218ba 100644
--- a/libgcc/config/avr/libf7/libf7.h
+++ b/libgcc/config/avr/libf7/libf7.h
@@ -606,6 +606,7 @@ extern void f7_sin (f7_t*, const f7_t*);
 extern void f7_cos (f7_t*, const f7_t*);
 extern void f7_tan (f7_t*, const f7_t*);
 extern void f7_atan (f7_t*, const f7_t*);
+extern void f7_atan2 (f7_t*, const f7_t*, const f7_t*);
 extern void f7_asin (f7_t*, const f7_t*);
 extern void f7_acos (f7_t*, const f7_t*);
 extern void f7_tanh (f7_t*, const f7_t*);
@@ -617,7 +618,6 @@ extern void f7_exp10 (f7_t*, const f7_t*);
 extern void f7_pow10 (f7_t*, const f7_t*);

 // Just prototypes, not implemented yet.
-extern void f7_atan2 (f7_t*, const f7_t*, const f7_t*);
 extern long f7_lrint (const f7_t*);
 extern long f7_lround (const f7_t*);

Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-12 Thread Richard Sandiford

Robin Dapp via Gcc-patches  writes:
> Hi,
>
> as Juzhe noticed in gcc.dg/pr92301.c there was still something missing in
> the last patch.  The attached v2 makes sure we always have a COND_LEN 
> operation
> before returning true and initializes len and bias even if they are unused.
>
> Bootstrapped and regtested on aarch64 and x86.

Sorry for the slow review.  I was hoping Richi would take it,
but I see he was hoping the same from me.

> Regards
>  Robin
>
> Subject: [PATCH v2] gimple-match: Do not try UNCOND optimization with
>  COND_LEN.
>
> On riscv we mis-optimize conditional (length) operations into
> unconditional operations e.g. in slp-reduc-7.c and
> gcc.dg/pr92301.c.
>
> This patch prevents optimizing e.g.
>  COND_LEN_ADD ({-1, ... }, a, 0, c, len, bias)
> unconditionally into just "a".
>
> Currently, we assume that COND_LEN operations can be optimized similarly
> to COND operations.  As the length is part of the mask (and usually not
> compile-time constant), we must not perform any optimization that relies
> on just the mask being "true".  This patch ensures that we still have a
> COND_LEN pattern after optimization.
>
> gcc/ChangeLog:
>
>   PR target/111311
>   * gimple-match-exports.cc (maybe_resimplify_conditional_op):
>   Check for length masking.
>   (try_conditional_simplification): Check that the result is still
>   length masked.
> ---
>  gcc/gimple-match-exports.cc | 38 ++---
>  gcc/gimple-match.h  |  3 ++-
>  2 files changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..d41de98a3d3 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -262,7 +262,8 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
> gimple_match_op *res_op,
>if (!res_op->cond.cond)
>  return false;
>  
> -  if (!res_op->cond.else_value
> +  if (!res_op->cond.len
> +  && !res_op->cond.else_value
>&& res_op->code.is_tree_code ())
>  {
>/* The "else" value doesn't matter.  If the "then" value is a

Why are the contents of this if statement wrong for COND_LEN?
If the "else" value doesn't matter, then the masked form can use
the "then" value for all elements.  I would have expected the same
thing to be true of COND_LEN.

> @@ -301,9 +302,12 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
> gimple_match_op *res_op,
>  
>/* If the "then" value is a gimple value and the "else" value matters,
>   create a VEC_COND_EXPR between them, then see if it can be further
> - simplified.  */
> + simplified.
> + Don't do this if we have a COND_LEN_ as that would make us lose the
> + length masking.  */
>gimple_match_op new_op;
> -  if (res_op->cond.else_value
> +  if (!res_op->cond.len
> +  && res_op->cond.else_value
>&& VECTOR_TYPE_P (res_op->type)
>&& gimple_simplified_result_is_gimple_val (res_op))
>  {

The change LGTM, but it would be nice to phrase the comment to avoid
the "Do A.  Don't do A if B" pattern.  Maybe:

  /* If the condition represents MASK ? THEN : ELSE, where THEN is a gimple
 value and ELSE matters, create a VEC_COND_EXPR between them, then see
 if it can be further simplified.  */

> @@ -314,7 +318,7 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
> gimple_match_op *res_op,
>return gimple_resimplify3 (seq, res_op, valueize);
>  }
>  
> -  /* Otherwise try rewriting the operation as an IFN_COND_* call.
> +  /* Otherwise try rewriting the operation as an IFN_COND_(LEN_)* call.
>   Again, this isn't a simplification in itself, since it's what
>   RES_OP already described.  */
>if (convert_conditional_op (res_op, &new_op))
> @@ -386,9 +390,29 @@ try_conditional_simplification (internal_fn ifn, 
> gimple_match_op *res_op,
>  default:
>gcc_unreachable ();
>  }
> -  *res_op = cond_op;
> -  maybe_resimplify_conditional_op (seq, res_op, valueize);
> -  return true;
> +
> +  if (len)
> +{
> +  /* If we had a COND_LEN before we need to ensure that it stays that
> +  way.  */
> +  gimple_match_op old_op = *res_op;
> +  *res_op = cond_op;
> +  maybe_resimplify_conditional_op (seq, res_op, valueize);
> +
> +  auto cfn = combined_fn (res_op->code);
> +  if (internal_fn_p (cfn)
> +   && internal_fn_len_index (as_internal_fn (cfn)) != -1)
> + return true;

Why isn't it enough to check the result of maybe_resimplify_conditional_op?

Thanks,
Richard

> +
> +  *res_op = old_op;
> +  return false;
> +}
> +  else
> +{
> +  *res_op = cond_op;
> +  maybe_resimplify_conditional_op (seq, res_op, valueize);
> +  return true;
> +}
>  }
>  
>  /* Helper for the autogenerated code, valueize OP.  */
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index bec3ff42e3e..d192b7dae3e 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -56,7 +56,

Re: [PATCH V2] Emit funcall external declarations only if actually used.

2023-10-12 Thread Jose E. Marchesi



> "Jose E. Marchesi"  writes:
>> Hi Richard.
>> Thanks for looking at this! :)
>>
>>
>>> "Jose E. Marchesi"  writes:
 ping
>>>
>>> I don't know this code very well, and have AFAIR haven't worked
>>> with an assembler that requires external declarations, but since
>>> it's at a second ping :)
>>>

> ping
>
>> [Differences from V1:
>> - Prototype for call_from_call_insn moved before comment block.
>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>> - Fallback to check REG_CALL_DECL in non-direct calls.
>> - New test to check correct behavior for non-direct calls.]
>>
>> There are many places in GCC where alternative local sequences are
>> tried in order to determine what is the cheapest or best alternative
>> to use in the current target.  When any of these sequences involve a
>> libcall, the current implementation of emit_library_call_value_1
>> introduce a side-effect consisting on emitting an external declaration
>> for the funcall (such as __divdi3) which is thus emitted even if the
>> sequence that does the libcall is not retained.
>>
>> This is problematic in targets such as BPF, because the kernel loader
>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>> object unloadable.  Note that BPF objects are not linked before being
>> loaded.
>>
>> This patch changes emit_library_call_value_1 to mark the target
>> SYMBOL_REF as a libcall.  Then, the emission of the external
>> declaration is done in the first loop of final.cc:shorten_branches.
>> This happens only if the corresponding sequence has been kept.
>>
>> Regtested in x86_64-linux-gnu.
>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>>>
>>> I'm not sure that shorten_branches is a natural place to do this.
>>> It isn't something that would normally emit asm text.
>>
>> Well, that was the approach suggested by another reviewer (Jakub) once
>> my initial approach (in the V1) got rejected.  He explicitly suggested
>> to use shorten_branches.
>>
>>> Would it be OK to emit the declaration at the same point as for decls,
>>> which IIUC is process_pending_assemble_externals?  If so, how about
>>> making assemble_external_libcall add the symbol to a list when
>>> !SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
>>> directly?  assemble_external_libcall could then also call get_identifier
>>> on the name (perhaps after calling strip_name_encoding -- can't
>>> remember whether assemble_external_libcall sees the encoded or
>>> unencoded name).
>>>
>>> All being well, the call to get_identifier should cause
>>> assemble_name_resolve to record when the name is used, via
>>> TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
>>> go through the list of libcalls recorded by assemble_external_libcall
>>> and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.
>>>
>>> Not super elegant, but it seems to fit within the existing scheme.
>>> And I don't there should be any problem with using get_identifier
>>> for libcalls, since it isn't valid to use libcall names for other
>>> types of symbol.
>>
>> This sounds way more complicated to me than the approach in V2, which
>> seems to work and is thus a clear improvement compared to the current
>> situation in the trunk.  The approach in V2 may be ugly, but it is
>> simple and easy to understand.  Is the proposed more convoluted
>> alternative really worth the extra complexity, given it is "not super
>> elegant"?
>
> Is it really that much more convoluted?  I was thinking of something
> like the attached, which seems a bit shorter than V2, and does seem
> to fix the bpf tests.

o_O
Ok I clearly misunderstood what you was proposing.  This is way simpler!

How does the magic of TREE_SYMBOL_REFERENCED work?  How is it set to
`true' only if the RTL containing the call is retained in the final
chain?

> I think most (all?) libcalls already have an associated decl due to
> optabs-libfuncs.cc, so an alternative to get_identifier would be to
> set the SYMBOL_REF_DECL.  Using get_identiifer seems a bit more
> lightweight though.
>
> Richard
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index b0eff17b8b5..073e3eb2579 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -2461,6 +2461,10 @@ contains_pointers_p (tree type)
> it all the way to final.  See PR 17982 for further discussion.  */
>  static GTY(()) tree pending_assemble_externals;
>  
> +/* A similar list of pending libcall symbols.  We only want to declare
> +   symbols that are actually used in the final assembly.  */
> +static GTY(()) rtx pending_libcall_symbols;
> +
>  #ifdef ASM_OUTPUT_EXTERNAL
>  /* Some targets delay some output to final using TARGET_ASM_FILE_END.
> As a result, assemble_external can be called after the list of externals
> @@ -2516,12 +2520,20 @@ void
>  process_pending_assemble_externals (void)
>  {
>  #ifdef ASM_OUTPUT_EXTER

Re: [PATCH] RISCV: Bugfix for incorrect documentation heading nesting

2023-10-12 Thread Jeff Law





On 10/12/23 04:05, Mary Bennett wrote:

gcc/ChangeLog:
 * doc/extend.texi: Change subsubsection to subsection for
   CORE-V built-ins.

This is OK.  I'll commit it shortly it.

jeff

Re: [PATCH V2] Emit funcall external declarations only if actually used.

2023-10-12 Thread Richard Sandiford

"Jose E. Marchesi"  writes:
>> "Jose E. Marchesi"  writes:
>>> Hi Richard.
>>> Thanks for looking at this! :)
>>>
>>>
 "Jose E. Marchesi"  writes:
> ping

 I don't know this code very well, and have AFAIR haven't worked
 with an assembler that requires external declarations, but since
 it's at a second ping :)

>
>> ping
>>
>>> [Differences from V1:
>>> - Prototype for call_from_call_insn moved before comment block.
>>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>>> - Fallback to check REG_CALL_DECL in non-direct calls.
>>> - New test to check correct behavior for non-direct calls.]
>>>
>>> There are many places in GCC where alternative local sequences are
>>> tried in order to determine what is the cheapest or best alternative
>>> to use in the current target.  When any of these sequences involve a
>>> libcall, the current implementation of emit_library_call_value_1
>>> introduce a side-effect consisting on emitting an external declaration
>>> for the funcall (such as __divdi3) which is thus emitted even if the
>>> sequence that does the libcall is not retained.
>>>
>>> This is problematic in targets such as BPF, because the kernel loader
>>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>>> object unloadable.  Note that BPF objects are not linked before being
>>> loaded.
>>>
>>> This patch changes emit_library_call_value_1 to mark the target
>>> SYMBOL_REF as a libcall.  Then, the emission of the external
>>> declaration is done in the first loop of final.cc:shorten_branches.
>>> This happens only if the corresponding sequence has been kept.
>>>
>>> Regtested in x86_64-linux-gnu.
>>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.

 I'm not sure that shorten_branches is a natural place to do this.
 It isn't something that would normally emit asm text.
>>>
>>> Well, that was the approach suggested by another reviewer (Jakub) once
>>> my initial approach (in the V1) got rejected.  He explicitly suggested
>>> to use shorten_branches.
>>>
 Would it be OK to emit the declaration at the same point as for decls,
 which IIUC is process_pending_assemble_externals?  If so, how about
 making assemble_external_libcall add the symbol to a list when
 !SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
 directly?  assemble_external_libcall could then also call get_identifier
 on the name (perhaps after calling strip_name_encoding -- can't
 remember whether assemble_external_libcall sees the encoded or
 unencoded name).

 All being well, the call to get_identifier should cause
 assemble_name_resolve to record when the name is used, via
 TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
 go through the list of libcalls recorded by assemble_external_libcall
 and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.

 Not super elegant, but it seems to fit within the existing scheme.
 And I don't there should be any problem with using get_identifier
 for libcalls, since it isn't valid to use libcall names for other
 types of symbol.
>>>
>>> This sounds way more complicated to me than the approach in V2, which
>>> seems to work and is thus a clear improvement compared to the current
>>> situation in the trunk.  The approach in V2 may be ugly, but it is
>>> simple and easy to understand.  Is the proposed more convoluted
>>> alternative really worth the extra complexity, given it is "not super
>>> elegant"?
>>
>> Is it really that much more convoluted?  I was thinking of something
>> like the attached, which seems a bit shorter than V2, and does seem
>> to fix the bpf tests.
>
> o_O
> Ok I clearly misunderstood what you was proposing.  This is way simpler!
>
> How does the magic of TREE_SYMBOL_REFERENCED work?  How is it set to
> `true' only if the RTL containing the call is retained in the final
> chain?

It happens in assemble_name, via assemble_name_resolve.  The system
relies on code using that rather than assemble_name_raw for symbols
that might need to be declared, or that might need visibility
information attached.  (It relies on that in general, I mean,
not just for this patch.)

Thanks,
Richard

[PATCH v1] RISC-V: Support FP lceil/lceilf auto vectorization

2023-10-12 Thread pan2 . li

From: Pan Li 

This patch would like to support the FP lceil/lceilf auto vectorization.

* long lceil (double) for rv64
* long lceilf (float) for rv32

Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lceilmn2 only act on DF => DI for
rv64, and SF => SI for rv32.

Given we have code like:

void
test_lceil (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lceil (in[i]);
}

Before this patch:
.L3:
  ...
  fld fa5,0(a1)
  fcvt.l.da5,fa5,rup
  sd  a5,-8(a0)
  ...
  bne a1,a4,.L3

After this patch:
  frrma6
  ...
  fsrmi   3 // RUP
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
  ...
  fsrma6

The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.

gcc/ChangeLog:

* config/riscv/autovec.md (lceil2): New
pattern] for lceil/lceilf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lceil): New func decl for expanding lceil.
* config/riscv/riscv-v.cc (expand_vec_lceil): New func impl
for expanding lceil.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lceil-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   | 11 +++
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv-v.cc   | 10 +++
 .../riscv/rvv/autovec/unop/math-lceil-0.c | 19 +
 .../riscv/rvv/autovec/unop/math-lceil-1.c | 19 +
 .../riscv/rvv/autovec/unop/math-lceil-run-0.c | 69 +++
 .../riscv/rvv/autovec/unop/math-lceil-run-1.c | 69 +++
 .../riscv/rvv/autovec/vls/math-lceil-0.c  | 30 
 .../riscv/rvv/autovec/vls/math-lceil-1.c  | 30 
 9 files changed, 259 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lceil-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lceil-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lceil-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lceil-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 33b11723c21..267691a0095 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2241,6 +2241,7 @@ (define_expand "avg3_ceil"
 ;; - roundeven/roundevenf
 ;; - lrint/lrintf
 ;; - irintf
+;; - lceil/lceilf
 ;; -
 (define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2331,3 +2332,13 @@ (define_expand "lround2"
 DONE;
   }
 )
+
+(define_expand "lceil2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b7eeeb8f55d..ab65ab19524 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -303,6 +303,7 @@ enum insn_type : unsigned int
   UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
   UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
+  UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P,
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
@@ -477,6 +478,7 @@ void expand_vec_trunc (rtx, rtx, machine_mode, 
machine_mode);
 void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lround (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b61c745678b..b03213dd8ed 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4132,4 +4132,14 @@ expand_vec_lround (rtx op_0, rtx op

Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-12 Thread Richard Sandiford

Richard Sandiford  writes:
> Robin Dapp via Gcc-patches  writes:
>> [...]
>> @@ -386,9 +390,29 @@ try_conditional_simplification (internal_fn ifn, 
>> gimple_match_op *res_op,
>>  default:
>>gcc_unreachable ();
>>  }
>> -  *res_op = cond_op;
>> -  maybe_resimplify_conditional_op (seq, res_op, valueize);
>> -  return true;
>> +
>> +  if (len)
>> +{
>> +  /* If we had a COND_LEN before we need to ensure that it stays that
>> + way.  */
>> +  gimple_match_op old_op = *res_op;
>> +  *res_op = cond_op;
>> +  maybe_resimplify_conditional_op (seq, res_op, valueize);
>> +
>> +  auto cfn = combined_fn (res_op->code);
>> +  if (internal_fn_p (cfn)
>> +  && internal_fn_len_index (as_internal_fn (cfn)) != -1)
>> +return true;
>
> Why isn't it enough to check the result of maybe_resimplify_conditional_op?

Sorry, ignore that part.  I get it now.

But isn't the test whether res_op->code itself is an internal_function?
In other words, shouldn't it just be:

  if (internal_fn_p (res_op->code)
  && internal_fn_len_index (as_internal_fn (res_op->code)) != -1)
return true;

maybe_resimplify_conditional_op should already have converted to an
internal function where possible, and if combined_fn (res_op->code)
does any extra conversion on the fly, that conversion won't be reflected
in res_op.

Thanks,
Richard

[PATCH 5/6]AArch64: Fix Armv9-a warnings that get emitted whenever a ACLE header is used.

2023-10-12 Thread Tamar Christina

Hi All,

At the moment, trying to use -march=armv9-a with any ACLE header such as
arm_neon.h results in rows and rows of warnings saying:

: warning: "__ARM_ARCH" redefined
: note: this is the location of the previous definition

This is obviously not useful and happens because the header was defined at
__ARM_ARCH == 8 and the commandline changes it.

The Arm port solves this by undef the macro during argument processing and we do
the same on AArch64 for the majority of macros.  However we define this macro
using a different helper which requires the manual undef.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Add undef.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/armv9_warning.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
578ec6f45b06347d90f951b37064006786baf10f..ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -82,6 +82,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
 {
   aarch64_def_or_undef (flag_unsafe_math_optimizations, "__ARM_FP_FAST", 
pfile);
 
+  cpp_undef (pfile, "__ARM_ARCH");
   builtin_define_with_int_value ("__ARM_ARCH", AARCH64_ISA_V9A ? 9 : 8);
 
   builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM",
diff --git a/gcc/testsuite/gcc.target/aarch64/armv9_warning.c 
b/gcc/testsuite/gcc.target/aarch64/armv9_warning.c
new file mode 100644
index 
..35690d5bce790e11331788aacef00f3f35cdf216
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/armv9_warning.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv9-a -Wpedantic -Werror" } */
+
+#include 
+




-- 
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
578ec6f45b06347d90f951b37064006786baf10f..ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -82,6 +82,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
 {
   aarch64_def_or_undef (flag_unsafe_math_optimizations, "__ARM_FP_FAST", 
pfile);
 
+  cpp_undef (pfile, "__ARM_ARCH");
   builtin_define_with_int_value ("__ARM_ARCH", AARCH64_ISA_V9A ? 9 : 8);
 
   builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM",
diff --git a/gcc/testsuite/gcc.target/aarch64/armv9_warning.c 
b/gcc/testsuite/gcc.target/aarch64/armv9_warning.c
new file mode 100644
index 
..35690d5bce790e11331788aacef00f3f35cdf216
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/armv9_warning.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv9-a -Wpedantic -Werror" } */
+
+#include 
+

Re: [PATCH 5/6]AArch64: Fix Armv9-a warnings that get emitted whenever a ACLE header is used.

2023-10-12 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> At the moment, trying to use -march=armv9-a with any ACLE header such as
> arm_neon.h results in rows and rows of warnings saying:
>
> : warning: "__ARM_ARCH" redefined
> : note: this is the location of the previous definition
>
> This is obviously not useful and happens because the header was defined at
> __ARM_ARCH == 8 and the commandline changes it.
>
> The Arm port solves this by undef the macro during argument processing and we 
> do
> the same on AArch64 for the majority of macros.  However we define this macro
> using a different helper which requires the manual undef.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Add undef.

OK!  Thanks for fixing this.

Richard.

>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/armv9_warning.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index 
> 578ec6f45b06347d90f951b37064006786baf10f..ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b
>  100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -82,6 +82,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>  {
>aarch64_def_or_undef (flag_unsafe_math_optimizations, "__ARM_FP_FAST", 
> pfile);
>  
> +  cpp_undef (pfile, "__ARM_ARCH");
>builtin_define_with_int_value ("__ARM_ARCH", AARCH64_ISA_V9A ? 9 : 8);
>  
>builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM",
> diff --git a/gcc/testsuite/gcc.target/aarch64/armv9_warning.c 
> b/gcc/testsuite/gcc.target/aarch64/armv9_warning.c
> new file mode 100644
> index 
> ..35690d5bce790e11331788aacef00f3f35cdf216
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/armv9_warning.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv9-a -Wpedantic -Werror" } */
> +
> +#include 
> +

Re: Ping: [PATCH v2 1/2] testsuite: Add dg-require-atomic-cmpxchg-word

2023-10-12 Thread Christophe Lyon

LGTM but I'm not a maintainer ;-)

On Thu, 12 Oct 2023 at 04:21, Hans-Peter Nilsson  wrote:
>
> Ping.
>
> > From: Hans-Peter Nilsson 
> > Date: Wed, 4 Oct 2023 19:04:55 +0200
> >
> > > From: Hans-Peter Nilsson 
> > > Date: Wed, 4 Oct 2023 17:15:28 +0200
> >
> > > New version coming up.
> >
> > Using pointer-sized int instead of int,
> > __atomic_compare_exchange instead of __atomic_exchange,
> > renamed to atomic-cmpxchg-word from atomic-exchange, and
> > updating a comment that already seemed reasonably well
> > placed.
> >
> > Tested as with v1 1/2.
> >
> > Ok to commit?
> >
> > -- >8 --
> > Some targets (armv6-m) support inline atomic load and store,
> > i.e. dg-require-thread-fence matches, but not atomic operations like
> > compare and exchange.
> >
> > This directive can be used to replace uses of dg-require-thread-fence
> > where an atomic operation is actually used.
> >
> >   * testsuite/lib/dg-options.exp (dg-require-atomic-cmpxchg-word):
> >   New proc.
> >   * testsuite/lib/libstdc++.exp (check_v3_target_atomic_cmpxchg_word):
> >   Ditto.
> > ---
> >  libstdc++-v3/testsuite/lib/dg-options.exp |  9 ++
> >  libstdc++-v3/testsuite/lib/libstdc++.exp  | 37 +++
> >  2 files changed, 46 insertions(+)
> >
> > diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp 
> > b/libstdc++-v3/testsuite/lib/dg-options.exp
> > index 84ad0c65330b..850442b6b7c1 100644
> > --- a/libstdc++-v3/testsuite/lib/dg-options.exp
> > +++ b/libstdc++-v3/testsuite/lib/dg-options.exp
> > @@ -133,6 +133,15 @@ proc dg-require-thread-fence { args } {
> >  return
> >  }
> >
> > +proc dg-require-atomic-cmpxchg-word { args } {
> > +if { ![ check_v3_target_atomic_cmpxchg_word ] } {
> > + upvar dg-do-what dg-do-what
> > + set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
> > + return
> > +}
> > +return
> > +}
> > +
> >  proc dg-require-atomic-builtins { args } {
> >  if { ![ check_v3_target_atomic_builtins ] } {
> >   upvar dg-do-what dg-do-what
> > diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
> > b/libstdc++-v3/testsuite/lib/libstdc++.exp
> > index 608056e5068e..4bedb36dc6f9 100644
> > --- a/libstdc++-v3/testsuite/lib/libstdc++.exp
> > +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
> > @@ -1221,6 +1221,43 @@ proc check_v3_target_thread_fence { } {
> >  }]
> >  }
> >
> > +proc check_v3_target_atomic_cmpxchg_word { } {
> > +return [check_v3_target_prop_cached et_atomic_cmpxchg_word {
> > + global cxxflags
> > + global DEFAULT_CXXFLAGS
> > +
> > + # Set up and link a C++11 test program that depends on
> > + # atomic-compare-exchange being available for a pointer-sized
> > + # integer.  It should be sufficient as gcc can derive all
> > + # other operations when a target implements this operation.
> > + set src atomic_cmpxchg_word[pid].cc
> > +
> > + set f [open $src "w"]
> > + puts $f "
> > + __UINTPTR_TYPE__ i, j, k;
> > + int main() {
> > + __atomic_compare_exchange (&i, &j, &k, 1, __ATOMIC_SEQ_CST, 
> > __ATOMIC_SEQ_CST);
> > + return 0;
> > + }"
> > + close $f
> > +
> > + set cxxflags_saved $cxxflags
> > + set cxxflags "$cxxflags $DEFAULT_CXXFLAGS -Werror -std=gnu++11"
> > +
> > + set lines [v3_target_compile $src /dev/null executable ""]
> > + set cxxflags $cxxflags_saved
> > + file delete $src
> > +
> > + if [string match "" $lines] {
> > + # No error message, linking succeeded.
> > + return 1
> > + } else {
> > + verbose "check_v3_target_atomic_cmpxchg_word: compilation failed" 
> > 2
> > + return 0
> > + }
> > +}]
> > +}
> > +
> >  # Return 1 if atomics_bool and atomic_int are always lock-free, 0 
> > otherwise.
> >  proc check_v3_target_atomic_builtins { } {
> >  return [check_v3_target_prop_cached et_atomic_builtins {
> > --
> > 2.30.2
> >

Re: Ping: [PATCH v2 2/2] testsuite: Replace many dg-require-thread-fence with dg-require-atomic-cmpxchg-word

2023-10-12 Thread Christophe Lyon

LGTM but I'm not a maintainer ;-)

On Thu, 12 Oct 2023 at 04:22, Hans-Peter Nilsson  wrote:
>
> Ping.
>
> > From: Hans-Peter Nilsson 
> > Date: Wed, 4 Oct 2023 19:08:16 +0200
> >
> > s/atomic-exchange/atomic-cmpxchg-word/g.
> > Tested as v1.
> >
> > Ok to commit?
> > -- >8 --
> > These tests actually use a form of atomic compare and exchange
> > operation, not just atomic loading and storing.  Some targets (not
> > supported by e.g. libatomic) have atomic loading and storing, but not
> > compare and exchange, yielding linker errors for missing library
> > functions.
> >
> > This change is just for existing uses of
> > dg-require-thread-fence.  It does not fix any other tests
> > that should also be gated on dg-require-atomic-cmpxchg-word.
> >
> >   * testsuite/29_atomics/atomic/compare_exchange_padding.cc,
> >   testsuite/29_atomics/atomic_flag/clear/1.cc,
> >   testsuite/29_atomics/atomic_flag/cons/value_init.cc,
> >   testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc,
> >   testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc,
> >   testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc,
> >   testsuite/29_atomics/atomic_ref/generic.cc,
> >   testsuite/29_atomics/atomic_ref/integral.cc,
> >   testsuite/29_atomics/atomic_ref/pointer.cc: Replace
> >   dg-require-thread-fence with dg-require-atomic-cmpxchg-word.
> > ---
> >  .../testsuite/29_atomics/atomic/compare_exchange_padding.cc | 2 +-
> >  libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc| 2 +-
> >  .../testsuite/29_atomics/atomic_flag/cons/value_init.cc | 2 +-
> >  .../testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc   | 2 +-
> >  .../testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc   | 2 +-
> >  .../testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc | 2 +-
> >  libstdc++-v3/testsuite/29_atomics/atomic_ref/generic.cc | 2 +-
> >  libstdc++-v3/testsuite/29_atomics/atomic_ref/integral.cc| 2 +-
> >  libstdc++-v3/testsuite/29_atomics/atomic_ref/pointer.cc | 2 +-
> >  9 files changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git 
> > a/libstdc++-v3/testsuite/29_atomics/atomic/compare_exchange_padding.cc 
> > b/libstdc++-v3/testsuite/29_atomics/atomic/compare_exchange_padding.cc
> > index 01f7475631e6..859629e625f8 100644
> > --- a/libstdc++-v3/testsuite/29_atomics/atomic/compare_exchange_padding.cc
> > +++ b/libstdc++-v3/testsuite/29_atomics/atomic/compare_exchange_padding.cc
> > @@ -1,5 +1,5 @@
> >  // { dg-do run { target c++20 } }
> > -// { dg-require-thread-fence "" }
> > +// { dg-require-atomic-cmpxchg-word "" }
> >  // { dg-add-options libatomic }
> >
> >  #include 
> > diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc 
> > b/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc
> > index 89ed381fe057..2e154178dbd7 100644
> > --- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc
> > +++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc
> > @@ -1,5 +1,5 @@
> >  // { dg-do run { target c++11 } }
> > -// { dg-require-thread-fence "" }
> > +// { dg-require-atomic-cmpxchg-word "" }
> >
> >  // Copyright (C) 2009-2023 Free Software Foundation, Inc.
> >  //
> > diff --git 
> > a/libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/value_init.cc 
> > b/libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/value_init.cc
> > index f3f38b54dbcd..6439873be133 100644
> > --- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/value_init.cc
> > +++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/value_init.cc
> > @@ -16,7 +16,7 @@
> >  // .
> >
> >  // { dg-do run { target c++20 } }
> > -// { dg-require-thread-fence "" }
> > +// { dg-require-atomic-cmpxchg-word "" }
> >
> >  #include 
> >  #include 
> > diff --git 
> > a/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc 
> > b/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc
> > index 6f723eb5f4e7..6cb1ae2b6dda 100644
> > --- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc
> > +++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc
> > @@ -1,5 +1,5 @@
> >  // { dg-do run { target c++11 } }
> > -// { dg-require-thread-fence "" }
> > +// { dg-require-atomic-cmpxchg-word "" }
> >
> >  // Copyright (C) 2008-2023 Free Software Foundation, Inc.
> >  //
> > diff --git 
> > a/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc 
> > b/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc
> > index 6f723eb5f4e7..6cb1ae2b6dda 100644
> > --- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc
> > +++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc
> > @@ -1,5 +1,5 @@
> >  // { dg-do run { target c++11 } }
> > -// { dg-require-thread-fence "" }
> > +// { dg-require-atomic-cmpxchg-word "" }
> >
> >  // Copyright (C) 2008-2023 Free Software Foundation

Re: [PATCH v1] RISC-V: Support FP llrint auto vectorization

2023-10-12 Thread Kito Cheng

I would prefer first approach since it no changes other than adding
testcase, that might confusing other people.


Li, Pan2  於 2023年10月11日 週三 23:12 寫道：

> Sorry for misleading here.
>
> When implement the llrint after lrint, I realize llrint (DF => SF) are
> supported by the lrint already in the previous patche(es).
> Because they same the same standard name as well as the mode iterator.
>
> Thus, I may have 2 options here for the patch naming.
>
> 1. Only mentioned test cases for llrint.
> 2. Named as support similar to lrint.
>
> After some consideration from the situation like search from the git logs,
> I choose option 2 here and add some description in
> as well.
>
> Finally, is there any best practices for this case? Thank again for
> comments.
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Thursday, October 12, 2023 1:05 PM
> To: Li, Pan2 
> Cc: juzhe.zh...@rivai.ai; gcc-patches ; Wang,
> Yanzhang 
> Subject: Re: [PATCH v1] RISC-V: Support FP llrint auto vectorization
>
> Did I miss something? the title says support but it seems only testcase??
>
> On Wed, Oct 11, 2023 at 8:38 PM Li, Pan2  wrote:
> >
> > Committed, thanks Juzhe.
> >
> >
> >
> > Pan
> >
> >
> >
> > From: juzhe.zh...@rivai.ai 
> > Sent: Thursday, October 12, 2023 11:34 AM
> > To: Li, Pan2 ; gcc-patches 
> > Cc: Li, Pan2 ; Wang, Yanzhang <
> yanzhang.w...@intel.com>; kito.cheng 
> > Subject: Re: [PATCH v1] RISC-V: Support FP llrint auto vectorization
> >
> >
> >
> > LGTM
> >
> >
> >
> > 
> >
> > juzhe.zh...@rivai.ai
> >
> >
> >
> > From: pan2.li
> >
> > Date: 2023-10-12 11:28
> >
> > To: gcc-patches
> >
> > CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
> >
> > Subject: [PATCH v1] RISC-V: Support FP llrint auto vectorization
> >
> > From: Pan Li 
> >
> >
> >
> > This patch would like to support the FP llrint auto vectorization.
> >
> >
> >
> > * long long llrint (double)
> >
> >
> >
> > This will be the CVT from DF => DI from the standard name's perpsective,
> >
> > which has been covered in previous PATCH(es). Thus, this patch only add
> >
> > some test cases.
> >
> >
> >
> > gcc/testsuite/ChangeLog:
> >
> >
> >
> > * gcc.target/riscv/rvv/autovec/unop/test-math.h: Add type int64_t.
> >
> > * gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: New test.
> >
> > * gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: New test.
> >
> > * gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c: New test.
> >
> >
> >
> > Signed-off-by: Pan Li 
> >
> > ---
> >
> > .../riscv/rvv/autovec/unop/math-llrint-0.c| 14 +
> >
> > .../rvv/autovec/unop/math-llrint-run-0.c  | 63 +++
> >
> > .../riscv/rvv/autovec/unop/test-math.h|  2 +
> >
> > .../riscv/rvv/autovec/vls/math-llrint-0.c | 30 +
> >
> > 4 files changed, 109 insertions(+)
> >
> > create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
> >
> > create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
> >
> > create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c
> >
> >
> >
> > diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
> >
> > new file mode 100644
> >
> > index 000..2d90d232ba1
> >
> > --- /dev/null
> >
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
> >
> > @@ -0,0 +1,14 @@
> >
> > +/* { dg-do compile } */
> >
> > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize
> -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2"
> } */
> >
> > +/* { dg-final { check-function-bodies "**" "" } } */
> >
> > +
> >
> > +#include "test-math.h"
> >
> > +
> >
> > +/*
> >
> > +** test_double_int64_t___builtin_llrint:
> >
> > +**   ...
> >
> > +**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
> >
> > +**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
> >
> > +**   ...
> >
> > +*/
> >
> > +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llrint)
> >
> > diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
> >
> > new file mode 100644
> >
> > index 000..6b69f5568e9
> >
> > --- /dev/null
> >
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
> >
> > @@ -0,0 +1,63 @@
> >
> > +/* { dg-do run { target { riscv_v && rv64 } } } */
> >
> > +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize
> -fno-vect-cost-model -ffast-math" } */
> >
> > +
> >
> > +#include "test-math.h"
> >
> > +
> >
> > +#define ARRAY_SIZE 128
> >
> > +
> >
> > +double in[ARRAY_SIZE];
> >
> > +int64_t out[ARRAY_SIZE];
> >
> > +int64_t ref[ARRAY_SIZE];
> >
> > +
> >
> > +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llrint)
> >
> > +TEST_ASSERT (int64_t)
> >
> > +
> >
> > +TEST_INIT_CVT (double, 1.2, int64_t, __builtin_llrint (1.2), 1)
> >
> > +TEST_INIT

Re: [PATCH] RISCV: Bugfix for incorrect documentation heading nesting

2023-10-12 Thread Jeff Law





On 10/12/23 04:05, Mary Bennett wrote:

gcc/ChangeLog:
 * doc/extend.texi: Change subsubsection to subsection for
   CORE-V built-ins.
Thanks for jumping on it quickly.  I added the PR marker to the 
ChangeLog entry (bugzilla integration) and pushed this to the trunk.


jeff

[committed] wide-int: Fix build with gcc < 12 or clang++ [PR111787]

2023-10-12 Thread Jakub Jelinek

Hi!

While my wide_int patch bootstrapped/regtested fine when I used GCC 12
as system gcc, apparently it doesn't with GCC 11 and older or clang++.
For GCC before PR96555 C++ DR1315 implementation the compiler complains
about template argument involving template parameters, for clang++ the
same + complains about missing needs_write_val_arg static data member
in some wi::int_traits specializations.

I've so far rebuilt just stage3 gcc subdirectory with this patch and Tobias
and William made it with it through stage1.  Committed to unbreak build
for others.

2023-10-12  Jakub Jelinek  

PR bootstrap/111787
* tree.h (wi::int_traits ::needs_write_val_arg): New
static data member.
(int_traits >::needs_write_val_arg): Likewise.
(wi::ints_for): Provide separate partial specializations for
generic_wide_int > and INL_CONST_PRECISION or that
and CONST_PRECISION, rather than using
int_traits  >::precision_type as the second template
argument.
* rtl.h (wi::int_traits ::needs_write_val_arg): New
static data member.
* double-int.h (wi::int_traits ::needs_write_val_arg):
Likewise.

--- gcc/tree.h.jj   2023-10-12 16:01:04.0 +0200
+++ gcc/tree.h  2023-10-12 16:52:51.977954615 +0200
@@ -6237,6 +6237,7 @@ namespace wi
 static const enum precision_type precision_type = VAR_PRECISION;
 static const bool host_dependent_precision = false;
 static const bool is_sign_extended = false;
+static const bool needs_write_val_arg = false;
   };
 
   template 
@@ -6262,6 +6263,7 @@ namespace wi
   = N == ADDR_MAX_PRECISION ? INL_CONST_PRECISION : CONST_PRECISION;
 static const bool host_dependent_precision = false;
 static const bool is_sign_extended = true;
+static const bool needs_write_val_arg = false;
 static const unsigned int precision = N;
   };
 
@@ -6293,8 +6295,14 @@ namespace wi
   tree_to_poly_wide_ref to_poly_wide (const_tree);
 
   template 
-  struct ints_for  >,
-  int_traits  >::precision_type>
+  struct ints_for  >, INL_CONST_PRECISION>
+  {
+typedef generic_wide_int  > extended;
+static extended zero (const extended &);
+  };
+
+  template 
+  struct ints_for  >, CONST_PRECISION>
   {
 typedef generic_wide_int  > extended;
 static extended zero (const extended &);
@@ -6532,8 +6540,15 @@ wi::to_poly_wide (const_tree t)
 template 
 inline generic_wide_int  >
 wi::ints_for  >,
- wi::int_traits  >::precision_type
->::zero (const extended &x)
+ wi::INL_CONST_PRECISION>::zero (const extended &x)
+{
+  return build_zero_cst (TREE_TYPE (x.get_tree ()));
+}
+
+template 
+inline generic_wide_int  >
+wi::ints_for  >,
+ wi::CONST_PRECISION>::zero (const extended &x)
 {
   return build_zero_cst (TREE_TYPE (x.get_tree ()));
 }
--- gcc/rtl.h.jj2023-09-29 22:04:44.463012421 +0200
+++ gcc/rtl.h   2023-10-12 16:54:59.915240074 +0200
@@ -2270,6 +2270,7 @@ namespace wi
 /* This ought to be true, except for the special case that BImode
is canonicalized to STORE_FLAG_VALUE, which might be 1.  */
 static const bool is_sign_extended = false;
+static const bool needs_write_val_arg = false;
 static unsigned int get_precision (const rtx_mode_t &);
 static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int,
  const rtx_mode_t &);
--- gcc/double-int.h.jj 2023-10-12 16:01:04.260164202 +0200
+++ gcc/double-int.h2023-10-12 16:53:41.401292272 +0200
@@ -442,6 +442,7 @@ namespace wi
   {
 static const enum precision_type precision_type = INL_CONST_PRECISION;
 static const bool host_dependent_precision = true;
+static const bool needs_write_val_arg = false;
 static const unsigned int precision = HOST_BITS_PER_DOUBLE_INT;
 static unsigned int get_precision (const double_int &);
 static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int,

Jakub

Re: [PATCH] reg-notes.def: Fix up description of REG_NOALIAS

2023-10-12 Thread Jeff Law





On 10/12/23 03:41, Alex Coplan wrote:

Hi,

The description of the REG_NOALIAS note in reg-notes.def isn't quite
right. It describes it as being attached to call insns, but it is
instead attached to a move insn receiving the return value from a call.

This can be seen by looking at the code in calls.cc:expand_call which
attaches the note:

   emit_move_insn (temp, valreg);

   /* The return value from a malloc-like function cannot alias
  anything else.  */
   last = get_last_insn ();
   add_reg_note (last, REG_NOALIAS, temp);

Bootstrapped on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

 * reg-notes.def (NOALIAS): Correct comment.

OK
jeff

Re: Ping: [PATCH v2 1/2] testsuite: Add dg-require-atomic-cmpxchg-word

2023-10-12 Thread Jeff Law





On 10/12/23 08:38, Christophe Lyon wrote:

LGTM but I'm not a maintainer ;-)
LGTM to as well -- I usually try to stay out of libstdc++, but this 
looks simple enough.  Both patches in this series are OK.


jeff

Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-12 Thread Vineet Gupta





On 10/11/23 19:37, Hans-Peter Nilsson wrote:

```
foo2:
sext.w  a6,a1 <-- this goes away
beq a1,zero,.L4
li  a5,0
li  a0,0
.L3:
addwa4,a2,a5
addwa5,a3,a5
addwa0,a4,a0
bltua5,a6,.L3
ret
.L4:
li  a0,0
ret
```

...if your patch gets rid of that sign-extension above...


diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
new file mode 100644
index ..007792466a51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -0,0 +1,15 @@
+/* Simplified varaint of gcc.target/riscv/zba-adduw.c.  */
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int foo2(int unused, int n, unsigned y, unsigned delta){
+  int s = 0;
+  unsigned int x = 0;
+  for (;x
...then why test for the presence of a sign-extension
instruction in the test-case?

IOW, shouldn't that be a scan-assember-not?


Yes indeed.


(What am I missing?)


Nothing deep really, just a snafu on my side. I'll fix it in v2.

Thx,
-Vineet


brgds, H-P
PS. sorry I missed the Cauldron this year.  Hope to see you all next year!


Looking fwd to.

Thx,
-Vineet

Re: [PATCH v2] RISC-V: Make xtheadcondmov-indirect tests robust against instruction reordering

2023-10-12 Thread Jeff Law





On 10/12/23 07:06, Christoph Muellner wrote:

From: Christoph Müllner 

Fixes: c1bc7513b1d7 ("RISC-V: const: hide mvconst splitter from IRA")

A recent change broke the xtheadcondmov-indirect tests, because the order of
emitted instructions changed. Since the test is too strict when testing for
a fixed instruction order, let's change the tests to simply count instruction,
like it is done for similar tests.

Reported-by: Patrick O'Neill 
Signed-off-by: Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect.c: Make robust against
instruction reordering.

OK for the trunk.

jeff

Re: [PATCH] RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the minimal VLEN exceeds 512.

2023-10-12 Thread Jeff Law





On 10/11/23 17:17, Kito Cheng wrote:

Yeah, I'll take you suggestion and go ahead, Robin's suggestion is
great but it's just a little too magic :P
So there'll be a V2 of this patch, right?  Just want to make sure state 
is correct in patchwork.


jeff

Re: [PATCH] RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the minimal VLEN exceeds 512.

2023-10-12 Thread Kito Cheng

Yeah, will send v2 today

Jeff Law  於 2023年10月12日 週四 09:15 寫道：

>
>
> On 10/11/23 17:17, Kito Cheng wrote:
> > Yeah, I'll take you suggestion and go ahead, Robin's suggestion is
> > great but it's just a little too magic :P
> So there'll be a V2 of this patch, right?  Just want to make sure state
> is correct in patchwork.
>
> jeff
>

Re: [PATCH] C99 test suite readiness: Mark some C89 tests

2023-10-12 Thread Jeff Law





On 10/11/23 10:42, Florian Weimer wrote:

Add -std=gnu89 to some tests which evidently target C89-only language
features.

gcc/testsuite/

* gcc.c-torture/compile/920501-11.c: Compile with -std=gnu89.
* gcc.c-torture/compile/920501-23.c: Likewise.
* gcc.c-torture/compile/920501-8.c: Likewise.
* gcc.c-torture/compile/920701-1.c: Likewise.
* gcc.c-torture/compile/930529-1.c: Likewise.

OK
jeff

Re: [PATCH] C99 test suite conversation: Some unverified test case adjustments

2023-10-12 Thread Jeff Law





On 10/11/23 10:53, Florian Weimer wrote:

These changes are assumed not to interfere with the test objective,
but it was not possible to reproduce the historic test case failures
(with or without the modification here).

gcc/testsuite/

* gcc.c-torture/compile/2105-1.c: Add missing int return type.
Call __builtin_exit instead of exit.
* gcc.c-torture/compile/2105-2.c: Add missing void types.
* gcc.c-torture/compile/2211-1.c (Lstream_fputc, Lstream_write)
(Lstream_flush_out, parse_doprnt_spec): Add missing function
declaration.
* gcc.c-torture/compile/2224-1.c (call_critical_lisp_code):
Declare.
* gcc.c-torture/compile/2314-2.c: Add missing void types.
* gcc.c-torture/compile/20090917-1.c (foo): Likewise.
* gcc.c-torture/compile/980816-1.c (XtVaCreateManagedWidget)
(XtAddCallback):Likewise.
* gcc.c-torture/compile/pr49474.c: Use struct
gfc_formal_arglist * instead of (implied) int type.
* gcc.c-torture/execute/2000-1.c (foo): Add cast to
char *.
(main): Call __builtin_abort and __builtin_exit.

OK
jeff

Re: [PATCH] C99 testsuite readiness: Some verified test case adjustments

2023-10-12 Thread Jeff Law





On 10/11/23 10:55, Florian Weimer wrote:

The updated test cases still reproduce the bugs with old compilers.

gcc/testsuite/

* gcc.c-torture/compile/pc44485.c (func_21): Add missing cast.
* gcc.c-torture/compile/pr106101.c: Use builtins to avoid
calls to undeclared functions.  Change type of yyvsp to
char ** and introduce yyvsp1 to avoid type errors.
* gcc.c-torture/execute/pr111331-1.c: Add missing int.
* gcc.dg/pr100512.c: Unreduce test case and suppress only
-Wpointer-to-int-cast.
* gcc.dg/pr103003.c: Likewise.
* gcc.dg/pr103451.c: Add cast to long and suppress
-Wdiv-by-zero only.
* gcc.dg/pr68435.c: Avoid implicit int and missing
static function implementation warning.

OK
jeff

[Patch] libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory

2023-10-12 Thread Tobias Burnus


I noticed that while OMP_DEFAULT_DEVICE was updated a ref to
OMP_TARGET_OFFLOAD (→ mandatory case) was missing. And
OMP_TARGET_OFFLOAD wasn't updated at all for those changes.

I hope the new version is clearer.

Current versions:
https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fDEFAULT_005fDEVICE.html
https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fTARGET_005fOFFLOAD.html

I have changed the reference to OpenMP 5.2 (from 4.5 and 5.0) as only
the latter fully describes the current behavior; I first left in the
current specification references and only added the v5.2 one, but I
then did not see an advantage of doing so.

Any comments or suggestions before I commit the attached patch?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory

In OpenMP 5.0/5.1, the semantic of OMP_TARGET_OFFLOAD=mandatory was
insufficiently specified; 5.2 clarified this with extensions/clarifications
(omp_initial_device, omp_invalid_device, "conforming device number").
GCC's implementation matches OpenMP 5.2.

libgomp/ChangeLog:

	* libgomp.texi (OMP_DEFAULT_DEVICE): Update spec ref; add @ref to
	OMP_TARGET_OFFLOAD.
	(OMP_TARGET_OFFLOAD): Update spec ref; add @ref to OMP_DEFAULT_DEVICE;
	clarify MANDATORY behavior.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index ba8e9013814..46c4dcf90f1 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -2831,9 +2831,10 @@ device number 0 will be used.
 
 @item @emph{See also}:
 @ref{omp_get_default_device}, @ref{omp_set_default_device},
+@ref{OMP_TARGET_OFFLOAD}
 
 @item @emph{Reference}:
-@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
+@uref{https://www.openmp.org, OpenMP specification v5.2}, Section 21.2.7
 @end table
 
 
@@ -3133,15 +3134,25 @@ variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED}
 or @code{DEFAULT}.
 
 If set to @code{MANDATORY}, the program will terminate with an error if
-the offload device is not present or is not supported.  If set to
-@code{DISABLED}, then offloading is disabled and all code will run on the
-host. If set to @code{DEFAULT}, the program will try offloading to the
+any device construct or device memory routine uses a device that is unavailable
+or not supported by the implementation, or uses a non-conforming device number.
+If set to @code{DISABLED}, then offloading is disabled and all code will run on
+the host. If set to @code{DEFAULT}, the program will try offloading to the
 device first, then fall back to running code on the host if it cannot.
 
 If undefined, then the program will behave as if @code{DEFAULT} was set.
 
+Note: Even with @code{MANDATORY}, there will be no run-time termination when
+the device number in a @code{device} clause or argument to a device memory
+routine is for host, which includes using the device number in the
+@var{default-device-var} ICV.  However, the initial value of
+the @var{default-device-var} ICV is affected by @code{MANDATORY}.
+
+@item @emph{See also}:
+@ref{OMP_DEFAULT_DEVICE}
+
 @item @emph{Reference}:
-@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.17
+@uref{https://www.openmp.org, OpenMP specification v5.2}, Section 21.2.8
 @end table

Re: [PATCH] TEST: Add vectorization check

2023-10-12 Thread Jeff Law





On 10/9/23 08:59, Juzhe-Zhong wrote:

These cases won't check SLP for load_lanes support target.

Add vectorization check for situations.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97832-2.c: Add vectorization check.
* gcc.dg/vect/pr97832-3.c: Ditto.
* gcc.dg/vect/pr97832-4.c: Ditto.
So has this been checked on anything other than riscv?  It would be good 
to know that these aren't likely to introduce new FAILs.


I would think you could build and x86 compiler and just run the vect.exp 
tests with and without this patch (so you don't need to do a full 
regression run).


Assuming it doesn't cause any new x86 FAILs, this is fine for the trunk.

jeff

Re: [PATCH][_Hashtable] Avoid redundant usage of rehash policy

2023-10-12 Thread François Dumont


Now that abi breakage is fixed and hoping that Friday is review day :-)

Ping !

On 17/09/2023 22:41, François Dumont wrote:

libstdc++: [_Hashtable] Avoid redundant usage of rehash policy

Bypass usage of __detail::__distance_fwd and check for need to rehash 
when assigning an initializer_list to

an unordered_multimap or unordered_multiset.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h
    (_Insert_base<>::_M_insert_range(_InputIte, _InputIte, 
_NodeGen&)): New.
    (_Insert_base<>::_M_insert_range(_InputIte, _InputIte, 
true_type)): Use latter.
    (_Insert_base<>::_M_insert_range(_InputIte, _InputIte, 
false_type)): Likewise.

    * include/bits/hashtable.h
(_Hashtable<>::operator=(initializer_list)): Likewise.
    (_Hashtable<>::_Hashtable(_InputIte, _InputIte, size_type, const 
_Hash&, const _Equal&,

    const allocator_type&, false_type)): Likewise.

Ok to commit ?

François

Re: [PATCH v17 02/39] c-family, c++: Look up built-in traits through gperf

2023-10-12 Thread Patrick Palka

On Wed, 11 Oct 2023, Ken Matsui wrote:

> Since RID_MAX soon reaches 255 and all traits are used approximately once in
> a C++ translation unit, this patch instead uses only RID_TRAIT_EXPR and
> RID_TRAIT_TYPE for all traits and uses gperf to look up the specific trait.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-common.cc (c_common_reswords): Map all traits to RID_TRAIT_EXPR
>   and RID_TRAIT_TYPE instead.
>   * c-common.h (enum rid): Remove all existing RID values for traits.
>   Use RID_TRAIT_EXPR and RID_TRAIT_TYPE instead.
> 
> gcc/cp/ChangeLog:
> 
>   * Make-lang.in: Add targets to generate cp-trait.gperf and
>   cp-trait.h.
>   * cp-objcp-common.cc (names_builtin_p): Remove all existing RID values
>   for traits.  Use RID_TRAIT_EXPR and RID_TRAIT_TYPE instead.
>   * parser.cc (cp_keyword_starts_decl_specifier_p): Likewise, for
>   type-yielding traits.  Use RID_TRAIT_TYPE instead.
>   (cp_parser_simple_type_specifier): Likewise.
>   (cp_parser_primary_expression): Likewise, for expression-yielding
>   traits.  Use RID_TRAIT_EXPR instead.
>   (cp_parser_trait): Look up traits through gperf instead of enum rid.
>   * lex.cc (init_reswords): Make ridpointers for RID_TRAIT_EXPR and
>   RID_TRAIT_TYPE empty, which do not have corresponding unique
>   cannonical spellings.
>   * cp-trait-head.in: New file.
>   * cp-trait.gperf: New file.
>   * cp-trait.h: New file.
> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/c-family/c-common.cc  |  12 +-
>  gcc/c-family/c-common.h   |   7 +-
>  gcc/cp/Make-lang.in   |  26 
>  gcc/cp/cp-objcp-common.cc |   6 +-
>  gcc/cp/cp-trait-head.in   |  30 +
>  gcc/cp/cp-trait.gperf |  74 
>  gcc/cp/cp-trait.h | 247 ++
>  gcc/cp/lex.cc |   5 +
>  gcc/cp/parser.cc  |  70 ---
>  9 files changed, 419 insertions(+), 58 deletions(-)
>  create mode 100644 gcc/cp/cp-trait-head.in
>  create mode 100644 gcc/cp/cp-trait.gperf
>  create mode 100644 gcc/cp/cp-trait.h
> 
> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> index f044db5b797..f219ccd29e5 100644
> --- a/gcc/c-family/c-common.cc
> +++ b/gcc/c-family/c-common.cc
> @@ -508,12 +508,16 @@ const struct c_common_resword c_common_reswords[] =
>{ "wchar_t",   RID_WCHAR,  D_CXXONLY },
>{ "while", RID_WHILE,  0 },
>  
> -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> -  { NAME,RID_##CODE, D_CXXONLY },
> +#define DEFTRAIT_EXPR(CODE, NAME, ARITY) \
> +  { NAME,RID_TRAIT_EXPR, D_CXXONLY },
>  #include "cp/cp-trait.def"
> -#undef DEFTRAIT
> +#undef DEFTRAIT_EXPR
>/* An alias for __is_same.  */
> -  { "__is_same_as",  RID_IS_SAME,D_CXXONLY },
> +  { "__is_same_as",  RID_TRAIT_EXPR, D_CXXONLY },
> +#define DEFTRAIT_TYPE(CODE, NAME, ARITY) \
> +  { NAME,RID_TRAIT_TYPE, D_CXXONLY },
> +#include "cp/cp-trait.def"
> +#undef DEFTRAIT_TYPE
>  
>/* C++ transactional memory.  */
>{ "synchronized",  RID_SYNCHRONIZED, D_CXX_OBJC | D_TRANSMEM },
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index 1fdba7ef3ea..a1a641f4175 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -168,10 +168,9 @@ enum rid
>RID_BUILTIN_LAUNDER,
>RID_BUILTIN_BIT_CAST,
>  
> -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> -  RID_##CODE,
> -#include "cp/cp-trait.def"
> -#undef DEFTRAIT
> +  /* C++ traits, defined in cp-trait.def.  */
> +  RID_TRAIT_EXPR,
> +  RID_TRAIT_TYPE,
>  
>/* C++11 */
>RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
> diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
> index 2727fb7f8cc..a67d1c3e9f3 100644
> --- a/gcc/cp/Make-lang.in
> +++ b/gcc/cp/Make-lang.in
> @@ -34,6 +34,8 @@
>  # - the compiler proper (eg: cc1plus)
>  # - define the names for selecting the language in LANGUAGES.
>  
> +AWK = @AWK@
> +
>  # Actual names to use when installing a native compiler.
>  CXX_INSTALL_NAME := $(shell echo c++|sed '$(program_transform_name)')
>  GXX_INSTALL_NAME := $(shell echo g++|sed '$(program_transform_name)')
> @@ -186,6 +188,30 @@ endif
>  # This is the file that depends on the generated header file.
>  cp/name-lookup.o: $(srcdir)/cp/std-name-hint.h
>  
> +# We always need the dependency on the .gperf file
> +# because it itself is generated.
> +ifeq ($(ENABLE_MAINTAINER_RULES), true)
> +$(srcdir)/cp/cp-trait.h: $(srcdir)/cp/cp-trait.gperf
> +else
> +$(srcdir)/cp/cp-trait.h: | $(srcdir)/cp/cp-trait.gperf
> +endif
> + gperf -o -C -E -k '8' -D -N 'find' -L C++ \
> + $(srcdir)/cp/cp-trait.gperf --output-file 
> $(srcdir)/cp/cp-trait.h
> +
> +# The cp-trait.gperf file itself is generated from
> +# cp-trait-head.in and cp-trait.def files.
> +$(srcdir)/cp/cp-trait.gperf: $(srcdir)/cp/cp-trait-head.in 
> $(srcdir)/cp/cp-trait.def
> + cat $< > $@
> + $(AWK) -F', *' '/^DEFTRAIT_/

Re: [PATCH] tree-optimization/111773 - avoid CD-DCE of noreturn special calls

2023-10-12 Thread Jan Hubicka

> The support to elide calls to allocation functions in DCE runs into
> the issue that when implementations are discovered noreturn we end
> up DCEing the calls anyway, leaving blocks without termination and
> without outgoing edges which is both invalid IL and wrong-code when
> as in the example the noreturn call would throw.  The following
> avoids taking advantage of both noreturn and the ability to elide
> allocation at the same time.
> 
> For the testcase it's valid to throw or return 10 by eliding the
> allocation.  But we have to do either where currently we'd run
> off the function.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Honza, any objections here?

Looks good to me.  Optimizing out noreturn new seems like odd
optimization anyway.

Honza
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/111773
>   * tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Do
>   not elide noreturn calls that are reflected to the IL.
> 
>   * g++.dg/torture/pr111773.C: New testcase.

Re: [PATCH v2] RISC-V: Make xtheadcondmov-indirect tests robust against instruction reordering

2023-10-12 Thread Kito Cheng

Sorry for the late comment after Jeff say ok, but I guess we may
consider add "-fno-schedule-insns -fno-schedule-insns2" to avoid
disturbing from schedule like some of our test case in
gcc/testsuite/gcc.target/riscv/rvv?

On Thu, Oct 12, 2023 at 9:12 AM Jeff Law  wrote:
>
>
>
> On 10/12/23 07:06, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > Fixes: c1bc7513b1d7 ("RISC-V: const: hide mvconst splitter from IRA")
> >
> > A recent change broke the xtheadcondmov-indirect tests, because the order of
> > emitted instructions changed. Since the test is too strict when testing for
> > a fixed instruction order, let's change the tests to simply count 
> > instruction,
> > like it is done for similar tests.
> >
> > Reported-by: Patrick O'Neill 
> > Signed-off-by: Christoph Müllner 
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadcondmov-indirect.c: Make robust against
> >   instruction reordering.
> OK for the trunk.
>
> jeff

Re: [PATCH v2] RISC-V: Make xtheadcondmov-indirect tests robust against instruction reordering

2023-10-12 Thread Kito Cheng

but anyway, I don't have a strong opinion for either way, just go
ahead no matter which one you choose.

On Thu, Oct 12, 2023 at 11:28 AM Kito Cheng  wrote:
>
> Sorry for the late comment after Jeff say ok, but I guess we may
> consider add "-fno-schedule-insns -fno-schedule-insns2" to avoid
> disturbing from schedule like some of our test case in
> gcc/testsuite/gcc.target/riscv/rvv?
>
> On Thu, Oct 12, 2023 at 9:12 AM Jeff Law  wrote:
> >
> >
> >
> > On 10/12/23 07:06, Christoph Muellner wrote:
> > > From: Christoph Müllner 
> > >
> > > Fixes: c1bc7513b1d7 ("RISC-V: const: hide mvconst splitter from IRA")
> > >
> > > A recent change broke the xtheadcondmov-indirect tests, because the order 
> > > of
> > > emitted instructions changed. Since the test is too strict when testing 
> > > for
> > > a fixed instruction order, let's change the tests to simply count 
> > > instruction,
> > > like it is done for similar tests.
> > >
> > > Reported-by: Patrick O'Neill 
> > > Signed-off-by: Christoph Müllner 
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/riscv/xtheadcondmov-indirect.c: Make robust against
> > >   instruction reordering.
> > OK for the trunk.
> >
> > jeff

[PATCH v2] RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the minimal VLEN exceeds 512.

2023-10-12 Thread Kito Cheng

riscv_legitimize_poly_move was expected to ensure the poly value is at most 32
times smaller than the minimal VLEN (32 being derived from '4096 / 128').
This assumption held when our mode modeling was not so precisely defined.
However, now that we have modeled the mode size according to the correct minimal
VLEN info, the size difference between different RVV modes can be up to 64
times. For instance, comparing RVVMF64BI and RVVMF1BI, the sizes are [1, 1]
versus [64, 64] respectively.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_poly_move): Bump
max_power to 64.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/autovec/bug-01.C: New.
* g++.target/riscv/rvv/rvv.exp: Add autovec folder.
---
 gcc/config/riscv/riscv.cc |  5 ++-
 gcc/config/riscv/riscv.h  |  5 +++
 .../g++.target/riscv/rvv/autovec/bug-01.C | 33 +++
 gcc/testsuite/g++.target/riscv/rvv/rvv.exp|  3 ++
 4 files changed, 43 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 739fc77e785..d43bc765ce7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2411,9 +2411,8 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, 
rtx tmp, rtx src)
 }
   else
 {
-  /* FIXME: We currently DON'T support TARGET_MIN_VLEN > 4096.  */
-  int max_power = exact_log2 (4096 / 128);
-  for (int i = 0; i < max_power; i++)
+  int max_power = exact_log2 (MAX_POLY_VARIANT);
+  for (int i = 0; i <= max_power; i++)
{
  int possible_div_factor = 1 << i;
  if (factor % (vlenb / possible_div_factor) == 0)
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 4b8d57509fb..3d2723f5339 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1197,4 +1197,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
 #define OPTIMIZE_MODE_SWITCHING(ENTITY) (TARGET_VECTOR)
 #define NUM_MODES_FOR_MODE_SWITCHING {VXRM_MODE_NONE, riscv_vector::FRM_NONE}
 
+
+/* The size difference between different RVV modes can be up to 64 times.
+   e.g. RVVMF64BI vs RVVMF1BI on zvl512b, which is [1, 1] vs [64, 64].  */
+#define MAX_POLY_VARIANT 64
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C 
b/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
new file mode 100644
index 000..fd10009ddbe
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
@@ -0,0 +1,33 @@
+/* { dg-options "-march=rv64gcv_zvl512b -mabi=lp64d -O3" } */
+
+class c {
+public:
+  int e();
+  void j();
+};
+float *d;
+class k {
+  int f;
+
+public:
+  k(int m) : f(m) {}
+  float g;
+  float h;
+  void n(int m) {
+for (int i; i < m; i++) {
+  d[0] = d[1] = d[2] = g;
+  d[3] = h;
+  d += f;
+}
+  }
+};
+c l;
+void o() {
+  int b = l.e();
+  k a(b);
+  for (;;)
+if (b == 4) {
+  l.j();
+  a.n(2);
+}
+}
diff --git a/gcc/testsuite/g++.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
index 249530580d7..c30d6e93144 100644
--- a/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
@@ -40,5 +40,8 @@ set CFLAGS "-march=$gcc_march -O3"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.C]] \
"" $CFLAGS
 
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[C\]]] \
+"" $CFLAGS
+
 # All done.
 dg-finish
-- 
2.34.1

Re: [Patch] libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory

2023-10-12 Thread Jakub Jelinek

On Thu, Oct 12, 2023 at 06:37:00PM +0200, Tobias Burnus wrote:
> libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory
> 
> In OpenMP 5.0/5.1, the semantic of OMP_TARGET_OFFLOAD=mandatory was
> insufficiently specified; 5.2 clarified this with extensions/clarifications
> (omp_initial_device, omp_invalid_device, "conforming device number").
> GCC's implementation matches OpenMP 5.2.
> 
> libgomp/ChangeLog:
> 
>   * libgomp.texi (OMP_DEFAULT_DEVICE): Update spec ref; add @ref to
>   OMP_TARGET_OFFLOAD.
>   (OMP_TARGET_OFFLOAD): Update spec ref; add @ref to OMP_DEFAULT_DEVICE;
>   clarify MANDATORY behavior.

LGTM.

Jakub

Re: [PATCH 2/1] c++: more non-static memfn call dependence cleanup [PR106086]

2023-10-12 Thread Patrick Palka

On Tue, 26 Sep 2023, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> for trunk?
> 
> -- >8 --
> 
> This follow-up patch removes some more repetition of the type-dependent

On second thought there's no good reason to split these patches into a two
part series, so here's a single squashed patch:

-- >8 --

Subject: [PATCH] c++: non-static memfn call dependence cleanup [PR106086]

In cp_parser_postfix_expression and in the CALL_EXPR case of
tsubst_copy_and_build, we essentially repeat the type-dependent and
COMPONENT_REF callee cases of finish_call_expr.  This patch deduplicates
this logic by making both spots consistently go through finish_call_expr.

This allows us to easily fix PR106086 -- which is about us neglecting to
capture 'this' when we resolve a use of a non-static member function of
the current instantiation only at lambda regeneration time -- by moving
the call to maybe_generic_this_capture from the parser to finish_call_expr
so that we consider capturing 'this' at regeneration time as well.

PR c++/106086

gcc/cp/ChangeLog:

* parser.cc (cp_parser_postfix_expression): Consolidate three
calls to finish_call_expr, one to build_new_method_call and
one to build_min_nt_call_vec into one call to finish_call_expr.
Don't call maybe_generic_this_capture here.
* pt.cc (tsubst_copy_and_build) : Remove
COMPONENT_REF callee handling.
(type_dependent_expression_p): Use t_d_object_e_p instead of
t_d_e_p for COMPONENT_REF and OFFSET_REF.
* semantics.cc (finish_call_expr): In the type-dependent case,
call maybe_generic_this_capture here instead.

gcc/testsuite/ChangeLog:

* g++.dg/template/crash127.C: Expect additional error due to
being able to check the member access expression ahead of time.
Strengthen the test by not instantiating the class template.
* g++.dg/cpp1y/lambda-generic-this5.C: New test.
---
 gcc/cp/parser.cc  | 52 +++
 gcc/cp/pt.cc  | 27 +-
 gcc/cp/semantics.cc   | 12 +++--
 .../g++.dg/cpp1y/lambda-generic-this5.C   | 22 
 gcc/testsuite/g++.dg/template/crash127.C  |  3 +-
 5 files changed, 38 insertions(+), 78 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-generic-this5.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f3abae716fe..b00ef36b831 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -8047,54 +8047,12 @@ cp_parser_postfix_expression (cp_parser *parser, bool 
address_p, bool cast_p,
close_paren_loc);
iloc_sentinel ils (combined_loc);

-   if (TREE_CODE (postfix_expression) == COMPONENT_REF)
- {
-   tree instance = TREE_OPERAND (postfix_expression, 0);
-   tree fn = TREE_OPERAND (postfix_expression, 1);
-
-   if (processing_template_decl
-   && (type_dependent_object_expression_p (instance)
-   || (!BASELINK_P (fn)
-   && TREE_CODE (fn) != FIELD_DECL)
-   || type_dependent_expression_p (fn)
-   || any_type_dependent_arguments_p (args)))
- {
-   maybe_generic_this_capture (instance, fn);
-   postfix_expression
- = build_min_nt_call_vec (postfix_expression, args);
- }
-   else if (BASELINK_P (fn))
- {
- postfix_expression
-   = (build_new_method_call
-  (instance, fn, &args, NULL_TREE,
-   (idk == CP_ID_KIND_QUALIFIED
-? LOOKUP_NORMAL|LOOKUP_NONVIRTUAL
-: LOOKUP_NORMAL),
-   /*fn_p=*/NULL,
-   complain));
- }
-   else
- postfix_expression
-   = finish_call_expr (postfix_expression, &args,
-   /*disallow_virtual=*/false,
-   /*koenig_p=*/false,
-   complain);
- }
-   else if (TREE_CODE (postfix_expression) == OFFSET_REF
-|| TREE_CODE (postfix_expression) == MEMBER_REF
-|| TREE_CODE (postfix_expression) == DOTSTAR_EXPR)
+   if (TREE_CODE (postfix_expression) == OFFSET_REF
+   || TREE_CODE (postfix_expression) == MEMBER_REF
+   || TREE_CODE (postfix_expression) == DOTSTAR_EXPR)
  postfix_expression = (build_offset_ref_call_from_tree
(postfix_expression, &args,
 complain));
-   else if (idk == CP_ID_KIND_QUALIFIED)
- /* A call to a static class

Re: [PATCH v6] c++: Check for indirect change of active union member in constexpr [PR101631,PR102286]

2023-10-12 Thread Jason Merrill


On 10/12/23 04:53, Nathaniel Shead wrote:

On Wed, Oct 11, 2023 at 12:48:12AM +1100, Nathaniel Shead wrote:

On Mon, Oct 09, 2023 at 04:46:46PM -0400, Jason Merrill wrote:

On 10/8/23 21:03, Nathaniel Shead wrote:

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631203.html

+ && (TREE_CODE (t) == MODIFY_EXPR
+ /* Also check if initializations have implicit change of active
+member earlier up the access chain.  */
+ || !refs->is_empty())


I'm not sure what the cumulative point of these two tests is.  TREE_CODE (t)
will be either MODIFY_EXPR or INIT_EXPR, and either should be OK.

As I understand it, the problematic case is something like
constexpr-union2.C, where we're also looking at a MODIFY_EXPR.  So what is
this check doing?


The reasoning was to correctly handle cases like the the following (in
constexpr-union6.C):

   constexpr int test1() {
 U u {};
 std::construct_at(&u.s, S{ 1, 2 });
 return u.s.b;
   }
   static_assert(test1() == 2);

The initialisation of &u.s here is not a member access expression within
the call to std::construct_at, since it's just a pointer, but this code
is still legal; in general, an INIT_EXPR to initialise a union member
should always be OK (I believe?), hence constraining to just
MODIFY_EXPR.

However, just that would then (incorrectly) allow all the following
cases in that test to compile, such as

   constexpr int test2() {
 U u {};
 int* p = &u.s.b;
 std::construct_at(p, 5);
 return u.s.b;
   }
   constexpr int x2 = test2();

since the INIT_EXPR is really only initialising 'b', but the implicit
"modification" of active member to 'u.s' is illegal.

Maybe a better way of expressing this condition would be something like
this?

   /* An INIT_EXPR of the last member in an access chain is always OK,
  but still check implicit change of members earlier on; see
  cpp2a/constexpr-union6.C.  */
   && !(TREE_CODE (t) == INIT_EXPR && refs->is_empty ())

Otherwise I'll see if I can rework some of the other conditions instead.


Incidentally, I think constexpr-union6.C could use a test where we pass &u.s
to a function other than construct_at, and then try (and fail) to assign to
the b member from that function.

Jason



Sounds good; I've added the following test:

   constexpr void foo(S* s) {
 s->b = 10;  // { dg-error "accessing .U::s. member instead of initialized 
.U::k." }
   }
   constexpr int test3() {
 U u {};
 foo(&u.s);  // { dg-message "in .constexpr. expansion" }
 return u.s.b;
   }
   constexpr int x3 = test3();  // { dg-message "in .constexpr. expansion" }

Incidentally I found this particular example caused a very unhelpful
error + ICE due to reporting that S could not be value-initialized in
the current version of the patch. The updated version below fixes that
by using 'build_zero_init' instead -- is this an appropriate choice
here?

A similar (but unrelated) issue is with e.g.
   
   struct S { const int a; int b; };

   union U { int k; S s; };

   constexpr int test() {
 U u {};
 return u.s.b;
   }
   constexpr int x = test();

giving me this pretty unhelpful error message:

/home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
/home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
 6 |   return u.s.b;
   |  ~~^
/home/ns/main.cpp:1:8: note: ‘S::S()’ is implicitly deleted because the default 
definition would be ill-formed:
 1 | struct S { const int a; int b; };
   |^
/home/ns/main.cpp:1:8: error: uninitialised const member in ‘struct S’
/home/ns/main.cpp:1:22: note: ‘const int S::a’ should be initialised
 1 | struct S { const int a; int b; };
   |  ^
/home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
/home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
 6 |   return u.s.b;
   |  ~~^
/home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
/home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’

but I'll try and fix this separately (it exists on current trunk without
this patch as well).


While attempting to fix this I found a much better way of handling
value-initialised unions. Here's a new version of the patch which also
includes the fix for accessing the wrong member of a value-initialised
union as well.

Additionally includes an `auto_diagnostic_group` for the `inform`
diagnostics as Marek helpfully informed me about in my other patch.

Bootstrapped and regtested on x86_64-pc-linux-gnu.


@@ -4496,21 +4491,36 @@ cxx_eval_component_reference (const constexpr_ctx *ctx, 
tree t,

break;
}
  }
-  if (TREE_CODE (TREE_TYPE (whole)) == UNION_TYPE
-  && CONSTRUCTOR_NELTS (whole) > 0)
+  if (TREE_CODE (TREE_TYPE (whole)) == UNION_TYPE)
  {
-  /* DR 1188 says we don't have to deal with this.  */
-  if (!ctx->quiet)
+  if (CONSTRUCTOR_NELTS (whole) > 0)
{
-

[PATCH] genemit: Split insn-emit.cc into ten files.

2023-10-12 Thread Robin Dapp

Hi,

on riscv insn-emit.cc has grown to over 1.2 mio lines of code and
compiling it takes considerable time.
Therefore, this patch adjust genemit to create ten files insn-emit-1.cc
to insn-emit-10.cc.  In order to do so it first counts the number of
available patterns, calculates the number of patterns per file and
starts a new file whenever that number is reached.  Most of the changes
are mechanical - genemit would output to stdout and I changed it to
write to a FILE.

Similar to match.pd a configure option --with-emitinsn-partitions=num
is introduced that makes the number of partition configurable.

This survived some bootstraps on aarch64, x86 and power10 as well as
regular cross builds on riscv.  I didn't to extensive timing on targets
but on my machine the compilation of all 10 insn-emit-...cc files for
riscv takes about 40 seconds now while the full file took roughly
10 minutes.

Testsuite is unchanged on all but x86 where, strangely, I saw several
illegal instructions in the pch tests.  Those were not reproducible
in a second manual test suite run.  I'm just running another full
bootstrap and testsuite cycle with the latest trunk.

Still figured I'd send the current state in order to get some feedback
about the general approach.

Regards
 Robin

gcc/ChangeLog:

PR bootstrap/84402
PR target/111600

* Makefile.in: Handle split insn-emit.cc.
* configure: Regenerate.
* configure.ac: Add --with-insnemit-partitions.
* genemit.cc (output_peephole2_scratches): Print to file instead
of stdout.
(print_code): Ditto.
(gen_rtx_scratch): Ditto.
(gen_exp): Ditto.
(gen_emit_seq): Ditto.
(emit_c_code): Ditto.
(gen_insn): Ditto.
(gen_expand): Ditto.
(gen_split): Ditto.
(output_add_clobbers): Ditto.
(output_added_clobbers_hard_reg_p): Ditto.
(print_overload_arguments): Ditto.
(print_overload_test): Ditto.
(handle_overloaded_code_for): Ditto.
(handle_overloaded_gen): Ditto.
(print_header): New function.
(handle_arg): New function.
(main): Split output into 10 files.
* gensupport.cc (count_patterns): New function.
* gensupport.h (count_patterns): Define.
* read-md.cc (md_reader::print_md_ptr_loc): Add file argument.
* read-md.h (class md_reader): Change definition.
---
 gcc/Makefile.in   |  37 +++-
 gcc/configure |  24 +-
 gcc/configure.ac  |  13 ++
 gcc/genemit.cc| 542 +-
 gcc/gensupport.cc |  36 +++
 gcc/gensupport.h  |   1 +
 gcc/read-md.cc|   4 +-
 gcc/read-md.h |   2 +-
 8 files changed, 404 insertions(+), 255 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9cc16268abf..1988327f311 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -236,6 +236,12 @@ GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-match-%.o, 
$(MATCH_SPLITS_SEQ))
 GENERIC_MATCH_PD_SEQ_SRC = $(patsubst %, generic-match-%.cc, 
$(MATCH_SPLITS_SEQ))
 GENERIC_MATCH_PD_SEQ_O = $(patsubst %, generic-match-%.o, $(MATCH_SPLITS_SEQ))
 
+# The number of splits to be made for the insn-emit files.
+NUM_INSNEMIT_SPLITS = @DEFAULT_INSNEMIT_PARTITIONS@
+INSNEMIT_SPLITS_SEQ = $(wordlist 1,$(NUM_INSNEMIT_SPLITS),$(one_to_))
+INSNEMIT_SEQ_SRC = $(patsubst %, insn-emit-%.cc, $(INSNEMIT_SPLITS_SEQ))
+INSNEMIT_SEQ_O = $(patsubst %, insn-emit-%.o, $(INSNEMIT_SPLITS_SEQ))
+
 # These files are to have specific diagnostics suppressed, or are not to
 # be subject to -Werror:
 # flex output may yield harmless "no previous prototype" warnings
@@ -1354,7 +1360,7 @@ OBJS = \
insn-attrtab.o \
insn-automata.o \
insn-dfatab.o \
-   insn-emit.o \
+   $(INSNEMIT_SEQ_O) \
insn-extract.o \
insn-latencytab.o \
insn-modes.o \
@@ -1852,7 +1858,8 @@ TREECHECKING = @TREECHECKING@
 FULL_DRIVER_NAME=$(target_noncanonical)-gcc-$(version)$(exeext)
 
 MOSTLYCLEANFILES = insn-flags.h insn-config.h insn-codes.h \
- insn-output.cc insn-recog.cc insn-emit.cc insn-extract.cc insn-peep.cc \
+ insn-output.cc insn-recog.cc $(INSNEMIT_SEQ_SRC) \
+ insn-extract.cc insn-peep.cc \
  insn-attr.h insn-attr-common.h insn-attrtab.cc insn-dfatab.cc \
  insn-latencytab.cc insn-opinit.cc insn-opinit.h insn-preds.cc 
insn-constants.h \
  tm-preds.h tm-constrs.h checksum-options $(GIMPLE_MATCH_PD_SEQ_SRC) \
@@ -2481,11 +2488,11 @@ $(common_out_object_file): $(common_out_file)
 # and compile them.
 
 .PRECIOUS: insn-config.h insn-flags.h insn-codes.h insn-constants.h \
-  insn-emit.cc insn-recog.cc insn-extract.cc insn-output.cc insn-peep.cc \
-  insn-attr.h insn-attr-common.h insn-attrtab.cc insn-dfatab.cc \
-  insn-latencytab.cc insn-preds.cc $(GIMPLE_MATCH_PD_SEQ_SRC) \
-  $(GENERIC_MATCH_PD_SEQ_SRC) gimple-match-auto.h generic-match-auto.h \
-  insn-target-def.h
+  $(INSNEMIT_SEQ_SRC) insn-recog.cc insn-extract.cc insn-output.cc \
+  insn-peep.cc insn-attr.h

[PATCH] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-12 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.033s.

I've added some debug prints to make sure that the rest of cp_fold_r
is still performed as before.

PR c++/111660

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r) : Return
integer_zero_node instead of break;.
(cp_fold_immediate): Return true if cp_fold_immediate_r returned
error_mark_node.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/hog1.C: New test.
---
 gcc/cp/cp-gimplify.cc |  9 ++--
 gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 +++
 2 files changed, 82 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index bdf6e5f98ff..ca622ca169a 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
break;
   if (TREE_OPERAND (stmt, 1)
  && cp_walk_tree (&TREE_OPERAND (stmt, 1), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
   if (TREE_OPERAND (stmt, 2)
  && cp_walk_tree (&TREE_OPERAND (stmt, 2), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
   /* We're done here.  Don't clear *walk_subtrees here though: we're called
 from cp_fold_r and we must let it recurse on the expression with
 cp_fold.  */
-  break;
+  return integer_zero_node;
 case PTRMEM_CST:
   if (TREE_CODE (PTRMEM_CST_MEMBER (stmt)) == FUNCTION_DECL
  && DECL_IMMEDIATE_FUNCTION_P (PTRMEM_CST_MEMBER (stmt)))
@@ -1145,7 +1145,8 @@ cp_fold_immediate (tree *tp, mce_value 
manifestly_const_eval)
 flags |= ff_mce_false;
 
   cp_fold_data data (flags);
-  return !!cp_walk_tree_without_duplicates (tp, cp_fold_immediate_r, &data);
+  tree r = cp_walk_tree_without_duplicates (tp, cp_fold_immediate_r, &data);
+  return r == error_mark_node;
 }
 
 /* Perform any pre-gimplification folding of C++ front end trees to
diff --git a/gcc/testsuite/g++.dg/cpp0x/hog1.C 
b/gcc/testsuite/g++.dg/cpp0x/hog1.C
new file mode 100644
index 000..105a2e912c4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/hog1.C
@@ -0,0 +1,77 @@
+// PR c++/111660
+// { dg-do compile { target c++11 } }
+
+enum Value {
+  LPAREN,
+  RPAREN,
+  LBRACE,
+  RBRACE,
+  LBRACK,
+  RBRACK,
+  CONDITIONAL,
+  COLON,
+  SEMICOLON,
+  COMMA,
+  PERIOD,
+  BIT_OR,
+  BIT_AND,
+  BIT_XOR,
+  BIT_NOT,
+  NOT,
+  LT,
+  GT,
+  MOD,
+  ASSIGN,
+  ADD,
+  SUB,
+  MUL,
+  DIV,
+  PRIVATE_NAME,
+  STRING,
+  TEMPLATE_SPAN,
+  IDENTIFIER,
+  WHITESPACE,
+  ILLEGAL,
+};
+
+constexpr Value GetOneCharToken(char c) {
+  return
+  c == '(' ? LPAREN :
+  c == ')' ? RPAREN :
+  c == '{' ? LBRACE :
+  c == '}' ? RBRACE :
+  c == '[' ? LBRACK :
+  c == ']' ? RBRACK :
+  c == '?' ? CONDITIONAL :
+  c == ':' ? COLON :
+  c == ';' ? SEMICOLON :
+  c == ',' ? COMMA :
+  c == '.' ? PERIOD :
+  c == '|' ? BIT_OR :
+  c == '&' ? BIT_AND :
+  c == '^' ? BIT_XOR :
+  c == '~' ? BIT_NOT :
+  c == '!' ? NOT :
+  c == '<' ? LT :
+  c == '>' ? GT :
+  c == '%' ? MOD :
+  c == '=' ? ASSIGN :
+  c == '+' ? ADD :
+  c == '-' ? SUB :
+  c == '*' ? MUL :
+  c == '/' ? DIV :
+  c == '#' ? PRIVATE_NAME :
+  c == '"' ? STRING :
+  c == '\'' ? STRING :
+  c == '`' ? TEMPLATE_SPAN :
+  c == '\\' ? IDENTIFIER :
+  c == ' ' ? WHITESPACE :
+  c == '\t' ? WHITESPACE :
+  c == '\v' ? WHITESPACE :
+  c == '\f' ? WHITESPACE :
+  c == '\r' ? WHITESPACE :
+  c == '\n' ? WHITESPACE :
+  ILLEGAL;
+}
+
+int main() {}

base-commit: 8bd11fa4ffcf8bceb6511a9d6918c90a34b705b5
-- 
2.41.0

Re: [PATCH v6] c++: Check for indirect change of active union member in constexpr [PR101631,PR102286]

2023-10-12 Thread Nathaniel Shead

On Thu, Oct 12, 2023 at 04:24:00PM -0400, Jason Merrill wrote:
> On 10/12/23 04:53, Nathaniel Shead wrote:
> > On Wed, Oct 11, 2023 at 12:48:12AM +1100, Nathaniel Shead wrote:
> > > On Mon, Oct 09, 2023 at 04:46:46PM -0400, Jason Merrill wrote:
> > > > On 10/8/23 21:03, Nathaniel Shead wrote:
> > > > > Ping for 
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631203.html
> > > > > 
> > > > > +   && (TREE_CODE (t) == MODIFY_EXPR
> > > > > +   /* Also check if initializations have implicit change of 
> > > > > active
> > > > > +  member earlier up the access chain.  */
> > > > > +   || !refs->is_empty())
> > > > 
> > > > I'm not sure what the cumulative point of these two tests is.  
> > > > TREE_CODE (t)
> > > > will be either MODIFY_EXPR or INIT_EXPR, and either should be OK.
> > > > 
> > > > As I understand it, the problematic case is something like
> > > > constexpr-union2.C, where we're also looking at a MODIFY_EXPR.  So what 
> > > > is
> > > > this check doing?
> > > 
> > > The reasoning was to correctly handle cases like the the following (in
> > > constexpr-union6.C):
> > > 
> > >constexpr int test1() {
> > >  U u {};
> > >  std::construct_at(&u.s, S{ 1, 2 });
> > >  return u.s.b;
> > >}
> > >static_assert(test1() == 2);
> > > 
> > > The initialisation of &u.s here is not a member access expression within
> > > the call to std::construct_at, since it's just a pointer, but this code
> > > is still legal; in general, an INIT_EXPR to initialise a union member
> > > should always be OK (I believe?), hence constraining to just
> > > MODIFY_EXPR.
> > > 
> > > However, just that would then (incorrectly) allow all the following
> > > cases in that test to compile, such as
> > > 
> > >constexpr int test2() {
> > >  U u {};
> > >  int* p = &u.s.b;
> > >  std::construct_at(p, 5);
> > >  return u.s.b;
> > >}
> > >constexpr int x2 = test2();
> > > 
> > > since the INIT_EXPR is really only initialising 'b', but the implicit
> > > "modification" of active member to 'u.s' is illegal.
> > > 
> > > Maybe a better way of expressing this condition would be something like
> > > this?
> > > 
> > >/* An INIT_EXPR of the last member in an access chain is always OK,
> > >   but still check implicit change of members earlier on; see
> > >   cpp2a/constexpr-union6.C.  */
> > >&& !(TREE_CODE (t) == INIT_EXPR && refs->is_empty ())
> > > 
> > > Otherwise I'll see if I can rework some of the other conditions instead.
> > > 
> > > > Incidentally, I think constexpr-union6.C could use a test where we pass 
> > > > &u.s
> > > > to a function other than construct_at, and then try (and fail) to 
> > > > assign to
> > > > the b member from that function.
> > > > 
> > > > Jason
> > > > 
> > > 
> > > Sounds good; I've added the following test:
> > > 
> > >constexpr void foo(S* s) {
> > >  s->b = 10;  // { dg-error "accessing .U::s. member instead of 
> > > initialized .U::k." }
> > >}
> > >constexpr int test3() {
> > >  U u {};
> > >  foo(&u.s);  // { dg-message "in .constexpr. expansion" }
> > >  return u.s.b;
> > >}
> > >constexpr int x3 = test3();  // { dg-message "in .constexpr. 
> > > expansion" }
> > > 
> > > Incidentally I found this particular example caused a very unhelpful
> > > error + ICE due to reporting that S could not be value-initialized in
> > > the current version of the patch. The updated version below fixes that
> > > by using 'build_zero_init' instead -- is this an appropriate choice
> > > here?
> > > 
> > > A similar (but unrelated) issue is with e.g.
> > >struct S { const int a; int b; };
> > >union U { int k; S s; };
> > > 
> > >constexpr int test() {
> > >  U u {};
> > >  return u.s.b;
> > >}
> > >constexpr int x = test();
> > > 
> > > giving me this pretty unhelpful error message:
> > > 
> > > /home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
> > > /home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
> > >  6 |   return u.s.b;
> > >|  ~~^
> > > /home/ns/main.cpp:1:8: note: ‘S::S()’ is implicitly deleted because the 
> > > default definition would be ill-formed:
> > >  1 | struct S { const int a; int b; };
> > >|^
> > > /home/ns/main.cpp:1:8: error: uninitialised const member in ‘struct S’
> > > /home/ns/main.cpp:1:22: note: ‘const int S::a’ should be initialised
> > >  1 | struct S { const int a; int b; };
> > >|  ^
> > > /home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
> > > /home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
> > >  6 |   return u.s.b;
> > >|  ~~^
> > > /home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
> > > /home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
> > > 
> > > but I'll try and fix this separately (it exists on current trunk without
> > > t

Re: Ping: [PATCH v2 1/2] testsuite: Add dg-require-atomic-cmpxchg-word

2023-10-12 Thread Jonathan Wakely

On Thu, 12 Oct 2023, 17:11 Jeff Law,  wrote:

>
>
> On 10/12/23 08:38, Christophe Lyon wrote:
> > LGTM but I'm not a maintainer ;-)
> LGTM to as well -- I usually try to stay out of libstdc++, but this
> looks simple enough.  Both patches in this series are OK.
>

Thanks for stepping in, Jeff. The patches are indeed fine, I'm just offline
due to circumstances beyond my control. I hope normal service will resume
soon.

Re: [PATCH v1] RISC-V: Support FP lceil/lceilf auto vectorization

2023-10-12 Thread 钟居哲

LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-12 22:17
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP lceil/lceilf auto vectorization
From: Pan Li 
 
This patch would like to support the FP lceil/lceilf auto vectorization.
 
* long lceil (double) for rv64
* long lceilf (float) for rv32
 
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lceilmn2 only act on DF => DI for
rv64, and SF => SI for rv32.
 
Given we have code like:
 
void
test_lceil (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lceil (in[i]);
}
 
Before this patch:
.L3:
  ...
  fld fa5,0(a1)
  fcvt.l.da5,fa5,rup
  sd  a5,-8(a0)
  ...
  bne a1,a4,.L3
 
After this patch:
  frrma6
  ...
  fsrmi   3 // RUP
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
  ...
  fsrma6
 
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lceil2): New
pattern] for lceil/lceilf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lceil): New func decl for expanding lceil.
* config/riscv/riscv-v.cc (expand_vec_lceil): New func impl
for expanding lceil.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lceil-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 11 +++
gcc/config/riscv/riscv-protos.h   |  2 +
gcc/config/riscv/riscv-v.cc   | 10 +++
.../riscv/rvv/autovec/unop/math-lceil-0.c | 19 +
.../riscv/rvv/autovec/unop/math-lceil-1.c | 19 +
.../riscv/rvv/autovec/unop/math-lceil-run-0.c | 69 +++
.../riscv/rvv/autovec/unop/math-lceil-run-1.c | 69 +++
.../riscv/rvv/autovec/vls/math-lceil-0.c  | 30 
.../riscv/rvv/autovec/vls/math-lceil-1.c  | 30 
9 files changed, 259 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lceil-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lceil-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lceil-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 33b11723c21..267691a0095 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2241,6 +2241,7 @@ (define_expand "avg3_ceil"
;; - roundeven/roundevenf
;; - lrint/lrintf
;; - irintf
+;; - lceil/lceilf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2331,3 +2332,13 @@ (define_expand "lround2"
 DONE;
   }
)
+
+(define_expand "lceil2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b7eeeb8f55d..ab65ab19524 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -303,6 +303,7 @@ enum insn_type : unsigned int
   UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
   UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
+  UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P,
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
@@ -477,6 +478,7 @@ void expand_vec_trunc (rtx, rtx, machine_mode, 
machine_mode);
void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_lround (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b61c745678b..b03213dd8ed 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/ri

Re: [PATCH v2] RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the minimal VLEN exceeds 512.

2023-10-12 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-10-13 02:40
To: gcc-patches; kito.cheng; palmer; jeffreyalaw; rdapp; juzhe.zhong
CC: Kito Cheng
Subject: [PATCH v2] RISC-V: Fix the riscv_legitimize_poly_move issue on targets 
where the minimal VLEN exceeds 512.
riscv_legitimize_poly_move was expected to ensure the poly value is at most 32
times smaller than the minimal VLEN (32 being derived from '4096 / 128').
This assumption held when our mode modeling was not so precisely defined.
However, now that we have modeled the mode size according to the correct minimal
VLEN info, the size difference between different RVV modes can be up to 64
times. For instance, comparing RVVMF64BI and RVVMF1BI, the sizes are [1, 1]
versus [64, 64] respectively.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_legitimize_poly_move): Bump
max_power to 64.
 
gcc/testsuite/ChangeLog:
 
* g++.target/riscv/rvv/autovec/bug-01.C: New.
* g++.target/riscv/rvv/rvv.exp: Add autovec folder.
---
gcc/config/riscv/riscv.cc |  5 ++-
gcc/config/riscv/riscv.h  |  5 +++
.../g++.target/riscv/rvv/autovec/bug-01.C | 33 +++
gcc/testsuite/g++.target/riscv/rvv/rvv.exp|  3 ++
4 files changed, 43 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 739fc77e785..d43bc765ce7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2411,9 +2411,8 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, 
rtx tmp, rtx src)
 }
   else
 {
-  /* FIXME: We currently DON'T support TARGET_MIN_VLEN > 4096.  */
-  int max_power = exact_log2 (4096 / 128);
-  for (int i = 0; i < max_power; i++)
+  int max_power = exact_log2 (MAX_POLY_VARIANT);
+  for (int i = 0; i <= max_power; i++)
{
  int possible_div_factor = 1 << i;
  if (factor % (vlenb / possible_div_factor) == 0)
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 4b8d57509fb..3d2723f5339 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1197,4 +1197,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
#define OPTIMIZE_MODE_SWITCHING(ENTITY) (TARGET_VECTOR)
#define NUM_MODES_FOR_MODE_SWITCHING {VXRM_MODE_NONE, riscv_vector::FRM_NONE}
+
+/* The size difference between different RVV modes can be up to 64 times.
+   e.g. RVVMF64BI vs RVVMF1BI on zvl512b, which is [1, 1] vs [64, 64].  */
+#define MAX_POLY_VARIANT 64
+
#endif /* ! GCC_RISCV_H */
diff --git a/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C 
b/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
new file mode 100644
index 000..fd10009ddbe
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
@@ -0,0 +1,33 @@
+/* { dg-options "-march=rv64gcv_zvl512b -mabi=lp64d -O3" } */
+
+class c {
+public:
+  int e();
+  void j();
+};
+float *d;
+class k {
+  int f;
+
+public:
+  k(int m) : f(m) {}
+  float g;
+  float h;
+  void n(int m) {
+for (int i; i < m; i++) {
+  d[0] = d[1] = d[2] = g;
+  d[3] = h;
+  d += f;
+}
+  }
+};
+c l;
+void o() {
+  int b = l.e();
+  k a(b);
+  for (;;)
+if (b == 4) {
+  l.j();
+  a.n(2);
+}
+}
diff --git a/gcc/testsuite/g++.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
index 249530580d7..c30d6e93144 100644
--- a/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
@@ -40,5 +40,8 @@ set CFLAGS "-march=$gcc_march -O3"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.C]] \
"" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[C\]]] \
+"" $CFLAGS
+
# All done.
dg-finish
-- 
2.34.1

RE: [PATCH v1] RISC-V: Support FP llrint auto vectorization

2023-10-12 Thread Li, Pan2

Sure thing,  thanks a lot and will follow the guidance.

Pan

From: Kito Cheng 
Sent: Thursday, October 12, 2023 10:42 PM
To: Li, Pan2 
Cc: 钟居哲 ; gcc-patches ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Support FP llrint auto vectorization

I would prefer first approach since it no changes other than adding testcase, 
that might confusing other people.


Li, Pan2 mailto:pan2...@intel.com>> 於 2023年10月11日 週三 23:12 
寫道：
Sorry for misleading here.

When implement the llrint after lrint, I realize llrint (DF => SF) are 
supported by the lrint already in the previous patche(es).
Because they same the same standard name as well as the mode iterator.

Thus, I may have 2 options here for the patch naming.

1. Only mentioned test cases for llrint.
2. Named as support similar to lrint.

After some consideration from the situation like search from the git logs, I 
choose option 2 here and add some description in
as well.

Finally, is there any best practices for this case? Thank again for comments.

Pan

-Original Message-
From: Kito Cheng mailto:kito.ch...@gmail.com>>
Sent: Thursday, October 12, 2023 1:05 PM
To: Li, Pan2 mailto:pan2...@intel.com>>
Cc: juzhe.zh...@rivai.ai; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>
Subject: Re: [PATCH v1] RISC-V: Support FP llrint auto vectorization

Did I miss something? the title says support but it seems only testcase??

On Wed, Oct 11, 2023 at 8:38 PM Li, Pan2 
mailto:pan2...@intel.com>> wrote:
>
> Committed, thanks Juzhe.
>
>
>
> Pan
>
>
>
> From: juzhe.zh...@rivai.ai 
> mailto:juzhe.zh...@rivai.ai>>
> Sent: Thursday, October 12, 2023 11:34 AM
> To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
> mailto:gcc-patches@gcc.gnu.org>>
> Cc: Li, Pan2 mailto:pan2...@intel.com>>; Wang, Yanzhang 
> mailto:yanzhang.w...@intel.com>>; kito.cheng 
> mailto:kito.ch...@gmail.com>>
> Subject: Re: [PATCH v1] RISC-V: Support FP llrint auto vectorization
>
>
>
> LGTM
>
>
>
> 
>
> juzhe.zh...@rivai.ai
>
>
>
> From: pan2.li
>
> Date: 2023-10-12 11:28
>
> To: gcc-patches
>
> CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
>
> Subject: [PATCH v1] RISC-V: Support FP llrint auto vectorization
>
> From: Pan Li mailto:pan2...@intel.com>>
>
>
>
> This patch would like to support the FP llrint auto vectorization.
>
>
>
> * long long llrint (double)
>
>
>
> This will be the CVT from DF => DI from the standard name's perpsective,
>
> which has been covered in previous PATCH(es). Thus, this patch only add
>
> some test cases.
>
>
>
> gcc/testsuite/ChangeLog:
>
>
>
> * gcc.target/riscv/rvv/autovec/unop/test-math.h: Add type int64_t.
>
> * gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: New test.
>
> * gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: New test.
>
> * gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c: New test.
>
>
>
> Signed-off-by: Pan Li mailto:pan2...@intel.com>>
>
> ---
>
> .../riscv/rvv/autovec/unop/math-llrint-0.c| 14 +
>
> .../rvv/autovec/unop/math-llrint-run-0.c  | 63 +++
>
> .../riscv/rvv/autovec/unop/test-math.h|  2 +
>
> .../riscv/rvv/autovec/vls/math-llrint-0.c | 30 +
>
> 4 files changed, 109 insertions(+)
>
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
>
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
>
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c
>
>
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
>
> new file mode 100644
>
> index 000..2d90d232ba1
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
>
> @@ -0,0 +1,14 @@
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
> -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } 
> */
>
> +/* { dg-final { check-function-bodies "**" "" } } */
>
> +
>
> +#include "test-math.h"
>
> +
>
> +/*
>
> +** test_double_int64_t___builtin_llrint:
>
> +**   ...
>
> +**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
>
> +**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
>
> +**   ...
>
> +*/
>
> +TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llrint)
>
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
>
> new file mode 100644
>
> index 000..6b69f5568e9
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
>
> @@ -0,0 +1,63 @@
>
> +/* { dg-do run { target { riscv_v && rv64 } } } */
>
> +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize 
> -fno-vect-cost-model -ffast-m

[PATCH v1] RISC-V: Support FP lfloor/lfloorf auto vectorization

2023-10-12 Thread pan2 . li

From: Pan Li 

This patch would like to support the FP lfloor/lfloorf auto vectorization.

* long lfloor (double) for rv64
* long lfloorf (float) for rv32

Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lfloormn2 only act on DF => DI for
rv64, and SF => SI for rv32.

Given we have code like:

void
test_lfloor (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lfloor (in[i]);
}

Before this patch:
.L3:
  ...
  fld fa5,0(a1)
  fcvt.l.da5,fa5,rdn
  sd  a5,-8(a0)
  ...
  bne a1,a4,.L3

After this patch:
  frrma6
  ...
  fsrmi   2 // RDN
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
  ...
  fsrma6

The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.

gcc/ChangeLog:

* config/riscv/autovec.md (lfloor2): New
pattern for lfloor/lfloorf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lfloor): New func decl for expanding lfloor.
* config/riscv/riscv-v.cc (expand_vec_lfloor): New func impl
for expanding lfloor.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   | 11 +++
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv-v.cc   | 10 +++
 .../riscv/rvv/autovec/unop/math-lfloor-0.c| 19 +
 .../riscv/rvv/autovec/unop/math-lfloor-1.c| 19 +
 .../rvv/autovec/unop/math-lfloor-run-0.c  | 69 +++
 .../rvv/autovec/unop/math-lfloor-run-1.c  | 69 +++
 .../riscv/rvv/autovec/vls/math-lfloor-0.c | 30 
 .../riscv/rvv/autovec/vls/math-lfloor-1.c | 30 
 9 files changed, 259 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 267691a0095..c5b1e52cbf9 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2242,6 +2242,7 @@ (define_expand "avg3_ceil"
 ;; - lrint/lrintf
 ;; - irintf
 ;; - lceil/lceilf
+;; - lfloor/lfloorf
 ;; -
 (define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2342,3 +2343,13 @@ (define_expand "lceil2"
 DONE;
   }
 )
+
+(define_expand "lfloor2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index ab65ab19524..49bdcdf2f93 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -304,6 +304,7 @@ enum insn_type : unsigned int
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
   UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
   UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
+  UNARY_OP_FRM_RDN = UNARY_OP | FRM_RDN_P,
   UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P,
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
@@ -479,6 +480,7 @@ void expand_vec_roundeven (rtx, rtx, machine_mode, 
machine_mode);
 void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lround (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b03213dd8ed..21d86c3f917 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4142,4 +4142,14 @@ expand_vec_lceil (

Re: [PATCH] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-12 Thread Jason Merrill


On 10/12/23 17:04, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.033s.

I've added some debug prints to make sure that the rest of cp_fold_r
is still performed as before.

 PR c++/111660

gcc/cp/ChangeLog:

 * cp-gimplify.cc (cp_fold_immediate_r) : Return
 integer_zero_node instead of break;.
 (cp_fold_immediate): Return true if cp_fold_immediate_r returned
 error_mark_node.

gcc/testsuite/ChangeLog:

 * g++.dg/cpp0x/hog1.C: New test.
---
  gcc/cp/cp-gimplify.cc |  9 ++--
  gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 +++
  2 files changed, 82 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index bdf6e5f98ff..ca622ca169a 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
break;
if (TREE_OPERAND (stmt, 1)
  && cp_walk_tree (&TREE_OPERAND (stmt, 1), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
if (TREE_OPERAND (stmt, 2)
  && cp_walk_tree (&TREE_OPERAND (stmt, 2), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
/* We're done here.  Don't clear *walk_subtrees here though: we're 
called
 from cp_fold_r and we must let it recurse on the expression with
 cp_fold.  */
-  break;
+  return integer_zero_node;


I'm concerned this will end up missing something like

1 ? 1 : ((1 ? 1 : 1), immediate())

as the integer_zero_node from the inner ?: will prevent walk_tree from 
looking any farther.


Maybe we want to handle COND_EXPR in cp_fold_r instead of here?

Jason

Re: [PATCH v1] RISC-V: Support FP lfloor/lfloorf auto vectorization

2023-10-12 Thread juzhe.zh...@rivai.ai

OK.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-13 09:38
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP lfloor/lfloorf auto vectorization
From: Pan Li 
 
This patch would like to support the FP lfloor/lfloorf auto vectorization.
 
* long lfloor (double) for rv64
* long lfloorf (float) for rv32
 
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lfloormn2 only act on DF => DI for
rv64, and SF => SI for rv32.
 
Given we have code like:
 
void
test_lfloor (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lfloor (in[i]);
}
 
Before this patch:
.L3:
  ...
  fld fa5,0(a1)
  fcvt.l.da5,fa5,rdn
  sd  a5,-8(a0)
  ...
  bne a1,a4,.L3
 
After this patch:
  frrma6
  ...
  fsrmi   2 // RDN
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
  ...
  fsrma6
 
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lfloor2): New
pattern for lfloor/lfloorf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lfloor): New func decl for expanding lfloor.
* config/riscv/riscv-v.cc (expand_vec_lfloor): New func impl
for expanding lfloor.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 11 +++
gcc/config/riscv/riscv-protos.h   |  2 +
gcc/config/riscv/riscv-v.cc   | 10 +++
.../riscv/rvv/autovec/unop/math-lfloor-0.c| 19 +
.../riscv/rvv/autovec/unop/math-lfloor-1.c| 19 +
.../rvv/autovec/unop/math-lfloor-run-0.c  | 69 +++
.../rvv/autovec/unop/math-lfloor-run-1.c  | 69 +++
.../riscv/rvv/autovec/vls/math-lfloor-0.c | 30 
.../riscv/rvv/autovec/vls/math-lfloor-1.c | 30 
9 files changed, 259 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 267691a0095..c5b1e52cbf9 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2242,6 +2242,7 @@ (define_expand "avg3_ceil"
;; - lrint/lrintf
;; - irintf
;; - lceil/lceilf
+;; - lfloor/lfloorf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2342,3 +2343,13 @@ (define_expand "lceil2"
 DONE;
   }
)
+
+(define_expand "lfloor2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index ab65ab19524..49bdcdf2f93 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -304,6 +304,7 @@ enum insn_type : unsigned int
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
   UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
   UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
+  UNARY_OP_FRM_RDN = UNARY_OP | FRM_RDN_P,
   UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P,
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
@@ -479,6 +480,7 @@ void expand_vec_roundeven (rtx, rtx, machine_mode, 
machine_mode);
void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_lround (rtx, rtx, machine_mode, machine_mode);
void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b03213dd8ed..21d86c3f917 100644
--- a/gcc/config/riscv/riscv-v.cc

[PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases

2023-10-12 Thread pan2 . li

From: Pan Li 

Leverage stdint-gcc.h for the int64_t types instead of typedef.
Or we may have conflict with stdint-gcc.h in somewhere else.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: Include
stdint-gcc.h for int types.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Remove int64_t
typedef.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c | 1 +
 .../gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c   | 1 +
 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h | 2 --
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
index 2d90d232ba1..4bf125f8cc8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
@@ -2,6 +2,7 @@
 /* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
+#include 
 #include "test-math.h"
 
 /*
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
index 6b69f5568e9..409175a8dff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target { riscv_v && rv64 } } } */
 /* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
 
+#include 
 #include "test-math.h"
 
 #define ARRAY_SIZE 128
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
index 3867bc50a14..a1c9d55bd48 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
@@ -68,8 +68,6 @@
 #define FRM_RMM 4
 #define FRM_DYN 7
 
-typedef long long int64_t;
-
 static inline void
 set_rm (unsigned rm)
 {
-- 
2.34.1

Re: [PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases

2023-10-12 Thread juzhe.zh...@rivai.ai

LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-13 10:22
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases
From: Pan Li 
 
Leverage stdint-gcc.h for the int64_t types instead of typedef.
Or we may have conflict with stdint-gcc.h in somewhere else.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: Include
stdint-gcc.h for int types.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Remove int64_t
typedef.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c | 1 +
.../gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c   | 1 +
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h | 2 --
3 files changed, 2 insertions(+), 2 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
index 2d90d232ba1..4bf125f8cc8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
@@ -2,6 +2,7 @@
/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
/* { dg-final { check-function-bodies "**" "" } } */
+#include 
#include "test-math.h"
/*
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
index 6b69f5568e9..409175a8dff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
@@ -1,6 +1,7 @@
/* { dg-do run { target { riscv_v && rv64 } } } */
/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+#include 
#include "test-math.h"
#define ARRAY_SIZE 128
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
index 3867bc50a14..a1c9d55bd48 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
@@ -68,8 +68,6 @@
#define FRM_RMM 4
#define FRM_DYN 7
-typedef long long int64_t;
-
static inline void
set_rm (unsigned rm)
{
-- 
2.34.1

RE: [PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases

2023-10-12 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, October 13, 2023 10:26 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases

LGTM。


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-10-13 10:22
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Leverage stdint-gcc.h for RVV test cases
From: Pan Li mailto:pan2...@intel.com>>

Leverage stdint-gcc.h for the int64_t types instead of typedef.
Or we may have conflict with stdint-gcc.h in somewhere else.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: Include
stdint-gcc.h for int types.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Remove int64_t
typedef.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c | 1 +
.../gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c   | 1 +
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h | 2 --
3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
index 2d90d232ba1..4bf125f8cc8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c
@@ -2,6 +2,7 @@
/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
/* { dg-final { check-function-bodies "**" "" } } */
+#include 
#include "test-math.h"
/*
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
index 6b69f5568e9..409175a8dff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c
@@ -1,6 +1,7 @@
/* { dg-do run { target { riscv_v && rv64 } } } */
/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+#include 
#include "test-math.h"
#define ARRAY_SIZE 128
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
index 3867bc50a14..a1c9d55bd48 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
@@ -68,8 +68,6 @@
#define FRM_RMM 4
#define FRM_DYN 7
-typedef long long int64_t;
-
static inline void
set_rm (unsigned rm)
{
--
2.34.1

Re: [PATCH] Disparage slightly for the alternative which move DFmode between SSE_REGS and GENERAL_REGS.

2023-10-12 Thread Hongtao Liu

On Thu, Jul 6, 2023 at 1:53 PM Uros Bizjak via Gcc-patches
 wrote:
>
> On Thu, Jul 6, 2023 at 3:14 AM liuhongt  wrote:
> >
> > For testcase
> >
> > void __cond_swap(double* __x, double* __y) {
> >   bool __r = (*__x < *__y);
> >   auto __tmp = __r ? *__x : *__y;
> >   *__y = __r ? *__y : *__x;
> >   *__x = __tmp;
> > }
> >
> > GCC-14 with -O2 and -march=x86-64 options generates the following code:
> >
> > __cond_swap(double*, double*):
> > movsd   xmm1, QWORD PTR [rdi]
> > movsd   xmm0, QWORD PTR [rsi]
> > comisd  xmm0, xmm1
> > jbe .L2
> > movqrax, xmm1
> > movapd  xmm1, xmm0
> > movqxmm0, rax
> > .L2:
> > movsd   QWORD PTR [rsi], xmm1
> > movsd   QWORD PTR [rdi], xmm0
> > ret
> >
> > rax is used to save and restore DFmode value. In RA both GENERAL_REGS
> > and SSE_REGS cost zero since we didn't disparage the
> > alternative in movdf_internal pattern, according to register
> > allocation order, GENERAL_REGS is allocated. The patch add ? for
> > alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal
> > pattern, after that we get optimal RA.
> >
> > __cond_swap:
> > .LFB0:
> > .cfi_startproc
> > movsd   (%rdi), %xmm1
> > movsd   (%rsi), %xmm0
> > comisd  %xmm1, %xmm0
> > jbe .L2
> > movapd  %xmm1, %xmm2
> > movapd  %xmm0, %xmm1
> > movapd  %xmm2, %xmm0
> > .L2:
> > movsd   %xmm1, (%rsi)
> > movsd   %xmm0, (%rdi)
> > ret
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > Ok for trunk?
> >
> >
> > gcc/ChangeLog:
> >
> > PR target/110170
> > * config/i386/i386.md (movdf_internal): Disparage slightly for
> > 2 alternatives (r,v) and (v,r) by adding constraint modifier
> > '?'.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr110170-3.c: New test.
>
> OK.
Some user reports the same issue in unixbench, i looks like an common
issue when swap 2 double variable
So I'd like to backport this patch to GCC13/GCC12/GCC11, the fix
should be generally good and at low risk.
Any comments?

>
> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386.md|  4 ++--
> >  gcc/testsuite/gcc.target/i386/pr110170-3.c | 11 +++
> >  2 files changed, 13 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr110170-3.c
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index a82cc353cfd..e47ced1bb70 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -3915,9 +3915,9 @@ (define_split
> >  ;; Possible store forwarding (partial memory) stall in alternatives 4, 6 
> > and 7.
> >  (define_insn "*movdf_internal"
> >[(set (match_operand:DF 0 "nonimmediate_operand"
> > -"=Yf*f,m   ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,v,v,v,m,*x,*x,*x,m ,r 
> > ,v,r  ,o ,r  ,m")
> > +"=Yf*f,m   ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,v,v,v,m,*x,*x,*x,m 
> > ,?r,?v,r  ,o ,r  ,m")
> > (match_operand:DF 1 "general_operand"
> > -"Yf*fm,Yf*f,G   ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C,v,m,v,C ,*x,m ,*x,v,r 
> > ,roF,rF,rmF,rC"))]
> > +"Yf*fm,Yf*f,G   ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C,v,m,v,C ,*x,m ,*x, v, 
> > r,roF,rF,rmF,rC"))]
> >"!(MEM_P (operands[0]) && MEM_P (operands[1]))
> > && (lra_in_progress || reload_completed
> > || !CONST_DOUBLE_P (operands[1])
> > diff --git a/gcc/testsuite/gcc.target/i386/pr110170-3.c 
> > b/gcc/testsuite/gcc.target/i386/pr110170-3.c
> > new file mode 100644
> > index 000..70daa89e9aa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr110170-3.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -fno-if-conversion -fno-if-conversion2" } */
> > +/* { dg-final { scan-assembler-not {(?n)movq.*r} } } */
> > +
> > +void __cond_swap(double* __x, double* __y) {
> > +  _Bool __r = (*__x < *__y);
> > +  double __tmp = __r ? *__x : *__y;
> > +  *__y = __r ? *__y : *__x;
> > +  *__x = __tmp;
> > +}
> > +
> > --
> > 2.39.1.388.g2fc9e9ca3c
> >



--
BR,
Hongtao

Re: [PATCH v2] RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the minimal VLEN exceeds 512.

2023-10-12 Thread Kito Cheng

Committed with few changelog tweak :P

On Thu, Oct 12, 2023 at 3:37 PM 钟居哲  wrote:
>
> LGTM
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-10-13 02:40
> To: gcc-patches; kito.cheng; palmer; jeffreyalaw; rdapp; juzhe.zhong
> CC: Kito Cheng
> Subject: [PATCH v2] RISC-V: Fix the riscv_legitimize_poly_move issue on 
> targets where the minimal VLEN exceeds 512.
> riscv_legitimize_poly_move was expected to ensure the poly value is at most 32
> times smaller than the minimal VLEN (32 being derived from '4096 / 128').
> This assumption held when our mode modeling was not so precisely defined.
> However, now that we have modeled the mode size according to the correct 
> minimal
> VLEN info, the size difference between different RVV modes can be up to 64
> times. For instance, comparing RVVMF64BI and RVVMF1BI, the sizes are [1, 1]
> versus [64, 64] respectively.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_legitimize_poly_move): Bump
> max_power to 64.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/riscv/rvv/autovec/bug-01.C: New.
> * g++.target/riscv/rvv/rvv.exp: Add autovec folder.
> ---
> gcc/config/riscv/riscv.cc |  5 ++-
> gcc/config/riscv/riscv.h  |  5 +++
> .../g++.target/riscv/rvv/autovec/bug-01.C | 33 +++
> gcc/testsuite/g++.target/riscv/rvv/rvv.exp|  3 ++
> 4 files changed, 43 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 739fc77e785..d43bc765ce7 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -2411,9 +2411,8 @@ riscv_legitimize_poly_move (machine_mode mode, rtx 
> dest, rtx tmp, rtx src)
>  }
>else
>  {
> -  /* FIXME: We currently DON'T support TARGET_MIN_VLEN > 4096.  */
> -  int max_power = exact_log2 (4096 / 128);
> -  for (int i = 0; i < max_power; i++)
> +  int max_power = exact_log2 (MAX_POLY_VARIANT);
> +  for (int i = 0; i <= max_power; i++)
> {
>   int possible_div_factor = 1 << i;
>   if (factor % (vlenb / possible_div_factor) == 0)
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 4b8d57509fb..3d2723f5339 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1197,4 +1197,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
> (void);
> #define OPTIMIZE_MODE_SWITCHING(ENTITY) (TARGET_VECTOR)
> #define NUM_MODES_FOR_MODE_SWITCHING {VXRM_MODE_NONE, riscv_vector::FRM_NONE}
> +
> +/* The size difference between different RVV modes can be up to 64 times.
> +   e.g. RVVMF64BI vs RVVMF1BI on zvl512b, which is [1, 1] vs [64, 64].  */
> +#define MAX_POLY_VARIANT 64
> +
> #endif /* ! GCC_RISCV_H */
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C 
> b/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
> new file mode 100644
> index 000..fd10009ddbe
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
> @@ -0,0 +1,33 @@
> +/* { dg-options "-march=rv64gcv_zvl512b -mabi=lp64d -O3" } */
> +
> +class c {
> +public:
> +  int e();
> +  void j();
> +};
> +float *d;
> +class k {
> +  int f;
> +
> +public:
> +  k(int m) : f(m) {}
> +  float g;
> +  float h;
> +  void n(int m) {
> +for (int i; i < m; i++) {
> +  d[0] = d[1] = d[2] = g;
> +  d[3] = h;
> +  d += f;
> +}
> +  }
> +};
> +c l;
> +void o() {
> +  int b = l.e();
> +  k a(b);
> +  for (;;)
> +if (b == 4) {
> +  l.j();
> +  a.n(2);
> +}
> +}
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/rvv.exp 
> b/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
> index 249530580d7..c30d6e93144 100644
> --- a/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
> +++ b/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
> @@ -40,5 +40,8 @@ set CFLAGS "-march=$gcc_march -O3"
> dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.C]] \
> "" $CFLAGS
> +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[C\]]] \
> +"" $CFLAGS
> +
> # All done.
> dg-finish
> --
> 2.34.1
>
>

Re: [committed] RISC-V: Fix INSN costing and more zicond tests

2023-10-12 Thread Hans-Peter Nilsson

> Date: Fri, 29 Sep 2023 16:37:21 -0600
> From: Jeff Law 

> So this ends up looking a lot like the bits that I had to revert several 
> weeks ago :-)
> 
> The core issue we have is given an INSN the generic code will cost the 
> SET_SRC and SET_DEST and sum them.  But that's far from ideal on a RISC 
> target.
> 
> For a register destination, the cost can be determined be looking at 
> just the SET_SRC.  Which is precisely what this patch does.  When the 
> outer code is an INSN and we're presented with a SET we take one of two 
> paths.
> 
> If the destination is a register, then we recurse just on the SET_SRC 
> and we're done.  Otherwise we fall back to the existing code which sums 
> the cost of the SET_SRC and SET_DEST.

Ackchyually...  that "otherwise" happens for calls to
set_rtx_cost (et al), but not calls to insn_cost.

IOW, with that patch, it seems you're mimicking insn_cost
behavior also for set_rtx_cost (et al).  You're likely aware
of this, but when seeing these target cost functions tweaked
for reasons that appear somewhat empirical, I felt compelled
to point out the related rabbit-hole.

While I'm ranting, these slightly different cost api:s,
somewhat arbitrarily, (or empirically) picked by callers, is
a problem by itself.  Not to mention that the default use of
set_rtx_cost means you get hit by another problem; the
default cost of 0 for registers is also a magic number to
pattern_cost to set the cost to INSN_COSTS (1).

The default insn_cost implementation, which RISC-V uses as
opposed to implementing the TARGET_INSN_COST hook, only
looks at the SET_SRC for calls to insn_cost for single-sets.
See pattern_cost.  I believe that's a bug.  Fixing that was
attempted in 2016 (by Bernd S.), a patch which was later
reverted: cf. commits r7-4866-g334442f282a9d6 and
r7-4930-g03612f25277590.  Hence rabbit-hole.  (And no,
implementing TARGET_INSN_COST doesn't automatically fix
things.  Too much of the gcc middle-end appears tuned to the
default behavior.)

Sorry for the rant; have a nice day and a better week-end.

>  That fallback path isn't great 
> and probably could be further improved (just costing SET_DEST in that 
> case is probably quite reasonable).
> 
> The difference between this version and the bits that slipped through by 
> accident several weeks ago is that old version mis-used the API due to a 
> thinko on my part.
> 
> This tightens up various zicond tests to avoid undesirable matching.
> 
> This has been tested on rv64gc -- the only difference it makes on the 
> testsuite is the new tests (included in this patch) flip from failing to 
> passing.
> 
> Pushed to the trunk.
> 
> Jeff

brgds, H-P

[PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Juzhe-Zhong

This patch fixes this following FAILs in RISC-V regression:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
stmts"

The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.

We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:

1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
condtional mask).
   
   This situation we just need to leverage the current MASK_GATHER_LOAD which 
can achieve SLP MASK_LEN_GATHER_LOAD.

2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
-1)
   
   Current SLP check will failed on dummy mask -1, so we relax the check in 
tree-vect-slp.cc and allow it to be materialized.

Consider this following case:

void __attribute__((noipa))
f (int *restrict y, int *restrict x, int *restrict indices, int n)
{
  for (int i = 0; i < n; ++i)
{
  y[i * 2] = x[indices[i * 2]] + 1;
  y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
}
}

https://godbolt.org/z/WG3M3n7Mo

GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:

f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vsetvli zero,a5,e32,m1,ta,ma
vlseg2e32.v v6,(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v6
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v1,(a1),v2
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v7
vsetvli zero,zero,e32,m1,ta,ma
vadd.vi v4,v1,1
vsetvli zero,zero,e64,m2,ta,ma
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
sllia6,a5,3
vadd.vi v5,v2,2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vsseg2e32.v v4,(a0)
add a2,a2,a6
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

After this patch:

f:
ble a3,zero,.L5
li  a5,1
csrrt1,vlenb
sllia5,a5,33
srlia7,t1,2
addia5,a5,1
sllia3,a3,1
neg t3,a7
vsetvli a4,zero,e64,m1,ta,ma
vmv.v.x v4,a5
.L3:
minua5,a3,a7
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v1
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
mv  a6,a3
vadd.vv v2,v2,v4
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a0)
add a2,a2,t1
add a0,a0,t1
add a3,a3,t3
bgtua6,a7,.L3
.L5:
ret

Note that I found we are missing conditional mask gather_load SLP test, Append 
a test for it in this patch.

Tested on RISC-V and Bootstrap && Regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_get_and_check_slp_defs): Ditto.
(vect_build_slp_tree_1): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-gather-6.c: New test.

---
 gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++
 gcc/tree-vect-slp.cc  | 22 ++
 gcc/tree-vect-stmts.cc| 10 +-
 3 files changed, 42 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
new file mode 100644
index 000..ff55f321854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+void
+f (int *restrict y, int *restrict x, int *restrict indices, int *restrict 
cond, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  if (cond[i * 2])
+   y[i * 2] = x[indices[i * 2]] + 1;
+  if (cond[i * 2 + 1])
+   y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
+}
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
vect_gather_load_ifn } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index fa098f9ff4e..38fe6ba6296 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -542,6 +542,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
return arg1_map;
 
  case IFN_MASK_GATHER_LOAD:
+ case IFN_MASK_LEN_GATHER_LOAD:
return arg1_arg4_map;
 
  case IFN_MASK_STORE:
@@ -700,8 +701,7 @@ vect_get_and_check_sl

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread juzhe.zh...@rivai.ai

Hi, Richi. 

As you suggest, I keep MAK_LEN_GATHER_LOAD (...,-1) format and support SLP for 
that in V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632846.html 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 19:14
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> In tree-vect-stmts.cc
> 
> vect_check_scalar_mask
> 
> Failed here:
> 
>   /* If the caller is not prepared for adjusting an external/constant
>  SLP mask vector type fail.  */
>   if (slp_node
>   && !mask_node
 
^^^
 
where's the mask_node?
 
>   && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "SLP mask argument is not vectorized.\n");
>   return false;
> }
> 
> If we allow vect_constant_def, we should adjust constant SLP mask ? in the 
> caller "vectorizable_load" ?
> 
> But I don't know how to adjust that.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD 
> > > SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > 
> > > > The root cause of these FAIL is that GCC SLP failed on 
> > > > MASK_LEN_GATHER_LOAD.
> > > > 
> > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > > tree-vect-patterns.cc if it is same
> > > > situation as GATHER_LOAD (no conditional mask).
> > > > 
> > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER

[PATCH v1] RISC-V: Add test for FP iroundf auto vectorization

2023-10-12 Thread pan2 . li

From: Pan Li 

The below FP API are supported already by sharing the same standard
name, as well as the machine mode.

int iroundf (float);

This patch would like to add the test cases for ensuring the
correctness.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-iround-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-iround-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-iround-0.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/unop/math-iround-0.c| 19 ++
 .../rvv/autovec/unop/math-iround-run-0.c  | 63 +++
 .../riscv/rvv/autovec/vls/math-iround-0.c | 30 +
 3 files changed, 112 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iround-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iround-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-iround-0.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iround-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iround-0.c
new file mode 100644
index 000..f32515d1403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iround-0.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_int___builtin_iroundf:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+4
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ret
+*/
+TEST_UNARY_CALL_CVT (float, int, __builtin_iroundf)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iround-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iround-run-0.c
new file mode 100644
index 000..2e05e443afe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-iround-run-0.c
@@ -0,0 +1,63 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+float in[ARRAY_SIZE];
+int out[ARRAY_SIZE];
+int ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL_CVT (float, int, __builtin_iroundf)
+TEST_ASSERT (int)
+
+TEST_INIT_CVT (float, 1.2, int, __builtin_iroundf (1.2), 1)
+TEST_INIT_CVT (float, -1.2, int, __builtin_iroundf (-1.2), 2)
+TEST_INIT_CVT (float, 0.5, int, __builtin_iroundf (0.5), 3)
+TEST_INIT_CVT (float, -0.5, int, __builtin_iroundf (-0.5), 4)
+TEST_INIT_CVT (float, 0.1, int, __builtin_iroundf (0.1), 5)
+TEST_INIT_CVT (float, -0.1, int, __builtin_iroundf (-0.1), 6)
+TEST_INIT_CVT (float, 3.0, int, __builtin_iroundf (3.0), 7)
+TEST_INIT_CVT (float, -3.0, int, __builtin_iroundf (-3.0), 8)
+TEST_INIT_CVT (float, 8388607.5, int, __builtin_iroundf (8388607.5), 9)
+TEST_INIT_CVT (float, 8388609.0, int, __builtin_iroundf (8388609.0), 10)
+TEST_INIT_CVT (float, -8388607.5, int, __builtin_iroundf (-8388607.5), 11)
+TEST_INIT_CVT (float, -8388609.0, int, __builtin_iroundf (-8388609.0), 12)
+TEST_INIT_CVT (float, 0.0, int, __builtin_iroundf (-0.0), 13)
+TEST_INIT_CVT (float, -0.0, int, __builtin_iroundf (-0.0), 14)
+TEST_INIT_CVT (float, 2147483520.0, int, __builtin_iroundf (2147483520.0), 15)
+TEST_INIT_CVT (float, 2147483648.0, int, 0x7fff, 16)
+TEST_INIT_CVT (float, -2147483648.0, int, __builtin_iroundf (-2147483648.0), 
17)
+TEST_INIT_CVT (float, -2147483904.0, int, 0x8000, 18)
+TEST_INIT_CVT (float, __builtin_inf (), int, __builtin_iroundf (__builtin_inff 
()), 19)
+TEST_INIT_CVT (float, -__builtin_inf (), int, __builtin_iroundf 
(-__builtin_inff ()), 20)
+TEST_INIT_CVT (float, __builtin_nanf (""), int, 0x7fff, 21)
+
+int
+main ()
+{
+  RUN_TEST_CVT (float, int, 1, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 2, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 3, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 4, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 5, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 6, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 7, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 8, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 9, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 10, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 11, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 12, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 13, __builtin_iroundf, in, out, ref, ARRAY_SIZE);
+  RUN_TEST_CVT (float, int, 1

Re: [PATCH v1] RISC-V: Add test for FP iroundf auto vectorization

2023-10-12 Thread juzhe.zhong

lgtm Replied Message Frompan2...@intel.comDate10/13/2023 13:33 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Add test for FP iroundf auto vectorization

RE: [PATCH v1] RISC-V: Add test for FP iroundf auto vectorization

2023-10-12 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zhong 
Sent: Friday, October 13, 2023 1:39 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; Li, Pan2 ; Wang, Yanzhang 
; kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Add test for FP iroundf auto vectorization

lgtm
 Replied Message 
From
pan2...@intel.com
Date
10/13/2023 13:33
To
gcc-patches@gcc.gnu.org
Cc
juzhe.zh...@rivai.ai,
pan2...@intel.com,
yanzhang.w...@intel.com,
kito.ch...@gmail.com
Subject
[PATCH v1] RISC-V: Add test for FP iroundf auto vectorization

[PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV

2023-10-12 Thread Juzhe-Zhong

Like ARM SVE and GCN, add RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr69907.c: Add RVV.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
index b348526b62f..f63b42a271a 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
@@ -22,5 +22,5 @@ void foo(unsigned *p1, unsigned short *p2)
 /* Disable for SVE because for long or variable-length vectors we don't
get an unrolled epilogue loop.  Also disable for AArch64 Advanced SIMD,
because there we can vectorize the epilogue using mixed vector sizes.
-   Likewise for AMD GCN.  */
-/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a 
load is not supported" "slp1" { target { { ! aarch64*-*-* } && { ! amdgcn*-*-* 
} } } } } */
+   Likewise for AMD GCN and RVV.  */
+/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a 
load is not supported" "slp1" { target { { ! aarch64*-*-* } && { { ! 
amdgcn*-*-* } && { ! riscv_v } } } } } } */
-- 
2.36.3

1 2 >

1 - 100 of 107 matches

Mail list logo