[PATCH] Fix Xcode 16 build break with NULL != nullptr

2024-07-10 Thread dani
From: Daniel Bertalan 

As of Xcode 16 beta 2 with the macOS 15 SDK, each re-inclusion of the
stddef.h header causes the NULL macro in C++ to be re-defined to an
integral constant (__null). This makes the workaround in d59a576b8
("Redefine NULL to nullptr") ineffective, as other headers that are
typically included after system.h (such as obstack.h) do include
stddef.h too.

This can be seen by running the sample below through `clang++ -E`

#include 
#define NULL nullptr
#include 
NULL

The relevant libc++ change is here:
https://github.com/llvm/llvm-project/commit/2950283dddab03c183c1be2d7de9d4999cc86131

This commit fixes the instances where NULL being an integral constant
instead of a null pointer literal (such as no longer implicitly
converting to a pointer when used as a template function's argument).

gcc/value-pointer-equiv.cc:65:43: error: no viable conversion from 
`pair::type, typename 
__unwrap_ref_decay::type>' to 'const pair'

65 |   const std::pair  m_marker = std::make_pair (NULL, NULL);
   |   ^~~

As noted in the previous commit though, the proper solution would be to
phase out the usages of NULL in GCC's C++ source code.

gcc/analyzer/ChangeLog:

* diagnostic-manager.cc (saved_diagnostic::saved_diagnostic):
Change NULL to nullptr.
(struct null_assignment_sm_context): Likewise.
* infinite-loop.cc: Likewise.
* infinite-recursion.cc: Likewise.
* varargs.cc (va_list_state_machine::on_leak): Likewise.

gcc/ChangeLog:

* value-pointer-equiv.cc: Change NULL to nullptr.
---
 gcc/analyzer/diagnostic-manager.cc | 18 +-
 gcc/analyzer/infinite-loop.cc  |  2 +-
 gcc/analyzer/infinite-recursion.cc |  2 +-
 gcc/analyzer/varargs.cc|  2 +-
 gcc/value-pointer-equiv.cc |  2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index fe943ac61c9e..51304b0795b6 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -679,12 +679,12 @@ saved_diagnostic::saved_diagnostic (const state_machine 
*sm,
   m_stmt (ploc.m_stmt),
   /* stmt_finder could be on-stack; we want our own copy that can
  outlive that.  */
-  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : NULL),
+  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : nullptr),
   m_loc (ploc.m_loc),
   m_var (var), m_sval (sval), m_state (state),
-  m_d (std::move (d)), m_trailing_eedge (NULL),
+  m_d (std::move (d)), m_trailing_eedge (nullptr),
   m_idx (idx),
-  m_best_epath (NULL), m_problem (NULL),
+  m_best_epath (nullptr), m_problem (nullptr),
   m_notes ()
 {
   /* We must have an enode in order to be able to look for paths
@@ -1800,10 +1800,10 @@ public:
stmt,
stack_depth,
sm,
-   NULL,
+   nullptr,
src_sm_val,
dst_sm_val,
-   NULL,
+   nullptr,
dst_state,
src_node));
 return false;
@@ -1993,9 +1993,9 @@ struct null_assignment_sm_context : public sm_context
m_sm,
var_new_sval,
from, to,
-   NULL,
+   nullptr,
*m_new_state,
-   NULL));
+   nullptr));
   }
 
   void set_next_state (const gimple *stmt,
@@ -2019,9 +2019,9 @@ struct null_assignment_sm_context : public sm_context
m_sm,
sval,
from, to,
-   NULL,
+   nullptr,
*m_new_state,
-   NULL));
+   nullptr));
   }
 
   void warn (const supernode *, const gimple *,
diff --git a/gcc/analyzer/infinite-loop.cc b/gcc/analyzer/infinite-loop.cc
index 8ba8e70acffc..6ac0a5b373d8 100644
--- a/gcc/analyzer/infinite-loop.cc
+++ b/gcc/analyzer/infinite-loop.cc
@@ -240,7 +240,7 @@ public:
enode->get_function ()->decl,
enode->get_stack_depth ()),
enode,
-   NULL, NULL, NULL));
+   nullptr, nullptr, nullptr));
 
 logger *logger = emission_path->get_logger ();
 
diff --git a/gcc/analyzer/infinite-recursion.cc 
b/gc

Re: [PATCH] Fix Xcode 16 build break with NULL != nullptr

2024-07-10 Thread Xi Ruoyao
On Wed, 2024-07-10 at 06:59 +, d...@danielbertalan.dev wrote:
> diff --git a/gcc/value-pointer-equiv.cc b/gcc/value-pointer-equiv.cc
> index bfc940ec9915..f8564536c308 100644
> --- a/gcc/value-pointer-equiv.cc
> +++ b/gcc/value-pointer-equiv.cc
> @@ -62,7 +62,7 @@ public:
>  private:
>    auto_vec> m_stack;
>    auto_vec m_replacements;
> -  const std::pair  m_marker = std::make_pair (NULL, NULL);
> +  const std::pair m_marker = std::make_pair(nullptr, nullptr);
>  };

AFAIK we prefer NULL_TREE for this.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] [alpha] adjust MEM alignment for block move [PR115459] (was: Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required)

2024-07-10 Thread Uros Bizjak
On Thu, Jun 13, 2024 at 9:37 AM Alexandre Oliva  wrote:
>
> Hello, Maciej,
>
> On Jun 12, 2024, "Maciej W. Rozycki"  wrote:
>
> >  This has regressed building the `alpha-linux-gnu' target, in libada, as
> > from commit d6b756447cd5 including GCC 14 and up to current GCC 15 trunk:
>
> > | Error detected around g-debpoo.adb:1896:8|
>
> > I have filed PR #115459.
>
> Thanks!
>
> This was tricky to duplicate without access to an alpha-linux-gnu
> machine.  I ended up building an uberbaum tree with --disable-shared
> --disable-threads --enable-languages=ada up to all-target-libgcc, then I
> replaced gcc/collect2 with a wrapper script that dropped crt[1in].o and
> -lc, so that link tests in libada/configure would succeed without glibc
> for the target.  libada still wouldn't build, because of the missing
> glibc headers, but I could compile g-depboo.adb with -I pointing at a
> x86_64-linux-gnu's gcc/ada/rts build tree, and with that, at -O2, I
> could trigger the problem and investigate it.  And with the following
> patch, the problem seems to be gone.
>
> Maciej, would you be so kind as to give it a spin with a native
> regstrap?  TIA,
>
> Richard, is this ok to install if regstrapping succeeds?
>
>
> Before issuing loads or stores for a block move, adjust the MEM
> alignments if analysis of the addresses enabled the inference of
> stricter alignment.  This ensures that the MEMs are sufficiently
> aligned for the corresponding insns, which avoids trouble in case of
> e.g. substitutions into SUBREGs.
>
>
> for  gcc/ChangeLog
>
> PR target/115459
> * config/alpha/alpha.cc (alpha_expand_block_move): Adjust
> MEMs to match inferred alignment.

LGTM, based on a successful bootstrap/regtest report down the reply thread.

Thanks,
Uros.

> ---
>  gcc/config/alpha/alpha.cc |   12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
> index 1126cea1f7ba2..e090e74b9d073 100644
> --- a/gcc/config/alpha/alpha.cc
> +++ b/gcc/config/alpha/alpha.cc
> @@ -3820,6 +3820,12 @@ alpha_expand_block_move (rtx operands[])
>else if (a >= 16 && c % 2 == 0)
> src_align = 16;
> }
> +
> +  if (MEM_P (orig_src) && MEM_ALIGN (orig_src) < src_align)
> +   {
> + orig_src = shallow_copy_rtx (orig_src);
> + set_mem_align (orig_src, src_align);
> +   }
>  }
>
>tmp = XEXP (orig_dst, 0);
> @@ -3841,6 +3847,12 @@ alpha_expand_block_move (rtx operands[])
>else if (a >= 16 && c % 2 == 0)
> dst_align = 16;
> }
> +
> +  if (MEM_P (orig_dst) && MEM_ALIGN (orig_dst) < dst_align)
> +   {
> + orig_dst = shallow_copy_rtx (orig_dst);
> + set_mem_align (orig_dst, dst_align);
> +   }
>  }
>
>ofs = 0;
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] testsuite: Tests the pattern folding x/sqrt(x) to sqrt(x) for Float16

2024-07-10 Thread Kyrylo Tkachov
Hi Jennifer,

> On 9 Jul 2024, at 14:07, Jennifer Schmitz  wrote:
> 
> As a follow-up to adding a pattern that folds x/sqrt(x) to sqrt(x) in 
> match.pd, this patch adds a test case for type Float16 for armv8.2-a+fp16.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> Ok for mainline?
> 

Ok. I’ve pushed the patch for you to trunk.
Thanks,
Kyrill


> Signed-off-by: Jennifer Schmitz 
> 
> gcc/testsuite/
> 
> * gcc.target/aarch64/sqrt_div_float16.c: New test.
> 



Re: [PATCH] testsuite: Tests the pattern folding x/sqrt(x) to sqrt(x) for Float16

2024-07-10 Thread Richard Sandiford
Jennifer Schmitz  writes:
> As a follow-up to adding a pattern that folds x/sqrt(x) to sqrt(x) in 
> match.pd, this patch adds a test case for type Float16 for armv8.2-a+fp16.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> Ok for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/testsuite/
>
> * gcc.target/aarch64/sqrt_div_float16.c: New test.
>
> commit f909f882dda56e33fde2a06f4c1318d7e691e5c9
> Author: Jennifer Schmitz 
> Date:   Mon Jul 8 18:54:54 2024 +0530
>
> [PATCH] testsuite: Test the pattern folding x/sqrt(x) to sqrt(x) for 
> Float16
> 
> As a follow-up to adding a pattern that folds x/sqrt(x) to sqrt(x) in 
> match.pd,
> this patch adds a test case for type Float16 for armv8.2-a+fp16.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> regression.
> Ok for mainline?
> 
> Signed-off-by: Jennifer Schmitz 
> 
> gcc/testsuite/
> 
> * gcc.target/aarch64/sqrt_div_float16.c: New test.
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sqrt_div_float16.c 
> b/gcc/testsuite/gcc.target/aarch64/sqrt_div_float16.c
> new file mode 100644
> index 000..c4f297ef17a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sqrt_div_float16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */
> +/* { dg-require-effective-target c99_runtime } */
> +
> +#pragma GCC target ("arch=armv8.2-a+fp16")
> +
> +_Float16 f (_Float16 x) 
> +{
> +  _Float16 t1 = __builtin_sqrt (x);
> +  _Float16 t2 = x / t1;
> +  return t2;
> +}
> +
> +/* { dg-final { scan-tree-dump "gimple_simplified to t2_\[0-9\]+ = .SQRT 
> .x_\[0-9\]*.D.." "forwprop1" } } */

I'm a bit nervous about tying a match.pd test to a specific dump file,
since match.pd is indirectly used by many passes.  How about instead
matching the end result, with:

/* { dg-options "-O2 -ffast-math" } */
/* { dg-final { check-function-bodies "**" "" } } */

#pragma ...

/*
** f:
**  fsqrt   h0, h0
**  ret
*/
_Float16 f (_Float16 x) 
...

(tabs rather than spaces in the asm quote).

OK with that change if you agree.

Thanks,
Richard


Re: [PATCH] testsuite: Tests the pattern folding x/sqrt(x) to sqrt(x) for Float16

2024-07-10 Thread Richard Sandiford
Richard Sandiford  writes:
> Jennifer Schmitz  writes:
>> As a follow-up to adding a pattern that folds x/sqrt(x) to sqrt(x) in 
>> match.pd, this patch adds a test case for type Float16 for armv8.2-a+fp16.
>>
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> Ok for mainline?
>>
>> Signed-off-by: Jennifer Schmitz 
>>
>> gcc/testsuite/
>>
>> * gcc.target/aarch64/sqrt_div_float16.c: New test.
>>
>> commit f909f882dda56e33fde2a06f4c1318d7e691e5c9
>> Author: Jennifer Schmitz 
>> Date:   Mon Jul 8 18:54:54 2024 +0530
>>
>> [PATCH] testsuite: Test the pattern folding x/sqrt(x) to sqrt(x) for 
>> Float16
>> 
>> As a follow-up to adding a pattern that folds x/sqrt(x) to sqrt(x) in 
>> match.pd,
>> this patch adds a test case for type Float16 for armv8.2-a+fp16.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>> regression.
>> Ok for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/testsuite/
>> 
>> * gcc.target/aarch64/sqrt_div_float16.c: New test.
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sqrt_div_float16.c 
>> b/gcc/testsuite/gcc.target/aarch64/sqrt_div_float16.c
>> new file mode 100644
>> index 000..c4f297ef17a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sqrt_div_float16.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */
>> +/* { dg-require-effective-target c99_runtime } */
>> +
>> +#pragma GCC target ("arch=armv8.2-a+fp16")
>> +
>> +_Float16 f (_Float16 x) 
>> +{
>> +  _Float16 t1 = __builtin_sqrt (x);
>> +  _Float16 t2 = x / t1;
>> +  return t2;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "gimple_simplified to t2_\[0-9\]+ = .SQRT 
>> .x_\[0-9\]*.D.." "forwprop1" } } */
>
> I'm a bit nervous about tying a match.pd test to a specific dump file,
> since match.pd is indirectly used by many passes.  How about instead
> matching the end result, with:
>
> /* { dg-options "-O2 -ffast-math" } */
> /* { dg-final { check-function-bodies "**" "" } } */
>
> #pragma ...
>
> /*
> ** f:
> **fsqrt   h0, h0
> **ret
> */
> _Float16 f (_Float16 x) 
> ...
>
> (tabs rather than spaces in the asm quote).
>
> OK with that change if you agree.

Sorry, the above collided with Kyrill's review, so please stick with
the committed version.

Richard


[PATCH] middle-end: Fix stalled swapped condition code value [PR115836]

2024-07-10 Thread Uros Bizjak
emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable.  However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.

Move calculation of scode value just before its only usage site to
avoid stalled scode value.

PR middle-end/115836

gcc/ChangeLog:

* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Also tested with original and minimized preprocessed source.
Unfortunately, even with the minimized source, the compilation takes
~5 minutes, and IMO such a trivial fix does not warrant that high
resource consumption.

OK for master and release branches?

Uros.
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 8bbbc94a98c..154964bd068 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -5632,11 +5632,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
   enum insn_code icode;
   machine_mode compare_mode;
   enum mode_class mclass;
-  enum rtx_code scode;
 
   if (unsignedp)
 code = unsigned_condition (code);
-  scode = swap_condition (code);
 
   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */
@@ -5751,6 +5749,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
 
  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
+ enum rtx_code scode = swap_condition (code);
+
  tem = emit_cstore (target, icode, scode, mode, compare_mode,
 unsignedp, op1, op0, normalizep, target_mode);
  if (tem)


Re: Re: [PATCH] [RISC-V] c implies zca, and conditionally zcf & zcd

2024-07-10 Thread Fei Gao
On 2024-07-09 23:28  Jeff Law  wrote:
>
>
>
>On 7/9/24 1:10 AM, Fei Gao wrote:
>> According to Zc-1.0.4-3.pdf from
>> https://github.com/riscvarchive/riscv-code-size-reduction/releases/tag/v1.0.4-3
>> The rule is that:
>> - C always implies Zca
>> - C+F implies Zcf (RV32 only)
>> - C+D implies Zcd
>>
>> Signed-off-by: Fei Gao 
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc:
>> c implies zca, and conditionally zcf & zcd.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/attribute-15.c: adapt TC.
>> * gcc.target/riscv/attribute-18.c: likewise.
>> * gcc.target/riscv/pr110696.c: likewise.
>> * gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: likewise.
>> * gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: likewise.
>> * gcc.target/riscv/rvv/base/pr114352-1.c: likewise.
>> * gcc.target/riscv/rvv/base/pr114352-3.c: likewise.
>> * gcc.target/riscv/arch-39.c: New test.
>> * gcc.target/riscv/arch-40.c: New test.
>It looks like this is failing the pre-commit testing: 

I run the regression of gcc locally and compared the delta before
submitting patches. But surprisingly, pre-commit CI reported failure.
I dug further and found my binutils is out of date. Zaamo and Zalrsc
were introduced recently in binutils and gcc configure checks if AS
supports these new extensions, and print them into riscv attributes if
supported. The old binutils fails to support, so some Zaamo and Zalrsc
testcases failed in reference test, causing no difference in the comparison.

I will send V2.

BR
Fei
>
>> New Failures Across All Affected Targets (8 targets / 8 total targets)
>> FAIL: gcc.target/riscv/attribute-16.c   -O0   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-16.c   -O1   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-16.c   -O2   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-16.c   -O2 -flto -fno-use-linker-plugin 
>> -flto-partition=none   scan-assembler .attribute arch, 
>> "rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-16.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects   scan-assembler .attribute arch, 
>> "rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-16.c   -O3 -g   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-16.c   -Os   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-17.c   -O0   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-17.c   -O1   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-17.c   -O2   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-17.c   -O2 -flto -fno-use-linker-plugin 
>> -flto-partition=none   scan-assembler .attribute arch, 
>> "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-17.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects   scan-assembler .attribute arch, 
>> "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-17.c   -O3 -g   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/attribute-17.c   -Os   scan-assembler .attribute 
>> arch, 
>> "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0"
>> FAIL: gcc.target/riscv/pr110696.c   -O0   scan-assembler .attribute arch, 
>> "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1_zca1p0_zcd1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl1024b1p0_zvl128b1p0_zvl2048b1p0_zvl256b1p0_zvl32b1p0_zvl4096b1p0_zvl512b1p0_zvl64b1p0"
>> FAIL: gcc.target/riscv/pr110696.c   -O1   scan-assembler .attribute arch, 
>> "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1_zca1p0_zcd1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl1024b1p0_zvl128b1p0_zvl2048b1p0_zvl256b1p0_zvl32b1p0_zvl4096b1p0_zvl512b1p0_zvl64b1p0"
>> FAIL: gcc.target/riscv/pr110696.c   -O2   scan-assembler .attribute arch, 
>> "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1_zca1p0_zcd1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl1024b1p0_zvl128b1p0_zvl2048b1p0_zvl256b

Re: [PATCH V4] report message for operator %a on unaddressible operand

2024-07-10 Thread Kewen.Lin
Hi Jeff,

on 2024/6/5 16:30, Jiufu Guo wrote:
> Hi,
> 
> For PR96866, when printing asm code for modifier "%a", an addressable
> operand is required.  While the constraint "X" allow any kind of
> operand even which is hard to get the address directly. e.g. extern
> symbol whose address is in TOC.
> An error message would be reported to indicate the invalid asm operand.
> 
> Compare with previous version, changelog and emit message are updated.
> 
> Bootstrap®test pass on ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff(Jiufu Guo)
> 
>   PR target/96866
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (print_operand_address): Emit message for
>   Unsupported operand.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr96866-1.c: New test.
>   * gcc.target/powerpc/pr96866-2.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc  |  7 ++-
>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 18 ++
>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 13 +
>  3 files changed, 37 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 117999613d8..7e7c36a1bad 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14664,7 +14664,12 @@ print_operand_address (FILE *file, rtx x)
>   fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
>reg_names[SMALL_DATA_REG]);
>else
> - gcc_assert (!TARGET_TOC);
> + {
> +   /* Do not support getting address directly from TOC, emit error.
> +  No more work is needed for !TARGET_TOC. */
> +   if (TARGET_TOC)
> + output_operand_lossage ("%%a requires an address of memory");
> + }
>  }
>else if (GET_CODE (x) == PLUS && REG_P (XEXP (x, 0))
>  && REG_P (XEXP (x, 1)))
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> new file mode 100644
> index 000..bcebbd6e310
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> @@ -0,0 +1,18 @@
> +/* The "%a" modifier can't get the address of extern symbol directly from TOC
> +   with -fPIC, even the symbol is propgated for "X" constraint under -O2. */
> +/* { dg-options "-fPIC -O2" } */
> +
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-1.c" } */

This seems to XPASS on Power10 with pcrel?  This needs ! powerpc_pcrel guard if 
so.

> +
> +int x[2];
> +
> +int __attribute__ ((noipa))
> +f1 (void)
> +{
> +  int n;
> +  int *p = x;
> +  *p++;
> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
> +  return n;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> new file mode 100644
> index 000..0577fd6d588
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> @@ -0,0 +1,13 @@
> +/* The "%a" modifier can't get the address of extern symbol directly from TOC
> +   with -fPIC. */
> +/* { dg-options "-fPIC -O2" } */
> +
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-2.c" } */

Ditto.

The others look good to me.

BR,
Kewen

> +
> +void
> +f (void)
> +{
> +  extern int x;
> +  __asm__ volatile("#%a0" ::"X"(&x));
> +}



Re: [PATCH] Fix Xcode 16 build break with NULL != nullptr

2024-07-10 Thread Richard Biener
On Wed, Jul 10, 2024 at 9:00 AM  wrote:
>
> From: Daniel Bertalan 
>
> As of Xcode 16 beta 2 with the macOS 15 SDK, each re-inclusion of the
> stddef.h header causes the NULL macro in C++ to be re-defined to an
> integral constant (__null). This makes the workaround in d59a576b8
> ("Redefine NULL to nullptr") ineffective, as other headers that are
> typically included after system.h (such as obstack.h) do include
> stddef.h too.

Hmm, that's arguably a bug.  I think for submodules we do not control
like libiberty we have to go the C++ standard library include way and
amend the INCLUDE_* macros in system.h or include those unconditionally.

I also see that libcpp line-map.h includes  though that's
unconditionally
included via system.h as well.

I also wonder as macOS 15 SDK is still in beta if it's possible to fix its
stddef.h behavior?

> This can be seen by running the sample below through `clang++ -E`
>
> #include 
> #define NULL nullptr
> #include 
> NULL
>
> The relevant libc++ change is here:
> https://github.com/llvm/llvm-project/commit/2950283dddab03c183c1be2d7de9d4999cc86131
>
> This commit fixes the instances where NULL being an integral constant
> instead of a null pointer literal (such as no longer implicitly
> converting to a pointer when used as a template function's argument).
>
> gcc/value-pointer-equiv.cc:65:43: error: no viable conversion from 
> `pair::type, typename 
> __unwrap_ref_decay::type>' to 'const pair'
>
> 65 |   const std::pair  m_marker = std::make_pair (NULL, 
> NULL);
>|   ^~~
>
> As noted in the previous commit though, the proper solution would be to
> phase out the usages of NULL in GCC's C++ source code.
>
> gcc/analyzer/ChangeLog:
>
> * diagnostic-manager.cc (saved_diagnostic::saved_diagnostic):
> Change NULL to nullptr.
> (struct null_assignment_sm_context): Likewise.
> * infinite-loop.cc: Likewise.
> * infinite-recursion.cc: Likewise.
> * varargs.cc (va_list_state_machine::on_leak): Likewise.
>
> gcc/ChangeLog:
>
> * value-pointer-equiv.cc: Change NULL to nullptr.
> ---
>  gcc/analyzer/diagnostic-manager.cc | 18 +-
>  gcc/analyzer/infinite-loop.cc  |  2 +-
>  gcc/analyzer/infinite-recursion.cc |  2 +-
>  gcc/analyzer/varargs.cc|  2 +-
>  gcc/value-pointer-equiv.cc |  2 +-
>  5 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/analyzer/diagnostic-manager.cc 
> b/gcc/analyzer/diagnostic-manager.cc
> index fe943ac61c9e..51304b0795b6 100644
> --- a/gcc/analyzer/diagnostic-manager.cc
> +++ b/gcc/analyzer/diagnostic-manager.cc
> @@ -679,12 +679,12 @@ saved_diagnostic::saved_diagnostic (const state_machine 
> *sm,
>m_stmt (ploc.m_stmt),
>/* stmt_finder could be on-stack; we want our own copy that can
>   outlive that.  */
> -  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : NULL),
> +  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : nullptr),
>m_loc (ploc.m_loc),
>m_var (var), m_sval (sval), m_state (state),
> -  m_d (std::move (d)), m_trailing_eedge (NULL),
> +  m_d (std::move (d)), m_trailing_eedge (nullptr),
>m_idx (idx),
> -  m_best_epath (NULL), m_problem (NULL),
> +  m_best_epath (nullptr), m_problem (nullptr),
>m_notes ()
>  {
>/* We must have an enode in order to be able to look for paths
> @@ -1800,10 +1800,10 @@ public:
> stmt,
> stack_depth,
> sm,
> -   NULL,
> +   nullptr,
> src_sm_val,
> dst_sm_val,
> -   NULL,
> +   nullptr,
> dst_state,
> src_node));
>  return false;
> @@ -1993,9 +1993,9 @@ struct null_assignment_sm_context : public sm_context
> m_sm,
> var_new_sval,
> from, to,
> -   NULL,
> +   nullptr,
> *m_new_state,
> -   NULL));
> +   nullptr));
>}
>
>void set_next_state (const gimple *stmt,
> @@ -2019,9 +2019,9 @@ struct null_assignment_sm_context : public sm_context
> m_sm,
> sval,
> from, to,
> -   NULL,
> +   nullptr,
> *m_new_state,
> -

Re: [PATCH] testsuite: Tests the pattern folding x/sqrt(x) to sqrt(x) for Float16

2024-07-10 Thread Richard Biener
On Wed, 10 Jul 2024, Richard Sandiford wrote:

> Richard Sandiford  writes:
> > Jennifer Schmitz  writes:
> >> As a follow-up to adding a pattern that folds x/sqrt(x) to sqrt(x) in 
> >> match.pd, this patch adds a test case for type Float16 for armv8.2-a+fp16.
> >>
> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> >> regression.
> >> Ok for mainline?
> >>
> >> Signed-off-by: Jennifer Schmitz 
> >>
> >> gcc/testsuite/
> >>
> >> * gcc.target/aarch64/sqrt_div_float16.c: New test.
> >>
> >> commit f909f882dda56e33fde2a06f4c1318d7e691e5c9
> >> Author: Jennifer Schmitz 
> >> Date:   Mon Jul 8 18:54:54 2024 +0530
> >>
> >> [PATCH] testsuite: Test the pattern folding x/sqrt(x) to sqrt(x) for 
> >> Float16
> >> 
> >> As a follow-up to adding a pattern that folds x/sqrt(x) to sqrt(x) in 
> >> match.pd,
> >> this patch adds a test case for type Float16 for armv8.2-a+fp16.
> >> 
> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> >> regression.
> >> Ok for mainline?
> >> 
> >> Signed-off-by: Jennifer Schmitz 
> >> 
> >> gcc/testsuite/
> >> 
> >> * gcc.target/aarch64/sqrt_div_float16.c: New test.
> >>
> >> diff --git a/gcc/testsuite/gcc.target/aarch64/sqrt_div_float16.c 
> >> b/gcc/testsuite/gcc.target/aarch64/sqrt_div_float16.c
> >> new file mode 100644
> >> index 000..c4f297ef17a
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/aarch64/sqrt_div_float16.c
> >> @@ -0,0 +1,14 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */
> >> +/* { dg-require-effective-target c99_runtime } */
> >> +
> >> +#pragma GCC target ("arch=armv8.2-a+fp16")
> >> +
> >> +_Float16 f (_Float16 x) 
> >> +{
> >> +  _Float16 t1 = __builtin_sqrt (x);
> >> +  _Float16 t2 = x / t1;
> >> +  return t2;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump "gimple_simplified to t2_\[0-9\]+ = .SQRT 
> >> .x_\[0-9\]*.D.." "forwprop1" } } */
> >
> > I'm a bit nervous about tying a match.pd test to a specific dump file,
> > since match.pd is indirectly used by many passes.  How about instead
> > matching the end result, with:
> >
> > /* { dg-options "-O2 -ffast-math" } */
> > /* { dg-final { check-function-bodies "**" "" } } */
> >
> > #pragma ...
> >
> > /*
> > ** f:
> > **  fsqrt   h0, h0
> > **  ret
> > */
> > _Float16 f (_Float16 x) 
> > ...
> >
> > (tabs rather than spaces in the asm quote).
> >
> > OK with that change if you agree.
> 
> Sorry, the above collided with Kyrill's review, so please stick with
> the committed version.

Just to add we're generally using the first forwprop dump for
match.pd tests that require SSA use-def chains, so I think the
test is fine.

Richard.


Re: [PATCH] middle-end: Fix stalled swapped condition code value [PR115836]

2024-07-10 Thread Richard Biener
On Wed, 10 Jul 2024, Uros Bizjak wrote:

> emit_store_flag_1 calculates scode (swapped condition code) at the
> beginning of the function from the value of code variable.  However,
> code variable may change before scode usage site, resulting in
> invalid stalled scode value.
> 
> Move calculation of scode value just before its only usage site to
> avoid stalled scode value.
> 
> PR middle-end/115836
> 
> gcc/ChangeLog:
> 
> * expmed.cc (emit_store_flag_1): Move calculation of
> scode just before its only usage site.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Also tested with original and minimized preprocessed source.
> Unfortunately, even with the minimized source, the compilation takes
> ~5 minutes, and IMO such a trivial fix does not warrant that high
> resource consumption.
> 
> OK for master and release branches?

OK.

Thanks,
Richard.


[PATCH] rs6000: Escalate warning to error for VSX with explicit no-altivec etc.

2024-07-10 Thread Kewen.Lin
Hi,

As the discussion in PR115688, for now when users specify
-mvsx and -mno-altivec explicitly, compiler emits warning
rather than error, but considering both options are given
explicitly, emitting hard error should be better.

So this patch is to escalate some related warning to error
when both are incompatible.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-

PR target/115713

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Emit error
messages when explicit VSX encounters explicit soft-float, no-altivec
or avoid-indexed-addresses.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/warn-1.c: Move to ...
* gcc.target/powerpc/error-1.c: ... here.  Adjust dg-warning with
dg-error and remove ineffective scan.
---
 gcc/config/rs6000/rs6000.cc   | 41 +++
 .../powerpc/{warn-1.c => error-1.c}   |  3 +-
 2 files changed, 24 insertions(+), 20 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{warn-1.c => error-1.c} (70%)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 76bbb3a28ea..3b1ee3a262a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3822,32 +3822,37 @@ rs6000_option_override_internal (bool global_init_p)
   /* Add some warnings for VSX.  */
   if (TARGET_VSX)
 {
-  const char *msg = NULL;
+  bool explicit_vsx_p = rs6000_isa_flags_explicit & OPTION_MASK_VSX;
   if (!TARGET_HARD_FLOAT)
{
- if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
-   msg = N_("%<-mvsx%> requires hardware floating point");
- else
+ if (explicit_vsx_p)
{
- rs6000_isa_flags &= ~ OPTION_MASK_VSX;
- rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
+ if (rs6000_isa_flags_explicit & OPTION_MASK_SOFT_FLOAT)
+   error ("%<-mvsx%> and %<-msoft-float%> are incompatible");
+ else
+   warning (0, N_("%<-mvsx%> requires hardware floating-point"));
}
+ rs6000_isa_flags &= ~OPTION_MASK_VSX;
+ rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
}
   else if (TARGET_AVOID_XFORM > 0)
-   msg = N_("%<-mvsx%> needs indexed addressing");
-  else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit
-  & OPTION_MASK_ALTIVEC))
-{
- if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
-   msg = N_("%<-mvsx%> and %<-mno-altivec%> are incompatible");
+   {
+ if (explicit_vsx_p && OPTION_SET_P (TARGET_AVOID_XFORM))
+   error ("%<-mvsx%> and %<-mavoid-indexed-addresses%>"
+  " are incompatible");
  else
-   msg = N_("%<-mno-altivec%> disables vsx");
-}
-
-  if (msg)
+   warning (0, N_("%<-mvsx%> needs indexed addressing"));
+ rs6000_isa_flags &= ~OPTION_MASK_VSX;
+ rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
+   }
+  else if (!TARGET_ALTIVEC
+  && (rs6000_isa_flags_explicit & OPTION_MASK_ALTIVEC))
{
- warning (0, msg);
- rs6000_isa_flags &= ~ OPTION_MASK_VSX;
+ if (explicit_vsx_p)
+   error ("%<-mvsx%> and %<-mno-altivec%> are incompatible");
+ else
+   warning (0, N_("%<-mno-altivec%> disables vsx"));
+ rs6000_isa_flags &= ~OPTION_MASK_VSX;
  rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
}
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/warn-1.c 
b/gcc/testsuite/gcc.target/powerpc/error-1.c
similarity index 70%
rename from gcc/testsuite/gcc.target/powerpc/warn-1.c
rename to gcc/testsuite/gcc.target/powerpc/error-1.c
index 76ac0c4e26e..d38eba8bb8a 100644
--- a/gcc/testsuite/gcc.target/powerpc/warn-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/error-1.c
@@ -3,7 +3,7 @@
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-O -mvsx -mno-altivec" } */

-/* { dg-warning "'-mvsx' and '-mno-altivec' are incompatible" "" { target 
*-*-* } 0 } */
+/* { dg-error "'-mvsx' and '-mno-altivec' are incompatible" "" { target *-*-* 
} 0 } */

 double
 foo (double *x, double *y)
@@ -16,4 +16,3 @@ foo (double *x, double *y)
   return z[0] * z[1];
 }

-/* { dg-final { scan-assembler-not "xsadddp" } } */
--
2.45.2


[PATCH] rs6000: Consider explicitly set options in target option parsing [PR115713]

2024-07-10 Thread Kewen.Lin
Hi,

In rs6000_inner_target_options, when enabling VSX we enable
altivec and disable -mavoid-indexed-addresses implicitly,
but it doesn't consider the case that the options altivec
and avoid-indexed-addresses can be explicitly disabled.  As
the test case in PR115713#c1 shows, with target attribute
"no-altivec,vsx", it results in that VSX unexpectedly set
altivec flag and there isn't an expected error.

This patch is to avoid the automatic enablement when they
are explicitly specified.  With this change, an existing
test case ppc-target-4.c also requires an adjustment by
specifying explicit altivec in target attribute (since it
requires altivec feature and command line is specifying
no-altivec).

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-

PR target/115713

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_inner_target_options): Avoid to
enable altivec or disable avoid-indexed-addresses automatically
when they get specified explicitly.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr115713-1.c: New test.
* gcc.target/powerpc/ppc-target-4.c: Adjust by specifying altivec
in target attribute.
---
 gcc/config/rs6000/rs6000.cc   |  7 +--
 .../gcc.target/powerpc/ppc-target-4.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr115713-1.c | 20 +++
 3 files changed, 26 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr115713-1.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3b1ee3a262a..ed7a9fdeb58 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -24643,8 +24643,11 @@ rs6000_inner_target_options (tree args, bool attr_p)
  {
if (mask == OPTION_MASK_VSX)
  {
-   mask |= OPTION_MASK_ALTIVEC;
-   TARGET_AVOID_XFORM = 0;
+   if (!(rs6000_isa_flags_explicit
+ & OPTION_MASK_ALTIVEC))
+ mask |= OPTION_MASK_ALTIVEC;
+   if (!OPTION_SET_P (TARGET_AVOID_XFORM))
+ TARGET_AVOID_XFORM = 0;
  }
  }

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c 
b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
index 43a98b353cf..db9ba500e0e 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
@@ -18,7 +18,7 @@
 #error "__VSX__ should not be defined."
 #endif

-#pragma GCC target("vsx")
+#pragma GCC target("altivec,vsx")
 #include 
 #pragma GCC reset_options

diff --git a/gcc/testsuite/gcc.target/powerpc/pr115713-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr115713-1.c
new file mode 100644
index 000..1b93a78682a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr115713-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* Force power7 to avoid possible error message on AltiVec ABI change.  */
+/* { dg-options "-mdejagnu-cpu=power7" } */
+
+/* Verify there is an error message for incompatible -maltivec and -mvsx
+   even when they are specified by target attributes.  */
+
+int __attribute__ ((target ("no-altivec,vsx")))
+test1 (void)
+{
+  /* { dg-error "'-mvsx' and '-mno-altivec' are incompatible" "" { target 
*-*-* } .-1 } */
+  return 0;
+}
+
+int __attribute__ ((target ("vsx,no-altivec")))
+test2 (void)
+{
+  /* { dg-error "'-mvsx' and '-mno-altivec' are incompatible" "" { target 
*-*-* } .-1 } */
+  return 0;
+}
--
2.45.2


[PATCH] rs6000: Update option set in rs6000_inner_target_options [PR115713]

2024-07-10 Thread Kewen.Lin
Hi,

When function rs6000_inner_target_options parsing target
options, it updates the explicit option set information for
rs6000_opt_masks by rs6000_isa_flags_explicit, but it misses
to update that information for rs6000_opt_vars, and it can
result in some unexpected consequence as the associated test
case shows.  This patch is to fix rs6000_inner_target_options
to update the option set for rs6000_opt_vars as well.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-

PR target/115713

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_inner_target_options): Update option
set information for rs6000_opt_vars.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr115713-2.c: New test.
---
 gcc/config/rs6000/rs6000.cc   |  3 ++-
 gcc/testsuite/gcc.target/powerpc/pr115713-2.c | 22 +++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr115713-2.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index ed7a9fdeb58..8647aa92fe9 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -24668,7 +24668,8 @@ rs6000_inner_target_options (tree args, bool attr_p)
if (strcmp (r, rs6000_opt_vars[i].name) == 0)
  {
size_t j = rs6000_opt_vars[i].global_offset;
-   *((int *) ((char *)&global_options + j)) = !invert;
+   *((int *) ((char *) &global_options + j)) = !invert;
+   *((int *) ((char *) &global_options_set + j)) = 1;
error_p = false;
not_valid_p = false;
break;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr115713-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr115713-2.c
new file mode 100644
index 000..47b39c0faba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr115713-2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* Force power7 to avoid possible error message on AltiVec ABI change.  */
+/* { dg-options "-mdejagnu-cpu=power7" } */
+
+/* Verify there is an error message for -mvsx incompatible with
+   -mavoid-indexed-addresses even when they are specified by
+   target attributes.  */
+
+int __attribute__ ((target ("avoid-indexed-addresses,vsx")))
+test1 (void)
+{
+  /* { dg-error "'-mvsx' and '-mavoid-indexed-addresses' are incompatible" "" 
{ target *-*-* } .-1 } */
+  return 0;
+}
+
+int __attribute__ ((target ("vsx,avoid-indexed-addresses")))
+test2 (void)
+{
+  /* { dg-error "'-mvsx' and '-mavoid-indexed-addresses' are incompatible" "" 
{ target *-*-* } .-1 } */
+  return 0;
+}
+
--
2.45.2


Ping^5 [PATCH] add rlwinm pattern for DImode for constant building

2024-07-10 Thread Jiufu Guo


Hi,

Gentle ping...

BR,
Jeff(Jiufu) Guo

Jiufu Guo  writes:

> Hi,
>
> Gentle ping.
>
> BR,
> Jeff(Jiufu) Guo
>
> Jiufu Guo  writes:
>
>> Hi,
>>
>> Gentle ping ...
>>
>> Jiufu Guo  writes:
>>
>>> Hi,
>>>
>>> Gentle ping ...
>>>
>>> BR,
>>> Jeff(Jiufu) Guo
>>>
>>> Jiufu Guo  writes:
>>>
 Hi,

 'rlwinm' pattern is already well used for SImode.  As this instruction
 can touch the whole 64bit register, so some constants in 64bit(DImode)
 can be built via 'lis/li+rlwinm'.  To achieve this, a new pattern for
 'rlwinm' is added, and 'rs6000_emit_set_long_const' is updated to check
 if a constant is able to be built by 'lis/li; rlwinm'.

 Bootstrap and regtest pass on ppc64{,le}.

 Is this patch ok for trunk (when stage1 is open)?
>>
>> Is this patch ok for trunk?
>>
>> BR,
>> Jeff(Jiufu) Guo
>>

 Jeff (Jiufu Guo).

 gcc/ChangeLog:

* config/rs6000/rs6000-protos.h (can_be_rotated_to_lowbits): Add new
parameter.
* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rlwinm): New 
 function.
(rs6000_emit_set_long_const): Generate 'lis/li+rlwinm'.
(can_be_rotated_to_lowbits): Add new parameter.
* config/rs6000/rs6000.md (rlwinm_di_mask): New pattern.

 gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr93012.c: Update to match 'rlwinm'.
* gcc.target/powerpc/rlwinm4di-1.c: New test.
* gcc.target/powerpc/rlwinm4di-2.c: New test.
* gcc.target/powerpc/rlwinm4di.c: New test.
* gcc.target/powerpc/rlwinm4di.h: New test.

 ---
  gcc/config/rs6000/rs6000-protos.h |  2 +-
  gcc/config/rs6000/rs6000.cc   | 65 ++-
  gcc/config/rs6000/rs6000.md   | 18 +
  gcc/testsuite/gcc.target/powerpc/pr93012.c|  2 +-
  .../gcc.target/powerpc/rlwinm4di-1.c  | 25 +++
  .../gcc.target/powerpc/rlwinm4di-2.c  | 19 ++
  gcc/testsuite/gcc.target/powerpc/rlwinm4di.c  |  6 ++
  gcc/testsuite/gcc.target/powerpc/rlwinm4di.h  | 25 +++
  8 files changed, 158 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di-1.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di-2.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di.h

 diff --git a/gcc/config/rs6000/rs6000-protos.h 
 b/gcc/config/rs6000/rs6000-protos.h
 index 09a57a806fa..10505a8061a 100644
 --- a/gcc/config/rs6000/rs6000-protos.h
 +++ b/gcc/config/rs6000/rs6000-protos.h
 @@ -36,7 +36,7 @@ extern bool vspltisw_vupkhsw_constant_p (rtx, 
 machine_mode, int * = nullptr);
  extern int vspltis_shifted (rtx);
  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
 -extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int 
 *);
 +extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int 
 *, bool = false);
  extern bool can_be_rotated_to_positive_16bits (HOST_WIDE_INT);
  extern bool can_be_rotated_to_negative_15bits (HOST_WIDE_INT);
  extern int num_insns_constant (rtx, machine_mode);
 diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
 index 6ba9df4f02e..853eaede673 100644
 --- a/gcc/config/rs6000/rs6000.cc
 +++ b/gcc/config/rs6000/rs6000.cc
 @@ -10454,6 +10454,51 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, 
 int *shift, HOST_WIDE_INT *mask)
return false;
  }
  
 +/* Check if value C can be generated by 2 instructions, one instruction
 +   is li/lis, another instruction is rlwinm.  */
 +
 +static bool
 +can_be_built_by_li_lis_and_rlwinm (HOST_WIDE_INT c, HOST_WIDE_INT *val,
 + int *shift, HOST_WIDE_INT *mask)
 +{
 +  unsigned HOST_WIDE_INT low = c & 0xULL;
 +  unsigned HOST_WIDE_INT high = (c >> 32) & 0xULL;
 +  unsigned HOST_WIDE_INT v;
 +
 +  /* diff of high and low (high ^ low) should be the mask position.  */
 +  unsigned HOST_WIDE_INT m = low ^ high;
 +  int tz = ctz_hwi (m);
 +  int lz = clz_hwi (m);
 +  if (m != 0)
 +m = ((HOST_WIDE_INT_M1U >> (lz + tz)) << tz);
 +  if (high != 0)
 +m = ~m;
 +  v = high != 0 ? high : ((low | ~m) & 0x);
 +
 +  if ((high != 0) && ((v & m) != low || lz < 33 || tz < 1))
 +return false;
 +
 +  /* rotl32 on positive/negative value of 'li' 15/16bits.  */
 +  int n;
 +  if (!can_be_rotated_to_lowbits (v, 15, &n, true)
 +  && !can_be_rotated_to_lowbits ((~v) & 0xULL, 15, &n, true))
 +{
 +  /* rotate32 from a negative value of 'lis'.  */
 +  if (!can_be_rotated_to_lowbits (v & 0xFFF

Re: [Fortran, Patch, PR 96992, V4] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-10 Thread Andre Vehreschild
Hi Harald,

thanks for the review. I totally agree, that this patch has gotten bigger than
I expected (and wanted). But things are as they are.

About the coding style: I have worked in so many projects, that I consider a
consistent coding style luxury. I esp. do not have my own one anymore. The
formating you are seeing in my patches is the result of clang-format with the
provided parameter file in contrib/clang-format. I was happy to have a tool
to do the formatting, that I could integrate into my IDE, because previously it
was hard to mimic the GNU style. I try to get to the GNU style as good as
possible, where I consider clang-format doing garbage.

I see that clang-format has a "very specific opinion" on how to format the
lines you mentioned, but it will "correct" them any time I change them and
touch them later. I now have forbidden clang-format to touch the code lines,
but this means to add formatter specific comments. Is this ok?

About the assumed size arrays, that was a small change and is added now.

Note, the runtime part of the patch (pr96992_3p1.patch) did not change and is
therefore not updated.

Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?

Regards,
Andre

On Fri, 5 Jul 2024 22:10:16 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> Am 03.07.24 um 12:58 schrieb Andre Vehreschild:
> > Hi Harald,
> >
> > I am sorry for the long delay, but fixing the negative stride lead from one
> > issue to the next. I finally got a version that does not regress. Please
> > have a look.
> >
> > This patch has two parts:
> > 1. The runtime library part in pr96992_3p1.patch and
> > 2. the compiler changes in pr96992_3p2.patch.
> >
> > In my branch also the two patches from Paul for pr59104 and pr102689 are
> > living, which might lead to small shifts during application of the patches.
> >
> > NOTE, this patch adds internal packing and unpacking of class arrays
> > similar to the regular pack and unpack. I think this is necessary, because
> > the regular un-/pack does not use the vptr's _copy routine for moving data
> > and therefore may produce bugs.
> >
> > The un-/pack_class routines are yet only used for converting a derived type
> > array to a class array. Extending their use when a UN-/PACK() is applied on
> > a class array is still to be done (as part of another PR).
> >
> > Regtests fine on x86_64-pc-linux-gnu/ Fedora 39.
>
> this is a really huge patch to review, and I am not sure that I can do
> this without help from others.  Paul?  Anybody else?
>
> As far as I can tell for now:
>
> - pr96992_3p1.patch (the libgfortran part) looks good to me.
>
> - git had some whitespace issues with pr96992_3p2.patch as attached,
>but I could fix that locally and do some testing parallel to reading.
>
> A few advance comments on the latter patch:
>
> - my understanding is that the PR at the end of a summary line should be
>like in:
>
> Fortran: Fix rejecting class arrays of different ranks as storage
> association argument [PR96992]
>
>I was told that this helps people explicitly scanning for the PR
>number in that place.
>
> - some rewrites of logical conditions change the coding style from
>what it recommended GNU coding style, and I find the more compact
>way used in some places harder to grok (but that may be just me).
>Example:
>
> @@ -8850,20 +8857,24 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr
> * expr, bool g77,
> /* There is no need to pack and unpack the array, if it is contiguous
>and not a deferred- or assumed-shape array, or if it is simply
>contiguous.  */
> -  no_pack = ((sym && sym->as
> -   && !sym->attr.pointer
> -   && sym->as->type != AS_DEFERRED
> -   && sym->as->type != AS_ASSUMED_RANK
> -   && sym->as->type != AS_ASSUMED_SHAPE)
> -   ||
> -  (ref && ref->u.ar.as
> -   && ref->u.ar.as->type != AS_DEFERRED
> +  no_pack = false;
> +  gfc_array_spec *as;
> +  if (sym)
> +{
> +  symbol_attribute *attr
> + = &(IS_CLASS_ARRAY (sym) ? CLASS_DATA (sym)->attr : sym->attr);
> +  as = IS_CLASS_ARRAY (sym) ? CLASS_DATA (sym)->as : sym->as;
> +  no_pack
> + = (as && !attr->pointer && as->type != AS_DEFERRED
> +&& as->type != AS_ASSUMED_RANK && as->type != AS_ASSUMED_SHAPE);
> +}
> +  if (ref && ref->u.ar.as)
> +no_pack = no_pack
> +   || (ref->u.ar.as->type != AS_DEFERRED
> && ref->u.ar.as->type != AS_ASSUMED_RANK
> -   && ref->u.ar.as->type != AS_ASSUMED_SHAPE)
> -   ||
> -  gfc_is_simply_contiguous (expr, false, true));
> -
> -  no_pack = contiguous && no_pack;
> +   && ref->u.ar.as->type != AS_ASSUMED_SHAPE);
> +  no_pack
> += contiguous && (no_pack || gfc_is_simply_contiguous (expr, false,
> true));
>
> /* If we have an EXPR_OP or a function returning an explicit-shaped
>or allocatable array, an array temporary will be g

Re: [PATCH V4] report message for operator %a on unaddressible operand

2024-07-10 Thread Jiufu Guo


Hi,

"Kewen.Lin"  writes:

> Hi Jeff,
>
> on 2024/6/5 16:30, Jiufu Guo wrote:
>> Hi,
>> 
>> For PR96866, when printing asm code for modifier "%a", an addressable
>> operand is required.  While the constraint "X" allow any kind of
>> operand even which is hard to get the address directly. e.g. extern
>> symbol whose address is in TOC.
>> An error message would be reported to indicate the invalid asm operand.
>> 
>> Compare with previous version, changelog and emit message are updated.
>> 
>> Bootstrap®test pass on ppc64{,le}.
>> Is this ok for trunk?
>> 
>> BR,
>> Jeff(Jiufu Guo)
>> 
>>  PR target/96866
>> 
>> gcc/ChangeLog:
>> 
>>  * config/rs6000/rs6000.cc (print_operand_address): Emit message for
>>  Unsupported operand.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/powerpc/pr96866-1.c: New test.
>>  * gcc.target/powerpc/pr96866-2.c: New test.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc  |  7 ++-
>>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 18 ++
>>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 13 +
>>  3 files changed, 37 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 117999613d8..7e7c36a1bad 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -14664,7 +14664,12 @@ print_operand_address (FILE *file, rtx x)
>>  fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
>>   reg_names[SMALL_DATA_REG]);
>>else
>> -gcc_assert (!TARGET_TOC);
>> +{
>> +  /* Do not support getting address directly from TOC, emit error.
>> + No more work is needed for !TARGET_TOC. */
>> +  if (TARGET_TOC)
>> +output_operand_lossage ("%%a requires an address of memory");
>> +}
>>  }
>>else if (GET_CODE (x) == PLUS && REG_P (XEXP (x, 0))
>> && REG_P (XEXP (x, 1)))
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>> new file mode 100644
>> index 000..bcebbd6e310
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>> @@ -0,0 +1,18 @@
>> +/* The "%a" modifier can't get the address of extern symbol directly from 
>> TOC
>> +   with -fPIC, even the symbol is propgated for "X" constraint under -O2. */
>> +/* { dg-options "-fPIC -O2" } */
>> +
>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>> */
>> +/* { dg-excess-errors "pr96866-1.c" } */
>
> This seems to XPASS on Power10 with pcrel?  This needs ! powerpc_pcrel
> guard if so.

Oh, yeap. Thanks for point out this! %a would accept "X"(&x) with pcrel.

>
>> +
>> +int x[2];
>> +
>> +int __attribute__ ((noipa))
>> +f1 (void)
>> +{
>> +  int n;
>> +  int *p = x;
>> +  *p++;
>> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
>> +  return n;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>> new file mode 100644
>> index 000..0577fd6d588
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>> @@ -0,0 +1,13 @@
>> +/* The "%a" modifier can't get the address of extern symbol directly from 
>> TOC
>> +   with -fPIC. */
>> +/* { dg-options "-fPIC -O2" } */
>> +
>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>> */
>> +/* { dg-excess-errors "pr96866-2.c" } */
>
> Ditto.
Thanks again.

BR,
Jeff(Jiufu) Guo.

>
> The others look good to me.
>
> BR,
> Kewen
>
>> +
>> +void
>> +f (void)
>> +{
>> +  extern int x;
>> +  __asm__ volatile("#%a0" ::"X"(&x));
>> +}


Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Tejas Belagod

On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

 bool foo (svint32_t a, svint32_t b)
 {
   svbool_t p = a > b;

   // Here p[2] is not the same as a[2] > b[2].
   return p[2];
 }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

 typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
 typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
  https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
auto r = a < b;
}

produces, with C23:

vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Restricting __attribute__((vector_mask)) to apply only to target
intrinsic types? Eg.

On SVE something like:
typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.

On AVX, something like:
typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though
this would require more fine-grained defn of lane-size to mask-bits mapping.


I think the ta

[PATCH 1/3] lower SLP load permutation to interleaving

2024-07-10 Thread Richard Biener
The following emulates classical interleaving for SLP load permutes
that we are unlikely handling natively.  This is to handle cases
where interleaving (or load/store-lanes) is the optimal choice for
vectorizing even when we are doing that within SLP.  An example
would be

void foo (int * __restrict a, int * b)
{
  for (int i = 0; i < 16; ++i)
{
  a[4*i + 0] = b[4*i + 0] * 3;
  a[4*i + 1] = b[4*i + 1] + 3;
  a[4*i + 2] = (b[4*i + 2] * 3 + 3);
  a[4*i + 3] = b[4*i + 3] * 3;
}
}

where currently the SLP store is merging four single-lane SLP
sub-graphs but none of the loads in it can be code-generated
with V4SImode vectors and a VF of four as the permutes would need
three vectors.

The patch introduces a lowering phase after SLP discovery but
before SLP pattern recognition or permute optimization that
analyzes all loads from the same dataref group and creates an
interleaving scheme starting from an unpermuted load.

What can be handled is power-of-two group size, group size of
three is handled in a followup, as is the possibility for
doing the interleaving with a load-lanes like instruction.

The patch has a fallback for when there are multi-lane groups
and the resulting permutes to not fit interleaving.  Code
generation is not optimal when this triggers and might be
worse than doing single-lane group interleaving.

The patch handles gaps by representing them with NULL
entries in SLP_TREE_SCALAR_STMTS for the unpermuted load node.
The SLP discovery changes could be elided if we manually build the
load node instead.

SLP load nodes covering enough lanes to not need intermediate
permutes are retained as having a load-permutation and do not
use the single SLP load node for each dataref group.  That's
something we might want to change, making load-permutation
something purely local to SLP discovery (but then SLP discovery
could do part of the lowering).

The patch misses CSEing intermediate generated permutes and
registering them with the bst_map which is possibly required
for SLP pattern detection in some cases.

* tree-vect-slp.cc (vect_build_slp_tree_1): Handle NULL stmt.
(vect_build_slp_tree_2): Likewise.  Release load permutation
when there's a NULL in SLP_TREE_SCALAR_STMTS and assert there's
no actual permutation in that case.
(vllp_cmp): New function.
(vect_lower_load_permutations): Likewise.
(vect_analyze_slp): Call it.

* gcc.dg/vect/slp-11a.c: Expect SLP.
* gcc.dg/vect/slp-12a.c: Likewise.
* gcc.dg/vect/slp-51.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/slp-11a.c |   2 +-
 gcc/testsuite/gcc.dg/vect/slp-12a.c |   2 +-
 gcc/testsuite/gcc.dg/vect/slp-51.c  |  17 ++
 gcc/tree-vect-slp.cc| 343 +++-
 4 files changed, 360 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-51.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-11a.c 
b/gcc/testsuite/gcc.dg/vect/slp-11a.c
index fcb7cf6c7a2..2efa1796757 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11a.c
@@ -72,4 +72,4 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } 
} */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c 
b/gcc/testsuite/gcc.dg/vect/slp-12a.c
index 2f98dc9da0b..fedf27b69d2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c
@@ -80,5 +80,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { vect_strided8 && vect_int_mult } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-51.c 
b/gcc/testsuite/gcc.dg/vect/slp-51.c
new file mode 100644
index 000..91ae763be30
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-51.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+void foo (int * __restrict x, int *y)
+{
+  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
+  y = __builtin_assume_aligned (y, __BIGGEST_ALIGNMENT__);
+  for (int i = 0; i < 1024; ++i)
+{
+  x[4*i+0] = y[4*i+0];
+  x[4*i+1

[PATCH 2/3] Support group-size of three in SLP load permutation lowering

2024-07-10 Thread Richard Biener
The following adds support for group-size three in SLP load permutation
lowering to match the non-SLP capabilities.  This is done by using
the non-interleaving fallback code which then creates at VF == 4 from
{ { a0, b0, c0 }, { a1, b1, c1 }, { a2, b2, c2 }, { a3, b3, c3 } }
the intermediate vectors { c0, c0, c1, c1 } and { c2, c2, c3, c3 }
to produce { c0, c1, c2, c3 }.

This turns out to be more effective than the scheme implemented
for non-SLP for SSE and only slightly worse for AVX512 and a bit
more worse for AVX2.  It seems to me that this would extend to
other non-power-of-two group-sizes though (but the patch does not).
Optimal schemes are likely difficult to lay out in VF agnostic form.

I'll note that while the lowering assumes even/odd extract is
generally available for all vector element sizes (which is probably
a good assumption), it doesn't in any way constrain the other
permutes it generates based on target availability.  Again difficult
to do in a VF agnostic way (but at least currently the vector type
is fixed).

I'll also note that the SLP store side merges lanes in a way
producing three-vector permutes for store group-size of three, so
the testcase uses a store group-size of four.

* tree-vect-slp.cc (vect_lower_load_permutations): Support
group-size of three.

* gcc.dg/vect/slp-52.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/slp-52.c | 14 
 gcc/tree-vect-slp.cc   | 35 +-
 2 files changed, 34 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-52.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-52.c 
b/gcc/testsuite/gcc.dg/vect/slp-52.c
new file mode 100644
index 000..ba49f0046e2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-52.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+void foo (int * __restrict x, int *y)
+{
+  for (int i = 0; i < 1024; ++i)
+{
+  x[4*i+0] = y[3*i+0];
+  x[4*i+1] = y[3*i+1] * 2;
+  x[4*i+2] = y[3*i+2] + 3;
+  x[4*i+3] = y[3*i+2] * 2 - 5;
+}
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_int && vect_int_mult } } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 0f830c1ad9c..2dc6d365303 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3710,7 +3710,8 @@ vect_build_slp_instance (vec_info *vinfo,
 with the least number of lanes to one and then repeat until
 we end up with two inputs.  That scheme makes sure we end
 up with permutes satisfying the restriction of requiring at
-most two vector inputs to produce a single vector output.  */
+most two vector inputs to produce a single vector output
+when the number of lanes is even.  */
  while (SLP_TREE_CHILDREN (perm).length () > 2)
{
  /* When we have three equal sized groups left the pairwise
@@ -4050,11 +4051,10 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
 = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (loads[0])[0]);
 
   /* Only a power-of-two number of lanes matches interleaving with N levels.
- The non-SLP path also supports DR_GROUP_SIZE == 3.
  ???  An even number of lanes could be reduced to 1<= group_lanes / 2)
+  if (SLP_TREE_LANES (load) >= (group_lanes + 1) / 2)
continue;
 
   /* First build (and possibly re-use) a load node for the
@@ -4107,7 +4107,7 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
   while (1)
{
  unsigned group_lanes = SLP_TREE_LANES (l0);
- if (SLP_TREE_LANES (load) >= group_lanes / 2)
+ if (SLP_TREE_LANES (load) >= (group_lanes + 1) / 2)
break;
 
  /* Try to lower by reducing the group to half its size using an
@@ -4117,19 +4117,24 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
 Thus { e, e, o, o, e, e, o, o } woud be an even/odd decomposition
 with N == 2.  */
  /* ???  Only an even number of lanes can be handed this way, but the
-fallback below could work for any number.  */
- gcc_assert ((group_lanes & 1) == 0);
- unsigned even = (1 << ceil_log2 (group_lanes)) - 1;
- unsigned odd = even;
- for (auto l : final_perm)
+fallback below could work for any number.  We have to make sure
+to round up in that case.  */
+ gcc_assert ((group_lanes & 1) == 0 || group_lanes == 3);
+ unsigned even = 0, odd = 0;
+ if ((group_lanes & 1) == 0)
{
- even &= ~l.second;
- odd &= l.second;
+ even = (1 << ceil_log2 (group_lanes)) - 1;
+ odd = even;
+ for (auto l : final_perm)
+   {
+ even &= ~l.second;
+ odd &= l.second;
+   }
}
 
  /* Now build an even or odd extractio

[PATCH 3/3] RISC-V: load and store-lanes with SLP

2024-07-10 Thread Richard Biener
The following is a prototype for how to represent load/store-lanes
within SLP.  I've for now settled with having a single load node
with multiple permute nodes acting as selection, one for each loaded lane
and a single store node fed from all stored lanes.  For

  for (int i = 0; i < 1024; ++i)
{
  a[2*i] = b[2*i] + 7;
  a[2*i+1] = b[2*i+1] * 3;
}

you have the following SLP graph where I explain how things are set
up and code-generated:

t.c:23:21: note:   SLP graph after lowering permutations:
t.c:23:21: note:   node 0x50dc8b0 (max_nunits=1, refcnt=1) vector(4) int
t.c:23:21: note:   op template: *_6 = _7;
t.c:23:21: note:stmt 0 *_6 = _7;
t.c:23:21: note:stmt 1 *_12 = _13;
t.c:23:21: note:children 0x50dc488 0x50dc6e8

This is the store node, it's marked with ldst_lanes = true during
SLP discovery.  This node code-generates

  vect_array.65[0] = vect__7.61_29;
  vect_array.65[1] = vect__13.62_28;
  MEM  [(int *)vectp_a.63_27] = .STORE_LANES (vect_array.65);

...
t.c:23:21: note:   node 0x50dc520 (max_nunits=4, refcnt=2) vector(4) int
t.c:23:21: note:   op: VEC_PERM_EXPR
t.c:23:21: note:stmt 0 _5 = *_4;
t.c:23:21: note:lane permutation { 0[0] }
t.c:23:21: note:children 0x50dc948
t.c:23:21: note:   node 0x50dc780 (max_nunits=4, refcnt=2) vector(4) int
t.c:23:21: note:   op: VEC_PERM_EXPR
t.c:23:21: note:stmt 0 _11 = *_10;
t.c:23:21: note:lane permutation { 0[1] }
t.c:23:21: note:children 0x50dc948

These are the selection nodes, marked with ldst_lanes = true.
They code generate nothing.

t.c:23:21: note:   node 0x50dc948 (max_nunits=4, refcnt=3) vector(4) int
t.c:23:21: note:   op template: _5 = *_4;
t.c:23:21: note:stmt 0 _5 = *_4;
t.c:23:21: note:stmt 1 _11 = *_10;
t.c:23:21: note:load permutation { 0 1 }

This is the load node, marked with ldst_lanes = true (the load
permutation is only accurate when taking into account the lane permute
in the selection nodes).  It code generates

  vect_array.58 = .LOAD_LANES (MEM  [(int *)vectp_b.56_33]);
  vect__5.59_31 = vect_array.58[0];
  vect__5.60_30 = vect_array.58[1];

This scheme allows to leave code generation in vectorizable_load/store
mostly as-is.

While this should support both load-lanes and (masked) store-lanes
the decision to do either is done during SLP discovery time and
cannot be reversed without altering the SLP tree - as-is the SLP
tree is not usable for non-store-lanes on the store side, the
load side is OK representation-wise but will very likely fail
permute handling as the lowering to deal with the two input vector
restriction isn't done - but of course since the permute node is
marked as to be ignored that doesn't work out.  So I've put
restrictions in place that fail vectorization if a load/store-lane
SLP tree is later classified differently by get_load_store_type.

With this I've disabled the code scrapping SLP as it will no longer
fire.  I'll note that for example gcc.target/aarch64/sve/mask_struct_store_3.c
will not get SLP store-lanes used because the full store SLPs just
fine though we then fail to handle the "splat" load-permutation

t2.c:5:21: note:   node 0x4db2630 (max_nunits=4, refcnt=2) vector([4,4]) int
t2.c:5:21: note:   op template: _6 = *_5;
t2.c:5:21: note:stmt 0 _6 = *_5;
t2.c:5:21: note:stmt 1 _6 = *_5;
t2.c:5:21: note:stmt 2 _6 = *_5;
t2.c:5:21: note:stmt 3 _6 = *_5;
t2.c:5:21: note:load permutation { 0 0 0 0 }

the load permute lowering code currently doesn't consider it worth
lowering single loads from a group (or in this case not grouped loads).
The expectation is the target can handle this by two interleaves with
itself.

So what we see here is that while the explicit SLP representation is
helpful in some cases, in cases like this it would require changing
it when we make decisions how to vectorize.  My idea is that this
all will change a lot when we re-do SLP discovery (for loops) and
when we get rid of non-SLP as I think vectorizable_* should be
allowed to alter the SLP graph during analysis.

I'm not sure what's the best way forward - if we can decide to
live with (temporary) regressions in this area?  There is the possibility
to do the "non-SLP" mode by forcing single-lane discovery everywhere(?)
as a temporary measure.  Unfortunately this will alter the VF and thus
cannot be done on-the-fly per SLP instance I think (much like we cannot
currently cancel only one SLP instance without a full re-analysis).

* tree-vectorizer.h (_slp_tree::ldst_lanes): New flag to mark
load, store and permute nodes.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize ldst_lanes.
(vect_build_slp_instance): For stores iff the target prefers
store-lanes discover single-lane sub-groups, do not perform
interleaving lowering but mark the node with ldst_lanes.
(vect_lower_load_permutations): When the target supports
load lanes a

Re: [patch,avr] PR115830: Improve code by using more condition code

2024-07-10 Thread Georg-Johann Lay

Am 10.07.24 um 01:17 schrieb Jeff Law:

On 7/9/24 4:03 AM, Georg-Johann Lay wrote:

Hi Jeff,

This patch adds peephole2s and insns to make better use of
instructions that set condition code (SREG) as a byproduct.

Of course with cc0 all this was *much* simpler... so here we go;
adding CCNmode and CCZNmode, and extra insns that do arith + CC.

No new regressions.

Ok for master?

Johann

--

AVR: target/115830 - Make better use of SREG.N and SREG.Z.

This patch adds new CC modes CCN and CCZN for operations that
set SREG.N, resp. SREG.Z and SREG.N.  Add a bunch of peephole2
patterns to generate new compute + branch insns that make use
of the Z and N flags.  Most of these patterns need their own
asm output routines that don't do all the micro-optimizations
that the ordinary outputs may perform, as the latter have no
requirement to set CC in a usable way.  Pass peephole2 is run
a second time so all patterns get a chance to match.

 PR target/115830
gcc/
 * config/avr/avr-modes.def (CCN, CCZN): New CC_MODEs.
 * config/avr/avr-protos.h (ret_cond_branch): Adjust.
 (avr_out_plus_set_N, avr_op8_ZN_operator,
 avr_out_op8_set_ZN, avr_len_op8_set_ZN): New protos.
 * config/avr/avr.cc (ret_cond_branch): Remove "reverse"
 argument (was always false) and respective code.
 Pass cc_overflow_unusable as an argument.
  (cond_string): Add bool cc_overflow_unusable argument.
 (avr_print_operand) ['L']: Like 'j' but overflow unusable.
 ['K']: Like 'k' but overflow unusable.
 (avr_out_plus_set_ZN): Also support adding -2 and +2.
 (avr_out_plus_set_N, avr_op8_ZN_operator): New functions.
 (avr_out_op8_set_ZN, avr_len_op8_set_ZN): New functions.
 (avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_N]: Hande case.
 (avr_class_max_nregs): All MODE_CCs occupy one hard reg.
 (avr_hard_regno_nregs): Same.
 (avr_hard_regno_mode_ok) [REG_CC]: Allow all MODE_CC.
 (pass_manager.h): Include it.
 (avr_option_override): Run peephole2 a second time.
 * config/avr/avr.md (adjust_len) [add_set_N]: New.
 (ALLCC, CCN_CCZN): New mode iterators.
 (CCname): New mode attribute.
 (eqnegtle, cmp_signed, op8_ZN): New code iterators.
 (swap, SWAP, tstMSB): New code attributes.
 (branch): Handle CCNmode and CCZNmode.  Assimilate...
 (difficult_branch): ...this insn.
 (p1m1): Turn into p2m2.
 (gen_add_for__): Adjust to CCNmode and CCZNmode.
 Extend peephole2s that produce them.
 (*add.for.eqne.): Extend to 
*add.for...

 (*ashift.for.ccn.): New insns and peephole2s to make them.
 (*op8.for.cczn.): New insns and peephole2s to make them.
 * config/avr/predicates.md (const_1_to_3_operand)
 (abs1_abs2_operand, signed_comparison_operator)
 (op8_ZN_operator): New predicates.
gcc/testsuite/
 * gcc.target/avr/pr115830-add-c.c: New test.
 * gcc.target/avr/pr115830-add-i.c: New test.
 * gcc.target/avr/pr115830-and.c: New test.
 * gcc.target/avr/pr115830-asl.c: New test.
 * gcc.target/avr/pr115830-asr.c: New test.
 * gcc.target/avr/pr115830-ior.c: New test.
 * gcc.target/avr/pr115830-lsr.c: New test.
 * gcc.target/avr/pr115830-asl32.c: New test.
I was going to throw this into my tester, but the avr.md part of the 
patch failed.  I'm guessing the patch needs minor updates due to some 
kind of changes on the trunk.


Hi Jeff,

The previous change to avr.md was several days ago, and should not
interfere with this one.  Anyway, I rebased the patch against
master and attached it below.  The patch is atop the ref in the patch
file name : https://gcc.gnu.org/r15-1935

It looks like avr exposes the CC register early, creating references to 
it during expansion to RTL.  Presumably this means you've got a 
reasonable way to reload values, particularly address arithmetic without 
impacting the CC state?


No. CC only comes into existence after reload in .split2 and in some
cases in .avr-ifelse.  reloads may clobber CC, so there is no way
to split cbranch insn prior to relaod.

It looks like you're relying heavily on peep2 patterns.  Did you explore 
using cmpelim?


Some question I have:

- compare-elim.cc states that the comparisons must look like
  [(set (reg:CC) (compare:CC (reg) (reg_or_immediate)))]
which is not always the case, in particular the patterns
may have clobbers. compare-elim uses single_set, which
is true for the pattern with a clobber / scratch.  However,
presence / absence of a clobber reg has to be taken into
account when deciding whether a transformation into cc
other than CCmode is possible.  compare-elim only supplies
the SET_SRC, but not the scratch_operand to SELECT_CC_MODE.

- The internals says that some optimizations / transforms
will happen prior to reload (e.g. .combine).  This is not
possible since CCmode exists only since .split2.

- Insn combine may have transformed some comparisons, e.g.
sign test are represented as a zero_extract + skip like
in the sbrx_branch insns.  This mean

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Richard Biener
On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:
>
> On 7/9/24 4:22 PM, Richard Biener wrote:
> > On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:
> >>
> >> On 7/8/24 4:45 PM, Richard Biener wrote:
> >>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  
> >>> wrote:
> 
>  Hi,
> 
>  Sorry to have dropped the ball on
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
>  here I've tried to pick it up again and write up a strawman proposal for
>  elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> 
> 
>  Thanks,
>  Tejas.
> 
>  Motivation
>  --
> 
>  The idea of packed boolean vectors came about when we wanted to support
>  C/C++ operators on SVE ACLE types. The current vector boolean type that
>  ACLE specifies does not adequately disambiguate vector lane sizes which
>  they were derived off of. Consider this simple, albeit unrealistic, 
>  example:
> 
>   bool foo (svint32_t a, svint32_t b)
>   {
> svbool_t p = a > b;
> 
> // Here p[2] is not the same as a[2] > b[2].
> return p[2];
>   }
> 
>  In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
>  does not return the bool value corresponding to a[i] > b[i]. This
>  necessitates a 'typed' vector boolean value that unambiguously
>  represents results of operations
>  of the same type.
> 
>  __attribute__((vector_mask))
>  -
> 
>  Note: If interested in historical discussions refer to:
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> 
>  We define this new attribute which when applied to a base data vector
>  produces a new boolean vector type that represents a boolean type that
>  is produced as a result of operations on the corresponding base vector
>  type. The following is the syntax.
> 
>   typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
>   typedef v8si v8sib __attribute__((vector_mask));
> 
>  Here the 'base' data vector type is v8si or a vector of 8 integers.
> 
>  Rules
> 
>  • The layout/size of the boolean vector type is implementation-defined
>  for its base data vector type.
> 
>  • Two boolean vector types who's base data vector types have same number
>  of elements and lane-width have the same layout and size.
> 
>  • Consequently, two boolean vectors who's base data vector types have
>  different number of elements or different lane-size have different 
>  layouts.
> 
>  This aligns with gnu vector extensions that generate integer vectors as
>  a result of comparisons - "The result of the comparison is a vector of
>  the same width and number of elements as the comparison operands with a
>  signed integral element type." according to
>    https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> >>>
> >>> Without having the time to re-review this all in detail I think the GNU
> >>> vector extension does not expose the result of the comparison as the
> >>> machine would produce it but instead a comparison "decays" to
> >>> a conditional:
> >>>
> >>> typedef int v4si __attribute__((vector_size(16)));
> >>>
> >>> v4si a;
> >>> v4si b;
> >>>
> >>> void foo()
> >>> {
> >>> auto r = a < b;
> >>> }
> >>>
> >>> produces, with C23:
> >>>
> >>> vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
> >>> 0, 0, 0 } > ;
> >>>
> >>> In fact on x86_64 with AVX and AVX512 you have two different "machine
> >>> produced" mask types and the above could either produce a AVX mask with
> >>> 32bit elements or a AVX512 mask with 1bit elements.
> >>>
> >>> Not exposing "native" mask types requires the compiler optimizing 
> >>> subsequent
> >>> uses and makes generic vectors difficult to combine with for example 
> >>> AVX512
> >>> intrinsics (where masks are just 'int').  Across an ABI boundary it's also
> >>> even more difficult to optimize mask transitions.
> >>>
> >>> But it at least allows portable code and it does not suffer from users 
> >>> trying to
> >>> expose machine representations of masks as input to generic vector code
> >>> with all the problems of constant folding not only requiring 
> >>> self-consistent
> >>> code within the compiler but compatibility with user produced constant 
> >>> masks.
> >>>
> >>> That said, I somewhat question the need to expose the target mask layout
> >>> to users for GCCs generic vector extension.
> >>>
> >>
> >> Thanks for your feedback.
> >>
> >> IIUC, I can imagine how having a GNU vector extension exposing the
> >> target vector mask layout can pose a challenge - maybe making it a
> >> generic GNU vector extension was too ambitious. I wonder if there's
> >> value in pursuing these alternate paths?
> >>
> >> 1. Can implementing this extension in 

PR115394: Remove streamer_debugging and it's uses

2024-07-10 Thread Prathamesh Kulkarni
Hi Richard,
As per your suggestion in PR, the attached patch removes streamer_debugging and 
it's uses.
Bootstrapped on aarch64-linux-gnu.
OK to commit ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
[PR115394] Remove streamer_debugging and it's uses.

gcc/ChangeLog:
PR lto/115394
* lto-streamer.h: Remove streamer_debugging definition.
* lto-streamer-out.cc (stream_write_tree_ref): Remove use of 
streamer_debugging.
(lto_output_tree): Likewise.
* tree-streamer-in.cc (streamer_read_tree_bitfields): Likewise.
(streamer_get_pickled_tree): Likewise.
* tree-streamer-out.cc (pack_ts_base_value_fields): Likewise.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index d4f728094ed..8b4bf9659cb 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -487,8 +487,6 @@ stream_write_tree_ref (struct output_block *ob, tree t)
gcc_checking_assert (tag == LTO_global_stream_ref);
  streamer_write_hwi (ob, -(int)(ix * 2 + id + 1));
}
-  if (streamer_debugging)
-   streamer_write_uhwi (ob, TREE_CODE (t));
 }
 }
 
@@ -1839,9 +1837,6 @@ lto_output_tree (struct output_block *ob, tree expr,
 will instantiate two different nodes for the same object.  */
   streamer_write_record_start (ob, LTO_tree_pickle_reference);
   streamer_write_uhwi (ob, ix);
-  if (streamer_debugging)
-   streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
-lto_tree_code_to_tag (TREE_CODE (expr)));
   lto_stats.num_pickle_refs_output++;
 }
   else
@@ -1882,9 +1877,6 @@ lto_output_tree (struct output_block *ob, tree expr,
}
  streamer_write_record_start (ob, LTO_tree_pickle_reference);
  streamer_write_uhwi (ob, ix);
- if (streamer_debugging)
-   streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
-lto_tree_code_to_tag (TREE_CODE (expr)));
}
   in_dfs_walk = false;
   lto_stats.num_pickle_refs_output++;
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index e8dbba471ed..79c44d2cae7 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -126,10 +126,6 @@ along with GCC; see the file COPYING3.  If not see
 
 typedef unsigned char  lto_decl_flags_t;
 
-/* Stream additional data to LTO object files to make it easier to debug
-   streaming code.  This changes object files.  */
-static const bool streamer_debugging = false;
-
 /* Tags representing the various IL objects written to the bytecode file
(GIMPLE statements, basic blocks, EH regions, tree nodes, etc).
 
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index 35341a2b2b6..c248a74f7a1 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -485,15 +485,6 @@ streamer_read_tree_bitfields (class lto_input_block *ib,
 
   /* Read the bitpack of non-pointer values from IB.  */
   bp = streamer_read_bitpack (ib);
-
-  /* The first word in BP contains the code of the tree that we
- are about to read.  */
-  if (streamer_debugging)
-{
-  code = (enum tree_code) bp_unpack_value (&bp, 16);
-  lto_tag_check (lto_tree_code_to_tag (code),
-lto_tree_code_to_tag (TREE_CODE (expr)));
-}
   code = TREE_CODE (expr);
 
   /* Note that all these functions are highly sensitive to changes in
@@ -1110,17 +1101,8 @@ streamer_get_pickled_tree (class lto_input_block *ib, 
class data_in *data_in)
 {
   unsigned HOST_WIDE_INT ix;
   tree result;
-  enum LTO_tags expected_tag;
 
   ix = streamer_read_uhwi (ib);
   result = streamer_tree_cache_get_tree (data_in->reader_cache, ix);
-
-  if (streamer_debugging)
-{
-  expected_tag = streamer_read_enum (ib, LTO_tags, LTO_NUM_TAGS);
-  gcc_assert (result
- && TREE_CODE (result) == lto_tag_to_tree_code (expected_tag));
-}
-
   return result;
 }
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index c30ab62a585..b7205287ffb 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -71,8 +71,6 @@ write_identifier (struct output_block *ob,
 static inline void
 pack_ts_base_value_fields (struct bitpack_d *bp, tree expr)
 {
-  if (streamer_debugging)
-bp_pack_value (bp, TREE_CODE (expr), 16);
   if (!TYPE_P (expr))
 {
   bp_pack_value (bp, TREE_SIDE_EFFECTS (expr), 1);


[Fortran, Patch, PR78466, coarray, v1] Fix Explicit cobounds of a procedures parameter not respected

2024-07-10 Thread Andre Vehreschild
Hi all,

the attached patch fixes explicit cobounds of procedure parameters not
respected. The central issue is, that class (array) types store their
attributes and `as` in the first component of the derived type. This made
comparison of existing types harder and gfortran confused generated trees for
different cobounds. The attached patch fixes this.

Note, the patch is based
on https://gcc.gnu.org/pipermail/fortran/2024-July/060645.html . Without it the
test poly_run_2 fails.

Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?

This patch also fixes PR fortran/80774.

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 32d8a8da4e1e6120c515932878994514e04c909d Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 31 Dec 2020 10:40:30 +0100
Subject: [PATCH] Fortran: Fix Explicit cobounds of a procedures parameter not
 respected [PR78466]

Explicit cobounds of class array procedure parameters were not taken
into account.  Furthermore were different cobounds in distinct
procedure parameter lists mixed up, i.e. the last definition was taken
for all.  The bounds are now regenerated when tree's and expr's bounds
do not match.

	PR fortran/78466
	PR fortran/80774

gcc/fortran/ChangeLog:

	* array.cc (gfc_compare_array_spec): Take cotype into account.
	* class.cc (gfc_build_class_symbol): Coarrays are also arrays.
	* gfortran.h (IS_CLASS_COARRAY_OR_ARRAY): New macro to detect
	regular and coarray class arrays.
	* interface.cc (compare_components): Take codimension into
	account.
	* resolve.cc (resolve_symbol): Improve error message.
	* simplify.cc (simplify_bound_dim): Remove duplicate.
	* trans-array.cc (gfc_trans_array_cobounds): Coarrays are also
	arrays.
	(gfc_trans_array_bounds): Same.
	(gfc_trans_dummy_array_bias): Same.
	(get_coarray_as): Get the as having a non-zero codim.
	(is_explicit_coarray): Detect explicit coarrays.
	(gfc_conv_expr_descriptor): Create a new descriptor for explicit
	coarrays.
	* trans-decl.cc (gfc_build_qualified_array): Coarrays are also
	arrays.
	(gfc_build_dummy_array_decl): Same.
	(gfc_get_symbol_decl): Same.
	(gfc_trans_deferred_vars): Same.
	* trans-expr.cc (class_scalar_coarray_to_class): Get the
	descriptor from the correct location.
	(gfc_conv_variable): Pick up the descriptor when needed.
	* trans-types.cc (gfc_is_nodesc_array): Coarrays are also
	arrays.
	(gfc_get_nodesc_array_type): Indentation fix only.
	(cobounds_match_decl): Match a tree's bounds to the expr's
	bounds and return true, when they match.
	(gfc_get_derived_type): Create a new type tree/descriptor, when
	the cobounds of the existing declaration and expr to not
	match.  This happends for class arrays in parameter list, when
	there are different cobound declarations.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/poly_run_1.f90: Activate old test code.
	* gfortran.dg/coarray/poly_run_2.f90: Activate test.  It was
	stopping before and passing without an error.
---
 gcc/fortran/array.cc  |  3 +
 gcc/fortran/class.cc  |  8 +-
 gcc/fortran/gfortran.h|  5 ++
 gcc/fortran/interface.cc  |  7 ++
 gcc/fortran/resolve.cc|  3 +-
 gcc/fortran/simplify.cc   |  2 -
 gcc/fortran/trans-array.cc| 53 -
 gcc/fortran/trans-decl.cc | 20 ++---
 gcc/fortran/trans-expr.cc | 34 ++---
 gcc/fortran/trans-types.cc| 74 ---
 .../gfortran.dg/coarray/poly_run_1.f90| 33 -
 .../gfortran.dg/coarray/poly_run_2.f90| 28 ---
 12 files changed, 207 insertions(+), 63 deletions(-)

diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
index e9934f1491b..79c774d59a0 100644
--- a/gcc/fortran/array.cc
+++ b/gcc/fortran/array.cc
@@ -1017,6 +1017,9 @@ gfc_compare_array_spec (gfc_array_spec *as1, gfc_array_spec *as2)
   if (as1->type != as2->type)
 return 0;

+  if (as1->cotype != as2->cotype)
+return 0;
+
   if (as1->type == AS_EXPLICIT)
 for (i = 0; i < as1->rank + as1->corank; i++)
   {
diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index abe89630be3..b9dcc0a3d98 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -709,8 +709,12 @@ gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
  work on the declared type. All array type other than deferred shape or
  assumed rank are added to the function namespace to ensure that they
  are properly distinguished.  */
-  if (attr->dummy && !attr->codimension && (*as)
-  && !((*as)->type == AS_DEFERRED || (*as)->type == AS_ASSUMED_RANK))
+  if (attr->dummy && (*as)
+  && ((!attr->codimension
+	   && !((*as)->type == AS_DEFERRED || (*as)->type == AS_ASSUMED_RANK))
+	  || (attr->codimension
+	  && !((*as)->cotype == AS_DEFERRED
+		   || (*as)->cotype == AS_ASSUMED_RANK
 {
   char *sname;
   ns = gfc_current

Re: PR115394: Remove streamer_debugging and it's uses

2024-07-10 Thread Richard Biener
On Wed, 10 Jul 2024, Prathamesh Kulkarni wrote:

> Hi Richard,
> As per your suggestion in PR, the attached patch removes streamer_debugging 
> and it's uses.
> Bootstrapped on aarch64-linux-gnu.
> OK to commit ?

OK.

Thanks,
Richard.

> Signed-off-by: Prathamesh Kulkarni 
> 
> Thanks,
> Prathamesh
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v1] Match: Support form 2 for the .SAT_TRUNC

2024-07-10 Thread Richard Biener
On Fri, Jul 5, 2024 at 2:48 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to add form 2 support for the .SAT_TRUNC.  Aka:
>
> Form 2:
>   #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
>   NT __attribute__((noinline)) \
>   sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
>   {\
> bool overflow = x > (WT)(NT)(-1);  \
> return overflow ? (NT)-1 : (NT)x;  \
>   }
>
> DEF_SAT_U_TRUC_FMT_2(uint32, uint64)
>
> Before this patch:
>3   │
>4   │ __attribute__((noinline))
>5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
>6   │ {
>7   │   uint32_t _1;
>8   │   long unsigned int _3;
>9   │
>   10   │ ;;   basic block 2, loop depth 0
>   11   │ ;;pred:   ENTRY
>   12   │   _3 = MIN_EXPR ;
>   13   │   _1 = (uint32_t) _3;
>   14   │   return _1;
>   15   │ ;;succ:   EXIT
>   16   │
>   17   │ }
>
> After this patch:
>3   │
>4   │ __attribute__((noinline))
>5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
>6   │ {
>7   │   uint32_t _1;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _1 = .SAT_TRUNC (x_2(D)); [tail call]
>   12   │   return _1;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch:
> 1. The x86 bootstrap test.
> 2. The x86 fully regression test.
> 3. The rv64gcv fully regresssion test.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * match.pd: Add form 2 for .SAT_TRUNC.
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Add new case NOP_EXPR,  and try to match SAT_TRUNC.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 17 -
>  gcc/tree-ssa-math-opts.cc |  4 
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4edfa2ae2c9..3759c64d461 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3234,7 +3234,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> -/* Unsigned saturation truncate, case 1 (), sizeof (WT) > sizeof (NT).
> +/* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
>  (match (unsigned_integer_sat_trunc @0)
>   (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> @@ -3250,6 +3250,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>}
>(if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
>
> +/* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
> +   SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
> +(match (unsigned_integer_sat_trunc @0)
> + (convert (min @0 INTEGER_CST@1))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> + (with
> +  {
> +   unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> +   unsigned otype_precision = TYPE_PRECISION (type);
> +   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
> +   wide_int int_cst = wi::to_wide (@1, itype_precision);
> +  }
> +  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index a35caf5f058..ac86be8eb94 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6170,6 +6170,10 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>   match_unsigned_saturation_sub (&gsi, as_a (stmt));
>   break;
>
> +   case NOP_EXPR:
> + match_unsigned_saturation_trunc (&gsi, as_a (stmt));
> + break;
> +
> default:;
> }
> }
> --
> 2.34.1
>


[PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-07-10 Thread pan2 . li
From: Pan Li 

The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
For example _1 = .SAT_ADD (_2, 9) comes from below sample code.

Form 3:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
  T __attribute__((noinline))  \
  vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
  {\
unsigned i;\
T ret; \
for (i = 0; i < limit; i++)\
  {\
out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
  }\
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)

It will failure to vectorize as the vectorizable_call will check the
operands is type_compatiable but the imm will be treated as unsigned
SImode from the perspective of tree.  Aka

uint64_t _1;
uint64_t _2;

_1 = .SAT_ADD (_2, 9);

The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
SImode,  and then result in vectorizable_call fails.  This patch would
like to promote the imm operand to the operand type mode of _2 if and
only if there is no precision/data loss.  Aka convert the imm 9 to the
DImode for above example.

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
new func impl to promote the imm tree to target type.
(vect_recog_sat_add_pattern): Peform the type promotion before
generate .SAT_ADD call.

Signed-off-by: Pan Li 
---
 gcc/tree-vect-patterns.cc | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 86e893a1c43..e1013222b12 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
   return NULL;
 }
 
+static void
+vect_recog_promote_cst_to_unsigned (tree *op, tree type)
+{
+  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
+return;
+
+  unsigned precision = TYPE_PRECISION (type);
+  wide_int type_max = wi::mask (precision, false, precision);
+  wide_int op_cst_val = wi::to_wide (*op, precision);
+
+  if (wi::leu_p (op_cst_val, type_max))
+*op = wide_int_to_tree (type, op_cst_val);
+}
+
 /*
  * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
  *   _7 = _4 + _6;
@@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
stmt_vec_info stmt_vinfo,
 
   if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
 {
+  vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
+  vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
+
   gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
  IFN_SAT_ADD, type_out,
  lhs, ops[0], ops[1]);
-- 
2.34.1



Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Tejas Belagod

On 7/10/24 2:38 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:


On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

  bool foo (svint32_t a, svint32_t b)
  {
svbool_t p = a > b;

// Here p[2] is not the same as a[2] > b[2].
return p[2];
  }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

  typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
  typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
   https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
 auto r = a < b;
}

produces, with C23:

 vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Restricting __attribute__((vector_mask)) to apply only to target
intrinsic types? Eg.

On SVE something like:
typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.

On AVX, something like:
typedef __m256i __mask32 __attribute__((vector_mask)

Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-10 Thread Richard Biener
On Mon, 8 Jul 2024, Filip Kastl wrote:

> Hi,
> 
> I'm replying to Richard and keeping Andrew in cc since your suggestions
> overlap.
> 
> 
> On Tue 2024-06-11 14:48:06, Richard Biener wrote:
> > On Thu, 30 May 2024, Filip Kastl wrote:
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> > 
> > I think it's better to enable -mpopcnt and -mbmi (or what remains
> > as minimal requirement).
> 
> Will do.  Currently the testcases are in the i386 directory.  After I exchange
> the -march for -mpopcnt -mbmi can I put these testcases into gcc.dg/tree-ssa?
> Will the -mpopcnt -mbmi options work with all target architectures?

No, those are i386 specific flags.  At least for popcount there's
dejagnu effective targets popcount, popcountl and popcountll so you
could do

/* { dg-additional-options "-mpopcnt" { target { x86_64-*-* i?86-*-* } } } 
*/

and guard the tree dump scan with { target popcount } to cover other
archs that have popcount (without adding extra flags).

> > > +/* Check that the "exponential index transform" can be applied to this 
> > > switch.
> > > +
> > > +   See comment of the exp_index_transform function for details about this
> > > +   transformation.
> > > +
> > > +   We want:
> > > +   - This form of the switch is more efficient
> > > +   - Cases are powers of 2
> > > +
> > > +   Expects that SWTCH has at least one case.  */
> > > +
> > > +bool
> > > +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> > > +{
> > > +  tree index = gimple_switch_index (swtch);
> > > +  tree index_type = TREE_TYPE (index);
> > > +  basic_block swtch_bb = gimple_bb (swtch);
> > > +  unsigned num_labels = gimple_switch_num_labels (swtch);
> > > +
> > > +  /* Check that we can efficiently compute logarithm of 2^k (using FFS) 
> > > and
> > > + test that a given number is 2^k for some k (using POPCOUNT).  */
> > > +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> > > +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> > > +|| !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, 
> > > opt_type))
> > > +return false;
> > > +
> > 
> > See above, I think this can be improved.  Would be nice to split out
> > a can_pow2p (type) and can_log2 (type) and a corresponding
> > gen_pow2p (op) and gen_log2 (op) function so this could be re-used
> > and alternate variants added when required.
> > 
> 
> Just to check that I understand this correctly:  You'd like me to create
> functions can_pow2p, can_log2.  Those functions will check that there are
> optabs for the target machine which allow us to efficiently test
> power-of-2-ness of a number and which allow us to efficiently compute the
> base-2 log of a power-of-2 number.  You'd also like me to create functions
> gen_pow2p and gen_log2 which generate this code.  For now these functions will
> just use POPCOUNT and FFS but they can be later extended to also consider
> different instructions.  Is that right?

Right.

> Into which file should I put these functions?

Just in this file for now.
 
> Is can_pow2p and gen_pow2p necessary?  As you noted one can always use
> (x & -x) == x so testing pow2p can always be done efficiently.

If you add this fallback then can_pow2p / gen_pow2p wouldn't be
necessary indeed.

> > > +  /* Insert a statement that takes the logarithm of the index variable.  
> > > */
> > > +  tree tmp2 = make_ssa_name (index_type);
> > > +  gsi = gsi_start_bb (swtch_bb);
> > 
> > Please use gsi_after_labels (swtch_bb) even though you know there's no
> > labels there.
> > 
> > > +  gcall *stmt_ffs = gimple_build_call_internal (IFN_FFS, 1, index);
> > > +  gimple_call_set_lhs (stmt_ffs, tmp2);
> > > +  gsi_insert_before (&gsi, stmt_ffs, GSI_SAME_STMT);
> > > +
> > > +  tree tmp3 = make_ssa_name (index_type);
> > > +  gassign *stmt_minus_one = gimple_build_assign (tmp3, MINUS_EXPR, tmp2, 
> > > one);
> > > +  gsi_insert_before (&gsi, stmt_minus_one, GSI_SAME_STMT);
> > 
> > You could also use
> > 
> >  tree tmp2 = gimple_build (gsi, true, GSI_SAME_STMT,
> >UNKNOWN_LOCATION, IFN_FFS, index);
> >  tree tmp3 = gimple_build (gsi, true, GSI_SAME_STMT,
> >UNKNOWN_LOCATION, MINUS_EXPR, tmp2, one);
> > 
> > which does the stmt building, temporary SSA name creation and insertion
> > plus eventually folding things with a definition.  There isn't a
> > gimple_build_cond with this API, but it would still work for the
> > popcount build above.
> 
> I've tried using the gimple_build API and it is indeed nicer.  However, I
> wasn't able to get it working with the FFS internal function.  IFN_FFS is of a
> different type than what gimple_build accepts.  I've tried this
> 
>   tree tmp2 = gimple_build (&gsi, true, GSI_SAME_STMT, UNKNOWN_LOCATION,
> as_combined_fn (IFN_FFS), index);
>
> but that only caused an ICE.  I tried looking for an example of using
> gimple_bui

Re: [PATCH] c++, contracts: Fix ICE in create_tmp_var [PR113968]

2024-07-10 Thread Nina Dinka Ranns
On Tue, 9 Jul 2024 at 22:50, Jason Merrill  wrote:

> On 7/9/24 6:41 AM, Nina Dinka Ranns wrote:
> > On Mon, 8 Jul 2024 at 16:01, Jason Merrill  > > wrote:
> >
> > On 7/8/24 7:47 AM, Nina Dinka Ranns wrote:
> >  > HI Jason,
> >  >
> >  > On Fri, 5 Jul 2024 at 17:31, Jason Merrill  > > wrote:
> >  >>
> >  >> On 7/5/24 10:25 AM, Nina Dinka Ranns wrote:
> >  >>> Certain places in contract parsing currently do not check for
> > errors.
> >  >>> This results in contracts
> >  >>> with embedded errors which eventually confuse gimplify. Checks
> for
> >  >>> errors added in
> >  >>> grok_contract() and cp_parser_contract_attribute_spec() to exit
> > early
> >  >>> if an error is encountered.
> >  >>
> >  >> Thanks for the patch!
> >  >>
> >  >>> Tested on x86_64-pc-linux-gnu
> >  >>> ---
> >  >>>
> >  >>>   PR c++/113968
> >  >>>
> >  >>> gcc/cp/ChangeLog:
> >  >>>
> >  >>>   * contracts.cc (grok_contract): Check for
> > error_mark_node early
> >  >>> exit
> >  >>
> >  >> These hunks are OK.
> >  >>
> >  >>>   * parser.cc (cp_parser_contract_attribute_spec):
> > Check for
> >  >>> error_mark_node early exit
> >  >>
> >  >> This seems redundant, since finish_contract_attribute already
> > checks for
> >  >> error_mark_node and we're returning its result unchanged.
> >  >
> >  > good catch, removed.
> >  >
> >  >>
> >  >> Also, the convention is for wrapped lines in ChangeLog entries
> > to line
> >  >> up with the *, and to finish sentences with a period.
> >  >
> >  > done.
> >  >
> >  > Tests re-run on x86_64-pc-linux-gnu , no change.
> >
> > This looks good, but the patch doesn't apply due to word wrap.  To
> > avoid
> > that, I tend to use git send-email; sending the patch as an
> attachment
> > is also OK.  Or see
> >
> > https://www.kernel.org/doc/html/latest/process/email-clients.html
> > 
> >
> > for tips on getting various email clients to leave patches alone.
> >
> >
> > ack, thank you for your patience.
> > This time, patch attached to the email.
>
> It looks like the attached patch reverted to older ChangeLog entries,
> without the periods, and with the dropped parser.cc change?
>
> git gcc-verify also complains
>
> > ERR: line should start with a tab: "* contracts.cc
> (grok_contract): Check for error_mark_node early"
> > ERR: line should start with a tab: "  exit"
> > ERR: line should start with a tab: "* parser.cc
> (cp_parser_contract_attribute_spec): Check for"
> > ERR: line should start with a tab: "  error_mark_node early exit"
> > ERR: line should start with a tab: "*
> g++.dg/contracts/pr113968.C: New test."
> > ERR: PR 113968 in subject but not in changelog: "c++, contracts: Fix ICE
> in create_tmp_var [PR113968]"
>
> Jason
>
>
Apologies. I must have copy pasted something wrong. I've setup gcc-verify
and that passes.
Let's try again. Patch attached.

Thank you,
Nina
From f529ffd1c3abb2229ff715e5de123a0bfed9ff2a Mon Sep 17 00:00:00 2001
From: Nina Ranns 
Date: Thu, 4 Jul 2024 17:08:58 +0100
Subject: [PATCH] c++, contracts: Fix ICE in create_tmp_var [PR113968]

During contract parsing, in grok_contract(), we proceed even if the
condition contains errors. This results in contracts with embedded errors
which eventually confuse gimplify. Checks for errors have been added in
grok_contract() to exit early if an error is encountered.

	PR c++/113968

gcc/cp/ChangeLog:

	* contracts.cc (grok_contract): Check for error_mark_node early
	exit.

gcc/testsuite/ChangeLog:

	* g++.dg/contracts/pr113968.C: New test.

Signed-off-by: Nina Ranns 
---
 gcc/cp/contracts.cc   |  7 ++
 gcc/testsuite/g++.dg/contracts/pr113968.C | 29 +++
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/contracts/pr113968.C

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 634e3cf4fa9..a7d0fdacf6e 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -750,6 +750,9 @@ tree
 grok_contract (tree attribute, tree mode, tree result, cp_expr condition,
 	   location_t loc)
 {
+  if (condition == error_mark_node)
+return error_mark_node;
+
   tree_code code;
   if (is_attribute_p ("assert", attribute))
 code = ASSERTION_STMT;
@@ -785,6 +788,10 @@ grok_contract (tree attribute, tree mode, tree result, cp_expr condition,
 
   /* The condition is converted to bool.  */
   condition = finish_contract_condition (condition);
+
+  if (condition == error_mark_node)
+return error_mark_node;
+
   CONTRACT_CONDITION (contract) = condition;
 
   return contract;
diff --git a/gcc/testsuite/g++.dg/contracts/pr113968.C b/gcc/testsui

[PATCH v2] Fix Xcode 16 build break with NULL != nullptr

2024-07-10 Thread Daniel Bertalan
As of Xcode 16 beta 2 with the macOS 15 SDK, each re-inclusion of the
stddef.h header causes the NULL macro in C++ to be re-defined to an
integral constant (__null). This makes the workaround in d59a576b8
("Redefine NULL to nullptr") ineffective, as other headers that are
typically included after system.h (such as obstack.h) do include
stddef.h too.

This can be seen by running the sample below through `clang++ -E`

#include 
#define NULL nullptr
#include 
NULL

The relevant libc++ change is here:
https://github.com/llvm/llvm-project/commit/2950283dddab03c183c1be2d7de9d4999cc86131

Filed as FB14261859 to Apple and added a comment about it on LLVM PR
86843.

This fixes the cases in --enable-languages=c,c++,objc,obj-c++,rust build
where NULL being an integral constant instead of a null pointer literal
(therefore no longer implicitly converting to a pointer when used as a
template function's argument) caused issues.

gcc/value-pointer-equiv.cc:65:43: error: no viable conversion from 
`pair::type, typename 
__unwrap_ref_decay::type>' to 'const pair'

65 |   const std::pair  m_marker = std::make_pair (NULL, NULL);
   |   ^~~

As noted in the previous commit though, the proper solution would be to
phase out the usages of NULL in GCC's C++ source code.

gcc/analyzer/ChangeLog:

* diagnostic-manager.cc (saved_diagnostic::saved_diagnostic):
Change NULL to nullptr.
(struct null_assignment_sm_context): Likewise.
* infinite-loop.cc: Likewise.
* infinite-recursion.cc: Likewise.
* varargs.cc (va_list_state_machine::on_leak): Likewise.

gcc/rust/ChangeLog:

* metadata/rust-imports.cc (Import::try_package_in_directory):
Change NULL to nullptr.

gcc/ChangeLog:

* value-pointer-equiv.cc: Change NULL to nullptr.
---
 gcc/analyzer/diagnostic-manager.cc | 18 +-
 gcc/analyzer/infinite-loop.cc  |  2 +-
 gcc/analyzer/infinite-recursion.cc |  2 +-
 gcc/analyzer/varargs.cc|  2 +-
 gcc/rust/metadata/rust-imports.cc  |  2 +-
 gcc/value-pointer-equiv.cc |  2 +-
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index fe943ac61c9e..51304b0795b6 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -679,12 +679,12 @@ saved_diagnostic::saved_diagnostic (const state_machine 
*sm,
   m_stmt (ploc.m_stmt),
   /* stmt_finder could be on-stack; we want our own copy that can
  outlive that.  */
-  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : NULL),
+  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : nullptr),
   m_loc (ploc.m_loc),
   m_var (var), m_sval (sval), m_state (state),
-  m_d (std::move (d)), m_trailing_eedge (NULL),
+  m_d (std::move (d)), m_trailing_eedge (nullptr),
   m_idx (idx),
-  m_best_epath (NULL), m_problem (NULL),
+  m_best_epath (nullptr), m_problem (nullptr),
   m_notes ()
 {
   /* We must have an enode in order to be able to look for paths
@@ -1800,10 +1800,10 @@ public:
stmt,
stack_depth,
sm,
-   NULL,
+   nullptr,
src_sm_val,
dst_sm_val,
-   NULL,
+   nullptr,
dst_state,
src_node));
 return false;
@@ -1993,9 +1993,9 @@ struct null_assignment_sm_context : public sm_context
m_sm,
var_new_sval,
from, to,
-   NULL,
+   nullptr,
*m_new_state,
-   NULL));
+   nullptr));
   }
 
   void set_next_state (const gimple *stmt,
@@ -2019,9 +2019,9 @@ struct null_assignment_sm_context : public sm_context
m_sm,
sval,
from, to,
-   NULL,
+   nullptr,
*m_new_state,
-   NULL));
+   nullptr));
   }
 
   void warn (const supernode *, const gimple *,
diff --git a/gcc/analyzer/infinite-loop.cc b/gcc/analyzer/infinite-loop.cc
index 8ba8e70acffc..6ac0a5b373d8 100644
--- a/gcc/analyzer/infinite-loop.cc
+++ b/gcc/analyzer/infinite-loop.cc
@@ -240,7 +240,7 @@ public:

[PATCH] RISC-V: c implies zca, and conditionally zcf & zcd

2024-07-10 Thread Fei Gao
According to Zc-1.0.4-3.pdf from
https://github.com/riscvarchive/riscv-code-size-reduction/releases/tag/v1.0.4-3
The rule is that:
- C always implies Zca
- C+F implies Zcf (RV32 only)
- C+D implies Zcd

Signed-off-by: Fei Gao 
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
c implies zca, and conditionally zcf & zcd.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-15.c: adapt TC.
* gcc.target/riscv/attribute-16.c: likewise.
* gcc.target/riscv/attribute-17.c: likewise.
* gcc.target/riscv/attribute-18.c: likewise.
* gcc.target/riscv/pr110696.c: likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: likewise.
* gcc.target/riscv/rvv/base/pr114352-1.c: likewise.
* gcc.target/riscv/rvv/base/pr114352-3.c: likewise.
* gcc.target/riscv/arch-39.c: New test.
* gcc.target/riscv/arch-40.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc  | 12 
 gcc/testsuite/gcc.target/riscv/arch-39.c |  7 +++
 gcc/testsuite/gcc.target/riscv/arch-40.c |  7 +++
 gcc/testsuite/gcc.target/riscv/attribute-15.c|  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-16.c|  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-17.c|  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-18.c|  2 +-
 gcc/testsuite/gcc.target/riscv/pr110696.c|  2 +-
 .../riscv/rvv/base/abi-callee-saved-1-zcmp.c |  2 +-
 .../riscv/rvv/base/abi-callee-saved-2-zcmp.c |  2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-1.c |  4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c |  8 
 12 files changed, 39 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-39.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-40.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index b9bda3e110a..8622f7f3c79 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -82,6 +82,18 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"a", "zaamo"},
   {"a", "zalrsc"},
 
+  {"c", "zca"},
+  {"c", "zcf",
+   [] (const riscv_subset_list *subset_list) -> bool
+   {
+ return subset_list->xlen () == 32 && subset_list->lookup ("f");
+   }},
+  {"c", "zcd",
+   [] (const riscv_subset_list *subset_list) -> bool
+   {
+ return subset_list->lookup ("d");
+   }},
+
   {"zabha", "zaamo"},
 
   {"zdinx", "zfinx"},
diff --git a/gcc/testsuite/gcc.target/riscv/arch-39.c 
b/gcc/testsuite/gcc.target/riscv/arch-39.c
new file mode 100644
index 000..beeb81e44c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-39.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64idc_zcmt -mabi=lp64d" } */
+int
+foo ()
+{}
+
+/* { dg-error "zcd conflicts with zcmt" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-40.c 
b/gcc/testsuite/gcc.target/riscv/arch-40.c
new file mode 100644
index 000..eaefaf1d0d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-40.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64idc_zcmp -mabi=lp64d" } */
+int
+foo ()
+{}
+
+/* { dg-error "zcd conflicts with zcmp" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-15.c 
b/gcc/testsuite/gcc.target/riscv/attribute-15.c
index a2e394b6489..ac6caaecd4f 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-15.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-15.c
@@ -3,4 +3,4 @@
 int foo()
 {
 }
-/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zaamo1p0_zalrsc1p0\"" } } */
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0\"" 
} } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-16.c 
b/gcc/testsuite/gcc.target/riscv/attribute-16.c
index d2b18160cb5..539e426ca97 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-16.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-16.c
@@ -3,4 +3,4 @@
 int foo()
 {
 }
-/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0\"" 
} } */
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0\""
 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-17.c 
b/gcc/testsuite/gcc.target/riscv/attribute-17.c
index fc2f488a3ac..30928cb5b68 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-17.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-17.c
@@ -3,4 +3,4 @@
 int foo()
 {
 }
-/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0\"" 
} } */
+/* { dg-final { scan-assembler ".attr

RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]

2024-07-10 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, June 20, 2024 8:55 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes
> known not to overflow. [PR114932]
> 
> On Wed, 19 Jun 2024, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, June 19, 2024 1:14 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> bin.ch...@linux.alibaba.com
> > > Subject: Re: [PATCH][ivopts]: perform affine fold on unsigned addressing
> modes
> > > known not to overflow. [PR114932]
> > >
> > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > When the patch for PR114074 was applied we saw a good boost in
> exchange2.
> > > >
> > > > This boost was partially caused by a simplification of the addressing 
> > > > modes.
> > > > With the patch applied IV opts saw the following form for the base
> addressing;
> > > >
> > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) 
> > > > l0_19(D) *
> > > > 324) + 36)
> > > >
> > > > vs what we normally get:
> > > >
> > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((integer(kind=8)) 
> > > > l0_19(D)
> > > > * 81) + 9) * 4
> > > >
> > > > This is because the patch promoted multiplies where one operand is a
> constant
> > > > from a signed multiply to an unsigned one, to attempt to fold away the
> constant.
> > > >
> > > > This patch attempts the same but due to the various problems with SCEV 
> > > > and
> > > > niters not being able to analyze the resulting forms (i.e. PR114322) we 
> > > > can't
> > > > do it during SCEV or in the general form like in fold-const like 
> > > > extract_muldiv
> > > > attempts.
> > > >
> > > > Instead this applies the simplification during IVopts initialization 
> > > > when we
> > > > create the IV.  Essentially when we know the IV won't overflow with 
> > > > regards
> to
> > > > niters then we perform an affine fold which gets it to simplify the 
> > > > internal
> > > > computation, even if this is signed because we know that for IVOPTs 
> > > > uses the
> > > > IV won't ever overflow.  This allows IV opts to see the simplified form
> > > > without influencing the rest of the compiler.
> > > >
> > > > as mentioned in PR114074 it would be good to fix the missed 
> > > > optimization in
> the
> > > > other passes so we can perform this in general.
> > > >
> > > > The reason this has a big impact on fortran code is that fortran 
> > > > doesn't seem
> to
> > > > have unsigned integer types.  As such all it's addressing are created 
> > > > with
> > > > signed types and folding does not happen on them due to the possible
> overflow.
> > > >
> > > > concretely on AArch64 this changes the results from generation:
> > > >
> > > > mov x27, -108
> > > > mov x24, -72
> > > > mov x23, -36
> > > > add x21, x1, x0, lsl 2
> > > > add x19, x20, x22
> > > > .L5:
> > > > add x0, x22, x19
> > > > add x19, x19, 324
> > > > ldr d1, [x0, x27]
> > > > add v1.2s, v1.2s, v15.2s
> > > > str d1, [x20, 216]
> > > > ldr d0, [x0, x24]
> > > > add v0.2s, v0.2s, v15.2s
> > > > str d0, [x20, 252]
> > > > ldr d31, [x0, x23]
> > > > add v31.2s, v31.2s, v15.2s
> > > > str d31, [x20, 288]
> > > > bl  digits_20_
> > > > cmp x21, x19
> > > > bne .L5
> > > >
> > > > into:
> > > >
> > > > .L5:
> > > > ldr d1, [x19, -108]
> > > > add v1.2s, v1.2s, v15.2s
> > > > str d1, [x20, 216]
> > > > ldr d0, [x19, -72]
> > > > add v0.2s, v0.2s, v15.2s
> > > > str d0, [x20, 252]
> > > > ldr d31, [x19, -36]
> > > > add x19, x19, 324
> > > > add v31.2s, v31.2s, v15.2s
> > > > str d31, [x20, 288]
> > > > bl  digits_20_
> > > > cmp x21, x19
> > > > bne .L5
> > > >
> > > > The two patches together results in a 10% performance increase in 
> > > > exchange2
> in
> > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > compile
> > > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > > reduction in binary size.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/114932
> > > > * tree-scalar-evolution.cc (alloc_iv): Perform affine unsigned 
> > > > fold.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR tree-optimization/114932
> > > > * gfortran.dg/addressing-modes_1.f90: New test.
> > > >
> > > > ---
> > > > di

RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]

2024-07-10 Thread Tamar Christina
> > > I might also point back to the idea I threw in somewhere, adding
> > > OEP_VALUE (or a better name) to the set of flags accepted by
> > > operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> > > touching hashing?
> > >
> >
> > Yes, That can indeed be done with this approach.  The hashing was that 
> > before I
> > was trying to prevent the "duplicate" IV expressions from being created in 
> > the
> > first place by modifying get_loop_invariant_expr.
> >
> > This function looks up if we have already seen a particular IV expression 
> > and if
> > we have it just returns that expression.  However after reading more of the 
> > code
> > I realized this wasn't the right approach, as without also dealing with the
> candidates
> > we'd end up creating IV expression that can't be handled by any candidate.
> >
> > IVops would just give up then.   Reading the code it seems that
> get_loop_invariant_expr
> > is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` 
> > and `a` as the
> same.
> >
> > This is also why I think that everywhere else *has* to continue stripping 
> > the
> expression.
> >
> > On a note from Richard S that he thought IVopts already had some code to 
> > deal
> with
> > expressions that differ only in signs led me to take a different approach.
> >
> > The goal wasn't to remove the different sign/unsigned IV expressions, but
> instead get
> > Then to be servable by the same candidates. i.e. we want them in the same
> candidate
> > groups and then candidate optimization will just do its thing.
> >
> > That seemed a more natural fit to how it worked.
> 
> Yeah, I agree that sounds like the better strathegy.
> 
> > Do you want me to try the operand_equal_p approach? Though in this case the
> issue
> > is we not only need to know they're equal, but also need to know the scale
> factor.
> 
> For this case yes, but if you'd keep the code as-is, the equal with scale
> factor one case would be fixed.  Not a case with different scale factors
> though - but conversions "elsewhere" should be handled via the stripping.
> So it would work to simply adjust the operand_equal_p check here?
> 
> > get_computation_aff_1 scales the common type IV by the scale we determined,
> > so I think operand_equal_p would not be very useful here.  But it does look 
> > like
> > constant_multiple_of can just be implemented with
> aff_combination_constant_multiple_p.
> >
> > Should I try?
> 
> You've had the other places where you replace operand_equal_p with
> affine-compute and compare.  As said that has some associated cost
> as well as a limit on the number of elements after which it resorts
> back to operand_equal_p.  So for strict equality tests implementing
> a weaker operand_equal_p might be a better solution.
> 

The structural comparison is implemented as a new mode for operand_equal_p which
compares two expressions ignoring NOP converts (unless their bitsizes differ)
and ignoring constant values, but requiring both operands be a constant.

There is one downside compared to affine comparison, in that this approach does
not deal well with commutative operations. i.e. it does not see a + (b + c) as
equivalent to c + (b + a).

This means we lose out on some of the more complicated addressing modes, but
with so many operations the address will likely be split anyway and we'll deal
with it then.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu -m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/114932
* fold-const.cc (operand_compare::operand_equal_p): Use it.
(operand_compare::verify_hash_value): Likewise.
* tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
* tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.

gcc/testsuite/ChangeLog:

PR tree-optimization/114932
* gfortran.dg/addressing-modes_2.f90: New test.

-- inline copy of --

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 
710d697c0217c784b34f9f9f7b00b1945369076a..3d43020541c082c094164724da9d17fbb5793237
 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -3191,6 +3191,9 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
  precision differences.  */
   if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
 {
+  if (flags & OEP_STRUCTURAL_EQ)
+   return true;
+
   /* Address of INTEGER_CST is not defined; check that we did not forget
 to drop the OEP_ADDRESS_OF flags.  */
   gcc_checking_assert (!(flags & OEP_ADDRESS_OF));
@@ -3204,7 +3207,8 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
 because they may change the signedness of the arguments.  As pointers
 strictly don't have a signedness, require either two pointers or
 two non-pointers as well.  */
-  if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1))
+

Re: [PATCH v2] Fix Xcode 16 build break with NULL != nullptr

2024-07-10 Thread Iain Sandoe
Hello Daniel,

Thanks for the patch!

> On 10 Jul 2024, at 10:43, Daniel Bertalan  wrote:
> 
> As of Xcode 16 beta 2 with the macOS 15 SDK, each re-inclusion of the
> stddef.h header causes the NULL macro in C++ to be re-defined to an
> integral constant (__null). This makes the workaround in d59a576b8
> ("Redefine NULL to nullptr") ineffective, as other headers that are
> typically included after system.h (such as obstack.h) do include
> stddef.h too.
> 
> This can be seen by running the sample below through `clang++ -E`
> 
>#include 
>#define NULL nullptr
>#include 
>NULL
> 
> The relevant libc++ change is here:
> https://github.com/llvm/llvm-project/commit/2950283dddab03c183c1be2d7de9d4999cc86131
> 
> Filed as FB14261859 to Apple and added a comment about it on LLVM PR
> 86843.
> 
> This fixes the cases in --enable-languages=c,c++,objc,obj-c++,rust build
> where NULL being an integral constant instead of a null pointer literal
> (therefore no longer implicitly converting to a pointer when used as a
> template function's argument) caused issues.
> 
>gcc/value-pointer-equiv.cc:65:43: error: no viable conversion from 
> `pair::type, typename 
> __unwrap_ref_decay::type>' to 'const pair'
> 
>65 |   const std::pair  m_marker = std::make_pair (NULL, NULL);
>   |   ^~~
> 
> As noted in the previous commit though, the proper solution would be to
> phase out the usages of NULL in GCC's C++ source code.
> 
> gcc/analyzer/ChangeLog:
> 
>   * diagnostic-manager.cc (saved_diagnostic::saved_diagnostic):
>   Change NULL to nullptr.
>   (struct null_assignment_sm_context): Likewise.
>   * infinite-loop.cc: Likewise.
>   * infinite-recursion.cc: Likewise.
>   * varargs.cc (va_list_state_machine::on_leak): Likewise.
> 
> gcc/rust/ChangeLog:
> 
>   * metadata/rust-imports.cc (Import::try_package_in_directory):
>   Change NULL to nullptr.
> 
> gcc/ChangeLog:
> 
>   * value-pointer-equiv.cc: Change NULL to nullptr.

This is fine from a Darwin/macOS perspective - and I’d say the changes are 
generally in
the ‘obvious’ category - however, let’s give other maintainers some time to 
weigh in.

NOTE: if you do not have a current copyright assignment to the FSF for GCC, then
you need to post the patch under DCO - see 
https://gcc.gnu.org/contribute.html#legal
for more information.

thanks
Iain

> ---
> gcc/analyzer/diagnostic-manager.cc | 18 +-
> gcc/analyzer/infinite-loop.cc  |  2 +-
> gcc/analyzer/infinite-recursion.cc |  2 +-
> gcc/analyzer/varargs.cc|  2 +-
> gcc/rust/metadata/rust-imports.cc  |  2 +-
> gcc/value-pointer-equiv.cc |  2 +-
> 6 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/analyzer/diagnostic-manager.cc 
> b/gcc/analyzer/diagnostic-manager.cc
> index fe943ac61c9e..51304b0795b6 100644
> --- a/gcc/analyzer/diagnostic-manager.cc
> +++ b/gcc/analyzer/diagnostic-manager.cc
> @@ -679,12 +679,12 @@ saved_diagnostic::saved_diagnostic (const state_machine 
> *sm,
>   m_stmt (ploc.m_stmt),
>   /* stmt_finder could be on-stack; we want our own copy that can
>  outlive that.  */
> -  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : NULL),
> +  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : nullptr),
>   m_loc (ploc.m_loc),
>   m_var (var), m_sval (sval), m_state (state),
> -  m_d (std::move (d)), m_trailing_eedge (NULL),
> +  m_d (std::move (d)), m_trailing_eedge (nullptr),
>   m_idx (idx),
> -  m_best_epath (NULL), m_problem (NULL),
> +  m_best_epath (nullptr), m_problem (nullptr),
>   m_notes ()
> {
>   /* We must have an enode in order to be able to look for paths
> @@ -1800,10 +1800,10 @@ public:
>   stmt,
>   stack_depth,
>   sm,
> - NULL,
> + nullptr,
>   src_sm_val,
>   dst_sm_val,
> - NULL,
> + nullptr,
>   dst_state,
>   src_node));
> return false;
> @@ -1993,9 +1993,9 @@ struct null_assignment_sm_context : public sm_context
>   m_sm,
>   var_new_sval,
>   from, to,
> - NULL,
> + nullptr,
>   *m_new_state,
> - NULL));
> + nullptr));
>   }
> 
>   void set_next_state (const gimple *stmt,
> @@ -2019,9 +2019,9 @@ struct null_assignment_sm_context : public sm_context
>   m_sm,
> 

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Richard Sandiford
Tejas Belagod  writes:
> On 7/10/24 2:38 PM, Richard Biener wrote:
>> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:
>>>
>>> On 7/9/24 4:22 PM, Richard Biener wrote:
 On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  
 wrote:
>
> On 7/8/24 4:45 PM, Richard Biener wrote:
>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  
>> wrote:
>>>
>>> Hi,
>>>
>>> Sorry to have dropped the ball on
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
>>> here I've tried to pick it up again and write up a strawman proposal for
>>> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
>>>
>>>
>>> Thanks,
>>> Tejas.
>>>
>>> Motivation
>>> --
>>>
>>> The idea of packed boolean vectors came about when we wanted to support
>>> C/C++ operators on SVE ACLE types. The current vector boolean type that
>>> ACLE specifies does not adequately disambiguate vector lane sizes which
>>> they were derived off of. Consider this simple, albeit unrealistic, 
>>> example:
>>>
>>>   bool foo (svint32_t a, svint32_t b)
>>>   {
>>> svbool_t p = a > b;
>>>
>>> // Here p[2] is not the same as a[2] > b[2].
>>> return p[2];
>>>   }
>>>
>>> In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
>>> does not return the bool value corresponding to a[i] > b[i]. This
>>> necessitates a 'typed' vector boolean value that unambiguously
>>> represents results of operations
>>> of the same type.
>>>
>>> __attribute__((vector_mask))
>>> -
>>>
>>> Note: If interested in historical discussions refer to:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
>>>
>>> We define this new attribute which when applied to a base data vector
>>> produces a new boolean vector type that represents a boolean type that
>>> is produced as a result of operations on the corresponding base vector
>>> type. The following is the syntax.
>>>
>>>   typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
>>>   typedef v8si v8sib __attribute__((vector_mask));
>>>
>>> Here the 'base' data vector type is v8si or a vector of 8 integers.
>>>
>>> Rules
>>>
>>> • The layout/size of the boolean vector type is implementation-defined
>>> for its base data vector type.
>>>
>>> • Two boolean vector types who's base data vector types have same number
>>> of elements and lane-width have the same layout and size.
>>>
>>> • Consequently, two boolean vectors who's base data vector types have
>>> different number of elements or different lane-size have different 
>>> layouts.
>>>
>>> This aligns with gnu vector extensions that generate integer vectors as
>>> a result of comparisons - "The result of the comparison is a vector of
>>> the same width and number of elements as the comparison operands with a
>>> signed integral element type." according to
>>>https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
>>
>> Without having the time to re-review this all in detail I think the GNU
>> vector extension does not expose the result of the comparison as the
>> machine would produce it but instead a comparison "decays" to
>> a conditional:
>>
>> typedef int v4si __attribute__((vector_size(16)));
>>
>> v4si a;
>> v4si b;
>>
>> void foo()
>> {
>>  auto r = a < b;
>> }
>>
>> produces, with C23:
>>
>>  vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
>> 0, 0, 0 } > ;
>>
>> In fact on x86_64 with AVX and AVX512 you have two different "machine
>> produced" mask types and the above could either produce a AVX mask with
>> 32bit elements or a AVX512 mask with 1bit elements.
>>
>> Not exposing "native" mask types requires the compiler optimizing 
>> subsequent
>> uses and makes generic vectors difficult to combine with for example 
>> AVX512
>> intrinsics (where masks are just 'int').  Across an ABI boundary it's 
>> also
>> even more difficult to optimize mask transitions.
>>
>> But it at least allows portable code and it does not suffer from users 
>> trying to
>> expose machine representations of masks as input to generic vector code
>> with all the problems of constant folding not only requiring 
>> self-consistent
>> code within the compiler but compatibility with user produced constant 
>> masks.
>>
>> That said, I somewhat question the need to expose the target mask layout
>> to users for GCCs generic vector extension.
>>
>
> Thanks for your feedback.
>
> IIUC, I can imagine how having a GNU vector extension exposing the
> target vector 

RE: [PATCH v1] Match: Support form 2 for the .SAT_TRUNC

2024-07-10 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 10, 2024 5:24 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v1] Match: Support form 2 for the .SAT_TRUNC

On Fri, Jul 5, 2024 at 2:48 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to add form 2 support for the .SAT_TRUNC.  Aka:
>
> Form 2:
>   #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
>   NT __attribute__((noinline)) \
>   sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
>   {\
> bool overflow = x > (WT)(NT)(-1);  \
> return overflow ? (NT)-1 : (NT)x;  \
>   }
>
> DEF_SAT_U_TRUC_FMT_2(uint32, uint64)
>
> Before this patch:
>3   │
>4   │ __attribute__((noinline))
>5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
>6   │ {
>7   │   uint32_t _1;
>8   │   long unsigned int _3;
>9   │
>   10   │ ;;   basic block 2, loop depth 0
>   11   │ ;;pred:   ENTRY
>   12   │   _3 = MIN_EXPR ;
>   13   │   _1 = (uint32_t) _3;
>   14   │   return _1;
>   15   │ ;;succ:   EXIT
>   16   │
>   17   │ }
>
> After this patch:
>3   │
>4   │ __attribute__((noinline))
>5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
>6   │ {
>7   │   uint32_t _1;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _1 = .SAT_TRUNC (x_2(D)); [tail call]
>   12   │   return _1;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch:
> 1. The x86 bootstrap test.
> 2. The x86 fully regression test.
> 3. The rv64gcv fully regresssion test.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * match.pd: Add form 2 for .SAT_TRUNC.
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Add new case NOP_EXPR,  and try to match SAT_TRUNC.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 17 -
>  gcc/tree-ssa-math-opts.cc |  4 
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4edfa2ae2c9..3759c64d461 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3234,7 +3234,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> -/* Unsigned saturation truncate, case 1 (), sizeof (WT) > sizeof (NT).
> +/* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
>  (match (unsigned_integer_sat_trunc @0)
>   (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> @@ -3250,6 +3250,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>}
>(if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
>
> +/* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
> +   SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
> +(match (unsigned_integer_sat_trunc @0)
> + (convert (min @0 INTEGER_CST@1))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> + (with
> +  {
> +   unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> +   unsigned otype_precision = TYPE_PRECISION (type);
> +   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
> +   wide_int int_cst = wi::to_wide (@1, itype_precision);
> +  }
> +  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index a35caf5f058..ac86be8eb94 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6170,6 +6170,10 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>   match_unsigned_saturation_sub (&gsi, as_a (stmt));
>   break;
>
> +   case NOP_EXPR:
> + match_unsigned_saturation_trunc (&gsi, as_a (stmt));
> + break;
> +
> default:;
> }
> }
> --
> 2.34.1
>


RE: [PATCH]middle-end: Implement conditonal store vectorizer pattern [PR115531]

2024-07-10 Thread Tamar Christina
> > >
> > > > +   }
> > > > +
> > > > +  if (new_code == ERROR_MARK)
> > > > +   {
> > > > + /* We couldn't flip the condition, so invert the mask 
> > > > instead.  */
> > > > + itype = TREE_TYPE (cmp_ls);
> > > > + conv = gimple_build_assign (var, BIT_XOR_EXPR, cmp_ls,
> > > > + build_int_cst (itype, 1));
> > > > +   }
> > > > +
> > > > +  mask_vec_type = get_mask_type_for_scalar_type (loop_vinfo, 
> > > > itype);
> > > > +  append_pattern_def_seq (vinfo, stmt_vinfo, conv, mask_vec_type,
> itype);
> > > > +  /* Then prepare the boolean mask as the mask conversion pattern
> > > > +won't hit on the pattern statement.  */
> > > > +  cmp_ls = build_mask_conversion (vinfo, var, gs_vectype, 
> > > > stmt_vinfo);
> > >
> > > Isn't this somewhat redundant with the below call?
> > >
> > > I fear of bad [non-]interactions with bool pattern recognition btw.
> >
> > So this is again another issue with that patterns don't apply to newly 
> > produced
> patterns.
> > and so they can't serve as root for new patterns.  This is why the 
> > scatter/gather
> pattern
> > addition refactored part of the work into these helper functions.
> >
> > I did actually try to just add a secondary loop that iterates over newly 
> > produced
> patterns
> > but you later run into problems where a new pattern completely cancels out 
> > an
> old pattern
> > rather than just extend it.
> >
> > So at the moment, unless the code ends up being hybrid, whatever the bool
> recog pattern
> > does is just ignored as irrelevant.
> >
> > But If we don't invert the compare then it should be simpler as the original
> compare is
> > never in a pattern.
> >
> > I'll respin with these changes.
> 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/115531
* tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New.
(vect_recog_cond_store_pattern): New.
(vect_vect_recog_func_ptrs): Use it.
* target.def (conditional_operation_is_expensive): New.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document it.
* targhooks.cc (default_conditional_operation_is_expensive): New.
* targhooks.h (default_conditional_operation_is_expensive): New.
* tree-vectorizer.h (may_be_nonaddressable_p): New.

-- inline copy of patch --

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
f10d9a59c6673a02823fc05132235af3a1ad7c65..c7535d07f4ddd16d55e0ab9b609a2bf95931a2f4
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6449,6 +6449,13 @@ The default implementation returns a 
@code{MODE_VECTOR_INT} with the
 same size and number of elements as @var{mode}, if such a mode exists.
 @end deftypefn
 
+@deftypefn {Target Hook} bool 
TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE (unsigned @var{ifn})
+This hook returns true if masked operation @var{ifn} (really of
+type @code{internal_fn}) should be considered more expensive to use than
+implementing the same operation without masking.  GCC can then try to use
+unconditional operations instead with extra selects.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE 
(unsigned @var{ifn})
 This hook returns true if masked internal function @var{ifn} (really of
 type @code{internal_fn}) should be considered expensive when the mask is
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 
24596eb2f6b4e9ea3ea3464fda171d99155f4c0f..64cea3b1edaf8ec818c0e8095ab50b00ae0cb857
 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4290,6 +4290,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_GET_MASK_MODE
 
+@hook TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE
+
 @hook TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
 
 @hook TARGET_VECTORIZE_CREATE_COSTS
diff --git a/gcc/target.def b/gcc/target.def
index 
ce4d1ecd58be0a1c8110c6993556a52a2c69168e..3de1aad4c84d3df0b171a411f97e1ce70b6f63b5
 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2033,6 +2033,18 @@ same size and number of elements as @var{mode}, if such 
a mode exists.",
  (machine_mode mode),
  default_get_mask_mode)
 
+/* Function to say whether a conditional operation is expensive when
+   compared to non-masked operations.  */
+DEFHOOK
+(conditional_operation_is_expensive,
+ "This hook returns true if masked operation @var{ifn} (really of\n\
+type @code{internal_fn}) should be considered more expensive to use than\n\
+implementing the same operation without masking.  GCC can then try to use\n\
+unconditional operations instead with extra selects.",
+ bool,
+ (unsigned ifn),
+ default_conditional_operation_is_expensive)
+
 /* Function to say whether a masked operation is expensive when the
mask is all zeros.  */
 DEFHOOK
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 
3cbca0f13a5e5de893630c45a6bbe0616b

[PATCH 2/2]AArch64: implement TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE [PR115531].

2024-07-10 Thread Tamar Christina
Hi All,

This implements the new target hook indicating that for AArch64 when possible
we prefer masked operations for any type vs doing LOAD + SELECT or
SELECT + STORE.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/115531
* config/aarch64/aarch64.cc
(aarch64_conditional_operation_is_expensive): New.
(TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/115531
* gcc.dg/vect/vect-conditional_store_1.c: New test.
* gcc.dg/vect/vect-conditional_store_2.c: New test.
* gcc.dg/vect/vect-conditional_store_3.c: New test.
* gcc.dg/vect/vect-conditional_store_4.c: New test.

---
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
2816124076383c1c458e2cfa21cbbafb0773b05a..dc1bc0958ca6172bc2d4753efe491457ab9bcc74
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28222,6 +28222,15 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool load,
   return true;
 }
 
+/* Implement TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE.  Assume that
+   predicated operations when available are beneficial.  */
+
+static bool
+aarch64_conditional_operation_is_expensive (unsigned)
+{
+  return false;
+}
+
 /* Implement TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.  Assume for now that
it isn't worth branching around empty masked ops (including masked
stores).  */
@@ -30909,6 +30918,9 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_VECTORIZE_RELATED_MODE aarch64_vectorize_related_mode
 #undef TARGET_VECTORIZE_GET_MASK_MODE
 #define TARGET_VECTORIZE_GET_MASK_MODE aarch64_get_mask_mode
+#undef TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE
+#define TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE \
+  aarch64_conditional_operation_is_expensive
 #undef TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
 #define TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE \
   aarch64_empty_mask_is_expensive
diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c 
b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c
new file mode 100644
index 
..563ac63bdab01e33b7a3edd9ec1545633ee1b86e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c
@@ -0,0 +1,24 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_masked_store } */
+
+/* { dg-additional-options "-mavx2" { target avx2 } } */
+/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
+
+void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int 
stride)
+{
+  if (stride <= 1)
+return;
+
+  for (int i = 0; i < n; i++)
+{
+  int res = c[i];
+  int t = b[i+stride];
+  if (a[i] != 0)
+res = t;
+  c[i] = res;
+}
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" { aarch64-*-* } } } 
*/
diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c 
b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c
new file mode 100644
index 
..c45cdc30a6278de7f04b8a04cfc7a508c853279b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c
@@ -0,0 +1,24 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_masked_store } */
+
+/* { dg-additional-options "-mavx2" { target avx2 } } */
+/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
+
+void foo2 (char *restrict a, int *restrict b, int *restrict c, int n, int 
stride)
+{
+  if (stride <= 1)
+return;
+
+  for (int i = 0; i < n; i++)
+{
+  int res = c[i];
+  int t = b[i+stride];
+  if (a[i] != 0)
+t = res;
+  c[i] = t;
+}
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" { aarch64-*-* } } } 
*/
diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c 
b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c
new file mode 100644
index 
..da9e675dbb97add70d47fc8d714a02256fb1387a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c
@@ -0,0 +1,24 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_masked_store } */
+
+/* { dg-additional-options "-mavx2" { target avx2 } } */
+/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
+
+void foo3 (float *restrict a, int *restrict b, int *restrict c, int n, int 
stride)
+{
+  if (stride <= 1)
+return;
+
+  for (int i = 0; i < n; i++)
+{
+  int res = c[i];
+  int t = b[i+stride];
+  if (a[i] >= 0)
+t = res;
+  c[i] = t;
+}
+}
+
+/* { dg-final 

Re: [PATCH v2] Fix Xcode 16 build break with NULL != nullptr

2024-07-10 Thread Richard Biener
On Wed, Jul 10, 2024 at 12:23 PM Iain Sandoe  wrote:
>
> Hello Daniel,
>
> Thanks for the patch!
>
> > On 10 Jul 2024, at 10:43, Daniel Bertalan  wrote:
> >
> > As of Xcode 16 beta 2 with the macOS 15 SDK, each re-inclusion of the
> > stddef.h header causes the NULL macro in C++ to be re-defined to an
> > integral constant (__null). This makes the workaround in d59a576b8
> > ("Redefine NULL to nullptr") ineffective, as other headers that are
> > typically included after system.h (such as obstack.h) do include
> > stddef.h too.
> >
> > This can be seen by running the sample below through `clang++ -E`
> >
> >#include 
> >#define NULL nullptr
> >#include 
> >NULL
> >
> > The relevant libc++ change is here:
> > https://github.com/llvm/llvm-project/commit/2950283dddab03c183c1be2d7de9d4999cc86131
> >
> > Filed as FB14261859 to Apple and added a comment about it on LLVM PR
> > 86843.
> >
> > This fixes the cases in --enable-languages=c,c++,objc,obj-c++,rust build
> > where NULL being an integral constant instead of a null pointer literal
> > (therefore no longer implicitly converting to a pointer when used as a
> > template function's argument) caused issues.
> >
> >gcc/value-pointer-equiv.cc:65:43: error: no viable conversion from 
> > `pair::type, typename 
> > __unwrap_ref_decay::type>' to 'const pair'
> >
> >65 |   const std::pair  m_marker = std::make_pair (NULL, 
> > NULL);
> >   |   
> > ^~~
> >
> > As noted in the previous commit though, the proper solution would be to
> > phase out the usages of NULL in GCC's C++ source code.
> >
> > gcc/analyzer/ChangeLog:
> >
> >   * diagnostic-manager.cc (saved_diagnostic::saved_diagnostic):
> >   Change NULL to nullptr.
> >   (struct null_assignment_sm_context): Likewise.
> >   * infinite-loop.cc: Likewise.
> >   * infinite-recursion.cc: Likewise.
> >   * varargs.cc (va_list_state_machine::on_leak): Likewise.
> >
> > gcc/rust/ChangeLog:
> >
> >   * metadata/rust-imports.cc (Import::try_package_in_directory):
> >   Change NULL to nullptr.
> >
> > gcc/ChangeLog:
> >
> >   * value-pointer-equiv.cc: Change NULL to nullptr.
>
> This is fine from a Darwin/macOS perspective - and I’d say the changes are 
> generally in
> the ‘obvious’ category - however, let’s give other maintainers some time to 
> weigh in.

Yes, I think the patch is OK.

> NOTE: if you do not have a current copyright assignment to the FSF for GCC, 
> then
> you need to post the patch under DCO - see 
> https://gcc.gnu.org/contribute.html#legal
> for more information.
>
> thanks
> Iain
>
> > ---
> > gcc/analyzer/diagnostic-manager.cc | 18 +-
> > gcc/analyzer/infinite-loop.cc  |  2 +-
> > gcc/analyzer/infinite-recursion.cc |  2 +-
> > gcc/analyzer/varargs.cc|  2 +-
> > gcc/rust/metadata/rust-imports.cc  |  2 +-
> > gcc/value-pointer-equiv.cc |  2 +-
> > 6 files changed, 14 insertions(+), 14 deletions(-)
> >
> > diff --git a/gcc/analyzer/diagnostic-manager.cc 
> > b/gcc/analyzer/diagnostic-manager.cc
> > index fe943ac61c9e..51304b0795b6 100644
> > --- a/gcc/analyzer/diagnostic-manager.cc
> > +++ b/gcc/analyzer/diagnostic-manager.cc
> > @@ -679,12 +679,12 @@ saved_diagnostic::saved_diagnostic (const 
> > state_machine *sm,
> >   m_stmt (ploc.m_stmt),
> >   /* stmt_finder could be on-stack; we want our own copy that can
> >  outlive that.  */
> > -  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : NULL),
> > +  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : nullptr),
> >   m_loc (ploc.m_loc),
> >   m_var (var), m_sval (sval), m_state (state),
> > -  m_d (std::move (d)), m_trailing_eedge (NULL),
> > +  m_d (std::move (d)), m_trailing_eedge (nullptr),
> >   m_idx (idx),
> > -  m_best_epath (NULL), m_problem (NULL),
> > +  m_best_epath (nullptr), m_problem (nullptr),
> >   m_notes ()
> > {
> >   /* We must have an enode in order to be able to look for paths
> > @@ -1800,10 +1800,10 @@ public:
> >   stmt,
> >   stack_depth,
> >   sm,
> > - NULL,
> > + nullptr,
> >   src_sm_val,
> >   dst_sm_val,
> > - NULL,
> > + nullptr,
> >   dst_state,
> >   src_node));
> > return false;
> > @@ -1993,9 +1993,9 @@ struct null_assignment_sm_context : public sm_context
> >   m_sm,
> >   var_new_sval,
> >   from, to,
> > - NULL,
> > + nullptr,
> >  

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Richard Biener
On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
 wrote:
>
> Tejas Belagod  writes:
> > On 7/10/24 2:38 PM, Richard Biener wrote:
> >> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  
> >> wrote:
> >>>
> >>> On 7/9/24 4:22 PM, Richard Biener wrote:
>  On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  
>  wrote:
> >
> > On 7/8/24 4:45 PM, Richard Biener wrote:
> >> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Sorry to have dropped the ball on
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
> >>> here I've tried to pick it up again and write up a strawman proposal 
> >>> for
> >>> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> >>>
> >>>
> >>> Thanks,
> >>> Tejas.
> >>>
> >>> Motivation
> >>> --
> >>>
> >>> The idea of packed boolean vectors came about when we wanted to 
> >>> support
> >>> C/C++ operators on SVE ACLE types. The current vector boolean type 
> >>> that
> >>> ACLE specifies does not adequately disambiguate vector lane sizes 
> >>> which
> >>> they were derived off of. Consider this simple, albeit unrealistic, 
> >>> example:
> >>>
> >>>   bool foo (svint32_t a, svint32_t b)
> >>>   {
> >>> svbool_t p = a > b;
> >>>
> >>> // Here p[2] is not the same as a[2] > b[2].
> >>> return p[2];
> >>>   }
> >>>
> >>> In the above example, because svbool_t has a fixed 1-lane-per-byte, 
> >>> p[i]
> >>> does not return the bool value corresponding to a[i] > b[i]. This
> >>> necessitates a 'typed' vector boolean value that unambiguously
> >>> represents results of operations
> >>> of the same type.
> >>>
> >>> __attribute__((vector_mask))
> >>> -
> >>>
> >>> Note: If interested in historical discussions refer to:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> >>>
> >>> We define this new attribute which when applied to a base data vector
> >>> produces a new boolean vector type that represents a boolean type that
> >>> is produced as a result of operations on the corresponding base vector
> >>> type. The following is the syntax.
> >>>
> >>>   typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
> >>>   typedef v8si v8sib __attribute__((vector_mask));
> >>>
> >>> Here the 'base' data vector type is v8si or a vector of 8 integers.
> >>>
> >>> Rules
> >>>
> >>> • The layout/size of the boolean vector type is implementation-defined
> >>> for its base data vector type.
> >>>
> >>> • Two boolean vector types who's base data vector types have same 
> >>> number
> >>> of elements and lane-width have the same layout and size.
> >>>
> >>> • Consequently, two boolean vectors who's base data vector types have
> >>> different number of elements or different lane-size have different 
> >>> layouts.
> >>>
> >>> This aligns with gnu vector extensions that generate integer vectors 
> >>> as
> >>> a result of comparisons - "The result of the comparison is a vector of
> >>> the same width and number of elements as the comparison operands with 
> >>> a
> >>> signed integral element type." according to
> >>>https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> >>
> >> Without having the time to re-review this all in detail I think the GNU
> >> vector extension does not expose the result of the comparison as the
> >> machine would produce it but instead a comparison "decays" to
> >> a conditional:
> >>
> >> typedef int v4si __attribute__((vector_size(16)));
> >>
> >> v4si a;
> >> v4si b;
> >>
> >> void foo()
> >> {
> >>  auto r = a < b;
> >> }
> >>
> >> produces, with C23:
> >>
> >>  vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 
> >> 0,
> >> 0, 0, 0 } > ;
> >>
> >> In fact on x86_64 with AVX and AVX512 you have two different "machine
> >> produced" mask types and the above could either produce a AVX mask with
> >> 32bit elements or a AVX512 mask with 1bit elements.
> >>
> >> Not exposing "native" mask types requires the compiler optimizing 
> >> subsequent
> >> uses and makes generic vectors difficult to combine with for example 
> >> AVX512
> >> intrinsics (where masks are just 'int').  Across an ABI boundary it's 
> >> also
> >> even more difficult to optimize mask transitions.
> >>
> >> But it at least allows portable code and it does not suffer from users 
> >> trying to
> >> expose machine representations of masks as input to generic vector code
> >> with all the problems of constant folding not only requiring 
>

Lower zeroing array assignment to memset for allocatable arrays

2024-07-10 Thread Prathamesh Kulkarni
Hi,
The attached patch lowers zeroing array assignment to memset for allocatable 
arrays.

For example:
subroutine test(z, n)
implicit none
integer :: n
real(4), allocatable :: z(:,:,:)

allocate(z(n, 8192, 2048))
z = 0
end subroutine

results in following call to memset instead of 3 nested loops for z = 0:
(void) __builtin_memset ((void *) z->data, 0, (unsigned long) MAX_EXPR 
dim[0].ubound - z->dim[0].lbound, -1> + 1) * (MAX_EXPR dim[1].ubound - 
z->dim[1].lbound, -1> + 1)) * (MAX_EXPR dim[2].ubound - z->dim[2].lbound, 
-1> + 1)) * 4));

The patch significantly improves speedup for an internal Fortran application on 
AArch64 -mcpu=grace (and potentially on other AArch64 cores too).
Bootstrapped+tested on aarch64-linux-gnu.
Does the patch look OK to commit ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
Lower zeroing array assignment to memset for allocatable arrays.

gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_trans_zero_assign): Handle allocatable arrays.

gcc/testsuite/ChangeLog:
* gfortran.dg/array_memset_3.f90: New test.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 605434f4ddb..7773a24f9d4 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -11421,18 +11421,23 @@ gfc_trans_zero_assign (gfc_expr * expr)
   type = TREE_TYPE (dest);
   if (POINTER_TYPE_P (type))
 type = TREE_TYPE (type);
-  if (!GFC_ARRAY_TYPE_P (type))
-return NULL_TREE;
-
-  /* Determine the length of the array.  */
-  len = GFC_TYPE_ARRAY_SIZE (type);
-  if (!len || TREE_CODE (len) != INTEGER_CST)
+  if (GFC_ARRAY_TYPE_P (type))
+{
+  /* Determine the length of the array.  */
+  len = GFC_TYPE_ARRAY_SIZE (type);
+  if (!len || TREE_CODE (len) != INTEGER_CST)
+   return NULL_TREE;
+}
+  else if (GFC_DESCRIPTOR_TYPE_P (type))
+{
+  if (POINTER_TYPE_P (TREE_TYPE (dest)))
+   dest = build_fold_indirect_ref_loc (input_location, dest);
+  len = gfc_conv_descriptor_size (dest, GFC_TYPE_ARRAY_RANK (type));
+  dest = gfc_conv_descriptor_data_get (dest);
+}
+  else
 return NULL_TREE;
 
-  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
-  len = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, len,
-fold_convert (gfc_array_index_type, tmp));
-
   /* If we are zeroing a local array avoid taking its address by emitting
  a = {} instead.  */
   if (!POINTER_TYPE_P (TREE_TYPE (dest)))
@@ -11440,6 +11445,11 @@ gfc_trans_zero_assign (gfc_expr * expr)
   dest, build_constructor (TREE_TYPE (dest),
  NULL));
 
+  /* Multiply len by element size.  */
+  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
+  len = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
+len, fold_convert (gfc_array_index_type, tmp));
+
   /* Convert arguments to the correct types.  */
   dest = fold_convert (pvoid_type_node, dest);
   len = fold_convert (size_type_node, len);
diff --git a/gcc/testsuite/gfortran.dg/array_memset_3.f90 
b/gcc/testsuite/gfortran.dg/array_memset_3.f90
new file mode 100644
index 000..b750c8de67d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/array_memset_3.f90
@@ -0,0 +1,31 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-original" }
+
+subroutine test1(n)
+  implicit none
+integer(8) :: n
+real(4), allocatable :: z(:,:,:)
+
+allocate(z(n, 100, 200))
+z = 0
+end subroutine
+
+subroutine test2(n)
+  implicit none
+integer(8) :: n
+integer, allocatable :: z(:,:,:)
+
+allocate(z(n, 100, 200))
+z = 0
+end subroutine
+
+subroutine test3(n)
+  implicit none
+integer(8) :: n
+logical, allocatable :: z(:,:,:)
+
+allocate(z(n, 100, 200))
+z = .false. 
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_memset" 3 "original" } }


Re: [PATCH v3] Vect: Optimize truncation for .SAT_SUB operands

2024-07-10 Thread Richard Biener
On Tue, Jul 9, 2024 at 6:03 AM  wrote:
>
> From: Pan Li 
>
> To get better vectorized code of .SAT_SUB,  we would like to avoid the
> truncated operation for the assignment.  For example, as below.
>
> unsigned int _1;
> unsigned int _2;
> unsigned short int _4;
> _9 = (unsigned short int).SAT_SUB (_1, _2);
>
> If we make sure that the _1 is in the range of unsigned short int.  Such
> as a def similar to:
>
> _1 = (unsigned short int)_4;
>
> Then we can do the distribute the truncation operation to:
>
> _3 = (unsigned short int) MIN (65535, _2); // aka _3 = .SAT_TRUNC (_2);
> _9 = .SAT_SUB (_4, _3);
>
> Then,  we can better vectorized code and avoid the unnecessary narrowing
> stmt during vectorization with below stmt(s).
>
> _3 = .SAT_TRUNC(_2); // SI => HI
> _9 = .SAT_SUB (_4, _3);
>
> Let's take RISC-V vector as example to tell the changes.  For below
> sample code:
>
> __attribute__((noinline))
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0);
>   } while (--n);
> }
>
> Before this patch:
>   ...
>   .L3:
>   vle16.v   v1,0(a3)
>   vrsub.vx  v5,v2,t1
>   mvt3,a4
>   addw  a4,a4,t5
>   vrgather.vv   v3,v1,v5
>   vsetvli   zero,zero,e32,m1,ta,ma
>   vzext.vf2 v1,v3
>   vssubu.vx v1,v1,a1
>   vsetvli   zero,zero,e16,mf2,ta,ma
>   vncvt.x.x.w   v1,v1
>   vrgather.vv   v3,v1,v5
>   vse16.v   v3,0(a3)
>   sub   a3,a3,t4
>   bgtu  t6,a4,.L3
>   ...
>
> After this patch:
> test:
>   ...
>   .L3:
>   vle16.v v3,0(a3)
>   vrsub.vxv5,v2,a6
>   mv  a7,a4
>   addwa4,a4,t3
>   vrgather.vv v1,v3,v5
>   vssubu.vv   v1,v1,v6
>   vrgather.vv v3,v1,v5
>   vse16.v v3,0(a3)
>   sub a3,a3,t1
>   bgtut4,a4,.L3
>   ...
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_sat_sub_pattern_transform):
> Add new func impl to perform the truncation distribution.
> (vect_recog_sat_sub_pattern): Perform above optimize before
> generate .SAT_SUB call.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 65 +++
>  1 file changed, 65 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..4570c25b664 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4566,6 +4566,70 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>return NULL;
>  }
>
> +/*
> + * Try to transform the truncation for .SAT_SUB pattern,  mostly occurs in
> + * the benchmark zip.  Aka:
> + *
> + *   unsigned int _1;
> + *   unsigned int _2;
> + *   unsigned short int _4;
> + *   _9 = (unsigned short int).SAT_SUB (_1, _2);
> + *
> + *   if _1 is known to be in the range of unsigned short int.  For example
> + *   there is a def _1 = (unsigned short int)_4.  Then we can transform the
> + *   truncation to:
> + *
> + *   _3 = (unsigned short int) MIN (65535, _2); // aka _3 = .SAT_TRUNC (_2);
> + *   _9 = .SAT_SUB (_4, _3);
> + *
> + *   Then,  we can better vectorized code and avoid the unnecessary narrowing
> + *   stmt during vectorization with below stmt(s).
> + *
> + *   _3 = .SAT_TRUNC(_2); // SI => HI
> + *   _9 = .SAT_SUB (_4, _3);
> + */
> +static void
> +vect_recog_sat_sub_pattern_transform (vec_info *vinfo,
> + stmt_vec_info stmt_vinfo,
> + tree lhs, tree *ops)
> +{
> +  tree otype = TREE_TYPE (lhs);
> +  tree itype = TREE_TYPE (ops[0]);
> +  unsigned itype_prec = TYPE_PRECISION (itype);
> +  unsigned otype_prec = TYPE_PRECISION (otype);
> +
> +  if (types_compatible_p (otype, itype) || otype_prec >= itype_prec)
> +return;
> +
> +  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
> +  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree_pair v_pair = tree_pair (v_otype, v_itype);
> +
> +  if (v_otype == NULL_TREE || v_itype == NULL_TREE
> +|| !direct_internal_fn_supported_p (IFN_SAT_TRUNC, v_pair,
> +   OPTIMIZE_FOR_BOTH))
> +return;
> +
> +  /* 1. Find the _4 and update ops[0] as above example.  */
> +  vect_unpromoted_value unprom;
> +  tree tmp = vect_look_through_possible_promotion (vinfo, ops[0], &unprom);
> +
> +  if (tmp == NULL_TREE || TYPE_PRECISION (unprom.type) != otype_prec)
> +return;
> +
> +  ops[0] = tmp;
> +
> +  /* 2. Generate _3 = .SAT_TRUNC (_2) and update ops[1] as above example.  */
> +  tree trunc_lhs_ssa = vect_recog_temp_ssa_var (otype, NULL);
> +  gcall *call = gimple_build_call_internal (IFN_SAT_TRUNC, 1, ops[1]);
> +
> +  gimple_call_set_lhs (call, trunc_lhs_ssa);
> +  gimple_call_set_nothrow (call, /* noth

Re: [match.pd PATCH] PR tree-optimization/114661: Generalize MULT_EXPR recognition.

2024-07-10 Thread Richard Biener
On Wed, Jul 10, 2024 at 12:28 AM Roger Sayle  wrote:
>
>
> This patch resolves PR tree-optimization/114661, by generalizing the set
> of expressions that we canonicalize to multiplication.  This extends the
> optimization(s) contributed (by me) back in July 2021.
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html
>
> The existing transformation folds (X*C1)^(X< allowed.  A subtlety is that for non-wrapping integer types, we
> actually fold this into (int)((unsigned)X*C3) so that we don't
> introduce an undefined overflow that wasn't in the original.
> Unfortunately, this transformation confuses itself, as the type-safe
> multiplication isn't recognized when further combining bit operations.
> Fixed here by adding transforms to turn (int)((unsigned)X*C1)^(X< into (int)((unsigned)X*C3) so that match.pd and EVRP can continue
> to construct multiplications.
>
> For the example given in the PR:
>
> unsigned mul(unsigned char c) {
> if (c > 3) __builtin_unreachable();
> return c << 18 | c << 15 |
>c << 12 | c << 9 |
>c << 6 | c << 3 | c;
> }
>
> GCC on x86_64 with -O2 previously generated:
>
> mul:movzbl  %dil, %edi
> leal(%rdi,%rdi,8), %edx
> leal0(,%rdx,8), %eax
> movl%edx, %ecx
> sall$15, %edx
> orl %edi, %eax
> sall$9, %ecx
> orl %ecx, %eax
> orl %edx, %eax
> ret
>
> with this patch we now generate:
>
> mul:movzbl  %dil, %eax
> imull   $299593, %eax, %eax
> ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

I'm looking at the difference between the existing

 (simplify
  (op:c (mult:s@0 @1 INTEGER_CST@2)
(lshift:s@3 @1 INTEGER_CST@4))
  (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
   && tree_int_cst_sgn (@4) > 0
   && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
   (with { wide_int wone = wi::one (TYPE_PRECISION (type));
   wide_int c = wi::add (wi::to_wide (@2),
 wi::lshift (wone, wi::to_wide (@4))); }
(mult @1 { wide_int_to_tree (type, c); }

and

+ (simplify
+  (op:c (convert:s@0 (mult:s@1 (convert @2) INTEGER_CST@3))
+   (lshift:s@4 @2 INTEGER_CST@5))
+  (if (INTEGRAL_TYPE_P (type)
+   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   && TREE_TYPE (@2) == type
+   && TYPE_UNSIGNED (TREE_TYPE (@1))
+   && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1))
+   && tree_int_cst_sgn (@5) > 0
+   && (tree_nonzero_bits (@0) & tree_nonzero_bits (@4)) == 0)
+   (with { tree t = TREE_TYPE (@1);
+  wide_int wone = wi::one (TYPE_PRECISION (t));
+  wide_int c = wi::add (wi::to_wide (@3),
+wi::lshift (wone, wi::to_wide (@5))); }
+(convert (mult:t (convert:t @2) { wide_int_to_tree (t, c); })

and wonder whether wrapping of the multiplication is required for correctness,
specifically the former seems to allow signed types with -fwrapv while the
latter won't.  It also looks the patterns could be merged doing

 (simplify
  (op:c (nop_convert:s? (mult:s@0 (nop_convert? @1) INTEGER_CST@2)
(lshift:s@3 @1 INTEGER_CST@4))

and by using nop_convert instead of convert simplify the condition?

Richard.

>
> 2024-07-09  Roger Sayle  
>
> gcc/ChangeLog
> PR tree-optimization/114661
> * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Additionally recognize
> multiplications surrounded by casts to an unsigned type and back
> such as those generated by these transformations.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/114661
> * gcc.dg/pr114661.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-07-10 Thread Richard Biener
On Wed, Jul 10, 2024 at 11:28 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
>   T __attribute__((noinline))  \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {\
> unsigned i;\
> T ret; \
> for (i = 0; i < limit; i++)\
>   {\
> out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>   }\
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will failure to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be treated as unsigned
> SImode from the perspective of tree.

I think that's a bug.  Do you say __builtin_add_overflow fails to promote
(constant) arguments?

>  Aka
>
> uint64_t _1;
> uint64_t _2;
>
> _1 = .SAT_ADD (_2, 9);
>
> The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> SImode,  and then result in vectorizable_call fails.  This patch would
> like to promote the imm operand to the operand type mode of _2 if and
> only if there is no precision/data loss.  Aka convert the imm 9 to the
> DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
> new func impl to promote the imm tree to target type.
> (vect_recog_sat_add_pattern): Peform the type promotion before
> generate .SAT_ADD call.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..e1013222b12 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>return NULL;
>  }
>
> +static void
> +vect_recog_promote_cst_to_unsigned (tree *op, tree type)
> +{
> +  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
> +return;
> +
> +  unsigned precision = TYPE_PRECISION (type);
> +  wide_int type_max = wi::mask (precision, false, precision);
> +  wide_int op_cst_val = wi::to_wide (*op, precision);
> +
> +  if (wi::leu_p (op_cst_val, type_max))
> +*op = wide_int_to_tree (type, op_cst_val);
> +}
> +
>  /*
>   * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
>   *   _7 = _4 + _6;
> @@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>
>if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
>  {
> +  vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
> +  vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
> +
>gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
>   IFN_SAT_ADD, 
> type_out,
>   lhs, ops[0], 
> ops[1]);
> --
> 2.34.1
>


[PATCH] aarch64: Avoid alloca in target attribute parsing

2024-07-10 Thread Richard Sandiford
The handling of the target attribute used alloca to allocate
a copy of unverified user input, which could exhaust the stack
if the input is too long.  This patch converts it to auto_vecs
instead.

I wondered about converting it to use std::string, which we
already use elsewhere, but that would be more invasive and
controversial.

I'll push tomorrow evening UK time if there are no comments
in the meantime.

Richard


gcc/
* config/aarch64/aarch64.cc (aarch64_process_one_target_attr)
(aarch64_process_target_attr): Avoid alloca.
---
 gcc/config/aarch64/aarch64.cc | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 7f0cc47d0f0..0d41a193ec1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19405,8 +19405,10 @@ aarch64_process_one_target_attr (char *arg_str)
   return false;
 }
 
-  char *str_to_check = (char *) alloca (len + 1);
-  strcpy (str_to_check, arg_str);
+  auto_vec buffer;
+  buffer.safe_grow (len + 1);
+  char *str_to_check = buffer.address ();
+  memcpy (str_to_check, arg_str, len + 1);
 
   /* We have something like __attribute__ ((target ("+fp+nosimd"))).
  It is easier to detect and handle it explicitly here rather than going
@@ -19569,8 +19571,10 @@ aarch64_process_target_attr (tree args)
 }
 
   size_t len = strlen (TREE_STRING_POINTER (args));
-  char *str_to_check = (char *) alloca (len + 1);
-  strcpy (str_to_check, TREE_STRING_POINTER (args));
+  auto_vec buffer;
+  buffer.safe_grow (len + 1);
+  char *str_to_check = buffer.address ();
+  memcpy (str_to_check, TREE_STRING_POINTER (args), len + 1);
 
   if (len == 0)
 {
-- 
2.25.1



RE: [PATCH 2/3] Support group-size of three in SLP load permutation lowering

2024-07-10 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, July 10, 2024 10:04 AM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 2/3] Support group-size of three in SLP load permutation 
> lowering
> 
> The following adds support for group-size three in SLP load permutation
> lowering to match the non-SLP capabilities.  This is done by using
> the non-interleaving fallback code which then creates at VF == 4 from
> { { a0, b0, c0 }, { a1, b1, c1 }, { a2, b2, c2 }, { a3, b3, c3 } }
> the intermediate vectors { c0, c0, c1, c1 } and { c2, c2, c3, c3 }
> to produce { c0, c1, c2, c3 }.
> 

Just curious, is this only for the 3rd component then? I'm assuming the firs 
two get handled by
{a0, b0, a1, b1} {a2, b2, a3, b3} still?

Regards,
Tamar

> This turns out to be more effective than the scheme implemented
> for non-SLP for SSE and only slightly worse for AVX512 and a bit
> more worse for AVX2.  It seems to me that this would extend to
> other non-power-of-two group-sizes though (but the patch does not).
> Optimal schemes are likely difficult to lay out in VF agnostic form.
> 
> I'll note that while the lowering assumes even/odd extract is
> generally available for all vector element sizes (which is probably
> a good assumption), it doesn't in any way constrain the other
> permutes it generates based on target availability.  Again difficult
> to do in a VF agnostic way (but at least currently the vector type
> is fixed).
> 
> I'll also note that the SLP store side merges lanes in a way
> producing three-vector permutes for store group-size of three, so
> the testcase uses a store group-size of four.
> 
>   * tree-vect-slp.cc (vect_lower_load_permutations): Support
>   group-size of three.
> 
>   * gcc.dg/vect/slp-52.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/slp-52.c | 14 
>  gcc/tree-vect-slp.cc   | 35 +-
>  2 files changed, 34 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-52.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-52.c 
> b/gcc/testsuite/gcc.dg/vect/slp-52.c
> new file mode 100644
> index 000..ba49f0046e2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/slp-52.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +
> +void foo (int * __restrict x, int *y)
> +{
> +  for (int i = 0; i < 1024; ++i)
> +{
> +  x[4*i+0] = y[3*i+0];
> +  x[4*i+1] = y[3*i+1] * 2;
> +  x[4*i+2] = y[3*i+2] + 3;
> +  x[4*i+3] = y[3*i+2] * 2 - 5;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target 
> {
> vect_int && vect_int_mult } } } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 0f830c1ad9c..2dc6d365303 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -3710,7 +3710,8 @@ vect_build_slp_instance (vec_info *vinfo,
>with the least number of lanes to one and then repeat until
>we end up with two inputs.  That scheme makes sure we end
>up with permutes satisfying the restriction of requiring at
> -  most two vector inputs to produce a single vector output.  */
> +  most two vector inputs to produce a single vector output
> +  when the number of lanes is even.  */
> while (SLP_TREE_CHILDREN (perm).length () > 2)
>   {
> /* When we have three equal sized groups left the pairwise
> @@ -4050,11 +4051,10 @@ vect_lower_load_permutations (loop_vec_info
> loop_vinfo,
>  = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (loads[0])[0]);
> 
>/* Only a power-of-two number of lanes matches interleaving with N levels.
> - The non-SLP path also supports DR_GROUP_SIZE == 3.
>   ???  An even number of lanes could be reduced to 1<   at each step.  */
>unsigned group_lanes = DR_GROUP_SIZE (first);
> -  if (exact_log2 (group_lanes) == -1)
> +  if (exact_log2 (group_lanes) == -1 && group_lanes != 3)
>  return;
> 
>for (slp_tree load : loads)
> @@ -4071,7 +4071,7 @@ vect_lower_load_permutations (loop_vec_info
> loop_vinfo,
>with a non-1:1 load permutation around instead of canonicalizing
>those into a load and a permute node.  Removing this early
>check would do such canonicalization.  */
> -  if (SLP_TREE_LANES (load) >= group_lanes / 2)
> +  if (SLP_TREE_LANES (load) >= (group_lanes + 1) / 2)
>   continue;
> 
>/* First build (and possibly re-use) a load node for the
> @@ -4107,7 +4107,7 @@ vect_lower_load_permutations (loop_vec_info
> loop_vinfo,
>while (1)
>   {
> unsigned group_lanes = SLP_TREE_LANES (l0);
> -   if (SLP_TREE_LANES (load) >= group_lanes / 2)
> +   if (SLP_TREE_LANES (load) >= (group_lanes + 1) / 2)
>   break;
> 
> /* Try to lower by reducing the group to half its size using an
> @@ -4117,19 +4117,24 @@ vect_lower_load_permutations (loop_vec_info
> loop_vinfo,

Re: [PATCH 2/2]AArch64: implement TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE [PR115531].

2024-07-10 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> This implements the new target hook indicating that for AArch64 when possible
> we prefer masked operations for any type vs doing LOAD + SELECT or
> SELECT + STORE.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR tree-optimization/115531
>   * config/aarch64/aarch64.cc
>   (aarch64_conditional_operation_is_expensive): New.
>   (TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE): New.
>
> gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/115531
>   * gcc.dg/vect/vect-conditional_store_1.c: New test.
>   * gcc.dg/vect/vect-conditional_store_2.c: New test.
>   * gcc.dg/vect/vect-conditional_store_3.c: New test.
>   * gcc.dg/vect/vect-conditional_store_4.c: New test.

OK for the aarch64 part if 1/2 is OK.  The tests look good to me too,
so OK for those unless someone objects.

Thanks,
Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 2816124076383c1c458e2cfa21cbbafb0773b05a..dc1bc0958ca6172bc2d4753efe491457ab9bcc74
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -28222,6 +28222,15 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool 
> load,
>return true;
>  }
>  
> +/* Implement TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE.  Assume 
> that
> +   predicated operations when available are beneficial.  */
> +
> +static bool
> +aarch64_conditional_operation_is_expensive (unsigned)
> +{
> +  return false;
> +}
> +
>  /* Implement TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.  Assume for now that
> it isn't worth branching around empty masked ops (including masked
> stores).  */
> @@ -30909,6 +30918,9 @@ aarch64_libgcc_floating_mode_supported_p
>  #define TARGET_VECTORIZE_RELATED_MODE aarch64_vectorize_related_mode
>  #undef TARGET_VECTORIZE_GET_MASK_MODE
>  #define TARGET_VECTORIZE_GET_MASK_MODE aarch64_get_mask_mode
> +#undef TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE
> +#define TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE \
> +  aarch64_conditional_operation_is_expensive
>  #undef TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
>  #define TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE \
>aarch64_empty_mask_is_expensive
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c
> new file mode 100644
> index 
> ..563ac63bdab01e33b7a3edd9ec1545633ee1b86e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c
> @@ -0,0 +1,24 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_masked_store } */
> +
> +/* { dg-additional-options "-mavx2" { target avx2 } } */
> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> +
> +void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int 
> stride)
> +{
> +  if (stride <= 1)
> +return;
> +
> +  for (int i = 0; i < n; i++)
> +{
> +  int res = c[i];
> +  int t = b[i+stride];
> +  if (a[i] != 0)
> +res = t;
> +  c[i] = res;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" { aarch64-*-* } } 
> } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c 
> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c
> new file mode 100644
> index 
> ..c45cdc30a6278de7f04b8a04cfc7a508c853279b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c
> @@ -0,0 +1,24 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_masked_store } */
> +
> +/* { dg-additional-options "-mavx2" { target avx2 } } */
> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> +
> +void foo2 (char *restrict a, int *restrict b, int *restrict c, int n, int 
> stride)
> +{
> +  if (stride <= 1)
> +return;
> +
> +  for (int i = 0; i < n; i++)
> +{
> +  int res = c[i];
> +  int t = b[i+stride];
> +  if (a[i] != 0)
> +t = res;
> +  c[i] = t;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" { aarch64-*-* } } 
> } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c 
> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c
> new file mode 100644
> index 
> ..da9e675dbb97add70d47fc8d714a02256fb1387a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c
> @@ -0,0 +1,24 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_masked_store } */
> +
> +/* { dg-a

[committed] arm: cleanup legacy ARM_PE code

2024-07-10 Thread Richard Earnshaw
The arm 'pe' target was removed back in 2012 when the FPA support was
removed, but in a small number of places some conditional code was
accidentally left behind.  It's no-longer needed, so remove it.

gcc/ChangeLog:

* config/arm/arm-protos.h (arm_dllexport_name_p): Remove prototype.
(arm_dllimport_name_p): Likewise.
(arm_pe_unique_section): Likewise.
(arm_pe_encode_section_info): Likewise.
(arm_dllexport_p): Likewise.
(arm_dllimport_p): Likewise.
(arm_mark_dllexport): Likewise.
(arm_mark_dllimport): Likewise.
(arm_change_mode_p): Likewise.
* config/arm/arm.cc (arm_gnu_attributes): Remove attributes for ARM_PE.
(TARGET_ENCODE_SECTION_INFO): Remove setting for ARM_PE.
(is_called_in_ARM_mode): Remove ARM_PE conditional code.
(thumb1_output_interwork): Remove obsolete ARM_PE code.
(arm_encode_section_info): Remove surrounding #ifndef.
---
 gcc/config/arm/arm-protos.h | 12 
 gcc/config/arm/arm.cc   | 32 +---
 2 files changed, 1 insertion(+), 43 deletions(-)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 34d6be76e94..50cae2b513a 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -266,19 +266,7 @@ extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
 #endif
 
-/* Defined in pe.c.  */
-extern int arm_dllexport_name_p (const char *);
-extern int arm_dllimport_name_p (const char *);
-
-#ifdef TREE_CODE
-extern void arm_pe_unique_section (tree, int);
-extern void arm_pe_encode_section_info (tree, rtx, int);
-extern int arm_dllexport_p (tree);
-extern int arm_dllimport_p (tree);
-extern void arm_mark_dllexport (tree);
-extern void arm_mark_dllimport (tree);
 extern bool arm_change_mode_p (tree);
-#endif
 
 extern tree arm_valid_target_attribute_tree (tree, struct gcc_options *,
 struct gcc_options *);
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 93993d95eb9..92cd168e659 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -208,9 +208,7 @@ static int aapcs_select_return_coproc (const_tree, 
const_tree);
 static void arm_elf_asm_constructor (rtx, int) ATTRIBUTE_UNUSED;
 static void arm_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
 #endif
-#ifndef ARM_PE
 static void arm_encode_section_info (tree, rtx, int);
-#endif
 
 static void arm_file_end (void);
 static void arm_file_start (void);
@@ -352,21 +350,7 @@ static const attribute_spec arm_gnu_attributes[] =
 NULL },
   { "naked",0, 0, true,  false, false, false,
 arm_handle_fndecl_attribute, NULL },
-#ifdef ARM_PE
-  /* ARM/PE has three new attributes:
- interfacearm - ?
- dllexport - for exporting a function/variable that will live in a dll
- dllimport - for importing a function/variable from a dll
-
- Microsoft allows multiple declspecs in one __declspec, separating
- them with spaces.  We do NOT support this.  Instead, use __declspec
- multiple times.
-  */
-  { "dllimport",0, 0, true,  false, false, false, NULL, NULL },
-  { "dllexport",0, 0, true,  false, false, false, NULL, NULL },
-  { "interfacearm", 0, 0, true,  false, false, false,
-arm_handle_fndecl_attribute, NULL },
-#elif TARGET_DLLIMPORT_DECL_ATTRIBUTES
+#if TARGET_DLLIMPORT_DECL_ATTRIBUTES
   { "dllimport",0, 0, false, false, false, false, handle_dll_attribute,
 NULL },
   { "dllexport",0, 0, false, false, false, false, handle_dll_attribute,
@@ -488,11 +472,7 @@ static const scoped_attribute_specs *const 
arm_attribute_table[] =
 #define TARGET_MEMORY_MOVE_COST arm_memory_move_cost
 
 #undef TARGET_ENCODE_SECTION_INFO
-#ifdef ARM_PE
-#define TARGET_ENCODE_SECTION_INFO  arm_pe_encode_section_info
-#else
 #define TARGET_ENCODE_SECTION_INFO  arm_encode_section_info
-#endif
 
 #undef  TARGET_STRIP_NAME_ENCODING
 #define TARGET_STRIP_NAME_ENCODING arm_strip_name_encoding
@@ -26821,11 +26801,7 @@ is_called_in_ARM_mode (tree func)
   if (TARGET_CALLEE_INTERWORKING && TREE_PUBLIC (func))
 return true;
 
-#ifdef ARM_PE
-  return lookup_attribute ("interfacearm", DECL_ATTRIBUTES (func)) != 
NULL_TREE;
-#else
   return false;
-#endif
 }
 
 /* Given the stack offsets and register mask in OFFSETS, decide how
@@ -28301,10 +28277,6 @@ thumb1_output_interwork (void)
 #define STUB_NAME ".real_start_of"
 
   fprintf (f, "\t.code\t16\n");
-#ifdef ARM_PE
-  if (arm_dllexport_name_p (name))
-name = arm_strip_name_encoding (name);
-#endif
   asm_fprintf (f, "\t.globl %s%U%s\n", STUB_NAME, name);
   fprintf (f, "\t.thumb_func\n");
   asm_fprintf (f, "%s%U%s:\n", STUB_NAME, name);
@@ -28893,7 +28865,6 @@ arm_file_end (void)
 }
 }
 
-#ifndef ARM_PE
 /* Symbols in the text segment can be accessed without indirecting via the
constant pool; it may take an extra binary operation, but this is still
faster than indire

[PING^3][PATCH v2] docs: Update function multiversioning documentation

2024-07-10 Thread Andrew Carlotti


On Mon, Jun 10, 2024 at 05:08:21PM +0100, Andrew Carlotti wrote:
> 
> On Tue, Apr 30, 2024 at 05:10:45PM +0100, Andrew Carlotti wrote:
> > Add target_version attribute to Common Function Attributes and update
> > target and target_clones documentation.  Move shared detail and examples
> > to the Function Multiversioning page.  Add target-specific details to
> > target-specific pages.
> > 
> > ---
> > 
> > Changes since v1:
> > - Various typo fixes.
> > - Reordered content in 'Function multiversioning' section to put 
> > implementation
> >   details at the end (as suggested in review).
> > - Dropped links to outdated wiki page, and a couple of other unhelpful
> >   sentences that the previous version preserved.
> > 
> > I've built and rechecked the info output.  Ok for master?  And is this ok 
> > for
> > the GCC-14 branch too?
> > 
> > gcc/ChangeLog:
> > 
> > * doc/extend.texi (Common Function Attributes): Update target
> > and target_clones documentation, and add target_version.
> > (AArch64 Function Attributes): Add ACLE reference and list
> > supported features.
> > (PowerPC Function Attributes): List supported features.
> > (x86 Function Attributes): Mention function multiversioning.
> > (Function Multiversioning): Update, and move shared detail here.
> > 
> > 
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index 
> > e290265d68d33f86a7e7ee9882cc0fd6bed00143..fefac70b5fffc350bf23db74a8fc88fa3bb99bd5
> >  100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -4178,17 +4178,16 @@ and @option{-Wanalyzer-tainted-size}.
> >  Multiple target back ends implement the @code{target} attribute
> >  to specify that a function is to
> >  be compiled with different target options than specified on the
> > -command line.  The original target command-line options are ignored.
> > -One or more strings can be provided as arguments.
> > -Each string consists of one or more comma-separated suffixes to
> > -the @code{-m} prefix jointly forming the name of a machine-dependent
> > -option.  @xref{Submodel Options,,Machine-Dependent Options}.
> > -
> > +command line.  One or more strings can be provided as arguments.
> >  The @code{target} attribute can be used for instance to have a function
> >  compiled with a different ISA (instruction set architecture) than the
> > -default.  @samp{#pragma GCC target} can be used to specify target-specific
> > -options for more than one function.  @xref{Function Specific Option 
> > Pragmas},
> > -for details about the pragma.
> > +default.
> > +
> > +The options supported by the @code{target} attribute are specific to each
> > +target; refer to @ref{x86 Function Attributes}, @ref{PowerPC Function
> > +Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function 
> > Attributes},
> > +@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> > +for details.
> >  
> >  For instance, on an x86, you could declare one function with the
> >  @code{target("sse4.1,arch=core2")} attribute and another with
> > @@ -4211,39 +4210,26 @@ multiple options is equivalent to separating the 
> > option suffixes with
> >  a comma (@samp{,}) within a single string.  Spaces are not permitted
> >  within the strings.
> >  
> > -The options supported are specific to each target; refer to @ref{x86
> > -Function Attributes}, @ref{PowerPC Function Attributes},
> > -@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
> > -@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> > -for details.
> > +@samp{#pragma GCC target} can be used to specify target-specific
> > +options for more than one function.  @xref{Function Specific Option 
> > Pragmas},
> > +for details about the pragma.
> > +
> > +On x86, the @code{target} attribute can also be used to create multiple
> > +versions of a function, compiled with different target-specific options.
> > +@xref{Function Multiversioning} for more details.
> >  
> >  @cindex @code{target_clones} function attribute
> >  @item target_clones (@var{options})
> >  The @code{target_clones} attribute is used to specify that a function
> > -be cloned into multiple versions compiled with different target options
> > -than specified on the command line.  The supported options and restrictions
> > -are the same as for @code{target} attribute.
> > -
> > -For instance, on an x86, you could compile a function with
> > -@code{target_clones("sse4.1,avx")}.  GCC creates two function clones,
> > -one compiled with @option{-msse4.1} and another with @option{-mavx}.
> > -
> > -On a PowerPC, you can compile a function with
> > -@code{target_clones("cpu=power9,default")}.  GCC will create two
> > -function clones, one compiled with @option{-mcpu=power9} and another
> > -with the default options.  GCC must be configured to use GLIBC 2.23 or
> > -newer in order to use the @code{target_clones} attribute.
> > -
> > -It also creates a resolver function (see
> > -the @code{ifunc} attribut

testsuite: Remove no_fsanitize_address install directory dependency

2024-07-10 Thread Matthew Malcomson
The current no_fsanitize_address effective target check (implemented in
target-supports.exp rather than in asan.exp) has some problems with the
link path.

Because it is not called from in between asan_init and asan_finish the
link paths of the compiler are not changed to point at the build
directories.

That means that they point at the install directory that the current
build is configured for.  Hence this test passes if the current compiler
has ASAN support *and* if there are ASAN libraries in the directory that
this build is configured to install into.

That is an unnecessary requirement.  On looking through each of the
tests that currently use no_fsanitize_address it seems all are `compile`
tests.  Hence we can change the logical test of the effective target
from "can we link an ASAN executable" to "can we compile for ASAN" and
avoid the need to set up link paths correctly for this test.

N.b. one alternative would be to remove this effective target and try to
move all tests which currently use this into directories which run their
tests between calls to `asan_finish` and `asan_init`.  This seems like
it might ensure a clearer division of "asan tests must be run in X
directories" and avoid problems similar to the one here in the future.
I'm suggesting this change as it appears the easiest to make and I
didn't think the above too bad a risk to take -- especially if the name
of the test is clear.

In doing this I also inverted the meaning of this check.  Rather than
the function indicating whether we *do not* support something and
tests using `dg-skip-if` to avoid running a test if this returns true,
this function indicates whether we *do* support something and tests
use `dg-require-effective-target` to run a test if it returns true.

Testing done by checking that each of the affected testcases changes
from UNSUPPORTED to PASS when run individually without any install
directory available.

gcc/testsuite/ChangeLog:

* g++.dg/warn/uninit-pr93100.C: Convert no_fsanitize_address use
to fsanitize_address_compilation.
* gcc.dg/pr91441.c: Likewise.
* gcc.dg/pr96260.c: Likewise.
* gcc.dg/pr96307.c: Likewise.
* gcc.dg/uninit-pr93100.c: Likewise.
* gnat.dg/asan1.adb: Likewise.
* gcc.target/aarch64/sve/pr97696.c: Likewise.
* lib/target-supports.exp (no_fsanitize_address): Rename to ...
(fsanitize_address_compilation): Make this a compile-only test
and avoid the need to set linker paths and invert semantics.



### Attachment also inlined for ease of reply###


diff --git a/gcc/testsuite/g++.dg/warn/uninit-pr93100.C 
b/gcc/testsuite/g++.dg/warn/uninit-pr93100.C
index 
e08a36d68a91ba9620ab44d5772017d598f50826..2f6056cc3c3e8fd4aa8e88236a31b8f4344b89ca
 100644
--- a/gcc/testsuite/g++.dg/warn/uninit-pr93100.C
+++ b/gcc/testsuite/g++.dg/warn/uninit-pr93100.C
@@ -1,7 +1,7 @@
 /* PR tree-optimization/98508 - Sanitizer disable -Wall and -Wextra
{ dg-do compile }
{ dg-options "-O0 -Wall -fsanitize=address" }
-   { dg-skip-if "no address sanitizer" { no_fsanitize_address } } */
+   { dg-require-effective-target fsanitize_address_compilation } */
 
 struct S
 {
diff --git a/gcc/testsuite/gcc.dg/pr91441.c b/gcc/testsuite/gcc.dg/pr91441.c
index 
4c785f61e597533202f9d3a42ce5a94aa3fd758f..4bd4c295913262463e2029f4a0466719474cc77a
 100644
--- a/gcc/testsuite/gcc.dg/pr91441.c
+++ b/gcc/testsuite/gcc.dg/pr91441.c
@@ -1,7 +1,7 @@
 /* PR target/91441 */
 /* { dg-do compile  } */
 /* { dg-options "--param asan-stack=1 -fsanitize=kernel-address" } */
-/* { dg-skip-if "no address sanitizer" { no_fsanitize_address } } */
+/* { dg-require-effective-target fsanitize_address_compilation } */
 
 int *bar(int *);
 int *f( int a)
diff --git a/gcc/testsuite/gcc.dg/pr96260.c b/gcc/testsuite/gcc.dg/pr96260.c
index 
587afb76116c5759751d5d6a2ceb1b4a392bc38a..a2b326c8e567972494e6e64d33a3e7f6c514f91d
 100644
--- a/gcc/testsuite/gcc.dg/pr96260.c
+++ b/gcc/testsuite/gcc.dg/pr96260.c
@@ -1,7 +1,7 @@
 /* PR target/96260 */
 /* { dg-do compile } */
 /* { dg-options "--param asan-stack=1 -fsanitize=kernel-address 
-fasan-shadow-offset=0x10" } */
-/* { dg-skip-if "no address sanitizer" { no_fsanitize_address } } */
+/* { dg-require-effective-target fsanitize_address_compilation } */
 
 int *bar(int *);
 int *f( int a)
diff --git a/gcc/testsuite/gcc.dg/pr96307.c b/gcc/testsuite/gcc.dg/pr96307.c
index 
89002b85c8ea6829e6b78679eedde653bb16753e..f49a9f642d35ffdfb16ac4f715dd8a25793a4817
 100644
--- a/gcc/testsuite/gcc.dg/pr96307.c
+++ b/gcc/testsuite/gcc.dg/pr96307.c
@@ -1,7 +1,7 @@
 /* PR target/96307 */
 /* { dg-do compile } */
 /* { dg-additional-options "-fsanitize=kernel-address 
--param=asan-instrumentation-with-call-threshold=8" } */
-/* { dg-skip-if "no address sanitizer" { no_fsanitize_address } } */
+/* { dg-require-effective-target fsanitize_address_compilation } */
 
 #include 
 enum a {test1, test2, test3=INT_MAX};
diff -

[PATCH] tree-optimization/115825 - improve unroll estimates for volatile accesses

2024-07-10 Thread Richard Biener
The loop unrolling code assumes that one third of all volatile accesses
can be possibly optimized away which is of course not true.  This leads
to excessive unrolling in some cases.  The following tracks the number
of stmts with side-effects as those are not eliminatable later and
only assumes one third of the other stmts can be further optimized.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

There's quite some testsuite fallout, mostly because of different rounding
and a size of 8 now no longer is optimistically optimized to 5 but only 6.
I can fix that by writing

  *est_eliminated = (unr_insns - not_elim) / 3;

as

  *est_eliminated = unr_insns - not_elim - (unr_insns - not_elim) * 2 / 3;

to preserve the old rounding behavior.  But for example

FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 LP64 note (test for 
warnings, line 56)

shows

  size:   3 C::C (_25, &MEM  [(void *)&_ZTT2D1 + 48B]);

which we now consider not being optimizable (correctly I think) and thus
the optimistic size reduction isn't enough to get the loop unrolled.
Previously the computed size of 20 was reduced to 13, exactly the size
of the not unrolled body.

So the remaining fallout will be

+FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 LP64 note (test for 
warnings
, line 56)
+FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 note (test for 
warnings, lin
e 66)
...
+FAIL: c-c++-common/ubsan/unreachable-3.c  -std=gnu++14  scan-tree-dump 
optimized "__builtin___ubsan_handle_builtin_unreachable"
...
+FAIL: c-c++-common/ubsan/unreachable-3.c   -O0   scan-tree-dump optimized 
"__builtin___ubsan_handle_builtin_unreachable"

for the latter the issue is __builtin___sanitizer_cov_trace_pc ()

Does this seem feasible overall?  I can fixup the testcases above
with #pragma unroll ...

Thanks,
Richard.

PR tree-optimization/115825
* tree-ssa-loop-ivcanon.cc (loop_size::not_eliminatable_after_peeling):
New.
(loop_size::last_iteration_not_eliminatable_after_peeling): Likewise.
(tree_estimate_loop_size): Count stmts with side-effects as
not optimistically eliminatable.
(estimated_unrolled_size): Compute the number of stmts that can
be optimistically eliminated by followup transforms.
(try_unroll_loop_completely): Adjust.

* gcc.dg/tree-ssa/cunroll-17.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c | 11 +++
 gcc/tree-ssa-loop-ivcanon.cc   | 35 +-
 2 files changed, 38 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
new file mode 100644
index 000..282db99c883
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-tree-optimized" } */
+
+char volatile v;
+void for16 (void)
+{
+  for (char i = 16; i > 0; i -= 2)
+v = i;
+}
+
+/* { dg-final { scan-tree-dump-times " ={v} " 1 "optimized" } } */
diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index 5ef24a91917..dd941c31648 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -139,11 +139,16 @@ struct loop_size
  variable where induction variable starts at known constant.)  */
   int eliminated_by_peeling;
 
+  /* Number of instructions that cannot be further optimized in the
+ peeled loop, for example volatile accesses.  */
+  int not_eliminatable_after_peeling;
+
   /* Same statistics for last iteration of loop: it is smaller because
  instructions after exit are not executed.  */
   int last_iteration;
   int last_iteration_eliminated_by_peeling;
-  
+  int last_iteration_not_eliminatable_after_peeling;
+
   /* If some IV computation will become constant.  */
   bool constant_iv;
 
@@ -267,8 +272,10 @@ tree_estimate_loop_size (class loop *loop, edge exit, edge 
edge_to_cancel,
 
   size->overall = 0;
   size->eliminated_by_peeling = 0;
+  size->not_eliminatable_after_peeling = 0;
   size->last_iteration = 0;
   size->last_iteration_eliminated_by_peeling = 0;
+  size->last_iteration_not_eliminatable_after_peeling = 0;
   size->num_pure_calls_on_hot_path = 0;
   size->num_non_pure_calls_on_hot_path = 0;
   size->non_call_stmts_on_hot_path = 0;
@@ -292,6 +299,7 @@ tree_estimate_loop_size (class loop *loop, edge exit, edge 
edge_to_cancel,
{
  gimple *stmt = gsi_stmt (gsi);
  int num = estimate_num_insns (stmt, &eni_size_weights);
+ bool not_eliminatable_after_peeling = false;
  bool likely_eliminated = false;
  bool likely_eliminated_last = false;
  bool likely_eliminated_peeled = false;
@@ -304,7 +312,9 @@ tree_estimate_loop_size (class loop *loop, edge exit, edge 
edge_to_cancel,
 
  /* Look for reasons why we might optimize this stmt away. */
 
- if (!gimple_has_side_effects (stmt))
+ 

Re: [PATCH] Add gcc.gnu.org account names to MAINTAINERS

2024-07-10 Thread Jakub Jelinek
On Wed, Jul 10, 2024 at 12:36:15PM +0100, Richard Sandiford wrote:
> The account names in the file were taken from a trawl of the
> gcc-cvs archives, with a very small number of manual edits for
> ambiguities.  There are a handful of names that I couldn't find;
> the new column has "-" for those.

I think we should list the account names (rather than -) only for users
which actually have acco...@gcc.gnu.org bugzilla accounts, otherwise this
doesn't serve its purpose.  E.g. I think Andrew MacLeod has amacleod account
on sourceware but not in bugzilla, so trying to CC him will not work, one
needs to know that the email address listed in MAINTAINERS should be used
in that case instead.  I think there are a few similar exceptions.

Jakub



Re: testsuite: Remove no_fsanitize_address install directory dependency

2024-07-10 Thread Matthew Malcomson

... Oops, just after sending I noticed that
`check_effective_target_fsanitize_address_compilation` is caching its 
result under the same name as the original 
`check_effective_target_fsanitize_address` in `asan-dg.exp`.


Attaching an updated patch (with updated cover letter) that adjusts the 
property that uses a different name to cache under, given that this is

testing something different than the original function.
Have again tested that the problematic tests run without an install 
directory.



On 7/10/24 13:19, Matthew Malcomson wrote:

The current no_fsanitize_address effective target check (implemented in
target-supports.exp rather than in asan.exp) has some problems with the
link path.

Because it is not called from in between asan_init and asan_finish the
link paths of the compiler are not changed to point at the build
directories.

That means that they point at the install directory that the current
build is configured for.  Hence this test passes if the current compiler
has ASAN support *and* if there are ASAN libraries in the directory that
this build is configured to install into.

That is an unnecessary requirement.  On looking through each of the
tests that currently use no_fsanitize_address it seems all are `compile`
tests.  Hence we can change the logical test of the effective target
from "can we link an ASAN executable" to "can we compile for ASAN" and
avoid the need to set up link paths correctly for this test.

N.b. one alternative would be to remove this effective target and try to
move all tests which currently use this into directories which run their
tests between calls to `asan_finish` and `asan_init`.  This seems like
it might ensure a clearer division of "asan tests must be run in X
directories" and avoid problems similar to the one here in the future.
I'm suggesting this change as it appears the easiest to make and I
didn't think the above too bad a risk to take -- especially if the name
of the test is clear.

In doing this I also inverted the meaning of this check.  Rather than
the function indicating whether we *do not* support something and
tests using `dg-skip-if` to avoid running a test if this returns true,
this function indicates whether we *do* support something and tests
use `dg-require-effective-target` to run a test if it returns true.

Testing done by checking that each of the affected testcases changes
from UNSUPPORTED to PASS when run individually without any install
directory available.

gcc/testsuite/ChangeLog:

* g++.dg/warn/uninit-pr93100.C: Convert no_fsanitize_address use
to fsanitize_address_compilation.
* gcc.dg/pr91441.c: Likewise.
* gcc.dg/pr96260.c: Likewise.
* gcc.dg/pr96307.c: Likewise.
* gcc.dg/uninit-pr93100.c: Likewise.
* gnat.dg/asan1.adb: Likewise.
* gcc.target/aarch64/sve/pr97696.c: Likewise.
* lib/target-supports.exp (no_fsanitize_address): Rename to ...
(fsanitize_address_compilation): Make this a compile-only test
and avoid the need to set linker paths and invert semantics.



### Attachment also inlined for ease of reply###


diff --git a/gcc/testsuite/g++.dg/warn/uninit-pr93100.C 
b/gcc/testsuite/g++.dg/warn/uninit-pr93100.C
index 
e08a36d68a91ba9620ab44d5772017d598f50826..2f6056cc3c3e8fd4aa8e88236a31b8f4344b89ca
 100644
--- a/gcc/testsuite/g++.dg/warn/uninit-pr93100.C
+++ b/gcc/testsuite/g++.dg/warn/uninit-pr93100.C
@@ -1,7 +1,7 @@
  /* PR tree-optimization/98508 - Sanitizer disable -Wall and -Wextra
 { dg-do compile }
 { dg-options "-O0 -Wall -fsanitize=address" }
-   { dg-skip-if "no address sanitizer" { no_fsanitize_address } } */
+   { dg-require-effective-target fsanitize_address_compilation } */
  
  struct S

  {
diff --git a/gcc/testsuite/gcc.dg/pr91441.c b/gcc/testsuite/gcc.dg/pr91441.c
index 
4c785f61e597533202f9d3a42ce5a94aa3fd758f..4bd4c295913262463e2029f4a0466719474cc77a
 100644
--- a/gcc/testsuite/gcc.dg/pr91441.c
+++ b/gcc/testsuite/gcc.dg/pr91441.c
@@ -1,7 +1,7 @@
  /* PR target/91441 */
  /* { dg-do compile  } */
  /* { dg-options "--param asan-stack=1 -fsanitize=kernel-address" } */
-/* { dg-skip-if "no address sanitizer" { no_fsanitize_address } } */
+/* { dg-require-effective-target fsanitize_address_compilation } */
  
  int *bar(int *);

  int *f( int a)
diff --git a/gcc/testsuite/gcc.dg/pr96260.c b/gcc/testsuite/gcc.dg/pr96260.c
index 
587afb76116c5759751d5d6a2ceb1b4a392bc38a..a2b326c8e567972494e6e64d33a3e7f6c514f91d
 100644
--- a/gcc/testsuite/gcc.dg/pr96260.c
+++ b/gcc/testsuite/gcc.dg/pr96260.c
@@ -1,7 +1,7 @@
  /* PR target/96260 */
  /* { dg-do compile } */
  /* { dg-options "--param asan-stack=1 -fsanitize=kernel-address 
-fasan-shadow-offset=0x10" } */
-/* { dg-skip-if "no address sanitizer" { no_fsanitize_address } } */
+/* { dg-require-effective-target fsanitize_address_compilation } */
  
  int *bar(int *);

  int *f( int a)
diff --git a/gcc/testsuite/gcc.dg/pr96307.c b/gcc/t

Re: testsuite: Remove no_fsanitize_address install directory dependency

2024-07-10 Thread Rainer Orth
Hi Matthew,

> The current no_fsanitize_address effective target check (implemented in
> target-supports.exp rather than in asan.exp) has some problems with the
> link path.
>
> Because it is not called from in between asan_init and asan_finish the
> link paths of the compiler are not changed to point at the build
> directories.
>
> That means that they point at the install directory that the current
> build is configured for.  Hence this test passes if the current compiler
> has ASAN support *and* if there are ASAN libraries in the directory that
> this build is configured to install into.
>
> That is an unnecessary requirement.  On looking through each of the
> tests that currently use no_fsanitize_address it seems all are `compile`
> tests.  Hence we can change the logical test of the effective target
> from "can we link an ASAN executable" to "can we compile for ASAN" and
> avoid the need to set up link paths correctly for this test.
>
> N.b. one alternative would be to remove this effective target and try to
> move all tests which currently use this into directories which run their
> tests between calls to `asan_finish` and `asan_init`.  This seems like
> it might ensure a clearer division of "asan tests must be run in X
> directories" and avoid problems similar to the one here in the future.
> I'm suggesting this change as it appears the easiest to make and I
> didn't think the above too bad a risk to take -- especially if the name
> of the test is clear.

moving the tests would be clearer IMO, otherwise we have two separate
mechanisms for the same issue.  Especially since we're talking about 7
tests only.  The only complication would be the aarch64 test where
there's currently no asan subdir.

> In doing this I also inverted the meaning of this check.  Rather than
> the function indicating whether we *do not* support something and
> tests using `dg-skip-if` to avoid running a test if this returns true,
> this function indicates whether we *do* support something and tests
> use `dg-require-effective-target` to run a test if it returns true.
>
> Testing done by checking that each of the affected testcases changes
> from UNSUPPORTED to PASS when run individually without any install
> directory available.

Please remember to state on which target you've run the tests.

> gcc/testsuite/ChangeLog:
>
>   * g++.dg/warn/uninit-pr93100.C: Convert no_fsanitize_address use
>   to fsanitize_address_compilation.

*If* we go this route (and as I said I'd prefer not to; maybe Mike
differs), please rename the new keyword to fsanitize_address_compile in
line with e.g. "dg-do compile".  You also would need to adjust
sourcebuild.texi for the change.

> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 
> d3edc7d839ec2d9460501c157c1129176542d870..d9ea048d01212c4d28e68860a38c523cea8f2393
>  100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -13166,16 +13166,16 @@ proc check_effective_target_movdir { } {
>  } "-mmovdiri -mmovdir64b" ]
>  }
>  
> -# Return 1 if the target does not support address sanitizer, 0 otherwise
> +# Return 1 if the target supports address sanitizer, 0 otherwise

Comment needs adjustment, too.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


RE: [PATCH 2/3] Support group-size of three in SLP load permutation lowering

2024-07-10 Thread Richard Biener
On Wed, 10 Jul 2024, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, July 10, 2024 10:04 AM
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH 2/3] Support group-size of three in SLP load permutation 
> > lowering
> > 
> > The following adds support for group-size three in SLP load permutation
> > lowering to match the non-SLP capabilities.  This is done by using
> > the non-interleaving fallback code which then creates at VF == 4 from
> > { { a0, b0, c0 }, { a1, b1, c1 }, { a2, b2, c2 }, { a3, b3, c3 } }
> > the intermediate vectors { c0, c0, c1, c1 } and { c2, c2, c3, c3 }
> > to produce { c0, c1, c2, c3 }.
> > 
> 
> Just curious, is this only for the 3rd component then? I'm assuming the firs 
> two get handled by
> {a0, b0, a1, b1} {a2, b2, a3, b3} still?

Yes, the example is only for the third component, the first and second
get handled by a similar two-stage approach so in total you have
6 permutes to reduce the four vectors down to three.  When you have
ld3 that's obviously going to be better to use, see the load/store-lane
followup in [3/3].

Richard.

> Regards,
> Tamar
> 
> > This turns out to be more effective than the scheme implemented
> > for non-SLP for SSE and only slightly worse for AVX512 and a bit
> > more worse for AVX2.  It seems to me that this would extend to
> > other non-power-of-two group-sizes though (but the patch does not).
> > Optimal schemes are likely difficult to lay out in VF agnostic form.
> > 
> > I'll note that while the lowering assumes even/odd extract is
> > generally available for all vector element sizes (which is probably
> > a good assumption), it doesn't in any way constrain the other
> > permutes it generates based on target availability.  Again difficult
> > to do in a VF agnostic way (but at least currently the vector type
> > is fixed).
> > 
> > I'll also note that the SLP store side merges lanes in a way
> > producing three-vector permutes for store group-size of three, so
> > the testcase uses a store group-size of four.
> > 
> > * tree-vect-slp.cc (vect_lower_load_permutations): Support
> > group-size of three.
> > 
> > * gcc.dg/vect/slp-52.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/vect/slp-52.c | 14 
> >  gcc/tree-vect-slp.cc   | 35 +-
> >  2 files changed, 34 insertions(+), 15 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-52.c
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/slp-52.c 
> > b/gcc/testsuite/gcc.dg/vect/slp-52.c
> > new file mode 100644
> > index 000..ba49f0046e2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/slp-52.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +
> > +void foo (int * __restrict x, int *y)
> > +{
> > +  for (int i = 0; i < 1024; ++i)
> > +{
> > +  x[4*i+0] = y[3*i+0];
> > +  x[4*i+1] = y[3*i+1] * 2;
> > +  x[4*i+2] = y[3*i+2] + 3;
> > +  x[4*i+3] = y[3*i+2] * 2 - 5;
> > +}
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { 
> > target {
> > vect_int && vect_int_mult } } } } */
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 0f830c1ad9c..2dc6d365303 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -3710,7 +3710,8 @@ vect_build_slp_instance (vec_info *vinfo,
> >  with the least number of lanes to one and then repeat until
> >  we end up with two inputs.  That scheme makes sure we end
> >  up with permutes satisfying the restriction of requiring at
> > -most two vector inputs to produce a single vector output.  */
> > +most two vector inputs to produce a single vector output
> > +when the number of lanes is even.  */
> >   while (SLP_TREE_CHILDREN (perm).length () > 2)
> > {
> >   /* When we have three equal sized groups left the pairwise
> > @@ -4050,11 +4051,10 @@ vect_lower_load_permutations (loop_vec_info
> > loop_vinfo,
> >  = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (loads[0])[0]);
> > 
> >/* Only a power-of-two number of lanes matches interleaving with N 
> > levels.
> > - The non-SLP path also supports DR_GROUP_SIZE == 3.
> >   ???  An even number of lanes could be reduced to 1< > lanes
> >   at each step.  */
> >unsigned group_lanes = DR_GROUP_SIZE (first);
> > -  if (exact_log2 (group_lanes) == -1)
> > +  if (exact_log2 (group_lanes) == -1 && group_lanes != 3)
> >  return;
> > 
> >for (slp_tree load : loads)
> > @@ -4071,7 +4071,7 @@ vect_lower_load_permutations (loop_vec_info
> > loop_vinfo,
> >  with a non-1:1 load permutation around instead of canonicalizing
> >  those into a load and a permute node.  Removing this early
> >  check would do such canonicalization.  */
> > -  if (SLP_TREE_LANES (load) >= group_lanes / 2)
> > +  if (SLP_TREE_LANES (load) >= (group_lanes + 1) / 2)
> 

[Fortran, Patch, PR82904] Fix [11/12/13/14/15 Regression][Coarray] ICE in make_ssa_name_fn, at tree-ssanames.c:261

2024-07-10 Thread Andre Vehreschild
Hi all,

the patch attached fixes the use of an uninitialized variable for the string
length in the declaration of the char[1:_len] type (the _len!). The type for
save'd deferred length char arrays is now char*, so that there is no need for
the length in the type declaration anymore. The length is of course still
provided and needed later on.

I hope this fixes the ICE in the IPA: inline phase, because I never saw it. Is
that what you had in mind @Richard?

Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 48f0a67b4ea241567f660052302f6f021778b232 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 10 Jul 2024 14:37:37 +0200
Subject: [PATCH] Fortran: Use char* for deferred length character arrays
 [PR82904]

Randomly during compiling the pass IPA: inline would ICE.  This was
caused by a saved deferred length string.  The length variable was not
set, but the variable was used in the array's declaration.  Now using a
character pointer to prevent this.

gcc/fortran/ChangeLog:

	* trans-types.cc (gfc_sym_type): Use type `char*` for saved
	deferred length char arrays.
	* trans.cc (get_array_span): Get `.span` also for `char*` typed
	arrays, i.e. for those that have INTEGER_TYPE instead of
	ARRAY_TYPE.

gcc/testsuite/ChangeLog:

	* gfortran.dg/deferred_character_38.f90: New test.
---
 gcc/fortran/trans-types.cc|  6 --
 gcc/fortran/trans.cc  |  4 +++-
 .../gfortran.dg/deferred_character_38.f90 | 20 +++
 3 files changed, 27 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/deferred_character_38.f90

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 42a7934db9d..c76cdca4eae 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -2320,8 +2320,10 @@ gfc_sym_type (gfc_symbol * sym, bool is_bind_c)
 	  || ((sym->attr.result || sym->attr.value)
 	  && sym->ns->proc_name
 	  && sym->ns->proc_name->attr.is_bind_c)
-	  || (sym->ts.deferred && (!sym->ts.u.cl
-   || !sym->ts.u.cl->backend_decl))
+	  || (sym->ts.deferred
+	  && (!sym->ts.u.cl
+		  || !sym->ts.u.cl->backend_decl
+		  || sym->attr.save))
 	  || (sym->attr.dummy
 	  && sym->attr.value
 	  && gfc_length_one_character_type_p (&sym->ts
diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index 1067e032621..d4c54093cbc 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -398,7 +398,9 @@ get_array_span (tree type, tree decl)
 return gfc_conv_descriptor_span_get (decl);

   /* Return the span for deferred character length array references.  */
-  if (type && TREE_CODE (type) == ARRAY_TYPE && TYPE_STRING_FLAG (type))
+  if (type
+  && (TREE_CODE (type) == ARRAY_TYPE || TREE_CODE (type) == INTEGER_TYPE)
+  && TYPE_STRING_FLAG (type))
 {
   if (TREE_CODE (decl) == PARM_DECL)
 	decl = build_fold_indirect_ref_loc (input_location, decl);
diff --git a/gcc/testsuite/gfortran.dg/deferred_character_38.f90 b/gcc/testsuite/gfortran.dg/deferred_character_38.f90
new file mode 100644
index 000..d5a6c0e5013
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/deferred_character_38.f90
@@ -0,0 +1,20 @@
+! { dg-do run }
+
+! Check for PR fortran/82904
+! Contributed by G.Steinmetz  
+
+! This test checks that 'IPA pass: inline' passes.
+! The initial version of the testcase contained coarrays, which does not work
+! yet.
+
+program p
+   save
+   character(:), allocatable :: x
+   character(:), allocatable :: y
+   allocate (character(3) :: y)
+   allocate (x, source='abc')
+   y = x
+
+   if (y /= 'abc') stop 1
+end
+
--
2.45.2



Re: [Fortran, Patch, PR82904] Fix [11/12/13/14/15 Regression][Coarray] ICE in make_ssa_name_fn, at tree-ssanames.c:261

2024-07-10 Thread Richard Biener
On Wed, 10 Jul 2024, Andre Vehreschild wrote:

> Hi all,
> 
> the patch attached fixes the use of an uninitialized variable for the string
> length in the declaration of the char[1:_len] type (the _len!). The type for
> save'd deferred length char arrays is now char*, so that there is no need for
> the length in the type declaration anymore. The length is of course still
> provided and needed later on.
> 
> I hope this fixes the ICE in the IPA: inline phase, because I never saw it. Is
> that what you had in mind @Richard?

I think this will fix the issue by side-stepping the use of a 
variable-length typed variable.
 
> Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?
> 
> Regards,
>   Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] fixincludes: add bypass to darwin_objc_runtime_1

2024-07-10 Thread FX Coudert
The  header that this fix applies to has been fixed in macOS 15 
beta SDK. Therefore, we can include a bypass.
Tested on aarch64-apple-darwin24. OK to push?

FX



0001-fixincludes-add-bypass-to-darwin_objc_runtime_1.patch
Description: Binary data


[PATCH 0/1] AArch64: LUTI2/LUTI4 ACLE for SVE2

2024-07-10 Thread vladimir.miloserdov
From: Vladimir Miloserdov 

Hi All,

This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.

LUTI instructions are used for efficient table lookups with 2-bit or 4-bit
indices. LUTI2 reads indexed 8-bit or 16-bit elements from the low 128 bits of
the table vector using packed 2-bit indices, while LUTI4 can read from the low
128 or 256 bits of the table vector or from two table vectors using packed 
4-bit indices. These instructions fill the destination vector by copying 
elements indexed by segments of the source vector, selected by the vector 
segment index.

The changes include the addition of a new AArch64 option extension "lut",
__ARM_FEATURE_LUT preprocessor macro, definitions for the new LUTI instruction
shapes, and implementations of the svluti2 and svluti4 builtins.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

This depends on "Extend aarch64_feature_flags to 128 bits" work which is soon 
to be submitted upstream as we ran out of 64-bit flags. 

The patch needs to be committed for me as I don't have commit rights.

Ok for master when the pre-requisites get committed? 

BR,
- Vladimir

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): 
Add support for __ARM_FEATURE_LUT preprocessor macro.
* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): 
Add "lut" option extension.
* config/aarch64/aarch64-sve-builtins-shapes.cc (struct luti_base): 
Define new LUTI ACLE shapes.
(SHAPE): Define shapes for luti2 and luti4.
* config/aarch64/aarch64-sve-builtins-shapes.h: Add declarations 
for luti2 and luti4.
* config/aarch64/aarch64-sve-builtins-sve2.cc (class svluti_lane_impl): 
Implement support for LUTI instructions.
(FUNCTION): Register svluti2 and svluti4 functions.
* config/aarch64/aarch64-sve-builtins-sve2.def (svluti2): 
Define svluti2 function.
(svluti4): Define svluti4 function.
* config/aarch64/aarch64-sve-builtins-sve2.h: Add declarations 
for svluti2 and svluti4.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_luti): 
Define machine description patterns for LUTI.
* config/aarch64/aarch64.h (AARCH64_ISA_LUT): Define macro for LUTI.
(TARGET_LUT): Likewise.
* config/aarch64/iterators.md: Define mode iterators 
for LUTI MD patterns.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add macro for 
SVE ACLE to enable LUTI tests.
* lib/target-supports.exp: Update to include check for the LUT feature.
* gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16_vg1x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16_vg1x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16_vg1x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16_vg1x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.



[PATCH 1/1] AArch64: Add LUTI ACLE for SVE2

2024-07-10 Thread vladimir.miloserdov

This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.

LUTI instructions are used for efficient table lookups with 2-bit
or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from
the low 128 bits of the table vector using packed 2-bit indices,
while LUTI4 can read from the low 128 or 256 bits of the table
vector or from two table vectors using packed 4-bit indices.
These instructions fill the destination vector by copying elements
indexed by segments of the source vector, selected by the vector
segment index.

The changes include the addition of a new AArch64 option
extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions
for the new LUTI instruction shapes, and implementations of the
svluti2 and svluti4 builtins.

New tests are added as well
---
 gcc/config/aarch64/aarch64-c.cc   |  1 +
 .../aarch64/aarch64-option-extensions.def |  2 +
 .../aarch64/aarch64-sve-builtins-shapes.cc| 41 +
 .../aarch64/aarch64-sve-builtins-shapes.h |  2 +
 .../aarch64/aarch64-sve-builtins-sve2.cc  | 17 +++
 .../aarch64/aarch64-sve-builtins-sve2.def |  4 ++
 .../aarch64/aarch64-sve-builtins-sve2.h   |  2 +
 gcc/config/aarch64/aarch64-sve2.md| 45 +++
 gcc/config/aarch64/aarch64.h  |  5 +++
 gcc/config/aarch64/iterators.md   | 10 +
 .../aarch64/sve/acle/asm/test_sve_acle.h  | 16 ++-
 .../aarch64/sve2/acle/asm/luti2_bf16.c| 35 +++
 .../aarch64/sve2/acle/asm/luti2_f16.c | 35 +++
 .../aarch64/sve2/acle/asm/luti2_s16.c | 35 +++
 .../aarch64/sve2/acle/asm/luti2_s8.c  | 35 +++
 .../aarch64/sve2/acle/asm/luti2_u16.c | 35 +++
 .../aarch64/sve2/acle/asm/luti2_u8.c  | 35 +++
 .../aarch64/sve2/acle/asm/luti4_bf16.c| 35 +++
 .../aarch64/sve2/acle/asm/luti4_bf16_x2.c | 15 +++
 .../aarch64/sve2/acle/asm/luti4_f16.c | 35 +++
 .../aarch64/sve2/acle/asm/luti4_f16_x2.c  | 15 +++
 .../aarch64/sve2/acle/asm/luti4_s16.c | 35 +++
 .../aarch64/sve2/acle/asm/luti4_s16_x2.c  | 15 +++
 .../aarch64/sve2/acle/asm/luti4_s8.c  | 25 +++
 .../aarch64/sve2/acle/asm/luti4_u16.c | 35 +++
 .../aarch64/sve2/acle/asm/luti4_u16_x2.c  | 15 +++
 .../aarch64/sve2/acle/asm/luti4_u8.c  | 25 +++
 gcc/testsuite/lib/target-supports.exp | 12 +
 28 files changed, 616 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_f16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u8.c

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 6f2111434b3..099d9be8080 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -267,6 +267,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", pfile);
   aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", pfile);
   aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
+  aarch64_def_or_undef (TARGET_LUT, "__ARM_FEATURE_LUT", pfile);
 
   /* Not for ACLE, but required to keep "float.h" correct if we switch
  target between implementations that do or do not support ARMv8.2-A
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 42ec0eec31e..840f52e08ed 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
 
 AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
 
+AARCH64_OPT_EXTENSION("lut", LUT, 

[r15-1936 Regression] FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c execution test on Linux/x86_64

2024-07-10 Thread haochen.jiang
On Linux/x86_64,

80e446e829d818dc19daa6e671b9626e93ee4949 is the first bad commit
commit 80e446e829d818dc19daa6e671b9626e93ee4949
Author: Pan Li 
Date:   Fri Jul 5 20:36:35 2024 +0800

Match: Support form 2 for the .SAT_TRUNC

caused

FAIL: gcc.target/i386/avx512f-vpmovusqb-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpmovusdb-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpmovusdw-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpmovusqb-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpmovusqd-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpmovusqw-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-1936/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512f-vpmovusqb-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512f-vpmovusqb-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusdb-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusdb-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusdb-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusdb-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusdw-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusdw-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusdw-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusdw-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusqb-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusqb-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusqd-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusqd-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusqw-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovusqw-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vpmovuswb-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: testsuite: Remove no_fsanitize_address install directory dependency

2024-07-10 Thread Matthew Malcomson

On 7/10/24 13:42, Rainer Orth wrote:

N.b. one alternative would be to remove this effective target and try to
move all tests which currently use this into directories which run their
tests between calls to `asan_finish` and `asan_init`.  This seems like
it might ensure a clearer division of "asan tests must be run in X
directories" and avoid problems similar to the one here in the future.
I'm suggesting this change as it appears the easiest to make and I
didn't think the above too bad a risk to take -- especially if the name
of the test is clear.


moving the tests would be clearer IMO, otherwise we have two separate
mechanisms for the same issue.  Especially since we're talking about 7
tests only.  The only complication would be the aarch64 test where
there's currently no asan subdir.


I'm in the middle of doing this.

Just wanted to mention the other complications (in case that changes 
what looks like the best option) -- the compilation options that tests 
get run under change (because they're being run by a different test 
runner), and also there's no `asan` subdir for gnat.dg (the ada test).


I'm thinking that the change in options is not that bad (will eventually 
filter out the options that are not valid for the test, will likely run 
with slightly less variations than before), but still worth mentioning.





In doing this I also inverted the meaning of this check.  Rather than
the function indicating whether we *do not* support something and
tests using `dg-skip-if` to avoid running a test if this returns true,
this function indicates whether we *do* support something and tests
use `dg-require-effective-target` to run a test if it returns true.

Testing done by checking that each of the affected testcases changes
from UNSUPPORTED to PASS when run individually without any install
directory available.


Please remember to state on which target you've run the tests.


Good point ;-)
I ran the SVE test on AArch64, the ada test on x86 (because that's where 
I could install a host gnat compiler easily to build ada), and all the 
others on both these targets.


Re: Ping^3 [PATCH-1v3] Value Range: Add range op for builtin isinf

2024-07-10 Thread Xi Ruoyao
On Mon, 2024-07-01 at 09:11 +0800, HAO CHEN GUI wrote:
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html

I guess you can add PR114678 into the subject and the ChangeLog, and
also mention the patch in the bugzilla.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


RE: [PATCH]middle-end: Implement conditonal store vectorizer pattern [PR115531]

2024-07-10 Thread Tamar Christina
Sorry missed a review comment to change !DR_IS_WRITE into DR_IS_READ.

Updated patch:

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/115531
* tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New.
(vect_recog_cond_store_pattern): New.
(vect_vect_recog_func_ptrs): Use it.
* target.def (conditional_operation_is_expensive): New.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document it.
* targhooks.cc (default_conditional_operation_is_expensive): New.
* targhooks.h (default_conditional_operation_is_expensive): New.
* tree-vectorizer.h (may_be_nonaddressable_p): New.

-- inline copy of patch --

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
f10d9a59c6673a02823fc05132235af3a1ad7c65..c7535d07f4ddd16d55e0ab9b609a2bf95931a2f4
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6449,6 +6449,13 @@ The default implementation returns a 
@code{MODE_VECTOR_INT} with the
 same size and number of elements as @var{mode}, if such a mode exists.
 @end deftypefn
 
+@deftypefn {Target Hook} bool 
TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE (unsigned @var{ifn})
+This hook returns true if masked operation @var{ifn} (really of
+type @code{internal_fn}) should be considered more expensive to use than
+implementing the same operation without masking.  GCC can then try to use
+unconditional operations instead with extra selects.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE 
(unsigned @var{ifn})
 This hook returns true if masked internal function @var{ifn} (really of
 type @code{internal_fn}) should be considered expensive when the mask is
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 
24596eb2f6b4e9ea3ea3464fda171d99155f4c0f..64cea3b1edaf8ec818c0e8095ab50b00ae0cb857
 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4290,6 +4290,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_GET_MASK_MODE
 
+@hook TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE
+
 @hook TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
 
 @hook TARGET_VECTORIZE_CREATE_COSTS
diff --git a/gcc/target.def b/gcc/target.def
index 
ce4d1ecd58be0a1c8110c6993556a52a2c69168e..3de1aad4c84d3df0b171a411f97e1ce70b6f63b5
 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2033,6 +2033,18 @@ same size and number of elements as @var{mode}, if such 
a mode exists.",
  (machine_mode mode),
  default_get_mask_mode)
 
+/* Function to say whether a conditional operation is expensive when
+   compared to non-masked operations.  */
+DEFHOOK
+(conditional_operation_is_expensive,
+ "This hook returns true if masked operation @var{ifn} (really of\n\
+type @code{internal_fn}) should be considered more expensive to use than\n\
+implementing the same operation without masking.  GCC can then try to use\n\
+unconditional operations instead with extra selects.",
+ bool,
+ (unsigned ifn),
+ default_conditional_operation_is_expensive)
+
 /* Function to say whether a masked operation is expensive when the
mask is all zeros.  */
 DEFHOOK
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 
3cbca0f13a5e5de893630c45a6bbe0616b725e86..2704d6008f14d2aa65671f002af886d3b802effa
 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -123,6 +123,7 @@ extern opt_machine_mode default_vectorize_related_mode 
(machine_mode,
poly_uint64);
 extern opt_machine_mode default_get_mask_mode (machine_mode);
 extern bool default_empty_mask_is_expensive (unsigned);
+extern bool default_conditional_operation_is_expensive (unsigned);
 extern vector_costs *default_vectorize_create_costs (vec_info *, bool);
 
 /* OpenACC hooks.  */
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 
b10104c363bf8432082d51c0ecb7e2a6811c2cc2..793932a77c60b0cd4bb670de50b7f7fdf2de2159
 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1608,6 +1608,14 @@ default_get_mask_mode (machine_mode mode)
 
 /* By default consider masked stores to be expensive.  */
 
+bool
+default_conditional_operation_is_expensive (unsigned ifn)
+{
+  return ifn == IFN_MASK_STORE;
+}
+
+/* By default consider masked stores to be expensive.  */
+
 bool
 default_empty_mask_is_expensive (unsigned ifn)
 {
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
86e893a1c4330ae6e8d1a54438c2977da623c4b5..36eec1a46fb653f0a43956425b496ecf58ad10bc
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vector-builder.h"
 #include "vec-perm-indices.h"
 #include "gimple-range.h"
+#include "alias.h"
 
 
 /* TODO:  Note the vectorizer still builds COND_EXPRs with GENERIC compares
@@ -6461,6 +6462,157 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* 

Re: Ping^3 [PATCH-1v3] Value Range: Add range op for builtin isinf

2024-07-10 Thread Xi Ruoyao
On Wed, 2024-07-10 at 21:54 +0800, Xi Ruoyao wrote:
> On Mon, 2024-07-01 at 09:11 +0800, HAO CHEN GUI wrote:
> > Hi,
> >   Gently ping it.
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html
> 
> I guess you can add PR114678 into the subject and the ChangeLog, and
> also mention the patch in the bugzilla.

And, remove xfail in vrp-float-abs-1.c and range-sincos.c (if this patch
works as intended they should no longer fail).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns.

2024-07-10 Thread Victor Do Nascimento
gcc/ChangeLog:

* config/arm/arm-builtins.cc (enum arm_builtins): Add new
ARM_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, UDOTV8QI,
UDOTV16QI, USDOTV8QI, USDOTV16QI.
(arm_init_dotprod_builtins): New.
(arm_init_builtins): Add call to `arm_init_dotprod_builtins'.
(arm_general_gimple_fold_builtin): New.
* config/arm/arm-protos.h (arm_general_gimple_fold_builtin):
New prototype.
* config/arm/arm.cc (arm_gimple_fold_builtin): Add call to
`arm_general_gimple_fold_builtin'.
* config/arm/neon.md (dot_prod): Deleted.
(dot_prod): New.
(neon_usdot): Deleted.
(neon_usdot): New.
---
 gcc/config/arm/arm-builtins.cc   | 95 
 gcc/config/arm/arm-protos.h  |  3 +
 gcc/config/arm/arm.cc|  1 +
 gcc/config/arm/arm_neon_builtins.def |  3 -
 gcc/config/arm/neon.md   |  4 +-
 5 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index c9d50bf8fbb..b23b6caa063 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -45,6 +45,8 @@
 #include "arm-builtins.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "basic-block.h"
+#include "gimple.h"
 
 #define SIMD_MAX_BUILTIN_ARGS 7
 
@@ -1298,6 +1300,13 @@ enum arm_builtins
 #define VAR1(T, N, X) \
   ARM_BUILTIN_##N,
 
+  ARM_BUILTIN_NEON_SDOTV8QI,
+  ARM_BUILTIN_NEON_SDOTV16QI,
+  ARM_BUILTIN_NEON_UDOTV8QI,
+  ARM_BUILTIN_NEON_UDOTV16QI,
+  ARM_BUILTIN_NEON_USDOTV8QI,
+  ARM_BUILTIN_NEON_USDOTV16QI,
+
   ARM_BUILTIN_ACLE_BASE,
   ARM_BUILTIN_SAT_IMM_CHECK = ARM_BUILTIN_ACLE_BASE,
 
@@ -2648,6 +2657,60 @@ arm_init_fp16_builtins (void)
   "__fp16");
 }
 
+static void
+arm_init_dotprod_builtins (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+
+  tree uv8qi = arm_simd_builtin_type (V8QImode, qualifier_unsigned);
+  tree sv8qi = arm_simd_builtin_type (V8QImode, qualifier_none);
+  tree uv16qi = arm_simd_builtin_type (V16QImode, qualifier_unsigned);
+  tree sv16qi = arm_simd_builtin_type (V16QImode, qualifier_none);
+  tree uv2si = arm_simd_builtin_type (V2SImode, qualifier_unsigned);
+  tree sv2si = arm_simd_builtin_type (V2SImode, qualifier_none);
+  tree uv4si = arm_simd_builtin_type (V4SImode, qualifier_unsigned);
+  tree sv4si = arm_simd_builtin_type (V4SImode, qualifier_none);
+
+  struct builtin_decls_data
+  {
+tree out_type_node;
+tree in_type1_node;
+tree in_type2_node;
+const char *builtin_name;
+int function_code;
+  };
+
+#define NAME(A) "__builtin_neon_" #A
+#define ENUM(B) ARM_BUILTIN_NEON_##B
+
+  builtin_decls_data bdda[] =
+  {
+{ sv2si, sv8qi,  sv8qi,  NAME (sdotv8qi),  ENUM (SDOTV8QI)   },
+{ uv2si, uv8qi,  uv8qi,  NAME (udotv8qi_),  ENUM (UDOTV8QI)   },
+{ sv2si, uv8qi,  sv8qi,  NAME (usdotv8qi_ssus), ENUM (USDOTV8QI)  },
+{ sv4si, sv16qi, sv16qi, NAME (sdotv16qi), ENUM (SDOTV16QI)  },
+{ uv4si, uv16qi, uv16qi, NAME (udotv16qi_),  ENUM (UDOTV16QI)  },
+{ sv4si, uv16qi, sv16qi, NAME (usdotv16qi_ssus), ENUM (USDOTV16QI) },
+  };
+
+#undef NAME
+#undef ENUM
+
+  builtin_decls_data *bdd = bdda;
+  builtin_decls_data *bdd_end = bdd + (ARRAY_SIZE (bdda));
+
+  for (; bdd < bdd_end; bdd++)
+  {
+ftype = build_function_type_list (bdd->out_type_node, bdd->out_type_node,
+ bdd->in_type1_node, bdd->in_type2_node,
+ NULL_TREE);
+fndecl = arm_general_add_builtin_function (bdd->builtin_name,
+  ftype, bdd->function_code);
+arm_builtin_decls[bdd->function_code] = fndecl;
+  }
+}
+
 void
 arm_init_builtins (void)
 {
@@ -2676,6 +2739,7 @@ arm_init_builtins (void)
arm_init_neon_builtins ();
   arm_init_vfp_builtins ();
   arm_init_crypto_builtins ();
+  arm_init_dotprod_builtins ();
 }
 
   if (TARGET_CDE)
@@ -2738,6 +2802,37 @@ arm_builtin_decl (unsigned code, bool initialize_p 
ATTRIBUTE_UNUSED)
 }
 }
 
+/* Try to fold STMT, given that it's a call to the built-in function with
+   subcode FCODE.  Return the new statement on success and null on
+   failure.  */
+gimple *
+arm_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt,
+gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED)
+{
+  gimple *new_stmt = NULL;
+  unsigned nargs = gimple_call_num_args (stmt);
+  tree *args = (nargs > 0
+   ? gimple_call_arg_ptr (stmt, 0)
+   : &error_mark_node);
+
+  switch (fcode)
+{
+case ARM_BUILTIN_NEON_SDOTV8QI:
+case ARM_BUILTIN_NEON_SDOTV16QI:
+case ARM_BUILTIN_NEON_UDOTV8QI:
+case ARM_BUILTIN_NEON_UDOTV16QI:
+case ARM_BUILTIN_NEON_USDOTV8QI:
+case ARM_BUILTIN_NEON_USDOTV16QI:
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+   

[PATCH 01/10] optabs: Make all `*dot_prod_optab's modeled as conversions

2024-07-10 Thread Victor Do Nascimento
Given the specification in the GCC internals manual defines the
{u|s}dot_prod standard name as taking "two signed elements of the
same mode, adding them to a third operand of wider mode", there is
currently ambiguity in the relationship between the mode of the first
two arguments and that of the third.

This vagueness means that, in theory, different modes may be
supportable in the third argument.  This flexibility would allow for a
given backend to add to the accumulator a different number of
vectorized products, e.g. A backend may provide instructions for both:

  accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]

and

  accum += a[0] * b[0] + a[1] * b[1],

as is now seen in the SVE2.1 extension to AArch64.  In spite of the
aforementioned flexibility, modeling the dot-product operation as a
direct optab means that we have no way to encode both input and the
accumulator data modes into the backend pattern name, which prevents
us from harnessing this flexibility.

We therefore make all dot_prod optabs conversions, allowing, for
example, for the encoding of both 2-way and 4-way dot product backend
patterns.

gcc/ChangeLog:

* optabs.def (sdot_prod_optab): Convert from OPTAB_D to
OPTAB_CD.
(udot_prod_optab): Likewise.
(usdot_prod_optab): Likewise.
* doc/md.texi (Standard Names): update entries for u,s and us
dot_prod names.
---
 gcc/doc/md.texi | 18 +-
 gcc/optabs.def  |  6 +++---
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 7f4335e0aac..2a74e473f05 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5748,15 +5748,15 @@ for (i = 0; i < LEN + BIAS; i++)
 operand0 += operand2[i];
 @end smallexample
 
-@cindex @code{sdot_prod@var{m}} instruction pattern
-@item @samp{sdot_prod@var{m}}
+@cindex @code{sdot_prod@var{m}@var{n}} instruction pattern
+@item @samp{sdot_prod@var{m}@var{n}}
 
 Compute the sum of the products of two signed elements.
 Operand 1 and operand 2 are of the same mode. Their
 product, which is of a wider mode, is computed and added to operand 3.
 Operand 3 is of a mode equal or wider than the mode of the product. The
 result is placed in operand 0, which is of the same mode as operand 3.
-@var{m} is the mode of operand 1 and operand 2.
+@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 and 
2.
 
 Semantically the expressions perform the multiplication in the following signs
 
@@ -5766,15 +5766,15 @@ sdot ==
 @dots{}
 @end smallexample
 
-@cindex @code{udot_prod@var{m}} instruction pattern
-@item @samp{udot_prod@var{m}}
+@cindex @code{udot_prod@var{m}@var{n}} instruction pattern
+@item @samp{udot_prod@var{m}@var{n}}
 
 Compute the sum of the products of two unsigned elements.
 Operand 1 and operand 2 are of the same mode. Their
 product, which is of a wider mode, is computed and added to operand 3.
 Operand 3 is of a mode equal or wider than the mode of the product. The
 result is placed in operand 0, which is of the same mode as operand 3.
-@var{m} is the mode of operand 1 and operand 2.
+@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 and 
2.
 
 Semantically the expressions perform the multiplication in the following signs
 
@@ -5784,14 +5784,14 @@ udot ==
 @dots{}
 @end smallexample
 
-@cindex @code{usdot_prod@var{m}} instruction pattern
-@item @samp{usdot_prod@var{m}}
+@cindex @code{usdot_prod@var{m}@var{n}} instruction pattern
+@item @samp{usdot_prod@var{m}@var{n}}
 Compute the sum of the products of elements of different signs.
 Operand 1 must be unsigned and operand 2 signed. Their
 product, which is of a wider mode, is computed and added to operand 3.
 Operand 3 is of a mode equal or wider than the mode of the product. The
 result is placed in operand 0, which is of the same mode as operand 3.
-@var{m} is the mode of operand 1 and operand 2.
+@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 and 
2.
 
 Semantically the expressions perform the multiplication in the following signs
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 45e117a7f50..fce4b2d5b08 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -106,6 +106,9 @@ OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
 OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b")
 OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
 OPTAB_CD(vec_init_optab, "vec_init$a$b")
+OPTAB_CD (sdot_prod_optab, "sdot_prod$I$a$b")
+OPTAB_CD (udot_prod_optab, "udot_prod$I$a$b")
+OPTAB_CD (usdot_prod_optab, "usdot_prod$I$a$b")
 
 OPTAB_CD (while_ult_optab, "while_ult$a$b")
 
@@ -409,10 +412,7 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor")
 OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
 OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
 OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
-OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
-OPTAB_D (udot_prod_optab, "udot_prod$I$a")
-OPTAB_D (usdot_prod

[PATCH 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-07-10 Thread Victor Do Nascimento
From: Victor Do Nascimento 

Given the novel treatment of the dot product optab as a conversion we
are now able to target, for a given architecture, different
relationships between output modes and input modes.

This is made clearer by way of example. Previously, on AArch64, the
following loop was vectorizable:

uint32_t udot4(int n, uint8_t* data) {
  uint32_t sum = 0;
  for (int i=0; i
+
+uint32_t udot4(int n, uint8_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i

[PATCH 07/10] mips: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/mips/loongson-mmi.md (sdot_prodv4hi): Deleted.
(sdot_prodv2siv4hi): New.
---
 gcc/config/mips/loongson-mmi.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/loongson-mmi.md b/gcc/config/mips/loongson-mmi.md
index dd166bfa4c9..4d958730139 100644
--- a/gcc/config/mips/loongson-mmi.md
+++ b/gcc/config/mips/loongson-mmi.md
@@ -394,7 +394,7 @@ (define_insn "loongson_pmaddhw"
   "pmaddhw\t%0,%1,%2"
   [(set_attr "type" "fmul")])
 
-(define_expand "sdot_prodv4hi"
+(define_expand "sdot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand" "")
(match_operand:V4HI 1 "register_operand" "")
(match_operand:V4HI 2 "register_operand" "")
-- 
2.34.1



[PATCH 09/10] c6x: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/c6x/c6x.md (sdot_prodv2hi): Deleted.
(sdot_prodsiv2hi): New.
---
 gcc/config/c6x/c6x.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/c6x/c6x.md b/gcc/config/c6x/c6x.md
index 5964dd69d0d..ea9ffe8b4e1 100644
--- a/gcc/config/c6x/c6x.md
+++ b/gcc/config/c6x/c6x.md
@@ -3082,7 +3082,7 @@ (define_insn "v2hi3"
 ;; Widening vector multiply and dot product.
 ;; See c6x-mult.md.in for the define_insn patterns
 
-(define_expand "sdot_prodv2hi"
+(define_expand "sdot_prodsiv2hi"
   [(match_operand:SI 0 "register_operand" "")
(match_operand:V2HI 1 "register_operand" "")
(match_operand:V2HI 2 "register_operand" "")
-- 
2.34.1



[PATCH 08/10] altivec: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/rs6000/altivec.md (udot_prod): Deleted.
(udot_prodv4si): New.
(sdot_prodv8hi): Deleted.
(sdot_prodv4siv8hi): New.
---
 gcc/config/rs6000/altivec.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..0682c8eb184 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -3699,7 +3699,7 @@ (define_expand "neg2"
 }
 })
 
-(define_expand "udot_prod"
+(define_expand "udot_prodv4si"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
(unspec:V4SI [(match_operand:VIshort 1 "register_operand" 
"v")  
@@ -3711,7 +3711,7 @@ (define_expand "udot_prod"
   DONE;
 })
 
-(define_expand "sdot_prodv8hi"
+(define_expand "sdot_prodv4siv8hi"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
(unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
-- 
2.34.1



[PATCH 02/10] autovectorizer: Add basic support for convert optabs

2024-07-10 Thread Victor Do Nascimento
Given the shift from modeling dot products as direct optabs to
treating them as conversion optabs, we make necessary changes to the
autovectorizer code to ensure that given the relevant tree code,
together with the input and output data modes, we can retrieve the
relevant optab and subsequently the insn_code for it.

gcc/ChangeLog:

* gimple-match-exports.cc (directly_supported_p): Add overload
for conversion-type optabs.
* gimple-match.h (directly_supported_p): Add new function
prototype.
* optabs.cc (expand_widen_pattern_expr): Make the
DOT_PROD_EXPR tree code use `find_widening_optab_handler' to
retrieve icode.
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): make it
call conversion-type overloaded `directly_supported_p'.
* tree-vect-patterns.cc (vect_supportable_conv_optab_p): New.
(vect_recog_dot_prod_pattern): s/direct/conv/ in call to
`vect_supportable_direct_optab_p'.
---
 gcc/gimple-match-exports.cc | 18 
 gcc/gimple-match.h  |  2 ++
 gcc/optabs.cc   |  3 ++-
 gcc/tree-vect-loop.cc   |  1 +
 gcc/tree-vect-patterns.cc   | 43 +++--
 5 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index aacf3ff0414..c079fa1fb19 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -1381,6 +1381,24 @@ directly_supported_p (code_helper code, tree type, 
optab_subtype query_type)
  && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
 }
 
+/* As above, overloading the function for conversion-type optabs.  */
+bool
+directly_supported_p (code_helper code, tree type_out, tree type_in,
+ optab_subtype query_type)
+{
+
+  if (code.is_tree_code ())
+{
+  convert_optab optab = optab_for_tree_code (tree_code (code), type_in,
+   query_type);
+  return (optab != unknown_optab
+ && convert_optab_handler (optab, TYPE_MODE (type_out),
+   TYPE_MODE (type_in)) != 
CODE_FOR_nothing);
+}
+  gcc_unreachable ();
+}
+
+
 /* A wrapper around the internal-fn.cc versions of get_conditional_internal_fn
for a code_helper CODE operating on type TYPE.  */
 
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index d710fcbace2..0333a5db00a 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -419,6 +419,8 @@ code_helper canonicalize_code (code_helper, tree);
 
 #ifdef GCC_OPTABS_TREE_H
 bool directly_supported_p (code_helper, tree, optab_subtype = optab_default);
+bool directly_supported_p (code_helper, tree, tree,
+  optab_subtype = optab_default);
 #endif
 
 internal_fn get_conditional_internal_fn (code_helper, tree);
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 185c5b1a705..32737fb80e8 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -317,7 +317,8 @@ expand_widen_pattern_expr (const_sepops ops, rtx op0, rtx 
op1, rtx wide_op,
 widen_pattern_optab
   = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
-  || ops->code == WIDEN_MULT_MINUS_EXPR)
+  || ops->code == WIDEN_MULT_MINUS_EXPR
+  || ops->code == DOT_PROD_EXPR)
 icode = find_widening_optab_handler (widen_pattern_optab,
 TYPE_MODE (TREE_TYPE (ops->op2)),
 tmode0);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a64b5082bd1..7e4c1e0f52e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5289,6 +5289,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info stmt_info)
 
   gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
   return !directly_supported_p (DOT_PROD_EXPR,
+   STMT_VINFO_VECTYPE (stmt_info),
STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
optab_vector_mixed_sign);
 }
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 86e893a1c43..c4dd627aa90 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -248,6 +248,45 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree 
otype, tree_code code,
   return true;
 }
 
+/* Return true if the target supports a vector version of CODE,
+   where CODE is known to map to a conversion optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
+
+   When returning true, set *VECOTYPE_OUT to the vector version of OTYPE.
+   Also set *VECITYPE_OUT to the vector version of ITYPE if VECITYPE_OUT
+   is nonnull.  */
+
+static bool
+vect_supportable_conv_optab_p (vec_info *vinfo, tree otype, tree_code code,
+tree itype, tree *vecotype_out,
+   

[PATCH 06/10] arc: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/arc/simdext.md (sdot_prodv2hi): Deleted.
(sdot_prodsiv2hi): New.
(udot_prodv2hi): Deleted.
(udot_prodsiv2hi): New.
(sdot_prodv4hi): Deleted.
(sdot_prodv2siv4hi): New.
(udot_prodv4hi): Deleted.
(udot_prodv2siv4hi): New.
---
 gcc/config/arc/simdext.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arc/simdext.md b/gcc/config/arc/simdext.md
index 4e51a237c3a..0696f0abb70 100644
--- a/gcc/config/arc/simdext.md
+++ b/gcc/config/arc/simdext.md
@@ -1643,7 +1643,7 @@ (define_insn "dmpyh"
 
 ;; We can use dmac as well here.  To be investigated which version
 ;; brings more.
-(define_expand "sdot_prodv2hi"
+(define_expand "sdot_prodsiv2hi"
   [(match_operand:SI 0 "register_operand" "")
(match_operand:V2HI 1 "register_operand" "")
(match_operand:V2HI 2 "register_operand" "")
@@ -1656,7 +1656,7 @@ (define_expand "sdot_prodv2hi"
  DONE;
 })
 
-(define_expand "udot_prodv2hi"
+(define_expand "udot_prodsiv2hi"
   [(match_operand:SI 0 "register_operand" "")
(match_operand:V2HI 1 "register_operand" "")
(match_operand:V2HI 2 "register_operand" "")
@@ -1669,7 +1669,7 @@ (define_expand "udot_prodv2hi"
  DONE;
 })
 
-(define_expand "sdot_prodv4hi"
+(define_expand "sdot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand" "")
(match_operand:V4HI 1 "register_operand" "")
(match_operand:V4HI 2 "register_operand" "")
@@ -1688,7 +1688,7 @@ (define_expand "sdot_prodv4hi"
  DONE;
 })
 
-(define_expand "udot_prodv4hi"
+(define_expand "udot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand" "")
(match_operand:V4HI 1 "register_operand" "")
(match_operand:V4HI 2 "register_operand" "")
-- 
2.34.1



[PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/i386/mmx.md (usdot_prodv8qi): Deleted.
(usdot_prodv2siv8qi): New.
(sdot_prodv8qi): Deleted.
(sdot_prodv2siv8qi): New.
(udot_prodv8qi): Deleted.
(udot_prodv2siv8qi): New.
(usdot_prodv4hi): Deleted.
(usdot_prodv2siv4hi): New.
(udot_prodv4hi): Deleted.
(udot_prodv2siv4hi): New.
(sdot_prodv4hi): Deleted.
(sdot_prodv2siv4hi): New.
* config/i386/sse.md (fourwayacc): New.
(twowayacc): New.
(sdot_prod): Deleted.
(sdot_prod): New.
(sdot_prodv4si): Deleted.
(sdot_prodv2div4si): New.
(usdot_prod): Deleted.
(usdot_prod): New.
(sdot_prod): Deleted.
(sdot_prod): New.
(sdot_prodv64qi): Deleted.
(sdot_prodv16siv64qi): New.
(udot_prod): Deleted.
(udot_prod): New.
(udot_prodv64qi): Deleted.
(udot_prodv16qiv64qi): New.
(usdot_prod): Deleted.
(usdot_prod): New.
(udot_prod): Deleted.
(udot_prod): New.
---
 gcc/config/i386/mmx.md | 30 +--
 gcc/config/i386/sse.md | 47 +-
 2 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 94d3a6e5692..d78739b033d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -6344,7 +6344,7 @@ (define_expand "usadv8qi"
   DONE;
 })
 
-(define_expand "usdot_prodv8qi"
+(define_expand "usdot_prodv2siv8qi"
   [(match_operand:V2SI 0 "register_operand")
(match_operand:V8QI 1 "register_operand")
(match_operand:V8QI 2 "register_operand")
@@ -6363,7 +6363,7 @@ (define_expand "usdot_prodv8qi"
   rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
   rtx op0 = gen_reg_rtx (V4SImode);
 
-  emit_insn (gen_usdot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_usdot_prodv4siv16qi (op0, op1, op2, op3));
   emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
  }
else
@@ -6377,7 +6377,7 @@ (define_expand "usdot_prodv8qi"
   emit_move_insn (op3, CONST0_RTX (V4SImode));
   emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
   emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
 
   /* vec_perm (op0, 2, 3, 0, 1);  */
   emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
@@ -6388,7 +6388,7 @@ (define_expand "usdot_prodv8qi"
 DONE;
 })
 
-(define_expand "sdot_prodv8qi"
+(define_expand "sdot_prodv2siv8qi"
   [(match_operand:V2SI 0 "register_operand")
(match_operand:V8QI 1 "register_operand")
(match_operand:V8QI 2 "register_operand")
@@ -6406,7 +6406,7 @@ (define_expand "sdot_prodv8qi"
   rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
   rtx op0 = gen_reg_rtx (V4SImode);
 
-  emit_insn (gen_sdot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv16qi (op0, op1, op2, op3));
   emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
 }
   else
@@ -6420,7 +6420,7 @@ (define_expand "sdot_prodv8qi"
   emit_move_insn (op3, CONST0_RTX (V4SImode));
   emit_insn (gen_extendv8qiv8hi2 (op1, operands[1]));
   emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
 
   /* vec_perm (op0, 2, 3, 0, 1);  */
   emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
@@ -6432,7 +6432,7 @@ (define_expand "sdot_prodv8qi"
 
 })
 
-(define_expand "udot_prodv8qi"
+(define_expand "udot_prodv2siv8qi"
   [(match_operand:V2SI 0 "register_operand")
(match_operand:V8QI 1 "register_operand")
(match_operand:V8QI 2 "register_operand")
@@ -6450,7 +6450,7 @@ (define_expand "udot_prodv8qi"
   rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
   rtx op0 = gen_reg_rtx (V4SImode);
 
-  emit_insn (gen_udot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_udot_prodv4siv16qi (op0, op1, op2, op3));
   emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
 }
   else
@@ -6464,7 +6464,7 @@ (define_expand "udot_prodv8qi"
   emit_move_insn (op3, CONST0_RTX (V4SImode));
   emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
   emit_insn (gen_zero_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
 
   /* vec_perm (op0, 2, 3, 0, 1);  */
   emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
@@ -6476,7 +6476,7 @@ (define_expand "udot_prodv8qi"
 
 })
 
-(define_expa

[PATCH 00/10] Make `dot_prod' a convert-type optab

2024-07-10 Thread Victor Do Nascimento
Given the specification in the GCC internals manual defines the
{u|s}dot_prod standard name as taking "two signed elements of the
same mode, adding them to a third operand of wider mode", there is
currently ambiguity in the relationship between the mode of the first
two arguments and that of the third.

This vagueness means that, in theory, different modes may be
supportable in the third argument.  This flexibility would allow for a
given backend to add to the accumulator a different number of
vectorized products, e.g. A backend may provide instructions for both:

  accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]

and

  accum += a[0] * b[0] + a[1] * b[1],

as is now seen in the SVE2.1 extension to AArch64.  In spite of the
aforementioned flexibility, modeling the dot-product operation as a
direct optab means that we have no way to encode both input and the
accumulator data modes into the backend pattern name, which prevents
us from harnessing this flexibility.

The purpose of this patch-series is therefore to remedy this current
shortcoming, moving the `dot_prod' from its current implementation as
a direct optab to an implementation where, as a conversion optab, we
are able to differentiate between dot products taking the same input
mode but resulting in a different output mode.

Regression-tested on x86_64, aarch64 and armhf.  I'd appreciate help
running relevant tests on the remaining architectures, i.e. arc, mips,
altivec and c6x to ensure I've not inadvertently broken anything for
those backends.

Victor Do Nascimento (10):
  optabs: Make all `*dot_prod_optab's modeled as conversions
  autovectorizer: Add basic support for convert optabs
  aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.
  arm: Fix arm backend-use of (u|s|us)dot_prod patterns.
  i386: Fix dot_prod backend patterns for mmx and sse targets
  arc: Adjust dot-product backend patterns
  mips:  Adjust dot-product backend patterns
  altivec: Adjust dot-product backend patterns
  c6x:  Adjust dot-product backend patterns
  autovectorizer: Test autovectorization of different dot-prod modes.

 gcc/config/aarch64/aarch64-builtins.cc| 71 ++
 gcc/config/aarch64/aarch64-simd-builtins.def  |  4 -
 gcc/config/aarch64/aarch64-simd.md|  9 +-
 .../aarch64/aarch64-sve-builtins-base.cc  | 13 +--
 gcc/config/aarch64/aarch64-sve-builtins.cc| 17 
 gcc/config/aarch64/aarch64-sve-builtins.h |  3 +
 gcc/config/aarch64/aarch64-sve.md |  6 +-
 gcc/config/aarch64/aarch64-sve2.md|  2 +-
 gcc/config/aarch64/iterators.md   |  1 +
 gcc/config/arc/simdext.md |  8 +-
 gcc/config/arm/arm-builtins.cc| 95 +++
 gcc/config/arm/arm-protos.h   |  3 +
 gcc/config/arm/arm.cc |  1 +
 gcc/config/arm/arm_neon_builtins.def  |  3 -
 gcc/config/arm/neon.md|  4 +-
 gcc/config/c6x/c6x.md |  2 +-
 gcc/config/i386/mmx.md| 30 +++---
 gcc/config/i386/sse.md| 47 +
 gcc/config/mips/loongson-mmi.md   |  2 +-
 gcc/config/rs6000/altivec.md  |  4 +-
 gcc/doc/md.texi   | 18 ++--
 gcc/gimple-match-exports.cc   | 18 
 gcc/gimple-match.h|  2 +
 gcc/optabs.cc |  3 +-
 gcc/optabs.def|  6 +-
 .../gcc.dg/vect/vect-dotprod-twoway.c | 38 
 .../aarch64/sme/vect-dotprod-twoway.c | 25 +
 gcc/tree-vect-loop.cc |  1 +
 gcc/tree-vect-patterns.cc | 43 -
 29 files changed, 399 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c

-- 
2.34.1



[PATCH 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.

2024-07-10 Thread Victor Do Nascimento
Given recent changes to the dot_prod standard pattern name, this patch
fixes the aarch64 back-end by implementing the following changes:

1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
2. Rewrite initialization and function expansion mechanism for simd
builtins.
3. Fix all direct calls to back-end `dot_prod' patterns in SVE
builtins.

Finally, given that it is now possible for the compiler to
differentiate between the two- and four-way dot product, we add a test
to ensure that autovectorization picks up on dot-product patterns
where the result is twice the width of the operands.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
New AARCH64_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI,
UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI.
(aarch64_init_builtin_dotprod_functions): New.
(aarch64_init_simd_builtins): Add call to
`aarch64_init_builtin_dotprod_functions'.
(aarch64_general_gimple_fold_builtin): Add DOT_PROD_EXPR
handling.
* config/aarch64/aarch64-simd-builtins.def: Remove macro
expansion-based initialization and expansion
of (u|s|us)dot_prod builtins.
* config/aarch64/aarch64-simd.md
(dot_prod): Deleted.
(dot_prod): New.
(usdot_prod): Deleted.
(usdot_prod): New.
(sadv16qi): Adjust call to gen_udot_prod take second mode.
(popcount): fix use of `udot_prod_optab'.
* config/aarch64/aarch64-sve-builtins-base.cc
(svdot_impl::expand): s/direct/convert/ in
`convert_optab_handler_for_sign' function call.
(svusdot_impl::expand): add second mode argument in call to
`code_for_dot_prod'.
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::convert_optab_handler_for_sign): New class
method.
* config/aarch64/aarch64-sve-builtins.h
(class function_expander): Add prototype for new
`convert_optab_handler_for_sign' method.
* gcc/config/aarch64/aarch64-sve.md
(dot_prod): Deleted.
(dot_prod): New.
(@dot_prod): Deleted.
(@dot_prod): New.
(sad): Adjust call to gen_udot_prod take second mode.
* gcc/config/aarch64/aarch64-sve2.md
(@aarch64_sve_dotvnx4sivnx8hi): Deleted.
(dot_prodvnx4sivnx8hi): New.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 71 +++
 gcc/config/aarch64/aarch64-simd-builtins.def  |  4 --
 gcc/config/aarch64/aarch64-simd.md|  9 +--
 .../aarch64/aarch64-sve-builtins-base.cc  | 13 ++--
 gcc/config/aarch64/aarch64-sve-builtins.cc| 17 +
 gcc/config/aarch64/aarch64-sve-builtins.h |  3 +
 gcc/config/aarch64/aarch64-sve.md |  6 +-
 gcc/config/aarch64/aarch64-sve2.md|  2 +-
 gcc/config/aarch64/iterators.md   |  1 +
 .../aarch64/sme/vect-dotprod-twoway.c | 25 +++
 10 files changed, 133 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 30669f8aa18..6c7c86d0e6e 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -783,6 +783,12 @@ enum aarch64_builtins
   AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1,
   AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_PATTERN_START
  + ARRAY_SIZE (aarch64_simd_builtin_data) - 1,
+  AARCH64_BUILTIN_SDOTV8QI,
+  AARCH64_BUILTIN_SDOTV16QI,
+  AARCH64_BUILTIN_UDOTV8QI,
+  AARCH64_BUILTIN_UDOTV16QI,
+  AARCH64_BUILTIN_USDOTV8QI,
+  AARCH64_BUILTIN_USDOTV16QI,
   AARCH64_CRC32_BUILTIN_BASE,
   AARCH64_CRC32_BUILTINS
   AARCH64_CRC32_BUILTIN_MAX,
@@ -1642,6 +1648,60 @@ handle_arm_neon_h (void)
   aarch64_init_simd_intrinsics ();
 }
 
+void
+aarch64_init_builtin_dotprod_functions (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+
+  tree uv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_unsigned);
+  tree sv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_none);
+  tree uv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_unsigned);
+  tree sv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_none);
+  tree uv2si = aarch64_simd_builtin_type (V2SImode, qualifier_unsigned);
+  tree sv2si = aarch64_simd_builtin_type (V2SImode, qualifier_none);
+  tree uv4si = aarch64_simd_builtin_type (V4SImode, qualifier_unsigned);
+  tree sv4si = aarch64_simd_builtin_type (V4SImode, qualifier_none);
+
+  struct builtin_decls_data
+  {
+tree out_type_node;
+tree in_type1_node;
+tree in_type2_node;
+const char *builtin_name;
+int function_code;
+  };
+
+#define NAME(A) "__builtin_aarch64_" #A
+#define ENUM(B) AARCH64_BUILTIN_##B
+
+  builtin_decls_data bdda[] =
+  {
+{ sv2si, sv8qi,  sv8qi,  N

Re: [PATCH] fixincludes: add bypass to darwin_objc_runtime_1

2024-07-10 Thread Iain Sandoe



> On 10 Jul 2024, at 14:09, FX Coudert  wrote:
> 
> The  header that this fix applies to has been fixed in macOS 
> 15 beta SDK. Therefore, we can include a bypass.

shame it’s not fixed earlier :( 

> Tested on aarch64-apple-darwin24. OK to push?

yes OK for tunk (and backports perhaps once macOS15 xcode16 settle down)
thanks for the patch
Iain

> 
> FX
> 
> <0001-fixincludes-add-bypass-to-darwin_objc_runtime_1.patch>



Re: mve: Fix vsetq_lane for 64-bit elements with lane 1 [PR 115611]

2024-07-10 Thread Richard Earnshaw (lists)
On 26/06/2024 13:20, Andre Vieira (lists) wrote:
> This patch fixes the backend pattern that was printing the wrong input
> scalar register pair when inserting into lane 1.
> 
> Added a new test to force float-abi=hard so we can use scan-assembler to check
> correct codegen.
> 
> Regression tested arm-none-eabi with 
> -march=armv8.1-m.main+mve/-mfloat-abi=hard/-mfpu=auto
> 
> gcc/ChangeLog:
> 
> PR target/115611
> * config/arm/mve.md (mve_vec_setv2di_internal): Fix printing of input
>     scalar register pair when lane = 1.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/arm/mve/intrinsics/vsetq_lane_su64.c: New test.

OK.

R.


Re: [PATCH] fixincludes: add bypass to darwin_objc_runtime_1

2024-07-10 Thread FX Coudert
Thanks, pushed as 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8326956159053b215b5cfe6cd41bfceff413491e

FX

Re: [PATCH] c++, contracts: Fix ICE in create_tmp_var [PR113968]

2024-07-10 Thread Jason Merrill

On 7/10/24 5:37 AM, Nina Dinka Ranns wrote:



On Tue, 9 Jul 2024 at 22:50, Jason Merrill > wrote:


On 7/9/24 6:41 AM, Nina Dinka Ranns wrote:
 > On Mon, 8 Jul 2024 at 16:01, Jason Merrill mailto:ja...@redhat.com>
 > >> wrote:
 >
 >     On 7/8/24 7:47 AM, Nina Dinka Ranns wrote:
 >      > HI Jason,
 >      >
 >      > On Fri, 5 Jul 2024 at 17:31, Jason Merrill
mailto:ja...@redhat.com>
 >     >> wrote:
 >      >>
 >      >> On 7/5/24 10:25 AM, Nina Dinka Ranns wrote:
 >      >>> Certain places in contract parsing currently do not
check for
 >     errors.
 >      >>> This results in contracts
 >      >>> with embedded errors which eventually confuse gimplify.
Checks for
 >      >>> errors added in
 >      >>> grok_contract() and cp_parser_contract_attribute_spec()
to exit
 >     early
 >      >>> if an error is encountered.
 >      >>
 >      >> Thanks for the patch!
 >      >>
 >      >>> Tested on x86_64-pc-linux-gnu
 >      >>> ---
 >      >>>
 >      >>>           PR c++/113968
 >      >>>
 >      >>> gcc/cp/ChangeLog:
 >      >>>
 >      >>>           * contracts.cc (grok_contract): Check for
 >     error_mark_node early
 >      >>>             exit
 >      >>
 >      >> These hunks are OK.
 >      >>
 >      >>>           * parser.cc (cp_parser_contract_attribute_spec):
 >     Check for
 >      >>>             error_mark_node early exit
 >      >>
 >      >> This seems redundant, since finish_contract_attribute already
 >     checks for
 >      >> error_mark_node and we're returning its result unchanged.
 >      >
 >      > good catch, removed.
 >      >
 >      >>
 >      >> Also, the convention is for wrapped lines in ChangeLog
entries
 >     to line
 >      >> up with the *, and to finish sentences with a period.
 >      >
 >      > done.
 >      >
 >      > Tests re-run on x86_64-pc-linux-gnu , no change.
 >
 >     This looks good, but the patch doesn't apply due to word
wrap.  To
 >     avoid
 >     that, I tend to use git send-email; sending the patch as an
attachment
 >     is also OK.  Or see
 >
 > https://www.kernel.org/doc/html/latest/process/email-clients.html

 >   
  >
 >
 >     for tips on getting various email clients to leave patches alone.
 >
 >
 > ack, thank you for your patience.
 > This time, patch attached to the email.

It looks like the attached patch reverted to older ChangeLog entries,
without the periods, and with the dropped parser.cc change?

git gcc-verify also complains

 > ERR: line should start with a tab: "        * contracts.cc
(grok_contract): Check for error_mark_node early"
 > ERR: line should start with a tab: "          exit"
 > ERR: line should start with a tab: "        * parser.cc
(cp_parser_contract_attribute_spec): Check for"
 > ERR: line should start with a tab: "          error_mark_node
early exit"
 > ERR: line should start with a tab: "        *
g++.dg/contracts/pr113968.C: New test."
 > ERR: PR 113968 in subject but not in changelog: "c++, contracts:
Fix ICE in create_tmp_var [PR113968]"

Jason


Apologies. I must have copy pasted something wrong. I've setup 
gcc-verify and that passes.

Let's try again. Patch attached.


Pushed, thanks.

Jason



[PATCH] fixincludes: skip stdio_stdarg_h on darwin

2024-07-10 Thread FX Coudert
I found another useless fixincludes on darwin, but this one was a bit harder to 
diagnose. GCC trunk applies a fix to  on modern Darwin: it is 
stdio_stdarg_h. That fix is actually part of a pair, along with stdio_va_list, 
and they appear to work around issues with some old Unix (or BSD?) headers and 
the definition of va_list. It is not entirely clear to me what they fix, but 
they have been here forever.

They use various bypass mechanisms, but those are fragile. I have no idea if 
the fix is actually needed on any still-supported system, and maybe some global 
reviewer might want to remove it. But for now, I only want to bypass the fix on 
Darwin: it is useless there, and applying it makes our builds more fragile (and 
sensitive to the SDK version). Solaris has already opted out years ago, and now 
we do the same.

To show the madness of this fix, the macOS headers actually contain a comment 
that is supposed to trigger the bypass:

/* DO NOT REMOVE THIS COMMENT: fixincludes needs to see:
 * __gnuc_va_list and include  */

This kludge was added to the Apple headers in Libc-391 released around 2004. 
But it recently became ineffective, due to the majority of the content of 
 being moved into <_stdio.h> (which is not covered by fixincludes).

Anyway, the only sane thing to do is to disarm this fix on darwin, as the 
attached patch does.
Tested on aarch64-apple-darwin24, OK to push?

FX



PS: With that patch, only two fixincludes remain active for latest darwin:
- handling of __FLT_EVAL_METHOD__ == 16 in math.h (I have reported this as a 
bug)
- handling of Apple’s “deprecated” functions: gets, sprintf, tmpnam, vsprintf, 
tempnam



0001-fixincludes-skip-stdio_stdarg_h-on-darwin.patch
Description: Binary data


[PATCH] recog: Handle some mode-changing hardreg propagations

2024-07-10 Thread Richard Sandiford
insn_propagation would previously only replace (reg:M H) with X
for some hard register H if the uses of H were also in mode M.
This patch extends it to handle simple mode punning too.

The original motivation was to try to get rid of the execution
frequency test in aarch64_split_simd_shift_p, but doing that is
follow-up work.

I tried this on at least one target per CPU directory (as for
the late-combine patches) and it seems to be a small win for
all of them.

The patch includes a couple of updates to the ia32 results.
In pr105033.c, foo3 replaced:

   vmovq   8(%esp), %xmm1
   vpunpcklqdq %xmm1, %xmm0, %xmm0

with:

   vmovhps 8(%esp), %xmm0, %xmm0

In vect-bfloat16-2b.c, 5 of the vec_extract_v32bf_* routines
(specifically the ones with nonzero even indices) replaced
things like:

   movl28(%esp), %eax
   vmovd   %eax, %xmm0

with:

   vpinsrw $0, 28(%esp), %xmm0, %xmm0

(These functions return a bf16, and so only the low 16 bits matter.)

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (insn_propagation::apply_to_rvalue_1): Handle simple
cases of hardreg propagation in which the register is set and
used in different modes.

gcc/testsuite/
* gcc.target/i386/pr105033.c: Expect vmovhps for the ia32 version
of foo.
* gcc.target/i386/vect-bfloat16-2b.c: Expect more vpinsrws.
---
 gcc/recog.cc  | 31 +++
 gcc/testsuite/gcc.target/i386/pr105033.c  |  4 ++-
 .../gcc.target/i386/vect-bfloat16-2b.c|  2 +-
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/gcc/recog.cc b/gcc/recog.cc
index 56370e40e01..36507f3f57c 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -1055,7 +1055,11 @@ insn_propagation::apply_to_rvalue_1 (rtx *loc)
   machine_mode mode = GET_MODE (x);
 
   auto old_num_changes = num_validated_changes ();
-  if (from && GET_CODE (x) == GET_CODE (from) && rtx_equal_p (x, from))
+  if (from
+  && GET_CODE (x) == GET_CODE (from)
+  && (REG_P (x)
+ ? REGNO (x) == REGNO (from)
+ : rtx_equal_p (x, from)))
 {
   /* Don't replace register asms in asm statements; we mustn't
 change the user's register allocation.  */
@@ -1065,11 +1069,26 @@ insn_propagation::apply_to_rvalue_1 (rtx *loc)
  && asm_noperands (PATTERN (insn)) > 0)
return false;
 
+  rtx newval = to;
+  if (GET_MODE (x) != GET_MODE (from))
+   {
+ gcc_assert (REG_P (x) && HARD_REGISTER_P (x));
+ if (REG_NREGS (x) != REG_NREGS (from)
+ || !REG_CAN_CHANGE_MODE_P (REGNO (x), GET_MODE (from),
+GET_MODE (x)))
+   return false;
+ newval = simplify_subreg (GET_MODE (x), to, GET_MODE (from),
+   subreg_lowpart_offset (GET_MODE (x),
+  GET_MODE (from)));
+ if (!newval)
+   return false;
+   }
+
   if (should_unshare)
-   validate_unshare_change (insn, loc, to, 1);
+   validate_unshare_change (insn, loc, newval, 1);
   else
-   validate_change (insn, loc, to, 1);
-  if (mem_depth && !REG_P (to) && !CONSTANT_P (to))
+   validate_change (insn, loc, newval, 1);
+  if (mem_depth && !REG_P (newval) && !CONSTANT_P (newval))
{
  /* We're substituting into an address, but TO will have the
 form expected outside an address.  Canonicalize it if
@@ -1083,9 +1102,9 @@ insn_propagation::apply_to_rvalue_1 (rtx *loc)
{
  /* TO is owned by someone else, so create a copy and
 return TO to its original form.  */
- rtx to = copy_rtx (*loc);
+ newval = copy_rtx (*loc);
  cancel_changes (old_num_changes);
- validate_change (insn, loc, to, 1);
+ validate_change (insn, loc, newval, 1);
}
}
   num_replacements += 1;
diff --git a/gcc/testsuite/gcc.target/i386/pr105033.c 
b/gcc/testsuite/gcc.target/i386/pr105033.c
index ab05e3b3bc8..10e39783464 100644
--- a/gcc/testsuite/gcc.target/i386/pr105033.c
+++ b/gcc/testsuite/gcc.target/i386/pr105033.c
@@ -1,6 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-march=sapphirerapids -O2" } */
-/* { dg-final { scan-assembler-times {vpunpcklqdq[ \t]+} 3 } } */
+/* { dg-final { scan-assembler-times {vpunpcklqdq[ \t]+} 3 { target { ! ia32 } 
} } } */
+/* { dg-final { scan-assembler-times {vpunpcklqdq[ \t]+} 2 { target ia32 } } } 
*/
+/* { dg-final { scan-assembler-times {vmovhps[ \t]+} 1 { target ia32 } } } */
 /* { dg-final { scan-assembler-not {vpermi2[wb][ \t]+} } } */
 
 typedef _Float16 v8hf __attribute__((vector_size (16)));
diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c 
b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c
index 29bf601d537..0d1e14d6eb6 100644
--- a/gcc/

Re: [PATCH] recog: Handle some mode-changing hardreg propagations

2024-07-10 Thread Jeff Law




On 7/10/24 9:32 AM, Richard Sandiford wrote:

insn_propagation would previously only replace (reg:M H) with X
for some hard register H if the uses of H were also in mode M.
This patch extends it to handle simple mode punning too.

The original motivation was to try to get rid of the execution
frequency test in aarch64_split_simd_shift_p, but doing that is
follow-up work.

I tried this on at least one target per CPU directory (as for
the late-combine patches) and it seems to be a small win for
all of them.

The patch includes a couple of updates to the ia32 results.
In pr105033.c, foo3 replaced:

vmovq   8(%esp), %xmm1
vpunpcklqdq %xmm1, %xmm0, %xmm0

with:

vmovhps 8(%esp), %xmm0, %xmm0

In vect-bfloat16-2b.c, 5 of the vec_extract_v32bf_* routines
(specifically the ones with nonzero even indices) replaced
things like:

movl28(%esp), %eax
vmovd   %eax, %xmm0

with:

vpinsrw $0, 28(%esp), %xmm0, %xmm0

(These functions return a bf16, and so only the low 16 bits matter.)

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (insn_propagation::apply_to_rvalue_1): Handle simple
cases of hardreg propagation in which the register is set and
used in different modes.

gcc/testsuite/
* gcc.target/i386/pr105033.c: Expect vmovhps for the ia32 version
of foo.
* gcc.target/i386/vect-bfloat16-2b.c: Expect more vpinsrws.

OK
jeff



[PATCH] internal-fn: Reuse SUBREG_PROMOTED_VAR_P handling

2024-07-10 Thread Richard Sandiford
expand_fn_using_insn has code to handle SUBREG_PROMOTED_VAR_P
destinations.  Specifically, for:

  (subreg/v:M1 (reg:M2 R) ...)

it creates a new temporary register T, uses it for the output
operand, then sign- or zero-extends the M1 lowpart of T to M2,
storing the result in R.

This patch splits this handling out into helper routines and
uses them for other instances of:

  if (!rtx_equal_p (target, ops[0].value))
emit_move_insn (target, ops[0].value);

It's quite probable that this doesn't help any of the other cases;
in particular, it shouldn't affect vectors.  But I think it could
be useful for the CRC work.

Bootstrapped & regression-tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
* internal-fn.cc (create_call_lhs_operand, assign_call_lhs): New
functions, split out from...
(expand_fn_using_insn): ...here.
(expand_load_lanes_optab_fn): Use them.
(expand_GOMP_SIMT_ENTER_ALLOC): Likewise.
(expand_GOMP_SIMT_LAST_LANE): Likewise.
(expand_GOMP_SIMT_ORDERED_PRED): Likewise.
(expand_GOMP_SIMT_VOTE_ANY): Likewise.
(expand_GOMP_SIMT_XCHG_BFLY): Likewise.
(expand_GOMP_SIMT_XCHG_IDX): Likewise.
(expand_partial_load_optab_fn): Likewise.
(expand_vec_cond_optab_fn): Likewise.
(expand_vec_cond_mask_optab_fn): Likewise.
(expand_RAWMEMCHR): Likewise.
(expand_gather_load_optab_fn): Likewise.
(expand_while_optab_fn): Likewise.
(expand_SPACESHIP): Likewise.
---
 gcc/internal-fn.cc | 162 +++--
 1 file changed, 84 insertions(+), 78 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 4948b48bde8..95946bfd683 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -199,6 +199,58 @@ const direct_internal_fn_info 
direct_internal_fn_array[IFN_LAST + 1] = {
   not_direct
 };
 
+/* Like create_output_operand, but for callers that will use
+   assign_call_lhs afterwards.  */
+
+static void
+create_call_lhs_operand (expand_operand *op, rtx lhs_rtx, machine_mode mode)
+{
+  /* Do not assign directly to a promoted subreg, since there is no
+ guarantee that the instruction will leave the upper bits of the
+ register in the state required by SUBREG_PROMOTED_SIGN.  */
+  rtx dest = lhs_rtx;
+  if (dest && GET_CODE (dest) == SUBREG && SUBREG_PROMOTED_VAR_P (dest))
+dest = NULL_RTX;
+  create_output_operand (op, dest, mode);
+}
+
+/* Move the result of an expanded instruction into the lhs of a gimple call.
+   LHS is the lhs of the call, LHS_RTX is its expanded form, and OP is the
+   result of the expanded instruction.  OP should have been set up by
+   create_call_lhs_operand.  */
+
+static void
+assign_call_lhs (tree lhs, rtx lhs_rtx, expand_operand *op)
+{
+  if (rtx_equal_p (lhs_rtx, op->value))
+return;
+
+  /* If the return value has an integral type, convert the instruction
+ result to that type.  This is useful for things that return an
+ int regardless of the size of the input.  If the instruction result
+ is smaller than required, assume that it is signed.
+
+ If the return value has a nonintegral type, its mode must match
+ the instruction result.  */
+  if (GET_CODE (lhs_rtx) == SUBREG && SUBREG_PROMOTED_VAR_P (lhs_rtx))
+{
+  /* If this is a scalar in a register that is stored in a wider
+mode than the declared mode, compute the result into its
+declared mode and then convert to the wider mode.  */
+  gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
+  rtx tmp = convert_to_mode (GET_MODE (lhs_rtx), op->value, 0);
+  convert_move (SUBREG_REG (lhs_rtx), tmp,
+   SUBREG_PROMOTED_SIGN (lhs_rtx));
+}
+  else if (GET_MODE (lhs_rtx) == GET_MODE (op->value))
+emit_move_insn (lhs_rtx, op->value);
+  else
+{
+  gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
+  convert_move (lhs_rtx, op->value, 0);
+}
+}
+
 /* Expand STMT using instruction ICODE.  The instruction has NOUTPUTS
output operands and NINPUTS input operands, where NOUTPUTS is either
0 or 1.  The output operand (if any) comes first, followed by the
@@ -220,15 +272,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
unsigned int noutputs,
   gcc_assert (noutputs == 1);
   if (lhs)
lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
-
-  /* Do not assign directly to a promoted subreg, since there is no
-guarantee that the instruction will leave the upper bits of the
-register in the state required by SUBREG_PROMOTED_SIGN.  */
-  rtx dest = lhs_rtx;
-  if (dest && GET_CODE (dest) == SUBREG && SUBREG_PROMOTED_VAR_P (dest))
-   dest = NULL_RTX;
-  create_output_operand (&ops[opno], dest,
-insn_data[icode].operand[opno].mode);
+  create_call_lhs_operand (&ops[opno], lhs_rtx,
+  insn_data[icode].operand[opn

Re: [PATCH] fixincludes: skip stdio_stdarg_h on darwin

2024-07-10 Thread Iain Sandoe
Hi FX,

> On 10 Jul 2024, at 16:25, FX Coudert  wrote:
> 
> I found another useless fixincludes on darwin, but this one was a bit harder 
> to diagnose. GCC trunk applies a fix to  on modern Darwin: it is 
> stdio_stdarg_h. That fix is actually part of a pair, along with 
> stdio_va_list, and they appear to work around issues with some old Unix (or 
> BSD?) headers and the definition of va_list. It is not entirely clear to me 
> what they fix, but they have been here forever.
> 
> They use various bypass mechanisms, but those are fragile. I have no idea if 
> the fix is actually needed on any still-supported system, and maybe some 
> global reviewer might want to remove it. But for now, I only want to bypass 
> the fix on Darwin: it is useless there, and applying it makes our builds more 
> fragile (and sensitive to the SDK version). Solaris has already opted out 
> years ago, and now we do the same.
> 
> To show the madness of this fix, the macOS headers actually contain a comment 
> that is supposed to trigger the bypass:
> 
> /* DO NOT REMOVE THIS COMMENT: fixincludes needs to see:
> * __gnuc_va_list and include  */
> 
> This kludge was added to the Apple headers in Libc-391 released around 2004. 
> But it recently became ineffective, due to the majority of the content of 
>  being moved into <_stdio.h> (which is not covered by fixincludes).
> 
> Anyway, the only sane thing to do is to disarm this fix on darwin, as the 
> attached patch does.

Right, if the comment was added in 2004, we have no still-supported OS versions 
that are relevant,

> Tested on aarch64-apple-darwin24, OK to push?

Yes, OK for trunk, and backports after some bake time,
thanks for the patch,
Iain

> 
> FX
> 
> 
> 
> PS: With that patch, only two fixincludes remain active for latest darwin:
> - handling of __FLT_EVAL_METHOD__ == 16 in math.h (I have reported this as a 
> bug)
> - handling of Apple’s “deprecated” functions: gets, sprintf, tmpnam, 
> vsprintf, tempnam
> 
> <0001-fixincludes-skip-stdio_stdarg_h-on-darwin.patch>



[PATCH] opts: allow -gctf, -gbtf, -gdwarf simultaneously

2024-07-10 Thread David Faust
[This is a resend of a patch previously sent as:
   PATCH v4 6/6 opts: allow any combination of DWARF,CTF,BTF
   https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654253.html]

Previously it was not supported to generate both CTF and BTF debug info
in the same compiler run, as both formats made incompatible changes to
the same internal data structures.

With the structural changes to CTF and BTF generation made in:

  d3f586ec50d3 ctf, btf: restructure CTF/BTF emission

in particular, with the guarantee that CTF will always be fully emitted
before any BTF translation occurs, there is no longer anything
preventing generation of both CTF and BTF at the same time.

This patch changes option parsing to lift the restriction on specifying
both -gbtf and -gctf at the same time, allowing for any combination of
-gdwarf, -gctf, and -gbtf to be active in the same compiler invocation.

Bootstrapped and tested on x86_64-linux-gnu.
Also tested on x86_64-linux-gnu for bpf-unknown-none.

gcc/
* opts.cc (set_debug_level): Allow any combination of -gdwarf,
-gctf and -gbtf to be enabled at the same time.

gcc/testsuite/
* gcc.dg/debug/btf/btf-3.c: New test.
* gcc.dg/debug/btf/btf-4.c: Likewise.
* gcc.dg/debug/btf/btf-5.c: Likewise.
---
 gcc/opts.cc| 20 +---
 gcc/testsuite/gcc.dg/debug/btf/btf-3.c |  8 
 gcc/testsuite/gcc.dg/debug/btf/btf-4.c |  8 
 gcc/testsuite/gcc.dg/debug/btf/btf-5.c |  9 +
 4 files changed, 30 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-5.c

diff --git a/gcc/opts.cc b/gcc/opts.cc
index d7e0126e11f8..735d0dd8accf 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -3508,21 +3508,11 @@ set_debug_level (uint32_t dinfo, int extended, const 
char *arg,
 }
   else
 {
-  /* Make and retain the choice if both CTF and DWARF debug info are to
-be generated.  */
-  if (((dinfo == DWARF2_DEBUG) || (dinfo == CTF_DEBUG))
- && ((opts->x_write_symbols == (DWARF2_DEBUG|CTF_DEBUG))
- || (opts->x_write_symbols == DWARF2_DEBUG)
- || (opts->x_write_symbols == CTF_DEBUG)))
-   {
- opts->x_write_symbols |= dinfo;
- opts_set->x_write_symbols |= dinfo;
-   }
-  /* However, CTF and BTF are not allowed together at this time.  */
-  else if (((dinfo == DWARF2_DEBUG) || (dinfo == BTF_DEBUG))
-  && ((opts->x_write_symbols == (DWARF2_DEBUG|BTF_DEBUG))
-  || (opts->x_write_symbols == DWARF2_DEBUG)
-  || (opts->x_write_symbols == BTF_DEBUG)))
+  /* Any combination of DWARF, CTF and BTF is allowed.  */
+  if (((dinfo == DWARF2_DEBUG) || (dinfo == CTF_DEBUG)
+  || (dinfo == BTF_DEBUG))
+ && ((opts->x_write_symbols | (DWARF2_DEBUG | CTF_DEBUG | BTF_DEBUG))
+  == (DWARF2_DEBUG | CTF_DEBUG | BTF_DEBUG)))
{
  opts->x_write_symbols |= dinfo;
  opts_set->x_write_symbols |= dinfo;
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-3.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-3.c
new file mode 100644
index ..93c8164a2a54
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-3.c
@@ -0,0 +1,8 @@
+/* Verify that BTF debug info can co-exist with DWARF.  */
+/* { dg-do compile } */
+/* { dg-options "-gdwarf -gbtf -dA" } */
+/* { dg-final { scan-assembler "0xeb9f.*btf_magic" } } */
+/* { dg-final { scan-assembler "DWARF version number" } } */
+
+void func (void)
+{ }
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-4.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-4.c
new file mode 100644
index ..b087917188bb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-4.c
@@ -0,0 +1,8 @@
+/* Verify that BTF debug info can co-exist with CTF.  */
+/* { dg-do compile } */
+/* { dg-options "-gctf -gbtf -dA" } */
+/* { dg-final { scan-assembler "0xeb9f.*btf_magic" } } */
+/* { dg-final { scan-assembler "0xdff2.*CTF preamble magic number" } } */
+
+void func (void)
+{ }
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-5.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-5.c
new file mode 100644
index ..45267b5fc422
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-5.c
@@ -0,0 +1,9 @@
+/* Verify that BTF, CTF and DWARF can all co-exist happily.  */
+/* { dg-do compile } */
+/* { dg-options "-gctf -gbtf -gdwarf -dA" } */
+/* { dg-final { scan-assembler "0xeb9f.*btf_magic" } } */
+/* { dg-final { scan-assembler "0xdff2.*CTF preamble magic number" } } */
+/* { dg-final { scan-assembler "DWARF version number" } } */
+
+void func (void)
+{ }
-- 
2.43.0



Re: [PATCH] RISC-V: c implies zca, and conditionally zcf & zcd

2024-07-10 Thread Jeff Law




On 7/10/24 4:12 AM, Fei Gao wrote:

According to Zc-1.0.4-3.pdf from
https://github.com/riscvarchive/riscv-code-size-reduction/releases/tag/v1.0.4-3
The rule is that:
- C always implies Zca
- C+F implies Zcf (RV32 only)
- C+D implies Zcd

Signed-off-by: Fei Gao 
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
c implies zca, and conditionally zcf & zcd.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-15.c: adapt TC.
* gcc.target/riscv/attribute-16.c: likewise.
* gcc.target/riscv/attribute-17.c: likewise.
* gcc.target/riscv/attribute-18.c: likewise.
* gcc.target/riscv/pr110696.c: likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: likewise.
* gcc.target/riscv/rvv/base/pr114352-1.c: likewise.
* gcc.target/riscv/rvv/base/pr114352-3.c: likewise.
* gcc.target/riscv/arch-39.c: New test.
* gcc.target/riscv/arch-40.c: New test.

OK.

jeff



Re: [PATCH] internal-fn: Reuse SUBREG_PROMOTED_VAR_P handling

2024-07-10 Thread Jeff Law




On 7/10/24 9:44 AM, Richard Sandiford wrote:

expand_fn_using_insn has code to handle SUBREG_PROMOTED_VAR_P
destinations.  Specifically, for:

   (subreg/v:M1 (reg:M2 R) ...)

it creates a new temporary register T, uses it for the output
operand, then sign- or zero-extends the M1 lowpart of T to M2,
storing the result in R.

This patch splits this handling out into helper routines and
uses them for other instances of:

   if (!rtx_equal_p (target, ops[0].value))
 emit_move_insn (target, ops[0].value);

It's quite probable that this doesn't help any of the other cases;
in particular, it shouldn't affect vectors.  But I think it could
be useful for the CRC work.

Bootstrapped & regression-tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
* internal-fn.cc (create_call_lhs_operand, assign_call_lhs): New
functions, split out from...
(expand_fn_using_insn): ...here.
(expand_load_lanes_optab_fn): Use them.
(expand_GOMP_SIMT_ENTER_ALLOC): Likewise.
(expand_GOMP_SIMT_LAST_LANE): Likewise.
(expand_GOMP_SIMT_ORDERED_PRED): Likewise.
(expand_GOMP_SIMT_VOTE_ANY): Likewise.
(expand_GOMP_SIMT_XCHG_BFLY): Likewise.
(expand_GOMP_SIMT_XCHG_IDX): Likewise.
(expand_partial_load_optab_fn): Likewise.
(expand_vec_cond_optab_fn): Likewise.
(expand_vec_cond_mask_optab_fn): Likewise.
(expand_RAWMEMCHR): Likewise.
(expand_gather_load_optab_fn): Likewise.
(expand_while_optab_fn): Likewise.
(expand_SPACESHIP): Likewise.

OK.

FWIW, we did some testing for cases where we didn't utilize the 
SUBREG_PROMOTED* bits to eliminate extensions coming out of the 
expansion phase.  For the most part we're doing a good job.  IIRC our 
instrumentation showed they tended to sneak in mostly through expansion 
of builtins (particularly the arithemtic overflow builtins).


Jeff


[PATCH] testsuite: Align testcase with implementation [PR105090]

2024-07-10 Thread Torbjörn SVENSSON
Is this ok for the following branches?
- trunk
- releases/gcc-14
- releases/gcc-13

--

Since r13-1006-g2005b9b888eeac, the test case copysign_softfloat_1.c
no longer contains any lsr istruction, so drop the check as per
comment 9 in PR105090.

gcc/testsuite/ChangeLog:

PR target/105090
* gcc.target/arm/copysign_softfloat_1.c: Drop check for lsr

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c 
b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
index a14922f1c12..50317b7abe5 100644
--- a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
+++ b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
@@ -42,7 +42,6 @@ main (int argc, char **argv)
   int index = 0;
 
 /* { dg-final { scan-assembler-times "bfi" 2 { target arm_softfloat } } } */
-/* { dg-final { scan-assembler-times "lsr" 1 { target arm_softfloat } } } */
   for (index; index < N; index++)
 {
   if (__builtin_copysignf (a_f[index], b_f[index]) != c_f[index])
-- 
2.25.1



Re: [RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation

2024-07-10 Thread Richard Sandiford
Mariam Arutunian  writes:
> On Sat, Jun 8, 2024 at 3:41 PM Richard Sandiford 
> wrote:
>
>> Mariam Arutunian  writes:
>> > This patch introduces two new expanders for the aarch64 backend,
>> > dedicated to generate optimized code for CRC computations.
>> > The new expanders are designed to leverage specific hardware capabilities
>> > to achieve faster CRC calculations,
>> > particularly using the pmul or crc32 instructions when supported by the
>> > target architecture.
>>
>> Thanks for porting this to aarch64!
>>
>> > Expander 1: Bit-Forward CRC (crc4)
>> > For targets that support pmul instruction (TARGET_AES),
>> > the expander will generate code that uses the pmul (crypto_pmulldi)
>> > instruction for CRC computation.
>> >
>> > Expander 2: Bit-Reversed CRC (crc_rev4)
>> > The expander first checks if the target supports the CRC32 instruction
>> set
>> > (TARGET_CRC32)
>> > and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are
>> met,
>> > it emits calls to the corresponding crc32 instruction (crc32b, crc32h,
>> > crc32w, or crc32x depending on the data size).
>> > If the target does not support crc32 but supports pmul, it then uses the
>> > pmul (crypto_pmulldi) instruction for bit-reversed CRC computation.
>> >
>> > Otherwise table-based CRC is generated.
>> >
>> >   gcc/config/aarch64/
>> >
>> > * aarch64-protos.h (aarch64_expand_crc_using_clmul): New extern
>> > function declaration.
>> > (aarch64_expand_reversed_crc_using_clmul):  Likewise.
>> > * aarch64.cc (aarch64_expand_crc_using_clmul): New function.
>> > (aarch64_expand_reversed_crc_using_clmul):  Likewise.
>> > * aarch64.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
>> > (crc_rev4): New expander for reversed CRC.
>> > (crc4): New expander for reversed CRC.
>> > * iterators.md (crc_data_type): New mode attribute.
>> >
>> >   gcc/testsuite/gcc.target/aarch64/
>> >
>> > * crc-1-pmul.c: Likewise.
>> > * crc-10-pmul.c: Likewise.
>> > * crc-12-pmul.c: Likewise.
>> > * crc-13-pmul.c: Likewise.
>> > * crc-14-pmul.c: Likewise.
>> > * crc-17-pmul.c: Likewise.
>> > * crc-18-pmul.c: Likewise.
>> > * crc-21-pmul.c: Likewise.
>> > * crc-22-pmul.c: Likewise.
>> > * crc-23-pmul.c: Likewise.
>> > * crc-4-pmul.c: Likewise.
>> > * crc-5-pmul.c: Likewise.
>> > * crc-6-pmul.c: Likewise.
>> > * crc-7-pmul.c: Likewise.
>> > * crc-8-pmul.c: Likewise.
>> > * crc-9-pmul.c: Likewise.
>> > * crc-CCIT-data16-pmul.c: Likewise.
>> > * crc-CCIT-data8-pmul.c: Likewise.
>> > * crc-coremark-16bitdata-pmul.c: Likewise.
>> > * crc-crc32-data16.c: New test.
>> > * crc-crc32-data32.c: Likewise.
>> > * crc-crc32-data8.c: Likewise.
>> >
>> > Signed-off-by: Mariam Arutunian > > diff --git a/gcc/config/aarch64/aarch64-protos.h
>> b/gcc/config/aarch64/aarch64-protos.h
>> > index 1d3f94c813e..167e1140f0d 100644
>> > --- a/gcc/config/aarch64/aarch64-protos.h
>> > +++ b/gcc/config/aarch64/aarch64-protos.h
>> > @@ -1117,5 +1117,8 @@ extern void mingw_pe_encode_section_info (tree,
>> rtx, int);
>> >
>> >  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
>> >  void aarch64_restore_za (rtx);
>> > +void aarch64_expand_crc_using_clmul (rtx *);
>> > +void aarch64_expand_reversed_crc_using_clmul (rtx *);
>> > +
>> >
>> >  #endif /* GCC_AARCH64_PROTOS_H */
>> > diff --git a/gcc/config/aarch64/aarch64.cc
>> b/gcc/config/aarch64/aarch64.cc
>> > index ee12d8897a8..05cd0296d38 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -30265,6 +30265,135 @@ aarch64_retrieve_sysreg (const char *regname,
>> bool write_p, bool is128op)
>> >return sysreg->encoding;
>> >  }
>> >
>> > +/* Generate assembly to calculate CRC
>> > +   using carry-less multiplication instruction.
>> > +   OPERANDS[1] is input CRC,
>> > +   OPERANDS[2] is data (message),
>> > +   OPERANDS[3] is the polynomial without the leading 1.  */
>> > +
>> > +void
>> > +aarch64_expand_crc_using_clmul (rtx *operands)
>>
>> This should probably be pmul rather than clmul.
>>
>> +{
>> > +  /* Check and keep arguments.  */
>> > +  gcc_assert (!CONST_INT_P (operands[0]));
>> > +  gcc_assert (CONST_INT_P (operands[3]));
>> > +  rtx crc = operands[1];
>> > +  rtx data = operands[2];
>> > +  rtx polynomial = operands[3];
>> > +
>> > +  unsigned HOST_WIDE_INT
>> > +  crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant
>> ();
>> > +  gcc_assert (crc_size <= 32);
>> > +  unsigned HOST_WIDE_INT
>> > +  data_size = GET_MODE_BITSIZE (GET_MODE (data)).to_constant ();
>>
>> We could instead make the interface:
>>
>> void
>> aarch64_expand_crc_using_pmul (scalar_mode crc_mode, scalar_mode data_mode,
>>rtx *operands)
>>
>> so that the lines above don't need the to_constant.  This should "just
>> work" on the .md file side, since the modes being passed are naturally
>> scalar_mode.
>>
>> I think it'd be worth asserting

Re: [PATCH] testsuite: Align testcase with implementation [PR105090]

2024-07-10 Thread Richard Earnshaw (lists)
On 10/07/2024 17:26, Torbjörn SVENSSON wrote:
> Is this ok for the following branches?
> - trunk
> - releases/gcc-14
> - releases/gcc-13
> 
> --
> 
> Since r13-1006-g2005b9b888eeac, the test case copysign_softfloat_1.c
> no longer contains any lsr istruction, so drop the check as per
> comment 9 in PR105090.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/105090
>   * gcc.target/arm/copysign_softfloat_1.c: Drop check for lsr
> 
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c 
> b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
> index a14922f1c12..50317b7abe5 100644
> --- a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
> +++ b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
> @@ -42,7 +42,6 @@ main (int argc, char **argv)
>int index = 0;
>  
>  /* { dg-final { scan-assembler-times "bfi" 2 { target arm_softfloat } } } */
> -/* { dg-final { scan-assembler-times "lsr" 1 { target arm_softfloat } } } */
>for (index; index < N; index++)
>  {
>if (__builtin_copysignf (a_f[index], b_f[index]) != c_f[index])

OK.
R.


Re: [PATCH] internal-fn: Reuse SUBREG_PROMOTED_VAR_P handling

2024-07-10 Thread Richard Sandiford
Thanks for the review.

Jeff Law  writes:
> On 7/10/24 9:44 AM, Richard Sandiford wrote:
>> expand_fn_using_insn has code to handle SUBREG_PROMOTED_VAR_P
>> destinations.  Specifically, for:
>> 
>>(subreg/v:M1 (reg:M2 R) ...)
>> 
>> it creates a new temporary register T, uses it for the output
>> operand, then sign- or zero-extends the M1 lowpart of T to M2,
>> storing the result in R.
>> 
>> This patch splits this handling out into helper routines and
>> uses them for other instances of:
>> 
>>if (!rtx_equal_p (target, ops[0].value))
>>  emit_move_insn (target, ops[0].value);
>> 
>> It's quite probable that this doesn't help any of the other cases;
>> in particular, it shouldn't affect vectors.  But I think it could
>> be useful for the CRC work.
>> 
>> Bootstrapped & regression-tested on aarch64-linux-gnu.  OK to install?
>> 
>> Richard
>> 
>> 
>> gcc/
>>  * internal-fn.cc (create_call_lhs_operand, assign_call_lhs): New
>>  functions, split out from...
>>  (expand_fn_using_insn): ...here.
>>  (expand_load_lanes_optab_fn): Use them.
>>  (expand_GOMP_SIMT_ENTER_ALLOC): Likewise.
>>  (expand_GOMP_SIMT_LAST_LANE): Likewise.
>>  (expand_GOMP_SIMT_ORDERED_PRED): Likewise.
>>  (expand_GOMP_SIMT_VOTE_ANY): Likewise.
>>  (expand_GOMP_SIMT_XCHG_BFLY): Likewise.
>>  (expand_GOMP_SIMT_XCHG_IDX): Likewise.
>>  (expand_partial_load_optab_fn): Likewise.
>>  (expand_vec_cond_optab_fn): Likewise.
>>  (expand_vec_cond_mask_optab_fn): Likewise.
>>  (expand_RAWMEMCHR): Likewise.
>>  (expand_gather_load_optab_fn): Likewise.
>>  (expand_while_optab_fn): Likewise.
>>  (expand_SPACESHIP): Likewise.
> OK.
>
> FWIW, we did some testing for cases where we didn't utilize the 
> SUBREG_PROMOTED* bits to eliminate extensions coming out of the 
> expansion phase.  For the most part we're doing a good job.  IIRC our 
> instrumentation showed they tended to sneak in mostly through expansion 
> of builtins (particularly the arithemtic overflow builtins).

That sounds great!  I can see why SUBREG_PROMOTED* was a nice hack,
but it's also a source of subtle bugs, and can sometimes mean that
we generate extensions that aren't really needed.  It would be good
if the optimisers are getting to the state where we could remove
it and express things in "natural rtl".

(Admittedly I'm using "natural rtl" to mean "rtl that seems
sensible to me".  It's not very objective.)

On a similar theme, have you ever tried getting rid of
WORD_REGISTER_OPERATIONS for riscv?  Kyrill did that in 2016
for aarch64 (56c9ef5f2fa5787ddd7b2c83804a46554fa1ffc9) and I've
never seen it cause a missed optimisation.  There too, we seem
to get good results using natural rtl.

Richard


Re: [Committed V2 1/2] RISC-V: Add support for B standard extension

2024-07-10 Thread Edwin Lu

Committed!

Edwin

On 7/9/2024 12:07 PM, Jeff Law wrote:



On 7/9/24 11:44 AM, Edwin Lu wrote:
This patch adds support for recognizing the B standard extension to 
be the
collection of Zba, Zbb, Zbs extensions for consistency and 
conciseness across

toolchains

* https://github.com/riscv/riscv-b/tags

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add imply rules for B
  extension
* config/riscv/arch-canonicalize: Ditto
Both of these patches are fine.  And a good reminder, I'll change my 
most recently submitted patch to use "gcb" since it fits the pattern 
of zba_zbb_zbs.


jeff



Re: [PATCH] testsuite: Align testcase with implementation [PR105090]

2024-07-10 Thread Torbjorn SVENSSON



On 2024-07-10 18:41, Richard Earnshaw (lists) wrote:

On 10/07/2024 17:26, Torbjörn SVENSSON wrote:

Is this ok for the following branches?
- trunk
- releases/gcc-14
- releases/gcc-13

--

Since r13-1006-g2005b9b888eeac, the test case copysign_softfloat_1.c
no longer contains any lsr istruction, so drop the check as per
comment 9 in PR105090.

gcc/testsuite/ChangeLog:

PR target/105090
* gcc.target/arm/copysign_softfloat_1.c: Drop check for lsr

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c 
b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
index a14922f1c12..50317b7abe5 100644
--- a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
+++ b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c
@@ -42,7 +42,6 @@ main (int argc, char **argv)
int index = 0;
  
  /* { dg-final { scan-assembler-times "bfi" 2 { target arm_softfloat } } } */

-/* { dg-final { scan-assembler-times "lsr" 1 { target arm_softfloat } } } */
for (index; index < N; index++)
  {
if (__builtin_copysignf (a_f[index], b_f[index]) != c_f[index])


OK.
R.


Pushed as:

basepoints/gcc-15-1950-g4865a92b350
releases/gcc-14.1.0-232-ge7d81cf551b
releases/gcc-13.3.0-121-g4f6f63f2cfc

Kind regards,
Torbjörn


Re: [PATCH] internal-fn: Reuse SUBREG_PROMOTED_VAR_P handling

2024-07-10 Thread Jeff Law




On 7/10/24 10:48 AM, Richard Sandiford wrote:

Thanks for the review.

Jeff Law  writes:

On 7/10/24 9:44 AM, Richard Sandiford wrote:

expand_fn_using_insn has code to handle SUBREG_PROMOTED_VAR_P
destinations.  Specifically, for:

(subreg/v:M1 (reg:M2 R) ...)

it creates a new temporary register T, uses it for the output
operand, then sign- or zero-extends the M1 lowpart of T to M2,
storing the result in R.

This patch splits this handling out into helper routines and
uses them for other instances of:

if (!rtx_equal_p (target, ops[0].value))
  emit_move_insn (target, ops[0].value);

It's quite probable that this doesn't help any of the other cases;
in particular, it shouldn't affect vectors.  But I think it could
be useful for the CRC work.

Bootstrapped & regression-tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
* internal-fn.cc (create_call_lhs_operand, assign_call_lhs): New
functions, split out from...
(expand_fn_using_insn): ...here.
(expand_load_lanes_optab_fn): Use them.
(expand_GOMP_SIMT_ENTER_ALLOC): Likewise.
(expand_GOMP_SIMT_LAST_LANE): Likewise.
(expand_GOMP_SIMT_ORDERED_PRED): Likewise.
(expand_GOMP_SIMT_VOTE_ANY): Likewise.
(expand_GOMP_SIMT_XCHG_BFLY): Likewise.
(expand_GOMP_SIMT_XCHG_IDX): Likewise.
(expand_partial_load_optab_fn): Likewise.
(expand_vec_cond_optab_fn): Likewise.
(expand_vec_cond_mask_optab_fn): Likewise.
(expand_RAWMEMCHR): Likewise.
(expand_gather_load_optab_fn): Likewise.
(expand_while_optab_fn): Likewise.
(expand_SPACESHIP): Likewise.

OK.

FWIW, we did some testing for cases where we didn't utilize the
SUBREG_PROMOTED* bits to eliminate extensions coming out of the
expansion phase.  For the most part we're doing a good job.  IIRC our
instrumentation showed they tended to sneak in mostly through expansion
of builtins (particularly the arithemtic overflow builtins).


That sounds great!  I can see why SUBREG_PROMOTED* was a nice hack,
but it's also a source of subtle bugs, and can sometimes mean that
we generate extensions that aren't really needed.  It would be good
if the optimisers are getting to the state where we could remove
it and express things in "natural rtl".
They're getting better :-)  But we're not there yet.  I sketched out 
exposing the ABI extension requirements in RTL for REE.  I never got it 
working well enough to claim a successful POC.





(Admittedly I'm using "natural rtl" to mean "rtl that seems
sensible to me".  It's not very objective.)

On a similar theme, have you ever tried getting rid of
WORD_REGISTER_OPERATIONS for riscv?  Kyrill did that in 2016
for aarch64 (56c9ef5f2fa5787ddd7b2c83804a46554fa1ffc9) and I've
never seen it cause a missed optimisation.  There too, we seem
to get good results using natural rtl.

We haven't tried, but it's on the list of things to explore.

jeff



[PATCH] PR 115800: Allow builds of little endian powerpc using --with-cpu=power5

2024-07-10 Thread Michael Meissner
The following two patches will allow GCC to be built with a little endian
target where the default CPU is power5.  In particular, both of the libstc++-v3
and libgfortran libraries assumeed that any little endian powerpc system would
support IEEE 128-bit.  However, to support IEEE 128-bit, you need the VSX
register set, so the IEEE 128-bit support would give errors when compiled.

I have built GCC with these options on both little endian and big endian
systems, and there were no regressions.  I have also built a little endian
compiler using the --with-cpu=power5 option, and it built correctly.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH 1/2] PR 115800: Fix libgfortran build using --with-cpu=power5

2024-07-10 Thread Michael Meissner
If you build a little endian compiler and select a default CPU of power5
(i.e. --with-cpu=power5), GCC cannot be built.  The reason is that both the
libgfortran and libstdc++-v3 libraries assume that all little endian powerpc
builds support IEEE 128-bit floating point.

However, if the default cpu does not support the VSX instruction set, then we
cannot build the IEEE 128-bit libraries.  This patch fixes the libgfortran
library so if the GCC compiler does not support IEEE 128-bit floating point, the
IEEE 128-bit floating point libraries are not built.  A companion patch will fix
the libstdc++-v3 library.

I have built these patches on a little endian system, doing both normal builds,
and making a build with a power5 default.  There was no regression in the normal
builds.  I have also built a big endian GCC compiler and there was no regression
there.  Can I check this patch into the trunk?

2024-07-10  Michael Meissner  

libgfortran/

PR target/115800
* configure.ac (powerpc64le*-linux*): Check to see that the compiler
uses VSX before enabling IEEE 128-bit support.
* configure: Regenerate.
* kinds-override.h (GFC_REAL_17): Add check for __VSX__.
* libgfortran.h (POWER_IEEE128): Likewise.

---
 libgfortran/configure| 7 +--
 libgfortran/configure.ac | 3 +++
 libgfortran/kinds-override.h | 2 +-
 libgfortran/libgfortran.h| 2 +-
 4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/libgfortran/configure b/libgfortran/configure
index 11a1bc5f070..2708e5c7eca 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -5981,6 +5981,9 @@ if test "x$GCC" = "xyes"; then
 #if __SIZEOF_LONG_DOUBLE__ != 16
   #error long double is double
   #endif
+  #if !defined(__VSX__)
+  #error VSX is not available
+  #endif
 int
 main ()
 {
@@ -12847,7 +12850,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12850 "configure"
+#line 12853 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12953,7 +12956,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12956 "configure"
+#line 12959 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index cca1ea0ea97..cfaeb9717ab 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -148,6 +148,9 @@ if test "x$GCC" = "xyes"; then
   AC_PREPROC_IFELSE(
 [AC_LANG_PROGRAM([[#if __SIZEOF_LONG_DOUBLE__ != 16
   #error long double is double
+  #endif
+  #if !defined(__VSX__)
+  #error VSX is not available
   #endif]],
  [[(void) 0;]])],
 [AM_FCFLAGS="$AM_FCFLAGS -mabi=ibmlongdouble -mno-gnu-attribute";
diff --git a/libgfortran/kinds-override.h b/libgfortran/kinds-override.h
index f6b4956c5ca..51f440e5323 100644
--- a/libgfortran/kinds-override.h
+++ b/libgfortran/kinds-override.h
@@ -30,7 +30,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #endif
 
 /* Keep these conditions on one line so grep can filter it out.  */
-#if defined(__powerpc64__)  && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__  && 
__SIZEOF_LONG_DOUBLE__ == 16
+#if defined(__powerpc64__)  && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__  && 
__SIZEOF_LONG_DOUBLE__ == 16 && defined(__VSX__)
 typedef _Float128 GFC_REAL_17;
 typedef _Complex _Float128 GFC_COMPLEX_17;
 #define HAVE_GFC_REAL_17
diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h
index 5c59ec26e16..23660335243 100644
--- a/libgfortran/libgfortran.h
+++ b/libgfortran/libgfortran.h
@@ -104,7 +104,7 @@ typedef off_t gfc_offset;
 #endif
 
 #if defined(__powerpc64__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ \
-&& defined __GLIBC_PREREQ
+&& defined __GLIBC_PREREQ && defined(__VSX__)
 #if __GLIBC_PREREQ (2, 32)
 #define POWER_IEEE128 1
 #endif
-- 
2.45.2


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


  1   2   >