Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-29 Thread Andre Vehreschild
Hi Tobias,

I am wondering why the testcase has no `!{ dg-do ... }` line. What will dejagnu
do then? Sorry for the may be stupid question, but I never encountered a
testcase without a dg-do line. It was the minimum for me.

Besides that the patch looks ok to me. 

- Andre

On Fri, 26 Jul 2024 20:34:18 +0200
Tobias Burnus  wrote:

> Updated patch - only change is to the testcase:
> 
> * With the just posted patch for PR116107, array sections with offset 
> work for 'link', hence, I updated the testcase.
> 
> * For 'arr2', I added ref to the associated PR.
> 
> I intent to commit it once PR116107 has been committed.
> 
> Tobias
> 
> Tobias Burnus wrote:
> > Hi all,
> >
> > it turned out that 'declare target' with 'link' clause was broken in
> > multiple ways.
> >
> > The main fix is the attached patch, i.e. namely pushing the variables
> > already to the offload-vars list already in the FE.
> >
> > When implementing it, I noticed:
> > * C has a similar issue when using nested functions, which is
> >a GNU extension →https://gcc.gnu.org/115574
> >
> > * When doing partial mapping of arrays (which is one of the reasons for
> > 'link'), offsets are mishandled in Fortran (not tested in C), see FIXME in
> > the patch) There: arr2(10) should print 10 but with map(arr2(10:)) it
> > prints 19. (I will file a PR about this).
> >
> > * It might happen that linked variables do not get linked. I have not
> > investigated why, but 'arr2' gives link errors – while 'arr' works.
> >See FIXME in the patch. (I will file a PR about this)
> >
> > * For COMMON blocks, map(/common/) is rejected,https://gcc.gnu.org/PR115577
> >
> > * When then mapping map(a,b,c) which is identical for 'common /mycom/
> > a,b,c', it fails to link the device side as the 'mycom_' symbol cannot be
> > found on the device side.  (I will file a PR about this)
> >
> > As COMMON as issues, an alternative would be to defer the trans-common.cc
> > changes to a later patch.
> >
> > Comments, questions, concerns?
> >
> > Tobias
> >
> > PS: Tested with nvptx offloading with a page-migration supporting system
> > with nvptx and GCN offloading configured and no new fails observed.  


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-29 Thread Tobias Burnus

Hi Andre,

Andre Vehreschild wrote:

I am wondering why the testcase has no `!{ dg-do ... }` line. What will dejagnu
do then? Sorry for the may be stupid question, but I never encountered a
testcase without a dg-do line. It was the minimum for me.


Well, then you need look harder ;-)

In gcc/testsuite/, the default is '{ dg-do compile }', i.e. you can
specify or leave out that line without any additional effect. Having it
might be a tad clearer, albeit makes the test a tad longer.

But if you want to 'run' or 'link', you need to specify the dg-do line.
There are several files which don't have the "dg-do compile" line, also
under gcc/testsuite/gfortran.dg

In case of libgomp, it is becomes interesting: the default is running
the code, i.e. you need a 'compile' or 'link' when it shouldn't be run.

However, at least for Fortran (libgomp.{oacc-}fortran), there is a
difference between specifying nothing and specifying 'dg-do run': In
case of the default, it is compiled and run. But if you specify 'dg-do
run', it is compiled multiple times with different optimization options
and then run.

(Actually, also under gcc/testsuite/gfortran.dg, you get multiple
compilations + runs with 'dg-do run'. If you use dg-additional-options,
you can also add options. I think with dg-options, you set it to a
single run [not confirmed].)

The downside of compiling + running it multiple times is a longer test
time without any real benefit. However, especially with Fortran,
compiling with different optimization levels did expose issues in the
past, both in the Fortran front end and in the middle end. — Thus, there
some benefit of using it.

In any case, there more complex the code is that front-end + middle-end
code have to process, the more useful is "dg-do run". The more work is
done by the run-time library, be it libgfortran or libgomp, the less
useful it becomes as the heavy lifting is done in the run-time library.
— As libgomp progressing already takes quite some time (albeit it can
now run in parallel), there are some who prefer few 'dg-do run' and
others who prefer if all Fortran testcases there use 'dg-do run' …

I hope it helps,

Tobias



Re: [PATCH v3 1/3] aarch64: Add march flags for +fp8 arch extensions

2024-07-29 Thread Kyrylo Tkachov
Hi Claudio,

> On 26 Jul 2024, at 18:32, Claudio Bantaloukas  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This introduces the relevant flags to enable access to the fpmr register and 
> fp8 intrinsics, which will be added subsequently.
> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64-option-extensions.def (fp8): New.
>* config/aarch64/aarch64.h (TARGET_FP8): Likewise.
>* doc/invoke.texi (AArch64 Options): Document new -march flags
>and extensions.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/aarch64/acle/fp8.c: New test.

Thanks, this looks ok to me now.
One question about the command-line flag.
FP8 defines instructions for Advanced SIMD, SVE and SME.
Is the “+fp8” option in this patch intended to combine with the +sve and +sme 
options to indicate the presence of these ISA-specific subsets? That is, you’re 
not planning to introduce something like +sve-fp8, +sme-fp8?
Kyrill


> ---
> .../aarch64/aarch64-option-extensions.def |  2 ++
> gcc/config/aarch64/aarch64.h  |  3 +++
> gcc/doc/invoke.texi   |  2 ++
> gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 20 +++
> 4 files changed, 27 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> 
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 42ec0eec31e..6998627f377 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
> 
> AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
> 
> +AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "fp8")
> +
> #undef AARCH64_OPT_FMV_EXTENSION
> #undef AARCH64_OPT_EXTENSION
> #undef AARCH64_FMV_FEATURE
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index b7e330438d9..2e75c6b81e2 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -463,6 +463,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE 
> ATTRIBUTE_UNUSED
> && (aarch64_tune_params.extra_tuning_flags \
> & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
> 
> +/* fp8 instructions are enabled through +fp8.  */
> +#define TARGET_FP8 AARCH64_HAVE_ISA (FP8)
> +
> /* Standard register usage.  */
> 
> /* 31 64-bit general purpose registers R0-R30:
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 9fb0925ed29..7cbcd8ad1b4 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21848,6 +21848,8 @@ Enable support for Armv9.4-a Guarded Control Stack 
> extension.
> Enable support for Armv8.9-a/9.4-a translation hardening extension.
> @item rcpc3
> Enable the RCpc3 (Release Consistency) extension.
> +@item fp8
> +Enable the fp8 (8-bit floating point) extension.
> 
> @end table
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> new file mode 100644
> index 000..459442be155
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> @@ -0,0 +1,20 @@
> +/* Test the fp8 ACLE intrinsics family.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -march=armv8-a" } */
> +
> +#include 
> +
> +#ifdef __ARM_FEATURE_FP8
> +#error "__ARM_FEATURE_FP8 feature macro defined."
> +#endif
> +
> +#pragma GCC push_options
> +#pragma GCC target("arch=armv9.4-a+fp8")
> +
> +/* We do not define __ARM_FEATURE_FP8 until all
> +   relevant features have been added. */
> +#ifdef __ARM_FEATURE_FP8
> +#error "__ARM_FEATURE_FP8 feature macro defined."
> +#endif
> +
> +#pragma GCC pop_options



Re: [PATCH v3 3/3] aarch64: Add fpm register helper functions.

2024-07-29 Thread Kyrylo Tkachov
Hi Claudio,

> On 26 Jul 2024, at 18:32, Claudio Bantaloukas  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> The ACLE declares several helper types and functions to facilitate 
> construction
> of `fpm` arguments. These are available when one of the arm_neon.h, arm_sve.h,
> or arm_sme.h headers is included. These helpers don't map to specific FP8
> instructions and there's no expectation that they will produce a given code
> sequence, they're just an abstraction and an aid to the programmer. Thus they 
> are
> implemented in a new header file arm_private_fp8.h
> Users are not expected to include this file, as it is a mere implementation 
> detail,
> subject to change. A check is included to guard against direct inclusion.
> 
> gcc/ChangeLog:
> 
>* config.gcc (extra_headers): Install arm_private_fp8.h.
>* config/aarch64/arm_neon.h: Include arm_private_fp8.h.
>* config/aarch64/arm_sve.h: Likewise.
>* config/aarch64/arm_private_fp8.h: New file
>(fpm_t): New type representing fpmr values.
>(enum __ARM_FPM_FORMAT): New enum representing valid fp8 formats.
>(enum __ARM_FPM_OVERFLOW): New enum representing how some fp8
>calculations work.
>(__arm_fpm_init): New.
>(__arm_set_fpm_src1_format): Likewise.
>(__arm_set_fpm_src2_format): Likewise.
>(__arm_set_fpm_dst_format): Likewise.
>(__arm_set_fpm_overflow_cvt): Likewise.
>(__arm_set_fpm_overflow_mul): Likewise.
>(__arm_set_fpm_lscale): Likewise.
>(__arm_set_fpm_lscale2): Likewise.
>(__arm_set_fpm_nscale): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/aarch64/acle/fp8-helpers-neon.c: New test of fpmr helper
>functions.
>* gcc.target/aarch64/acle/fp8-helpers-sve.c: New test of fpmr helper
>functions presence.
>* gcc.target/aarch64/acle/fp8-helpers-sme.c: New test of fpmr helper
>functions presence.
> ---
> gcc/config.gcc|  2 +-
> gcc/config/aarch64/arm_neon.h |  1 +
> gcc/config/aarch64/arm_private_fp8.h  | 80 +++
> gcc/config/aarch64/arm_sve.h  |  1 +
> .../aarch64/acle/fp8-helpers-neon.c   | 53 
> .../gcc.target/aarch64/acle/fp8-helpers-sme.c | 12 +++
> .../gcc.target/aarch64/acle/fp8-helpers-sve.c | 12 +++
> 7 files changed, 160 insertions(+), 1 deletion(-)
> create mode 100644 gcc/config/aarch64/arm_private_fp8.h
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 7453ade0782..a36dd1bcbc6 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -347,7 +347,7 @@ m32c*-*-*)
> ;;
> aarch64*-*-*)
> cpu_type=aarch64
> - extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h 
> arm_sme.h arm_neon_sve_bridge.h"
> + extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h 
> arm_sme.h arm_neon_sve_bridge.h arm_private_fp8.h"
> c_target_objs="aarch64-c.o"
> cxx_target_objs="aarch64-c.o"
> d_target_objs="aarch64-d.o"
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index c4a09528ffd..e376685489d 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -30,6 +30,7 @@
> #pragma GCC push_options
> #pragma GCC target ("+nothing+simd")
> 
> +#include 
> #pragma GCC aarch64 "arm_neon.h"
> 
> #include 
> diff --git a/gcc/config/aarch64/arm_private_fp8.h 
> b/gcc/config/aarch64/arm_private_fp8.h
> new file mode 100644
> index 000..ba93bc526c1
> --- /dev/null
> +++ b/gcc/config/aarch64/arm_private_fp8.h
> @@ -0,0 +1,80 @@
> +/* AArch64 FP8 helper functions.
> +   Do not include this file directly. Use one of arm_neon.h
> +   arm_sme.h arm_sve.h instead.
> +
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +   Contributed by ARM Ltd.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> 

[committed] testsuite: Fix up consteval-prop21.C for 32-bit targets [PR115986]

2024-07-29 Thread Jakub Jelinek
On Sat, Jul 27, 2024 at 04:26:07PM -0400, Jason Merrill wrote:
>   * g++.dg/cpp2a/consteval-prop21.C: New test.

The test fails on 32-bit targets (which don't support __int128 type).
Using unsigned long long instead still ICEs before the fix and passes
after it on those targets.

Tested on x86_64-linux with
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ 
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} dg.exp=consteval-prop21.C'
and committed to trunk as obvious.

Jakub



Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-29 Thread Andre Vehreschild
Thanks a lot Tobias,

yes, I could have looked harder :-)

This isn't by any chance documented on the developer website of gcc somewhere?
It would be sad, if that knowledge is not publicy available for the future.

Thanks again for the explanation and keep up the good work.

Regards,
Andre

On Mon, 29 Jul 2024 09:29:28 +0200
Tobias Burnus  wrote:

> Hi Andre,
> 
> Andre Vehreschild wrote:
> > I am wondering why the testcase has no `!{ dg-do ... }` line. What will
> > dejagnu do then? Sorry for the may be stupid question, but I never
> > encountered a testcase without a dg-do line. It was the minimum for me.  
> 
> Well, then you need look harder ;-)
> 
> In gcc/testsuite/, the default is '{ dg-do compile }', i.e. you can
> specify or leave out that line without any additional effect. Having it
> might be a tad clearer, albeit makes the test a tad longer.
> 
> But if you want to 'run' or 'link', you need to specify the dg-do line.
> There are several files which don't have the "dg-do compile" line, also
> under gcc/testsuite/gfortran.dg
> 
> In case of libgomp, it is becomes interesting: the default is running
> the code, i.e. you need a 'compile' or 'link' when it shouldn't be run.
> 
> However, at least for Fortran (libgomp.{oacc-}fortran), there is a
> difference between specifying nothing and specifying 'dg-do run': In
> case of the default, it is compiled and run. But if you specify 'dg-do
> run', it is compiled multiple times with different optimization options
> and then run.
> 
> (Actually, also under gcc/testsuite/gfortran.dg, you get multiple
> compilations + runs with 'dg-do run'. If you use dg-additional-options,
> you can also add options. I think with dg-options, you set it to a
> single run [not confirmed].)
> 
> The downside of compiling + running it multiple times is a longer test
> time without any real benefit. However, especially with Fortran,
> compiling with different optimization levels did expose issues in the
> past, both in the Fortran front end and in the middle end. — Thus, there
> some benefit of using it.
> 
> In any case, there more complex the code is that front-end + middle-end
> code have to process, the more useful is "dg-do run". The more work is
> done by the run-time library, be it libgfortran or libgomp, the less
> useful it becomes as the heavy lifting is done in the run-time library.
> — As libgomp progressing already takes quite some time (albeit it can
> now run in parallel), there are some who prefer few 'dg-do run' and
> others who prefer if all Fortran testcases there use 'dg-do run' …
> 
> I hope it helps,
> 
> Tobias
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


[PATCH] c++: Fix up error recovery of invalid structured bindings used in conditions [PR116113]

2024-07-29 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because for structured binding error recovery
DECL_DECOMP_BASE is kept NULL and the newly added code to pick up saved
value from the base assumes that on structured binding bases the
TARGET_EXPR will be always there (that is the case if there are no errors).

The following patch fixes it by testing DECL_DECOMP_BASE before
dereferencing it, another option would be not to do that if
error_operand_p (cond).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-07-29  Jakub Jelinek  

PR c++/116113
* semantics.cc (maybe_convert_cond): Check DECL_DECOMP_BASE
is non-NULL before dereferencing it.
(finish_switch_cond): Likewise.

* g++.dg/cpp26/decomp11.C: New test.

--- gcc/cp/semantics.cc.jj  2024-07-24 15:47:15.238477295 +0200
+++ gcc/cp/semantics.cc 2024-07-27 09:44:44.537658588 +0200
@@ -972,6 +972,7 @@ maybe_convert_cond (tree cond)
  result in a TARGET_EXPR, pick it up from there.  */
   if (DECL_DECOMPOSITION_P (cond)
   && DECL_DECOMP_IS_BASE (cond)
+  && DECL_DECOMP_BASE (cond)
   && TREE_CODE (DECL_DECOMP_BASE (cond)) == TARGET_EXPR)
 cond = TARGET_EXPR_SLOT (DECL_DECOMP_BASE (cond));
 
@@ -1714,6 +1715,7 @@ finish_switch_cond (tree cond, tree swit
 conversion result in a TARGET_EXPR, pick it up from there.  */
   if (DECL_DECOMPOSITION_P (cond)
  && DECL_DECOMP_IS_BASE (cond)
+ && DECL_DECOMP_BASE (cond)
  && TREE_CODE (DECL_DECOMP_BASE (cond)) == TARGET_EXPR)
cond = TARGET_EXPR_SLOT (DECL_DECOMP_BASE (cond));
   cond = build_expr_type_conversion (WANT_INT | WANT_ENUM, cond, true);
--- gcc/testsuite/g++.dg/cpp26/decomp11.C.jj2024-07-27 09:49:54.931612663 
+0200
+++ gcc/testsuite/g++.dg/cpp26/decomp11.C   2024-07-27 09:52:09.411859739 
+0200
@@ -0,0 +1,19 @@
+// PR c++/116113
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+extern int b[];
+
+void
+foo ()
+{
+  auto [a] = b;// { dg-error "is incomplete" }
+   // { dg-warning "structured bindings only available with" "" { 
target c++14_down } .-1 }
+  if (a)
+;
+  switch (a)
+{
+default:
+  break;
+}
+}

Jakub



[PING^0][Patch, rs6000, middle-end] v7: Add implementation for different targets for pair mem fusion

2024-07-29 Thread Ajit Agarwal


Hello Richard:

Did you get a chance to look at the changes. Ok to install?

Thanks & Regards
Ajit

 Forwarded Message 
Subject: [Patch, rs6000, middle-end] v7: Add implementation for different 
targets for pair mem fusion
Date: Fri, 19 Jul 2024 14:46:13 +0530
From: Ajit Agarwal 
To: Alex Coplan , Richard Sandiford 
, Kewen.Lin , Segher 
Boessenkool , Michael Meissner 
, Peter Bergner , David Edelsohn 
, gcc-patches 

Hello Richard:

All comments are addressed.

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific code implement virtual functions defined by generic code.

Target specific code are added in rs6000-mem-fusion.cc.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


rs6000, middle-end: Add implementation for different targets for pair mem fusion

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific code implement virtual functions defined by generic code.

Target specific code are added in rs6000-mem-fusion.cc.

2024-07-19  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: New mem fusion pass
before pass_early_remat.
* pair-fusion.h: Add additional pure virtual function
required for rs6000 target implementation.
* pair-fusion.cc: Use of virtual functions for additional
virtual function addded for rs6000 target.
* config/rs6000/rs6000-mem-fusion.cc: Add new pass.
Add target specific implementation for generic pure virtual
functions.
* config/rs6000/mma.md: Modify movoo machine description.
Add new machine description movoo1.
* config/rs6000/rs6000.cc: Modify rs6000_split_multireg_move
to expand movoo machine description for all constraints.
* config.gcc: Add new object file.
* config/rs6000/rs6000-protos.h: Add new prototype for mem
fusion pass.
* config/rs6000/t-rs6000: Add new rule.
* rtl-ssa/functions.h: Move out allocate function from private
to public and add get_m_temp_defs function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/mem-fusion.C: New test.
* g++.target/powerpc/mem-fusion-1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|   2 +
 gcc/config/rs6000/mma.md  |  26 +-
 gcc/config/rs6000/rs6000-mem-fusion.cc| 746 ++
 gcc/config/rs6000/rs6000-passes.def   |   4 +-
 gcc/config/rs6000/rs6000-protos.h |   1 +
 gcc/config/rs6000/rs6000.cc   |  58 +-
 gcc/config/rs6000/rs6000.md   |   1 +
 gcc/config/rs6000/t-rs6000|   5 +
 gcc/pair-fusion.cc|  32 +-
 gcc/pair-fusion.h |  48 ++
 gcc/rtl-ssa/functions.h   |  11 +-
 .../g++.target/powerpc/mem-fusion-1.C |  22 +
 gcc/testsuite/g++.target/powerpc/mem-fusion.C |  15 +
 .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
 14 files changed, 946 insertions(+), 29 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-mem-fusion.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index bc45615741b..12f79a78177 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -524,6 +524,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -560,6 +561,7 @@ rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
;;
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 04e2d0066df..88413926a02 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -294,7 +294,31 @@
 
 (define_insn_and_split "*movoo"
   [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
-   (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
+(match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
+  "TARGET_MMA
+   && (gpc_r

Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-29 Thread Tobias Burnus

Hi Andre, hi all,

Andre Vehreschild wrote:

yes, I could have looked harder 🙂


I wrote ;-) on purpose as this feature is somewhat hidden and writing 
'dg-do compile' doesn't harm.


In case of gcc/testsuite, the 'run' is also needed and were often missed 
(or rather caused by invalid variants such as 'dg-run' (should be: 
'dg-do run') or '{dg-do run }' (missing space after '{') prevented the 
running of the code). Sam did fix some of those (and some other dg-* 
issues) recently, e.g. in r15-2349-ga75c6295252d0d (→ 
https://gcc.gnu.org/r15-2349-ga75c6295252d0d ).



This isn't by any chance documented on the developer website of gcc somewhere?
It would be sad, if that knowledge is not publicy available for the future.


https://gcc.gnu.org/onlinedocs/gccint/Directives.html#Specify-how-to-build-the-test 
documents it.


And libgomp has: lib/libgomp.exp:set dg-do-what-default run

The all arguments vs. only -O2 is set in libgomp via:

libgomp.c++/c++.exp:    set DEFAULT_CFLAGS "-O2"

libgomp.c/c.exp:    set DEFAULT_CFLAGS "-O2"

and for libgomp.*fortran/fortran.exp, the difference between 'dg-do run' 
vs. default is *not* *documented,* but seems to be the result of the 
following:


# For Fortran we're doing torture testing, as Fortran has far more tests
# with arrays etc. that testing just -O0 or -O2 is insufficient, that is
# typically not the case for C/C++.
gfortran-dg-runtest $tests "" ""


Tobias


Re: [PATCH] MATCH: add abs support for half float

2024-07-29 Thread Kugan Vivekanandarajah
On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
 wrote:
>
> On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
>  wrote:
> >
> > On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
> > >  wrote:
> > > >
> > > > On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski  
> > > > wrote:
> > > > >
> > > > > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> > > > >  wrote:
> > > > > >
> > > > > > Revised based on the comment and moved it into existing patterns as.
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A.
> > > > > > Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * gcc.dg/tree-ssa/absfloat16.c: New test.
> > > > >
> > > > > The testcase needs to make sure it runs only for targets that support
> > > > > float16 so like:
> > > > >
> > > > > /* { dg-require-effective-target float16 } */
> > > > > /* { dg-add-options float16 } */
> > > > Added in the attached version.
> > >
> > > + /* (type)A >=/> 0 ? A : -Asame as abs (A) */
> > >   (for cmp (ge gt)
> > >(simplify
> > > -   (cnd (cmp @0 zerop) @1 (negate @1))
> > > -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> > > -&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> > > -&& bitwise_equal_p (@0, @1))
> > > +   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
> > > +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
> > > +&& !TYPE_UNSIGNED (TREE_TYPE (@1))
> > > +&& ((VECTOR_TYPE_P (type)
> > > + && tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE (@1)))
> > > +   || (!VECTOR_TYPE_P (type)
> > > +   && (TYPE_PRECISION (TREE_TYPE (@1))
> > > +   <= TYPE_PRECISION (TREE_TYPE (@0)
> > > +&& bitwise_equal_p (@1, @2))
> > >
> > > I wonder about the bitwise_equal_p which tests @1 against @2 now
> > > with the convert still applied to @1 - that looks odd.  You are allowing
> > > sign-changing conversions but doesn't that change ge/gt behavior?
> > > Also why are sign/zero-extensions not OK for vector types?
> > Thanks for the review.
> > My main motivation here is for _Float16  as below.
> >
> > _Float16 absfloat16 (_Float16 x)
> > {
> >   float _1;
> >   _Float16 _2;
> >   _Float16 _4;
> >[local count: 1073741824]:
> >   _1 = (float) x_3(D);
> >   if (_1 < 0.0)
> > goto ; [41.00%]
> >   else
> > goto ; [59.00%]
> >[local count: 440234144]:\
> >   _4 = -x_3(D);
> >[local count: 1073741824]:
> >   # _2 = PHI <_4(3), x_3(D)(2)>
> >   return _2;
> > }
> >
> > This is why I added  bitwise_equal_p test of @1 against @2 with
> > TYPE_PRECISION checks.
> > I agree that I will have to check for sign-changing conversions.
> >
> > Just to keep it simple, I disallowed vector types. I am not sure if
> > this would  hit vec types. I am happy to handle this if that is
> > needed.
>
> I think with __builtin_convertvector you should be able to construct
> a testcase that does
Thanks.

For the pattern,
```
 /* A >=/> 0 ? A : -Asame as abs (A) */
 (for cmp (ge gt)
  (simplify
   (cnd (cmp @0 zerop) @1 (negate @1))
(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
 && !TYPE_UNSIGNED (TREE_TYPE(@0))
 && bitwise_equal_p (@0, @1))
 (if (TYPE_UNSIGNED (type))
  (absu:type @0)
  (abs @0)
```
the vector type doesn't seem right. For example, if we have a 4
element vector with some negative and positive, I don't think  it
makes sense. Also, we dont seem to generate  (cmp @0 zerop). Am I
missing it completely?

Thanks,
Kugan

>
> >
> > >
> > > +  (absu:type @1)
> > > +  (abs @1)
> > >
> > > I think this should use @2 now.
> > I will change this.
> >
> > Thanks,
> > Kugan
> >
> > >
> > > > Thanks.
> > > > Kugan
> > > > >
> > > > > (like what is in gcc.dg/c11-floatn-3.c and others).
> > > > >
> > > > > Other than that it looks good but I can't approve it.
> > > > >
> > > > > Thanks,
> > > > > Andrew Pinski
> > > > >
> > > > > >
> > > > > > Signed-off-by: Kugan Vivekanandarajah 
> > > > > >
> > > > > > Bootstrapped and regression test on aarch64-linux-gnu. Is this OK 
> > > > > > for trunk?
> > > > > > Thanks,
> > > > > > Kugan
> > > > > >
> > > > > > 
> > > > > > From: Andrew Pinski 
> > > > > > Sent: Monday, 15 July 2024 5:30 AM
> > > > > > To: Kugan Vivekanandarajah 
> > > > > > Cc: gcc-patches@gcc.gnu.org ; 
> > > > > > richard.guent...@gmail.com 
> > > > > > Subject: Re: [PATCH] MATCH: add abs support for half float
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > On Sun, Jul 14, 2024 at 1:12 AM Kugan Vivekanandarajah
> > > > > >  wrote:
> > > > > > >
> > > > > > > This patch extends abs detection in matched for half float.
> > > > > > >
> > > > > > > Bootstrapped and regression test on aarch64-linux-gnu. Is this OK 
> 

[PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-07-29 Thread pan2 . li
From: Pan Li 

For some target like target=amdgcn-amdhsa,  we need to take care of
vector bool types prior to general vector mode types.  Or we may have
the asm check failure as below.

gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
s[0-9]+, v[0-9]+ 80
gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
s[0-9]+, v[0-9]+ 80
gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
s[0-9]+, v[0-9]+ 56
gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
s[0-9]+, v[0-9]+ 56
gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "

The below test suites are passed for this patch.
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
4. The amdgcn test case as above.

gcc/ChangeLog:

* internal-fn.cc (type_strictly_matches_mode_p): Add handling
for vector bool type.

Signed-off-by: Pan Li 
---
 gcc/internal-fn.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 8a2e07f2f96..086c8be398a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn)
 static bool
 type_strictly_matches_mode_p (const_tree type)
 {
+  /* For target=amdgcn-amdhsa,  we need to take care of vector bool types.
+ More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103.  */
+  if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type))
+&& TYPE_PRECISION (TREE_TYPE (type)) == 1)
+return true;
+
   if (VECTOR_TYPE_P (type))
 return VECTOR_MODE_P (TYPE_MODE (type));
 
-- 
2.34.1



[PATCH] LoongArch: Rework bswap{hi,si,di}2 definition

2024-07-29 Thread Xi Ruoyao
Per a gcc-help thread we are generating sub-optimal code for
__builtin_bswap{32,64}.  To fix it:

- Use a single revb.d instruction for bswapdi2.
- Use a single revb.2w instruction for bswapsi2 for TARGET_64BIT,
  revb.2h + rotri.w for !TARGET_64BIT.
- Use a single revb.2h instruction for bswapsi2 (x) r>> 16, and a single
  revb.2w instruction for bswapdi2 (x) r>> 32.

Unfortunately I cannot figure out a way to make the compiler generate
revb.4h or revh.{2w,d} instructions.

gcc/ChangeLog:

* config/loongarch/loongarch.md (UNSPEC_REVB_2H, UNSPEC_REVB_4H,
UNSPEC_REVH_D): Remove UNSPECs.
(revb_4h, revh_d): Remove define_insn.
(revb_2h): Define as (rotatert:SI (bswap:SI x) 16) instead of
an UNSPEC.
(revb_2h_extend, revb_2w, *bswapsi2, bswapdi2): New define_insn.
(bswapsi2): Change to define_expand.  Only expand to revb.2h +
rotri.w if !TARGET_64BIT.
(bswapdi2): Change to define_insn of which the output is just a
revb.d instruction.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/revb.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 79 ---
 gcc/testsuite/gcc.target/loongarch/revb.c | 61 +
 2 files changed, 104 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/revb.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index ac94a22eafc..f166e834c56 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -20,11 +20,6 @@
 ;; .
 
 (define_c_enum "unspec" [
-  ;; Integer operations that are too cumbersome to describe directly.
-  UNSPEC_REVB_2H
-  UNSPEC_REVB_4H
-  UNSPEC_REVH_D
-
   ;; Floating-point moves.
   UNSPEC_LOAD_LOW
   UNSPEC_LOAD_HIGH
@@ -3155,55 +3150,67 @@ (define_insn "alslsi3_extend"
 
 ;; Reverse the order of bytes of operand 1 and store the result in operand 0.
 
-(define_insn "bswaphi2"
-  [(set (match_operand:HI 0 "register_operand" "=r")
-   (bswap:HI (match_operand:HI 1 "register_operand" "r")))]
+(define_insn "revb_2h"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (rotatert:SI (bswap:SI (match_operand:SI 1 "register_operand" "r"))
+(const_int 16)))]
   ""
   "revb.2h\t%0,%1"
   [(set_attr "type" "shift")])
 
-(define_insn_and_split "bswapsi2"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (bswap:SI (match_operand:SI 1 "register_operand" "r")))]
-  ""
-  "#"
-  ""
-  [(set (match_dup 0) (unspec:SI [(match_dup 1)] UNSPEC_REVB_2H))
-   (set (match_dup 0) (rotatert:SI (match_dup 0) (const_int 16)))]
-  ""
-  [(set_attr "insn_count" "2")])
-
-(define_insn_and_split "bswapdi2"
+(define_insn "revb_2h_extend"
   [(set (match_operand:DI 0 "register_operand" "=r")
-   (bswap:DI (match_operand:DI 1 "register_operand" "r")))]
+   (sign_extend:DI
+ (rotatert:SI
+   (bswap:SI (match_operand:SI 1 "register_operand" "r"))
+   (const_int 16]
   "TARGET_64BIT"
-  "#"
-  ""
-  [(set (match_dup 0) (unspec:DI [(match_dup 1)] UNSPEC_REVB_4H))
-   (set (match_dup 0) (unspec:DI [(match_dup 0)] UNSPEC_REVH_D))]
-  ""
-  [(set_attr "insn_count" "2")])
+  "revb.2h\t%0,%1"
+  [(set_attr "type" "shift")])
 
-(define_insn "revb_2h"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:SI 1 "register_operand" "r")] 
UNSPEC_REVB_2H))]
+(define_insn "bswaphi2"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+   (bswap:HI (match_operand:HI 1 "register_operand" "r")))]
   ""
   "revb.2h\t%0,%1"
   [(set_attr "type" "shift")])
 
-(define_insn "revb_4h"
+(define_insn "revb_2w"
   [(set (match_operand:DI 0 "register_operand" "=r")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "r")] 
UNSPEC_REVB_4H))]
+   (rotatert:DI (bswap:DI (match_operand:DI 1 "register_operand" "r"))
+(const_int 32)))]
   "TARGET_64BIT"
-  "revb.4h\t%0,%1"
+  "revb.2w\t%0,%1"
   [(set_attr "type" "shift")])
 
-(define_insn "revh_d"
+(define_insn "*bswapsi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (bswap:SI (match_operand:SI 1 "register_operand" "r")))]
+  "TARGET_64BIT"
+  "revb.2w\t%0,%1"
+  [(set_attr "type" "shift")])
+
+(define_expand "bswapsi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (bswap:SI (match_operand:SI 1 "register_operand" "r")))]
+  ""
+{
+  if (!TARGET_64BIT)
+{
+  rtx t = gen_reg_rtx (SImode);
+  emit_insn (gen_revb_2h (t, operands[1]));
+  emit_insn (gen_rotrsi3 (operands[0], t, GEN_INT (16)));
+  DONE;
+}
+})
+
+(define_insn "bswapdi2"
   [(set (match_operand:DI 0 "register_operand" "=r")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "r")] 
UNSPEC_REVH_D))]
+   (bswap:DI (match_operand:DI 1 "register_operand" "r")))]
   "TARGET_64BIT"
-  "revh.d\t%0,%1

[PATCH] LoongArch: Relax ins_zero_bitmask_operand and remove and3_align

2024-07-29 Thread Xi Ruoyao
In r15-1207 I was too stupid to realize we just need to relax
ins_zero_bitmask_operand to allow using bstrins for aligning, instead of
adding a new split.  And, "> 12" in ins_zero_bitmask_operand also makes
no sense: it rejects bstrins for things like "x & ~4l" with no good
reason.

So fix my errors now.

gcc/ChangeLog:

* config/loongarch/predicates.md (ins_zero_bitmask_operand):
Cover more cases that bstrins can benefit.
(high_bitmask_operand): Remove.
* config/loongarch/constraints.md (Yy): Remove.
* config/loongarch/loongarch.md (and3_align): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bstrins-4.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/constraints.md|  4 
 gcc/config/loongarch/loongarch.md  | 17 -
 gcc/config/loongarch/predicates.md |  9 ++---
 gcc/testsuite/gcc.target/loongarch/bstrins-4.c |  9 +
 4 files changed, 11 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bstrins-4.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 12cf5e2924a..18da8b31f49 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -292,10 +292,6 @@ (define_constraint "Yx"
"@internal"
(match_operand 0 "low_bitmask_operand"))
 
-(define_constraint "Yy"
-   "@internal"
-   (match_operand 0 "high_bitmask_operand"))
-
 (define_constraint "YI"
   "@internal
A replicated vector const in which the replicated value is in the range
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index e1629c5a339..ac94a22eafc 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1588,23 +1588,6 @@ (define_insn "and3_extended"
   [(set_attr "move_type" "pick_ins")
(set_attr "mode" "")])
 
-(define_insn_and_split "and3_align"
-  [(set (match_operand:GPR 0 "register_operand" "=r")
-   (and:GPR (match_operand:GPR 1 "register_operand" "r")
-(match_operand:GPR 2 "high_bitmask_operand" "Yy")))]
-  ""
-  "#"
-  ""
-  [(set (match_dup 0) (match_dup 1))
-   (set (zero_extract:GPR (match_dup 0) (match_dup 2) (const_int 0))
-   (const_int 0))]
-{
-  int len;
-
-  len = low_bitmask_len (mode, ~INTVAL (operands[2]));
-  operands[2] = GEN_INT (len);
-})
-
 (define_insn_and_split "*bstrins__for_mask"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(and:GPR (match_operand:GPR 1 "register_operand" "r")
diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index 58e406ea522..95c2544cc2f 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -293,10 +293,6 @@ (define_predicate "low_bitmask_operand"
   (and (match_code "const_int")
(match_test "low_bitmask_len (mode, INTVAL (op)) > 12")))
 
-(define_predicate "high_bitmask_operand"
-  (and (match_code "const_int")
-   (match_test "low_bitmask_len (mode, ~INTVAL (op)) > 0")))
-
 (define_predicate "d_operand"
   (and (match_code "reg")
(match_test "GP_REG_P (REGNO (op))")))
@@ -406,11 +402,10 @@ (define_predicate "muldiv_target_operand"
 
 (define_predicate "ins_zero_bitmask_operand"
   (and (match_code "const_int")
-   (match_test "INTVAL (op) != -1")
-   (match_test "INTVAL (op) & 1")
(match_test "low_bitmask_len (mode, \
 ~UINTVAL (op) | (~UINTVAL(op) - 1)) \
-   > 12")))
+   > 0")
+   (not (match_operand 0 "const_uns_arith_operand"
 
 (define_predicate "const_call_insn_operand"
   (match_code "const,symbol_ref,label_ref")
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-4.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-4.c
new file mode 100644
index 000..0823cfc386e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-4.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "bstrins\\.d\t\\\$r4,\\\$r0,2,2" } } */
+
+long
+x (long a)
+{
+  return a & ~4;
+}
-- 
2.45.2



Re: [PATCH] MATCH: add abs support for half float

2024-07-29 Thread Andrew Pinski
On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
 wrote:
>
> On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
>  wrote:
> >
> > On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
> >  wrote:
> > >
> > > On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
> > > >  wrote:
> > > > >
> > > > > On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> > > > > >  wrote:
> > > > > > >
> > > > > > > Revised based on the comment and moved it into existing patterns 
> > > > > > > as.
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A.
> > > > > > > Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > * gcc.dg/tree-ssa/absfloat16.c: New test.
> > > > > >
> > > > > > The testcase needs to make sure it runs only for targets that 
> > > > > > support
> > > > > > float16 so like:
> > > > > >
> > > > > > /* { dg-require-effective-target float16 } */
> > > > > > /* { dg-add-options float16 } */
> > > > > Added in the attached version.
> > > >
> > > > + /* (type)A >=/> 0 ? A : -Asame as abs (A) */
> > > >   (for cmp (ge gt)
> > > >(simplify
> > > > -   (cnd (cmp @0 zerop) @1 (negate @1))
> > > > -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> > > > -&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> > > > -&& bitwise_equal_p (@0, @1))
> > > > +   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
> > > > +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
> > > > +&& !TYPE_UNSIGNED (TREE_TYPE (@1))
> > > > +&& ((VECTOR_TYPE_P (type)
> > > > + && tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE (@1)))
> > > > +   || (!VECTOR_TYPE_P (type)
> > > > +   && (TYPE_PRECISION (TREE_TYPE (@1))
> > > > +   <= TYPE_PRECISION (TREE_TYPE (@0)
> > > > +&& bitwise_equal_p (@1, @2))
> > > >
> > > > I wonder about the bitwise_equal_p which tests @1 against @2 now
> > > > with the convert still applied to @1 - that looks odd.  You are allowing
> > > > sign-changing conversions but doesn't that change ge/gt behavior?
> > > > Also why are sign/zero-extensions not OK for vector types?
> > > Thanks for the review.
> > > My main motivation here is for _Float16  as below.
> > >
> > > _Float16 absfloat16 (_Float16 x)
> > > {
> > >   float _1;
> > >   _Float16 _2;
> > >   _Float16 _4;
> > >[local count: 1073741824]:
> > >   _1 = (float) x_3(D);
> > >   if (_1 < 0.0)
> > > goto ; [41.00%]
> > >   else
> > > goto ; [59.00%]
> > >[local count: 440234144]:\
> > >   _4 = -x_3(D);
> > >[local count: 1073741824]:
> > >   # _2 = PHI <_4(3), x_3(D)(2)>
> > >   return _2;
> > > }
> > >
> > > This is why I added  bitwise_equal_p test of @1 against @2 with
> > > TYPE_PRECISION checks.
> > > I agree that I will have to check for sign-changing conversions.
> > >
> > > Just to keep it simple, I disallowed vector types. I am not sure if
> > > this would  hit vec types. I am happy to handle this if that is
> > > needed.
> >
> > I think with __builtin_convertvector you should be able to construct
> > a testcase that does
> Thanks.
>
> For the pattern,
> ```
>  /* A >=/> 0 ? A : -Asame as abs (A) */
>  (for cmp (ge gt)
>   (simplify
>(cnd (cmp @0 zerop) @1 (negate @1))
> (if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
>  && !TYPE_UNSIGNED (TREE_TYPE(@0))
>  && bitwise_equal_p (@0, @1))
>  (if (TYPE_UNSIGNED (type))
>   (absu:type @0)
>   (abs @0)
> ```
> the vector type doesn't seem right. For example, if we have a 4
> element vector with some negative and positive, I don't think  it
> makes sense. Also, we dont seem to generate  (cmp @0 zerop). Am I
> missing it completely?

Looks like I missed adding some vector testcases anyways here is one
to get this, note it is C++ due to the C front-end not support `?:`
for vectors yet (there is a patch).
```
#define vect8 __attribute__((vector_size(8)))
vect8 int f(vect8 int a)
{
  vect8 int na = -a;
  return (a > 0) ? a : na;
}
```
At -O2 before forwprop1, we have:
```
  vector(2) intD.9 a_2(D) = aD.2796;
  vector(2) intD.9 naD.2799;
  vector(2)  _1;
  vector(2) intD.9 _4;

  na_3 = -a_2(D);
  _1 = a_2(D) > { 0, 0 };
  _4 = VEC_COND_EXPR <_1, a_2(D), na_3>;
```
And forwprop using match is able to do:
```
Applying pattern match.pd:6306, gimple-match-10.cc:19843
gimple_simplified to _4 = ABS_EXPR ;
Removing dead stmt:_1 = a_2(D) > { 0, 0 };
Removing dead stmt:na_3 = -a_2(D);
```
(replace int with float and add  -fno-signed-zeros you can get the ABS also).

Note comparisons with vector types always generate a vector boolean
type. So cond_expr will never show up with a vector comparison; only
vec_cond.

Thanks,
Andrew Pinski

>
> Thanks,
> Kugan
>
> >
> > >

[patch,wwwdocs,avr,applied] Mention recent additions to the avr backend

2024-07-29 Thread Georg-Johann Lay

Applied the patch below

Johann

--

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 3b3a6c0b..aa8d7609 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -99,7 +99,27 @@ a work-in-progress.

 

-
+AVR
+
+
+  Support has been added for the signal and 
interrupt
+href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Function-Attributes.html#index-signal_0028num_0029-function-attribute_002c-AVR";

+   >function attributes
+that allow to specify the interrupt vector number as an argument.
+It allows to use static functions as interrupt handlers, and also
+functions defined in a C++ namespace.
+  Support has been added for the noblock function 
attribute.

+It can be specified together with the signal attribute to
+indicate that the interrupt service routine should start with a SEI
+instruction to globally re-enable interrupts.  The difference to the
+interrupt attribute is that the noblock
+attribute just acts like a flag and does not impose a specific function
+name.
+  Support has been added for the __builtin_avr_mask1
+href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Built-in-Functions.html#index-_005f_005fbuiltin_005favr_005fmask1";
+   >built-in function.  It can be used to compute some bit 
masks when
+code like 1 << offset is not fast 
enough.

+

 



Re: [Patch] libgomp: Fix declare target link with offset array-section mapping [PR116107]

2024-07-29 Thread Jakub Jelinek
On Fri, Jul 26, 2024 at 08:05:43PM +0200, Tobias Burnus wrote:
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1820,8 +1820,11 @@ gomp_map_vars_internal (struct gomp_device_descr 
> *devicep,
>   if (k->aux && k->aux->link_key)
> {
>   /* Set link pointer on target to the device address of the
> -mapped object.  */
> - void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
> +mapped object. Also deal with offsets due to
> +array-section mapping. */

Formatting.  Two spaces after . in both spots.

> + void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset
> +- (k->host_start
> +   - 
> k->aux->link_key->host_start));

Otherwise LGTM.

Jakub



Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-29 Thread Jakub Jelinek
On Mon, Jul 29, 2024 at 09:53:47AM +0200, Tobias Burnus wrote:
> Hi Andre, hi all,
> 
> Andre Vehreschild wrote:
> > yes, I could have looked harder 🙂
> 
> I wrote ;-) on purpose as this feature is somewhat hidden and writing 'dg-do
> compile' doesn't harm.

I think an explicit dg-do is better, otherwise one has to just guess
for some tests what has been actually intentional (see the recent
torture tests which were just compile time but written most likely to be
runtime; I've changed a few, Sam changed more).

Also, the subject line has too few digits in the PR number I think (9
missing?).

Otherwise LGTM.

Jakub



Re: [RFC v1 1/2] Merge definitions of array_type_nelts_top()

2024-07-29 Thread Richard Biener
On Sun, Jul 28, 2024 at 4:16 PM Alejandro Colomar  wrote:
>
> There were two identical definitions, and none of them are available
> where they are needed for implementing _Lengthof().  Merge them, and
> provide the single definition in gcc/tree.{h,cc}, where it's available
> for _Lengthof().
>
> Signed-off-by: Alejandro Colomar 
> ---
>  gcc/cp/cp-tree.h  |  1 -
>  gcc/cp/tree.cc| 13 -
>  gcc/rust/backend/rust-tree.cc | 13 -
>  gcc/rust/backend/rust-tree.h  |  2 --
>  gcc/tree.cc   | 13 +
>  gcc/tree.h|  1 +
>  6 files changed, 14 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index c1a371bc721..e6c1c63f872 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -8099,7 +8099,6 @@ extern tree build_exception_variant   
> (tree, tree);
>  extern void fixup_deferred_exception_variants   (tree, tree);
>  extern tree bind_template_template_parm(tree, tree);
>  extern tree array_type_nelts_total (tree);
> -extern tree array_type_nelts_top   (tree);
>  extern bool array_of_unknown_bound_p   (const_tree);
>  extern tree break_out_target_exprs (tree, bool = false);
>  extern tree build_ctor_subob_ref   (tree, tree, tree);
> diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> index dfd4a3a948b..1f3ecff1a21 100644
> --- a/gcc/cp/tree.cc
> +++ b/gcc/cp/tree.cc
> @@ -3071,19 +3071,6 @@ cxx_print_statistics (void)
>  depth_reached);
>  }
>
> -/* Return, as an INTEGER_CST node, the number of elements for TYPE
> -   (which is an ARRAY_TYPE).  This counts only elements of the top
> -   array.  */
> -
> -tree
> -array_type_nelts_top (tree type)
> -{
> -  return fold_build2_loc (input_location,
> - PLUS_EXPR, sizetype,
> - array_type_nelts (type),
> - size_one_node);
> -}
> -
>  /* Return, as an INTEGER_CST node, the number of elements for TYPE
> (which is an ARRAY_TYPE).  This one is a recursive count of all
> ARRAY_TYPEs that are clumped together.  */
> diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc
> index 2a5ffcbf895..dd8eda84f9b 100644
> --- a/gcc/rust/backend/rust-tree.cc
> +++ b/gcc/rust/backend/rust-tree.cc
> @@ -859,19 +859,6 @@ is_empty_class (tree type)
>return CLASSTYPE_EMPTY_P (type);
>  }
>
> -// forked from gcc/cp/tree.cc array_type_nelts_top
> -
> -/* Return, as an INTEGER_CST node, the number of elements for TYPE
> -   (which is an ARRAY_TYPE).  This counts only elements of the top
> -   array.  */
> -
> -tree
> -array_type_nelts_top (tree type)
> -{
> -  return fold_build2_loc (input_location, PLUS_EXPR, sizetype,
> - array_type_nelts (type), size_one_node);
> -}
> -
>  // forked from gcc/cp/tree.cc builtin_valid_in_constant_expr_p
>
>  /* Test whether DECL is a builtin that may appear in a
> diff --git a/gcc/rust/backend/rust-tree.h b/gcc/rust/backend/rust-tree.h
> index 26c8b653ac6..e597c3ab81d 100644
> --- a/gcc/rust/backend/rust-tree.h
> +++ b/gcc/rust/backend/rust-tree.h
> @@ -2993,8 +2993,6 @@ extern location_t rs_expr_location (const_tree);
>  extern int
>  is_empty_class (tree type);
>
> -extern tree array_type_nelts_top (tree);
> -
>  extern bool
>  is_really_empty_class (tree, bool);
>
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 2d2d5b6db6e..3b0adb4cd9f 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -3729,6 +3729,19 @@ array_type_nelts (const_tree type)
>   ? max
>   : fold_build2 (MINUS_EXPR, TREE_TYPE (max), max, min));
>  }
> +
> +/* Return, as an INTEGER_CST node, the number of elements for TYPE
> +   (which is an ARRAY_TYPE).  This counts only elements of the top
> +   array.  */
> +
> +tree
> +array_type_nelts_top (tree type)
> +{
> +  return fold_build2_loc (input_location,
> + PLUS_EXPR, sizetype,
> + array_type_nelts (type),
> + size_one_node);
> +}

But this is now extremely confusing API with array_type_nelts above this
saying

/* Return, as a tree node, the number of elements for TYPE (which is an
   ARRAY_TYPE) minus one.  This counts only elements of the top array.  */

so both are "_top".  And there's build_array_type_nelts that's taking
the number of elements.

Can you please rename the existing array_type_nelts to
array_type_nelts_minus_one?  Then _top could be dropped as well from
the alternate API  you add.

I'll also note since array_type_nelts_top calls the other function and that has

  /* If they did it with unspecified bounds, then we should have already
 given an error about it before we got here.  */
  if (! TYPE_DOMAIN (type))
return error_mark_node;

the function should handle error_mark_node (and pass that down).

Note array_type_nelts returns nelts - 1 because that avoids building
a new tree node for arrays with lower bo

Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-29 Thread HAO CHEN GUI
Hi Jeff,
  Do you have further questions?

Thanks
Gui Haochen

在 2024/7/24 6:39, Andrew MacLeod 写道:
> 
> On 7/23/24 15:18, Jeff Law wrote:
>>
>>
>> On 7/11/24 9:17 PM, HAO CHEN GUI wrote:
>>
 So why the test for real_isinf on the upper/lower bound?  If op1 is known 
 to be a NaN, then why test the bounds at all?  If a bounds test is needed, 
 why only test the upper bound?

>>> IMHO, logical is if the op1 is a NAN, it's not an infinite number. If the 
>>> upper
>>> and lower bound both are finite numbers, the op1 is not an infinite number.
>>> Under both situations, the result should be set to 0 which means op1 isn't 
>>> an
>>> infinite number.
>> Understood, but that's not what the code actually implements:
>>
> +    if (op1.known_isnan ()
> +    || (!real_isinf (&op1.lower_bound ())
> +    && !real_isinf (&op1.upper_bound (
> +  {
> +    r.set_zero (type);
> +    return true;
> +  }
>> If op1 is a NaN, then it it can not be Inf.  Similarly if both of the bounds 
>> are known not to be Inf, then op1 is not Inf and thus we should be returning 
>> false instead of true.  Or am I mis-understanding this API?
>>
>>
> the range is in r, and is set to [0,0].  this is the false part of what is 
> being returned for the range.
> 
> the "return true" indicates we determined a range, so use what is in R.
> 
> returning false means we did not find a range to return, so r is garbage.
> 
> 


Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 );

2024-07-29 Thread Ehrnsperger, Markus

Hi,


I'm attaching two files:

1.: *to_chars10.h*:

This is intended to be included in libstdc++ / gcc to achieve 
performance improvements. It is an implementation of


to_chars10(char* first, char* last,  /* integer-type */ value);

Parameters are identical to std::to_chars(char* first, char* last,  /* 
integer-type */ value, int base = 10 ); . It only works for base == 10.


If it is included in libstdc++, to_chars10(...) could be renamed to 
std::to_chars(char* first, char* last,  /* integer-type */ value) to 
provide an overload for the default base = 10



2.:  t*o_chars10.cpp*:

This is a test program for to_chars10 verifying the correctness of the 
results, and measuring the performance. The actual performance 
improvement is system dependent, so please test on your own system.


On my system the performance improvement is about factor two, my results 
are:



Test   int8_t verifying to_chars10 = std::to_chars ... OK
Test  uint8_t verifying to_chars10 = std::to_chars ... OK
Test  int16_t verifying to_chars10 = std::to_chars ... OK
Test uint16_t verifying to_chars10 = std::to_chars ... OK
Test  int32_t verifying to_chars10 = std::to_chars ... OK
Test uint32_t verifying to_chars10 = std::to_chars ... OK
Test  int64_t verifying to_chars10 = std::to_chars ... OK
Test uint64_t verifying to_chars10 = std::to_chars ... OK

Benchmarking test case   tested method  ...  time (lower is 
better)

Benchmarking random unsigned 64 bit  to_chars10 ...  0.00957
Benchmarking random unsigned 64 bit  std::to_chars  ...  0.01854
Benchmarking random   signed 64 bit  to_chars10 ...  0.01018
Benchmarking random   signed 64 bit  std::to_chars  ...  0.02297
Benchmarking random unsigned 32 bit  to_chars10 ...  0.00620
Benchmarking random unsigned 32 bit  std::to_chars  ...  0.01275
Benchmarking random   signed 32 bit  to_chars10 ...  0.00783
Benchmarking random   signed 32 bit  std::to_chars  ...  0.01606
Benchmarking random unsigned 16 bit  to_chars10 ...  0.00536
Benchmarking random unsigned 16 bit  std::to_chars  ...  0.00871
Benchmarking random   signed 16 bit  to_chars10 ...  0.00664
Benchmarking random   signed 16 bit  std::to_chars  ...  0.01154
Benchmarking random unsigned 08 bit  to_chars10 ...  0.00393
Benchmarking random unsigned 08 bit  std::to_chars  ...  0.00626
Benchmarking random   signed 08 bit  to_chars10 ...  0.00465
Benchmarking random   signed 08 bit  std::to_chars  ...  0.01089


Thanks, Markus


// g++ -std=c++17 -O3 -g to_chars10.cpp
/*
  Copyright (C) Markus Ehrnsperger. All rights reserved.
  Licence: GNU General Public License version 3

  This program does:
- check correctness of to_chars10
- compare performance of to_chars10 with std::to_chars

  Note: Part of the code is copied / modifies from 
https://github.com/miloyip/itoa-benchmark
*/
#include "to_chars10.h"

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
 
const unsigned kIterationPerDigit = 10;
const unsigned kIterationForRandom = 100;
const unsigned kTrial = 10;

template 
struct Traits { };

template <>
struct Traits {
enum { kBufferSize = 3 };
enum { kMaxDigit = 3 };
static uint8_t Negate(uint8_t x) { return x; };
};

template <>
struct Traits {
enum { kBufferSize = 4 };
enum { kMaxDigit = 3 };
static int8_t Negate(int8_t x) { return -x; };
};

template <>
struct Traits {
enum { kBufferSize = 5 };
enum { kMaxDigit = 5 };
static uint16_t Negate(uint16_t x) { return x; };
};

template <>
struct Traits {
enum { kBufferSize = 6 };
enum { kMaxDigit = 5 };
static int16_t Negate(int16_t x) { return -x; };
};

template <>
struct Traits {
enum { kBufferSize = 10 };
enum { kMaxDigit = 10 };
static uint32_t Negate(uint32_t x) { return x; };
};

template <>
struct Traits {
enum { kBufferSize = 11 };
enum { kMaxDigit = 10 };
static int32_t Negate(int32_t x) { return -x; };
};

template <>
struct Traits {
enum { kBufferSize = 20 };
enum { kMaxDigit = 20 };
static uint64_t Negate(uint64_t x) { return x; };
};

template <>
struct Traits {
enum { kBufferSize = 20 };
enum { kMaxDigit = 19 };
static int64_t Negate(int64_t x) { return -x; };
};

template 
void VerifyValue(T value, std::to_chars_result(*f)(char*, char*, T, int), 
std::to_chars_result(*g)(char*, char*, T, int), const char* test, const char* 
fname, const char* gname) {
char buffer_f[Traits::kBufferSize];
char buffer_g[Traits::kBufferSize];

std::to_chars_result r_f = f(buffer_f, buffer_f+sizeof(buffer_f), value, 
10);
std::to_chars_result r_g = g(buffer_g, buffer_g+sizeof(buffer_g), value, 
10);

if (r_f.ec != r_g.ec) {
std::cout << "\nError: " << fname << " -> " << 
std::make_error_code(r_f.ec).message();
std::cout <<", " << gname << " -> " << 
std::make_error_code(r_g.ec).message()  << "\n";
std::cout << "Value " << +value << ", sizeof

Re: [PATCH v2] i386: Fix AVX512 intrin macro typo

2024-07-29 Thread Jakub Jelinek
On Mon, Jul 29, 2024 at 02:07:24AM +, Jiang, Haochen wrote:
> > LGTM with the above ChangeLog nit fixed, for trunk/release branches, even 
> > for
> > 14.2 if committed RSN.
> 
> Ok. I will commit them and backport them to GCC13 and GCC12 now. For GCC14,
> we could wait for GCC14.3 since it has been a weekend passed and not that RSN.
> But if it could be in GCC14.2, I will also happy for that.

Please commit it to 14.2 ASAP.

Jakub



RE: [PATCH v2] i386: Fix AVX512 intrin macro typo

2024-07-29 Thread Jiang, Haochen



> -Original Message-
> From: Jakub Jelinek 
> Sent: Monday, July 29, 2024 4:41 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH v2] i386: Fix AVX512 intrin macro typo
> 
> On Mon, Jul 29, 2024 at 02:07:24AM +, Jiang, Haochen wrote:
> > > LGTM with the above ChangeLog nit fixed, for trunk/release branches,
> > > even for
> > > 14.2 if committed RSN.
> >
> > Ok. I will commit them and backport them to GCC13 and GCC12 now. For
> > GCC14, we could wait for GCC14.3 since it has been a weekend passed and
> not that RSN.
> > But if it could be in GCC14.2, I will also happy for that.
> 
> Please commit it to 14.2 ASAP.

Pushed to GCC14.2

Thx,
Haochen

> 
>   Jakub



Re: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-07-29 Thread Richard Biener
On Mon, Jul 29, 2024 at 9:57 AM  wrote:
>
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.

OK.

Richard.

> gcc/ChangeLog:
>
> * internal-fn.cc (type_strictly_matches_mode_p): Add handling
> for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..086c8be398a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* For target=amdgcn-amdhsa,  we need to take care of vector bool types.
> + More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103.  
> */
> +  if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +&& TYPE_PRECISION (TREE_TYPE (type)) == 1)
> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));
>
> --
> 2.34.1
>


Re: [PATCH] MATCH: add abs support for half float

2024-07-29 Thread Richard Biener
On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski  wrote:
>
> On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
>  wrote:
> >
> > On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
> >  wrote:
> > >
> > > On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
> > >  wrote:
> > > >
> > > > On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Revised based on the comment and moved it into existing 
> > > > > > > > patterns as.
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A.
> > > > > > > > Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
> > > > > > > >
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > >
> > > > > > > > * gcc.dg/tree-ssa/absfloat16.c: New test.
> > > > > > >
> > > > > > > The testcase needs to make sure it runs only for targets that 
> > > > > > > support
> > > > > > > float16 so like:
> > > > > > >
> > > > > > > /* { dg-require-effective-target float16 } */
> > > > > > > /* { dg-add-options float16 } */
> > > > > > Added in the attached version.
> > > > >
> > > > > + /* (type)A >=/> 0 ? A : -Asame as abs (A) */
> > > > >   (for cmp (ge gt)
> > > > >(simplify
> > > > > -   (cnd (cmp @0 zerop) @1 (negate @1))
> > > > > -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> > > > > -&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> > > > > -&& bitwise_equal_p (@0, @1))
> > > > > +   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
> > > > > +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
> > > > > +&& !TYPE_UNSIGNED (TREE_TYPE (@1))
> > > > > +&& ((VECTOR_TYPE_P (type)
> > > > > + && tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE 
> > > > > (@1)))
> > > > > +   || (!VECTOR_TYPE_P (type)
> > > > > +   && (TYPE_PRECISION (TREE_TYPE (@1))
> > > > > +   <= TYPE_PRECISION (TREE_TYPE (@0)
> > > > > +&& bitwise_equal_p (@1, @2))
> > > > >
> > > > > I wonder about the bitwise_equal_p which tests @1 against @2 now
> > > > > with the convert still applied to @1 - that looks odd.  You are 
> > > > > allowing
> > > > > sign-changing conversions but doesn't that change ge/gt behavior?
> > > > > Also why are sign/zero-extensions not OK for vector types?
> > > > Thanks for the review.
> > > > My main motivation here is for _Float16  as below.
> > > >
> > > > _Float16 absfloat16 (_Float16 x)
> > > > {
> > > >   float _1;
> > > >   _Float16 _2;
> > > >   _Float16 _4;
> > > >[local count: 1073741824]:
> > > >   _1 = (float) x_3(D);
> > > >   if (_1 < 0.0)
> > > > goto ; [41.00%]
> > > >   else
> > > > goto ; [59.00%]
> > > >[local count: 440234144]:\
> > > >   _4 = -x_3(D);
> > > >[local count: 1073741824]:
> > > >   # _2 = PHI <_4(3), x_3(D)(2)>
> > > >   return _2;
> > > > }
> > > >
> > > > This is why I added  bitwise_equal_p test of @1 against @2 with
> > > > TYPE_PRECISION checks.
> > > > I agree that I will have to check for sign-changing conversions.
> > > >
> > > > Just to keep it simple, I disallowed vector types. I am not sure if
> > > > this would  hit vec types. I am happy to handle this if that is
> > > > needed.
> > >
> > > I think with __builtin_convertvector you should be able to construct
> > > a testcase that does
> > Thanks.
> >
> > For the pattern,
> > ```
> >  /* A >=/> 0 ? A : -Asame as abs (A) */
> >  (for cmp (ge gt)
> >   (simplify
> >(cnd (cmp @0 zerop) @1 (negate @1))
> > (if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> >  && !TYPE_UNSIGNED (TREE_TYPE(@0))
> >  && bitwise_equal_p (@0, @1))
> >  (if (TYPE_UNSIGNED (type))
> >   (absu:type @0)
> >   (abs @0)
> > ```
> > the vector type doesn't seem right. For example, if we have a 4
> > element vector with some negative and positive, I don't think  it
> > makes sense. Also, we dont seem to generate  (cmp @0 zerop). Am I
> > missing it completely?
>
> Looks like I missed adding some vector testcases anyways here is one
> to get this, note it is C++ due to the C front-end not support `?:`
> for vectors yet (there is a patch).
> ```
> #define vect8 __attribute__((vector_size(8)))
> vect8 int f(vect8 int a)
> {
>   vect8 int na = -a;
>   return (a > 0) ? a : na;
> }
> ```
> At -O2 before forwprop1, we have:
> ```
>   vector(2) intD.9 a_2(D) = aD.2796;
>   vector(2) intD.9 naD.2799;
>   vector(2)  _1;
>   vector(2) intD.9 _4;
>
>   na_3 = -a_2(D);
>   _1 = a_2(D) > { 0, 0 };
>   _4 = VEC_COND_EXPR <_1, a_2(D), na_3>;
> ```
> And forwprop using match is able to do:
> ```
> Applying pattern match.pd:6306, gimple-match-10.cc:19843
> gimple_simplified to _4 = ABS_EXPR ;
> Removi

Re: [RFC v1 1/2] Merge definitions of array_type_nelts_top()

2024-07-29 Thread Alejandro Colomar
Hi Richard,

On Mon, Jul 29, 2024 at 10:27:35AM GMT, Richard Biener wrote:
> On Sun, Jul 28, 2024 at 4:16 PM Alejandro Colomar  wrote:
> >
> > There were two identical definitions, and none of them are available
> > where they are needed for implementing _Lengthof().  Merge them, and
> > provide the single definition in gcc/tree.{h,cc}, where it's available
> > for _Lengthof().
> >
> > Signed-off-by: Alejandro Colomar 
> > ---
> >  gcc/cp/cp-tree.h  |  1 -
> >  gcc/cp/tree.cc| 13 -
> >  gcc/rust/backend/rust-tree.cc | 13 -
> >  gcc/rust/backend/rust-tree.h  |  2 --
> >  gcc/tree.cc   | 13 +
> >  gcc/tree.h|  1 +
> >  6 files changed, 14 insertions(+), 29 deletions(-)
> >

[...]

> > diff --git a/gcc/tree.cc b/gcc/tree.cc
> > index 2d2d5b6db6e..3b0adb4cd9f 100644
> > --- a/gcc/tree.cc
> > +++ b/gcc/tree.cc
> > @@ -3729,6 +3729,19 @@ array_type_nelts (const_tree type)
> >   ? max
> >   : fold_build2 (MINUS_EXPR, TREE_TYPE (max), max, min));
> >  }
> > +
> > +/* Return, as an INTEGER_CST node, the number of elements for TYPE
> > +   (which is an ARRAY_TYPE).  This counts only elements of the top
> > +   array.  */
> > +
> > +tree
> > +array_type_nelts_top (tree type)
> > +{
> > +  return fold_build2_loc (input_location,
> > + PLUS_EXPR, sizetype,
> > + array_type_nelts (type),
> > + size_one_node);
> > +}
> 
> But this is now extremely confusing API with array_type_nelts above this
> saying
> 
> /* Return, as a tree node, the number of elements for TYPE (which is an
>ARRAY_TYPE) minus one.  This counts only elements of the top array.  */
> 
> so both are "_top".  And there's build_array_type_nelts that's taking
> the number of elements.
> 
> Can you please rename the existing array_type_nelts to
> array_type_nelts_minus_one?  Then _top could be dropped as well from
> the alternate API  you add.

I wanted to do that, but then I found other functions that are named
similarly, such as build_array_type_nelts(), and thought that I wasn't
sure if all of them should be renamed to _minus_one, or just some.  So
I decided to start without renaming.

But yeah, I think I should rename.  I'll prepare a patch for renaming it
independently of this patch set, and send it to be merged before this
patch set.

> I'll also note since array_type_nelts_top calls the other function and that 
> has
> 
>   /* If they did it with unspecified bounds, then we should have already
>  given an error about it before we got here.  */
>   if (! TYPE_DOMAIN (type))
> return error_mark_node;
> 
> the function should handle error_mark_node (and pass that down).

H, now I understand that (! TYPE_DOMAIN (type))

$ grep -rn return.array_type_nelts gcc
gcc/cp/call.cc:12111:return array_type_nelts_top (c->type);
gcc/c-family/c-common.cc:4090:  return array_type_nelts_top (type);

$ sed -n 12102,12119p gcc/cp/call.cc
/* Return a tree representing the number of elements initialized by the
   list-initialization C.  The caller must check that C converts to an
   array type.  */

static tree
nelts_initialized_by_list_init (conversion *c)
{
  /* If the array we're converting to has a dimension, we'll use that.  
*/
  if (TYPE_DOMAIN (c->type))
return array_type_nelts_top (c->type);
  else
{
  /* Otherwise, we look at how many elements the constructor we're
 initializing from has.  */
  tree ctor = conv_get_original_expr (c);
  return size_int (CONSTRUCTOR_NELTS (ctor));
}
}

It seems that would fail when measuring for example

#define memberof(T, member)  ((T){}.member)

struct s {
int x;
int a[];
};

__lengthof__(memberof(struct s, a));

I guess?

$ cat len.c 
#include 

#define memberof(T, member)  ((T){}.member)

struct s {
int x;
int y[];
};

int
main(int argc, char *argv[argc + 1])
{
int a[42];
size_t  n;

(void) argv;

//n = __lengthof__(argv);
//printf("__lengthof__(argv) == %zu\n", n);

n = __lengthof__(a);
printf("lengthof(a):\t %zu\n", n);

n = __lengthof__(long [99]);
printf("lengthof(long [99]):\t %zu\n", n);

n = __lengthof__(short [n - 10]);
printf("lengthof(short [n - 10]):\t %zu\n", n);

int  b[n / 2];
n = __lengthof__(b);
printf("lengthof(b):\t %zu\n", n);

n = __lengthof__(memberof(struct s, y));
printf("lengthof(memberof(struct s, y)):\

Fix ICE with -fdump-tree-moref

2024-07-29 Thread Jan Hubicka
Hi,
this patch fixes sanity check in modref dumping which is no longer
correct now when we give up on parameters being readonly after store
merging.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

PR ipa/116055
* ipa-modref.cc (analyze_function): Do not ICE when flags regress.

diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index f6a758b5f42..59cfe91f987 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -3297,7 +3297,8 @@ analyze_function (bool ipa)
fprintf (dump_file, "  Flags for param %i improved:",
 (int)i);
  else
-   gcc_unreachable ();
+   fprintf (dump_file, "  Flags for param %i changed:",
+(int)i);
  dump_eaf_flags (dump_file, old_flags, false);
  fprintf (dump_file, " -> ");
  dump_eaf_flags (dump_file, new_flags, true);
@@ -3313,7 +3314,7 @@ analyze_function (bool ipa)
  || (summary->retslot_flags & EAF_UNUSED))
fprintf (dump_file, "  Flags for retslot improved:");
  else
-   gcc_unreachable ();
+   fprintf (dump_file, "  Flags for retslot changed:");
  dump_eaf_flags (dump_file, past_retslot_flags, false);
  fprintf (dump_file, " -> ");
  dump_eaf_flags (dump_file, summary->retslot_flags, true);
@@ -3328,7 +3329,7 @@ analyze_function (bool ipa)
  || (summary->static_chain_flags & EAF_UNUSED))
fprintf (dump_file, "  Flags for static chain improved:");
  else
-   gcc_unreachable ();
+   fprintf (dump_file, "  Flags for static chain changed:");
  dump_eaf_flags (dump_file, past_static_chain_flags, false);
  fprintf (dump_file, " -> ");
  dump_eaf_flags (dump_file, summary->static_chain_flags, true);


Re: [PATCH v1] Widening-Mul: Try .SAT_SUB for PLUS_EXPR when one op is IMM

2024-07-29 Thread Richard Biener
On Sun, Jul 28, 2024 at 5:25 AM  wrote:
>
> From: Pan Li 
>
> After add the matching for .SAT_SUB when one op is IMM,  there
> will be a new root PLUS_EXPR for the .SAT_SUB pattern.  For example,
>
> Form 3:
>   #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_3 (T x)  \
>   {   \
> return x >= IMM ? x - IMM : 0;\
>   }
>
> DEF_SAT_U_SUB_IMM_FMT_3(uint64_t, 11)
>
> And then we will have gimple before widening-mul as below.  Thus,  try
> the .SAT_SUB for the PLUS_EXPR.
>
>4   │ __attribute__((noinline))
>5   │ uint64_t sat_u_sub_imm11_uint64_t_fmt_3 (uint64_t x)
>6   │ {
>7   │   long unsigned int _1;
>8   │   uint64_t _3;
>9   │
>   10   │[local count: 1073741824]:
>   11   │   _1 = MAX_EXPR ;
>   12   │   _3 = _1 + 18446744073709551605;
>   13   │   return _3;
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK

> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Try .SAT_SUB for PLUS_EXPR case.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-ssa-math-opts.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index ac86be8eb94..8d96a4c964b 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6129,6 +6129,7 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>
> case PLUS_EXPR:
>   match_unsigned_saturation_add (&gsi, as_a (stmt));
> + match_unsigned_saturation_sub (&gsi, as_a (stmt));
>   /* fall-through  */
> case MINUS_EXPR:
>   if (!convert_plusminus_to_widen (&gsi, stmt, code))
> --
> 2.34.1
>


Re: [C++ coroutines 6/6] Testsuite.

2024-07-29 Thread Thomas Schwinge
Hi Iain!

On 2019-11-17T10:28:26+, Iain Sandoe  wrote:
> There are two categories of test:
>
> 1. Checks for correctly formed source code and the error reporting.
> 2. Checks for transformation and code-gen.
>
> The second set are run as 'torture' tests for the standard options
> set, including LTO.  These are also intentionally run with no options
> provided (from the coroutines.exp script).

I recently was confused why I'm seeing the same test case first without
and then again with torture testing options; non-standard in the GCC test
suite, per my experience at least?  Should we therefore add a short
rationale comment to the 'find' in 'g++.dg/coroutines/coroutines.exp',
why 'g++.dg/coroutines/torture/' test cases are not being filtered out
there, despite more specific 'g++.dg/coroutines/torture/coro-torture.exp'
testing these, too?


Grüße
 Thomas


> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/coroutines/coroutines.exp
> @@ -0,0 +1,50 @@

> +foreach test [lsort [find $srcdir/$subdir {*.[CH]}]] {
> +if [runtest_file_p $runtests $test] {
> +set nshort [file tail [file dirname $test]]/[file tail $test]
> +verbose "Testing $nshort $DEFAULT_COROFLAGS" 1
> +dg-test $test "" $DEFAULT_COROFLAGS
> +set testcase [string range $test [string length "$srcdir/"] end]
> +}

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/coroutines/torture/coro-torture.exp

> +gcc-dg-runtest [lsort [glob $srcdir/$subdir/*.C]] "" $DEFAULT_COROFLAGS


Re: [RFC v1 1/2] Merge definitions of array_type_nelts_top()

2024-07-29 Thread Richard Biener
On Mon, Jul 29, 2024 at 10:55 AM Alejandro Colomar  wrote:
>
> Hi Richard,
>
> On Mon, Jul 29, 2024 at 10:27:35AM GMT, Richard Biener wrote:
> > On Sun, Jul 28, 2024 at 4:16 PM Alejandro Colomar  wrote:
> > >
> > > There were two identical definitions, and none of them are available
> > > where they are needed for implementing _Lengthof().  Merge them, and
> > > provide the single definition in gcc/tree.{h,cc}, where it's available
> > > for _Lengthof().
> > >
> > > Signed-off-by: Alejandro Colomar 
> > > ---
> > >  gcc/cp/cp-tree.h  |  1 -
> > >  gcc/cp/tree.cc| 13 -
> > >  gcc/rust/backend/rust-tree.cc | 13 -
> > >  gcc/rust/backend/rust-tree.h  |  2 --
> > >  gcc/tree.cc   | 13 +
> > >  gcc/tree.h|  1 +
> > >  6 files changed, 14 insertions(+), 29 deletions(-)
> > >
>
> [...]
>
> > > diff --git a/gcc/tree.cc b/gcc/tree.cc
> > > index 2d2d5b6db6e..3b0adb4cd9f 100644
> > > --- a/gcc/tree.cc
> > > +++ b/gcc/tree.cc
> > > @@ -3729,6 +3729,19 @@ array_type_nelts (const_tree type)
> > >   ? max
> > >   : fold_build2 (MINUS_EXPR, TREE_TYPE (max), max, min));
> > >  }
> > > +
> > > +/* Return, as an INTEGER_CST node, the number of elements for TYPE
> > > +   (which is an ARRAY_TYPE).  This counts only elements of the top
> > > +   array.  */
> > > +
> > > +tree
> > > +array_type_nelts_top (tree type)
> > > +{
> > > +  return fold_build2_loc (input_location,
> > > + PLUS_EXPR, sizetype,
> > > + array_type_nelts (type),
> > > + size_one_node);
> > > +}
> >
> > But this is now extremely confusing API with array_type_nelts above this
> > saying
> >
> > /* Return, as a tree node, the number of elements for TYPE (which is an
> >ARRAY_TYPE) minus one.  This counts only elements of the top array.  */
> >
> > so both are "_top".  And there's build_array_type_nelts that's taking
> > the number of elements.
> >
> > Can you please rename the existing array_type_nelts to
> > array_type_nelts_minus_one?  Then _top could be dropped as well from
> > the alternate API  you add.
>
> I wanted to do that, but then I found other functions that are named
> similarly, such as build_array_type_nelts(), and thought that I wasn't
> sure if all of them should be renamed to _minus_one, or just some.  So
> I decided to start without renaming.

Just array_type_nelts needs renaming, build_array_type_nelts is fine.

> But yeah, I think I should rename.  I'll prepare a patch for renaming it
> independently of this patch set, and send it to be merged before this
> patch set.

Thanks.

> > I'll also note since array_type_nelts_top calls the other function and that 
> > has
> >
> >   /* If they did it with unspecified bounds, then we should have already
> >  given an error about it before we got here.  */
> >   if (! TYPE_DOMAIN (type))
> > return error_mark_node;
> >
> > the function should handle error_mark_node (and pass that down).
>
> H, now I understand that (! TYPE_DOMAIN (type))
>
> $ grep -rn return.array_type_nelts gcc
> gcc/cp/call.cc:12111:return array_type_nelts_top (c->type);
> gcc/c-family/c-common.cc:4090:  return array_type_nelts_top (type);
>
> $ sed -n 12102,12119p gcc/cp/call.cc
> /* Return a tree representing the number of elements initialized by 
> the
>list-initialization C.  The caller must check that C converts to an
>array type.  */
>
> static tree
> nelts_initialized_by_list_init (conversion *c)
> {
>   /* If the array we're converting to has a dimension, we'll use 
> that.  */
>   if (TYPE_DOMAIN (c->type))
> return array_type_nelts_top (c->type);
>   else
> {
>   /* Otherwise, we look at how many elements the constructor we're
>  initializing from has.  */
>   tree ctor = conv_get_original_expr (c);
>   return size_int (CONSTRUCTOR_NELTS (ctor));
> }
> }

The point is that if you make this a general API it should be safe to be used,
not depending on constraints that are apparently checked right now.

> It seems that would fail when measuring for example
>
> #define memberof(T, member)  ((T){}.member)
>
> struct s {
> int x;
> int a[];
> };
>
> __lengthof__(memberof(struct s, a));
>
> I guess?
>
> $ cat len.c
> #include 
>
> #define memberof(T, member)  ((T){}.member)
>
> struct s {
> int x;
> int y[];
> };
>
> int
> main(int argc, char *argv[argc + 1])
> {
> int a[42];
> size_t  n;
>
> (void) argv;
>
> //n = __lengthof__(argv);
> //printf("__lengthof__(argv) == %zu\n", n);
>
>  

Re: [C++ coroutines 6/6] Testsuite.

2024-07-29 Thread Iain Sandoe
Hi Thomas,

> On 29 Jul 2024, at 10:06, Thomas Schwinge  wrote:
> On 2019-11-17T10:28:26+, Iain Sandoe  wrote:
>> There are two categories of test:
>> 
>> 1. Checks for correctly formed source code and the error reporting.
>> 2. Checks for transformation and code-gen.
>> 
>> The second set are run as 'torture' tests for the standard options
>> set, including LTO.  These are also intentionally run with no options
>> provided (from the coroutines.exp script).
> 
> I recently was confused why I'm seeing the same test case first without
> and then again with torture testing options; non-standard in the GCC test
> suite, per my experience at least?  Should we therefore add a short
> rationale comment to the 'find' in 'g++.dg/coroutines/coroutines.exp',
> why 'g++.dg/coroutines/torture/' test cases are not being filtered out
> there, despite more specific 'g++.dg/coroutines/torture/coro-torture.exp'
> testing these, too?

Well, I’d say not “more specific” so much as “additional cases” - the
torture tests do not usually** include the default (i.e. no options case).  I
wanted the default to be included - and this was a reasonable way to
do it (as would be adding another case to the torture list).  It’s quite
convenient to get at least the default case run for every test with:
` make check-gcc-c++ RUNTESTFLAGS=coroutines.exp `

having said that, if concensus would be to add an additional case to
the torture list (I think that this can be done per directory) then I’d also
be OK with that,

cheers
Iain

** last I checked anyway - if that has changed then we should adjust the
current setup.

> 
> 
> Grüße
> Thomas
> 
> 
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/coroutines/coroutines.exp
>> @@ -0,0 +1,50 @@
> 
>> +foreach test [lsort [find $srcdir/$subdir {*.[CH]}]] {
>> +if [runtest_file_p $runtests $test] {
>> +set nshort [file tail [file dirname $test]]/[file tail $test]
>> +verbose "Testing $nshort $DEFAULT_COROFLAGS" 1
>> +dg-test $test "" $DEFAULT_COROFLAGS
>> +set testcase [string range $test [string length "$srcdir/"] end]
>> +}
> 
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/coroutines/torture/coro-torture.exp
> 
>> +gcc-dg-runtest [lsort [glob $srcdir/$subdir/*.C]] "" $DEFAULT_COROFLAGS



Re: Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 );

2024-07-29 Thread Jonathan Wakely
On Mon, 29 Jul 2024 at 09:42, Ehrnsperger, Markus
 wrote:
>
> Hi,
>
>
> I'm attaching two files:
>
> 1.:   to_chars10.h:
>
> This is intended to be included in libstdc++ / gcc to achieve performance 
> improvements. It is an implementation of
>
> to_chars10(char* first, char* last,  /* integer-type */ value);
>
> Parameters are identical to std::to_chars(char* first, char* last,  /* 
> integer-type */ value, int base = 10 ); . It only works for base == 10.
>
> If it is included in libstdc++, to_chars10(...) could be renamed to 
> std::to_chars(char* first, char* last,  /* integer-type */ value) to provide 
> an overload for the default base = 10

Thanks for the email. This isn't in the form of a patch that we can
accept as-is, although I see that the license is compatible with
libstdc++, so if you are looking to contribute it then that could be
done either by assigning copyright to the FSF or under the DCO terms.
See https://gcc.gnu.org/contribute.html#legal for more details.

I haven't looked at the code in detail, but is it a similar approach
to https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ ?
How does it compare to the performance of that algorithm?

I have an incomplete implementation of that algorithm for libstdc++
somewhere, but I haven't looked at it for a while.


>
> 2.:  to_chars10.cpp:
>
> This is a test program for to_chars10 verifying the correctness of the 
> results, and measuring the performance. The actual performance improvement is 
> system dependent, so please test on your own system.
>
> On my system the performance improvement is about factor two, my results are:
>
>
> Test   int8_t verifying to_chars10 = std::to_chars ... OK
> Test  uint8_t verifying to_chars10 = std::to_chars ... OK
> Test  int16_t verifying to_chars10 = std::to_chars ... OK
> Test uint16_t verifying to_chars10 = std::to_chars ... OK
> Test  int32_t verifying to_chars10 = std::to_chars ... OK
> Test uint32_t verifying to_chars10 = std::to_chars ... OK
> Test  int64_t verifying to_chars10 = std::to_chars ... OK
> Test uint64_t verifying to_chars10 = std::to_chars ... OK
>
> Benchmarking test case   tested method  ...  time (lower is 
> better)
> Benchmarking random unsigned 64 bit  to_chars10 ...  0.00957
> Benchmarking random unsigned 64 bit  std::to_chars  ...  0.01854
> Benchmarking random   signed 64 bit  to_chars10 ...  0.01018
> Benchmarking random   signed 64 bit  std::to_chars  ...  0.02297
> Benchmarking random unsigned 32 bit  to_chars10 ...  0.00620
> Benchmarking random unsigned 32 bit  std::to_chars  ...  0.01275
> Benchmarking random   signed 32 bit  to_chars10 ...  0.00783
> Benchmarking random   signed 32 bit  std::to_chars  ...  0.01606
> Benchmarking random unsigned 16 bit  to_chars10 ...  0.00536
> Benchmarking random unsigned 16 bit  std::to_chars  ...  0.00871
> Benchmarking random   signed 16 bit  to_chars10 ...  0.00664
> Benchmarking random   signed 16 bit  std::to_chars  ...  0.01154
> Benchmarking random unsigned 08 bit  to_chars10 ...  0.00393
> Benchmarking random unsigned 08 bit  std::to_chars  ...  0.00626
> Benchmarking random   signed 08 bit  to_chars10 ...  0.00465
> Benchmarking random   signed 08 bit  std::to_chars  ...  0.01089
>
>
> Thanks, Markus
>
>
>


Re: [PATCH v1 1/3] aarch64: store signing key and signing method in DWARF _Unwind_FrameState

2024-07-29 Thread Matthieu Longo

On 2024-07-19 15:54, Matthieu Longo wrote:

This patch is only a refactoring of the existing implementation
of PAuth and returned-address signing. The existing behavior is
preserved.

_Unwind_FrameState already contains several CIE and FDE information
(see the attributes below the comment "The information we care
about from the CIE/FDE" in libgcc/unwind-dw2.h).
The patch aims at moving the information from DWARF CIE (signing
key stored in the augmentation string) and FDE (the used signing
method) into _Unwind_FrameState along the already-stored CIE and
FDE information.
Note: those information have to be saved in frame_state_reg_info
instead of _Unwind_FrameState as they need to be savable by
DW_CFA_remember_state and restorable by DW_CFA_restore_state, that
both rely on the attribute "prev".

Those new information in _Unwind_FrameState simplifies the look-up
of the signing key when the return address is demangled. It also
allows future signing methods to be easily added.

_Unwind_FrameState is not a part of the public API of libunwind,
so the change is backward compatible.

A new architecture-specific handler MD_ARCH_EXTENSION_FRAME_INIT
allows to reset values (if needed) in the frame state and unwind
context before changing the frame state to the caller context.

A new architecture-specific handler MD_ARCH_EXTENSION_CIE_AUG_HANDLER
isolates the architecture-specific augmentation strings in AArch64
backend, and allows others architectures to reuse augmentation
strings that would have clashed with AArch64 DWARF extensions.

aarch64_demangle_return_addr, DW_CFA_AARCH64_negate_ra_state and
DW_CFA_val_expression cases in libgcc/unwind-dw2-execute_cfa.h
were documented to clarify where the value of the RA state register
is stored (FS and CONTEXT respectively).

libgcc/ChangeLog:

   * config/aarch64/aarch64-unwind.h
   (AARCH64_DWARF_RA_STATE_MASK): The mask for RA state register.
   (aarch64_RA_signing_method_t): The diversifiers used to sign a
   function's return address.
   (aarch64_pointer_auth_key): The key used to sign a function's
   return address.
   (aarch64_cie_signed_with_b_key): Deleted as the signing key is
   available now in _Unwind_FrameState.
   (MD_ARCH_EXTENSION_CIE_AUG_HANDLER): New CIE augmentation string
   handler for architecture extensions.
   (MD_ARCH_EXTENSION_FRAME_INIT): New architecture-extension
   initialization routine for DWARF frame state and context before
   execution of DWARF instructions.
   (aarch64_context_RA_state_get): Read RA state register from CONTEXT.
   (aarch64_RA_state_get): Read RA state register from FS.
   (aarch64_RA_state_set): Write RA state register into FS.
   (aarch64_RA_state_toggle): Toggle RA state register in FS.
   (aarch64_cie_aug_handler): Handler AArch64 augmentation strings.
   (aarch64_arch_extension_frame_init): Initialize defaults for the
   signing key (PAUTH_KEY_A), and RA state register (RA_no_signing).
   (aarch64_demangle_return_addr): Rely on the frame registers and
   the signing_key attribute in _Unwind_FrameState.
   * unwind-dw2-execute_cfa.h:
   Use the right alias DW_CFA_AARCH64_negate_ra_state for __aarch64__
   instead of DW_CFA_GNU_window_save.
   (DW_CFA_AARCH64_negate_ra_state): Save the signing method in RA
   state register. Toggle RA state register without resetting 'how'
   to REG_UNSAVED.
   * unwind-dw2.c:
   (extract_cie_info): Save the signing key in the current
   _Unwind_FrameState while parsing the augmentation data.
   (uw_frame_state_for): Reset some attributes related to architecture
   extensions in _Unwind_FrameState.
   (uw_update_context): Move authentication code to AArch64 unwinding.
   * unwind-dw2.h (enum register_rule): Give a name to the existing
   enum for the register rules, and replace 'unsigned char' by 'enum
   register_rule' to facilitate debugging in GDB.
   (_Unwind_FrameState): Add a new architecture-extension attribute
   to store the signing key.
---
  libgcc/config/aarch64/aarch64-unwind.h | 154 -
  libgcc/unwind-dw2-execute_cfa.h|  34 --
  libgcc/unwind-dw2.c|  19 ++-
  libgcc/unwind-dw2.h|  17 ++-
  4 files changed, 175 insertions(+), 49 deletions(-)

diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
b/libgcc/config/aarch64/aarch64-unwind.h
index daf96624b5e..cc225a7e207 100644
--- a/libgcc/config/aarch64/aarch64-unwind.h
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -25,55 +25,155 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  #if !defined (AARCH64_UNWIND_H) && !defined (__ILP32__)
  #define AARCH64_UNWIND_H
  
-#define DWARF_REGNUM_AARCH64_RA_STATE 34

+#include "ansidecl.h"
+#include 
+
+#define AARCH64_DWARF_REGNUM_RA_STATE 34
+#define AARCH64_DWARF_RA_STATE_MASK   0x1
+
+/* The diversifiers used to sign a function's return address. */
+typedef enum
+{
+  AARCH64_RA_no_signing = 0x0,
+  AARCH64_RA_signing_SP = 0x1,
+} __attribute__((packed)) aarch64_RA_signing_method_t;
+

Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-29 Thread Prathamesh Kulkarni
Hi Richard,
Thanks for your suggestions on RFC email, the attached patch adds support for 
streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS.
The patch changes streaming of poly_int as follows:

Streaming out poly_int:

degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
  stream out poly_int.coeffs[i];

Streaming in poly_int:

stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
  fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
  poly_int.coeffs[i] = 0;

Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu.
LTO bootstrap+test on x86_64-linux-gnu in progress.

I am not quite sure how to test it for offloading since currently it's 
(entirely) broken for aarch64->nvptx.
I can give a try with x86_64->nvptx offloading if required (altho I guess LTO 
bootstrap should test streaming changes ?)

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
Partially support streaming of poly_int for offloading.

Support streaming of poly_int for offloading when it's degree doesn't exceed
accel's NUM_POLY_INT_COEFFS.

The patch changes streaming of poly_int as follows:

Streaming out poly_int:

degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
  stream out poly_int.coeffs[i];

Streaming in poly_int (for accelerator):

stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
  fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
  poly_int.coeffs[i] = 0;

gcc/ChangeLog:

* data-streamer-in.cc (streamer_read_poly_uint64): Stream in poly_int
degree and call poly_int_read_common. 
(streamer_read_poly_int64): Likewise.
* data-streamer-out.cc (streamer_write_poly_uint64): Stream out poly_int
degree.
(streamer_write_poly_int64): Likewise.
* data-streamer.h (bp_pack_poly_value): Likewise.
(poly_int_read_common): New function template.
(bp_unpack_poly_value): Stream in poly_int degree and call
poly_int_read_common.
* poly-int.h (C>::degree): New method.
* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Stream in
POLY_INT_CST degree, issue a fatal_error if degree is greater than
NUM_POLY_INT_COEFFS, and zero out remaining coeffs. 
* tree-streamer-out.cc (write_ts_poly_tree_pointers): Calculate and
stream out POLY_INT_CST degree.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..91cece39b05 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,11 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
 res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read a poly_int64 from IB.  */
@@ -193,10 +194,11 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
 poly_int64
 streamer_read_poly_int64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_int64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
 res.coeffs[i] = streamer_read_hwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read gcov_type value from IB.  */
diff --git a/gcc/data-streamer-out.cc b/gcc/data-streamer-out.cc
index c237e30f704..b0fb9dedb24 100644
--- a/gcc/data-streamer-out.cc
+++ b/gcc/data-streamer-out.cc
@@ -227,7 +227,9 @@ streamer_write_hwi (struct output_block *ob, HOST_WIDE_INT 
work)
 void
 streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
 streamer_write_uhwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
@@ -236,7 +238,9 @@ streamer_write_poly_uint64 (struct output_block *ob, 
poly_uint64 work)
 void
 streamer_write_poly_int64 (struct output_block *ob, poly_int64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
 streamer_write_hwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..b154c439b8c 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -142,7 +142,9 @@ bp_pack_poly_value (struct bitpack_d *bp,
const poly_int &val,
unsigned nbits)
 {
-  for (

Re: Performance improvement for std::to_chars(char* first, char* last, /* integer-type */ value, int base = 10 );

2024-07-29 Thread Jonathan Wakely
On Mon, 29 Jul 2024 at 10:45, Jonathan Wakely  wrote:
>
> On Mon, 29 Jul 2024 at 09:42, Ehrnsperger, Markus
>  wrote:
> >
> > Hi,
> >
> >
> > I'm attaching two files:
> >
> > 1.:   to_chars10.h:
> >
> > This is intended to be included in libstdc++ / gcc to achieve performance 
> > improvements. It is an implementation of
> >
> > to_chars10(char* first, char* last,  /* integer-type */ value);
> >
> > Parameters are identical to std::to_chars(char* first, char* last,  /* 
> > integer-type */ value, int base = 10 ); . It only works for base == 10.
> >
> > If it is included in libstdc++, to_chars10(...) could be renamed to 
> > std::to_chars(char* first, char* last,  /* integer-type */ value) to 
> > provide an overload for the default base = 10
>
> Thanks for the email. This isn't in the form of a patch that we can
> accept as-is, although I see that the license is compatible with
> libstdc++, so if you are looking to contribute it then that could be
> done either by assigning copyright to the FSF or under the DCO terms.
> See https://gcc.gnu.org/contribute.html#legal for more details.
>
> I haven't looked at the code in detail, but is it a similar approach
> to https://jk-jeon.github.io/posts/2022/02/jeaiii-algorithm/ ?
> How does it compare to the performance of that algorithm?
>
> I have an incomplete implementation of that algorithm for libstdc++
> somewhere, but I haven't looked at it for a while.

I took a closer look and the reinterpret_casts worried me, so I tried
your test code with UBsan. There are a number of errors that would
need to be fixed before we would consider using this code.


>
>
> >
> > 2.:  to_chars10.cpp:
> >
> > This is a test program for to_chars10 verifying the correctness of the 
> > results, and measuring the performance. The actual performance improvement 
> > is system dependent, so please test on your own system.
> >
> > On my system the performance improvement is about factor two, my results 
> > are:
> >
> >
> > Test   int8_t verifying to_chars10 = std::to_chars ... OK
> > Test  uint8_t verifying to_chars10 = std::to_chars ... OK
> > Test  int16_t verifying to_chars10 = std::to_chars ... OK
> > Test uint16_t verifying to_chars10 = std::to_chars ... OK
> > Test  int32_t verifying to_chars10 = std::to_chars ... OK
> > Test uint32_t verifying to_chars10 = std::to_chars ... OK
> > Test  int64_t verifying to_chars10 = std::to_chars ... OK
> > Test uint64_t verifying to_chars10 = std::to_chars ... OK
> >
> > Benchmarking test case   tested method  ...  time (lower is 
> > better)
> > Benchmarking random unsigned 64 bit  to_chars10 ...  0.00957
> > Benchmarking random unsigned 64 bit  std::to_chars  ...  0.01854
> > Benchmarking random   signed 64 bit  to_chars10 ...  0.01018
> > Benchmarking random   signed 64 bit  std::to_chars  ...  0.02297
> > Benchmarking random unsigned 32 bit  to_chars10 ...  0.00620
> > Benchmarking random unsigned 32 bit  std::to_chars  ...  0.01275
> > Benchmarking random   signed 32 bit  to_chars10 ...  0.00783
> > Benchmarking random   signed 32 bit  std::to_chars  ...  0.01606
> > Benchmarking random unsigned 16 bit  to_chars10 ...  0.00536
> > Benchmarking random unsigned 16 bit  std::to_chars  ...  0.00871
> > Benchmarking random   signed 16 bit  to_chars10 ...  0.00664
> > Benchmarking random   signed 16 bit  std::to_chars  ...  0.01154
> > Benchmarking random unsigned 08 bit  to_chars10 ...  0.00393
> > Benchmarking random unsigned 08 bit  std::to_chars  ...  0.00626
> > Benchmarking random   signed 08 bit  to_chars10 ...  0.00465
> > Benchmarking random   signed 08 bit  std::to_chars  ...  0.01089
> >
> >
> > Thanks, Markus
> >
> >
> >


Re: [PATCH ver 2] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-29 Thread Kewen.Lin
Hi Carl,

on 2024/7/27 06:37, Carl Love wrote:
> GCC developers:
> 
> Version 2, updated rs6000-overload.def to remove adding additonal internal 
> names and to change XXSLDWI_Q to XXSLDWI_1TI per comments from Kewen.  Move 
> new documentation statement for the PIVPR built-ins per comments from Kewen.  
> Updated dg-do-run directive and added comment about the save-temps  in 
> testcase per feedback from Segher.  Retested the patch on Power 10 with no 
> regressions.
> 
> The following patch adds the int128 varients to the existing overloaded 
> built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, 
> vec_sro.  These varients were requested by Steve Munroe.
> 
> The patch has been tested on a Power 10 system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.
> 
>    Carl
> 
> 
> ---
> rs6000, Add new overloaded vector shift builtin int128 varients
> 
> Add the signed __int128 and unsigned __int128 argument types for the
> overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
> vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
> testcase and update the documentation for the built-in.
> 
> gcc/ChangeLog:
>     * config/rs6000/altivec.md (vsdb_): Change
>     define_insn iterator to VEC_IC.
>     * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
>     __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
>     __builtin_altivec_vsrdb_v1ti): New builtin definitions.
>     * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
>     vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
>     definitions.
>     * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,

Nit: s// /

>     vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
>     built-ins.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file.
> ---
>  gcc/config/rs6000/altivec.md  |   6 +-
>  gcc/config/rs6000/rs6000-builtins.def |  12 +
>  gcc/config/rs6000/rs6000-overload.def |  40 ++
>  gcc/doc/extend.texi   |  43 +++
>  .../vec-shift-double-runnable-int128.c    | 358 ++
>  5 files changed, 456 insertions(+), 3 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> 

snip...

> 
>  [VEC_SRV, vec_srv, __builtin_vec_vsrv]
>    vuc __builtin_vec_vsrv (vuc, vuc);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 0b572afca72..83ff168faf6 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -23504,6 +23504,10 @@ const unsigned int);
>  vector signed long long, const unsigned int);
>  @exdent vector unsigned long long vec_sldb (vector unsigned long long,
>  vector unsigned long long, const unsigned int);
> +@exdent vector signed __int128 vec_sldb (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
>  @end smallexample
> 
>  Shift the combined input vectors left by the amount specified by the 
> low-order
> @@ -23531,12 +23535,51 @@ const unsigned int);
>  vector signed long long, const unsigned int);
>  @exdent vector unsigned long long vec_srdb (vector unsigned long long,
>  vector unsigned long long, const unsigned int);
> +@exdent vector signed __int128 vec_srdb (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
>  @end smallexample
> 
>  Shift the combined input vectors right by the amount specified by the 
> low-order
>  three bits of the third argument, and return the remaining 128 bits.  Code
>  using this built-in must be endian-aware.
> 
> +@smallexample
> +@exdent vector signed __int128 vec_sld (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
> +@exdent vector signed __int128 vec_sldw (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
> +vector unsigned __int128, const unsigned int);
> +@exdent vector signed __int128 vec_slo (vector signed __int128,
> +vector signed char);
> +@exdent vector signed __int128 vec_slo (vector signed __int128,
> +vector unsigned char);
> +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
> +vector signed char);
> +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
> +vector unsigned char);
> +@exdent vector signed __int128 vec_sro (vector signed __int128,
> +vector signed char);
> +@exdent vector signed __in

Re: [PATCH] rs6000, add comment to VEC_IC definition

2024-07-29 Thread Kewen.Lin
Hi Carl,

on 2024/7/27 07:31, Carl Love wrote:
> GCC maintainers:
> 
> This patch adds a comment to the VEC_IC definitions to clarify the V1TI 
> "TARGET_POWER10" mode per the request by Segher in the feedback to patch 
> "https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658156.html";.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658156.html
> 
> Please let me know if this patch is acceptable for mainline.
> 
> Thanks.
> 
>   Carl
> 
> rs6000, add comment to VEC_IC definition
> 
> This patch adds a comment to the VEC_IC definition to clarify
> the V1TI "TARGET_POWER10" mode that was added.
> 
> gcc/ChangeLog:
>     * config/rs6000/vector.md: Add comment for the VEC_IC
>     define_mode_iterator.
> ---
>  gcc/config/rs6000/vector.md | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 0d3e0a24e11..75d95ccfb47 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,7 +26,8 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> -;; Vector int modes for comparison, shift and rotation
> +;; Vector int modes for comparison, shift and rotation.  ISA 3.1 adds the 
> V1TI mode
> +;; for the int128 type.

Maybe s/int128/vector int128/, OK with/without this nit tweaked, thanks!

BR,
Kewen

>  (define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI "TARGET_POWER10")])
> 
>  ;; 128-bit int modes



Re: [PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-29 Thread Thomas Schwinge
Hi Andi!

I'm lacking all possible context here, but I noticed:

On 2024-07-25T15:55:01-0700, Andi Kleen  wrote:
> - Run the target_effective tail_call checks without optimization to
> match the actual test cases.

> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -12741,7 +12741,15 @@ proc check_effective_target_tail_call { } {
>  return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
>   __attribute__((__noipa__)) void foo (void) { }
>   __attribute__((__noipa__)) void bar (void) { foo(); }
> -} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
> +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
> +}

> +proc check_effective_target_external_tail_call { } {
> +[...]
> +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
>  }

> @@ -12751,9 +12759,9 @@ proc check_effective_target_struct_tail_call { } {
> [...]
> -} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
> +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
>  }

..., that means that a number of the new test cases are UNSUPPORTED, for
example, x86_64 GNU/Linux:

+UNSUPPORTED: c-c++-common/musttail1.c  -Wc++-compat 
+UNSUPPORTED: c-c++-common/musttail12.c  -Wc++-compat 
+PASS: c-c++-common/musttail13.c  -Wc++-compat   (test for errors, line 4)
+PASS: c-c++-common/musttail13.c  -Wc++-compat  (test for excess errors)
+UNSUPPORTED: c-c++-common/musttail2.c  -Wc++-compat 
+UNSUPPORTED: c-c++-common/musttail3.c  -Wc++-compat 
+UNSUPPORTED: c-c++-common/musttail4.c  -Wc++-compat 
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for errors, line 17)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 10)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 11)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 12)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 24)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 25)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 26)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 5)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 6)
+PASS: c-c++-common/musttail5.c  -Wc++-compat  (test for excess errors)
+UNSUPPORTED: c-c++-common/musttail7.c  -Wc++-compat 
+UNSUPPORTED: c-c++-common/musttail8.c  -Wc++-compat 

(Similarly for their C++ testing.)

+UNSUPPORTED: g++.dg/musttail10.C  
+UNSUPPORTED: g++.dg/musttail11.C  
+UNSUPPORTED: g++.dg/musttail6.C  
+UNSUPPORTED: g++.dg/musttail9.C  

..., and even a few existing test cases "regress" from PASS to
UNSUPPORTED:

[-PASS:-]{+UNSUPPORTED:+} gcc.dg/plugin/must-tail-call-1.c 
-fplugin=./must_tail_call_plugin.so[-(test for excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so[-(test for errors, line 18)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so  (test for errors, line 33)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so  (test for errors, line 40)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so  (test for errors, line 49)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so  (test for errors, line 58)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so (test for excess errors)-]

Similarly for ppc64le GNU/Linux.

Is that intentional?


Grüße
 Thomas


Re: [PATCH v1 0/3][libgcc] store signing key and signing method in DWARF _Unwind_FrameState

2024-07-29 Thread Matthieu Longo

On 2024-07-19 15:54, Matthieu Longo wrote:

This patch series is only a refactoring of the existing implementation of PAuth 
and returned-address signing. The existing behavior is preserved.

1. aarch64: store signing key and signing method in DWARF _Unwind_FrameState

_Unwind_FrameState already contains several CIE and FDE information (see the attributes 
below the comment "The information we care about from the CIE/FDE" in 
libgcc/unwind-dw2.h).
The patch aims at moving the information from DWARF CIE (signing key stored in 
the augmentation string) and FDE (the used signing method) into 
_Unwind_FrameState along the already-stored CIE and FDE information.
Note: those information have to be saved in frame_state_reg_info instead of 
_Unwind_FrameState as they need to be savable by DW_CFA_remember_state and restorable by 
DW_CFA_restore_state, that both rely on the attribute "prev".
Those new information in _Unwind_FrameState simplifies the look-up of the 
signing key when the return address is demangled. It also allows future signing 
methods to be easily added.
_Unwind_FrameState is not a part of the public API of libunwind, so the change 
is backward compatible.

A new architecture-specific handler MD_ARCH_EXTENSION_FRAME_INIT allows to 
reset values in the frame state and unwind context if needed by the 
architecture extension before changing the frame state to the caller context.
A new architecture-specific handler MD_ARCH_EXTENSION_CIE_AUG_HANDLER isolates 
the architecture-specific augmentation strings in AArch64 backend, and allows 
others architectures to reuse augmentation strings that would have clashed with 
AArch64 DWARF extensions.
aarch64_demangle_return_addr, DW_CFA_AARCH64_negate_ra_state and 
DW_CFA_val_expression cases in libgcc/unwind-dw2-execute_cfa.h were documented 
to clarify where the value of the RA state register is stored (FS and CONTEXT 
respectively).

2. libgcc: hide CIE and FDE data for DWARF architecture extensions behind a 
handler.

This patch provides a new handler MD_ARCH_FRAME_STATE_T to hide an 
architecture-specific structure containing CIE and FDE data related to DWARF 
architecture extensions.
Hiding the architecture-specific attributes behind a handler has the following 
benefits:
 1. isolating those data from the generic ones in _Unwind_FrameState
 2. avoiding casts to custom types.
 3. preserving typing information when debugging with GDB, and so 
facilitating their printing.

This approach required to add a new header md-unwind-def.h included at the top 
of libgcc/unwind-dw2.h, and redirecting to the corresponding architecture 
header via a symbolic link.
An obvious drawback is the increase in complexity with macros, and headers. It 
also caused a split of architecture definitions between md-unwind-def.h (types 
definitions used in unwind-dw2.h) and md-unwind.h (local types definitions and 
handlers implementations).
The naming of md-unwind.h with .h extension is a bit misleading as the file is 
only included in the middle of unwind-dw2.c. Changing this naming would require 
modification of others backends, which I prefered to abstain from.
Overall the benefits are worth the added complexity from my perspective.

3. libgcc: update configure (regenerated by autoreconf)

Regenerate the build files.


## Testing

Those changes were testing by covering the 3 following cases:
- backtracing.
- exception handling in a C++ program.
- gcc/testsuite/gcc.target/aarch64/pr104689.c: pac-ret with unusual DWARF [1]

Regression tested on aarch64-unknown-linux-gnu, and no regression found.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594414.html


Ok for master? I don't have commit access so I need someone to commit on my 
behalf.

Regards,
Matthieu.


Matthieu Longo (3):
   aarch64: store signing key and signing method in DWARF
 _Unwind_FrameState
   libgcc: hide CIE and FDE data for DWARF architecture extensions behind
 a handler.
   libgcc: update configure (regenerated by autoreconf)

  libgcc/Makefile.in |   6 +-
  libgcc/config.host |  13 +-
  libgcc/config/aarch64/aarch64-unwind-def.h |  41 ++
  libgcc/config/aarch64/aarch64-unwind.h | 150 +
  libgcc/configure   |   2 +
  libgcc/configure.ac|   1 +
  libgcc/unwind-dw2-execute_cfa.h|  34 +++--
  libgcc/unwind-dw2.c|  19 ++-
  libgcc/unwind-dw2.h|  19 ++-
  9 files changed, 233 insertions(+), 52 deletions(-)
  create mode 100644 libgcc/config/aarch64/aarch64-unwind-def.h



Adding Ian in CC as he is listed as the maintainer of libgcc in 
MAINTAINERS file.


Ping: [PATCH] recog: Disallow subregs in mode-punned value [PR115881]

2024-07-29 Thread Richard Sandiford
Ping

Richard Sandiford  writes:
> In g:9d20529d94b23275885f380d155fe8671ab5353a, I'd extended
> insn_propagation to handle simple cases of hard-reg mode punning.
> The punned "to" value was created using simplify_subreg rather
> than simplify_gen_subreg, on the basis that hard-coded subregs
> aren't generally useful after RA (where hard-reg propagation is
> expected to happen).
>
> This PR is about a case where the subreg gets pushed into the
> operands of a plus, but the subreg on one of the operands
> cannot be simplified.  Specifically, we have to generate
> (subreg:SI (reg:DI sp) 0) rather than (reg:SI sp), since all
> references to the stack pointer must be via stack_pointer_rtx.
>
> However, code in x86 (reasonably) expects no subregs of registers
> to appear after RA, except for special cases like strict_low_part.
> This leads to an awkward situation where we can't ban subregs of sp
> (because of the strict_low_part use), can't allow direct references
> to sp in other modes (because of the stack_pointer_rtx requirement),
> and can't allow rvalue uses of the subreg (because of the "no subregs
> after RA" assumption).  It all seems a bit of a mess...
>
> I sat on this for a while in the hope that a clean solution might
> become apparent, but in the end, I think we'll just have to check
> manually for nested subregs and punt on them.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?
>
> Richard

gcc/
PR rtl-optimization/115881
* recog.cc: Include rtl-iter.h.
(insn_propagation::apply_to_rvalue_1): Check that the result
of simplify_subreg does not include nested subregs.

gcc/tetsuite/
PR rtl-optimization/115881
* cc.c-torture/compile/pr115881.c: New test.
---
 gcc/recog.cc  | 21 +++
 .../gcc.c-torture/compile/pr115881.c  | 16 ++
 2 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr115881.c

diff --git a/gcc/recog.cc b/gcc/recog.cc
index 54b317126c2..23e4820180f 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "reload.h"
 #include "tree-pass.h"
 #include "function-abi.h"
+#include "rtl-iter.h"
 
 #ifndef STACK_POP_CODE
 #if STACK_GROWS_DOWNWARD
@@ -1082,6 +1083,7 @@ insn_propagation::apply_to_rvalue_1 (rtx *loc)
  || !REG_CAN_CHANGE_MODE_P (REGNO (x), GET_MODE (from),
 GET_MODE (x)))
return false;
+
  /* If the reference is paradoxical and the replacement
 value contains registers, we would need to check that the
 simplification below does not increase REG_NREGS for those
@@ -1090,11 +1092,30 @@ insn_propagation::apply_to_rvalue_1 (rtx *loc)
  if (paradoxical_subreg_p (GET_MODE (x), GET_MODE (from))
  && !CONSTANT_P (to))
return false;
+
  newval = simplify_subreg (GET_MODE (x), to, GET_MODE (from),
subreg_lowpart_offset (GET_MODE (x),
   GET_MODE (from)));
  if (!newval)
return false;
+
+ /* Check that the simplification didn't just push an explicit
+subreg down into subexpressions.  In particular, for a register
+R that has a fixed mode, such as the stack pointer, a subreg of:
+
+  (plus:M (reg:M R) (const_int C))
+
+would be:
+
+  (plus:N (subreg:N (reg:M R) ...) (const_int C'))
+
+But targets can legitimately assume that subregs of hard registers
+will not be created after RA (except in special circumstances,
+such as strict_low_part).  */
+ subrtx_iterator::array_type array;
+ FOR_EACH_SUBRTX (iter, array, newval, NONCONST)
+   if (GET_CODE (*iter) == SUBREG)
+ return false;
}
 
   if (should_unshare)
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115881.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115881.c
new file mode 100644
index 000..8379704c4c8
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115881.c
@@ -0,0 +1,16 @@
+typedef unsigned u32;
+int list_is_head();
+void tu102_acr_wpr_build_acr_0_0_0(int, long, u32);
+void tu102_acr_wpr_build() {
+  u32 offset = 0;
+  for (; list_is_head();) {
+int hdr;
+u32 _addr = offset, _size = sizeof(hdr), *_data = &hdr;
+while (_size--) {
+  tu102_acr_wpr_build_acr_0_0_0(0, _addr, *_data++);
+  _addr += 4;
+}
+offset += sizeof(hdr);
+  }
+  tu102_acr_wpr_build_acr_0_0_0(0, offset, 0);
+}


Re: [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space

2024-07-29 Thread Thomas Schwinge
Hi!

On 2024-02-10T17:26:01+, Andrew Burgess  wrote:
> --- a/libiberty/argv.c
> +++ b/libiberty/argv.c

> @@ -439,17 +442,8 @@ expandargv (int *argcp, char ***argvp)
>   }
>/* Add a NUL terminator.  */
>buffer[len] = '\0';
> -  /* If the file is empty or contains only whitespace, buildargv would
> -  return a single empty argument.  In this context we want no arguments,
> -  instead.  */
> -  if (only_whitespace (buffer))
> - {
> -   file_argv = (char **) xmalloc (sizeof (char *));
> -   file_argv[0] = NULL;
> - }
> -  else
> - /* Parse the string.  */
> - file_argv = buildargv (buffer);
> +  /* Parse the string.  */
> +  file_argv = buildargv (buffer);
>/* If *ARGVP is not already dynamically allocated, copy it.  */
>if (*argvp == original_argv)
>   *argvp = dupargv (*argvp);

With that (single) use of 'only_whitespace' now gone:

[...]/source-gcc/libiberty/argv.c:128:1: warning: ‘only_whitespace’ defined 
but not used [-Wunused-function]
  128 | only_whitespace (const char* input)
  | ^~~


Grüße
 Thomas


Re: [RFC v1 0/2] c: Add _Lengthof operator

2024-07-29 Thread Joseph Myers
On Sun, 28 Jul 2024, Alejandro Colomar wrote:

>  gcc/Makefile.in   |  1 +
>  gcc/c-family/c-common.cc  | 20 +
>  gcc/c-family/c-common.def |  4 ++
>  gcc/c-family/c-common.h   |  2 +
>  gcc/c/c-parser.cc | 35 +++
>  gcc/c/c-tree.h|  4 ++
>  gcc/c/c-typeck.cc | 84 +++
>  gcc/cp/cp-tree.h  |  1 -
>  gcc/cp/operators.def  |  1 +
>  gcc/cp/tree.cc| 13 --
>  gcc/ginclude/stdlength.h  | 35 +++
>  gcc/rust/backend/rust-tree.cc | 13 --
>  gcc/rust/backend/rust-tree.h  |  2 -
>  gcc/target.h  |  3 ++
>  gcc/tree.cc   | 13 ++
>  gcc/tree.h|  1 +

Please start with documentation and testcases, neither of which are 
included here - making sure that both documentation and testcases cover 
all the error cases and questions of e.g. evaluation of VLA operands.  
Documentation and testcases are the most important pieces for reviewing a 
proposed addition of a new language feature, before the actual 
implementation.

A relevant semantic question to answer here: sizeof evaluates all VLA 
operands, should this operator do likewise, or should it only evaluate 
when the toplevel array is of variable length (but not for a 
constant-length array of variable-size elements)?

-- 
Joseph S. Myers
josmy...@redhat.com



Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-29 Thread Richard Biener
On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:

> Hi Richard,
> Thanks for your suggestions on RFC email, the attached patch adds support for 
> streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS.
> The patch changes streaming of poly_int as follows:
> 
> Streaming out poly_int:
> 
> degree = poly_int.degree();
> stream out degree;
> for (i = 0; i < degree; i++)
>   stream out poly_int.coeffs[i];
> 
> Streaming in poly_int:
> 
> stream in degree;
> if (degree > NUM_POLY_INT_COEFFS)
>   fatal_error();
> stream in coeffs;
> // Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
> for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
>   poly_int.coeffs[i] = 0;
> 
> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu.
> LTO bootstrap+test on x86_64-linux-gnu in progress.
> 
> I am not quite sure how to test it for offloading since currently it's 
> (entirely) broken for aarch64->nvptx.
> I can give a try with x86_64->nvptx offloading if required (altho I guess LTO 
> bootstrap should test streaming changes ?)

+  unsigned degree
+= bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
HOST_WIDE_INT));

The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
I'd say we should constrain it to a reasonable upper bound,
like 2?  Maybe even have MAX_NUM_POLY_INT_COEFFS or 
NUM_POLY_INT_COEFFS_BITS in poly-int.h and constrain NUM_POLY_INT_COEFFS.

The patch looks reasonable over all, but Richard S. should have a say
about the abstraction you chose and the poly-int adjustment.

Thanks,
Richard.


> Signed-off-by: Prathamesh Kulkarni 
> 
> Thanks,
> Prathamesh
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Looking for someone to do paid work on SH backend

2024-07-29 Thread John Paul Adrian Glaubitz
Hello,

the SH backend in GCC is currently in rather poor shape and could need some
overhaul and for that matter, I'm looking for an experienced GCC developer
who might be willing to fix some bugs in the backend for money.

I have started similar efforts in the past for the avr, m68k and vax backends,
back then using the platform Bountysource in order to convert those backends
from cc0 to MODE_CC with great success. All of these backends could be saved
from being removed from GCC due to this work.

Unfortunately, Bountysource went bankrupt, so we can't just put up bounties
there again to work on GCC. However, there are plenty alternatives such as
OpenCollective, Patreon and Algora.io and probably more.

There is also a larger homebrew community around the SuperH architecture, mainly
due to the Sega Dreamcast console. Popular projects are the KallistiOS [1] and
various game ports to the Dreamcast such as GTA3, Counterstrike and many more 
[2].

Chances are therefore not bad that we would be able to collect some funds to
motivate an experienced GCC developer to work on the SH backend. There is 
already
an ongoing discussion in one of the Dreamcast forums on this topic and several
people already said they'd be willing to support the effort [3].

Thus, I'm wondering now whether there is any GCC developer out there who would 
be
willing to fix some of the bugs in the SH backend for money. There are currently
194 bugs that can be associated with the SH backend [4].

While I don't expect it to be realistic to get all of these bugs fixed, the most
important bug to be fixed would be PR55212 [5] which concerns switching the 
backend
to the new register allocator LRA. We'd already moved a huge step forward if the
backend could be switched to LRA as one of the most common failures that we're
seeing downstream in distributions are register allocation failures such as the
one reported in PR81426 [6].

Would there be anyone willing to work on switching the SH backend to LRA while
being paid for it? And what would be their payment platform of choice?

Thanks,
Adrian

> [1] https://github.com/KallistiOS/KallistiOS
> [2] https://en.wikipedia.org/wiki/List_of_Dreamcast_homebrew_games
> [3] https://dcemulation.org/phpBB/viewtopic.php?t=106838
> [4]
https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=SUSPENDED&bug_status=WAITING&bug_status=REOPENED&bug_status=VERIFIED&cf_known_to_fail_type=allwords&cf_known_to_work_type=allwords&f1=cf_gcctarget&f2=commenter&f3=reporter&j_top=OR&list_id=436831&o1=regexp&o2=allwordssubstr&o3=allwordssubstr&order=Last%20Changed&query_format=advanced&v1=sh.%2A
> [5] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212
> [6] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81426

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH] doc: Improve punctuation and grammar in -fdiagnostics-format docs

2024-07-29 Thread Jonathan Wakely

On 15/03/24 13:02 +, Jonathan Wakely wrote:

OK for trunk?


Ping


-- >8 --

The hyphen can be misunderstood to mean "emitted to -" i.e. stdout.
Refer to both forms by name, rather than using "the former" for one and
referring to the other by name.

gcc/ChangeLog:

* doc/invoke.texi (Diagnostic Message Formatting Options):
Replace hyphen with a new sentence. Replace "the former" with
the actual value.
---
gcc/doc/invoke.texi | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 85c938d4a14..d850b5fcdcc 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -5737,8 +5737,9 @@ named @file{@var{source}.sarif}, respectively.

The @samp{json} format is a synonym for @samp{json-stderr}.
The @samp{json-stderr} and @samp{json-file} formats are identical, apart from
-where the JSON is emitted to - with the former, the JSON is emitted to stderr,
-whereas with @samp{json-file} it is written to @file{@var{source}.gcc.json}.
+where the JSON is emitted to.  With @samp{json-stderr}, the JSON is emitted
+to stderr, whereas with @samp{json-file} it is written to
+@file{@var{source}.gcc.json}.

The emitted JSON consists of a top-level JSON array containing JSON objects
representing the diagnostics.




[Patch] libgomp.texi: Update 'Device Information Routines' section

2024-07-29 Thread Tobias Burnus

I recently stumbled over omp_get_default_device returning -1 (= 
omp_initial_device)
vs. returning omp_get_num_devices(). Thus, it makes sense to document this 
properly.
I also updated some wording and made a tiny step to documenting the missing 
functions
by adding a title to the commented @menu items.

→ https://gcc.gnu.org/onlinedocs/libgomp/#toc-OpenMP-Runtime-Library-Routines
for the current wording.

Comments or suggestions before I commit it?

Tobias
libgomp.texi: Update 'Device Information Routines' section

Update 'OpenMP Runtime Library Routines' by adding a note that invoking
inside a target region might invoke unspecified behavior. Additionally,
update omp_{get,set}_default_device for omp_{initial,invalid}_device
named constants.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Runtime Library Routines): Add missing
	title to some commented still undocumented items.
	(Device Information Routines): Update.

 libgomp/libgomp.texi | 48 +---
 1 file changed, 33 insertions(+), 15 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 50da248b74d..8fe74d58562 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1208,11 +1208,11 @@ They have C linkage and do not throw exceptions.
 
 @menu
 * omp_get_proc_bind::   Whether threads may be moved between CPUs
-@c * omp_get_num_places:: 
-@c * omp_get_place_num_procs:: 
-@c * omp_get_place_proc_ids:: 
-@c * omp_get_place_num:: 
-@c * omp_get_partition_num_places:: 
+@c * omp_get_num_places::   Get the number of places available
+@c * omp_get_place_num_procs::  Get the number of processes associated with a place
+@c * omp_get_place_proc_ids::   Get number of processes associated with a place
+@c * omp_get_place_num::Get place number of the associated task
+@c * omp_get_partition_num_places:: Get number of places of innermost task
 @c * omp_get_partition_place_nums:: 
 @c * omp_set_affinity_format:: 
 @c * omp_get_affinity_format:: 
@@ -1627,8 +1627,12 @@ Returns the number of processors online on that device.
 @subsection @code{omp_set_default_device} -- Set the default device for target regions
 @table @asis
 @item @emph{Description}:
-Set the default device for target regions without device clause.  The argument
-shall be a nonnegative device number.
+Get the value of the @emph{default-device-var} ICV, which is used
+for target regions without device clause.  The argument
+shall be a nonnegative device number, @code{omp_initial_device},
+or @code{omp_invalid_device}.
+
+The effect of running this routine in a @code{target} region is unspecified.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -1654,7 +1658,15 @@ shall be a nonnegative device number.
 @subsection @code{omp_get_default_device} -- Get the default device for target regions
 @table @asis
 @item @emph{Description}:
-Get the default device for target regions without device clause.
+Get the value of the @emph{default-device-var} ICV, which is used
+for target regions without device clause. The value is either a
+nonnegative device number, @code{omp_initial_device} or
+@code{omp_invalid_device}. Note that for the host, the ICV can have two values
+and, hence, this routine might return either the value of the named constant
+@code{omp_initial_device} or the value returned by the
+@code{omp_get_initial_device} routine.
+
+The effect of running this routine in a @code{target} region is unspecified.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -1667,7 +1679,8 @@ Get the default device for target regions without device clause.
 @end multitable
 
 @item @emph{See also}:
-@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
+@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device},
+@ref{omp_get_initial_device}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
@@ -1681,6 +1694,8 @@ Get the default device for target regions without device clause.
 @item @emph{Description}:
 Returns the number of target devices.
 
+The effect of running this routine in a @code{target} region is unspecified.
+
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
@@ -1702,9 +1717,9 @@ Returns the number of target devices.
 @table @asis
 @item @emph{Description}:
 This function returns a device number that represents the device that the
-current thread is executing on. For OpenMP 5.0, this must be equal to the
-value returned by the @code{omp_get_initial_device} function when called
-from the host.
+current thread is executing on. When called on the host, it returns
+the same value as returned by the @code{omp_get_initial_device} function
+as required since OpenMP 5.0.
 
 @item @emph{C/C++}
 @multitable @columnfractions .20 .80
@@ -1754,9 +1769,11 @@ their language-specific counterparts.
 @table @asis
 @item @emph{Description}:
 This function returns a device number that rep

Re: [PATCH v3 2/3] aarch64: Add support for moving fpm system register

2024-07-29 Thread Richard Sandiford
Claudio Bantaloukas  writes:
> Unlike most system registers, fpmr can be heavily written to in code that
> exercises the fp8 functionality. That is because every fp8 instrinsic call
> can potentially change the value of fpmr.
> Rather than just use a an unspec, we treat the fpmr system register like

Typo: s/a an/an/

> all other registers and use a move operation to read and write to it.
>
> We introduce a new class of moveable system registers that, currently,
> only accepts fpmr and a new constraint, Umv, that allows us to
> selectively use mrs and msr instructions when expanding rtl for them.
> Given that there is code that depends on "real" registers coming before
> "fake" ones, we introduce a new constant FPM_REGNUM that uses an
> existing value and renumber registers below that.
> This requires us to update the bitmaps that describe which registers
> belong to each register class.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
>   support for MOVEABLE_SYSREGS class.
>   (aarch64_hard_regno_mode_ok): Allow reads and writes to fpmr.
>   (aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
>   (aarch64_class_max_nregs): Likewise.
>   * config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
>   (CALL_REALLY_USED_REGISTERS): Likewise.
>   (REGISTER_NAMES): Likewise.
>   (enum reg_class): Add MOVEABLE_SYSREGS class.
>   (REG_CLASS_NAMES): Likewise.
>   (REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
>   the new MOVEABLE_REGS class and renumbering of registers.
>   * config/aarch64/aarch64.md: (FPM_REGNUM): added new register
>   number, reusing old value.
>   (FFR_REGNUM): Renumber.
>   (FFRT_REGNUM): Likewise.
>   (LOWERING_REGNUM): Likewise.
>   (TPIDR2_BLOCK_REGNUM): Likewise.
>   (SME_STATE_REGNUM): Likewise.
>   (TPIDR2_SETUP_REGNUM): Likewise.
>   (ZA_FREE_REGNUM): Likewise.
>   (ZA_SAVED_REGNUM): Likewise.
>   (ZA_REGNUM): Likewise.
>   (ZT0_REGNUM): Likewise.
>   (*mov_aarch64): Add support for moveable sysregs.
>   (*movsi_aarch64): Likewise.
>   (*movdi_aarch64): Likewise.
>   * config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/acle/fp8.c: New tests.
> [...]
> @@ -1405,6 +1409,8 @@ (define_insn "*mov_aarch64"
>   [w, r Z  ; neon_from_gp, nosimd ] fmov\t%s0, %w1
>   [w, w; neon_dup   , simd   ] dup\t%0, %1.[0]
>   [w, w; neon_dup   , nosimd ] fmov\t%s0, %s1
> + [Umv, r  ; mrs, *  ] msr\t%0, %x1
> + [r, Umv  ; mrs, *  ] mrs\t%x0, %1
>}
>  )
>  
> @@ -1467,6 +1473,8 @@ (define_insn_and_split "*movsi_aarch64"
>   [r  , w  ; f_mrc, fp  , 4] fmov\t%w0, %s1
>   [w  , w  ; fmov , fp  , 4] fmov\t%s0, %s1
>   [w  , Ds ; neon_move, simd, 4] << 
> aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);
> + [Umv, r  ; mrs  , *   , 8] msr\t%0, %x1
> + [r, Umv  ; mrs  , *   , 8] mrs\t%x0, %1

The lengths should be 4 rather than 8.

>}
>"CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), 
> SImode)
>  && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
> @@ -1505,6 +1513,8 @@ (define_insn_and_split "*movdi_aarch64"
>   [w, w  ; fmov , fp  , 4] fmov\t%d0, %d1
>   [w, Dd ; neon_move, simd, 4] << 
> aarch64_output_scalar_simd_mov_immediate (operands[1], DImode);
>   [w, Dx ; neon_move, simd, 8] #
> + [Umv, r; mrs  , *   , 8] msr\t%0, %1
> + [r, Umv; mrs  , *   , 8] mrs\t%0, %1

Similarly here.

>}
>"CONST_INT_P (operands[1])
> && REG_P (operands[0])
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> index 459442be155..1a5c3d7e8fd 100644
> --- a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> @@ -1,6 +1,7 @@
>  /* Test the fp8 ACLE intrinsics family.  */
>  /* { dg-do compile } */
>  /* { dg-options "-O1 -march=armv8-a" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
>  
>  #include 
>  
> @@ -17,4 +18,107 @@
>  #error "__ARM_FEATURE_FP8 feature macro defined."
>  #endif
>  
> +/*
> +**test_write_fpmr_sysreg_asm_64:
> +**   msr fpmr, x0
> +**   ret
> +*/
> +void
> +test_write_fpmr_sysreg_asm_64 (uint64_t val)
> +{
> +  register uint64_t fpmr asm ("fpmr") = val;
> +  asm volatile ("" ::"Umv"(fpmr));
> +}
> +
> +/*
> +**test_write_fpmr_sysreg_asm_32:
> +**   uxtwx0, w0
> +**   msr fpmr, x0
> +**   ret
> +*/
> +void
> +test_write_fpmr_sysreg_asm_32 (uint32_t val)
> +{
> +  register uint64_t fpmr asm ("fpmr") = val;

By using uint64_t rather than uint32_t, these tests are testing movdi
rather than the smaller move patterns.  I think it should be uint32_t
instead.  We should then have just an MSR, without an extension.

Si

[PATCH 1/3] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-29 Thread Richard Biener
The following adds a target hook to specify whether regs of MODE can be
used to transfer bits.  The hook is supposed to be used for value-numbering
to decide whether a value loaded in such mode can be punned to another
mode instead of re-loading the value in the other mode and for SRA to
decide whether MODE is suitable as container holding a value to be
used in different modes.

* target.def (mode_can_transfer_bits): New target hook.
* target.h (mode_can_transfer_bits): New function wrapping the
hook and providing default behavior.
* doc/tm.texi.in: Update.
* doc/tm.texi: Re-generate.
---
 gcc/doc/tm.texi|  6 ++
 gcc/doc/tm.texi.in |  2 ++
 gcc/target.def |  8 
 gcc/target.h   | 15 +++
 4 files changed, 31 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c7535d07f4d..fa53c23f1de 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4545,6 +4545,12 @@ is either a declaration of type int or accessed by 
dereferencing
 a pointer to int.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_MODE_CAN_TRANSFER_BITS (machine_mode 
@var{mode})
+Define this to return false if the mode @var{mode} cannot be used
+for memory copying.  The default is to assume modes with the same
+precision as size are fine to be used.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_TRANSLATE_MODE_ATTRIBUTE 
(machine_mode @var{mode})
 Define this hook if during mode attribute processing, the port should
 translate machine_mode @var{mode} to another mode.  For example, rs6000's
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 64cea3b1eda..8af3f414505 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3455,6 +3455,8 @@ stack.
 
 @hook TARGET_REF_MAY_ALIAS_ERRNO
 
+@hook TARGET_MODE_CAN_TRANSFER_BITS
+
 @hook TARGET_TRANSLATE_MODE_ATTRIBUTE
 
 @hook TARGET_SCALAR_MODE_SUPPORTED_P
diff --git a/gcc/target.def b/gcc/target.def
index 3de1aad4c84..4356ef2f974 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3363,6 +3363,14 @@ a pointer to int.",
  bool, (ao_ref *ref),
  default_ref_may_alias_errno)
 
+DEFHOOK
+(mode_can_transfer_bits,
+ "Define this to return false if the mode @var{mode} cannot be used\n\
+for memory copying.  The default is to assume modes with the same\n\
+precision as size are fine to be used.",
+ bool, (machine_mode mode),
+ NULL)
+
 /* Support for named address spaces.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_ADDR_SPACE_"
diff --git a/gcc/target.h b/gcc/target.h
index c1f99b97b86..c888ad39897 100644
--- a/gcc/target.h
+++ b/gcc/target.h
@@ -312,6 +312,21 @@ estimated_poly_value (poly_int64 x,
 return targetm.estimated_poly_value (x, kind);
 }
 
+/* Return true when MODE can be used to copy GET_MODE_BITSIZE bits
+   unchanged.  */
+
+inline bool
+mode_can_transfer_bits (machine_mode mode)
+{
+  if (mode == BLKmode)
+return true;
+  if (maybe_ne (GET_MODE_BITSIZE (mode), GET_MODE_PRECISION (mode)))
+return false;
+  if (targetm.mode_can_transfer_bits)
+return targetm.mode_can_transfer_bits (mode);
+  return true;
+}
+
 #ifdef GCC_TM_H
 
 #ifndef CUMULATIVE_ARGS_MAGIC
-- 
2.35.3



[PATCH 2/3] [x86] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-29 Thread Richard Biener
The following implements the hook, excluding x87 modes.

* i386.cc (TARGET_MODE_CAN_TRANSFER_BITS): Define.
(ix86_mode_can_transfer_bits): New function.
---
 gcc/config/i386/i386.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 12d15feb5e9..584417992a0 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26113,6 +26113,14 @@ ix86_have_ccmp ()
   return (bool) TARGET_APX_CCMP;
 }
 
+/* Implement TARGET_MODE_CAN_TRANSFER_BITS.  */
+static bool
+ix86_mode_can_transfer_bits (machine_mode mode)
+{
+  return (!SCALAR_FLOAT_MODE_P (mode)
+ || (TARGET_SSE_MATH && !TARGET_MIX_SSE_I387 && mode != XFmode));
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -26959,6 +26967,9 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_HAVE_CCMP
 #define TARGET_HAVE_CCMP ix86_have_ccmp
 
+#undef TARGET_MODE_CAN_TRANSFER_BITS
+#define TARGET_MODE_CAN_TRANSFER_BITS ix86_mode_can_transfer_bits
+
 static bool
 ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
 {
-- 
2.35.3



[PATCH 3/3] tree-optimization/114659 - VN and FP to int punning

2024-07-29 Thread Richard Biener
The following addresses another case where x87 FP loads mangle the
bit representation and thus are not suitable for a representative
in other types.  VN was value-numbering a later integer load of 'x'
as the same as a former float load of 'x'.

We can use the new TARGET_MODE_CAN_TRANSFER_BITS hook to identify
problematic modes and enforce strict compatibility for those in
the reference comparison, improving the handling of modes with
padding in visit_reference_op_load.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR tree-optimization/114659
* tree-ssa-sccvn.cc (visit_reference_op_load): Do not
prevent punning from modes with padding here, but ...
(vn_reference_eq): ... ensure this here, also honoring
types with modes that cannot act as bit container.

* gcc.target/i386/pr114659.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr114659.c | 62 
 gcc/tree-ssa-sccvn.cc| 11 ++---
 2 files changed, 66 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114659.c

diff --git a/gcc/testsuite/gcc.target/i386/pr114659.c 
b/gcc/testsuite/gcc.target/i386/pr114659.c
new file mode 100644
index 000..e1e24d55687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114659.c
@@ -0,0 +1,62 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int
+my_totalorderf (float const *x, float const *y)
+{
+  int xs = __builtin_signbit (*x);
+  int ys = __builtin_signbit (*y);
+  if (!xs != !ys)
+return xs;
+
+  int xn = __builtin_isnan (*x);
+  int yn = __builtin_isnan (*y);
+  if (!xn != !yn)
+return !xn == !xs;
+  if (!xn)
+return *x <= *y;
+
+  unsigned int extended_sign = -!!xs;
+  union { unsigned int i; float f; } xu = {0}, yu = {0};
+  __builtin_memcpy (&xu.f, x, sizeof (float));
+  __builtin_memcpy (&yu.f, y, sizeof (float));
+  return (xu.i ^ extended_sign) <= (yu.i ^ extended_sign);
+}
+
+static float
+positive_NaNf ()
+{
+  float volatile nan = 0.0f / 0.0f;
+  return (__builtin_signbit (nan) ? - nan : nan);
+}
+
+typedef union { float value; unsigned int word[1]; } memory_float;
+
+static memory_float
+construct_memory_SNaNf (float quiet_value)
+{
+  memory_float m;
+  m.value = quiet_value;
+  m.word[0] ^= (unsigned int) 1 << 22;
+  m.word[0] |= (unsigned int) 1;
+  return m;
+}
+
+memory_float x[7] =
+  {
+{ 0 },
+{ 1e-5 },
+{ 1 },
+{ 1e37 },
+{ 1.0f / 0.0f },
+  };
+
+int
+main ()
+{
+  x[5] = construct_memory_SNaNf (positive_NaNf ());
+  x[6] = (memory_float) { positive_NaNf () };
+  if (! my_totalorderf (&x[5].value, &x[6].value))
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index dc377fa16ce..0639ba426ff 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -837,6 +837,9 @@ vn_reference_eq (const_vn_reference_t const vr1, 
const_vn_reference_t const vr2)
TYPE_VECTOR_SUBPARTS (vr2->type)))
return false;
 }
+  else if (TYPE_MODE (vr1->type) != TYPE_MODE (vr2->type)
+  && !mode_can_transfer_bits (TYPE_MODE (vr1->type)))
+return false;
 
   i = 0;
   j = 0;
@@ -5814,13 +5817,7 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
   if (result
   && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
 {
-  /* Avoid the type punning in case the result mode has padding where
-the op we lookup has not.  */
-  if (TYPE_MODE (TREE_TYPE (result)) != BLKmode
- && maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
-  GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)
-   result = NULL_TREE;
-  else if (CONSTANT_CLASS_P (result))
+  if (CONSTANT_CLASS_P (result))
result = const_unop (VIEW_CONVERT_EXPR, TREE_TYPE (op), result);
   else
{
-- 
2.35.3


Re: [PATCH 4/5] RISC-V: Add support to vector stack-clash protection

2024-07-29 Thread Raphael Zinsly
On Fri, Jul 26, 2024 at 6:48 PM Jeff Law  wrote:
>
>
>
> On 7/24/24 12:00 PM, Raphael Moreira Zinsly wrote:
> > Adds basic support to vector stack-clash protection using a loop to do
> > the probing and stack adjustments.
> >
> > gcc/ChangeLog:
> >   * config/riscv/riscv.cc
> >   (riscv_allocate_and_probe_stack_loop): New function.
> >   (riscv_v_adjust_scalable_frame): Add stack-clash protection
> >   support.
> >   (riscv_allocate_and_probe_stack_space): Move the probe loop
> >   implementation to riscv_allocate_and_probe_stack_loop.
> >   * config/riscv/riscv.h: Define RISCV_STACK_CLASH_VECTOR_CFA_REGNUM.
> >
> > gcc/testsuite/ChangeLog:
> >   * gcc.target/riscv/stack-check-cfa-3.c: New test.
> >   * gcc.target/riscv/stack-check-prologue-16.c: New test.
> >   * gcc.target/riscv/struct_vect_24.c: New test.
> So my only worry here is using another scratch register in the prologue
> code instead of using one of the preexisting prologue scratch registers.
>   Is there a reasonable way to use  PROLOGUE_TEMP or PROLOGUE_TEMP2 here?

These are the preexisting prologue scratch registers: PROLOGUE_TEMP is
t0 and PROLOGUE_TEMP2 is t1.

> Otherwise this looks good as well.  So let's get closure on that
> question and we can move forward after that.
>
> jeff



-- 
Raphael Moreira Zinsly


RE: [PATCH v1] Widening-Mul: Try .SAT_SUB for PLUS_EXPR when one op is IMM

2024-07-29 Thread Li, Pan2
> OK

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, July 29, 2024 5:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Try .SAT_SUB for PLUS_EXPR when one op is 
IMM

On Sun, Jul 28, 2024 at 5:25 AM  wrote:
>
> From: Pan Li 
>
> After add the matching for .SAT_SUB when one op is IMM,  there
> will be a new root PLUS_EXPR for the .SAT_SUB pattern.  For example,
>
> Form 3:
>   #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_3 (T x)  \
>   {   \
> return x >= IMM ? x - IMM : 0;\
>   }
>
> DEF_SAT_U_SUB_IMM_FMT_3(uint64_t, 11)
>
> And then we will have gimple before widening-mul as below.  Thus,  try
> the .SAT_SUB for the PLUS_EXPR.
>
>4   │ __attribute__((noinline))
>5   │ uint64_t sat_u_sub_imm11_uint64_t_fmt_3 (uint64_t x)
>6   │ {
>7   │   long unsigned int _1;
>8   │   uint64_t _3;
>9   │
>   10   │[local count: 1073741824]:
>   11   │   _1 = MAX_EXPR ;
>   12   │   _3 = _1 + 18446744073709551605;
>   13   │   return _3;
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK

> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Try .SAT_SUB for PLUS_EXPR case.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-ssa-math-opts.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index ac86be8eb94..8d96a4c964b 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6129,6 +6129,7 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>
> case PLUS_EXPR:
>   match_unsigned_saturation_add (&gsi, as_a (stmt));
> + match_unsigned_saturation_sub (&gsi, as_a (stmt));
>   /* fall-through  */
> case MINUS_EXPR:
>   if (!convert_plusminus_to_widen (&gsi, stmt, code))
> --
> 2.34.1
>


[PATCH 1/2] c++: fix ICE on FUNCTION_DECLs inside coroutines [PR115906]

2024-07-29 Thread Arsen Arsenović
When register_local_var_uses iterates a BIND_EXPRs BIND_EXPR_VARS, it
fails to account for the fact that FUNCTION_DECLs might be present, and
later passes it to DECL_HAS_VALUE_EXPR_P.  This leads to a tree check
failure in DECL_HAS_VALUE_EXPR_P:

  tree check: expected var_decl or parm_decl or result_decl, have
  function_decl in register_local_var_uses

Much like types and namespaces, we don't need to check FUNCTION_DECLs.
Simply skip them.

PR c++/115906 - [coroutines] missing diagnostic and ICE when co_await used as 
default argument in function declaration

gcc/cp/ChangeLog:

PR c++/115906
* coroutines.cc (register_local_var_uses): Skip FUNCTION_DECLs.

gcc/testsuite/ChangeLog:

PR c++/115906
* g++.dg/coroutines/coro-function-decl.C: New test.
---
Tested on x86_64-pc-linux-gnu.

OK for trunk?

TIA, have a lovely day.

 gcc/cp/coroutines.cc  |  1 +
 .../g++.dg/coroutines/coro-function-decl.C| 19 +++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/coro-function-decl.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 2b16b4814d10..8cd619d7eaed 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3928,6 +3928,7 @@ register_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
 
  /* Make sure that we only present vars to the tests below.  */
  if (TREE_CODE (lvar) == TYPE_DECL
+ || TREE_CODE (lvar) == FUNCTION_DECL
  || TREE_CODE (lvar) == NAMESPACE_DECL)
continue;
 
diff --git a/gcc/testsuite/g++.dg/coroutines/coro-function-decl.C 
b/gcc/testsuite/g++.dg/coroutines/coro-function-decl.C
new file mode 100644
index ..86140569a76e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/coro-function-decl.C
@@ -0,0 +1,19 @@
+#include 
+
+struct task
+{
+  struct promise_type
+  {
+std::suspend_always initial_suspend () { return {}; }
+std::suspend_always final_suspend () noexcept { return {}; }
+void unhandled_exception () {}
+task get_return_object () noexcept { return {}; }
+void return_void () {}
+  };
+};
+
+task foo ()
+{
+  void bar ();
+  co_return;
+}
-- 
2.45.2



[PATCH 2/2] c++: diagnose usage of co_await and co_yield in default args [PR115906]

2024-07-29 Thread Arsen Arsenović
This is a partial fix for PR115906.  Per [expr.await] 2s3, "An
await-expression shall not appear in a default argument
([dcl.fct.default])".  This patch introduces the diagnostic in that
case, and in the case of a co_yield (as co_yield is defined in terms of
co_await, so prerequisites of co_await hold).

PR c++/115906 - [coroutines] missing diagnostic and ICE when co_await used as 
default argument in function declaration

gcc/cp/ChangeLog:

PR c++/115906
* parser.cc (cp_parser_unary_expression): Reject await
expressions if use of local variables is currently forbidden.
(cp_parser_yield_expression): Reject yield expressions if use of
local variables is currently forbidden.

gcc/testsuite/ChangeLog:

PR c++/115906
* g++.dg/coroutines/pr115906-yield.C: New test.
* g++.dg/coroutines/pr115906.C: New test.
* g++.dg/coroutines/co-await-syntax-02-outside-fn.C: Don't rely
on default arguments.
* g++.dg/coroutines/co-yield-syntax-01-outside-fn.C: Ditto.
---
 gcc/cp/parser.cc  | 17 ++
 .../co-await-syntax-02-outside-fn.C   |  2 +-
 .../co-yield-syntax-01-outside-fn.C   |  3 +-
 .../g++.dg/coroutines/pr115906-yield.C| 29 +
 gcc/testsuite/g++.dg/coroutines/pr115906.C| 32 +++
 5 files changed, 80 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr115906-yield.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr115906.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f79736c17ac6..5cba35eff1c1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -9242,6 +9242,14 @@ cp_parser_unary_expression (cp_parser *parser, 
cp_id_kind * pidk,
if (expr == error_mark_node)
  return error_mark_node;
 
+   /* ... but, we cannot use co_await in default arguments.  */
+   if (parser->local_variables_forbidden_p & LOCAL_VARS_FORBIDDEN)
+ {
+   error_at (kw_loc,
+ "% cannot be used in default arguments");
+   return error_mark_node;
+ }
+
/* Handle [expr.await].  */
return cp_expr (finish_co_await_expr (kw_loc, expr));
  }
@@ -29646,6 +29654,15 @@ cp_parser_yield_expression (cp_parser* parser)
   else
 expr = cp_parser_assignment_expression (parser);
 
+  /* Similar to co_await, we cannot use co_yield in default arguments (as
+ co_awaits underlie co_yield).  */
+  if (parser->local_variables_forbidden_p & LOCAL_VARS_FORBIDDEN)
+{
+  error_at (kw_loc,
+   "% cannot be used in default arguments");
+  return error_mark_node;
+}
+
   if (expr == error_mark_node)
 return expr;
 
diff --git a/gcc/testsuite/g++.dg/coroutines/co-await-syntax-02-outside-fn.C 
b/gcc/testsuite/g++.dg/coroutines/co-await-syntax-02-outside-fn.C
index 4ce5c2e04a0a..132128f27192 100644
--- a/gcc/testsuite/g++.dg/coroutines/co-await-syntax-02-outside-fn.C
+++ b/gcc/testsuite/g++.dg/coroutines/co-await-syntax-02-outside-fn.C
@@ -2,4 +2,4 @@
 
 #include "coro.h"
 
-auto f (int x = co_await coro::suspend_always{}); // { dg-error {'co_await' 
cannot be used outside a function} }
+auto x = co_await coro::suspend_always{}; // { dg-error {'co_await' cannot be 
used outside a function} }
diff --git a/gcc/testsuite/g++.dg/coroutines/co-yield-syntax-01-outside-fn.C 
b/gcc/testsuite/g++.dg/coroutines/co-yield-syntax-01-outside-fn.C
index 30db0e963b09..51c304625278 100644
--- a/gcc/testsuite/g++.dg/coroutines/co-yield-syntax-01-outside-fn.C
+++ b/gcc/testsuite/g++.dg/coroutines/co-yield-syntax-01-outside-fn.C
@@ -2,5 +2,4 @@
 
 #include "coro.h"
 
-auto f (int x = co_yield 5); // { dg-error {'co_yield' cannot be used outside 
a function} }
-
+auto x = co_yield 5; // { dg-error {'co_yield' cannot be used outside a 
function} }
diff --git a/gcc/testsuite/g++.dg/coroutines/pr115906-yield.C 
b/gcc/testsuite/g++.dg/coroutines/pr115906-yield.C
new file mode 100644
index ..f8b6ded5001c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr115906-yield.C
@@ -0,0 +1,29 @@
+#include 
+
+struct Promise;
+
+struct Handle : std::coroutine_handle {
+using promise_type = Promise;
+};
+
+struct Promise {
+Handle get_return_object() noexcept {
+return {Handle::from_promise(*this)};
+}
+std::suspend_never initial_suspend() const noexcept { return {}; }
+std::suspend_never final_suspend() const noexcept { return {}; }
+void return_void() const noexcept {}
+void unhandled_exception() const noexcept {}
+std::suspend_never yield_value(int) { return {}; }
+};
+
+Handle Coro() {
+[] (int x = co_yield 1){}; // { dg-error ".co_yield. cannot be used in 
default arguments" }
+co_return;
+}
+
+int main() {
+Coro();
+
+return 0;
+}
diff --git a/gcc/testsuite/g++.dg/coroutines/pr115906.C 
b/gcc/testsuite/g++.dg/coroutines/pr115906.C
new fi

[PATCH] c++: make BUILTIN_SOURCE_LOCATION follow DECL_RAMP_FN

2024-07-29 Thread Arsen Arsenović
This fixes the value of current_function in compiler generated coroutine
code.

PR c++/110855 - std::source_location doesn't work with C++20 coroutine

gcc/cp/ChangeLog:

PR c++/110855
* cp-gimplify.cc (fold_builtin_source_location): Use the name of
the DECL_RAMP_FN of the current function if present.

gcc/testsuite/ChangeLog:

PR c++/110855
* g++.dg/coroutines/pr110855.C: New test.
---
Tested on x86_64-pc-linux-gnu.

OK for trunk?

TIA, have a lovely day.

 gcc/cp/cp-gimplify.cc  |  9 +++-
 gcc/testsuite/g++.dg/coroutines/pr110855.C | 61 ++
 2 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr110855.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index e6629dea5fdc..651751312fbe 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3929,7 +3929,14 @@ fold_builtin_source_location (const_tree t)
  const char *name = "";
 
  if (current_function_decl)
-   name = cxx_printable_name (current_function_decl, 2);
+   {
+ /* If this is a coroutine, we should get the name of the user
+function rather than the actor we generate.  */
+ if (tree ramp = DECL_RAMP_FN (current_function_decl))
+   name = cxx_printable_name (ramp, 2);
+ else
+   name = cxx_printable_name (current_function_decl, 2);
+   }
 
  val = build_string_literal (name);
}
diff --git a/gcc/testsuite/g++.dg/coroutines/pr110855.C 
b/gcc/testsuite/g++.dg/coroutines/pr110855.C
new file mode 100644
index ..6b5c0147ec83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr110855.C
@@ -0,0 +1,61 @@
+// { dg-do run }
+// { dg-output {^} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {$} }
+// https://gcc.gnu.org/PR110855
+#include 
+#include 
+
+struct ReturnObject {
+  struct promise_type {
+auto
+initial_suspend(const std::source_location location =
+std::source_location::current()) {
+  __builtin_puts (location.function_name ());
+  return std::suspend_never{};
+}
+auto
+final_suspend(const std::source_location location =
+  std::source_location::current()) noexcept {
+  __builtin_puts (location.function_name ());
+  return std::suspend_never{};
+}
+auto
+get_return_object(const std::source_location location =
+  std::source_location::current()) {
+  __builtin_puts (location.function_name ());
+  return 
ReturnObject{std::coroutine_handle::from_promise(*this)};
+}
+auto
+unhandled_exception() { }
+auto return_void(const std::source_location location =
+ std::source_location::current()) {
+  __builtin_puts (location.function_name ());
+}
+  };
+  std::coroutine_handle<> handle;
+};
+
+struct awaitable : std::suspend_never
+{
+  void await_resume(const std::source_location location =
+ std::source_location::current())
+  {
+  __builtin_puts (location.function_name ());
+  }
+};
+
+ReturnObject
+bar(int, char, bool) {
+  co_await awaitable{};
+  co_return;
+}
+
+int
+main() {
+  bar(1, 'a', false);
+}
-- 
2.45.2



RE: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-07-29 Thread Li, Pan2
> OK.

Thanks Richard, will wait the confirmation from Thomas in case I missed some 
more failed cases.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, July 29, 2024 4:44 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

On Mon, Jul 29, 2024 at 9:57 AM  wrote:
>
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.

OK.

Richard.

> gcc/ChangeLog:
>
> * internal-fn.cc (type_strictly_matches_mode_p): Add handling
> for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..086c8be398a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* For target=amdgcn-amdhsa,  we need to take care of vector bool types.
> + More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103.  
> */
> +  if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +&& TYPE_PRECISION (TREE_TYPE (type)) == 1)
> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));
>
> --
> 2.34.1
>


Re: [PATCH 1/3] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-29 Thread Jakub Jelinek
On Mon, Jul 29, 2024 at 02:14:40PM +0200, Richard Biener wrote:
> The following adds a target hook to specify whether regs of MODE can be
> used to transfer bits.  The hook is supposed to be used for value-numbering
> to decide whether a value loaded in such mode can be punned to another
> mode instead of re-loading the value in the other mode and for SRA to
> decide whether MODE is suitable as container holding a value to be
> used in different modes.
> 
>   * target.def (mode_can_transfer_bits): New target hook.
>   * target.h (mode_can_transfer_bits): New function wrapping the
>   hook and providing default behavior.
>   * doc/tm.texi.in: Update.
>   * doc/tm.texi: Re-generate.


> --- a/gcc/target.h
> +++ b/gcc/target.h
> @@ -312,6 +312,21 @@ estimated_poly_value (poly_int64 x,
>  return targetm.estimated_poly_value (x, kind);
>  }
>  
> +/* Return true when MODE can be used to copy GET_MODE_BITSIZE bits
> +   unchanged.  */
> +
> +inline bool
> +mode_can_transfer_bits (machine_mode mode)
> +{

Shouldn't this start with
  mode = GET_MODE_INNER (mode);
?
I mean say XCmode has similar problems as XFmode, or
V4SFmode as SFmode if i?86 -mno-sse.
Though, admittedly, with i?86 -msse2 -mfpmath=387 perhaps some vector modes
could work, which would argue for passing even vector modes to the hook.
Though the GET_MODE_BITSIZE != GET_MODE_PRECISION check then wants the inner
modes maybe.

> +  if (mode == BLKmode)
> +return true;
> +  if (maybe_ne (GET_MODE_BITSIZE (mode), GET_MODE_PRECISION (mode)))
> +return false;
> +  if (targetm.mode_can_transfer_bits)
> +return targetm.mode_can_transfer_bits (mode);
> +  return true;
> +}
> +
>  #ifdef GCC_TM_H
>  
>  #ifndef CUMULATIVE_ARGS_MAGIC
> -- 
> 2.35.3

Jakub



Re: [PATCH] RISC-V: Expand subreg move via slide if necessary [PR116086].

2024-07-29 Thread Richard Sandiford
Jeff Law  writes:
> On 7/26/24 2:42 PM, Robin Dapp wrote:
>> Hi,
>> 
>> when the source mode is potentially larger than one vector (e.g. an
>> LMUL2 mode for VLEN=128) we don't know which vector the subreg actually
>> refers to.  For zvl128b and LMUL=2 the subreg in (subreg:V2DI (reg:V4DI))
>> could actually be the a full (high) vector register of a two-register
>> group (at VLEN=128) or the higher part of a single register (at VLEN>128).
>> 
>> In that case we need to use a slidedown instead of moving a register
>> directly.
>> 
>> Regtested on rv64gcv_zvfh_zvbb -mrvv-max-lmul=m2 at vlen 128 and vlen 256.
>> This also fixes
>>gcc.dg/vect/bb-slp-cond-1.c
>>gcc.dg/vect/bb-slp-pr101668.c
>>gcc.dg/vect/pr66251.c
>> and others from the vector test suite when ran with vlen 256.
>> 
>> Regtested on rv64gcv_zvfh_zvbb -mrvv-max-lmul=m2 and vlen 128 as well as vlen
>> 256.  Still curious what the CI says.
>> 
>> Regards
>>   Robin
>> 
>> gcc/ChangeLog:
>> 
>>  PR target/116086
>> 
>>  * config/riscv/riscv-v.cc (legitimize_move): Slide down instead
>>  of moving register directly.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/riscv/rvv/autovec/pr116086-2-run.c: New test.
>>  * gcc.target/riscv/rvv/autovec/pr116086-2.c: New test.
>>  * gcc.target/riscv/rvv/autovec/pr116086.c: New test.
> So the representational issues LMUL > 1 brings with GCC have been in the 
> back of my mind, but never bubbled up enough for me to say anything.
>
> This seems to start and touch on those concerns.  While your patch fixes 
> the immediate issue in the RISC-V backend, I won't be at all surprised 
> if we find other cases in the generic code that assume they know how to 
> interpret a SUBREG and get it wrong.

Yeah, I agree.  It seems like this is papering over an issue elsewhere,
and that we're losing our chance to see what it is.

A somewhat similar situation can happen for SVE with subregs like:

  (subreg:V4SI (reg:VNx8SI R) 16)

IMO, what ought to happen here is that the RA should spill
the inner register to memory and load the V4SI back from there.
(Or vice versa, for an lvalue.)  Obviously that's not very efficient,
and so a patch like the above might be useful as an optimisation.[*]
But it shouldn't be needed for correctness.  The target-independent
code should already have the information it needs to realise that
it can't predict the register index at compile time (at least for SVE).

Richard

[*] Big-endian SVE also checks for tricky subregs in the move patterns,
and tries to optimise them to something that doesn't involve a subreg.
But it's only an optimisation.

>
> And we may currently be hiding these issues by having GCC and QEMU be 
> consistent in their handling in the default configurations.
>
> Anyway, the patch is sensible.  Essentially using a slidedown is a good 
> way to avoid a subclass of the potential problems.
>
> Jeff


Re: [PATCH] RISC-V: Expand subreg move via slide if necessary [PR116086].

2024-07-29 Thread Richard Sandiford
Richard Sandiford  writes:
> A somewhat similar situation can happen for SVE with subregs like:
>
>   (subreg:V4SI (reg:VNx8SI R) 16)
>
> IMO, what ought to happen here is that the RA should spill
> the inner register to memory and load the V4SI back from there.
> (Or vice versa, for an lvalue.)  Obviously that's not very efficient,
> and so a patch like the above might be useful as an optimisation.[*]
> But it shouldn't be needed for correctness.  The target-independent
> code should already have the information it needs to realise that
> it can't predict the register index at compile time (at least for SVE).

Or actually, for that case:

  /* For pseudo registers, we want most of the same checks.  Namely:

 Assume that the pseudo register will be allocated to hard registers
 that can hold REGSIZE bytes each.  If OSIZE is not a multiple of REGSIZE,
 the remainder must correspond to the lowpart of the containing hard
 register.  If BYTES_BIG_ENDIAN, the lowpart is at the highest offset,
 otherwise it is at the lowest offset.

 Given that we've already checked the mode and offset alignment,
 we only have to check subblock subregs here.  */
  if (maybe_lt (osize, regsize)
  && ! (lra_in_progress && (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode
{
  /* It is invalid for the target to pick a register size for a mode
 that isn't ordered wrt to the size of that mode.  */
  poly_uint64 block_size = ordered_min (isize, regsize);
  unsigned int start_reg;
  poly_uint64 offset_within_reg;
  if (!can_div_trunc_p (offset, block_size, &start_reg, &offset_within_reg)
  ...

in validate_subreg should reject the offset.

Richard


Re: [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space

2024-07-29 Thread Andrew Burgess
Thomas Schwinge  writes:

> Hi!
>
> On 2024-02-10T17:26:01+, Andrew Burgess  wrote:
>> --- a/libiberty/argv.c
>> +++ b/libiberty/argv.c
>
>> @@ -439,17 +442,8 @@ expandargv (int *argcp, char ***argvp)
>>  }
>>/* Add a NUL terminator.  */
>>buffer[len] = '\0';
>> -  /* If the file is empty or contains only whitespace, buildargv would
>> - return a single empty argument.  In this context we want no arguments,
>> - instead.  */
>> -  if (only_whitespace (buffer))
>> -{
>> -  file_argv = (char **) xmalloc (sizeof (char *));
>> -  file_argv[0] = NULL;
>> -}
>> -  else
>> -/* Parse the string.  */
>> -file_argv = buildargv (buffer);
>> +  /* Parse the string.  */
>> +  file_argv = buildargv (buffer);
>>/* If *ARGVP is not already dynamically allocated, copy it.  */
>>if (*argvp == original_argv)
>>  *argvp = dupargv (*argvp);
>
> With that (single) use of 'only_whitespace' now gone:
>
> [...]/source-gcc/libiberty/argv.c:128:1: warning: ‘only_whitespace’ 
> defined but not used [-Wunused-function]
>   128 | only_whitespace (const char* input)
>   | ^~~
>

Sorry about that.

The patch below is the obvious fix.  OK to apply?

Thanks,
Andrew

---

commit c4533957b8424a3780180b47834350897674c776
Author: Andrew Burgess 
Date:   Mon Jul 29 13:47:32 2024 +0100

libiberty/argv.c: remove only_whitespace

After the commit:

  commit 5e1d530da87a6d2aa7e719744cb278e7e54a6623 (gcc-buildargv)
  Date:   Sat Feb 10 11:22:13 2024 +

  libiberty/buildargv: handle input consisting of only white space

The function only_whitespace (in argv.c) was no longer being called.
Lets delete it.

There should be no user visible changes after this commit.

2024-07-29  Andrew Burgess  

libiberty/

* argv.c (only_whitespace): Delete.

diff --git a/libiberty/argv.c b/libiberty/argv.c
index 675336273f3..f889432a868 100644
--- a/libiberty/argv.c
+++ b/libiberty/argv.c
@@ -124,15 +124,6 @@ consume_whitespace (const char **input)
 }
 }
 
-static int
-only_whitespace (const char* input)
-{
-  while (*input != EOS && ISSPACE (*input))
-input++;
-
-  return (*input == EOS);
-}
-
 /*
 
 @deftypefn Extension char** buildargv (char *@var{sp})




Re: [PATCH 1/3] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-29 Thread Richard Biener
On Mon, 29 Jul 2024, Jakub Jelinek wrote:

> On Mon, Jul 29, 2024 at 02:14:40PM +0200, Richard Biener wrote:
> > The following adds a target hook to specify whether regs of MODE can be
> > used to transfer bits.  The hook is supposed to be used for value-numbering
> > to decide whether a value loaded in such mode can be punned to another
> > mode instead of re-loading the value in the other mode and for SRA to
> > decide whether MODE is suitable as container holding a value to be
> > used in different modes.
> > 
> > * target.def (mode_can_transfer_bits): New target hook.
> > * target.h (mode_can_transfer_bits): New function wrapping the
> > hook and providing default behavior.
> > * doc/tm.texi.in: Update.
> > * doc/tm.texi: Re-generate.
> 
> 
> > --- a/gcc/target.h
> > +++ b/gcc/target.h
> > @@ -312,6 +312,21 @@ estimated_poly_value (poly_int64 x,
> >  return targetm.estimated_poly_value (x, kind);
> >  }
> >  
> > +/* Return true when MODE can be used to copy GET_MODE_BITSIZE bits
> > +   unchanged.  */
> > +
> > +inline bool
> > +mode_can_transfer_bits (machine_mode mode)
> > +{
> 
> Shouldn't this start with
>   mode = GET_MODE_INNER (mode);
> ?

I specifically wanted to avoid this (at least for the purpose of the
hook).

> I mean say XCmode has similar problems as XFmode, or
> V4SFmode as SFmode if i?86 -mno-sse.
> Though, admittedly, with i?86 -msse2 -mfpmath=387 perhaps some vector modes
> could work, which would argue for passing even vector modes to the hook.
> Though the GET_MODE_BITSIZE != GET_MODE_PRECISION check then wants the inner
> modes maybe.

We do not support vector inner modes with padding.  I didn't think of
XCmode - though precision is 160 here and size 192, so the padding
check should work there as well.

For vector I think the x86 backend ensures we never get x87 modes as
components.  The middle-end will also not allow vector(1) float
with SFmode like it allows vector(1) int with SImode.

That said, the i386 implementation needs to handle XCmode, will
adjust.

Richard.

> 
> > +  if (mode == BLKmode)
> > +return true;
> > +  if (maybe_ne (GET_MODE_BITSIZE (mode), GET_MODE_PRECISION (mode)))
> > +return false;
> > +  if (targetm.mode_can_transfer_bits)
> > +return targetm.mode_can_transfer_bits (mode);
> > +  return true;
> > +}
> > +
> >  #ifdef GCC_TM_H
> >  
> >  #ifndef CUMULATIVE_ARGS_MAGIC
> > -- 
> > 2.35.3
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 1/3] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-29 Thread Jakub Jelinek
On Mon, Jul 29, 2024 at 02:52:24PM +0200, Richard Biener wrote:
> >   mode = GET_MODE_INNER (mode);
> > ?
> 
> I specifically wanted to avoid this (at least for the purpose of the
> hook).
> 
> > I mean say XCmode has similar problems as XFmode, or
> > V4SFmode as SFmode if i?86 -mno-sse.
> > Though, admittedly, with i?86 -msse2 -mfpmath=387 perhaps some vector modes
> > could work, which would argue for passing even vector modes to the hook.
> > Though the GET_MODE_BITSIZE != GET_MODE_PRECISION check then wants the inner
> > modes maybe.
> 
> We do not support vector inner modes with padding.  I didn't think of
> XCmode - though precision is 160 here and size 192, so the padding
> check should work there as well.

One thing is XCmode, another one is SCmode/DCmode/HCmode/BCmode without
-mfpmath=sse, there the target hook should say that it can't transfer bits.

For the vector V*[SDHB]Fmode it really depends on if it will be lowered to
scalar or vector moves.

And, for the GET_MODE_INNER, I also meant it for Aarch64/RISC-V VL vectors,
I think those should be considered as true by the hook, not false
because maybe_ne.

> For vector I think the x86 backend ensures we never get x87 modes as
> components.  The middle-end will also not allow vector(1) float

It ensures there are no V*XFmode vectors.  But whether say V*SFmode vectors
will result in vector moves which move everything safely or scalar which
would use x87 and be unsafe is unsure.

> with SFmode like it allows vector(1) int with SImode.
> 
> That said, the i386 implementation needs to handle XCmode, will
> adjust.

Jakub



Re: [PATCH 2/3] [x86] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-29 Thread Richard Biener
On Mon, 29 Jul 2024, Richard Biener wrote:

> The following implements the hook, excluding x87 modes.

Jakub correctly pointed out complex modes, so I've adjusted the hook to
the following which might be easier to parse (and handles decimal
FP modes as returning true).  Re-testing in progress.

/* Implement TARGET_MODE_CAN_TRANSFER_BITS.  */
static bool
ix86_mode_can_transfer_bits (machine_mode mode)
{
  if (GET_MODE_CLASS (mode) == MODE_FLOAT
  || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
switch (GET_MODE_INNER (mode))
  {
  case SFmode:
  case DFmode:
return TARGET_SSE_MATH && !TARGET_MIX_SSE_I387;
  default:
return false;
  }

  return true;
}


>   * i386.cc (TARGET_MODE_CAN_TRANSFER_BITS): Define.
>   (ix86_mode_can_transfer_bits): New function.
> ---
>  gcc/config/i386/i386.cc | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 12d15feb5e9..584417992a0 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -26113,6 +26113,14 @@ ix86_have_ccmp ()
>return (bool) TARGET_APX_CCMP;
>  }
>  
> +/* Implement TARGET_MODE_CAN_TRANSFER_BITS.  */
> +static bool
> +ix86_mode_can_transfer_bits (machine_mode mode)
> +{
> +  return (!SCALAR_FLOAT_MODE_P (mode)
> +   || (TARGET_SSE_MATH && !TARGET_MIX_SSE_I387 && mode != XFmode));
> +}
> +
>  /* Target-specific selftests.  */
>  
>  #if CHECKING_P
> @@ -26959,6 +26967,9 @@ ix86_libgcc_floating_mode_supported_p
>  #undef TARGET_HAVE_CCMP
>  #define TARGET_HAVE_CCMP ix86_have_ccmp
>  
> +#undef TARGET_MODE_CAN_TRANSFER_BITS
> +#define TARGET_MODE_CAN_TRANSFER_BITS ix86_mode_can_transfer_bits
> +
>  static bool
>  ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
>  {
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [patch,avr] PR115830: Improve code by using more condition code

2024-07-29 Thread Georg-Johann Lay

Am 10.07.24 um 01:17 schrieb Jeff Law:

On 7/9/24 4:03 AM, Georg-Johann Lay wrote:

Hi Jeff,

This patch adds peephole2s and insns to make better use of
instructions that set condition code (SREG) as a byproduct.

Of course with cc0 all this was *much* simpler... so here we go;
adding CCNmode and CCZNmode, and extra insns that do arith + CC.

No new regressions.

Ok for master?

Johann

--

AVR: target/115830 - Make better use of SREG.N and SREG.Z.

This patch adds new CC modes CCN and CCZN for operations that
set SREG.N, resp. SREG.Z and SREG.N.  Add a bunch of peephole2
patterns to generate new compute + branch insns that make use
of the Z and N flags.  Most of these patterns need their own
asm output routines that don't do all the micro-optimizations
that the ordinary outputs may perform, as the latter have no
requirement to set CC in a usable way.  Pass peephole2 is run
a second time so all patterns get a chance to match.

 PR target/115830
gcc/
 * config/avr/avr-modes.def (CCN, CCZN): New CC_MODEs.
 * config/avr/avr-protos.h (ret_cond_branch): Adjust.
 (avr_out_plus_set_N, avr_op8_ZN_operator,
 avr_out_op8_set_ZN, avr_len_op8_set_ZN): New protos.
 * config/avr/avr.cc (ret_cond_branch): Remove "reverse"
 argument (was always false) and respective code.
 Pass cc_overflow_unusable as an argument.
  (cond_string): Add bool cc_overflow_unusable argument.
 (avr_print_operand) ['L']: Like 'j' but overflow unusable.
 ['K']: Like 'k' but overflow unusable.
 (avr_out_plus_set_ZN): Also support adding -2 and +2.
 (avr_out_plus_set_N, avr_op8_ZN_operator): New functions.
 (avr_out_op8_set_ZN, avr_len_op8_set_ZN): New functions.
 (avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_N]: Hande case.
 (avr_class_max_nregs): All MODE_CCs occupy one hard reg.
 (avr_hard_regno_nregs): Same.
 (avr_hard_regno_mode_ok) [REG_CC]: Allow all MODE_CC.
 (pass_manager.h): Include it.
 (avr_option_override): Run peephole2 a second time.
 * config/avr/avr.md (adjust_len) [add_set_N]: New.
 (ALLCC, CCN_CCZN): New mode iterators.
 (CCname): New mode attribute.
 (eqnegtle, cmp_signed, op8_ZN): New code iterators.
 (swap, SWAP, tstMSB): New code attributes.
 (branch): Handle CCNmode and CCZNmode.  Assimilate...
 (difficult_branch): ...this insn.
 (p1m1): Turn into p2m2.
 (gen_add_for__): Adjust to CCNmode and CCZNmode.
 Extend peephole2s that produce them.
 (*add.for.eqne.): Extend to 
*add.for...

 (*ashift.for.ccn.): New insns and peephole2s to make them.
 (*op8.for.cczn.): New insns and peephole2s to make them.
 * config/avr/predicates.md (const_1_to_3_operand)
 (abs1_abs2_operand, signed_comparison_operator)
 (op8_ZN_operator): New predicates.
gcc/testsuite/
 * gcc.target/avr/pr115830-add-c.c: New test.
 * gcc.target/avr/pr115830-add-i.c: New test.
 * gcc.target/avr/pr115830-and.c: New test.
 * gcc.target/avr/pr115830-asl.c: New test.
 * gcc.target/avr/pr115830-asr.c: New test.
 * gcc.target/avr/pr115830-ior.c: New test.
 * gcc.target/avr/pr115830-lsr.c: New test.
 * gcc.target/avr/pr115830-asl32.c: New test.
I was going to throw this into my tester, but the avr.md part of the 
patch failed.  I'm guessing the patch needs minor updates due to some 
kind of changes on the trunk.



It looks like avr exposes the CC register early, creating references to 
it during expansion to RTL.  Presumably this means you've got a 
reasonbale way to reload values, particularly address arithmetic without 
impacting the CC state?


It looks like you're relying heavily on peep2 patterns.  Did you explore 
using cmpelim?


jeff


Hi Jeff,

could you make any advancement in improving cmpelim and the many points
that make it a bad choice for avr? Like for example

* cmpelim requires the insn and insn+ccmode to be of the same form
(same clobbers, constraints etc), which is not the case for avr,
or turned the other way: forcing them to take the same form will
reduce code quality for non-compare cases.

* cmpelim cannot provide a scratch reg while peep2 can.

Sadly, all these shortcomings of MODE_CC / cmpelim were well
known prior to the cc0 removal...

Johann



Re: [PATCH 1/3] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-29 Thread Jakub Jelinek
On Mon, Jul 29, 2024 at 02:59:58PM +0200, Jakub Jelinek wrote:
> On Mon, Jul 29, 2024 at 02:52:24PM +0200, Richard Biener wrote:
> > >   mode = GET_MODE_INNER (mode);
> > > ?
> > 
> > I specifically wanted to avoid this (at least for the purpose of the
> > hook).
> > 
> > > I mean say XCmode has similar problems as XFmode, or
> > > V4SFmode as SFmode if i?86 -mno-sse.
> > > Though, admittedly, with i?86 -msse2 -mfpmath=387 perhaps some vector 
> > > modes
> > > could work, which would argue for passing even vector modes to the hook.
> > > Though the GET_MODE_BITSIZE != GET_MODE_PRECISION check then wants the 
> > > inner
> > > modes maybe.
> > 
> > We do not support vector inner modes with padding.  I didn't think of
> > XCmode - though precision is 160 here and size 192, so the padding
> > check should work there as well.
> 
> One thing is XCmode, another one is SCmode/DCmode/HCmode/BCmode without
> -mfpmath=sse, there the target hook should say that it can't transfer bits.
> 
> For the vector V*[SDHB]Fmode it really depends on if it will be lowered to
> scalar or vector moves.
> 
> And, for the GET_MODE_INNER, I also meant it for Aarch64/RISC-V VL vectors,
> I think those should be considered as true by the hook, not false
> because maybe_ne.

Maybe the vector modes are ok on ia32, given -O2 -m32 -mno-sse
struct S { _Complex _Float16 a; __attribute__((vector_size (8 * sizeof 
(_Float16 _Float16 b; };

void
foo (struct S *p, struct S *q)
{
  p->a = q->a;
  q->b = p->b;
}

struct T { _Complex _Float32 a; __attribute__((vector_size (8 * sizeof 
(_Float32 _Float32 b; };

void
bar (struct T *p, struct T *q)
{
  p->a = q->a;
  q->b = p->b;
}
But SCmode/DCmode is not.

Jakub



Re: [PATCH 1/3] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-29 Thread Richard Biener
On Mon, 29 Jul 2024, Jakub Jelinek wrote:

> On Mon, Jul 29, 2024 at 02:52:24PM +0200, Richard Biener wrote:
> > >   mode = GET_MODE_INNER (mode);
> > > ?
> > 
> > I specifically wanted to avoid this (at least for the purpose of the
> > hook).
> > 
> > > I mean say XCmode has similar problems as XFmode, or
> > > V4SFmode as SFmode if i?86 -mno-sse.
> > > Though, admittedly, with i?86 -msse2 -mfpmath=387 perhaps some vector 
> > > modes
> > > could work, which would argue for passing even vector modes to the hook.
> > > Though the GET_MODE_BITSIZE != GET_MODE_PRECISION check then wants the 
> > > inner
> > > modes maybe.
> > 
> > We do not support vector inner modes with padding.  I didn't think of
> > XCmode - though precision is 160 here and size 192, so the padding
> > check should work there as well.
> 
> One thing is XCmode, another one is SCmode/DCmode/HCmode/BCmode without
> -mfpmath=sse, there the target hook should say that it can't transfer bits.

I guess the adjusted hook doing

  if (GET_MODE_CLASS (mode) == MODE_FLOAT
  || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
switch (GET_MODE_INNER (mode))
  {
  case SFmode:
  case DFmode:
return TARGET_SSE_MATH && !TARGET_MIX_SSE_I387;
  default:
return false;
  }

would cover that.

> For the vector V*[SDHB]Fmode it really depends on if it will be lowered to
> scalar or vector moves.

Hmm, indeed vector(4) float can get V4SFmode even without SSE enabled
since we use targetm.vector_mode_supported_any_target_p to decide
whether that mode is usable.  So that might later get lowered to
x87 SFmode though the problematic load/store stmts are _not_ lowered
by vector lowering.

Indeed

typedef float v2sf __attribute__((vector_size(8)));
typedef int v2si __attribute__((vector_size(8)));

v2si v, v3;
v2sf v2;

void foo ()
{
  v2sf x = *(v2sf *)&v;
  v2si i = v;
  v2 = x;
  v3 = i;
}

gets optimized to

  x_3 = MEM[(v2sf *)&v];
  _7 = VIEW_CONVERT_EXPR(x_3);
  v2 = x_3;
  v3 = _7;

with -mno-sse even, but in the end the v2sf load prevails and gets
expanded via

movlv, %edx
movlv+4, %eax

same with double/long long.

So I _think_ this should not be a concern either.  Actual float
operations remain float.

> And, for the GET_MODE_INNER, I also meant it for Aarch64/RISC-V VL vectors,
> I think those should be considered as true by the hook, not false
> because maybe_ne.

I don't think relevant modes will have size/precision mismatches
and maybe_ne should work here.  Richard?

> > For vector I think the x86 backend ensures we never get x87 modes as
> > components.  The middle-end will also not allow vector(1) float
> 
> It ensures there are no V*XFmode vectors.  But whether say V*SFmode vectors
> will result in vector moves which move everything safely or scalar which
> would use x87 and be unsafe is unsure.

The experiment above shows it "works".  I'm not sure to what extent
the x86 makes sure that SFmode moves never end up in the FP stack
on x86-64 - this is why it's up to the target hook to say what's safe
and what not.

Maybe the hook documentation needs to clarify with RTL specific
wording I am not aware of - it basically says whether a move through MODE
is preserving the bit pattern (so mem <- reg, reg <- mem but also reg <- 
reg).

Richard.

> > with SFmode like it allows vector(1) int with SImode.
> > 
> > That said, the i386 implementation needs to handle XCmode, will
> > adjust.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] RISC-V: Add configure check for B extention support

2024-07-29 Thread Kito Cheng
LGTM, although I said no binutils check for zacas and zabha, but B is
a different situation since GCC will add that if zba, zbb and zbs are
all present.



On Thu, Jul 25, 2024 at 7:51 AM Edwin Lu  wrote:
>
> Binutils 2.42 and before don't recognize the B extension in the march
> strings even though it supports zba_zbb_zbs. Add a configure check to
> ignore the B in the march string if found.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc (riscv_subset_list::to_string):
> Skip b in march string
> * config.in: Regenerate.
> * configure: Regenerate.
> * configure.ac: Add B assembler check
>
> Signed-off-by: Edwin Lu 
> ---
>  gcc/common/config/riscv/riscv-common.cc |  8 +++
>  gcc/config.in   |  6 +
>  gcc/configure   | 31 +
>  gcc/configure.ac|  5 
>  4 files changed, 50 insertions(+)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 682826c0e34..200a57e1bc8 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -857,6 +857,7 @@ riscv_subset_list::to_string (bool version_p) const
>bool skip_zaamo_zalrsc = false;
>bool skip_zabha = false;
>bool skip_zicsr = false;
> +  bool skip_b = false;
>bool i2p0 = false;
>
>/* For RISC-V ISA version 2.2 or earlier version, zicsr and zifencei is
> @@ -891,6 +892,10 @@ riscv_subset_list::to_string (bool version_p) const
>/* Skip since binutils 2.42 and earlier don't recognize zabha.  */
>skip_zabha = true;
>  #endif
> +#ifndef HAVE_AS_MARCH_B
> +  /* Skip since binutils 2.42 and earlier don't recognize b.  */
> +  skip_b = true;
> +#endif
>
>for (subset = m_head; subset != NULL; subset = subset->next)
>  {
> @@ -911,6 +916,9 @@ riscv_subset_list::to_string (bool version_p) const
>if (skip_zabha && subset->name == "zabha")
> continue;
>
> +  if (skip_b && subset->name == "b")
> +   continue;
> +
>/* For !version_p, we only separate extension with underline for
>  multi-letter extension.  */
>if (!first &&
> diff --git a/gcc/config.in b/gcc/config.in
> index bc819005bd6..96e829b9c93 100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -629,6 +629,12 @@
>  #endif
>
>
> +/* Define if the assembler understands -march=rv*_b. */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_AS_MARCH_B
> +#endif
> +
> +
>  /* Define if the assembler understands -march=rv*_zaamo_zalrsc. */
>  #ifndef USED_FOR_TARGET
>  #undef HAVE_AS_MARCH_ZAAMO_ZALRSC
> diff --git a/gcc/configure b/gcc/configure
> index 01acca7fb5c..c5725c4cd44 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -30913,6 +30913,37 @@ if test $gcc_cv_as_riscv_march_zabha = yes; then
>
>  $as_echo "#define HAVE_AS_MARCH_ZABHA 1" >>confdefs.h
>
> +fi
> +
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
> -march=rv32i_b support" >&5
> +$as_echo_n "checking assembler for -march=rv32i_b support... " >&6; }
> +if ${gcc_cv_as_riscv_march_b+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  gcc_cv_as_riscv_march_b=no
> +  if test x$gcc_cv_as != x; then
> +$as_echo '' > conftest.s
> +if { ac_try='$gcc_cv_as $gcc_cv_as_flags -march=rv32i_b -o conftest.o 
> conftest.s >&5'
> +  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
> +  (eval $ac_try) 2>&5
> +  ac_status=$?
> +  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> +  test $ac_status = 0; }; }
> +then
> +   gcc_cv_as_riscv_march_b=yes
> +else
> +  echo "configure: failed program was" >&5
> +  cat conftest.s >&5
> +fi
> +rm -f conftest.o conftest.s
> +  fi
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_riscv_march_b" 
> >&5
> +$as_echo "$gcc_cv_as_riscv_march_b" >&6; }
> +if test $gcc_cv_as_riscv_march_b = yes; then
> +
> +$as_echo "#define HAVE_AS_MARCH_B 1" >>confdefs.h
> +
>  fi
>
>  ;;
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 3f20c107b6a..93d9236ff36 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -5466,6 +5466,11 @@ configured with --enable-newlib-nano-formatted-io.])
>[-march=rv32i_zabha],,,
>[AC_DEFINE(HAVE_AS_MARCH_ZABHA, 1,
>  [Define if the assembler understands -march=rv*_zabha.])])
> +gcc_GAS_CHECK_FEATURE([-march=rv32i_b support],
> +  gcc_cv_as_riscv_march_b,
> +  [-march=rv32i_b],,,
> +  [AC_DEFINE(HAVE_AS_MARCH_B, 1,
> +[Define if the assembler understands -march=rv*_b.])])
>  ;;
>  loongarch*-*-*)
>  gcc_GAS_CHECK_FEATURE([.dtprelword support],
> --
> 2.34.1
>


Re: [RFH PATCH] c++: Implement C++26 P2963R3 - Ordering of constraints involving fold expressions [PR115746]

2024-07-29 Thread Jakub Jelinek
On Fri, Jul 26, 2024 at 06:00:12PM -0400, Patrick Palka wrote:
> On Fri, 26 Jul 2024, Jakub Jelinek wrote:
> 
> > On Fri, Jul 26, 2024 at 04:42:36PM -0400, Patrick Palka wrote:
> > > > // P2963R3 - Ordering of constraints involving fold expressions
> > > > // { dg-do compile { target c++20 } }
> > > > 
> > > > template  concept C = (__is_same (T, int) && ...);
> > > > template 
> > > > struct S {
> > > >   template  requires (C)
> > > >   static constexpr bool foo () { return true; }
> > > > };
> > > > 
> > > > static_assert (S::foo  ());
> > > > 
> > > > somehow the template parameter mapping needs to be remembered even for 
> > > > the
> > > > fold expanded constraint, right now the patch will see the pack is T,
> > > > which is level 1 index 0, but args aren't arguments of the C concept,
> > > > but of the foo function template.
> > > > One can also use requires (C) etc., no?
> > > 
> > > It seems the problem is FOLD_EXPR_PACKS is currently set to the
> > > parameter packs used inside the non-normalized constraints, but I think
> > > what we really need are the packs used in the normalized constraints,
> > > specifically the packs used in the target of each parameter mapping of
> > > each atomic constraint?
> > 
> > But in that case there might be no packs at all.
> > 
> > template  C = true;
> > template  requires (C && ...)
> > constexpr bool foo () { return true; }
> > 
> > If normalized C is just true, it doesn't use any packs.
> > But the [temp.constr.fold] wording assumes it is a pack expansion and that
> > there is at least one pack expansion parameter, otherwise N wouldn't be
> > defined.
> 
> Hmm yeah, I see what you mean.  That seems to be an edge case that's not
> fully accounted for by the wording.
> 
> One thing that's unclear to me in that wording is what are the pcak
> expansion parameters of a fold expanded constraint.
> 
> In
> 
>   template concept C = (__is_same (T, int) && ...);
>   template
>   void f() requires C;
> 
> is the pack expansion parameter T or V?  In
> 
>   template concept C = (__is_same (T, int) && ...);
>   template
>   void g() requires C;
> 
> it must be T.  So I guess in both cases it must be T.  But then I reckon
> when [temp.constr.fold] mentions "pack expansion parameter(s)" what it
> really means is "target of each pack expansion parameter within the
> parameter mapping"...

So, shall we file some https://github.com/cplusplus/CWG/ issue about this?
Whether the packs [temp.constr.fold] talks about are the normalized ones
only (in that case what happens if there are no packs), or all packs
mentioned (in that case, whether there shouldn't be also template parameter
mappings on the fold expanded constraints like there are on the atomic
constraints (for the unexpanded packs only)?

Interesting testcases could be also:
struct A  {};
template  C = true;
template  D = __is_same (T, int);
template  requires ((C && D) && ...)
constexpr bool foo (A, A) { return true; }
static_assert (foo (A, A));
// Is this valid because only V unexpanded pack from the normalized
// constraint is considered, or invalid because there are 2 packs
// and have different length?

Anyway, I'm afraid on the implementation side, ARGUMENT_PACK_SELECT
didn't help almost at all.  The problem e.g. on fold-constr7.C testcase
is that the ARGUMENT_PACK_SELECT is optimized away before it could be used.
tsubst_parameter_mapping (where I could remove the
  if (cxx_dialect >= cxx26 && ARGUMENT_PACK_P (arg))
hack without any behavior change) just tsubsts it into int type.
With the hack removed, it will go through
  if (ARGUMENT_PACK_P (arg))
new_arg = tsubst_argument_pack (arg, args, complain, in_decl);
but that still sets new_arg to int INTEGER_TYPE; while if a pack is used
in some nested pack expansion as well as outside of it, we'd need to arrange
to reconstruct ARGUMENT_PACK_SELECT in what tsubst_parameter_mapping
arranges.

Jakub



Re: [PATCH v3] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-07-29 Thread Kito Cheng
> > This API is intended to provide the capability to query minimal common 
> > available extensions on the system.
> >
> > Proposal in riscv-c-api-doc: 
> > https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74
>
> That's not merged, but I'm not sure what the rules are on stability for
> the C API doc.

The general rule is wait until achieving consensus between the GNU and
LLVM community,
you may know we (sifive folks) still have some discussion with Philip
Reames, so that's why

> > +static void __init_riscv_features_bits_linux ()
> > +{
> > +  struct riscv_hwprobe hwprobes[] = {
> > +{RISCV_HWPROBE_KEY_BASE_BEHAVIOR, 0},
> > +{RISCV_HWPROBE_KEY_IMA_EXT_0, 0},
> > +{RISCV_HWPROBE_KEY_MVENDORID, 0},
> > +  };
> > +
> > +  long rv = syscall_5_args (__NR_riscv_hwprobe, (long)&hwprobes,
> > + sizeof (hwprobes) / sizeof (hwprobes[0]), 0,
> > + 0, 0);
>
> We were talking about this on the patchwork call, but went on to
> something else.  I was still kind of curious as to how this worked, and
> it's because this is just calling the syscall directly.  I think that's
> OK, as we're not resolving the hwprobe libc function.  It means we lose
> the caching from the VDSO, but we're caching again here so maybe that
> doesn't really matter -- we're just caching twice, but it's not like
> the performance is going to be worse than Arm/Intel (just a bit clunky).
>
> We did come back to it in the patchwork call, though, and were a bit
> worried about those symbol lookups.  So the conclusion was to put
> together a test to make sure we can actually look up these symbols from
> IFUNCs.

This function may also be used by __builtin_cpu_init, so IFUNC's parameter
is not available for that situation.

>
> > +
> > +  if (rv)
> > +return;
>
> Don't we need to also zero out the extension list when the syscalls
> fails?

We don't really need that since global variables should be zero-initialized
by default :)

and following zero out logic is only used for local variable copy only.

> > +void __init_riscv_feature_bits ()
> > +{
> > +  if (__init)
> > +return;
> > +
> > +#ifdef __linux
> > +  __init_riscv_features_bits_linux ();
>
> Just thinking a bit here: if we have an ABI where
> __init_riscv_feature_bits() takes an argument that's either 0 (ie, "do
> the syscall") or the pre-resolved VDSO function then we can avoid going
> into the kernel

Yeah, sounds like a reasonable way, and call that a platform specific argument.


Re: [RFC/RFA][PATCH 0/2] SVE intrinsics: Add strength reduction for division by constant.

2024-07-29 Thread Jennifer Schmitz
On 17 Jul 2024, at 09:29, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>> This patch series is part of an ongoing effort to replace the SVE intrinsic 
>> svdiv
>> by lower-strength instructions for division by constant. To that end, we
>> implemented svdiv_impl::fold to perform the following transformation in 
>> gimple:
>> - Division where all divisors are the same power of 2 --> svasrd
> 
> Sounds good.
> 
>> - Division where all divisors are powers of 2 --> svasr
> 
> I don't think this is correct for negative dividends (which is why
> ASRD exists).  E.g. -1 / 4 is 0 as computed by svdiv (round towards zero),
> but -1 as computed by svasr (round towards -Inf).
You’re right, I dropped the second patch.
> 
>> We chose svdiv_impl::fold as location for the implementation to have the
>> transform applied as early as possible, such that other (existing or future)
>> gimple optimizations can be applied on the result.
>> Currently, the transform to is only applied for signed integers, because
>> there do not exist an unsigned svasrd and svasr. The transform has not (yet)
>> been implemented for svdivr.
> 
> FWIW, using svlsr for unsigned divisions should be OK.
Thanks for pointing that out, I adapted the patch to transform unsigned 
division by a power of 2 to svlsr.
> 
>> Please also comment/advise on the following:
>> In a next patch, we would like to replace SVE division by constants (other
>> than powers of 2) by multiply and shifts, similar as for scalar division.
>> This is planned to be implemented in the gimple_folder as well. Thoughts?
> 
> I'm a bit uneasy about going that far.  I suppose it comes down to a
> question about what intrinsics are for.  Are they for describing an
> algorithm, or for hand-optimising a specific implementation of the
> algorithm?  IMO it's the latter.
> 
> If people want to write out a calculation in natural arithmetic, it
> would be better to write the algorithm in scalar code and let the
> vectoriser handle it.  That gives the opportunity for many more
> optimisations than just this one.
> 
> Intrinsics are about giving programmers direct, architecture-level
> control over how something is implemented.  I've seen Arm's library
> teams go to great lengths to work out which out of a choice of
> instruction sequences is the best one, even though the sequences in
> question would look functionally equivalent to a smart-enough compiler.
> 
> So part of the work of using intrinsics is to figure out what the best
> sequence is.  And IMO, part of the contract is that the compiler
> shouldn't interfere with the programmer's choices too much.  If the
> compiler makes a change, it must very confident that it is a win for
> the function as a whole.
> 
> Replacing one division with one shift is fine, as an aid to the programmer.
> It removes the need for (say) templated functions to check for that case
> manually.  Constant folding is fine too, for similar reasons.  In these
> cases, there's not really a cost/benefit choice to be made between
> different expansions.  One choice is objectively better in all
> realistic situations.
> 
> But when it comes to general constants, there are many different choices
> that could be made when deciding which constants should be open-coded
> and which shouldn't.  IMO we should leave the choice to the programmer
> in those cases.  If the compiler gets it wrong, there will be no way
> for the programmer to force the compiler's hand ("no, when I say svdiv,
> I really do mean svdiv”).
Makes sense, then we will not pursue this further and only leave the current 
optimization.
Best, Jennifer
> 
> Thanks,
> Richard



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant.

2024-07-29 Thread Jennifer Schmitz
Dear Richard,
I revised the patch according to your comments and also implemented the 
transform for unsigned division; more comments inline below.
The new patch was bootstrapped and tested again.
Looking forward to your feedback.
Thanks,
Jennifer



0001-SVE-intrinsics-Add-strength-reduction-for-division-b.patch
Description: Binary data

> On 17 Jul 2024, at 09:57, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>> This patch folds signed SVE division where all divisor elements are the same
>> power of 2 to svasrd. Tests were added to check 1) whether the transform is
>> applied, i.e. asrd is used, and 2) correctness for all possible input types
>> for svdiv, predication, and a variety of values. As the transform is applied
>> only to signed integers, correctness for predication and values was only
>> tested for svint32_t and svint64_t.
>> Existing svdiv tests were adjusted such that the divisor is no longer a
>> power of 2.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>> 
>>  * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
>>  fold and expand.
>> 
>> gcc/testsuite/
>> 
>>  * gcc.target/aarch64/sve/div_const_1.c: New test.
>>  * gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
>>  * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
>>  * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
>> 
>> From e8ffbab52ad7b9307cbfc9dbca4ef4d20e08804b Mon Sep 17 00:00:00 2001
>> From: Jennifer Schmitz 
>> Date: Tue, 16 Jul 2024 01:59:50 -0700
>> Subject: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by
>> constant.
>> 
>> This patch folds signed SVE division where all divisor elements are the same
>> power of 2 to svasrd. Tests were added to check 1) whether the transform is
>> applied, i.e. asrd is used, and 2) correctness for all possible input types
>> for svdiv, predication, and a variety of values. As the transform is applied
>> only to signed integers, correctness for predication and values was only
>> tested for svint32_t and svint64_t.
>> Existing svdiv tests were adjusted such that the divisor is no longer a
>> power of 2.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>> 
>>  * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
>>  fold and expand.
>> 
>> gcc/testsuite/
>> 
>>  * gcc.target/aarch64/sve/div_const_1.c: New test.
>>  * gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
>>  * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
>>  * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc  | 44 -
>> .../gcc.target/aarch64/sve/acle/asm/div_s32.c | 60 ++--
>> .../gcc.target/aarch64/sve/acle/asm/div_s64.c | 60 ++--
>> .../gcc.target/aarch64/sve/div_const_1.c  | 34 +++
>> .../gcc.target/aarch64/sve/div_const_1_run.c  | 91 +++
>> 5 files changed, 228 insertions(+), 61 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
>> 
>> diff --git 
>> a/gcc/config/aarch64/aarch64-sve-builtins-base.ccb/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index aa26370d397..d821cc96588 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -746,6 +746,48 @@ public:
>>   }
>> };
>> 
>> +class svdiv_impl : public unspec_based_function
>> +{
>> +public:
>> +  CONSTEXPR svdiv_impl ()
>> +: unspec_based_function (DIV, UDIV, UNSPEC_COND_FDIV) {}
>> +
>> +  gimple *
>> +  fold (gimple_folder &f) const override
>> +  {
>> +tree divisor = gimple_call_arg (f.call, 2);
>> +tree divisor_cst = uniform_integer_cst_p (divisor);
>> +
>> +if (f.type_suffix (0).unsigned_p)
>> +  {
>> + return NULL;
>> +  }
> 
> We might as well test this first, since it doesn't depend on the
> divisor_cst result.
> 
> Formatting nit: should be no braces for single statements, so:
> 
>if (f.type_suffix (0).unsigned_p)
>  return NULL;
> 
> Same for the others.
Done.
> 
>> +
>> +if (!divisor_cst)
>> +  {
>> + return NULL;
>> +  }
>> +
>> +if (!integer_pow2p (divisor_cst))
>> +  {
>> + return NULL;
>> +  }
>> +
>> +function_instance instance ("svasrd", functions::svasrd, 
>> shapes::shift_right_imm, MODE_n, f.type_suffix_ids, GROUP_none, f.pred);
> 
> This line is above the 80 character limit.  Maybe:
> 
>function_instance instance ("svasrd", functions::svasrd,
>shapes::shift_right_imm, MODE_n,
>f.type_suffi

Re: [Patch] libgomp.texi: Update 'Device Information Routines' section

2024-07-29 Thread Sandra Loosemore

On 7/29/24 06:12, Tobias Burnus wrote:
I recently stumbled over omp_get_default_device returning -1 (= 
omp_initial_device)
vs. returning omp_get_num_devices(). Thus, it makes sense to document 
this properly.
I also updated some wording and made a tiny step to documenting the 
missing functions

by adding a title to the commented @menu items.

→ 
https://gcc.gnu.org/onlinedocs/libgomp/#toc-OpenMP-Runtime-Library-Routines

for the current wording.

Comments or suggestions before I commit it?


Looks OK to me, although I'd suggest s/without device clause/without a 
device clause/g.


-Sandra



Re: [PATCH 4/5] RISC-V: Add support to vector stack-clash protection

2024-07-29 Thread Jeff Law




On 7/29/24 6:18 AM, Raphael Zinsly wrote:

On Fri, Jul 26, 2024 at 6:48 PM Jeff Law  wrote:




On 7/24/24 12:00 PM, Raphael Moreira Zinsly wrote:

Adds basic support to vector stack-clash protection using a loop to do
the probing and stack adjustments.

gcc/ChangeLog:
   * config/riscv/riscv.cc
   (riscv_allocate_and_probe_stack_loop): New function.
   (riscv_v_adjust_scalable_frame): Add stack-clash protection
   support.
   (riscv_allocate_and_probe_stack_space): Move the probe loop
   implementation to riscv_allocate_and_probe_stack_loop.
   * config/riscv/riscv.h: Define RISCV_STACK_CLASH_VECTOR_CFA_REGNUM.

gcc/testsuite/ChangeLog:
   * gcc.target/riscv/stack-check-cfa-3.c: New test.
   * gcc.target/riscv/stack-check-prologue-16.c: New test.
   * gcc.target/riscv/struct_vect_24.c: New test.

So my only worry here is using another scratch register in the prologue
code instead of using one of the preexisting prologue scratch registers.
   Is there a reasonable way to use  PROLOGUE_TEMP or PROLOGUE_TEMP2 here?


These are the preexisting prologue scratch registers: PROLOGUE_TEMP is
t0 and PROLOGUE_TEMP2 is t1.


Otherwise this looks good as well.  So let's get closure on that
question and we can move forward after that.
Right.  And so my question is can we use PROLOGUE_TEMP or PROLOGUE_TEMP2 
rather than defining another temporary for the prologue?


It may not seem all that important, but the more distinct hardware 
register we use this way, the more likely we are to run into problems 
with -fcall-saved- options.  Right now I suspect both the risc-v 
and aarch64 ports are broken WRT the -fcall-saved- option.  We 
shouldn't make it worse if we can avoid it.


jeff


[r15-2378 Regression] FAIL: gfortran.dg/compiler-directive_2.f -O (test for excess errors) on Linux/x86_64

2024-07-29 Thread haochen.jiang
On Linux/x86_64,

29b1587e7d34667a1fd63071c1e4f5475cd71026 is the first bad commit
commit 29b1587e7d34667a1fd63071c1e4f5475cd71026
Author: Tobias Burnus 
Date:   Mon Jul 29 11:46:57 2024 +0200

OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause 
[PR115559]

caused

FAIL: gfortran.dg/compiler-directive_2.f   -O   (test for errors, line 8)
FAIL: gfortran.dg/compiler-directive_2.f   -O  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-2378/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/compiler-directive_2.f 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/compiler-directive_2.f 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] btf: Protect BTF_KIND_INFO against invalid kind

2024-07-29 Thread Will Hawkins
If the user provides a kind value that is more than 5 bits, the
BTF_KIND_INFO macro would emit incorrect values for info (by clobbering
values of the kind flag).

Tested on x86_64-redhat-linux.

include/ChangeLog:

* btf.h (BTF_TYPE_INFO): Protect against user providing invalid
  kind.

Signed-off-by: Will Hawkins 
---

Notes:
 I have a small out-of-tree test but was not sure whether a) it should
 be included and/or b) where it should be included. If you would
 like me to include it, please just let me know where it should 
go!

 include/btf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/btf.h b/include/btf.h
index 3f45ffb0b6b..0c3e1a1cf51 100644
--- a/include/btf.h
+++ b/include/btf.h
@@ -82,7 +82,7 @@ struct btf_type
   };
 };
 
-/* The folloing macros access the information encoded in btf_type.info.  */
+/* The following macros access the information encoded in btf_type.info.  */
 /* Type kind. See below.  */
 #define BTF_INFO_KIND(info)(((info) >> 24) & 0x1f)
 /* Number of entries of variable length data following certain type kinds.
@@ -95,7 +95,7 @@ struct btf_type
 
 /* Encoding for struct btf_type.info.  */
 #define BTF_TYPE_INFO(kind, kflag, vlen) \
-  kflag) ? 1 : 0 ) << 31) | ((kind) << 24) | ((vlen) & 0x))
+  kflag) ? 1 : 0 ) << 31) | ((kind & 0x1f) << 24) | ((vlen) & 0x))
 
 #define BTF_KIND_UNKN  0   /* Unknown or invalid.  */
 #define BTF_KIND_INT   1   /* Integer.  */
-- 
2.45.2



Re: [PATCH 4/5] RISC-V: Add support to vector stack-clash protection

2024-07-29 Thread Raphael Zinsly
On Mon, Jul 29, 2024 at 11:20 AM Jeff Law  wrote:
>
>
>
> On 7/29/24 6:18 AM, Raphael Zinsly wrote:
> > On Fri, Jul 26, 2024 at 6:48 PM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 7/24/24 12:00 PM, Raphael Moreira Zinsly wrote:
> >>> Adds basic support to vector stack-clash protection using a loop to do
> >>> the probing and stack adjustments.
> >>>
> >>> gcc/ChangeLog:
> >>>* config/riscv/riscv.cc
> >>>(riscv_allocate_and_probe_stack_loop): New function.
> >>>(riscv_v_adjust_scalable_frame): Add stack-clash protection
> >>>support.
> >>>(riscv_allocate_and_probe_stack_space): Move the probe loop
> >>>implementation to riscv_allocate_and_probe_stack_loop.
> >>>* config/riscv/riscv.h: Define RISCV_STACK_CLASH_VECTOR_CFA_REGNUM.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>* gcc.target/riscv/stack-check-cfa-3.c: New test.
> >>>* gcc.target/riscv/stack-check-prologue-16.c: New test.
> >>>* gcc.target/riscv/struct_vect_24.c: New test.
> >> So my only worry here is using another scratch register in the prologue
> >> code instead of using one of the preexisting prologue scratch registers.
> >>Is there a reasonable way to use  PROLOGUE_TEMP or PROLOGUE_TEMP2 here?
> >
> > These are the preexisting prologue scratch registers: PROLOGUE_TEMP is
> > t0 and PROLOGUE_TEMP2 is t1.
> >
> >> Otherwise this looks good as well.  So let's get closure on that
> >> question and we can move forward after that.
> Right.  And so my question is can we use PROLOGUE_TEMP or PROLOGUE_TEMP2
> rather than defining another temporary for the prologue?

We are only using these two and we do not need to use another temporary.
Do you mean stop using riscv_force_temporary?
If so, yes, we can change it to riscv_emit_move.

> It may not seem all that important, but the more distinct hardware
> register we use this way, the more likely we are to run into problems
> with -fcall-saved- options.  Right now I suspect both the risc-v
> and aarch64 ports are broken WRT the -fcall-saved- option.  We
> shouldn't make it worse if we can avoid it.
>
> jeff


Thanks,
-- 
Raphael Moreira Zinsly


[PATCH v1] gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()

2024-07-29 Thread Alejandro Colomar
The old name was misleading.

While at it, also rename some temporary variables that are used with
this function, for consistency.

Link: 
https://inbox.sourceware.org/gcc-patches/9fffd80-dca-2c7e-14b-6c9b509a7...@redhat.com/T/#m2f661c67c8f7b2c405c8c7fc3152dd85dc729120
Cc: Gabriel Ravier 
Cc: Martin Uecker 
Cc: Joseph Myers 
Cc: Xavier Del Campo Romero 

gcc/ChangeLog:

* tree.cc (array_type_nelts): Rename function ...
(array_type_nelts_minus_one): ... to this name.  The old name
was misleading.
* tree.h: Likewise.
* c/c-decl.cc: Likewise.
* c/c-fold.cc: Likewise.
* config/aarch64/aarch64.cc: Likewise.
* config/i386/i386.cc: Likewise.
* cp/decl.cc: Likewise.
* cp/init.cc: Likewise.
* cp/lambda.cc: Likewise.
* cp/tree.cc: Likewise.
* expr.cc: Likewise.
* fortran/trans-array.cc: Likewise.
* fortran/trans-openmp.cc: Likewise.
* rust/backend/rust-tree.cc: Likewise.

Suggested-by: Richard Biener 
Signed-off-by: Alejandro Colomar 
---
Range-diff against v0:
-:  --- > 1:  82efbc3c540 gcc/: Rename array_type_nelts() => 
array_type_nelts_minus_one()

 gcc/c/c-decl.cc   | 10 +-
 gcc/c/c-fold.cc   |  7 ---
 gcc/config/aarch64/aarch64.cc |  2 +-
 gcc/config/i386/i386.cc   |  2 +-
 gcc/cp/decl.cc|  2 +-
 gcc/cp/init.cc|  8 
 gcc/cp/lambda.cc  |  3 ++-
 gcc/cp/tree.cc|  2 +-
 gcc/expr.cc   |  8 
 gcc/fortran/trans-array.cc|  2 +-
 gcc/fortran/trans-openmp.cc   |  4 ++--
 gcc/rust/backend/rust-tree.cc |  2 +-
 gcc/tree.cc   |  4 ++--
 gcc/tree.h|  2 +-
 14 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 97f1d346835..4dced430d1f 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5309,7 +5309,7 @@ one_element_array_type_p (const_tree type)
 {
   if (TREE_CODE (type) != ARRAY_TYPE)
 return false;
-  return integer_zerop (array_type_nelts (type));
+  return integer_zerop (array_type_nelts_minus_one (type));
 }
 
 /* Determine whether TYPE is a zero-length array type "[0]".  */
@@ -6257,15 +6257,15 @@ get_parm_array_spec (const struct c_parm *parm, tree 
attrs)
  for (tree type = parm->specs->type; TREE_CODE (type) == ARRAY_TYPE;
   type = TREE_TYPE (type))
{
- tree nelts = array_type_nelts (type);
- if (error_operand_p (nelts))
+ tree nelts_minus_one = array_type_nelts_minus_one (type);
+ if (error_operand_p (nelts_minus_one))
return attrs;
- if (TREE_CODE (nelts) != INTEGER_CST)
+ if (TREE_CODE (nelts_minus_one) != INTEGER_CST)
{
  /* Each variable VLA bound is represented by the dollar
 sign.  */
  spec += "$";
- tpbnds = tree_cons (NULL_TREE, nelts, tpbnds);
+ tpbnds = tree_cons (NULL_TREE, nelts_minus_one, tpbnds);
}
}
  tpbnds = nreverse (tpbnds);
diff --git a/gcc/c/c-fold.cc b/gcc/c/c-fold.cc
index 57b67c74bd8..9ea174f79c4 100644
--- a/gcc/c/c-fold.cc
+++ b/gcc/c/c-fold.cc
@@ -73,11 +73,12 @@ c_fold_array_ref (tree type, tree ary, tree index)
   unsigned elem_nchars = (TYPE_PRECISION (elem_type)
  / TYPE_PRECISION (char_type_node));
   unsigned len = (unsigned) TREE_STRING_LENGTH (ary) / elem_nchars;
-  tree nelts = array_type_nelts (TREE_TYPE (ary));
+  tree nelts_minus_one = array_type_nelts_minus_one (TREE_TYPE (ary));
   bool dummy1 = true, dummy2 = true;
-  nelts = c_fully_fold_internal (nelts, true, &dummy1, &dummy2, false, false);
+  nelts_minus_one = c_fully_fold_internal (nelts_minus_one, true, &dummy1,
+  &dummy2, false, false);
   unsigned HOST_WIDE_INT i = tree_to_uhwi (index);
-  if (!tree_int_cst_le (index, nelts)
+  if (!tree_int_cst_le (index, nelts_minus_one)
   || i >= len
   || i + elem_nchars > len)
 return NULL_TREE;
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 0d41a193ec1..eaef2a0e985 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1082,7 +1082,7 @@ pure_scalable_type_info::analyze_array (const_tree type)
 
   /* An array of unknown, flexible or variable length will be passed and
  returned by reference whatever we do.  */
-  tree nelts_minus_one = array_type_nelts (type);
+  tree nelts_minus_one = array_type_nelts_minus_one (type);
   if (!tree_fits_uhwi_p (nelts_minus_one))
 return DOESNT_MATTER;
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 9c2ebe74fc9..298d8c9131a 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24519,7 +24519,7 @@ ix86_canonical_va_list_type (tree type)
r

Re: [PATCH] rs6000, add comment to VEC_IC definition

2024-07-29 Thread Carl Love

Kewen:

On 7/29/24 3:21 AM, Kewen.Lin wrote:

index 0d3e0a24e11..75d95ccfb47 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,7 +26,8 @@
  ;; Vector int modes
  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])

-;; Vector int modes for comparison, shift and rotation
+;; Vector int modes for comparison, shift and rotation.  ISA 3.1 adds the V1TI 
mode
+;; for the int128 type.

Maybe s/int128/vector int128/, OK with/without this nit tweaked, thanks!

OK, made the change and committed the patch.

Thanks.

   Carl


Re: [PATCH v2] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-29 Thread Filip Kastl
Hi Richard,

> > Sorry, I'm not sure if I understand.  Are you suggesting something like 
> > this?
> > 
> > if (idom(default bb) == cond bb)
> > {
> >   if (exists a path from default bb to final bb)
> >   {
> > idom(final bb) = cond bb;
> >   }
> >   else
> >   {
> > idom(final bb) = switch bb;
> >   }
> > }
> > else
> > {
> >   // idom(final bb) doesn't change
> > }

Sidenote: I've just noticed that this code wouldn't work since the original
idom of final_bb may be some block outside of the switch.  In that case, idom
of final_bb should remain the same after the transformation regardless of if
idom(default bb) == cond bb.  So the code would look like this

if (original idom(final bb) == switch bb and idom(default bb) == cond bb)
{
  if (exists a path from default bb to final bb)
  {
idom(final bb) = cond bb;
  }
  else
  {
idom(final bb) = switch bb;
  }
}
else
{
  // idom(final bb) doesn't change
}

> > 
> > If so, how do I implement testing existence of a path from default bb to 
> > final
> > bb?  I guess I could do BFS but that seems like a pretty complicated 
> > solution.
> > > 
> > > That said, even above if there's a merge of the default BB and final BB
> > > downstream in the CFG, inserting cond BB requires adjustment of the
> > > immediate dominator of that merge block and you are missing that?
> > 
> > I think this isn't a problem because I do
> > 
> > redirect_immediate_dominators (CDI_DOMINATORS, swtch_bb, cond_bb);
> 
> Hmm, I'm probably just confused.  So the problem is that
> redirect_immediate_dominators makes the dominator of final_bb incorrect
> (but also all case_bb immediate dominator?!)?

Yes, the problem is what the idom of final_bb should be after the
transformation.  However, redirect_immediate_dominators doesn't *make* the idom
of final_bb incorrect.  It may have been already incorrect before the call (the
call may also possibly make the idom correct btw).

This has probably already been clear to you.  I'm just making sure we're on the
same page.

> 
> Ah, I see you fix those up.  Then 2.) is left - the final block.  Iff
> the final block needs adjustment you know there was a path from
> the default case to it which means one of its predecessors is dominated
> by the default case?  In that case, adjust the dominator to cond_bb,
> otherwise leave it at switch_bb?

Yes, what I'm saying is that if I want to know idom of final_bb after the
transformation, I have to know if there is a path between default_bb and
final_bb.  It is because of these two cases:

1.

cond BB -+
   | |
switch BB ---+   |
/  |  \   \  |
case BBsdefault BB
\  |  /   /
final BB <---+  <- this may be an edge or a path
   |

2.

cond BB -+
   | |
switch BB ---+   |
/  |  \   \  |
case BBsdefault BB
\  |  /   /
final BB / <- this may be an edge or a path
   |/

In the first case, there is a path between default_bb and final_bb and in the
second there isn't.  Otherwise the cases are the same.  In the first case idom
of final_bb should be cond_bb.  In the second case idom of final_bb should be
switch_bb. Algorithm deciding what should be idom of final_bb therefore has to
know if there is a path between default_bb and final_bb.

You said that if there is a path between default_bb and final_bb, one of the
predecessors of final_bb is dominated by default_bb.  That would indeed give a
nice way to check existence of a path between default_bb and final_bb.  But
does it hold?  Consider this situation:

   | |
cond BB --+
   | ||
switch BB +   |
/  |  \  | \  |
case BBs |default BB
\  |  /  |/
final BB <- pred BB -+
   |

Here no predecessors of final_bb are dominated by default_bb but at the same
time there does exist a path from default_bb to final_bb.  Or is this CFG
impossible for some reason?

Btw to further check that we're on the same page:  Right now we're only trying
to figure out if there is a way to update idom of final_bb after the
transformation without using iterate_fix_dominators, right?  The rest of my
dominator fixing code makes sense / is ok?

Cheers,
Filip Kastl


Ping * 2: [PATCH v2] Provide more contexts for -Warray-bounds warning messages

2024-07-29 Thread Qing Zhao
The 2nd ping for the following patch:

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657150.html

thanks.

Qing

> On Jul 22, 2024, at 09:01, Qing Zhao  wrote:
> 
> Hi, Richard,
> 
> Could you please take a look at the patch and let me know any comment you 
> have (especially on the middle-end part)?
> 
> David, let me know if you have further comment and suggestions. 
> 
> Thanks a lot.
> 
> Qing
> 
>> On Jul 12, 2024, at 10:03, Qing Zhao  wrote:
>> 
>> due to code duplication from jump threading [PR109071]
>> Control this with a new option -fdiagnostic-explain-harder.
>> 
>> Compared to V1, the major difference are: (address David's comments)
>> 
>> 1. Change the name of the option from:
>> 
>> -fdiagnostic-try-to-explain-harder 
>> To
>> -fdiagnostic-explain-harder 
>> 
>> 2. Sync the commit comments with the real output of the compilation message.
>> 
>> 3. Add one more event in the end of the path to repeat the out-of-bound
>>  issue.
>> 
>> 4. Fixed the unnecessary changes in Makefile.in.
>> 
>> 5. Add new copy_history_diagnostic_path.[cc|h] to implement a new
>> class copy_history_diagnostic_path : public diagnostic_path
>> 
>> for copy_history_t. 
>> 
>> 6. Only building the rich locaiton and populating the path when warning_at
>> is called.
>> 
>> There are two comments from David that I didn't addressed in this version:
>> 
>> 1. Make regenerate-opt-urls.
>> will do this in a later version. 
>> 
>> 2. Add a ⚠️  emoji for the last event. 
>> I didn't add this yet since I think the current message is clear enough.
>> might not worth the effort to add this emoji (it's not that straightforward
>> to add on). 
>> 
>> With this new version, the message emitted by GCC:
>> 
>> $gcc -O2 -Wall -fdiagnostics-explain-harder -c -o t.o t.c
>> t.c: In function ‘sparx5_set’:
>> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
>> [-Warray-bounds=]
>>  12 |   int *val = &sg->vals[index];
>> |   ^~~
>> ‘sparx5_set’: events 1-2
>>   4 |   if (*index >= 4)
>> |  ^
>> |  |
>> |  (1) when the condition is evaluated to true
>> ..
>>  12 |   int *val = &sg->vals[index];
>> |   ~~~
>> |   |
>> |   (2) out of array bounds here
>> t.c:8:18: note: while referencing ‘vals’
>>   8 | struct nums {int vals[4];};
>> |  ^~~~
>> 
>> Bootstrapped and regression tested on both aarch64 and x86. no issues.
>> 
>> Let me know any further comments and suggestions.
>> 
>> thanks.
>> 
>> Qing
>> 
>> ==
>> $ cat t.c
>> extern void warn(void);
>> static inline void assign(int val, int *regs, int *index)
>> {
>> if (*index >= 4)
>>   warn();
>> *regs = val;
>> }
>> struct nums {int vals[4];};
>> 
>> void sparx5_set (int *ptr, struct nums *sg, int index)
>> {
>> int *val = &sg->vals[index];
>> 
>> assign(0,ptr, &index);
>> assign(*val, ptr, &index);
>> }
>> 
>> $ gcc -Wall -O2  -c -o t.o t.c
>> t.c: In function ‘sparx5_set’:
>> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
>> [-Warray-bounds=]
>>  12 |   int *val = &sg->vals[index];
>> |   ^~~
>> t.c:8:18: note: while referencing ‘vals’
>>   8 | struct nums {int vals[4];};
>> |  ^~~~
>> 
>> In the above, Although the warning is correct in theory, the warning message
>> itself is confusing to the end-user since there is information that cannot
>> be connected to the source code directly.
>> 
>> It will be a nice improvement to add more information in the warning message
>> to report where such index value come from.
>> 
>> In order to achieve this, we add a new data structure copy_history to record
>> the condition and the transformation that triggered the code duplication.
>> Whenever there is a code duplication due to some specific transformations,
>> such as jump threading, loop switching, etc, a copy_history structure is
>> created and attached to the duplicated gimple statement.
>> 
>> During array out-of-bound checking or other warning checking, the 
>> copy_history
>> that was attached to the gimple statement is used to form a sequence of
>> diagnostic events that are added to the corresponding rich location to be 
>> used
>> to report the warning message.
>> 
>> This behavior is controled by the new option -fdiagnostic-explain-harder
>> which is off by default.
>> 
>> With this change, by adding -fdiagnostic-explain-harder,
>> the warning message for the above testing case is now:
>> 
>> $ gcc -Wall -O2 -fdiagnostics-explain-harder -c -o t.o t.c
>> t.c: In function ‘sparx5_set’:
>> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
>> [-Warray-bounds=]
>>  12 |   int *val = &sg->vals[index];
>> |   ^~~
>> ‘sparx5_set’: events 1-2
>>   4 |   if (*index >= 4)
>> |  ^
>> |  |
>> |  (1) when the condition is evalua

[PATCH] c++: generic lambda as default template argument [PR88313]

2024-07-29 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK for trunk and perhaps 14.3?  It should only make a differenc
for C++20 code where lambdas are permitted as template arguments.

-- >8 --

Here we're rejecting the generic lambda inside the default template
argument ultimately because auto_is_implicit_function_template_parm_p
doesn't get set during parsing of the lambda's parameter list, due
to the !processing_template_parmlist restriction.  But when parsing a
lambda parameter list we should always set that flag regardless of where
the lambda appears.  This patch makes sure this happens by way of a
local lambda_p flag.

PR c++/88313

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_declarator_opt): Pass
lambda_p=true to cp_parser_parameter_declaration_clause.
(cp_parser_direct_declarator): Pass lambda_p=false to
to cp_parser_parameter_declaration_clause.
(cp_parser_parameter_declaration_clause): Add bool lambda_p
parameter.  Consider lambda_p instead of current_class_type
when setting parser->auto_is_implicit_function_template_parm_p.
Don't consider processing_template_parmlist.
(cp_parser_requirement_parameter_list): Pass lambda_p=false
to cp_parser_parameter_declaration_clause.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ6.C: New test.
---
 gcc/cp/parser.cc  | 34 +--
 gcc/testsuite/g++.dg/cpp2a/lambda-targ6.C | 11 
 2 files changed, 31 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ6.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f79736c17ac..f5336eae74a 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2600,7 +2600,7 @@ static tree cp_parser_type_id_1
 static void cp_parser_type_specifier_seq
   (cp_parser *, cp_parser_flags, bool, bool, cp_decl_specifier_seq *);
 static tree cp_parser_parameter_declaration_clause
-  (cp_parser *, cp_parser_flags);
+  (cp_parser *, cp_parser_flags, bool);
 static tree cp_parser_parameter_declaration_list
   (cp_parser *, cp_parser_flags, auto_vec *);
 static cp_parameter_declarator *cp_parser_parameter_declaration
@@ -11889,7 +11889,7 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
   /* Parse parameters.  */
   param_list
= cp_parser_parameter_declaration_clause
-   (parser, CP_PARSER_FLAGS_TYPENAME_OPTIONAL);
+   (parser, CP_PARSER_FLAGS_TYPENAME_OPTIONAL, /*lambda_p=*/true);
 
   /* Default arguments shall not be specified in the
 parameter-declaration-clause of a lambda-declarator.  */
@@ -24097,7 +24097,8 @@ cp_parser_direct_declarator (cp_parser* parser,
 
  /* Parse the parameter-declaration-clause.  */
  params
-   = cp_parser_parameter_declaration_clause (parser, flags);
+   = cp_parser_parameter_declaration_clause (parser, flags,
+ /*lambda=*/false);
  const location_t parens_end
= cp_lexer_peek_token (parser->lexer)->location;
 
@@ -25444,13 +25445,17 @@ function_being_declared_is_template_p (cp_parser* 
parser)
 
The parser flags FLAGS is used to control type-specifier parsing.
 
+   LAMBDA_P is true if this is the parameter-declaration-clause of
+   a lambda-declarator.
+
Returns a representation for the parameter declarations.  A return
value of NULL indicates a parameter-declaration-clause consisting
only of an ellipsis.  */
 
 static tree
 cp_parser_parameter_declaration_clause (cp_parser* parser,
-   cp_parser_flags flags)
+   cp_parser_flags flags,
+   bool lambda_p)
 {
   tree parameters;
   cp_token *token;
@@ -25459,15 +25464,15 @@ cp_parser_parameter_declaration_clause (cp_parser* 
parser,
   auto cleanup = make_temp_override
 (parser->auto_is_implicit_function_template_parm_p);
 
-  if (!processing_specialization
-  && !processing_template_parmlist
-  && !processing_explicit_instantiation
-  /* default_arg_ok_p tracks whether this is a parameter-clause for an
- actual function or a random abstract declarator.  */
-  && parser->default_arg_ok_p)
-if (!current_function_decl
-   || (current_class_type && LAMBDA_TYPE_P (current_class_type)))
-  parser->auto_is_implicit_function_template_parm_p = true;
+  if (lambda_p
+  || (!processing_specialization
+ && !processing_template_parmlist
+ && !processing_explicit_instantiation
+ /* default_arg_ok_p tracks whether this is a parameter-clause for an
+actual function or a random abstract declarator.  */
+ && parser->default_arg_ok_p
+ && !current_function_decl))
+parser->auto_is_implicit_function_template_parm_p = true;
 
   /* Peek at the next token.  */
   to

Re: [PATCH ver 2] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-29 Thread Peter Bergner
On 7/29/24 5:21 AM, Kewen.Lin wrote:
> on 2024/7/27 06:37, Carl Love wrote:
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
>> @@ -0,0 +1,358 @@
>> +/* { dg-do run  { target power10_hw } } */
>> +/* { dg-do link { target { ! power10_hw } } } */
>> +/* { dg-require-effective-target power10_ok } */
> 
> As Peter pointed out in another thread, you need int128 effective target 
> check as well,
> otherwise it will fail with power10 -m32.
> 
> Another nit: power10_hw should already guarantee power10_ok, so power10_ok
> is only required for dg-do link.

I really dislike those *_ok tests.  The power10_ok test doesn't verify that
the options being used to compile the test case enables Power10.  It only
verifies the assembler you're using is Power10 enabled.  I agree that the
power10_hw test includes the same (useless) assembler check that power10_ok
includes, so power10_ok isn't needed.


Those *_ok tests really should be verifying the compiler options that will
be used to compile the test case enables the features the test case is
attempting to use.



Maybe the following will work?

+/* { dg-do run  { target power10_hw } } */
+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-require-effective-target int128 } */
...

Carl, can you try testing the above change on ltcd97-lp7 and run the test
in both 32-bit and 64-bit modes?

Peter



Polish libstdc++ 'dg-final' action 'file-io-diff' (was: [PATCH 4/8] libstdc++: Add file-io-diff to replace @diff@ markup in I/O tests)

2024-07-29 Thread Thomas Schwinge
Hi Jonathan!

On 2024-07-22T17:28:38+0100, Jonathan Wakely  wrote:
> This adds a new dg-final action to compare two files after a test has
> run, [...]

Nice!

> --- a/libstdc++-v3/testsuite/lib/libstdc++.exp
> +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp

> +# Compare output file written by test to expected result.
> +# With two arguments the comparison is done via 'diff arg1 arg2'.
> +# With one argument the comparison is done via 'diff arg1.tst arg1.txt'.
> +proc file-io-diff { args } {
> +set nargs [llength $args]
> +if { $nargs < 1 } {
> + error "too few arguments to file-io-diff"
> +}
> +if { $nargs > 2 } {
> + error "too many arguments to file-io-diff"
> +}
> +if { $nargs == 1 } {
> + set file1 [lindex $args 0]
> + set file2 "${file1}.txt"
> + append file1 ".tst"
> +} else {
> + set file1 [lindex $args 0]
> + set file2 [lindex $args 1]
> +}
> +
> +spawn -noecho diff -u $file1 $file2
> +expect {
> +  -re ".+" {
> + set msg "files differ\n"
> + append msg $expect_out(0,string)
> + fail $msg
> + exp_continue
> +  }
> +}
> +return
> +}

Via "deficient" GCN and nvptx target testing of my WIP C++ enablement
trees, I ran into a minor nuisance here; OK to push the attached
"Polish libstdc++ 'dg-final' action 'file-io-diff'"?

On a proper target, powerpc64le GNU/Linux, with:

$ echo yo >> libstdc++-v3/testsuite/data/istream_extractor_other-2.tst

... injected just for demonstration purposes (triggering for
'27_io/basic_istream/extractors_other/char/2.cc' one instance of
'FAIL: files differ'), my patch changes 'libstdc++.sum' as follows:

--- 
build-gcc/powerpc64le-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum
2024-07-29 17:11:42.333879749 +0200
+++ 
build-gcc/powerpc64le-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum
2024-07-29 15:46:27.528214632 +0200
@@ -1,2 +1,2 @@
-Test run by tschwinge on Mon Jul 29 16:53:34 2024
+Test run by tschwinge on Mon Jul 29 15:28:20 2024
 Native configuration is powerpc64le-unknown-linux-gnu
@@ -11075,2 +11075,3 @@
 PASS: 27_io/basic_filebuf/close/12790-1.cc  -std=gnu++17 execution test
+PASS: 27_io/basic_filebuf/close/char/1.cc  -std=gnu++17  file-io-diff 
filebuf_members-1
 PASS: 27_io/basic_filebuf/close/char/1.cc  -std=gnu++17 (test for excess 
errors)
@@ -11722,2 +11723,4 @@
 PASS: 27_io/basic_istream/extractors_other/char/1.cc  -std=gnu++17 
execution test
+PASS: 27_io/basic_istream/extractors_other/char/2.cc  -std=gnu++17  
file-io-diff istream_extractor_other-1
+FAIL: 27_io/basic_istream/extractors_other/char/2.cc  -std=gnu++17  
file-io-diff istream_extractor_other-2
 PASS: 27_io/basic_istream/extractors_other/char/2.cc  -std=gnu++17 (test 
for excess errors)
@@ -11748,2 +11751,4 @@
 PASS: 27_io/basic_istream/extractors_other/wchar_t/1.cc  -std=gnu++17 
execution test
+PASS: 27_io/basic_istream/extractors_other/wchar_t/2.cc  -std=gnu++17  
file-io-diff wistream_extractor_other-1
+PASS: 27_io/basic_istream/extractors_other/wchar_t/2.cc  -std=gnu++17  
file-io-diff wistream_extractor_other-2
 PASS: 27_io/basic_istream/extractors_other/wchar_t/2.cc  -std=gnu++17 
(test for excess errors)
@@ -11772,2 +11777,3 @@
 PASS: 27_io/basic_istream/get/char/1.cc  -std=gnu++17 execution test
+PASS: 27_io/basic_istream/get/char/2.cc  -std=gnu++17  file-io-diff 
istream_unformatted-1
 PASS: 27_io/basic_istream/get/char/2.cc  -std=gnu++17 (test for excess 
errors)
@@ -11779,2 +11785,3 @@
 PASS: 27_io/basic_istream/get/wchar_t/1.cc  -std=gnu++17 execution test
+PASS: 27_io/basic_istream/get/wchar_t/2.cc  -std=gnu++17  file-io-diff 
istream_unformatted-1
 PASS: 27_io/basic_istream/get/wchar_t/2.cc  -std=gnu++17 (test for excess 
errors)
@@ -11812,2 +11819,3 @@
 PASS: 27_io/basic_istream/ignore/char/2.cc  -std=gnu++17 execution test
+PASS: 27_io/basic_istream/ignore/char/3.cc  -std=gnu++17  file-io-diff 
istream_unformatted-1
 PASS: 27_io/basic_istream/ignore/char/3.cc  -std=gnu++17 (test for excess 
errors)
@@ -11828,2 +11836,3 @@
 PASS: 27_io/basic_istream/ignore/wchar_t/2.cc  -std=gnu++17 execution test
+PASS: 27_io/basic_istream/ignore/wchar_t/3.cc  -std=gnu++17  file-io-diff 
istream_unformatted-1
 PASS: 27_io/basic_istream/ignore/wchar_t/3.cc  -std=gnu++17 (test for 
excess errors)
@@ -11844,2 +11853,3 @@
 PASS: 27_io/basic_istream/peek/char/12296.cc  -std=gnu++17 execution test
+PASS: 27_io/basic_istream/peek/char/6414.cc  -std=gnu++17  file-io-diff 
istream_seeks-1
 PASS: 27_io/basic_istream/peek/char/6414.cc  -std=gnu++17 (test for excess 
errors)
@@ -11850,2 +11860,3 @@
 PASS: 27_io/basic_istream/peek/wchar_t/12296.cc  -std=gnu++17 execution 
test
+PASS: 27_io/basic_istream/peek/wchar_t/6414.cc  -std=gnu++17  file-io-diff 
wistream_seeks-1
 PASS: 27_io/basic_istream/peek/wchar_t

Re: [PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-29 Thread Andi Kleen
> ..., that means that a number of the new test cases are UNSUPPORTED, for
> example, x86_64 GNU/Linux:
> 
> +UNSUPPORTED: c-c++-common/musttail1.c  -Wc++-compat 
> +UNSUPPORTED: c-c++-common/musttail12.c  -Wc++-compat 
> +PASS: c-c++-common/musttail13.c  -Wc++-compat   (test for errors, line 4)
> +PASS: c-c++-common/musttail13.c  -Wc++-compat  (test for excess errors)
> +UNSUPPORTED: c-c++-common/musttail2.c  -Wc++-compat 
> +UNSUPPORTED: c-c++-common/musttail3.c  -Wc++-compat 
> +UNSUPPORTED: c-c++-common/musttail4.c  -Wc++-compat 
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for errors, line 17)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 10)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 11)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 12)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 24)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 25)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 26)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 5)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 6)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat  (test for excess errors)
> +UNSUPPORTED: c-c++-common/musttail7.c  -Wc++-compat 
> +UNSUPPORTED: c-c++-common/musttail8.c  -Wc++-compat 
> 
> (Similarly for their C++ testing.)
> 
> +UNSUPPORTED: g++.dg/musttail10.C  
> +UNSUPPORTED: g++.dg/musttail11.C  
> +UNSUPPORTED: g++.dg/musttail6.C  
> +UNSUPPORTED: g++.dg/musttail9.C  
> 
> ..., and even a few existing test cases "regress" from PASS to
> UNSUPPORTED:
> 
> [-PASS:-]{+UNSUPPORTED:+} gcc.dg/plugin/must-tail-call-1.c 
> -fplugin=./must_tail_call_plugin.so[-(test for excess errors)-]
> [-PASS:-]{+UNSUPPORTED:+} gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so[-(test for errors, line 18)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so  (test for errors, line 33)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so  (test for errors, line 40)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so  (test for errors, line 49)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so  (test for errors, line 58)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so (test for excess errors)-]
> 
> Similarly for ppc64le GNU/Linux.
> 
> Is that intentional?

Thanks.  I will take a look. At least on x86_64-linux everything should
be supported. On powerpc and ARM I expect some unsupported. 

But the previous test cases shouldn't have changed. Maybe we need
more tail_call dejagnu tests that also enable -O2. 

The whole area is unfortunately somewhat of a mine field because of
lots of varying restrictions on tail calls, both with frontends
and targets.

-Andi


Re: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-07-29 Thread Richard Sandiford
pan2...@intel.com writes:
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.
>
> gcc/ChangeLog:
>
>   * internal-fn.cc (type_strictly_matches_mode_p): Add handling
>   for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..086c8be398a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* For target=amdgcn-amdhsa,  we need to take care of vector bool types.
> + More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103.  
> */
> +  if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +&& TYPE_PRECISION (TREE_TYPE (type)) == 1)

Sorry for the formatting nits, but I think this should be:

  if (VECTOR_BOOLEAN_TYPE_P (type)
  && SCALAR_INT_MODE_P (TYPE_MODE (type))
  && TYPE_PRECISION (TREE_TYPE (type)) == 1)

(one condition per line, indented below "VECTOR").

But I think the comment should give the underlying reason, rather than
treat it as a target oddity.  Maybe something like:

  /* Masked vector operations have both vector data operands and
 vector boolean operands.  The vector data operands are expected
 to have a vector mode, but the vector boolean operands can be
 an integer mode rather than a vector mode, depending on how
 TARGET_VECTORIZE_GET_MASK_MODE is defined.  */

Thanks,
Richard

> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));


Re: Polish libstdc++ 'dg-final' action 'file-io-diff' (was: [PATCH 4/8] libstdc++: Add file-io-diff to replace @diff@ markup in I/O tests)

2024-07-29 Thread Jonathan Wakely
On Mon, 29 Jul 2024 at 17:02, Thomas Schwinge  wrote:
>
> Hi Jonathan!
>
> On 2024-07-22T17:28:38+0100, Jonathan Wakely  wrote:
> > This adds a new dg-final action to compare two files after a test has
> > run, [...]
>
> Nice!
>
> > --- a/libstdc++-v3/testsuite/lib/libstdc++.exp
> > +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
>
> > +# Compare output file written by test to expected result.
> > +# With two arguments the comparison is done via 'diff arg1 arg2'.
> > +# With one argument the comparison is done via 'diff arg1.tst arg1.txt'.
> > +proc file-io-diff { args } {
> > +set nargs [llength $args]
> > +if { $nargs < 1 } {
> > + error "too few arguments to file-io-diff"
> > +}
> > +if { $nargs > 2 } {
> > + error "too many arguments to file-io-diff"
> > +}
> > +if { $nargs == 1 } {
> > + set file1 [lindex $args 0]
> > + set file2 "${file1}.txt"
> > + append file1 ".tst"
> > +} else {
> > + set file1 [lindex $args 0]
> > + set file2 [lindex $args 1]
> > +}
> > +
> > +spawn -noecho diff -u $file1 $file2
> > +expect {
> > +  -re ".+" {
> > + set msg "files differ\n"
> > + append msg $expect_out(0,string)
> > + fail $msg
> > + exp_continue
> > +  }
> > +}
> > +return
> > +}
>
> Via "deficient" GCN and nvptx target testing of my WIP C++ enablement
> trees, I ran into a minor nuisance here; OK to push the attached
> "Polish libstdc++ 'dg-final' action 'file-io-diff'"?

This is an excellent improvement to my janky tcl code, making the
output more conventional.

OK for trunk, thanks.


>
> On a proper target, powerpc64le GNU/Linux, with:
>
> $ echo yo >> libstdc++-v3/testsuite/data/istream_extractor_other-2.tst
>
> ... injected just for demonstration purposes (triggering for
> '27_io/basic_istream/extractors_other/char/2.cc' one instance of
> 'FAIL: files differ'), my patch changes 'libstdc++.sum' as follows:
>
> --- 
> build-gcc/powerpc64le-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum  
>   2024-07-29 17:11:42.333879749 +0200
> +++ 
> build-gcc/powerpc64le-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum  
>   2024-07-29 15:46:27.528214632 +0200
> @@ -1,2 +1,2 @@
> -Test run by tschwinge on Mon Jul 29 16:53:34 2024
> +Test run by tschwinge on Mon Jul 29 15:28:20 2024
>  Native configuration is powerpc64le-unknown-linux-gnu
> @@ -11075,2 +11075,3 @@
>  PASS: 27_io/basic_filebuf/close/12790-1.cc  -std=gnu++17 execution test
> +PASS: 27_io/basic_filebuf/close/char/1.cc  -std=gnu++17  file-io-diff 
> filebuf_members-1
>  PASS: 27_io/basic_filebuf/close/char/1.cc  -std=gnu++17 (test for excess 
> errors)
> @@ -11722,2 +11723,4 @@
>  PASS: 27_io/basic_istream/extractors_other/char/1.cc  -std=gnu++17 
> execution test
> +PASS: 27_io/basic_istream/extractors_other/char/2.cc  -std=gnu++17  
> file-io-diff istream_extractor_other-1
> +FAIL: 27_io/basic_istream/extractors_other/char/2.cc  -std=gnu++17  
> file-io-diff istream_extractor_other-2
>  PASS: 27_io/basic_istream/extractors_other/char/2.cc  -std=gnu++17 (test 
> for excess errors)
> @@ -11748,2 +11751,4 @@
>  PASS: 27_io/basic_istream/extractors_other/wchar_t/1.cc  -std=gnu++17 
> execution test
> +PASS: 27_io/basic_istream/extractors_other/wchar_t/2.cc  -std=gnu++17  
> file-io-diff wistream_extractor_other-1
> +PASS: 27_io/basic_istream/extractors_other/wchar_t/2.cc  -std=gnu++17  
> file-io-diff wistream_extractor_other-2
>  PASS: 27_io/basic_istream/extractors_other/wchar_t/2.cc  -std=gnu++17 
> (test for excess errors)
> @@ -11772,2 +11777,3 @@
>  PASS: 27_io/basic_istream/get/char/1.cc  -std=gnu++17 execution test
> +PASS: 27_io/basic_istream/get/char/2.cc  -std=gnu++17  file-io-diff 
> istream_unformatted-1
>  PASS: 27_io/basic_istream/get/char/2.cc  -std=gnu++17 (test for excess 
> errors)
> @@ -11779,2 +11785,3 @@
>  PASS: 27_io/basic_istream/get/wchar_t/1.cc  -std=gnu++17 execution test
> +PASS: 27_io/basic_istream/get/wchar_t/2.cc  -std=gnu++17  file-io-diff 
> istream_unformatted-1
>  PASS: 27_io/basic_istream/get/wchar_t/2.cc  -std=gnu++17 (test for 
> excess errors)
> @@ -11812,2 +11819,3 @@
>  PASS: 27_io/basic_istream/ignore/char/2.cc  -std=gnu++17 execution test
> +PASS: 27_io/basic_istream/ignore/char/3.cc  -std=gnu++17  file-io-diff 
> istream_unformatted-1
>  PASS: 27_io/basic_istream/ignore/char/3.cc  -std=gnu++17 (test for 
> excess errors)
> @@ -11828,2 +11836,3 @@
>  PASS: 27_io/basic_istream/ignore/wchar_t/2.cc  -std=gnu++17 execution 
> test
> +PASS: 27_io/basic_istream/ignore/wchar_t/3.cc  -std=gnu++17  
> file-io-diff istream_unformatted-1
>  PASS: 27_io/basic_istream/ignore/wchar_t/3.cc  -std=gnu++17 (test for 
> excess errors)
> @@ -11844,2 +11853,3 @@
>  PASS: 27_io/basic_istream/peek/char/12296.cc  -std=gnu++17 execution test
> +PASS: 27_io/basic_istream/peek

Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-29 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
>
>> Hi Richard,
>> Thanks for your suggestions on RFC email, the attached patch adds support 
>> for streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS.
>> The patch changes streaming of poly_int as follows:
>> 
>> Streaming out poly_int:
>> 
>> degree = poly_int.degree();
>> stream out degree;
>> for (i = 0; i < degree; i++)
>>   stream out poly_int.coeffs[i];
>> 
>> Streaming in poly_int:
>> 
>> stream in degree;
>> if (degree > NUM_POLY_INT_COEFFS)
>>   fatal_error();
>> stream in coeffs;
>> // Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
>> for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
>>   poly_int.coeffs[i] = 0;
>> 
>> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu.
>> LTO bootstrap+test on x86_64-linux-gnu in progress.
>> 
>> I am not quite sure how to test it for offloading since currently it's 
>> (entirely) broken for aarch64->nvptx.
>> I can give a try with x86_64->nvptx offloading if required (altho I guess 
>> LTO bootstrap should test streaming changes ?)
>
> +  unsigned degree
> += bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> HOST_WIDE_INT));
>
> The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> I'd say we should constrain it to a reasonable upper bound,
> like 2?  Maybe even have MAX_NUM_POLY_INT_COEFFS or 
> NUM_POLY_INT_COEFFS_BITS in poly-int.h and constrain NUM_POLY_INT_COEFFS.
>
> The patch looks reasonable over all, but Richard S. should have a say
> about the abstraction you chose and the poly-int adjustment.

Sorry if this has been discussed already, but could we instead stream
NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
It's a target invariant, and poly_int has wormed its way into lots
of things by now :)

Thanks,
Richard


[COMMITTED] [PATCH] testsuite: make PR115277 test an execute one

2024-07-29 Thread Sam James
PR middle-end/115277
* gcc.c-torture/compile/pr115277.c: Rename to...
* gcc.c-torture/execute/pr115277.c: ...this.
---
ACKed on IRC by honza. Pushed.

 gcc/testsuite/gcc.c-torture/{compile => execute}/pr115277.c | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/gcc.c-torture/{compile => execute}/pr115277.c (100%)

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115277.c 
b/gcc/testsuite/gcc.c-torture/execute/pr115277.c
similarity index 100%
rename from gcc/testsuite/gcc.c-torture/compile/pr115277.c
rename to gcc/testsuite/gcc.c-torture/execute/pr115277.c

-- 
2.45.2



Re: [PATCH 1/3] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-29 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, 29 Jul 2024, Jakub Jelinek wrote:
>> And, for the GET_MODE_INNER, I also meant it for Aarch64/RISC-V VL vectors,
>> I think those should be considered as true by the hook, not false
>> because maybe_ne.
>
> I don't think relevant modes will have size/precision mismatches
> and maybe_ne should work here.  Richard?

Yeah, I think that's true for AArch64 at least (not sure about RVV).

One wrinkle is that VNx16BI (every bit of a predicate) is technically
suitable for memcpy, even though it would be a bad choice performance-wise.
But VNx8BI (every even bit of a predicate) wouldn't, since the odd bits
are undefined on read.

Arguably, this means that VNx8BI has the wrong precision, but like you
say, we don't (AFAIK) support bitsize != precision for vector modes.
Instead, the information that there is only one meaningful bit per
boolean is represented by having an inner mode of BI.  Both VNx16BI
and VNx8BI have an inner mode of BI, which means that VNx8BI's
precision is not equal the its nunits * its unit precision.

So I suppose:

  maybe_ne (GET_MODE_BITSIZE (mode),
GET_MODE_UNIT_PRECISION (mode) * GET_MODE_NUNITS (mode))

would capture this.

Targets that want a vector bool mode with 2 meaningful bits per boolean
are expected to define a 2-bit scalar boolean mode and use that as the
inner mode.  So I think the condition above would (correctly) continue
to allow those.

Thanks,
Richard


Re: [PATCH 6/4] libbacktrace: Add loaded dlls after initialize

2024-07-29 Thread Ian Lance Taylor
On Fri, Mar 15, 2024 at 1:41 PM Björn Schäpers  wrote:
>
> Am 10.01.2024 um 13:34 schrieb Eli Zaretskii:
> >> Date: Tue, 9 Jan 2024 21:02:44 +0100
> >> Cc: i...@google.com, gcc-patches@gcc.gnu.org, g...@gcc.gnu.org
> >> From: Björn Schäpers 
> >>
> >> Am 07.01.2024 um 18:03 schrieb Eli Zaretskii:
> >>> In that case, you an call either GetModuleHandeExA or
> >>> GetModuleHandeExW, the difference is minor.
> >>
> >> Here an updated version without relying on TEXT or TCHAR, directly calling
> >> GetModuleHandleExW.
> >
> > Thanks, this LGTM (but I couldn't test it, I just looked at the
> > sour ce code).
>
> Here an updated version. It is rebased on the combined approach of getting the
> loaded DLLs and two minor changes to suppress warnings.

This bug report was filed about this patch:

https://github.com/ianlancetaylor/libbacktrace/issues/131

> src\pecoff.c(86): error C2059: syntax error: '('
> src\pecoff.c(89): error C2059: syntax error: '('
>
> It works fine if deleting CALLBACK and NTAPI.

Any ideas?

Thanks.

Ian


Re: [PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-29 Thread Andi Kleen



I'm going to revert the patch for now. There are two problems:

- The new tests don't have a unique name so the caching confuses 
the results.
- To test with -O2 we need explicit musttail checks because tail call doesn't
run with -O0 w/o musttail.



Re: [PATCH 6/4] libbacktrace: Add loaded dlls after initialize

2024-07-29 Thread Eli Zaretskii
> From: Ian Lance Taylor 
> Date: Mon, 29 Jul 2024 09:46:46 -0700
> Cc: Eli Zaretskii , gcc-patches@gcc.gnu.org, g...@gcc.gnu.org
> 
> On Fri, Mar 15, 2024 at 1:41 PM Björn Schäpers  wrote:
> >
> > Am 10.01.2024 um 13:34 schrieb Eli Zaretskii:
> > >> Date: Tue, 9 Jan 2024 21:02:44 +0100
> > >> Cc: i...@google.com, gcc-patches@gcc.gnu.org, g...@gcc.gnu.org
> > >> From: Björn Schäpers 
> > >>
> > >> Am 07.01.2024 um 18:03 schrieb Eli Zaretskii:
> > >>> In that case, you an call either GetModuleHandeExA or
> > >>> GetModuleHandeExW, the difference is minor.
> > >>
> > >> Here an updated version without relying on TEXT or TCHAR, directly 
> > >> calling
> > >> GetModuleHandleExW.
> > >
> > > Thanks, this LGTM (but I couldn't test it, I just looked at the
> > > sour ce code).
> >
> > Here an updated version. It is rebased on the combined approach of getting 
> > the
> > loaded DLLs and two minor changes to suppress warnings.
> 
> This bug report was filed about this patch:
> 
> https://github.com/ianlancetaylor/libbacktrace/issues/131
> 
> > src\pecoff.c(86): error C2059: syntax error: '('
> > src\pecoff.c(89): error C2059: syntax error: '('
> >
> > It works fine if deleting CALLBACK and NTAPI.
> 
> Any ideas?

Instead of deleting those, move them inside the parentheses:

typedef VOID (CALLBACK *LDR_DLL_NOTIFICATION)(ULONG,
  struct dll_notification_data*,
  PVOID);
typedef NTSTATUS (NTAPI *LDR_REGISTER_FUNCTION)(ULONG,
LDR_DLL_NOTIFICATION, PVOID,
PVOID*);

and also I think you need to include , for the definition
of the NTSTATUS type.

Caveat: I don't have MSVC, so I couldn't verify that these measures
fix the problem, sorry.


[PATCH 1/2] libstdc++: Fix std::format output for std::chrono::zoned_time

2024-07-29 Thread Jonathan Wakely
My first attempt to fix this was an overly complex kluge, but there was
a nice simple solution staring me in the face. I'm pretty happy with
this now.

Tested x86_64-linux.

-- >8 --

When formatting a chrono::zoned_time with an empty chrono-specs, we were
only formatting its _M_time member, but the ostream insertion operator
uses the format "{:L%F %T %Z}" which includes the time zone
abbreviation. The %Z should also be used when formatting with an empty
chrono-specs.

This commit makes _M_format_to_ostream handle __local_time_fmt
specializations directly, rather than calling itself recursively to
format the _M_time member. We need to be able to customize the output of
_M_format_to_ostream for __local_time_fmt, because we use that type for
gps_time and tai_time as well as for zoned_time and __local_time_fmt.
When formatting gps_time and tai_time we don't want to include the time
zone abbreviation in the "{}" output, but for zoned_time we do want to.
We can reuse the __is_neg flag passed to _M_format_to_ostream (via
_M_format) to say that we want the time zone abbreviation.  Currently
the __is_neg flag is only used for duration specializations, so it's
available for __local_time_fmt to use.

In addition to fixing the zoned_time output to use %Z, this commit also
changes the __local_time_fmt output to use %Z. Previously it didn't use
it, just like zoned_time.  The standard doesn't actually say how to
format local-time-format-t for an empty chrono-specs, but this behaviour
seems sensible and is what I'm proposing as part of LWG 4124.

While testing this I noticed that some chrono types were not being
tested with empty chrono-specs, so this adds more tests. I also noticed
that std/time/clock/local/io.cc was testing tai_time instead of
local_time, which was completely wrong. That's fixed now too.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__local_fmt_t): Remove unused
declaration.
(__formatter_chrono::_M_format_to_ostream): Add explicit
handling for specializations of __local_time_fmt, including the
time zone abbreviation in the output if __is_neg is true.
(formatter>::format): Add comment.
(formatter>::format): Likewise.
(formatter
 struct __local_time_fmt
 {
@@ -163,8 +164,6 @@ namespace __detail
   const string* _M_abbrev;
   const seconds* _M_offset_sec;
 };
-
-  struct __local_fmt_t;
 }
 /// @endcond
 
@@ -695,13 +694,34 @@ namespace __format
  using ::std::chrono::__detail::__utc_leap_second;
  using ::std::chrono::__detail::__local_time_fmt;
 
+ basic_ostringstream<_CharT> __os;
+ __os.imbue(_M_locale(__fc));
+
  if constexpr (__is_specialization_of<_Tp, __local_time_fmt>)
-   return _M_format_to_ostream(__t._M_time, __fc, false);
+   {
+ // Format as "{:L%F %T}"
+ auto __days = chrono::floor(__t._M_time);
+ __os << chrono::year_month_day(__days) << ' '
+  << chrono::hh_mm_ss(__t._M_time - __days);
+
+ // For __local_time_fmt the __is_neg flags says whether to
+ // append " %Z" to the result.
+ if (__is_neg)
+   {
+ if (!__t._M_abbrev) [[unlikely]]
+   __format::__no_timezone_available();
+ else if constexpr (is_same_v<_CharT, char>)
+   __os << ' ' << *__t._M_abbrev;
+ else
+   {
+ __os << L' ';
+ for (char __c : *__t._M_abbrev)
+   __os << __c;
+   }
+   }
+   }
  else
{
- basic_ostringstream<_CharT> __os;
- __os.imbue(_M_locale(__fc));
-
  if constexpr (__is_specialization_of<_Tp, __utc_leap_second>)
__os << __t._M_date << ' ' << __t._M_time;
  else if constexpr (chrono::__is_time_point_v<_Tp>)
@@ -727,11 +747,11 @@ namespace __format
  __os << _S_plus_minus[1];
  __os << __t;
}
-
- auto __str = std::move(__os).str();
- return __format::__write_padded_as_spec(__str, __str.size(),
- __fc, _M_spec);
}
+
+ auto __str = std::move(__os).str();
+ return __format::__write_padded_as_spec(__str, __str.size(),
+ __fc, _M_spec);
}
 
   static constexpr const _CharT* _S_chars
@@ -2008,6 +2028,8 @@ namespace __format
   _FormatContext& __fc) const
{
  // Convert to __local_time_fmt with abbrev "TAI" and offset 0s.
+ // We use __local_time_fmt and not sys_time (as the standard implies)
+ // because %Z for sys_time would print "UTC" and we want "TAI" here.
 
  // Offset is 1970y/January/1 - 1958y/January/1
  constexpr chrono::da

[PATCH 2/2] libstdc++: Fix formatter for low-resolution chrono::zoned_time (LWG 4124)

2024-07-29 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

This implements the proposed resolution of LWG 4124, so that
low-resolution chrono::zoned_time objects can be formatted. The
formatter for zoned_time needs to account for get_local_time
returning local_time> not local_time.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__local_time_fmt_for): New alias
template.
(formatter>): Use __local_time_fmt_for.
* testsuite/std/time/zoned_time/io.cc: Check zoned_time
can be formatted.
---
 libstdc++-v3/include/bits/chrono_io.h| 12 +---
 libstdc++-v3/testsuite/std/time/zoned_time/io.cc |  4 
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index e7e7deb2cde..d8a4a121113 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -164,6 +164,12 @@ namespace __detail
   const string* _M_abbrev;
   const seconds* _M_offset_sec;
 };
+
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 4124. Cannot format zoned_time with resolution coarser than seconds
+  template
+using __local_time_fmt_for
+  = __local_time_fmt>;
 }
 /// @endcond
 
@@ -2137,15 +2143,15 @@ namespace __format
 #if _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
   template
 struct formatter, _CharT>
-: formatter, _CharT>
+: formatter, _CharT>
 {
   template
typename _FormatContext::iterator
format(const chrono::zoned_time<_Duration, _TimeZonePtr>& __tp,
   _FormatContext& __ctx) const
{
- using chrono::__detail::__local_time_fmt;
- using _Base = formatter<__local_time_fmt<_Duration>, _CharT>;
+ using _Ltf = chrono::__detail::__local_time_fmt_for<_Duration>;
+ using _Base = formatter<_Ltf, _CharT>;
  const chrono::sys_info __info = __tp.get_info();
  const auto __lf = chrono::local_time_format(__tp.get_local_time(),
  &__info.abbrev,
diff --git a/libstdc++-v3/testsuite/std/time/zoned_time/io.cc 
b/libstdc++-v3/testsuite/std/time/zoned_time/io.cc
index ee3b9edba81..c113eea6d3f 100644
--- a/libstdc++-v3/testsuite/std/time/zoned_time/io.cc
+++ b/libstdc++-v3/testsuite/std/time/zoned_time/io.cc
@@ -66,6 +66,10 @@ test_format()
   ws = std::format(L"{:+^34}", zoned_time(zone, t));
   VERIFY( ws == L"++2022-12-19 12:26:25.708000 EST++" );
 #endif
+
+  // LWG 4124. Cannot format zoned_time with resolution coarser than seconds
+  s = std::format("{}", zoned_time(zone, 
time_point_cast(t)));
+  VERIFY( s == "2022-12-19 12:26:00 EST" );
 }
 
 int main()
-- 
2.45.2



[PATCH] PR116080: Fix test suite checks for musttail

2024-07-29 Thread Andi Kleen
From: Andi Kleen 

This is a new attempt to fix PR116080. The previous try was reverted
because it just broke a bunch of tests, hiding the problem.

- musttail behaves differently than tailcall at -O0. Some of the test
run at -O0, so add separate effective target tests for musttail.
- New effective target tests need to use unique file names
to make dejagnu caching work
- Change the tests to use new targets
- Add a external_musttail test to check for target's ability
to do tail calls between translation units. This covers some powerpc
ABIs.

gcc/testsuite/ChangeLog:

PR testsuite/116080
* c-c++-common/musttail1.c: Use musttail target.
* c-c++-common/musttail12.c: Use struct_musttail target.
* c-c++-common/musttail2.c: Use musttail target.
* c-c++-common/musttail3.c: Likewise.
* c-c++-common/musttail4.c: Likewise.
* c-c++-common/musttail7.c: Likewise.
* c-c++-common/musttail8.c: Likewise.
* g++.dg/musttail10.C: Likewise. Replace powerpc checks with
external_musttail.
* g++.dg/musttail11.C: Use musttail target.
* g++.dg/musttail6.C: Use musttail target. Replace powerpc
checks with external_musttail.
* g++.dg/musttail9.C: Use musttail target.
* lib/target-supports.exp: Add musttail, struct_musttail,
external_musttail targets. Remove optimization for musttail.
Use unique file names for musttail.
---
 gcc/testsuite/c-c++-common/musttail1.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail12.c |  2 +-
 gcc/testsuite/c-c++-common/musttail2.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail3.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail4.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail7.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail8.c  |  2 +-
 gcc/testsuite/g++.dg/musttail10.C   |  4 ++--
 gcc/testsuite/g++.dg/musttail11.C   |  2 +-
 gcc/testsuite/g++.dg/musttail6.C|  4 ++--
 gcc/testsuite/g++.dg/musttail9.C|  2 +-
 gcc/testsuite/lib/target-supports.exp   | 30 -
 12 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
b/gcc/testsuite/c-c++-common/musttail1.c
index 74efcc2a0bc6..51549672e02a 100644
--- a/gcc/testsuite/c-c++-common/musttail1.c
+++ b/gcc/testsuite/c-c++-common/musttail1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 int __attribute__((noinline,noclone,noipa))
diff --git a/gcc/testsuite/c-c++-common/musttail12.c 
b/gcc/testsuite/c-c++-common/musttail12.c
index 4140bcd00950..475afc5af3f3 100644
--- a/gcc/testsuite/c-c++-common/musttail12.c
+++ b/gcc/testsuite/c-c++-common/musttail12.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { struct_tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 struct str
diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
b/gcc/testsuite/c-c++-common/musttail2.c
index 86f2c3d77404..1970c4edd670 100644
--- a/gcc/testsuite/c-c++-common/musttail2.c
+++ b/gcc/testsuite/c-c++-common/musttail2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 
 struct box { char field[256]; int i; };
 
diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
b/gcc/testsuite/c-c++-common/musttail3.c
index ea9589c59ef2..7499fd6460b4 100644
--- a/gcc/testsuite/c-c++-common/musttail3.c
+++ b/gcc/testsuite/c-c++-common/musttail3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
 
 extern int foo2 (int x, ...);
 
diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
b/gcc/testsuite/c-c++-common/musttail4.c
index 23f4b5e1cd68..bd6effa4b931 100644
--- a/gcc/testsuite/c-c++-common/musttail4.c
+++ b/gcc/testsuite/c-c++-common/musttail4.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 
 struct box { char field[64]; int i; };
 
diff --git a/gcc/testsuite/c-c++-common/musttail7.c 
b/gcc/testsuite/c-c++-common/musttail7.c
index c753a3fe9b2a..d17cb71256d7 100644
--- a/gcc/testsuite/c-c++-common/musttail7.c
+++ b/gcc/testsuite/c-c++-common/musttail7.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 void __attribute__((noipa)) f() {}
diff --git a/gcc/testsuite/c-c++-common/musttail8.c 
b/gcc/testsuite/c-c++-common/musttail8.c
index 9fa10e0b54c4..50ca1ac0dd48 100644
--- a/gcc/testsuite/c-c+

Re: [PATCH] btf: Protect BTF_KIND_INFO against invalid kind

2024-07-29 Thread David Faust


On 7/29/24 07:42, Will Hawkins wrote:
> If the user provides a kind value that is more than 5 bits, the
> BTF_KIND_INFO macro would emit incorrect values for info (by clobbering
> values of the kind flag).
> 
> Tested on x86_64-redhat-linux.

OK, thanks.

> 
> include/ChangeLog:
> 
>   * btf.h (BTF_TYPE_INFO): Protect against user providing invalid
> kind.
> 
> Signed-off-by: Will Hawkins 
> ---
> 
> Notes:
>  I have a small out-of-tree test but was not sure whether a) it should
>be included and/or b) where it should be included. If you would
>like me to include it, please just let me know where it should 
> go!
> 
>  include/btf.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/btf.h b/include/btf.h
> index 3f45ffb0b6b..0c3e1a1cf51 100644
> --- a/include/btf.h
> +++ b/include/btf.h
> @@ -82,7 +82,7 @@ struct btf_type
>};
>  };
>  
> -/* The folloing macros access the information encoded in btf_type.info.  */
> +/* The following macros access the information encoded in btf_type.info.  */
>  /* Type kind. See below.  */
>  #define BTF_INFO_KIND(info)  (((info) >> 24) & 0x1f)
>  /* Number of entries of variable length data following certain type kinds.
> @@ -95,7 +95,7 @@ struct btf_type
>  
>  /* Encoding for struct btf_type.info.  */
>  #define BTF_TYPE_INFO(kind, kflag, vlen) \
> -  kflag) ? 1 : 0 ) << 31) | ((kind) << 24) | ((vlen) & 0x))
> +  kflag) ? 1 : 0 ) << 31) | ((kind & 0x1f) << 24) | ((vlen) & 0x))
>  
>  #define BTF_KIND_UNKN0   /* Unknown or invalid.  */
>  #define BTF_KIND_INT 1   /* Integer.  */


[COMMITTED] gcc: xtensa: disable late-combine by default

2024-07-29 Thread Max Filippov
gcc/
* config/xtensa/xtensa.cc (xtensa_option_override_after_change):
New function.
(TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): Define as
xtensa_option_override_after_change.
(xtensa_option_override): Call
xtensa_option_override_after_change.
---
 gcc/config/xtensa/xtensa.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index d49d224466ad..10d964b51a96 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -114,6 +114,7 @@ struct GTY(()) machine_function
 };
 
 static void xtensa_option_override (void);
+static void xtensa_option_override_after_change (void);
 static enum internal_test map_test_to_internal_test (enum rtx_code);
 static rtx gen_int_relational (enum rtx_code, rtx, rtx);
 static rtx gen_float_relational (enum rtx_code, rtx, rtx);
@@ -303,6 +304,9 @@ static rtx xtensa_delegitimize_address (rtx);
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE xtensa_option_override
 
+#undef TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE
+#define TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE 
xtensa_option_override_after_change
+
 #undef TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA
 #define TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA xtensa_output_addr_const_extra
 
@@ -2988,6 +2992,15 @@ xtensa_option_override (void)
  the define_insn_and_splits are fixed.  */
   if (!OPTION_SET_P (flag_late_combine_instructions))
 flag_late_combine_instructions = 0;
+
+  xtensa_option_override_after_change ();
+}
+
+static void
+xtensa_option_override_after_change (void)
+{
+  if (!OPTION_SET_P (flag_late_combine_instructions))
+flag_late_combine_instructions = 0;
 }
 
 /* Implement TARGET_HARD_REGNO_NREGS.  */
-- 
2.39.2



Re: [PATCH] doc: Improve punctuation and grammar in -fdiagnostics-format docs

2024-07-29 Thread David Malcolm
On Fri, 2024-03-15 at 13:02 +, Jonathan Wakely wrote:
> OK for trunk?

LGTM, thanks

Dave

> 
> -- >8 --
> 
> The hyphen can be misunderstood to mean "emitted to -" i.e. stdout.
> Refer to both forms by name, rather than using "the former" for one
> and
> referring to the other by name.
> 
> gcc/ChangeLog:
> 
> * doc/invoke.texi (Diagnostic Message Formatting Options):
> Replace hyphen with a new sentence. Replace "the former" with
> the actual value.
> ---
>  gcc/doc/invoke.texi | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 85c938d4a14..d850b5fcdcc 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -5737,8 +5737,9 @@ named @file{@var{source}.sarif}, respectively.
>  
>  The @samp{json} format is a synonym for @samp{json-stderr}.
>  The @samp{json-stderr} and @samp{json-file} formats are identical,
> apart from
> -where the JSON is emitted to - with the former, the JSON is emitted
> to stderr,
> -whereas with @samp{json-file} it is written to
> @file{@var{source}.gcc.json}.
> +where the JSON is emitted to.  With @samp{json-stderr}, the JSON is
> emitted
> +to stderr, whereas with @samp{json-file} it is written to
> +@file{@var{source}.gcc.json}.
>  
>  The emitted JSON consists of a top-level JSON array containing JSON
> objects
>  representing the diagnostics.





[Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637]

2024-07-29 Thread Tobias Burnus

The problem is code like:

  MEM  [(c_char * {ref-all})&arr2]

where arr2 is the value expr '*arr2$13$linkptr'
(i.e. indirect ref + decl name).

Insidepass_omp_target_link::execute, there is a call to 
gimple_regimplify_operands but the value expression is not 
expanded.There are two problems: ADDR_EXPR is no handling this and while 
MEM_REF has some code for it, it doesn't handle this either. The 
attached code fixes this. Tested on x86_64-gnu-linux with nvidia 
offloading. Comments, remarks, OK? Better suggestions? * * * In 
gimplify_expr for MEM_REF, there is a call to is_gimple_mem_ref_addr which checks for ADD_EXPR

but not for value expressions. The attached match handles
the case explicitly, but, alternatively, we might want
move it to is_gimple_mem_ref_addr (not checked whether it
makes sense or not).

Where is_gimple_mem_ref_addr is defined as:

/* Return true if T is a valid address operand of a MEM_REF.  */

bool
is_gimple_mem_ref_addr (tree t)
{
  return (is_gimple_reg (t)
  || TREE_CODE (t) == INTEGER_CST
  || (TREE_CODE (t) == ADDR_EXPR
  && (CONSTANT_CLASS_P (TREE_OPERAND (t, 0))
  || decl_address_invariant_p (TREE_OPERAND (t, 0);
}

Tobias
gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637]

As the PR and included testcase shows, replacing 'arr2' by its value expression
'*arr2$13$linkptr' failed for
  MEM  [(c_char * {ref-all})&arr2]
which left 'arr2' in the code as unknown symbol.

	PR middle-end/115637

gcc/ChangeLog:

	* gimplify.cc (gimplify_addr_expr): Handle value-expr arg.
	(gimplify_expr): For MEM_REF and an ADDR_EXPR, also check
	for value-expr arguments.
	(gimplify_body): Fix macro name in the comment.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: Uncomment
	now working code.

 gcc/gimplify.cc  | 16 ++--
 .../testsuite/libgomp.fortran/declare-target-link.f90| 15 ++-
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index ab323d764e8..d548dc2cdf6 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -6888,6 +6888,13 @@ gimplify_addr_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
   enum gimplify_status ret;
   location_t loc = EXPR_LOCATION (*expr_p);
 
+  if (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
+{
+  ret = gimplify_var_or_parm_decl (&TREE_OPERAND (expr, 0));
+  if (ret == GS_ERROR)
+	return ret;
+  op0 = TREE_OPERAND (expr, 0);
+}
   switch (TREE_CODE (op0))
 {
 case INDIRECT_REF:
@@ -18251,8 +18258,13 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	 in suitable form.  Re-gimplifying would mark the address
 	 operand addressable.  Always gimplify when not in SSA form
 	 as we still may have to gimplify decls with value-exprs.  */
+	  tmp = TREE_OPERAND (*expr_p, 0);
 	  if (!gimplify_ctxp || !gimple_in_ssa_p (cfun)
-	  || !is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0)))
+	  || (!is_gimple_mem_ref_addr (tmp)
+		  || (TREE_CODE (tmp) == ADDR_EXPR
+		  && (VAR_P (TREE_OPERAND (tmp, 0))
+			  || TREE_CODE (TREE_OPERAND (tmp, 0)) == PARM_DECL)
+		  && DECL_HAS_VALUE_EXPR_P (TREE_OPERAND (tmp, 0)
 	{
 	  ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
    is_gimple_mem_ref_addr, fb_rvalue);
@@ -19422,7 +19434,7 @@ gimplify_body (tree fndecl, bool do_parms)
   DECL_SAVED_TREE (fndecl) = NULL_TREE;
 
   /* If we had callee-copies statements, insert them at the beginning
- of the function and clear DECL_VALUE_EXPR_P on the parameters.  */
+ of the function and clear DECL_HAS_VALUE_EXPR_P on the parameters.  */
   if (!gimple_seq_empty_p (parm_stmts))
 {
   tree parm;
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90
index 2ce212d114f..44c67f925bd 100644
--- a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90
@@ -1,5 +1,7 @@
 ! { dg-additional-options "-Wall" }
+
 ! PR fortran/115559
+! PR middle-end/115637
 
 module m
integer :: A
@@ -73,24 +75,19 @@ contains
 !$omp target map(from:res)
   res = run_device1()
 !$omp end target
-print *, res
-! FIXME: arr2 not link mapped -> PR115637
-! if (res /= -11436) stop 5
-if (res /= -11546) stop 5 ! FIXME
+! print *, res
+if (res /= -11436) stop 5
   end
   integer function run_device1()
 !$omp declare target
 integer :: i
 run_device1 = -99
-! FIXME: arr2 not link mapped -> PR115637
-!   arr2 = [11,22,33,44]
+arr2 = [11,22,33,44]
 if (any (arr(10:50) /= [(i, i=10,50)])) then
   run_device1 = arr(11)
   return
 end if
-! FIXME: -> PR115637
-! run_device1 = sum(arr(10:13) + arr2)
-run_device1 = sum(arr(10:13) ) ! FIXME
+run_device1 = sum(arr

Re: [PATCH] c++: generic lambda as default template argument [PR88313]

2024-07-29 Thread Jason Merrill

On 7/29/24 11:38 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK for trunk and perhaps 14.3?  It should only make a differenc
for C++20 code where lambdas are permitted as template arguments.


OK for both.


-- >8 --

Here we're rejecting the generic lambda inside the default template
argument ultimately because auto_is_implicit_function_template_parm_p
doesn't get set during parsing of the lambda's parameter list, due
to the !processing_template_parmlist restriction.  But when parsing a
lambda parameter list we should always set that flag regardless of where
the lambda appears.  This patch makes sure this happens by way of a
local lambda_p flag.

PR c++/88313

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_declarator_opt): Pass
lambda_p=true to cp_parser_parameter_declaration_clause.
(cp_parser_direct_declarator): Pass lambda_p=false to
to cp_parser_parameter_declaration_clause.
(cp_parser_parameter_declaration_clause): Add bool lambda_p
parameter.  Consider lambda_p instead of current_class_type
when setting parser->auto_is_implicit_function_template_parm_p.
Don't consider processing_template_parmlist.
(cp_parser_requirement_parameter_list): Pass lambda_p=false
to cp_parser_parameter_declaration_clause.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ6.C: New test.
---
  gcc/cp/parser.cc  | 34 +--
  gcc/testsuite/g++.dg/cpp2a/lambda-targ6.C | 11 
  2 files changed, 31 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ6.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f79736c17ac..f5336eae74a 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2600,7 +2600,7 @@ static tree cp_parser_type_id_1
  static void cp_parser_type_specifier_seq
(cp_parser *, cp_parser_flags, bool, bool, cp_decl_specifier_seq *);
  static tree cp_parser_parameter_declaration_clause
-  (cp_parser *, cp_parser_flags);
+  (cp_parser *, cp_parser_flags, bool);
  static tree cp_parser_parameter_declaration_list
(cp_parser *, cp_parser_flags, auto_vec *);
  static cp_parameter_declarator *cp_parser_parameter_declaration
@@ -11889,7 +11889,7 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
/* Parse parameters.  */
param_list
= cp_parser_parameter_declaration_clause
-   (parser, CP_PARSER_FLAGS_TYPENAME_OPTIONAL);
+   (parser, CP_PARSER_FLAGS_TYPENAME_OPTIONAL, /*lambda_p=*/true);
  
/* Default arguments shall not be specified in the

 parameter-declaration-clause of a lambda-declarator.  */
@@ -24097,7 +24097,8 @@ cp_parser_direct_declarator (cp_parser* parser,
  
  	  /* Parse the parameter-declaration-clause.  */

  params
-   = cp_parser_parameter_declaration_clause (parser, flags);
+   = cp_parser_parameter_declaration_clause (parser, flags,
+ /*lambda=*/false);
  const location_t parens_end
= cp_lexer_peek_token (parser->lexer)->location;
  
@@ -25444,13 +25445,17 @@ function_being_declared_is_template_p (cp_parser* parser)
  
 The parser flags FLAGS is used to control type-specifier parsing.
  
+   LAMBDA_P is true if this is the parameter-declaration-clause of

+   a lambda-declarator.
+
 Returns a representation for the parameter declarations.  A return
 value of NULL indicates a parameter-declaration-clause consisting
 only of an ellipsis.  */
  
  static tree

  cp_parser_parameter_declaration_clause (cp_parser* parser,
-   cp_parser_flags flags)
+   cp_parser_flags flags,
+   bool lambda_p)
  {
tree parameters;
cp_token *token;
@@ -25459,15 +25464,15 @@ cp_parser_parameter_declaration_clause (cp_parser* 
parser,
auto cleanup = make_temp_override
  (parser->auto_is_implicit_function_template_parm_p);
  
-  if (!processing_specialization

-  && !processing_template_parmlist
-  && !processing_explicit_instantiation
-  /* default_arg_ok_p tracks whether this is a parameter-clause for an
- actual function or a random abstract declarator.  */
-  && parser->default_arg_ok_p)
-if (!current_function_decl
-   || (current_class_type && LAMBDA_TYPE_P (current_class_type)))
-  parser->auto_is_implicit_function_template_parm_p = true;
+  if (lambda_p
+  || (!processing_specialization
+ && !processing_template_parmlist
+ && !processing_explicit_instantiation
+ /* default_arg_ok_p tracks whether this is a parameter-clause for an
+actual function or a random abstract declarator.  */
+ && parser->default_arg_ok_p
+ && !current_function_decl))
+parser->a

Re: [PATCH] c++: make BUILTIN_SOURCE_LOCATION follow DECL_RAMP_FN

2024-07-29 Thread Jason Merrill
I don't know what all-caps BUILTIN_SOURCE_LOCATION refers to 
specifically (it doesn't match CP_BUILT_IN_SOURCE_LOCATION, for 
instance); let's just refer to source_location.


On 7/29/24 8:20 AM, Arsen Arsenović wrote:

This fixes the value of current_function in compiler generated coroutine
code.

PR c++/110855 - std::source_location doesn't work with C++20 coroutine

gcc/cp/ChangeLog:

PR c++/110855
* cp-gimplify.cc (fold_builtin_source_location): Use the name of
the DECL_RAMP_FN of the current function if present.

gcc/testsuite/ChangeLog:

PR c++/110855
* g++.dg/coroutines/pr110855.C: New test.
---
Tested on x86_64-pc-linux-gnu.

OK for trunk?

TIA, have a lovely day.


Again, best to put stuff like this that doesn't go in the commit message 
first, followed by scissors (-- 8< --), then the commit message.


The patch is OK.


  gcc/cp/cp-gimplify.cc  |  9 +++-
  gcc/testsuite/g++.dg/coroutines/pr110855.C | 61 ++
  2 files changed, 69 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr110855.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index e6629dea5fdc..651751312fbe 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3929,7 +3929,14 @@ fold_builtin_source_location (const_tree t)
  const char *name = "";
  
  	  if (current_function_decl)

-   name = cxx_printable_name (current_function_decl, 2);
+   {
+ /* If this is a coroutine, we should get the name of the user
+function rather than the actor we generate.  */
+ if (tree ramp = DECL_RAMP_FN (current_function_decl))
+   name = cxx_printable_name (ramp, 2);
+ else
+   name = cxx_printable_name (current_function_decl, 2);
+   }
  
  	  val = build_string_literal (name);

}
diff --git a/gcc/testsuite/g++.dg/coroutines/pr110855.C 
b/gcc/testsuite/g++.dg/coroutines/pr110855.C
new file mode 100644
index ..6b5c0147ec83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr110855.C
@@ -0,0 +1,61 @@
+// { dg-do run }
+// { dg-output {^} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {ReturnObject bar\(int, char, bool\)(\n|\r\n|\r)} }
+// { dg-output {$} }
+// https://gcc.gnu.org/PR110855
+#include 
+#include 
+
+struct ReturnObject {
+  struct promise_type {
+auto
+initial_suspend(const std::source_location location =
+std::source_location::current()) {
+  __builtin_puts (location.function_name ());
+  return std::suspend_never{};
+}
+auto
+final_suspend(const std::source_location location =
+  std::source_location::current()) noexcept {
+  __builtin_puts (location.function_name ());
+  return std::suspend_never{};
+}
+auto
+get_return_object(const std::source_location location =
+  std::source_location::current()) {
+  __builtin_puts (location.function_name ());
+  return 
ReturnObject{std::coroutine_handle::from_promise(*this)};
+}
+auto
+unhandled_exception() { }
+auto return_void(const std::source_location location =
+ std::source_location::current()) {
+  __builtin_puts (location.function_name ());
+}
+  };
+  std::coroutine_handle<> handle;
+};
+
+struct awaitable : std::suspend_never
+{
+  void await_resume(const std::source_location location =
+ std::source_location::current())
+  {
+  __builtin_puts (location.function_name ());
+  }
+};
+
+ReturnObject
+bar(int, char, bool) {
+  co_await awaitable{};
+  co_return;
+}
+
+int
+main() {
+  bar(1, 'a', false);
+}




  1   2   >