date:20240923

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Andre Vehreschild

Hi Anuj,

please check the code style of your patch using:

contrib/check_GNU_style.py 

It reports several errors with line length and formatting.

Could you also please specify the commit SHA your patch is supposed to apply
to? At current mainline's HEAD it has several rejects which makes reviewing
harder.

And please attach the patch as plain text. It is html-encoded with several
html-codes, for example a '>' is encoded as '>'. This makes it nearly
impossible to apply.

Therefore not good for mainline yet.

- Andre



On Sun, 22 Sep 2024 11:49:28 +0530
Anuj Mohite  wrote:

> gcc/fortran/ChangeLog:
>
>   * dump-parse-tree.cc (show_code_node): Updated to use
>   c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
>   Added support for dumping DO CONCURRENT locality specifiers.
>   * frontend-passes.cc (index_interchange, gfc_code_walker): Updated to
>   use c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
>   * gfortran.h (enum locality_type): Added new enum for locality types
>   in DO CONCURRENT constructs.
>   * match.cc (match_simple_forall, gfc_match_forall): Updated to use
>   new_st.ext.concur.forall_iterator instead of
> new_st.ext.forall_iterator. (gfc_match_do): Implemented support for matching
> DO CONCURRENT locality specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE),
> and REDUCE).
>   * parse.cc (parse_do_block): Updated to use
>   new_st.ext.concur.forall_iterator instead of
> new_st.ext.forall_iterator.
>   * resolve.cc: Added struct check_default_none_data.
>   (do_concur_locality_specs_f2023): New function to check compliance
>   with F2023's C1133 constraint for DO CONCURRENT.
>   (check_default_none_expr): New function to check DEFAULT(NONE)
>   compliance.
>   (resolve_locality_spec): New function to resolve locality specs.
>   (gfc_count_forall_iterators): Updated to use
>   code->ext.concur.forall_iterator.
>   (gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator.
>   * st.cc (gfc_free_statement): Updated to free locality specifications
>   and use p->ext.concur.forall_iterator.
>   * trans-stmt.cc (gfc_trans_forall_1): Updated to use
>   code->ext.concur.forall_iterator.
>
> gcc/testsuite/ChangeLog:
>
>   * gfortran.dg/do_concurrent_10.f90: New test for parsing DO CONCURRENT
>   with 'concurrent' as a variable name.
>   * gfortran.dg/do_concurrent_8_f2018.f90: New test for F2018 DO
>   CONCURRENT with nested loops and REDUCE clauses.
>   * gfortran.dg/do_concurrent_8_f2023.f90: New test for F2023 DO
>   CONCURRENT with nested loops and REDUCE clauses.
>   * gfortran.dg/do_concurrent_9.f90: New test for DO CONCURRENT with
>   DEFAULT(NONE) and locality specs.
>   * gfortran.dg/do_concurrent_all_clauses.f90: New test covering all DO
>   CONCURRENT clauses and their interactions.
>   * gfortran.dg/do_concurrent_basic.f90: New basic test for DO
> CONCURRENT functionality.
>   * gfortran.dg/do_concurrent_constraints.f90: New test for constraints
>   on DO CONCURRENT locality specs.
>   * gfortran.dg/do_concurrent_local_init.f90: New test for LOCAL_INIT
>   clause in DO CONCURRENT.
>   * gfortran.dg/do_concurrent_locality_specs.f90: New test for DO
>   CONCURRENT with locality specs.
>   * gfortran.dg/do_concurrent_multiple_reduce.f90: New test for multiple
>   REDUCE clauses in DO CONCURRENT.
>   * gfortran.dg/do_concurrent_nested.f90: New test for nested DO
>   CONCURRENT loops.
>   * gfortran.dg/do_concurrent_parser.f90: New test for DO CONCURRENT
>   parser error handling.
>   * gfortran.dg/do_concurrent_reduce_max.f90: New test for REDUCE with
>   MAX operation in DO CONCURRENT.
>   * gfortran.dg/do_concurrent_reduce_sum.f90: New test for REDUCE with
>   sum operation in DO CONCURRENT.
>   * gfortran.dg/do_concurrent_shared.f90: New test for SHARED clause in
>   DO CONCURRENT.
>
> Signed-off-by: Anuj 
> ---
>  gcc/fortran/dump-parse-tree.cc| 113 +-
>  gcc/fortran/frontend-passes.cc|   8 +-
>  gcc/fortran/gfortran.h|  20 +-
>  gcc/fortran/match.cc  | 286 +-
>  gcc/fortran/parse.cc  |   2 +-
>  gcc/fortran/resolve.cc| 354 +-
>  gcc/fortran/st.cc |   5 +-
>  gcc/fortran/trans-stmt.cc |   6 +-
>  .../gfortran.dg/do_concurrent_10.f90  |  11 +
>  .../gfortran.dg/do_concurrent_8_f2018.f90 |  19 +
>  .../gfortran.dg/do_concurrent_8_f2023.f90 |  23 ++
>  gcc/testsuite/gfortran.dg/do_concurrent_9.f90 |  15 +
>  .../gfortran.dg/do_concurrent_all_clauses.f90 |  26 ++
>  .../gfortran.dg/do_concurrent_basic.f90   |  11 +
>  .../gfortran.dg/do_concurrent_constraints.f90 | 126 +++
>  .../gfortran.dg/do_concurrent_lo

[PING]: [PATCH 1/1] config: Handle dash in library name for AC_LIB_LINKAGEFLAGS_BODY

2024-09-23 Thread Ijaz, Abdul B

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656541.html

Best Regards
Abdul Basit

-Original Message-
From: Ijaz, Abdul B  
Sent: Sunday, July 7, 2024 9:45 PM
To: Ijaz, Abdul B 
Subject: [PATCH 1/1] config: Handle dash in library name for 
AC_LIB_LINKAGEFLAGS_BODY

From: "Ijaz, Abdul B" 

For a library with dash in the name like yaml-cpp the AC_LIB_LINKAGEFLAGS_BODY 
function generates a with_libname_type argument variable name with a dash but 
this results in configure error.  Since dashes are not allowed in the variable 
name.

This change handles such cases and in case input library for the 
AC_LIB_HAVE_LINKFLAGS has dash then it replaces it with the underscore "_".

Example of an error for yaml-cpp library before the change using gcc config 
scripts in gdb:
gdb/gdb/configure: line 22868: with_libyaml-cpp_type=auto: command not found

After having underscore for this variable name:

checking whether to use yaml-cpp... yes
checking for libyaml-cpp... yes
checking how to link with libyaml-cpp... -lyaml-cpp

config/ChangeLog:

* lib-link.m4: Handle dash in the library name for
AC_LIB_LINKFLAGS_BODY.

2024-07-03 Ijaz, Abdul B 
---
 config/lib-link.m4 | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/config/lib-link.m4 b/config/lib-link.m4 index 
20e281fd323..a60a8069453 100644
--- a/config/lib-link.m4
+++ b/config/lib-link.m4
@@ -126,6 +126,7 @@ AC_DEFUN([AC_LIB_LINKFLAGS_BODY],  [
   define([NAME],[translit([$1],[abcdefghijklmnopqrstuvwxyz./-],
[ABCDEFGHIJKLMNOPQRSTUVWXYZ___])])
+  define([Name],[translit([$1],[./-], [___])])
   dnl By default, look in $includedir and $libdir.
   use_additional=yes
   AC_LIB_WITH_FINAL_PREFIX([
@@ -152,8 +153,8 @@ AC_DEFUN([AC_LIB_LINKFLAGS_BODY],
 ])
   AC_LIB_ARG_WITH([lib$1-type],
 [  --with-lib$1-type=TYPE type of library to search for 
(auto/static/shared) ],
-  [ with_lib$1_type=$withval ], [ with_lib$1_type=auto ])
-  lib_type=`eval echo \$with_lib$1_type`
+  [ with_lib[]Name[]_type=$withval ], [ with_lib[]Name[]_type=auto ])  
+ lib_type=`eval echo \$with_lib[]Name[]_type`
 
   dnl Search the library and its dependencies in $additional_libdir and
   dnl $LDFLAGS. Using breadth-first-seach.
--
2.34.1

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

RE: [nvptx] Fix code-gen for alias attribute

2024-09-23 Thread Prathamesh Kulkarni


> -Original Message-
> From: Thomas Schwinge 
> Sent: Wednesday, September 4, 2024 3:15 PM
> To: Prathamesh Kulkarni ; Jan Hubicka
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [nvptx] Fix code-gen for alias attribute
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi!
> 
> Honza (or others, of course), there's a question about
> 'ultimate_alias_target'.
> 
> On 2024-08-26T10:50:36+, Prathamesh Kulkarni
>  wrote:
> > For the following test (adapted from pr96390.c):
> >
> > __attribute__((noipa)) int foo () { return 42; } int bar ()
> > __attribute__((alias ("foo"))); int baz () __attribute__((alias
> > ("bar")));
> 
> > Compiling [for nvptx] results in:
> >
> > ptxas fatal   : Internal error: alias to unknown symbol
> > nvptx-as: ptxas returned 255 exit status
> 
> Prathamesh: thanks for looking into this, and ACK: one of the many
> limitations of PTX '.alias'.  :-|
> 
> > This happens because ptx code-gen shows:
> >
> > // BEGIN GLOBAL FUNCTION DEF: foo
> > .visible .func (.param.u32 %value_out) foo {
> >   [...]
> > }
> > .visible .func (.param.u32 %value_out) bar; .alias bar,foo; .visible
> > .func (.param.u32 %value_out) baz; .alias baz,bar;
> 
> > .alias baz, bar is invalid since PTX requires aliasee to be a defined
> function:
> > https://sw-docs-dgx-station.nvidia.com/cuda-latest/parallel-thread-exe
> > cution/latest-internal/#kernel-and-function-directives-alias
> 
> (Us ordinary mortals need to look at
>  function-directives-alias>;
> please update the Git commit log.)
> 
> > The patch uses cgraph_node::get(name)->ultimate_alias_target ()
> instead of the provided value in nvptx_asm_output_def_from_decls.
> 
> I confirm that resolving to 'ultimate_alias_target' does work for this
> case:
> 
> > For the above case, it now generates the following ptx:
> >
> > .alias baz,foo;
> > instead of:
> > .alias baz,bar;
> >
> > which fixes the issue.
> 
> ..., but I'm not sure if that's conceptually correct; I'm not familiar
> with 'ultimate_alias_target' semantics.  (Honza?)
> 
> Also, I wonder whether 'gcc/varasm.cc:do_assemble_alias' is prepared for
> 'ASM_OUTPUT_DEF_FROM_DECLS' to disregard the specified 'target'/'value'
> and instead do its own thing (here, the proposed resolving to
> 'ultimate_alias_target')?  (No other GCC back end appears to be doing
> such a thing; from a quick look, all appear to faithfully use the
> specified 'target'/'value'.)
> 
> Now, consider the case that the source code is changed as follows:
> 
>  __attribute__((noipa)) int foo () { return 42; }
> -int bar () __attribute__((alias ("foo")));
> +int bar () __attribute__((weak, alias ("foo")));
>  int baz () __attribute__((alias ("bar")));
> 
> With 'ultimate_alias_target', I've checked, you'd then still emit
> '.alias baz,foo;', losing the ability to override the weak alias with a
> strong 'bar' definition in another compilation unit?
> 
> Now, that said: GCC/nvptx for such code currently diagnoses
> "error: weak alias definitions not supported [...]" ;-| -- so we may be
> safe, after all?  ..., or is there any other way that the resolving to
> 'ultimate_alias_target' might cause issues?  If not, then at least your
> proposed patch shouldn't be causing any harm (doesn't affect '--
> target=nvptx-none' test results at all...), and does address one user-
> visible issue ('libgomp.c-c++-common/pr96390.c'), and thus makes sense
> to install.
> 
> > [nvptx] Fix code-gen for alias attribute.
> 
> I'd rather suggest something like:
> "[nvptx] (Some) support for aliases to aliases" (or similar).
> 
> Also, please add "PR target/104957" to the Git commit log, as your
> change directly alters this one aspect of PR104957 "[nvptx] Use .alias
> directive (available starting ptx isa version 6.3)"'s commit r12-7766-
> gf8b15e177155960017ac0c5daef8780d1127f91c
> "[nvptx] Use .alias directive for mptx >= 6.3":
> 
> | Aliases to aliases are not supported (see libgomp.c-c++-
> common/pr96390.c).
> | This is currently not prohibited by the compiler, but with the driver
> | link we run into:  "Internal error: alias to unknown symbol" .
> 
> ... which we then have (some) support for with the proposed code
> changes:
> 
> > --- a/gcc/config/nvptx/nvptx.cc
> > +++ b/gcc/config/nvptx/nvptx.cc
> > @@ -7583,7 +7583,8 @@ nvptx_mem_local_p (rtx mem)
> >while (0)
> >
> >  void
> > -nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
> > +nvptx_asm_output_def_from_decls (FILE *stream, tree name,
> > +  tree value ATTRIBUTE_UNUSED)
> >  {
> >if (nvptx_alias == 0 || !TARGET_PTX_6_3)
> >  {
> > @@ -7618,7 +7619,8 @@ nvptx_asm_output_def_from_decls (FILE *stream,
> tree name, tree value)
> >return;
> >  }
> >
> > -  if (!cgraph_node::get (name)->referred_to_p ())
> > +  cgraph_node *cnode = cgraph_node::get (name);  if
> > + (!cnode->referred_to_p ())
> >  /* Prevent "Inte

[PATCH] tree-optimization/116791 - Elementwise SLP vectorization

2024-09-23 Thread Richard Biener

The following restricts the elementwise SLP vectorization to the
single-lane case which is the reason I enabled it to avoid regressions
with non-SLP.  The PR shows that multi-line SLP loads with elementwise
accesses require work, I'll open a new bug to track this for the
future.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

PR tree-optimization/116791
* tree-vect-stmts.cc (get_group_load_store_type): Only
fall back to elementwise access for single-lane SLP, restore
hard failure mode for other cases.

* gcc.dg/vect/pr116791.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr116791.c | 20 
 gcc/tree-vect-stmts.cc   | 23 +--
 2 files changed, 37 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr116791.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr116791.c 
b/gcc/testsuite/gcc.dg/vect/pr116791.c
new file mode 100644
index 000..d9700a88fcc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr116791.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx2" { target avx2 } } */
+
+struct nine_context {
+  unsigned tex_stage[8][33];
+};
+struct fvec4 {
+  float x[2];
+};
+void f(struct fvec4 *dst, struct nine_context *context)
+{
+  unsigned s;
+  for (s = 0; s < 8; ++s)
+{
+  float *rgba = &dst[s].x[0];
+  unsigned color = context->tex_stage[s][0];
+  rgba[0] = (float)((color >> 16) & 0xFF) / 0xFF;
+  rgba[1] = (float)((color >> 8) & 0xFF) / 0xFF;
+}
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a7032c21d66..2e85b683789 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2192,12 +2192,23 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
  && single_element_p
  && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
{
- *memory_access_type = VMAT_ELEMENTWISE;
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"single-element interleaving not supported "
-"for not adjacent vector loads, using "
-"elementwise access\n");
+ if (SLP_TREE_LANES (slp_node) == 1)
+   {
+ *memory_access_type = VMAT_ELEMENTWISE;
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"single-element interleaving not supported 
"
+"for not adjacent vector loads, using "
+"elementwise access\n");
+   }
+ else
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"single-element interleaving not supported 
"
+"for not adjacent vector loads\n");
+ return false;
+   }
}
}
 }
-- 
2.43.0

Re: [COMMITTED] testsuite: debug: fix dejagnu directive syntax

2024-09-23 Thread Sam James

Thomas Schwinge  writes:

> Hi Andrew, Sam!
>
> On 2024-09-20T14:21:33-0700, Andrew Pinski  wrote:
>> On Fri, Sep 20, 2024 at 1:53 AM Thomas Schwinge  
>> wrote:
>>> On 2024-09-20T05:12:19+0100, Sam James  wrote:
>>> > In this case, they were all harmless in reality (no diff in test logs).
>>>
>>> > -/* { dg-do compile )  */
>>> > +/* { dg-do compile } */
>>>
>>> DejaGnu directives are matched by '{ dg-[...] }' (simplified; see
>>> '/usr/share/dejagnu/dg.exp:dg-get-options' for the details), so your
>>> changes did not "fix dejagnu directive syntax", but rather fix whitespace
>>> around DejaGnu directives.  ;-P (Thanks either way!)
>>
>> I think you missed that in these cases it was `)` vs `}` . yes some
>> fonts it is sometimes hard to tell especially it is on different lines
>> but they are different characters.
>
> Indeed, you're absolutely right, thanks for pointing that out, Andrew!
> Sam, I apologize -- you did "fix dejagnu directive syntax" after all!
> I'll blame it 1/3 on the font display (..., but now I do see it, of
> course...), 1/3 on me not paying carful attention, and 1/3 on me not
> paying carful attention due to having a cold after return from the
> GNU Tools Cauldron 2024 yet trying to be funny.
>
> ..., and now let me please crawl back under my stone, and hide in shame.

No, I appreciate people checking carefully! Just keep this in mind for
when I inevitably make a mistake next and go easy on me ;)

Cheers!
sam

Re: [PATCH] gdbhooks: Handle references to vec* in VecPrinter

2024-09-23 Thread Alex Coplan

On 30/08/2024 18:11, Alex Coplan wrote:
> Hi,
> 
> vec.h has this method:
> 
>   template
>   inline T *
>   vec_safe_push (vec *&v, const T &obj CXX_MEM_STAT_INFO)
> 
> where v is a reference to a pointer to vec.  This matches the regex for
> VecPrinter, so gdbhooks.py attempts to print it but chokes on the reference.
> I see the following:
> 
>   #1  0x02b84b7b in vec_safe_push (v=Traceback (most
>   recent call last):
> File "$SRC/gcc/gcc/gdbhooks.py", line 486, in to_string
>   return '0x%x' % intptr(self.gdbval)
> File "$SRC/gcc/gcc/gdbhooks.py", line 168, in intptr
>   return long(gdbval) if sys.version_info.major == 2 else int(gdbval)
>   gdb.error: Cannot convert value to long.
> 
> This patch makes VecPrinter handle such references by stripping them
> (dereferencing) at the top of the relevant functions.
> 
> I thought about trying to make VecPrinter.{to_string,children} robust
> against non-pointer values (i.e. actual vec structs) as the current
> calls to intptr will fail on those.  However, I then realised that the
> current regex only matches pointer types:
> 
>   pp.add_printer_for_regex(r'vec<(\S+), (\S+), (\S+)> \*',
>'vec',
>VecPrinter)
> 
> That is somewhat at odds with the (pre-existing) code in
> VecPrinter.children which appears to attempt to handle non-pointer
> types.  ISTM either we should drop the handling for non-pointer types
> (since the regex requires a pointer) or (perhaps more usefully) relax
> the regex to allow matching a plain vec<...> struct and fix the member
> functions to handle those properly.
> 
> Any thoughts on that, Dave?  Is the current patch OK as an intermediate
> step (manually tested by verifying both a vec*& and vec* print OK)?

Gentle ping on this.

> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   * gdbhooks.py (strip_ref): New. Use it ...
>   (VecPrinter.to_string): ... here,
>   (VecPrinter.children): ... and here.

> diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
> index 904ee28423a..a91e5fd2a83 100644
> --- a/gcc/gdbhooks.py
> +++ b/gcc/gdbhooks.py
> @@ -472,6 +472,11 @@ def get_vec_kind(val):
>  else:
>  assert False, f"unexpected vec kind {kind}"
>  
> +def strip_ref(gdbval):
> +if gdbval.type.code == gdb.TYPE_CODE_REF:
> +return gdbval.referenced_value ()
> +return gdbval
> +
>  class VecPrinter:
>  #-ex "up" -ex "p bb->preds"
>  def __init__(self, gdbval):
> @@ -483,10 +488,10 @@ class VecPrinter:
>  def to_string (self):
>  # A trivial implementation; prettyprinting the contents is done
>  # by gdb calling the "children" method below.
> -return '0x%x' % intptr(self.gdbval)
> +return '0x%x' % intptr(strip_ref(self.gdbval))
>  
>  def children (self):
> -val = self.gdbval
> +val = strip_ref(self.gdbval)
>  if intptr(val) != 0 and get_vec_kind(val) == VEC_KIND_PTR:
>  val = val['m_vec']
>

pair-fusion: Assume alias conflict if common address reg changes [PR116783]

2024-09-23 Thread Alex Coplan

Hi,

As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn.  This lead to wrong code as
the accesses really did alias.

To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).

Bootstrapped/regtested on aarch64-linux-gnu (all languages, both regular
bootstrap and LTO+PGO bootstrap).  OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR rtl-optimization/116783
* pair-fusion.cc (def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(pair_fusion_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.
diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index cb0374f426b..b1ea611bacd 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -2089,11 +2089,80 @@ protected:
 
   def_iter_t def_iter;
   insn_info *limit;
-  def_walker (def_info *def, insn_info *limit) :
-def_iter (def), limit (limit) {}
+
+  // Array of register uses from the candidate insn which occur in MEMs.
+  use_array cand_addr_uses;
+
+  def_walker (def_info *def, insn_info *limit, use_array addr_uses) :
+def_iter (def), limit (limit), cand_addr_uses (addr_uses) {}
 
   virtual bool iter_valid () const { return *def_iter; }
 
+  // Implemented in {load,store}_walker.
+  virtual bool alias_conflict_p (int &budget) const = 0;
+
+  // Return true if the current (walking) INSN () uses a register R inside a
+  // MEM, where R is also used inside a MEM by the (static) candidate insn, and
+  // those uses see different definitions of that register.  In this case we
+  // can't rely on RTL alias analysis, and for now we conservatively assume 
that
+  // there is an alias conflict.  See PR116783.
+  bool addr_reg_conflict_p () const
+  {
+use_array curr_insn_uses = insn ()->uses ();
+auto cand_use_iter = cand_addr_uses.begin ();
+auto insn_use_iter = curr_insn_uses.begin ();
+while (cand_use_iter != cand_addr_uses.end ()
+  && insn_use_iter != curr_insn_uses.end ())
+  {
+   auto insn_use = *insn_use_iter;
+   auto cand_use = *cand_use_iter;
+   if (insn_use->regno () > cand_use->regno ())
+ cand_use_iter++;
+   else if (insn_use->regno () < cand_use->regno ())
+ insn_use_iter++;
+   else
+ {
+   // As it stands I believe the alias code (memory_modified_in_insn_p)
+   // doesn't look at insn notes such as REG_EQU{IV,AL}, so it should
+   // be safe to skip over uses that only occur in notes.
+   if (insn_use->includes_address_uses ()
+   && !insn_use->only_occurs_in_notes ()
+   && insn_use->def () != cand_use->def ())
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file,
+"assuming aliasing of cand i%d and i%d:\n"
+"-> insns see different defs of common addr reg 
r%u\n"
+"-> ",
+cand_use->insn ()->uid (), insn_use->insn ()->uid 
(),
+insn_use->regno ());
+
+   // Note that while the following sequence could be made more
+   // concise by eliding pp_string calls into the pp_printf
+   // calls, doing so triggers -Wformat-diag.
+   pretty_printer pp;
+   pp_string (&pp, "[");
+   pp_access (&pp, cand_use, 0);
+   pp_string (&pp, "] in ");
+   pp_printf (&pp, "i%d", cand_use->insn ()->uid ());
+   pp_string (&pp, " vs [");
+   pp_access (&pp, insn_use, 0);
+   pp_string (&pp, "] in ");
+   pp_printf (&pp, "i%d", insn_use->insn ()->uid ());
+   fprintf (dump_file, "%s\n", pp_formatted_text (&pp));
+ }
+   return true;
+ }
+
+   cand_use_iter++;
+

[PATCH v2] c++: Don't ICE due to artificial constructor parameters [PR116722]

2024-09-23 Thread Simon Martin

Hi Jason,

On 20 Sep 2024, at 18:01, Jason Merrill wrote:

> On 9/20/24 5:21 PM, Simon Martin wrote:
>> The following code triggers an ICE
>>
>> === cut here ===
>> class base {};
>> class derived : virtual public base {
>> public:
>>template constexpr derived(Arg) {}
>> };
>> int main() {
>>derived obj(1.);
>> }
>> === cut here ===
>>
>> The problem is that cxx_bind_parameters_in_call ends up attempting to

>> convert a REAL_CST (the first non artificial parameter) to 
>> INTEGER_TYPE
>> (the type of the __in_chrg parameter), which ICEs.
>>
>> This patch teaches cxx_bind_parameters_in_call to handle the 
>> __in_chrg
>> and __vtt_parm parameters that {con,de}structors might have.
>>
>> Note that in the test case, the constructor is not 
>> constexpr-suitable,
>> however it's OK since it's a template according to my read of 
>> paragraph
>> (3) of [dcl.constexpr].
>
> Agreed.
>
> It looks like your patch doesn't correct the mismatching of arguments 
> to parameters that you describe, but at least for now it should be 
> enough to set *non_constant_p and return if we see a VTT or in-charge 
> parameter.
>
Thanks, it’s true that my initial patch was wrong in that we’d leave 
cxx_bind_parameters_in_call thinking the expression was actually a 
constant expression :-/

The attached revised patch follows your suggestion (thanks!). 
Successfully tested on x86_64-pc-linux-gnu. OK for trunk?

Thanks,
   SimonFrom 12d818220d4addc76f0d8e1fdf8feba336fb9b04 Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Wed, 18 Sep 2024 12:35:27 +0200
Subject: [PATCH] c++: Don't ICE due to artificial constructor parameters 
[PR116722]

The following code triggers an ICE

=== cut here ===
class base {};
class derived : virtual public base {
public:
  template constexpr derived(Arg) {}
};
int main() {
  derived obj(1.);
}
=== cut here ===

The problem is that cxx_bind_parameters_in_call ends up attempting to
convert a REAL_CST (the first non artificial parameter) to INTEGER_TYPE
(the type of the __in_chrg parameter), which ICEs.

This patch changes cxx_bind_parameters_in_call to return early if it's
called with a *structor that has an __in_chrg or __vtt_parm parameter
since the expression won't be a constant expression.

Note that in the test case, the constructor is not constexpr-suitable,
however it's OK since it's a template according to my read of paragraph
(3) of [dcl.constexpr].

Successfully tested on x86_64-pc-linux-gnu.

PR c++/116722

gcc/cp/ChangeLog:

* constexpr.cc (cxx_bind_parameters_in_call): Leave early for
{con,de}structors of classes with virtual bases.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-ctor22.C: New test.

---
 gcc/cp/constexpr.cc   | 11 ++-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ctor22.C | 15 +++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-ctor22.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index f6fd059be46..5c6696740fc 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1862,6 +1862,15 @@ cxx_bind_parameters_in_call (const constexpr_ctx *ctx, 
tree t, tree fun,
   int nparms = list_length (parms);
   int nbinds = nargs < nparms ? nargs : nparms;
   tree binds = make_tree_vec (nbinds);
+
+  /* The call is not a constant expression if it involves the cdtor for a type
+ with virtual bases.  */
+  if (DECL_HAS_IN_CHARGE_PARM_P (fun) || DECL_HAS_VTT_PARM_P (fun))
+{
+  *non_constant_p = true;
+  return binds;
+}
+
   for (i = 0; i < nargs; ++i)
 {
   tree x, arg;
@@ -1871,7 +1880,7 @@ cxx_bind_parameters_in_call (const constexpr_ctx *ctx, 
tree t, tree fun,
   x = get_nth_callarg (t, i);
   /* For member function, the first argument is a pointer to the implied
  object.  For a constructor, it might still be a dummy object, in
- which case we get the real argument from ctx. */
+which case we get the real argument from ctx.  */
   if (i == 0 && DECL_CONSTRUCTOR_P (fun)
  && is_dummy_object (x))
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-ctor22.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-ctor22.C
new file mode 100644
index 000..279f6ec4454
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-ctor22.C
@@ -0,0 +1,15 @@
+// PR c++/116722
+// We're now accepting this in spite of the virtual base class. This is OK
+// according to [dcl.constexpr] 3: "Except for instantiated constexpr functions
+// non-templated constexpr functions shall be constexpr-suitable".
+// { dg-do compile { target c++11 } }
+
+class base {};
+class derived : virtual public base {
+public:
+  template
+  constexpr derived(Arg) {}
+};
+int main() {
+  derived obj(1.);
+}
-- 
2.44.0

[PATCH] Update email in MAINTAINERS file.

2024-09-23 Thread Aldy Hernandez

From: Aldy Hernandez 

ChangeLog:

* MAINTAINERS: Update email and add myself to DCO.
---
 MAINTAINERS | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index cfd96c9f33e..e9fafaf45a7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -116,7 +116,7 @@ riscv port  Jim Wilson  

 rs6000/powerpc port David Edelsohn  
 rs6000/powerpc port Segher Boessenkool  
 rs6000/powerpc port Kewen Lin   
-rs6000 vector extns Aldy Hernandez  
+rs6000 vector extns Aldy Hernandez  
 rx port Nick Clifton
 s390 port   Ulrich Weigand  
 s390 port   Andreas Krebbel 
@@ -213,7 +213,7 @@ c++ runtime libsJonathan Wakely 

 c++ runtime libs special modes  Fran??ois Dumont 
 fixincludes Bruce Korb  
 *gimpl* Jakub Jelinek   
-*gimpl* Aldy Hernandez  
+*gimpl* Aldy Hernandez  
 *gimpl* Jason Merrill   
 gcse.cc Jeff Law
 global opt frameworkJeff Law
@@ -240,7 +240,7 @@ option handling Joseph Myers

 middle-end  Jeff Law
 middle-end  Ian Lance Taylor
 middle-end  Richard Biener  
-*vrp, rangerAldy Hernandez  
+*vrp, rangerAldy Hernandez  
 *vrp, rangerAndrew MacLeod  
 tree-ssaAndrew MacLeod  
 tree browser/unparser   Sebastian Pop   
@@ -518,7 +518,7 @@ Daniel Hellstromdanielh 

 Fergus Henderson-   
 Richard Henderson   rth 
 Stuart Hendersonshenders
-Aldy Hernandez  aldyh   
+Aldy Hernandez  aldyh   
 Philip Herron   redbrain
 Marius Hillenbrand  -   
 Matthew Hiller  -   
@@ -948,3 +948,4 @@ Jonathan Wakely 

 Alexander Westbrooks
 Chung-Ju Wu 
 Pengxuan Zheng  
+Aldy Hernandez  
-- 
2.43.0

[PATCH v3 2/4] tree-optimization/116024 - simplify C1-X cmp C2 for unsigned types

2024-09-23 Thread Artemiy Volkov

Implement a match.pd transformation inverting the sign of X in
C1 - X cmp C2, where C1 and C2 are integer constants and X is
of an unsigned type, by observing that:

(a) If cmp is == or !=, simply move X and C2 to opposite sides of the
comparison to arrive at X cmp C1 - C2.

(b) If cmp is <:
- C1 - X < C2 means that C1 - X spans the range of 0, 1, ..., C2 - 1;
- This means that X spans the range of C1 - (C2 - 1),
  C1 - (C2 - 2), ..., C1;
- Subtracting C1 - (C2 - 1), X - (C1 - (C2 - 1)) is one of 0, 1,
  ..., C1 - (C1 - (C2 - 1));
- Simplifying the above, X - (C1 - C2 + 1) is one of 0, 1, ...,
 C2 - 1;
- Summarizing, the expression C1 - X < C2 can be transformed
  into X - (C1 - C2 + 1) < C2.

(c) Similarly, if cmp is <=:
- C1 - X <= C2 means that C1 - X is one of 0, 1, ..., C2;
- It follows that X is one of C1 - C2, C1 - (C2 - 1), ..., C1;
- Subtracting C1 - C2, X - (C1 - C2) has range 0, 1, ..., C2;
- Thus, the expression C1 - X <= C2 can be transformed into
  X - (C1 - C2) <= C2.

(d) The >= and > cases are negations of (b) and (c), respectively.

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

300 - (unsigned int)f() < 100;

now compiles to

addia0,a0,-201
sltiu   a0,a0,100

instead of

li  a5,300
sub a0,a5,a0
sltiu   a0,a0,100

on 32-bit RISC-V.

Additional examples can be found in the newly added test file.  This
patch has been bootstrapped and regtested on aarch64, x86_64, and i386,
and additionally regtested on riscv32.

gcc/ChangeLog:

PR tree-optimization/116024
* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116024-1.c: New test.

Signed-off-by: Artemiy Volkov 
---
 gcc/match.pd   | 23 ++-
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c | 73 ++
 2 files changed, 95 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 81be0a21462..d0489789527 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8949,7 +8949,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 TYPE_SIGN (TREE_TYPE (@0)));
   constant_boolean_node (less == ovf_high, type);
 })
-  (rcmp @1 { res; }))
+  (rcmp @1 { res; })))
+/* For unsigned types, transform like so (using < as example):
+C1 - X < C2
+  ==>  C1 - X = { 0, 1, ..., C2 - 1 }
+  ==>  X = { C1 - (C2 - 1), ..., C1 + 1, C1 }
+  ==>  X - (C1 - (C2 - 1)) = { 0, 1, ..., C1 - (C1 - (C2 - 1)) }
+  ==>  X - (C1 - C2 + 1) = { 0, 1, ..., C2 - 1 }
+  ==>  X - (C1 - C2 + 1) < C2.
+
+  Similarly,
+C1 - X <= C2 ==> X - (C1 - C2) <= C2;
+C1 - X >= C2 ==> X - (C1 - C2 + 1) >= C2;
+C1 - X > C2 ==> X - (C1 - C2) > C2.  */
+   (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
+ (switch
+   (if (cmp == EQ_EXPR || cmp == NE_EXPR)
+(cmp @1 (minus @0 @2)))
+   (if (cmp == LE_EXPR || cmp == GT_EXPR)
+(cmp (plus @1 (minus @2 @0)) @2))
+   (if (cmp == LT_EXPR || cmp == GE_EXPR)
+(cmp (plus @1 (minus @2
+  (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2)))
 
 /* Canonicalizations of BIT_FIELD_REFs.  */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c
new file mode 100644
index 000..48e647dc0c6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c
@@ -0,0 +1,73 @@
+/* PR tree-optimization/116024 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-details" } */
+
+#include 
+
+uint32_t f(void);
+
+int32_t i2(void)
+{
+  uint32_t l = 2;
+  l = 10 - (uint32_t)f();
+  return l <= 20; // f() + 10 <= 20 
+}
+
+int32_t i2a(void)
+{
+  uint32_t l = 2;
+  l = 10 - (uint32_t)f();
+  return l < 30; // f() + 19 < 30 
+}
+
+int32_t i2b(void)
+{
+  uint32_t l = 2;
+  l = 200 - (uint32_t)f();
+  return l <= 100; // f() - 100 <= 100 
+}
+
+int32_t i2c(void)
+{
+  uint32_t l = 2;
+  l = 300 - (uint32_t)f();
+  return l < 100; // f() - 201 < 100
+}
+
+int32_t i2d(void)
+{
+  uint32_t l = 2;
+  l = 1000 - (uint32_t)f();
+  return l >= 2000; // f() + 999 >= 2000
+}
+
+int32_t i2e(void)
+{
+  uint32_t l = 2;
+  l = 1000 - (uint32_t)f();
+  return l > 3000; // f() + 2000 > 3000
+}
+
+int32_t i2f(void)
+{
+  uint32_t l = 2;
+  l = 2 - (uint32_t)f();
+  return l >= 1; // f() - 10001 >= 1
+}
+
+int32_t i2g(void)
+{
+  uint32_t l = 2;
+  l = 3 - (uint32_t)f();
+  return l > 1; // f() - 2 > 1
+}
+
+/* { dg-final { scan-tree-dump-times "Removing dead stmt:.*?- _" 8 "forwprop1" 
} } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ 10.*\n.*<= 
20" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\

[PATCH v3 0/4] tree-optimization/116024 - match.pd: add 4

2024-09-23 Thread Artemiy Volkov

Hi,

sending a v3 of
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661066.html with
the following changes since v2:

- Split one big patch into 4 smaller ones, each corresponding to a
  specific new match.pd (sub-)pattern.
- Clarified the logic of each transformation in inline comments as well
  as in commit messages.
- Simplified new testcases by removing unnecessary type limit constants
  ({U,}INT32_{MIN,MAX}).
- Removed use of fold_overflow_warning () in patch #1.
- Fixed an implementation error in patch #3 (use bit_not (C2) instead of
  negate (C2)).
- Fixed an implementation error in patch #4 (switch sign of INF in
  computation of c1_cst and use c2 instead of max/min).
- Multiple smaller cosmetic improvements.

Could someone please help out with review and/or pushing this to
trunk/14?

Many thanks in advance,
Artemiy

Artemiy Volkov (4):
  tree-optimization/116024 - simplify C1-X cmp C2 for UB-on-overflow
types
  tree-optimization/116024 - simplify C1-X cmp C2 for unsigned types
  tree-optimization/116024 - simplify C1-X cmp C2 for wrapping signed
types
  tree-optimization/116024 - simplify some cases of X +- C1 cmp C2

 gcc/match.pd  | 109 +-
 gcc/testsuite/gcc.dg/pr67089-6.c  |   4 +-
 .../gcc.dg/tree-ssa/pr116024-1-fwrapv.c   |  73 
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c|  73 
 .../gcc.dg/tree-ssa/pr116024-2-fwrapv.c   |  38 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c|  38 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c  |  74 
 .../gcc.target/aarch64/gtu_to_ltu_cmp_1.c |   2 +-
 8 files changed, 407 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2-fwrapv.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c

-- 
2.44.2

[PATCH v3 4/4] tree-optimization/116024 - simplify some cases of X +- C1 cmp C2

2024-09-23 Thread Artemiy Volkov

Whenever C1 and C2 are integer constants, X is of a wrapping type, and
cmp is a relational operator, the expression X +- C1 cmp C2 can be
simplified in the following cases:

(a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial
comparison in the following way:
   X +- C1 <= C2
   -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1)
   -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions)
   -INF -+ C1 <= X <= +INF (due to (1))
   -INF -+ C1 <= X (eliminate the right hand side since it holds for any X)

(b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following
sequence of transformations:

   X +- C1 >= C2
   +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1)
   +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions)
   +INF -+ C1 >= X >= -INF (due to (1))
   +INF -+ C1 >= X (eliminate the right hand side since it holds for any X)

(c) The > and < cases are negations of (a) and (b), respectively.

This transformation allows to occasionally save add / sub instructions,
for instance the expression

3 + (uint32_t)f() < 2

compiles to

cmn w0, #4
csetw0, ls

instead of

add w0, w0, 3
cmp w0, 2
csetw0, ls

on aarch64.

Testcases that go together with this patch have been split into two
separate files, one containing testcases for unsigned variables and the
other for wrapping signed ones (and thus compiled with -fwrapv).
Additionally, one aarch64 test has been adjusted since the patch has
caused the generated code to change from

cmn w0, #2
csinc   w0, w1, wzr, cc   (x < -2)

to

cmn w0, #3
csinc   w0, w1, wzr, cs   (x <= -3)

This patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.

gcc/ChangeLog:

PR tree-optimization/116024
* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116024-2.c: New test.
* gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto.
* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.

Signed-off-by: Artemiy Volkov 
---
 gcc/match.pd  | 44 ++-
 .../gcc.dg/tree-ssa/pr116024-2-fwrapv.c   | 38 
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c| 38 
 .../gcc.target/aarch64/gtu_to_ltu_cmp_1.c |  2 +-
 4 files changed, 120 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2-fwrapv.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index bf3b4a2e3fe..3275a69252f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8896,6 +8896,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(cmp @0 { TREE_OVERFLOW (res)
 ? drop_tree_overflow (res) : res; }
 (for cmp (lt le gt ge)
+ rcmp (gt ge lt le)
  (for op (plus minus)
   rop (minus plus)
   (simplify
@@ -8923,7 +8924,48 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  "X cmp C2 -+ C1"),
 WARN_STRICT_OVERFLOW_COMPARISON);
}
-   (cmp @0 { res; })
+   (cmp @0 { res; })
+/* For wrapping types, simplify the following cases of X +- C1 CMP C2:
+
+   (a) If CMP is <= and C2 -+ C1 == +INF (1), simplify to X >= -INF -+ C1
+   by observing the following:
+
+   X +- C1 <= C2
+  ==>  -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1)
+  ==>  -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions)
+  ==>  -INF -+ C1 <= X <= +INF (due to (1))
+  ==>  -INF -+ C1 <= X (eliminate the right hand side since it holds for any X)
+
+(b) Similarly, if CMP is >= and C2 -+ C1 == -INF (1):
+
+   X +- C1 >= C2
+  ==>  +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1)
+  ==>  +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions)
+  ==>  +INF -+ C1 >= X >= -INF (due to (1))
+  ==>  +INF -+ C1 >= X (eliminate the right hand side since it holds for any X)
+
+(c) The > and < cases are negations of (a) and (b), respectively.  */
+   (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+ (with
+   {
+   wide_int max = wi::max_value (TREE_TYPE (@0));
+   wide_int min = wi::min_value (TREE_TYPE (@0));
+
+   wide_int c2 = rop == PLUS_EXPR
+ ? wi::add (wi::to_wide (@2), wi::to_wide (@1))
+ : wi::sub (wi::to_wide (@2), wi::to_wide (@1));
+   }
+   (if (((cmp == LE_EXPR || cmp == GT_EXPR) && wi::eq_p (c2, max))
+   || ((cmp == LT_EXPR || cmp == GE_EXPR) && wi::eq_p (c2, min)))
+ (with
+  {
+wide_int c1 = rop == PLUS_EXPR
+  ? wi::add (wi::bit_not (c2), wi::to_wide (@1))
+  : wi::sub (wi::bit_not (c2), wi::to_wide (@1));
+tree c1_cst = build_uniform_cst (TREE_TYPE (@0),
+   wide_int_to_tree (TREE_TYPE (@0), c1));
+

[PATCH] [MAINTAINERS] Add myself to write after approval

2024-09-23 Thread saurabh.jha


ChangeLog:

* MAINTAINERS: Add myself to write after approval.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e9fafaf45a7..0ea4db20f88 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -557,6 +557,7 @@ Andrew Jenner   andrewjenner
 Haochen Jiang   -   
 Qian Jianhua-   
 Michal Jiresmjires  
+Saurabh Jha -   
 Janis Johnson   janis   
 Teresa Johnson  tejohnson   
 Kean Johnston   -

Re: [PATCH] c++/contracts: ICE in build_contract_condition_function [PR116490]

2024-09-23 Thread Nina Dinka Ranns

Hi all,

just pinging this thread in case it got lost in the shuffle.

Best,
Nina

On Fri, 30 Aug 2024 at 13:49, Nina Dinka Ranns 
wrote:

> We currently do not expect comdat group of the guarded function to
> be set at the time of generating pre and post check function.
> However, in the case of an explicit instantiation, the guarded
> function has been added to a comdat group before generating contract
> check functions, which causes the observed ICE. Current assert
> removed and an additional check for comdat group of the guarded
> function added. With this change, the pre and post check functions
> get added to the same comdat group of the guarded function if the
> guarded function is already placed in a comdat group.
>
> Tested on x86_64-pc-linux-gnu.
>
> Patch attached to the email.
>
> OK for trunk?
>
> Best,
> Nina
>

Re: [PATCH v2] c++: Don't ICE due to artificial constructor parameters [PR116722]

2024-09-23 Thread Jason Merrill


On 9/23/24 10:44 AM, Simon Martin wrote:

Hi Jason,

On 20 Sep 2024, at 18:01, Jason Merrill wrote:


On 9/20/24 5:21 PM, Simon Martin wrote:

The following code triggers an ICE

=== cut here ===
class base {};
class derived : virtual public base {
public:
template constexpr derived(Arg) {}
};
int main() {
derived obj(1.);
}
=== cut here ===

The problem is that cxx_bind_parameters_in_call ends up attempting to



convert a REAL_CST (the first non artificial parameter) to
INTEGER_TYPE
(the type of the __in_chrg parameter), which ICEs.

This patch teaches cxx_bind_parameters_in_call to handle the
__in_chrg
and __vtt_parm parameters that {con,de}structors might have.

Note that in the test case, the constructor is not
constexpr-suitable,
however it's OK since it's a template according to my read of
paragraph
(3) of [dcl.constexpr].


Agreed.

It looks like your patch doesn't correct the mismatching of arguments
to parameters that you describe, but at least for now it should be
enough to set *non_constant_p and return if we see a VTT or in-charge
parameter.


Thanks, it’s true that my initial patch was wrong in that we’d leave
cxx_bind_parameters_in_call thinking the expression was actually a
constant expression :-/

The attached revised patch follows your suggestion (thanks!).
Successfully tested on x86_64-pc-linux-gnu. OK for trunk?


OK.

Re: [PATCH] doc: Remove @code wrapping of fortran option names [PR116801]

2024-09-23 Thread Mikael Morin


Le 23/09/2024 à 00:01, Andreas Schwab a écrit :

On Sep 22 2024, Arsen Arsenović wrote:


Andreas Schwab  writes:


On Sep 22 2024, Jakub Jelinek wrote:


On Sun, Sep 22, 2024 at 10:52:37PM +0200, Andreas Schwab wrote:

On Sep 22 2024, Mikael Morin wrote:


@@ -370,7 +370,7 @@ Set the default accessibility of module entities to 
@code{PRIVATE}.
  Use-associated entities will not be accessible unless they are explicitly
  declared as @code{PUBLIC}.
  
-@opindex @code{ffixed-line-length-}@var{n}

+@opindex ffixed-line-length-@var{n}


Shouldn't all the @var{...} parts be dropped as well, throughout?


We have it all over the other manuals:


But it causes them to not show up in the urls files.


That seems like a defect of the regen script rather than of the manuals.
They're there for a reason (signifying that something is not a fixed
string).


It's only about the @opindex.  The vast majority have those variable
parts removed from the index entry.

For options where the variable is not a separate argument, I think it's 
preferable to keep the variable.


For example, -ffree-line-length-@var{n} looks better on the index page 
as "-ffree-line-length-n" (with the n having a different formatting), 
than as "-free-line-length-".  It makes it clear that there is a suffix 
to the option.

[Patch, fortran] PR116733: Generic processing of assumed rank objects (f202y)

2024-09-23 Thread Paul Richard Thomas

Hi All,

The moment I saw the DIN4 proposal for "Generic processing of assumed rank
objects", I thought that this was a highly intuitive and implementable
proposal. I implemented a test version in June and had some correspondence
with Reinhold Bader about it shortly before he passed away.

Malcolm Cohen wrote J3/24-136r1 in response to this and I have posted a
comment in PR116733 addressing the the extent to which the attached patch
addresses his remarks.

Before this patch goes through the approval process, we have to consider
how experimental F202y features can be carried forward. I was badly bitten
by failing to synchronise the array descriptor reform branch to the extent
that I gave up on it and adopted the simplified reform that is now in
place. Given the likely timescale before the full adoption of the F202y
standard, this is a considerable risk for experimental features, given the
variability of active maintainers:

What I propose is the following:
(i) For audit purposes, I have opened PR116732, which should be blocked by
PRs for each experimental F202y feature;
(ii) These PRs should represent a complete audit trail for each feature; and
(iii) All such experimental features should be enabled on mainline by
--std=f202y, which is equivalent to -std=f2023+f202y.

The attached patch enables pointer assignment and associate, both with rank
remapping, plus the reshape intrinsics. which was not part of the DIN4
proposal.

The ChangeLog entries do a pretty complete job of describing the patch.

Regtests correctly. OK for mainline?

Paul
diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
index 1fa61ebfe2a..3f724852db9 100644
--- a/gcc/fortran/array.cc
+++ b/gcc/fortran/array.cc
@@ -866,7 +866,7 @@ gfc_set_array_spec (gfc_symbol *sym, gfc_array_spec *as, locus *error_loc)
 {
   int i;
   symbol_attribute *attr;
-  
+
   if (as == NULL)
 return true;
 
@@ -875,7 +875,7 @@ gfc_set_array_spec (gfc_symbol *sym, gfc_array_spec *as, locus *error_loc)
   attr = &sym->attr;
   if (gfc_submodule_procedure(attr))
 return true;
-  
+
   if (as->rank
   && !gfc_add_dimension (&sym->attr, sym->name, error_loc))
 return false;
@@ -2454,7 +2454,7 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen, mpz_t *result, mpz_t *end)
 	mpz_set_ui (stride, 1);
   else
 	{
-	  stride_expr = gfc_copy_expr(ar->stride[dimen]); 
+	  stride_expr = gfc_copy_expr(ar->stride[dimen]);
 
 	  if (!gfc_simplify_expr (stride_expr, 1)
 	 || stride_expr->expr_type != EXPR_CONSTANT
diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index 81c641e2322..9e5b141518c 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -4357,9 +4357,18 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue,
 	  return false;
 	}
 
+  /* An assumed rank target is an experimental F202y feature.  */
+  if (rvalue->rank == -1 && !(gfc_option.allow_std & GFC_STD_F202Y))
+	{
+	  gfc_error ("The assumed rank target at %L is an experimental F202y "
+		 "feature. Use option -std=f202y to enable",
+		 &rvalue->where);
+	  return false;
+	}
+
   /* The target must be either rank one or it must be simply contiguous
 	 and F2008 must be allowed.  */
-  if (rvalue->rank != 1)
+  if (rvalue->rank != 1 && rvalue->rank != -1)
 	{
 	  if (!gfc_is_simply_contiguous (rvalue, true, false))
 	{
@@ -4372,6 +4381,21 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue,
 	return false;
 	}
 }
+  else if (rvalue->rank == -1)
+{
+  gfc_error ("The data-target at %L ia an assumed rank object and so the "
+		 "data-pointer-object %s must have a bounds remapping list "
+		 "(list of lbound:ubound for each dimension)",
+		  &rvalue->where, lvalue->symtree->name);
+  return false;
+}
+
+  if (rvalue->rank == -1 && !gfc_is_simply_contiguous (rvalue, true, false))
+{
+  gfc_error ("The assumed rank data-target at %L must be contiguous",
+		 &rvalue->where);
+  return false;
+}
 
   /* Now punt if we are dealing with a NULLIFY(X) or X = NULL(X).  */
   if (rvalue->expr_type == EXPR_NULL)
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 37c28691f41..57890472d04 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3020,6 +3020,8 @@ typedef struct gfc_association_list
 
   gfc_expr *target;
 
+  gfc_array_ref *ar;
+
   /* Used for inferring the derived type of an associate name, whose selector
  is a sibling derived type function that has not yet been parsed.  */
   gfc_symbol *derived_types;
diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index b592fe4f6c7..dbcbed8bf30 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -3337,6 +3337,16 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
 	  goto match;
 	}
 
+  if (warn_surprising
+	  && a->expr->expr_type == EXPR_VARIABLE
+	  && a->expr->symtree->n.sym->as
+	  && a->expr->symtree->n.sym->as->type == AS_ASSUMED_SIZE

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Tobias Burnus


Hi Paul,

Am 23.09.24 um 10:26 schrieb Paul Richard Thomas:

In addition to Andre's remarks, could you please tell us, when you
resubmit, if this is a complete F2023 implementation of do concurrent.
If not, what is missing?


Regarding missing parts: still to do is actually privatizing (with or
without initialization) for variables that are listed with 'local' and
'local_init'. Hence, code doing that currently fails after doing all
required diagnostic with a 'sorry not yet implemented error'. [My
feeling is that doing it in trans*.cc might make most sense, but it
could be also done by adding at Fortran AST level (inserting a BLOCK +
adding the variable there).]

Otherwise, all parsing + diagnostic should work; 'default(none)' is
diagnostics only and 'shared' doesn't do anything, except affecting
'default(none)' diagnostic. — 'reduce' will have a code gen effect, but
only when going to real concurrency/parallel execution.

* * *

If you talk about unimplemented 'do concurrent' features in general,
gfortran does not handle the forall/do-concurrent header with typespec
(i.e. 'do concurrent (integer :: i = 1, 4)', cf.
https://gcc.gnu.org/PR96255 [F2018 feature].

* * *

In terms of true parallelization:

* I was (since a while) thinking of having a
-fdo-concurrent=
compile-time flag to handle this.

* OpenMP 6.0 (added I think in Technical Report (TR) 13, which was
released Aug 1, 2024) now supports '!$omp loop' on 'do concurrent'

Either variant would then use the new locality spec (F2018/F2023 and new
in gfortran) and hook into the existing OpenMP/OpenACC handling. –
'!$omp loop' and -fdo-concurrent=omp-parallel are in any case easier
than 'omp-target-parallel' as the latter will run into issues related to
data mapping or (potentially) atomic updates now having to be in sync
with host atomic access.


BTW Thanks for doing this. It was on my long term TODO list and is now
struck off :-)


Yes – and I have heard from others that do-concurrent actually being
concurrent – or at least having having the new locality specs even if
not run concurrently is a much missed feature. — That might be from a
small bubble, but still those users wand to have it. And also Damian
mentioned that he has a project what will use it.

Also thanks from my side!

Tobias

[PATCH] tree-optimization/116796 - virtual LC SSA broken after unrolling

2024-09-23 Thread Richard Biener

When the unroller unloops loops it tracks whether it changes any
nesting relationship of remaining loops but when scanning a loops
preheader it fails to pass down the LC-SSA-invalidated bitmap, losing
the fact that an unrolled formerly inner loop can now be placed on
an exit of its outer loop.  The following fixes that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116796
* cfgloopmanip.cc (fix_loop_placements): Get LC-SSA-invalidated
bitmap and pass it on.
(remove_path): Pass LC-SSA-invalidated to fix_loop_placements.
---
 gcc/cfgloopmanip.cc | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 3707db2fdb3..d37d351fdf3 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -39,7 +39,7 @@ static void loop_redirect_edge (edge, basic_block);
 static void remove_bbs (basic_block *, int);
 static bool rpe_enum_p (const_basic_block, const void *);
 static int find_path (edge, basic_block **);
-static void fix_loop_placements (class loop *, bool *);
+static void fix_loop_placements (class loop *, bool *, bitmap);
 static bool fix_bb_placement (basic_block);
 static void fix_bb_placements (basic_block, bool *, bitmap);
 
@@ -415,7 +415,8 @@ remove_path (edge e, bool *irred_invalidated,
   /* Fix placements of basic blocks inside loops and the placement of
  loops in the loop tree.  */
   fix_bb_placements (from, irred_invalidated, loop_closed_ssa_invalidated);
-  fix_loop_placements (from->loop_father, irred_invalidated);
+  fix_loop_placements (from->loop_father, irred_invalidated,
+  loop_closed_ssa_invalidated);
 
   if (local_irred_invalidated
   && loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
@@ -1048,7 +1049,8 @@ unloop (class loop *loop, bool *irred_invalidated,
invalidate the information about irreducible regions.  */
 
 static void
-fix_loop_placements (class loop *loop, bool *irred_invalidated)
+fix_loop_placements (class loop *loop, bool *irred_invalidated,
+bitmap loop_closed_ssa_invalidated)
 {
   class loop *outer;
 
@@ -1064,7 +1066,7 @@ fix_loop_placements (class loop *loop, bool 
*irred_invalidated)
 to the loop.  So call fix_bb_placements to fix up the placement
 of the preheader and (possibly) of its predecessors.  */
   fix_bb_placements (loop_preheader_edge (loop)->src,
-irred_invalidated, NULL);
+irred_invalidated, loop_closed_ssa_invalidated);
   loop = outer;
 }
 }
-- 
2.43.0

[PATCH] tree-optimization/116810 - out-of-bound access to matches[]

2024-09-23 Thread Richard Biener

The following makes sure to apply forced splitting of groups for
firced single-lane SLP only when the group being analyzed has more
than one lane.  This avoids an out-of-bound access to matches[].

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116810
* tree-vect-slp.cc (vect_build_slp_instance): Onlu force
splitting for group_size > 1.
---
 gcc/tree-vect-slp.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 600987dd6e5..c24376688f5 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3715,7 +3715,7 @@ vect_build_slp_instance (vec_info *vinfo,
   unsigned i;
 
   slp_tree node = NULL;
-  if (force_single_lane)
+  if (group_size > 1 && force_single_lane)
 {
   matches[0] = true;
   matches[1] = false;
-- 
2.43.0

[PATCH v4] c++: Don't crash when mangling member with anonymous union or template types [PR100632, PR109790]

2024-09-23 Thread Simon Martin

Hi Jason,

On 20 Sep 2024, at 18:06, Jason Merrill wrote:

> On 9/16/24 4:07 PM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 14 Sep 2024, at 18:44, Simon Martin wrote:
>>
>>> Hi Jason,
>>>
>>> On 14 Sep 2024, at 18:11, Jason Merrill wrote:
>>>
 On 9/13/24 11:06 AM, Simon Martin wrote:
> Hi Jason,
>
> On 12 Sep 2024, at 16:48, Jason Merrill wrote:
>
>> On 9/12/24 7:23 AM, Simon Martin wrote:
>>> Hi,
>>>
>>> While looking at more open PRs, I have discovered that the 
>>> problem
>>
>>> reported in PR109790 is very similar to that in PR100632, so 
>>> I’m
>>> combining both in a single patch attached here. The fix is 
>>> similar
>>
>>> to
>>> the one I initially submitted, only more general and I believe
>>> better.
>>
>>> We currently crash upon mangling members that have an anonymous
>>> union
>>> or a template type.
>>>
>>> The problem is that before calling write_unqualified_name,
>>> write_member_name has an assert that assumes that it has an
>>> IDENTIFIER_NODE in its hand. However it's incorrect: it has an
>>> anonymous union in PR100632, and a template in PR109790.
>>
>> The assert does not assume it has an IDENTIFIER_NODE; it assumes 
>> it
>>
>> has a _DECL, and expects its DECL_NAME to be an IDENTIFIER_NODE.
>>
>> !identifier_p will always be true for a _DECL, making the assert
>> useless.
> Indeed, my bad. Thanks for catching and explaining this!
>>
>> How about checking !DECL_NAME (member) instead of !identifier_p?
> Unfortunately it does not fix PR100632, that actually involves
> legitimate operators.
>
> I checked why the assert was added in the first place (via 
> r11-6301),
>
> and the idea was to catch any case where we’d be missing the
>>
> “on”
> marker - PR100632 contains such cases.

 I assume you mean 109790?
>>> Yes :-/

> So I took the approach to refactor write_member_name a bit to 
> first
> write the marker in all the cases required, and then actually 
> write
> the
> member name; and the assert is not needed anymore there.

 Refactoring code in mangle.cc is tricky given the intent to retain
 backward bug-compatibility.

 Specifically, adding the "on" in ABI v11 is wrong since GCC 10 (ABI

 v14-15) didn't emit it for the 109790 testcase; we can add it for 
 v16,
 since GCC 11 ICEd on the testcase.

 I would prefer to fix the bug locally rather than refactor.
>>> Understood, that makes sense.
>>>
>>> I’ll work on a more local patch and resubmit (by the way you can 
>>> also
>>> ignore
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662496.html,
>>> that is also “too wide”).
>> I’m attaching a revised version of the patch, that lets members
>> with an anonymous union type go through, and for operators, 
>> introduces
>> a new ABI version under which it adds the missing "on”.
>
> We can add the missing "on" in v16, since previous releases of v16-19 
> ICEd on that case.
Duh you’re right; amended in the latest revision of the patch.
>
>> @@ -3255,7 +3255,15 @@ write_member_name (tree member)
>>  }
>>else if (DECL_P (member))
>>  {
>> -  gcc_assert (!DECL_OVERLOADED_OPERATOR_P (member));
>> +  if (ANON_AGGR_TYPE_P (TREE_TYPE (member)))
>> +;
>> +  else if (DECL_OVERLOADED_OPERATOR_P (member))
>> +{
>> +  if (abi_check (20))
>> +write_string ("on");
>> +}
>> +  else
>> +gcc_assert (identifier_p (DECL_NAME (member)));
>
> This last else is redundant; checking DECL_OVERLOADED_OPERATOR_P 
> already asserts that it's an identifier.
Thanks for pointing this out, fixed in the attached revision, 
successfully tested on x86_64-pc-linux-gnu. OK for trunk?

Thanks, SimonFrom 79dfd8c8fc9e6c33549cc041c13c49acaf6b4994 Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Mon, 16 Sep 2024 13:45:32 +0200
Subject: [PATCH] c++: Don't crash when mangling member with anonymous union or 
template type [PR100632, PR109790]

We currently crash upon mangling members that have an anonymous union or
a template operator type.

The problem is that before calling write_unqualified_name,
write_member_name asserts that it has a declaration whose DECL_NAME is
an identifier node that is not that of an operator. This is wrong:
 - In PR100632, it's an anonymous union declaration, hence a 0 DECL_NAME
 - In PR109790, it's a legitimate template declaration for an operator
   (this was accepted up to GCC 10)

This assert was added via r11-6301, to be sure that we do write the "on"
marker for operator members.

This patch removes that assert and instead
 - Lets members with an anonymous union type go through
 - For operators, adds the missing "on" marker for ABI versions greater
   than the highest usable with GCC 10

Successfully tested on x86_64-pc-linux-gnu.

PR c++/109790
PR c++/100632

gcc/

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Tobias Burnus


Hi Andre,

Andre Vehreschild wrote:

Could you also please specify the commit SHA your patch is supposed to apply
to? At current mainline's HEAD it has several rejects which makes reviewing
harder.


I just tried and here it applies cleanly on mainline, except that I get 
a bunch of:


Hunk #1 succeeded at 2904 (offset 74 lines).

style of warning, but those hunks still seem to end up at the proper play.


And please attach the patch as plain text. It is html-encoded with several
html-codes, for example a '>' is encoded as '>'. This makes it nearly
impossible to apply.


I don't see this in my email program – and also when looking at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663534.html – I 
don't see any '>' – also not when looking at the the HTML attachment.



please check the code style of your patch using:
contrib/check_GNU_style.py 
It reports several errors with line length and formatting.


Hmm, I only see errors related to tree dump, which seem to be okay:

=== ERROR type #1: there should be exactly one space between function 
name and parenthesis (7 error(s)) ===

gcc/fortran/dump-parse-tree.cc:2915:17:   fputs (" LOCAL(", dumpfile);

And the following is in the parser – and the spaces are mandatory here:

=== ERROR type #2: there should be no space before closing parenthesis 
(1 error(s)) ===
gcc/fortran/match.cc:2758:41:   else if (gfc_match ("default ( none 
)") == MATCH_YES)


I wonder what's the difference between our email readers. – Can you try 
the version from


the mailing list archive?

Cheers,

Tobias

[PATCH] tree-optimization/116818 - try VMAT_GATHER_SCATTER also for SLP

2024-09-23 Thread Richard Biener

When not doing SLP and we end up with VMAT_ELEMENTWISE we consider
using strided loads, aka VMAT_GATHER_SCATTER.  The following moves
this logic down to also apply to SLP where we now can end up
using VMAT_ELEMENTWISE as well.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/116818
* tree-vect-stmts.cc (get_group_load_store_type): Consider
VMAT_GATHER_SCATTER instead of VMAT_ELEMENTWISE also for SLP.
---
 gcc/tree-vect-stmts.cc | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ad08fbe5511..d74497822c4 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2264,21 +2264,21 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
}
}
}
+}
 
-  /* As a last resort, trying using a gather load or scatter store.
+  /* As a last resort, trying using a gather load or scatter store.
 
-??? Although the code can handle all group sizes correctly,
-it probably isn't a win to use separate strided accesses based
-on nearby locations.  Or, even if it's a win over scalar code,
-it might not be a win over vectorizing at a lower VF, if that
-allows us to use contiguous accesses.  */
-  if (*memory_access_type == VMAT_ELEMENTWISE
- && single_element_p
- && loop_vinfo
- && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
-masked_p, gs_info))
-   *memory_access_type = VMAT_GATHER_SCATTER;
-}
+ ??? Although the code can handle all group sizes correctly,
+ it probably isn't a win to use separate strided accesses based
+ on nearby locations.  Or, even if it's a win over scalar code,
+ it might not be a win over vectorizing at a lower VF, if that
+ allows us to use contiguous accesses.  */
+  if (*memory_access_type == VMAT_ELEMENTWISE
+  && single_element_p
+  && loop_vinfo
+  && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
+masked_p, gs_info))
+*memory_access_type = VMAT_GATHER_SCATTER;
 
   if (*memory_access_type == VMAT_GATHER_SCATTER
   || *memory_access_type == VMAT_ELEMENTWISE)
-- 
2.43.0

Re: [PATCH RFC] build: enable C++11 narrowing warnings

2024-09-23 Thread Jason Merrill


On 9/23/24 9:05 AM, Richard Biener wrote:

On Sat, Sep 21, 2024 at 2:49 AM Jason Merrill  wrote:


Tested x86_64-pc-linux-gnu.  OK for trunk?

-- 8< --

We've been using -Wno-narrowing since gcc 4.7, but at this point narrowing
diagnostics seem like a stable part of C++ and we should adjust.

This patch changes -Wno-narrowing to -Wno-error=narrowing so that narrowing
issues will still not break bootstrap, but we can see them.

The rest of the patch fixes the narrowing warnings I see in an
x86_64-pc-linux-gnu bootstrap.  In most of the cases, by adjusting the types
of various declarations so that we store the values in the same types we
compute them in, which seems worthwhile anyway.  This also allowed us to
remove a few -Wsign-compare casts.

The one place I didn't see how best to do this was in
vect_prologue_cost_for_slp: changing const_nunits to unsigned int broke the
call to TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits), since
poly_uint64.is_constant wants a pointer to unsigned HOST_WIDE_INT.  So I
added casts in that one place.  Not too bad, I think.

+   unsigned HOST_WIDE_INT foff = bitpos_of_field (field);


Can you make bitpos_of_field return unsigned HOST_WIDE_INT then and adjust it
accordingly - it looks for shwi fitting but negative DECL_FIELD_OFFSET
or BIT_OFFSET are not a thing.


So, like the attached?


@@ -7471,7 +7471,8 @@ vect_prologue_cost_for_slp (slp_tree node,
nelt_limit = const_nunits;
hash_set vector_ops;
for (unsigned int i = 0; i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); ++i)
-   if (!vector_ops.add ({ ops, i * const_nunits, const_nunits }))


So why do we diagnose this case (unsigned int member) but not ...


+   if (!vector_ops.add
+   ({ ops, i * (unsigned)const_nunits, (unsigned)const_nunits }))
   starts.quick_push (i * const_nunits);


... this one - unsigned int function argument?


Because the former is in { }, and the latter isn't; narrowing 
conversions are only ill-formed within { }.



I think it would be slightly better to do

 {
unsigned start = (unsigned) const_units * i;
if (!vector_ops.add ({ ops, start, const_unints }))
  starts.quick_push (start);
 }

to avoid the non-obvious difference between both.


We'd still need the cast for the third element, but now I notice we can 
use nelt_limit instead since it just got the same value.


So, OK with this supplemental patch?
From 8c4b43c76d07b5e1638123588bda7196740c0211 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Mon, 23 Sep 2024 08:49:13 -0400
Subject: [PATCH] gcc: narrowing tweaks
To: gcc-patches@gcc.gnu.org

gcc/ChangeLog:

	* tree-ssa-structalias.cc (bitpos_of_field): Return unsigned.
	* tree-vect-slp.cc (vect_prologue_cost_for_slp): Use nelt_limit
	instead of casts.
---
 gcc/tree-ssa-structalias.cc | 10 +-
 gcc/tree-vect-slp.cc|  5 ++---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 7adb50a896d..d6a53f801f0 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -3220,15 +3220,15 @@ process_constraint (constraint_t t)
 /* Return the position, in bits, of FIELD_DECL from the beginning of its
structure.  */
 
-static HOST_WIDE_INT
+static unsigned HOST_WIDE_INT
 bitpos_of_field (const tree fdecl)
 {
-  if (!tree_fits_shwi_p (DECL_FIELD_OFFSET (fdecl))
-  || !tree_fits_shwi_p (DECL_FIELD_BIT_OFFSET (fdecl)))
+  if (!tree_fits_uhwi_p (DECL_FIELD_OFFSET (fdecl))
+  || !tree_fits_uhwi_p (DECL_FIELD_BIT_OFFSET (fdecl)))
 return -1;
 
-  return (tree_to_shwi (DECL_FIELD_OFFSET (fdecl)) * BITS_PER_UNIT
-	  + tree_to_shwi (DECL_FIELD_BIT_OFFSET (fdecl)));
+  return (tree_to_uhwi (DECL_FIELD_OFFSET (fdecl)) * BITS_PER_UNIT
+	  + tree_to_uhwi (DECL_FIELD_BIT_OFFSET (fdecl)));
 }
 
 
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index bfdc4f7dec9..af137577333 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7471,9 +7471,8 @@ vect_prologue_cost_for_slp (slp_tree node,
   nelt_limit = const_nunits;
   hash_set vector_ops;
   for (unsigned int i = 0; i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); ++i)
-	if (!vector_ops.add
-	({ ops, i * (unsigned)const_nunits, (unsigned)const_nunits }))
-	  starts.quick_push (i * const_nunits);
+	if (!vector_ops.add ({ ops, i * nelt_limit, nelt_limit }))
+	  starts.quick_push (i * nelt_limit);
 }
   else
 {
-- 
2.46.0

Re: [PATCH] c++: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]

2024-09-23 Thread Jason Merrill


On 8/9/24 9:06 PM, Jakub Jelinek wrote:

Hi!

The following patch implements the C++23 P2718R0 paper
- Wording for P2644R1 Fix for Range-based for Loop.
As all the temporaries from __for_range initialization should have life
extended until the end of __for_range scope, this patch disables (for C++23
and later only and if !processing_template_decl) CLEANUP_POINT_EXPR wrapping
of the __for_range declaration, also disables -Wdangling-reference warning
as well as the rest of extend_ref_init_temps (we know the __for_range temporary
is not TREE_STATIC and as all the temporaries from the initializer will be life
extended, we shouldn't try to handle temporaries referenced by references any
differently) and adds an extra push_stmt_list/pop_stmt_list before
cp_finish_decl of __for_range and after end of the for body and wraps all
that into CLEANUP_POINT_EXPR.
I had to repeat that also for OpenMP range loops because those are handled
differently.


Let's add a flag for this, not just control it with cxx_dialect.  We 
might want to consider enabling it by default in earlier modes when not 
being strictly conforming?



@@ -44600,11 +44609,14 @@ cp_convert_omp_range_for (tree &this_pre
else
{
  range_temp = build_range_temp (init);
+ tree name = DECL_NAME (range_temp);
  DECL_NAME (range_temp) = NULL_TREE;
  pushdecl (range_temp);
+ DECL_NAME (range_temp) = name;
  cp_finish_decl (range_temp, init,
  /*is_constant_init*/false, NULL_TREE,
  LOOKUP_ONLYCONVERTING);
+ DECL_NAME (range_temp) = NULL_TREE;


This messing with the name needs a rationale.  What wants it to be null?

Jason

[PATCH] [MAINTAINERS] Fix myself in order and add username

2024-09-23 Thread saurabh.jha


ChangeLog:

* MAINTAINERS: Fix sort order and add username.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0ea4db20f88..3b4cf9d20d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -554,10 +554,10 @@ Sam James   sjames  
 Surya Kumari Jangalajskumari
 Jakub Jelinek   jakub   
 Andrew Jenner   andrewjenner
+Saurabh Jha saurabhjha  
 Haochen Jiang   -   
 Qian Jianhua-   
 Michal Jiresmjires  
-Saurabh Jha -   
 Janis Johnson   janis   
 Teresa Johnson  tejohnson   
 Kean Johnston   -

Re: [PATCH v10 0/2] Add support for AdvSIMD faminmax

2024-09-23 Thread Saurabh Jha





On 9/18/2024 4:28 PM, saurabh@arm.com wrote:

From: Saurabh Jha 

This is a revised version of this patch series:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663204.html

The only new thing in both patches of this series are fixing directives
in test cases, replace /* { dg-do assemble} */ with
/* { dg-do compile } */. We need compile here to make the tests work.
Sorry for missing this review in my previous version.

No changes in code.

Both patches don't require further review as pointed out by Richard
Sandiford in replies to the two patches
* https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663229.html
* https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663230.html

I will request commit access to gcc after this patch is accepted.
Because I already have commit access to binutils, I will email the
overseers.


Thanks for the reviews. I now have the commit access and have committed 
this patch series.


Saurabh Jha (2):
   aarch64: Add AdvSIMD faminmax intrinsics
   aarch64: Add codegen support for AdvSIMD faminmax

  gcc/config/aarch64/aarch64-builtins.cc| 119 
  .../aarch64/aarch64-option-extensions.def |   2 +
  .../aarch64/aarch64-simd-pragma-builtins.def  |  23 ++
  gcc/config/aarch64/aarch64-simd.md|  19 ++
  gcc/config/aarch64/aarch64.h  |   4 +
  gcc/config/aarch64/iterators.md   |  12 +
  gcc/doc/invoke.texi   |   2 +
  .../aarch64/simd/faminmax-builtins-no-flag.c  |  10 +
  .../aarch64/simd/faminmax-builtins.c  | 115 
  .../aarch64/simd/faminmax-codegen-no-flag.c   | 217 ++
  .../aarch64/simd/faminmax-codegen.c   | 197 +
  .../aarch64/simd/faminmax-no-codegen.c| 267 ++
  12 files changed, 987 insertions(+)
  create mode 100644 gcc/config/aarch64/aarch64-simd-pragma-builtins.def
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-no-codegen.c

[PATCH] hosthooks.h: Fix GCC_HOST_HOOKS_H typo

2024-09-23 Thread Yangyu Chen

The comment of the final endif in hosthooks.h is wrong, it should be
GCC_HOST_HOOKS_H instead of GCC_LANG_HOOKS_H.

gcc/ChangeLog:

* hosthooks.h (struct host_hooks): Fix GCC_HOST_HOOKS_H typo.

Signed-off-by: Yangyu Chen 
---
 gcc/hosthooks.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/hosthooks.h b/gcc/hosthooks.h
index 53363801330..8178c9c692a 100644
--- a/gcc/hosthooks.h
+++ b/gcc/hosthooks.h
@@ -47,4 +47,4 @@ struct host_hooks
 /* Each host provides its own.  */
 extern const struct host_hooks host_hooks;
 
-#endif /* GCC_LANG_HOOKS_H */
+#endif /* GCC_HOST_HOOKS_H */
-- 
2.45.2

Re: [PATCH] doc: Remove @code wrapping of fortran option names [PR116801]

2024-09-23 Thread Andreas Schwab

On Sep 23 2024, Mikael Morin wrote:

> For options where the variable is not a separate argument, I think it's
> preferable to keep the variable.
>
> For example, -ffree-line-length-@var{n} looks better on the index page as
> "-ffree-line-length-n" (with the n having a different formatting), than as
> "-free-line-length-".  It makes it clear that there is a suffix to the
> option.

Whatever you feel like is the right solution, please make it constent
throughout the manual.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH] doc: Remove @code wrapping of fortran option names [PR116801]

2024-09-23 Thread Arsen Arsenović

Andreas Schwab  writes:

> It's only about the @opindex.  The vast majority have those variable
> parts removed from the index entry.

We can probably do both at the same time, either via makeinfos -D option
and some special macro, or by emitting a machine-generated index, maybe.

How would a flag which has a variable part in the name (like
ffree-line-length-...) get matched?  i.e. what would we need to emit for
it to be linked correctly?  This impacts what the right solution is.

TIA, have a lovely day.
-- 
Arsen Arsenović

signature.asc
Description: PGP signature

Re: [PATCH RFC] build: enable C++11 narrowing warnings

2024-09-23 Thread Richard Biener

On Sat, Sep 21, 2024 at 2:49 AM Jason Merrill  wrote:
>
> Tested x86_64-pc-linux-gnu.  OK for trunk?
>
> -- 8< --
>
> We've been using -Wno-narrowing since gcc 4.7, but at this point narrowing
> diagnostics seem like a stable part of C++ and we should adjust.
>
> This patch changes -Wno-narrowing to -Wno-error=narrowing so that narrowing
> issues will still not break bootstrap, but we can see them.
>
> The rest of the patch fixes the narrowing warnings I see in an
> x86_64-pc-linux-gnu bootstrap.  In most of the cases, by adjusting the types
> of various declarations so that we store the values in the same types we
> compute them in, which seems worthwhile anyway.  This also allowed us to
> remove a few -Wsign-compare casts.
>
> The one place I didn't see how best to do this was in
> vect_prologue_cost_for_slp: changing const_nunits to unsigned int broke the
> call to TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits), since
> poly_uint64.is_constant wants a pointer to unsigned HOST_WIDE_INT.  So I
> added casts in that one place.  Not too bad, I think.
>
> gcc/ChangeLog:
>
> * configure.ac (CXX_WARNING_OPTS): Change -Wno-narrowing
> to -Wno-error=narrowing.
> * configure: Regenerate.
> * config/i386/i386.h (debugger_register_map)
> (debugger64_register_map)
> (svr4_debugger_register_map): Make unsigned.
> * config/i386/i386.cc: Likewise.
> * diagnostic-event-id.h (diagnostic_thread_id_t): Make int.
> * vec.h (vec::size): Make unsigned int.
> * ipa-modref.cc (escape_point::arg): Make unsigned.
> (modref_lattice::add_escape_point): Use eaf_flags_t.
> (update_escape_summary_1): Use eaf_flags_t, && for bool.
> * pair-fusion.cc (pair_fusion_bb_info::track_access):
> Make mem_size unsigned int.
> * pretty-print.cc (format_phase_2): Cast va_arg to char.
> * tree-ssa-loop-ch.cc (ch_base::copy_headers): Make nheaders
> unsigned, remove cast.
> * tree-ssa-structalias.cc (push_fields_onto_fieldstack):
> Make offset unsigned, remove cast.
> * tree-vect-slp.cc (vect_prologue_cost_for_slp): Add cast.
> * tree-vect-stmts.cc (vect_truncate_gather_scatter_offset):
> Make scale unsigned.
> (vectorizable_operation): Make ncopies unsigned.
> * rtl-ssa/member-fns.inl: Make num_accesses unsigned int.
> ---
>  gcc/config/i386/i386.h  |  6 +++---
>  gcc/diagnostic-event-id.h   |  2 +-
>  gcc/vec.h   |  2 +-
>  gcc/config/i386/i386.cc |  6 +++---
>  gcc/ipa-modref.cc   | 13 +++--
>  gcc/pair-fusion.cc  |  2 +-
>  gcc/pretty-print.cc |  2 +-
>  gcc/tree-ssa-loop-ch.cc |  6 +++---
>  gcc/tree-ssa-structalias.cc |  6 +++---
>  gcc/tree-vect-slp.cc|  3 ++-
>  gcc/tree-vect-stmts.cc  |  7 ---
>  gcc/configure.ac|  3 +--
>  gcc/rtl-ssa/member-fns.inl  |  3 ++-
>  gcc/configure   |  7 +++
>  14 files changed, 35 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index c1ec92ffb15..751c250ddb3 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2091,9 +2091,9 @@ do {
>   \
>  #define DEBUGGER_REGNO(N) \
>(TARGET_64BIT ? debugger64_register_map[(N)] : debugger_register_map[(N)])
>
> -extern int const debugger_register_map[FIRST_PSEUDO_REGISTER];
> -extern int const debugger64_register_map[FIRST_PSEUDO_REGISTER];
> -extern int const svr4_debugger_register_map[FIRST_PSEUDO_REGISTER];
> +extern unsigned int const debugger_register_map[FIRST_PSEUDO_REGISTER];
> +extern unsigned int const debugger64_register_map[FIRST_PSEUDO_REGISTER];
> +extern unsigned int const svr4_debugger_register_map[FIRST_PSEUDO_REGISTER];
>
>  /* Before the prologue, RA is at 0(%esp).  */
>  #define INCOMING_RETURN_ADDR_RTX \
> diff --git a/gcc/diagnostic-event-id.h b/gcc/diagnostic-event-id.h
> index 8237ba34df3..06985d23c12 100644
> --- a/gcc/diagnostic-event-id.h
> +++ b/gcc/diagnostic-event-id.h
> @@ -67,6 +67,6 @@ typedef diagnostic_event_id_t *diagnostic_event_id_ptr;
>  /* A type for compactly referring to a particular thread within a
> diagnostic_path.  Typically there is just one thread per path,
> with id 0.  */
> -typedef unsigned diagnostic_thread_id_t;
> +typedef int diagnostic_thread_id_t;
>
>  #endif /* ! GCC_DIAGNOSTIC_EVENT_ID_H */
> diff --git a/gcc/vec.h b/gcc/vec.h
> index bc83827f644..b13c4716428 100644
> --- a/gcc/vec.h
> +++ b/gcc/vec.h
> @@ -2409,7 +2409,7 @@ public:
>const value_type &back () const;
>const value_type &operator[] (unsigned int i) const;
>
> -  size_t size () const { return m_size; }
> +  unsigned size () const { return m_size; }
>size_t size_bytes () const { return m_size * sizeof (T); }
>bool empty () const { return m_size == 0; }
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/conf

Re: [PATCH 1/3] Remove commented out PHI_ARG_DEF macro defition

2024-09-23 Thread Richard Biener

On Mon, Sep 23, 2024 at 5:48 AM Andrew Pinski  wrote:
>
> This was commented out since r0-125500-g80560f9521f81a and a new
> defition was added at the same time. Let's remove the commented
> out version.

OK

> gcc/ChangeLog:
>
> * tree-ssa-operands.h (PHI_ARG_DEF): Remove definition.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-ssa-operands.h | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/gcc/tree-ssa-operands.h b/gcc/tree-ssa-operands.h
> index b6534f18c66..f368d5b59f8 100644
> --- a/gcc/tree-ssa-operands.h
> +++ b/gcc/tree-ssa-operands.h
> @@ -74,9 +74,6 @@ struct GTY(()) ssa_operands {
>
>  #define PHI_RESULT(PHI)gimple_phi_result (PHI)
>  #define SET_PHI_RESULT(PHI, V) SET_DEF (gimple_phi_result_ptr (PHI), (V))
> -/*
> -#define PHI_ARG_DEF(PHI, I)USE_FROM_PTR (PHI_ARG_DEF_PTR ((PHI), (I)))
> -*/
>  #define PHI_ARG_DEF_PTR(PHI, I)gimple_phi_arg_imm_use_ptr ((PHI), 
> (I))
>  #define PHI_ARG_DEF(PHI, I)gimple_phi_arg_def ((PHI), (I))
>  #define SET_PHI_ARG_DEF(PHI, I, V) \
> --
> 2.34.1
>

Re: [PATCH 2/3] gimple: Remove custom remove_pointer

2024-09-23 Thread Richard Biener

On Mon, Sep 23, 2024 at 5:48 AM Andrew Pinski  wrote:
>
> Since r11-2700-g22dc89f8073cd0, type_traits has been included via system.h so
> we don't need a custom version for gimple.h.
>
> Note a small C++14 cleanup is to use remove_pointer_t directly here instead
> of remove_pointer::type.
>
> bootstrapped and tested on x86_64-linux-gnu

OK

> gcc/ChangeLog:
>
> * gimple.h (remove_pointer): Remove.
> (GIMPLE_CHECK2): Use std::remove_pointer instead of custom one.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple.h | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index ee986eaf153..4a6e0e97d1e 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -37,10 +37,6 @@ enum gimple_code {
>  extern const char *const gimple_code_name[];
>  extern const unsigned char gimple_rhs_class_table[];
>
> -/* Strip the outermost pointer, from tr1/type_traits.  */
> -template struct remove_pointer { typedef T type; };
> -template struct remove_pointer { typedef T type; };
> -
>  /* Error out if a gimple tuple is addressed incorrectly.  */
>  #if defined ENABLE_GIMPLE_CHECKING
>  #define gcc_gimple_checking_assert(EXPR) gcc_assert (EXPR)
> @@ -72,7 +68,7 @@ GIMPLE_CHECK2(const gimple *gs,
>T ret = dyn_cast  (gs);
>if (!ret)
>  gimple_check_failed (gs, file, line, fun,
> -remove_pointer::type::code_, ERROR_MARK);
> +std::remove_pointer::type::code_, ERROR_MARK);
>return ret;
>  }
>  template 
> @@ -91,7 +87,7 @@ GIMPLE_CHECK2(gimple *gs,
>T ret = dyn_cast  (gs);
>if (!ret)
>  gimple_check_failed (gs, file, line, fun,
> -remove_pointer::type::code_, ERROR_MARK);
> +std::remove_pointer::type::code_, ERROR_MARK);
>return ret;
>  }
>  #else  /* not ENABLE_GIMPLE_CHECKING  */
> --
> 2.34.1
>

Re: [PATCH 3/3] gimple: Simplify gimple_seq_nondebug_singleton_p

2024-09-23 Thread Richard Biener

On Mon, Sep 23, 2024 at 5:48 AM Andrew Pinski  wrote:
>
> The implementation of gimple_seq_nondebug_singleton_p
> was convoluted on how to determine if the sequence
> was a singleton (which could contain debug statements).
>
> This simplifies the function into two calls. One to get the start
> after all of the debug statements and then check to see if it
> is at the one before the end (or there is only debug statements
> afterwards).
>
> Bootstrapped and tested on x86_64-linux-gnu (including ada).

OK

> gcc/ChangeLog:
>
> * gimple-iterator.h (gimple_seq_nondebug_singleton_p):
> Rewrite to be simplely, 
> gsi_start_nondebug/gsi_one_nondebug_before_end_p.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-iterator.h | 23 ++-
>  1 file changed, 2 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/gimple-iterator.h b/gcc/gimple-iterator.h
> index 501f0549d92..97176d639d9 100644
> --- a/gcc/gimple-iterator.h
> +++ b/gcc/gimple-iterator.h
> @@ -430,28 +430,9 @@ gsi_seq (gimple_stmt_iterator i)
>  inline bool
>  gimple_seq_nondebug_singleton_p (gimple_seq seq)
>  {
> -  gimple_stmt_iterator gsi;
> -
> -  /* Find a nondebug gimple.  */
> -  gsi.ptr = gimple_seq_first (seq);
> -  gsi.seq = &seq;
> -  gsi.bb = NULL;
> -  while (!gsi_end_p (gsi)
> -&& is_gimple_debug (gsi_stmt (gsi)))
> -gsi_next (&gsi);
> -
> -  /* No nondebug gimple found, not a singleton.  */
> -  if (gsi_end_p (gsi))
> -return false;
> -
> -  /* Find a next nondebug gimple.  */
> -  gsi_next (&gsi);
> -  while (!gsi_end_p (gsi)
> -&& is_gimple_debug (gsi_stmt (gsi)))
> -gsi_next (&gsi);
> +  gimple_stmt_iterator gsi = gsi_start_nondebug (seq);
>
> -  /* Only a singleton if there's no next nondebug gimple.  */
> -  return gsi_end_p (gsi);
> +  return gsi_one_nondebug_before_end_p (gsi);
>  }
>
>  #endif /* GCC_GIMPLE_ITERATOR_H */
> --
> 2.34.1
>

Re: [Patch] OpenMP: Add support for 'self_maps' to the 'require' directive

2024-09-23 Thread Andre Vehreschild

Hi Tobias,

to my eye this looks fine. I would appreciate, if you could add some tests for
errors on the fortran side, esp. where modules are involved. But no must. 

Ok for mainline. Thanks for the patch.

- Andre

On Sat, 21 Sep 2024 23:37:33 +0200
Tobias Burnus  wrote:

> Add support of the 'self_maps' clause in 'omp requires',
> an OpenMP 6 feature but added here mostly as part of the
> on-going improvement of the unified-shared memory (USM) handling.
> 
> Comments, remarks concerns before I commit it?
> 
> * * *
> 
> Regarding USM, there is on one hand the hardware:
> 
> - some hardware cannot access the host memory at all
> - other hardware can access it, but either only through
>an interconnect or via page migration on page fault
> - on the third time of hardware, a host and device share
>the same memory controller
> 
> For the latter, a 'map' never does make sense, but for
> the second case, it depends on the details whether it is
> better to do mapping or directly accessing the memory
> (i.e. via interconnect or page migration).
> 
> On the compile-time side, the user can demand:
> - no requirement
> - 'requires unified_shared_memory' (= memory has to be accessible
>but the implementation can still do mapping for explicit maps)
> - 'requires shared_memory' - mapping is strictly not permitted.
> - other hints using compiler flags
> 
> And for the runtime, the result depends on the actual hardware,
> the compile-time wishes, environment variables what is done.
> 
> * * *
> 
> Currently, the runtime never maps with USM, i.e. both act the same.
> At least using an environment variable, I would consider enabling
> mapping - one could also consider to have it always do mappings,
> except for self_maps.
> 
> On the compile side, we need to handle implicit 'declare target'
> better - as it currently leads to separate memory. Using 'link',
> we could point to the host memory (at least for 'self_maps').
> 
> And before we can enable USM by default for integrated/APU devices,
> we need to solve some issues with 'link' (→ posted link) and for
> those, 'map' has to be honored.
> 
> Those are 5.x follow up tasks, but having 'self_maps' available,
> completes the what-does-the-user-want part.
> 
> Tobias
> 
> PS: There is also the 'self' modifier to the map clause, working
> on a per-variable granularity. However, this like several other
> 6.0 items is completely out of scope of the current USM work.
> 
> PPS: See
> also https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663209.html and 
> the patch associated set, posted
> at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655946.html


-- 
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH v3 08/12] OpenMP: Reject other properties with kind(any)

2024-09-23 Thread Jakub Jelinek

On Sun, Sep 22, 2024 at 08:45:40AM -0600, Sandra Loosemore wrote:
> On 9/21/24 22:52, Jakub Jelinek wrote:
> > On Sat, Sep 21, 2024 at 08:08:29PM -0600, Sandra Loosemore wrote:
> > > On 9/20/24 01:41, Jakub Jelinek wrote:
> > > > > +
> > > > > +   /* Check for unknown properties.  */
> > > > > if (omp_ts_map[ts_code].valid_properties == NULL)
> > > > >   continue;
> > > > > -
> > > > 
> > > > Why?
> > > 
> > > Why what?  I made this change because when I added another check in this
> > > loop I was temporarily confused about the control flow and I thought this
> > > would help.  If you're asking why it's doing the null check here, it's
> > > because it doesn't make sense to check properties on all selectors, like
> > > properties that can have implementation-defined values for other 
> > > compilers.
> > 
> > I meant why the empty line was removed?
> 
> To make it more clear that the null check is part of the code chunk to
> diagnose unknown properties -- each of which is now formatted the same way,
> with a blank line, comment to explain what it's doing, and no blank lines
> within the chunk.

Ok.

Jakub

Re: [PATCH v1] Genmatch: Fix ICE for binary phi cfg mismatching [PR116795]

2024-09-23 Thread Richard Biener

On Sun, Sep 22, 2024 at 12:50 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix one ICE when try to match the binary
> phi for below cfg.  We check the first edge of the Phi block comes
> from b0, instead of check the only one edge of b1 comes from the
> b0 too.  Thus, it will result in some code to be recog as .SAT_SUB
> but it is not, and finally result the verify_ssa failure.
>
> +--+
> | b0:  |
> | def  |   +-+
> | ...  |   | b1: |
> | cond |-->| def |
> +--+   | ... |
>|   +-+
>|  |
>|  |
>v  |
> +-+   |
> | b2: |   |
> | Phi |<--+
> +-+
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.

OK.

Thanks,
Richard.

> PR target/116795
>
> gcc/ChangeLog:
>
> * gimple-match-head.cc (match_cond_with_binary_phi): Fix the
> incorrect cfg check as b0->b1 in above example.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr116795-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/gimple-match-head.cc  |  2 +-
>  gcc/testsuite/gcc.dg/torture/pr116795-1.c | 14 ++
>  2 files changed, 15 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116795-1.c
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index b63b66e9485..b5d4a71ddc5 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -402,7 +402,7 @@ match_cond_with_binary_phi (gphi *phi, tree *true_arg, 
> tree *false_arg)
>if (EDGE_COUNT (pred_b0->succs) == 2
>&& EDGE_COUNT (pred_b1->succs) == 1
>&& EDGE_COUNT (pred_b1->preds) == 1
> -  && pred_b0 == EDGE_PRED (gimple_bb (phi), 0)->src)
> +  && pred_b0 == EDGE_PRED (pred_b1, 0)->src)
>  /*
>   * +--+
>   * | b0:  |
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116795-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr116795-1.c
> new file mode 100644
> index 000..629bdf4bacd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116795-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +volatile int a, b;
> +int c;
> +int main() {
> +  unsigned e = 0;
> +  for (; e < 2; e++) {
> +a && b;
> +if (c)
> +  e = -(c ^ e);
> +  }
> +  return 0;
> +}
> --
> 2.43.0
>

Re: [PATCH v1] RISC-V: RISC-V: Add testcases for form 4 of signed vector SAT_ADD

2024-09-23 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-09-23 13:43
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: RISC-V: Add testcases for form 4 of signed vector 
SAT_ADD
From: Pan Li 
 
Form 4:
  #define DEF_VEC_SAT_S_ADD_FMT_4(T, UT, MIN, MAX) \
  void __attribute__((noinline))   \
  vec_sat_s_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
T sum; \
bool overflow = __builtin_add_overflow (x, y, &sum);   \
out[i] = !overflow ? sum : x < 0 ? MIN : MAX;  \
  }\
  }
 
DEF_VEC_SAT_S_ADD_FMT_4 (int8_t, uint8_t, INT8_MIN, INT8_MAX)
 
The below test are passed for this patch.
* The rv64gcv fully regression test.
 
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-16.c: New test.
 
Signed-off-by: Pan Li 
---
.../rvv/autovec/binop/vec_sat_s_add-13.c  |  9 
.../rvv/autovec/binop/vec_sat_s_add-14.c  |  9 
.../rvv/autovec/binop/vec_sat_s_add-15.c  |  9 
.../rvv/autovec/binop/vec_sat_s_add-16.c  |  9 
.../rvv/autovec/binop/vec_sat_s_add-run-13.c  | 17 ++
.../rvv/autovec/binop/vec_sat_s_add-run-14.c  | 17 ++
.../rvv/autovec/binop/vec_sat_s_add-run-15.c  | 17 ++
.../rvv/autovec/binop/vec_sat_s_add-run-16.c  | 17 ++
.../riscv/rvv/autovec/vec_sat_arith.h | 22 +++
9 files changed, 126 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-13.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-14.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-15.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-16.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-13.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-14.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-15.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-16.c
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-13.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-13.c
new file mode 100644
index 000..ec3f8aee434
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-13.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fdump-rtl-expand-details" } */
+
+#include "../vec_sat_arith.h"
+
+DEF_VEC_SAT_S_ADD_FMT_4(int8_t, uint8_t, INT8_MIN, INT8_MAX)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-assembler-times {vsadd\.vv} 1 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-14.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-14.c
new file mode 100644
index 000..5542616c90a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-14.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fdump-rtl-expand-details" } */
+
+#include "../vec_sat_arith.h"
+
+DEF_VEC_SAT_S_ADD_FMT_4(int16_t, uint16_t, INT16_MIN, INT16_MAX)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-assembler-times {vsadd\.vv} 1 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-15.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-15.c
new file mode 100644
index 000..091bfd15edf
---

Re: [PATCH]middle-end: Insert invariant instructions before the gsi [PR116812[

2024-09-23 Thread Richard Biener

On Mon, 23 Sep 2024, Tamar Christina wrote:

> Hi All,
> 
> The new invariant statements should be inserted before the current
> statement and not after.  This goes fine 99% of the time but when the
> current statement is a gcond the control flow gets corrupted.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/116812
>   * tree-vect-slp.cc (vect_slp_region): Fix insertion.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/116812
>   * gcc.dg/vect/pr116812.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.dg/vect/pr116812.c 
> b/gcc/testsuite/gcc.dg/vect/pr116812.c
> new file mode 100644
> index 
> ..3e83c13d94bdb475828971efb5b6f2e5101eaebe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr116812.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -fno-tree-dce -fno-tree-dse" } */
> +
> +int a, b, c, d, e, f[2], g, h;
> +int k(int j) { return 2 >> a ? 2 >> a : a; }
> +int main() {
> +  int i;
> +  for (; g; g = k(d = 0))
> +;
> +  if (a)
> +b && h;
> +  for (e = 0; e < 2; e++)
> +c = d & 1 ? d : 0;
> +  for (i = 0; i < 2; i++)
> +f[i] = 0;
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 
> 600987dd6e5d506aa5fbb02350f9dab77793d382..d08d2f84a00ed307e03fc1c027681d428e12fdd0
>  100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -9170,8 +9170,8 @@ vect_slp_region (vec bbs, 
> vec datarefs,
>  
> gimple_stmt_iterator gsi;
> gsi = gsi_after_labels (bb_vinfo->bbs[0]);
> -   gsi_insert_seq_after (&gsi, bb_vinfo->inv_pattern_def_seq,
> - GSI_CONTINUE_LINKING);
> +   gsi_insert_seq_before (&gsi, bb_vinfo->inv_pattern_def_seq,
> +  GSI_CONTINUE_LINKING);

Please use bb_vinfo->insert_seq_on_entry

OK with that change.
Richard.

>   }
>   }
>else
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH 2/2]middle-end: use two's complement equality when comparing IVs during candidate selection [PR114932]

2024-09-23 Thread Tamar Christina

ping

> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, September 10, 2024 8:57 PM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> comparing IVs during candidate selection [PR114932]
> 
> ping
> 
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Tuesday, August 20, 2024 2:06 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> > Subject: [PATCH 2/2]middle-end: use two's complement equality when
> comparing
> > IVs during candidate selection [PR114932]
> >
> > Hi All,
> >
> > IVOPTS normally uses affine trees to perform comparisons between different 
> > IVs,
> > but these seem to have been missing in two key spots and instead normal tree
> > equivalencies used.
> >
> > In some cases where we have a two-complements equivalence but not a strict
> > signedness equivalencies we end up generating both a signed and unsigned IV 
> > for
> > the same candidate.
> >
> > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag 
> > will
> > check if the operands would produce the same bit values after the 
> > computations
> > even if the final sign is different.
> >
> > This happens quite a lot with fortran but can also happen in C because this 
> > came
> > code is unable to figure out when one expression is a multiple of another.
> >
> > As an example in the attached testcase we get:
> >
> > Initial set of candidates:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >group:0 --> iv_cand:6, cost=(0,1)
> >group:1 --> iv_cand:1, cost=(0,0)
> >group:2 --> iv_cand:8, cost=(0,1)
> >group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > :
> > inv_expr 1: stride.3_27 * 4
> > inv_expr 2: (unsigned long) stride.3_27 * 4
> >
> > These end up being used in the same group:
> >
> > Group 1:
> > cand  costcompl.  inv.expr.   inv.vars
> > 1 0   0   NIL;6
> > 2 0   0   NIL;6
> > 3 0   0   NIL;6
> >
> > which ends up with IV opts picking the signed and unsigned IVs:
> >
> > Improved to:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >group:0 --> iv_cand:6, cost=(0,1)
> >group:1 --> iv_cand:1, cost=(0,0)
> >group:2 --> iv_cand:8, cost=(0,1)
> >group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > and so generates the same IV as both signed and unsigned:
> >
> > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > 58.2545), maybe hot
> > ;;prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > ;;pred:   28 [always]  count:23622320 (estimated locally, freq 
> > 6.4080)
> > (FALLTHRU,EXECUTABLE)
> > ;;25 [always]  count:191126046 (estimated locally, freq 
> > 51.8465)
> > (FALLTHRU,DFS_BACK,EXECUTABLE)
> >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> >   # ivtmp.26_51 = PHI 
> >   # ivtmp.28_90 = PHI 
> >
> > ...
> >
> > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > 58.2545), maybe hot
> > ;;prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > ;;pred:   22 [always]  count:95443719 (estimated locally, freq 
> > 25.8909)
> > (FALLTHRU)
> > ;;21 [33.3% (guessed)]  count:71582790 (estimated locally, 
> > freq
> 19.4182)
> > (TRUE_VALUE,EXECUTABLE)
> > ;;31 [33.3% (guessed)]  count:47721860 (estimated locally, 
> > freq
> 12.9455)
> > (TRUE_VALUE,EXECUTABLE)
> > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > ivtmp.22_82 = ivtmp.22_41 + 1;
> > ivtmp.26_72 = ivtmp.26_51 + _80;
> > ivtmp.28_98 = ivtmp.28_90 + _39;
> >
> > These two IVs are always used as unsigned, so IV ops generates:
> >
> >   _73 = stride.3_27 * 4;
> >   _80 = (unsigned long) _73;
> >   _54 = (unsigned long) stride.3_27;
> >   _39 = _54 * 4;
> >
> > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> >
> > This is because candidate 6 and 8 are equivalent under two's complement but
> have
> > different signs.
> >
> > This patch changes it so that if you have two IVs that are affine 
> > equivalent to
> > just pick one over the other.  IV already has code for this, so the patch 
> > just
> > uses affine trees instead of tree for the check.
> >
> > With it we get:
> >
> > :
> > inv_expr 1: stride.3_27 * 4
> >
> > :
> > Group 0:
> >   cand  costcompl.  inv.expr.   inv.vars
> >   5 0   2   NIL;NIL;
> >   6 0   3   NIL;NIL;
> >
> > Group 1:
> >   cand  costcompl.  inv.expr.   inv.vars
> >   1 0

RE: [PATCH 1/2]middle-end: refactor type to be explicit in operand_equal_p [PR114932]

2024-09-23 Thread Tamar Christina

ping

> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, September 10, 2024 8:57 PM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> Subject: RE: [PATCH 1/2]middle-end: refactor type to be explicit in
> operand_equal_p [PR114932]
> 
> ping
> 
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Tuesday, August 20, 2024 2:06 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> > Subject: [PATCH 1/2]middle-end: refactor type to be explicit in 
> > operand_equal_p
> > [PR114932]
> >
> > Hi All,
> >
> > This is a refactoring with no expected behavioral change.
> > The goal with this is to make the type of the expressions being used 
> > explicit.
> >
> > I did not change all the recursive calls to operand_equal_p () to recurse
> > directly to the new function but instead this goes through the top level 
> > call
> > which re-extracts the types.
> >
> > This was done because in most of the cases where we recurse type == arg.
> > The second patch makes use of this new flexibility to implement an overload
> > of operand_equal_p which checks for equality under two's complement.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/114932
> > * fold-const.cc (operand_compare::operand_equal_p): Split into one that
> > takes explicit type parameters and use that in public one.
> > * fold-const.h (class operand_compare): Add operand_equal_p private
> > overload.
> >
> > ---
> > diff --git a/gcc/fold-const.h b/gcc/fold-const.h
> > index
> >
> b82ef137e2f2096f86c20df3c7749747e604177e..878545b1148b839e8a8e866f
> > 38e31161f0d116c8 100644
> > --- a/gcc/fold-const.h
> > +++ b/gcc/fold-const.h
> > @@ -273,6 +273,12 @@ protected:
> >   true is returned.  Then RET is set to corresponding comparsion 
> > result.  */
> >bool verify_hash_value (const_tree arg0, const_tree arg1, unsigned int 
> > flags,
> >   bool *ret);
> > +
> > +private:
> > +  /* Return true if two operands are equal.  The flags fields can be used
> > + to specify OEP flags described in tree-core.h.  */
> > +  bool operand_equal_p (tree, const_tree, tree, const_tree,
> > +   unsigned int flags);
> >  };
> >
> >  #endif // GCC_FOLD_CONST_H
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index
> >
> 8908e7381e72cbbf4a8fd96f18cbf4436aba8441..71e82b1d76d4106c7c23c54af
> > 8b35905a1af9f1c 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -3156,6 +3156,17 @@ combine_comparisons (location_t loc,
> >  bool
> >  operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
> >   unsigned int flags)
> > +{
> > +  return operand_equal_p (TREE_TYPE (arg0), arg0, TREE_TYPE (arg1), arg1,
> > flags);
> > +}
> > +
> > +/* The same as operand_equal_p however the type of ARG0 and ARG1 are
> > assumed to be
> > +   the TYPE0 and TYPE1 respectively.  */
> > +
> > +bool
> > +operand_compare::operand_equal_p (tree type0, const_tree arg0,
> > + tree type1, const_tree arg1,
> > + unsigned int flags)
> >  {
> >bool r;
> >if (verify_hash_value (arg0, arg1, flags, &r))
> > @@ -3166,25 +3177,25 @@ operand_compare::operand_equal_p (const_tree
> > arg0, const_tree arg1,
> >
> >/* If either is ERROR_MARK, they aren't equal.  */
> >if (TREE_CODE (arg0) == ERROR_MARK || TREE_CODE (arg1) == ERROR_MARK
> > -  || TREE_TYPE (arg0) == error_mark_node
> > -  || TREE_TYPE (arg1) == error_mark_node)
> > +  || type0 == error_mark_node
> > +  || type1 == error_mark_node)
> >  return false;
> >
> >/* Similar, if either does not have a type (like a template id),
> >   they aren't equal.  */
> > -  if (!TREE_TYPE (arg0) || !TREE_TYPE (arg1))
> > +  if (!type0 || !type1)
> >  return false;
> >
> >/* Bitwise identity makes no sense if the values have different layouts. 
> >  */
> >if ((flags & OEP_BITWISE)
> > -  && !tree_nop_conversion_p (TREE_TYPE (arg0), TREE_TYPE (arg1)))
> > +  && !tree_nop_conversion_p (type0, type1))
> >  return false;
> >
> >/* We cannot consider pointers to different address space equal.  */
> > -  if (POINTER_TYPE_P (TREE_TYPE (arg0))
> > -  && POINTER_TYPE_P (TREE_TYPE (arg1))
> > -  && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
> > - != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
> > +  if (POINTER_TYPE_P (type0)
> > +  && POINTER_TYPE_P (type1)
> > +  && (TYPE_ADDR_SPACE (TREE_TYPE (type0))
> > + != TYPE_ADDR_SPACE (TREE_TYPE (type1
> >  return false;
> >
> >/* Check equality of integer constants before bailing out due to
> > @@ -3211,12 +3222,15 @@ operand_compare::opera

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Tobias Burnus


Hi all,

as a background – Anuj, did this as part of his Google Summer of Code
project (thanks!).

As I looked as various drafts, I would be happy if someone else could
have a look as well, as I probably start skipping over things and,
hence, as miss potential issues …

A bit hidden in the patch is a bug fix to allow 'concurrent' as loop
variable name of a normal 'do' loop …

Thanks,

Tobias

Anuj Mohite wrote:

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_code_node): Updated to use
c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
Added support for dumping DO CONCURRENT locality specifiers.
* frontend-passes.cc (index_interchange, gfc_code_walker): Updated to
use c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
* gfortran.h (enum locality_type): Added new enum for locality types
in DO CONCURRENT constructs.
* match.cc (match_simple_forall, gfc_match_forall): Updated to use
new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
(gfc_match_do): Implemented support for matching DO CONCURRENT locality
specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and REDUCE).
* parse.cc (parse_do_block): Updated to use
new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
* resolve.cc: Added struct check_default_none_data.
(do_concur_locality_specs_f2023): New function to check compliance
with F2023's C1133 constraint for DO CONCURRENT.
(check_default_none_expr): New function to check DEFAULT(NONE)
compliance.
(resolve_locality_spec): New function to resolve locality specs.
(gfc_count_forall_iterators): Updated to use
code->ext.concur.forall_iterator.
(gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator.
* st.cc (gfc_free_statement): Updated to free locality specifications
and use p->ext.concur.forall_iterator.
* trans-stmt.cc (gfc_trans_forall_1): Updated to use
code->ext.concur.forall_iterator.

gcc/testsuite/ChangeLog:

* gfortran.dg/do_concurrent_10.f90: New test for parsing DO CONCURRENT
with 'concurrent' as a variable name.
* gfortran.dg/do_concurrent_8_f2018.f90: New test for F2018 DO
CONCURRENT with nested loops and REDUCE clauses.
* gfortran.dg/do_concurrent_8_f2023.f90: New test for F2023 DO
CONCURRENT with nested loops and REDUCE clauses.
* gfortran.dg/do_concurrent_9.f90: New test for DO CONCURRENT with
DEFAULT(NONE) and locality specs.
* gfortran.dg/do_concurrent_all_clauses.f90: New test covering all DO
CONCURRENT clauses and their interactions.
* gfortran.dg/do_concurrent_basic.f90: New basic test for DO CONCURRENT
functionality.
* gfortran.dg/do_concurrent_constraints.f90: New test for constraints
on DO CONCURRENT locality specs.
* gfortran.dg/do_concurrent_local_init.f90: New test for LOCAL_INIT
clause in DO CONCURRENT.
* gfortran.dg/do_concurrent_locality_specs.f90: New test for DO
CONCURRENT with locality specs.
* gfortran.dg/do_concurrent_multiple_reduce.f90: New test for multiple
REDUCE clauses in DO CONCURRENT.
* gfortran.dg/do_concurrent_nested.f90: New test for nested DO
CONCURRENT loops.
* gfortran.dg/do_concurrent_parser.f90: New test for DO CONCURRENT
parser error handling.
* gfortran.dg/do_concurrent_reduce_max.f90: New test for REDUCE with
MAX operation in DO CONCURRENT.
* gfortran.dg/do_concurrent_reduce_sum.f90: New test for REDUCE with
sum operation in DO CONCURRENT.
* gfortran.dg/do_concurrent_shared.f90: New test for SHARED clause in
DO CONCURRENT.

Signed-off-by: Anuj 
---
  gcc/fortran/dump-parse-tree.cc| 113 +-
  gcc/fortran/frontend-passes.cc|   8 +-
  gcc/fortran/gfortran.h|  20 +-
  gcc/fortran/match.cc  | 286 +-
  gcc/fortran/parse.cc  |   2 +-
  gcc/fortran/resolve.cc| 354 +-
  gcc/fortran/st.cc |   5 +-
  gcc/fortran/trans-stmt.cc |   6 +-
  .../gfortran.dg/do_concurrent_10.f90  |  11 +
  .../gfortran.dg/do_concurrent_8_f2018.f90 |  19 +
  .../gfortran.dg/do_concurrent_8_f2023.f90 |  23 ++
  gcc/testsuite/gfortran.dg/do_concurrent_9.f90 |  15 +
  .../gfortran.dg/do_concurrent_all_clauses.f90 |  26 ++
  .../gfortran.dg/do_concurrent_basic.f90   |  11 +
  .../gfortran.dg/do_concurrent_constraints.f90 | 126 +++
  .../gfortran.dg/do_concurrent_local_init.f90  |  11 +
  .../do_concurrent_locality_specs.f90  |  14 +
  .../do_concurrent_multiple_reduce.f90 |  17 +
  .../gfortran.dg/do_concurrent_nested.f90

Re: [Patch] gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h

2024-09-23 Thread Tobias Burnus

Now committed as r15-3797-ga030fcad4f9f49 / 
https://gcc.gnu.org/r15-3797-ga030fcad4f9f49 as obvious.


Tobias

Am 21.09.24 um 00:52 schrieb Tobias Burnus:

See attached patch for adding the include lines:

+  if (gcn_stack_size)
+    {
+  fprintf (cfile, "#include \n");
+  fprintf (cfile, "#include \n\n");

but contrary to previously there is no 'stdint.h'
and they are also not unconditionally included.

(The 'stdbool.h' is only used for a single 'true', but on the other 
hand it
is only #included under this condition and 'stdbool.h' is a very 
simple file.)


I intent to apply this patch as obvious, unless there are further 
comments.


* * *

Thomas Schwinge wrote:


I've not verified, but I very much suspect that this change: […]

 gcn/mkoffload.cc: Use #embed for including the generated ELF file
... is responsible for: […]
 /tmp/ccHVeRbm.c:80:21: error: implicit declaration of function 
'getenv' [-Wimplicit-function-declaration]

[…] Did you not see that happen in your testing?


I vaguely remember some fails in this area — but after digging and 
re-testing, it did not show up, for whatever reason. As it only 
triggers with -mstack-size, it somehow must have fallen through the 
cracks. :-/

Re: OpenMP: Fix omp_get_device_from_uid, minor cleanup

2024-09-23 Thread Tobias Burnus

Now committed as r15-3799-gcdb9aa0f623ec7 / 
https://gcc.gnu.org/r15-3799-gcdb9aa0f623ec7


Tobias

Am 21.09.24 um 01:33 schrieb Tobias Burnus:

Hi Thomas, hello all,

the attached follow-up patch does:

* It fixes an issue (thinko) related to Fortran and \0 terminated,
  which fails for at least substring strings.

* Includes some minor fixes, e.g. ensuring the device is initialized
  in omp_get_uid_from_device, the superfluous 'omp_', or adding some
  inits to oacc-host.c.

* Now the plugins return NULL instead of failing when the UID cannot
  be obtained; in that case, the fallback UID "OMP_DEV_%d" is used.

Comments or remarks before I commit it?

* * *

Regarding the topic of caching in the plugin instead of in
libgomp: If we want to change it, we either to remove the fallback
and require the existence and success of GOMP_OFFLOAD_get_uid.
Otherwise, with host fallback support, we have to cache it at both
locations, which is somehow not really sensible, either.

Thoughts on this topic?

* * *

Longer reply to Thomas' comments:

Thomas Schwinge wrote:


+  "omp_get_uid_from_device",

..., but here without 'omp_' prefix: 'get_uid_from_device' (and properly
sorted).


Ups! Should be of course without. (as 'omp_' prefix is checked before).


Do we apparently not have test suite coverage for these things?


We do *not* test all API routines. The check is, e.g., used in

  gfc_error ("%s cannot contain OpenMP API call in intervening code "

or
  "OpenMP runtime API call %qD in a region with "
  "% clause", fndecl);

And we have a few tests for each of them, but not a full set of all 
API routines.


* * *


+  const char *uid;

Caching this here, instead of acquiring via 'GOMP_OFFLOAD_get_uid' for
each call, is a minor performance optimization?  (Similar to other items
cached here, I guess.)


Yes, but it goes a bit beyond: As the pointer is returned to the user, it
has to be allocated at some point - and cached to avoid allocating more
memory when called repeatable called. As the fallback and host 
handling is

also done in target.c, the caching is done here.

(Besides the API routines, two env vars and one context selector for
'target_device' support the UID.)

* * *


Please also update 'libgomp/oacc-host.c:host_dispatch'.

Done.

+  ! Note: In gfortran, strings are \0 termined
+  integer(c_int) function omp_get_device_from_uid(uid) bind(C)

For my understanding: in general, Fortran strings are *not*
NUL-terminated, right?  So this is a specific properly of 'gfortran'
and/or this GCC/OpenMP interface,


The Fortran standard leaves this to implementation, but by construction,
there is a length (however it is made handled internally, e.g. via the
declaration) and the actual data. - To aid debugging, gfortran NUL 
terminates

them.

However, when thinking a bit more about it, taking a substring of a
null-terminated string will not magically be \0 at the boundary of the
substring. - Thus, the simplified approach failed + a Fortran specific
function had to be added (→ fortran.c).

* * *


+    interface omp_get_uid_from_device
+  ! Deviation from OpenMP 6.0: VALUE added.

(..., which I suppose you've reported to OpenMP...)


No - it is not really a bug in the standard. The OpenMP
specification tries to provide a consistent API - but it
is difficult to create an API without touching the ABI.

For the caller side, the usage is the same independent
whether there is an 'intent(in)' or VALUE attribute,
a Bind(C) with or without binding name. Or also a generic
interface with multiple specific ones - which we do to handle
-fdefault-integer-8.

Obviously, the compiler needs to know those details, but
unless users codes the interface themselves instead of
using omp.h / omp_lib.h / the omp_lib module.

Thus, that's one of the few deviation from the OpenMP
specification which does affect the ABI but not the API.

* * *


+GOMP_OFFLOAD_get_uid (int ord)
+{

I guess I'd have just put this code into 'init_hsa_context', filling a
new statically-sized 'uuid' field in 'hsa_context_info' (like
'driver_version_s'; and assuming that 'hsa_context_info' is the right
abstraction for this), and then just return that 'uuid' from
'GOMP_OFFLOAD_get_uid'.


That would be one option. Still, we have to decide whether we either
want to have strictly everything handled in the device code - including
fallback handling (which could be an UID replacement or a fatal error).

Of we do part of the handling elsewhere, e.g. by permitting that the
plugin can fail or does not provide the functions, we can handle it
in target.c (as currently done) - but then we need to cache it there
as well (or at least the fallbacks).

* * *


That way, you'd avoid the unclear semantics of
who gets to 'free' the buffer returned from 'GOMP_OFFLOAD_get_uid' upon
'GOMP_OFFLOAD_fini_device' -- currently the memory is lost?


Well, depends what you mean by lost. The 'devices' data structure in 
target.c is allocated early during device init

Re: [committed] arc: Remove mlra option [PR113954]

2024-09-23 Thread Andreas Schwab

On Sep 23 2024, Claudiu Zissulescu wrote:

> diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> index c800226b179..a225adeff57 100644
> --- a/gcc/config/arc/arc.cc
> +++ b/gcc/config/arc/arc.cc
> @@ -721,7 +721,7 @@ static rtx arc_legitimize_address_0 (rtx, rtx, 
> machine_mode mode);
>arc_no_speculation_in_delay_slots_p
>  
>  #undef TARGET_LRA_P
> -#define TARGET_LRA_P arc_lra_p
> +#define TARGET_LRA_P hook_bool_void_true

This is the default for lra_p, so you can remove the override.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [Patch, fortran] PR116733: Generic processing of assumed rank objects (f202y)

2024-09-23 Thread Thomas Koenig


Am 23.09.24 um 11:02 schrieb Paul Richard Thomas:

Hi All,

The moment I saw the DIN4 proposal for "Generic processing of assumed 
rank objects", I thought that this was a highly intuitive and 
implementable proposal. I implemented a test version in June and had 
some correspondence with Reinhold Bader about it shortly before he 
passed away.


Malcolm Cohen wrote J3/24-136r1 in response to this and I have posted a 
comment in PR116733 addressing the the extent to which the attached 
patch addresses his remarks.


I think your approaches are sound.

Before this patch goes through the approval process, we have to consider 
how experimental F202y features can be carried forward. I was badly 
bitten by failing to synchronise the array descriptor reform branch to 
the extent that I gave up on it and adopted the simplified reform that 
is now in place. Given the likely timescale before the full adoption of 
the F202y standard, this is a considerable risk for 
experimental features, given the variability of active maintainers:




That is correct. We (well, Nicolas) also saw the bit rot on the
native coarray branch.


What I propose is the following:
(i) For audit purposes, I have opened PR116732, which should be blocked 
by PRs for each experimental F202y feature;


I have added PR 116025 to this.


(ii) These PRs should represent a complete audit trail for each feature; and
(iii) All such experimental features should be enabled on mainline by 
--std=f202y, which is equivalent to -std=f2023+f202y.


As far as the -funsigned patch goes: I would like to keep the option
itself, but also enable it with -std=f202y.

The attached patch enables pointer assignment and associate, both with 
rank remapping, plus the reshape intrinsics. which was not part of the 
DIN4 proposal.


The ChangeLog entries do a pretty complete job of describing the patch.

Regtests correctly. OK for mainline?


As a general remark: What we currently have are extensions, and we
should try to describe them so a user who is not familiar with the
J3 documents or our PRs and commit messages should be able to use
the features.  I am certainly not there yet with the work on UNSIGNED,
but we should work on that so the documentation is fairly complete
when gcc15 is released.

As for the patch itself: It looks good do me in principle. It still
needs some test cases (or did git omit these from your patch?
I've been bitten by that :-).  One typo:

+  gfc_error ("The data-target at %L ia an assumed rank object and 
so the "


s/ia/is/

So, OK in principle with reasonable test coverage, but if you could hold
for a few days before committing so others can also comment, that would
be good.

Best regards

Thomas

[committed] arc: Remove mlra option [PR113954]

2024-09-23 Thread Claudiu Zissulescu

The target dependent mlra option was designed to be able to quickly
switch between LRA and reload.  The reload register allocator step is
scheduled for retirement, thus, remove the functionality of mlra,
keeping it for backward compatibility.

PR target/113954

gcc/ChangeLog:

* config/arc/arc.cc (TARGET_LRA_P): Always return true.
(arc_lra_p): Remove.
* config/arc/arc.h (TARGET_LRA): Remove.
* config/arc/arc.opt (mlra): Change it to do nothing.
* doc/invoke.texi (mlra): Update option description.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc.cc  | 10 +-
 gcc/config/arc/arc.h   |  4 
 gcc/config/arc/arc.opt |  4 ++--
 gcc/doc/invoke.texi|  4 +---
 4 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index c800226b179..a225adeff57 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -721,7 +721,7 @@ static rtx arc_legitimize_address_0 (rtx, rtx, machine_mode 
mode);
   arc_no_speculation_in_delay_slots_p
 
 #undef TARGET_LRA_P
-#define TARGET_LRA_P arc_lra_p
+#define TARGET_LRA_P hook_bool_void_true
 #define TARGET_REGISTER_PRIORITY arc_register_priority
 /* Stores with scaled offsets have different displacement ranges.  */
 #define TARGET_DIFFERENT_ADDR_DISPLACEMENT_P hook_bool_void_true
@@ -10156,14 +10156,6 @@ arc_eh_uses (int regno)
   return false;
 }
 
-/* Return true if we use LRA instead of reload pass.  */
-
-bool
-arc_lra_p (void)
-{
-  return arc_lra_flag;
-}
-
 /* ??? Should we define TARGET_REGISTER_PRIORITY?  We might perfer to
use q registers, because some insn are shorter with them.  OTOH we
already have separate alternatives for this purpose, and other
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 0a1ecb71d89..4cadef7a2b2 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -1660,8 +1660,4 @@ enum
 /* The default option for BI/BIH instructions.  */
 #define DEFAULT_BRANCH_INDEX 0
 
-#ifndef TARGET_LRA
-#define TARGET_LRA arc_lra_p()
-#endif
-
 #endif /* GCC_ARC_H */
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index 5abb2977626..7b9318335be 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -401,8 +401,8 @@ Pass -marclinux_prof option through to linker.
 
 ;; lra is still unproven for ARC, so allow to fall back to reload with 
-mno-lra.
 mlra
-Target Var(arc_lra_flag) Init(1) Save
-Use LRA instead of reload.
+Target Ignore
+Does nothing.  Preserved for backward compatibility.
 
 mlra-priority-none
 Target RejectNegative Var(arc_lra_priority_tag, ARC_LRA_PRIORITY_NONE)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 032adfff5fc..7e4f0ca7a62 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -22716,9 +22716,7 @@ the case.
 
 @opindex mlra
 @item -mlra
-Enable Local Register Allocation.  This is still experimental for ARC,
-so by default the compiler uses standard reload
-(i.e.@: @option{-mno-lra}).
+Does nothing.  Preserved for backward compatibility.
 
 @opindex mlra-priority-none
 @item -mlra-priority-none
-- 
2.30.2

[PING] [PATCH v2] gimple ssa: Don't use __builtin_popcount in switch exp transform

2024-09-23 Thread Filip Kastl

Hi,

I'd like to ping my patch.  You can find it here

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662744.html

Btw I forgot to include [PR116616] in the subject.  Hope I didn't confuse
people.  I will take care to include the tag in the git commit message.

Thanks,
Filip Kastl

RE: [nvptx] Fix code-gen for alias attribute

2024-09-23 Thread Thomas Schwinge

Hi Prathamesh!

On 2024-09-23T08:24:36+, Prathamesh Kulkarni  wrote:
> Thanks for the review and sorry for late reply.

No worries.  My replies often are way more delayed...  ;'-|

> The attached patch addresses the above suggestions.
> Does it look OK ?

ACK, thanks!

> (Also, could you please test it at your end as well?)

As expected:

 PASS: gcc.target/nvptx/alias-to-alias-1.c (test for excess errors)
+PASS: gcc.target/nvptx/alias-to-alias-1.c execution test
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)\\tcall 
bar;$ 0
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)\\tcall 
baz;$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)\\tcall 
foo;$ 0
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DECL: bar$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DECL: baz$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DECL: foo$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DEF: bar$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DEF: baz$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DEF: foo$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.alias bar,foo;$ 1
-PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.alias baz,bar;$ 1
+PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.alias baz,foo;$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.visible \\.func bar;$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.visible \\.func baz;$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.visible \\.func foo$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.visible \\.func foo;$ 1


Grüße
 Thomas


> nvptx: Partial support for aliases to aliases.
>
> For the following test (adapted from pr96390.c):
>
> __attribute__((noipa)) int foo () { return 42; }
> int bar () __attribute__((alias ("foo")));
> int baz () __attribute__((alias ("bar")));
>
> int main ()
> {
>   int n;
>   #pragma omp target map(from:n)
> n = baz ();
>   return n;
> }
>
> gcc emits following ptx for baz:
> .visible .func (.param.u32 %value_out) bar;
> .alias bar,foo;
> .visible .func (.param.u32 %value_out) baz;
> .alias baz,bar;
>
> which is incorrect since PTX requires aliasee to be a defined function.
> The patch instead uses cgraph_node::get(name)->ultimate_alias_target,
> which generates the following PTX:
>
> .visible .func (.param.u32 %value_out) baz;
> .alias baz,foo;
>
> gcc/ChangeLog:
>   PR target/104957
> * config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls): Use
> cgraph_node::get(name)->ultimate_alias_target instead of value.
>
> gcc/testsuite/ChangeLog:
>   PR target/104957
>   * gcc.target/nvptx/alias-to-alias-1.c: Adjust.
>
> Signed-off-by: Prathamesh Kulkarni 
> Co-authored-by: Thomas Schwinge 
>
> diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
> index 4a7c64f05eb..96a1134220e 100644
> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -7582,7 +7582,8 @@ nvptx_mem_local_p (rtx mem)
>while (0)
>  
>  void
> -nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
> +nvptx_asm_output_def_from_decls (FILE *stream, tree name,
> +  tree value ATTRIBUTE_UNUSED)
>  {
>if (nvptx_alias == 0 || !TARGET_PTX_6_3)
>  {
> @@ -7617,7 +7618,8 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
> name, tree value)
>return;
>  }
>  
> -  if (!cgraph_node::get (name)->referred_to_p ())
> +  cgraph_node *cnode = cgraph_node::get (name);
> +  if (!cnode->referred_to_p ())
>  /* Prevent "Internal error: reference to deleted section".  */
>  return;
>  
> @@ -7626,11 +7628,27 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
> name, tree value)
>fputs (s.str ().c_str (), stream);
>  
>tree id = DECL_ASSEMBLER_NAME (name);
> +
> +  /* Walk alias chain to get reference callgraph node.
> + The rationale of using ultimate_alias_target here is that
> + PTX's .alias directive only supports 1-level aliasing where
> + aliasee is function defined in same module.
> +
> + So for the following case:
> + int foo() { return 42; }
> + int bar () __attribute__((alias ("foo")));
> + int baz () __attribute__((alias ("bar")));
> +
> + should resolve baz to foo:
> + .visible .func (.param.u32 %value_out) baz;
> + .alias baz,foo;  */
> +  symtab_node *alias_target_node = cnode->ultimate_alias_target ();
> +  tree alias_target_id = DECL_ASSEMBLER_NAME (alias_target_node->decl)

[PATCH]middle-end: check explicitly for external or constants when checking for loop invariant [PR116817]

2024-09-23 Thread Tamar Christina

Hi All,

The previous check if a value was external was checking
!vect_get_internal_def (vinfo, var) but this of course isn't completely right
as they could reductions etc.

This changes the check to just explicitly look at externals and constants.
Note that reductions remain unhandled here, but we don't support codegen of
boolean reductions today anyway.

So at the time we do then this would have the be handled as well in lowering.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/116817
* tree-vect-patterns.cc (vect_recog_bool_pattern): Check for const or
externals.

gcc/testsuite/ChangeLog:

PR tree-optimization/116817
* g++.dg/vect/pr116817.cc: New test.

---
diff --git a/gcc/testsuite/g++.dg/vect/pr116817.cc 
b/gcc/testsuite/g++.dg/vect/pr116817.cc
new file mode 100644
index 
..7e28982fb138c24f956aedb03fa454d9d858
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr116817.cc
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int main_ulData0;
+unsigned *main_pSrcBuffer;
+int main(void) {
+  int iSrc = 0;
+  bool bData0;
+  for (; iSrc < 4; iSrc++) {
+if (bData0)
+  main_pSrcBuffer[iSrc] = main_ulData0;
+else
+  main_pSrcBuffer[iSrc] = 0;
+bData0 = !bData0;
+  }
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
e7e877dd2adb55262822f1660f8d92b42d44e6d0..b913d6de003e8fd95e16c7892d52f23bde823647
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6062,12 +6062,15 @@ vect_recog_bool_pattern (vec_info *vinfo,
   if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
return NULL;
 
+  stmt_vec_info var_def_info = vinfo->lookup_def (var);
   if (check_bool_pattern (var, vinfo, bool_stmts))
var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
   else if (integer_type_for_mask (var, vinfo))
return NULL;
   else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
-  && !vect_get_internal_def (vinfo, var))
+  && var_def_info
+  && (STMT_VINFO_DEF_TYPE (var_def_info) == vect_external_def
+  || STMT_VINFO_DEF_TYPE (var_def_info) == vect_constant_def))
{
  /* If the condition is already a boolean then manually convert it to a
 mask of the given integer type but don't set a vectype.  */




-- 
diff --git a/gcc/testsuite/g++.dg/vect/pr116817.cc b/gcc/testsuite/g++.dg/vect/pr116817.cc
new file mode 100644
index ..7e28982fb138c24f956aedb03fa454d9d858
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr116817.cc
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int main_ulData0;
+unsigned *main_pSrcBuffer;
+int main(void) {
+  int iSrc = 0;
+  bool bData0;
+  for (; iSrc < 4; iSrc++) {
+if (bData0)
+  main_pSrcBuffer[iSrc] = main_ulData0;
+else
+  main_pSrcBuffer[iSrc] = 0;
+bData0 = !bData0;
+  }
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index e7e877dd2adb55262822f1660f8d92b42d44e6d0..b913d6de003e8fd95e16c7892d52f23bde823647 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6062,12 +6062,15 @@ vect_recog_bool_pattern (vec_info *vinfo,
   if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
+  stmt_vec_info var_def_info = vinfo->lookup_def (var);
   if (check_bool_pattern (var, vinfo, bool_stmts))
 	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
   else if (integer_type_for_mask (var, vinfo))
 	return NULL;
   else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
-	   && !vect_get_internal_def (vinfo, var))
+	   && var_def_info
+	   && (STMT_VINFO_DEF_TYPE (var_def_info) == vect_external_def
+		   || STMT_VINFO_DEF_TYPE (var_def_info) == vect_constant_def))
 	{
 	  /* If the condition is already a boolean then manually convert it to a
 	 mask of the given integer type but don't set a vectype.  */

[PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion

2024-09-23 Thread pan2 . li

From: Pan Li 

This patch would like to fix the following ICE for -O2 -m32 of x86_64.

during RTL pass: expand
JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
int)':
JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
expand_fn_using_insn, at internal-fn.cc:263
3 | void DequeueEvent(unsigned frame) {
  |  ^~~~
0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
diagnostic_metadata const*, diagnostic_option_id, char const*,
__va_list_tag (*) [1], diagnostic_t)
???:0
0x27c4a3f internal_error(char const*, ...)
???:0
0x27b3994 fancy_abort(char const*, int, char const*)
???:0
0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
???:0
0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int)
???:0
0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
???:0

We allowed the operand convert when matching SAT_SUB in match.pd, to support
the zip benchmark SAT_SUB pattern.  Aka,

(convert? (minus (convert1? @0) (convert1? @1))) for below sample code.

void test (uint16_t *x, unsigned b, unsigned n)
{
  unsigned a = 0;
  register uint16_t *p = x;

  do {
a = *--p;
*p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
  } while (--n);
}

The pattern match for SAT_SUB itself may also act on below scalar sample
code too.

unsigned long long GetTimeFromFrames(int);
unsigned long long GetMicroSeconds();

void DequeueEvent(unsigned frame) {
  long long frame_time = GetTimeFromFrames(frame);
  unsigned long long current_time = GetMicroSeconds();
  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
}

Aka:

uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);

Then there will be a problem when ia32 or -m32 is given when compiling.
Because we only check the lhs (aka uint32_t) type is supported by ifn
and missed the operand (aka uint64_t).  Mostly DImode is disabled for
32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

PR target/116814

gcc/ChangeLog:

* tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
ifn is_supported check for operand TREE type.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr116814-1.C: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
 gcc/tree-ssa-math-opts.cc | 23 +++
 2 files changed, 27 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C

diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
b/gcc/testsuite/g++.dg/torture/pr116814-1.C
new file mode 100644
index 000..8db5b020cfd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -m32" } */
+
+unsigned long long GetTimeFromFrames(int);
+unsigned long long GetMicroSeconds();
+
+void DequeueEvent(unsigned frame) {
+  long long frame_time = GetTimeFromFrames(frame);
+  unsigned long long current_time = GetMicroSeconds();
+
+  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
+}
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index d61668aacfc..361761cedef 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call 
(gimple_stmt_iterator *gsi, gphi *phi,
internal_fn fn, tree lhs, tree op_0,
tree op_1)
 {
-  if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), OPTIMIZE_FOR_BOTH))
-{
-  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
-  gimple_call_set_lhs (call, lhs);
-  gsi_insert_before (gsi, call, GSI_SAME_STMT);
+  tree lhs_type = TREE_TYPE (lhs);
+  tree op_type = TREE_TYPE (op_0);
 
-  gimple_stmt_iterator psi = gsi_for_stmt (phi);
-  remove_phi_node (&psi, /* release_lhs_p */ false);
-}
+  if (!direct_internal_fn_supported_p (fn, lhs_type, OPTIMIZE_FOR_BOTH))
+return;
+
+  if (lhs_type != op_type
+  && !direct_internal_fn_supported_p (fn, op_type, OPTIMIZE_FOR_BOTH))
+return;
+
+  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
+  gimple_call_set_lhs (call, lhs);
+  gsi_insert_before (gsi, call, GSI_SAME_STMT);
+
+  gimple_stmt_iterator psi = gsi_for_stmt (phi);
+  remove_phi_node (&psi, /* release_lhs_p */ false);
 }
 
 /*
-- 
2.43.0

[pushed] libstdc++: guard c++config pragmas

2024-09-23 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk as obvious.

-- 8< --

c++config needs to be compilable as C, in which mode we complain about the
-Wc++ pragmas.

libstdc++-v3/ChangeLog:

* include/bits/c++config: Don't try to disable -Wc++??-extensions
when compiling as C.
---
 libstdc++-v3/include/bits/c++config | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 66d03cfd037..16c67b80e76 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -34,8 +34,10 @@
 
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wvariadic-macros"
+#if __cplusplus
 #pragma GCC diagnostic ignored "-Wc++11-extensions"
 #pragma GCC diagnostic ignored "-Wc++23-extensions" // bf16
+#endif
 
 // The major release number for the GCC release the C++ library belongs to.
 #define _GLIBCXX_RELEASE

base-commit: 2620e3727d9559ec03f9f967ecb68ed2e076a342
-- 
2.46.0

Re: [PATCH v2] AArch64: Fix copysign patterns

2024-09-23 Thread Saurabh Jha


On 9/20/2024 10:51 AM, Christophe Lyon wrote:

Hi Saurabh,

On 9/18/24 21:53, Saurabh Jha wrote:

Hi Wilco,

Thanks for the patch. This mostly looks good. Just added a couple 
clarifications.


On 9/18/2024 8:17 PM, Wilco Dijkstra wrote:

v2: Add more testcase fixes.

The current copysign pattern has a mismatch in the predicates and 
constraints -
operand[2] is a register_operand but also has an alternative X which 
allows any
operand.  Since it is a floating point operation, having an integer 
alternative
makes no sense.  Change the expander to always use the vector variant 
of copysign
which results in better code.  Add a SVE bitmask move immediate 
alternative to
the aarch64_simd_mov patterns so we emit a single move when SVE is 
available.


Passes bootstrap and regress, OK for commit?

gcc:
 * config/aarch64/aarch64.md (copysign3): Defer to 
AdvSIMD copysign.


Should the things after "(copysign..") be on a newline? I have mostly 
seen gcc ChangeLogs have file name and individual elements separated 
by newlines.


AFAIK, ChangeLog entries follow to 80-columns limit and are indented by 
a tab.  In emacs + change-log-mode, alt-Q formats the paragraph adequately.


Makes sense, thanks Christophe.


Thanks,

Christophe





 (copysign3_insn): Remove pattern.
 * config/aarch64/aarch64-simd.md 
(aarch64_simd_mov): Add SVE movimm

 alternative.


Similar comment about file name and the instruction pattern being on 
separate lines.


 (aarch64_simd_mov): Likewise.  Remove redundant 
V2DI check.

 (copysign3): Make global.
 (ior3): Move Neon immediate alternative 
before the SVE one.


testsuite:
 * gcc.target/aarch64/copysign_3.c: New test.
 * gcc.target/aarch64/copysign_4.c: New test.
 * gcc.target/aarch64/fneg-abs_2.c: Allow .2s and .4s.
 * gcc.target/aarch64/sve/fneg-abs_1.c: Fixup test.
 * gcc.target/aarch64/sve/fneg-abs_2.c: Likewise.

---

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/ 
aarch64-simd.md
index 
e70d59380ed295577721f15277c28829d42a0189..3077e920ce623c92d21193124747ff7ad010d006 100644

--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -161,6 +161,7 @@ (define_insn_and_split 
"*aarch64_simd_mov"

   [?w, r ; f_mcr  , *    , *] fmov\t%d0, %1
   [?r, r ; mov_reg    , *    , *] mov\t%0, %1
   [w , Dn; neon_move   , simd , *] << 
aarch64_output_simd_mov_immediate (operands[1], 64);

+ [w , vsl; * , sve  , *] mov\t%Z0., %1
   [w , Dz; f_mcr  , *    , *] fmov\t%d0, xzr
   [w , Dx; neon_move  , simd , 8] #
    }
@@ -190,6 +191,7 @@ (define_insn_and_split 
"*aarch64_simd_mov"

   [?w , r ; multiple   , *   , 8] #
   [?r , r ; multiple   , *   , 8] #
   [w  , Dn; neon_move   , simd, 4] << 
aarch64_output_simd_mov_immediate (operands[1], 128);

+ [w  , vsl; * , sve,  4] mov\t%Z0., %1
   [w  , Dz; fmov   , *   , 4] fmov\t%d0, xzr
   [w  , Dx; neon_move  , simd, 8] #
    }
@@ -208,7 +210,6 @@ (define_insn_and_split 
"*aarch64_simd_mov"

  else
    {
  if (FP_REGNUM_P (REGNO (operands[0]))
-    && mode == V2DImode
  && aarch64_maybe_generate_simd_constant (operands[0], 
operands[1],

   mode))
    ;
@@ -648,7 +649,7 @@ (define_insn 
"aarch64_dot_lane<

    [(set_attr "type" "neon_dot")]
  )
-(define_expand "copysign3"
+(define_expand "@copysign3"
    [(match_operand:VHSDF 0 "register_operand")
 (match_operand:VHSDF 1 "register_operand")
 (match_operand:VHSDF 2 "nonmemory_operand")]
@@ -1138,10 +1139,8 @@ (define_insn "ior3"
    "TARGET_SIMD"
    {@ [ cons: =0 , 1 , 2; attrs: arch ]
   [ w    , w , w  ; simd  ] orr\t%0., 
%1., %2.
- [ w    , 0 , vsl; sve   ] orr\t%Z0., 
%Z0., #%2

- [ w    , 0 , Do ; simd  ] \
-   << aarch64_output_simd_mov_immediate (operands[2], , \
- AARCH64_CHECK_ORR);
+ [ w    , 0 , Do ; simd  ] << 
aarch64_output_simd_mov_immediate (operands[2], , 
AARCH64_CHECK_ORR);
+ [ w    , 0 , vsl; sve   ] orr\t%Z0., 
%Z0., %2

    }
    [(set_attr "type" "neon_logic")]
  )
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/ 
aarch64.md
index 
c54b29cd64b9e0dc6c6d12735049386ccedc5408..e9b148e59abf81cee53cb0dd846af9a62bbad294 100644

--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7218,20 +7218,11 @@ (define_expand "lrint2"
  }
  )
-;; For copysign (x, y), we want to generate:
+;; For copysignf (x, y), we want to generate:
  ;;
-;;   LDR d2, #(1 << 63)
-;;   BSL v2.8b, [y], [x]
+;;    movi    v31.4s, 0x80, lsl 24
+;;    bit v0.16b, v1.16b, v31.16b
  ;;
-;; or another, equivalent, sequence using one of BSL/BIT/BIF.  Because
-;; we expect these operations to nearly a

Re: [PATCH v4] c++: Don't crash when mangling member with anonymous union or template types [PR100632, PR109790]

2024-09-23 Thread Jason Merrill


On 9/23/24 12:58 PM, Simon Martin wrote:

Hi Jason,

On 20 Sep 2024, at 18:06, Jason Merrill wrote:


On 9/16/24 4:07 PM, Simon Martin wrote:

Hi Jason,

On 14 Sep 2024, at 18:44, Simon Martin wrote:


Hi Jason,

On 14 Sep 2024, at 18:11, Jason Merrill wrote:


On 9/13/24 11:06 AM, Simon Martin wrote:

Hi Jason,

On 12 Sep 2024, at 16:48, Jason Merrill wrote:


On 9/12/24 7:23 AM, Simon Martin wrote:

Hi,

While looking at more open PRs, I have discovered that the
problem



reported in PR109790 is very similar to that in PR100632, so
I’m
combining both in a single patch attached here. The fix is
similar



to
the one I initially submitted, only more general and I believe
better.



We currently crash upon mangling members that have an anonymous
union
or a template type.

The problem is that before calling write_unqualified_name,
write_member_name has an assert that assumes that it has an
IDENTIFIER_NODE in its hand. However it's incorrect: it has an
anonymous union in PR100632, and a template in PR109790.


The assert does not assume it has an IDENTIFIER_NODE; it assumes
it



has a _DECL, and expects its DECL_NAME to be an IDENTIFIER_NODE.

!identifier_p will always be true for a _DECL, making the assert
useless.

Indeed, my bad. Thanks for catching and explaining this!


How about checking !DECL_NAME (member) instead of !identifier_p?

Unfortunately it does not fix PR100632, that actually involves
legitimate operators.

I checked why the assert was added in the first place (via
r11-6301),

and the idea was to catch any case where we’d be missing the



“on”
marker - PR100632 contains such cases.


I assume you mean 109790?

Yes :-/



So I took the approach to refactor write_member_name a bit to
first
write the marker in all the cases required, and then actually
write
the
member name; and the assert is not needed anymore there.


Refactoring code in mangle.cc is tricky given the intent to retain
backward bug-compatibility.

Specifically, adding the "on" in ABI v11 is wrong since GCC 10 (ABI



v14-15) didn't emit it for the 109790 testcase; we can add it for
v16,
since GCC 11 ICEd on the testcase.

I would prefer to fix the bug locally rather than refactor.

Understood, that makes sense.

I’ll work on a more local patch and resubmit (by the way you can
also
ignore
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662496.html,
that is also “too wide”).

I’m attaching a revised version of the patch, that lets members
with an anonymous union type go through, and for operators,
introduces
a new ABI version under which it adds the missing "on”.


We can add the missing "on" in v16, since previous releases of v16-19
ICEd on that case.

Duh you’re right; amended in the latest revision of the patch.



@@ -3255,7 +3255,15 @@ write_member_name (tree member)
  }
else if (DECL_P (member))
  {
-  gcc_assert (!DECL_OVERLOADED_OPERATOR_P (member));
+  if (ANON_AGGR_TYPE_P (TREE_TYPE (member)))
+   ;
+  else if (DECL_OVERLOADED_OPERATOR_P (member))
+   {
+ if (abi_check (20))
+   write_string ("on");
+   }
+  else
+   gcc_assert (identifier_p (DECL_NAME (member)));


This last else is redundant; checking DECL_OVERLOADED_OPERATOR_P
already asserts that it's an identifier.

Thanks for pointing this out, fixed in the attached revision,
successfully tested on x86_64-pc-linux-gnu. OK for trunk?


OK.

Jason

Re: [patch, fortran] Matmul and dot_product for unsigned

2024-09-23 Thread Thomas Koenig


Hello Andre and everybody else?

Any more comments on the matmul patch? The other ones depend on
it, so I would like to commit (unless there are further
questions, of course).

Best regards

Thomas

[PATCH v3 1/4] tree-optimization/116024 - simplify C1-X cmp C2 for UB-on-overflow types

2024-09-23 Thread Artemiy Volkov

Implement a match.pd pattern for C1 - X cmp C2, where C1 and C2 are
integer constants and X is of a UB-on-overflow type.  The pattern is
simplified to X rcmp C1 - C2 by moving X and C2 to the other side of the
comparison (with opposite signs).  If C1 - C2 happens to overflow,
replace the whole expression with either a constant 0 or a constant 1
node, depending on the comparison operator and the sign of the overflow.

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

10 - (int) x <= 9;

now compiles to

sgt a0,a0,zero

instead of

li  a5,10
sub a0,a5,a0
sltia0,a0,10

on 32-bit RISC-V.

Additional examples can be found in the newly added test file. This
patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.  Existing tests were
adjusted where necessary.

gcc/ChangeLog:

PR tree-optimization/116024
* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116024.c: New test.
* gcc.dg/pr67089-6.c: Adjust.

Signed-off-by: Artemiy Volkov 
---
 gcc/match.pd | 26 +
 gcc/testsuite/gcc.dg/pr67089-6.c |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c | 74 
 3 files changed, 102 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 940292d0d49..81be0a21462 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8925,6 +8925,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
}
(cmp @0 { res; })
 
+/* Invert sign of X in comparisons of the form C1 - X CMP C2.  */
+
+(for cmp (lt le gt ge eq ne)
+ rcmp (gt ge lt le eq ne)
+  (simplify
+   (cmp (minus INTEGER_CST@0 @1) INTEGER_CST@2)
+/* For UB-on-overflow types, simply switch sides for X and C2
+   to arrive at X RCMP C1 - C2, handling the case when the latter
+   expression overflows.  */
+   (if (!TREE_OVERFLOW (@0) && !TREE_OVERFLOW (@2)
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1)))
+ (with { tree res = int_const_binop (MINUS_EXPR, @0, @2); }
+  (if (TREE_OVERFLOW (res))
+   (switch
+(if (cmp == NE_EXPR)
+ { constant_boolean_node (true, type); })
+(if (cmp == EQ_EXPR)
+ { constant_boolean_node (false, type); })
+{
+  bool less = cmp == LE_EXPR || cmp == LT_EXPR;
+  bool ovf_high = wi::lt_p (wi::to_wide (@0), 0,
+TYPE_SIGN (TREE_TYPE (@0)));
+  constant_boolean_node (less == ovf_high, type);
+})
+  (rcmp @1 { res; }))
+
 /* Canonicalizations of BIT_FIELD_REFs.  */
 
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr67089-6.c b/gcc/testsuite/gcc.dg/pr67089-6.c
index b59d75b2318..80a33c3f3e2 100644
--- a/gcc/testsuite/gcc.dg/pr67089-6.c
+++ b/gcc/testsuite/gcc.dg/pr67089-6.c
@@ -57,5 +57,5 @@ T (25, unsigned short, 2U - x, if (r > 2U) foo (0))
 T (26, unsigned char, 2U - x, if (r <= 2U) foo (0))
 
 /* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 16 "widening_mul" { target 
{ i?86-*-* x86_64-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 11 "widening_mul" { target 
{ { i?86-*-* x86_64-*-* } && { ! ia32 } } } } } */
-/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 9 "widening_mul" { target 
{ { i?86-*-* x86_64-*-* } && ia32 } } } } */
+/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 9 "widening_mul" { target 
{ { i?86-*-* x86_64-*-* } && { ! ia32 } } } } } */
+/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 7 "widening_mul" { target 
{ { i?86-*-* x86_64-*-* } && ia32 } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116024.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr116024.c
new file mode 100644
index 000..0dde9abbf89
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116024.c
@@ -0,0 +1,74 @@
+/* PR tree-optimization/116024 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-details" } */
+
+#include 
+#include 
+
+uint32_t f(void);
+
+int32_t i1(void)
+{
+  int32_t l = 2;
+  l = 10 - (int32_t)f();
+  return l <= 9; // f() > 0
+}
+
+int32_t i1a(void)
+{
+  int32_t l = 2;
+  l = 20 - (int32_t)f();
+  return l <= INT32_MIN; // return 0
+}
+
+int32_t i1b(void)
+{
+  int32_t l = 2;
+  l = 30 - (int32_t)f();
+  return l <= INT32_MIN + 31; // f() == INT32_MAX
+}
+
+int32_t i1c(void)
+{
+  int32_t l = 2;
+  l = INT32_MAX - 40 - (int32_t)f();
+  return l <= -38; // f() > INT32_MAX - 3
+}
+
+int32_t i1d(void)
+{
+  int32_t l = 2;
+  l = INT32_MAX - 50 - (int32_t)f();
+  return l <= INT32_MAX - 1; // f() != -50
+}
+
+int32_t i1e(void)
+{
+  int32_t l = 2;
+  l = INT32_MAX - 60 - (int32_t)f();
+  return l != INT32_MAX - 90; // f() != 30
+}
+
+int32_t i1f(void)
+{
+  int32_t l = 2;
+  l = INT32_MIN + 70 - (int32_t)f();
+  return l <= INT32_MAX - 2; // return 0
+}
+
+int32_t i1g(void)
+{
+  int32_t l = 2;
+  l

[PATCH v3 3/4] tree-optimization/116024 - simplify C1-X cmp C2 for wrapping signed types

2024-09-23 Thread Artemiy Volkov

Implement a match.pd transformation inverting the sign of X in
C1 - X cmp C2, where C1 and C2 are integer constants and X is
of a wrapping signed type, by observing that:

(a) If cmp is == or !=, simply move X and C2 to opposite sides of
the comparison to arrive at X cmp C1 - C2.

(b) If cmp is <:
- C1 - X < C2 means that C1 - X spans the values of -INF,
  -INF + 1, ..., C2 - 1;
- Therefore, X is one of C1 - -INF, C1 - (-INF + 1), ...,
  C1 - C2 + 1;
- Subtracting (C1 + 1), X - (C1 + 1) is one of - (-INF) - 1,
  - (-INF) - 2, ..., -C2;
- Using the fact that - (-INF) - 1 is +INF, derive that
  X - (C1 + 1) spans the values +INF, +INF - 1, ..., -C2;
- Thus, the original expression can be simplified to
  X - (C1 + 1) > -C2 - 1.

(c) Similarly, C1 - X <= C2 is equivalent to X - (C1 + 1) >= -C2 - 1.

(d) The >= and > cases are negations of (b) and (c), respectively.

(e) In all cases, the expression -C2 - 1 can be shortened to
bit_not (C2).

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

10 - (int)f() >= 20;

now compiles to

addia0,a0,-11
sltia0,a0,-20

instead of

li  a5,10
sub a0,a5,a0
sltit0,a0,20
xoria0,t0,1

on 32-bit RISC-V when compiled with -fwrapv.

Additional examples can be found in the newly added test file.  This
patch has been bootstrapped and regtested on aarch64, x86_64, and i386,
and additionally regtested on riscv32.

gcc/ChangeLog:

PR tree-optimization/116024
* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116024-1-fwrapv.c: New test.

Signed-off-by: Artemiy Volkov 
---
 gcc/match.pd  | 20 -
 .../gcc.dg/tree-ssa/pr116024-1-fwrapv.c   | 73 +++
 2 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c

diff --git a/gcc/match.pd b/gcc/match.pd
index d0489789527..bf3b4a2e3fe 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8970,7 +8970,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (cmp (plus @1 (minus @2 @0)) @2))
(if (cmp == LT_EXPR || cmp == GE_EXPR)
 (cmp (plus @1 (minus @2
-  (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2)))
+  (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2)))
+/* For wrapping signed types (-fwrapv), transform like so (using < as example):
+C1 - X < C2
+  ==>  C1 - X = { -INF, -INF + 1, ..., C2 - 1 }
+  ==>  X = { C1 - (-INF), C1 - (-INF + 1), ..., C1 - C2 + 1 }
+  ==>  X - (C1 + 1) = { - (-INF) - 1, - (-INF) - 2, ..., -C2 }
+  ==>  X - (C1 + 1) = { +INF, +INF - 1, ..., -C2 }
+  ==>  X - (C1 + 1) > -C2 - 1
+  ==>  X - (C1 + 1) > bit_not (C2)
+
+  Similarly,
+C1 - X <= C2 ==> X - (C1 + 1) >= bit_not (C2);
+C1 - X >= C2 ==> X - (C1 + 1) <= bit_not (C2);
+C1 - X > C2 ==> X - (C1 + 1) < bit_not (C2).  */
+   (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1)))
+ (if (cmp == EQ_EXPR || cmp == NE_EXPR)
+   (cmp @1 (minus @0 @2))
+ (rcmp (minus @1 (plus @0 { build_one_cst (TREE_TYPE (@1)); }))
+(bit_not @2
 
 /* Canonicalizations of BIT_FIELD_REFs.  */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
new file mode 100644
index 000..c2bf1d17234
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
@@ -0,0 +1,73 @@
+/* PR tree-optimization/116024 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-details -fwrapv" } */
+
+#include 
+
+uint32_t f(void);
+
+int32_t i2(void)
+{
+  int32_t l = 2;
+  l = 10 - (int32_t)f();
+  return l <= 20; // f() - 11 >= -21
+}
+
+int32_t i2a(void)
+{
+  int32_t l = 2;
+  l = 10 - (int32_t)f();
+  return l < 30; // f() - 11 > -31
+}
+
+int32_t i2b(void)
+{
+  int32_t l = 2;
+  l = 200 - (int32_t)f();
+  return l <= 100; // f() - 201 >= -101
+}
+
+int32_t i2c(void)
+{
+  int32_t l = 2;
+  l = 300 - (int32_t)f();
+  return l < 100; // f() - 301 > -101
+}
+
+int32_t i2d(void)
+{
+  int32_t l = 2;
+  l = 1000 - (int32_t)f();
+  return l >= 2000; // f() - 1001 <= -2001
+}
+
+int32_t i2e(void)
+{
+  int32_t l = 2;
+  l = 1000 - (int32_t)f();
+  return l > 3000; // f() - 1001 < -3001
+}
+
+int32_t i2f(void)
+{
+  int32_t l = 2;
+  l = 2 - (int32_t)f();
+  return l >= 1; // f() - 20001 <= -10001
+}
+
+int32_t i2g(void)
+{
+  int32_t l = 2;
+  l = 3 - (int32_t)f();
+  return l > 1; // f() - 30001 < -10001
+}
+
+/* { dg-final { scan-tree-dump-times "Removing dead stmt:.*?- _" 8 "forwprop1" 
} } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -11.*\n.*>= 
-21" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -11.*\n.*>= 
-30" 1 "forwprop1" } } */
+/* { dg-final {

[PATCH][v2] tree-optimization/115372 - failed store-lanes in some cases

2024-09-23 Thread Richard Biener

The gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c testcase shows
that we sometimes fail to use store-lanes even though it should be
profitable.  We're currently relying on vect_slp_prefer_store_lanes_p
at the point we run into the first SLP discovery mismatch with obviously
limited information.  For the case at hand we have 3, 5 or 7 lanes
of VnDImode [2, 2] vectors with the first mismatch at lane 2 so the
new group size is 1.  The heuristic says that might be an OK split
given the rest is a multiple of the vector lanes.  Now we continue
discovery but in the end mismatches result in uniformly single-lane
SLP instances which we can handle via interleaving but of course are
prime candidates for store-lanes.  The following patch re-assesses
with the extra knowledge now just relying on the fact whether the
target supports store-lanes for the given group size.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/115372
* tree-vect-slp.cc (vect_build_slp_instance): Compute the
uniform, if, number of lanes of the RHS sub-graphs feeding
the store and if uniformly one, use store-lanes if the target
supports that.
---
 gcc/tree-vect-slp.cc | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 600987dd6e5..a4589e45481 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3957,6 +3957,7 @@ vect_build_slp_instance (vec_info *vinfo,
  /* Calculate the unrolling factor based on the smallest type.  */
  poly_uint64 unrolling_factor = 1;
 
+ unsigned int rhs_common_nlanes = 0;
  unsigned int start = 0, end = i;
  while (start < group_size)
{
@@ -3978,6 +3979,10 @@ vect_build_slp_instance (vec_info *vinfo,
 calculate_unrolling_factor
   (max_nunits, end - start));
  rhs_nodes.safe_push (node);
+ if (start == 0)
+   rhs_common_nlanes = SLP_TREE_LANES (node);
+ else if (rhs_common_nlanes != SLP_TREE_LANES (node))
+   rhs_common_nlanes = 0;
  start = end;
  if (want_store_lanes || force_single_lane)
end = start + 1;
@@ -4015,6 +4020,19 @@ vect_build_slp_instance (vec_info *vinfo,
}
}
 
+ /* Now re-assess whether we want store lanes in case the
+discovery ended up producing all single-lane RHSs.  */
+ if (rhs_common_nlanes == 1
+ && ! STMT_VINFO_GATHER_SCATTER_P (stmt_info)
+ && ! STMT_VINFO_STRIDED_P (stmt_info)
+ && compare_step_with_zero (vinfo, stmt_info) > 0
+ && (vect_store_lanes_supported (SLP_TREE_VECTYPE (rhs_nodes[0]),
+ group_size,
+ SLP_TREE_CHILDREN
+   (rhs_nodes[0]).length () != 1)
+ != IFN_LAST))
+   want_store_lanes = true;
+
  /* Now we assume we can build the root SLP node from all stores.  */
  if (want_store_lanes)
{
-- 
2.43.0

[PATCH v2] match: Fix `a != 0 ? a * b : 0` patterns for things that trap [PR116772]

2024-09-23 Thread Andrew Pinski

For generic, `a != 0 ? a * b : 0` would match where `b` would be an expression
which trap (in the case of the testcase, it was an integer division but it 
could be any).

This adds a new helper function, expr_no_side_effects_p which tests if there is 
no side effects
and the expression is not trapping which might be used in other locations.

Changes since v1:
* v2: Add move check to helper function instead of inlining it.

PR middle-end/116772

gcc/ChangeLog:

* generic-match-head.cc (expr_no_side_effects_p): New function
* gimple-match-head.cc (expr_no_side_effects_p): New function
* match.pd (`a != 0 ? a / b : 0`): Check expr_no_side_effects_p.
(`a != 0 ? a * b : 0`, `a != 0 ? a & b : 0`): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116772-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/generic-match-head.cc | 12 
 gcc/gimple-match-head.cc  | 10 ++
 gcc/match.pd  | 10 --
 gcc/testsuite/gcc.dg/torture/pr116772-1.c | 24 +++
 4 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116772-1.c

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index 641d8e9b2de..42dee626613 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -115,6 +115,18 @@ optimize_successive_divisions_p (tree, tree)
   return false;
 }
 
+/* Returns true if the expression T has no side effects
+   including not trapping. */
+static inline bool
+expr_no_side_effects_p (tree t)
+{
+  if (TREE_SIDE_EFFECTS (t))
+return false;
+  if (generic_expr_could_trap_p (t))
+return false;
+  return true;
+}
+
 /* Return true if EXPR1 and EXPR2 have the same value, but not necessarily
same type.  The types can differ through nop conversions.  */
 
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index b5d4a71ddc5..4147a0eb38a 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -145,6 +145,16 @@ optimize_vectors_before_lowering_p ()
   return !cfun || (cfun->curr_properties & PROP_gimple_lvec) == 0;
 }
 
+/* Returns true if the expression T has no side effects
+   including not trapping. */
+static inline bool
+expr_no_side_effects_p (tree t)
+{
+  /* For gimple, there should only be gimple val's here. */
+  gcc_assert (is_gimple_val (t));
+  return true;
+}
+
 /* Return true if pow(cst, x) should be optimized into exp(log(cst) * x).
As a workaround for SPEC CPU2017 628.pop2_s, don't do it if arg0
is an exact integer, arg1 = phi_res +/- cst1 and phi_res = PHI 
diff --git a/gcc/match.pd b/gcc/match.pd
index 940292d0d49..1b88168532d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4679,7 +4679,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (simplify
   (cond (ne @0 integer_zerop) (op@2 @3 @1) integer_zerop )
(if (bitwise_equal_p (@0, @3)
-&& tree_expr_nonzero_p (@1))
+&& tree_expr_nonzero_p (@1)
+   /* Cannot make a expression with side effects
+  unconditional. */
+   && expr_no_side_effects_p (@3))
 @2)))
 
 /* Note we prefer the != case here
@@ -4689,7 +4692,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for op (mult bit_and)
  (simplify
   (cond (ne @0 integer_zerop) (op:c@2 @1 @3) integer_zerop)
-  (if (bitwise_equal_p (@0, @3))
+  (if (bitwise_equal_p (@0, @3)
+   /* Cannot make a expression with side effects
+  unconditional. */
+   && expr_no_side_effects_p (@1))
@2)))
 
 /* Simplifications of shift and rotates.  */
diff --git a/gcc/testsuite/gcc.dg/torture/pr116772-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116772-1.c
new file mode 100644
index 000..eedd0398af1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116772-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* PR middle-end/116772  */
+/* The division by `/b` should not
+   be made uncondtional. */
+
+int mult0(int a,int b) __attribute__((noipa));
+
+int mult0(int a,int b){
+  return (b!=0 ? (a/b)*b : 0);
+}
+
+int bit_and0(int a,int b) __attribute__((noipa));
+
+int bit_and0(int a,int b){
+  return (b!=0 ? (a/b)&b : 0);
+}
+
+int main() {
+  if (mult0(3, 0) != 0)
+__builtin_abort();
+  if (bit_and0(3, 0) != 0)
+__builtin_abort();
+  return 0;
+}
-- 
2.43.0

[PATCH 00/10] c++/modules: Implement P1815 "Translation-unit-local entities"

2024-09-23 Thread Nathaniel Shead

This patch series implements most of the changes made by P1815.  It also
cleans up a few bugs found along the way that impacted tests I wrote.

The whole patch series was bootstrapped on x86_64-pc-linux-gnu with no
regressions.

Nathaniel Shead (10):
  libstdc++: Remove unnecessary 'static' from __is_specialization_of
  c++: Update decl_linkage for C++11
  c++/modules: Use decl_linkage in maybe_record_mergeable_decl
  c++/modules: Fix linkage checks for exported using-decls
  c++/modules: Allow imported references in constant expressions
  c++/modules: Detect exposures of TU-local entities
  c++/modules: Implement ignored TU-local exposures
  c++/modules: Support anonymous namespaces in header units
  c++/modules: Check linkage for exported declarations
  c++/modules: Validate external linkage definitions in header units
[PR116401]

 gcc/c-family/c.opt  |   4 +
 gcc/cp/cp-objcp-common.cc   |   1 +
 gcc/cp/cp-tree.def  |   6 +
 gcc/cp/cp-tree.h|  29 +-
 gcc/cp/decl.cc  |   1 +
 gcc/cp/decl2.cc |   1 +
 gcc/cp/module.cc| 764 +---
 gcc/cp/name-lookup.cc   |  88 +--
 gcc/cp/name-lookup.h|   2 +-
 gcc/cp/parser.cc|  25 +-
 gcc/cp/parser.h |   3 +
 gcc/cp/pt.cc| 100 ++-
 gcc/cp/tree.cc  |  92 ++-
 gcc/doc/invoke.texi |  19 +-
 gcc/testsuite/g++.dg/modules/block-decl-2.C |   2 +-
 gcc/testsuite/g++.dg/modules/cexpr-5_a.C|  13 +
 gcc/testsuite/g++.dg/modules/cexpr-5_b.C|   9 +
 gcc/testsuite/g++.dg/modules/export-3.C |   2 +-
 gcc/testsuite/g++.dg/modules/export-6.C |  35 +
 gcc/testsuite/g++.dg/modules/hdr-2.H| 164 +
 gcc/testsuite/g++.dg/modules/internal-1.C   |  15 +-
 gcc/testsuite/g++.dg/modules/internal-3.C   |  18 +
 gcc/testsuite/g++.dg/modules/internal-4.C   | 112 +++
 gcc/testsuite/g++.dg/modules/internal-5_a.C | 104 +++
 gcc/testsuite/g++.dg/modules/internal-5_b.C |  29 +
 gcc/testsuite/g++.dg/modules/internal-6.C   |  24 +
 gcc/testsuite/g++.dg/modules/internal-7_a.C |  75 ++
 gcc/testsuite/g++.dg/modules/internal-7_b.C |  21 +
 gcc/testsuite/g++.dg/modules/internal-8_a.H |  28 +
 gcc/testsuite/g++.dg/modules/internal-8_b.C |  29 +
 gcc/testsuite/g++.dg/modules/linkage-2.C|   5 +-
 gcc/testsuite/g++.dg/modules/macro-4_c.H|   2 +-
 gcc/testsuite/g++.dg/modules/mod-sym-4.C|   4 +-
 gcc/testsuite/g++.dg/modules/pr106761.h |   2 +-
 gcc/testsuite/g++.dg/modules/pr98843_b.H|   2 +-
 gcc/testsuite/g++.dg/modules/pr99468.H  |   2 +-
 gcc/testsuite/g++.dg/modules/pragma-1_a.H   |   2 +-
 gcc/testsuite/g++.dg/modules/tpl-ary-1.h|   2 +-
 gcc/testsuite/g++.dg/modules/using-10.C |  56 +-
 gcc/testsuite/g++.dg/modules/using-12.C |  42 +-
 gcc/testsuite/g++.dg/modules/using-27.C |  14 +
 gcc/testsuite/g++.dg/modules/using-28_a.C   |  12 +
 gcc/testsuite/g++.dg/modules/using-28_b.C   |   8 +
 libcc1/libcp1plugin.cc  |   2 +-
 libstdc++-v3/include/std/format |   5 +-
 45 files changed, 1738 insertions(+), 237 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-5_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-5_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/export-6.C
 create mode 100644 gcc/testsuite/g++.dg/modules/hdr-2.H
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-4.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-5_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-5_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-6.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-7_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-7_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-8_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-8_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-27.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-28_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-28_b.C

-- 
2.46.0

[PATCH 04/10] c++/modules: Fix linkage checks for exported using-decls

2024-09-23 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

This fixes some inconsistencies with what kinds of linkage various
entities are assumed to have.  This also fixes handling of exported
using-decls binding to GM entities and type aliases to better align with
the standard's requirements.

gcc/cp/ChangeLog:

* name-lookup.cc (check_can_export_using_decl): Handle internal
linkage GM entities, and use linkage of entity ultimately
referred to by aliases.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-10.C: Add tests for no-linkage, fix
expected linkage of aliases.
* g++.dg/modules/using-12.C: Likewise.
* g++.dg/modules/using-27.C: New test.
* g++.dg/modules/using-28_a.C: New test.
* g++.dg/modules/using-28_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc | 55 --
 gcc/testsuite/g++.dg/modules/using-10.C   | 56 ++-
 gcc/testsuite/g++.dg/modules/using-12.C   | 42 +++--
 gcc/testsuite/g++.dg/modules/using-27.C   | 14 ++
 gcc/testsuite/g++.dg/modules/using-28_a.C | 12 +
 gcc/testsuite/g++.dg/modules/using-28_b.C |  8 
 6 files changed, 145 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/using-27.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-28_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-28_b.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index c0f89f98d87..cbb2827808f 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -5206,38 +5206,43 @@ pushdecl_outermost_localscope (tree x)
 static bool
 check_can_export_using_decl (tree binding)
 {
-  tree decl = STRIP_TEMPLATE (binding);
-
-  /* Linkage is determined by the owner of an enumerator.  */
-  if (TREE_CODE (decl) == CONST_DECL)
-decl = TYPE_NAME (DECL_CONTEXT (decl));
-
-  /* If the using decl is exported, the things it refers
- to must also be exported (or not have module attachment).  */
-  if (!DECL_MODULE_EXPORT_P (decl)
-  && (DECL_LANG_SPECIFIC (decl)
- && DECL_MODULE_ATTACH_P (decl)))
+  /* We want the linkage of the underlying entity, so strip typedefs.
+ If the underlying entity is a builtin type then we're OK.  */
+  tree entity = binding;
+  if (TREE_CODE (entity) == TYPE_DECL)
 {
-  bool internal_p = !TREE_PUBLIC (decl);
+  entity = TYPE_MAIN_DECL (TREE_TYPE (entity));
+  if (!entity)
+   return true;
+}
+
+  linkage_kind linkage = decl_linkage (entity);
+  tree not_tmpl = STRIP_TEMPLATE (entity);
 
-  /* A template in an anonymous namespace doesn't constrain TREE_PUBLIC
-until it's instantiated, so double-check its context.  */
-  if (!internal_p && TREE_CODE (binding) == TEMPLATE_DECL)
-   internal_p = decl_internal_context_p (decl);
+  /* Attachment is determined by the owner of an enumerator.  */
+  if (TREE_CODE (not_tmpl) == CONST_DECL)
+not_tmpl = TYPE_NAME (DECL_CONTEXT (not_tmpl));
 
+  /* If the using decl is exported, the things it refers to must
+ have external linkage.  decl_linkage returns lk_external for
+ module linkage so also check for attachment.  */
+  if (linkage != lk_external
+  || (DECL_LANG_SPECIFIC (not_tmpl)
+ && DECL_MODULE_ATTACH_P (not_tmpl)
+ && !DECL_MODULE_EXPORT_P (not_tmpl)))
+{
   auto_diagnostic_group d;
   error ("exporting %q#D that does not have external linkage",
 binding);
-  if (TREE_CODE (decl) == TYPE_DECL && !DECL_IMPLICIT_TYPEDEF_P (decl))
-   /* An un-exported explicit type alias has no linkage.  */
-   inform (DECL_SOURCE_LOCATION (binding),
-   "%q#D declared here with no linkage", binding);
-  else if (internal_p)
-   inform (DECL_SOURCE_LOCATION (binding),
-   "%q#D declared here with internal linkage", binding);
+  if (linkage == lk_none)
+   inform (DECL_SOURCE_LOCATION (entity),
+   "%q#D declared here with no linkage", entity);
+  else if (linkage == lk_internal)
+   inform (DECL_SOURCE_LOCATION (entity),
+   "%q#D declared here with internal linkage", entity);
   else
-   inform (DECL_SOURCE_LOCATION (binding),
-   "%q#D declared here with module linkage", binding);
+   inform (DECL_SOURCE_LOCATION (entity),
+   "%q#D declared here with module linkage", entity);
   return false;
 }
 
diff --git a/gcc/testsuite/g++.dg/modules/using-10.C 
b/gcc/testsuite/g++.dg/modules/using-10.C
index d468a36f5d8..6f82b5dd147 100644
--- a/gcc/testsuite/g++.dg/modules/using-10.C
+++ b/gcc/testsuite/g++.dg/modules/using-10.C
@@ -23,6 +23,13 @@ namespace s {
   }
 }
 
+export using s::a1;  // { dg-error "does not have external linkage" }
+export using s::b1;  // { dg-error "does not have external linkage" }
+export using s::x1;  // { dg-error "does not have external linkage" }
+e

[PATCH 05/10] c++/modules: Allow imported references in constant expressions

2024-09-23 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently the streaming code uses TREE_CONSTANT to determine whether an
entity will have a definition that is interesting to stream out.  This
is not sufficient, however; we also need to write the definition of
references, since although not TREE_CONSTANT they can still be usable in
constant expressions.

gcc/cp/ChangeLog:

* module.cc (has_definition): Also write definition of
references initialized with a constant expression.

gcc/testsuite/ChangeLog:

* g++.dg/modules/cexpr-5_a.C: New test.
* g++.dg/modules/cexpr-5_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc |  6 +-
 gcc/testsuite/g++.dg/modules/cexpr-5_a.C | 13 +
 gcc/testsuite/g++.dg/modules/cexpr-5_b.C |  9 +
 3 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-5_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-5_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f5df9e875d3..7589de2348d 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11829,7 +11829,11 @@ has_definition (tree decl)
   since there's no TU to emit them in otherwise.  */
return true;
 
- if (!TREE_CONSTANT (decl))
+ if (!TREE_CONSTANT (decl)
+ /* A reference is never TREE_CONSTANT, but still stream its
+definition if it's usable in constant expressions.  */
+ && !(TYPE_REF_P (TREE_TYPE (decl))
+  && DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl)))
return false;
 
  return true;
diff --git a/gcc/testsuite/g++.dg/modules/cexpr-5_a.C 
b/gcc/testsuite/g++.dg/modules/cexpr-5_a.C
new file mode 100644
index 000..3a9f00523f6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/cexpr-5_a.C
@@ -0,0 +1,13 @@
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi M }
+
+export module M;
+
+int x = 123;
+void f() {}
+
+int& xr = x;
+auto& fr = f;
+
+constexpr int& cxr = xr;
+constexpr auto& cfr = fr;
diff --git a/gcc/testsuite/g++.dg/modules/cexpr-5_b.C 
b/gcc/testsuite/g++.dg/modules/cexpr-5_b.C
new file mode 100644
index 000..4b1b901104b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/cexpr-5_b.C
@@ -0,0 +1,9 @@
+// { dg-additional-options "-fmodules-ts" }
+
+module M;
+
+constexpr auto& use_xr = xr;
+constexpr auto& use_fr = fr;
+
+static_assert(&cxr == &use_xr);
+static_assert(&cfr == &use_fr);
-- 
2.46.0

[PATCH 07/10] c++/modules: Implement ignored TU-local exposures

2024-09-23 Thread Nathaniel Shead

Currently I just stream DECL_NAME in TU_LOCAL_ENTITYs for use in diagnostics,
but this feels perhaps insufficient.  Are there any better approached here?
Otherwise I don't think it matters too much, as which entity it is will also
be hopefully clear from the 'declared here' notes.

I've put the new warning in Wextra, but maybe it would be better to just
leave it out of any of the normal warning groups since there's currently
no good way to work around the warnings it produces?

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

[basic.link] p14 lists a number of circumstances where a declaration
naming a TU-local entity is not an exposure, notably the bodies of
non-inline templates and friend declarations in classes.  This patch
ensures that these references do not error when exporting the module.

We do need to still error on instantiation from a different module,
however, in case this refers to a TU-local entity.  As such this patch
adds a new tree TU_LOCAL_ENTITY which is used purely as a placeholder to
poison any attempted template instantiations that refer to it.

This is also streamed for friend decls so that merging (based on the
index of an entity into the friend decl list) doesn't break and to
prevent complicating the logic; I imagine this shouldn't ever come up
though.

We also add a new warning, '-Wignored-exposures', to handle the case
where someone accidentally refers to a TU-local value from within a
non-inline function template.  This will compile without errors as-is,
but any attempt to instantiate the decl will fail; this warning can be
used to ensure that this doesn't happen.  Unfortunately the warning has
quite some false positives; for instance, a user could deliberately only
call explicit instantiations of the decl, or use 'if constexpr' to avoid
instantiating the TU-local entity from other TUs, neither of which are
currently detected.

The main piece that this patch doesn't yet attempt to solve is ADL: as
specified, if ADL adds an overload set that includes a translation-unit
local entity when instantiating a template, that overload set is now
poisoned and counts as an exposure.  Unfortunately, we don't currently
differentiate between decls that are hidden due to not being exported,
or decls that are hidden due to being hidden friends, so this patch
instead just keeps the current (wrong) behaviour of non-exported
entities not being visible to ADL at all.

gcc/c-family/ChangeLog:

* c.opt: New warning '-Wignored-exposures'.

gcc/cp/ChangeLog:

* cp-objcp-common.cc (cp_tree_size): Add TU_LOCAL_ENTITY.
* cp-tree.def (TU_LOCAL_ENTITY): New tree code.
* cp-tree.h (struct tree_tu_local_entity): New type.
(TU_LOCAL_ENTITY_NAME): New accessor.
(TU_LOCAL_ENTITY_LOCATION): New accessor.
(enum cp_tree_node_structure_enum): Add TS_CP_TU_LOCAL_ENTITY.
(union GTY): Add tu_local_entity field.
* module.cc (enum tree_tag): New flag DB_IGNORED_EXPOSURE_BIT.
(depset::is_ignored_exposure): New accessor.
(depset::has_defn): Override for TU-local entities.
(depset::hash::ignore_exposure): New field.
(depset::hash::hash): Initialize it.
(trees_out::tree_tag::tt_tu_local): New flag.
(trees_out::writing_local_entities): New field.
(trees_out::is_initial_scan): New function.
(trees_out::tu_local_count): New counter.
(trees_out::trees_out): Initialize writing_local_entities.
(dumper::impl::nested_name): Handle TU_LOCAL_ENTITY.
(trees_out::instrument): Report TU-local entity counts.
(trees_out::decl_value): Early exit for TU-local entities.
(trees_in::decl_value): Handle typedefs of TU-local entities.
(trees_out::decl_node): Adjust assertion to cope with early exit
of TU-local deps.  Always write TU-local entities by value.
(trees_out::type_node): Handle TU-local types.
(trees_out::has_tu_local_dep): New function.
(trees_out::find_tu_local_decl): New function.
(trees_out::tree_node): Intercept TU-local entities and write
placeholder values for them instead of normal streaming.
(trees_in::tree_node): Handle TU-local template results.
(trees_out::write_function_def): Ignore exposures in non-inline
function bodies.
(trees_out::write_var_def): Ignore exposures in initializers.
(trees_out::write_class_def): Ignore exposures in friend decls.
(trees_in::read_class_def): Skip TU-local friends.
(trees_out::write_definition): Record whether we're writing a
decl which refers to TU-local entities.
(depset::hash::add_dependency): Handle ignored exposures.
(depset::hash::find_dependencies): Use depset's own is_key_order
function rather than delegating via walker.  Pass whether the
decl has ignored TU-local entities in its definition.
(depset::hash::finalize_dependencies):

[PATCH 06/10] c++/modules: Detect exposures of TU-local entities

2024-09-23 Thread Nathaniel Shead

I feel like there should be a way to make use of LAMBDA_TYPE_EXTRA_SCOPE to
avoid the need for the new TYPE_DEFINED_IN_INITIALIZER_P flag, perhaps once
something like my patch here[1] is accepted (but with further embellishments
for concepts, probably), but I wasn't able to work it out. Since currently as
far as I'm aware only lambdas can satisfy being a type with no name defined in
an 'initializer' this does seem a little overkill but I've applied it to all
class types just in case.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662393.html

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently, the modules streaming code implements some checks for
declarations in the CMI that reference (some kinds of) internal-linkage
entities, and errors if so.  This patch expands on that support to
implement the logic for exposures of TU-local entities as defined in
[basic.link] since P1815.

This will cause some code that previously errored in modules to start
compiling; for instance, template specialisations of internal linkage
functions.

However, some code that previously appeared valid will with this patch
no longer compile, notably some kinds of usages of internal linkage
functions included from the GMF.  This appears to be related to P2808
and FR-025, however as yet there doesn't appear to be consensus for
changing these rules so I've implemented them as-is.

This patch leaves a couple of things out.  In particular, a couple of
the rules for what is a TU-local entity currently seem to me to be
redundant; I've left them as FIXMEs to be handled once I can find
testcases that aren't adequately supported by the other logic here.

Additionally, there are some exceptions for when naming a TU-local
entity is not always an exposure; I've left support for this to a
follow-up patch for easier review, as it has broader implications for
streaming.

Finally, this patch makes a couple of small adjustments to the modules
streaming logic to prune any leftover TU-local deps (that aren't
erroneous exposures).  This is required for this patch to ensure that
later stages don't get confused by any leftover TU-local entities
floating around.

gcc/cp/ChangeLog:

* cp-tree.h (TYPE_DEPENDENT_P_VALID): Fix whitespace.
(TYPE_DEFINED_IN_INITIALIZER_P): New accessor.
* module.cc (DB_IS_INTERNAL_BIT): Rename to...
(DB_TU_LOCAL_BIT): ...this.
(DB_REFS_INTERNAL_BIT): Rename to...
(DB_EXPOSURE_BIT): ...this.
(depset::hash::is_internal): Rename to...
(depset::hash::is_tu_local): ...this.
(depset::hash::refs_internal): Rename to...
(depset::hash::is_exposure): ...this.
(depset::hash::is_tu_local_entity): New function.
(depset::hash::has_tu_local_tmpl_arg): New function.
(depset::hash::is_tu_local_value): New function.
(depset::hash::make_dependency): Check for TU-local entities.
(depset::hash::add_dependency): Make current an exposure
whenever it references a TU-local entity.
(depset::hash::add_binding_entity): Don't create bindings for
any TU-local entity.
(depset::hash::finalize_dependencies): Rename flags and adjust
diagnostic messages to report exposures of TU-local entities.
(depset::tarjan::connect): Don't include any TU-local depsets.
(depset::hash::connect): Likewise.
* parser.h (struct cp_parser::in_initializer_p): New flag.
* parser.cc (cp_debug_parser): Print the new flag.
(cp_parser_new): Set the new flag to false.
(cp_parser_lambda_expression): Mark whether the lambda was
defined in an initializer.
(cp_parser_initializer): Set the new flag to true while parsing.
(cp_parser_class_head): Mark whether the class was defined in an
initializer.
(cp_parser_concept_definition): Set the new flag to true while
parsing.

gcc/testsuite/ChangeLog:

* g++.dg/modules/block-decl-2.C: Adjust messages.
* g++.dg/modules/internal-1.C: Adjust messages, remove XFAILs.
* g++.dg/modules/linkage-2.C: Adjust messages, remove XFAILS.
* g++.dg/modules/internal-3.C: New test.
* g++.dg/modules/internal-4.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h|   7 +-
 gcc/cp/module.cc| 388 +---
 gcc/cp/parser.cc|  16 +
 gcc/cp/parser.h |   3 +
 gcc/testsuite/g++.dg/modules/block-decl-2.C |   2 +-
 gcc/testsuite/g++.dg/modules/internal-1.C   |  15 +-
 gcc/testsuite/g++.dg/modules/internal-3.C   |  18 +
 gcc/testsuite/g++.dg/modules/internal-4.C   | 112 ++
 gcc/testsuite/g++.dg/modules/linkage-2.C|   5 +-
 9 files changed, 493 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-4.C

di

[PATCH 09/10] c++/modules: Check linkage for exported declarations

2024-09-23 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

By [module.interface] p3, if an exported declaration is not within a
header unit, it shall not declare a name with internal linkage.

Unfortunately we cannot just do this within set_originating_module,
since at the locations its called the linkage for declarations are not
always fully determined yet.  We could move the calls but this causes
the checking assertion to fail as the originating module declaration may
have moved, and in general for some kinds of declarations it's not
always obvious where it should be moved to.

This patch instead introduces a new function to check that the linkage
of a declaration within a module is correct, to be called for all
declarations once their linkage is fully determined.

As a drive-by fix this patch also improves the source location of
namespace aliases to point at the identifier rather than the terminating
semicolon.

gcc/cp/ChangeLog:

* cp-tree.h (check_module_decl_linkage): Declare.
* decl2.cc (finish_static_data_member_decl): Check linkage.
* module.cc (set_originating_module): Adjust comment.
(check_module_decl_linkage): New function.
* name-lookup.cc (do_namespace_alias): Build alias with
specified location, check linkage.
(pushtag): Check linkage.
(push_namespace): Slightly clarify error message.
* name-lookup.h (do_namespace_alias): Add location parameter.
* parser.cc (cp_parser_namespace_alias_definition): Pass
identifier location to do_namespace_alias.
(cp_parser_alias_declaration): Check linkage.
(cp_parser_init_declarator): Check linkage.
(cp_parser_function_definition_after_declarator): Check linkage.
(cp_parser_save_member_function_body): Check linkage.
* pt.cc (finish_concept_definition): Mark as public, check
linkage.

libcc1/ChangeLog:

* libcp1plugin.cc (plugin_add_namespace_alias): Call
do_namespace_alias with input_location.

gcc/testsuite/ChangeLog:

* g++.dg/modules/export-3.C: Adjust error message.
* g++.dg/modules/export-6.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h|  1 +
 gcc/cp/decl2.cc |  1 +
 gcc/cp/module.cc| 29 +---
 gcc/cp/name-lookup.cc   | 15 ---
 gcc/cp/name-lookup.h|  2 +-
 gcc/cp/parser.cc|  9 ++-
 gcc/cp/pt.cc|  2 ++
 gcc/testsuite/g++.dg/modules/export-3.C |  2 +-
 gcc/testsuite/g++.dg/modules/export-6.C | 35 +
 libcc1/libcp1plugin.cc  |  2 +-
 10 files changed, 87 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/export-6.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 85c1d23c240..05731a66df3 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7454,6 +7454,7 @@ extern void set_originating_module (tree, bool friend_p = 
false);
 extern tree get_originating_module_decl (tree) ATTRIBUTE_PURE;
 extern int get_originating_module (tree, bool for_mangle = false) 
ATTRIBUTE_PURE;
 extern unsigned get_importing_module (tree, bool = false) ATTRIBUTE_PURE;
+extern void check_module_decl_linkage (tree);
 
 /* Where current instance of the decl got declared/defined/instantiated.  */
 extern void set_instantiating_module (tree);
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 0279372488c..97ce4473b1c 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -1019,6 +1019,7 @@ finish_static_data_member_decl (tree decl,
 }
 
   cp_finish_decl (decl, init, init_const_expr_p, asmspec_tree, flags);
+  check_module_decl_linkage (decl);
 }
 
 /* DECLARATOR and DECLSPECS correspond to a class member.  The other
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f114c2ec980..d7e6fe2c54f 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19920,11 +19920,34 @@ set_originating_module (tree decl, bool friend_p 
ATTRIBUTE_UNUSED)
   DECL_MODULE_ATTACH_P (decl) = true;
 }
 
-  if (!module_exporting_p ())
+  /* It is illegal to export a declaration with internal linkage.  However, at
+ the point this function is called we don't always know yet whether this
+ declaration has internal linkage; instead we defer this check for callers
+ to do once visibility has been determined.  */
+  if (module_exporting_p ())
+DECL_MODULE_EXPORT_P (decl) = true;
+}
+
+/* Checks whether DECL within a module unit has valid linkage for its kind.
+   Must be called after visibility for DECL has been finalised.  */
+
+void
+check_module_decl_linkage (tree decl)
+{
+  if (!module_has_cmi_p ())
 return;
 
-  // FIXME: Check ill-formed linkage
-  DECL_MODULE_EXPORT_P (decl) = true;
+  /* An internal-linkage declaration cannot be generally be exported.
+ But it's OK to export any declaration from a header u

[PATCH 02/10] c++: Update decl_linkage for C++11

2024-09-23 Thread Nathaniel Shead

This patch intends no change in functionality apart from the mangling
difference noted; more tests are in patch 4 of this series, which adds a
way to actually check what the linkage of decl_linkage provides more
directly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently modules code uses a variety of ad-hoc methods to attempt to
determine whether an entity has internal linkage, which leads to
inconsistencies and some correctness issues as different edge cases are
neglected.  While investigating this I discovered 'decl_linkage', but it
doesn't seem to have been updated to account for the C++11 clarification
that all entities declared in an anonymous namespace are internal.

I'm not convinced that even in C++98 it was intended that e.g. types in
anonymous namespaces should be external, but some tests in the testsuite
rely on this, so for compatibility I restricted those modifications to
C++11 and later.

This should have relatively minimal impact as not much seems to actually
rely on decl_linkage, but does change the mangling of symbols in
anonymous namespaces slightly.  Previously, we had

  namespace {
int x;  // mangled as '_ZN12_GLOBAL__N_11xE'
static int y;  // mangled as '_ZN12_GLOBAL__N_1L1yE'
  }

but with this patch the x is now mangled like y (with the extra 'L').
For contrast, Clang currently mangles neither x nor y with the 'L'.
Since this only affects internal-linkage entities I don't believe this
should break ABI in any observable fashion.

gcc/cp/ChangeLog:

* name-lookup.cc (do_namespace_alias): Propagate TREE_PUBLIC for
namespace aliases.
* tree.cc (decl_linkage): Update rules for C++11.

gcc/testsuite/ChangeLog:

* g++.dg/modules/mod-sym-4.C: Update test to account for
non-static internal-linkage variables new mangling.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc|  1 +
 gcc/cp/tree.cc   | 92 +++-
 gcc/testsuite/g++.dg/modules/mod-sym-4.C |  4 +-
 3 files changed, 60 insertions(+), 37 deletions(-)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index c7a693e02d5..50e169eca43 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -6610,6 +6610,7 @@ do_namespace_alias (tree alias, tree name_space)
   DECL_NAMESPACE_ALIAS (alias) = name_space;
   DECL_EXTERNAL (alias) = 1;
   DECL_CONTEXT (alias) = FROB_CONTEXT (current_scope ());
+  TREE_PUBLIC (alias) = TREE_PUBLIC (DECL_CONTEXT (alias));
   set_originating_module (alias);
 
   pushdecl (alias);
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index f43febed124..28e14295de4 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -5840,7 +5840,7 @@ char_type_p (tree type)
  || same_type_p (type, wchar_type_node));
 }
 
-/* Returns the kind of linkage associated with the indicated DECL.  Th
+/* Returns the kind of linkage associated with the indicated DECL.  The
value returned is as specified by the language standard; it is
independent of implementation details regarding template
instantiation, etc.  For example, it is possible that a declaration
@@ -5857,53 +5857,75 @@ decl_linkage (tree decl)
  linkage first, and then transform that into a concrete
  implementation.  */
 
-  /* Things that don't have names have no linkage.  */
-  if (!DECL_NAME (decl))
-return lk_none;
+  /* An explicit type alias has no linkage.  */
+  if (TREE_CODE (decl) == TYPE_DECL
+  && !DECL_IMPLICIT_TYPEDEF_P (decl)
+  && !DECL_SELF_REFERENCE_P (decl))
+{
+  /* But this could be a typedef name for linkage purposes, in which
+case we're interested in the linkage of the main decl.  */
+  if (decl == TYPE_NAME (TYPE_MAIN_VARIANT (TREE_TYPE (decl
+   decl = TYPE_MAIN_DECL (TREE_TYPE (decl));
+  else
+   return lk_none;
+}
 
-  /* Fields have no linkage.  */
-  if (TREE_CODE (decl) == FIELD_DECL)
+  /* Namespace-scope entities with no name usually have no linkage.  */
+  if (NAMESPACE_SCOPE_P (decl)
+  && (!DECL_NAME (decl) || IDENTIFIER_ANON_P (DECL_NAME (decl
+{
+  if (TREE_CODE (decl) == TYPE_DECL && !TYPE_ANON_P (TREE_TYPE (decl)))
+   /* This entity has a typedef name for linkage purposes.  */;
+  else if (TREE_CODE (decl) == NAMESPACE_DECL && cxx_dialect >= cxx11)
+   /* An anonymous namespace has internal linkage since C++11.  */
+   return lk_internal;
+  else
+   return lk_none;
+}
+
+  /* Fields and parameters have no linkage.  */
+  if (TREE_CODE (decl) == FIELD_DECL || TREE_CODE (decl) == PARM_DECL)
 return lk_none;
 
-  /* Things in local scope do not have linkage.  */
+  /* Things in block scope do not have linkage.  */
   if (decl_function_context (decl))
 return lk_none;
 
+  /* Things in class scope have the linkage of their owning class.  */
+  if (tree ctype = DECL_CLASS_CONTEXT (decl))
+return decl_linkage (TYPE_NAME (ctype));
+
+  /* Anonymo

[PATCH 01/10] libstdc++: Remove unnecessary 'static' from __is_specialization_of

2024-09-23 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

This makes the declarations internal linkage, which is an ODR issue, and
causes a future modules patch to fail regtest as it now detects attempted
uses of TU-local entities in module CMIs.

libstdc++-v3/ChangeLog:

* include/std/format: Remove unnecessary 'static'.

Signed-off-by: Nathaniel Shead 
---
 libstdc++-v3/include/std/format | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index e963d7f79b3..d9014d111b1 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -361,10 +361,9 @@ namespace __format
 
 /// @cond undocumented
   template class _Class>
-static constexpr bool __is_specialization_of = false;
+constexpr bool __is_specialization_of = false;
   template class _Class, typename... _Args>
-static constexpr bool __is_specialization_of<_Class<_Args...>, _Class>
-  = true;
+constexpr bool __is_specialization_of<_Class<_Args...>, _Class> = true;
 
 namespace __format
 {
-- 
2.46.0

[PATCH 03/10] c++/modules: Use decl_linkage in maybe_record_mergeable_decl

2024-09-23 Thread Nathaniel Shead

I don't currently have any testcases where this changes something, but I felt
it to be a valuable cleanup.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

This avoids any possible inconsistencies (current or future) about
whether a declaration is internal or not.

gcc/cp/ChangeLog:

* name-lookup.cc (maybe_record_mergeable_decl): Use decl_linkage
instead of ad-hoc checks.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 50e169eca43..c0f89f98d87 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3725,17 +3725,10 @@ maybe_record_mergeable_decl (tree *slot, tree name, 
tree decl)
   if (TREE_CODE (*slot) != BINDING_VECTOR)
 return;
 
-  if (!TREE_PUBLIC (CP_DECL_CONTEXT (decl)))
-/* Member of internal namespace.  */
+  if (decl_linkage (decl) == lk_internal)
 return;
 
   tree not_tmpl = STRIP_TEMPLATE (decl);
-  if ((TREE_CODE (not_tmpl) == FUNCTION_DECL
-   || VAR_P (not_tmpl))
-  && DECL_THIS_STATIC (not_tmpl))
-/* Internal linkage.  */
-return;
-
   bool is_attached = (DECL_LANG_SPECIFIC (not_tmpl)
  && DECL_MODULE_ATTACH_P (not_tmpl));
   tree *gslot = get_fixed_binding_slot
-- 
2.46.0

[PATCH 10/10] c++/modules: Validate external linkage definitions in header units [PR116401]

2024-09-23 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

[module.import] p6 says "A header unit shall not contain a definition of
a non-inline function or variable whose name has external linkage."

This patch implements this requirement, and cleans up some issues in the
testsuite where this was already violated.  To handle deduction guides
we mark them as inline, since although we give them a definition for
implementation by the standard they have no definition, and so should
not error in this case.

One remaining question is the behaviour of code like:

  struct S { static const int x = 123; };

'S::x' is not 'inline' here, but this is legal code as long as there is
exactly one definition elsewhere if 'x' is ever odr-used, as specified
by [class.static.data] p4.  However, since it's not 'inline' then the
exemption for [module.import] does not apply, and so it appears uses of
this in header units should error.

Unfortunately the standard library headers do this, and there doesn't
appear to be an easy C++98-compatible way to adjust this.  Additionally
I'm not entirely certain that this wasn't an oversight; as such this
patch reduces this specific case to merely a pedwarn.

PR c++/116401

gcc/cp/ChangeLog:

* decl.cc (grokfndecl): Mark deduction guides as 'inline'.
* module.cc (check_module_decl_linkage): Implement checks for
non-inline external linkage definitions in headers.

gcc/testsuite/ChangeLog:

* g++.dg/modules/macro-4_c.H: Add missing 'inline'.
* g++.dg/modules/pr106761.h: Likewise.
* g++.dg/modules/pr98843_b.H: Likewise.
* g++.dg/modules/pr99468.H: Likewise.
* g++.dg/modules/pragma-1_a.H: Likewise.
* g++.dg/modules/tpl-ary-1.h: Likewise.
* g++.dg/modules/hdr-2.H: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/decl.cc|   1 +
 gcc/cp/module.cc  |  33 +
 gcc/testsuite/g++.dg/modules/hdr-2.H  | 164 ++
 gcc/testsuite/g++.dg/modules/macro-4_c.H  |   2 +-
 gcc/testsuite/g++.dg/modules/pr106761.h   |   2 +-
 gcc/testsuite/g++.dg/modules/pr98843_b.H  |   2 +-
 gcc/testsuite/g++.dg/modules/pr99468.H|   2 +-
 gcc/testsuite/g++.dg/modules/pragma-1_a.H |   2 +-
 gcc/testsuite/g++.dg/modules/tpl-ary-1.h  |   2 +-
 9 files changed, 204 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/hdr-2.H

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 2190ede745b..0203bbb682b 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -10818,6 +10818,7 @@ grokfndecl (tree ctype,
 have one: the restriction that you can't repeat a deduction guide
 makes them more like a definition anyway.  */
   DECL_INITIAL (decl) = void_node;
+  DECL_DECLARED_INLINE_P (decl) = true;
   break;
 default:
   break;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index d7e6fe2c54f..143eb676ce9 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19937,6 +19937,39 @@ check_module_decl_linkage (tree decl)
   if (!module_has_cmi_p ())
 return;
 
+  /* A header unit shall not contain a definition of a non-inline function
+ or variable (not template) whose name has external linkage.  */
+  if (header_module_p ()
+  && !processing_template_decl
+  && ((TREE_CODE (decl) == FUNCTION_DECL
+  && !DECL_DECLARED_INLINE_P (decl)
+  && DECL_INITIAL (decl))
+ || (TREE_CODE (decl) == VAR_DECL
+ && !DECL_INLINE_VAR_P (decl)
+ && DECL_INITIALIZED_P (decl)))
+  && !(DECL_LANG_SPECIFIC (decl)
+  && DECL_TEMPLATE_INSTANTIATION (decl))
+  && decl_linkage (decl) == lk_external)
+{
+  /* Strictly speaking,
+
+  struct S { static const int x = 123; };
+
+is not valid in a header unit as currently specified.  But this is
+done within the standard library, and there doesn't seem to be a
+C++98-compatible alternative, so we support this with a pedwarn.  */
+  if (VAR_P (decl)
+ && DECL_CLASS_SCOPE_P (decl)
+ && DECL_INITIALIZED_IN_CLASS_P (decl))
+   pedwarn (DECL_SOURCE_LOCATION (decl), OPT_Wpedantic,
+"external linkage definition of %qD in header module must "
+"be declared %", decl);
+  else
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "external linkage definition of %qD in header module must "
+ "be declared %", decl);
+}
+
   /* An internal-linkage declaration cannot be generally be exported.
  But it's OK to export any declaration from a header unit, including
  internal linkage declarations.  */
diff --git a/gcc/testsuite/g++.dg/modules/hdr-2.H 
b/gcc/testsuite/g++.dg/modules/hdr-2.H
new file mode 100644
index 000..bb3d7d70123
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/hdr-2.H
@@ -0,0 +1,164 @@
+// { dg-additional-options "-fmodule-header -Wno-error=pedantic" }
+// {

[PATCH 08/10] c++/modules: Support anonymous namespaces in header units

2024-09-23 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

A header unit may contain anonymous namespaces, and those declarations
are exported (as with any declaration in a header unit).  This patch
ensures that such declarations are correctly handled.

The change to 'make_namespace_finish' is required so that if an
anonymous namespace is first seen by an import it is correctly handled
within 'add_imported_namespace'.  I don't see any particular reason why
handling of anonymous namespaces here had to be handled separately
outside that function since these are the only two callers.

gcc/cp/ChangeLog:

* module.cc (depset::hash::add_binding_entity): Also walk
anonymous namespaces.
(module_state::write_namespaces): Adjust assertion.
* name-lookup.cc (push_namespace): Move anon using-directive
handling to...
(make_namespace_finish): ...here.

gcc/testsuite/ChangeLog:

* g++.dg/modules/internal-8_a.H: New test.
* g++.dg/modules/internal-8_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc|  7 +++--
 gcc/cp/name-lookup.cc   |  8 +++---
 gcc/testsuite/g++.dg/modules/internal-8_a.H | 28 
 gcc/testsuite/g++.dg/modules/internal-8_b.C | 29 +
 4 files changed, 63 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-8_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-8_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 7b1e69cb4c0..f114c2ec980 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13717,15 +13717,15 @@ depset::hash::add_binding_entity (tree decl, 
WMB_Flags flags, void *data_)
   return (flags & WMB_Using
  ? flags & WMB_Export : DECL_MODULE_EXPORT_P (decl));
 }
-  else if (DECL_NAME (decl) && !data->met_namespace)
+  else if (!data->met_namespace)
 {
   /* Namespace, walk exactly once.  */
-  gcc_checking_assert (TREE_PUBLIC (decl));
   data->met_namespace = true;
   if (data->hash->add_namespace_entities (decl, data->partitions))
{
  /* It contains an exported thing, so it is exported.  */
  gcc_checking_assert (DECL_MODULE_PURVIEW_P (decl));
+ gcc_checking_assert (TREE_PUBLIC (decl) || header_module_p ());
  DECL_MODULE_EXPORT_P (decl) = true;
}
 
@@ -16120,8 +16120,7 @@ module_state::write_namespaces (elf_out *to, vec spaces,
   tree ns = b->get_entity ();
 
   gcc_checking_assert (TREE_CODE (ns) == NAMESPACE_DECL);
-  /* P1815 may have something to say about this.  */
-  gcc_checking_assert (TREE_PUBLIC (ns));
+  gcc_checking_assert (TREE_PUBLIC (ns) || header_module_p ());
 
   unsigned flags = 0;
   if (TREE_PUBLIC (ns))
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index cbb2827808f..5f0b056f272 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -9094,6 +9094,9 @@ make_namespace_finish (tree ns, tree *slot, bool 
from_import = false)
 
   if (DECL_NAMESPACE_INLINE_P (ns) || !DECL_NAME (ns))
 emit_debug_info_using_namespace (ctx, ns, true);
+
+  if (!DECL_NAMESPACE_INLINE_P (ns) && !DECL_NAME (ns))
+add_using_namespace (NAMESPACE_LEVEL (ctx)->using_directives, ns);
 }
 
 /* Push into the scope of the NAME namespace.  If NAME is NULL_TREE,
@@ -9230,11 +9233,6 @@ push_namespace (tree name, bool make_inline)
  gcc_checking_assert (slot);
}
  make_namespace_finish (ns, slot);
-
- /* Add the anon using-directive here, we don't do it in
-make_namespace_finish.  */
- if (!DECL_NAMESPACE_INLINE_P (ns) && !name)
-   add_using_namespace (current_binding_level->using_directives, ns);
}
 }
 
diff --git a/gcc/testsuite/g++.dg/modules/internal-8_a.H 
b/gcc/testsuite/g++.dg/modules/internal-8_a.H
new file mode 100644
index 000..57fe60bb3c0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-8_a.H
@@ -0,0 +1,28 @@
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+static int x = 123;
+static void f() {}
+template  static void t() {}
+
+namespace {
+  int y = 456;
+  void g() {};
+  template  void u() {}
+
+  namespace ns { int in_ns = 456; }
+
+  struct A {};
+  template  struct B {};
+
+  enum E { X };
+  enum class F { Y };
+
+  template  using U = int;
+
+#if __cplusplus >= 202002L
+  template  concept C = true;
+#endif
+}
+
+namespace ns2 = ns;
diff --git a/gcc/testsuite/g++.dg/modules/internal-8_b.C 
b/gcc/testsuite/g++.dg/modules/internal-8_b.C
new file mode 100644
index 000..a2d74a87473
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-8_b.C
@@ -0,0 +1,29 @@
+// { dg-additional-options "-fmodules-ts" }
+
+import "internal-8_a.H";
+
+int main() {
+  auto x2 = x;
+  f();
+  t();
+
+  auto y2 = y;
+  g();
+  u();
+
+  int val1 = ns::in_ns;
+
+  A a;
+  B b;
+
+  E e = X;
+  F f = F::Y

Re: [PATCH] libobjc: Fix typos

2024-09-23 Thread Andrew Kreimer

On Mon, Sep 23, 2024 at 01:59:04PM -0700, Andrew Pinski wrote:
> On Mon, Sep 23, 2024 at 12:57 PM Andrew Kreimer  wrote:
> >
> > On Mon, Sep 23, 2024 at 12:47:28PM -0700, Andrew Pinski wrote:
> > > On Fri, Sep 20, 2024 at 1:41 AM Andrew Kreimer  wrote:
> > > >
> > > > Fix typos in comments.
> > >
> > > OK. Do you have push access or need someone to push it for you?
> > >
> >
> > Hi, I need someone to push to.
> 
> Ok, pushed as r15-3813-g0121b852c85db9  except for the configure part
> since the top-level libtool.m4 still contains the typo and we should
> only be regenerating configure.
> 

Thank you.

> 
> Also next time please also add a changelog to commit message as
> mentioned on https://gcc.gnu.org/contribute.html .
> 

Noted.

RE: [PATCH]middle-end: check explicitly for external or constants when checking for loop invariant [PR116817]

2024-09-23 Thread Tamar Christina

I had made the condition to strict before, here's an updated patch:

Hi All,

The previous check if a value was external was checking
!vect_get_internal_def (vinfo, var) but this of course isn't completely right
as they could reductions etc.

This changes the check to just explicitly look at externals and constants.
Note that reductions remain unhandled here, but we don't support codegen of
boolean reductions today anyway.

So at the time we do then this would have the be handled as well in lowering.

Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
x86_64-pc-linux-gnu -m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/116817
* tree-vect-patterns.cc (vect_recog_bool_pattern): Check for const or
externals.

gcc/testsuite/ChangeLog:

PR tree-optimization/116817
* g++.dg/vect/pr116817.cc: New test.

-- inline copy of patch --

diff --git a/gcc/testsuite/g++.dg/vect/pr116817.cc 
b/gcc/testsuite/g++.dg/vect/pr116817.cc
new file mode 100644
index 
..7e28982fb138c24f956aedb03fa454d9d858
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr116817.cc
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int main_ulData0;
+unsigned *main_pSrcBuffer;
+int main(void) {
+  int iSrc = 0;
+  bool bData0;
+  for (; iSrc < 4; iSrc++) {
+if (bData0)
+  main_pSrcBuffer[iSrc] = main_ulData0;
+else
+  main_pSrcBuffer[iSrc] = 0;
+bData0 = !bData0;
+  }
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
e7e877dd2adb55262822f1660f8d92b42d44e6d0..f0298b2ab97a1e7dd0d943340e1389c3c0fa796e
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6062,12 +6062,15 @@ vect_recog_bool_pattern (vec_info *vinfo,
   if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
return NULL;
 
+  stmt_vec_info var_def_info = vinfo->lookup_def (var);
   if (check_bool_pattern (var, vinfo, bool_stmts))
var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
   else if (integer_type_for_mask (var, vinfo))
return NULL;
   else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
-  && !vect_get_internal_def (vinfo, var))
+  && (!var_def_info
+  || STMT_VINFO_DEF_TYPE (var_def_info) == vect_external_def
+  || STMT_VINFO_DEF_TYPE (var_def_info) == vect_constant_def))
{
  /* If the condition is already a boolean then manually convert it to a
 mask of the given integer type but don't set a vectype.  */


rb18806.patch
Description: rb18806.patch

Re: [PATCH] c++: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]

2024-09-23 Thread Jason Merrill


On 9/23/24 9:24 PM, Jakub Jelinek wrote:

On Mon, Sep 23, 2024 at 11:32:59AM -0400, Jason Merrill wrote:

On 8/9/24 9:06 PM, Jakub Jelinek wrote:

Hi!

The following patch implements the C++23 P2718R0 paper
- Wording for P2644R1 Fix for Range-based for Loop.
As all the temporaries from __for_range initialization should have life
extended until the end of __for_range scope, this patch disables (for C++23
and later only and if !processing_template_decl) CLEANUP_POINT_EXPR wrapping
of the __for_range declaration, also disables -Wdangling-reference warning
as well as the rest of extend_ref_init_temps (we know the __for_range temporary
is not TREE_STATIC and as all the temporaries from the initializer will be life
extended, we shouldn't try to handle temporaries referenced by references any
differently) and adds an extra push_stmt_list/pop_stmt_list before
cp_finish_decl of __for_range and after end of the for body and wraps all
that into CLEANUP_POINT_EXPR.
I had to repeat that also for OpenMP range loops because those are handled
differently.


Let's add a flag for this, not just control it with cxx_dialect.  We might
want to consider enabling it by default in earlier modes when not being
strictly conforming?


-frange-based-for-ext-temps
or do you have better suggestion?


I'd probably drop "based", "range-for" seems enough.


Shall we allow also disabling it in C++23 or later modes, or override
user choice unconditionally for C++23+ and only allow users to
enable/disable it in C++11-C++20?


Hmm, I think the latter.


What about the __cpp_range_based_for predefined macro?
Shall it be defined to the C++23 202211L value if the switch is on?
While that could be done in theory for C++17 and later code, for C++11/14
__cpp_range_based_for is 200907L and doesn't include the C++17
201603L step.  Or keep the macro only for C++23 and later?


I think update the macro for 17 and later.


@@ -44600,11 +44609,14 @@ cp_convert_omp_range_for (tree &this_pre
 else
{
  range_temp = build_range_temp (init);
+ tree name = DECL_NAME (range_temp);
  DECL_NAME (range_temp) = NULL_TREE;
  pushdecl (range_temp);
+ DECL_NAME (range_temp) = name;
  cp_finish_decl (range_temp, init,
  /*is_constant_init*/false, NULL_TREE,
  LOOKUP_ONLYCONVERTING);
+ DECL_NAME (range_temp) = NULL_TREE;


This messing with the name needs a rationale.  What wants it to be null?


I'll add comments.  The first = NULL_TREE; is needed so that pushdecl
doesn't register the temporary for name lookup, the = name now is so that
cp_finish_decl recognizes the temporary as range based for temporary
for the lifetime extension, and the last one is just to preserve previous
behavior, not have it visible in debug info etc.


But cp_convert_range_for doesn't ever set the name to NULL_TREE, why 
should the OMP variant be different?


Having it visible to name lookup in the debugger seems beneficial. 
Having it visible to the code seems less useful, but not important to 
prevent.


Jason

Re: [PATCH] libobjc: Fix typos

2024-09-23 Thread Andrew Pinski

On Fri, Sep 20, 2024 at 1:41 AM Andrew Kreimer  wrote:
>
> Fix typos in comments.

OK. Do you have push access or need someone to push it for you?

Thanks,
Andrew

>
> Signed-off-by: Andrew Kreimer 
> ---
>  libobjc/Makefile.in  | 2 +-
>  libobjc/configure| 2 +-
>  libobjc/encoding.c   | 4 ++--
>  libobjc/exception.c  | 4 ++--
>  libobjc/hash.c   | 2 +-
>  libobjc/init.c   | 2 +-
>  libobjc/objc-private/objc-list.h | 2 +-
>  libobjc/sendmsg.c| 2 +-
>  libobjc/thr.c| 2 +-
>  9 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/libobjc/Makefile.in b/libobjc/Makefile.in
> index 58d0638f72e..3d856eb8d5f 100644
> --- a/libobjc/Makefile.in
> +++ b/libobjc/Makefile.in
> @@ -59,7 +59,7 @@ MULTIDO = true
>  MULTICLEAN = true
>
>  # Not configured per top-level version, since that doesn't get passed
> -# down at configure time, but overrridden by the top-level install
> +# down at configure time, but overridden by the top-level install
>  # target.
>  INSTALL = @INSTALL@
>  INSTALL_PROGRAM = @INSTALL_PROGRAM@
> diff --git a/libobjc/configure b/libobjc/configure
> index 68172549137..5f4c928d2bf 100755
> --- a/libobjc/configure
> +++ b/libobjc/configure
> @@ -13935,7 +13935,7 @@ func_basename ()
>  # to NONDIR_REPLACEMENT.
>  # value returned in "$func_dirname_result"
>  #   basename: Compute filename of FILE.
> -# value retuned in "$func_basename_result"
> +# value returned in "$func_basename_result"
>  # Implementation must be kept synchronized with func_dirname
>  # and func_basename. For efficiency, we do not delegate to
>  # those functions but instead duplicate the functionality here.
> diff --git a/libobjc/encoding.c b/libobjc/encoding.c
> index 7a2d2abe6d1..f4fc4f452e4 100644
> --- a/libobjc/encoding.c
> +++ b/libobjc/encoding.c
> @@ -151,7 +151,7 @@ static int __attribute__ ((__unused__)) not_target_flags 
> = 0;
>  #  undef TARGET_ALIGN_NATURAL
>  #  define TARGET_ALIGN_NATURAL 1
>  # endif
> -/* On Darwin32, we need to recurse until we find the starting stuct type.  */
> +/* On Darwin32, we need to recurse until we find the starting struct type.  
> */
>  static int
>  _darwin_rs6000_special_round_type_align (const char *struc, int comp, int 
> spec)
>  {
> @@ -186,7 +186,7 @@ _darwin_rs6000_special_round_type_align (const char 
> *struc, int comp, int spec)
>
>  /*  FIXME: while this file has no business including tm.h, this
>  definitely has no business defining this macro but it
> -is only way around without really rewritting this file,
> +is only way around without really rewriting this file,
>  should look after the branch of 3.4 to fix this.   */
>  #define rs6000_special_round_type_align(STRUCT, COMPUTED, SPECIFIED)   \
>({ const char *_fields = TYPE_FIELDS (STRUCT);   \
> diff --git a/libobjc/exception.c b/libobjc/exception.c
> index f051c5f9524..dce576e8559 100644
> --- a/libobjc/exception.c
> +++ b/libobjc/exception.c
> @@ -42,7 +42,7 @@ is_kind_of_exception_matcher (Class catch_class, id 
> exception)
>  return 1;
>
>/* If exception is nil (eg, @throw nil;), then it can only be
> - catched by a catch-all (eg, @catch (id object)).  */
> + caught by a catch-all (eg, @catch (id object)).  */
>if (exception != nil)
>  {
>Class c;
> @@ -384,7 +384,7 @@ PERSONALITY_FUNCTION (int version,
>  #endif /* __USING_SJLJ_EXCEPTIONS__  */
>
>/* If ip is not present in the table, C++ would call terminate.  */
> -  /* ??? As with Java, it's perhaps better to tweek the LSDA to that
> +  /* ??? As with Java, it's perhaps better to tweak the LSDA to that
>   no-action is mapped to no-entry.  */
>CONTINUE_UNWINDING;
>
> diff --git a/libobjc/hash.c b/libobjc/hash.c
> index e216c8cdf3b..e0ecf30e024 100644
> --- a/libobjc/hash.c
> +++ b/libobjc/hash.c
> @@ -222,7 +222,7 @@ objc_hash_next (cache_ptr cache, node_ptr node)
>if (node->next)
> {
>   /* There is a node which follows the last node returned.
> -Step to that node and retun it.  */
> +Step to that node and return it.  */
>   return node->next;
> }
>else
> diff --git a/libobjc/init.c b/libobjc/init.c
> index 6216546084b..9f8bafb8ee3 100644
> --- a/libobjc/init.c
> +++ b/libobjc/init.c
> @@ -851,7 +851,7 @@ __objc_create_classes_tree (struct objc_module *module)
>
>/* Now iterate over "claimed" categories too (ie, categories that
>   extend a class that has already been loaded by the runtime), and
> - insert them in the classes tree hiearchy too.  Otherwise, if you
> + insert them in the classes tree hierarchy too.  Otherwise, if you
>   add a category, its +load method would not be called if the class
>   is already loaded in the runtime.  It the category is
>   "unclaimed", ie, we haven't loaded the mai

Re: [PATCH v2 2/2] c++: simplify handling implicit INDIRECT_REF and co_await in convert_to_void

2024-09-23 Thread Jason Merrill


On 9/21/24 1:54 PM, Arsen Arsenović wrote:

convert_to_void has, so far, when converting a co_await expression to
void altered the await_resume expression of a co_await so that it is
also converted to void.  This meant that the type of the await_resume
expression, which is also supposed to be the type of the whole co_await
expression, was not the same as the type of the CO_AWAIT_EXPR tree.

While this has not caused problems so far, it is unexpected, I think.

Also, convert_to_void had a special case when an INDIRECT_REF wrapped a
CALL_EXPR.  In this case, we also diagnosed maybe_warn_nodiscard.  This
was a duplication of logic related to converting call expressions to
void.

Instead, we can generalize a bit, and rather discard the expression that
was implicitly dereferenced instead.

This patch changes the diagnostic of:

   void f(struct S* x) { static_cast(*x); }

... from:

   warning: indirection will not access object of incomplete type
'volatile S' in statement

... to:

   warning: implicit dereference will not access object of type
‘volatile S’ in statement

... but should have no impact in other cases.


OK.


gcc/cp/ChangeLog:

* coroutines.cc (co_await_get_resume_call): Return a tree
directly, rather than a tree pointer.
* cp-tree.h (co_await_get_resume_call): Adjust signature
accordingly.
* cvt.cc (convert_to_void): Do not alter CO_AWAIT_EXPRs when
discarding them.  Simplify handling implicit INDIRECT_REFs.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/nodiscard-1.C: New test.
---
  gcc/cp/coroutines.cc  |   4 +-
  gcc/cp/cp-tree.h  |   2 +-
  gcc/cp/cvt.cc | 112 +-
  gcc/testsuite/g++.dg/coroutines/nodiscard-1.C |  77 
  4 files changed, 137 insertions(+), 58 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/nodiscard-1.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index f9129b5f988b..33878b7a876b 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -867,14 +867,14 @@ coro_get_destroy_function (tree decl)
  
  /* Given a CO_AWAIT_EXPR AWAIT_EXPR, return its resume call.  */
  
-tree*

+tree
  co_await_get_resume_call (tree await_expr)
  {
gcc_checking_assert (TREE_CODE (await_expr) == CO_AWAIT_EXPR);
tree vec = TREE_OPERAND (await_expr, 3);
if (!vec)
  return nullptr;
-  return &TREE_VEC_ELT (vec, 2);
+  return TREE_VEC_ELT (vec, 2);
  }
  
  
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h

index 87e3da49ea97..2aad1e497338 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8780,7 +8780,7 @@ extern tree coro_get_actor_function   (tree);
  extern tree coro_get_destroy_function (tree);
  extern tree coro_get_ramp_function(tree);
  
-extern tree* co_await_get_resume_call		(tree await_expr);

+extern tree co_await_get_resume_call   (tree await_expr);
  
  
  /* contracts.cc */

diff --git a/gcc/cp/cvt.cc b/gcc/cp/cvt.cc
index df02b8faaf51..526937d36181 100644
--- a/gcc/cp/cvt.cc
+++ b/gcc/cp/cvt.cc
@@ -1272,54 +1272,15 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
  complete_type (type);
int is_complete = COMPLETE_TYPE_P (type);
  
-	/* Can't load the value if we don't know the type.  */

-   if (is_volatile && !is_complete)
-  {
-if (complain & tf_warning)
- switch (implicit)
-   {
- case ICV_CAST:
-   warning_at (loc, 0, "conversion to void will not access "
-   "object of incomplete type %qT", type);
-   break;
- case ICV_SECOND_OF_COND:
-   warning_at (loc, 0, "indirection will not access object of "
-   "incomplete type %qT in second operand "
-   "of conditional expression", type);
-   break;
- case ICV_THIRD_OF_COND:
-   warning_at (loc, 0, "indirection will not access object of "
-   "incomplete type %qT in third operand "
-   "of conditional expression", type);
-   break;
- case ICV_RIGHT_OF_COMMA:
-   warning_at (loc, 0, "indirection will not access object of "
-   "incomplete type %qT in right operand of "
-   "comma operator", type);
-   break;
- case ICV_LEFT_OF_COMMA:
-   warning_at (loc, 0, "indirection will not access object of "
-   "incomplete type %qT in left operand of "
-   "comma operator", type);
-   break;
- case ICV_STATEMENT:
-   warning_at (loc, 0, "indirection will

Re: [PATCH] c++: diagnose this specifier in requires expr [PR116798]

2024-09-23 Thread Jason Merrill


On 9/23/24 9:08 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
We don't detect an explicit object parameter in a requires expression.
We can get there by way of requires-expression -> requirement-parameter-list
-> parameter-declaration-clause -> ... -> parameter-declaration with
this[opt].  But [dcl.fct]/5 doesn't allow an explicit object parameter
in this context.  So let's fix it like r14-9033 and not like r14-8832.

PR c++/116798

gcc/cp/ChangeLog:

* parser.cc (cp_parser_parameter_declaration): Detect an explicit
object parameter in a requires expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-diagnostics12.C: New test.
---
  gcc/cp/parser.cc  | 11 ---
  .../g++.dg/cpp23/explicit-obj-diagnostics12.C | 10 ++
  2 files changed, 18 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics12.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4dd9474cf60..7b54586fce6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -25982,10 +25982,15 @@ cp_parser_parameter_declaration (cp_parser *parser,
  
bool xobj_param_p

  = decl_spec_seq_has_spec_p (&decl_specifiers, ds_this);
-  if (xobj_param_p && template_parm_p)
+  if (xobj_param_p
+  && (template_parm_p || current_binding_level->requires_expression))
  {
-  error_at (decl_specifiers.locations[ds_this],
-   "% specifier in template parameter declaration");
+  if (template_parm_p)
+   error_at (decl_specifiers.locations[ds_this],
+ "% specifier in template parameter declaration");
+  else
+   error_at (decl_specifiers.locations[ds_this],
+ "% specifier in a requires expression");


Let's include the word "parameter" in this diagnostic, either 
"requirement parameter" or "requires-expression parameter".


OK with that change.

Jason

Re: [Fortran, Patch, PR84870, v1] Fix ICE and allocated memory not assigned correctly.

2024-09-23 Thread Harald Anlauf


Hi Andre,

Am 19.09.24 um 16:01 schrieb Andre Vehreschild:

Hi all,

in PR84870 an ICE was reported, that has been fixed in the meantime by some
other patch. Nevertheless did a testcase reveal that the memory handling still
was not correct. I.e. the test case in the patch was answering 2 for both x.b.a
and y.b.a which is not correct.

For a coarray all memory is allocated using an array descriptor. For scalars
just a temporary descriptor is created and handed to the caf-register routine.
The error here was, that the memory now handed back in the temporary descriptor
was not used for the memory in the component, thus the pointer in the component
was not updated. The patch fixes this.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?


this looks good to me.

Thanks for the patch!

Harald


Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH] libobjc: Fix typos

2024-09-23 Thread Andrew Kreimer

On Mon, Sep 23, 2024 at 12:47:28PM -0700, Andrew Pinski wrote:
> On Fri, Sep 20, 2024 at 1:41 AM Andrew Kreimer  wrote:
> >
> > Fix typos in comments.
> 
> OK. Do you have push access or need someone to push it for you?
> 

Hi, I need someone to push to.

[PATCH] c++: diagnose this specifier in requires expr [PR116798]

2024-09-23 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
We don't detect an explicit object parameter in a requires expression.
We can get there by way of requires-expression -> requirement-parameter-list
-> parameter-declaration-clause -> ... -> parameter-declaration with
this[opt].  But [dcl.fct]/5 doesn't allow an explicit object parameter
in this context.  So let's fix it like r14-9033 and not like r14-8832.

PR c++/116798

gcc/cp/ChangeLog:

* parser.cc (cp_parser_parameter_declaration): Detect an explicit
object parameter in a requires expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-diagnostics12.C: New test.
---
 gcc/cp/parser.cc  | 11 ---
 .../g++.dg/cpp23/explicit-obj-diagnostics12.C | 10 ++
 2 files changed, 18 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics12.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4dd9474cf60..7b54586fce6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -25982,10 +25982,15 @@ cp_parser_parameter_declaration (cp_parser *parser,
 
   bool xobj_param_p
 = decl_spec_seq_has_spec_p (&decl_specifiers, ds_this);
-  if (xobj_param_p && template_parm_p)
+  if (xobj_param_p
+  && (template_parm_p || current_binding_level->requires_expression))
 {
-  error_at (decl_specifiers.locations[ds_this],
-   "% specifier in template parameter declaration");
+  if (template_parm_p)
+   error_at (decl_specifiers.locations[ds_this],
+ "% specifier in template parameter declaration");
+  else
+   error_at (decl_specifiers.locations[ds_this],
+ "% specifier in a requires expression");
   xobj_param_p = false;
   decl_specifiers.locations[ds_this] = 0;
 }
diff --git a/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics12.C 
b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics12.C
new file mode 100644
index 000..84db2166cc2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics12.C
@@ -0,0 +1,10 @@
+// PR c++/116798
+// { dg-do compile { target c++23 } }
+
+template
+concept C = requires(this T u,   // { dg-error "'this' specifier in a requires 
expression" }
+this T v) {  // { dg-error "'this' specifier in a requires 
expression" }
+u + v;
+};
+
+static_assert(C);

base-commit: c1fb78fb03caede01b02a1ebb3275ac98343d468
-- 
2.46.1

Re: [PATCH v3 1/4] tree-optimization/116024 - simplify C1-X cmp C2 for UB-on-overflow types

2024-09-23 Thread Jeff Law





On 9/23/24 2:32 AM, Artemiy Volkov wrote:

Implement a match.pd pattern for C1 - X cmp C2, where C1 and C2 are
integer constants and X is of a UB-on-overflow type.  The pattern is
simplified to X rcmp C1 - C2 by moving X and C2 to the other side of the
comparison (with opposite signs).  If C1 - C2 happens to overflow,
replace the whole expression with either a constant 0 or a constant 1
node, depending on the comparison operator and the sign of the overflow.

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

10 - (int) x <= 9;

now compiles to

sgt a0,a0,zero

instead of

li  a5,10
sub a0,a5,a0
sltia0,a0,10

on 32-bit RISC-V.

Additional examples can be found in the newly added test file. This
patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.  Existing tests were
adjusted where necessary.

gcc/ChangeLog:

PR tree-optimization/116024
 * match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/pr116024.c: New test.
 * gcc.dg/pr67089-6.c: Adjust.
I think Richi is already engaged on the review side, so I'll let him own 
especially since he knows more about match.pd patterns than I do.




+int32_t i1(void)
+{
+  int32_t l = 2;
+  l = 10 - (int32_t)f();
+  return l <= 9; // f() > 0
+}
Why the initialization of l = 2?  It's trivially dead and I expect it to 
be cleaned up early in the optimization pipeline.  It looks like most of 
the tests in the series have this trivially dead initialization code.


Jeff

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Tobias Burnus


Hi all,

I have now downloaded the file at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663534.html (by 
copying it from the browser, not the source code to avoid '>


This file had had to fix spurious line breaks like:

 @@ -5171,7 +5171,7 @@ index_interchange (gfc_code **c, int
*walk_subtrees ATTRIBUTE_UNUSED,

where the *... belongs to the previous line.

the result of this conversion is the attached file.

* * *

Harald Anlauf wrote:

Generally speaking, runtime tests should verify that they work as
expected.


There are currently only compile-time tests.

[One might argue that some should be run-time tests, albeit the really 
interesting part only happens with local/local_init (currently not 
supported) – and with true concurrency in particular with 'reduce'.]


[The interesting cases of 'local'/'local_init' there is a currently a 
'sorry' while 'reduce' only becomes truly interesting if one goes 
parallel …]


Tobias
gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_code_node): Updated to use
	c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
	Added support for dumping DO CONCURRENT locality specifiers.
	* frontend-passes.cc (index_interchange, gfc_code_walker): Updated to
	use c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
	* gfortran.h (enum locality_type): Added new enum for locality types
	in DO CONCURRENT constructs.
	* match.cc (match_simple_forall, gfc_match_forall): Updated to use
	new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
	(gfc_match_do): Implemented support for matching DO CONCURRENT locality
	specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and REDUCE).
	* parse.cc (parse_do_block): Updated to use
	new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
	* resolve.cc: Added struct check_default_none_data.
	(do_concur_locality_specs_f2023): New function to check compliance
	with F2023's C1133 constraint for DO CONCURRENT.
	(check_default_none_expr): New function to check DEFAULT(NONE)
	compliance.
	(resolve_locality_spec): New function to resolve locality specs.
	(gfc_count_forall_iterators): Updated to use
	code->ext.concur.forall_iterator.
	(gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator.
	* st.cc (gfc_free_statement): Updated to free locality specifications
	and use p->ext.concur.forall_iterator.
	* trans-stmt.cc (gfc_trans_forall_1): Updated to use
	code->ext.concur.forall_iterator.

gcc/testsuite/ChangeLog:

	* gfortran.dg/do_concurrent_10.f90: New test for parsing DO CONCURRENT
	with 'concurrent' as a variable name.
	* gfortran.dg/do_concurrent_8_f2018.f90: New test for F2018 DO
	CONCURRENT with nested loops and REDUCE clauses.
	* gfortran.dg/do_concurrent_8_f2023.f90: New test for F2023 DO
	CONCURRENT with nested loops and REDUCE clauses.
	* gfortran.dg/do_concurrent_9.f90: New test for DO CONCURRENT with
	DEFAULT(NONE) and locality specs.
	* gfortran.dg/do_concurrent_all_clauses.f90: New test covering all DO
	CONCURRENT clauses and their interactions.
	* gfortran.dg/do_concurrent_basic.f90: New basic test for DO CONCURRENT
	functionality.
	* gfortran.dg/do_concurrent_constraints.f90: New test for constraints
	on DO CONCURRENT locality specs.
	* gfortran.dg/do_concurrent_local_init.f90: New test for LOCAL_INIT
	clause in DO CONCURRENT.
	* gfortran.dg/do_concurrent_locality_specs.f90: New test for DO
	CONCURRENT with locality specs.
	* gfortran.dg/do_concurrent_multiple_reduce.f90: New test for multiple
	REDUCE clauses in DO CONCURRENT.
	* gfortran.dg/do_concurrent_nested.f90: New test for nested DO
	CONCURRENT loops.
	* gfortran.dg/do_concurrent_parser.f90: New test for DO CONCURRENT
	parser error handling.
	* gfortran.dg/do_concurrent_reduce_max.f90: New test for REDUCE with
	MAX operation in DO CONCURRENT.
	* gfortran.dg/do_concurrent_reduce_sum.f90: New test for REDUCE with
	sum operation in DO CONCURRENT.
	* gfortran.dg/do_concurrent_shared.f90: New test for SHARED clause in
	DO CONCURRENT.

Signed-off-by: Anuj 
---
 gcc/fortran/dump-parse-tree.cc| 113 +-
 gcc/fortran/frontend-passes.cc|   8 +-
 gcc/fortran/gfortran.h|  20 +-
 gcc/fortran/match.cc  | 286 +-
 gcc/fortran/parse.cc  |   2 +-
 gcc/fortran/resolve.cc| 354 +-
 gcc/fortran/st.cc |   5 +-
 gcc/fortran/trans-stmt.cc |   6 +-
 .../gfortran.dg/do_concurrent_10.f90  |  11 +
 .../gfortran.dg/do_concurrent_8_f2018.f90 |  19 +
 .../gfortran.dg/do_concurrent_8_f2023.f90 |  23 ++
 gcc/testsuite/gfortran.dg/do_concurrent_9.f90 |  15 +
 .../gfortran.dg/do_concurrent_all_clauses.f90 |  26 ++
 .../gfortran.dg/do_concurrent_basic.f90   |  11 +
 .../gfortran.dg/do_concurrent_constraints.f90 | 126 +++
 .../gfortran.dg/do_concurrent_local_init.f90  |  11 +
 .../do_concurrent_locality_spec

Re: [PATCH] libobjc: Fix typos

2024-09-23 Thread Andrew Pinski

On Mon, Sep 23, 2024 at 12:57 PM Andrew Kreimer  wrote:
>
> On Mon, Sep 23, 2024 at 12:47:28PM -0700, Andrew Pinski wrote:
> > On Fri, Sep 20, 2024 at 1:41 AM Andrew Kreimer  wrote:
> > >
> > > Fix typos in comments.
> >
> > OK. Do you have push access or need someone to push it for you?
> >
>
> Hi, I need someone to push to.

Ok, pushed as r15-3813-g0121b852c85db9  except for the configure part
since the top-level libtool.m4 still contains the typo and we should
only be regenerating configure.

ChangeLog entry used:
libobjc/ChangeLog:

* Makefile.in: s/overrridden/overridden.
* encoding.c (_darwin_rs6000_special_round_type_align): Fix typo
in comment.
(rs6000_special_round_type_align): Likewise.
* exception.c (is_kind_of_exception_matcher): Likewise.
(PERSONALITY_FUNCTION): Likewise.
* hash.c (objc_hash_next): Likewise.
* init.c (__objc_create_classes_tree): Likewise.
* objc-private/objc-list.h (list_remove_head): Likewise.
* sendmsg.c (__objc_install_dtable_for_class): Likewise.
* thr.c (objc_thread_yield): Likewise.

Also next time please also add a changelog to commit message as
mentioned on https://gcc.gnu.org/contribute.html .

Thanks,
Andrew Pinski

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Harald Anlauf


Hi Anuj,

thanks for your work!

I am unable to apply the patch, so I only looked at the testcases.

Generally speaking, runtime tests should verify that they work as
expected.  Just printing a result does not.  Use a comparison
against an expected result and do e.g. STOP 123 on failure.

Also, never use -std=gnu in the options; -std=gnu is the default,
and its behavior may change any time.  If you want to test something
that is enabled at F2023, please use -std=f2023.  Also, -std=gnu is
meant to enable a GNU extension, but DO CONCURRENT is not an extension
but defined in the Fortran standard.

For details on my comments see below.

Thanks,
Harald

Am 22.09.24 um 08:19 schrieb Anuj Mohite:


diff --git a/gcc/testsuite/gfortran.dg/do_concurrent_10.f90
b/gcc/testsuite/gfortran.dg/do_concurrent_10.f90
new file mode 100644
index 000..6bbeb3bc990
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/do_concurrent_10.f90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! { dg-options "-std=f2018" }
+
+program do_concurrent_parsing
+  implicit none
+  integer :: concurrent, do
+  do concurrent = 1, 5
+  end do
+  do concurrent = 1, 5

^^^ should this be 'do' instead of 'concurrent'?


+  end do
+end program do_concurrent_parsing



diff --git a/gcc/testsuite/gfortran.dg/do_concurrent_8_f2023.f90
b/gcc/testsuite/gfortran.dg/do_concurrent_8_f2023.f90
new file mode 100644
index 000..a99d81e4a5c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/do_concurrent_8_f2023.f90
@@ -0,0 +1,23 @@
+! { dg-do compile }
+! { dg-options "-std=gnu" }

   ^^^ here you want -std=f2023


+program do_concurrent_complex
+  implicit none
+  integer :: i, j, k, sum, product
+  integer, dimension(10,10,10) :: array
+  sum = 0
+  product = 1
+  do concurrent (i = 1:10) local(j) shared(sum) reduce(+:sum)
+! { dg-error "Variable .sum. at .1. has already been specified in
a locality-spec" "" { target *-*-* } .-1 }
+! { dg-error "Sorry, LOCAL and LOCAL_INIT are not yet supported
for 'do concurrent' constructs" "" { target *-*-* } .-2 }
+do concurrent (j = 1:10) local(k) shared(product) reduce(*:product)
+  ! { dg-error "Variable .product. at .1. has already been
specified in a locality-spec" "" { target *-*-* } .-1 }
+  ! { dg-error "Sorry, LOCAL and LOCAL_INIT are not yet supported
for 'do concurrent' constructs" "" { target *-*-* } .-2 }
+  do concurrent (k = 1:10)
+array(i,j,k) = i * j * k
+sum = sum + array(i,j,k)
+product = product * array(i,j,k)
+  end do
+end do
+  end do
+  print *, sum, product
+end program do_concurrent_complex
\ No newline at end of file




diff --git a/gcc/testsuite/gfortran.dg/do_concurrent_basic.f90
b/gcc/testsuite/gfortran.dg/do_concurrent_basic.f90
new file mode 100644
index 000..fe8723d48b4
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/do_concurrent_basic.f90
@@ -0,0 +1,11 @@
+! { dg-do run }
+program basic_do_concurrent
+  implicit none
+  integer :: i, arr(10)
+
+  do concurrent (i = 1:10)
+arr(i) = i
+  end do
+
+  print *, arr
+end program basic_do_concurrent
\ No newline at end of file

^^^ this testcase does neither test the result, nor does it provide
anything beyond existing tests.  Consider dropping it.



diff --git a/gcc/testsuite/gfortran.dg/do_concurrent_multiple_reduce.f90
b/gcc/testsuite/gfortran.dg/do_concurrent_multiple_reduce.f90
new file mode 100644
index 000..47c71492107
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/do_concurrent_multiple_reduce.f90
@@ -0,0 +1,17 @@
+! { dg-do compile }
+program do_concurrent_multiple_reduce
+  implicit none
+  integer :: i, arr(10), sum, product
+  sum = 0
+  product = 1
+
+  do concurrent (i = 1:10) reduce(+:sum) reduce(*:product)
+arr(i) = i
+sum = sum + i
+product = product * i
+  end do
+
+  print *, arr
+  print *, "Sum:", sum
+  print *, "Product:", product

  ^^^ please verify results!


+end program do_concurrent_multiple_reduce
\ No newline at end of file



diff --git a/gcc/testsuite/gfortran.dg/do_concurrent_nested.f90
b/gcc/testsuite/gfortran.dg/do_concurrent_nested.f90
new file mode 100644
index 000..83b9cdbc04f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/do_concurrent_nested.f90
@@ -0,0 +1,26 @@
+! { dg-do compile }
+program nested_do_concurrent
+  implicit none
+  integer :: i, j, x(10, 10)
+  integer :: total_sum
+
+  total_sum = 0
+
+  ! Outer loop remains DO CONCURRENT
+  do concurrent (i = 1:10)
+! Inner loop changed to regular DO loop
+do j = 1, 10
+  x(i, j) = i * j
+end do
+  end do
+
+  ! Separate loops for summation
+  do i = 1, 10
+do j = 1, 10
+  total_sum = total_sum + x(i, j)
+end do
+  end do
+
+  print *, "Total sum:", total_sum
+  print *, "Array:", x

  ^^^ please verify results!

+end program nested_do_concurrent
\ No newline at end of file




diff --git a/gcc/testsuite/gfortran.dg/do_concurrent_reduce_max.f90
b/gcc/testsuite/gfortran.dg/do_concurrent_reduc

Re: [PATCH] c++: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]

2024-09-23 Thread Jakub Jelinek

On Mon, Sep 23, 2024 at 11:32:59AM -0400, Jason Merrill wrote:
> On 8/9/24 9:06 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > The following patch implements the C++23 P2718R0 paper
> > - Wording for P2644R1 Fix for Range-based for Loop.
> > As all the temporaries from __for_range initialization should have life
> > extended until the end of __for_range scope, this patch disables (for C++23
> > and later only and if !processing_template_decl) CLEANUP_POINT_EXPR wrapping
> > of the __for_range declaration, also disables -Wdangling-reference warning
> > as well as the rest of extend_ref_init_temps (we know the __for_range 
> > temporary
> > is not TREE_STATIC and as all the temporaries from the initializer will be 
> > life
> > extended, we shouldn't try to handle temporaries referenced by references 
> > any
> > differently) and adds an extra push_stmt_list/pop_stmt_list before
> > cp_finish_decl of __for_range and after end of the for body and wraps all
> > that into CLEANUP_POINT_EXPR.
> > I had to repeat that also for OpenMP range loops because those are handled
> > differently.
> 
> Let's add a flag for this, not just control it with cxx_dialect.  We might
> want to consider enabling it by default in earlier modes when not being
> strictly conforming?

-frange-based-for-ext-temps
or do you have better suggestion?

Shall we allow also disabling it in C++23 or later modes, or override
user choice unconditionally for C++23+ and only allow users to
enable/disable it in C++11-C++20?

What about the __cpp_range_based_for predefined macro?
Shall it be defined to the C++23 202211L value if the switch is on?
While that could be done in theory for C++17 and later code, for C++11/14
__cpp_range_based_for is 200907L and doesn't include the C++17
201603L step.  Or keep the macro only for C++23 and later?
And if one can override -std=c++23 -fno-range-based-for-ext-temps,
shall it use the C++17 version?

> > @@ -44600,11 +44609,14 @@ cp_convert_omp_range_for (tree &this_pre
> > else
> > {
> >   range_temp = build_range_temp (init);
> > + tree name = DECL_NAME (range_temp);
> >   DECL_NAME (range_temp) = NULL_TREE;
> >   pushdecl (range_temp);
> > + DECL_NAME (range_temp) = name;
> >   cp_finish_decl (range_temp, init,
> >   /*is_constant_init*/false, NULL_TREE,
> >   LOOKUP_ONLYCONVERTING);
> > + DECL_NAME (range_temp) = NULL_TREE;
> 
> This messing with the name needs a rationale.  What wants it to be null?

I'll add comments.  The first = NULL_TREE; is needed so that pushdecl
doesn't register the temporary for name lookup, the = name now is so that
cp_finish_decl recognizes the temporary as range based for temporary
for the lifetime extension, and the last one is just to preserve previous
behavior, not have it visible in debug info etc.

Jakub

Re: [Fortran, Patch, PR101100, v1] Fix ICE when compiling with caf-lib and using proc_pointer component.

2024-09-23 Thread Harald Anlauf


Hi Andre,

Am 19.09.24 um 14:19 schrieb Andre Vehreschild:

Hi all,

the attached patch fixes an ICE when compiling with -fcoarray=lib and using
(proc_-)pointer component in a coarray. The code was looking at the wrong
location for the caf-token.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?


this looks good to me.

Thanks for the patch!

Harald


Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH] hosthooks.h: Fix GCC_HOST_HOOKS_H typo

2024-09-23 Thread Andrew Pinski

On Mon, Sep 23, 2024 at 10:12 AM Yangyu Chen
 wrote:
>
> The comment of the final endif in hosthooks.h is wrong, it should be
> GCC_HOST_HOOKS_H instead of GCC_LANG_HOOKS_H.

This looks obvious to me. Do you have push access or do you need
someone to push this change for you?

Thanks,
Andrew

>
> gcc/ChangeLog:
>
> * hosthooks.h (struct host_hooks): Fix GCC_HOST_HOOKS_H typo.
>
> Signed-off-by: Yangyu Chen 
> ---
>  gcc/hosthooks.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/hosthooks.h b/gcc/hosthooks.h
> index 53363801330..8178c9c692a 100644
> --- a/gcc/hosthooks.h
> +++ b/gcc/hosthooks.h
> @@ -47,4 +47,4 @@ struct host_hooks
>  /* Each host provides its own.  */
>  extern const struct host_hooks host_hooks;
>
> -#endif /* GCC_LANG_HOOKS_H */
> +#endif /* GCC_HOST_HOOKS_H */
> --
> 2.45.2
>

Re: [PATCH v2 1/2] c++/coro: prevent ICV_STATEMENT diagnostics in temp promotion [PR116502]

2024-09-23 Thread Jason Merrill


On 9/21/24 1:54 PM, Arsen Arsenović wrote:

Okay, these patch should work correctly in all cases, at least all I
could think of.  The first patch is unchanged, the second one is simpler
than it was before, I think.
-- >8 --
If such a diagnostic is necessary, it has already been emitted,
otherwise, it is not correct and emitting it here is inactionable by the
user, and bogus.


OK.


PR c++/116502

gcc/cp/ChangeLog:

* coroutines.cc (maybe_promote_temps): Convert temporary
initializers to void without complaining.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/maybe-unused-1.C: New test.
* g++.dg/coroutines/pr116502.C: New test.
---
  gcc/cp/coroutines.cc  | 12 +--
  .../g++.dg/coroutines/maybe-unused-1.C| 33 +++
  gcc/testsuite/g++.dg/coroutines/pr116502.C| 33 +++
  3 files changed, 75 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/maybe-unused-1.C
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116502.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 75e8dbbeab1b..f9129b5f988b 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3344,7 +3344,13 @@ maybe_promote_temps (tree *stmt, void *d)
 to run the initializer.
 If the initializer is a conditional expression, we need to collect
 and declare any promoted variables nested within it.  DTORs for such
-variables must be run conditionally too.  */
+variables must be run conditionally too.
+
+Since here we're synthetically processing code here, we've already
+emitted any Wunused-result warnings.  Below, however, we call
+finish_expr_stmt, which will convert its operand to void, and could
+result in such a diagnostic being emitted.  To avoid that, convert to
+void ahead of time.  */
if (t->var)
{
  tree var = t->var;
@@ -3354,7 +3360,7 @@ maybe_promote_temps (tree *stmt, void *d)
  if (TREE_CODE (t->init) == COND_EXPR)
process_conditional (t, vlist);
  else
-   finish_expr_stmt (t->init);
+   finish_expr_stmt (convert_to_void (t->init, ICV_STATEMENT, 
tf_none));
  if (tree cleanup = cxx_maybe_build_cleanup (var, tf_warning_or_error))
{
  tree cl = build_stmt (sloc, CLEANUP_STMT, expr_list, cleanup, 
var);
@@ -3373,7 +3379,7 @@ maybe_promote_temps (tree *stmt, void *d)
  if (TREE_CODE (t->init) == COND_EXPR)
process_conditional (t, vlist);
  else
-   finish_expr_stmt (t->init);
+   finish_expr_stmt (convert_to_void (t->init, ICV_STATEMENT, 
tf_none));
  if (expr_list)
{
  if (TREE_CODE (expr_list) != STATEMENT_LIST)
diff --git a/gcc/testsuite/g++.dg/coroutines/maybe-unused-1.C 
b/gcc/testsuite/g++.dg/coroutines/maybe-unused-1.C
new file mode 100644
index ..68d59d83e8eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/maybe-unused-1.C
@@ -0,0 +1,33 @@
+// https://gcc.gnu.org/PR116502
+#include 
+
+struct SuspendNever {
+  bool await_ready() noexcept;
+  void await_suspend(std::coroutine_handle<>) noexcept;
+  void await_resume() noexcept;
+};
+
+struct Coroutine;
+
+struct PromiseType {
+  Coroutine get_return_object();
+  SuspendNever initial_suspend();
+  SuspendNever final_suspend() noexcept;
+  void return_void();
+  void unhandled_exception();
+};
+
+struct Coroutine {
+  using promise_type = PromiseType;
+};
+
+struct Awaiter {
+  bool await_ready();
+  void await_suspend(std::coroutine_handle<>);
+  [[nodiscard]] int& await_resume();
+};
+
+Coroutine foo()
+{
+  co_await Awaiter {}; // { dg-warning "Wunused-result" }
+}
diff --git a/gcc/testsuite/g++.dg/coroutines/pr116502.C 
b/gcc/testsuite/g++.dg/coroutines/pr116502.C
new file mode 100644
index ..95cc0bc8a983
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr116502.C
@@ -0,0 +1,33 @@
+// https://gcc.gnu.org/PR116502
+#include 
+
+struct SuspendNever {
+  bool await_ready() noexcept;
+  void await_suspend(std::coroutine_handle<>) noexcept;
+  void await_resume() noexcept;
+};
+
+struct Coroutine;
+
+struct PromiseType {
+  Coroutine get_return_object();
+  SuspendNever initial_suspend();
+  SuspendNever final_suspend() noexcept;
+  void return_void();
+  void unhandled_exception();
+};
+
+struct Coroutine {
+  using promise_type = PromiseType;
+};
+
+struct Awaiter {
+  bool await_ready();
+  void await_suspend(std::coroutine_handle<>);
+  [[nodiscard]] int& await_resume();
+};
+
+Coroutine foo()
+{
+  (void)co_await Awaiter {};
+}

Re: RFC PATCH: contrib/test_summary mode for submitting testsuite results to bunsen

2024-09-23 Thread Frank Ch. Eigler

Hi, HP -

> I'd love for (something like) gcc-testresults@ to be usefully 
> searchable (it can be done but... lacks), so please allow me:

Certainly!

> > +: ${bunsengit=ssh://sourceware.org/git/bunsendb.git/};
> > +: ${bunsentag=`whoami`/gcc/`uname -m`-`date +%Y%m%d-%H%M`};
> 
> That uname -m looks like it's an assumption that the report is 
> for a 1) native build that is 2) the same machine as where the 
> git push should happen and 3) all run the same OS.  Also, my 
> local account-name may be completely different than what's 
> needed in the tag.  Looks like there's a side-question for 
> account names for the bunsendb when you don't have a sourceware 
> account (are rules needed)? Anyway, please parametrize. [...]

OK, the shell script is parametrized more.
wdyt about this version?

commit 23c3100e992029994f33eb4a1465570b476c1df4 (HEAD -> master)
Author: Frank Ch. Eigler 
Date:   Mon Sep 23 18:03:31 2024 -0400

contrib/test_summary: Add bunsen uploading mode

This makes it easy for someone to push gcc dejagnu/autoconf test
results to a bunsen [1] system, as an alternative or supplement to
sending a subset by email to .  Bunsen
allows minimum-infrastructure archiving, indexing, and analysis of
test results.

% contrib/test_summary -b
echo 'master' > .bunsen.source.gitbranch &&
echo 'basepoints/gcc-15-3524-ga523c2ba5862' > .bunsen.source.gitdescribe &&
echo 'a523c2ba58621c3630a1cd890d6db82879f92c90' > .bunsen.source.gitname &&
echo 'git://gcc.gnu.org/git/gcc.git' > .bunsen.source.gitrepo &&
(find . -name '*.log' -o -name '*.sum' -o -name '.bunsen.*' | 
t-upload-git-push 'ssh://sourceware.org/git/bunsendb.git/' 
'fche/gcc/x86_64-pc-linux-gnu/x86_64-pc-linux-gnu/20240923-1817')

Commit access to the sourceware bunsen database [2] is available on
request [3], so uploads automatically show up in the web interface
[4], but one may also operate a private copy of the system to use it
entirely locally.  A unique tag name is generated from one's userid,
the gcc host/target triplets, and a timestamp, but these defaults may
be overridden with contrib/test_summary options.  The git
commit/tag/push machinery is wrapped into a tiny "t-upload-git-push"
shell script, which may be downloaded from bunsen.git into your $PATH.

[1] https://sourceware.org/bunsen/
[2] https://sourceware.org/git/bunsendb.git
[3] 
[4] https://builder.sourceware.org/testruns/

https://inbox.sourceware.org/bunsen/20240913201848.gc25...@redhat.com/

ChangeLog:

   * Makefile.tpl, Makefile.in: Add bunsen-report.log target.

contrib/ChangeLog:

   * test_summary: Add -b (bunsen) mode to report all test results
 into a https://sourceware.org/bunsen/-like system instead of
 emailing extracts.

Signed-Off-By: Frank Ch. Eigler 

diff --git a/Makefile.in b/Makefile.in
index 966d60454960..8c352f7a2956 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -2852,6 +2852,11 @@ mail-report-with-warnings.log: warning.log
chmod +x $@
echo If you really want to send e-mail, run ./$@ now
 
+bunsen-report.log:
+   $(srcdir)/contrib/test_summary -b >$@
+   chmod +x $@
+   echo If you really want to send to bunsen, run ./$@ now
+
 # Local Vim config
 
 $(srcdir)/.local.vimrc:
diff --git a/Makefile.tpl b/Makefile.tpl
index da38dca697ad..9816fcd6f5b2 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -1034,6 +1034,11 @@ mail-report-with-warnings.log: warning.log
chmod +x $@
echo If you really want to send e-mail, run ./$@ now
 
+bunsen-report.log:
+   $(srcdir)/contrib/test_summary -b >$@
+   chmod +x $@
+   echo If you really want to send to bunsen, run ./$@ now
+
 # Local Vim config
 
 $(srcdir)/.local.vimrc:
diff --git a/contrib/test_summary b/contrib/test_summary
index 5760b053ec27..b4a9c92b753e 100755
--- a/contrib/test_summary
+++ b/contrib/test_summary
@@ -39,6 +39,10 @@ if test x"$1" = "x-h"; then
  should be selected from the log files.
  -f: force reports to be mailed; if omitted, only reports that differ
  from the sent.* version are sent.
+ -b: instead of emailing, push test logs into a bunsen git repo
+ -bg REPO: specify the bunsen git repo to override default
+ -bi TAG1: specify the bunsen tag prefix (user name)
+ -bt TAG2: specify the bunsen tag suffix (build name)
 _EOF
   exit 0
 fi
@@ -57,6 +61,10 @@ fi
 : ${filesuffix=}; export filesuffix
 : ${move=true}; export move
 : ${forcemail=false}; export forcemail
+: ${bunsen=false};
+: ${bunsengit=ssh://sourceware.org/git/bunsendb.git/};
+: ${bunsentag1=`whoami`};
+: ${bunsentag2=gcc/`grep ^host= config.log | tr -

Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion

2024-09-23 Thread Uros Bizjak

On Mon, Sep 23, 2024 at 4:58 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix the following ICE for -O2 -m32 of x86_64.
>
> during RTL pass: expand
> JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
> int)':
> JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
> expand_fn_using_insn, at internal-fn.cc:263
> 3 | void DequeueEvent(unsigned frame) {
>   |  ^~~~
> 0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
> diagnostic_metadata const*, diagnostic_option_id, char const*,
> __va_list_tag (*) [1], diagnostic_t)
> ???:0
> 0x27c4a3f internal_error(char const*, ...)
> ???:0
> 0x27b3994 fancy_abort(char const*, int, char const*)
> ???:0
> 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
> ???:0
> 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int)
> ???:0
> 0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
> ???:0
>
> We allowed the operand convert when matching SAT_SUB in match.pd, to support
> the zip benchmark SAT_SUB pattern.  Aka,
>
> (convert? (minus (convert1? @0) (convert1? @1))) for below sample code.
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> The pattern match for SAT_SUB itself may also act on below scalar sample
> code too.
>
> unsigned long long GetTimeFromFrames(int);
> unsigned long long GetMicroSeconds();
>
> void DequeueEvent(unsigned frame) {
>   long long frame_time = GetTimeFromFrames(frame);
>   unsigned long long current_time = GetMicroSeconds();
>   DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> }
>
> Aka:
>
> uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);
>
> Then there will be a problem when ia32 or -m32 is given when compiling.
> Because we only check the lhs (aka uint32_t) type is supported by ifn
> and missed the operand (aka uint64_t).  Mostly DImode is disabled for
> 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> PR target/116814

This is not "target", but "middle-end" component. Even though the bug
is exposed on x86_64 target, the fix is in the middle-end code, not in
the target code.

> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
> ifn is_supported check for operand TREE type.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/pr116814-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
>  gcc/tree-ssa-math-opts.cc | 23 +++
>  2 files changed, 27 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C
>
> diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
> b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> new file mode 100644
> index 000..8db5b020cfd
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-options "-O2 -m32" } */

Please remove -m32 and use "{ dg-do compile { target ia32 } }" instead.

Uros,

> +
> +unsigned long long GetTimeFromFrames(int);
> +unsigned long long GetMicroSeconds();
> +
> +void DequeueEvent(unsigned frame) {
> +  long long frame_time = GetTimeFromFrames(frame);
> +  unsigned long long current_time = GetMicroSeconds();
> +
> +  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> +}
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index d61668aacfc..361761cedef 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call 
> (gimple_stmt_iterator *gsi, gphi *phi,
> internal_fn fn, tree lhs, tree op_0,
> tree op_1)
>  {
> -  if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), 
> OPTIMIZE_FOR_BOTH))
> -{
> -  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
> -  gimple_call_set_lhs (call, lhs);
> -  gsi_insert_before (gsi, call, GSI_SAME_STMT);
> +  tree lhs_type = TREE_TYPE (lhs);
> +  tree op_type = TREE_TYPE (op_0);
>
> -  gimple_stmt_iterator psi = gsi_for_stmt (phi);
> -  remove_phi_node (&psi, /* release_lhs_p */ false);
> -}
> +  if (!direct_internal_fn_supported_p (fn, lhs_type, OPTIMIZE_FOR_BOTH))
> +return;
> +
> +  if (lhs_type != op_type
> +  && !direct_internal_fn_supported_p (fn, op_type, OPTIMIZE_FOR_BOTH))
> +return;
> +
> +  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
> +  gimple_call_set_

Re: [PATCH] hosthooks.h: Fix GCC_HOST_HOOKS_H typo

2024-09-23 Thread Jiawei


On Mon, Sep 23, 2024 at 10:12 AM Yangyu Chen
 wrote:
>//>/The comment of the final endif in hosthooks.h is wrong, it should be 
/>/GCC_HOST_HOOKS_H instead of GCC_LANG_HOOKS_H. /
This looks obvious to me. Do you have push access or do you need
someone to push this change for you?

Thanks,
Andrew

>//>/gcc/ChangeLog: />//>/* hosthooks.h (struct host_hooks): Fix GCC_HOST_HOOKS_H typo. />//>/Signed-off-by: Yangyu Chen 
 />/--- />/gcc/hosthooks.h | 2 +- />/1 file changed, 1 insertion(+), 1 deletion(-) />//>/diff --git 
a/gcc/hosthooks.h b/gcc/hosthooks.h />/index 53363801330..8178c9c692a 100644 />/--- a/gcc/hosthooks.h />/+++ b/gcc/hosthooks.h />/@@ -47,4 
+47,4 @@ struct host_hooks />//* Each host provides its own. */ />/extern const struct host_hooks host_hooks; />//>/-#endif /* 
GCC_LANG_HOOKS_H */ />/+#endif /* GCC_HOST_HOOKS_H */ />/-- />/2.45.2 />



Thanks, committed into trunk.

Jiawei

RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator

2024-09-23 Thread Prathamesh Kulkarni



> -Original Message-
> From: Richard Biener 
> Sent: Monday, September 9, 2024 7:24 PM
> To: Prathamesh Kulkarni 
> Cc: Richard Sandiford ; Thomas Schwinge
> ; gcc-patches@gcc.gnu.org
> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
> 
> >
> >
> > > -Original Message-
> > > From: Prathamesh Kulkarni 
> > > Sent: Thursday, August 22, 2024 7:41 PM
> > > To: Richard Biener 
> > > Cc: Richard Sandiford ; Thomas Schwinge
> > > ; gcc-patches@gcc.gnu.org
> > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > for accelerator
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Wednesday, August 21, 2024 5:09 PM
> > > > To: Prathamesh Kulkarni 
> > > > Cc: Richard Sandiford ; Thomas Schwinge
> > > > ; gcc-patches@gcc.gnu.org
> > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > for
> > > > accelerator
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> > > >
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Richard Biener 
> > > > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > > > To: Richard Sandiford 
> > > > > > Cc: Prathamesh Kulkarni ; Thomas
> > > Schwinge
> > > > > > ; gcc-patches@gcc.gnu.org
> > > > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > streaming
> > > in
> > > > > > for accelerator
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > > > :
> > > > > > >
> > > > > > > Prathamesh Kulkarni  writes:
> > > > > > >> diff --git a/gcc/lto-streamer-in.cc
> > > > > > >> b/gcc/lto-streamer-in.cc index
> > > > > > >> cbf6041fd68..0420183faf8 100644
> > > > > > >> --- a/gcc/lto-streamer-in.cc
> > > > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
> > > > > > >> If
> > > > not
> > > > > > see
> > > > > > >> #include "debug.h"
> > > > > > >> #include "alloc-pool.h"
> > > > > > >> #include "toplev.h"
> > > > > > >> +#include "stor-layout.h"
> > > > > > >>
> > > > > > >> /* Allocator used to hold string slot entries for line map
> > > > > > streaming.
> > > > > > >> */ static struct object_allocator
> > > > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> > > lto_read_tree_1
> > > > > > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > > > > > >> with -g1, see for example PR113488.  */
> > > > > > >>   else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr)
> > > ==
> > > > > > expr)
> > > > > > >>DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > > > >> +
> > > > > > >> +#ifdef ACCEL_COMPILER
> > > > > > >> +  /* For decl with aggregate type, host streams out
> > > > VOIDmode.
> > > > > > >> + Compute the correct DECL_MODE by calling
> relayout_decl.
> > > > */
> > > > > > >> +  if ((VAR_P (expr)
> > > > > > >> +   || TREE_CODE (expr) == PARM_DECL
> > > > > > >> +   || TREE_CODE (expr) == FIELD_DECL)
> > > > > > >> +  && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > > >> +  && DECL_MODE (expr) == VOIDmode)
> > > > > > >> +relayout_decl (expr);
> > > > > > >> +#endif
> > > > > > >
> > > > > > > Genuine question, but: is relayout_decl safe in this
> context?
> > > > It
> > > > > > does
> > > > > > > a lot more than just reset the mode.  It also applies the
> > > target
> > > > > > ABI's
> > > > > > > preferences wrt alignment, padding, and so on, rather than
> > > > > > preserving
> > > > > > > those of the host's.
> > > > > >
> > > > > > It would be better to just recompute the mode here.
> > > > > Hi,
> > > > > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > > > (expr)) in lto_read_tree_1 instead of calling relayout_decl
> (expr).
> > > > > I checked layout_decl_type does the same thing for setting decl
> > > > mode,
> > > > > except for bit fields. Since bit-fields cannot have aggregate
> > > type,
> > > > I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > > (expr))
> > > > would be OK in this case ?
> > > >
> > > > Yep, that should work.
> > > Thanks, I have committed the patch in:
> > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b182
> > > 87
> > > 1e105f47823b8e72
> > Hi,
> > This also results in same failure (using OImode) for vector of 256-bit
> > type, which was triggered for firstprivate-mappings-1.c.
> > Can be reproduced with following simple test-case:
> >
> > typedef long v4di __attribute__((vector_size (sizeof (long) * 4)));
> > int main() {
> >   v4di x;
> >   #pragma acc parallel copy(x)
> > x;
> >   return 0;
> > }
> >
>

Re: [PATCH RFC] build: enable C++11 narrowing warnings

2024-09-23 Thread Richard Biener

On Mon, Sep 23, 2024 at 3:41 PM Jason Merrill  wrote:
>
> On 9/23/24 9:05 AM, Richard Biener wrote:
> > On Sat, Sep 21, 2024 at 2:49 AM Jason Merrill  wrote:
> >>
> >> Tested x86_64-pc-linux-gnu.  OK for trunk?
> >>
> >> -- 8< --
> >>
> >> We've been using -Wno-narrowing since gcc 4.7, but at this point narrowing
> >> diagnostics seem like a stable part of C++ and we should adjust.
> >>
> >> This patch changes -Wno-narrowing to -Wno-error=narrowing so that narrowing
> >> issues will still not break bootstrap, but we can see them.
> >>
> >> The rest of the patch fixes the narrowing warnings I see in an
> >> x86_64-pc-linux-gnu bootstrap.  In most of the cases, by adjusting the 
> >> types
> >> of various declarations so that we store the values in the same types we
> >> compute them in, which seems worthwhile anyway.  This also allowed us to
> >> remove a few -Wsign-compare casts.
> >>
> >> The one place I didn't see how best to do this was in
> >> vect_prologue_cost_for_slp: changing const_nunits to unsigned int broke the
> >> call to TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits), since
> >> poly_uint64.is_constant wants a pointer to unsigned HOST_WIDE_INT.  So I
> >> added casts in that one place.  Not too bad, I think.
> >>
> >> +   unsigned HOST_WIDE_INT foff = bitpos_of_field (field);
> >
> > Can you make bitpos_of_field return unsigned HOST_WIDE_INT then and adjust 
> > it
> > accordingly - it looks for shwi fitting but negative DECL_FIELD_OFFSET
> > or BIT_OFFSET are not a thing.
>
> So, like the attached?
>
> >> @@ -7471,7 +7471,8 @@ vect_prologue_cost_for_slp (slp_tree node,
> >> nelt_limit = const_nunits;
> >> hash_set vector_ops;
> >> for (unsigned int i = 0; i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); 
> >> ++i)
> >> -   if (!vector_ops.add ({ ops, i * const_nunits, const_nunits }))
> >
> > So why do we diagnose this case (unsigned int member) but not ...
> >
> >> +   if (!vector_ops.add
> >> +   ({ ops, i * (unsigned)const_nunits, (unsigned)const_nunits }))
> >>starts.quick_push (i * const_nunits);
> >
> > ... this one - unsigned int function argument?
>
> Because the former is in { }, and the latter isn't; narrowing
> conversions are only ill-formed within { }.
>
> > I think it would be slightly better to do
> >
> >  {
> > unsigned start = (unsigned) const_units * i;
> > if (!vector_ops.add ({ ops, start, const_unints }))
> >   starts.quick_push (start);
> >  }
> >
> > to avoid the non-obvious difference between both.
>
> We'd still need the cast for the third element, but now I notice we can
> use nelt_limit instead since it just got the same value.
>
> So, OK with this supplemental patch?

OK.

Thanks,
Richard.

Re: [PATCH]middle-end: check explicitly for external or constants when checking for loop invariant [PR116817]

2024-09-23 Thread Tamar Christina

> Can you explain how you get to see constant/external defs with 
> astmt_vec_info?  That's somehow a violation of some inherentinvariant in the 
> vectorizer.

I'm not sure I actually get any. It could be the condition is never hit with a 
stmt_vec_info. I had assumed however since the condition is part of a 
gimple_cond and if one of the arguments of the gimple_cond is loop bound, that 
the condition would be analyzed too.

So if you're saying you never get a stmt_vec_info for invariants at this point 
(I assume you could see you see them in the corresponding slp tree) then maybe 
checking for the stmt_vec_info is enough.

However, when I was looking around for how to check for externals I noticed 
other patterns also check for externals and constants. So I assumed that you 
could indeed get them.

Kind regards,
Tamar



From: Richard Biener 
Sent: Tuesday, September 24, 2024 7:45 AM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org ; nd ; 
j...@ventanamicro.com 
Subject: RE: [PATCH]middle-end: check explicitly for external or constants when 
checking for loop invariant [PR116817]

On Mon, 23 Sep 2024, Tamar Christina wrote:

> I had made the condition to strict before, here's an updated patch:
>
> Hi All,
>
> The previous check if a value was external was checking
> !vect_get_internal_def (vinfo, var) but this of course isn't completely right
> as they could reductions etc.
>
> This changes the check to just explicitly look at externals and constants.
> Note that reductions remain unhandled here, but we don't support codegen of
> boolean reductions today anyway.

Can you explain how you get to see constant/external defs with a
stmt_vec_info?  That's somehow a violation of some inherent
invariant in the vectorizer.

Richard.

> So at the time we do then this would have the be handled as well in lowering.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>PR tree-optimization/116817
>* tree-vect-patterns.cc (vect_recog_bool_pattern): Check for const or
>externals.
>
> gcc/testsuite/ChangeLog:
>
>PR tree-optimization/116817
>* g++.dg/vect/pr116817.cc: New test.
>
> -- inline copy of patch --
>
> diff --git a/gcc/testsuite/g++.dg/vect/pr116817.cc 
> b/gcc/testsuite/g++.dg/vect/pr116817.cc
> new file mode 100644
> index 
> ..7e28982fb138c24f956aedb03fa454d9d858
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/pr116817.cc
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +
> +int main_ulData0;
> +unsigned *main_pSrcBuffer;
> +int main(void) {
> +  int iSrc = 0;
> +  bool bData0;
> +  for (; iSrc < 4; iSrc++) {
> +if (bData0)
> +  main_pSrcBuffer[iSrc] = main_ulData0;
> +else
> +  main_pSrcBuffer[iSrc] = 0;
> +bData0 = !bData0;
> +  }
> +}
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> e7e877dd2adb55262822f1660f8d92b42d44e6d0..f0298b2ab97a1e7dd0d943340e1389c3c0fa796e
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6062,12 +6062,15 @@ vect_recog_bool_pattern (vec_info *vinfo,
>if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
>return NULL;
>
> +  stmt_vec_info var_def_info = vinfo->lookup_def (var);
>if (check_bool_pattern (var, vinfo, bool_stmts))
>var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
>else if (integer_type_for_mask (var, vinfo))
>return NULL;
>else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
> -&& !vect_get_internal_def (vinfo, var))
> +&& (!var_def_info
> +|| STMT_VINFO_DEF_TYPE (var_def_info) == vect_external_def
> +|| STMT_VINFO_DEF_TYPE (var_def_info) == vect_constant_def))
>{
>  /* If the condition is already a boolean then manually convert it to 
> a
> mask of the given integer type but don't set a vectype.  */
>

--
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH 04/10] c++/modules: Fix linkage checks for exported using-decls

2024-09-23 Thread Nathaniel Shead

On Tue, Sep 24, 2024 at 09:44:48AM +1000, Nathaniel Shead wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> This fixes some inconsistencies with what kinds of linkage various
> entities are assumed to have.  This also fixes handling of exported
> using-decls binding to GM entities and type aliases to better align with
> the standard's requirements.
> 

Linaro picked up a regression on ARM from this patch where the standard
library headers are exporting using-decls of internal linkage entities,
which is supposed to be valid (and I'd missed in this version of the
patch).

Here's an updated version of the patch which handles this case and adds
a dedicated testcase for it.

Bootstrapped and regtested on x86_64-pc-linux-gnu and
aarch64-unknown-linux-gnu.

-- >8 --

This fixes some inconsistencies with what kinds of linkage various
entities are assumed to have.  This also fixes handling of exported
using-decls binding to GM entities and type aliases to better align with
the standard's requirements.

gcc/cp/ChangeLog:

* name-lookup.cc (check_can_export_using_decl): Handle internal
linkage GM entities (but ignore in header units); use linkage
of entity ultimately referred to by aliases.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-10.C: Add tests for no-linkage, fix
expected linkage of aliases.
* g++.dg/modules/using-12.C: Likewise.
* g++.dg/modules/using-27.C: New test.
* g++.dg/modules/using-28_a.C: New test.
* g++.dg/modules/using-28_b.C: New test.
* g++.dg/modules/using-29.H: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc | 57 +--
 gcc/testsuite/g++.dg/modules/using-10.C   | 56 +-
 gcc/testsuite/g++.dg/modules/using-12.C   | 42 +++--
 gcc/testsuite/g++.dg/modules/using-27.C   | 14 ++
 gcc/testsuite/g++.dg/modules/using-28_a.C | 12 +
 gcc/testsuite/g++.dg/modules/using-28_b.C |  8 
 gcc/testsuite/g++.dg/modules/using-29.H   |  6 +++
 7 files changed, 154 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/using-27.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-28_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-28_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-29.H

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index c0f89f98d87..eb365b259d9 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -5206,38 +5206,47 @@ pushdecl_outermost_localscope (tree x)
 static bool
 check_can_export_using_decl (tree binding)
 {
-  tree decl = STRIP_TEMPLATE (binding);
-
-  /* Linkage is determined by the owner of an enumerator.  */
-  if (TREE_CODE (decl) == CONST_DECL)
-decl = TYPE_NAME (DECL_CONTEXT (decl));
+  /* Declarations in header units are always OK.  */
+  if (header_module_p ())
+return true;
 
-  /* If the using decl is exported, the things it refers
- to must also be exported (or not have module attachment).  */
-  if (!DECL_MODULE_EXPORT_P (decl)
-  && (DECL_LANG_SPECIFIC (decl)
- && DECL_MODULE_ATTACH_P (decl)))
+  /* We want the linkage of the underlying entity, so strip typedefs.
+ If the underlying entity is a builtin type then we're OK.  */
+  tree entity = binding;
+  if (TREE_CODE (entity) == TYPE_DECL)
 {
-  bool internal_p = !TREE_PUBLIC (decl);
+  entity = TYPE_MAIN_DECL (TREE_TYPE (entity));
+  if (!entity)
+   return true;
+}
 
-  /* A template in an anonymous namespace doesn't constrain TREE_PUBLIC
-until it's instantiated, so double-check its context.  */
-  if (!internal_p && TREE_CODE (binding) == TEMPLATE_DECL)
-   internal_p = decl_internal_context_p (decl);
+  linkage_kind linkage = decl_linkage (entity);
+  tree not_tmpl = STRIP_TEMPLATE (entity);
 
+  /* Attachment is determined by the owner of an enumerator.  */
+  if (TREE_CODE (not_tmpl) == CONST_DECL)
+not_tmpl = TYPE_NAME (DECL_CONTEXT (not_tmpl));
+
+  /* If the using decl is exported, the things it refers to must
+ have external linkage.  decl_linkage returns lk_external for
+ module linkage so also check for attachment.  */
+  if (linkage != lk_external
+  || (DECL_LANG_SPECIFIC (not_tmpl)
+ && DECL_MODULE_ATTACH_P (not_tmpl)
+ && !DECL_MODULE_EXPORT_P (not_tmpl)))
+{
   auto_diagnostic_group d;
   error ("exporting %q#D that does not have external linkage",
 binding);
-  if (TREE_CODE (decl) == TYPE_DECL && !DECL_IMPLICIT_TYPEDEF_P (decl))
-   /* An un-exported explicit type alias has no linkage.  */
-   inform (DECL_SOURCE_LOCATION (binding),
-   "%q#D declared here with no linkage", binding);
-  else if (internal_p)
-   inform (DECL_SOURCE_LOCATION (binding),
-   "%q#D declared here with internal linkage", binding);
+  if (linkage == lk_none)
+

RE: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion

2024-09-23 Thread Li, Pan2

Got it and thanks, let me rerun to make sure it works well as expected.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Tuesday, September 24, 2024 2:33 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; 
tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand 
promotion

On Tue, Sep 24, 2024 at 8:24 AM Li, Pan2  wrote:
>
> Thanks Uros for comments.
>
> > This is not "target", but "middle-end" component. Even though the bug
> > is exposed on x86_64 target, the fix is in the middle-end code, not in
> > the target code.
>
> Sure, will rename to middle-end.
>
> > Please remove -m32 and use "{ dg-do compile { target ia32 } }" instead.
>
> Is there any suggestion to run the "ia32" test when configure gcc build?
> I first leverage ia32 but complain UNSUPPORTED for this case.

You can add the following to your testsuite run:

RUNTESTFLAGS="--target-board=unix\{,-m32\}"

e.g:

make -j N -k check RUNTESTFLAGS=...

(where N is the number of make threads)

You can also add "dg.exp" or "dg.exp=pr12345.c" (or any other exp file
or testcase name) to RUNTESTFLAGS to run only one exp file or a single
test.

Uros.

> Pan
>
> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, September 24, 2024 2:17 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; 
> tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching 
> operand promotion
>
> On Mon, Sep 23, 2024 at 4:58 PM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to fix the following ICE for -O2 -m32 of x86_64.
> >
> > during RTL pass: expand
> > JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
> > int)':
> > JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
> > expand_fn_using_insn, at internal-fn.cc:263
> > 3 | void DequeueEvent(unsigned frame) {
> >   |  ^~~~
> > 0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
> > diagnostic_metadata const*, diagnostic_option_id, char const*,
> > __va_list_tag (*) [1], diagnostic_t)
> > ???:0
> > 0x27c4a3f internal_error(char const*, ...)
> > ???:0
> > 0x27b3994 fancy_abort(char const*, int, char const*)
> > ???:0
> > 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
> > ???:0
> > 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned 
> > int)
> > ???:0
> > 0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
> > ???:0
> >
> > We allowed the operand convert when matching SAT_SUB in match.pd, to support
> > the zip benchmark SAT_SUB pattern.  Aka,
> >
> > (convert? (minus (convert1? @0) (convert1? @1))) for below sample code.
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> >   } while (--n);
> > }
> >
> > The pattern match for SAT_SUB itself may also act on below scalar sample
> > code too.
> >
> > unsigned long long GetTimeFromFrames(int);
> > unsigned long long GetMicroSeconds();
> >
> > void DequeueEvent(unsigned frame) {
> >   long long frame_time = GetTimeFromFrames(frame);
> >   unsigned long long current_time = GetMicroSeconds();
> >   DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> > }
> >
> > Aka:
> >
> > uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);
> >
> > Then there will be a problem when ia32 or -m32 is given when compiling.
> > Because we only check the lhs (aka uint32_t) type is supported by ifn
> > and missed the operand (aka uint64_t).  Mostly DImode is disabled for
> > 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.
> >
> > The below test suites are passed for this patch.
> > * The rv64gcv fully regression test.
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> >
> > PR target/116814
>
> This is not "target", but "middle-end" component. Even though the bug
> is exposed on x86_64 target, the fix is in the middle-end code, not in
> the target code.
>
> > gcc/ChangeLog:
> >
> > * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
> > ifn is_supported check for operand TREE type.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/torture/pr116814-1.C: New test.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
> >  gcc/tree-ssa-math-opts.cc | 23 +++
> >  2 files changed, 27 insertions(+), 8 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C
> >
> > diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
> > b/gcc/tests

RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator

2024-09-23 Thread Richard Biener

On Tue, 24 Sep 2024, Prathamesh Kulkarni wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, September 9, 2024 7:24 PM
> > To: Prathamesh Kulkarni 
> > Cc: Richard Sandiford ; Thomas Schwinge
> > ; gcc-patches@gcc.gnu.org
> > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> > accelerator
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
> > 
> > >
> > >
> > > > -Original Message-
> > > > From: Prathamesh Kulkarni 
> > > > Sent: Thursday, August 22, 2024 7:41 PM
> > > > To: Richard Biener 
> > > > Cc: Richard Sandiford ; Thomas Schwinge
> > > > ; gcc-patches@gcc.gnu.org
> > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > > for accelerator
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Wednesday, August 21, 2024 5:09 PM
> > > > > To: Prathamesh Kulkarni 
> > > > > Cc: Richard Sandiford ; Thomas Schwinge
> > > > > ; gcc-patches@gcc.gnu.org
> > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > > for
> > > > > accelerator
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Richard Biener 
> > > > > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > > > > To: Richard Sandiford 
> > > > > > > Cc: Prathamesh Kulkarni ; Thomas
> > > > Schwinge
> > > > > > > ; gcc-patches@gcc.gnu.org
> > > > > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > > streaming
> > > > in
> > > > > > > for accelerator
> > > > > > >
> > > > > > > External email: Use caution opening links or attachments
> > > > > > >
> > > > > > >
> > > > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > > > > :
> > > > > > > >
> > > > > > > > Prathamesh Kulkarni  writes:
> > > > > > > >> diff --git a/gcc/lto-streamer-in.cc
> > > > > > > >> b/gcc/lto-streamer-in.cc index
> > > > > > > >> cbf6041fd68..0420183faf8 100644
> > > > > > > >> --- a/gcc/lto-streamer-in.cc
> > > > > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
> > > > > > > >> If
> > > > > not
> > > > > > > see
> > > > > > > >> #include "debug.h"
> > > > > > > >> #include "alloc-pool.h"
> > > > > > > >> #include "toplev.h"
> > > > > > > >> +#include "stor-layout.h"
> > > > > > > >>
> > > > > > > >> /* Allocator used to hold string slot entries for line map
> > > > > > > streaming.
> > > > > > > >> */ static struct object_allocator
> > > > > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> > > > lto_read_tree_1
> > > > > > > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > > > > > > >> with -g1, see for example PR113488.  */
> > > > > > > >>   else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr)
> > > > ==
> > > > > > > expr)
> > > > > > > >>DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > > > > >> +
> > > > > > > >> +#ifdef ACCEL_COMPILER
> > > > > > > >> +  /* For decl with aggregate type, host streams out
> > > > > VOIDmode.
> > > > > > > >> + Compute the correct DECL_MODE by calling
> > relayout_decl.
> > > > > */
> > > > > > > >> +  if ((VAR_P (expr)
> > > > > > > >> +   || TREE_CODE (expr) == PARM_DECL
> > > > > > > >> +   || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > >> +  && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > > > >> +  && DECL_MODE (expr) == VOIDmode)
> > > > > > > >> +relayout_decl (expr);
> > > > > > > >> +#endif
> > > > > > > >
> > > > > > > > Genuine question, but: is relayout_decl safe in this
> > context?
> > > > > It
> > > > > > > does
> > > > > > > > a lot more than just reset the mode.  It also applies the
> > > > target
> > > > > > > ABI's
> > > > > > > > preferences wrt alignment, padding, and so on, rather than
> > > > > > > preserving
> > > > > > > > those of the host's.
> > > > > > >
> > > > > > > It would be better to just recompute the mode here.
> > > > > > Hi,
> > > > > > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > > > > (expr)) in lto_read_tree_1 instead of calling relayout_decl
> > (expr).
> > > > > > I checked layout_decl_type does the same thing for setting decl
> > > > > mode,
> > > > > > except for bit fields. Since bit-fields cannot have aggregate
> > > > type,
> > > > > I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > > > (expr))
> > > > > would be OK in this case ?
> > > > >
> > > > > Yep, that should work.
> > > > Thanks, I have committed the patch in:
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b182
> > > > 87
> > > > 1e105f47823b8e72
> > > Hi,
> > > This also results in same failure (using OIm

Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-23 Thread Hongtao Liu

On Thu, Sep 19, 2024 at 2:08 PM Richard Biener

 wrote:
>
> On Wed, Sep 18, 2024 at 7:55 PM Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Thu, Sep 12, 2024 at 4:50 PM Hongtao Liu  wrote:
> > >>
> > >> On Wed, Sep 11, 2024 at 4:21 PM Hongtao Liu  wrote:
> > >> >
> > >> > On Wed, Sep 11, 2024 at 4:04 PM Richard Biener
> > >> >  wrote:
> > >> > >
> > >> > > On Wed, Sep 11, 2024 at 4:17 AM liuhongt  
> > >> > > wrote:
> > >> > > >
> > >> > > > GCC12 enables vectorization for O2 with very cheap cost model 
> > >> > > > which is restricted
> > >> > > > to constant tripcount. The vectorization capacity is very limited 
> > >> > > > w/ consideration
> > >> > > > of codesize impact.
> > >> > > >
> > >> > > > The patch extends the very cheap cost model a little bit to 
> > >> > > > support variable tripcount.
> > >> > > > But still disable peeling for gaps/alignment, runtime aliasing 
> > >> > > > checking and epilogue
> > >> > > > vectorization with the consideration of codesize.
> > >> > > >
> > >> > > > So there're at most 2 versions of loop for O2 vectorization, one 
> > >> > > > vectorized main loop
> > >> > > > , one scalar/remainder loop.
> > >> > > >
> > >> > > > .i.e.
> > >> > > >
> > >> > > > void
> > >> > > > foo1 (int* __restrict a, int* b, int* c, int n)
> > >> > > > {
> > >> > > >  for (int i = 0; i != n; i++)
> > >> > > >   a[i] = b[i] + c[i];
> > >> > > > }
> > >> > > >
> > >> > > > with -O2 -march=x86-64-v3, will be vectorized to
> > >> > > >
> > >> > > > .L10:
> > >> > > > vmovdqu (%r8,%rax), %ymm0
> > >> > > > vpaddd  (%rsi,%rax), %ymm0, %ymm0
> > >> > > > vmovdqu %ymm0, (%rdi,%rax)
> > >> > > > addq$32, %rax
> > >> > > > cmpq%rdx, %rax
> > >> > > > jne .L10
> > >> > > > movl%ecx, %eax
> > >> > > > andl$-8, %eax
> > >> > > > cmpl%eax, %ecx
> > >> > > > je  .L21
> > >> > > > vzeroupper
> > >> > > > .L12:
> > >> > > > movl(%r8,%rax,4), %edx
> > >> > > > addl(%rsi,%rax,4), %edx
> > >> > > > movl%edx, (%rdi,%rax,4)
> > >> > > > addq$1, %rax
> > >> > > > cmpl%eax, %ecx
> > >> > > > jne .L12
> > >> > > >
> > >> > > > As measured with SPEC2017 on EMR, the patch(N-Iter) improves 
> > >> > > > performance by 4.11%
> > >> > > > with extra 2.8% codeisze, and cheap cost model improve performance 
> > >> > > > by 5.74% with
> > >> > > > extra 8.88% codesize. The details are as below
> > >> > >
> > >> > > I'm confused by this, is the N-Iter numbers ontop of the cheap cost
> > >> > > model numbers?
> > >> > No, it's N-iter vs base(very cheap cost model), and cheap vs base.
> > >> > >
> > >> > > > Performance measured with -march=x86-64-v3 -O2 on EMR
> > >> > > >
> > >> > > > N-Iter  cheap cost model
> > >> > > > 500.perlbench_r -0.12%  -0.12%
> > >> > > > 502.gcc_r   0.44%   -0.11%
> > >> > > > 505.mcf_r   0.17%   4.46%
> > >> > > > 520.omnetpp_r   0.28%   -0.27%
> > >> > > > 523.xalancbmk_r 0.00%   5.93%
> > >> > > > 525.x264_r  -0.09%  23.53%
> > >> > > > 531.deepsjeng_r 0.19%   0.00%
> > >> > > > 541.leela_r 0.22%   0.00%
> > >> > > > 548.exchange2_r -11.54% -22.34%
> > >> > > > 557.xz_r0.74%   0.49%
> > >> > > > GEOMEAN INT -1.04%  0.60%
> > >> > > >
> > >> > > > 503.bwaves_r3.13%   4.72%
> > >> > > > 507.cactuBSSN_r 1.17%   0.29%
> > >> > > > 508.namd_r  0.39%   6.87%
> > >> > > > 510.parest_r3.14%   8.52%
> > >> > > > 511.povray_r0.10%   -0.20%
> > >> > > > 519.lbm_r   -0.68%  10.14%
> > >> > > > 521.wrf_r   68.20%  76.73%
> > >> > >
> > >> > > So this seems to regress as well?
> > >> > Niter increases performance less than the cheap cost model, that's
> > >> > expected, it is not a regression.
> > >> > >
> > >> > > > 526.blender_r   0.12%   0.12%
> > >> > > > 527.cam4_r  19.67%  23.21%
> > >> > > > 538.imagick_r   0.12%   0.24%
> > >> > > > 544.nab_r   0.63%   0.53%
> > >> > > > 549.fotonik3d_r 14.44%  9.43%
> > >> > > > 554.roms_r  12.39%  0.00%
> > >> > > > GEOMEAN FP  8.26%   9.41%
> > >> > > > GEOMEAN ALL 4.11%   5.74%
> > >>
> > >> I've tested the patch on aarch64, it shows similar improvement with
> > >> little codesize increasement.
> > >> I haven't tested it on other backends, but I think it would have
> > >> similar good improvements
> > >
> > > I think overall this is expected since a constant niter dividable by
> > > the VF isn't a common situation.  So the question is mostly whether
> > > we want to pay the size penalty or not.
> > >
> > > Looking only at docs the proposed change would make the very-cheap
> > > cost model nearly(?) equivalent to the cheap one so maybe the answer
> > > is

[PATCH] x86: Extend AVX512 Vectorization for Popcount in Various Modes

2024-09-23 Thread Levy Hsu

This patch enables vectorization of the popcount operation for V2QI, V4QI,
V8QI, V2HI, V4HI, and V2SI modes.

gcc/ChangeLog:

* config/i386/mmx.md:
(VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte QImode.
(popcount2): New pattern for popcount of V2QI/V4QI/V8QI mode.
(popcount2): New pattern for popcount of V2HI/V4HI mode.
(popcountv2si2): New pattern for popcount of V2SI mode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-popcount-1.c: New test.
---
 gcc/config/i386/mmx.md| 24 +
 .../gcc.target/i386/part-vect-popcount-1.c| 49 +++
 2 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-popcount-1.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 4bc191b874b..147ae150bf3 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -70,6 +70,9 @@
 ;; 8-byte and 4-byte HImode vector modes
 (define_mode_iterator VI2_32_64 [(V4HI "TARGET_MMX_WITH_SSE") V2HI])
 
+;; 8-byte, 4-byte and 2-byte QImode vector modes
+(define_mode_iterator VI1_16_32_64 [(V8QI "TARGET_MMX_WITH_SSE") V4QI V2QI])
+
 ;; 4-byte and 2-byte integer vector modes
 (define_mode_iterator VI_16_32 [V4QI V2QI V2HI])
 
@@ -6786,3 +6789,24 @@
   [(set_attr "type" "mmx")
(set_attr "modrm" "0")
(set_attr "memory" "none")])
+
+(define_insn "popcount2"
+  [(set (match_operand:VI1_16_32_64 0 "register_operand" "=v")
+   (popcount:VI1_16_32_64
+ (match_operand:VI1_16_32_64 1 "register_operand" "v")))]
+  "TARGET_AVX512VL && TARGET_AVX512BITALG"
+  "vpopcntb\t{%1, %0|%0, %1}")
+
+(define_insn "popcount2"
+  [(set (match_operand:VI2_32_64 0 "register_operand" "=v")
+   (popcount:VI2_32_64
+ (match_operand:VI2_32_64 1 "register_operand" "v")))]
+  "TARGET_AVX512VL && TARGET_AVX512BITALG"
+  "vpopcntw\t{%1, %0|%0, %1}")
+
+(define_insn "popcountv2si2"
+  [(set (match_operand:V2SI 0 "register_operand" "=v")
+   (popcount:V2SI
+ (match_operand:V2SI 1 "register_operand" "v")))]
+  "TARGET_AVX512VPOPCNTDQ && TARGET_AVX512VL && TARGET_MMX_WITH_SSE"
+  "vpopcntd\t{%1, %0|%0, %1}")
diff --git a/gcc/testsuite/gcc.target/i386/part-vect-popcount-1.c 
b/gcc/testsuite/gcc.target/i386/part-vect-popcount-1.c
new file mode 100644
index 000..a30f6ec4726
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/part-vect-popcount-1.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512vpopcntdq -mavx512bitalg -mavx512vl" } */
+/* { dg-final { scan-assembler-times "vpopcntd\[^\n\r\]*xmm\[0-9\]" 1 { target 
{ ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpopcntw\[^\n\r\]*xmm\[0-9\]" 3 { target 
ia32 } } } */
+/* { dg-final { scan-assembler-times "vpopcntw\[^\n\r\]*xmm\[0-9\]" 2 { target 
{ ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpopcntb\[^\n\r\]*xmm\[0-9\]" 4 { target 
ia32 } } } */
+/* { dg-final { scan-assembler-times "vpopcntb\[^\n\r\]*xmm\[0-9\]" 3 { target 
{ ! ia32 } } } } */
+
+void
+foo1 (int* a, int* __restrict b)
+{
+  for (int i = 0; i != 2; i++)
+a[i] = __builtin_popcount (b[i]);
+}
+
+void
+foo2 (unsigned short* a, unsigned short* __restrict b)
+{
+  for (int i = 0; i != 4; i++)
+a[i] = __builtin_popcount (b[i]);
+}
+
+void
+foo3 (unsigned short* a, unsigned short* __restrict b)
+{
+  for (int i = 0; i != 2; i++)
+a[i] = __builtin_popcount (b[i]);
+}
+
+void
+foo4 (unsigned char* a, unsigned char* __restrict b)
+{
+  for (int i = 0; i != 8; i++)
+a[i] = __builtin_popcount (b[i]);
+}
+
+void
+foo5 (unsigned char* a, unsigned char* __restrict b)
+{
+  for (int i = 0; i != 4; i++)
+a[i] = __builtin_popcount (b[i]);
+}
+
+void
+foo6 (unsigned char* a, unsigned char* __restrict b)
+{
+  for (int i = 0; i != 2; i++)
+a[i] = __builtin_popcount (b[i]);
+}
-- 
2.31.1

[PATCH v1] RISC-V: Fix incorrect test macro for signed scalar SAT_ADD form 2 run test

2024-09-23 Thread pan2 . li

From: Pan Li 

This patch would like to fix one incorrect test macro usage for
form 2 of signed scalar SAT_ADD run test.  It should leverage the
_FMT_2 instead of _FMT_1 for form 2.

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_s_add-run-5.c: Take form 2 for run test.
* gcc.target/riscv/sat_s_add-run-6.c: Ditto.
* gcc.target/riscv/sat_s_add-run-7.c: Ditto.
* gcc.target/riscv/sat_s_add-run-8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h   | 2 ++
 gcc/testsuite/gcc.target/riscv/sat_s_add-run-5.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/sat_s_add-run-6.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/sat_s_add-run-7.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/sat_s_add-run-8.c | 4 ++--
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index a2617b6db70..77b5ef1807b 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -141,6 +141,8 @@ sat_s_add_##T##_fmt_2 (T x, T y) \
 return sum;  \
   return x < 0 ? MIN : MAX;  \
 }
+#define DEF_SAT_S_ADD_FMT_2_WRAP(T, UT, MIN, MAX) \
+  DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX)
 
 #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)   \
 T __attribute__((noinline))\
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-run-5.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add-run-5.c
index 9a4ce338d0c..d57e0a0d195 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_s_add-run-5.c
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-run-5.c
@@ -7,10 +7,10 @@
 #define T1 int8_t
 #define T2 uint8_t
 
-DEF_SAT_S_ADD_FMT_1_WRAP(T1, T2, INT8_MIN, INT8_MAX)
+DEF_SAT_S_ADD_FMT_2_WRAP(T1, T2, INT8_MIN, INT8_MAX)
 
 #define DATA TEST_BINARY_DATA_WRAP(T1, ssadd)
 #define TTEST_BINARY_STRUCT_DECL(T1, ssadd)
-#define RUN_BINARY(x, y) RUN_SAT_S_ADD_FMT_1_WRAP(T1, x, y)
+#define RUN_BINARY(x, y) RUN_SAT_S_ADD_FMT_2_WRAP(T1, x, y)
 
 #include "scalar_sat_binary_run_xxx.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-run-6.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add-run-6.c
index 34459b85e2b..cdac5bdb883 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_s_add-run-6.c
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-run-6.c
@@ -7,10 +7,10 @@
 #define T1 int16_t
 #define T2 uint16_t
 
-DEF_SAT_S_ADD_FMT_1_WRAP(T1, T2, INT16_MIN, INT16_MAX)
+DEF_SAT_S_ADD_FMT_2_WRAP(T1, T2, INT16_MIN, INT16_MAX)
 
 #define DATA TEST_BINARY_DATA_WRAP(T1, ssadd)
 #define TTEST_BINARY_STRUCT_DECL(T1, ssadd)
-#define RUN_BINARY(x, y) RUN_SAT_S_ADD_FMT_1_WRAP(T1, x, y)
+#define RUN_BINARY(x, y) RUN_SAT_S_ADD_FMT_2_WRAP(T1, x, y)
 
 #include "scalar_sat_binary_run_xxx.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-run-7.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add-run-7.c
index 4d4841f4066..4ac952e27fa 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_s_add-run-7.c
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-run-7.c
@@ -7,10 +7,10 @@
 #define T1 int32_t
 #define T2 uint32_t
 
-DEF_SAT_S_ADD_FMT_1_WRAP(T1, T2, INT32_MIN, INT32_MAX)
+DEF_SAT_S_ADD_FMT_2_WRAP(T1, T2, INT32_MIN, INT32_MAX)
 
 #define DATA TEST_BINARY_DATA_WRAP(T1, ssadd)
 #define TTEST_BINARY_STRUCT_DECL(T1, ssadd)
-#define RUN_BINARY(x, y) RUN_SAT_S_ADD_FMT_1_WRAP(T1, x, y)
+#define RUN_BINARY(x, y) RUN_SAT_S_ADD_FMT_2_WRAP(T1, x, y)
 
 #include "scalar_sat_binary_run_xxx.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-run-8.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add-run-8.c
index df818879628..4d25e7f171d 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_s_add-run-8.c
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-run-8.c
@@ -7,10 +7,10 @@
 #define T1 int64_t
 #define T2 uint64_t
 
-DEF_SAT_S_ADD_FMT_1_WRAP(T1, T2, INT64_MIN, INT64_MAX)
+DEF_SAT_S_ADD_FMT_2_WRAP(T1, T2, INT64_MIN, INT64_MAX)
 
 #define DATA TEST_BINARY_DATA_WRAP(T1, ssadd)
 #define TTEST_BINARY_STRUCT_DECL(T1, ssadd)
-#define RUN_BINARY(x, y) RUN_SAT_S_ADD_FMT_1_WRAP(T1, x, y)
+#define RUN_BINARY(x, y) RUN_SAT_S_ADD_FMT_2_WRAP(T1, x, y)
 
 #include "scalar_sat_binary_run_xxx.h"
-- 
2.43.0

Re: [PATCH 06/10] c++/modules: Detect exposures of TU-local entities

2024-09-23 Thread Nathaniel Shead

On Tue, Sep 24, 2024 at 09:45:37AM +1000, Nathaniel Shead wrote:
> I feel like there should be a way to make use of LAMBDA_TYPE_EXTRA_SCOPE to
> avoid the need for the new TYPE_DEFINED_IN_INITIALIZER_P flag, perhaps once
> something like my patch here[1] is accepted (but with further embellishments
> for concepts, probably), but I wasn't able to work it out. Since currently as
> far as I'm aware only lambdas can satisfy being a type with no name defined in
> an 'initializer' this does seem a little overkill but I've applied it to all
> class types just in case.
> 
> [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662393.html
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> Currently, the modules streaming code implements some checks for
> declarations in the CMI that reference (some kinds of) internal-linkage
> entities, and errors if so.  This patch expands on that support to
> implement the logic for exposures of TU-local entities as defined in
> [basic.link] since P1815.
> 
> This will cause some code that previously errored in modules to start
> compiling; for instance, template specialisations of internal linkage
> functions.
> 
> However, some code that previously appeared valid will with this patch
> no longer compile, notably some kinds of usages of internal linkage
> functions included from the GMF.  This appears to be related to P2808
> and FR-025, however as yet there doesn't appear to be consensus for
> changing these rules so I've implemented them as-is.
> 
> This patch leaves a couple of things out.  In particular, a couple of
> the rules for what is a TU-local entity currently seem to me to be
> redundant; I've left them as FIXMEs to be handled once I can find
> testcases that aren't adequately supported by the other logic here.
> 
> Additionally, there are some exceptions for when naming a TU-local
> entity is not always an exposure; I've left support for this to a
> follow-up patch for easier review, as it has broader implications for
> streaming.
> 
> Finally, this patch makes a couple of small adjustments to the modules
> streaming logic to prune any leftover TU-local deps (that aren't
> erroneous exposures).  This is required for this patch to ensure that
> later stages don't get confused by any leftover TU-local entities
> floating around.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (TYPE_DEPENDENT_P_VALID): Fix whitespace.
>   (TYPE_DEFINED_IN_INITIALIZER_P): New accessor.
>   * module.cc (DB_IS_INTERNAL_BIT): Rename to...
>   (DB_TU_LOCAL_BIT): ...this.
>   (DB_REFS_INTERNAL_BIT): Rename to...
>   (DB_EXPOSURE_BIT): ...this.
>   (depset::hash::is_internal): Rename to...
>   (depset::hash::is_tu_local): ...this.
>   (depset::hash::refs_internal): Rename to...
>   (depset::hash::is_exposure): ...this.
>   (depset::hash::is_tu_local_entity): New function.
>   (depset::hash::has_tu_local_tmpl_arg): New function.
>   (depset::hash::is_tu_local_value): New function.
>   (depset::hash::make_dependency): Check for TU-local entities.
>   (depset::hash::add_dependency): Make current an exposure
>   whenever it references a TU-local entity.
>   (depset::hash::add_binding_entity): Don't create bindings for
>   any TU-local entity.
>   (depset::hash::finalize_dependencies): Rename flags and adjust
>   diagnostic messages to report exposures of TU-local entities.
>   (depset::tarjan::connect): Don't include any TU-local depsets.
>   (depset::hash::connect): Likewise.
>   * parser.h (struct cp_parser::in_initializer_p): New flag.
>   * parser.cc (cp_debug_parser): Print the new flag.
>   (cp_parser_new): Set the new flag to false.
>   (cp_parser_lambda_expression): Mark whether the lambda was
>   defined in an initializer.
>   (cp_parser_initializer): Set the new flag to true while parsing.
>   (cp_parser_class_head): Mark whether the class was defined in an
>   initializer.
>   (cp_parser_concept_definition): Set the new flag to true while
>   parsing.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/block-decl-2.C: Adjust messages.
>   * g++.dg/modules/internal-1.C: Adjust messages, remove XFAILs.
>   * g++.dg/modules/linkage-2.C: Adjust messages, remove XFAILS.
>   * g++.dg/modules/internal-3.C: New test.
>   * g++.dg/modules/internal-4.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/cp-tree.h|   7 +-
>  gcc/cp/module.cc| 388 +---
>  gcc/cp/parser.cc|  16 +
>  gcc/cp/parser.h |   3 +
>  gcc/testsuite/g++.dg/modules/block-decl-2.C |   2 +-
>  gcc/testsuite/g++.dg/modules/internal-1.C   |  15 +-
>  gcc/testsuite/g++.dg/modules/internal-3.C   |  18 +
>  gcc/testsuite/g++.dg/modules/internal-4.C   | 112 ++
>  gcc/testsuite/g++.dg/modules/linkage-2.C

1 2 >

1 - 100 of 107 matches

Mail list logo