Re: [PATCH] Don't remove /usr/lib and /lib from when passing to the linker [PR97304/104707]

2024-08-23 Thread Gerald Pfeifer
On Thu, 22 Aug 2024, Andrew Pinski wrote:
> With newer ld, the default search library path does not include /usr/lib 
> nor /lib but the driver decides to not pass -L down to the link for 
> these and then in some/most cases libc is not found.
> This code dates from at least 1992 and it is done in a way which is not 
> safe and does not make sense. So let's remove it.
> 
> Bootstrapped and tested on x86_64-linux-gnu (which defaults to being a 
> multilib).

Also bootstrapped on x86_64-unknown-freebsd13.3 (this was originally 
reported against the earlier x86_64-unknown-freebsd12.1) on a system 
where I also ran into this in April.

> gcc/ChangeLog:
> 
>   PR driver/104707
>   PR driver/97304
> 
>   * gcc.cc (is_directory): Don't not include /usr/lib and /lib
>   for library directory pathes. Remove library argument.
>   (add_to_obstack): Update call to is_directory.
>   (driver_handle_option): Likewise.
>   (spec_path): Likewise.

For the ChangeLog, maybe use "Don't remove /usr/lib and /lib from library 
directory paths" similar to the subject?

(My brain originally "autocorrected" and contracted "Don't not"...)


Thank you for tackling this longer standing issue which has been rearing 
its head again and again!

Gerald


Ping [PATCH 0/4] Prime path coverage in gcc/gcov

2024-08-23 Thread Jørgen Kvalsvik

Ping.

On 8/15/24 10:15, Jørgen Kvalsvik wrote:

Ping. Since the last patch I have fixed a few bugs in the path count
limit aborting, and a few minor rephrases in docs.

Jørgen Kvalsvik (4):
   testsuite: Use dg-compile, not gcc -c
   gcov: Cache source files
   gcov: branch, conds, calls in function summaries
   Add prime path coverage to gcc/gcov

  gcc/Makefile.in|6 +-
  gcc/builtins.cc|2 +-
  gcc/collect2.cc|5 +-
  gcc/common.opt |   16 +
  gcc/doc/gcov.texi  |  155 ++
  gcc/doc/invoke.texi|   36 +
  gcc/gcc.cc |4 +-
  gcc/gcov-counter.def   |3 +
  gcc/gcov-io.h  |3 +
  gcc/gcov.cc|  537 ++-
  gcc/ipa-inline.cc  |2 +-
  gcc/passes.cc  |4 +-
  gcc/path-coverage.cc   |  782 +
  gcc/prime-paths.cc | 2031 
  gcc/profile.cc |6 +-
  gcc/selftest-run-tests.cc  |1 +
  gcc/selftest.h |1 +
  gcc/testsuite/g++.dg/gcov/gcov-22.C|  170 ++
  gcc/testsuite/gcc.misc-tests/gcov-23.c |3 +-
  gcc/testsuite/gcc.misc-tests/gcov-29.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-30.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-31.c |   35 +
  gcc/testsuite/gcc.misc-tests/gcov-32.c |   24 +
  gcc/testsuite/lib/gcov.exp |   92 +-
  gcc/tree-profile.cc|   11 +-
  25 files changed, 5627 insertions(+), 40 deletions(-)
  create mode 100644 gcc/path-coverage.cc
  create mode 100644 gcc/prime-paths.cc
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-22.C
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-30.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-31.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-32.c





Re: [PATCH 1/3] gcov: Cache source files

2024-08-23 Thread Jan Hubicka
Hi,
> 1:4:int notmain(const char *entity)
> -: == inlined from hello.h ==
> 1:6:  if (s)
> branch  0 taken 0 (fallthrough)
> branch  1 taken 1
> #:7:printf ("hello, %s!\n", s);
> %:7-block 3
> call0 never executed
> -:8:  else
> 1:9:printf ("hello, world!\n");
> 1:9-block 4
> call0 returned 1
> 1:   10:  return 0;
> 1:   10-block 5
> -: == inlined from hello.h (end) ==
> -:5:{
> 1:6:  return hello (entity);
> 1:6-block 7
> -:7:}

This indeed looks like a reasonable goal.
> gcc/ChangeLog:
> 
>   * gcov.cc (release_structures): Release source_lines.
>   (slurp): New function.
>   (output_lines): Read sources with slurp.
> ---
>  gcc/gcov.cc | 70 -
>  1 file changed, 53 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/gcov.cc b/gcc/gcov.cc
> index e76a314041c..19019f404ee 100644
> --- a/gcc/gcov.cc
> +++ b/gcc/gcov.cc
> @@ -550,6 +550,11 @@ static vector names;
> a file being read multiple times.  */
>  static vector processed_files;
>  
> +/* The contents of a source file.  The nth SOURCE_LINES entry is the
> +   contents of the nth SOURCES, or empty if it has not or could not be
> +   read.  */
> +static vector*> source_lines;
> +
>  /* This holds data summary information.  */
>  
>  static unsigned object_runs;
> @@ -762,6 +767,8 @@ static string make_gcov_file_name (const char *, const 
> char *);
>  static char *mangle_name (const char *);
>  static void release_structures (void);
>  extern int main (int, char **);
> +static const vector&
> +slurp (const source_info &src, FILE *gcov_file, const char *line_start);
>  
>  function_info::function_info (): m_name (NULL), m_demangled_name (NULL),
>ident (0), lineno_checksum (0), cfg_checksum (0), has_catch (0),
> @@ -1804,6 +1811,15 @@ release_structures (void)
> it != functions.end (); it++)
>  delete (*it);
>  
> +  for (vector *lines : source_lines)
> +{
> +  if (lines)
> + for (const char *line : *lines)
> +   free (const_cast  (line));
> +  delete (lines);
> +}
> +  source_lines.resize (0);
> +
>for (fnfilter &filter : filters)
>  regfree (&filter.regex);
>  
> @@ -3246,6 +3262,41 @@ read_line (FILE *file)
>return pos ? string : NULL;
>  }
>  
> +/* Get the vector with the contents SRC, possibly from a cache.  If
> +   the reading fails, a message prefixed with LINE_START is written to
> +   GCOV_FILE.  */
> +static const vector&
> +slurp (const source_info &src, FILE *gcov_file,
> +   const char *line_start)
> +{
> +  if (source_lines.size () <= src.index)
> +source_lines.resize (src.index + 1);
> +
> +  /* Store vector pointers so that the returned references remain
> + stable and won't be broken by successive calls to slurp.  */
> +  if (!source_lines[src.index])
> +source_lines[src.index] = new vector ();
> +
> +  if (!source_lines[src.index]->empty ())
> +return *source_lines[src.index];
> +
> +  FILE *source_file = fopen (src.name, "r");
> +  if (!source_file)
> +fnotice (stderr, "Cannot open source file %s\n", src.name);
> +  else if (src.file_time == 0)
> +fprintf (gcov_file, "%sSource is newer than graph\n", line_start);
> +
> +  const char *retval;
> +  vector &lines = *source_lines[src.index];
> +  if (source_file)
> +while ((retval = read_line (source_file)))
> +  lines.push_back (xstrdup (retval));
> +
> +  if (source_file)
> +fclose (source_file);
> +  return lines;
So this is basically going to read all sources needed for single
compilation unit (.cc file) into memory at once.  I guess that is OK.
I wonder if we don't want to do something like mmap the source and then
set up source_lines array instead of reading every like into spearate
xstrduped chunk?
But that could be done incrementally, so th epatch is OK.

Honza
> +}
> +
>  /* Pad string S with spaces from left to have total width equal to 9.  */
>  
>  static void
> @@ -3435,9 +3486,6 @@ output_lines (FILE *gcov_file, const source_info *src)
>  #define  DEFAULT_LINE_START "-:0:"
>  #define FN_SEPARATOR "--\n"
>  
> -  FILE *source_file;
> -  const char *retval;
> -
>/* Print colorization legend.  */
>if (flag_use_colors)
>  fprintf (gcov_file, "%s",
> @@ -3464,17 +3512,8 @@ output_lines (FILE *gcov_file, const source_info *src)
>fprintf (gcov_file, DEFAULT_LINE_START "Runs:%u\n", object_runs);
>  }
>  
> -  source_file = fopen (src->name, "r");
> -  if (!source_file)
> -fnotice (stderr, "Cannot open source file %s\n", src->name);
> -  else if (src->file_time == 0)
> -fprintf (gcov_file, DEFAULT_LINE_START "Source is newer than graph\n");
> -
> -  vector source_lines;
> -  if (source_file)
> -while ((retval = read_line (source_file)) != NULL)
> -  source_lines.push_back (xstrdup (

Re: [PATCH 1/4] testsuite: Use dg-compile, not gcc -c

2024-08-23 Thread Jan Hubicka
> Since this is a pure compile test it makes sense to inform dejagnu of
> it.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.misc-tests/gcov-23.c: Use dg-compile, not gcc -c

OK,
Honza
> ---
>  gcc/testsuite/gcc.misc-tests/gcov-23.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.misc-tests/gcov-23.c 
> b/gcc/testsuite/gcc.misc-tests/gcov-23.c
> index 72849d80e3a..72ba0aa1389 100644
> --- a/gcc/testsuite/gcc.misc-tests/gcov-23.c
> +++ b/gcc/testsuite/gcc.misc-tests/gcov-23.c
> @@ -1,4 +1,5 @@
> -/* { dg-options "-fcondition-coverage -ftest-coverage -O2 -c" } */
> +/* { dg-options "-fcondition-coverage -ftest-coverage -O2" } */
> +/* { dg-do compile } */
>  
>  #include 
>  #include 
> -- 
> 2.39.2
> 


[wwwdocs,PATCH] readings: Add ANSI C89 (was: C89 question: Do we need to accept -Wint-conversion warnings)

2024-08-23 Thread Gerald Pfeifer
[ gcc@ -> gcc-patches@ ]

On Tue, 10 Oct 2023, Joseph Myers wrote:
> Implicit conversions between pointers and integers are not valid C89.
> 
> ANSI C89, as adopted as FIPS PUB 160, is available from NIST: 
> https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub160.pdf

Thanks, Joseph. I added this links to our readings page; patch below.

Gerald


commit 16dcbb39b29818cad658879292a855cb481359ce
Author: Gerald Pfeifer 
Date:   Fri Aug 23 09:41:20 2024 +0200

readings: Add ANSI C89

diff --git a/htdocs/readings.html b/htdocs/readings.html
index 1f1dbae3..423bc4c1 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -338,6 +338,9 @@ names.
 https://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm";>C99
 Defect Reports
 
+https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub160.pdf";>ANSI C89,
+as adopted as FIPS PUB 160
+
 http://www.lysator.liu.se/c/rat/title.html";>C89
 Rationale (HTML)
 


Re: [Fortran, Patch, PR86468, v1] Follow up: Remove obsolete VIEW_CONVERT

2024-08-23 Thread Andre Vehreschild
Hi Steve,

thanks for the ok. Committed as gcc-15-3099-g0636de8c520

Thanks again,
Andre

On Wed, 21 Aug 2024 09:31:55 -0700
Steve Kargl  wrote:

> On Wed, Aug 21, 2024 at 12:17:46PM +0200, Andre Vehreschild wrote:
> >
> > attached small patch removes a VIEW_CONVERT that I erroneously inserted
> > during patching pr110033. PR86468 fixes the (co-)rank computation and
> > therefore this VIEW_CONVERT is IMO obsolete. I think it may cause hard to
> > find runtime bugs in the future and therefore like to remove it.
> >
> > Regtests ok on x86_64-pc-linux-gnu. Ok for mainline?
> >
>
> Yes.
>


--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH 1/3] gcov: Cache source files

2024-08-23 Thread Jørgen Kvalsvik

On 8/23/24 09:39, Jan Hubicka wrote:

Hi,

 1:4:int notmain(const char *entity)
 -: == inlined from hello.h ==
 1:6:  if (s)
branch  0 taken 0 (fallthrough)
branch  1 taken 1
 #:7:printf ("hello, %s!\n", s);
 %:7-block 3
call0 never executed
 -:8:  else
 1:9:printf ("hello, world!\n");
 1:9-block 4
call0 returned 1
 1:   10:  return 0;
 1:   10-block 5
 -: == inlined from hello.h (end) ==
 -:5:{
 1:6:  return hello (entity);
 1:6-block 7
 -:7:}


This indeed looks like a reasonable goal.

gcc/ChangeLog:

* gcov.cc (release_structures): Release source_lines.
(slurp): New function.
(output_lines): Read sources with slurp.
---
  gcc/gcov.cc | 70 -
  1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index e76a314041c..19019f404ee 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -550,6 +550,11 @@ static vector names;
 a file being read multiple times.  */
  static vector processed_files;
  
+/* The contents of a source file.  The nth SOURCE_LINES entry is the

+   contents of the nth SOURCES, or empty if it has not or could not be
+   read.  */
+static vector*> source_lines;
+
  /* This holds data summary information.  */
  
  static unsigned object_runs;

@@ -762,6 +767,8 @@ static string make_gcov_file_name (const char *, const char 
*);
  static char *mangle_name (const char *);
  static void release_structures (void);
  extern int main (int, char **);
+static const vector&
+slurp (const source_info &src, FILE *gcov_file, const char *line_start);
  
  function_info::function_info (): m_name (NULL), m_demangled_name (NULL),

ident (0), lineno_checksum (0), cfg_checksum (0), has_catch (0),
@@ -1804,6 +1811,15 @@ release_structures (void)
 it != functions.end (); it++)
  delete (*it);
  
+  for (vector *lines : source_lines)

+{
+  if (lines)
+   for (const char *line : *lines)
+ free (const_cast  (line));
+  delete (lines);
+}
+  source_lines.resize (0);
+
for (fnfilter &filter : filters)
  regfree (&filter.regex);
  
@@ -3246,6 +3262,41 @@ read_line (FILE *file)

return pos ? string : NULL;
  }
  
+/* Get the vector with the contents SRC, possibly from a cache.  If

+   the reading fails, a message prefixed with LINE_START is written to
+   GCOV_FILE.  */
+static const vector&
+slurp (const source_info &src, FILE *gcov_file,
+   const char *line_start)
+{
+  if (source_lines.size () <= src.index)
+source_lines.resize (src.index + 1);
+
+  /* Store vector pointers so that the returned references remain
+ stable and won't be broken by successive calls to slurp.  */
+  if (!source_lines[src.index])
+source_lines[src.index] = new vector ();
+
+  if (!source_lines[src.index]->empty ())
+return *source_lines[src.index];
+
+  FILE *source_file = fopen (src.name, "r");
+  if (!source_file)
+fnotice (stderr, "Cannot open source file %s\n", src.name);
+  else if (src.file_time == 0)
+fprintf (gcov_file, "%sSource is newer than graph\n", line_start);
+
+  const char *retval;
+  vector &lines = *source_lines[src.index];
+  if (source_file)
+while ((retval = read_line (source_file)))
+  lines.push_back (xstrdup (retval));
+
+  if (source_file)
+fclose (source_file);
+  return lines;

So this is basically going to read all sources needed for single
compilation unit (.cc file) into memory at once.  I guess that is OK.
I wonder if we don't want to do something like mmap the source and then
set up source_lines array instead of reading every like into spearate
xstrduped chunk?
But that could be done incrementally, so th epatch is OK.



Yes, which is what it did anyway (if not on purpose). Moving to mmap 
might also be a nice change, and maybe use a separate string_view like 
index to easily move between the lines. But as you say, that is an 
incremental change.


Thanks,
Jørgen



Honza

+}
+
  /* Pad string S with spaces from left to have total width equal to 9.  */
  
  static void

@@ -3435,9 +3486,6 @@ output_lines (FILE *gcov_file, const source_info *src)
  #define  DEFAULT_LINE_START "-:0:"
  #define FN_SEPARATOR "--\n"
  
-  FILE *source_file;

-  const char *retval;
-
/* Print colorization legend.  */
if (flag_use_colors)
  fprintf (gcov_file, "%s",
@@ -3464,17 +3512,8 @@ output_lines (FILE *gcov_file, const source_info *src)
fprintf (gcov_file, DEFAULT_LINE_START "Runs:%u\n", object_runs);
  }
  
-  source_file = fopen (src->name, "r");

-  if (!source_file)
-fnotice (stderr, "Cannot open source file %s\n", src->name);
-  else if (src->file_time == 0)
-fprintf (gcov_file, DEFAULT_LINE_START "Source is newer than graph\n");
-
-  vector source_lines;
-  if (source_file)
-  

Re: [RFC/RFA][PATCH v4 06/12] aarch64: Implement new expander for efficient CRC computation

2024-08-23 Thread Mariam Arutunian
On Wed, Aug 21, 2024 at 5:56 PM Richard Sandiford 
wrote:

> Mariam Arutunian  writes:
> > This patch introduces two new expanders for the aarch64 backend,
> > dedicated to generate optimized code for CRC computations.
> > The new expanders are designed to leverage specific hardware capabilities
> > to achieve faster CRC calculations,
> > particularly using the crc32, crc32c and pmull instructions when
> supported
> > by the target architecture.
> >
> > Expander 1: Bit-Forward CRC (crc4)
> > For targets that support pmul instruction (TARGET_AES),
> > the expander will generate code that uses the pmull (crypto_pmulldi)
> > instruction for CRC computation.
> >
> > Expander 2: Bit-Reversed CRC (crc_rev4)
> > The expander first checks if the target supports the CRC32* instruction
> set
> > (TARGET_CRC32)
> > and the polynomial in use is 0x1EDC6F41 (iSCSI) or 0x04C11DB7 (HDLC). If
> > the conditions are met,
> > it emits calls to the corresponding crc32* instruction (depending on the
> > data size and the polynomial).
> > If the target does not support crc32* but supports pmull, it then uses
> the
> > pmull (crypto_pmulldi) instruction for bit-reversed CRC computation.
> > Otherwise table-based CRC is generated.
> >
> >   gcc/config/aarch64/
> >
> > * aarch64-protos.h (aarch64_expand_crc_using_pmull): New extern
> > function declaration.
> > (aarch64_expand_reversed_crc_using_pmull):  Likewise.
> > * aarch64.cc (aarch64_expand_crc_using_pmull): New function.
> > (aarch64_expand_reversed_crc_using_pmull):  Likewise.
> > * aarch64.md (crc_rev4): New expander for
> > reversed CRC.
> > (crc4): New expander for bit-forward CRC.
> > * iterators.md (crc_data_type): New mode attribute.
> >
> >   gcc/testsuite/gcc.target/aarch64/
> >
> > * crc-1-pmul.c: New test.
> > * crc-10-pmul.c: Likewise.
> > * crc-12-pmul.c: Likewise.
> > * crc-13-pmul.c: Likewise.
> > * crc-14-pmul.c: Likewise.
> > * crc-17-pmul.c: Likewise.
> > * crc-18-pmul.c: Likewise.
> > * crc-21-pmul.c: Likewise.
> > * crc-22-pmul.c: Likewise.
> > * crc-23-pmul.c: Likewise.
> > * crc-4-pmul.c: Likewise.
> > * crc-5-pmul.c: Likewise.
> > * crc-6-pmul.c: Likewise.
> > * crc-7-pmul.c: Likewise.
> > * crc-8-pmul.c: Likewise.
> > * crc-9-pmul.c: Likewise.
> > * crc-CCIT-data16-pmul.c: Likewise.
> > * crc-CCIT-data8-pmul.c: Likewise.
> > * crc-coremark-16bitdata-pmul.c: Likewise.
> > * crc-crc32-data16.c: Likewise.
> > * crc-crc32-data32.c: Likewise.
> > * crc-crc32-data8.c: Likewise.
> > * crc-crc32c-data16.c: Likewise.
> > * crc-crc32c-data32.c: Likewise.
> > * crc-crc32c-data8.c: Likewise.
>
> OK for trunk once the prerequisites are approved.  Thanks for all your
> work on this.
>
> Which other parts of the series still need review?  I can try to help
> out with the target-independent bits.  (That said, I'm not sure I'm the
> best person to review the tree recognition pass, but I can have a go.)
>
>
Thank you very much for everything.
Right now, I'm not sure which parts would be best to be reviewed since
Richard Biener is currently reviewing them.
Maybe I can ask for your help later?

Thanks,
Mariam

Richard
>
> >
> > Signed-off-by: Mariam Arutunian 
> > Co-authored-by: Richard Sandiford 
> > diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> > index 42639e9efcf..469111e3b17 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -1112,5 +1112,8 @@ extern void aarch64_adjust_reg_alloc_order ();
> >
> >  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
> >  void aarch64_restore_za (rtx);
> > +void aarch64_expand_crc_using_pmull (scalar_mode, scalar_mode, rtx *);
> > +void aarch64_expand_reversed_crc_using_pmull (scalar_mode, scalar_mode,
> rtx *);
> > +
> >
> >  #endif /* GCC_AARCH64_PROTOS_H */
> > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > index 7f0cc47d0f0..0cb8f3e8090 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -30314,6 +30314,137 @@ aarch64_retrieve_sysreg (const char *regname,
> bool write_p, bool is128op)
> >return sysreg->encoding;
> >  }
> >
> > +/* Generate assembly to calculate CRC
> > +   using carry-less multiplication instruction.
> > +   OPERANDS[1] is input CRC,
> > +   OPERANDS[2] is data (message),
> > +   OPERANDS[3] is the polynomial without the leading 1.  */
> > +
> > +void
> > +aarch64_expand_crc_using_pmull (scalar_mode crc_mode,
> > + scalar_mode data_mode,
> > + rtx *operands)
> > +{
> > +  /* Check and keep arguments.  */
> > +  gcc_assert (!CONST_INT_P (operands[0]));
> > +  gcc_assert (CONST_INT_P (operands[3]));
> > +  rtx crc = operands[1];
> > +  rtx data = operands[2];
> > +  rtx polynomial = operands[3];
> > +
> > +  unsigned HOST_WIDE_INT crc_size = GET_MOD

[pushed] fortran: Minor fix to -ffrontend-optimize description (was: typo on homepage)

2024-08-23 Thread Gerald Pfeifer
On Mon, 8 Apr 2024, Johannes Nendwich via Gcc wrote:
> on https://gcc.gnu.org/onlinedocs/gfortran/Code-Gen-Options.html
> there is at the end the part
> 
>-ffrontend-optimize
> 
>This option performs front-end optimization, based on manipulating
> parts the Fortran parse tree.
> 
> Might it be that it should say "... manipulating parts _of_ the Fortran 
> parse tree."?

Yes, I believe you're right, so went ahead and pushed the following 
change.

Thank you,
Gerald


commit a071fcda136d00f8321d0adc773007f4f45020ea
Author: Gerald Pfeifer 
Date:   Fri Aug 23 10:02:15 2024 +0200

fortran: Minor fix to -ffrontend-optimize description

gcc/fortran:
* invoke.texi (Code Gen Options): Add a missing word.

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 6bc42afe2c4..8b3f8118848 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -2117,7 +2117,7 @@ if @option{-ffrontend-optimize} is in effect.
 @cindex Front-end optimization
 @item -ffrontend-optimize
 This option performs front-end optimization, based on manipulating
-parts the Fortran parse tree.  Enabled by default by any @option{-O} option
+parts of the Fortran parse tree.  Enabled by default by any @option{-O} option
 except @option{-O0} and @option{-Og}.  Optimizations enabled by this option
 include:
 @itemize @bullet


Re: [patch][v2a] libgomp: Add interop types and routines to OpenMP's headers and module

2024-08-23 Thread Andre Vehreschild
Hi Tobias,

I just had a short look at your PR. Besides that it did not git-am for me (see
below), I have only one question (see also below). Please note, that I have
only user-level experience in OpenMP and can say nothing about completeness or
soundness of your PR. I hope that a first check on overall style motivates a
"pro" to have a more in-depth look.

So my "ok" is just for style and overall applicability.

Regards,
Andre

On Thu, 22 Aug 2024 09:14:58 +0200
Tobias Burnus  wrote:

> This is nearly identical to v2, except that I presumably used 'git add 
> testsuite' when intending to use 'git add -u testsuite' in a last-minute 
> change as it contained a bunch of unrelated test files …
> 
> The only other change besides removing unrelated files  is that for the 
> generic part of omp_get_interop_type_desc, the data types ('int' for 
> fr_id, vendor, device_num; const char*' for fr_name, vendor_name) are 
> now returned in target.c while the specific types (for device, 
> device_context, targetsync platform) will eventually be handled by the 
> plugin function.
> 
> Tobias
> 
> Am 21.08.24 um 20:27 schrieb Tobias Burnus:
> > Nearly identical to v1, except that I realized that OpenMP permits to 
> > call those functions also from target regions.
> >
> > Hence, those also got those functions, including a use of 
> > omp_irc_other to make clear why it will fail …
> >
> > In addition, two (nonhost) target-region test files were added.
> >
> > Comments, remarks, suggestions before I commit it?


Attachment: 

> libgomp: Add interop types and routines to OpenMP's headers and module

git am did not work for me (sorry for the German):
$ git am interop-1v2a.diff
Wende an: This commit adds OpenMP 5.1+'s interop enumeration, type and routine
/mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:839: indent 
with spaces.
 "const char*", /* fr_name */
/mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:840: indent 
with spaces.
 "int", /* vendor */
/mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:841: indent 
with spaces.
 "const char *",/* vendor_name */
/mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:842: indent 
with spaces.
 "int"};/* device_num */
/mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:1332: space 
before tab in indent.
"omp_interop_none"));  /* GCC implementation 
choice.  */
Warnung: 5 Zeilen fügen Whitespace-Fehler hinzu.
Schwerwiegend: Leerer Name in Identifikation (für <>) nicht erlaubt.

> diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
> index 9cafea4e2cc..e9141f20ef3 100644
> --- a/libgomp/config/gcn/target.c
> +++ b/libgomp/config/gcn/target.c
> @@ -185,3 +185,102 @@ GOMP_target_enter_exit_data (int device, size_t mapnum,



> +omp_intptr_t
> +omp_get_interop_int (const omp_interop_t interop,
> +  omp_interop_property_t property_id,
> +  omp_interop_rc_t *ret_code)
> +{
> +  if (property_id < omp_ipr_first || property_id >= 0)
> +*ret_code = omp_irc_out_of_range;
> +  else if (interop == omp_interop_none)
> +*ret_code = omp_irc_empty;
> +  else
> +*ret_code = omp_irc_other;
> +  return 0;

Do I get this correct, that omp_intptr_t is a pointer to an integer? Would it
not be more intuitive to return nullptr here?

> +}




-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


[PATCH v3 00/10] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

Hello,

this is the third version of the inline MINLOC/MAXLOC without DIM patchset
whose second version was posted before at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660599.html

Compared to the previous version, it contains a change of wording of the
documentation in patch 10.  The rest is rebased on a recent master
without any other change (as announced by Harald, there was a minor
conflict for patch 4/10 as a result of Andre's corank patch).

I need a ack for the newly added option in patch 10.

This series of patches enable the generation of inline code for the MINLOC
and MAXLOC intrinsics, when the DIM argument is not present.  The
generated code is based on the inline implementation already generated in
the scalar case, that is when ARRAY has rank 1 and DIM is present.  The
code is extended by using several variables (one for each dimension) where
the scalar code used just one, and collecting the variables to an array
before returning.

The patches are split in a way that allows inlining in more and more cases
as controlled by the gfc_inline_intrinsic_p predicate which evolves with
the patches.  The last patch (10/10) adds a flag to control inlining
from the command line.

v2 -> v3 changes:

 - In patch 10/10, rework the documentation:
   Add a summary statement as first sentence of the new flag documentation.
   Reword the documentation of the flag limitations to omit the ambiguous
   "scalar" qualification of intrinsic function results.
   Incorporate the sentence suggested during review.

v1 -> v2 changes:

 - In patch 1/10, use intrinsic ieee_arithmetic module to get NAN values in
   tests.  This required to split the tests using ieee_arithmetic to
   a separate file in the ieee/ subdirectory.

 - Add patch 4/10 removing the frontend minmaxloc pass.

 - Add patch 10/10 adding -finline-intrinsics flag to control MINLOC/MAXLOC
   inlining from the command line.

Mikael Morin (10):
  fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]
  fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]
  fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1
[PR90608]
  fortran: Remove MINLOC/MAXLOC frontend optimization
  fortran: Outline array bound check generation code
  fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK
[PR90608]
  fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK
[PR90608]
  fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]
  fortran: Continue MINLOC/MAXLOC second loop where the first stopped
[PR90608]
  fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]

 gcc/flag-types.h  |  30 +
 gcc/fortran/frontend-passes.cc|  57 --
 gcc/fortran/invoke.texi   |  31 +
 gcc/fortran/lang.opt  |  27 +
 gcc/fortran/options.cc|  21 +-
 gcc/fortran/trans-array.cc| 382 +
 gcc/fortran/trans-intrinsic.cc| 489 ---
 .../gfortran.dg/ieee/maxloc_nan_1.f90 |  44 +
 .../gfortran.dg/ieee/minloc_nan_1.f90 |  44 +
 gcc/testsuite/gfortran.dg/maxloc_7.f90| 208 +
 gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 |   4 +-
 .../gfortran.dg/maxloc_with_mask_1.f90| 373 +
 gcc/testsuite/gfortran.dg/minloc_8.f90| 208 +
 .../gfortran.dg/minloc_with_mask_1.f90| 372 +
 gcc/testsuite/gfortran.dg/minmaxloc_18.f90| 772 ++
 gcc/testsuite/gfortran.dg/minmaxloc_18a.f90   |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18b.f90   |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18c.f90   |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18d.f90   |  10 +
 22 files changed, 2767 insertions(+), 347 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18b.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18c.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18d.f90

-- 
2.43.0



[PATCH v3 04/10] fortran: Remove MINLOC/MAXLOC frontend optimization

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

Remove the frontend pass rewriting calls of MINLOC/MAXLOC without DIM to
calls with one-valued DIM enclosed in an array constructor.  This
transformation was circumventing the limitation of inline MINLOC/MAXLOC code
generation to scalar cases only, allowing inline code to be generated if
ARRAY had rank 1 and DIM was absent.  As MINLOC/MAXLOC has gained support of
inline code generation in that case, the limitation is no longer effective,
and the transformation no longer necessary.

gcc/fortran/ChangeLog:

* frontend-passes.cc (optimize_minmaxloc): Remove.
(optimize_expr): Remove dispatch to optimize_minmaxloc.
---
 gcc/fortran/frontend-passes.cc | 58 --
 1 file changed, 58 deletions(-)

diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index f7f49eea617..c7cb9d2a389 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -36,7 +36,6 @@ static bool optimize_op (gfc_expr *);
 static bool optimize_comparison (gfc_expr *, gfc_intrinsic_op);
 static bool optimize_trim (gfc_expr *);
 static bool optimize_lexical_comparison (gfc_expr *);
-static void optimize_minmaxloc (gfc_expr **);
 static bool is_empty_string (gfc_expr *e);
 static void doloop_warn (gfc_namespace *);
 static int do_intent (gfc_expr **);
@@ -356,17 +355,6 @@ optimize_expr (gfc_expr **e, int *walk_subtrees 
ATTRIBUTE_UNUSED,
   if ((*e)->expr_type == EXPR_OP && optimize_op (*e))
 gfc_simplify_expr (*e, 0);
 
-  if ((*e)->expr_type == EXPR_FUNCTION && (*e)->value.function.isym)
-switch ((*e)->value.function.isym->id)
-  {
-  case GFC_ISYM_MINLOC:
-  case GFC_ISYM_MAXLOC:
-   optimize_minmaxloc (e);
-   break;
-  default:
-   break;
-  }
-
   if (function_expr)
 count_arglist --;
 
@@ -2266,52 +2254,6 @@ optimize_trim (gfc_expr *e)
   return true;
 }
 
-/* Optimize minloc(b), where b is rank 1 array, into
-   (/ minloc(b, dim=1) /), and similarly for maxloc,
-   as the latter forms are expanded inline.  */
-
-static void
-optimize_minmaxloc (gfc_expr **e)
-{
-  gfc_expr *fn = *e;
-  gfc_actual_arglist *a;
-  char *name, *p;
-
-  if (fn->rank != 1
-  || fn->value.function.actual == NULL
-  || fn->value.function.actual->expr == NULL
-  || fn->value.function.actual->expr->ts.type == BT_CHARACTER
-  || fn->value.function.actual->expr->rank != 1
-  || gfc_inline_intrinsic_function_p (fn))
-return;
-
-  *e = gfc_get_array_expr (fn->ts.type, fn->ts.kind, &fn->where);
-  (*e)->shape = fn->shape;
-  fn->rank = 0;
-  fn->corank = 0;
-  fn->shape = NULL;
-  gfc_constructor_append_expr (&(*e)->value.constructor, fn, &fn->where);
-
-  name = XALLOCAVEC (char, strlen (fn->value.function.name) + 1);
-  strcpy (name, fn->value.function.name);
-  p = strstr (name, "loc0");
-  p[3] = '1';
-  fn->value.function.name = gfc_get_string ("%s", name);
-  if (fn->value.function.actual->next)
-{
-  a = fn->value.function.actual->next;
-  gcc_assert (a->expr == NULL);
-}
-  else
-{
-  a = gfc_get_actual_arglist ();
-  fn->value.function.actual->next = a;
-}
-  a->expr = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
-  &fn->where);
-  mpz_set_ui (a->expr->value.integer, 1);
-}
-
 /* Data package to hand down for DO loop checks in a contained
procedure.  */
 typedef struct contained_info
-- 
2.43.0



[PATCH v3 07/10] fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608]

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

Enable the generation of inline code for MINLOC/MAXLOC when argument ARRAY
is of integral type, DIM is not present, and MASK is present and is scalar
(only absent MASK or rank 1 ARRAY were inlined before).

Scalar masks are implemented with a wrapping condition around the code one
would generate if MASK wasn't present, so they are easy to support once
inline code without MASK is working.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate
variable initialization for each dimension in the else branch of
the toplevel condition.
(gfc_inline_intrinsic_function_p): Return TRUE for scalar MASK.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_bounds_7.f90: Additionally accept the error message
reported by the scalarizer.
---
 gcc/fortran/trans-intrinsic.cc| 13 -
 gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 |  4 ++--
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 1215cd89630..45f5a7b6977 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5917,7 +5917,6 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
   /* For a scalar mask, enclose the loop in an if statement.  */
   if (maskexpr && maskss == NULL)
 {
-  gcc_assert (loop.dimen == 1);
   tree ifmask;
 
   gfc_init_se (&maskse, NULL);
@@ -5932,7 +5931,8 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
 the pos variable the same way as above.  */
 
   gfc_init_block (&elseblock);
-  gfc_add_modify (&elseblock, pos[0], gfc_index_zero_node);
+  for (int i = 0; i < loop.dimen; i++)
+   gfc_add_modify (&elseblock, pos[i], gfc_index_zero_node);
   elsetmp = gfc_finish_block (&elseblock);
   ifmask = conv_mask_condition (&maskse, maskexpr, optional_mask);
   tmp = build3_v (COND_EXPR, ifmask, tmp, elsetmp);
@@ -11833,9 +11833,12 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
if (array->rank == 1)
  return true;
 
-   if (array->ts.type == BT_INTEGER
-   && dim == nullptr
-   && mask == nullptr)
+   if (array->ts.type != BT_INTEGER
+   || dim != nullptr)
+ return false;
+
+   if (mask == nullptr
+   || mask->rank == 0)
  return true;
 
return false;
diff --git a/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 
b/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90
index 206a29b149d..3aa9d3dcebe 100644
--- a/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90
+++ b/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90
@@ -1,6 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-fbounds-check" }
-! { dg-shouldfail "Incorrect extent in return value of MAXLOC intrinsic: is 3, 
should be 2" }
+! { dg-shouldfail "Incorrect extent in return value of MAXLOC intrinsic: is 3, 
should be 2|Array bound mismatch for dimension 1 of array 'res' .3/2." }
 module tst
 contains
   subroutine foo(res)
@@ -18,4 +18,4 @@ program main
   integer :: res(3)
   call foo(res)
 end program main
-! { dg-output "Fortran runtime error: Incorrect extent in return value of 
MAXLOC intrinsic: is 3, should be 2" }
+! { dg-output "Fortran runtime error: Incorrect extent in return value of 
MAXLOC intrinsic: is 3, should be 2|Array bound mismatch for dimension 1 of 
array 'res' .3/2." }
-- 
2.43.0



[PATCH v3 05/10] fortran: Outline array bound check generation code

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

The next patch will need reindenting of the array bound check generation
code.  This outlines it to its own function beforehand, reducing the churn
in the next patch.

-- >8 --

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_startstride): Move array bound check
generation code...
(add_check_section_in_array_bounds): ... here as a new function.
---
 gcc/fortran/trans-array.cc | 297 ++---
 1 file changed, 143 insertions(+), 154 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 3c4831b6089..bc5f5900c6a 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4816,6 +4816,146 @@ gfc_conv_section_startstride (stmtblock_t * block, 
gfc_ss * ss, int dim)
 }
 
 
+/* Generate in INNER the bounds checking code along the dimension DIM for
+   the array associated with SS_INFO.  */
+
+static void
+add_check_section_in_array_bounds (stmtblock_t *inner, gfc_ss_info *ss_info,
+  int dim)
+{
+  gfc_expr *expr = ss_info->expr;
+  locus *expr_loc = &expr->where;
+  const char *expr_name = expr->symtree->name;
+
+  gfc_array_info *info = &ss_info->data.array;
+
+  bool check_upper;
+  if (dim == info->ref->u.ar.dimen - 1
+  && info->ref->u.ar.as->type == AS_ASSUMED_SIZE)
+check_upper = false;
+  else
+check_upper = true;
+
+  /* Zero stride is not allowed.  */
+  tree tmp = fold_build2_loc (input_location, EQ_EXPR, logical_type_node,
+ info->stride[dim], gfc_index_zero_node);
+  char * msg = xasprintf ("Zero stride is not allowed, for dimension %d "
+ "of array '%s'", dim + 1, expr_name);
+  gfc_trans_runtime_check (true, false, tmp, inner, expr_loc, msg);
+  free (msg);
+
+  tree desc = info->descriptor;
+
+  /* This is the run-time equivalent of resolve.cc's
+ check_dimension.  The logical is more readable there
+ than it is here, with all the trees.  */
+  tree lbound = gfc_conv_array_lbound (desc, dim);
+  tree end = info->end[dim];
+  tree ubound = check_upper ? gfc_conv_array_ubound (desc, dim) : NULL_TREE;
+
+  /* non_zerosized is true when the selected range is not
+ empty.  */
+  tree stride_pos = fold_build2_loc (input_location, GT_EXPR, 
logical_type_node,
+info->stride[dim], gfc_index_zero_node);
+  tmp = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
+info->start[dim], end);
+  stride_pos = fold_build2_loc (input_location, TRUTH_AND_EXPR,
+   logical_type_node, stride_pos, tmp);
+
+  tree stride_neg = fold_build2_loc (input_location, LT_EXPR, 
logical_type_node,
+info->stride[dim], gfc_index_zero_node);
+  tmp = fold_build2_loc (input_location, GE_EXPR, logical_type_node,
+info->start[dim], end);
+  stride_neg = fold_build2_loc (input_location, TRUTH_AND_EXPR,
+   logical_type_node, stride_neg, tmp);
+  tree non_zerosized = fold_build2_loc (input_location, TRUTH_OR_EXPR,
+   logical_type_node, stride_pos,
+   stride_neg);
+
+  /* Check the start of the range against the lower and upper
+ bounds of the array, if the range is not empty.
+ If upper bound is present, include both bounds in the
+ error message.  */
+  if (check_upper)
+{
+  tmp = fold_build2_loc (input_location, LT_EXPR, logical_type_node,
+info->start[dim], lbound);
+  tmp = fold_build2_loc (input_location, TRUTH_AND_EXPR, logical_type_node,
+non_zerosized, tmp);
+  tree tmp2 = fold_build2_loc (input_location, GT_EXPR, logical_type_node,
+  info->start[dim], ubound);
+  tmp2 = fold_build2_loc (input_location, TRUTH_AND_EXPR, 
logical_type_node,
+ non_zerosized, tmp2);
+  msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' outside of "
+  "expected range (%%ld:%%ld)", dim + 1, expr_name);
+  gfc_trans_runtime_check (true, false, tmp, inner, expr_loc, msg,
+ fold_convert (long_integer_type_node, info->start[dim]),
+ fold_convert (long_integer_type_node, lbound),
+ fold_convert (long_integer_type_node, ubound));
+  gfc_trans_runtime_check (true, false, tmp2, inner, expr_loc, msg,
+ fold_convert (long_integer_type_node, info->start[dim]),
+ fold_convert (long_integer_type_node, lbound),
+ fold_convert (long_integer_type_node, ubound));
+  free (msg);
+}
+  else
+{
+  tmp = fold_build2_loc (input_location, LT_EXPR, logical_type_node,
+info->start[dim], lbound);
+  tmp = fold_build2_loc (input_location, TRUTH_AND_EXPR, logical_type_node,
+non_ze

[PATCH v3 02/10] fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

Disable rewriting of MINLOC/MAXLOC expressions for which inline code
generation is supported.  Update the gfc_inline_intrinsic_function_p
predicate (already existing) for that, with the current state of
MINLOC/MAXLOC inlining support, that is only the cases of a scalar
result and non-CHARACTER argument for now.

This change has no effect currently, as the MINLOC/MAXLOC front-end passes
only change expressions of rank 1, but the inlining control predicate
gfc_inline_intrinsic_function_p returns false for those.  However, later
changes will extend MINLOC/MAXLOC inline expansion support to array
expressions and update the inlining control predicate, and this will become
effective.

PR fortran/90608

gcc/fortran/ChangeLog:

* frontend-passes.cc (optimize_minmaxloc): Skip if we can generate
inline code for the unmodified expression.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Add
MINLOC and MAXLOC cases.
---
 gcc/fortran/frontend-passes.cc |  3 ++-
 gcc/fortran/trans-intrinsic.cc | 23 +++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 104ccb1a4c1..f7f49eea617 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -2281,7 +2281,8 @@ optimize_minmaxloc (gfc_expr **e)
   || fn->value.function.actual == NULL
   || fn->value.function.actual->expr == NULL
   || fn->value.function.actual->expr->ts.type == BT_CHARACTER
-  || fn->value.function.actual->expr->rank != 1)
+  || fn->value.function.actual->expr->rank != 1
+  || gfc_inline_intrinsic_function_p (fn))
 return;
 
   *e = gfc_get_array_expr (fn->ts.type, fn->ts.kind, &fn->where);
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 0632e3e4d2f..cf5c8e63a9f 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -11662,6 +11662,29 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
 case GFC_ISYM_TRANSPOSE:
   return true;
 
+case GFC_ISYM_MINLOC:
+case GFC_ISYM_MAXLOC:
+  {
+   /* Disable inline expansion if code size matters.  */
+   if (optimize_size)
+ return false;
+
+   gfc_actual_arglist *array_arg = expr->value.function.actual;
+   gfc_actual_arglist *dim_arg = array_arg->next;
+
+   gfc_expr *array = array_arg->expr;
+   gfc_expr *dim = dim_arg->expr;
+
+   if (!(array->ts.type == BT_INTEGER
+ || array->ts.type == BT_REAL))
+ return false;
+
+   if (array->rank == 1 && dim != nullptr)
+ return true;
+
+   return false;
+  }
+
 default:
   return false;
 }
-- 
2.43.0



[PATCH v3 09/10] fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608]

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

Continue the second set of loops where the first one stopped in the
generated inline MINLOC/MAXLOC code in the cases where the generated code
contains two sets of loops.  This fixes a regression that was introduced
when enabling the generation of inline MINLOC/MAXLOC code with ARRAY of rank
greater than 1, no DIM argument, and either non-scalar MASK or floating-
point ARRAY.

In the cases where two sets of loops are generated as inline MINLOC/MAXLOC
code, we previously generated code such as (for rank 2 ARRAY, so with two
levels of nesting):

for (idx11 in lower1..upper1)
  {
for (idx12 in lower2..upper2)
  {
...
if (...)
  {
...
goto second_loop;
  }
  }
  }
second_loop:
for (idx21 in lower1..upper1)
  {
for (idx22 in lower2..upper2)
  {
...
  }
  }

which means we process the first elements twice, once in the first set
of loops and once in the second one.  This change avoids this duplicate
processing by using a conditional as lower bound for the second set of
loops, generating code like:

second_loop_entry = false;
for (idx11 in lower1..upper1)
  {
for (idx12 in lower2..upper2)
  {
...
if (...)
  {
...
second_loop_entry = true;
goto second_loop;
  }
  }
  }
second_loop:
for (idx21 in (second_loop_entry ? idx11 : lower1)..upper1)
  {
for (idx22 in (second_loop_entry ? idx12 : lower2)..upper2)
  {
...
second_loop_entry = false;
  }
  }

It was expected that the compiler optimizations would be able to remove the
state variable second_loop_entry.  It is the case if ARRAY has rank 1 (so
without loop nesting), the variable is removed and the loop bounds become
unconditional, which restores previously generated code, fully fixing the
regression.  For larger rank, unfortunately, the state variable and
conditional loop bounds remain, but those cases were previously using
library calls, so it's not a regression.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate a set
of index variables.  Set them using the loop indexes before leaving
the first set of loops.  Generate a new loop entry predicate.
Initialize it.  Set it before leaving the first set of loops.  Clear
it in the body of the second set of loops.  For the second set of
loops, update each loop lower bound to use the corresponding index
variable if the predicate variable is set.
---
 gcc/fortran/trans-intrinsic.cc | 33 +++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 3d29bcaf590..f490e795c02 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5371,6 +5371,7 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 pos0 = 0;
 pos1 = 0;
 S1 = from1;
+second_loop_entry = false;
 while (S1 <= to1) {
   S0 = from0;
   while (s0 <= to0 {
@@ -5383,6 +5384,7 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 limit = a[S1][S0];
 pos0 = S0 + (1 - from0);
 pos1 = S1 + (1 - from1);
+second_loop_entry = true;
 goto lab1;
   }
 }
@@ -5392,9 +5394,9 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 }
 goto lab2;
 lab1:;
-S1 = from1;
+S1 = second_loop_entry ? S1 : from1;
 while (S1 <= to1) {
-  S0 = from0;
+  S0 = second_loop_entry ? S0 : from0;
   while (S0 <= to0) {
 if (mask[S1][S0])
   if (a[S1][S0] < limit) {
@@ -5402,6 +5404,7 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 pos0 = S + (1 - from0);
 pos1 = S + (1 - from1);
   }
+second_loop_entry = false;
 S0++;
   }
   S1++;
@@ -5473,6 +5476,7 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
   gfc_expr *backexpr;
   gfc_se backse;
   tree pos[GFC_MAX_DIMENSIONS];
+  tree idx[GFC_MAX_DIMENSIONS];
   tree result_var = NULL_TREE;
   int n;
   bool optional_mask;
@@ -5554,6 +5558,8 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
   gfc_get_string ("pos%d", i));
   offset[i] = gfc_create_var (gfc_array_index_type,
  gfc_g

[PATCH v3 06/10] fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608]

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

Enable generation of inline code for the MINLOC and MAXLOC intrinsic,
if the ARRAY argument is of integral type and of any rank (only the rank 1
case was previously inlined), and neither DIM nor MASK arguments are
present.

This needs a few adjustments in gfc_conv_intrinsic_minmaxloc,
mainly to replace the single variables POS and OFFSET, with collections
of variables, one variable per dimension each.

The restriction to integral ARRAY and absent MASK limits the scope of
the change to the cases where we generate single loop inline code.  The
code generation for the second loop is only accessible with ARRAY of rank
1, so it can continue using a single variable.  A later change will extend
inlining to the double loop cases.

There is some bounds checking code that was previously handled by the
library, and that needed some changes in the scalarizer to avoid regressing.
The bounds check code generation was already supported by the scalarizer,
but it was only applying to array reference sections, checking both
for array bound violation and for shape conformability between all the
involved arrays.  With this change, for MINLOC or MAXLOC, enable the
conformability check between all the scalarized arrays, and disable the
array bound violation check.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_startstride): Set the MINLOC/MAXLOC
result upper bound using the rank of the ARRAY argument.  Ajdust
the error message for intrinsic result arrays.  Only check array
bounds for array references.  Move bound check decision code...
(bounds_check_needed): ... here as a new predicate.  Allow bound
check for MINLOC/MAXLOC intrinsic results.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Change the
result array upper bound to the rank of ARRAY.  Update the NONEMPTY
variable to depend on the non-empty extent of every dimension.  Use
one variable per dimension instead of a single variable for the
position and the offset.  Update their declaration, initialization,
and update to affect the variable of each dimension.  Use the first
variable only in areas only accessed with rank 1 ARRAY argument.
Set every element of the result using its corresponding variable.
(gfc_inline_intrinsic_function_p): Return true for integral ARRAY
and absent DIM and MASK.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_bounds_4.f90: Additionally accept the error
message emitted by the scalarizer.
---
 gcc/fortran/trans-array.cc|  70 ++--
 gcc/fortran/trans-intrinsic.cc| 150 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 |   4 +-
 3 files changed, 166 insertions(+), 58 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index bc5f5900c6a..bb694371b47 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4956,6 +4956,35 @@ add_check_section_in_array_bounds (stmtblock_t *inner, 
gfc_ss_info *ss_info,
 }
 
 
+/* Tells whether we need to generate bounds checking code for the array
+   associated with SS.  */
+
+bool
+bounds_check_needed (gfc_ss *ss)
+{
+  /* Catch allocatable lhs in f2003.  */
+  if (flag_realloc_lhs && ss->no_bounds_check)
+return false;
+
+  gfc_ss_info *ss_info = ss->info;
+  if (ss_info->type == GFC_SS_SECTION)
+return true;
+
+  if (!(ss_info->type == GFC_SS_INTRINSIC
+   && ss_info->expr
+   && ss_info->expr->expr_type == EXPR_FUNCTION))
+return false;
+
+  gfc_intrinsic_sym *isym = ss_info->expr->value.function.isym;
+  if (!(isym
+   && (isym->id == GFC_ISYM_MAXLOC
+   || isym->id == GFC_ISYM_MINLOC)))
+return false;
+
+  return gfc_inline_intrinsic_function_p (ss_info->expr);
+}
+
+
 /* Calculates the range start and stride for a SS chain.  Also gets the
descriptor and data pointer.  The range of vector subscripts is the size
of the vector.  Array bounds are also checked.  */
@@ -5057,10 +5086,17 @@ done:
info->data = gfc_conv_array_data (info->descriptor);
info->data = gfc_evaluate_now (info->data, &outer_loop->pre);
 
-   info->offset = gfc_index_zero_node;
+   gfc_expr *array = expr->value.function.actual->expr;
+   tree rank = build_int_cst (gfc_array_index_type, array->rank);
+
+   tree tmp = fold_build2_loc (input_location, MINUS_EXPR,
+   gfc_array_index_type, rank,
+   gfc_index_one_node);
+
+   info->end[0] = gfc_evaluate_now (tmp, &outer_loop->pre);
info->start[0] = gfc_index_zero_node;
-   info->end[0] = gfc_index_zero_node;
info->stride[0] = gfc_index_one_node;
+   info->offset = gfc_index_zero_node;
continue;
  }
 

[PATCH v3 03/10] fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1 [PR90608]

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

Enable inline code generation for the MINLOC and MAXLOC intrinsic, if the
DIM argument is not present and ARRAY has rank 1.  This case is similar to
the case where the result is scalar (DIM present and rank 1 ARRAY), which
already supports inline expansion of the intrinsic.  Both cases return
the same value, with the difference that the result is an array of size 1 if
DIM is absent, whereas it's a scalar if DIM  is present.  So all there is
to do for the new case to work is hook the inline expansion with the
scalarizer.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_startstride): Set the scalarization
rank based on the MINLOC/MAXLOC rank if needed.  Call the inline
code generation and setup the scalarizer array descriptor info
in the MINLOC and MAXLOC cases.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Return the
result array element if the scalarizer is setup and we are inside
the loops.  Restrict library function call dispatch to the case
where inline expansion is not supported.  Declare an array result
if the expression isn't scalar.  Initialize the array result single
element and return the result variable if the expression isn't
scalar.
(walk_inline_intrinsic_minmaxloc): New function.
(walk_inline_intrinsic_function): Add MINLOC and MAXLOC cases,
dispatching to walk_inline_intrinsic_minmaxloc.
(gfc_add_intrinsic_ss_code): Add MINLOC and MAXLOC cases.
(gfc_inline_intrinsic_function_p): Return true if ARRAY has rank 1,
regardless of DIM.
---
 gcc/fortran/trans-array.cc |  25 
 gcc/fortran/trans-intrinsic.cc | 224 +++--
 2 files changed, 181 insertions(+), 68 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index ea5fff2e0c2..3c4831b6089 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4851,6 +4851,8 @@ gfc_conv_ss_startstride (gfc_loopinfo * loop)
case GFC_ISYM_UBOUND:
case GFC_ISYM_LCOBOUND:
case GFC_ISYM_UCOBOUND:
+   case GFC_ISYM_MAXLOC:
+   case GFC_ISYM_MINLOC:
case GFC_ISYM_SHAPE:
case GFC_ISYM_THIS_IMAGE:
  loop->dimen = ss->dimen;
@@ -4900,6 +4902,29 @@ done:
case GFC_SS_INTRINSIC:
  switch (expr->value.function.isym->id)
{
+   case GFC_ISYM_MINLOC:
+   case GFC_ISYM_MAXLOC:
+ {
+   gfc_se se;
+   gfc_init_se (&se, nullptr);
+   se.loop = loop;
+   se.ss = ss;
+   gfc_conv_intrinsic_function (&se, expr);
+   gfc_add_block_to_block (&outer_loop->pre, &se.pre);
+   gfc_add_block_to_block (&outer_loop->post, &se.post);
+
+   info->descriptor = se.expr;
+
+   info->data = gfc_conv_array_data (info->descriptor);
+   info->data = gfc_evaluate_now (info->data, &outer_loop->pre);
+
+   info->offset = gfc_index_zero_node;
+   info->start[0] = gfc_index_zero_node;
+   info->end[0] = gfc_index_zero_node;
+   info->stride[0] = gfc_index_one_node;
+   continue;
+ }
+
/* Fall through to supply start and stride.  */
case GFC_ISYM_LBOUND:
case GFC_ISYM_UBOUND:
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index cf5c8e63a9f..695c3591837 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5276,66 +5276,95 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
we need to handle.  For performance reasons we sometimes create two
loops instead of one, where the second one is much simpler.
Examples for minloc intrinsic:
-   1) Result is an array, a call is generated
-   2) Array mask is used and NaNs need to be supported:
-  limit = Infinity;
-  pos = 0;
-  S = from;
-  while (S <= to) {
-   if (mask[S]) {
- if (pos == 0) pos = S + (1 - from);
- if (a[S] <= limit) { limit = a[S]; pos = S + (1 - from); goto lab1; }
-   }
-   S++;
-  }
-  goto lab2;
-  lab1:;
-  while (S <= to) {
-   if (mask[S]) if (a[S] < limit) { limit = a[S]; pos = S + (1 - from); }
-   S++;
-  }
-  lab2:;
-   3) NaNs need to be supported, but it is known at compile time or cheaply
-  at runtime whether array is nonempty or not:
-  limit = Infinity;
-  pos = 0;
-  S = from;
-  while (S <= to) {
-   if (a[S] <= limit) { limit = a[S]; pos = S + (1 - from); goto lab1; }
-   S++;
-  }
-  if (from <= to) pos = 1;
-  goto lab2;
-  lab1:;
-  while (S <= to) {
-   if (a[S] < limit) { limit = a[S]; pos = S + (1 - from); }
-   S++;
-  }
-  lab2:;
-   4) NaNs aren't supported, array mask is

[PATCH v3 08/10] fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

Enable generation of inline MINLOC/MAXLOC code in the case where DIM
is not present, and either ARRAY is of floating point type or MASK is an
array.  Those cases are the remaining bits to fully support inlining of
non-CHARACTER MINLOC/MAXLOC without DIM.  They are treated together because
they generate similar code, the NANs for REAL types being handled a bit like
a second level of masking.  These are the cases for which we generate two
sets of loops.

This change affects the code generating the second loop, that was previously
accessible only in the cases ARRAY has rank 1 only.  The single variable
initialization and update are changed to apply to multiple variables, one
per dimension.

The code generated is as follows (if ARRAY has rank 2):

for (idx11 in lower1..upper1)
  {
for (idx12 in lower2..upper2)
  {
...
if (...)
  {
...
goto second_loop;
  }
  }
  }
second_loop:
for (idx21 in lower1..upper1)
  {
for (idx22 in lower2..upper2)
  {
...
  }
  }

This code leads to processing the first elements redundantly, both in the
first set of loops and in the second one.  The loop over idx22 could
start from idx12 the first time it is run, but as it has to start from
lower2 for the rest of the runs, this change uses the same bounds for both
set of loops for simplicity.  In the rank 1 case, this makes the generated
code worse compared to the inline code that was generated before.  A later
change will introduce conditionals to avoid the duplicate processing and
restore the generated code in that case.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Initialize
and update all the variables.  Put the label and goto in the
outermost scalarizer loop.  Don't start the second loop where the
first stopped.
(gfc_inline_intrinsic_function_p): Also return TRUE for array MASK
or for any REAL type.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_bounds_5.f90: Additionally accept error
messages reported by the scalarizer.
* gfortran.dg/maxloc_bounds_6.f90: Ditto.
---
 gcc/fortran/trans-intrinsic.cc| 127 --
 gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 |   4 +-
 3 files changed, 87 insertions(+), 48 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 45f5a7b6977..3d29bcaf590 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5361,12 +5361,55 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
   }
   S++;
 }
-   B: ARRAY has rank 1, and DIM is absent.  Use the same code as the scalar
-  case and wrap the result in an array.
-   C: ARRAY has rank > 1, NANs are not supported, and DIM and MASK are absent.
-  Generate code similar to the single loop scalar case, but using one
-  variable per dimension, for example if ARRAY has rank 2:
-  4) NAN's aren't supported, no MASK:
+   B: Array result, non-CHARACTER type, DIM absent
+  Generate similar code as in the scalar case, using a collection of
+  variables (one per dimension) instead of a single variable as result.
+  Picking only cases 1) and 4) with ARRAY of rank 2, the generated code
+  becomes:
+  1) Array mask is used and NaNs need to be supported:
+limit = Infinity;
+pos0 = 0;
+pos1 = 0;
+S1 = from1;
+while (S1 <= to1) {
+  S0 = from0;
+  while (s0 <= to0 {
+if (mask[S1][S0]) {
+  if (pos0 == 0) {
+pos0 = S0 + (1 - from0);
+pos1 = S1 + (1 - from1);
+  }
+  if (a[S1][S0] <= limit) {
+limit = a[S1][S0];
+pos0 = S0 + (1 - from0);
+pos1 = S1 + (1 - from1);
+goto lab1;
+  }
+}
+S0++;
+  }
+  S1++;
+}
+goto lab2;
+lab1:;
+S1 = from1;
+while (S1 <= to1) {
+  S0 = from0;
+  while (S0 <= to0) {
+if (mask[S1][S0])
+  if (a[S1][S0] < limit) {
+limit = a[S1][S0];
+pos0 = S + (1 - from0);
+pos1 = S + (1 - from1);
+  }
+S0++;
+  }
+  S1++;
+}
+lab2:;
+result = { pos0, pos1 };
+  ...
+  4) NANs aren't supported, no array mask.
 limit = infinities_supported ? Infinity : huge (limit);
 pos0 = (from0 <= to0 && from1 <= to1) ? 1 : 0;
 pos1 = (from0 <= to0 && from1 <= to1) ? 1 : 0;
@@ -5384,7 +5427,7 @@ strip_kin

[PATCH v3 01/10] fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

Add the tests covering the various cases for which we are about to implement
inline expansion of MINLOC and MAXLOC.  Those are cases where the DIM
argument is not present.

PR fortran/90608

gcc/testsuite/ChangeLog:

* gfortran.dg/ieee/maxloc_nan_1.f90: New test.
* gfortran.dg/ieee/minloc_nan_1.f90: New test.
* gfortran.dg/maxloc_7.f90: New test.
* gfortran.dg/maxloc_with_mask_1.f90: New test.
* gfortran.dg/minloc_8.f90: New test.
* gfortran.dg/minloc_with_mask_1.f90: New test.
---
 .../gfortran.dg/ieee/maxloc_nan_1.f90 |  44 +++
 .../gfortran.dg/ieee/minloc_nan_1.f90 |  44 +++
 gcc/testsuite/gfortran.dg/maxloc_7.f90| 208 ++
 .../gfortran.dg/maxloc_with_mask_1.f90| 373 ++
 gcc/testsuite/gfortran.dg/minloc_8.f90| 208 ++
 .../gfortran.dg/minloc_with_mask_1.f90| 372 +
 6 files changed, 1249 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90

diff --git a/gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90 
b/gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90
new file mode 100644
index 000..329b54e8e1f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90
@@ -0,0 +1,44 @@
+! { dg-do run }
+!
+! PR fortran/90608
+! Check the correct behaviour of the inline MAXLOC implementation,
+! when ARRAY is filled with NANs.
+
+program p
+  implicit none
+  call check_without_mask
+  call check_with_mask
+contains
+  subroutine check_without_mask()
+use, intrinsic :: ieee_arithmetic
+real, allocatable :: a(:,:,:)
+real :: nan
+integer, allocatable :: m(:)
+if (.not. ieee_support_nan(nan)) return
+nan = ieee_value(nan, ieee_quiet_nan)
+allocate(a(3,3,3), source = nan)
+m = maxloc(a)
+if (size(m, dim=1) /= 3) stop 32
+if (any(m /= (/ 1, 1, 1 /))) stop 35
+  end subroutine
+  subroutine check_with_mask()
+use, intrinsic :: ieee_arithmetic
+real, allocatable :: a(:,:,:)
+logical, allocatable :: m(:,:,:)
+real :: nan
+integer, allocatable :: r(:)
+if (.not. ieee_support_nan(nan)) return
+nan = ieee_value(nan, ieee_quiet_nan)
+allocate(a(3,3,3), source = nan)
+allocate(m(3,3,3))
+m(:,:,:) = reshape((/ .false., .false., .true. , .true. , .false., &
+  .true. , .false., .false., .false., .true. , &
+  .true. , .false., .true. , .true. , .true. , &
+  .false., .false., .true. , .true. , .false., &
+  .false., .true. , .false., .false., .true. , &
+  .true. , .true. /), shape(m))
+r = maxloc(a, mask = m)
+if (size(r, dim = 1) /= 3) stop 62
+if (any(r /= (/ 3, 1, 1 /))) stop 65
+  end subroutine
+end program p
diff --git a/gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90 
b/gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90
new file mode 100644
index 000..8f71b4c4398
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90
@@ -0,0 +1,44 @@
+! { dg-do run }
+!
+! PR fortran/90608
+! Check the correct behaviour of the inline MINLOC implementation,
+! when ARRAY is filled with NANs.
+
+program p
+  implicit none
+  call check_without_mask
+  call check_with_mask
+contains
+  subroutine check_without_mask()
+use, intrinsic :: ieee_arithmetic
+real, allocatable :: a(:,:,:)
+real :: nan
+integer, allocatable :: m(:)
+if (.not. ieee_support_nan(nan)) return
+nan = ieee_value(nan, ieee_quiet_nan)
+allocate(a(3,3,3), source = nan)
+m = minloc(a)
+if (size(m, dim=1) /= 3) stop 32
+if (any(m /= (/ 1, 1, 1 /))) stop 35
+  end subroutine
+  subroutine check_with_mask()
+use, intrinsic :: ieee_arithmetic
+real, allocatable :: a(:,:,:)
+logical, allocatable :: m(:,:,:)
+real :: nan
+integer, allocatable :: r(:)
+if (.not. ieee_support_nan(nan)) return
+nan = ieee_value(nan, ieee_quiet_nan)
+allocate(a(3,3,3), source = nan)
+allocate(m(3,3,3))
+m(:,:,:) = reshape((/ .false., .false., .true. , .true. , .false., &
+  .true. , .false., .false., .false., .true. , &
+  .true. , .false., .true. , .true. , .true. , &
+  .false., .false., .true. , .true. , .false., &
+  .false., .true. , .false., .false., .true. , &
+  .true. , .true. /), shape(m))
+r = minloc(a, mask = m)
+if (size(r, dim = 1) /= 3) stop 62
+if (any(r /= (/ 3, 1, 1 /))) stop 65
+  end subroutine
+end program p
diff --git a/gcc

[PATCH v3 10/10] fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]

2024-08-23 Thread Mikael Morin
From: Mikael Morin 

The documentation in this patch was partly reworded, compared
to the previous version posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660607.html
The rest of the patch is unchanged, just rebased to a more recent
master.

Joseph is in CC as I need a ack for the new option.

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Introduce the -finline-intrinsics flag to control from the command line
whether to generate either inline code or calls to the functions from the
library, for the MINLOC and MAXLOC intrinsics.

The flag allows to specify inlining either independently for each intrinsic
(either MINLOC or MAXLOC), or all together.  For each intrinsic, a default
value is set if none was set.  The default value depends on the optimization
setting: inlining is avoided if not optimizing or if optimizing for size;
otherwise inlining is preferred.

There is no direct support for this behaviour provided by the .opt options
framework.  It is obtained by defining three different variants of the flag
(finline-intrinsics, fno-inline-intrinsics, finline-intrinsics=) all using
the same underlying option variable.  Each enum value (corresponding to an
intrinsic function) uses two identical bits, and the variable is initialized
with alternated bits, so that we can tell whether the value was set or not
by checking whether the two bits have different values.

PR fortran/90608

gcc/ChangeLog:

* flag-types.h (enum gfc_inlineable_intrinsics): New type.

gcc/fortran/ChangeLog:

* invoke.texi(finline-intrinsics): Document new flag.
* lang.opt (finline-intrinsics, finline-intrinsics=,
fno-inline-intrinsics): New flags.
* options.cc (gfc_post_options): If the option variable controling
the inlining of MAXLOC (respectively MINLOC) has not been set, set
it or clear it depending on the optimization option variables.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return false
if inlining for the intrinsic is disabled according to the option
variable.

gcc/testsuite/ChangeLog:

* gfortran.dg/minmaxloc_18.f90: New test.
* gfortran.dg/minmaxloc_18a.f90: New test.
* gfortran.dg/minmaxloc_18b.f90: New test.
* gfortran.dg/minmaxloc_18c.f90: New test.
* gfortran.dg/minmaxloc_18d.f90: New test.
---
 gcc/flag-types.h|  30 +
 gcc/fortran/invoke.texi |  31 +
 gcc/fortran/lang.opt|  27 +
 gcc/fortran/options.cc  |  21 +-
 gcc/fortran/trans-intrinsic.cc  |  13 +-
 gcc/testsuite/gfortran.dg/minmaxloc_18.f90  | 772 
 gcc/testsuite/gfortran.dg/minmaxloc_18a.f90 |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18b.f90 |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18c.f90 |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18d.f90 |  10 +
 10 files changed, 929 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18b.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18c.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18d.f90

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 1e497f0bb91..df56337f7e8 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -451,6 +451,36 @@ enum gfc_convert
 };
 
 
+/* gfortran -finline-intrinsics= values;
+   We use two identical bits for each value, and initialize with alternated
+   bits, so that we can check whether a value has been set by checking whether
+   the two bits have identical value.  */
+
+#define GFC_INL_INTR_VAL(idx) (3 << (2 * idx))
+#define GFC_INL_INTR_UNSET_VAL(val) (0x & (val))
+
+enum gfc_inlineable_intrinsics
+{
+  GFC_FLAG_INLINE_INTRINSIC_NONE = 0,
+  GFC_FLAG_INLINE_INTRINSIC_MAXLOC = GFC_INL_INTR_VAL (0),
+  GFC_FLAG_INLINE_INTRINSIC_MINLOC = GFC_INL_INTR_VAL (1),
+  GFC_FLAG_INLINE_INTRINSIC_ALL = GFC_FLAG_INLINE_INTRINSIC_MAXLOC
+ | GFC_FLAG_INLINE_INTRINSIC_MINLOC,
+
+  GFC_FLAG_INLINE_INTRINSIC_NONE_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_NONE),
+  GFC_FLAG_INLINE_INTRINSIC_MAXLOC_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_MAXLOC),
+  GFC_FLAG_INLINE_INTRINSIC_MINLOC_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_MINLOC),
+  GFC_FLAG_INLINE_INTRINSIC_ALL_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_ALL)
+};
+
+#undef GFC_INL_INTR_UNSET_VAL
+#undef GFC_INL_INTR_VAL
+
+
 /* Inline String Operations functions.  */
 enum ilsop_fn
 {
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 6bc42afe2c4..3d59728f433 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -194,6 +194,7 @@ and warnings}.
 -finit-character=

Re: [PATCH v5] RISC-V: Enable -gvariable-location-views by default

2024-08-23 Thread Rainer Orth
Richard Biener  writes:

> On Thu, 22 Aug 2024, Bernd Edlinger wrote:
>
>> This affects only the RISC-V targets, where the compiler options
>> -gvariable-location-views and consequently also -ginline-points
>> are disabled by default, which is unexpected and disables some
>> useful features of the generated debug info.
[...]
>> gcc/ChangeLog:
>> 
>> * dwarf2out.cc (dwarf2out_maybe_output_loclist_view_pair,
>> output_loc_list): Correct handling of -gno-as-loc-support,
>> use ZERO_VIEW_P to output view number as zero value.
>> * toplev.cc (process_options): Do not automatically disable
>> -gvariable-location-views when -gno-as-loc-support or
>> -gno-as-locview-support is used, instead do automatically
>> disable -gas-locview-support if -gno-as-loc-support is used.

Unfortunately, this patch broke Solaris/x86 bootstrap with the native
as: PR debug/116470.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[COMMITTED 02/16] ada: Error missing when 'access is applied to an interface type object

2024-08-23 Thread Marc Poulhiès
From: Javier Miranda 

The compiler does not report an error when 'access is applied to
a non-aliased class-wide interface type object.

gcc/ada/

* exp_util.ads (Is_Expanded_Class_Wide_Interface_Object_Decl): New
subprogram.
* exp_util.adb (Is_Expanded_Class_Wide_Interface_Object_Decl):
ditto.
* sem_util.adb (Is_Aliased_View): Handle expanded class-wide type
object declaration.
* checks.adb (Is_Aliased_Unconstrained_Component): Protect the
frontend against calling Is_Aliased_View with Empty. Found working
on this issue.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb   |  2 +-
 gcc/ada/exp_util.adb | 15 +++
 gcc/ada/exp_util.ads |  5 +
 gcc/ada/sem_util.adb |  4 
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 38fe687bc7a..77043ca07c2 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -1549,7 +1549,7 @@ package body Checks is
   then
  if (Etype (N) = Typ
   or else (Do_Access and then Designated_Type (Typ) = S_Typ))
-   and then not Is_Aliased_View (Lhs)
+   and then (No (Lhs) or else not Is_Aliased_View (Lhs))
  then
 return;
  end if;
diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index ef8c91dfe94..392bf3a511e 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -8574,6 +8574,21 @@ package body Exp_Util is
   and then Is_Formal (Entity (N)));
end Is_Conversion_Or_Reference_To_Formal;
 
+   --
+   -- Is_Expanded_Class_Wide_Interface_Object_Decl --
+   --
+
+   function Is_Expanded_Class_Wide_Interface_Object_Decl
+  (N : Node_Id) return Boolean is
+   begin
+  return not Comes_From_Source (N)
+and then Nkind (Original_Node (N)) = N_Object_Declaration
+and then Nkind (N) = N_Object_Renaming_Declaration
+and then Is_Class_Wide_Type (Etype (Defining_Identifier (N)))
+and then Is_Interface (Etype (Defining_Identifier (N)))
+and then Nkind (Name (N)) = N_Explicit_Dereference;
+   end Is_Expanded_Class_Wide_Interface_Object_Decl;
+
--
-- Is_Finalizable_Transient --
--
diff --git a/gcc/ada/exp_util.ads b/gcc/ada/exp_util.ads
index 14d9e345b53..279feb2e6fe 100644
--- a/gcc/ada/exp_util.ads
+++ b/gcc/ada/exp_util.ads
@@ -773,6 +773,11 @@ package Exp_Util is
--  Return True if N is a type conversion, or a dereference thereof, or a
--  reference to a formal parameter.
 
+   function Is_Expanded_Class_Wide_Interface_Object_Decl
+  (N : Node_Id) return Boolean;
+   --  Determine if N is the expanded code for a class-wide interface type
+   --  object declaration.
+
function Is_Finalizable_Transient
  (Decl : Node_Id;
   N: Node_Id) return Boolean;
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 3f956098c6d..ab7fcf8dfd1 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -15223,6 +15223,10 @@ package body Sem_Util is
   then
  return Is_Aliased_View (Expression (Obj));
 
+  elsif Is_Expanded_Class_Wide_Interface_Object_Decl (Parent (Obj)) then
+ return Is_Aliased
+  (Defining_Identifier (Original_Node (Parent (Obj;
+
   --  The dereference of an access-to-object value denotes an aliased view,
   --  but this routine uses the rules of the language so we need to exclude
   --  rewritten constructs that introduce artificial dereferences.
-- 
2.45.2



[COMMITTED 03/16] ada: First controlling parameter aspect

2024-08-23 Thread Marc Poulhiès
From: Javier Miranda 

gcc/ada/

* sem_ch13.adb (Analyze_One_Aspect): Temporarily remove reporting
an error when the new aspect is set to True and the extensions are
not enabled.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index f4ff3a28273..3487931bf4d 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -4524,9 +4524,6 @@ package body Sem_Ch13 is
 if (No (Expr) or else Entity (Expr) = Standard_True)
   and then not Core_Extensions_Allowed
 then
-   Error_Msg_GNAT_Extension
- ("'First_'Controlling_'Parameter", Sloc (Aspect),
-  Is_Core_Extension => True);
goto Continue;
 end if;
 
-- 
2.45.2



[COMMITTED 04/16] ada: Fix validity checks for named parameter associations

2024-08-23 Thread Marc Poulhiès
From: Piotr Trojanek 

When iterating over actual and formal parameters, we should use
First_Actual/Next_Actual and not simply First/Next, because the
order of actual parameters might be different than the order of
formal parameters obtained with First_Formal/Next_Formal.

This patch fixes a glitch in validity checks for actual parameters
and applies the same fix to other misuses of First/Next as well.

gcc/ada/

* checks.adb (Ensure_Valid): Use First_Actual/Next_Actual.
* exp_ch6.adb (Is_Direct_Deep_Call): Likewise.
* exp_util.adb (Type_Of_Formal): Likewise.
* sem_util.adb (Is_Container_Element): Likewise; cleanup
membership test by using a subtype.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb   |  4 ++--
 gcc/ada/exp_ch6.adb  |  4 ++--
 gcc/ada/exp_util.adb |  4 ++--
 gcc/ada/sem_util.adb | 10 --
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 77043ca07c2..343f027608b 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -6840,7 +6840,7 @@ package body Checks is
  --  OUT parameter for which we are the argument.
 
  F := First_Formal (E);
- A := First (L);
+ A := First_Actual (P);
  while Present (F) loop
 if A = N
   and then (Ekind (F) = E_Out_Parameter
@@ -6850,7 +6850,7 @@ package body Checks is
 end if;
 
 Next_Formal (F);
-Next (A);
+Next_Actual (A);
  end loop;
   end if;
end if;
diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
index 24b754731d2..420d5f44a69 100644
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -3879,7 +3879,7 @@ package body Exp_Ch6 is
Formal : Entity_Id;
 
 begin
-   Actual := First (Parameter_Associations (Call_Node));
+   Actual := First_Actual (Call_Node);
Formal := First_Formal (Subp);
while Present (Actual)
  and then Present (Formal)
@@ -3891,7 +3891,7 @@ package body Exp_Ch6 is
  return True;
   end if;
 
-  Next (Actual);
+  Next_Actual (Actual);
   Next_Formal (Formal);
end loop;
 end;
diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 392bf3a511e..756638c52c2 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -13070,14 +13070,14 @@ package body Exp_Util is
   begin
  --  Examine the list of actual and formal parameters in parallel
 
- A := First (Parameter_Associations (Call));
+ A := First_Actual (Call);
  F := First_Formal (Entity (Name (Call)));
  while Present (A) and then Present (F) loop
 if A = Actual then
return Etype (F);
 end if;
 
-Next (A);
+Next_Actual (A);
 Next_Formal (F);
  end loop;
 
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index ab7fcf8dfd1..688d9232b44 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -15918,10 +15918,8 @@ package body Sem_Util is
elsif Nkind (Parent (Par)) = N_Object_Renaming_Declaration then
   return False;
 
-   elsif Nkind (Parent (Par)) in
-   N_Function_Call|
-   N_Procedure_Call_Statement |
-   N_Entry_Call_Statement
+   elsif Nkind (Parent (Par)) in N_Entry_Call_Statement
+   | N_Subprogram_Call
then
   --  Check that the element is not part of an actual for an
   --  in-out parameter.
@@ -15932,14 +15930,14 @@ package body Sem_Util is
 
   begin
  F := First_Formal (Entity (Name (Parent (Par;
- A := First (Parameter_Associations (Parent (Par)));
+ A := First_Actual (Parent (Par));
  while Present (F) loop
 if A = Par and then Ekind (F) /= E_In_Parameter then
return False;
 end if;
 
 Next_Formal (F);
-Next (A);
+Next_Actual (A);
  end loop;
   end;
 
-- 
2.45.2



[COMMITTED 06/16] ada: Cleanup validity of boolean operators

2024-08-23 Thread Marc Poulhiès
From: Piotr Trojanek 

Move detection of always valid expressions from routine
Ensure_Valid (which inserts validity checks) to Expr_Known_Valid
(which decides their validity). In particular, this patch removes
duplicated detection of boolean operators, which were recognized
in both these routines.

Code cleanup; behavior is unaffected.

gcc/ada/

* checks.adb (Ensure_Valid): Remove detection of boolean and
short-circuit operators.
(Expr_Known_Valid): Detect short-circuit operators; detection of
boolean operators was already done in this routine.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index d13e7bb3269..3650c070b7a 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -6816,17 +6816,6 @@ package body Checks is
  end if;
   end if;
 
-  --  If this is a boolean expression, only its elementary operands need
-  --  checking: if they are valid, a boolean or short-circuit operation
-  --  with them will be valid as well.
-
-  if Base_Type (Typ) = Standard_Boolean
-and then
- (Nkind (Expr) in N_Op or else Nkind (Expr) in N_Short_Circuit)
-  then
- return;
-  end if;
-
   --  If we fall through, a validity check is required
 
   Insert_Valid_Check (Expr, Related_Id, Is_Low_Bound, Is_High_Bound);
@@ -6947,9 +6936,10 @@ package body Checks is
  return True;
 
   --  The result of a membership test is always valid, since it is true or
-  --  false, there are no other possibilities.
+  --  false, there are no other possibilities; same for short-circuit
+  --  operators.
 
-  elsif Nkind (Expr) in N_Membership_Test then
+  elsif Nkind (Expr) in N_Membership_Test | N_Short_Circuit then
  return True;
 
   --  For all other cases, we do not know the expression is valid
-- 
2.45.2



[COMMITTED 15/16] ada: String interpolation: report error without Extensions allowed

2024-08-23 Thread Marc Poulhiès
From: Javier Miranda 

The compiler does not report the correct error in occurrences
of interpolated strings, when the sources are compiled without
language extensions allowed.

gcc/ada/

* scng.adb (Scan): Call Error_Msg_GNAT_Extension() to report an
error, when the sources are compiled without Core_Extensions_
Allowed, and the scanner detects the beginning of an interpolated
string.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/scng.adb | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/gcc/ada/scng.adb b/gcc/ada/scng.adb
index 08ce2ab5ad1..658970fbab2 100644
--- a/gcc/ada/scng.adb
+++ b/gcc/ada/scng.adb
@@ -2135,14 +2135,19 @@ package body Scng is
  --  Lower case letters
 
  when 'a' .. 'z' =>
-if Core_Extensions_Allowed
-  and then Source (Scan_Ptr) = 'f'
+if Source (Scan_Ptr) = 'f'
   and then Source (Scan_Ptr + 1) = '"'
 then
-   Scan_Ptr := Scan_Ptr + 1;
-   Accumulate_Checksum (Source (Scan_Ptr));
-   Token := Tok_Left_Interpolated_String;
-   return;
+   if Core_Extensions_Allowed then
+  Scan_Ptr := Scan_Ptr + 1;
+  Accumulate_Checksum (Source (Scan_Ptr));
+  Token := Tok_Left_Interpolated_String;
+  return;
+   else
+  Error_Msg_GNAT_Extension
+("interpolated string", Scan_Ptr,
+ Is_Core_Extension => True);
+   end if;
 end if;
 
 Name_Len := 1;
@@ -2155,15 +2160,20 @@ package body Scng is
  --  Upper case letters
 
  when 'A' .. 'Z' =>
-if Core_Extensions_Allowed
-  and then Source (Scan_Ptr) = 'F'
+if Source (Scan_Ptr) = 'F'
   and then Source (Scan_Ptr + 1) = '"'
 then
-   Error_Msg_S
- ("delimiter of interpolated string must be in lowercase");
-   Scan_Ptr := Scan_Ptr + 1;
-   Token := Tok_Left_Interpolated_String;
-   return;
+   if Core_Extensions_Allowed then
+  Error_Msg_S
+("delimiter of interpolated string must be in lowercase");
+  Scan_Ptr := Scan_Ptr + 1;
+  Token := Tok_Left_Interpolated_String;
+  return;
+   else
+  Error_Msg_GNAT_Extension
+("interpolated string", Scan_Ptr,
+ Is_Core_Extension => True);
+   end if;
 end if;
 
 Token_Contains_Uppercase := True;
-- 
2.45.2



[COMMITTED 01/16] ada: First controlling parameter aspect

2024-08-23 Thread Marc Poulhiès
From: Javier Miranda 

This patch adds support for a new GNAT aspect/pragma that modifies
the semantics of dispatching primitives. When a tagged type has
this aspect/pragma, only subprograms that have the first parameter
of this type will be considered dispatching primitives; this new
pragma/aspect is inherited by all descendant types.

gcc/ada/

* aspects.ads (Aspect_First_Controlling_Parameter): New aspect.
Defined as implementation defined aspect that has a static boolean
value and it is converted to pragma when the value is True.
* einfo.ads (Has_First_Controlling_Parameter): New attribute.
* exp_ch9.adb (Build_Corresponding_Record): Propagate the aspect
to the corresponding record type.
(Expand_N_Protected_Type_Declaration): Analyze the inherited
aspect to add the pragma.
(Expand_N_Task_Type_Declaration): ditto.
* freeze.adb (Warn_If_Implicitly_Inherited_Aspects): New
subprogram.
(Has_First_Ctrl_Param_Aspect): New subprogram.
(Freeze_Record_Type): Call Warn_If_Implicitly_Inherited_Aspects.
(Freeze_Subprogram): Check illegal subprograms of tagged types and
interface types that have this new aspect.
* gen_il-fields.ads (Has_First_Controlling_Parameter): New entity
field.
* gen_il-gen-gen_entities.adb (Has_First_Controlling_Parameter):
The new field is a semantic flag.
* gen_il-internals.adb (Image): Add
Has_First_Controlling_Parameter.
* par-prag.adb (Prag): No action for
Pragma_First_Controlling_Parameter since processing is handled
entirely in Sem_Prag.
* sem_ch12.adb (Validate_Private_Type_Instance): When the generic
formal has this new aspect, check that the actual type also has
this aspect.
* sem_ch13.adb (Analyze_One_Aspect): Check that the aspect is
applied to a tagged type or a concurrent type.
* sem_ch3.adb (Analyze_Full_Type_Declaration): Derived tagged
types inherit this new aspect, and also from their implemented
interface types.
(Process_Full_View): Propagate the aspect to the full view.
* sem_ch6.adb (Is_A_Primitive): New subprogram; used to factor
code and also clarify detection of primitives.
* sem_ch9.adb (Check_Interfaces): Propagate this new aspect to the
type implementing interface types.
* sem_disp.adb (Check_Controlling_Formals): Handle tagged type
that has the aspect and has subprograms overriding primitives of
tagged types that lack this aspect.
(Check_Dispatching_Operation): Warn on dispatching primitives
disallowed by this new aspect.
(Has_Predefined_Dispatching_Operation_Name): New subprogram.
(Find_Dispatching_Type): Handle dispatching functions of tagged
types that have the new aspect.
(Find_Primitive_Covering_Interface): For primitives of tagged
types that have the aspect and override a primitive of a parent
type that does not have the aspect, we must temporarily unset
attribute First_Controlling_ Parameter to properly check
conformance.
* sem_prag.ads (Aspect_Specifying_Pragma): Add new pragma.
* sem_prag.adb (Pragma_First_Controlling_Parameter): Handle new
pragma.
* snames.ads-tmpl (Name_First_Controlling_Parameter): New name.
* warnsw.ads (Warn_On_Non_Dispatching_Primitives): New warning.
* warnsw.adb (Warn_On_Non_Dispatching_Primitives): New warning;
not set by default when GNAT_Mode warnings are enabled, nor when
all warnings are enabled (-gnatwa).

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/aspects.ads |   5 +
 gcc/ada/einfo.ads   |   9 +
 gcc/ada/exp_ch9.adb |  73 
 gcc/ada/freeze.adb  | 279 
 gcc/ada/gen_il-fields.ads   |   1 +
 gcc/ada/gen_il-gen-gen_entities.adb |   3 +
 gcc/ada/gen_il-internals.adb|   2 +
 gcc/ada/par-prag.adb|   1 +
 gcc/ada/sem_ch12.adb|  12 ++
 gcc/ada/sem_ch13.adb|  52 ++
 gcc/ada/sem_ch3.adb |  48 +
 gcc/ada/sem_ch6.adb |  83 +++--
 gcc/ada/sem_ch9.adb |   8 +
 gcc/ada/sem_disp.adb| 207 -
 gcc/ada/sem_prag.adb|  86 +
 gcc/ada/sem_prag.ads|   1 +
 gcc/ada/snames.ads-tmpl |   2 +
 gcc/ada/warnsw.adb  |   4 +-
 gcc/ada/warnsw.ads  |   7 +
 19 files changed, 860 insertions(+), 23 deletions(-)

diff --git a/gcc/ada/aspects.ads b/gcc/ada/aspects.ads
index 9d0a9eb0110..adaa11f8a93 100644
--- a/gcc/ada/aspects.ads
+++ b/gcc/ada/aspects.ads
@@ -198,6 +198,7 @@ package Aspects is
   Aspect_Export,
   Aspect_Extensions_Visible,  

[COMMITTED 05/16] ada: Simplify validity checks for scalar parameters

2024-08-23 Thread Marc Poulhiès
From: Piotr Trojanek 

Replace low-level iteration over formal and actual parameters with a
call to high-level Find_Actual routine. Code cleanup; behavior is
unaffected.

gcc/ada/

* checks.adb (Ensure_Valid): Use Find_Actual.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 58 +++---
 1 file changed, 8 insertions(+), 50 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 343f027608b..d13e7bb3269 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -6799,60 +6799,18 @@ package body Checks is
 
  if Is_Scalar_Type (Typ) then
 declare
-   P : Node_Id;
-   N : Node_Id;
-   E : Entity_Id;
-   F : Entity_Id;
-   A : Node_Id;
-   L : List_Id;
+   Formal : Entity_Id;
+   Call   : Node_Id;
 
 begin
-   --  Find actual argument (which may be a parameter association)
-   --  and the parent of the actual argument (the call statement)
+   Find_Actual (Expr, Formal, Call);
 
-   N := Expr;
-   P := Parent (Expr);
-
-   if Nkind (P) = N_Parameter_Association then
-  N := P;
-  P := Parent (N);
-   end if;
-
-   --  If this is an indirect or dispatching call, get signature
-   --  from the subprogram type.
-
-   if Nkind (P) in N_Entry_Call_Statement
- | N_Function_Call
- | N_Procedure_Call_Statement
+   if Present (Formal)
+ and then
+   (Ekind (Formal) = E_Out_Parameter
+  or else Mechanism (Formal) = By_Reference)
then
-  E := Get_Called_Entity (P);
-  L := Parameter_Associations (P);
-
-  --  Only need to worry if there are indeed actuals, and if
-  --  this could be a subprogram call, otherwise we cannot get
-  --  a match (either we are not an argument, or the mode of
-  --  the formal is not OUT). This test also filters out the
-  --  generic case.
-
-  if Is_Non_Empty_List (L) and then Is_Subprogram (E) then
-
- --  This is the loop through parameters, looking for an
- --  OUT parameter for which we are the argument.
-
- F := First_Formal (E);
- A := First_Actual (P);
- while Present (F) loop
-if A = N
-  and then (Ekind (F) = E_Out_Parameter
- or else Mechanism (F) = By_Reference)
-then
-   return;
-end if;
-
-Next_Formal (F);
-Next_Actual (A);
- end loop;
-  end if;
+  return;
end if;
 end;
  end if;
-- 
2.45.2



[COMMITTED 08/16] ada: First controlling parameter aspect

2024-08-23 Thread Marc Poulhiès
From: Javier Miranda 

gcc/ada/

* sem_ch6.adb (Check_Private_Overriding): Improve code detecting
error on private function with controlling result. Fixes the
regression of ACATS bde0003.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch6.adb | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index 008c3a7ba13..461bdfcbe4b 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -11535,8 +11535,16 @@ package body Sem_Ch6 is
 --  operation. That's illegal in the tagged case
 --  (but not if the private type is untagged).
 
+--  Do not report this error when the tagged type has
+--  the First_Controlling_Parameter aspect, unless the
+--  function has a controlling result (which is only
+--  possible if the function overrides an inherited
+--  primitive).
+
 if T = Base_Type (Etype (S))
-  and then Has_Controlling_Result (S)
+  and then
+(not Has_First_Controlling_Parameter_Aspect (T)
+   or else Has_Controlling_Result (S))
 then
Error_Msg_N
  ("private function with controlling result must"
@@ -11550,7 +11558,9 @@ package body Sem_Ch6 is
 
 elsif Ekind (Etype (S)) = E_Anonymous_Access_Type
   and then T = Base_Type (Designated_Type (Etype (S)))
-  and then Has_Controlling_Result (S)
+  and then
+(not Has_First_Controlling_Parameter_Aspect (T)
+   or else Has_Controlling_Result (S))
   and then Ada_Version >= Ada_2012
 then
Error_Msg_N
-- 
2.45.2



[COMMITTED 09/16] ada: Emit a warning on inheritly limited types

2024-08-23 Thread Marc Poulhiès
From: Viljar Indus 

Record types that do not have a limited keyword but have a
member with a limited type are also considered to be limited types.
This can be confusing to understand for newer Ada users. It is
better to emit a warning in this scenario and suggest that the
type should be marked with a limited keyword. This diagnostic will
be acticated when the -gnatw_l switch is used.

gcc/ada/

* sem_ch3.adb: Add method Check_Inherited_Limted_Record for
emitting the warning for an inherited limited type.
* warnsw.adb: Add processing for the -gnatw_l switch that
triggeres the inheritly limited type warning.
* warnsw.ads: same as above.
* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Add
entry for -gnatw_l switch.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 ...building_executable_programs_with_gnat.rst | 17 ++
 gcc/ada/gnat_ugn.texi | 27 ++-
 gcc/ada/sem_ch3.adb   | 34 +++
 gcc/ada/warnsw.adb|  3 +-
 gcc/ada/warnsw.ads|  7 
 5 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst 
b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
index ce3ed0cc65a..07ca2ea22c3 100644
--- a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
+++ b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
@@ -3430,6 +3430,23 @@ of the pragma in the :title:`GNAT_Reference_manual`).
   This switch suppresses listing of inherited aspects.
 
 
+.. index:: -gnatw_l  (gcc)
+
+:switch:`-gnatw_l`
+  *Activate warnings on inheritely limited types.*
+
+  This switch causes the compiler trigger warnings on record types that do not
+  have a limited keyword but contain a component that is a limited type.
+
+
+.. index:: -gnatw_L  (gcc)
+
+:switch:`-gnatw_L`
+  *Suppress warnings on inheritely limited types.*
+
+  This switch suppresses warnings on inheritely limited types.
+
+
 .. index:: -gnatwm  (gcc)
 
 :switch:`-gnatwm`
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index 0e3ee935552..dcde9ea705b 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-GNAT User's Guide for Native Platforms , Jul 29, 2024
+GNAT User's Guide for Native Platforms , Aug 19, 2024
 
 AdaCore
 
@@ -11671,6 +11671,31 @@ Pre’Class, and Post’Class aspects. Also list inherited 
subtype predicates.
 This switch suppresses listing of inherited aspects.
 @end table
 
+@geindex -gnatw_l (gcc)
+
+
+@table @asis
+
+@item @code{-gnatw_l}
+
+`Activate warnings on inheritely limited types.'
+
+This switch causes the compiler trigger warnings on record types that do not
+have a limited keyword but contain a component that is a limited type.
+@end table
+
+@geindex -gnatw_L (gcc)
+
+
+@table @asis
+
+@item @code{-gnatw_L}
+
+`Suppress warnings on inheritely limited types.'
+
+This switch suppresses warnings on inheritely limited types.
+@end table
+
 @geindex -gnatwm (gcc)
 
 
diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index 3b44f0a5100..4dac4eec108 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -741,6 +741,11 @@ package body Sem_Ch3 is
--  Check that an entity in a list of progenitors is an interface,
--  emit error otherwise.
 
+   procedure Warn_On_Inherently_Limited_Type (E : Entity_Id);
+   --  Emit a warning if a record type that does not have a limited keyword in
+   --  its definition has any components that are limited (which implicitly
+   --  make the type limited).
+
---
-- Access_Definition --
---
@@ -22924,6 +22929,8 @@ package body Sem_Ch3 is
  Derive_Progenitor_Subprograms (T, T);
   end if;
 
+  Warn_On_Inherently_Limited_Type (T);
+
   Check_Function_Writable_Actuals (N);
end Record_Type_Declaration;
 
@@ -23396,4 +23403,31 @@ package body Sem_Ch3 is
   Set_Is_Constrained (T);
end Signed_Integer_Type_Declaration;
 
+   -
+   -- Warn_On_Inherently_Limited_Type --
+   -
+
+   procedure Warn_On_Inherently_Limited_Type (E : Entity_Id) is
+  C : Entity_Id;
+   begin
+  if Warnsw.Warn_On_Inherently_Limited_Type
+and then not Is_Limited_Record (E)
+  then
+ C := First_Component (Base_Type (E));
+ while Present (C) loop
+if Is_Inherently_Limited_Type (Etype (C)) then
+   Error_Msg_Node_2 := E;
+   Error_Msg_NE
+ ("?_l?limited component & makes & limited", E, C);
+   Error_Msg_N
+ ("\\?_l?consider annotating the record type "
+  & "with a LIMITED keyword", E);
+   exit;
+end if;
+
+Ne

[COMMITTED 12/16] ada: Implicit_Dereference aspect specification for subtype incorrectly accepted

2024-08-23 Thread Marc Poulhiès
From: Steve Baird 

Implicit_Dereference is a type-specific aspect and therefore cannot be
legally specified as part of a subtype declaration.

gcc/ada/

* sem_ch13.adb (Analyze_Aspect_Implicit_Dereference): Generate
error if an aspect specification specifies the
Implicit_Dereference aspect of a non-first subtype.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 0546aa37de7..a55ba3c7bd9 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -1982,6 +1982,11 @@ package body Sem_Ch13 is
   Error_Msg_N
 ("aspect must apply to a type with discriminants", Expr);
 
+   elsif not Is_First_Subtype (E) then
+  Error_Msg_N
+("aspect not specifiable in a subtype declaration",
+ Aspect);
+
elsif not Is_Entity_Name (Expr) then
   Error_Msg_N
 ("aspect must name a discriminant of current type", Expr);
-- 
2.45.2



[COMMITTED 10/16] ada: Update libraries with the limited flag

2024-08-23 Thread Marc Poulhiès
From: Viljar Indus 

Records without a limited keyword now emit a warning if
they contain a member that has an inherently limited type.

gcc/ada/

* libgnat/a-coinho__shared.ads: add limited keyword.
* libgnat/g-awk.adb: add limited keyword.
* libgnat/g-comlin.ads: add limited keyword.
* libgnat/s-excmac__arm.ads: add limited keyword.
* libgnat/s-excmac__gcc.ads: add limited keyword.
* libgnat/s-soflin.ads: add limited keyword.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-coinho__shared.ads | 2 +-
 gcc/ada/libgnat/g-awk.adb| 2 +-
 gcc/ada/libgnat/g-comlin.ads | 4 ++--
 gcc/ada/libgnat/s-excmac__arm.ads| 2 +-
 gcc/ada/libgnat/s-excmac__gcc.ads| 2 +-
 gcc/ada/libgnat/s-soflin.ads | 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/libgnat/a-coinho__shared.ads 
b/gcc/ada/libgnat/a-coinho__shared.ads
index ddab1fd7d93..57abd1bafe3 100644
--- a/gcc/ada/libgnat/a-coinho__shared.ads
+++ b/gcc/ada/libgnat/a-coinho__shared.ads
@@ -109,7 +109,7 @@ private
 
type Holder_Access is access all Holder;
 
-   type Shared_Holder is record
+   type Shared_Holder is limited record
   Counter : System.Atomic_Counters.Atomic_Counter;
   Element : Element_Access;
end record;
diff --git a/gcc/ada/libgnat/g-awk.adb b/gcc/ada/libgnat/g-awk.adb
index 62856d9204a..c9284944dd5 100644
--- a/gcc/ada/libgnat/g-awk.adb
+++ b/gcc/ada/libgnat/g-awk.adb
@@ -261,7 +261,7 @@ package body GNAT.AWK is
-- Session Data --
--
 
-   type Session_Data is record
+   type Session_Data is limited record
   Current_File : Text_IO.File_Type;
   Current_Line : Unbounded_String;
   Separators   : Split.Mode_Access;
diff --git a/gcc/ada/libgnat/g-comlin.ads b/gcc/ada/libgnat/g-comlin.ads
index c20cd5eb31a..2a131e5d78c 100644
--- a/gcc/ada/libgnat/g-comlin.ads
+++ b/gcc/ada/libgnat/g-comlin.ads
@@ -1045,7 +1045,7 @@ private
 
type Depth is range 1 .. Max_Depth;
 
-   type Level is record
+   type Level is limited record
   Name_Last : Natural := 0;
   Dir   : GNAT.Directory_Operations.Dir_Type;
end record;
@@ -1087,7 +1087,7 @@ private
   --  separators in the pattern.
end record;
 
-   type Opt_Parser_Data (Arg_Count : Natural) is record
+   type Opt_Parser_Data (Arg_Count : Natural) is limited record
   Arguments : GNAT.OS_Lib.Argument_List_Access;
   --  null if reading from the command line
 
diff --git a/gcc/ada/libgnat/s-excmac__arm.ads 
b/gcc/ada/libgnat/s-excmac__arm.ads
index 23d02f85ff9..463191d6b42 100644
--- a/gcc/ada/libgnat/s-excmac__arm.ads
+++ b/gcc/ada/libgnat/s-excmac__arm.ads
@@ -154,7 +154,7 @@ package System.Exceptions.Machine is
--  A GNAT exception object to be dealt with by the personality routine
--  called by the GCC unwinding runtime.
 
-   type GNAT_GCC_Exception is record
+   type GNAT_GCC_Exception is limited record
   Header : Unwind_Control_Block;
   --  ABI Exception header first
 
diff --git a/gcc/ada/libgnat/s-excmac__gcc.ads 
b/gcc/ada/libgnat/s-excmac__gcc.ads
index 24899055506..6cbc92654ec 100644
--- a/gcc/ada/libgnat/s-excmac__gcc.ads
+++ b/gcc/ada/libgnat/s-excmac__gcc.ads
@@ -142,7 +142,7 @@ package System.Exceptions.Machine is
--  A GNAT exception object to be dealt with by the personality routine
--  called by the GCC unwinding runtime.
 
-   type GNAT_GCC_Exception is record
+   type GNAT_GCC_Exception is limited record
   Header : Unwind_Exception;
   --  ABI Exception header first
 
diff --git a/gcc/ada/libgnat/s-soflin.ads b/gcc/ada/libgnat/s-soflin.ads
index c2d947535d9..61025e5961d 100644
--- a/gcc/ada/libgnat/s-soflin.ads
+++ b/gcc/ada/libgnat/s-soflin.ads
@@ -339,7 +339,7 @@ package System.Soft_Links is
--  specific data. This type is used to store the necessary data into the
--  Task_Control_Block or into a global variable in the non tasking case.
 
-   type TSD is record
+   type TSD is limited record
   Pri_Stack_Info : aliased Stack_Checking.Stack_Info;
   --  Information on stack (Base/Limit/Size) used by System.Stack_Checking.
   --  If this TSD does not belong to the environment task, the Size field
-- 
2.45.2



[COMMITTED 07/16] ada: Fix style in lines starting with assignment operator

2024-08-23 Thread Marc Poulhiès
From: Piotr Trojanek 

Style cleanup; semantics is unaffected. Offending occurrences found with
grep "^ *:=" and fixed manually.

gcc/ada/

* checks.ads, cstand.adb, exp_aggr.adb, exp_ch4.adb, exp_ch5.adb,
exp_dbug.adb, exp_util.adb, gnatlink.adb, lib-util.adb,
libgnat/a-except.adb, libgnat/a-exexpr.adb, libgnat/a-ngcoar.adb,
libgnat/s-rannum.adb, libgnat/s-trasym__dwarf.adb, osint.adb,
rtsfind.adb, sem_case.adb, sem_ch12.adb, sem_ch13.adb,
sem_ch3.adb, sem_ch6.adb, sem_eval.adb, sem_prag.adb,
sem_util.adb: Fix style.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.ads  |  4 +-
 gcc/ada/cstand.adb  |  4 +-
 gcc/ada/exp_aggr.adb| 14 +++---
 gcc/ada/exp_ch4.adb | 16 +++
 gcc/ada/exp_ch5.adb | 10 ++---
 gcc/ada/exp_dbug.adb|  4 +-
 gcc/ada/exp_util.adb| 28 ++--
 gcc/ada/gnatlink.adb|  6 ++-
 gcc/ada/lib-util.adb|  3 +-
 gcc/ada/libgnat/a-except.adb|  4 +-
 gcc/ada/libgnat/a-exexpr.adb|  4 +-
 gcc/ada/libgnat/a-ngcoar.adb|  4 +-
 gcc/ada/libgnat/s-rannum.adb| 14 +++---
 gcc/ada/libgnat/s-trasym__dwarf.adb | 16 +++
 gcc/ada/osint.adb   |  8 ++--
 gcc/ada/rtsfind.adb |  4 +-
 gcc/ada/sem_case.adb| 16 +++
 gcc/ada/sem_ch12.adb| 18 
 gcc/ada/sem_ch13.adb| 12 ++---
 gcc/ada/sem_ch3.adb |  8 ++--
 gcc/ada/sem_ch6.adb |  4 +-
 gcc/ada/sem_eval.adb| 10 +++--
 gcc/ada/sem_prag.adb| 20 -
 gcc/ada/sem_util.adb| 68 ++---
 24 files changed, 151 insertions(+), 148 deletions(-)

diff --git a/gcc/ada/checks.ads b/gcc/ada/checks.ads
index 322629a3c1f..83d3fdb8329 100644
--- a/gcc/ada/checks.ads
+++ b/gcc/ada/checks.ads
@@ -49,8 +49,8 @@ package Checks is
   record
  Elements : Bit_Vector (1 .. Dimensions);
   end record;
-   Empty_Dimension_Set : constant Dimension_Set
- := (Dimensions => 0, Elements => (others => <>));
+   Empty_Dimension_Set : constant Dimension_Set :=
+ (Dimensions => 0, Elements => (others => <>));
 
procedure Initialize;
--  Called for each new main source program, to initialize internal
diff --git a/gcc/ada/cstand.adb b/gcc/ada/cstand.adb
index 6b45d252163..d2e4a6b0c82 100644
--- a/gcc/ada/cstand.adb
+++ b/gcc/ada/cstand.adb
@@ -1334,8 +1334,8 @@ package body CStand is
   --  used internally. They are unsigned types with the same length as
   --  the correspondingly named signed integer types.
 
-  Standard_Short_Short_Unsigned
-:= New_Standard_Entity ("short_short_unsigned");
+  Standard_Short_Short_Unsigned :=
+New_Standard_Entity ("short_short_unsigned");
   Build_Unsigned_Integer_Type
 (Standard_Short_Short_Unsigned, Standard_Short_Short_Integer_Size);
 
diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index aa6079d82b5..83b88e7cf73 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -7126,9 +7126,9 @@ package body Exp_Aggr is
 
   --  Determine whether this is an indexed aggregate (see RM 4.3.5(25/5))
 
-  Is_Indexed_Aggregate
-:= Sem_Aggr.Is_Indexed_Aggregate
- (N, Add_Unnamed_Subp, New_Indexed_Subp);
+  Is_Indexed_Aggregate :=
+Sem_Aggr.Is_Indexed_Aggregate
+  (N, Add_Unnamed_Subp, New_Indexed_Subp);
 
   --  The constructor for bounded containers is a function with
   --  a parameter that sets the size of the container. If the
@@ -7140,8 +7140,8 @@ package body Exp_Aggr is
   declare
  Count_Type : Entity_Id := Standard_Natural;
  Default: Node_Id   := Empty;
- Empty_First_Formal : constant Entity_Id
-:= First_Formal (Entity (Empty_Subp));
+ Empty_First_Formal : constant Entity_Id :=
+   First_Formal (Entity (Empty_Subp));
  Param_List : List_Id;
 
   begin
@@ -7636,8 +7636,8 @@ package body Exp_Aggr is
 
declare
   --  recursively get name for prefix
-  LHS_Prefix : constant Node_Id
-:= Make_Delta_Choice_LHS (Prefix (Choice), Deep_Choice);
+  LHS_Prefix : constant Node_Id :=
+Make_Delta_Choice_LHS (Prefix (Choice), Deep_Choice);
begin
   if Nkind (Choice) = N_Indexed_Component then
  return Make_Indexed_Component (Sloc (Choice),
diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 106305f4636..9024c1aebb2 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -2799,9 +2799,9 @@ package body Exp_Ch4 is
 
if Is_Constrained (Opnd_Typ) then
   declare
-

[COMMITTED 14/16] ada: Fix incorrect tracebacks on Windows

2024-08-23 Thread Marc Poulhiès
From: Sebastian Poeplau 

PECOFF symbols don't have a size attached to them. The symbol size that
System.Object_Reader.Read_Symbol guesses to make up for the lack of
information can be wrong when the symbol table doesn't match the
algorithm's expectations; in particular that's the case when function
symbols aren't sorted by address.

To avoid incorrect tracebacks caused by wrong symbol size guesses, don't
use the symbol size for PECOFF files when producing a traceback and
instead pick the symbol with the highest address lower than the target
address.

gcc/ada/

* libgnat/s-dwalin.adb (Symbolic_Address): Ignore symbol size in
address-to-symbol translation for PECOFF files.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-dwalin.adb | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/libgnat/s-dwalin.adb b/gcc/ada/libgnat/s-dwalin.adb
index 46a7d61e78d..028a55d1f20 100644
--- a/gcc/ada/libgnat/s-dwalin.adb
+++ b/gcc/ada/libgnat/s-dwalin.adb
@@ -1753,6 +1753,7 @@ package body System.Dwarf_Lines is
   Success  : Boolean;
   Done : Boolean;
   S: Object_Symbol;
+  Closest_S: Object_Symbol := Null_Symbol;
 
begin
   --  Initialize result
@@ -1801,7 +1802,22 @@ package body System.Dwarf_Lines is
   else
  S := First_Symbol (C.Obj.all);
  while S /= Null_Symbol loop
-if Spans (S, Addr_Int) then
+if Format (C.Obj.all) = PECOFF
+  or else Format (C.Obj.all) = PECOFF_PLUS
+then
+   --  Don't use the size of symbols from PECOFF files; it's
+   --  just a guess and can be unreliable. Instead, iterate
+   --  over the entire symbol table and use the symbol with the
+   --  highest address lower than Addr_Int.
+
+   if Closest_S = Null_Symbol
+ or else (Closest_S.Value < S.Value
+   and then S.Value <= Addr_Int)
+   then
+  Closest_S := S;
+   end if;
+
+elsif Spans (S, Addr_Int) then
Subprg_Name := Object_Reader.Name (C.Obj.all, S);
exit;
 end if;
@@ -1809,6 +1825,14 @@ package body System.Dwarf_Lines is
 S := Next_Symbol (C.Obj.all, S);
  end loop;
 
+ if (Format (C.Obj.all) = PECOFF
+ or else Format (C.Obj.all) = PECOFF_PLUS)
+   and then Closest_S /= Null_Symbol
+ then
+S := Closest_S; --  for consistency with non-PECOFF
+Subprg_Name := Object_Reader.Name (C.Obj.all, S);
+ end if;
+
  --  Search address in aranges table
 
  Aranges_Lookup (C, Addr, Info_Offset, Success);
-- 
2.45.2



[COMMITTED 11/16] ada: Eliminated-mode overflow check not eliminated

2024-08-23 Thread Marc Poulhiès
From: Steve Baird 

If the Overflow_Mode in effect is Eliminated, then evaluating an arithmetic
op such as addition or subtraction should not fail an overflow check.
Fix a bug which resulted in such an overflow check failure.

gcc/ada/

* checks.adb (Is_Signed_Integer_Arithmetic_Op): Return True in the
case of relational operator whose operands are of a signed integer
type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 3650c070b7a..83879a519f7 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -330,10 +330,11 @@ package body Checks is
 
function Is_Signed_Integer_Arithmetic_Op (N : Node_Id) return Boolean;
--  Returns True if node N is for an arithmetic operation with signed
-   --  integer operands. This includes unary and binary operators, and also
-   --  if and case expression nodes where the dependent expressions are of
-   --  a signed integer type. These are the kinds of nodes for which special
-   --  handling applies in MINIMIZED or ELIMINATED overflow checking mode.
+   --  integer operands. This includes unary and binary operators (including
+   --  comparison operators), and also if and case expression nodes which
+   --  yield a value of a signed integer type.
+   --  These are the kinds of nodes for which special handling applies in
+   --  MINIMIZED or ELIMINATED overflow checking mode.
 
function Range_Or_Validity_Checks_Suppressed
  (Expr : Node_Id) return Boolean;
@@ -8337,6 +8338,9 @@ package body Checks is
  =>
 return Is_Signed_Integer_Type (Etype (N));
 
+ when N_Op_Compare =>
+return Is_Signed_Integer_Type (Etype (Left_Opnd (N)));
+
  when N_Case_Expression
 | N_If_Expression
  =>
-- 
2.45.2



[COMMITTED 13/16] ada: Crash on string interpolation with custom string types

2024-08-23 Thread Marc Poulhiès
From: Javier Miranda 

The compiler crashes when processing an object declaration
of a custom string type initialized with an interpolated
string.

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference: [Put_Image]): Add
support for custom string types.
* exp_ch2.adb (Expand_N_Interpolated_String_Literal): Add a type
conversion to the result object declaration of custom string
types.
* exp_put_image.adb (Build_String_Put_Image_Call): Handle custom
string types.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_attr.adb  | 28 +++-
 gcc/ada/exp_ch2.adb   | 14 ++
 gcc/ada/exp_put_image.adb | 36 +++-
 3 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index 6475308f71b..84c7a4bbdee 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -6006,6 +6006,7 @@ package body Exp_Attr is
   when Attribute_Put_Image => Put_Image : declare
  use Exp_Put_Image;
  U_Type : constant Entity_Id := Underlying_Type (Entity (Pref));
+ C_Type : Entity_Id;
  Pname  : Entity_Id;
  Decl   : Node_Id;
 
@@ -6031,6 +6032,21 @@ package body Exp_Attr is
  end if;
 
  if No (Pname) then
+if Is_String_Type (U_Type) then
+   declare
+  R : constant Entity_Id := Root_Type (U_Type);
+
+   begin
+  if Is_Private_Type (R) then
+ C_Type := Component_Type (Full_View (R));
+  else
+ C_Type := Component_Type (R);
+  end if;
+
+  C_Type := Root_Type (Underlying_Type (C_Type));
+   end;
+end if;
+
 --  If Put_Image is disabled, call the "unknown" version
 
 if not Put_Image_Enabled (U_Type) then
@@ -6046,7 +6062,17 @@ package body Exp_Attr is
Analyze (N);
return;
 
-elsif Is_Standard_String_Type (U_Type) then
+--  String type objects, including custom string types, and
+--  excluding C arrays.
+
+elsif Is_String_Type (U_Type)
+  and then C_Type in Standard_Character
+   | Standard_Wide_Character
+   | Standard_Wide_Wide_Character
+  and then (not RTU_Loaded (Interfaces_C)
+  or else Enclosing_Lib_Unit_Entity (U_Type)
+/= RTU_Entity (Interfaces_C))
+then
Rewrite (N, Build_String_Put_Image_Call (N));
Analyze (N);
return;
diff --git a/gcc/ada/exp_ch2.adb b/gcc/ada/exp_ch2.adb
index 958f4299b73..99a16947525 100644
--- a/gcc/ada/exp_ch2.adb
+++ b/gcc/ada/exp_ch2.adb
@@ -768,6 +768,7 @@ package body Exp_Ch2 is
New_Occurrence_Of (Sink_Entity, Loc;
 
  Actions  : constant List_Id := New_List;
+ U_Type   : constant Entity_Id := Underlying_Type (Etype (N));
  Elem_Typ : Entity_Id;
  Str_Elem : Node_Id;
 
@@ -810,6 +811,19 @@ package body Exp_Ch2 is
 Next (Str_Elem);
  end loop;
 
+ --  Add a type conversion to the result object declaration of custom
+ --  string types.
+
+ if not Is_Standard_String_Type (U_Type)
+   and then (not RTU_Loaded (Interfaces_C)
+   or else Enclosing_Lib_Unit_Entity (U_Type)
+ /= RTU_Entity (Interfaces_C))
+ then
+Set_Expression (Result_Decl,
+  Convert_To (Etype (N),
+Relocate_Node (Expression (Result_Decl;
+ end if;
+
  Append_To (Actions, Result_Decl);
 
  return Make_Expression_With_Actions (Loc,
diff --git a/gcc/ada/exp_put_image.adb b/gcc/ada/exp_put_image.adb
index 217c38a30e7..190ac99b565 100644
--- a/gcc/ada/exp_put_image.adb
+++ b/gcc/ada/exp_put_image.adb
@@ -417,14 +417,48 @@ package body Exp_Put_Image is
   Lib_RE  : RE_Id;
   use Stand;
begin
+  pragma Assert (Is_String_Type (U_Type));
+  pragma Assert (not RTU_Loaded (Interfaces_C)
+or else Enclosing_Lib_Unit_Entity (U_Type)
+  /= RTU_Entity (Interfaces_C));
+
   if R = Standard_String then
  Lib_RE := RE_Put_Image_String;
   elsif R = Standard_Wide_String then
  Lib_RE := RE_Put_Image_Wide_String;
   elsif R = Standard_Wide_Wide_String then
  Lib_RE := RE_Put_Image_Wide_Wide_String;
+
   else
- raise Program_Error;
+ --  Handle custom string types. For example:
+
+ -- type T is array (1 .. 10) of Character;
+ -- Obj : T := (others => 'A');
+ -- ...
+ -- Put (Obj'Image);
+
+ declare
+C_Type : Entity_Id;
+
+ 

[COMMITTED 16/16] ada: Fix crash on aliased variable with packed array type and -g switch

2024-08-23 Thread Marc Poulhiès
From: Eric Botcazou 

This comes from a loophole in gnat_get_array_descr_info for record types
containing a template, which represent an aliased array, when this array
type is bit-packed and implemented as a modular integer.

gcc/ada/

* gcc-interface/misc.cc (gnat_get_array_descr_info): Test the
BIT_PACKED_ARRAY_TYPE_P flag only once on the final debug type. In
the case of records containing a template, replay the entire
processing for the array type contained therein.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/misc.cc | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index f77629ce70b..13cb39e91cb 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -784,7 +784,7 @@ gnat_get_array_descr_info (const_tree const_type,
 {
   tree type = const_cast (const_type);
   tree first_dimen, dimen;
-  bool is_bit_packed_array, is_array;
+  bool is_array;
   int i;
 
   /* Temporaries created in the first pass and used in the second one for thin
@@ -797,12 +797,7 @@ gnat_get_array_descr_info (const_tree const_type,
   /* If we have an implementation type for a packed array, get the original
  array type.  */
   if (TYPE_IMPL_PACKED_ARRAY_P (type) && TYPE_ORIGINAL_PACKED_ARRAY (type))
-{
-  is_bit_packed_array = BIT_PACKED_ARRAY_TYPE_P (type);
-  type = TYPE_ORIGINAL_PACKED_ARRAY (type);
-}
-  else
-is_bit_packed_array = false;
+type = TYPE_ORIGINAL_PACKED_ARRAY (type);
 
   /* First pass: gather all information about this array except everything
  related to dimensions.  */
@@ -833,6 +828,14 @@ gnat_get_array_descr_info (const_tree const_type,
   tree array_field = DECL_CHAIN (bounds_field);
   tree array_type = TREE_TYPE (array_field);
 
+  /* Replay the entire processing for array types.  */
+  if (TYPE_CAN_HAVE_DEBUG_TYPE_P (array_type)
+  && TYPE_DEBUG_TYPE (array_type))
+array_type = TYPE_DEBUG_TYPE (array_type);
+  if (TYPE_IMPL_PACKED_ARRAY_P (array_type)
+  && TYPE_ORIGINAL_PACKED_ARRAY (array_type))
+array_type = TYPE_ORIGINAL_PACKED_ARRAY (array_type);
+
   /* Shift back the address to get the address of the template.  */
   tree shift_amount
= fold_build1 (NEGATE_EXPR, sizetype, byte_position (array_field));
@@ -859,9 +862,7 @@ gnat_get_array_descr_info (const_tree const_type,
   /* If this array has fortran convention, it's arranged in column-major
  order, so our view here has reversed dimensions.  */
   const bool convention_fortran_p = TYPE_CONVENTION_FORTRAN_P (first_dimen);
-
-  if (BIT_PACKED_ARRAY_TYPE_P (first_dimen))
-is_bit_packed_array = true;
+  const bool is_bit_packed_array = BIT_PACKED_ARRAY_TYPE_P (first_dimen);
 
   /* ??? For row major ordering, we probably want to emit nothing and
  instead specify it as the default in Dw_TAG_compile_unit.  */
-- 
2.45.2



[PATCH] c++: Add testcase for (now fixed) regression [PR113746]

2024-08-23 Thread Simon Martin
The case in PR113746 used to ICE until commit f04dc89a991. This patch
simply adds the case to the testsuite.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/113746

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash76.C: New test.

---
 gcc/testsuite/g++.dg/parse/crash76.C | 6 ++
 1 file changed, 6 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/parse/crash76.C

diff --git a/gcc/testsuite/g++.dg/parse/crash76.C 
b/gcc/testsuite/g++.dg/parse/crash76.C
new file mode 100644
index 000..6fbd1fa9f7e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash76.C
@@ -0,0 +1,6 @@
+// PR c++/113746
+// { dg-do compile }
+
+template struct S { // { dg-error "not been declared" }
+  enum { e0 = 0, e00 = e0 };
+};
-- 
2.44.0




[Fortran, Patch, PR85002, v1] Fix deep-copy of alloc. comps. in coarrays ICEing and crashing w/ lib.

2024-08-23 Thread Andre Vehreschild
Hi all,

attached patch fixes an ICE during trans-phase when allocatable components in
derived typed coarrays were nested. I am nearly convinced, that the ICE is
mostly fixed by pr86468, because I get a slightly different ICE. Nevertheless
was the executable crashing with -fcoarray=lib because the deep copy was not
inserted in the coarray case, which is fixed by this patch now. Furthermore did
I correct a comment, that was describing the inverse of the code following.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From db6ca3808956b6e7c6950cbf572f2bba0995e03e Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Fri, 23 Aug 2024 09:07:09 +0200
Subject: [PATCH] [Fortran] Fix deep copy allocatable components in coarrays.
 [PR85002]

Fix code for deep copy of allocatable components in derived type nested
structures generated, but not inserted when the copy had to be done in
a coarray.  Additionally fix a comment.

gcc/fortran/ChangeLog:

	PR fortran/85002
	* trans-array.cc (duplicate_allocatable_coarray): Allow adding
	of deep copy code in the when-allocated case.  Add bounds
	computation before condition, because coarrays need the bounds
	also when not allocated.
	(structure_alloc_comps): Duplication in the coarray case is done
	already, omit it.  Add the deep-code when duplication a coarray.
	* trans-expr.cc (gfc_trans_structure_assign): Fix comment.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/alloc_comp_9.f90: New test.
---
 gcc/fortran/trans-array.cc| 16 ++---
 gcc/fortran/trans-expr.cc |  2 +-
 .../gfortran.dg/coarray/alloc_comp_9.f90  | 23 +++
 3 files changed, 32 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/alloc_comp_9.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index ea5fff2e0c2..3472fc5b636 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -9505,10 +9505,9 @@ gfc_duplicate_allocatable_nocopy (tree dest, tree src, tree type, int rank)
 NULL_TREE, NULL_TREE);
 }

-
 static tree
-duplicate_allocatable_coarray (tree dest, tree dest_tok, tree src,
-			   tree type, int rank)
+duplicate_allocatable_coarray (tree dest, tree dest_tok, tree src, tree type,
+			   int rank, tree add_when_allocated)
 {
   tree tmp;
   tree size;
@@ -9562,7 +9561,7 @@ duplicate_allocatable_coarray (tree dest, tree dest_tok, tree src,
   gfc_add_modify (&globalblock, tmp, build_int_cst (TREE_TYPE (tmp), rank));

   if (rank)
-	nelems = gfc_full_array_size (&block, src, rank);
+	nelems = gfc_full_array_size (&globalblock, src, rank);
   else
 	nelems = integer_one_node;

@@ -9593,7 +9592,7 @@ duplicate_allocatable_coarray (tree dest, tree dest_tok, tree src,
  fold_convert (size_type_node, size));
   gfc_add_expr_to_block (&block, tmp);
 }
-
+  gfc_add_expr_to_block (&block, add_when_allocated);
   tmp = gfc_finish_block (&block);

   /* Null the destination if the source is null; otherwise do
@@ -9772,7 +9771,7 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl, tree dest,
 	 gfc_duplicate_allocatable (), where the deep copy code is just added
 	 into the if's body, by adding tmp (the deep copy code) as last
 	 argument to gfc_duplicate_allocatable ().  */
-  if (purpose == COPY_ALLOC_COMP
+  if (purpose == COPY_ALLOC_COMP && caf_mode == 0
 	  && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (dest)))
 	tmp = gfc_duplicate_allocatable (dest, decl, decl_type, rank,
 	 tmp);
@@ -10502,8 +10501,9 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl, tree dest,
 		 c->caf_token,
 		 NULL_TREE);
 		}
-		  tmp = duplicate_allocatable_coarray (dcmp, dst_tok, comp,
-		   ctype, rank);
+		  tmp
+		= duplicate_allocatable_coarray (dcmp, dst_tok, comp, ctype,
+		 rank, add_when_allocated);
 		}
 	  else
 		tmp = gfc_duplicate_allocatable (dcmp, comp, ctype, rank,
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 4681a131139..4489490bb7d 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -9692,7 +9692,7 @@ gfc_trans_structure_assign (tree dest, gfc_expr * expr, bool init, bool coarray)

   /* Register the component with the caf-lib before it is initialized.
 	 Register only allocatable components, that are not coarray'ed
-	 components (%comp[*]).  Only register when the constructor is not the
+	 components (%comp[*]).  Only register when the constructor is the
 	 null-expression.  */
   if (coarray && !cm->attr.codimension
 	  && (cm->attr.allocatable || cm->attr.pointer)
diff --git a/gcc/testsuite/gfortran.dg/coarray/alloc_comp_9.f90 b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_9.f90
new file mode 100644
index 000..d8e739a07d8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_9.f90
@@ -0,0 +1,23 @@
+!{ dg-do run }
+
+! C

Re: [PATCH] Fix test failure on powerpc targets

2024-08-23 Thread Richard Biener



> Am 23.08.2024 um 06:42 schrieb Bernd Edlinger :
> 
> Apparently due to slightly different optimization levels
> not always both subroutines have multiple subranges,
> but having at least one such, and no lexical blocks
> is sufficient to prove that the fix worked.  Q.E.D.
> So reduce the test expectations to only at least one
> inlined subroutine with multiple subranges.

Ok

Richard 

> gcc/testsuite/ChangeLog:
> 
>PR 116462
>* gcc.dg/debug/dwarf2/inline7.c: Reduce test expectations.
> ---
> gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> fix was confirmed by reporter, OK for trunk?
> 
> diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c 
> b/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c
> index 48d457216b1..083df5b586c 100644
> --- a/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c
> +++ b/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c
> @@ -1,9 +1,9 @@
> -/* Verify that both inline instances have a DW_AT_ranges but
> -   no extra DW_TAG_lexical_block.  */
> +/* Verify that at least one of both inline instances have
> +   a DW_AT_ranges but no extra DW_TAG_lexical_block.  */
> /* { dg-options "-O -gdwarf -dA" } */
> /* { dg-do compile } */
> /* { dg-final { scan-assembler-times "\\(DIE \\(\[^\n\]*\\) 
> DW_TAG_inlined_subroutine" 2 } } */
> -/* { dg-final { scan-assembler-times " DW_AT_ranges" 2 } } */
> +/* { dg-final { scan-assembler " DW_AT_ranges" } } */
> /* { dg-final { scan-assembler-times "\\(DIE \\(\[^\n\]*\\) 
> DW_TAG_lexical_block" 0 } } */
> 
> static int foo (int i)
> --
> 2.39.2
> 


Re: [patch][v2a] libgomp: Add interop types and routines to OpenMP's headers and module

2024-08-23 Thread Andre Vehreschild
Oh, and I get compile errors:

/mnt/work_store/gcc/gcc.test/libgomp/target.c: In function 
'omp_get_interop_type_desc':
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5179:30: error: excess elements 
in scalar initializer [-Werror]
 5179 |  "const char*", /* fr_name */
  |  ^
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5179:30: note: (near 
initialization for 'desc')
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5180:30: error: excess elements 
in scalar initializer [-Werror]
 5180 |  "int", /* vendor */
  |  ^
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5180:30: note: (near 
initialization for 'desc')
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5181:30: error: excess elements 
in scalar initializer [-Werror]
 5181 |  "const char *",/* vendor_name */
  |  ^~
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5181:30: note: (near 
initialization for 'desc')
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5182:30: error: excess elements 
in scalar initializer [-Werror]
 5182 |  "int"};/* device_num */
  |  ^
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5182:30: note: (near 
initialization for 'desc')
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5185:21: error: 'fr_id' 
undeclared (first use in this function)
 5185 |   if (property_id > fr_id || property_id < omp_ipr_first)
  | ^
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5185:21: note: each undeclared 
identifier is reported only once for each function it appears in
/mnt/work_store/gcc/gcc.test/libgomp/target.c:5188:16: error: returning 'char' 
from a function with return type 'const char *' makes pointer from integer 
without a cast [-Wint-conversion]
 5188 | return desc[omp_ipr_fr_id - property_id];
  |^

- Andre

On Fri, 23 Aug 2024 10:28:56 +0200
Andre Vehreschild  wrote:

> Hi Tobias,
> 
> I just had a short look at your PR. Besides that it did not git-am for me (see
> below), I have only one question (see also below). Please note, that I have
> only user-level experience in OpenMP and can say nothing about completeness or
> soundness of your PR. I hope that a first check on overall style motivates a
> "pro" to have a more in-depth look.
> 
> So my "ok" is just for style and overall applicability.
> 
> Regards,
>   Andre
> 
> On Thu, 22 Aug 2024 09:14:58 +0200
> Tobias Burnus  wrote:
> 
> > This is nearly identical to v2, except that I presumably used 'git add 
> > testsuite' when intending to use 'git add -u testsuite' in a last-minute 
> > change as it contained a bunch of unrelated test files …
> > 
> > The only other change besides removing unrelated files  is that for the 
> > generic part of omp_get_interop_type_desc, the data types ('int' for 
> > fr_id, vendor, device_num; const char*' for fr_name, vendor_name) are 
> > now returned in target.c while the specific types (for device, 
> > device_context, targetsync platform) will eventually be handled by the 
> > plugin function.
> > 
> > Tobias
> > 
> > Am 21.08.24 um 20:27 schrieb Tobias Burnus:  
> > > Nearly identical to v1, except that I realized that OpenMP permits to 
> > > call those functions also from target regions.
> > >
> > > Hence, those also got those functions, including a use of 
> > > omp_irc_other to make clear why it will fail …
> > >
> > > In addition, two (nonhost) target-region test files were added.
> > >
> > > Comments, remarks, suggestions before I commit it?  
> 
> 
> Attachment: 
> 
> > libgomp: Add interop types and routines to OpenMP's headers and module  
> 
> git am did not work for me (sorry for the German):
> $ git am interop-1v2a.diff
> Wende an: This commit adds OpenMP 5.1+'s interop enumeration, type and routine
> /mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:839:
> indent with spaces. "const char*", /* fr_name */
> /mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:840:
> indent with spaces. "int", /* vendor */
> /mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:841:
> indent with spaces. "const char *",/* vendor_name */
> /mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:842:
> indent with spaces. "int"};/* device_num */
> /mnt/work_store/gcc/gcc/.git/worktrees/gcc.test/rebase-apply/patch:1332:
> space before tab in indent. "omp_interop_none"));  /* GCC implementation
> choice.  */ Warnung: 5 Zeilen fügen Whitespace-Fehler hinzu.
> Schwerwiegend: Leerer Name in Identifikation (für <>) nicht erlaubt.
> 
> > diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
> > index 9cafea4e2cc..e9141f20ef3 100644
> > --- a/libgomp/config/gcn/target.c
> > +++ b/libgomp/co

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-23 Thread HAO CHEN GUI
Hi Hongtao,

在 2024/8/23 11:47, Hongtao Liu 写道:
> On Fri, Aug 23, 2024 at 11:03 AM HAO CHEN GUI  wrote:
>>
>> Hi Hongtao,
>>
>> 在 2024/8/23 9:47, Hongtao Liu 写道:
>>> On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI  wrote:

 Hi Hongtao,

 在 2024/8/21 11:21, Hongtao Liu 写道:
> r15-3058-gbb42c551905024 support const0 operand for movv16qi, please
> rebase your patch and see if there's still the regressions.

 There's still regressions. The patch enables V16QI const0 store, but
 it also enables V8QI const0 store. The vector mode is preferable than
 scalar mode so that V8QI is used for 8-byte memory clear instead of
 DI. It's sub-optimal.
>>> Could we check if mode_size is greater than HOST_BITS_PER_WIDE_INT?
>> Not sure if all targets prefer it. Richard & Jeff, what's your opinion?
>>
>> IMHO, could we disable it from predicate or convert it to DI mode store
>> if V8QI const0 store is sub-optimal on i386?
>>
>>

 Another issue is it takes lots of subreg to generate an all-zero
 V16QI register sometime. As PR92080 has been fixed, it can't reuse
 existing all-zero V16QI register.
> Backend rtx_cost needs to be adjusted to prevent const0 propagation.
> The current rtx_cost for const0 for i386 is 0, which will enable
> propagation of const0.
> 
>/* If MODE2 is appropriate for an MMX register, then tie
> @@ -21588,10 +21590,12 @@ ix86_rtx_costs (rtx x, machine_mode mode,
> int outer_code_i, int opno,
> case 0:
>   break;
> case 1:  /* 0: xor eliminates false dependency */
> - *total = 0;
> + /* Add extra cost 1 to prevent propagation of CONST_VECTOR
> +for SET, which will enable more CSE optimization.  */
> + *total = 0 + (outer_code == SET);
>   return true;
> default: /* -1: cmp contains false dependency */
> - *total = 1;
> + *total = 1 + (outer_code == SET);
>   return true;
> }
> 
> the upper hunk should help for that.
Sorry, I didn't get your point. Which problem it will fix? I tested
upper code. Nothing changed. Which kind of const0 propagation you want
to prevent?

Thanks
Gui Haochen


 (insn 16 15 17 (set (reg:V4SI 118)
 (const_vector:V4SI [
 (const_int 0 [0]) repeated x4
 ])) "auto-init-7.c":25:12 -1
  (nil))

 (insn 17 16 18 (set (reg:V8HI 117)
 (subreg:V8HI (reg:V4SI 118) 0)) "auto-init-7.c":25:12 -1
  (nil))

 (insn 18 17 19 (set (reg:V16QI 116)
 (subreg:V16QI (reg:V8HI 117) 0)) "auto-init-7.c":25:12 -1
  (nil))

 (insn 19 18 0 (set (mem/c:V16QI (plus:DI (reg:DI 114)
 (const_int 12 [0xc])) [0 MEM  [(void 
 *)&temp3]+12 S16 A32])
 (reg:V16QI 116)) "auto-init-7.c":25:12 -1
  (nil))
>>> I think those subregs can be simplified by later rtl passes?
>>
>> Here is the final dump. There are two all-zero 16-byte vector
>> registers. It can't figure out V4SI could be a subreg of V16QI.
>>
>> (insn 14 56 15 2 (set (reg:V16QI 20 xmm0 [115])
>> (const_vector:V16QI [
>> (const_int 0 [0]) repeated x16
>> ])) "auto-init-7.c":25:12 2154 {movv16qi_internal}
>>  (nil))
>> (insn 15 14 16 2 (set (mem/c:V16QI (reg:DI 0 ax [114]) [0 MEM  
>> [(void *)&temp3]+0 S16 A128])
>> (reg:V16QI 20 xmm0 [115])) "auto-init-7.c":25:12 2154 
>> {movv16qi_internal}
>>  (nil))
>> (insn 16 15 19 2 (set (reg:V4SI 20 xmm0 [118])
>> (const_vector:V4SI [
>> (const_int 0 [0]) repeated x4
>> ])) "auto-init-7.c":25:12 2160 {movv4si_internal}
>>  (nil))
>> (insn 19 16 57 2 (set (mem/c:V16QI (plus:DI (reg:DI 0 ax [114])
>> (const_int 12 [0xc])) [0 MEM  [(void 
>> *)&temp3]+12 S16 A32])
>> (reg:V16QI 20 xmm0 [116])) "auto-init-7.c":25:12 2154 
>> {movv16qi_internal}
>>
>> Thanks
>> Gui Haochen
>>

 Thanks
 Gui Haochen
>>>
>>>
>>>
> 
> 
> 


Re: [PATCH] Use add_name_and_src_coords_attributes in modified_type_die

2024-08-23 Thread Richard Biener
On Thu, Aug 22, 2024 at 5:04 PM Tom Tromey  wrote:
>
> > "Richard" == Richard Biener  writes:
>
> >> While working on a patch to the Ada compiler, I found a spot in
> >> dwarf2out.cc that calls add_name_attribute where a call to
> >> add_name_and_src_coords_attributes would be better, because the latter
> >> respects DECL_NAMELESS.
>
> Richard> If the point is DECL_NAMELESS shouldn't we omit the typedef DIE
> Richard> instead?
>
> At least for Ada this doesn't really seem to change typedefs.  Instead
> it touches types like this:
>
>  <1><26a>: Abbrev Number: 25 (DW_TAG_pointer_type)
> <26b>   DW_AT_byte_size   : 8
> <26c>   DW_AT_name: (indirect string, offset: 0x8f): 
> foo__TTdmSCFD__B12b__P11b
> <270>   DW_AT_type: <0x1e7>
> <274>   DW_AT_artificial  : 1
>
> Richard> A less controversial patch might be to use
> Richard> dwarf2_name (name, 0) instead of IDENTIFIER_POINTER
>
> I can try that if you still think it's desirable; I just went with the
> minimal change that made sense.

Well, in addition to honoring DECL_NAMELESS you'd get SRC coord
attributes set - that
ignores the possibility that not doing that was on purpose here?
Honoring DECL_NAMELESS
is obvious enough to me that I'd approve such a change - for the rest
I'd have to dig more
into the code to understand why it's using add_name_attribute in the
first place ...

Richard.

>
> Tom


Re: [PATCH] libcpp: bump padding size in _cpp_convert_input [PR116458]

2024-08-23 Thread Richard Biener
On Thu, Aug 22, 2024 at 8:25 PM Alexander Monakov  wrote:
>
> The recently introduced search_line_fast_ssse3 raised padding
> requirement from 16 to 64, which was adjusted in read_file_guts,
> but the corresponding ' + 16' in _cpp_convert_input was overlooked.

OK

> libcpp/ChangeLog:
>
> PR preprocessor/116458
> * charset.cc (_cpp_convert_input): Bump padding to 64 if
> HAVE_SSSE3.
> ---
>  libcpp/charset.cc | 21 -
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/libcpp/charset.cc b/libcpp/charset.cc
> index d58319a500..79072877cb 100644
> --- a/libcpp/charset.cc
> +++ b/libcpp/charset.cc
> @@ -3093,6 +3093,7 @@ _cpp_convert_input (cpp_reader *pfile, const char 
> *input_charset,
>struct cset_converter input_cset;
>struct _cpp_strbuf to;
>unsigned char *buffer;
> +  size_t pad;
>
>input_cset = init_iconv_desc (pfile, SOURCE_CHARSET, input_charset);
>if (input_cset.func == convert_no_conversion)
> @@ -3129,16 +3130,18 @@ _cpp_convert_input (cpp_reader *pfile, const char 
> *input_charset,
> }
>  }
>
> +#ifdef HAVE_SSSE3
> +  pad = 64;
> +#else
> +  pad = 16;
> +#endif
>/* Resize buffer if we allocated substantially too much, or if we
> - haven't enough space for the \n-terminator or following
> - 15 bytes of padding (used to quiet warnings from valgrind or
> - Address Sanitizer, when the optimized lexer accesses aligned
> - 16-byte memory chunks, including the bytes after the malloced,
> - area, and stops lexing on '\n').  */
> -  if (to.len + 4096 < to.asize || to.len + 16 > to.asize)
> -to.text = XRESIZEVEC (uchar, to.text, to.len + 16);
> -
> -  memset (to.text + to.len, '\0', 16);
> + don't have enough space for the following padding, which allows
> + search_line_fast to use (possibly misaligned) vector loads.  */
> +  if (to.len + 4096 < to.asize || to.len + pad > to.asize)
> +to.text = XRESIZEVEC (uchar, to.text, to.len + pad);
> +
> +  memset (to.text + to.len, '\0', pad);
>
>/* If the file is using old-school Mac line endings (\r only),
>   terminate with another \r, not an \n, so that we do not mistake
> --
> 2.46.0
>


Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-08-23 Thread Richard Biener
On Thu, Aug 22, 2024 at 8:36 PM Jakub Jelinek  wrote:
>
> On Tue, Aug 20, 2024 at 01:52:35PM +0200, Richard Biener wrote:
> > On Sat, Aug 17, 2024 at 11:18 PM Jakub Jelinek  wrote:
> > >
> > > On Sat, Aug 17, 2024 at 05:03:14AM +, Li, Pan2 wrote:
> > > > Please feel free to let me know if there is anything I can do to fix 
> > > > this issue. Thanks a lot.
> > >
> > > There is no bug.  The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to 
> > > have the same type, as described in the 
> > > __builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can 
> > > have different type and result yet another one, the behavior is then (as 
> > > if) to perform the operation in infinite precision and if that result 
> > > fits into the result type, there is no overflow, otherwise there is.
> > > So, there is no need to promote anything.
> >
> > Hmm, it's a bit awkward to have this state in the IL.
>
> Why?  These aren't the only internal functions which have different types
> of arguments, from the various widening ifns, conditional ifns,
> scatter/gather, ...  Even the WIDEN_*EXPR trees do have type differences
> among arguments.
> And it matches what the user builtin does.
>
> Furthermore, at least without _BitInt (but even with _BitInt at the maximum
> precision too) this might not be even possible.
> E.g. if there is __builtin_add_overflow with unsigned __int128 and __int128
> arguments and there are no wider types there is simply no type to use for both
> arguments, it would need to be a signed type with at least 129 bits...
>
> > I see that
> > expand_arith_overflow eventually applies
> > promotion, namely to the type of the LHS.
>
> The LHS doesn't have to be wider than the operand types, so it can't promote
> always.  Yes, in some cases it applies promotion if it is desirable for
> codegen purposes.  But without the promotions explicitly in the IL it
> doesn't need to rely on VRP to figure out how to expand it exactly.
>
> > Exposing this earlier could
> > enable optimization even
>
> Which optimizations?

I was thinking of merging conversions with that implied promotion.

>  We already try to fold the .{ADD,SUB,MUL}_OVERFLOW
> builtins to constants or non-overflowing arithmetics etc. as soon as we
> can e.g. using ranges prove the operation will never overflow or will always
> overflow.  Doing unnecessary promotion (see above that it might not be
> always possible at all) would just make the IL larger and risk we during
> expansion actually perform the promotions even when we don't have to.
> We on the other side already have match.pd rules to undo such promotions
> in the operands.  See
> /* Demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW.  */
> And the result (well, TREE_TYPE of the lhs type) can be yet another type,
> not related to either of those in any way.

OK, fair enough.  I think this also shows again the lack of documentation
of internal function signatures (hits me all the time with the more complex
ones like MASK_LEN_GATHER_LOAD where I always wonder which
argument is what) as well as IL type checking (which can also serve as
documentation about argument constraints).

IMO comments in internal-fn.def would suffice for the former (like effectively
tree.h/def provide authority for tree codes);  for IL verification a function
in internal-fn.cc would be suitable, we can call that from call verification.

For the saturating matching this means strictly matching types
(I think that's what was proposed) is probably the least complicated
variant to reason OK.

Thanks,
Richard.

> > apart from IL hygiene which is violated with different typed operands
> > of an ADD, SUB or MUL.
>
> Jakub
>


Re: [PATCH] Don't remove /usr/lib and /lib from when passing to the linker [PR97304/104707]

2024-08-23 Thread Richard Biener
On Fri, Aug 23, 2024 at 9:18 AM Gerald Pfeifer  wrote:
>
> On Thu, 22 Aug 2024, Andrew Pinski wrote:
> > With newer ld, the default search library path does not include /usr/lib
> > nor /lib but the driver decides to not pass -L down to the link for
> > these and then in some/most cases libc is not found.
> > This code dates from at least 1992 and it is done in a way which is not
> > safe and does not make sense. So let's remove it.
> >
> > Bootstrapped and tested on x86_64-linux-gnu (which defaults to being a
> > multilib).
>
> Also bootstrapped on x86_64-unknown-freebsd13.3 (this was originally
> reported against the earlier x86_64-unknown-freebsd12.1) on a system
> where I also ran into this in April.

OK if there are no objections until next week.

Thanks,
Richard.

> > gcc/ChangeLog:
> >
> >   PR driver/104707
> >   PR driver/97304
> >
> >   * gcc.cc (is_directory): Don't not include /usr/lib and /lib
> >   for library directory pathes. Remove library argument.
> >   (add_to_obstack): Update call to is_directory.
> >   (driver_handle_option): Likewise.
> >   (spec_path): Likewise.
>
> For the ChangeLog, maybe use "Don't remove /usr/lib and /lib from library
> directory paths" similar to the subject?
>
> (My brain originally "autocorrected" and contracted "Don't not"...)
>
>
> Thank you for tackling this longer standing issue which has been rearing
> its head again and again!
>
> Gerald


RE: [PATCH v3] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook

2024-08-23 Thread Prathamesh Kulkarni


> -Original Message-
> From: Richard Biener 
> Sent: Thursday, August 22, 2024 2:16 PM
> To: H.J. Lu 
> Cc: gcc-patches@gcc.gnu.org; josmy...@redhat.com
> Subject: Re: [PATCH v3] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker
> plugin hook
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Aug 21, 2024 at 4:25 PM H.J. Lu  wrote:
> >
> > This hook allows the BFD linker plugin to distinguish calls to
> > claim_file_handler that know the object is being used by the linker
> > (from ldmain.c:add_archive_element), from calls that don't know it's
> > being used by the linker (from elf_link_is_defined_archive_symbol);
> in
> > the latter case, the plugin should avoid including the unused LTO
> > archive members in link output.  To get the proper support for
> > archives with LTO common symbols, the linker fix
> 
> OK.
Hi,
After this commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a98dd536b1017c2b814a3465206c6c01b2890998
I am no longer able to see mkoffload (and accel compiler) being invoked for 
nvptx (-save-temps also doesn't show accel dumps).
I have attached -v output before and after the commit for x86_64->nvptx 
offloading for the following simple test (host doesn't really matter, can also 
reproduce with aarch64 host):

int main()
{
  int x = 1;
  #pragma omp target map(x)
x = 5;
  return x;
}

Thanks,
Prathamesh
> 
> Thanks,
> Richard.
> 
> > commit a6f8fe0a9e9cbe871652e46ba7c22d5e9fb86208
> > Author: H.J. Lu 
> > Date:   Wed Aug 14 20:50:02 2024 -0700
> >
> > lto: Don't include unused LTO archive members in output
> >
> > is required.
> >
> > PR lto/116361
> > * lto-plugin.c (claim_file_handler_v2): Rename claimed to
> > can_be_claimed.  Include the LTO object only if it is known
> to
> > be included in link output.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  lto-plugin/lto-plugin.c | 53
> > -
> >  1 file changed, 31 insertions(+), 22 deletions(-)
> >
> > diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index
> > 152648338b9..61b0de62f52 100644
> > --- a/lto-plugin/lto-plugin.c
> > +++ b/lto-plugin/lto-plugin.c
> > @@ -1191,16 +1191,19 @@ process_offload_section (void *data, const
> char *name, off_t offset, off_t len)
> >return 1;
> >  }
> >
> > -/* Callback used by a linker to check if the plugin will claim
> FILE. Writes
> > -   the result in CLAIMED.  If KNOWN_USED, the object is known by
> the linker
> > -   to be used, or an older API version is in use that does not
> provide that
> > -   information; otherwise, the linker is only determining whether
> this is
> > -   a plugin object and it should not be registered as having
> offload data if
> > -   not claimed by the plugin.  */
> > +/* Callback used by a linker to check if the plugin can claim FILE.
> > +   Writes the result in CAN_BE_CLAIMED.  If KNOWN_USED != 0, the
> object
> > +   is known by the linker to be included in link output, or an
> older API
> > +   version is in use that does not provide that information.
> Otherwise,
> > +   the linker is only determining whether this is a plugin object
> and
> > +   only the symbol table is needed by the linker.  In this case,
> the
> > +   object should not be included in link output and this function
> will
> > +   be called by the linker again with KNOWN_USED != 0 after the
> linker
> > +   decides the object should be included in link output. */
> >
> >  static enum ld_plugin_status
> > -claim_file_handler_v2 (const struct ld_plugin_input_file *file, int
> *claimed,
> > -  int known_used)
> > +claim_file_handler_v2 (const struct ld_plugin_input_file *file,
> > +  int *can_be_claimed, int known_used)
> >  {
> >enum ld_plugin_status status;
> >struct plugin_objfile obj;
> > @@ -1229,7 +1232,7 @@ claim_file_handler_v2 (const struct
> ld_plugin_input_file *file, int *claimed,
> >  }
> >lto_file.handle = file->handle;
> >
> > -  *claimed = 0;
> > +  *can_be_claimed = 0;
> >obj.file = file;
> >obj.found = 0;
> >obj.offload = false;
> > @@ -1286,15 +1289,19 @@ claim_file_handler_v2 (const struct
> ld_plugin_input_file *file, int *claimed,
> >   lto_file.symtab.syms);
> >check (status == LDPS_OK, LDPL_FATAL, "could not add
> symbols");
> >
> > -  LOCK_SECTION;
> > -  num_claimed_files++;
> > -  claimed_files =
> > -   xrealloc (claimed_files,
> > - num_claimed_files * sizeof (struct
> plugin_file_info));
> > -  claimed_files[num_claimed_files - 1] = lto_file;
> > -  UNLOCK_SECTION;
> > +  /* Include it only if it is known to be used for link output.
> */
> > +  if (known_used)
> > +   {
> > + LOCK_SECTION;
> > + num_claimed_files++;
> > + claimed_files =
> > +   xrealloc (claimed_files,
> > + num_claimed_files * sizeof (struct
> plugin_file_info));
> > +   

Re: [RFC/RFA][PATCH v4 06/12] aarch64: Implement new expander for efficient CRC computation

2024-08-23 Thread Richard Biener
On Fri, Aug 23, 2024 at 9:55 AM Mariam Arutunian
 wrote:
>
>
> On Wed, Aug 21, 2024 at 5:56 PM Richard Sandiford  
> wrote:
>>
>> Mariam Arutunian  writes:
>> > This patch introduces two new expanders for the aarch64 backend,
>> > dedicated to generate optimized code for CRC computations.
>> > The new expanders are designed to leverage specific hardware capabilities
>> > to achieve faster CRC calculations,
>> > particularly using the crc32, crc32c and pmull instructions when supported
>> > by the target architecture.
>> >
>> > Expander 1: Bit-Forward CRC (crc4)
>> > For targets that support pmul instruction (TARGET_AES),
>> > the expander will generate code that uses the pmull (crypto_pmulldi)
>> > instruction for CRC computation.
>> >
>> > Expander 2: Bit-Reversed CRC (crc_rev4)
>> > The expander first checks if the target supports the CRC32* instruction set
>> > (TARGET_CRC32)
>> > and the polynomial in use is 0x1EDC6F41 (iSCSI) or 0x04C11DB7 (HDLC). If
>> > the conditions are met,
>> > it emits calls to the corresponding crc32* instruction (depending on the
>> > data size and the polynomial).
>> > If the target does not support crc32* but supports pmull, it then uses the
>> > pmull (crypto_pmulldi) instruction for bit-reversed CRC computation.
>> > Otherwise table-based CRC is generated.
>> >
>> >   gcc/config/aarch64/
>> >
>> > * aarch64-protos.h (aarch64_expand_crc_using_pmull): New extern
>> > function declaration.
>> > (aarch64_expand_reversed_crc_using_pmull):  Likewise.
>> > * aarch64.cc (aarch64_expand_crc_using_pmull): New function.
>> > (aarch64_expand_reversed_crc_using_pmull):  Likewise.
>> > * aarch64.md (crc_rev4): New expander for
>> > reversed CRC.
>> > (crc4): New expander for bit-forward CRC.
>> > * iterators.md (crc_data_type): New mode attribute.
>> >
>> >   gcc/testsuite/gcc.target/aarch64/
>> >
>> > * crc-1-pmul.c: New test.
>> > * crc-10-pmul.c: Likewise.
>> > * crc-12-pmul.c: Likewise.
>> > * crc-13-pmul.c: Likewise.
>> > * crc-14-pmul.c: Likewise.
>> > * crc-17-pmul.c: Likewise.
>> > * crc-18-pmul.c: Likewise.
>> > * crc-21-pmul.c: Likewise.
>> > * crc-22-pmul.c: Likewise.
>> > * crc-23-pmul.c: Likewise.
>> > * crc-4-pmul.c: Likewise.
>> > * crc-5-pmul.c: Likewise.
>> > * crc-6-pmul.c: Likewise.
>> > * crc-7-pmul.c: Likewise.
>> > * crc-8-pmul.c: Likewise.
>> > * crc-9-pmul.c: Likewise.
>> > * crc-CCIT-data16-pmul.c: Likewise.
>> > * crc-CCIT-data8-pmul.c: Likewise.
>> > * crc-coremark-16bitdata-pmul.c: Likewise.
>> > * crc-crc32-data16.c: Likewise.
>> > * crc-crc32-data32.c: Likewise.
>> > * crc-crc32-data8.c: Likewise.
>> > * crc-crc32c-data16.c: Likewise.
>> > * crc-crc32c-data32.c: Likewise.
>> > * crc-crc32c-data8.c: Likewise.
>>
>> OK for trunk once the prerequisites are approved.  Thanks for all your
>> work on this.
>>
>> Which other parts of the series still need review?  I can try to help
>> out with the target-independent bits.  (That said, I'm not sure I'm the
>> best person to review the tree recognition pass, but I can have a go.)
>>
>
> Thank you very much for everything.
> Right now, I'm not sure which parts would be best to be reviewed since 
> Richard Biener is currently reviewing them.
> Maybe I can ask for your help later?

I'm done with the parts I preserved for reviewing.  Btw, it seems the
vN series are not
complete, that is, you didn't re-post the entire series but only
changed parts?  I was
somewhat confused by that.

Richard.

> Thanks,
> Mariam
>
>> Richard
>>
>> >
>> > Signed-off-by: Mariam Arutunian 
>> > Co-authored-by: Richard Sandiford 
>> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> > b/gcc/config/aarch64/aarch64-protos.h
>> > index 42639e9efcf..469111e3b17 100644
>> > --- a/gcc/config/aarch64/aarch64-protos.h
>> > +++ b/gcc/config/aarch64/aarch64-protos.h
>> > @@ -1112,5 +1112,8 @@ extern void aarch64_adjust_reg_alloc_order ();
>> >
>> >  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
>> >  void aarch64_restore_za (rtx);
>> > +void aarch64_expand_crc_using_pmull (scalar_mode, scalar_mode, rtx *);
>> > +void aarch64_expand_reversed_crc_using_pmull (scalar_mode, scalar_mode, 
>> > rtx *);
>> > +
>> >
>> >  #endif /* GCC_AARCH64_PROTOS_H */
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 7f0cc47d0f0..0cb8f3e8090 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -30314,6 +30314,137 @@ aarch64_retrieve_sysreg (const char *regname, 
>> > bool write_p, bool is128op)
>> >return sysreg->encoding;
>> >  }
>> >
>> > +/* Generate assembly to calculate CRC
>> > +   using carry-less multiplication instruction.
>> > +   OPERANDS[1] is input CRC,
>> > +   OPERANDS[2] is data (message),
>> > +   OPERANDS[3] is the polynomial without the leading 1.  */
>> > +
>> > +void
>> > +aarch64_expand_crc_using_pmull (scalar_mode crc_

Re: [PATCH v3] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook

2024-08-23 Thread H.J. Lu
On Fri, Aug 23, 2024, 4:02 AM Prathamesh Kulkarni 
wrote:

>
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, August 22, 2024 2:16 PM
> > To: H.J. Lu 
> > Cc: gcc-patches@gcc.gnu.org; josmy...@redhat.com
> > Subject: Re: [PATCH v3] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker
> > plugin hook
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Aug 21, 2024 at 4:25 PM H.J. Lu  wrote:
> > >
> > > This hook allows the BFD linker plugin to distinguish calls to
> > > claim_file_handler that know the object is being used by the linker
> > > (from ldmain.c:add_archive_element), from calls that don't know it's
> > > being used by the linker (from elf_link_is_defined_archive_symbol);
> > in
> > > the latter case, the plugin should avoid including the unused LTO
> > > archive members in link output.  To get the proper support for
> > > archives with LTO common symbols, the linker fix
> >
> > OK.
> Hi,
> After this commit:
>
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a98dd536b1017c2b814a3465206c6c01b2890998
> I am no longer able to see mkoffload (and accel compiler) being invoked
> for nvptx (-save-temps also doesn't show accel dumps).
> I have attached -v output before and after the commit for x86_64->nvptx
> offloading for the following simple test (host doesn't really matter, can
> also reproduce with aarch64 host):
>

Please open a bug report with exact steps
to reproduce the issue.


> int main()
> {
>   int x = 1;
>   #pragma omp target map(x)
> x = 5;
>   return x;
> }
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> > > commit a6f8fe0a9e9cbe871652e46ba7c22d5e9fb86208
> > > Author: H.J. Lu 
> > > Date:   Wed Aug 14 20:50:02 2024 -0700
> > >
> > > lto: Don't include unused LTO archive members in output
> > >
> > > is required.
> > >
> > > PR lto/116361
> > > * lto-plugin.c (claim_file_handler_v2): Rename claimed to
> > > can_be_claimed.  Include the LTO object only if it is known
> > to
> > > be included in link output.
> > >
> > > Signed-off-by: H.J. Lu 
> > > ---
> > >  lto-plugin/lto-plugin.c | 53
> > > -
> > >  1 file changed, 31 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index
> > > 152648338b9..61b0de62f52 100644
> > > --- a/lto-plugin/lto-plugin.c
> > > +++ b/lto-plugin/lto-plugin.c
> > > @@ -1191,16 +1191,19 @@ process_offload_section (void *data, const
> > char *name, off_t offset, off_t len)
> > >return 1;
> > >  }
> > >
> > > -/* Callback used by a linker to check if the plugin will claim
> > FILE. Writes
> > > -   the result in CLAIMED.  If KNOWN_USED, the object is known by
> > the linker
> > > -   to be used, or an older API version is in use that does not
> > provide that
> > > -   information; otherwise, the linker is only determining whether
> > this is
> > > -   a plugin object and it should not be registered as having
> > offload data if
> > > -   not claimed by the plugin.  */
> > > +/* Callback used by a linker to check if the plugin can claim FILE.
> > > +   Writes the result in CAN_BE_CLAIMED.  If KNOWN_USED != 0, the
> > object
> > > +   is known by the linker to be included in link output, or an
> > older API
> > > +   version is in use that does not provide that information.
> > Otherwise,
> > > +   the linker is only determining whether this is a plugin object
> > and
> > > +   only the symbol table is needed by the linker.  In this case,
> > the
> > > +   object should not be included in link output and this function
> > will
> > > +   be called by the linker again with KNOWN_USED != 0 after the
> > linker
> > > +   decides the object should be included in link output. */
> > >
> > >  static enum ld_plugin_status
> > > -claim_file_handler_v2 (const struct ld_plugin_input_file *file, int
> > *claimed,
> > > -  int known_used)
> > > +claim_file_handler_v2 (const struct ld_plugin_input_file *file,
> > > +  int *can_be_claimed, int known_used)
> > >  {
> > >enum ld_plugin_status status;
> > >struct plugin_objfile obj;
> > > @@ -1229,7 +1232,7 @@ claim_file_handler_v2 (const struct
> > ld_plugin_input_file *file, int *claimed,
> > >  }
> > >lto_file.handle = file->handle;
> > >
> > > -  *claimed = 0;
> > > +  *can_be_claimed = 0;
> > >obj.file = file;
> > >obj.found = 0;
> > >obj.offload = false;
> > > @@ -1286,15 +1289,19 @@ claim_file_handler_v2 (const struct
> > ld_plugin_input_file *file, int *claimed,
> > >   lto_file.symtab.syms);
> > >check (status == LDPS_OK, LDPL_FATAL, "could not add
> > symbols");
> > >
> > > -  LOCK_SECTION;
> > > -  num_claimed_files++;
> > > -  claimed_files =
> > > -   xrealloc (claimed_files,
> > > - num_claimed_files * sizeof (struct
> > plugin_file_info));
> > > -  claimed_files[num_claimed_fil

Re: [PATCH] late-combine: Preserve INSN_CODE when modifying notes [PR116343]

2024-08-23 Thread Georg-Johann Lay

Am 15.08.24 um 10:45 schrieb Richard Sandiford:

When it removes a definition, late-combine tries to update all
uses in notes.  It does this using the same insn_propagation class
that it uses for patterns.

However, insn_propagation uses validate_change, which in turn
resets the INSN_CODE.  This is inefficient in the best case,
since it forces the pattern to be rerecognised even though
changing a note can't affect the INSN_CODE.  But in the PR
it's a correctness problem: resetting INSN_CODE means we lose
the NOOP_INSN_MOVE_CODE, which in turn means that rtl-ssa doesn't
queue it for deletion.

This patch adds a routine specifically for propagating into notes.
A belt-and-braces fix would be to rerecognise noop moves in
function_info::change_insns, but I can't think of a good reason
why that would be necessary, and it could paper over latent bugs.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR testsuite/116343
* recog.h (insn_propagation::apply_to_note): Declare.
* recog.cc (insn_propagation::apply_to_note): New function.
* late-combine.cc (insn_combination::substitute_note): Use
apply_to_note instead of apply_to_rvalue.
* rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Improve
dumping of costs for noop moves.

gcc/testsuite/
PR testsuite/116343
* gcc.dg/torture/pr116343.c: New test.
---
  gcc/late-combine.cc |  2 +-
  gcc/recog.cc| 13 +
  gcc/recog.h |  1 +
  gcc/rtl-ssa/changes.cc  |  5 -
  gcc/testsuite/gcc.dg/torture/pr116343.c | 18 ++
  5 files changed, 37 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116343.c

diff --git a/gcc/late-combine.cc b/gcc/late-combine.cc
index 2b62e2956ed..1d81b386c3d 100644
--- a/gcc/late-combine.cc
+++ b/gcc/late-combine.cc
@@ -338,7 +338,7 @@ insn_combination::substitute_note (insn_info *use_insn, rtx 
note,
|| REG_NOTE_KIND (note) == REG_EQUIV)
  {
insn_propagation prop (use_insn->rtl (), m_dest, m_src);
-  return (prop.apply_to_rvalue (&XEXP (note, 0))
+  return (prop.apply_to_note (&XEXP (note, 0))
  && (can_propagate || prop.num_replacements == 0));
  }
return true;
diff --git a/gcc/recog.cc b/gcc/recog.cc
index 23e4820180f..615aaabc551 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -1469,6 +1469,19 @@ insn_propagation::apply_to_rvalue (rtx *loc)
return res;
  }
  
+/* Like apply_to_rvalue, but specifically for the case where *LOC is in

+   a note.  This never changes the INSN_CODE.  */
+
+bool
+insn_propagation::apply_to_note (rtx *loc)
+{
+  auto old_code = INSN_CODE (insn);
+  bool res = apply_to_rvalue (loc);
+  if (INSN_CODE (insn) != old_code)
+INSN_CODE (insn) = old_code;
+  return res;
+}
+
  /* Check whether INSN matches a specific alternative of an .md pattern.  */
  
  bool

diff --git a/gcc/recog.h b/gcc/recog.h
index 87a5803dec0..1dccce78ba4 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -121,6 +121,7 @@ public:
insn_propagation (rtx_insn *, rtx, rtx, bool = true);
bool apply_to_pattern (rtx *);
bool apply_to_rvalue (rtx *);
+  bool apply_to_note (rtx *);
  
/* Return true if we should accept a substitution into the address of

   memory expression MEM.  Undoing changes OLD_NUM_CHANGES and up restores
diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index a30f000191e..0476296607b 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -228,7 +228,10 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
for (const insn_change *change : changes)
if (!change->is_deletion ())
  {
-   fprintf (dump_file, " %c %d", sep, change->new_cost);
+   if (INSN_CODE (change->rtl ()) == NOOP_MOVE_INSN_CODE)
+ fprintf (dump_file, " %c nop", sep);
+   else
+ fprintf (dump_file, " %c %d", sep, change->new_cost);
sep = '+';
  }
if (weighted_new_cost != 0)
diff --git a/gcc/testsuite/gcc.dg/torture/pr116343.c 
b/gcc/testsuite/gcc.dg/torture/pr116343.c
new file mode 100644
index 000..ad13f0fc21c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116343.c
@@ -0,0 +1,18 @@
+// { dg-additional-options "-fschedule-insns -fno-thread-jumps -fno-dce" }


Hi, this fails on machines that don't support scheduling:

cc1: warning: instruction scheduling not supported on this target machine

FAIL: gcc.dg/torture/pr116343.c   -O0  (test for excess errors)
Excess errors:
cc1: warning: instruction scheduling not supported on this target machine

Johann


+
+int a, b, c;
+volatile int d;
+int e(int f, int g) { return g > 1 ? 1 : f >> g; }
+int main() {
+  int *i = &a;
+  long j[1];
+  if (a)
+while (1) {
+  a ^= 1;
+  if (*i)
+while (1)
+  ;
+  b = c && e((d, 1) >= 1, j[0]);
+}
+  re

Re: [pushed] rtl-ssa: Fix removal of order_nodes [PR115929]

2024-08-23 Thread Georg-Johann Lay




Am 16.07.24 um 16:34 schrieb Richard Sandiford:

order_nodes are used to implement ordered comparisons between
two insns with the same program point number.  remove_insn would
remove an order_node from its splay tree, but didn't remove it
from the insn.  This caused confusion if the insn was later
reinserted somewhere else that also needed an order_node.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  Pushed as obvious.

Richard


gcc/
PR rtl-optimization/115929
* rtl-ssa/insns.cc (function_info::remove_insn): Remove an
order_node from the instruction as well as from the splay tree.

gcc/testsuite/
PR rtl-optimization/115929
* gcc.dg/torture/pr115929-1.c: New test.
---
  gcc/rtl-ssa/insns.cc  |  5 ++-
  gcc/testsuite/gcc.dg/torture/pr115929-1.c | 45 +++
  2 files changed, 49 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/torture/pr115929-1.c

diff --git a/gcc/rtl-ssa/insns.cc b/gcc/rtl-ssa/insns.cc
index 7e26bfd978f..bc30734df89 100644
--- a/gcc/rtl-ssa/insns.cc
+++ b/gcc/rtl-ssa/insns.cc
@@ -393,7 +393,10 @@ void
  function_info::remove_insn (insn_info *insn)
  {
if (insn_info::order_node *order = insn->get_order_node ())
-insn_info::order_splay_tree::remove_node (order);
+{
+  insn_info::order_splay_tree::remove_node (order);
+  insn->remove_note (order);
+}
  
if (auto *note = insn->find_note ())

  {
diff --git a/gcc/testsuite/gcc.dg/torture/pr115929-1.c 
b/gcc/testsuite/gcc.dg/torture/pr115929-1.c
new file mode 100644
index 000..19b831ab99e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr115929-1.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-fno-gcse -fschedule-insns -fno-guess-branch-probability 
-fno-tree-fre -fno-tree-ch" } */


Hi, this fails on machines that don't support scheduling:

cc1: warning: instruction scheduling not supported on this target machine

FAIL: gcc.dg/torture/pr115929-2.c   -O0  (test for excess errors)
Excess errors:
cc1: warning: instruction scheduling not supported on this target machine

Better do

/* { dg-require-effective-target scheduling } */

Johann


+
+int printf(const char *, ...);
+int a[6], b, c;
+char d, l;
+struct {
+  char e;
+  int f;
+  int : 8;
+  long g;
+  long h;
+} i[1][9] = {0};
+unsigned j;
+void n(char p) { b = b >> 8 ^ a[b ^ p]; }
+int main() {
+  int k, o;
+  while (b) {
+k = 0;
+for (; k < 9; k++) {
+  b = b ^ a[l];
+  n(j);
+  if (o)
+printf(&d);
+  long m = i[c][k].f;
+  b = b >> 8 ^ a[l];
+  n(m >> 32);
+  n(m);
+  if (o)
+printf("%d", d);
+  b = b >> 8 ^ l;
+  n(2);
+  n(0);
+  if (o)
+printf(&d);
+  b = b ^ a[l];
+  n(i[c][k].g >> 2);
+  n(i[c][k].g);
+  if (o)
+printf(&d);
+  printf("%d", i[c][k].f);
+}
+  }
+  return 0;
+}


[RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2024-08-23 Thread Jennifer Schmitz
This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS tunable and
use_new_vector_costs entry in aarch64-tuning-flags.def and makes the
AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the
default.
To that end, the function aarch64_use_new_vector_costs_p and its uses were
removed. Additionally, guards were added prevent nullpointer dereferences of
fields in cpu_vector_cost.

The patch was bootstrapped and regtested on aarch64-linux-gnu:
No problems bootstrapping, but several test files (in aarch64-sve.exp: 
gather_load_extend_X.c 
where X is 1 to 4, strided_load_2.c, strided_store_2.c) fail because of small 
differences
in codegen that make some of the scan-assembler-times tests fail.

Kyrill suggested to add a -fvect-cost-model=unlimited flag to these tests and 
add some 
logic to aarch64_vector_costs::add_stmt_cost to disable the changes in vector 
instructions 
when flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED. If you agree with that
suggestion, I propose prepending the current patch by one that implements this 
logic and adding
-fvect-cost-model=unlimited to the failing tests. Please advise.

Signed-off-by: Jennifer Schmitz 

gcc/
* config/aarch64/aarch64-tuning-flags.def: Remove
use_new_vector_costs as tuning option.
* config/aarch64/aarch64.cc (aarch64_use_new_vector_costs_p):
Remove.
(aarch64_in_loop_reduction_latency): Add nullpointer dereference
guard.
(aarch64_detect_vector_stmt_subtype): Likewise.
(aarch64_vector_costs::add_stmt_cost): Remove use of
aarch64_use_new_vector_costs_p and add nullpointer dereference
guards.
(aarch64_vector_costs::finish_cost): Remove use of
aarch64_use_new_vector_costs_p.
* config/aarch64/tuning_models/cortexx925.h: Remove
AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS.
* config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
* config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
* config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
* config/aarch64/tuning_models/neoversen2.h: Likewise.
* config/aarch64/tuning_models/neoversen3.h: Likewise.
* config/aarch64/tuning_models/neoversev1.h: Likewise.
* config/aarch64/tuning_models/neoversev2.h: Likewise.
* config/aarch64/tuning_models/neoversev3.h: Likewise.


0001-AArch64-Remove-AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COS.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature


[patch,avr] Overhaul avr-ifelse RTL optimization pass

2024-08-23 Thread Georg-Johann Lay

This patch overhauls the avr-ifelse mini-pass that optimizes
two cbranch insns to one comparison and two branches.

More optimization opportunities are realized, and the code
has been refactored.

No new regressions.  Ok for trunk?

There is currently no avr maintainer, so some global reviewer
might please have a look at this.

And one question I have:  avr_optimize_2ifelse() is rewiring
basic blocks using redirect_edge_and_branch().  Does this
require extra pass flags or actions?  Currently the RTL_PASS
data reads:

static const pass_data avr_pass_data_ifelse =
{
  RTL_PASS,  // type
  "",// name (will be patched)
  OPTGROUP_NONE, // optinfo_flags
  TV_DF_SCAN,// tv_id
  0, // properties_required
  0, // properties_provided
  0, // properties_destroyed
  0, // todo_flags_start
  TODO_df_finish | TODO_df_verify // todo_flags_finish
};


Johann

p.s. The additional notes on compare-elim / PR115830 can be found
here (pending review):

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660743.html

--

AVR: Overhaul the avr-ifelse RTL optimization pass.

Mini-pass avr-ifelse realizes optimizations that replace two cbranch
insns with one comparison and two branches.  This patch adds the
following improvements:

- The right operand of the comparisons may also be REGs.
  Formerly only CONST_INT was handled.

- The code of the first comparison in no more restricted
  to (effectively) EQ.

- When the second cbranch is located in the fallthrough path
  of the first cbranch, then difficult (expensive) comparisons
  can always be avoided.  This may require to swap the branch
  targets.  (When the second cbranch if located after the target
  label of the first one, then getting rid of difficult branches
  would require to reorder blocks.)

- The code has been cleaned up:  avr_rest_of_handle_ifelse() now
  just scans the insn stream for optimization candidates.  The code
  that actually performs the transformation has been outsourced to
  the new function avr_optimize_2ifelse().

- The code to find a better representation for reg-const_int comparisons
  has been split into two parts:  First try to find codes such that the
  right-hand sides of the comparisons are the same (avr_2comparisons_rhs).
  When this succeeds then one comparison can serve two branches, and
  avr_redundant_compare() tries to get rid of difficult branches that
  may have been introduced by avr_2comparisons_rhs().  This is always
  possible when the second cbranch is located in the fallthrough path
  of the first one, or when the first code is EQ.

Some final notes on why we don't use compare-elim:  1) The two cbranch
insns may come with different scratch operands depending on the chosen
constraint alternatives.  There are cases where the outgoing comparison
requires a scratch but only one incoming cbranch has one.  2) Avoiding
difficult branches can be achieved by rewiring basic blocks.
compare-elim doesn't do that; it doesn't even know the costs of the
branch codes.  3)  avr_2comparisons_rhs() may de-canonicalize a
comparison to achieve its goal.  compare-elim doesn't know how to do
that.  4) There are more reasons, see for example the commit
message and discussion for PR115830.

gcc/
* config/avr/avr.cc (cfganal.h): Include it.
(avr_2comparisons_rhs, avr_redundant_compare_regs)
(avr_strict_signed_p, avr_strict_unsigned_p): New static functions.
(avr_redundant_compare): Overhaul: Allow more cases.
(avr_optimize_2ifelse): New static function, outsourced from...
(avr_rest_of_handle_ifelse): ...this method.
gcc/testsuite/
* gcc.target/avr/torture/ifelse-c.h: New file.
* gcc.target/avr/torture/ifelse-d.h: New file.
* gcc.target/avr/torture/ifelse-q.h: New file.
* gcc.target/avr/torture/ifelse-r.h: New file.
* gcc.target/avr/torture/ifelse-c-i8.c: New test.
* gcc.target/avr/torture/ifelse-d-i8.c: New test.
* gcc.target/avr/torture/ifelse-q-i8.c: New test.
* gcc.target/avr/torture/ifelse-r-i8.c: New test.
* gcc.target/avr/torture/ifelse-c-i16.c: New test.
* gcc.target/avr/torture/ifelse-d-i16.c: New test.
* gcc.target/avr/torture/ifelse-q-i16.c: New test.
* gcc.target/avr/torture/ifelse-r-i16.c: New test.
* gcc.target/avr/torture/ifelse-c-u16.c: New test.
* gcc.target/avr/torture/ifelse-d-u16.c: New test.
* gcc.target/avr/torture/ifelse-q-u16.c: New test.
* gcc.target/avr/torture/ifelse-r-u16.c: New test.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index c520b98a178..90606b73114 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -33,6 +33,7 @@
 #include "cgraph.h"
 #include "c-family/c-common.h"
 #include "cfghooks.h"
+#include "cfganal.h"
 #include "df.h"
 #include "memmodel.h"
 #include "tm_p.h"
@@ -337,6 +338,174 @@ ra_in_progress ()
 }
 
 
+/* Return TRUE iff comparison code C

[committed] libstdc++: Make std::vector::reference constructor private [PR115098]

2024-08-23 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The standard says this constructor should be private.  LWG 4141 proposes
to remove it entirely. We still need it, but it doesn't need to be
public.

For std::bitset the default constructor is already private (and never
even defined) but there's a non-standard constructor that's public, but
doesn't need to be.

libstdc++-v3/ChangeLog:

PR libstdc++/115098
* include/bits/stl_bvector.h (_Bit_reference): Make default
constructor private. Declare vector and bit iterators as
friends.
* include/std/bitset (bitset::reference): Make constructor and
data members private.
* testsuite/20_util/bitset/115098.cc: New test.
* testsuite/23_containers/vector/bool/115098.cc: New test.
---
 libstdc++-v3/include/bits/stl_bvector.h  | 12 +---
 libstdc++-v3/include/std/bitset  |  5 +
 libstdc++-v3/testsuite/20_util/bitset/115098.cc  | 11 +++
 .../testsuite/23_containers/vector/bool/115098.cc|  8 
 4 files changed, 29 insertions(+), 7 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/bitset/115098.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/vector/bool/115098.cc

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index c45b7ff3320..42261ac5915 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -81,6 +81,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   struct _Bit_reference
   {
+  private:
+template friend class vector;
+friend struct _Bit_iterator;
+friend struct _Bit_const_iterator;
+
+_GLIBCXX20_CONSTEXPR
+_Bit_reference() _GLIBCXX_NOEXCEPT : _M_p(0), _M_mask(0) { }
+
 _Bit_type * _M_p;
 _Bit_type _M_mask;
 
@@ -88,9 +96,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 _Bit_reference(_Bit_type * __x, _Bit_type __y)
 : _M_p(__x), _M_mask(__y) { }
 
-_GLIBCXX20_CONSTEXPR
-_Bit_reference() _GLIBCXX_NOEXCEPT : _M_p(0), _M_mask(0) { }
-
+  public:
 #if __cplusplus >= 201103L
 _Bit_reference(const _Bit_reference&) = default;
 #endif
diff --git a/libstdc++-v3/include/std/bitset b/libstdc++-v3/include/std/bitset
index e5d677ff059..2e82a0e289d 100644
--- a/libstdc++-v3/include/std/bitset
+++ b/libstdc++-v3/include/std/bitset
@@ -870,10 +870,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_WordT* _M_wp;
size_t  _M_bpos;
 
-   // left undefined
-   reference();
-
-  public:
_GLIBCXX23_CONSTEXPR
reference(bitset& __b, size_t __pos) _GLIBCXX_NOEXCEPT
{
@@ -881,6 +877,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  _M_bpos = _Base::_S_whichbit(__pos);
}
 
+  public:
 #if __cplusplus >= 201103L
reference(const reference&) = default;
 #endif
diff --git a/libstdc++-v3/testsuite/20_util/bitset/115098.cc 
b/libstdc++-v3/testsuite/20_util/bitset/115098.cc
new file mode 100644
index 000..52d6a0ec378
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/bitset/115098.cc
@@ -0,0 +1,11 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace std;
+
+static_assert( ! is_default_constructible::reference>::value,
+"std::bitset::reference is not default constructible");
+
+static_assert( ! is_constructible::reference, bitset<10>&, 
size_t>::value,
+"std::bitset::reference is not default constructible");
diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/115098.cc 
b/libstdc++-v3/testsuite/23_containers/vector/bool/115098.cc
new file mode 100644
index 000..3df8b801795
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/vector/bool/115098.cc
@@ -0,0 +1,8 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+static_assert(
+!std::is_default_constructible::reference>::value,
+"std::vector::reference is not default constructible"
+);
-- 
2.46.0



Re: [PATCH] libstdc++: Simplify C++20 implementation of std::variant

2024-08-23 Thread Jonathan Wakely
On Wed, 21 Aug 2024 at 10:03, Jonathan Wakely  wrote:
>
> Tested x86_64-linux.
>
> This should improve compile times for C++20 and up.
>
> I need to test this with Clang, but then I plan to push it if all goes
> well.

It seems to work OK with Clang, so I've pushed it.



Re: [patch,avr] Overhaul avr-ifelse RTL optimization pass

2024-08-23 Thread Richard Biener
On Fri, Aug 23, 2024 at 2:16 PM Georg-Johann Lay  wrote:
>
> This patch overhauls the avr-ifelse mini-pass that optimizes
> two cbranch insns to one comparison and two branches.
>
> More optimization opportunities are realized, and the code
> has been refactored.
>
> No new regressions.  Ok for trunk?
>
> There is currently no avr maintainer, so some global reviewer
> might please have a look at this.

I see Denis still listed?  Possibly Jeff can have a look though.

> And one question I have:  avr_optimize_2ifelse() is rewiring
> basic blocks using redirect_edge_and_branch().  Does this
> require extra pass flags or actions?  Currently the RTL_PASS
> data reads:

It probably depends where the pass is inserted, but if it didn't
blow up spectacularly in your testing it should be fine as-is.

> static const pass_data avr_pass_data_ifelse =
> {
>RTL_PASS,  // type
>"",// name (will be patched)
>OPTGROUP_NONE, // optinfo_flags
>TV_DF_SCAN,// tv_id
>0, // properties_required
>0, // properties_provided
>0, // properties_destroyed
>0, // todo_flags_start
>TODO_df_finish | TODO_df_verify // todo_flags_finish
> };
>
>
> Johann
>
> p.s. The additional notes on compare-elim / PR115830 can be found
> here (pending review):
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660743.html
>
> --
>
> AVR: Overhaul the avr-ifelse RTL optimization pass.
>
> Mini-pass avr-ifelse realizes optimizations that replace two cbranch
> insns with one comparison and two branches.  This patch adds the
> following improvements:
>
> - The right operand of the comparisons may also be REGs.
>Formerly only CONST_INT was handled.
>
> - The code of the first comparison in no more restricted
>to (effectively) EQ.
>
> - When the second cbranch is located in the fallthrough path
>of the first cbranch, then difficult (expensive) comparisons
>can always be avoided.  This may require to swap the branch
>targets.  (When the second cbranch if located after the target
>label of the first one, then getting rid of difficult branches
>would require to reorder blocks.)
>
> - The code has been cleaned up:  avr_rest_of_handle_ifelse() now
>just scans the insn stream for optimization candidates.  The code
>that actually performs the transformation has been outsourced to
>the new function avr_optimize_2ifelse().
>
> - The code to find a better representation for reg-const_int comparisons
>has been split into two parts:  First try to find codes such that the
>right-hand sides of the comparisons are the same (avr_2comparisons_rhs).
>When this succeeds then one comparison can serve two branches, and
>avr_redundant_compare() tries to get rid of difficult branches that
>may have been introduced by avr_2comparisons_rhs().  This is always
>possible when the second cbranch is located in the fallthrough path
>of the first one, or when the first code is EQ.
>
> Some final notes on why we don't use compare-elim:  1) The two cbranch
> insns may come with different scratch operands depending on the chosen
> constraint alternatives.  There are cases where the outgoing comparison
> requires a scratch but only one incoming cbranch has one.  2) Avoiding
> difficult branches can be achieved by rewiring basic blocks.
> compare-elim doesn't do that; it doesn't even know the costs of the
> branch codes.  3)  avr_2comparisons_rhs() may de-canonicalize a
> comparison to achieve its goal.  compare-elim doesn't know how to do
> that.  4) There are more reasons, see for example the commit
> message and discussion for PR115830.
>
> gcc/
> * config/avr/avr.cc (cfganal.h): Include it.
> (avr_2comparisons_rhs, avr_redundant_compare_regs)
> (avr_strict_signed_p, avr_strict_unsigned_p): New static functions.
> (avr_redundant_compare): Overhaul: Allow more cases.
> (avr_optimize_2ifelse): New static function, outsourced from...
> (avr_rest_of_handle_ifelse): ...this method.
> gcc/testsuite/
> * gcc.target/avr/torture/ifelse-c.h: New file.
> * gcc.target/avr/torture/ifelse-d.h: New file.
> * gcc.target/avr/torture/ifelse-q.h: New file.
> * gcc.target/avr/torture/ifelse-r.h: New file.
> * gcc.target/avr/torture/ifelse-c-i8.c: New test.
> * gcc.target/avr/torture/ifelse-d-i8.c: New test.
> * gcc.target/avr/torture/ifelse-q-i8.c: New test.
> * gcc.target/avr/torture/ifelse-r-i8.c: New test.
> * gcc.target/avr/torture/ifelse-c-i16.c: New test.
> * gcc.target/avr/torture/ifelse-d-i16.c: New test.
> * gcc.target/avr/torture/ifelse-q-i16.c: New test.
> * gcc.target/avr/torture/ifelse-r-i16.c: New test.
> * gcc.target/avr/torture/ifelse-c-u16.c: New test.
> * gcc.target/avr/torture/ifelse-d-u16.c: New test.
> * gcc.target/avr/torture/ifelse-q-u16.c: New test.

[committed] libstdc++: Use noexcept insted of throw() in src/c++11/debug.cc

2024-08-23 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* src/c++11/debug.cc: Replace throw() with noexcept.
---
 libstdc++-v3/src/c++11/debug.cc | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index 5d6bb5b7547..e3880318e5c 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -380,7 +380,7 @@ namespace __gnu_debug
 
   __gnu_cxx::__mutex&
   _Safe_sequence_base::
-  _M_get_mutex() throw ()
+  _M_get_mutex() noexcept
   { return get_safe_base_mutex(this); }
 
   void
@@ -393,7 +393,7 @@ namespace __gnu_debug
 
   void
   _Safe_sequence_base::
-  _M_attach_single(_Safe_iterator_base* __it, bool __constant) throw ()
+  _M_attach_single(_Safe_iterator_base* __it, bool __constant) noexcept
   {
 _Safe_iterator_base*& __its =
   __constant ? _M_const_iterators : _M_iterators;
@@ -414,7 +414,7 @@ namespace __gnu_debug
 
   void
   _Safe_sequence_base::
-  _M_detach_single(_Safe_iterator_base* __it) throw ()
+  _M_detach_single(_Safe_iterator_base* __it) noexcept
   {
 // Remove __it from this sequence's list
 __it->_M_unlink();
@@ -443,7 +443,7 @@ namespace __gnu_debug
 
   void
   _Safe_iterator_base::
-  _M_attach_single(_Safe_sequence_base* __seq, bool __constant) throw ()
+  _M_attach_single(_Safe_sequence_base* __seq, bool __constant) noexcept
   {
 _M_detach_single();
 
@@ -478,7 +478,7 @@ namespace __gnu_debug
 
   void
   _Safe_iterator_base::
-  _M_detach_single() throw ()
+  _M_detach_single() noexcept
   {
 if (_M_sequence)
   {
@@ -489,7 +489,7 @@ namespace __gnu_debug
 
   void
   _Safe_iterator_base::
-  _M_reset() throw ()
+  _M_reset() noexcept
   {
 reset_sequence_ptr(_M_sequence);
 // Do not reset version, so that a detached iterator does not look like a
@@ -501,17 +501,17 @@ namespace __gnu_debug
 
   bool
   _Safe_iterator_base::
-  _M_singular() const throw ()
+  _M_singular() const noexcept
   { return !_M_sequence || _M_version != _M_sequence->_M_version; }
 
   bool
   _Safe_iterator_base::
-  _M_can_compare(const _Safe_iterator_base& __x) const throw ()
+  _M_can_compare(const _Safe_iterator_base& __x) const noexcept
   { return _M_sequence == __x._M_sequence; }
 
   __gnu_cxx::__mutex&
   _Safe_iterator_base::
-  _M_get_mutex() throw ()
+  _M_get_mutex() noexcept
   { return _M_sequence->_M_get_mutex(); }
 
   _Safe_unordered_container_base*
@@ -538,7 +538,7 @@ namespace __gnu_debug
 
   void
   _Safe_local_iterator_base::
-  _M_attach_single(_Safe_sequence_base* __cont, bool __constant) throw ()
+  _M_attach_single(_Safe_sequence_base* __cont, bool __constant) noexcept
   {
 _M_detach_single();
 
@@ -566,7 +566,7 @@ namespace __gnu_debug
 
   void
   _Safe_local_iterator_base::
-  _M_detach_single() throw ()
+  _M_detach_single() noexcept
   {
 if (_M_sequence)
   {
@@ -608,7 +608,7 @@ namespace __gnu_debug
 
   void
   _Safe_unordered_container_base::
-  _M_attach_local_single(_Safe_iterator_base* __it, bool __constant) throw ()
+  _M_attach_local_single(_Safe_iterator_base* __it, bool __constant) noexcept
   {
 _Safe_iterator_base*& __its =
   __constant ? _M_const_local_iterators : _M_local_iterators;
@@ -629,7 +629,7 @@ namespace __gnu_debug
 
   void
   _Safe_unordered_container_base::
-  _M_detach_local_single(_Safe_iterator_base* __it) throw ()
+  _M_detach_local_single(_Safe_iterator_base* __it) noexcept
   {
 // Remove __it from this container's list
 __it->_M_unlink();
@@ -1233,7 +1233,7 @@ namespace
 namespace __gnu_debug
 {
   _Error_formatter&
-  _Error_formatter::_M_message(_Debug_msg_id __id) const throw ()
+  _Error_formatter::_M_message(_Debug_msg_id __id) const noexcept
   {
 return const_cast<_Error_formatter*>(this)
   ->_M_message(_S_debug_messages[__id]);
@@ -1334,7 +1334,7 @@ namespace __gnu_debug
   template
 void
 _Error_formatter::_M_format_word(char*, int, const char*, _Tp)
-const throw ()
+const noexcept
 { }
 
   void
@@ -1346,7 +1346,7 @@ namespace __gnu_debug
   { }
 
   void
-  _Error_formatter::_M_get_max_length() const throw ()
+  _Error_formatter::_M_get_max_length() const noexcept
   { }
 
   // Instantiations.
-- 
2.46.0



[committed] libstdc++: Make debug sequence members mutable [PR116369]

2024-08-23 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

We need to be able to attach debug mode iterators to const containers,
so the safe iterator constructor uses const_cast to get a modifiable
pointer to the container. If the container was defined as const, that
const_cast to access its members results in undefined behaviour.  PR
116369 shows a case where it results in a segfault because the container
is in a rodata section (which shouldn't have happened, but the undefined
behaviour in the library still exists in any case).

This makes the _M_iterators and _M_const_iterators data members mutable,
so that it's safe to modify them even if the declared type of the
container is a const type.

Ideally we would not need the const_cast at all. Instead, the _M_attach
member (and everything it calls) should be const-qualified. That would
work fine now, because the members that it ends up modifying are
mutable. Making that change would require a number of new exports from
the shared library, and would require retaining the old non-const member
functions (maybe as symbol aliases) for backwards compatibility. That
might be worth changing at some point, but isn't done here.

libstdc++-v3/ChangeLog:

PR c++/116369
* include/debug/safe_base.h (_Safe_sequence_base::_M_iterators):
Add mutable specifier.
(_Safe_sequence_base::_M_const_iterators): Likewise.
---
 libstdc++-v3/include/debug/safe_base.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/debug/safe_base.h 
b/libstdc++-v3/include/debug/safe_base.h
index d5fbe4b1320..88d7f0b05c8 100644
--- a/libstdc++-v3/include/debug/safe_base.h
+++ b/libstdc++-v3/include/debug/safe_base.h
@@ -205,10 +205,10 @@ namespace __gnu_debug
 
   public:
 /// The list of mutable iterators that reference this container
-_Safe_iterator_base* _M_iterators;
+mutable _Safe_iterator_base* _M_iterators;
 
 /// The list of constant iterators that reference this container
-_Safe_iterator_base* _M_const_iterators;
+mutable _Safe_iterator_base* _M_const_iterators;
 
 /// The container version number. This number may never be 0.
 mutable unsigned int _M_version;
-- 
2.46.0



[PATCH] lto: Don't check obj.found for offload section

2024-08-23 Thread H.J. Lu
obj.found is the number of LTO symbols.  We should include the offload
section when it is used by linker even if there are no LTO symbols.

PR lto/116361
* lto-plugin.c (claim_file_handler_v2): Don't check obj.found
for the offload section.

Signed-off-by: H.J. Lu 
---
 lto-plugin/lto-plugin.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index 61b0de62f52..c564b36eb92 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -1320,7 +1320,7 @@ claim_file_handler_v2 (const struct ld_plugin_input_file 
*file,
   if (*can_be_claimed && !obj.offload && offload_files_last_lto == NULL)
 offload_files_last_lto = offload_files_last;
 
-  if (obj.offload && known_used && obj.found > 0)
+  if (obj.offload && known_used)
 {
   /* Add file to the list.  The order must be exactly the same as the final
 order after recompilation and linking, otherwise host and target tables
-- 
2.46.0



[PATCH] tree-optimization/116463 - complex lowering leaves around dead stmts

2024-08-23 Thread Richard Biener
Complex lowering generally replaces existing complex defs with
COMPLEX_EXPRs but those might be dead when it can always refer to
components from the lattice.  This in turn can pessimize followup
transforms like forwprop and reassoc, the following makes sure to
get rid of dead COMPLEX_EXPRs generated by using
simple_dce_from_worklist.

Bootstrapped and tested on x86_64-unknown-linux-gnu, this will cause
the following fallout which is similar to the aarch64 fallout in
PR116463, complex SLP recognition being somewhat fragile.  I'll track
this there.  Pushed.

FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-not 
vfma
dd[123]*ph[ t]
FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c 
scan-assembler-times vf
maddcph[ t] 1
FAIL: gcc.target/i386/part-vect-complexhf.c scan-assembler-times 
vfmaddcph[ t] 1


PR tree-optimization/116463
* tree-complex.cc: Include tree-ssa-dce.h.
(dce_worklist): New global.
(update_complex_assignment): Add SSA def to the DCE worklist.
(tree_lower_complex): Perform DCE.
---
 gcc/tree-complex.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
index dfb45b9d91c..7480c07640e 100644
--- a/gcc/tree-complex.cc
+++ b/gcc/tree-complex.cc
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "case-cfn-macros.h"
 #include "builtins.h"
 #include "optabs-tree.h"
+#include "tree-ssa-dce.h"
 
 /* For each complex ssa name, a lattice value.  We're interested in finding
out whether a complex number is degenerate in some way, having only real
@@ -88,6 +89,9 @@ static vec phis_to_revisit;
 /* BBs that need EH cleanup.  */
 static bitmap need_eh_cleanup;
 
+/* SSA defs we should try to DCE.  */
+static bitmap dce_worklist;
+
 /* Lookup UID in the complex_variable_components hashtable and return the
associated tree.  */
 static tree
@@ -731,6 +735,7 @@ update_complex_assignment (gimple_stmt_iterator *gsi, tree 
r, tree i)
   update_stmt (stmt);
   if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
 bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
+  bitmap_set_bit (dce_worklist, SSA_NAME_VERSION (gimple_assign_lhs (stmt)));
 
   update_complex_components (gsi, gsi_stmt (*gsi), r, i);
 }
@@ -1962,6 +1967,7 @@ tree_lower_complex (void)
   complex_propagate.ssa_propagate ();
 
   need_eh_cleanup = BITMAP_ALLOC (NULL);
+  dce_worklist = BITMAP_ALLOC (NULL);
 
   complex_variable_components = new int_tree_htab_type (10);
 
@@ -2008,6 +2014,9 @@ tree_lower_complex (void)
 
   gsi_commit_edge_inserts ();
 
+  simple_dce_from_worklist (dce_worklist, need_eh_cleanup);
+  BITMAP_FREE (dce_worklist);
+
   unsigned todo
 = gimple_purge_all_dead_eh_edges (need_eh_cleanup) ? TODO_cleanup_cfg : 0;
   BITMAP_FREE (need_eh_cleanup);
-- 
2.43.0


Re: [PATCH v3] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook

2024-08-23 Thread H.J. Lu
On Fri, Aug 23, 2024 at 4:02 AM Prathamesh Kulkarni
 wrote:
>
>
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, August 22, 2024 2:16 PM
> > To: H.J. Lu 
> > Cc: gcc-patches@gcc.gnu.org; josmy...@redhat.com
> > Subject: Re: [PATCH v3] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker
> > plugin hook
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Aug 21, 2024 at 4:25 PM H.J. Lu  wrote:
> > >
> > > This hook allows the BFD linker plugin to distinguish calls to
> > > claim_file_handler that know the object is being used by the linker
> > > (from ldmain.c:add_archive_element), from calls that don't know it's
> > > being used by the linker (from elf_link_is_defined_archive_symbol);
> > in
> > > the latter case, the plugin should avoid including the unused LTO
> > > archive members in link output.  To get the proper support for
> > > archives with LTO common symbols, the linker fix
> >
> > OK.
> Hi,
> After this commit:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a98dd536b1017c2b814a3465206c6c01b2890998
> I am no longer able to see mkoffload (and accel compiler) being invoked for 
> nvptx (-save-temps also doesn't show accel dumps).
> I have attached -v output before and after the commit for x86_64->nvptx 
> offloading for the following simple test (host doesn't really matter, can 
> also reproduce with aarch64 host):

Please try

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661306.html

> int main()
> {
>   int x = 1;
>   #pragma omp target map(x)
> x = 5;
>   return x;
> }
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> > > commit a6f8fe0a9e9cbe871652e46ba7c22d5e9fb86208
> > > Author: H.J. Lu 
> > > Date:   Wed Aug 14 20:50:02 2024 -0700
> > >
> > > lto: Don't include unused LTO archive members in output
> > >
> > > is required.
> > >
> > > PR lto/116361
> > > * lto-plugin.c (claim_file_handler_v2): Rename claimed to
> > > can_be_claimed.  Include the LTO object only if it is known
> > to
> > > be included in link output.
> > >
> > > Signed-off-by: H.J. Lu 
> > > ---
> > >  lto-plugin/lto-plugin.c | 53
> > > -
> > >  1 file changed, 31 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index
> > > 152648338b9..61b0de62f52 100644
> > > --- a/lto-plugin/lto-plugin.c
> > > +++ b/lto-plugin/lto-plugin.c
> > > @@ -1191,16 +1191,19 @@ process_offload_section (void *data, const
> > char *name, off_t offset, off_t len)
> > >return 1;
> > >  }
> > >
> > > -/* Callback used by a linker to check if the plugin will claim
> > FILE. Writes
> > > -   the result in CLAIMED.  If KNOWN_USED, the object is known by
> > the linker
> > > -   to be used, or an older API version is in use that does not
> > provide that
> > > -   information; otherwise, the linker is only determining whether
> > this is
> > > -   a plugin object and it should not be registered as having
> > offload data if
> > > -   not claimed by the plugin.  */
> > > +/* Callback used by a linker to check if the plugin can claim FILE.
> > > +   Writes the result in CAN_BE_CLAIMED.  If KNOWN_USED != 0, the
> > object
> > > +   is known by the linker to be included in link output, or an
> > older API
> > > +   version is in use that does not provide that information.
> > Otherwise,
> > > +   the linker is only determining whether this is a plugin object
> > and
> > > +   only the symbol table is needed by the linker.  In this case,
> > the
> > > +   object should not be included in link output and this function
> > will
> > > +   be called by the linker again with KNOWN_USED != 0 after the
> > linker
> > > +   decides the object should be included in link output. */
> > >
> > >  static enum ld_plugin_status
> > > -claim_file_handler_v2 (const struct ld_plugin_input_file *file, int
> > *claimed,
> > > -  int known_used)
> > > +claim_file_handler_v2 (const struct ld_plugin_input_file *file,
> > > +  int *can_be_claimed, int known_used)
> > >  {
> > >enum ld_plugin_status status;
> > >struct plugin_objfile obj;
> > > @@ -1229,7 +1232,7 @@ claim_file_handler_v2 (const struct
> > ld_plugin_input_file *file, int *claimed,
> > >  }
> > >lto_file.handle = file->handle;
> > >
> > > -  *claimed = 0;
> > > +  *can_be_claimed = 0;
> > >obj.file = file;
> > >obj.found = 0;
> > >obj.offload = false;
> > > @@ -1286,15 +1289,19 @@ claim_file_handler_v2 (const struct
> > ld_plugin_input_file *file, int *claimed,
> > >   lto_file.symtab.syms);
> > >check (status == LDPS_OK, LDPL_FATAL, "could not add
> > symbols");
> > >
> > > -  LOCK_SECTION;
> > > -  num_claimed_files++;
> > > -  claimed_files =
> > > -   xrealloc (claimed_files,
> > > - num_claimed_files * sizeof (struct
> > plugin_file_info));
> > > -  claimed_files[nu

[committed,v2] libstdc++: Optimize __try_use_facet for const types

2024-08-23 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

LWG 436 confirmed that const-qualified types are valid arguments for
Facet template parameters (but volatile-qualified types are not). Use the
fast path in std::use_facet and std::has_facet for const T as well as T.

libstdc++-v3/ChangeLog:

* include/bits/locale_classes.tcc (__try_use_facet): Also avoid
dynamic_cast for const-qualified facet types.
---
 libstdc++-v3/include/bits/locale_classes.tcc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/locale_classes.tcc 
b/libstdc++-v3/include/bits/locale_classes.tcc
index c79574e58de..d5ef1911057 100644
--- a/libstdc++-v3/include/bits/locale_classes.tcc
+++ b/libstdc++-v3/include/bits/locale_classes.tcc
@@ -110,7 +110,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // We know these standard facets are always installed in every locale
   // so dynamic_cast always succeeds, just use static_cast instead.
 #define _GLIBCXX_STD_FACET(...) \
-  if _GLIBCXX_CONSTEXPR (__is_same(_Facet, __VA_ARGS__)) \
+  if _GLIBCXX_CONSTEXPR (__is_same(const _Facet, const __VA_ARGS__)) \
return static_cast(__facets[__i])
 
   _GLIBCXX_STD_FACET(ctype);
-- 
2.46.0



[PATCH] c++: Add most missing C++20 and C++23 names to cxxapi-data.csv

2024-08-23 Thread Jonathan Wakely
Tested x86_64-linux. OK for trunk?

-- >8 --

This includes uncommenting the atomic_flag non-member functions, which
were added by PR libstdc++/103934.

Also generate a hint for std::ignore, which was recently tweaked to be
more generally useful by P2968R2, which r15-2324 implemented.

gcc/cp/ChangeLog:

* cxxapi-data.csv: Add C++20 and C++23 names from ,
, , , , and .
Set cxx11 dialect for std::ignore in . Uncomment
atomic_flag functions from .
* std-name-hint.gperf: Regenerate.
* std-name-hint.h: Regenerate.
---
 gcc/cp/cxxapi-data.csv |   96 +-
 gcc/cp/std-name-hint.gperf |   84 +-
 gcc/cp/std-name-hint.h | 2103 
 3 files changed, 1333 insertions(+), 950 deletions(-)

diff --git a/gcc/cp/cxxapi-data.csv b/gcc/cp/cxxapi-data.csv
index 1cbf774acd7..bd397fb2acb 100644
--- a/gcc/cp/cxxapi-data.csv
+++ b/gcc/cp/cxxapi-data.csv
@@ -197,16 +197,16 @@
 ,atomic_uintmax_t,1,cxx20
 
,atomic_signed_lock_free,1,cxx11,__cpp_lib_atomic_lock_free_type_aliases
 
,atomic_unsigned_lock_free,1,cxx11,__cpp_lib_atomic_lock_free_type_aliases
-# libstdc++/103934 ,atomic_flag_test,1,no
-# libstdc++/103934 ,atomic_flag_test_explicit,1,no
+,atomic_flag_test,1,no
+,atomic_flag_test_explicit,1,no
 ,atomic_flag_test_and_set,1,no
 ,atomic_flag_test_and_set_explicit,1,no
 ,atomic_flag_clear,1,no
 ,atomic_flag_clear_explicit,1,no
-# libstdc++/103934 ,atomic_flag_wait,1,no
-# libstdc++/103934 ,atomic_flag_wait_explicit,1,no
-# libstdc++/103934 ,atomic_flag_notify_one,1,no
-# libstdc++/103934 ,atomic_flag_notify_all,1,no
+,atomic_flag_wait,1,no
+,atomic_flag_wait_explicit,1,no
+,atomic_flag_notify_one,1,no
+,atomic_flag_notify_all,1,no
 ,atomic_thread_fence,1,no
 ,atomic_signal_fence,1,no
 ,barrier,1,no
@@ -238,7 +238,48 @@
 ,to_chars,1,no
 ,from_chars_result,1,no
 ,from_chars,1,no
-#  TODO
+,chrono::duration,1,cxx11
+,chrono::nanoseconds,1,cxx11
+,chrono::microseconds,1,cxx11
+,chrono::milliseconds,1,cxx11
+,chrono::seconds,1,cxx11
+,chrono::minutes,1,cxx11
+,chrono::hours,1,cxx11
+,chrono::days,1,cxx20
+,chrono::weeks,1,cxx20
+,chrono::months,1,cxx20
+,chrono::years,1,cxx20
+,chrono::duration_cast,1,cxx11
+,chrono::time_point,1,cxx11
+,chrono::time_point_cast,1,cxx11
+,chrono::system_clock,1,cxx11
+,chrono::steady_clock,1,cxx11
+,chrono::high_resolution_clock,1,cxx11
+,chrono::utc_clock,1,cxx20
+,chrono::tai_clock,1,cxx20
+,chrono::gps_clock,1,cxx20
+,chrono::file_clock,1,cxx20
+,chrono::local_t,1,cxx20
+,chrono::clock_cast,1,cxx20
+,chrono::time_zone,1,cxx20
+,chrono::zoned_time,1,cxx20
+,chrono::tzdb,1,cxx20
+,chrono::tzdb_list,1,cxx20
+,chrono::get_tzdb,1,cxx20
+,chrono::get_tzdb_list,1,cxx20
+,chrono::reload_tzdb,1,cxx20
+,chrono::remote_version,1,cxx20
+,chrono::locate_zone,1,cxx20
+,chrono::leap_second,1,cxx20
+,chrono::leap_second_info,1,cxx20
+,chrono::get_leap_second_info,1,cxx20
+# c++/106851 ,chrono::abs,1,cxx17
+# c++/106851 ,chrono::floor,1,cxx17
+# c++/106851 ,chrono::ceil,1,cxx17
+,chrono::round,1,cxx17
+,chrono::from_stream,1,cxx20
+,chrono::parse,1,cxx20
+#  TODO the rest
 #  TODO
 ,weak_equality,1,cxx20
 ,strong_equality,1,cxx20
@@ -311,6 +352,33 @@
 # ,sorted_equivalent,1,no
 # ,erase_if,1,no
 # ,uses_allocator,1,no
+,format,1,cxx20
+,format_to,1,cxx20
+,format_to_n,1,cxx20
+,formatted_size,1,cxx20
+,vformat,1,cxx20
+,vformat_to,1,cxx20
+,formatter,1,cxx20
+,range_formatter,1,cxx23
+,range_format,1,cxx23
+,formattable,1,cxx23
+,format_error,1,cxx20
+,basic_format_parse_context,1,cxx20
+,format_parse_context,1,cxx20
+,wformat_parse_context,1,cxx20
+,basic_format_context,1,cxx20
+,format_context,1,cxx20
+,wformat_context,1,cxx20
+,basic_format_string,1,cxx20
+,format_string,1,cxx20
+,wformat_string,1,cxx20
+,basic_format_arg,1,cxx20
+,basic_format_args,1,cxx20
+,format_args,1,cxx20
+,wformat_args,1,cxx20
+,make_format_args,1,cxx20
+,make_wformat_args,1,cxx20
+,runtime_format,1,cxx26
 ,forward_list,1,cxx11
 ,operator==,1,no
 ,operator<=>,1,no
@@ -361,6 +429,8 @@
 ,shared_future,1,no
 ,packaged_task,1,cxx11
 ,async,1,cxx11
+,generator,1,cxx23
+# c++/106851 ,pmr::generator,1,no
 ,initializer_list,1,no
 ,begin,1,no
 ,end,1,no
@@ -433,6 +503,9 @@
 ,ostreambuf_iterator,1,cxx98
 ,prev,1,cxx11
 ,reverse_iterator,1,cxx98
+,common_iterator,1,cxx20
+,counted_iterator,1,cxx20
+,const_iterator,1,cxx23
 #  TODO the rest
 ,latch,1,no
 ,list,1,cxx98
@@ -590,6 +663,8 @@
 ,flush_emit,1,cxx20
 ,operator<<,1,no
 #  TODO
+,print,1,cxx23
+,println,1,cxx23
 ,queue,1,cxx98
 ,operator==,1,no
 ,operator!=,1,no
@@ -720,6 +795,11 @@
 ,range_error,1,cxx98
 ,overflow_error,1,cxx98
 ,underflow_error,1,cxx98
+,float16_t,1,cxx23
+,float32_t,1,cxx23
+,float64_t,1,cxx23
+,float128_t,1,cxx23
+,bfloat16_t,1,cxx23
 ,stop_token,1,cxx20
 ,stop_source,1,cxx20
 ,nostopstate_t,1,no
@@ -801,7 +881,7 @@
 ,tuple,1,cxx11
 ,basic_common_reference,1,no
 ,common_type,1,no
-,ignore,1,no
+,ignore,1,cxx11
 ,make_tuple,1,cxx11
 ,forward_as_tuple,1,cxx11
 ,tie,1,cxx11
diff --git a/g

[committed] libstdc++: Implement LWG 3746 for std::optional

2024-08-23 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This avoids constraint recursion in operator<=> for std::optional.
The resolution was approved in Kona 2022.

libstdc++-v3/ChangeLog:

* include/std/optional (__is_derived_from_optional): New
concept.
(operator<=>): Use __is_derived_from_optional.
* testsuite/20_util/optional/relops/lwg3746.cc: New test.
---
 libstdc++-v3/include/std/optional | 12 +--
 .../20_util/optional/relops/lwg3746.cc| 20 +++
 2 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/optional/relops/lwg3746.cc

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 6651686cd1d..933a5b15e56 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -1581,9 +1581,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !__rhs; }
 #endif // three-way-comparison
 
+#if __cpp_lib_concepts
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 4072. std::optional comparisons: constrain harder
-#if __cpp_lib_concepts
 # define _REQUIRES_NOT_OPTIONAL(T) requires (!__is_optional_v)
 #else
 # define _REQUIRES_NOT_OPTIONAL(T)
@@ -1675,8 +1675,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !__rhs || __lhs >= *__rhs; }
 
 #ifdef __cpp_lib_three_way_comparison
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3746. optional's spaceship with U with a type derived from optional
+  // causes infinite constraint meta-recursion
+  template
+concept __is_derived_from_optional = requires (const _Tp& __t) {
+  [](const optional<_Up>&){ }(__t);
+};
+
   template
-requires (!__is_optional_v<_Up>)
+requires (!__is_derived_from_optional<_Up>)
   && three_way_comparable_with<_Up, _Tp>
 constexpr compare_three_way_result_t<_Tp, _Up>
 operator<=> [[nodiscard]] (const optional<_Tp>& __x, const _Up& __v)
diff --git a/libstdc++-v3/testsuite/20_util/optional/relops/lwg3746.cc 
b/libstdc++-v3/testsuite/20_util/optional/relops/lwg3746.cc
new file mode 100644
index 000..46065f8e901
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/optional/relops/lwg3746.cc
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++20 } }
+
+// LWG 3746. optional's spaceship with U with a type derived from optional
+// causes infinite constraint meta-recursion
+
+#include 
+
+struct S : std::optional
+{
+bool operator==(const S&) const;
+bool operator<(const S&) const;
+bool operator>(const S&) const;
+bool operator<=(const S&) const;
+bool operator>=(const S&) const;
+};
+
+auto cmp(const S& s, const std::optional& o)
+{
+  return s <=> o;
+}
-- 
2.46.0



Re: [PATCH] lto: Don't check obj.found for offload section

2024-08-23 Thread Richard Biener
On Fri, Aug 23, 2024 at 2:36 PM H.J. Lu  wrote:
>
> obj.found is the number of LTO symbols.  We should include the offload
> section when it is used by linker even if there are no LTO symbols.

OK.

> PR lto/116361
> * lto-plugin.c (claim_file_handler_v2): Don't check obj.found
> for the offload section.
>
> Signed-off-by: H.J. Lu 
> ---
>  lto-plugin/lto-plugin.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
> index 61b0de62f52..c564b36eb92 100644
> --- a/lto-plugin/lto-plugin.c
> +++ b/lto-plugin/lto-plugin.c
> @@ -1320,7 +1320,7 @@ claim_file_handler_v2 (const struct 
> ld_plugin_input_file *file,
>if (*can_be_claimed && !obj.offload && offload_files_last_lto == NULL)
>  offload_files_last_lto = offload_files_last;
>
> -  if (obj.offload && known_used && obj.found > 0)
> +  if (obj.offload && known_used)
>  {
>/* Add file to the list.  The order must be exactly the same as the 
> final
>  order after recompilation and linking, otherwise host and target 
> tables
> --
> 2.46.0
>


Re: [patch][v3] libgomp: Add interop types and routines to OpenMP's headers and module

2024-08-23 Thread Tobias Burnus

v3:

Changes:

(A) The 'ret_code' arguments of omp_get_interop_{int,ptr,str} are 
actually 'optional'.


That's something that got lost in at some point between OpenMP 5.2 and 
TR13 (I filed OpenMP spec Issue #4165 for it). When adding it, I noticed 
that two '…_async' function lacked the '= NULL' for C++, permitting to 
omit the argument. — For my C and Fortran testcases, I added a test with 
NULL for C and omitted the argument for Fortran. I also changed the C 
code such that it also compiles with C++ and added a check that the 
omitted argument is handled correctly.


(B) Fixed a few libgomp/target.c issues, which sneaked in due to the wip 
patch for the libgomp plugin patch, posted at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html (among 
others, it also contained some spurious spaces).


Build and regtested on x86-64-gnu-linux (w/o offloading configured).

Any additional comments, suggestions, remarks?

Andre Vehreschild wrote:
[…]
First, Thanks for your comments. However, regarding:

+omp_intptr_t

Do I get this correct, that omp_intptr_t is a pointer to an integer?


No 'intptr_t' is a (signed) integer type which is has (at least) the 
size of a pointer; in Fortran, that's 'integer(c_intptr_t)'. And 
'omp_intptr_t' is just a typedef for 'intptr_t'. [BTW: I don't know why 
'intptr_t' was used and not, e.g., int64_t or just 'int'.]


Tobiaslibgomp: Add interop types and routines to OpenMP's headers and module

This commit adds OpenMP 5.1+'s interop enumeration, type and routine
declarations to the C/C++ header file and, new in OpenMP TR13, also to
the Fortran module and omp_lib.h header file.

While a stub implementation is provided, only with foreign runtime
support by the libgomp GPU plugins and with the 'interop' directive,
this becomes really useful.

libgomp/ChangeLog:

	* fortran.c (omp_get_interop_str_, omp_get_interop_name_,
	omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add.
	* libgomp.map (GOMP_5.1.3): New; add interop routines.
	* omp.h.in: Add interop typedefs, enum and prototypes.
	(__GOMP_DEFAULT_NULL): Define.
	(omp_target_memcpy_async, omp_target_memcpy_rect_async):
	Use it for the optional depend argument.
 	* omp_lib.f90.in: Add paramters and interfaces for interop.
	* omp_lib.h.in: Likewise; move F90 '&' to column 81 for
	-ffree-length-80.
	* target.c (omp_get_num_interop_properties, omp_get_interop_int,
	omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name,
	omp_get_interop_type_desc, omp_get_interop_rc_desc): Add.
	* config/gcn/target.c (omp_get_num_interop_properties,
	omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
	omp_get_interop_name, omp_get_interop_type_desc,
	omp_get_interop_rc_desc): Add.
	* config/nvptx/target.c (omp_get_num_interop_properties,
	omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
	omp_get_interop_name, omp_get_interop_type_desc,
	omp_get_interop_rc_desc): Add.
	* testsuite/libgomp.c-c++-common/interop-routines-1.c: New test.
	* testsuite/libgomp.c-c++-common/interop-routines-2.c: New test.
	* testsuite/libgomp.fortran/interop-routines-1.F90: New test.
	* testsuite/libgomp.fortran/interop-routines-2.F90: New test.
	* testsuite/libgomp.fortran/interop-routines-3.F: New test.
	* testsuite/libgomp.fortran/interop-routines-4.F: New test.
	* testsuite/libgomp.fortran/interop-routines-5.F: New test.
	* testsuite/libgomp.fortran/interop-routines-6.F: New test.
	* testsuite/libgomp.fortran/interop-routines-7.F90: New test.

 libgomp/config/gcn/target.c| 105 ++
 libgomp/config/nvptx/target.c  | 105 ++
 libgomp/fortran.c  |  41 +++
 libgomp/libgomp.map|  15 +
 libgomp/omp.h.in   |  78 -
 libgomp/omp_lib.f90.in |  99 ++
 libgomp/omp_lib.h.in   | 170 --
 libgomp/target.c   | 110 +++
 .../libgomp.c-c++-common/interop-routines-1.c  | 287 +
 .../libgomp.c-c++-common/interop-routines-2.c  | 354 +
 .../libgomp.fortran/interop-routines-1.F90 | 236 ++
 .../libgomp.fortran/interop-routines-2.F90 |   3 +
 .../testsuite/libgomp.fortran/interop-routines-3.F |   2 +
 .../testsuite/libgomp.fortran/interop-routines-4.F |   4 +
 .../testsuite/libgomp.fortran/interop-routines-5.F |   4 +
 .../testsuite/libgomp.fortran/interop-routines-6.F |   4 +
 .../libgomp.fortran/interop-routines-7.F90 | 290 +
 17 files changed, 1883 insertions(+), 24 deletions(-)

diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index 9cafea4e2cc..f7fa6aa6396 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -185,3 +185,108 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
   (void) depend;
   __builtin_unreachable ();
 }
+
+

[patch][v2] libgomp.texi: Document OpenMP's Interoperability Routines

2024-08-23 Thread Tobias Burnus
Minor update, mainly because of the 'optional' changes in v3 of the 
patch https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661313.html


The 'optional' affects the omp_get_interop_{int,ptr,str} but also 
omp_target_memcpy_async, omp_target_memcpy_rect_async got a few words.


Additionally, the returned string of omp_get_interop_type_desc is now 
better described (in GCC it is the C/C++ type decl as string or "N/A" or 
NULL). And a couple of notes about calling the routines from inside a 
non-host target region were added.


Tobias Burnus:

Add documentation for OpenMP's interoperability routines.

This obviously, depends on the actual implementation patch, posted at: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661035.html 
(albeit I will post a v2 in a moment).


I am sure there will be comments, suggestions and remarks :-)

Tobias

PS: I am not 100% sure whether adding the implementation detail makes 
sense or not.

Tobiaslibgomp.texi: Document OpenMP's Interoperability Routines

libgomp/ChangeLog:

	* libgomp.texi (Interoperability Routines): Add.
	(omp_target_memcpy_async, omp_target_memcpy_rect_async):
	Document that depobj_list may be omitted in C++ and Fortran.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index fe25d879788..b36b58b6d10 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -656,7 +656,7 @@ specification in version 5.2.
 * Lock Routines::
 * Timing Routines::
 * Event Routine::
-@c * Interoperability Routines::
+* Interoperability Routines::
 * Memory Management Routines::
 @c * Tool Control Routine::
 * Environment Display Routine::
@@ -2134,8 +2134,9 @@ to the destination device's @var{dst} address shifted by @var{dst_offset}.
 Task dependence is expressed by passing an array of depend objects to
 @var{depobj_list}, where the number of array elements is passed as
 @var{depobj_count}; if the count is zero, the @var{depobj_list} argument is
-ignored.  The routine returns zero if the copying process has successfully
-been started and non-zero otherwise.
+ignored.  In C++ and Fortran, the @var{depobj_list} argument can also be
+omitted in that case.   The routine returns zero if the copying process has
+successfully been started and non-zero otherwise.
 
 Running this routine in a @code{target} region except on the initial device
 is not supported.
@@ -2255,7 +2256,8 @@ respectively.  The offset per dimension to the first element to be copied is
 given by the @var{dst_offset} and @var{src_offset} arguments.  Task dependence
 is expressed by passing an array of depend objects to @var{depobj_list}, where
 the number of array elements is passed as @var{depobj_count}; if the count is
-zero, the @var{depobj_list} argument is ignored.  The routine
+zero, the @var{depobj_list} argument is ignored.  In C++ and Fortran, the
+@var{depobj_list} argument can also be omitted in that case.  The routine
 returns zero on success and non-zero otherwise.
 
 The OpenMP specification only requires that @var{num_dims} up to three is
@@ -2884,21 +2886,315 @@ event handle that has already been fulfilled is also undefined.
 
 
 
-@c @node Interoperability Routines
-@c @section Interoperability Routines
-@c
-@c Routines to obtain properties from an @code{omp_interop_t} object.
-@c They have C linkage and do not throw exceptions.
-@c
-@c @menu
-@c * omp_get_num_interop_properties:: 
-@c * omp_get_interop_int:: 
-@c * omp_get_interop_ptr:: 
-@c * omp_get_interop_str:: 
-@c * omp_get_interop_name:: 
-@c * omp_get_interop_type_desc:: 
-@c * omp_get_interop_rc_desc:: 
-@c @end menu
+@node Interoperability Routines
+@section Interoperability Routines
+
+Routines to obtain properties from an object of OpenMP interop type.
+They have C linkage and do not throw exceptions.
+
+@menu
+* omp_get_num_interop_properties:: Get the number of implementation-specific properties
+* omp_get_interop_int:: Obtain integer-valued interoperability property
+* omp_get_interop_ptr:: Obtain pointer-valued interoperability property
+* omp_get_interop_str:: Obtain string-valued interoperability property
+* omp_get_interop_name:: Obtain the name of an interop_property value as string
+* omp_get_interop_type_desc:: Obtain type and description to an interop_property
+* omp_get_interop_rc_desc:: Obtain error string to an interop_rc error code
+@end menu
+
+
+
+@node omp_get_num_interop_properties
+@subsection @code{omp_get_num_interop_properties} -- Get the number of implementation-specific properties
+@table @asis
+@item @emph{Description}:
+The @code{omp_get_num_interop_properties} function returns the number of
+implementation-defined interoperability properties available for the passed
+@var{interop}, extending the OpenMP-defined properties.  The available OpenMP
+interop_property-type values range from @code{omp_ipr_first} to the value
+returned by @code{omp_get_num_interop_properties} minus one.
+
+No implementation-defined properties are currently defined in GCC.
+
+Implementation remark: In GCC, th

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-23 Thread Jeff Law




On 8/22/24 9:02 PM, HAO CHEN GUI wrote:

Hi Hongtao,

在 2024/8/23 9:47, Hongtao Liu 写道:

On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI  wrote:


Hi Hongtao,

在 2024/8/21 11:21, Hongtao Liu 写道:

r15-3058-gbb42c551905024 support const0 operand for movv16qi, please
rebase your patch and see if there's still the regressions.


There's still regressions. The patch enables V16QI const0 store, but
it also enables V8QI const0 store. The vector mode is preferable than
scalar mode so that V8QI is used for 8-byte memory clear instead of
DI. It's sub-optimal.

Could we check if mode_size is greater than HOST_BITS_PER_WIDE_INT?

Not sure if all targets prefer it. Richard & Jeff, what's your opinion?
Sorry, I haven't been following.  That doesn't seem like a good test at 
the surface (why would HOST_BITS_PER_WIDE_INT matter here, that's a 
property of the host, not the target).


Additionally, selection of the "optimal" mode may be impossible as 
there's just not going to be enough context.  For a given target there 
may be cases where something like V16QI is good and for the same target 
cases where doing a series of DI accesses would be better.


So we have to pick sensible modes and give the targets ways to turn the 
knobs to hopefully get better code depending on the desired behavior of 
each (sub)target.


So how's that for a non-answer?  :-)




IMHO, could we disable it from predicate or convert it to DI mode store
if V8QI const0 store is sub-optimal on i386?
I'd look for ways to allow the x86 port to control behavior.  Presumably 
the problem is the move-by-pieces code is emitting stores directly 
rather than going through an expander?



Jeff


[PATCH 3/9 v3] c++, coroutines: Separate allocator work from the ramp body build.

2024-08-23 Thread Iain Sandoe
>>>http://eel.is/c++draft/dcl.fct.def.coroutine#12 (sentence 2) says " If both 
>>>a usual deallocation function with only a pointer parameter and a usual 
>>>deallocation function with both a pointer parameter and a size parameter are 
>>>found, then the selected deallocation function shall be the one with two 
>>>parameters.”
>>>however, if my promise provides both - the one with a single param is always 
>>>chosen.
>>>It is not that the other overload is invalid - if I do not include the 
>>>single param version, the two param one is happily selected.

>>Ah, that's backwards from https://eel.is/c++draft/expr.delete#9.4 "If the 
>>deallocation functions belong to a class scope, the one without a parameter 
>>of type std​::​size_t is selected."
>>
>>This is implemented as
>>
>>  /* -- If the deallocation functions have class scope, the one 
>> without a parameter of type std::size_t is selected.  */
>>  bool want_size;
>>  if (DECL_CLASS_SCOPE_P (fn))
>>want_size = false;
>>
>>I guess we need some way for build_op_delete_call to know that we want the 
>>other preference in this case.

>Adding a defaulted param to the existing call seems to be messy since it would 
>interrupt the complain being last parm theme..

>… I suppose we could add an overload with an additional bool specifying 
>priority to the two argument case?

>if that seems reasonable, I can take that on - as part of this patch (or 
>separately).

That patch might look as below.

I have addressed the other comments here;

If this does not look like a good direction, then perhaps we could proceed
with the original version and address improvements as a follow-on along with
the other changes we need to make to support over-aligned frame objects?

thanks,
Iain

--- 8< ---

This splits out the building of the allocation and deallocation expressions
and runs them early in the ramp build, so that we can exit if they are not
usable, before we start building the ramp body.

Likewise move checks for other required resources to the begining of the
ramp builder.

This is preparation for work needed to update the allocation/destruction
in cases where we have excess alignment of the promise or other saved frame
state.

gcc/cp/ChangeLog:

* call.cc (build_op_delete_call_1): Renamed and added a param
to allow the caller to prioritize two argument usual deleters.
(build_op_delete_call): Add an overload to expose the option
to prioritize two argument deleters.
* coroutines.cc (coro_get_frame_dtor): Rename...
(build_coroutine_frame_delete_expr):... to this; simplify to
use build_op_delete_call for all cases.
(build_actor_fn): Use revised frame delete function.
(build_coroutine_frame_alloc_expr): New.
(cp_coroutine_transform::complete_ramp_function): Rename...
(cp_coroutine_transform::build_ramp_function): ... to this.
Reorder code to carry out checks for prerequisites before the
codegen. Split out the allocation/delete code.
(cp_coroutine_transform::apply_transforms): Use revised name.
* coroutines.h: Rename function.
* cp-tree.h (build_op_delete_call): Add an overload for the version
that allows two operand deleters.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C: Use revised
diagnostics.
* g++.dg/coroutines/coro-bad-gro-00-class-gro-scalar-return.C:
Likewise.
* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C:
Likewise.
* g++.dg/coroutines/coro-bad-grooaf-00-static.C: Likewise.
* g++.dg/coroutines/ramp-return-b.C: Likewise.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/call.cc|  36 +-
 gcc/cp/coroutines.cc  | 462 ++
 gcc/cp/coroutines.h   |   2 +-
 gcc/cp/cp-tree.h  |   3 +
 .../coroutines/coro-bad-alloc-01-bad-op-del.C |   2 +-
 .../coro-bad-gro-00-class-gro-scalar-return.C |   4 +-
 .../coro-bad-gro-01-void-gro-non-class-coro.C |   4 +-
 .../coroutines/coro-bad-grooaf-00-static.C|   6 +-
 .../g++.dg/coroutines/ramp-return-b.C |   8 +-
 9 files changed, 292 insertions(+), 235 deletions(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 0fe679aae9f..377882a1c61 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -7851,6 +7851,8 @@ usual_deallocation_fn_p (tree fn)
SIZE is the size of the memory block to be deleted.
GLOBAL_P is true if the delete-expression should not consider
class-specific delete operators.
+   TWO_ARGS_PRIORITY_P is true if a two argument usual deallocation should be
+   chosen in preference to the single argument version in a class context.
PLACEMENT is the corresponding placement new call, or NULL_TREE.
 
If this call to "operator delete" is being generated as part to
@@ -7859,10 +7861,10 @@ usual_deallocation_fn_p

RE: [PATCH] lto: Don't check obj.found for offload section

2024-08-23 Thread Prathamesh Kulkarni
> -Original Message-
> From: H.J. Lu 
> Sent: Friday, August 23, 2024 6:07 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Prathamesh Kulkarni ;
> richard.guent...@gmail.com
> Subject: [PATCH] lto: Don't check obj.found for offload section
> 
> External email: Use caution opening links or attachments
> 
> 
> obj.found is the number of LTO symbols.  We should include the offload
> section when it is used by linker even if there are no LTO symbols.
> 
> PR lto/116361
> * lto-plugin.c (claim_file_handler_v2): Don't check obj.found
> for the offload section.
Hi,
I applied your patch locally, and can confirm this fixes the issue with 
offloading, thanks!

Thanks,
Prathamesh
> 
> Signed-off-by: H.J. Lu 
> ---
>  lto-plugin/lto-plugin.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index
> 61b0de62f52..c564b36eb92 100644
> --- a/lto-plugin/lto-plugin.c
> +++ b/lto-plugin/lto-plugin.c
> @@ -1320,7 +1320,7 @@ claim_file_handler_v2 (const struct
> ld_plugin_input_file *file,
>if (*can_be_claimed && !obj.offload && offload_files_last_lto ==
> NULL)
>  offload_files_last_lto = offload_files_last;
> 
> -  if (obj.offload && known_used && obj.found > 0)
> +  if (obj.offload && known_used)
>  {
>/* Add file to the list.  The order must be exactly the same as
> the final
>  order after recompilation and linking, otherwise host and
> target tables
> --
> 2.46.0



Re: [PATCH v2] tree-optimization/116024 - match.pd: add 4 int-compare simplifications

2024-08-23 Thread Artemiy Volkov
On 8/22/2024 3:15 PM, Richard Biener wrote:
 > On Wed, 21 Aug 2024, Artemiy Volkov wrote:
 >
 >> Hi,
 >>
 >> sending a v2 of
 >> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659851.html after
 >> changing variable types in all new testcases from standard to 
fixed-width.
 >>
 >> Could anyone please assist with reviewing and/or pushing to trunk/14 
since I
 >> don't have commit access?
 >>
 >> Many thanks,
 >> Artemiy
 >>
 >> -- 8< 
 >>
 >> This patch implements match.pd patterns for the following 
transformations:
 >>
 >> (1) (UB-on-overflow types) C1 - X cmp C2 -> X cmp C1 - C2
 >>
 >> (2) (unsigned types) C1 - X cmp C2 ->
 >>  (a) X cmp C1 - C2, when cmp is !=, ==
 >>  (b) X - (C1 - C2) cmp C2, when cmp is <=, >
 >>  (c) X - (C1 - C2 + 1) cmp C2, when cmp is <, >=,
 >>
 >> (3) (signed wrapping types) C1 - X cmp C2
 >>  (a) X cmp C1 - C2, when cmp is !=, ==
 >>  (b) X - (C1 + 1) rcmp -(C2 + 1), otherwise
 >>
 >> (4) (all wrapping types) X + C1 cmp C2 ->
 >>  (a) X cmp C2 - C1, when cmp is !=, ==
 >>  (b) X cmp -C1, when cmp is <=, > and C2 - C1 == max
 >>  (c) X cmp -C1, when cmp is <, >= and C2 - C1 == min
 >
 > It would be easier to review if separating those, it wasn't all
 > clear what of (1) .. (4) corresponds to which hunk below.

OK, I will split this patch into 4 in v3.

 >
 >> Included along are testcases for all the aforementioned changes.  This
 >> patch has been bootstrapped and regtested on aarch64, x86_64, and i386,
 >> and additionally regtested on riscv32.  Existing tests were adjusted
 >> where necessary.
 >>
 >> gcc/ChangeLog:
 >>
 >> PR tree-optimization/116024
 >>  * match.pd: New transformations around integer comparison.
 >>
 >> gcc/testsuite/ChangeLog:
 >>
 >>  * gcc.dg/tree-ssa/pr116024.c: New test.
 >>  * gcc.dg/tree-ssa/pr116024-1.c: Ditto.
 >>  * gcc.dg/tree-ssa/pr116024-1-fwrapv.c: Ditto.
 >>  * gcc.dg/tree-ssa/pr116024-2.c: Ditto.
 >>  * gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto.
 >>  * gcc.dg/pr67089-6.c: Adjust.
 >>  * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Ditto.
 >>
 >> Signed-off-by: Artemiy Volkov 
 >> ---
 >>   gcc/match.pd  | 75 ++-
 >>   gcc/testsuite/gcc.dg/pr67089-6.c  |  4 +-
 >>   .../gcc.dg/tree-ssa/pr116024-1-fwrapv.c   | 74 ++
 >>   gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c| 74 ++
 >>   .../gcc.dg/tree-ssa/pr116024-2-fwrapv.c   | 38 ++
 >>   gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c| 39 ++
 >>   gcc/testsuite/gcc.dg/tree-ssa/pr116024.c  | 74 ++
 >>   .../gcc.target/aarch64/gtu_to_ltu_cmp_1.c |  2 +-
 >>   8 files changed, 376 insertions(+), 4 deletions(-)
 >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
 >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c
 >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2-fwrapv.c
 >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c
 >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c
 >>
 >> diff --git a/gcc/match.pd b/gcc/match.pd
 >> index 65a3aae2243..bf3ccef7437 100644
 >> --- a/gcc/match.pd
 >> +++ b/gcc/match.pd
 >> @@ -8800,6 +8800,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 >>  (cmp @0 { TREE_OVERFLOW (res)
 >>  ? drop_tree_overflow (res) : res; }
 >>   (for cmp (lt le gt ge)
 >> + rcmp (gt ge lt le)
 >>(for op (plus minus)
 >> rop (minus plus)
 >> (simplify
 >> @@ -8827,7 +8828,79 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 >>   "X cmp C2 -+ C1"),
 >>  WARN_STRICT_OVERFLOW_COMPARISON);
 >> }
 >> -   (cmp @0 { res; })
 >> +   (cmp @0 { res; })
 >> +/* For wrapping types, simplify X + C1 CMP C2 to X CMP -C1 when 
possible.  */
 >
 > it's more like X +- C1 CMP C2 to X CMP' C2 -+ C1 when C2 -+ C1 == +-INF?
 >
 > I have to think hard whether this requires an unsigned type
 > (naturally TYPE_OVERFLOW_WRAPS) or if it also works for signed
 > types (with -fwrapv).
 >
 > Can you explain?

If CMP is <= and C2 -+ C1 == +INF, we can add -+C1 to both sides of the 
comparison (and use the fact that -INF <= X+-C1) to obtain (-INF-+C1 <=) 
X <= C2-+C1. The right part always holds, so we are left with -INF-+C1 
<= X. (> is the negation of this, so -INF-+C1 > X).

Similarly, for >= (and <), using the fact that X+-C1 <= +INF, we can 
eliminate C2 and simplify to X <= +INF-+C1.

 >
 >> +   (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
 >> + (with
 >> +   {
 >> +   wide_int max = wi::max_value (TREE_TYPE (@0));
 >> +   wide_int min = wi::min_value (TREE_TYPE (@0));
 >> +
 >> +   wide_int c2 = rop == PLUS_EXPR
 >> + ? wi::add (wi::to_wide (@2), wi::to_wide (@1))
 >> + : wi::sub (wi::to_wide (@2), wi::to_wide (@1));
 

[PATCH 4/9 v2] c++, coroutines: Fix handling of early exceptions [PR113773].

2024-08-23 Thread Iain Sandoe
Hi Jason,

>>+  tree iarc_m = lookup_member (frame_type, coro_frame_i_a_r_c_id,
>>+1, 0, tf_warning_or_error);
>>+  tree iarc_x = build_class_member_access_expr (deref_fp, iarc_m, 
>>NULL_TREE,
>>+false, tf_warning_or_error);

>Do you need to call lookup_member directly?  I'd think you could just pass the 
>identifier to build_class_member_access_expr.  And in several other places in 
>coroutine.cc, as well.

AFAICT, I do... the doc for build_class_member_access_expr () says:
 "MEMBER is a DECL or baselink.".  We could extend the API to allow a member
identifier (which would be nice).  Maybe I'm missing something else?

>> /* We always expect to delete the frame.  */

>This seems to no longer be true after this patch?
Updated the comments.

>>+  fnf_if = do_poplevel (fnf_if_scope);
>>+  add_stmt (fnf_if);

>Why not finish_if_stmt instead of these 4 lines?
Actually, I have no recollection of why they are like that, some fallout of
the evolution of the code...

Amended to use finish_if_stmt() instead.
retested on x86_64-darwin, OK for trunk?
thanks
Iain

--- 8< ---

The responsibility for destroying part of the frame content (promise,
arg copies and the frame itself) transitions from the ramp to the
body of the coroutine once we reach the await_resume () for the
initial suspend.

We added the variable that flags the transition, but failed to act on
it.  This corrects that so that the ramp only tries to run DTORs for
objects when an exception occurs before the initial suspend await
resume has started.

PR c++/113773

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Only cleanup the
frame state on exceptions that occur before the initial await
resume has begun.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/pr113773.C: New test.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc  | 39 +++
 .../g++.dg/coroutines/torture/pr113773.C  | 66 +++
 2 files changed, 92 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/pr113773.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 8cdc76a6054..2f128db42e1 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -5129,12 +5129,22 @@ cp_coroutine_transform::build_ramp_function ()
  finish_if_stmt_cond (coro_gro_live, gro_d_if);
  finish_expr_stmt (gro_ret_dtor);
  finish_then_clause (gro_d_if);
- tree gro_d_if_scope = IF_SCOPE (gro_d_if);
- IF_SCOPE (gro_d_if) = NULL;
- gro_d_if = do_poplevel (gro_d_if_scope);
- add_stmt (gro_d_if);
+ finish_if_stmt (gro_d_if);
}
 
+  /* Before initial resume is called, the responsibility for cleanup on
+exception falls to the ramp.  After that, the coroutine body code
+should do the cleanup.  */
+  tree iarc_m = lookup_member (frame_type, coro_frame_i_a_r_c_id,
+  1, 0, tf_warning_or_error);
+  tree iarc_x
+   = build_class_member_access_expr (deref_fp, iarc_m, NULL_TREE,
+ /*preserve_reference*/false,
+ tf_warning_or_error);
+  tree not_iarc
+   = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node, iarc_x);
+  tree cleanup_if = begin_if_stmt ();
+  finish_if_stmt_cond (not_iarc, cleanup_if);
   /* If the promise is live, then run its dtor if that's available.  */
   if (promise_dtor && promise_dtor != error_mark_node)
{
@@ -5142,10 +5152,7 @@ cp_coroutine_transform::build_ramp_function ()
  finish_if_stmt_cond (coro_promise_live, promise_d_if);
  finish_expr_stmt (promise_dtor);
  finish_then_clause (promise_d_if);
- tree promise_d_if_scope = IF_SCOPE (promise_d_if);
- IF_SCOPE (promise_d_if) = NULL;
- promise_d_if = do_poplevel (promise_d_if_scope);
- add_stmt (promise_d_if);
+ finish_if_stmt (promise_d_if);
}
 
   /* Clean up any frame copies of parms with non-trivial dtors.
@@ -5169,15 +5176,21 @@ cp_coroutine_transform::build_ramp_function ()
  finish_if_stmt_cond (parm_i->guard_var, dtor_if);
  finish_expr_stmt (parm_i->fr_copy_dtor);
  finish_then_clause (dtor_if);
- tree parm_d_if_scope = IF_SCOPE (dtor_if);
- IF_SCOPE (dtor_if) = NULL;
- dtor_if = do_poplevel (parm_d_if_scope);
- add_stmt (dtor_if);
+ finish_if_stmt (dtor_if);
}
}
 
-  /* We always expect to delete the frame.  */
+  /* No delete the frame if required.  */
+  tree fnf_if = begin_if_stmt ();
+  finish_if_stmt_cond (fnf_x, fnf_if);
   finish_expr_stmt (delete_frame_call);
+  finish_then_clause (fnf_if);
+  

Re: [PATCH] late-combine: Preserve INSN_CODE when modifying notes [PR116343]

2024-08-23 Thread Jeff Law




On 8/23/24 6:02 AM, Georg-Johann Lay wrote:



Hi, this fails on machines that don't support scheduling:

cc1: warning: instruction scheduling not supported on this target machine

FAIL: gcc.dg/torture/pr116343.c   -O0  (test for excess errors)
Excess errors:
cc1: warning: instruction scheduling not supported on this target machine

Two paths make sense to me.

First we could add a -w to the flags in the relevant testcases to 
suppress the warning.


Second we could just eliminate the warning completely.  The warning may 
have made sense in the run-up to gcc-2 when we added the instruction 
scheduler.  But we're 30 years past that point.


I'd support either approach.

jeff


RE: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2024-08-23 Thread Tamar Christina
Hi Jennifer,

> -Original Message-
> From: Jennifer Schmitz 
> Sent: Friday, August 23, 2024 1:07 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Kyrylo Tkachov
> 
> Subject: [RFC][PATCH] AArch64: Remove
> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> 
> This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> tunable and
> use_new_vector_costs entry in aarch64-tuning-flags.def and makes the
> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the
> default.
> To that end, the function aarch64_use_new_vector_costs_p and its uses were
> removed. Additionally, guards were added prevent nullpointer dereferences of
> fields in cpu_vector_cost.
> 

I'm not against this change, but it does mean that we now switch old Adv. SIMD
cost models as well to the new throughput based cost models.  That means that
-mcpu=generic now behaves differently, and -mcpu=neoverse-n1 and I think
some distros explicitly use this (I believe yocto for instance does).

Have we validated that the old generic cost model still behaves sensibly with 
this change?

> The patch was bootstrapped and regtested on aarch64-linux-gnu:
> No problems bootstrapping, but several test files (in aarch64-sve.exp:
> gather_load_extend_X.c
> where X is 1 to 4, strided_load_2.c, strided_store_2.c) fail because of small
> differences
> in codegen that make some of the scan-assembler-times tests fail.
> 
> Kyrill suggested to add a -fvect-cost-model=unlimited flag to these tests and 
> add
> some

I don't personally like unlimited here as unlimited means just vectorize at any
cost.  This means that costing between modes are also disabled. A lot of these
testcases are intended to test not just that we vectorize but that we vectorize
with efficient code.

I'd prefer to use -fvect-cost-model=dynamic if that fixes the testcases.

Thanks,
Tamar

> logic to aarch64_vector_costs::add_stmt_cost to disable the changes in vector
> instructions
> when flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED. If you agree with
> that
> suggestion, I propose prepending the current patch by one that implements this
> logic and adding
> -fvect-cost-model=unlimited to the failing tests. Please advise.
> 
> Signed-off-by: Jennifer Schmitz 
> 
> gcc/
>   * config/aarch64/aarch64-tuning-flags.def: Remove
>   use_new_vector_costs as tuning option.
>   * config/aarch64/aarch64.cc (aarch64_use_new_vector_costs_p):
>   Remove.
>   (aarch64_in_loop_reduction_latency): Add nullpointer dereference
>   guard.
>   (aarch64_detect_vector_stmt_subtype): Likewise.
>   (aarch64_vector_costs::add_stmt_cost): Remove use of
>   aarch64_use_new_vector_costs_p and add nullpointer dereference
>   guards.
>   (aarch64_vector_costs::finish_cost): Remove use of
>   aarch64_use_new_vector_costs_p.
>   * config/aarch64/tuning_models/cortexx925.h: Remove
>   AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS.
>   * config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
>   * config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
>   * config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
>   * config/aarch64/tuning_models/neoversen2.h: Likewise.
>   * config/aarch64/tuning_models/neoversen3.h: Likewise.
>   * config/aarch64/tuning_models/neoversev1.h: Likewise.
>   * config/aarch64/tuning_models/neoversev2.h: Likewise.
>   * config/aarch64/tuning_models/neoversev3.h: Likewise.


Re: [PATCH] late-combine: Preserve INSN_CODE when modifying notes [PR116343]

2024-08-23 Thread Richard Biener



> Am 23.08.2024 um 16:49 schrieb Jeff Law :
> 
> 
> 
>> On 8/23/24 6:02 AM, Georg-Johann Lay wrote:
>> 
>> Hi, this fails on machines that don't support scheduling:
>> cc1: warning: instruction scheduling not supported on this target machine
>> FAIL: gcc.dg/torture/pr116343.c   -O0  (test for excess errors)
>> Excess errors:
>> cc1: warning: instruction scheduling not supported on this target machine
> Two paths make sense to me.
> 
> First we could add a -w to the flags in the relevant testcases to suppress 
> the warning.
> 
> Second we could just eliminate the warning completely.  The warning may have 
> made sense in the run-up to gcc-2 when we added the instruction scheduler.  
> But we're 30 years past that point.
> 
> I'd support either approach.

I think there’s an effective target for insn scheduling 

> jeff


Re: [PATCH] late-combine: Preserve INSN_CODE when modifying notes [PR116343]

2024-08-23 Thread Jeff Law




On 8/23/24 9:45 AM, Richard Biener wrote:




Am 23.08.2024 um 16:49 schrieb Jeff Law :




On 8/23/24 6:02 AM, Georg-Johann Lay wrote:

Hi, this fails on machines that don't support scheduling:
cc1: warning: instruction scheduling not supported on this target machine
FAIL: gcc.dg/torture/pr116343.c   -O0  (test for excess errors)
Excess errors:
cc1: warning: instruction scheduling not supported on this target machine

Two paths make sense to me.

First we could add a -w to the flags in the relevant testcases to suppress the 
warning.

Second we could just eliminate the warning completely.  The warning may have 
made sense in the run-up to gcc-2 when we added the instruction scheduler.  But 
we're 30 years past that point.

I'd support either approach.


I think there’s an effective target for insn scheduling

If so, that's fine by me as well.

jeff


[PATCH] MIPS: Include missing mips16.S in libgcc/lib1funcs.S

2024-08-23 Thread YunQiang Su
mips16.S was missing since
commit 29b74545531f6afbee9fc38c267524326dbfbedf
Date:   Thu Jun 1 10:14:24 2023 +0800

MIPS: Add speculation_barrier support

Without mips16.S included, some symbols will miss for mips16, and
so some software will fail to build.

libgcc/ChangeLog:

* config/mips/lib1funcs.S: Includes mips16.S.
---
 libgcc/config/mips/lib1funcs.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/mips/lib1funcs.S b/libgcc/config/mips/lib1funcs.S
index fa8114b37d9..324a84e7846 100644
--- a/libgcc/config/mips/lib1funcs.S
+++ b/libgcc/config/mips/lib1funcs.S
@@ -19,7 +19,7 @@ a copy of the GCC Runtime Library Exception along with this 
program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
-//#include "mips16.S"
+#include "mips16.S"
 
 #ifdef L_speculation_barrier
 
-- 
2.39.3 (Apple Git-146)



Re: [PATCH] late-combine: Preserve INSN_CODE when modifying notes [PR116343]

2024-08-23 Thread Georg-Johann Lay

Am 23.08.24 um 17:45 schrieb Richard Biener:

Am 23.08.2024 um 16:49 schrieb Jeff Law :

On 8/23/24 6:02 AM, Georg-Johann Lay wrote:

Hi, this fails on machines that don't support scheduling:
cc1: warning: instruction scheduling not supported on this target machine
FAIL: gcc.dg/torture/pr116343.c   -O0  (test for excess errors)
Excess errors:
cc1: warning: instruction scheduling not supported on this target machine

Two paths make sense to me.

First we could add a -w to the flags in the relevant testcases to suppress the 
warning.

Second we could just eliminate the warning completely.  The warning may have 
made sense in the run-up to gcc-2 when we added the instruction scheduler.  But 
we're 30 years past that point.

I'd support either approach.


I think there’s an effective target for insn scheduling


Ya,


/* { dg-require-effective-target scheduling } */

Johann


[PATCH] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-08-23 Thread Simon Martin
We currently emit an incorrect -Woverloaded-virtual warning upon the following
test case

=== cut here ===
struct A {
  virtual operator int() { return 42; }
  virtual operator char() = 0;
};
struct B : public A {
  operator char() { return 'A'; }
};
=== cut here ===

The problem is that warn_hidden relies on get_basefndecls to find the methods
in A possibly hidden B's operator char(), and gets both the conversion operator
to int and to char. It eventually wrongly concludes that the conversion to int
is hidden.

This patch fixes this by filtering out conversion operators to different types
from the list returned by get_basefndecls.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/109918

gcc/cp/ChangeLog:

* class.cc (warn_hidden): Filter out conversion operators to different
types.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Woverloaded-virt5.C: New test.

---
 gcc/cp/class.cc   | 33 ---
 gcc/testsuite/g++.dg/warn/Woverloaded-virt5.C | 12 +++
 2 files changed, 34 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Woverloaded-virt5.C

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index fb6c3370950..a8178a31fe8 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -3267,18 +3267,29 @@ warn_hidden (tree t)
if (TREE_CODE (fndecl) == FUNCTION_DECL
&& DECL_VINDEX (fndecl))
  {
-   /* If the method from the base class has the same
-  signature as the method from the derived class, it
-  has been overridden.  Note that we can't move on
-  after finding one match: fndecl might override
-  multiple base fns.  */
for (size_t k = 0; k < base_fndecls.length (); k++)
- if (base_fndecls[k]
- && same_signature_p (fndecl, base_fndecls[k]))
-   {
- base_fndecls[k] = NULL_TREE;
- any_override = true;
-   }
+ {
+   if (!base_fndecls[k])
+ continue;
+   /* If FNS is a conversion operator, base_fndecls contains
+  all conversion operators from base classes; we need to
+  remove those converting to a different type.  */
+   if (IDENTIFIER_CONV_OP_P (name)
+   && !same_type_p (DECL_CONV_FN_TYPE (fndecl),
+DECL_CONV_FN_TYPE (base_fndecls[k])))
+ {
+   base_fndecls[k] = NULL_TREE;
+ }
+   /* If the method from the base class has the same signature
+  as the method from the derived class, it has been
+  overriden.  Note that we can't move on after finding
+  one match: fndecl might override multiple base fns.  */
+   else if (same_signature_p (fndecl, base_fndecls[k]))
+ {
+   base_fndecls[k] = NULL_TREE;
+   any_override = true;
+ }
+ }
  }
if (!any_override)
  seen_non_override = true;
diff --git a/gcc/testsuite/g++.dg/warn/Woverloaded-virt5.C 
b/gcc/testsuite/g++.dg/warn/Woverloaded-virt5.C
new file mode 100644
index 000..ea26569e565
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Woverloaded-virt5.C
@@ -0,0 +1,12 @@
+// PR c++/109918
+// { dg-do compile }
+// { dg-additional-options -Woverloaded-virtual }
+
+struct A {
+  virtual operator int() { return 42; }
+  virtual operator char() = 0;
+};
+
+struct B : public A {
+  operator char() { return 'A'; }
+};
-- 
2.44.0




Re: [PATCH] rs6000: Fix PTImode handling in power8 swap optimization pass [PR116415]

2024-08-23 Thread Peter Bergner
On 8/22/24 8:48 PM, Peter Bergner wrote:
> On 8/22/24 4:39 AM, Kewen.Lin wrote:
>> OK for trunk and all active release branches with/without these nits tweaked,
>> but please give others two days or so to comment, thanks!
> 
> I'll make the suggested changes and push them to trunk when my new set of
> regtests are clean.  I'll let it bake there and then push to the release
> later.   Thanks!

The updated patch tested clean as expected, so I pushed it to trunk.
I'll let it sit there for a bit to let our CI testers verify it doesn't
expose any issues on our various different builds before pushing to
the release branches.  Thanks!

Peter




Re: [PATCH] ifcvt: Disallow emitting call instructions in noce_convert_multiple_sets [PR116358]

2024-08-23 Thread Philipp Tomsich
Applied to master, thanks!
--Philipp.


On Thu, 22 Aug 2024 at 20:30, Jeff Law  wrote:
>
>
>
> On 8/22/24 5:04 AM, Manolis Tsamis wrote:
> > Similar to not allowing jump instructions in the generated code, we also
> > shouldn't allow call instructions in noce_convert_multiple_sets.
> > In the case of PR116358 a libcall was generated from force_operand.
> >
> >   PR middle-end/116358
> >
> > gcc/ChangeLog:
> >
> >   * ifcvt.cc (noce_convert_multiple_sets): Disallow call insns.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/pr116358.c: New test.
> OK
> jeff
>


Re: [PATCH] ifcvt: Do not overwrite results in noce_convert_multiple_sets [PR116372, PR116405]

2024-08-23 Thread Philipp Tomsich
As reported on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116372,
this change restores bootstrap.

Committing as obvious.
--Philipp.

On Tue, 20 Aug 2024 at 21:57, Manolis Tsamis  wrote:
>
> Now that more operations are allowed for noce_convert_multiple_sets, it is
> possible that the same register appears multiple times as target in a
> basic block.  After noce_convert_multiple_sets_1 is called we potentially
> also emit register moves from temporaries back to the original targets.
> In some cases where the target registers overlap with the block's condition,
> these register moves may overwrite intermediate variables because they're
> emitted after the if-converted code.  To address this issue we now iterate
> backwards and keep track of seen registers when emitting these final register
> moves.
>
> Fix-up for the recent ifcvt commit 72c9b5f4
>
> PR rtl-optimization/116372
> PR rtl-optimization/116405
>
> gcc/ChangeLog:
>
> * ifcvt.cc (noce_convert_multiple_sets): Iterate backwards and track
> target registers.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr116372.c: New test.
> * gcc.dg/pr116405.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
>  gcc/ifcvt.cc| 22 ++
>  gcc/testsuite/gcc.dg/pr116372.c | 13 +
>  gcc/testsuite/gcc.dg/pr116405.c | 17 +
>  3 files changed, 48 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr116372.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr116405.c
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index da59c907891..36de036661b 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3515,10 +3515,24 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>   given an empty BB to convert, and we can't handle that.  */
>gcc_assert (!insn_info.is_empty ());
>
> -  /* Now fixup the assignments.  */
> -  for (unsigned i = 0; i < insn_info.length (); i++)
> -if (insn_info[i]->target != insn_info[i]->temporary)
> -  noce_emit_move_insn (insn_info[i]->target, insn_info[i]->temporary);
> +  /* Now fixup the assignments.
> + PR116405: Iterate in reverse order and keep track of the targets so that
> + a move does not overwrite a subsequent value when multiple instructions
> + have the same target.  */
> +  unsigned i;
> +  noce_multiple_sets_info *info;
> +  bitmap set_targets = BITMAP_ALLOC (®_obstack);
> +  FOR_EACH_VEC_ELT_REVERSE (insn_info, i, info)
> +{
> +  gcc_checking_assert (REG_P (info->target));
> +
> +  if (info->target != info->temporary
> + && !bitmap_bit_p (set_targets, REGNO (info->target)))
> +   noce_emit_move_insn (info->target, info->temporary);
> +
> +  bitmap_set_bit (set_targets, REGNO (info->target));
> +}
> +  BITMAP_FREE (set_targets);
>
>/* Actually emit the sequence if it isn't too expensive.  */
>rtx_insn *seq = get_insns ();
> diff --git a/gcc/testsuite/gcc.dg/pr116372.c b/gcc/testsuite/gcc.dg/pr116372.c
> new file mode 100644
> index 000..e9878ac5042
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr116372.c
> @@ -0,0 +1,13 @@
> +/* PR rtl-optimization/116372 */
> +/* { dg-do run } */
> +/* { dg-options "-O1" } */
> +/* { dg-additional-options "-march=z13" { target s390x-*-* } } */
> +
> +long x = -0x7fff - 1;
> +int main (void)
> +{
> +  long y = x % (-0xf - 1);
> +  if (-0x7fff - 1 + y == x == 0)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr116405.c b/gcc/testsuite/gcc.dg/pr116405.c
> new file mode 100644
> index 000..9223f15a298
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr116405.c
> @@ -0,0 +1,17 @@
> +/* PR rtl-optimization/116405 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-ssa-phiopt -fno-tree-dce" } */
> +
> +int printf(const char *, ...);
> +int a, b = 2, c = 1;
> +unsigned d, e;
> +int main() {
> + L:
> +  a = -1 / c;
> +  d = ~(b && (c && ~e) & b);
> +  printf("0\n");
> +  c = 0;
> +  if (d != -1)
> +goto L;
> +  return 0;
> +}
> --
> 2.34.1
>


[Committed 1/9] RISC-V: Use encoded nelts when calling repeating_sequence_p

2024-08-23 Thread Patrick O'Neill



On 8/22/24 13:41, Robin Dapp wrote:

Before looking at the rest (tomorrow) - this is OK.


Committed - thanks!

Patrick



Re: [PATCH 3/9] RISC-V: Handle 0.0 floating point pattern costing to match const_vector expander

2024-08-23 Thread Patrick O'Neill



On 8/22/24 13:45, Robin Dapp wrote:

+   /* Constants in range -16 ~ 15 integer or 0.0 floating-point
+  can be emitted using vmv.v.i.  */
+   if (satisfies_constraint_vi (x) || satisfies_constraint_Wc0 (x))
  return 1;

Just a nit but while you're at it, don't you want to split this off into
valid_vector_immediate or so?  That would make it extensible more easily.

Your call, OK either way.


Makes sense - I'll update this for v2


[PATCH 5/9 v2] c++, coroutines: Only allow void get_return_object if the ramp is void [PR100476].

2024-08-23 Thread Iain Sandoe
Hi Jason

>>+  /* Check for a bad get return object type.  */
>>+  tree gro_return_type = FUNC_OR_METHOD_TYPE_P (TREE_TYPE (get_ro_meth))
>>+? TREE_TYPE (TREE_TYPE (get_ro_meth))
>>+: TREE_TYPE (get_ro_meth);

>Is this to allow get_return_type to be a function-like-object?  If so, 
>checking its TREE_TYPE won't give the return type of its op(). Shouldn't we 
>just use TREE_TYPE (get_ro) once we have it?

Yes the idea is to try and test as much of the stuff that needs to be diagnosed
early, so that we do not need to tweak the input_pointer to get sensible 
messages.
However, in this case what you suggest will work fine - we can use error_at 
().
So now checking once we have the get_ro.

retested on x86_64-darwin, OK for trunk?
thanks
Iain

--- 8< ---

Require that the value returned by get_return_object is convertible to
the ramp return.  This means that the only time we allow a void
get_return_object, is when the ramp is also a void function.

We diagnose this early to allow us to exit the ramp build if the return
values are incompatible.

PR c++/100476

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Remove special
handling of void get_return_object expressions.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C:
Adjust expected diagnostic.
* g++.dg/coroutines/pr102489.C: Avoid void get_return_object.
* g++.dg/coroutines/pr103868.C: Likewise.
* g++.dg/coroutines/pr94879-folly-1.C: Likewise.
* g++.dg/coroutines/pr94883-folly-2.C: Likewise.
* g++.dg/coroutines/pr96749-2.C: Likewise.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc  | 47 +--
 .../coro-bad-gro-01-void-gro-non-class-coro.C |  2 +-
 gcc/testsuite/g++.dg/coroutines/pr102489.C|  2 +-
 gcc/testsuite/g++.dg/coroutines/pr103868.C|  2 +-
 .../g++.dg/coroutines/pr94879-folly-1.C   |  3 +-
 .../g++.dg/coroutines/pr94883-folly-2.C   | 39 +++
 gcc/testsuite/g++.dg/coroutines/pr96749-2.C   |  2 +-
 7 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 2f128db42e1..3c0757403ce 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4605,6 +4605,7 @@ cp_coroutine_transform::build_ramp_function ()
 
   tree promise_type = get_coroutine_promise_type (orig_fn_decl);
   tree fn_return_type = TREE_TYPE (TREE_TYPE (orig_fn_decl));
+  bool void_ramp_p = VOID_TYPE_P (fn_return_type);
 
   /* [dcl.fct.def.coroutine] / 10 (part1)
 The unqualified-id get_return_object_on_allocation_failure is looked up
@@ -4771,7 +4772,7 @@ cp_coroutine_transform::build_ramp_function ()
   tree cond = build1 (CONVERT_EXPR, frame_ptr_type, nullptr_node);
   cond = build2 (EQ_EXPR, boolean_type_node, coro_fp, cond);
   finish_if_stmt_cond (cond, if_stmt);
-  if (VOID_TYPE_P (fn_return_type))
+  if (void_ramp_p)
{
  /* Execute the get-return-object-on-alloc-fail call...  */
  finish_expr_stmt (grooaf);
@@ -4971,17 +4972,27 @@ cp_coroutine_transform::build_ramp_function ()
 coro_get_return_object_identifier,
 fn_start, NULL, /*musthave=*/true);
   /* Without a return object we haven't got much clue what's going on.  */
-  if (get_ro == error_mark_node)
+  if (!get_ro || get_ro == error_mark_node)
 {
   BIND_EXPR_BODY (ramp_bind) = pop_stmt_list (ramp_outer_bind);
   /* Suppress warnings about the missing return value.  */
   suppress_warning (orig_fn_decl, OPT_Wreturn_type);
   return false;
 }
+ 
+  /* Check for a bad get return object type.
+ [dcl.fct.def.coroutine] / 7 requires:
+ The expression promise.get_return_object() is used to initialize the
+ returned reference or prvalue result object ... */
+  tree gro_type = TREE_TYPE (get_ro);
+  if (VOID_TYPE_P (gro_type) && !void_ramp_p)
+{
+  error_at (fn_start, "no viable conversion from % provided by"
+   " % to return type %qT", fn_return_type);
+  return false;
+}
 
   tree gro_context_body = push_stmt_list ();
-  tree gro_type = TREE_TYPE (get_ro);
-  bool gro_is_void_p = VOID_TYPE_P (gro_type);
 
   tree gro = NULL_TREE;
   tree gro_bind_vars = NULL_TREE;
@@ -4990,8 +5001,11 @@ cp_coroutine_transform::build_ramp_function ()
   tree gro_cleanup_stmt = NULL_TREE;
   /* We have to sequence the call to get_return_object before initial
  suspend.  */
-  if (gro_is_void_p)
-r = get_ro;
+  if (void_ramp_p)
+{
+  gcc_checking_assert (VOID_TYPE_P (gro_type));
+  r = get_ro;
+}
   else if (same_type_p (gro_type, fn_return_type))
 {
  /* [dcl.fct.def.coroutine] / 7
@@ -5072,28 +5086,11 @@ cp_coroutine_transform::build_ramp_function ()
  for an object of the return type.  */
 
   if (sa

[r15-3128 Regression] FAIL: gcc.target/i386/part-vect-complexhf.c scan-assembler-times vfmaddcph[ \\t] 1 on Linux/x86_64

2024-08-23 Thread haochen.jiang
On Linux/x86_64,

de1923f9f4d5344694c22ca883aeb15caf635734 is the first bad commit
commit de1923f9f4d5344694c22ca883aeb15caf635734
Author: Richard Biener 
Date:   Fri Aug 23 13:44:29 2024 +0200

tree-optimization/116463 - complex lowering leaves around dead stmts

caused

FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-not 
vfmadd[123]*ph[ \\t]
FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-times 
vfmaddcph[ \\t] 1
FAIL: gcc.target/i386/part-vect-complexhf.c scan-assembler-times vfmaddcph[ 
\\t] 1

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3128/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-vector-complex-float.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-vector-complex-float.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-vector-complex-float.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-vector-complex-float.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/part-vect-complexhf.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/part-vect-complexhf.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] c++/coros: do not assume coros don't nest [PR113457]

2024-08-23 Thread Arsen Arsenović
Tested against folly and cppcoro, currently regstrapping on
x86_64-pc-linux-gnu.  coroutine.exp and coro-torture.exp passed.

OK for trunk?  (after regstrap)
-- >8 --
In the testcase presented in the PR, during template expansion, an
tsubst of an operand causes a lambda coroutine to be processed, causing
it to get an initial suspend and final suspend.  The code for assigning
awaitable var names (get_awaitable_var) assumed that the sequence Is ->
Is -> Fs -> Fs is impossible (i.e. that one could only 'open' one
coroutine before closing it at a time), and reset the counter used for
unique numbering each time a final suspend occured.  This assumption is
false in a few cases, usually when lambdas are involved.

Instead of storing this counter in a static-storage variable, we can
store it in coroutine_info.  This struct is local to each function, so
we don't need to worry about "cross-contamination" nor resetting.

PR c++/113457 - Nesting coroutine definitions (e.g. via a lambda or a 
template expansion) can ICE the compiler

gcc/cp/ChangeLog:

* coroutines.cc (struct coroutine_info): Add integer field
awaitable_number.  This is a counter used for assigning unique
names to awaitable temporaries.
(get_awaitable_var): Use awaitable_number from coroutine_info
instead of the static int awn.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr113457-1.C: New test.
* g++.dg/coroutines/pr113457.C: New test.
---
 gcc/cp/coroutines.cc |  19 +-
 gcc/testsuite/g++.dg/coroutines/pr113457-1.C |  25 +++
 gcc/testsuite/g++.dg/coroutines/pr113457.C   | 178 +++
 3 files changed, 216 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr113457-1.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr113457.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 81096784b4d7..65d17dac4d89 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -95,6 +95,10 @@ struct GTY((for_user)) coroutine_info
   tree return_void;   /* The expression for p.return_void() if it exists.  */
   location_t first_coro_expr; /* The location of the expression that turned
 this funtion into a coroutine.  */
+
+  /* Temporary variable number assigned by get_awaitable_var.  */
+  int awaitable_number = 0;
+
   /* Flags to avoid repeated errors for per-function issues.  */
   bool coro_ret_type_error_emitted;
   bool coro_promise_error_emitted;
@@ -1007,15 +1011,18 @@ enum suspend_point_kind {
 static tree
 get_awaitable_var (suspend_point_kind suspend_kind, tree v_type)
 {
-  static int awn = 0;
+  auto cinfo = get_coroutine_info (current_function_decl);
+  gcc_assert (cinfo);
   char *buf;
   switch (suspend_kind)
 {
-  default: buf = xasprintf ("Aw%d", awn++); break;
-  case CO_YIELD_SUSPEND_POINT: buf =  xasprintf ("Yd%d", awn++); break;
-  case INITIAL_SUSPEND_POINT: buf =  xasprintf ("Is"); break;
-  case FINAL_SUSPEND_POINT: buf =  xasprintf ("Fs"); awn = 0; break;
-  }
+default: buf = xasprintf ("Aw%d", cinfo->awaitable_number++); break;
+case CO_YIELD_SUSPEND_POINT:
+  buf = xasprintf ("Yd%d", cinfo->awaitable_number++);
+  break;
+case INITIAL_SUSPEND_POINT: buf =  xasprintf ("Is"); break;
+case FINAL_SUSPEND_POINT: buf =  xasprintf ("Fs"); break;
+}
   tree ret = get_identifier (buf);
   free (buf);
   ret = build_lang_decl (VAR_DECL, ret, v_type);
diff --git a/gcc/testsuite/g++.dg/coroutines/pr113457-1.C 
b/gcc/testsuite/g++.dg/coroutines/pr113457-1.C
new file mode 100644
index ..fcf67e15271c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr113457-1.C
@@ -0,0 +1,25 @@
+// https://gcc.gnu.org/PR113457
+#include 
+
+struct coro
+{
+  struct promise_type
+  {
+std::suspend_never initial_suspend ();
+std::suspend_never final_suspend () noexcept;
+void return_void ();
+void unhandled_exception ();
+coro get_return_object ();
+  };
+};
+
+struct not_quite_suspend_never : std::suspend_never
+{};
+
+coro
+foo ()
+{
+  co_await std::suspend_never{},
+[] () -> coro { co_return; },
+co_await not_quite_suspend_never{};
+}
diff --git a/gcc/testsuite/g++.dg/coroutines/pr113457.C 
b/gcc/testsuite/g++.dg/coroutines/pr113457.C
new file mode 100644
index ..77b1a3ceaa2b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr113457.C
@@ -0,0 +1,178 @@
+// https://gcc.gnu.org/PR113457
+namespace std {
+template  _Up __declval(int);
+template  auto declval() noexcept -> decltype(__declval<_Tp>(0));
+template  struct remove_cv {
+  using type = __remove_cv(_Tp);
+};
+template  using remove_cv_t = typename remove_cv<_Tp>::type;
+template  struct remove_reference {
+  using type = __remove_reference(_Tp);
+};
+template 
+using remove_reference_t = typename remove_reference<_Tp>::type;
+template  inline constexpr bool is_array_v = __is_array(_Tp);
+template  struct remove_

Re: [PATCH] c++/coros: do not assume coros don't nest [PR113457]

2024-08-23 Thread Iain Sandoe
Hi Arsen,

sorry, I missed one point when I looked through this earlier ..

> On 23 Aug 2024, at 20:23, Arsen Arsenović  wrote:
> 
> Tested against folly and cppcoro, currently regstrapping on
> x86_64-pc-linux-gnu.  coroutine.exp and coro-torture.exp passed.
> 
> OK for trunk?  (after regstrap)
> -- >8 --
> In the testcase presented in the PR, during template expansion, an
> tsubst of an operand causes a lambda coroutine to be processed, causing
> it to get an initial suspend and final suspend.  The code for assigning
> awaitable var names (get_awaitable_var) assumed that the sequence Is ->
> Is -> Fs -> Fs is impossible (i.e. that one could only 'open' one
> coroutine before closing it at a time), and reset the counter used for
> unique numbering each time a final suspend occured.  This assumption is
> false in a few cases, usually when lambdas are involved.
> 
> Instead of storing this counter in a static-storage variable, we can
> store it in coroutine_info.  This struct is local to each function, so
> we don't need to worry about "cross-contamination" nor resetting.
> 
>   PR c++/113457 - Nesting coroutine definitions (e.g. via a lambda or a 
> template expansion) can ICE the compiler
> 
> gcc/cp/ChangeLog:
> 
>   * coroutines.cc (struct coroutine_info): Add integer field
>   awaitable_number.  This is a counter used for assigning unique
>   names to awaitable temporaries.
>   (get_awaitable_var): Use awaitable_number from coroutine_info
>   instead of the static int awn.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/coroutines/pr113457-1.C: New test.
>   * g++.dg/coroutines/pr113457.C: New test.
> ---
> gcc/cp/coroutines.cc |  19 +-
> gcc/testsuite/g++.dg/coroutines/pr113457-1.C |  25 +++
> gcc/testsuite/g++.dg/coroutines/pr113457.C   | 178 +++
> 3 files changed, 216 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr113457-1.C
> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr113457.C
> 
> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
> index 81096784b4d7..65d17dac4d89 100644
> --- a/gcc/cp/coroutines.cc
> +++ b/gcc/cp/coroutines.cc
> @@ -95,6 +95,10 @@ struct GTY((for_user)) coroutine_info
>   tree return_void;   /* The expression for p.return_void() if it exists.  */
>   location_t first_coro_expr; /* The location of the expression that turned
>this funtion into a coroutine.  */
> +
> +  /* Temporary variable number assigned by get_awaitable_var.  */
> +  int awaitable_number = 0;
> +
>   /* Flags to avoid repeated errors for per-function issues.  */
>   bool coro_ret_type_error_emitted;
>   bool coro_promise_error_emitted;
> @@ -1007,15 +1011,18 @@ enum suspend_point_kind {
> static tree
> get_awaitable_var (suspend_point_kind suspend_kind, tree v_type)
> {
> -  static int awn = 0;
> +  auto cinfo = get_coroutine_info (current_function_decl);
> +  gcc_assert (cinfo);

If the purpose of this is to check for mistakes during development (i.e. we do
not see a reason for having it in a released compiler) - then it’s better to use
gcc_checking_assert() which will disappear for non-checking builds.

>   char *buf;
>   switch (suspend_kind)
> {
> -  default: buf = xasprintf ("Aw%d", awn++); break;
> -  case CO_YIELD_SUSPEND_POINT: buf =  xasprintf ("Yd%d", awn++); break;
> -  case INITIAL_SUSPEND_POINT: buf =  xasprintf ("Is"); break;
> -  case FINAL_SUSPEND_POINT: buf =  xasprintf ("Fs"); awn = 0; break;
> -  }
> +default: buf = xasprintf ("Aw%d", cinfo->awaitable_number++); break;
> +case CO_YIELD_SUSPEND_POINT:
> +  buf = xasprintf ("Yd%d", cinfo->awaitable_number++);
> +  break;
> +case INITIAL_SUSPEND_POINT: buf =  xasprintf ("Is"); break;
> +case FINAL_SUSPEND_POINT: buf =  xasprintf ("Fs"); break;
> +}
>   tree ret = get_identifier (buf);
>   free (buf);
>   ret = build_lang_decl (VAR_DECL, ret, v_type);
> diff --git a/gcc/testsuite/g++.dg/coroutines/pr113457-1.C 
> b/gcc/testsuite/g++.dg/coroutines/pr113457-1.C
> new file mode 100644
> index ..fcf67e15271c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/coroutines/pr113457-1.C
> @@ -0,0 +1,25 @@
> +// https://gcc.gnu.org/PR113457
> +#include 
> +
> +struct coro
> +{
> +  struct promise_type
> +  {
> +std::suspend_never initial_suspend ();
> +std::suspend_never final_suspend () noexcept;
> +void return_void ();
> +void unhandled_exception ();
> +coro get_return_object ();
> +  };
> +};
> +
> +struct not_quite_suspend_never : std::suspend_never
> +{};
> +
> +coro
> +foo ()
> +{
> +  co_await std::suspend_never{},
> +[] () -> coro { co_return; },
> +co_await not_quite_suspend_never{};
> +}
> diff --git a/gcc/testsuite/g++.dg/coroutines/pr113457.C 
> b/gcc/testsuite/g++.dg/coroutines/pr113457.C
> new file mode 100644
> index ..77b1a3ceaa2b
> --- /dev/null
> +++ b/g

Re: [PATCH] c++/coros: do not assume coros don't nest [PR113457]

2024-08-23 Thread Arsen Arsenović
Iain Sandoe  writes:

>> static tree
>> get_awaitable_var (suspend_point_kind suspend_kind, tree v_type)
>> {
>> -  static int awn = 0;
>> +  auto cinfo = get_coroutine_info (current_function_decl);
>> +  gcc_assert (cinfo);
>
> If the purpose of this is to check for mistakes during development (i.e. we do
> not see a reason for having it in a released compiler) - then it’s better to 
> use
> gcc_checking_assert() which will disappear for non-checking builds.

I figured it was OK since this check is extremely light - I can use
gcc_checking_assert if you prefer.  No strong feelings in this instance.

Thanks.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tree-optimization/116463 - complex lowering leaves around dead stmts

2024-08-23 Thread Andrew Pinski
On Fri, Aug 23, 2024 at 5:38 AM Richard Biener  wrote:
>
> Complex lowering generally replaces existing complex defs with
> COMPLEX_EXPRs but those might be dead when it can always refer to
> components from the lattice.  This in turn can pessimize followup
> transforms like forwprop and reassoc, the following makes sure to
> get rid of dead COMPLEX_EXPRs generated by using
> simple_dce_from_worklist.

Just an FYI, I had noticed this also when looking into PR 115544, 2
months ago and I was thinking about implementing then.
It also fixes that issue without the change to the _BitInt lower.

Thanks,
Andrew Pinski

>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, this will cause
> the following fallout which is similar to the aarch64 fallout in
> PR116463, complex SLP recognition being somewhat fragile.  I'll track
> this there.  Pushed.
>
> FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-not
> vfma
> dd[123]*ph[ t]
> FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c
> scan-assembler-times vf
> maddcph[ t] 1
> FAIL: gcc.target/i386/part-vect-complexhf.c scan-assembler-times
> vfmaddcph[ t] 1
>
>
> PR tree-optimization/116463
> * tree-complex.cc: Include tree-ssa-dce.h.
> (dce_worklist): New global.
> (update_complex_assignment): Add SSA def to the DCE worklist.
> (tree_lower_complex): Perform DCE.
> ---
>  gcc/tree-complex.cc | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
> index dfb45b9d91c..7480c07640e 100644
> --- a/gcc/tree-complex.cc
> +++ b/gcc/tree-complex.cc
> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "case-cfn-macros.h"
>  #include "builtins.h"
>  #include "optabs-tree.h"
> +#include "tree-ssa-dce.h"
>
>  /* For each complex ssa name, a lattice value.  We're interested in finding
> out whether a complex number is degenerate in some way, having only real
> @@ -88,6 +89,9 @@ static vec phis_to_revisit;
>  /* BBs that need EH cleanup.  */
>  static bitmap need_eh_cleanup;
>
> +/* SSA defs we should try to DCE.  */
> +static bitmap dce_worklist;
> +
>  /* Lookup UID in the complex_variable_components hashtable and return the
> associated tree.  */
>  static tree
> @@ -731,6 +735,7 @@ update_complex_assignment (gimple_stmt_iterator *gsi, 
> tree r, tree i)
>update_stmt (stmt);
>if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
>  bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
> +  bitmap_set_bit (dce_worklist, SSA_NAME_VERSION (gimple_assign_lhs (stmt)));
>
>update_complex_components (gsi, gsi_stmt (*gsi), r, i);
>  }
> @@ -1962,6 +1967,7 @@ tree_lower_complex (void)
>complex_propagate.ssa_propagate ();
>
>need_eh_cleanup = BITMAP_ALLOC (NULL);
> +  dce_worklist = BITMAP_ALLOC (NULL);
>
>complex_variable_components = new int_tree_htab_type (10);
>
> @@ -2008,6 +2014,9 @@ tree_lower_complex (void)
>
>gsi_commit_edge_inserts ();
>
> +  simple_dce_from_worklist (dce_worklist, need_eh_cleanup);
> +  BITMAP_FREE (dce_worklist);
> +
>unsigned todo
>  = gimple_purge_all_dead_eh_edges (need_eh_cleanup) ? TODO_cleanup_cfg : 
> 0;
>BITMAP_FREE (need_eh_cleanup);
> --
> 2.43.0


[committed] libstdc++: Improve Doxygen docs for std::allocator_traits specializations

2024-08-23 Thread Jonathan Wakely
Pushed to trunk.

-- >8 --

The main fix here is to use @header so that the docs show the correct
header file instead of an internal header like alloc_traits.h.

libstdc++-v3/ChangeLog:

* include/bits/alloc_traits.h: Improve doxygen docs for
allocator_traits specializations.
* include/bits/memory_resource.h: Likewise.
---
 libstdc++-v3/include/bits/alloc_traits.h| 16 ++--
 libstdc++-v3/include/bits/memory_resource.h |  9 +++--
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/alloc_traits.h 
b/libstdc++-v3/include/bits/alloc_traits.h
index c2acc2ab207..c64f4757d5d 100644
--- a/libstdc++-v3/include/bits/alloc_traits.h
+++ b/libstdc++-v3/include/bits/alloc_traits.h
@@ -549,7 +549,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #pragma GCC diagnostic pop
 
 #if _GLIBCXX_HOSTED
-  /// Partial specialization for std::allocator.
+  /**
+   * @brief  Partial specialization for `std::allocator`
+   * @headerfile memory
+   * @ingroup allocators
+   * @since C++11
+   * @see std::allocator_traits
+  */
   template
 struct allocator_traits>
 {
@@ -720,7 +726,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __rhs; }
 };
 
-  /// Explicit specialization for std::allocator.
+  /**
+   * @brief  Explicit specialization for `std::allocator`
+   * @headerfile memory
+   * @ingroup allocators
+   * @since C++11
+   * @see std::allocator_traits
+  */
   template<>
 struct allocator_traits>
 {
diff --git a/libstdc++-v3/include/bits/memory_resource.h 
b/libstdc++-v3/include/bits/memory_resource.h
index 5f50b296df7..db515fb30ef 100644
--- a/libstdc++-v3/include/bits/memory_resource.h
+++ b/libstdc++-v3/include/bits/memory_resource.h
@@ -52,7 +52,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 namespace pmr
 {
-  /// Class memory_resource
+  /// Class `memory_resource`
   /**
* @ingroup pmr
* @headerfile memory_resource
@@ -385,7 +385,12 @@ namespace pmr
 
   template struct allocator_traits;
 
-  /// Partial specialization for std::pmr::polymorphic_allocator
+  /// Partial specialization for `std::pmr::polymorphic_allocator`
+  /**
+   * @ingroup pmr
+   * @headerfile memory_resource
+   * @since C++17
+   */
   template
 struct allocator_traits>
 {
-- 
2.46.0



[committed] libstdc++: Hide std::tuple internals from Doxygen docs

2024-08-23 Thread Jonathan Wakely
Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/tuple: Do not include implementation details in
Doxygen documentation.
---
 libstdc++-v3/include/std/tuple | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 93b649e7d21..70cf4dba7b9 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -66,6 +66,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class tuple;
 
+  /// @cond undocumented
   template
 struct __is_empty_non_tuple : is_empty<_Tp> { };
 
@@ -823,6 +824,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
static constexpr bool __is_explicitly_constructible()
{ return false; }
 };
+  /// @endcond
 
   /// Primary class template, tuple
   template
-- 
2.46.0



[committed] libstdc++: Update and clarify Doxygen version requirements in manual

2024-08-23 Thread Jonathan Wakely
Pushed to trunk.

-- >8 --

There are lots of bugs that affect libstdc++ output from Doxygen, so
using 1.9.6 or later is recommended. Give a lower minimum, because some
distros still use 1.9.1 and that will work, albeit suboptimally.

libstdc++-v3/ChangeLog:

* doc/xml/manual/documentation_hacking.xml: Update minimum
Doxygen version.
* doc/html/*: Regenerate.
---
 libstdc++-v3/doc/html/manual/debug.html | 2 +-
 libstdc++-v3/doc/html/manual/documentation_hacking.html | 5 +++--
 libstdc++-v3/doc/html/manual/setup.html | 3 +--
 libstdc++-v3/doc/html/manual/using_exceptions.html  | 4 ++--
 libstdc++-v3/doc/html/manual/using_headers.html | 2 +-
 libstdc++-v3/doc/xml/manual/documentation_hacking.xml   | 5 +++--
 6 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/debug.html 
b/libstdc++-v3/doc/html/manual/debug.html
index a5f51569e00..1623cd30486 100644
--- a/libstdc++-v3/doc/html/manual/debug.html
+++ b/libstdc++-v3/doc/html/manual/debug.html
@@ -250,4 +250,4 @@
   with C++11 and later standards. They might be removed at a future date.
   Prev Up NextExceptions Home Part II. 
 Standard Contents
-  
+  
\ No newline at end of file
diff --git a/libstdc++-v3/doc/html/manual/documentation_hacking.html 
b/libstdc++-v3/doc/html/manual/documentation_hacking.html
index 047a62e0831..6b462b44acf 100644
--- a/libstdc++-v3/doc/html/manual/documentation_hacking.html
+++ b/libstdc++-v3/doc/html/manual/documentation_hacking.html
@@ -112,9 +112,10 @@
   supported, and are always aliased to dummy rules. These
   unsupported formats are: info,
   ps, and dvi.
-DoxygenPrerequisitesTable 
B.1. Doxygen PrerequisitesToolVersionRequired Bycoreutils8.5allbash4.1alldoxygen1.7.6.1allgraphviz2.26graphical 
hierarchiespdflatex2007-59pdf 
output
+DoxygenPrerequisitesTable 
B.1. Doxygen PrerequisitesToolVersionRequired Bycoreutils8.5allbash4.1alldoxygen1.9.1allgraphviz2.26graphical 
hierarchiespdflatex2007-59pdf 
output
Prerequisite tools are Bash 2.0 or later,
-   https://www.doxygen.nl"; 
target="_top">Doxygen, and
+   https://www.doxygen.nl"; target="_top">Doxygen
+   1.9.1 or later (for best results use at least 1.9.6), and
the http://www.gnu.org/software/coreutils/"; 
target="_top">GNU
coreutils. (GNU versions of find, xargs, and possibly
sed and grep are used, just because the GNU versions make
diff --git a/libstdc++-v3/doc/html/manual/setup.html 
b/libstdc++-v3/doc/html/manual/setup.html
index d8c5ff65cff..67bb6c108a1 100644
--- a/libstdc++-v3/doc/html/manual/setup.html
+++ b/libstdc++-v3/doc/html/manual/setup.html
@@ -22,8 +22,7 @@
   Because libstdc++ is part of GCC, the primary source for
installation instructions is
https://gcc.gnu.org/install/"; target="_top">the GCC 
install page.
-   In particular, the list of prerequisite software needed to build
-   the library
+   In particular, list of prerequisite software needed to build the library
https://gcc.gnu.org/install/prerequisites.html"; 
target="_top">
starts with those requirements. The same pages also list
the tools you will need if you wish to modify the source.
diff --git a/libstdc++-v3/doc/html/manual/using_exceptions.html 
b/libstdc++-v3/doc/html/manual/using_exceptions.html
index f3556ef9d75..706b27e1479 100644
--- a/libstdc++-v3/doc/html/manual/using_exceptions.html
+++ b/libstdc++-v3/doc/html/manual/using_exceptions.html
@@ -166,8 +166,8 @@ exception neutrality and exception safety.
 implicitly generated magic necessary to
 support try and catch blocks
 and thrown objects. (Language support
-for -fno-exceptions is documented in the GNU
-GCC https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options";
 target="_top">manual.)
+for -fno-exceptions is documented in the GCC 
+https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options";
 target="_top">manual.)
   Before detailing the library support
 for -fno-exceptions, first a passing note on
 the things lost when this flag is used: it will break exceptions
diff --git a/libstdc++-v3/doc/html/manual/using_headers.html 
b/libstdc++-v3/doc/html/manual/using_headers.html
index 5f669862654..49a82614344 100644
--- a/libstdc++-v3/doc/html/manual/using_headers.html
+++ b/libstdc++-v3/doc/html/manual/using_headers.html
@@ -186,5 +186,5 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 hello.cc 
-o test.exe
 ! ./stdc++.h.gch
 . /mnt/share/bld/H-x86-gcc.20071201/include/c++/4.3.0/iostream
 . /mnt/share/bld/H-x86-gcc.20071201include/c++/4.3.0/string
-The exclamation point to the left of the stdc++.h.gch listing means that the generated PCH file was 
used. Detailed information about creating precompiled header 
files can be found in the GCC https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html"; 
target="_top">documentation.
+The exclamation point to the le

Re: [PATCH] gm2: export all libc number conversion functions

2024-08-23 Thread Gaius Mulley
Wilken Gottwalt  writes:

> Export all string to integral and floating point number conversion functions
> (atof, atoi, atol, atoll, strtod, strtof, strtold, strtol, strtoll, strtoul,
> strtoull).
>
> gcc/gm2:
>   * gm2-libs/libc.def: Export all string to number conversion functions.
>
> Signed-off-by: Wilken Gottwalt 

Hi Wilken,

many thanks for the patch (I'll apply it) - yes lgtm,

regards,
Gaius


Re: [PATCH 3/9 v3] c++, coroutines: Separate allocator work from the ramp body build.

2024-08-23 Thread Jason Merrill

On 8/23/24 9:41 AM, Iain Sandoe wrote:

http://eel.is/c++draft/dcl.fct.def.coroutine#12 (sentence 2) says " If both a 
usual deallocation function with only a pointer parameter and a usual deallocation 
function with both a pointer parameter and a size parameter are found, then the 
selected deallocation function shall be the one with two parameters.”
however, if my promise provides both - the one with a single param is always 
chosen.
It is not that the other overload is invalid - if I do not include the single 
param version, the two param one is happily selected.



Ah, that's backwards from https://eel.is/c++draft/expr.delete#9.4 "If the 
deallocation functions belong to a class scope, the one without a parameter of type 
std​::​size_t is selected."

This is implemented as

  /* -- If the deallocation functions have class scope, the one without 
a parameter of type std::size_t is selected.  */
  bool want_size;
  if (DECL_CLASS_SCOPE_P (fn))
want_size = false;

I guess we need some way for build_op_delete_call to know that we want the 
other preference in this case.



Adding a defaulted param to the existing call seems to be messy since it would 
interrupt the complain being last parm theme..



… I suppose we could add an overload with an additional bool specifying 
priority to the two argument case?



if that seems reasonable, I can take that on - as part of this patch (or 
separately).


That patch might look as below.

I have addressed the other comments here;

If this does not look like a good direction, then perhaps we could proceed
with the original version and address improvements as a follow-on along with
the other changes we need to make to support over-aligned frame objects?

thanks,
Iain

--- 8< ---

This splits out the building of the allocation and deallocation expressions
and runs them early in the ramp build, so that we can exit if they are not
usable, before we start building the ramp body.

Likewise move checks for other required resources to the begining of the
ramp builder.

This is preparation for work needed to update the allocation/destruction
in cases where we have excess alignment of the promise or other saved frame
state.

gcc/cp/ChangeLog:

* call.cc (build_op_delete_call_1): Renamed and added a param
to allow the caller to prioritize two argument usual deleters.
(build_op_delete_call): Add an overload to expose the option
to prioritize two argument deleters.
* coroutines.cc (coro_get_frame_dtor): Rename...
(build_coroutine_frame_delete_expr):... to this; simplify to
use build_op_delete_call for all cases.
(build_actor_fn): Use revised frame delete function.
(build_coroutine_frame_alloc_expr): New.
(cp_coroutine_transform::complete_ramp_function): Rename...
(cp_coroutine_transform::build_ramp_function): ... to this.
Reorder code to carry out checks for prerequisites before the
codegen. Split out the allocation/delete code.
(cp_coroutine_transform::apply_transforms): Use revised name.
* coroutines.h: Rename function.
* cp-tree.h (build_op_delete_call): Add an overload for the version
that allows two operand deleters.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C: Use revised
diagnostics.
* g++.dg/coroutines/coro-bad-gro-00-class-gro-scalar-return.C:
Likewise.
* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C:
Likewise.
* g++.dg/coroutines/coro-bad-grooaf-00-static.C: Likewise.
* g++.dg/coroutines/ramp-return-b.C: Likewise.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/call.cc|  36 +-
  gcc/cp/coroutines.cc  | 462 ++
  gcc/cp/coroutines.h   |   2 +-
  gcc/cp/cp-tree.h  |   3 +
  .../coroutines/coro-bad-alloc-01-bad-op-del.C |   2 +-
  .../coro-bad-gro-00-class-gro-scalar-return.C |   4 +-
  .../coro-bad-gro-01-void-gro-non-class-coro.C |   4 +-
  .../coroutines/coro-bad-grooaf-00-static.C|   6 +-
  .../g++.dg/coroutines/ramp-return-b.C |   8 +-
  9 files changed, 292 insertions(+), 235 deletions(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 0fe679aae9f..377882a1c61 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -7851,6 +7851,8 @@ usual_deallocation_fn_p (tree fn)
 SIZE is the size of the memory block to be deleted.
 GLOBAL_P is true if the delete-expression should not consider
 class-specific delete operators.
+   TWO_ARGS_PRIORITY_P is true if a two argument usual deallocation should be
+   chosen in preference to the single argument version in a class context.
 PLACEMENT is the corresponding placement new call, or NULL_TREE.
  
 If this call to "operator delete" is being generated as part to

@@ -7859,10 +7861,10 @@ usual_deallocation_

Re: [PATCH 4/9 v2] c++, coroutines: Fix handling of early exceptions [PR113773].

2024-08-23 Thread Jason Merrill

On 8/23/24 10:36 AM, Iain Sandoe wrote:

Hi Jason,


+  tree iarc_m = lookup_member (frame_type, coro_frame_i_a_r_c_id,
+  1, 0, tf_warning_or_error);
+  tree iarc_x = build_class_member_access_expr (deref_fp, iarc_m, 
NULL_TREE,
+  false, tf_warning_or_error);



Do you need to call lookup_member directly?  I'd think you could just pass the 
identifier to build_class_member_access_expr.  And in several other places in 
coroutine.cc, as well.


AFAICT, I do... the doc for build_class_member_access_expr () says:
  "MEMBER is a DECL or baselink.".  We could extend the API to allow a member
identifier (which would be nice).  Maybe I'm missing something else?


Ah, I was confusing it with /finish/_class_member_access_expr, which can 
take an identifier.  Would it work to use that instead of build_?


In the meantime, this patch is OK.

Jason



Re: [PATCH 5/9 v2] c++, coroutines: Only allow void get_return_object if the ramp is void [PR100476].

2024-08-23 Thread Jason Merrill

On 8/23/24 2:30 PM, Iain Sandoe wrote:

Hi Jason


+  /* Check for a bad get return object type.  */
+  tree gro_return_type = FUNC_OR_METHOD_TYPE_P (TREE_TYPE (get_ro_meth))
+  ? TREE_TYPE (TREE_TYPE (get_ro_meth))
+  : TREE_TYPE (get_ro_meth);



Is this to allow get_return_type to be a function-like-object?  If so, checking 
its TREE_TYPE won't give the return type of its op(). Shouldn't we just use 
TREE_TYPE (get_ro) once we have it?


Yes the idea is to try and test as much of the stuff that needs to be diagnosed
early, so that we do not need to tweak the input_pointer to get sensible 
messages.
However, in this case what you suggest will work fine - we can use error_at 
().
So now checking once we have the get_ro.

retested on x86_64-darwin, OK for trunk?


OK.


thanks
Iain

--- 8< ---

Require that the value returned by get_return_object is convertible to
the ramp return.  This means that the only time we allow a void
get_return_object, is when the ramp is also a void function.

We diagnose this early to allow us to exit the ramp build if the return
values are incompatible.

PR c++/100476

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Remove special
handling of void get_return_object expressions.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C:
Adjust expected diagnostic.
* g++.dg/coroutines/pr102489.C: Avoid void get_return_object.
* g++.dg/coroutines/pr103868.C: Likewise.
* g++.dg/coroutines/pr94879-folly-1.C: Likewise.
* g++.dg/coroutines/pr94883-folly-2.C: Likewise.
* g++.dg/coroutines/pr96749-2.C: Likewise.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/coroutines.cc  | 47 +--
  .../coro-bad-gro-01-void-gro-non-class-coro.C |  2 +-
  gcc/testsuite/g++.dg/coroutines/pr102489.C|  2 +-
  gcc/testsuite/g++.dg/coroutines/pr103868.C|  2 +-
  .../g++.dg/coroutines/pr94879-folly-1.C   |  3 +-
  .../g++.dg/coroutines/pr94883-folly-2.C   | 39 +++
  gcc/testsuite/g++.dg/coroutines/pr96749-2.C   |  2 +-
  7 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 2f128db42e1..3c0757403ce 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4605,6 +4605,7 @@ cp_coroutine_transform::build_ramp_function ()
  
tree promise_type = get_coroutine_promise_type (orig_fn_decl);

tree fn_return_type = TREE_TYPE (TREE_TYPE (orig_fn_decl));
+  bool void_ramp_p = VOID_TYPE_P (fn_return_type);
  
/* [dcl.fct.def.coroutine] / 10 (part1)

  The unqualified-id get_return_object_on_allocation_failure is looked up
@@ -4771,7 +4772,7 @@ cp_coroutine_transform::build_ramp_function ()
tree cond = build1 (CONVERT_EXPR, frame_ptr_type, nullptr_node);
cond = build2 (EQ_EXPR, boolean_type_node, coro_fp, cond);
finish_if_stmt_cond (cond, if_stmt);
-  if (VOID_TYPE_P (fn_return_type))
+  if (void_ramp_p)
{
  /* Execute the get-return-object-on-alloc-fail call...  */
  finish_expr_stmt (grooaf);
@@ -4971,17 +4972,27 @@ cp_coroutine_transform::build_ramp_function ()
 coro_get_return_object_identifier,
 fn_start, NULL, /*musthave=*/true);
/* Without a return object we haven't got much clue what's going on.  */
-  if (get_ro == error_mark_node)
+  if (!get_ro || get_ro == error_mark_node)
  {
BIND_EXPR_BODY (ramp_bind) = pop_stmt_list (ramp_outer_bind);
/* Suppress warnings about the missing return value.  */
suppress_warning (orig_fn_decl, OPT_Wreturn_type);
return false;
  }
+
+  /* Check for a bad get return object type.
+ [dcl.fct.def.coroutine] / 7 requires:
+ The expression promise.get_return_object() is used to initialize the
+ returned reference or prvalue result object ... */
+  tree gro_type = TREE_TYPE (get_ro);
+  if (VOID_TYPE_P (gro_type) && !void_ramp_p)
+{
+  error_at (fn_start, "no viable conversion from % provided by"
+   " % to return type %qT", fn_return_type);
+  return false;
+}
  
tree gro_context_body = push_stmt_list ();

-  tree gro_type = TREE_TYPE (get_ro);
-  bool gro_is_void_p = VOID_TYPE_P (gro_type);
  
tree gro = NULL_TREE;

tree gro_bind_vars = NULL_TREE;
@@ -4990,8 +5001,11 @@ cp_coroutine_transform::build_ramp_function ()
tree gro_cleanup_stmt = NULL_TREE;
/* We have to sequence the call to get_return_object before initial
   suspend.  */
-  if (gro_is_void_p)
-r = get_ro;
+  if (void_ramp_p)
+{
+  gcc_checking_assert (VOID_TYPE_P (gro_type));
+  r = get_ro;
+}
else if (same_type_p (gro_type, fn_return_type))
  {
   /* [dcl.fct.def.coroutine] / 7
@@ -5072,28 +5086,11 @@ cp_coroutine_trans

[PATCH v3] Optimize initialization of small padded objects

2024-08-23 Thread Alexandre Oliva
Hello, Richi,

Thanks for the review and the feedback.

On Aug 22, 2024, Richard Biener  wrote:

>> +   /* If the object is small enough to go in registers, and it's
>> +  not required to be constructed in memory, clear it first.
>> +  That will avoid wasting cycles preserving any padding bits
>> +  that might be there, and if there aren't any, the compiler
>> +  is smart enough to optimize the clearing out.  */
>> +   else if (complete_p <= 0

> I wonder if for complete_p == 1 zeroing first also makes a difference?

It does, at the very least in that it raises dead assignment warnings.

In cases I analyzed, the stores eventually got combined and it didn't
make a difference.  But it's conceivable that zero-initialization might
aid some cases on some targets.  I didn't pursue that line of
investigation much: the warnings, and the difficulties I envisioned to
silence them, led me away.

> I also wonder whether the extra zeroing confuses SRA - try say struct
> s { short a; int b; } and see if SRA is still happily considering it.

Yeah, SRA seems to be happy to drop the struct object completely, in
which case the zero-initialization of the padding bits gets discarded,
and also to SRA relevant fields while keeping the whole object if it's
needed as a whole, so that the zero-initialization (often narrowed to
the padding bits only) is retained and aids in merging the stores more
efficiently.

>> +&& !TREE_ADDRESSABLE (ctor) && !TREE_THIS_VOLATILE (object)

> 2nd && to the next line

'k

>> +&& (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK (type))
>> +&& (opt_for_fn (cfun->decl, optimize)
>> +|| opt_for_fn (cfun->decl, optimize_size)))

> that's simplified as && optimize

It felt like a regression to add non-*fun references to flags in a
function that explicitly mentioned *fun elsewhere, but sure 'optimize'
looks nicer ;-)


Regstrapping on x86_64-linux-gnu.  Ok to install?


Optimize initialization of small padded objects

When small objects containing padding bits (or bytes) are fully
initialized, we will often store them in registers, and setting
bitfields and other small fields will attempt to preserve the
uninitialized padding bits, which tends to be expensive.
Zero-initializing registers, OTOH, tends to be cheap.

So, if we're optimizing, zero-initialize such small padded objects
even if that's not needed for correctness.  We can't zero-initialize
all such padding objects, though: if there's no padding whatsoever,
and all fields are initialized with nonzero, the zero initialization
would be flagged as dead.  That's why we introduce machinery to detect
whether objects have padding bits.  I considered distinguishing
between bitfields, units and larger padding elements, but I didn't
pursue that distinction.

Since the object's zero-initialization subsumes fields'
zero-initialization, the empty string test in builtin-snprintf-6.c's
test_assign_aggregate would regress without the addition of
native_encode_constructor.


for  gcc/ChangeLog

* expr.cc (categorize_ctor_elements_1): Change p_complete to
int, to distinguish complete initialization in presence or
absence of uninitialized padding bits.
(categorize_ctor_elements): Likewise.  Adjust all callers...
* expr.h (categorize_ctor_elements): ... and declaration.
(type_has_padding_at_level_p): New.
* gimple-fold.cc (type_has_padding_at_level_p): New.
* fold-const.cc (native_encode_constructor): New.
(native_encode_expr): Call it.
* gimplify.cc (gimplify_init_constructor): Clear small
non-addressable non-volatile objects with padding or
other uninitialized fields as an optimization.

for  gcc/testsuite/ChangeLog

* gcc.dg/init-pad-1.c: New.
---
 gcc/expr.cc   |   20 ++-
 gcc/expr.h|3 +-
 gcc/fold-const.cc |   36 +++
 gcc/gimple-fold.cc|   50 +
 gcc/gimplify.cc   |   14 ++
 gcc/testsuite/gcc.dg/init-pad-1.c |   18 +
 6 files changed, 132 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/init-pad-1.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 8d17a5a39b4bd..320be8b17a13e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7096,7 +7096,7 @@ count_type_elements (const_tree type, bool for_ctor_p)
 static bool
 categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
HOST_WIDE_INT *p_unique_nz_elts,
-   HOST_WIDE_INT *p_init_elts, bool *p_complete)
+   HOST_WIDE_INT *p_init_elts, int *p_complete)
 {
   unsigned HOST_WIDE_INT idx;
   HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields;
@@ -7218,7 +7218,10 @@ categorize_ctor_elements_1 (const_tree ctor, 
HOST_W

Re: [PATCH v3] Optimize initialization of small padded objects

2024-08-23 Thread Richard Biener



> Am 24.08.2024 um 06:59 schrieb Alexandre Oliva :
> 
> Hello, Richi,
> 
> Thanks for the review and the feedback.
> 
> On Aug 22, 2024, Richard Biener  wrote:
> 
>>> +   /* If the object is small enough to go in registers, and it's
>>> +  not required to be constructed in memory, clear it first.
>>> +  That will avoid wasting cycles preserving any padding bits
>>> +  that might be there, and if there aren't any, the compiler
>>> +  is smart enough to optimize the clearing out.  */
>>> +   else if (complete_p <= 0
> 
>> I wonder if for complete_p == 1 zeroing first also makes a difference?
> 
> It does, at the very least in that it raises dead assignment warnings.
> 
> In cases I analyzed, the stores eventually got combined and it didn't
> make a difference.  But it's conceivable that zero-initialization might
> aid some cases on some targets.  I didn't pursue that line of
> investigation much: the warnings, and the difficulties I envisioned to
> silence them, led me away.
> 
>> I also wonder whether the extra zeroing confuses SRA - try say struct
>> s { short a; int b; } and see if SRA is still happily considering it.
> 
> Yeah, SRA seems to be happy to drop the struct object completely, in
> which case the zero-initialization of the padding bits gets discarded,
> and also to SRA relevant fields while keeping the whole object if it's
> needed as a whole, so that the zero-initialization (often narrowed to
> the padding bits only) is retained and aids in merging the stores more
> efficiently.
> 
>>> +&& !TREE_ADDRESSABLE (ctor) && !TREE_THIS_VOLATILE (object)
> 
>> 2nd && to the next line
> 
> 'k
> 
>>> +&& (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK 
>>> (type))
>>> +&& (opt_for_fn (cfun->decl, optimize)
>>> +|| opt_for_fn (cfun->decl, optimize_size)))
> 
>> that's simplified as && optimize
> 
> It felt like a regression to add non-*fun references to flags in a
> function that explicitly mentioned *fun elsewhere, but sure 'optimize'
> looks nicer ;-)
> 
> 
> Regstrapping on x86_64-linux-gnu.  Ok to install?

Ok if Jakub doesn’t have any further comments next week.

Thanks,
Richard 

> 
> Optimize initialization of small padded objects
> 
> When small objects containing padding bits (or bytes) are fully
> initialized, we will often store them in registers, and setting
> bitfields and other small fields will attempt to preserve the
> uninitialized padding bits, which tends to be expensive.
> Zero-initializing registers, OTOH, tends to be cheap.
> 
> So, if we're optimizing, zero-initialize such small padded objects
> even if that's not needed for correctness.  We can't zero-initialize
> all such padding objects, though: if there's no padding whatsoever,
> and all fields are initialized with nonzero, the zero initialization
> would be flagged as dead.  That's why we introduce machinery to detect
> whether objects have padding bits.  I considered distinguishing
> between bitfields, units and larger padding elements, but I didn't
> pursue that distinction.
> 
> Since the object's zero-initialization subsumes fields'
> zero-initialization, the empty string test in builtin-snprintf-6.c's
> test_assign_aggregate would regress without the addition of
> native_encode_constructor.
> 
> 
> for  gcc/ChangeLog
> 
>* expr.cc (categorize_ctor_elements_1): Change p_complete to
>int, to distinguish complete initialization in presence or
>absence of uninitialized padding bits.
>(categorize_ctor_elements): Likewise.  Adjust all callers...
>* expr.h (categorize_ctor_elements): ... and declaration.
>(type_has_padding_at_level_p): New.
>* gimple-fold.cc (type_has_padding_at_level_p): New.
>* fold-const.cc (native_encode_constructor): New.
>(native_encode_expr): Call it.
>* gimplify.cc (gimplify_init_constructor): Clear small
>non-addressable non-volatile objects with padding or
>other uninitialized fields as an optimization.
> 
> for  gcc/testsuite/ChangeLog
> 
>* gcc.dg/init-pad-1.c: New.
> ---
> gcc/expr.cc   |   20 ++-
> gcc/expr.h|3 +-
> gcc/fold-const.cc |   36 +++
> gcc/gimple-fold.cc|   50 +
> gcc/gimplify.cc   |   14 ++
> gcc/testsuite/gcc.dg/init-pad-1.c |   18 +
> 6 files changed, 132 insertions(+), 9 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/init-pad-1.c
> 
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 8d17a5a39b4bd..320be8b17a13e 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -7096,7 +7096,7 @@ count_type_elements (const_tree type, bool for_ctor_p)
> static bool
> categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>HOST_WIDE_INT *p_unique_nz_elts,
> -HOST_WIDE_INT *p_init_elts, bool *p_complete

  1   2   >