date:20240808

Re: [PATCH] RISC-V: Fix missing abi arg in test

2024-08-08 Thread Robin Dapp

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> index d150f20b5d9..02814183dbb 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-O3 -march=rv64gcv_zvl256b -fdump-rtl-expand-details" } */
> +/* { dg-options "-O3 -march=rv64gcv_zvl256b -mabi=lp64d 
> -fdump-rtl-expand-details" } */
>  
>  int b[24];
>  _Bool c[24];

OK.

We really want to have an march imply an mabi, especially if there's no other
choice anyway.

-- 
Regards
 Robin

[PATCH 2/3] gcov: branch, conds, calls in function summaries

2024-08-08 Thread Jørgen Kvalsvik

The gcov function summaries only output the covered lines, not the
branches and calls. Since the function summaries is an opt-in it
probably makes sense to also include branch coverage, calls, and
condition coverage.

$ gcc --coverage -fpath-coverage hello.c -o hello
$ ./hello

Before:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 4

Function 'fn'
Lines executed:100.00% of 7

File 'hello.c'
Lines executed:100.00% of 11
Creating 'hello.c.gcov'

After:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

With conditions:
$ gcov -fg hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1
No conditions

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
Condition outcomes covered:100.00% of 8
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

gcc/ChangeLog:

* gcov.cc (generate_results): Count branches, conditions.
(function_summary): Output branch, calls, condition count.
---
 gcc/gcov.cc | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)
---
 gcc/gcov.cc | 48 +++-
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 5eb40f94b99..74ebcf10e4b 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -1687,11 +1687,19 @@ generate_results (const char *file_name)
   memset (&coverage, 0, sizeof (coverage));
   coverage.name = fn->get_name ();
   add_line_counts (flag_function_summary ? &coverage : NULL, fn);
-  if (flag_function_summary)
-   {
- function_summary (&coverage);
- fnotice (stdout, "\n");
-   }
+
+  if (!flag_function_summary)
+   continue;
+
+  for (const block_info& block : fn->blocks)
+   for (arc_info *arc = block.succ; arc; arc = arc->succ_next)
+ add_branch_counts (&coverage, arc);
+
+  for (const block_info& block : fn->blocks)
+   add_condition_counts (&coverage, &block);
+
+  function_summary (&coverage);
+  fnotice (stdout, "\n");
 }
 
   name_map needle;
@@ -2764,6 +2772,36 @@ function_summary (const coverage_info *coverage)
 {
   fnotice (stdout, "%s '%s'\n", "Function", coverage->name);
   executed_summary (coverage->lines, coverage->lines_executed);
+
+  if (coverage->branches)
+{
+  fnotice (stdout, "Branches executed:%s of %d\n",
+  format_gcov (coverage->branches_executed, coverage->branches, 2),
+  coverage->branches);
+  fnotice (stdout, "Taken at least once:%s of %d\n",
+  format_gcov (coverage->branches_taken, coverage->branches, 2),
+   coverage->branches);
+}
+  else
+fnotice (stdout, "No branches\n");
+
+  if (coverage->calls)
+fnotice (stdout, "Calls executed:%s of %d\n",
+format_gcov (coverage->calls_executed, coverage->calls, 2),
+coverage->calls);
+  else
+fnotice (stdout, "No calls\n");
+
+  if (flag_conditions)
+{
+  if (coverage->conditions)
+   fnotice (stdout, "Condition outcomes covered:%s of %d\n",
+format_gcov (coverage->conditions_covered,
+ coverage->conditions, 2),
+coverage->conditions);
+  else
+   fnotice (stdout, "No conditions\n");
+}
 }
 
 /* Output summary info for a file.  */
-- 
2.39.2

[PATCH 1/3] gcov: Cache source files

2024-08-08 Thread Jørgen Kvalsvik

Cache the source files as they are read, rather than discarding them at
the end of output_lines (), and move the reading of the source file to
the new function slurp.

This patch does not really change anything other than moving the file
reading out of output_file, but set gcov up for more interaction with
the source file. The motvating example is reporting coverage on
functions from different source files, notably C++ headers and
((always_inline)).

Here is an example of what gcov does today:

hello.h:
inline __attribute__((always_inline))
int hello (const char *s)
{
  if (s)
printf ("hello, %s!\n", s);
  else
printf ("hello, world!\n");
  return 0;
}

hello.c:
int notmain(const char *entity)
{
  return hello (entity);
}

int main()
{
  const char *empty = 0;
  if (!empty)
hello (empty);
  else
puts ("Goodbye!");
}

$ gcov -abc hello
function notmain called 0 returned 0% blocks executed 0%
#:4:int notmain(const char *entity)
%:4-block 2
branch  0 never executed (fallthrough)
branch  1 never executed
-:5:{
#:6:  return hello (entity);
%:6-block 7
-:7:}

Clearly there is a branch in notmain, but the branch comes from the
inlining of hello. This is not very obvious from looking at the output.
Here is hello.h.gcov:

-:3:inline __attribute__((always_inline))
-:4:int hello (const char *s)
-:5:{
#:6:  if (s)
%:6-block 3
branch  0 never executed (fallthrough)
branch  1 never executed
%:6-block 2
branch  2 never executed (fallthrough)
branch  3 never executed
#:7:printf ("hello, %s!\n", s);
%:7-block 4
call0 never executed
%:7-block 3
call1 never executed
-:8:  else
#:9:printf ("hello, world!\n");
%:9-block 5
call0 never executed
%:9-block 4
call1 never executed
#:   10:  return 0;
%:   10-block 6
%:   10-block 5
-:   11:}

The blocks from the different call sites have all been interleaved.

The reporting could tuned be to list the inlined function, too, like
this:

1:4:int notmain(const char *entity)
-: == inlined from hello.h ==
1:6:  if (s)
branch  0 taken 0 (fallthrough)
branch  1 taken 1
#:7:printf ("hello, %s!\n", s);
%:7-block 3
call0 never executed
-:8:  else
1:9:printf ("hello, world!\n");
1:9-block 4
call0 returned 1
1:   10:  return 0;
1:   10-block 5
-: == inlined from hello.h (end) ==
-:5:{
1:6:  return hello (entity);
1:6-block 7
-:7:}

Implementing something to this effect relies on having the sources for
both files (hello.c, hello.h) available, which is what this patch sets
up.

Note that the previous reading code would leak the source file content,
and explicitly storing them is not a huge departure nor performance
implication. I verified this with valgrind:

With slurp:

$ valgrind gcov ./hello
== == Memcheck, a memory error detector
== == Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
== == Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
== == Command: ./gcc/gcov demo
== ==
File 'hello.c'
Lines executed:100.00% of 4
Creating 'hello.c.gcov'

File 'hello.h'
Lines executed:75.00% of 4
Creating 'hello.h.gcov'
== ==
== == HEAP SUMMARY:
== == in use at exit: 84,907 bytes in 54 blocks
== ==   total heap usage: 254 allocs, 200 frees, 137,156 bytes allocated
== ==
== == LEAK SUMMARY:
== ==definitely lost: 1,237 bytes in 22 blocks
== ==indirectly lost: 562 bytes in 18 blocks
== ==  possibly lost: 0 bytes in 0 blocks
== ==still reachable: 83,108 bytes in 14 blocks
== ==   of which reachable via heuristic:
== == newarray   : 1,544 bytes in 1 blocks
== == suppressed: 0 bytes in 0 blocks
== == Rerun with --leak-check=full to see details of leaked memory
== ==
== == For lists of detected and suppressed errors, rerun with: -s
== == ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Without slurp:

$ valgrind gcov ./demo
== == Memcheck, a memory error detector
== == Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
== == Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
== == Command: ./gcc/gcov demo
== ==
File 'hello.c'
Lines executed:100.00% of 4
Creating 'hello.c.gcov'

File 'hello.h'
Lines executed:75.00% of 4
Creating 'hello.h.gcov'

Lines executed:87.50% of 8
== ==
== == HEAP SUMMARY:
== == in use at exit: 85,316 bytes in 82 blocks
== ==   total heap usage: 250 allocs, 168 frees, 137,084 bytes allocated
== ==
== == LEAK SUMMARY:
== ==definitely lost: 1,646 bytes in 50 blocks
== ==indirectly lost: 562 bytes in 18 blocks
== ==  possibly lost: 0 bytes in 0 blocks
==

[PATCH] tree-optimization/116258 - fix i386 testcase

2024-08-08 Thread Richard Biener

With -march=cascadelake we use vpermilps instead of shufps.

Tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116258
* gcc.target/i386/pr116258.c: Also allow vpermilps.
---
 gcc/testsuite/gcc.target/i386/pr116258.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr116258.c 
b/gcc/testsuite/gcc.target/i386/pr116258.c
index bd7d3a97b2c..cb67e4085c5 100644
--- a/gcc/testsuite/gcc.target/i386/pr116258.c
+++ b/gcc/testsuite/gcc.target/i386/pr116258.c
@@ -10,5 +10,5 @@
   return (x + h(t));
 }
 
-/* { dg-final { scan-assembler-times "shufps" 1 } } */
+/* { dg-final { scan-assembler-times "shufps|permilps" 1 } } */
 /* { dg-final { scan-assembler-not "unpck" } } */
-- 
2.43.0

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Martin Uecker

Am Donnerstag, dem 08.08.2024 um 00:09 +0200 schrieb Alejandro Colomar:
> Hi Martin,
> > 
...

> > > 
> > > I would personally prefer supporting [0], and consider that not
> > > supporting [*] is a bug in the implementation of [*] (and thus not my
> > > problem).
> > > 
> > > However, since GCC doesn't support 0-length arrays, I'm not sure that
> > > would be correct.
> > > 
> > > What do you think?
> > 
> > I think the logic in your patch is OK as is.  It does not exactly
> > what you want, as it now treats some [0] as [*] but I would not
> > make the logic more complex here when we will fix it properly
> > anyway.
> 
> I'm detecting some issues with my patches.
> 
>   $ cat zero.c
>   static int A[__lengthof__(int [0])];
>   static int B[__lengthof__(A)];
> 
>   static int C[0];
>   static int D[__lengthof__(C)];
> 
>   void fa(char (*a)[3][*], int (*x)[__lengthof__(*a)]);  // x: array
>   void fb(char (*a)[*][3], int (*x)[__lengthof__(*a)]);  // x: vla
>   void fc(char (*a)[3], int (*x)[__lengthof__(*a)]);  // x: array
>   void fd(char (*a)[0], int (*x)[__lengthof__(*a)]);  // x: ?
>   void fe(char (*a)[*], int (*x)[__lengthof__(*a)]);  // x: vla
>   void ff(char (*a)[*], int (*x)[*]);  // x: array
> 
> 
>   static int W[1];
>   static int X[__lengthof__(W)];
>   static int Y[0];
>   static int Z[__lengthof__(Y)];
> 
>   $ /opt/local/gnu/gcc/lengthof/bin/gcc zero.c
>   zero.c:18:12: error: variably modified ‘Z’ at file scope
>  18 | static int Z[__lengthof__(Y)];
> |^
> 
> 
> See that D, which is identical to Z, does not cause an error.
> There's one case of [0] resulting in a constant expression, and another
> in a VLA.  Can you please help investigate why it's happening?

This seems to be another bug where we incorrectly set
C_TYPE_VARIABLE_SIZE and this also affects sizeof:

https://godbolt.org/z/a8Ej6c5jr

Strangely it seems related to the function declaration
with the unspecified size before.  I will look into this,
I am just working on some checking functions that make sure
that those bits are consistent all the time because I also
missed some cases where I need to set C_TYPE_VARIABLY_MODIFIED

I filed a new bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116284

...

> |^
> 
> If I make [0] always result in a constant expression (and thus break
> some [*] cases), by doing
> 
>   -  var = var || (zero && C_TYPE_VARIABLE_SIZE (type));
> 
> Then the problem disappears.  But I'm worried that it might be hiding
> the problem instead of removing it, since I don't really understand why
> it's happening.  Do you know why?
> 
> Anyway, I'll remove that line to support [0].  But it would be
> interesting to learn why this problem triggers.

You need the line to support variable size arrays. Please just  uncomment
your test with a reference to the bug for now and I will try fix this ASAP.

Martin



> Alex
>

Re: [PATCH] Support if conversion for switches

2024-08-08 Thread Richard Biener

On Wed, Aug 7, 2024 at 9:33 PM Andi Kleen  wrote:
>
> > > + /* Create chain of switch tests for each case.  */
> > > + tree switch_cond = NULL_TREE;
> > > + tree index = gimple_switch_index (sw);
> > > + for (unsigned i = 1; i < gimple_switch_num_labels (sw); i++)
> > > +   {
> > > + tree label = gimple_switch_label (sw, i);
> > > + tree case_cond;
> > > + /* This currently cannot happen because tree-cfg lowers 
> > > range
> > > +switches with a single destination to COND.  */
> >
> > But it should also lower non-range switches with a single destination ...?
> > See convert_single_case_switch.  You say
> >
> >   switch (i)
> > {
> > case 1:
> > case 5 ... 7:
> >   return 42;
> > default:
> >   return 0;
> > }
> >
> > doesn't hit here with a CASE_HIGH for the 5 ... 7 CASE_LABEL?
>
> Yes it can actually happen. I'll correct the comment/description
> and add a test case.
>
> But your comment made me realize there is a major bug.
>
> if_convertible_switch_p also needs to check that that the labels don't fall
> through, so the the flow graph is diamond shape.  Need some easy way to
> verify that.

Do we verify this for if()s?  That is,

  if (i)
{
  ...
   goto fallthru;
}
  else
   {
fallthru:
 ...
   }

For ifs we seem to add the predicate to both edges even in the degenerate case.

>
> -Andi

Re: [PATCH] vect: Small C++11-ification of vect_vect_recog_func_ptrs

2024-08-08 Thread Richard Biener

On Thu, Aug 8, 2024 at 12:11 AM Andrew Pinski  wrote:
>
> This is a small C++11-ificiation for the use of vect_vect_recog_func_ptrs.
> Changes the loop into a range based loop which then we can remove the variable
> definition of NUM_PATTERNS. Also uses const reference instead of a pointer.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK

> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (NUM_PATTERNS): Delete.
> (vect_pattern_recog_1): Constify and change
> recog_func to a reference.
> (vect_pattern_recog): Use range-based loop over
> vect_vect_recog_func_ptrs.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-vect-patterns.cc | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 87b3dc413b8..f52de2b6972 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -7362,8 +7362,6 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>/* These must come after the double widening ones.  */
>  };
>
> -const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
> -
>  /* Mark statements that are involved in a pattern.  */
>
>  void
> @@ -7518,7 +7516,7 @@ vect_mark_pattern_stmts (vec_info *vinfo,
>
>  static void
>  vect_pattern_recog_1 (vec_info *vinfo,
> - vect_recog_func *recog_func, stmt_vec_info stmt_info)
> + const vect_recog_func &recog_func, stmt_vec_info 
> stmt_info)
>  {
>gimple *pattern_stmt;
>tree pattern_vectype;
> @@ -7538,7 +7536,7 @@ vect_pattern_recog_1 (vec_info *vinfo,
>  }
>
>gcc_assert (!STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> -  pattern_stmt = recog_func->fn (vinfo, stmt_info, &pattern_vectype);
> +  pattern_stmt = recog_func.fn (vinfo, stmt_info, &pattern_vectype);
>if (!pattern_stmt)
>  {
>/* Clear any half-formed pattern definition sequence.  */
> @@ -7550,7 +7548,7 @@ vect_pattern_recog_1 (vec_info *vinfo,
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location,
>  "%s pattern recognized: %G",
> -recog_func->name, pattern_stmt);
> +recog_func.name, pattern_stmt);
>
>/* Mark the stmts that are involved in the pattern. */
>vect_mark_pattern_stmts (vinfo, stmt_info, pattern_stmt, pattern_vectype);
> @@ -7658,8 +7656,8 @@ vect_pattern_recog (vec_info *vinfo)
> continue;
>
>   /* Scan over all generic vect_recog_xxx_pattern functions.  */
> - for (unsigned j = 0; j < NUM_PATTERNS; j++)
> -   vect_pattern_recog_1 (vinfo, &vect_vect_recog_func_ptrs[j],
> + for (const auto &func_ptr : vect_vect_recog_func_ptrs)
> +   vect_pattern_recog_1 (vinfo, func_ptr,
>   stmt_info);
> }
>  }
> --
> 2.43.0
>

Re: sched1 pathology on RISC-V : PR/114729

2024-08-08 Thread Richard Biener

On Thu, Aug 8, 2024 at 12:17 AM Vineet Gupta  wrote:
>
> On 8/7/24 12:28, Jeff Law wrote:
> > On 8/7/24 11:47 AM, Richard Sandiford wrote:
> >> I should probably start by saying that the "model" heuristic is now
> >> pretty old and was originally tuned for an in-order AArch32 core.
> >> The aim wasn't to *minimise* spilling, but to strike a better balance
> >> between parallelising with spills vs. sequentialising.  At the time,
> >> scheduling without taking register pressure into account would overly
> >> parallelise things, whereas the original -fsched-pressure would overly
> >> serialise (i.e. was too conservative).
> >>
> >> There were specific workloads in, er, a formerly popular embedded
> >> benchmark that benefitted significantly from *some* spilling.
> >>
> >> This comment probably sums up the trade-off best:
> >>
> >> This pressure cost is deliberately timid.  The intention has been
> >> to choose a heuristic that rarely interferes with the normal list
> >> scheduler in cases where that scheduler would produce good code.
> >> We simply want to curb some of its worst excesses.
> >>
> >> Because it was tuned for an in-order core, it was operating in an
> >> environment where instruction latencies were meaningful and realistic.
> >> So it still deferred to those to quite a big extent.  This is almost
> >> certainly too conservative for out-of-order cores.
> > What's interesting here is that the increased spilling roughly doubles
> > the number of dynamic instructions we have to execute for the benchmark.
> >   While a good uarch design can hide a lot of that overhead, it's still
> > crazy bad.
>
> [snip...]
>
> >> ...I think for OoO cores, this:
> >>
> >> baseECC (X) could itself be used as the ECC value described above.
> >> However, this is often too conservative, in the sense that it
> >> tends to make high-priority instructions that increase pressure
> >> wait too long in cases where introducing a spill would be better.
> >> For this reason the final ECC is a priority-adjusted form of
> >> baseECC (X).  Specifically, we calculate:
> >>
> >>   P (X) = INSN_PRIORITY (X) - insn_delay (X) - baseECC (X)
> >>   baseP = MAX { P (X) | baseECC (X) <= 0 }
> >>
> >> Then:
> >>
> >>   ECC (X) = MAX (MIN (baseP - P (X), baseECC (X)), 0)
> >>
> >> Thus an instruction's effect on pressure is ignored if it has a high
> >> enough priority relative to the ones that don't increase pressure.
> >> Negative values of baseECC (X) do not increase the priority of X
> >> itself, but they do make it harder for other instructions to
> >> increase the pressure further.
> >>
> >> is probably not appropriate.  We should probably just use the baseECC,
> >> as suggested by the first sentence in the comment.  It looks like the hack:
> >>
> >> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
> >> index 1bc610f9a5f..9601e929a88 100644
> >> --- a/gcc/haifa-sched.cc
> >> +++ b/gcc/haifa-sched.cc
> >> @@ -2512,7 +2512,7 @@ model_set_excess_costs (rtx_insn **insns, int count)
> >>  print_p = true;
> >>}
> >>  cost = model_excess_cost (insns[i], print_p);
> >> -if (cost <= 0)
> >> +if (cost <= 0 && 0)
> >>{
> >>  priority = INSN_PRIORITY (insns[i]) - insn_delay (insns[i]) - 
> >> cost;
> >>  priority_base = MAX (priority_base, priority);
> >> @@ -2525,6 +2525,7 @@ model_set_excess_costs (rtx_insn **insns, int count)
> >>
> >> /* Use MAX (baseECC, 0) and baseP to calculcate ECC for each
> >>instruction.  */
> >> +  if (0)
> >> for (i = 0; i < count; i++)
> >>   {
> >> cost = INSN_REG_PRESSURE_EXCESS_COST_CHANGE (insns[i]);
> >>
> >> fixes things for me.  Perhaps we should replace these && 0s
> >> with a query for an out-of-order core?
>
> Yes removing this heuristics does improves things but unfortunately it seems 
> there's more in sched1 that needs unraveling - Jeff is right after all :-)
>
> |  
> upstream  | -fno-schedule |  Patch  |
> | 
>|-insns | |
> | 
>|   | |
> _ZL24ML_BSSN_Dissipation_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd.lto_priv |
> 55,702  |43,132 |  45,788 |
> _ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd.lto_priv  |   
> 144,278  |59,204 | 132,588 |
> _ZL24ML_BSSN_constraints_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd.lto_priv |   
> 321,476  |   138,074 | 253,206 |
> _ZL16ML_BSSN_RHS_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd.lto_priv |   
> 483,794  |   179,694 | 360,286 |

Note even on x86 we spill like crazy in these functions - we are
dealing with >16 memory streams here so it is
inevitable that spilling is necessary with the tendency to hoist loads
and sink stores.

So this

Re: [PATCH] ada: Fix s-taprop__solaris.adb compilation

2024-08-08 Thread Marc Poulhiès

Rainer Orth  writes:

Hello,

> Solaris Ada bootstrap is broken as of 2024-08-06 with
>
> s-taprop.adb:1971:23: error: "int" is not visible
> s-taprop.adb:1971:23: error: multiple use clauses cause hiding
> s-taprop.adb:1971:23: error: hidden declaration at s-osinte.ads:51
> s-taprop.adb:1971:23: error: hidden declaration at i-c.ads:62
>
> because one instance of int isn't qualified.  This patch fixes this.
>
> Bootstrapped without regressions on i386-pc-solaris2.11 and
> sparc-sun-solaris2.11.
>
> Ok for trunk?

Yes, thanks!

Marc

Re: [PATCH] doc: move the cross reference for -fprofile-arcs to the right paragraph

2024-08-08 Thread Richard Biener

On Thu, Aug 8, 2024 at 4:03 AM Wentao Zhang  wrote:
>
> The referenced page contains more explanation of auxname.gcda produced
> by gcov profiler, which is a continuation of -fprofile-arcs's
> description.

OK

> gcc/ChangeLog:
>
> * doc/invoke.texi (Instrumentation Options): Move the cross
> reference of "Cross-profiling" under the description for flag
> "-fprofile-arcs".
> ---
>  gcc/doc/invoke.texi | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 27539a017..cd10d6cd5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17328,6 +17328,8 @@ Note that if a command line directly links source 
> files, the corresponding
>  E.g. @code{gcc a.c b.c -o binary} would generate @file{binary-a.gcda} and
>  @file{binary-b.gcda} files.
>
> +@xref{Cross-profiling}.
> +
>  @item -fcondition-coverage
>  @opindex fcondition-coverage
>  Add code so that program conditions are instrumented.  During execution the
> @@ -17336,8 +17338,6 @@ can be used to verify that all terms in a Boolean 
> function are tested and have
>  an independent effect on the outcome of a decision.  The result can be read
>  with @code{gcov --conditions}.
>
> -@xref{Cross-profiling}.
> -
>  @cindex @command{gcov}
>  @opindex coverage
>  @item --coverage
> --
> 2.34.1
>

Re: [PATCH] Ada, libgnarl: Fix s-taprop__posix.adb compilation.

2024-08-08 Thread Marc Poulhiès

Iain Sandoe  writes:

Hello,

> Tested on x86_64-darwin21, OK for trunk?

Yes, thanks!
Marc

Re: [PATCH v3] diagnostics: Follow DECL_ORIGIN in lhd_print_error_function [PR102061]

2024-08-08 Thread Richard Biener

On Thu, Aug 8, 2024 at 4:55 AM Peter Damianov  wrote:
>
> Currently, if a warning references a cloned function, the name of the cloned
> function will be emitted in the "In function 'xyz'" part of the diagnostic,
> which users aren't supposed to see. This patch follows the DECL_ORIGIN link
> to get the name of the original function, so the internal compiler details
> aren't exposed.

Note I see an almost exact copy of the function in cp/error.cc as
cp_print_error_function (possibly more modern), specifically using

  pp_printf (context->printer, function_category (fndecl),
 fndecl);

which ends up using %qD.

I've CCed David who likely invented diagnostic_abstract_origin and friends.

> gcc/ChangeLog:
> PR diagnostics/102061
> * langhooks.cc (lhd_print_error_function): Follow DECL_ORIGIN
> links.
> * gcc.dg/pr102061.c: New testcase.
>
> Signed-off-by: Peter Damianov 
> ---
> v3: also follow DECL_ORIGIN when emitting "inlined from" warnings, I missed 
> this before.
> Add testcase.
>
>  gcc/langhooks.cc|  3 +++
>  gcc/testsuite/gcc.dg/pr102061.c | 35 +
>  2 files changed, 38 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr102061.c
>
> diff --git a/gcc/langhooks.cc b/gcc/langhooks.cc
> index 61f2b676256..7a2a66b3c39 100644
> --- a/gcc/langhooks.cc
> +++ b/gcc/langhooks.cc
> @@ -395,6 +395,8 @@ lhd_print_error_function (diagnostic_context *context, 
> const char *file,
>   else
> fndecl = current_function_decl;
>
> + fndecl = DECL_ORIGIN(fndecl);

Space after DECL_ORIGIN.  There's a comment warranted for what we
intend do to here.

I think this change is reasonable.

> +
>   if (TREE_CODE (TREE_TYPE (fndecl)) == METHOD_TYPE)
> pp_printf
>   (context->printer, _("In member function %qs"),
> @@ -439,6 +441,7 @@ lhd_print_error_function (diagnostic_context *context, 
> const char *file,
> }
>   if (fndecl)
> {
> + fndecl = DECL_ORIGIN(fndecl);

Space missing again.

This change OTOH might cause us to print

inlined from foo at ...
inlined from foo at ...

so duplicating an inline for example in the case we split a function and then
inline both parts or in the case we inline a IPA-CP forwarder and the specific
clone.  It's not obvious what we should do here since of course for a recursive
function we can have a function inlined two times in a row.

The testcase only triggers the first case, right?

David, any comments?  I think the patch is OK with the formatting fixed.

Thanks,
Richard.

>   expanded_location s = expand_location (*locus);
>   pp_comma (context->printer);
>   pp_newline (context->printer);
> diff --git a/gcc/testsuite/gcc.dg/pr102061.c b/gcc/testsuite/gcc.dg/pr102061.c
> new file mode 100644
> index 000..dbdd23965e7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr102061.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -O2" } */
> +/* { dg-message "inlined from 'bar'" "" { target *-*-* } 0 } */
> +/* { dg-excess-errors "" } */
> +
> +static inline void
> +foo (char *p)
> +{
> +  __builtin___memcpy_chk (p, "abc", 3, __builtin_object_size (p, 0));
> +}
> +static void
> +bar (char *p) __attribute__((noinline));
> +static void
> +bar (char *p)
> +{
> +  foo (p);
> +}
> +void f(char*) __attribute__((noipa));
> +char buf[2];
> +void
> +baz (void) __attribute__((noinline));
> +void
> +baz (void)
> +{
> +  bar (buf);
> +  f(buf);
> +}
> +
> +void f(char*)
> +{}
> +
> +int main(void)
> +{
> +baz();
> +}
> --
> 2.39.2
>

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Alejandro Colomar

Hello Jens,

On Thu, Aug 08, 2024 at 07:35:12AM GMT, Jₑₙₛ Gustedt wrote:
> Hello Alejandro,
> 
> On Thu, 8 Aug 2024 00:44:02 +0200, Alejandro Colomar wrote:
> 
> > +Its syntax is similar to @code{sizeof}.
> 
> For my curiosity, do you also make the same distinction that with
> expressions you may omit the parenthesis?

I thought of it.  TBH, I haven't tested that thoroughly.

In principle, I have implemented it in the same way as sizeof, yes.

Personally, I would have never allowed sizeof without parentheses, but I
understand there are people who think the parentheses hurt readability,
so I kept it in the same way.

I'm not sure why the parentheses are necessary with type names in
sizeof, but to maintain expectations, I think it would be better to do
the same here.

> 
> I wouldn't be sure that we should continue that distinction from
> `sizeof`.

But then, what do we do?  Allow lengthof with type names without parens?
Or require parens?  I'm not comfortable with that choice.

> Also that prefix variant would be difficult to wrap in a
> `lengthof` macro (without underscores) as we would probably like to
> have it in the end.

Do you mean that I should add _Lengthof?  We're adding __lengthof__ to
be a GNU extension with relative freedom from ISO.  If I sent a patch
adding _Lengthof, we'd have to send a proposal to ISO at the same time,
and we'd be waiting for ISO to discuss it before I can merge it.  And we
couldn't bring prior art to ISO.

With this approach instead, the plan is:

-  Merge __lengthof__ in GCC before ISO hears of it (well, there are
   already several WG14 members in this discussion, so you have actually
   heard of it, but we're free to do more or less what we want).

-  Propose _Lengthof to ISO C, with prior art in GCC as __lengthof__,
   proposing the same semantics.  Also propose a lengthof macro defined
   in 

-  When ISO C accepts _Lengthof and lengthof, map _Lengthof in GCC to
   the same internals as __lengthof__, so they are the same thing.

Still, I'm interested in having some feedback from WG14, to prevent
implementing something that will have modifications when merged to
ISO C, so please CC anyone interested from WG14, if you know of any.

Have a lovely day!
Alex

-- 

signature.asc
Description: PGP signature

[x86 PATCH] Tweak ix86_mode_can_transfer_bits to restore bootstrap on RHEL.

2024-08-08 Thread Roger Sayle


This minor patch, very similar to one posted and approved previously at
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657229.html is
required to restore builds on systems using gcc 4.8 as a host compiler.
Using the enumeration constants E_SFmode and E_DFmode avoids issues with
SFmode and DFmode being "non-literal types in constant expressions".

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, with no new failures.  Ok for mainline?


2024-08-08  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_mode_can_transfer_bits): Use E_?Fmode
enumeration constants in switch statement.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8f289b5..02e2829 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26113,8 +26113,8 @@ ix86_mode_can_transfer_bits (machine_mode mode)
   || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
 switch (GET_MODE_INNER (mode))
   {
-  case SFmode:
-  case DFmode:
+  case E_SFmode:
+  case E_DFmode:
/* These suffer from normalization upon load when not using SSE.  */
return !(ix86_fpmath & FPMATH_387);
   default:

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Alejandro Colomar

Hi Martin,

On Thu, Aug 08, 2024 at 09:39:59AM GMT, Martin Uecker wrote:
> > $ /opt/local/gnu/gcc/lengthof/bin/gcc zero.c
> > zero.c:18:12: error: variably modified ‘Z’ at file scope
> >18 | static int Z[__lengthof__(Y)];
> >   |^
> > 
> > 
> > See that D, which is identical to Z, does not cause an error.
> > There's one case of [0] resulting in a constant expression, and another
> > in a VLA.  Can you please help investigate why it's happening?
> 
> This seems to be another bug where we incorrectly set
> C_TYPE_VARIABLE_SIZE and this also affects sizeof:
> 
> https://godbolt.org/z/a8Ej6c5jr
> 
> Strangely it seems related to the function declaration
> with the unspecified size before.  I will look into this,
> I am just working on some checking functions that make sure
> that those bits are consistent all the time because I also
> missed some cases where I need to set C_TYPE_VARIABLY_MODIFIED
> 
> I filed a new bug:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116284

Huh, that's obscure!  Thanks!  :-)

> 
> ...
> 
> >   |^
> > 
> > If I make [0] always result in a constant expression (and thus break
> > some [*] cases), by doing
> > 
> > -  var = var || (zero && C_TYPE_VARIABLE_SIZE (type));
> > 
> > Then the problem disappears.  But I'm worried that it might be hiding
> > the problem instead of removing it, since I don't really understand why
> > it's happening.  Do you know why?
> > 
> > Anyway, I'll remove that line to support [0].  But it would be
> > interesting to learn why this problem triggers.
> 
> You need the line to support variable size arrays.

Not really.  'zero' is only true for [0] and for [*], but nor for
[zero], right?  

All vla tests seem to pass if I remove that line.  The only issue will
be that

void f(char (*a)[*], int (*x)[__lengthof__(*a)]);

will result in 'int (*x)[0]' until you change the implementation of [*],
but I think we can live with that small detail.

> Please just  uncomment
> your test with a reference to the bug for now and I will try fix this ASAP.

I'll send v6 in a moment; feel free to insist in this if you disagree
after seeing it, but I think it works well without the line.

> 
> Martin

Cheers,
Alex

-- 



signature.asc
Description: PGP signature

[PATCH] c++: Attempt to implement C++26 P3034R1 - Module Declarations Shouldn't be Macros [PR114461]

2024-08-08 Thread Jakub Jelinek

Hi!

This is an attempt to implement the https://wg21.link/p3034r1 paper,
but I'm afraid the wording in the paper is bad for multiple reasons.
I think I understand the intent, that the module name and partition
if any shouldn't come from macros so that they can be scanned for
without preprocessing, but on the other side doesn't want to disable
macro expansion in pp-module altogether, because e.g. the optional
attribute in module-declaration would be nice to come from macros
as which exact attribute is needed might need to be decided based on
preprocessor checks.
The paper added https://eel.is/c++draft/cpp.module#2
which uses partly the wording from https://eel.is/c++draft/cpp.module#1

The first issue I see is that using that "defined as an object-like macro"
from there means IMHO something very different in those 2 paragraphs.
As per https://eel.is/c++draft/cpp.pre#7.sentence-1 preprocessing tokens
in preprocessing directives aren't subject to macro expansion unless
otherwise stated, and so the export and module tokens aren't expanded
and so the requirement that they aren't defined as an object-like macro
makes perfect sense.  The problem with the new paragraph is that
https://eel.is/c++draft/cpp.module#3.sentence-1 says that the rest of
the tokens are macro expanded and after macro expansion none of the
tokens can be defined as an object-like macro, if they would be, they'd
be expanded to that.  So, I think either the wording needs to change
such that not all preprocessing tokens after module are macro expanded,
only those which are after the pp-module-name and if any pp-module-partition
tokens, or all tokens after module are macro expanded but none of the tokens in
pp-module-name and pp-module-partition if any must come from macro
expansion.  The patch below implements it as if the former would be
specified (but see later), so essentially scans the preprocessing tokens
after module without expansion, if the first one is an identifier, it
disables expansion for it and then if followed by . or : expects another
such identifier (again with disabled expansion), but stops after second
: is seen.

Second issue is that while the global-module-fragment start is fine, matches
the syntax of the new paragraph where the pp-tokens[opt] aren't present,
there is also private-module-fragment in the syntax where module is
followed by : private ; and in that case the colon doesn't match the
pp-module-name grammar and appears now to be invalid.  I think the
https://eel.is/c++draft/cpp.module#2
paragraph needs to change so that it allows also that pp-tokens of
a pp-module may also be : pp-tokens[opt] (and in that case, I think
the colon shouldn't come from a macro and private and/or ; can).

Third issue is that there are too many pp-tokens in
https://eel.is/c++draft/cpp.module , one is all the tokens between
module keyword and the semicolon and one is the optional extra tokens
after pp-module-partition (if any, if missing, after pp-module).
Perhaps introducing some other non-terminal would help talking about it?
So in "where the pp-tokens (if any) shall not begin with a ( preprocessing
token" it isn't obvious which pp-tokens it is talking about (my assumption
is the latter) and also whether ( can't appear there just before macro
expansion or also after expansion.  The patch expects only before expansion,
so
#define F ();
export module foo F
would be valid during preprocessing but obviously invalid during
compilation, but
#define foo(n) n;
export module foo (3)
would be invalid already during preprocessing.

The last issue applies only if the first issue is resolved to allow
expansion of tokens after : if first token, or after pp-module-partition
if present or after pp-module-name if present.  When non-preprocessing
scanner sees
export module foo.bar:baz.qux;
it knows nothing can come from preprocessing macros and is ok, but if it
sees
export module foo.bar:baz qux
then it can't know whether it will be
export module foo.bar:baz;
or
export module foo.bar:baz [[]];
or
export module foo.bar:baz.freddy.garply;
because qux could be validly a macro, which expands to ; or [[]];
or .freddy.garply; etc.  So, either the non-preprocessing scanner would
need to note it as possible export of foo.bar:baz* module partitions
and preprocess if it needs to know the details or just compile, or if that
is not ok, the wording would need to rule out that the expansion of (the
second) pp-tokens if any can't start with . or : (colon would be only
problematic if it isn't present in the tokens before it already).
So, if e.g. defining qux above to . whatever is invalid, then the scanner
can rely it sees the whole module name and partition.

The patch below implements what is above described as the first variant
of the first issue resolution, i.e. disables expansion of as many tokens
as could be in the valid module name and module partition syntax, but
as soon as it e.g. sees two adjacent identifiers, the second one can be
macro expanded.  So, effecti

Re: [x86 PATCH] Tweak ix86_mode_can_transfer_bits to restore bootstrap on RHEL.

2024-08-08 Thread Uros Bizjak

On Thu, Aug 8, 2024 at 10:28 AM Roger Sayle  wrote:
>
>
> This minor patch, very similar to one posted and approved previously at
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657229.html is
> required to restore builds on systems using gcc 4.8 as a host compiler.
> Using the enumeration constants E_SFmode and E_DFmode avoids issues with
> SFmode and DFmode being "non-literal types in constant expressions".
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, with no new failures.  Ok for mainline?
>
>
> 2024-08-08  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_mode_can_transfer_bits): Use E_?Fmode
> enumeration constants in switch statement.

OK, also as an obvious patch.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>

[committed] libgomp.c++/static-aggr-constructor-destructor-{1,2}.C: Fix scan-tree-dump (was: [r15-2799 Regression] FAIL: libgomp.c++/static-aggr-constructor-destructor-2.C scan-tree-dump-times optimiz

2024-08-08 Thread Tobias Burnus


haochen.jiang wrote:

FAIL: libgomp.c++/static-aggr-constructor-destructor-1.C scan-tree-dump-times optimized 
"__attribute__\\(\\([^\n\r]*omp declare target nohost" 1
FAIL: libgomp.c++/static-aggr-constructor-destructor-1.C scan-tree-dump-times optimized 
"void _GLOBAL__off_I_v1" 1


Those symbols are generated even with ENABLE_OFFLOADING == false, but in 
that case they are optimized way (as they should).


With offloading, the pass removing them comes too late, but we should 
handle 'nohost' explicitly. Once done, the dump will be the same (no 
symbol). Until this implemented, we now do:


To make this test pass, we now use 'target (!) offload_target_any' to 
separate the cases, even though offload_target_any does not completely 
match ENABLE_OFFLOADING.*


Committed as r15-2814-ge3a6dec326a127

Tobias

(* If you configured with --enable-offload-defaulted and have no offload 
binaries available or when you smuggle '-foffload=disable' to the 
commandline, ENABLE_OFFLOADING is true while offload_target_any is false.)
commit e3a6dec326a127ad549246435b9d3835e9a32407
Author: Tobias Burnus 
Date:   Thu Aug 8 10:42:25 2024 +0200

libgomp.c++/static-aggr-constructor-destructor-{1,2}.C: Fix scan-tree-dump

In principle, the optimized dump should be the same on the host, but as
'nohost' is not handled, is is present. However when ENABLE_OFFLOADING is
false, it is handled early enough to remove the function.

libgomp/ChangeLog:

* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: Split
scan-tree-dump into with and without target offload_target_any.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C:
Likewise.
---
 .../libgomp.c++/static-aggr-constructor-destructor-1.C   | 15 ---
 .../libgomp.c++/static-aggr-constructor-destructor-2.C   | 16 +---
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
index 403a071c0c0..b5aafc8cabc 100644
--- a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
+++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
@@ -9,9 +9,18 @@
 
 // { dg-final { scan-tree-dump-not "omp_is_initial_device" "optimized" } }
 // { dg-final { scan-tree-dump-not "__omp_target_static_init_and_destruction" "optimized" } }
-// FIXME: should be '-not' not '-times' 1:
-// { dg-final { scan-tree-dump-times "void _GLOBAL__off_I_v1" 1 "optimized" } }
-// { dg-final { scan-tree-dump-times "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" 1 "optimized" } }
+
+// (A) No offloading configured: The symbols aren't present
+// Caveat: They are present with -foffload=disable - or offloading
+// configured but none of the optional offload packages/binaries installed.
+// But the 'offload_target_any' check cannot distinguish those
+// { dg-final { scan-tree-dump-not "void _GLOBAL__off_I_v1" "optimized" { target { ! offload_target_any } } } }
+// { dg-final { scan-tree-dump-not "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" "optimized" { target { ! offload_target_any } } } }
+
+// (B) With offload configured (and compiling for an offload target)
+// the symbols are present (missed optimization). Hence: FIXME.
+// { dg-final { scan-tree-dump-times "void _GLOBAL__off_I_v1" 1 "optimized" { target offload_target_any } } }
+// { dg-final { scan-tree-dump-times "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" 1 "optimized" { target offload_target_any } } }
 
 // { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump-not "omp_initial_device;" "optimized" { target offload_target_amdgcn } } }
 // { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "v1\\._x = 5;" "optimized" { target offload_target_amdgcn } } }
diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
index 6dd4260a522..9652a721bbe 100644
--- a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
+++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
@@ -9,9 +9,19 @@
 
 // { dg-final { scan-tree-dump-not "omp_is_initial_device" "optimized" } }
 // { dg-final { scan-tree-dump-not "__omp_target_static_init_and_destruction" "optimized" } }
-// FIXME: should be '-not' not '-times' 1:
-// { dg-final { scan-tree-dump-times "void _GLOBAL__off_I_" 1 "optimized" } }
-// { dg-final { scan-tree-dump-times "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" 1 "optimized" } }
+
+// (A) No offloading configured: The symbols aren't present
+// Caveat: They are present with -foffload=disable - or offloading
+// configured but none of the optional offload packages/binaries installed.
+// But the 'offload_target_any' check cannot distinguish those

PING [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-08-08 Thread Kong, Lingling

Hi,

Gently ping.

Thanks,
Lingling
From: Kong, Lingling 
Sent: Tuesday, June 25, 2024 2:46 PM
To: gcc-patches@gcc.gnu.org
Cc: Alexander Monakov ; Uros Bizjak ; 
lingling.ko...@gmail.com; Hongtao Liu ; Jeff Law 
; Richard Biener 
Subject: RE: [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

Hi,


Gently ping for this.

This version has removed the target hook and added a new optab for cfcmov.

Thanks,
Lingling

From: Kong, Lingling mailto:lingling.k...@intel.com>>
Sent: Tuesday, June 18, 2024 3:41 PM
To: gcc-patches@gcc.gnu.org
Cc: Alexander Monakov mailto:amona...@ispras.ru>>; Uros 
Bizjak mailto:ubiz...@gmail.com>>; 
lingling.ko...@gmail.com; Hongtao Liu 
mailto:crazy...@gmail.com>>; Jeff Law 
mailto:jeffreya...@gmail.com>>; Richard Biener 
mailto:richard.guent...@gmail.com>>
Subject: [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass


APX CFCMOV feature implements conditionally faulting which means

that all memory faults are suppressed when the condition code

evaluates to false and load or store a memory operand. Now we

could load or store a memory operand may trap or fault for

conditional move.



In middle-end, now we don't support a conditional move if we knew

that a load from A or B could trap or fault. To enable CFCMOV, we

added a new optab.



Conditional move suppress_fault for condition mem store would not

move any arithmetic calculations. For condition mem load now just

support a conditional move one trap mem and one no trap and no mem

cases.



gcc/ChangeLog:



   * ifcvt.cc (noce_try_cmove_load_mem_notrap): Allow convert

   to cfcmov for conditional load.

   (noce_try_cmove_store_mem_notrap): Convert to conditional store.

   (noce_process_if_block): Ditto.

   * optabs.def (OPTAB_D): New optab.

---

gcc/ifcvt.cc   | 246 -

gcc/optabs.def |   1 +

2 files changed, 246 insertions(+), 1 deletion(-)



diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc

index 58ed42673e5..65c069b8cc6 100644

--- a/gcc/ifcvt.cc

+++ b/gcc/ifcvt.cc

@@ -783,6 +783,8 @@ static rtx noce_emit_cmove (struct noce_if_info *, rtx, 
enum rtx_code, rtx,

 rtx, rtx, rtx, rtx = NULL, rtx 
= NULL);

static bool noce_try_cmove (struct noce_if_info *);

static bool noce_try_cmove_arith (struct noce_if_info *);

+static bool noce_try_cmove_load_mem_notrap (struct noce_if_info *);

+static bool noce_try_cmove_store_mem_notrap (struct noce_if_info *, rtx *, 
rtx);

static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);

static bool noce_try_minmax (struct noce_if_info *);

static bool noce_try_abs (struct noce_if_info *);

@@ -2401,6 +2403,233 @@ noce_try_cmove_arith (struct noce_if_info *if_info)

   return false;

}



+/* When target support suppress memory fault, try more complex cases involving

+   conditional_move's source or dest may trap or fault.  */

+

+static bool

+noce_try_cmove_load_mem_notrap (struct noce_if_info *if_info)

+{

+  rtx a = if_info->a;

+  rtx b = if_info->b;

+  rtx x = if_info->x;

+

+  if (MEM_P (x))

+return false;

+  /* Just handle a conditional move from one trap MEM + other non_trap,

+ non mem cases.  */

+  if (!(MEM_P (a) ^ MEM_P (b)))

+  return false;

+  bool a_trap = may_trap_or_fault_p (a);

+  bool b_trap = may_trap_or_fault_p (b);

+

+  if (!(a_trap ^ b_trap))

+return false;

+  if (a_trap && !MEM_P (a))

+return false;

+  if (b_trap && !MEM_P (b))

+return false;

+

+  rtx orig_b;

+  rtx_insn *insn_a, *insn_b;

+  bool a_simple = if_info->then_simple;

+  bool b_simple = if_info->else_simple;

+  basic_block then_bb = if_info->then_bb;

+  basic_block else_bb = if_info->else_bb;

+  rtx target;

+  enum rtx_code code;

+  rtx cond = if_info->cond;

+  rtx_insn *ifcvt_seq;

+

+  /* if (test) x = *a; else x = c - d;

+ => x = c - d;

+ if (test)

+   x = *a;

+  */

+

+  code = GET_CODE (cond);

+  insn_a = if_info->insn_a;

+  insn_b = if_info->insn_b;

+  machine_mode x_mode = GET_MODE (x);

+

+  /* Because we only handle one trap MEM + other non_trap, non mem cases,

+ just move one trap MEM always in then_bb.  */

+  if (noce_reversed_cond_code (if_info) != UNKNOWN)

+{

+  bool reversep = false;

+  if (b_trap)

+ reversep = true;

+

+  if (reversep)

+ {

+   if (if_info->rev_cond)

+ {

+   cond = if_info->rev_cond;

+   code = GET_CODE (cond);

+ }

+   else

+ code = reversed_comparison_code (cond, if_info->jump);

+   std::swap (a, b);

+   std::swap (insn_a, insn_b);

+   std::swap (a_simple, b_simple);

+   std::swap (then_bb, else_bb);

+

Re: [PATCH 0/8] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

2024-08-08 Thread Mikael Morin


Le 07/08/2024 à 12:03, Harald Anlauf a écrit :

Hi Mikael, Thomas!

Am 07.08.24 um 11:11 schrieb Mikael Morin:

Hello,

Le 06/08/2024 à 22:57, Thomas Koenig a écrit :

Hi Mikael and Harald,


- inline expansion is inhibited at -Os.  But wouldn't it be good if
   we make this expansion also dependent on -ffrontend-optimize?
   (This was the case for rank-1 before your patch).



By the way, I disabled the minmaxloc frontend optimization without too
much thought, because it was preventing me from seeing the effects of my
patches in the dumps.  Now that both of you have put some focus on it, I
think the optimization should be completely removed instead, because the
patches make it unreachable.


The original idea was to have -ffrontend-optimize as a check if anything
went wrong with front-end optimization in particular - if the bug went
away with -fno-frontend-optimize, we knew where to look (and I knew
I had to look).


It also provides a way for users to workaround bugs in frontend
optimizations.  If inline expansion were dependent on the flag, it would
also provide the same benefit, but it would be using the flag outside of
its intended scope, so I would rather not do it.


So, probably better to not do this at -Os.  One thought: Should we
also do the inlining without optimization?


At -Os: no inline expansion.  Don't we all agree on that?
I'm fine with also disabling expansion at -O0.


The following change to patch 2/8 does what I had in mind:

diff --git a/gcc/fortran/trans-intrinsic.cc 
b/gcc/fortran/trans-intrinsic.cc

index 9f3c3ce47bc..cc0d00f4e39 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -11650,6 +11650,29 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
  case GFC_ISYM_TRANSPOSE:
    return true;

+    case GFC_ISYM_MINLOC:
+    case GFC_ISYM_MAXLOC:
+  {
+    /* Disable inline expansion if code size matters.  */
+    if (optimize_size)
+  return false;

 /* Disable inline expansion if frontend optimization is disabled.  */
 if (!flag_frontend_optimize)
   return false;


As a result, the following happens:

- at -Os, inlining will never happen (as you had it)
- at -O0, the default is -fno-frontend-optimize, and we get the
   library implementation.  Inlining is forced with -ffrontend-optimize.
- at higher -Ox, the default is -ffrontend-optimize.

I believe this is also what Thomas' original motivation was.

(This flag actually helps to see that the inlining code in gcc-14
is currently broken for minloc/maxloc and optinional back argument).

As we are not planning to remove the library implementation (-Os!),
this is also the best way to compare library to inline code.


This makes perfect sense, but why reuse the -ffrontend-optimize option?
The manual describes it as:

This option performs front-end optimization, based on manipulating parts the 
Fortran parse tree


These patches are about inlining, there is no manipulation of the parse 
tree.  So I would rather use a separate option (-finline-intrinsics?).

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Jens Gustedt

Hi

Am 8. August 2024 10:26:14 MESZ schrieb Alejandro Colomar :
> Hello Jens,
> 
> On Thu, Aug 08, 2024 at 07:35:12AM GMT, Jₑₙₛ Gustedt wrote:
> > Hello Alejandro,
> > 
> > On Thu, 8 Aug 2024 00:44:02 +0200, Alejandro Colomar wrote:
> > 
> > > +Its syntax is similar to @code{sizeof}.
> > 
> > For my curiosity, do you also make the same distinction that with
> > expressions you may omit the parenthesis?
> 
> I thought of it.  TBH, I haven't tested that thoroughly.
> 
> In principle, I have implemented it in the same way as sizeof, yes.
> 
> Personally, I would have never allowed sizeof without parentheses, but I
> understand there are people who think the parentheses hurt readability,
> so I kept it in the same way.
> 
> I'm not sure why the parentheses are necessary with type names in
> sizeof,

probably because of operator precedence. there would be no rule that tells us 
where sizeof ends and we'd switch back from parsing a type to parsing an 
expression


> but to maintain expectations, I think it would be better to do
> the same here.

Just to compare, the recent additions in C23 typeof etc. only have the 
parenthesized versions. So there would be precedent. And it really eases 
transition


> > 
> > I wouldn't be sure that we should continue that distinction from
> > `sizeof`.
> 
> But then, what do we do?  Allow lengthof with type names without parens?
> Or require parens?  I'm not comfortable with that choice.
> 
> > Also that prefix variant would be difficult to wrap in a
> > `lengthof` macro (without underscores) as we would probably like to
> > have it in the end.
> 
> Do you mean that I should add _Lengthof?  We're adding __lengthof__ to
> be a GNU extension with relative freedom from ISO.  If I sent a patch
> adding _Lengthof, we'd have to send a proposal to ISO at the same time,
> and we'd be waiting for ISO to discuss it before I can merge it.  And we
> couldn't bring prior art to ISO.
> 
> With this approach instead, the plan is:
> 
> -  Merge __lengthof__ in GCC before ISO hears of it (well, there are
>already several WG14 members in this discussion, so you have actually
>heard of it, but we're free to do more or less what we want).
> 
> -  Propose _Lengthof to ISO C, with prior art in GCC as __lengthof__,
>proposing the same semantics.  Also propose a lengthof macro defined
>in 

I don't really see why we should take a detour via _Lengthof, I would hope we 
could directly propose lengthof as the standardization

> -  When ISO C accepts _Lengthof and lengthof, map _Lengthof in GCC to
>the same internals as __lengthof__, so they are the same thing.
> 
> Still, I'm interested in having some feedback from WG14, to prevent
> implementing something that will have modifications when merged to
> ISO C, so please CC anyone interested from WG14, if you know of any.

I think that more important would be to have clang on board with this.

In any case, thanks for doing this!

Jens


-- 
Jens Gustedt - INRIA & ICube, Strasbourg, France

[PATCH 2/3] gcov: branch, conds, calls in function summaries

2024-08-08 Thread Jørgen Kvalsvik

The gcov function summaries only output the covered lines, not the
branches and calls. Since the function summaries is an opt-in it
probably makes sense to also include branch coverage, calls, and
condition coverage.

$ gcc --coverage -fpath-coverage hello.c -o hello
$ ./hello

Before:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 4

Function 'fn'
Lines executed:100.00% of 7

File 'hello.c'
Lines executed:100.00% of 11
Creating 'hello.c.gcov'

After:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

With conditions:
$ gcov -fg hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1
No conditions

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
Condition outcomes covered:100.00% of 8
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

gcc/ChangeLog:

* gcov.cc (generate_results): Count branches, conditions.
(function_summary): Output branch, calls, condition count.
---
 gcc/gcov.cc | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)
---
 gcc/gcov.cc | 48 +++-
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 19019f404ee..7215a00c702 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -1687,11 +1687,19 @@ generate_results (const char *file_name)
   memset (&coverage, 0, sizeof (coverage));
   coverage.name = fn->get_name ();
   add_line_counts (flag_function_summary ? &coverage : NULL, fn);
-  if (flag_function_summary)
-   {
- function_summary (&coverage);
- fnotice (stdout, "\n");
-   }
+
+  if (!flag_function_summary)
+   continue;
+
+  for (const block_info& block : fn->blocks)
+   for (arc_info *arc = block.succ; arc; arc = arc->succ_next)
+ add_branch_counts (&coverage, arc);
+
+  for (const block_info& block : fn->blocks)
+   add_condition_counts (&coverage, &block);
+
+  function_summary (&coverage);
+  fnotice (stdout, "\n");
 }
 
   name_map needle;
@@ -2764,6 +2772,36 @@ function_summary (const coverage_info *coverage)
 {
   fnotice (stdout, "%s '%s'\n", "Function", coverage->name);
   executed_summary (coverage->lines, coverage->lines_executed);
+
+  if (coverage->branches)
+{
+  fnotice (stdout, "Branches executed:%s of %d\n",
+  format_gcov (coverage->branches_executed, coverage->branches, 2),
+  coverage->branches);
+  fnotice (stdout, "Taken at least once:%s of %d\n",
+  format_gcov (coverage->branches_taken, coverage->branches, 2),
+   coverage->branches);
+}
+  else
+fnotice (stdout, "No branches\n");
+
+  if (coverage->calls)
+fnotice (stdout, "Calls executed:%s of %d\n",
+format_gcov (coverage->calls_executed, coverage->calls, 2),
+coverage->calls);
+  else
+fnotice (stdout, "No calls\n");
+
+  if (flag_conditions)
+{
+  if (coverage->conditions)
+   fnotice (stdout, "Condition outcomes covered:%s of %d\n",
+format_gcov (coverage->conditions_covered,
+ coverage->conditions, 2),
+coverage->conditions);
+  else
+   fnotice (stdout, "No conditions\n");
+}
 }
 
 /* Output summary info for a file.  */
-- 
2.39.2

[PATCH 2/3] Add branch, conds, calls in gcov function summary

2024-08-08 Thread Jørgen Kvalsvik

The gcov function summaries only output the covered lines, not the
branches and calls. Since the function summaries is an opt-in it
probably makes sense to also include branch coverage, calls, and
condition coverage.

$ gcc --coverage -fpath-coverage hello.c -o hello
$ ./hello

Before:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 4

Function 'fn'
Lines executed:100.00% of 7

File 'hello.c'
Lines executed:100.00% of 11
Creating 'hello.c.gcov'

After:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

With conditions:
$ gcov -fg hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1
No conditions

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
Condition outcomes covered:100.00% of 8
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

gcc/ChangeLog:

* gcov.cc (generate_results): Count branches, conditions.
(function_summary): Output branch, calls, condition count.
---
 gcc/gcov.cc | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)
---
 gcc/gcov.cc | 48 +++-
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 5eb40f94b99..74ebcf10e4b 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -1687,11 +1687,19 @@ generate_results (const char *file_name)
   memset (&coverage, 0, sizeof (coverage));
   coverage.name = fn->get_name ();
   add_line_counts (flag_function_summary ? &coverage : NULL, fn);
-  if (flag_function_summary)
-   {
- function_summary (&coverage);
- fnotice (stdout, "\n");
-   }
+
+  if (!flag_function_summary)
+   continue;
+
+  for (const block_info& block : fn->blocks)
+   for (arc_info *arc = block.succ; arc; arc = arc->succ_next)
+ add_branch_counts (&coverage, arc);
+
+  for (const block_info& block : fn->blocks)
+   add_condition_counts (&coverage, &block);
+
+  function_summary (&coverage);
+  fnotice (stdout, "\n");
 }
 
   name_map needle;
@@ -2764,6 +2772,36 @@ function_summary (const coverage_info *coverage)
 {
   fnotice (stdout, "%s '%s'\n", "Function", coverage->name);
   executed_summary (coverage->lines, coverage->lines_executed);
+
+  if (coverage->branches)
+{
+  fnotice (stdout, "Branches executed:%s of %d\n",
+  format_gcov (coverage->branches_executed, coverage->branches, 2),
+  coverage->branches);
+  fnotice (stdout, "Taken at least once:%s of %d\n",
+  format_gcov (coverage->branches_taken, coverage->branches, 2),
+   coverage->branches);
+}
+  else
+fnotice (stdout, "No branches\n");
+
+  if (coverage->calls)
+fnotice (stdout, "Calls executed:%s of %d\n",
+format_gcov (coverage->calls_executed, coverage->calls, 2),
+coverage->calls);
+  else
+fnotice (stdout, "No calls\n");
+
+  if (flag_conditions)
+{
+  if (coverage->conditions)
+   fnotice (stdout, "Condition outcomes covered:%s of %d\n",
+format_gcov (coverage->conditions_covered,
+ coverage->conditions, 2),
+coverage->conditions);
+  else
+   fnotice (stdout, "No conditions\n");
+}
 }
 
 /* Output summary info for a file.  */
-- 
2.39.2

[PATCH 1/3] gcov: Cache source files

2024-08-08 Thread Jørgen Kvalsvik

Cache the source files as they are read, rather than discarding them at
the end of output_lines (), and move the reading of the source file to
the new function slurp.

This patch does not really change anything other than moving the file
reading out of output_file, but set gcov up for more interaction with
the source file. The motvating example is reporting coverage on
functions from different source files, notably C++ headers and
((always_inline)).

Here is an example of what gcov does today:

hello.h:
inline __attribute__((always_inline))
int hello (const char *s)
{
  if (s)
printf ("hello, %s!\n", s);
  else
printf ("hello, world!\n");
  return 0;
}

hello.c:
int notmain(const char *entity)
{
  return hello (entity);
}

int main()
{
  const char *empty = 0;
  if (!empty)
hello (empty);
  else
puts ("Goodbye!");
}

$ gcov -abc hello
function notmain called 0 returned 0% blocks executed 0%
#:4:int notmain(const char *entity)
%:4-block 2
branch  0 never executed (fallthrough)
branch  1 never executed
-:5:{
#:6:  return hello (entity);
%:6-block 7
-:7:}

Clearly there is a branch in notmain, but the branch comes from the
inlining of hello. This is not very obvious from looking at the output.
Here is hello.h.gcov:

-:3:inline __attribute__((always_inline))
-:4:int hello (const char *s)
-:5:{
#:6:  if (s)
%:6-block 3
branch  0 never executed (fallthrough)
branch  1 never executed
%:6-block 2
branch  2 never executed (fallthrough)
branch  3 never executed
#:7:printf ("hello, %s!\n", s);
%:7-block 4
call0 never executed
%:7-block 3
call1 never executed
-:8:  else
#:9:printf ("hello, world!\n");
%:9-block 5
call0 never executed
%:9-block 4
call1 never executed
#:   10:  return 0;
%:   10-block 6
%:   10-block 5
-:   11:}

The blocks from the different call sites have all been interleaved.

The reporting could tuned be to list the inlined function, too, like
this:

1:4:int notmain(const char *entity)
-: == inlined from hello.h ==
1:6:  if (s)
branch  0 taken 0 (fallthrough)
branch  1 taken 1
#:7:printf ("hello, %s!\n", s);
%:7-block 3
call0 never executed
-:8:  else
1:9:printf ("hello, world!\n");
1:9-block 4
call0 returned 1
1:   10:  return 0;
1:   10-block 5
-: == inlined from hello.h (end) ==
-:5:{
1:6:  return hello (entity);
1:6-block 7
-:7:}

Implementing something to this effect relies on having the sources for
both files (hello.c, hello.h) available, which is what this patch sets
up.

Note that the previous reading code would leak the source file content,
and explicitly storing them is not a huge departure nor performance
implication. I verified this with valgrind:

With slurp:

$ valgrind gcov ./hello
== == Memcheck, a memory error detector
== == Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
== == Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
== == Command: ./gcc/gcov demo
== ==
File 'hello.c'
Lines executed:100.00% of 4
Creating 'hello.c.gcov'

File 'hello.h'
Lines executed:75.00% of 4
Creating 'hello.h.gcov'
== ==
== == HEAP SUMMARY:
== == in use at exit: 84,907 bytes in 54 blocks
== ==   total heap usage: 254 allocs, 200 frees, 137,156 bytes allocated
== ==
== == LEAK SUMMARY:
== ==definitely lost: 1,237 bytes in 22 blocks
== ==indirectly lost: 562 bytes in 18 blocks
== ==  possibly lost: 0 bytes in 0 blocks
== ==still reachable: 83,108 bytes in 14 blocks
== ==   of which reachable via heuristic:
== == newarray   : 1,544 bytes in 1 blocks
== == suppressed: 0 bytes in 0 blocks
== == Rerun with --leak-check=full to see details of leaked memory
== ==
== == For lists of detected and suppressed errors, rerun with: -s
== == ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Without slurp:

$ valgrind gcov ./demo
== == Memcheck, a memory error detector
== == Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
== == Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
== == Command: ./gcc/gcov demo
== ==
File 'hello.c'
Lines executed:100.00% of 4
Creating 'hello.c.gcov'

File 'hello.h'
Lines executed:75.00% of 4
Creating 'hello.h.gcov'

Lines executed:87.50% of 8
== ==
== == HEAP SUMMARY:
== == in use at exit: 85,316 bytes in 82 blocks
== ==   total heap usage: 250 allocs, 168 frees, 137,084 bytes allocated
== ==
== == LEAK SUMMARY:
== ==definitely lost: 1,646 bytes in 50 blocks
== ==indirectly lost: 562 bytes in 18 blocks
== ==  possibly lost: 0 bytes in 0 blocks
==

[PATCH 0/3] Prime path coverage in gcc/gcov

2024-08-08 Thread Jørgen Kvalsvik

I think this patch is ready for review now. I'm resubmitting these
patches with a few tiny fixes so they build properly.

These are the main highlights since v3:

1. Atomics are issued under -fprofile-update=atomic
2. Giving up after exceeding path limit in more phases to not
   accidentally get stuck between checks.
3. Fixed some ICEs, mostly around setjmp.
4. Refactoring, comments.
5. Manual entries, --help.

Jørgen Kvalsvik (3):
  gcov: Cache source files
  gcov: Add branch, conds, calls in function summary
  Add prime path coverage to gcc/gcov

 gcc/Makefile.in|6 +-
 gcc/builtins.cc|2 +-
 gcc/collect2.cc|5 +-
 gcc/common.opt |   14 +
 gcc/doc/gcov.texi  |  155 ++
 gcc/doc/invoke.texi|   35 +
 gcc/gcc.cc |4 +-
 gcc/gcov-counter.def   |3 +
 gcc/gcov-io.h  |3 +
 gcc/gcov.cc|  537 ++-
 gcc/ipa-inline.cc  |2 +-
 gcc/passes.cc  |4 +-
 gcc/path-coverage.cc   |  778 +
 gcc/prime-paths.cc | 2006 
 gcc/profile.cc |6 +-
 gcc/selftest-run-tests.cc  |1 +
 gcc/selftest.h |1 +
 gcc/testsuite/g++.dg/gcov/gcov-22.C|  170 ++
 gcc/testsuite/gcc.misc-tests/gcov-29.c |  869 ++
 gcc/testsuite/gcc.misc-tests/gcov-30.c |  869 ++
 gcc/testsuite/lib/gcov.exp |   92 +-
 gcc/tree-profile.cc|   11 +-
 22 files changed, 5534 insertions(+), 39 deletions(-)
 create mode 100644 gcc/path-coverage.cc
 create mode 100644 gcc/prime-paths.cc
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-22.C
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-30.c

-- 
2.39.2

Re: sched1 pathology on RISC-V : PR/114729

2024-08-08 Thread Richard Sandiford

Vineet Gupta  writes:
> On 8/7/24 12:28, Jeff Law wrote:
>> On 8/7/24 11:47 AM, Richard Sandiford wrote:
>>> I should probably start by saying that the "model" heuristic is now
>>> pretty old and was originally tuned for an in-order AArch32 core.
>>> The aim wasn't to *minimise* spilling, but to strike a better balance
>>> between parallelising with spills vs. sequentialising.  At the time,
>>> scheduling without taking register pressure into account would overly
>>> parallelise things, whereas the original -fsched-pressure would overly
>>> serialise (i.e. was too conservative).
>>>
>>> There were specific workloads in, er, a formerly popular embedded
>>> benchmark that benefitted significantly from *some* spilling.
>>>
>>> This comment probably sums up the trade-off best:
>>>
>>> This pressure cost is deliberately timid.  The intention has been
>>> to choose a heuristic that rarely interferes with the normal list
>>> scheduler in cases where that scheduler would produce good code.
>>> We simply want to curb some of its worst excesses.
>>>
>>> Because it was tuned for an in-order core, it was operating in an
>>> environment where instruction latencies were meaningful and realistic.
>>> So it still deferred to those to quite a big extent.  This is almost
>>> certainly too conservative for out-of-order cores.
>> What's interesting here is that the increased spilling roughly doubles 
>> the number of dynamic instructions we have to execute for the benchmark. 
>>   While a good uarch design can hide a lot of that overhead, it's still 
>> crazy bad.
>
> [snip...]
>
>>> ...I think for OoO cores, this:
>>>
>>> baseECC (X) could itself be used as the ECC value described above.
>>> However, this is often too conservative, in the sense that it
>>> tends to make high-priority instructions that increase pressure
>>> wait too long in cases where introducing a spill would be better.
>>> For this reason the final ECC is a priority-adjusted form of
>>> baseECC (X).  Specifically, we calculate:
>>>
>>>   P (X) = INSN_PRIORITY (X) - insn_delay (X) - baseECC (X)
>>>   baseP = MAX { P (X) | baseECC (X) <= 0 }
>>>
>>> Then:
>>>
>>>   ECC (X) = MAX (MIN (baseP - P (X), baseECC (X)), 0)
>>>
>>> Thus an instruction's effect on pressure is ignored if it has a high
>>> enough priority relative to the ones that don't increase pressure.
>>> Negative values of baseECC (X) do not increase the priority of X
>>> itself, but they do make it harder for other instructions to
>>> increase the pressure further.
>>>
>>> is probably not appropriate.  We should probably just use the baseECC,
>>> as suggested by the first sentence in the comment.  It looks like the hack:
>>>
>>> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
>>> index 1bc610f9a5f..9601e929a88 100644
>>> --- a/gcc/haifa-sched.cc
>>> +++ b/gcc/haifa-sched.cc
>>> @@ -2512,7 +2512,7 @@ model_set_excess_costs (rtx_insn **insns, int count)
>>> print_p = true;
>>>   }
>>> cost = model_excess_cost (insns[i], print_p);
>>> -   if (cost <= 0)
>>> +   if (cost <= 0 && 0)
>>>   {
>>> priority = INSN_PRIORITY (insns[i]) - insn_delay (insns[i]) - cost;
>>> priority_base = MAX (priority_base, priority);
>>> @@ -2525,6 +2525,7 @@ model_set_excess_costs (rtx_insn **insns, int count)
>>>   
>>> /* Use MAX (baseECC, 0) and baseP to calculcate ECC for each
>>>instruction.  */
>>> +  if (0)
>>> for (i = 0; i < count; i++)
>>>   {
>>> cost = INSN_REG_PRESSURE_EXCESS_COST_CHANGE (insns[i]);
>>>
>>> fixes things for me.  Perhaps we should replace these && 0s
>>> with a query for an out-of-order core?
>
> Yes removing this heuristics does improves things but unfortunately it seems 
> there's more in sched1 that needs unraveling - Jeff is right after all :-)
>
>     |  
> upstream  | -fno-schedule |  Patch  |
> | 
>    |    -insns | |
> | 
>    |   | |
> _ZL24ML_BSSN_Dissipation_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd.lto_priv |    
> 55,702  |    43,132 |  45,788 |
> _ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd.lto_priv  |   
> 144,278  |    59,204 | 132,588 |
> _ZL24ML_BSSN_constraints_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd.lto_priv |   
> 321,476  |   138,074 | 253,206 |
> _ZL16ML_BSSN_RHS_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd.lto_priv |   
> 483,794  |   179,694 | 360,286 |
>
>
>
>>>
>>> I haven't benchmarked this. :)  And I'm looking at the code for the
>>> first time in many years, so I'm certainly forgetting details.
>> Well, I think we're probably too focused on ooo vs in-order.  The 
>> badness we're seeing I think would likely trigger on the in-order risc-v 
>> implementations out there

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Martin Uecker

Am Donnerstag, dem 08.08.2024 um 10:42 +0200 schrieb Alejandro Colomar:
> > 
> > ...
> > 
> > > |^
> > > 
> > > If I make [0] always result in a constant expression (and thus break
> > > some [*] cases), by doing
> > > 
> > >   -  var = var || (zero && C_TYPE_VARIABLE_SIZE (type));
> > > 
> > > Then the problem disappears.  But I'm worried that it might be hiding
> > > the problem instead of removing it, since I don't really understand why
> > > it's happening.  Do you know why?
> > > 
> > > Anyway, I'll remove that line to support [0].  But it would be
> > > interesting to learn why this problem triggers.
> > 
> > You need the line to support variable size arrays.
> 
> Not really.  'zero' is only true for [0] and for [*], but nor for
> [zero], right?  
> 
> All vla tests seem to pass if I remove that line.  The only issue will
> be that
> 
>   void f(char (*a)[*], int (*x)[__lengthof__(*a)]);
> 
> will result in 'int (*x)[0]' until you change the implementation of [*],
> but I think we can live with that small detail.


I plan to change the representation of [0], so it would be nice if the
[*] cases are correct as much as possible so that they not get forgotten
later.

Martin

> 
> > Please just  uncomment
> > your test with a reference to the bug for now and I will try fix this ASAP.
> 
> I'll send v6 in a moment; feel free to insist in this if you disagree
> after seeing it, but I think it works well without the line.
> 
> > 
> > Martin
> 
> Cheers,
> Alex
> 

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Alejandro Colomar

Hi Jens,

On Thu, Aug 08, 2024 at 11:13:02AM GMT, Jens Gustedt wrote:
> > but to maintain expectations, I think it would be better to do
> > the same here.
> 
> Just to compare, the recent additions in C23 typeof etc. only have the
> parenthesized versions. So there would be precedent. And it really
> eases transition

Hmmm, interesting.

The good part of reusing sizeof syntax is that I can reuse internal code
for sizeof.  But I'll check if I can change it easily to only support
parens.

> > > I wouldn't be sure that we should continue that distinction from
> > > `sizeof`.
> > 
> > But then, what do we do?  Allow lengthof with type names without parens?
> > Or require parens?  I'm not comfortable with that choice.
> > 
> > > Also that prefix variant would be difficult to wrap in a
> > > `lengthof` macro (without underscores) as we would probably like to
> > > have it in the end.
> > 
> > Do you mean that I should add _Lengthof?  We're adding __lengthof__ to
> > be a GNU extension with relative freedom from ISO.  If I sent a patch
> > adding _Lengthof, we'd have to send a proposal to ISO at the same time,
> > and we'd be waiting for ISO to discuss it before I can merge it.  And we
> > couldn't bring prior art to ISO.
> > 
> > With this approach instead, the plan is:
> > 
> > -  Merge __lengthof__ in GCC before ISO hears of it (well, there are
> >already several WG14 members in this discussion, so you have actually
> >heard of it, but we're free to do more or less what we want).
> > 
> > -  Propose _Lengthof to ISO C, with prior art in GCC as __lengthof__,
> >proposing the same semantics.  Also propose a lengthof macro defined
> >in 
> 
> I don't really see why we should take a detour via _Lengthof, I would
> hope we could directly propose lengthof as the standardization

Hmmm, maybe programs already use lengthof for some other purpose.
Hopefully not, but I don't know.  In any case, I'm fine with both
approaches.

> > -  When ISO C accepts _Lengthof and lengthof, map _Lengthof in GCC to
> >the same internals as __lengthof__, so they are the same thing.
> > 
> > Still, I'm interested in having some feedback from WG14, to prevent
> > implementing something that will have modifications when merged to
> > ISO C, so please CC anyone interested from WG14, if you know of any.
> 
> I think that more important would be to have clang on board with this.

Does anyone have any Clang maintainer in mind that would be interested
in being CCed?  If so, please let me know (and/or add it yourselves).

> 
> In any case, thanks for doing this!

:-)

Cheers,
Alex

-- 



signature.asc
Description: PGP signature

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Alejandro Colomar

On Thu, Aug 08, 2024 at 11:23:51AM GMT, Martin Uecker wrote:
> > Not really.  'zero' is only true for [0] and for [*], but nor for
> > [zero], right?  
> > 
> > All vla tests seem to pass if I remove that line.  The only issue will
> > be that
> > 
> > void f(char (*a)[*], int (*x)[__lengthof__(*a)]);
> > 
> > will result in 'int (*x)[0]' until you change the implementation of [*],
> > but I think we can live with that small detail.
> 
> 
> I plan to change the representation of [0], so it would be nice if the
> [*] cases are correct as much as possible so that they not get forgotten
> later.

Ahhh, thanks!  Will do, then.

> 
> Martin

Cheers,
Alex

-- 



signature.asc
Description: PGP signature

[PATCH] tree-optimization/116024 - match.pd: add 4 int-compare simplifications

2024-08-08 Thread Artemiy Volkov

This patch implements match.pd patterns for the following transformations:

(1) (UB-on-overflow types) C1 - X cmp C2 -> X cmp C1 - C2

(2) (unsigned types) C1 - X cmp C2 ->
(a) X cmp C1 - C2, when cmp is !=, ==
(b) X - (C1 - C2) cmp C2, when cmp is <=, >
(c) X - (C1 - C2 + 1) cmp C2, when cmp is <, >=

(3) (signed wrapping types) C1 - X cmp C2
(a) X cmp C1 - C2, when cmp is !=, ==
(b) X - (C1 + 1) rcmp -C2 - 1, otherwise

(4) (all wrapping types) X + C1 cmp C2 ->
(a) X cmp C2 - C1, when cmp is !=, ==
(b) X cmp -C1, when cmp is <=, > and C2 - C1 == max
(c) X cmp -C1, when cmp is <, >= and C2 - C1 == min

Included along are testcases for all the aforementioned changes.  This
patch has been bootstrapped and regtested on aarch64, x86_64, and i386,
and additionally regtested on riscv32.  Existing tests were adjusted
where necessary.

gcc/ChangeLog:

PR tree-optimization/116024
* match.pd: New transformations around integer comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116024.c: New test.
* gcc.dg/tree-ssa/pr116024-1.c: Ditto.
* gcc.dg/tree-ssa/pr116024-1-fwrapv.c: Ditto.
* gcc.dg/tree-ssa/pr116024-2.c: Ditto.
* gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto.
* gcc.dg/pr67089-6.c: Adjust.
* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Ditto.

Signed-off-by: Artemiy Volkov 
---
 gcc/match.pd   | 75 +-
 gcc/testsuite/gcc.dg/pr67089-6.c   |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c  | 73 +
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c | 73 +
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2-fwrapv.c  | 37 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c | 38 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c   | 73 +
 .../gcc.target/aarch64/gtu_to_ltu_cmp_1.c  |  2 +-
 8 files changed, 371 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2-fwrapv.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c

diff --git a/gcc/match.pd b/gcc/match.pd
index d401e75..97d4398 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8418,6 +8418,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(cmp @0 { TREE_OVERFLOW (res)
 ? drop_tree_overflow (res) : res; }
 (for cmp (lt le gt ge)
+ rcmp (gt ge lt le)
  (for op (plus minus)
   rop (minus plus)
   (simplify
@@ -8445,7 +8446,79 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  "X cmp C2 -+ C1"),
 WARN_STRICT_OVERFLOW_COMPARISON);
}
-   (cmp @0 { res; })
+   (cmp @0 { res; })
+/* For wrapping types, simplify X + C1 CMP C2 to X CMP -C1 when possible.  */
+   (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+ (with
+   {
+   wide_int max = wi::max_value (TREE_TYPE (@0));
+   wide_int min = wi::min_value (TREE_TYPE (@0));
+
+   wide_int c2 = rop == PLUS_EXPR
+ ? wi::add (wi::to_wide (@2), wi::to_wide (@1))
+ : wi::sub (wi::to_wide (@2), wi::to_wide (@1));
+   }
+   (if (((cmp == LE_EXPR || cmp == GT_EXPR) && wi::eq_p (c2, max))
+   || ((cmp == LT_EXPR || cmp == GE_EXPR) && wi::eq_p (c2, min)))
+ (with
+  {
+wide_int c1 = rop == PLUS_EXPR
+  ? wi::add (min, wi::to_wide (@1))
+  : wi::sub (min, wi::to_wide (@1));
+tree c1_cst = build_uniform_cst (TREE_TYPE (@0),
+   wide_int_to_tree (TREE_TYPE (@0), c1));
+  }
+  (rcmp @0 { c1_cst; })
+
+/* Invert sign of X in comparisons of the form C1 - X CMP C2.  */
+
+(for cmp (lt le gt ge eq ne)
+ rcmp (gt ge lt le eq ne)
+  (simplify
+   (cmp (minus INTEGER_CST@0 @1) INTEGER_CST@2)
+   (if (!TREE_OVERFLOW (@0) && !TREE_OVERFLOW (@2)
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1)))
+ (with { tree res = int_const_binop (MINUS_EXPR, @0, @2); }
+  (if (TREE_OVERFLOW (res))
+   (with
+   {
+ fold_overflow_warning (("assuming signed overflow does not occur "
+ "when simplifying conditional to constant"),
+ WARN_STRICT_OVERFLOW_CONDITIONAL);
+   }
+   (switch
+(if (cmp == NE_EXPR)
+ { constant_boolean_node (true, type); })
+(if (cmp == EQ_EXPR)
+ { constant_boolean_node (false, type); })
+{
+  bool less = cmp == LE_EXPR || cmp == LT_EXPR;
+  bool ovf_high = wi::lt_p (wi::to_wide (@0), 0,
+TYPE_S

Re: [PATCH] c++: Attempt to implement C++26 P3034R1 - Module Declarations Shouldn't be Macros [PR114461]

2024-08-08 Thread Jakub Jelinek

On Thu, Aug 08, 2024 at 10:44:31AM +0200, Jakub Jelinek wrote:
> I think the patch is at least a step in the direction of the paper's
> intent, but perhaps not full.  If we need to check for initial : or .
> in the expansion of the first identifier after the module name or
> module partition, not sure how it would be implemented

Maybe set some new NODE_* flag on the first CPP_NAME token after the
module name/partition unexpanded token iff it is some macro (object-like or
function-like) and in cpp_get_token_1 error if the first token from such
macro is CPP_DOT or CPP_COLON.  Ugly, but could work.

Or if the wording is changed to require that none of the pp-module-name or
pp-module-partition tokens come from macro expansion mark with some flag
all tokens from macro expansion.  We already have -ftrack-macro-expansion=,
but that can be disabled and we'd need to diagnose it even in that case.

Jakub

[PATCH] RISC-V: tree-optimization/116274 - overzealous SLP vectorization

2024-08-08 Thread Richard Biener

The following tries to address that the vectorizer fails to have
precise knowledge of argument and return calling conventions and
views some accesses as loads and stores that are not.
This is mainly important when doing basic-block vectorization as
otherwise loop indexing would force such arguments to memory.

On x86 the reduction in the number of apparent loads and stores
often dominates cost analysis so the following tries to mitigate
this aggressively by adjusting only the scalar load and store
cost, reducing them to the cost of a simple scalar statement,
but not touching the vector access cost which would be much
harder to estimate.  Thereby we error on the side of not performing
basic-block vectorization.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard - we can of course do this adjustment in the backend as well
but it might be worthwhile in generic code.  Do you see similar
issues on arm?

PR tree-optimization/116274
* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Cost scalar loads
and stores as simple scalar stmts when they access a non-global,
not address-taken variable that doesn't have BLKmode assigned.

* gcc.target/i386/pr116274.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr116274.c |  9 +
 gcc/tree-vect-slp.cc | 12 +++-
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr116274.c

diff --git a/gcc/testsuite/gcc.target/i386/pr116274.c 
b/gcc/testsuite/gcc.target/i386/pr116274.c
new file mode 100644
index 000..d5811344b93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116274.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slp2-optimized" } */
+
+struct a { long x,y; };
+long test(struct a a) { return a.x+a.y; }
+
+/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" } } */
+/* { dg-final { scan-assembler-times "addl|leaq" 1 } } */
+/* { dg-final { scan-assembler-not "padd" } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3464d0c0e23..e43ff721100 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7807,7 +7807,17 @@ next_lane:
   vect_cost_for_stmt kind;
   if (STMT_VINFO_DATA_REF (orig_stmt_info))
{
- if (DR_IS_READ (STMT_VINFO_DATA_REF (orig_stmt_info)))
+ data_reference_p dr = STMT_VINFO_DATA_REF (orig_stmt_info);
+ tree base = get_base_address (DR_REF (dr));
+ /* When the scalar access is to a non-global not address-taken
+decl that is not BLKmode assume we can access it with a single
+non-load/store instruction.  */
+ if (DECL_P (base)
+ && !is_global_var (base)
+ && !TREE_ADDRESSABLE (base)
+ && DECL_MODE (base) != BLKmode)
+   kind = scalar_stmt;
+ else if (DR_IS_READ (STMT_VINFO_DATA_REF (orig_stmt_info)))
kind = scalar_load;
  else
kind = scalar_store;
-- 
2.43.0

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Joseph Myers

On Thu, 8 Aug 2024, Alejandro Colomar wrote:

> Hi Jens,
> 
> On Thu, Aug 08, 2024 at 11:13:02AM GMT, Jens Gustedt wrote:
> > > but to maintain expectations, I think it would be better to do
> > > the same here.
> > 
> > Just to compare, the recent additions in C23 typeof etc. only have the
> > parenthesized versions. So there would be precedent. And it really
> > eases transition
> 
> Hmmm, interesting.
> 
> The good part of reusing sizeof syntax is that I can reuse internal code
> for sizeof.  But I'll check if I can change it easily to only support
> parens.

Since typeof produces a type, it's used in different syntactic contexts 
from sizeof, so has different ambiguity issues, and requiring parentheses 
with typeof is not relevant to sizeof/lengthof.  I think lengthof should 
follow sizeof.  Make sure there's a testcase for lengthof applied to a 
compound literal (the case that illustrates how, on parsing sizeof 
(type-name), the compiler needs to see what comes after (type-name) to 
determine whether it's actually sizeof applied to an expression (if '{' 
follows) or to a type (otherwise)).  (If you're following the sizeof 
implementation closely enough, this should just work.)

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH v3] diagnostics: Follow DECL_ORIGIN in lhd_print_error_function [PR102061]

2024-08-08 Thread Peter0x44


On 2024-08-08 09:04, Richard Biener wrote:
On Thu, Aug 8, 2024 at 4:55 AM Peter Damianov  
wrote:


Currently, if a warning references a cloned function, the name of the 
cloned
function will be emitted in the "In function 'xyz'" part of the 
diagnostic,
which users aren't supposed to see. This patch follows the DECL_ORIGIN 
link
to get the name of the original function, so the internal compiler 
details

aren't exposed.


Note I see an almost exact copy of the function in cp/error.cc as
cp_print_error_function (possibly more modern), specifically using
I noticed that too, but I'm not sure what circumstances it is used 
under. I checked my patch removed the cloned names in the diagnostic for 
both C and C++.


  pp_printf (context->printer, function_category (fndecl),
 fndecl);

which ends up using %qD.

I've CCed David who likely invented diagnostic_abstract_origin and 
friends.



gcc/ChangeLog:
PR diagnostics/102061
* langhooks.cc (lhd_print_error_function): Follow DECL_ORIGIN
links.
* gcc.dg/pr102061.c: New testcase.

Signed-off-by: Peter Damianov 
---
v3: also follow DECL_ORIGIN when emitting "inlined from" warnings, I 
missed this before.

Add testcase.

 gcc/langhooks.cc|  3 +++
 gcc/testsuite/gcc.dg/pr102061.c | 35 
+

 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr102061.c

diff --git a/gcc/langhooks.cc b/gcc/langhooks.cc
index 61f2b676256..7a2a66b3c39 100644
--- a/gcc/langhooks.cc
+++ b/gcc/langhooks.cc
@@ -395,6 +395,8 @@ lhd_print_error_function (diagnostic_context 
*context, const char *file,

  else
fndecl = current_function_decl;

+ fndecl = DECL_ORIGIN(fndecl);


Space after DECL_ORIGIN.  There's a comment warranted for what we
intend do to here.

I think this change is reasonable.


+
  if (TREE_CODE (TREE_TYPE (fndecl)) == METHOD_TYPE)
pp_printf
  (context->printer, _("In member function %qs"),
@@ -439,6 +441,7 @@ lhd_print_error_function (diagnostic_context 
*context, const char *file,

}
  if (fndecl)
{
+ fndecl = DECL_ORIGIN(fndecl);


Space missing again.

This change OTOH might cause us to print

inlined from foo at ...
inlined from foo at ...

so duplicating an inline for example in the case we split a function 
and then
inline both parts or in the case we inline a IPA-CP forwarder and the 
specific
clone.  It's not obvious what we should do here since of course for a 
recursive

function we can have a function inlined two times in a row.

The testcase only triggers the first case, right?
Correct. I don't know how to construct a testcase exercising that, the 
thing that made me notice this first was actually a case with ".isra" 
and not ".constprop", but it would be nice for completeness. And also 
having a testcase that covers the other path, not just inlining.


David, any comments?  I think the patch is OK with the formatting 
fixed.


Thanks,
Richard.


  expanded_location s = expand_location (*locus);
  pp_comma (context->printer);
  pp_newline (context->printer);
diff --git a/gcc/testsuite/gcc.dg/pr102061.c 
b/gcc/testsuite/gcc.dg/pr102061.c

new file mode 100644
index 000..dbdd23965e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102061.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall -O2" } */
+/* { dg-message "inlined from 'bar'" "" { target *-*-* } 0 } */
+/* { dg-excess-errors "" } */
+
+static inline void
+foo (char *p)
+{
+  __builtin___memcpy_chk (p, "abc", 3, __builtin_object_size (p, 0));
+}
+static void
+bar (char *p) __attribute__((noinline));
+static void
+bar (char *p)
+{
+  foo (p);
+}
+void f(char*) __attribute__((noipa));
+char buf[2];
+void
+baz (void) __attribute__((noinline));
+void
+baz (void)
+{
+  bar (buf);
+  f(buf);
+}
+
+void f(char*)
+{}
+
+int main(void)
+{
+baz();
+}
--
2.39.2


Thanks for the review,
Peter D.

[PATCH] c++: Propagate TREE_ADDRESSABLE in fixup_type_variants [PR115062]

2024-08-08 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

The change to 'finish_struct_bits' is not required for this PR but I
felt it was a nice cleanup; happy to commit without it though if
preferred.

-- >8 --

This has caused issues with modules when an import fills in the
definition of a type already created with a typedef.

PR c++/115062

gcc/cp/ChangeLog:

* class.cc (fixup_type_variants): Propagate TREE_ADDRESSABLE.
(finish_struct_bits): Cleanup now that TREE_ADDRESSABLE is
propagated by fixup_type_variants.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr115062_a.H: New test.
* g++.dg/modules/pr115062_b.H: New test.
* g++.dg/modules/pr115062_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/class.cc   | 31 ++-
 gcc/testsuite/g++.dg/modules/pr115062_a.H |  6 +
 gcc/testsuite/g++.dg/modules/pr115062_b.H | 14 ++
 gcc/testsuite/g++.dg/modules/pr115062_c.C |  9 +++
 4 files changed, 43 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr115062_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/pr115062_b.H
 create mode 100644 gcc/testsuite/g++.dg/modules/pr115062_c.C

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 718601756dd..fb6c3370950 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -2312,6 +2312,7 @@ fixup_type_variants (tree type)
   TYPE_PRECISION (variant) = TYPE_PRECISION (type);
   TYPE_MODE_RAW (variant) = TYPE_MODE_RAW (type);
   TYPE_EMPTY_P (variant) = TYPE_EMPTY_P (type);
+  TREE_ADDRESSABLE (variant) = TREE_ADDRESSABLE (type);
 }
 }
 
@@ -2378,8 +2379,17 @@ fixup_attribute_variants (tree t)
 static void
 finish_struct_bits (tree t)
 {
-  /* Fix up variants (if any).  */
-  fixup_type_variants (t);
+  /* If this type has a copy constructor or a destructor, force its
+ mode to be BLKmode, and force its TREE_ADDRESSABLE bit to be
+ nonzero.  This will cause it to be passed by invisible reference
+ and prevent it from being returned in a register.  */
+  if (type_has_nontrivial_copy_init (t)
+  || TYPE_HAS_NONTRIVIAL_DESTRUCTOR (t))
+{
+  SET_DECL_MODE (TYPE_MAIN_DECL (t), BLKmode);
+  SET_TYPE_MODE (t, BLKmode);
+  TREE_ADDRESSABLE (t) = 1;
+}
 
   if (BINFO_N_BASE_BINFOS (TYPE_BINFO (t)) && TYPE_POLYMORPHIC_P (t))
 /* For a class w/o baseclasses, 'finish_struct' has set
@@ -2392,21 +2402,8 @@ finish_struct_bits (tree t)
looking in the vtables).  */
 get_pure_virtuals (t);
 
-  /* If this type has a copy constructor or a destructor, force its
- mode to be BLKmode, and force its TREE_ADDRESSABLE bit to be
- nonzero.  This will cause it to be passed by invisible reference
- and prevent it from being returned in a register.  */
-  if (type_has_nontrivial_copy_init (t)
-  || TYPE_HAS_NONTRIVIAL_DESTRUCTOR (t))
-{
-  tree variants;
-  SET_DECL_MODE (TYPE_MAIN_DECL (t), BLKmode);
-  for (variants = t; variants; variants = TYPE_NEXT_VARIANT (variants))
-   {
- SET_TYPE_MODE (variants, BLKmode);
- TREE_ADDRESSABLE (variants) = 1;
-   }
-}
+  /* Fix up variants (if any).  */
+  fixup_type_variants (t);
 }
 
 /* Issue warnings about T having private constructors, but no friends,
diff --git a/gcc/testsuite/g++.dg/modules/pr115062_a.H 
b/gcc/testsuite/g++.dg/modules/pr115062_a.H
new file mode 100644
index 000..3c9daac317e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr115062_a.H
@@ -0,0 +1,6 @@
+// PR c++/115062
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+template  class S;
+typedef S X;
diff --git a/gcc/testsuite/g++.dg/modules/pr115062_b.H 
b/gcc/testsuite/g++.dg/modules/pr115062_b.H
new file mode 100644
index 000..d8da59591ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr115062_b.H
@@ -0,0 +1,14 @@
+// PR c++/115062
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+template 
+struct S {
+  int a;
+  long b;
+  union {};
+  ~S();
+  void foo();
+};
+extern template void S::foo();
+S operator+(S, const char *);
diff --git a/gcc/testsuite/g++.dg/modules/pr115062_c.C 
b/gcc/testsuite/g++.dg/modules/pr115062_c.C
new file mode 100644
index 000..5255b9ffca7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr115062_c.C
@@ -0,0 +1,9 @@
+// PR c++/115062
+// { dg-additional-options "-fmodules-ts" }
+
+import "pr115062_a.H";
+import "pr115062_b.H";
+
+int main() {
+  X x = X() + "";
+}
-- 
2.43.2

Re: [RFC] Generalize formation of lane-reducing ops in loop reduction

2024-08-08 Thread Richard Biener

On Sat, Aug 3, 2024 at 2:42 PM Feng Xue OS  wrote:
>
> >> 1. Background
> >>
> >> For loop reduction of accumulating result of a widening operation, the
> >> preferred pattern is lane-reducing operation, if supported by target. 
> >> Because
> >> this kind of operation need not preserve intermediate results of widening
> >> operation, and only produces reduced amount of final results for 
> >> accumulation,
> >> choosing the pattern could lead to pretty compact codegen.
> >>
> >> Three lane-reducing opcodes are defined in gcc, belonging to two kinds of
> >> operations: dot-product (DOT_PROD_EXPR) and sum-of-absolute-difference
> >> (SAD_EXPR). WIDEN_SUM_EXPR could be seen as a degenerated dot-product with 
> >> a
> >> constant operand as "1". Currently, gcc only supports recognition of simple
> >> lane-reducing case, in which each accumulation statement of loop reduction
> >> forms one pattern:
> >>
> >>  char  *d0, *d1;
> >>  short *s0, *s1;
> >>
> >>  for (i) {
> >>sum += d0[i] * d1[i];  //  = DOT_PROD  >> char>
> >>sum += abs(s0[i] - s1[i]); //  = SAD 
> >>  }
> >>
> >> We could rewrite the example as the below using only one statement, whose 
> >> non-
> >> reduction addend is the sum of the above right-side parts. As a whole, the
> >> addend would match nothing, while its two sub-expressions could be 
> >> recognized
> >> as corresponding lane-reducing patterns.
> >>
> >>  for (i) {
> >>sum += d0[i] * d1[i] + abs(s0[i] - s1[i]);
> >>  }
> >
> > Note we try to recognize the original form as SLP reduction (which of
> > course fails).
> >
> >> This case might be too elaborately crafted to be very common in reality.
> >> Though, we do find seemingly variant but essentially similar code pattern 
> >> in
> >> some AI applications, which use matrix-vector operations extensively, some
> >> usages are just single loop reduction composed of multiple dot-products. A
> >> code snippet from ggml:
> >>
> >>  for (int j = 0; j < qk/2; ++j) {
> >>const uint8_t xh_0 = ((qh >> (j +  0)) << 4) & 0x10;
> >>const uint8_t xh_1 = ((qh >> (j + 12)) ) & 0x10;
> >>
> >>const int32_t x0 = (x[i].qs[j] & 0xF) | xh_0;
> >>const int32_t x1 = (x[i].qs[j] >>  4) | xh_1;
> >>
> >>sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]);
> >>  }
> >>
> >> In the source level, it appears to be a nature and minor scaling-up of 
> >> simple
> >> one lane-reducing pattern, but it is beyond capability of current 
> >> vectorization
> >> pattern recognition, and needs some kind of generic extension to the 
> >> framework.
>
> Sorry for late response.
>
> > So this is about re-associating lane-reducing ops to alternative 
> > lane-reducing
> > ops to save repeated accumulation steps?
>
> You mean re-associating slp-based lane-reducing ops to loop-based?

Yes.

> > The thing is that IMO pattern recognition as we do now is limiting and 
> > should
> > eventually move to the SLP side where we should be able to more freely
> > "undo" and associate.
>
> No matter pattern recognition is done prior to or within SLP, the must thing 
> is we
> need to figure out which op is qualified for lane-reducing by some means.
>
> For example, when seeing a mult in a loop with vectorization-favored shape,
> ...
> t = a * b; // char a, b
> ...
>
> we could not say it is decidedly applicable for reduced computation via 
> dot-product
> even the corresponding target ISA is available.

True.  Note there's a PR which shows SLP lane-reducing written out like

  a[i] = b[4*i] * 3 + b[4*i+1] * 3 + b[4*i+2] * 3 + b[4*i+3] * 3;

which we cannot change to a DOT_PROD because we do not know which
lanes are reduced.  My point was there are non-reduction cases where knowing
which actual lanes get reduced would help.  For reductions it's not important
and associating in a way to expose more possible (reduction) lane reductions
is almost always going to be a win.

> Recognition of normal patterns merely involves local statement-based match, 
> while
> for lane-reducing, validity check requires global loop-wise analysis on 
> structure of
> reduction, probably not same as, but close to what is proposed in the RFC. The
> basic logic, IMHO, is independent of where pattern recognition is implemented.
> As the matter of fact, this is not about of "associating", but "tagging" 
> (mark all lane-
> reducing quantifiable statements). After the process, "re-associator" could 
> play its
> role to guide selection of either loop-based or slp-based lane-reducing op.
>
> > I've searched twice now, a few days ago I read that the optabs not 
> > specifying
> > which lanes are combined/reduced is a limitation.  Yes, it is - I hope we 
> > can
> > rectify this, so if this is motivation enough we should split the optabs up
> > into even/odd/hi/lo (or whatever else interesting targets actually do).
>
> Actually, how lanes are combined/reduced does not matter too much regarding
> to recognition of lane-reducing patterns.
>
> > I did read through th

[PATCH] c++/modules: Assume header bindings are global module

2024-08-08 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

While stepping through some code I noticed that we do some extra work
(finding the originating module decl, stripping the template, and
inspecting the attached-ness) for every declaration taken from a header
unit.  This doesn't seem necessary though since no declaration in a
header unit can be attached to anything but the global module, so we can
just assume that global_p will be true.

This was the original behaviour before I removed this assumption while
refactoring for r15-2807-gc592310d5275e0.

gcc/cp/ChangeLog:

* module.cc (module_state::read_cluster): Assume header module
declarations will require GM merging.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 58ad8cbdb61..f4d137b13a1 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -15361,7 +15361,7 @@ module_state::read_cluster (unsigned snum)
tree visible = NULL_TREE;
tree type = NULL_TREE;
bool dedup = false;
-   bool global_p = false;
+   bool global_p = is_header ();
 
/* We rely on the bindings being in the reverse order of
   the resulting overload set.  */
-- 
2.43.2

Re: [PATCH] vect: Multistep float->int conversion only with no trapping math

2024-08-08 Thread Richard Biener

On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ  wrote:
>
> Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ  wrote:
> > >
> > > Do not convert floats to ints in multiple step if trapping math is
> > > enabled.  This might hide some inexact signals.
> > >
> > > Also use correct sign (the sign of the target integer type) for the
> > > intermediate steps.  This only affects undefined behaviour (casting
> > > floats to unsigned datatype where the float is negative).
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> > >   float to int conversion only with trapping math and correct
> > >   sign.
> > >
> > > Signed-off-by: Juergen Christ 
> > >
> > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > >
> > > ---
> > >  gcc/tree-vect-stmts.cc | 8 +---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index fdcda0d2abae..2ddd13383193 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> > > break;
> > >
> > >   cvt_type
> > > -   = build_nonstandard_integer_type (GET_MODE_BITSIZE 
> > > (rhs_mode), 0);
> > > +   = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > + TYPE_UNSIGNED (lhs_type));
> >
> > But lhs_type should be a float type here, the idea that for a
> > FLOAT_EXPR (int -> float)
> > a signed integer type is the natural one to use - as it's 2x wider
> > than the original
> > RHS type it's signedness doesn't matter.  Note all float types should be
> > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent 
> > IMO.
> >
> > Please drop it.
>
> Will do.  Sorry about that.
>
> > >   cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > >   if (cvt_type == NULL_TREE)
> > > goto unsupported;
> > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> > >if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> > > goto unsupported;
> > >
> > > -  if (code == FIX_TRUNC_EXPR)
> > > +  if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > > {
> > >   cvt_type
> > > -   = build_nonstandard_integer_type (GET_MODE_BITSIZE 
> > > (rhs_mode), 0);
> > > +   = build_nonstandard_integer_type (GET_MODE_BITSIZE (rhs_mode),
> > > + TYPE_UNSIGNED (lhs_type));
> >
> > Here it might be relevant for correctness - we have to choose between
> > sfix and ufix for the float -> [u]int conversion.
> >
> > Do  you have a testcase?  Shouldn't the exactness be independent of the 
> > integer
> > type we convert to?
>
> I was looking at this little program which contains undefined behaviour:
>
> #include 
>
> __attribute__((noinline,noclone,noipa))
> void
> vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
>
> void
> vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> {
> out[0] = in[0];
> out[1] = in[1];
> out[2] = in[2];
> out[3] = in[3];
> }
>
> int main()
> {
> double in[] = {-1,-2,-3,-4};
> unsigned int out[4];
>
> vec_pack_ufix_trunc_v2df (in, out);
> for (int i = 0; i < 4; ++i)
> printf("out[%d] = %u\n", i, out[i]);
> return 0;
> }
>
> On s390x, I get different results after vectorization:
>
> out[0] = 4294967295
> out[1] = 4294967294
> out[2] = 4294967293
> out[3] = 4294967292
>
> than without vectorization:
>
> out[0] = 0
> out[1] = 0
> out[2] = 0
> out[3] = 0
>
> Even if this is undefined behaviour, I think it would be nice to have
> consistent results here.
>
> Also, while I added an expander to circumvent this problem in a
> previous patch, reviewers requested to hide this behind trapping math.
> Thus, I looked into this.
>
> Seeing the result from the CI for aarch64, I guess there are some
> tests that actually expect this vectorization to always happen even
> though it might not be save w.r.t. trapping math.

I do remember this was extensively discussed (but we might have missed
something) and one argument indeed was that when it's undefined behavior
we can do the vectorization given the actual values might be in-bound.

Richard.

> >
> > >   cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > >   if (cvt_type == NULL_TREE)
> > > goto unsupported;
> > > --
> > > 2.43.5
> > >

Re: [PATCH v2] Rearrange SLP nodes with duplicate statements. [PR98138]

2024-08-08 Thread Richard Biener

On Tue, Aug 6, 2024 at 12:38 PM Manolis Tsamis  wrote:
>
> Pinging this for a review and/or further feedback.
>
> Thanks,
> Manolis
>
> On Wed, Jun 26, 2024 at 3:06 PM Manolis Tsamis  
> wrote:
> >
> > This change checks when a two_operators SLP node has multiple occurrences of
> > the same statement (e.g. {A, B, A, B, ...}) and tries to rearrange the 
> > operands
> > so that there are no duplicates. Two vec_perm expressions are then 
> > introduced
> > to recreate the original ordering. These duplicates can appear due to how
> > two_operators nodes are handled, and they prevent vectorization in some 
> > cases.
> >
> > This targets the vectorization of the SPEC2017 x264 pixel_satd functions.
> > In some processors a larger than 10% improvement on x264 has been observed.
> >
> > See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138

This patch is OK.

Sorry for the slow reply/review.

Thanks,
Richard.

> > gcc/ChangeLog:
> >
> > * tree-vect-slp.cc: Avoid duplicates in two_operators nodes.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/vect-slp-two-operator.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> > Changes in v2:
> > - Do not use predefined patterns; support rearrangement of arbitrary
> > node orderings.
> > - Only apply for two_operators nodes.
> > - Recurse with single SLP operand instead of two duplicated ones.
> > - Refactoring of code.
> >
> >  .../aarch64/vect-slp-two-operator.c   |  36 ++
> >  gcc/tree-vect-slp.cc  | 114 ++
> >  2 files changed, 150 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c 
> > b/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> > new file mode 100644
> > index 000..b6b093ffc34
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect 
> > -fdump-tree-vect-details" } */
> > +
> > +typedef unsigned char uint8_t;
> > +typedef unsigned int uint32_t;
> > +
> > +#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
> > +int t0 = s0 + s1;\
> > +int t1 = s0 - s1;\
> > +int t2 = s2 + s3;\
> > +int t3 = s2 - s3;\
> > +d0 = t0 + t2;\
> > +d1 = t1 + t3;\
> > +d2 = t0 - t2;\
> > +d3 = t1 - t3;\
> > +}
> > +
> > +void sink(uint32_t tmp[4][4]);
> > +
> > +int x264_pixel_satd_8x4( uint8_t *pix1, int i_pix1, uint8_t *pix2, int 
> > i_pix2 )
> > +{
> > +uint32_t tmp[4][4];
> > +int sum = 0;
> > +for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 )
> > +{
> > +uint32_t a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
> > +uint32_t a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
> > +uint32_t a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
> > +uint32_t a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
> > +HADAMARD4( tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0,a1,a2,a3 
> > );
> > +}
> > +sink(tmp);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index b47b7e8c979..60d0d388dff 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -2420,6 +2420,95 @@ out:
> >}
> >swap = NULL;
> >
> > +  bool has_two_operators_perm = false;
> > +  auto_vec two_op_perm_indices[2];
> > +  vec two_op_scalar_stmts[2] = {vNULL, vNULL};
> > +
> > +  if (two_operators && oprnds_info.length () == 2 && group_size > 2)
> > +{
> > +  unsigned idx = 0;
> > +  hash_map seen;
> > +  vec new_oprnds_info
> > +   = vect_create_oprnd_info (1, group_size);
> > +  bool success = true;
> > +
> > +  enum tree_code code = ERROR_MARK;
> > +  if (oprnds_info[0]->def_stmts[0]
> > + && is_a (oprnds_info[0]->def_stmts[0]->stmt))
> > +   code = gimple_assign_rhs_code (oprnds_info[0]->def_stmts[0]->stmt);
> > +
> > +  for (unsigned j = 0; j < group_size; ++j)
> > +   {
> > + FOR_EACH_VEC_ELT (oprnds_info, i, oprnd_info)
> > +   {
> > + stmt_vec_info stmt_info = oprnd_info->def_stmts[j];
> > + if (!stmt_info || !stmt_info->stmt
> > + || !is_a (stmt_info->stmt)
> > + || gimple_assign_rhs_code (stmt_info->stmt) != code
> > + || skip_args[i])
> > +   {
> > + success = false;
> > + break;
> > +   }
> > +
> > + bool exists;
> > + unsigned &stmt_idx
> > +   = seen.get_or_insert (stmt_info->stmt, &exists);
> > +
> > + if (!exists)
>

[Patch] libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device (was: Re: [PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates)

2024-08-08 Thread Tobias Burnus


Document  -fno-builtin-omp_is_initial_device as discussed:

Jakub Jelinek wrote:

RFC: Should be document this new built-in some where? If so, where? As part
of the routine description in libgomp.texi? Or in extend.texi (or even
invoke.texi)?

I think libgomp.texi in the omp_is_initial_device description, mention
that the compiler folds it by default and that if that is undesirable,
there is this option to use.


Unless there are wording suggestions, I will commit it later today.

Tobias
libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device

libgomp/ChangeLog:

	* libgomp.texi (omp_is_initial_device): Mention
	-fno-builtin-omp_is_initial_device and folding by default.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index c6759dd03bc..96cc0e4baa8 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1754,6 +1754,10 @@ This function returns @code{true} if currently running on the host device,
 @code{false} otherwise.  Here, @code{true} and @code{false} represent
 their language-specific counterparts.
 
+Note that in GCC this value is already folded to a constant in the compiler;
+compile with @option{-fno-builtin-omp_is_initial_device} if a run-time function
+is desired.
+
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}

Re: [Patch] libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device (was: Re: [PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates)

2024-08-08 Thread Jakub Jelinek

On Thu, Aug 08, 2024 at 02:18:48PM +0200, Tobias Burnus wrote:
> Document  -fno-builtin-omp_is_initial_device as discussed:
> 
> Jakub Jelinek wrote:
> > > RFC: Should be document this new built-in some where? If so, where? As 
> > > part
> > > of the routine description in libgomp.texi? Or in extend.texi (or even
> > > invoke.texi)?
> > I think libgomp.texi in the omp_is_initial_device description, mention
> > that the compiler folds it by default and that if that is undesirable,
> > there is this option to use.
> 
> Unless there are wording suggestions, I will commit it later today.
> 
> Tobias

> libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device
> 
> libgomp/ChangeLog:
> 
>   * libgomp.texi (omp_is_initial_device): Mention
>   -fno-builtin-omp_is_initial_device and folding by default.

LGTM.

Jakub

[Patch] libgomp.texi: Update implementation status table for OpenMP TR13

2024-08-08 Thread Tobias Burnus

Update for the very recently released TR13. Unsurprisingly, most item 
are still unimplemented.


→ https://www.openmp.org/specifications/ → Technical Report 13

Comments, suggestions, typo fixes? — If not, I will commit it later today.

Tobias
libgomp.texi: Update implementation status table for OpenMP TR13

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Technical Report 13): Renamed from
	'OpenMP Technical Report 12'; updated for TR13 changes.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index c6759dd03bc..96cc0e4baa8 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -169,7 +169,7 @@ See also @ref{OpenMP Implementation Status}.
 * OpenMP 5.0:: Feature completion status to 5.0 specification
 * OpenMP 5.1:: Feature completion status to 5.1 specification
 * OpenMP 5.2:: Feature completion status to 5.2 specification
-* OpenMP Technical Report 12:: Feature completion status to second 6.0 preview
+* OpenMP Technical Report 13:: Feature completion status to third 6.0 preview
 @end menu
 
 The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version}
@@ -391,7 +391,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @item @code{destroy} clause with destroy-var argument on @code{depobj}
   @tab Y @tab
 @item Deprecation of no-argument @code{destroy} clause on @code{depobj}
-  @tab N @tab
+  @tab N/A @tab undeprecated in OpenMP 6
 @item @code{linear} clause syntax changes and @code{step} modifier @tab Y @tab
 @item Deprecation of minus operator for reductions @tab N @tab
 @item Deprecation of separating @code{map} modifiers without comma @tab N @tab
@@ -448,20 +448,24 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @end multitable
 
 
-@node OpenMP Technical Report 12
-@section OpenMP Technical Report 12
+@node OpenMP Technical Report 13
+@section OpenMP Technical Report 13
 
-Technical Report (TR) 12 is the second preview for OpenMP 6.0.
+Technical Report (TR) 13 is the third preview for OpenMP 6.0.
 
 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
 @multitable @columnfractions .60 .10 .25
-@item Features deprecated in versions 5.2, 5.1 and 5.0 were removed
+@item Features deprecated in versions 5.0, 5.1 and 5.2 were removed
   @tab N/A @tab Backward compatibility
 @item Full support for C23 was added @tab P @tab
 @item Full support for C++23 was added @tab P @tab
+@item Full support for Fortran 2023 was added @tab P @tab
 @item @code{_ALL} suffix to the device-scope environment variables
   @tab P @tab Host device number wrongly accepted
 @item @code{num_threads} now accepts a list @tab N @tab
+@item Abstract names added for @code{OMP_NUM_THREADS},
+  @code{OMP_THREAD_LIMIT} and @code{OMP_TEAMS_THREAD_LIMIT}
+  @tab N @tab
 @item Supporting increments with abstract names in @code{OMP_PLACES} @tab N @tab
 @item Extension of @code{OMP_DEFAULT_DEVICE} and new
   @code{OMP_AVAILABLE_DEVICES} environment vars @tab N @tab
@@ -470,28 +474,51 @@ Technical Report (TR) 12 is the second preview for OpenMP 6.0.
   @tab Y @tab
 @item The OpenMP directive syntax was extended to include C 23 attribute
   specifiers @tab Y @tab
+@item Support for pure directives in Fortran's @code{do concurrent} @tab N @tab
 @item All inarguable clauses take now an optional Boolean argument @tab N @tab
 @item For Fortran, @emph{locator list} can be also function reference with
   data pointer result @tab N @tab
 @item Concept of @emph{assumed-size arrays} in C and C++
   @tab N @tab
 @item @emph{directive-name-modifier} accepted in all clauses @tab N @tab
+@item Argument-free version of @code{depobj} including added @code{init} clause
+  @tab N @tab
+@item Undeprecate omitting the argument to the @code{depend} clause of
+  the argument version of the @code{depend} construct @tab Y @tab
 @item For Fortran, atomic with BLOCK construct and, for C/C++, with
   unlimited curly braces supported @tab N @tab
+@item For Fortran, atomic with pointer comparison @tab N @tab
+@item For Fortran, atomic with enum and enumeration types @tab N @tab
 @item For Fortran, atomic compare with storing the comparison result
   @tab N @tab
 @item New @code{looprange} clause @tab N @tab
-@item Ref-count change for @code{use_device_ptr}/@code{use_device_addr}
+@item For Fortran, handling polymorphic types in data-sharing-attribute
+  clauses @tab P @tab @code{private} not supported
+@item For Fortran, rejecting polymorphic types in data-mapping clauses
+  @tab N @tab not diagnosed (and mostly unsupported)
+@item New @code{taskgraph} construct including @emph{saved} modifier and
+  @code{replayable} clause @tab N @tab
+@item @code{default} clause on the @code{target} directive @tab N @tab
+@item Ref-count change for @code{use_device_ptr} and @code{use_device_addr}
   @tab N @tab
 @item Support for inductions @tab N @tab
+@item Deprecation of the combiner expressio

Re: [PATCH] c++/modules: Assume header bindings are global module

2024-08-08 Thread Jason Merrill


On 8/8/24 8:06 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

While stepping through some code I noticed that we do some extra work
(finding the originating module decl, stripping the template, and
inspecting the attached-ness) for every declaration taken from a header
unit.  This doesn't seem necessary though since no declaration in a
header unit can be attached to anything but the global module, so we can
just assume that global_p will be true.

This was the original behaviour before I removed this assumption while
refactoring for r15-2807-gc592310d5275e0.

gcc/cp/ChangeLog:

* module.cc (module_state::read_cluster): Assume header module
declarations will require GM merging.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 58ad8cbdb61..f4d137b13a1 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -15361,7 +15361,7 @@ module_state::read_cluster (unsigned snum)
tree visible = NULL_TREE;
tree type = NULL_TREE;
bool dedup = false;
-   bool global_p = false;
+   bool global_p = is_header ();
  
  	/* We rely on the bindings being in the reverse order of

   the resulting overload set.  */

Re: [PATCH] c++: Propagate TREE_ADDRESSABLE in fixup_type_variants [PR115062]

2024-08-08 Thread Jason Merrill


On 8/8/24 7:59 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


The change to 'finish_struct_bits' is not required for this PR but I
felt it was a nice cleanup; happy to commit without it though if
preferred.

-- >8 --

This has caused issues with modules when an import fills in the
definition of a type already created with a typedef.

PR c++/115062

gcc/cp/ChangeLog:

* class.cc (fixup_type_variants): Propagate TREE_ADDRESSABLE.
(finish_struct_bits): Cleanup now that TREE_ADDRESSABLE is
propagated by fixup_type_variants.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr115062_a.H: New test.
* g++.dg/modules/pr115062_b.H: New test.
* g++.dg/modules/pr115062_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/class.cc   | 31 ++-
  gcc/testsuite/g++.dg/modules/pr115062_a.H |  6 +
  gcc/testsuite/g++.dg/modules/pr115062_b.H | 14 ++
  gcc/testsuite/g++.dg/modules/pr115062_c.C |  9 +++
  4 files changed, 43 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr115062_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/pr115062_b.H
  create mode 100644 gcc/testsuite/g++.dg/modules/pr115062_c.C

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 718601756dd..fb6c3370950 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -2312,6 +2312,7 @@ fixup_type_variants (tree type)
TYPE_PRECISION (variant) = TYPE_PRECISION (type);
TYPE_MODE_RAW (variant) = TYPE_MODE_RAW (type);
TYPE_EMPTY_P (variant) = TYPE_EMPTY_P (type);
+  TREE_ADDRESSABLE (variant) = TREE_ADDRESSABLE (type);
  }
  }
  
@@ -2378,8 +2379,17 @@ fixup_attribute_variants (tree t)

  static void
  finish_struct_bits (tree t)
  {
-  /* Fix up variants (if any).  */
-  fixup_type_variants (t);
+  /* If this type has a copy constructor or a destructor, force its
+ mode to be BLKmode, and force its TREE_ADDRESSABLE bit to be
+ nonzero.  This will cause it to be passed by invisible reference
+ and prevent it from being returned in a register.  */
+  if (type_has_nontrivial_copy_init (t)
+  || TYPE_HAS_NONTRIVIAL_DESTRUCTOR (t))
+{
+  SET_DECL_MODE (TYPE_MAIN_DECL (t), BLKmode);
+  SET_TYPE_MODE (t, BLKmode);
+  TREE_ADDRESSABLE (t) = 1;
+}
  
if (BINFO_N_BASE_BINFOS (TYPE_BINFO (t)) && TYPE_POLYMORPHIC_P (t))

  /* For a class w/o baseclasses, 'finish_struct' has set
@@ -2392,21 +2402,8 @@ finish_struct_bits (tree t)
 looking in the vtables).  */
  get_pure_virtuals (t);
  
-  /* If this type has a copy constructor or a destructor, force its

- mode to be BLKmode, and force its TREE_ADDRESSABLE bit to be
- nonzero.  This will cause it to be passed by invisible reference
- and prevent it from being returned in a register.  */
-  if (type_has_nontrivial_copy_init (t)
-  || TYPE_HAS_NONTRIVIAL_DESTRUCTOR (t))
-{
-  tree variants;
-  SET_DECL_MODE (TYPE_MAIN_DECL (t), BLKmode);
-  for (variants = t; variants; variants = TYPE_NEXT_VARIANT (variants))
-   {
- SET_TYPE_MODE (variants, BLKmode);
- TREE_ADDRESSABLE (variants) = 1;
-   }
-}
+  /* Fix up variants (if any).  */
+  fixup_type_variants (t);
  }
  
  /* Issue warnings about T having private constructors, but no friends,

diff --git a/gcc/testsuite/g++.dg/modules/pr115062_a.H 
b/gcc/testsuite/g++.dg/modules/pr115062_a.H
new file mode 100644
index 000..3c9daac317e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr115062_a.H
@@ -0,0 +1,6 @@
+// PR c++/115062
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+template  class S;
+typedef S X;
diff --git a/gcc/testsuite/g++.dg/modules/pr115062_b.H 
b/gcc/testsuite/g++.dg/modules/pr115062_b.H
new file mode 100644
index 000..d8da59591ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr115062_b.H
@@ -0,0 +1,14 @@
+// PR c++/115062
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+template 
+struct S {
+  int a;
+  long b;
+  union {};
+  ~S();
+  void foo();
+};
+extern template void S::foo();
+S operator+(S, const char *);
diff --git a/gcc/testsuite/g++.dg/modules/pr115062_c.C 
b/gcc/testsuite/g++.dg/modules/pr115062_c.C
new file mode 100644
index 000..5255b9ffca7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr115062_c.C
@@ -0,0 +1,9 @@
+// PR c++/115062
+// { dg-additional-options "-fmodules-ts" }
+
+import "pr115062_a.H";
+import "pr115062_b.H";
+
+int main() {
+  X x = X() + "";
+}

[nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-08-08 Thread Prathamesh Kulkarni

Hi Richard,
After differing NUM_POLY_INT_COEFFS fix for AArch64/nvptx offloading, the 
following minimal test:

int main()
{
  int x;
  #pragma omp target map(x)
x = 5;
  return x;
}

compiled with -fopenmp -foffload=nvptx-none now fails with:
gcc: error: unrecognized command-line option '-m64'
nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit status 
compilation terminated.

As mentioned in RFC email, this happens because 
nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host compiler depending 
on whether
offload_abi is OFFLOAD_ABI_LP64 or OFFLOAD_ABI_ILP32, and aarch64 backend 
doesn't recognize these options.

Based on your suggestion in: 
https://gcc.gnu.org/pipermail/gcc/2024-July/244470.html,
The attached patch generates new macro HOST_MULTILIB derived from 
$enable_as_accelerator_for, and in mkoffload.cc it gates passing -m32/-m64
to host_compiler on HOST_MULTILIB. I verified that the macro is set to 0 for 
aarch64 host (and thus avoids above unrecognized command line option error),
and is set to 1 for x86_64 host.

Does the patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
[nvptx] Pass -m32/-m64 to host_compiler if it has multilib support.

gcc/ChangeLog:
* configure.ac: Generate new macro HOST_MULTILIB.
* config.in: Regenerate.
* configure: Likewise.
* config/nvptx/mkoffload.cc (compile_native): Gate appending
"-m32"/"-m64" to argv_obstack on HOST_MULTILIB.
(main): Likewise.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/config.in b/gcc/config.in
index 7fcabbe5061..3c509356f0a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -2270,6 +2270,12 @@
 #endif
 
 
+/* Define if host has multilib support. */
+#ifndef USED_FOR_TARGET
+#undef HOST_MULTILIB
+#endif
+
+
 /* Define which stat syscall is able to handle 64bit indodes. */
 #ifndef USED_FOR_TARGET
 #undef HOST_STAT_FOR_64BIT_INODES
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 503b1abcefd..f7d29bd5215 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -607,17 +607,18 @@ compile_native (const char *infile, const char *outfile, 
const char *compiler,
   obstack_ptr_grow (&argv_obstack, ptx_dumpbase);
   obstack_ptr_grow (&argv_obstack, "-dumpbase-ext");
   obstack_ptr_grow (&argv_obstack, ".c");
-  switch (offload_abi)
-{
-case OFFLOAD_ABI_LP64:
-  obstack_ptr_grow (&argv_obstack, "-m64");
-  break;
-case OFFLOAD_ABI_ILP32:
-  obstack_ptr_grow (&argv_obstack, "-m32");
-  break;
-default:
-  gcc_unreachable ();
-}
+  if (HOST_MULTILIB)
+switch (offload_abi)
+  {
+   case OFFLOAD_ABI_LP64:
+ obstack_ptr_grow (&argv_obstack, "-m64");
+ break;
+   case OFFLOAD_ABI_ILP32:
+ obstack_ptr_grow (&argv_obstack, "-m32");
+ break;
+   default:
+ gcc_unreachable ();
+  }
   obstack_ptr_grow (&argv_obstack, infile);
   obstack_ptr_grow (&argv_obstack, "-c");
   obstack_ptr_grow (&argv_obstack, "-o");
@@ -761,17 +762,18 @@ main (int argc, char **argv)
   if (verbose)
 obstack_ptr_grow (&argv_obstack, "-v");
   obstack_ptr_grow (&argv_obstack, "-xlto");
-  switch (offload_abi)
-{
-case OFFLOAD_ABI_LP64:
-  obstack_ptr_grow (&argv_obstack, "-m64");
-  break;
-case OFFLOAD_ABI_ILP32:
-  obstack_ptr_grow (&argv_obstack, "-m32");
-  break;
-default:
-  gcc_unreachable ();
-}
+  if (HOST_MULTILIB)
+switch (offload_abi)
+  {
+   case OFFLOAD_ABI_LP64:
+ obstack_ptr_grow (&argv_obstack, "-m64");
+ break;
+   case OFFLOAD_ABI_ILP32:
+ obstack_ptr_grow (&argv_obstack, "-m32");
+ break;
+   default:
+ gcc_unreachable ();
+  }
   if (fopenmp)
 obstack_ptr_grow (&argv_obstack, "-mgomp");
 
diff --git a/gcc/configure b/gcc/configure
index 557ea5fa3ac..cdfa06f0c80 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -931,6 +931,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -1115,6 +1116,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE}'
@@ -1367,6 +1369,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
 ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=*

Re: [PATCH] RISC-V: tree-optimization/116274 - overzealous SLP vectorization

2024-08-08 Thread Richard Sandiford

Richard Biener  writes:
> The following tries to address that the vectorizer fails to have
> precise knowledge of argument and return calling conventions and
> views some accesses as loads and stores that are not.
> This is mainly important when doing basic-block vectorization as
> otherwise loop indexing would force such arguments to memory.
>
> On x86 the reduction in the number of apparent loads and stores
> often dominates cost analysis so the following tries to mitigate
> this aggressively by adjusting only the scalar load and store
> cost, reducing them to the cost of a simple scalar statement,
> but not touching the vector access cost which would be much
> harder to estimate.  Thereby we error on the side of not performing
> basic-block vectorization.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Richard - we can of course do this adjustment in the backend as well
> but it might be worthwhile in generic code.  Do you see similar
> issues on arm?

Yeah, a pathological case is:

struct a { float f[4]; };
struct a test(struct a a) {
  a.f[0] += 1;
  a.f[1] += 2;
  a.f[2] += 3;
  a.f[3] += 4;
  return a;
}

which with -O2 generates:

test:
.LFB0:
.cfi_startproc
fmovw1, s2
fmovw4, s0
mov x0, 0
fmovw3, s1
sub sp, sp, #16
.cfi_def_cfa_offset 16
mov x2, 0
bfi x0, x1, 0, 32
fmovw1, s3
bfi x2, x4, 0, 32
bfi x2, x3, 32, 32
bfi x0, x1, 32, 32
adrpx1, .LC0
stp x2, x0, [sp]
ldr q30, [sp]
ldr q31, [x1, #:lo12:.LC0]
add sp, sp, 16
.cfi_def_cfa_offset 0
faddv31.4s, v30.4s, v31.4s
umovx0, v31.d[0]
umovx1, v31.d[1]
mov x3, x0
lsr x4, x0, 32
lsr x0, x1, 32
fmovs1, w4
fmovs3, w0
fmovs2, w1
lsr w0, w3, 0
fmovs0, w0
ret
.cfi_endproc

Admittedly most of the badness there would probably be fixed by
parameter and return fsra (Jiufu Guo's patch), but it still doesn't
make much sense to marshall 4 separate floats into one vector for
a single addition, only to tear it apart into 4 separate floats
afterwards.  We should just do four scalar additions instead.

(The patch doesn't fix this case, although it does trigger.)

>   PR tree-optimization/116274
>   * tree-vect-slp.cc (vect_bb_slp_scalar_cost): Cost scalar loads
>   and stores as simple scalar stmts when they access a non-global,
>   not address-taken variable that doesn't have BLKmode assigned.
>
>   * gcc.target/i386/pr116274.c: New testcase.
> ---
>  gcc/testsuite/gcc.target/i386/pr116274.c |  9 +
>  gcc/tree-vect-slp.cc | 12 +++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr116274.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr116274.c 
> b/gcc/testsuite/gcc.target/i386/pr116274.c
> new file mode 100644
> index 000..d5811344b93
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr116274.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-slp2-optimized" } */
> +
> +struct a { long x,y; };
> +long test(struct a a) { return a.x+a.y; }
> +
> +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" } } 
> */
> +/* { dg-final { scan-assembler-times "addl|leaq" 1 } } */
> +/* { dg-final { scan-assembler-not "padd" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 3464d0c0e23..e43ff721100 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -7807,7 +7807,17 @@ next_lane:
>vect_cost_for_stmt kind;
>if (STMT_VINFO_DATA_REF (orig_stmt_info))
>   {
> -   if (DR_IS_READ (STMT_VINFO_DATA_REF (orig_stmt_info)))
> +   data_reference_p dr = STMT_VINFO_DATA_REF (orig_stmt_info);
> +   tree base = get_base_address (DR_REF (dr));
> +   /* When the scalar access is to a non-global not address-taken
> +  decl that is not BLKmode assume we can access it with a single
> +  non-load/store instruction.  */
> +   if (DECL_P (base)
> +   && !is_global_var (base)
> +   && !TREE_ADDRESSABLE (base)
> +   && DECL_MODE (base) != BLKmode)
> + kind = scalar_stmt;
> +   else if (DR_IS_READ (STMT_VINFO_DATA_REF (orig_stmt_info)))
>   kind = scalar_load;
> else
>   kind = scalar_store;

LGTM FWIW, but did you consider skipping the cost altogether?
I'm not sure what the scalar_stmt would correspond to in practice,
if we assume that the ABI (for parameters/returns) or RA (for locals)
puts the data in a sensible register class for the datatype.

Thanks,
Richard

Re: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-08-08 Thread Andrew Pinski

On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
 wrote:
>
> Hi Richard,
> After differing NUM_POLY_INT_COEFFS fix for AArch64/nvptx offloading, the 
> following minimal test:
>
> int main()
> {
>   int x;
>   #pragma omp target map(x)
> x = 5;
>   return x;
> }
>
> compiled with -fopenmp -foffload=nvptx-none now fails with:
> gcc: error: unrecognized command-line option '-m64'
> nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit status 
> compilation terminated.
>
> As mentioned in RFC email, this happens because 
> nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host compiler depending 
> on whether
> offload_abi is OFFLOAD_ABI_LP64 or OFFLOAD_ABI_ILP32, and aarch64 backend 
> doesn't recognize these options.
>
> Based on your suggestion in: 
> https://gcc.gnu.org/pipermail/gcc/2024-July/244470.html,
> The attached patch generates new macro HOST_MULTILIB derived from 
> $enable_as_accelerator_for, and in mkoffload.cc it gates passing -m32/-m64
> to host_compiler on HOST_MULTILIB. I verified that the macro is set to 0 for 
> aarch64 host (and thus avoids above unrecognized command line option error),
> and is set to 1 for x86_64 host.
>
> Does the patch look OK ?

Note I think the usage of the name MULTILIB here is wrong because
aarch64 (and riscv) could have MUTLILIB support just the options are
different. For aarch64, it would be -mabi=ilp32/-mabi=lp64 (riscv it
is more complex).

This most likely should be something more complex due to the above.
Maybe call it HOST_64_32 but even that seems wrong due to Aarch64
having ILP32 support and such.
What about HOST_64ABI_OPTS="-mabi=lp64"/HOST_32ABI_OPTS="-mabi=ilp32"
but  I am not sure if that would be enough to support RISCV which
requires two options.

Thanks,
Andrew Pinski

>
> Signed-off-by: Prathamesh Kulkarni 
>
> Thanks,
> Prathamesh

[commit] amdgcn: Re-enable trampolines

2024-08-08 Thread Andrew Stubbs

Previously, trampolines worked on GCN3 devices, but the newer GCN5
devices had different permissions on the stack memory space we were
using.

That changed when we added the reverse-offload features because we
switched from using the "private" memory space to using a regular memory
allocation.

The execute permissions on this new space permit trampolines to work
just as they did before.

This patch has been committed to mainline and will be pushed to the OG14
branch shortly.

Andrew

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_trampoline_init): Re-enable trampolines.
---
 gcc/config/gcn/gcn.cc | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 00f2978559b..b22132de6ab 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -3799,11 +3799,6 @@ gcn_asm_trampoline_template (FILE *f)
 static void
 gcn_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 {
-  // FIXME
-  if (TARGET_GCN5_PLUS)
-sorry ("nested function trampolines not supported on GCN5 due to"
-   " non-executable stacks");
-
   emit_block_move (m_tramp, assemble_trampoline_template (),
   GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
 
-- 
2.45.2

Re: [Patch] libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device

2024-08-08 Thread Sandra Loosemore


On 8/8/24 06:20, Jakub Jelinek wrote:

On Thu, Aug 08, 2024 at 02:18:48PM +0200, Tobias Burnus wrote:

Document  -fno-builtin-omp_is_initial_device as discussed:

Jakub Jelinek wrote:

RFC: Should be document this new built-in some where? If so, where? As part
of the routine description in libgomp.texi? Or in extend.texi (or even
invoke.texi)?

I think libgomp.texi in the omp_is_initial_device description, mention
that the compiler folds it by default and that if that is undesirable,
there is this option to use.


Unless there are wording suggestions, I will commit it later today.

Tobias



libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device

libgomp/ChangeLog:

* libgomp.texi (omp_is_initial_device): Mention
-fno-builtin-omp_is_initial_device and folding by default.


LGTM.


Me too.

-Sandra

Re: [PATCH v3 1/7] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces

2024-08-08 Thread Tobias Burnus


Paul-Antoine Arras wrote:

This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.


LGTM - thanks!

Tobias


gcc/ChangeLog:

* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.

gcc/fortran/ChangeLog:

* types.def (BT_FN_PTR_CONST_PTR_INT): Declare.
---
  gcc/builtin-types.def|  1 +
  gcc/fortran/types.def|  1 +
  gcc/omp-selectors.h  |  1 +
  gcc/tree-core.h  |  7 +++
  gcc/tree-pretty-print.cc | 21 +
  gcc/tree.cc  |  4 
  gcc/tree.def |  5 +
  gcc/tree.h   |  7 +++
  8 files changed, 47 insertions(+)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..ef7aaf67d13 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -677,6 +677,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_INT_FEXCEPT_T_PTR_INT, BT_INT, 
BT_FEXCEPT_T_PTR,
  DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_FEXCEPT_T_PTR_INT, BT_INT,
 BT_CONST_FEXCEPT_T_PTR, BT_INT)
  DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_UINT8, BT_PTR, BT_CONST_PTR, 
BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)

  DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)

diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 390cc9542f7..5047c8f816a 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -120,6 +120,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_INT_BOOL, BT_BOOL, BT_INT, 
BT_BOOL)
  DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_PTRMODE,
 BT_VOID, BT_PTR, BT_PTRMODE)
  DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, 
BT_SIZE)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)

  DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)

diff --git a/gcc/omp-selectors.h b/gcc/omp-selectors.h
index c61808ec0ad..ef3ce9a449a 100644
--- a/gcc/omp-selectors.h
+++ b/gcc/omp-selectors.h
@@ -55,6 +55,7 @@ enum omp_ts_code {
OMP_TRAIT_CONSTRUCT_PARALLEL,
OMP_TRAIT_CONSTRUCT_FOR,
OMP_TRAIT_CONSTRUCT_SIMD,
+  OMP_TRAIT_CONSTRUCT_DISPATCH,
OMP_TRAIT_LAST,
OMP_TRAIT_INVALID = -1
  };
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c7702..508f5c580d4 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -542,6 +542,13 @@ enum omp_clause_code {

/* OpenACC clause: nohost.  */
OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: novariants (scalar-expression).  */
+  OMP_CLAUSE_NOVARIANTS,
+
+  /* OpenMP clause: nocontext (scalar-expression).  */
+  OMP_CLAUSE_NOCONTEXT,
+
  };

  #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 4bb946bb0e8..752a402e0d0 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -506,6 +506,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
dump_flags_t flags)
  case OMP_CLAUSE_EXCLUSIVE:
name = "exclusive";
goto print_remap;
+case OMP_CLAUSE_NOVARIANTS:
+  pp_string (pp, "novariants");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOVARIANTS_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOVARIANTS_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
+case OMP_CLAUSE_NOCONTEXT:
+  pp_string (pp, "nocontext");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOCONTEXT_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOCONTEXT_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
  case OMP_CLAUSE__LOOPTEMP_:
name = "_looptemp_";
goto print_remap;
@@ -3947,6 +3963,11 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
dump_omp_clauses (pp, OMP_SECTIONS_CLAUSES (node), spc, flags);
goto dump_omp_body;

+case OMP_DISPATCH:
+  pp_string (pp, "#pragma omp dispatch");
+  dump_omp_clauses (pp, OMP_DISPATCH_CLAUSES (node), spc, flags);
+  goto dump_omp_body;
+
  case OMP_SECTION:
pp_string (pp, "#pra

Re: [PATCH] Support if conversion for switches

2024-08-08 Thread Andi Kleen

> > But your comment made me realize there is a major bug.
> >
> > if_convertible_switch_p also needs to check that that the labels don't fall
> > through, so the the flow graph is diamond shape.  Need some easy way to
> > verify that.
> 
> Do we verify this for if()s?  That is,

No we do not. After some consideration it isn't a bug at all.

> 
>   if (i)
> {
>   ...
>goto fallthru;
> }
>   else
>{
> fallthru:
>  ...
>}
> 
> For ifs we seem to add the predicate to both edges even in the degenerate 
> case.

Yes we do.

-Andi

Re: [Patch] libgomp.texi: Update implementation status table for OpenMP TR13

2024-08-08 Thread Sandra Loosemore


On 8/8/24 06:21, Tobias Burnus wrote:
Update for the very recently released TR13. Unsurprisingly, most item 
are still unimplemented.


→ https://www.openmp.org/specifications/ → Technical Report 13

Comments, suggestions, typo fixes? — If not, I will commit it later today.


I've got a few things...


 @item @code{workdistribute} directive for Fortran @tab N
-  @tab Renamed just after TR12; added in TR12 as @code{coexecute}
+  @tab Intermittendly known as @code{coexecute}


"Intermittendly" isn't a word.  I'm not sure what you're trying to say 
here, but I don't think we need to document things that are not part of 
any official standard and were never implemented in GCC.



+@item Deprecation of the @code{target_data_op}, @code{target},
+  @code{target_map and target_submit} callbacks and as value that
+  @code{set_callback} must return @tab N @tab


Do you mean "@code{target_map} and @code{target_submit}"?

And s/as value/as values/ (since there are more than one).


+@item The @code{values ompt_target_data_transfer_to_device},
+  @code{ompt_target_data_transfer_from_device},
+  @code{ompt_target_data_transfer_to_device_async} and
+  @code{ompt_target_data_transfer_from_device_async} were deprecated > +   
   @tab N @tab


Doesn't say what the things with these names are.  How about

"The  enumerators for the @code{target_data_op} OMPT type were 
deprecated."


-Sandra

[PATCH v2 0/4] aarch64: Fix intrinsic availability [PR112108]

2024-08-08 Thread Andrew Carlotti

This series of patches fixes issues with some intrinsics being incorrectly
gated by global target options, instad of just using function-specific target
options.  These issues have been present since the +tme, +memtag and +ls64
intrinsics were introduced.

This series is an rebased and fixed version of the series I sent last November:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635798.html

Patch 1 is updated to fix formatting, and to retain SME error reporting that
was merged after the original series was posted.

Patch 2 is updated after the creation of aarch64_general_check_builtin_call
upstream.

Patches 2-4 are also updated to use aarch64_general_simulate_builtin, and to
initialise the intrinsics within handle_arm_acle_h.


Bootstrapped and regression tested on aarch64.  Ok to merge?

Also, ok for backports to affected versions (with regression tests)?

[PATCH v2 1/4] aarch64: Refactor check_required_extensions

2024-08-08 Thread Andrew Carlotti

Move SVE extension checking functionality to aarch64-builtins.cc, so
that it can be shared by non-SVE intrinsics.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (check_builtin_call)
(expand_builtin): Update calls to the below.
(report_missing_extension, check_required_registers)
(check_required_extensions): Move out of aarch64_sve namespace,
rename, and move into...
* config/aarch64/aarch64-builtins.cc (aarch64_report_missing_extension)
(aarch64_check_non_general_registers)
(aarch64_check_required_extensions) ...here.
* config/aarch64/aarch64-protos.h (aarch64_check_required_extensions):
Add prototype.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
30669f8aa1823b64689c67e306d38e234bd31698..d0fb8bc1d1fedb382cba1a1f09a9c3ce6757ee22
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2180,6 +2180,110 @@ aarch64_general_builtin_decl (unsigned code, bool)
   return aarch64_builtin_decls[code];
 }
 
+/* True if we've already complained about attempts to use functions
+   when the required extension is disabled.  */
+static bool reported_missing_extension_p;
+
+/* True if we've already complained about attempts to use functions
+   which require registers that are missing.  */
+static bool reported_missing_registers_p;
+
+/* Report an error against LOCATION that the user has tried to use
+   function FNDECL when extension EXTENSION is disabled.  */
+static void
+aarch64_report_missing_extension (location_t location, tree fndecl,
+ const char *extension)
+{
+  /* Avoid reporting a slew of messages for a single oversight.  */
+  if (reported_missing_extension_p)
+return;
+
+  error_at (location, "ACLE function %qD requires ISA extension %qs",
+   fndecl, extension);
+  inform (location, "you can enable %qs using the command-line"
+ " option %<-march%>, or by using the %"
+ " attribute or pragma", extension);
+  reported_missing_extension_p = true;
+}
+
+/* Check whether non-general registers required by ACLE function fndecl are
+ * available.  Report an error against LOCATION and return false if not.  */
+static bool
+aarch64_check_non_general_registers (location_t location, tree fndecl)
+{
+  /* Avoid reporting a slew of messages for a single oversight.  */
+  if (reported_missing_registers_p)
+return false;
+
+  if (TARGET_GENERAL_REGS_ONLY)
+{
+  /* FP/SIMD/SVE registers are not usable when -mgeneral-regs-only option
+is specified.  */
+  error_at (location,
+   "ACLE function %qD is incompatible with the use of %qs",
+   fndecl, "-mgeneral-regs-only");
+  reported_missing_registers_p = true;
+  return false;
+}
+
+  return true;
+}
+
+/* Check whether all the AARCH64_FL_* values in REQUIRED_EXTENSIONS are
+   enabled, given that those extensions are required for function FNDECL.
+   Report an error against LOCATION if not.
+   If REQUIRES_NON_GENERAL_REGISTERS is true, then also check whether
+   non-general registers are available.  */
+bool
+aarch64_check_required_extensions (location_t location, tree fndecl,
+  aarch64_feature_flags required_extensions,
+  bool requires_non_general_registers)
+{
+  auto missing_extensions = required_extensions & ~aarch64_asm_isa_flags;
+  if (missing_extensions == 0)
+return requires_non_general_registers
+  ? aarch64_check_non_general_registers (location, fndecl)
+  : true;
+
+  if (missing_extensions & AARCH64_FL_SM_OFF)
+{
+  error_at (location, "ACLE function %qD cannot be called when"
+   " SME streaming mode is enabled", fndecl);
+  return false;
+}
+
+  if (missing_extensions & AARCH64_FL_SM_ON)
+{
+  error_at (location, "ACLE function %qD can only be called when"
+   " SME streaming mode is enabled", fndecl);
+  return false;
+}
+
+  if (missing_extensions & AARCH64_FL_ZA_ON)
+{
+  error_at (location, "ACLE function %qD can only be called from"
+   " a function that has %qs state", fndecl, "za");
+  return false;
+}
+
+  static const struct {
+aarch64_feature_flags flag;
+const char *name;
+  } extensions[] = {
+#define AARCH64_OPT_EXTENSION(EXT_NAME, IDENT, C, D, E, F) \
+{ AARCH64_FL_##IDENT, EXT_NAME },
+#include "aarch64-option-extensions.def"
+  };
+
+  for (unsigned int i = 0; i < ARRAY_SIZE (extensions); ++i)
+if (missing_extensions & extensions[i].flag)
+  {
+   aarch64_report_missing_extension (location, fndecl, extensions[i].name);
+   return false;
+  }
+  gcc_unreachable ();
+}
+
 bool
 aarch64_general_check_builtin_call (location_t location, vec,
unsigned int code, tree fndecl,
diff --git a/gcc/config/aarch64/aarch64-pr

[COMMITTED 1/6] ada: Finalization_Size raises Constraint_Error

2024-08-08 Thread Marc Poulhiès

From: Javier Miranda 

When the attribute Finalization_Size is applied to an interface type
object, the compiler-generated code fails at runtime, raising a
Constraint_Error exception.

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference) :
If the prefix is an interface type, generate code to obtain its
address and displace it to reference the base of the object.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_attr.adb | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index 13c7444ca87..6475308f71b 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -3688,11 +3688,34 @@ package body Exp_Attr is
 
  --  Local variables
 
- Size : Entity_Id;
+ P_Loc : constant Source_Ptr := Sloc (Pref);
+ Size  : Entity_Id;
 
   --  Start of processing for Finalization_Size
 
   begin
+ --  If the prefix is an interface type, generate code to obtain its
+ --  address and displace it to reference the base of the object.
+
+ if Is_Interface (Ptyp) then
+--  Generate:
+--Ptyp!(tag_ptr!($base_address (ptr.all'address)).all)
+
+Rewrite (Pref,
+  Unchecked_Convert_To (Ptyp,
+Make_Explicit_Dereference (P_Loc,
+  Unchecked_Convert_To (RTE (RE_Tag_Ptr),
+Make_Function_Call (P_Loc,
+  Name => New_Occurrence_Of
+(RTE (RE_Base_Address), P_Loc),
+  Parameter_Associations =>
+New_List (
+  Make_Attribute_Reference (P_Loc,
+Prefix => Duplicate_Subexpr (Pref),
+Attribute_Name => Name_Address)));
+Analyze_And_Resolve (Pref, Ptyp);
+ end if;
+
  --  If the prefix is the dereference of an access value subject to
  --  pragma No_Heap_Finalization, then no header has been added.
 
-- 
2.45.2

[COMMITTED 4/6] ada: Run-time error with GNAT-LLVM on container aggregate with finalization

2024-08-08 Thread Marc Poulhiès

From: Gary Dismukes 

When unnesting is enabled, the compiler was failing to copy the At_End_Proc
field from a block statement to the procedure created to replace it when
unnesting of top-level blocks is done.  At run time this could lead to
exceptions due to missing finalization calls.

gcc/ada/

* exp_ch7.adb (Unnest_Block): Copy the At_End_Proc from the block
statement to the newly created subprogram body.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 72f0b539c2e..640ad5c60b8 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -8932,7 +8932,8 @@ package body Exp_Ch7 is
   Defining_Unit_Name => Local_Proc),
   Declarations   => Declarations (Decl),
   Handled_Statement_Sequence =>
-Handled_Statement_Sequence (Decl));
+Handled_Statement_Sequence (Decl),
+  At_End_Proc=> New_Copy_Tree (At_End_Proc (Decl)));
 
   --  Handlers in the block may contain nested subprograms that require
   --  unnesting.
-- 
2.45.2

[PATCH v2 2/4] aarch64: Fix tme intrinsic availability

2024-08-08 Thread Andrew Carlotti

The availability of tme intrinsics was previously gated at both
initialisation time (using global target options) and usage time
(accounting for function-specific target options).  This patch removes
the check at initialisation time, and also moves the intrinsics out of
the header file to allow for better error messages (matching the
existing error messages for SVE intrinsics).

gcc/ChangeLog:

PR target/112108
* config/aarch64/aarch64-builtins.cc (aarch64_init_tme_builtins):
(aarch64_general_init_builtins): Move tme initialisation...
(handle_arm_acle_h): ...to here, and remove feature check.
(aarch64_general_check_builtin_call): Check tme intrinsics.
(aarch64_expand_builtin_tme): Check feature availability.
* config/aarch64/arm_acle.h (__tstart, __tcommit, __tcancel)
(__ttest): Remove.
(_TMFAILURE_*): Define unconditionally.

gcc/testsuite/ChangeLog:

PR target/112108
* gcc.target/aarch64/acle/tme_guard-1.c: New test.
* gcc.target/aarch64/acle/tme_guard-2.c: New test.
* gcc.target/aarch64/acle/tme_guard-3.c: New test.
* gcc.target/aarch64/acle/tme_guard-4.c: New test.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
d0fb8bc1d1fedb382cba1a1f09a9c3ce6757ee22..f7d31d8c4308b4a883f8ce7df5c3ee319a9c
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1791,19 +1791,19 @@ aarch64_init_tme_builtins (void)
 = build_function_type_list (void_type_node, uint64_type_node, NULL);
 
   aarch64_builtin_decls[AARCH64_TME_BUILTIN_TSTART]
-= aarch64_general_add_builtin ("__builtin_aarch64_tstart",
+= aarch64_general_simulate_builtin ("__tstart",
   ftype_uint64_void,
   AARCH64_TME_BUILTIN_TSTART);
   aarch64_builtin_decls[AARCH64_TME_BUILTIN_TTEST]
-= aarch64_general_add_builtin ("__builtin_aarch64_ttest",
+= aarch64_general_simulate_builtin ("__ttest",
   ftype_uint64_void,
   AARCH64_TME_BUILTIN_TTEST);
   aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCOMMIT]
-= aarch64_general_add_builtin ("__builtin_aarch64_tcommit",
+= aarch64_general_simulate_builtin ("__tcommit",
   ftype_void_void,
   AARCH64_TME_BUILTIN_TCOMMIT);
   aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCANCEL]
-= aarch64_general_add_builtin ("__builtin_aarch64_tcancel",
+= aarch64_general_simulate_builtin ("__tcancel",
   ftype_void_uint64,
   AARCH64_TME_BUILTIN_TCANCEL);
 }
@@ -2068,6 +2068,7 @@ handle_arm_acle_h (void)
 {
   if (TARGET_LS64)
 aarch64_init_ls64_builtins ();
+  aarch64_init_tme_builtins ();
 }
 
 /* Initialize fpsr fpcr getters and setters.  */
@@ -2160,9 +2161,6 @@ aarch64_general_init_builtins (void)
   if (!TARGET_ILP32)
 aarch64_init_pauth_hint_builtins ();
 
-  if (TARGET_TME)
-aarch64_init_tme_builtins ();
-
   if (TARGET_MEMTAG)
 aarch64_init_memtag_builtins ();
 
@@ -2289,6 +2287,7 @@ aarch64_general_check_builtin_call (location_t location, 
vec,
unsigned int code, tree fndecl,
unsigned int nargs ATTRIBUTE_UNUSED, tree *args)
 {
+  tree decl = aarch64_builtin_decls[code];
   switch (code)
 {
 case AARCH64_RSR:
@@ -2301,15 +2300,28 @@ aarch64_general_check_builtin_call (location_t 
location, vec,
 case AARCH64_WSR64:
 case AARCH64_WSRF:
 case AARCH64_WSRF64:
-  tree addr = STRIP_NOPS (args[0]);
-  if (TREE_CODE (TREE_TYPE (addr)) != POINTER_TYPE
- || TREE_CODE (addr) != ADDR_EXPR
- || TREE_CODE (TREE_OPERAND (addr, 0)) != STRING_CST)
-   {
- error_at (location, "first argument to %qD must be a string literal",
-   fndecl);
- return false;
-   }
+  {
+   tree addr = STRIP_NOPS (args[0]);
+   if (TREE_CODE (TREE_TYPE (addr)) != POINTER_TYPE
+   || TREE_CODE (addr) != ADDR_EXPR
+   || TREE_CODE (TREE_OPERAND (addr, 0)) != STRING_CST)
+ {
+   error_at (location, "first argument to %qD must be a string 
literal",
+ fndecl);
+   return false;
+ }
+   break;
+  }
+
+case AARCH64_TME_BUILTIN_TSTART:
+case AARCH64_TME_BUILTIN_TCOMMIT:
+case AARCH64_TME_BUILTIN_TTEST:
+case AARCH64_TME_BUILTIN_TCANCEL:
+  return aarch64_check_required_extensions (location, decl,
+   AARCH64_FL_TME, false);
+
+default:
+  break;
 }
   /* Default behavior.  */
   return true;
@@ -2734,6 +2746,11 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int 
fcode)
 static rtx
 aarch64_expand_builtin_tme (int fcode, tree exp, rtx target)
 {
+  tree fndec

[COMMITTED 2/6] ada: Spurious maximum nesting level warnings

2024-08-08 Thread Marc Poulhiès

From: Justin Squirek 

This patch fixes an issue in the compiler whereby disabling style checks via
pragma Style_Checks ("-L") resulted in the minimum nesting level being zero
but the style still being enabled - leading to spurious maximum nesting level
exceeded warnings.

gcc/ada/

* stylesw.adb (Set_Style_Check_Options): Disable max nesting level
when unspecified

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/stylesw.adb | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/stylesw.adb b/gcc/ada/stylesw.adb
index 76004455b10..6ef8e205e96 100644
--- a/gcc/ada/stylesw.adb
+++ b/gcc/ada/stylesw.adb
@@ -537,7 +537,8 @@ package body Stylesw is
Style_Check_Layout := False;
 
 when 'L' =>
-   Style_Max_Nesting_Level := 0;
+   Style_Max_Nesting_Level:= 0;
+   Style_Check_Max_Nesting_Level  := False;
 
 when 'm' =>
Style_Check_Max_Line_Length:= False;
-- 
2.45.2

[COMMITTED 6/6] ada: Missing legality check when type completed

2024-08-08 Thread Marc Poulhiès

From: Steve Baird 

An access discriminant is allowed to have a default value only if the
discriminated type is immutably limited. In the case of a discriminated
limited private type declaration, this rule needs to be checked when
the completion of the type is seen.

gcc/ada/

* sem_ch6.adb (Check_Discriminant_Conformance): Perform check for
illegal access discriminant default values when the completion of
a limited private type is analyzed.
* sem_aux.adb (Is_Immutably_Limited): If passed the
not-yet-analyzed entity for the full view of a record type, test
the Limited_Present flag
(which is set by the parser).

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aux.adb | 11 +++
 gcc/ada/sem_ch6.adb | 14 ++
 2 files changed, 25 insertions(+)

diff --git a/gcc/ada/sem_aux.adb b/gcc/ada/sem_aux.adb
index 0639a2e4d86..9903a2b6a16 100644
--- a/gcc/ada/sem_aux.adb
+++ b/gcc/ada/sem_aux.adb
@@ -1118,6 +1118,17 @@ package body Sem_Aux is
 
   elsif Is_Private_Type (Btype) then
 
+  --  If Ent occurs in the completion of a limited private type, then
+  --  look for the word "limited" in the full view.
+
+ if Nkind (Parent (Ent)) = N_Full_Type_Declaration
+   and then Nkind (Type_Definition (Parent (Ent))) =
+  N_Record_Definition
+   and then Limited_Present (Type_Definition (Parent (Ent)))
+ then
+return True;
+ end if;
+
  --  AI05-0063: A type derived from a limited private formal type is
  --  not immutably limited in a generic body.
 
diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index d3912ffc9d5..5735efb327c 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -6456,6 +6456,20 @@ package body Sem_Ch6 is
  New_Discr_Id);
   return;
end if;
+
+   if NewD
+ and then Ada_Version >= Ada_2005
+ and then Nkind (Discriminant_Type (New_Discr)) =
+N_Access_Definition
+ and then not Is_Immutably_Limited_Type
+(Defining_Identifier (N))
+   then
+  Error_Msg_N
+("(Ada 2005) default value for access discriminant "
+ & "requires immutably limited type",
+ Expression (New_Discr));
+  return;
+   end if;
 end if;
  end;
 
-- 
2.45.2

[COMMITTED 5/6] ada: Etype missing for raise expression

2024-08-08 Thread Marc Poulhiès

From: Steve Baird 

If the primitive equality operator of the component type of an array type is
abstract, then a call to that abstract function raises Program_Error (when
such a call is legal). The FE generates a raise expression to implement this.
That raise expression is an expression so it should have a valid Etype.

gcc/ada/

* exp_ch4.adb (Build_Eq_Call): In the abstract callee case, copy
the Etype of the callee onto the Make_Raise_Program_Error result.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 18ec7125cc1..106305f4636 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -443,8 +443,11 @@ package body Exp_Ch4 is
begin
   if Present (Eq) then
  if Is_Abstract_Subprogram (Eq) then
-return Make_Raise_Program_Error (Loc,
-   Reason =>  PE_Explicit_Raise);
+return Result : constant Node_Id :=
+  Make_Raise_Program_Error (Loc, Reason =>  PE_Explicit_Raise)
+do
+   Set_Etype (Result, Etype (Eq));
+end return;
 
  else
 return
-- 
2.45.2

[COMMITTED 3/6] ada: Futher refinements to mutably tagged types

2024-08-08 Thread Marc Poulhiès

From: Justin Squirek 

This patch further enhances the mutably tagged type implementation by fixing
several oversights relating to generic instantiations, attributes, and
type conversions.

gcc/ada/

* exp_put_image.adb (Append_Component_Attr): Obtain the mutably
tagged type for the component type.
* mutably_tagged.adb (Make_Mutably_Tagged_Conversion): Add more
cases to avoid conversion generation.
* sem_attr.adb (Check_Put_Image_Attribute): Add mutably tagged
type conversion.
* sem_ch12.adb (Analyze_One_Association): Add rewrite for formal
type declarations which are mutably tagged type to their
equivalent type.
(Instantiate_Type): Add condition to obtain class wide equivalent
types.
(Validate_Private_Type_Instance): Add check for class wide
equivalent types which are considered "definite".
* sem_util.adb (Is_Variable): Add condition to handle selected
components of view conversions. Add missing check for selected
components.
(Is_View_Conversion): Add condition to handle class wide
equivalent types.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_put_image.adb  | 25 ++---
 gcc/ada/mutably_tagged.adb | 21 ++---
 gcc/ada/sem_attr.adb   |  7 +++
 gcc/ada/sem_ch12.adb   | 25 +++--
 gcc/ada/sem_util.adb   | 14 +-
 5 files changed, 71 insertions(+), 21 deletions(-)

diff --git a/gcc/ada/exp_put_image.adb b/gcc/ada/exp_put_image.adb
index bf14eded93e..217c38a30e7 100644
--- a/gcc/ada/exp_put_image.adb
+++ b/gcc/ada/exp_put_image.adb
@@ -32,6 +32,7 @@ with Einfo.Utils;use Einfo.Utils;
 with Exp_Tss;use Exp_Tss;
 with Exp_Util;   use Exp_Util;
 with Lib;use Lib;
+with Mutably_Tagged; use Mutably_Tagged;
 with Namet;  use Namet;
 with Nlists; use Nlists;
 with Nmake;  use Nmake;
@@ -402,9 +403,9 @@ package body Exp_Put_Image is
   end;
end Build_Elementary_Put_Image_Call;
 
-   -
+   -
-- Build_String_Put_Image_Call --
-   -
+   -
 
function Build_String_Put_Image_Call (N : Node_Id) return Node_Id is
   Loc : constant Source_Ptr := Sloc (N);
@@ -485,9 +486,9 @@ package body Exp_Put_Image is
 Relocate_Node (Sink)));
end Build_Protected_Put_Image_Call;
 
-   
+   ---
-- Build_Task_Put_Image_Call --
-   
+   ---
 
--  For "Task_Type'Put_Image (S, Task_Object)", build:
--
@@ -650,12 +651,14 @@ package body Exp_Put_Image is
  return Result;
   end Make_Component_List_Attributes;
 
-  
+  ---
   -- Append_Component_Attr --
-  
+  ---
 
   procedure Append_Component_Attr (Clist : List_Id; C : Entity_Id) is
- Component_Typ : constant Entity_Id := Put_Image_Base_Type (Etype (C));
+ Component_Typ : constant Entity_Id :=
+   Put_Image_Base_Type
+ (Get_Corresponding_Mutably_Tagged_Type_If_Present (Etype (C)));
   begin
  if Ekind (C) /= E_Void then
 Append_To (Clist,
@@ -936,9 +939,9 @@ package body Exp_Put_Image is
   Build_Put_Image_Proc (Loc, Btyp, Decl, Pnam, Stms);
end Build_Record_Put_Image_Procedure;
 
-   ---
+   -
-- Build_Put_Image_Profile --
-   ---
+   -
 
function Build_Put_Image_Profile
  (Loc : Source_Ptr; Typ : Entity_Id) return List_Id
@@ -983,9 +986,9 @@ package body Exp_Put_Image is
   Statements => Stms));
end Build_Put_Image_Proc;
 
-   
+   --
-- Build_Unknown_Put_Image_Call --
-   
+   --
 
function Build_Unknown_Put_Image_Call (N : Node_Id) return Node_Id is
   Loc: constant Source_Ptr := Sloc (N);
diff --git a/gcc/ada/mutably_tagged.adb b/gcc/ada/mutably_tagged.adb
index 34b032f08c8..495cdd0fcfb 100644
--- a/gcc/ada/mutably_tagged.adb
+++ b/gcc/ada/mutably_tagged.adb
@@ -272,15 +272,22 @@ package body Mutably_Tagged is
   if Force
 
 --  Otherwise, don't make the conversion when N is on the left-hand
---  side of the assignment, is already part of an unchecked conversion,
---  or is part of a renaming.
+--  side of the assignment, in cases where we need the actual type
+--  such as a subtype or object renaming declaration, or a generic or
+--  pa

[PATCH v2 4/4] aarch64: Fix ls64 intrinsic availability

2024-08-08 Thread Andrew Carlotti

The availability of ls64 intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.

This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. We also get better error
messages when ls64 is not available (matching the existing error
messages for SVE intrinsics).

The data512_t type is made always available; this is consistent with the
present behaviour for Neon fp16/bf16 types.

gcc/ChangeLog:

PR target/112108
* config/aarch64/aarch64-builtins.cc (handle_arm_acle_h): Remove
feature check at initialisation.
(aarch64_general_check_builtin_call): Check ls64 intrinsics.
(aarch64_expand_builtin_ls64): Add feature check.
* config/aarch64/arm_acle.h: (data512_t) Make always available.

gcc/testsuite/ChangeLog:

PR target/112108
* gcc.target/aarch64/acle/ls64_guard-1.c: New test.
* gcc.target/aarch64/acle/ls64_guard-2.c: New test.
* gcc.target/aarch64/acle/ls64_guard-3.c: New test.
* gcc.target/aarch64/acle/ls64_guard-4.c: New test.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
50667e555497b483aea6a64bb5809ddc62cedf83..ba0147a2077514b4d2a6f9bccc8e7fe897d891b3
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2066,8 +2066,7 @@ aarch64_init_data_intrinsics (void)
 void
 handle_arm_acle_h (void)
 {
-  if (TARGET_LS64)
-aarch64_init_ls64_builtins ();
+  aarch64_init_ls64_builtins ();
   aarch64_init_tme_builtins ();
   aarch64_init_memtag_builtins ();
 }
@@ -2318,6 +2317,13 @@ aarch64_general_check_builtin_call (location_t location, 
vec,
   return aarch64_check_required_extensions (location, decl,
AARCH64_FL_TME, false);
 
+case AARCH64_LS64_BUILTIN_LD64B:
+case AARCH64_LS64_BUILTIN_ST64B:
+case AARCH64_LS64_BUILTIN_ST64BV:
+case AARCH64_LS64_BUILTIN_ST64BV0:
+  return aarch64_check_required_extensions (location, decl,
+   AARCH64_FL_LS64, false);
+
 default:
   break;
 }
@@ -2798,6 +2804,11 @@ aarch64_expand_builtin_ls64 (int fcode, tree exp, rtx 
target)
 {
   expand_operand ops[3];
 
+  tree fndecl = aarch64_builtin_decls[fcode];
+  if (!aarch64_check_required_extensions (EXPR_LOCATION (exp), fndecl,
+ AARCH64_FL_LS64, false))
+return target;
+
   switch (fcode)
 {
 case AARCH64_LS64_BUILTIN_LD64B:
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 
ab04326791309796125860ce64e63fe858a4a733..ab4e7e60e046a9e9c81237de2ca5463c3d4f96ca
 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -265,9 +265,7 @@ __crc32d (uint32_t __a, uint64_t __b)
 #define _TMFAILURE_INT0x0080u
 #define _TMFAILURE_TRIVIAL0x0100u
 
-#ifdef __ARM_FEATURE_LS64
 typedef __arm_data512_t data512_t;
-#endif
 
 #pragma GCC push_options
 #pragma GCC target ("+nothing+rng")
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-1.c 
b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-1.c
new file mode 100644
index 
..7dfc193a2934c994220280990316027c07e75ac4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.6-a" } */
+
+#include 
+
+data512_t foo (void * p)
+{
+  return __arm_ld64b (p); /* { dg-error {ACLE function '__arm_ld64b' requires 
ISA extension 'ls64'} } */
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-2.c 
b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-2.c
new file mode 100644
index 
..3ede05a81f026f8606ee2c9cd56f15ce45caa1c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.6-a" } */
+
+#include 
+
+#pragma GCC target("arch=armv8-a+ls64")
+data512_t foo (void * p)
+{
+  return __arm_ld64b (p);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-3.c 
b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-3.c
new file mode 100644
index 
..e0fccdad7bec4aa522fb709d010289fd02f91d05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-3.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8-a+ls64 -mgeneral-regs-only" } */
+
+#include 
+
+data512_t foo (void * p)
+{
+  return __arm_ld64b (p);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-4.c 
b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-4.c
new file mode 100644
index 
..af1d9a4241fd0047c52735a8103eeaa4552

[PATCH v2 3/4] aarch64: Fix memtag intrinsic availability

2024-08-08 Thread Andrew Carlotti

The availability of memtag intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.

This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. It also removes the macro
indirection from the header file - this simplifies the header, and
allows the missing extension error reporting to find the user-facing
intrinsic names.

gcc/ChangeLog:

PR target/112108
* config/aarch64/aarch64-builtins.cc (aarch64_init_memtag_builtins):
Replace internal builtin names with intrinsic names.
(aarch64_general_init_builtins): Move memtag intialisation...
(handle_arm_acle_h): ...to here, and remove feature check.
(aarch64_general_check_builtin_call): Check memtag intrinsics.
(aarch64_expand_builtin_memtag): Add feature check.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag)
(__arm_mte_exclude_tag, __arm_mte_ptrdiff)
(__arm_mte_increment_tag, __arm_mte_set_tag, __arm_mte_get_tag):
Remove.

gcc/testsuite/ChangeLog:

PR target/112108
* gcc.target/aarch64/acle/memtag_guard-1.c: New test.
* gcc.target/aarch64/acle/memtag_guard-2.c: New test.
* gcc.target/aarch64/acle/memtag_guard-3.c: New test.
* gcc.target/aarch64/acle/memtag_guard-4.c: New test.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
f7d31d8c4308b4a883f8ce7df5c3ee319a9c..50667e555497b483aea6a64bb5809ddc62cedf83
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1936,7 +1936,7 @@ aarch64_init_memtag_builtins (void)
 
 #define AARCH64_INIT_MEMTAG_BUILTINS_DECL(F, N, I, T) \
   aarch64_builtin_decls[AARCH64_MEMTAG_BUILTIN_##F] \
-= aarch64_general_add_builtin ("__builtin_aarch64_memtag_"#N, \
+= aarch64_general_simulate_builtin ("__arm_mte_"#N, \
   T, AARCH64_MEMTAG_BUILTIN_##F); \
   aarch64_memtag_builtin_data[AARCH64_MEMTAG_BUILTIN_##F - \
  AARCH64_MEMTAG_BUILTIN_START - 1] = \
@@ -1944,19 +1944,19 @@ aarch64_init_memtag_builtins (void)
 
   fntype = build_function_type_list (ptr_type_node, ptr_type_node,
 uint64_type_node, NULL);
-  AARCH64_INIT_MEMTAG_BUILTINS_DECL (IRG, irg, irg, fntype);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (IRG, create_random_tag, irg, fntype);
 
   fntype = build_function_type_list (uint64_type_node, ptr_type_node,
 uint64_type_node, NULL);
-  AARCH64_INIT_MEMTAG_BUILTINS_DECL (GMI, gmi, gmi, fntype);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (GMI, exclude_tag, gmi, fntype);
 
   fntype = build_function_type_list (ptrdiff_type_node, ptr_type_node,
 ptr_type_node, NULL);
-  AARCH64_INIT_MEMTAG_BUILTINS_DECL (SUBP, subp, subp, fntype);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (SUBP, ptrdiff, subp, fntype);
 
   fntype = build_function_type_list (ptr_type_node, ptr_type_node,
 unsigned_type_node, NULL);
-  AARCH64_INIT_MEMTAG_BUILTINS_DECL (INC_TAG, inc_tag, addg, fntype);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (INC_TAG, increment_tag, addg, fntype);
 
   fntype = build_function_type_list (void_type_node, ptr_type_node, NULL);
   AARCH64_INIT_MEMTAG_BUILTINS_DECL (SET_TAG, set_tag, stg, fntype);
@@ -2069,6 +2069,7 @@ handle_arm_acle_h (void)
   if (TARGET_LS64)
 aarch64_init_ls64_builtins ();
   aarch64_init_tme_builtins ();
+  aarch64_init_memtag_builtins ();
 }
 
 /* Initialize fpsr fpcr getters and setters.  */
@@ -2161,9 +2162,6 @@ aarch64_general_init_builtins (void)
   if (!TARGET_ILP32)
 aarch64_init_pauth_hint_builtins ();
 
-  if (TARGET_MEMTAG)
-aarch64_init_memtag_builtins ();
-
   if (in_lto_p)
 handle_arm_acle_h ();
 }
@@ -2323,7 +2321,12 @@ aarch64_general_check_builtin_call (location_t location, 
vec,
 default:
   break;
 }
-  /* Default behavior.  */
+
+  if (code >= AARCH64_MEMTAG_BUILTIN_START
+  && code <= AARCH64_MEMTAG_BUILTIN_END)
+   return aarch64_check_required_extensions (location, decl,
+ AARCH64_FL_MEMTAG, false);
+
   return true;
 }
 
@@ -3098,6 +3101,11 @@ aarch64_expand_builtin_memtag (int fcode, tree exp, rtx 
target)
   return const0_rtx;
 }
 
+  tree fndecl = aarch64_builtin_decls[fcode];
+  if (!aarch64_check_required_extensions (EXPR_LOCATION (exp), fndecl,
+ AARCH64_FL_MEMTAG, false))
+return target;
+
   rtx pat = NULL;
   enum insn_code icode = aarch64_memtag_builtin_data[fcode -
   AARCH64_MEMTAG_BUILTIN_START - 1].icode;
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 
2d84ab1bd3f3241196727d7a632a155014708081..

Re: [PATCH 1/3] RISC-V: testsuite: xtheadfmemidx: Rename test and add similar Zfa test

2024-08-08 Thread Christoph Müllner

On Wed, Aug 7, 2024 at 4:48 PM Jeff Law  wrote:
>
>
>
> On 8/7/24 12:27 AM, Christoph Müllner wrote:
> > Test file xtheadfmemidx-medany.c has been added in b79cd204c780 as a
> > test case that provoked an ICE when loading DFmode registers via two
> > SImode register loads followed by a SI->DF[63:32] move from XTheadFmv.
> > Since Zfa is affected in the same way as XTheadFmv, even if both
> > have slightly different instructions, let's add a test for Zfa as well
> > and give the tests proper names.
> >
> > Let's also add a test into the test files that counts the SI->DF moves
> > from XTheadFmv/Zfa.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadfmemidx-medany.c: Move to...
> >   * gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c: ...here.
> >   * gcc.target/riscv/xtheadfmemidx-zfa-medany.c: New test.
> OK
> jeff

OK to backport the three patches of this series on GCC 14 (which is
also affected by PR116131)?

Re: [PATCH 1/3] RISC-V: testsuite: xtheadfmemidx: Rename test and add similar Zfa test

2024-08-08 Thread Jeff Law





On 8/8/24 8:34 AM, Christoph Müllner wrote:

On Wed, Aug 7, 2024 at 4:48 PM Jeff Law  wrote:




On 8/7/24 12:27 AM, Christoph Müllner wrote:

Test file xtheadfmemidx-medany.c has been added in b79cd204c780 as a
test case that provoked an ICE when loading DFmode registers via two
SImode register loads followed by a SI->DF[63:32] move from XTheadFmv.
Since Zfa is affected in the same way as XTheadFmv, even if both
have slightly different instructions, let's add a test for Zfa as well
and give the tests proper names.

Let's also add a test into the test files that counts the SI->DF moves
from XTheadFmv/Zfa.

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/xtheadfmemidx-medany.c: Move to...
   * gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c: ...here.
   * gcc.target/riscv/xtheadfmemidx-zfa-medany.c: New test.

OK
jeff


OK to backport the three patches of this series on GCC 14 (which is
also affected by PR116131)?

Of course.
jeff

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Jens Gustedt

Am 8. August 2024 13:28:57 MESZ schrieb Joseph Myers :
> On Thu, 8 Aug 2024, Alejandro Colomar wrote:
> 
> > Hi Jens,
> > 
> > On Thu, Aug 08, 2024 at 11:13:02AM GMT, Jens Gustedt wrote:
> > > > but to maintain expectations, I think it would be better to do
> > > > the same here.
> > > 
> > > Just to compare, the recent additions in C23 typeof etc. only have the
> > > parenthesized versions. So there would be precedent. And it really
> > > eases transition
> > 
> > Hmmm, interesting.
> > 
> > The good part of reusing sizeof syntax is that I can reuse internal code
> > for sizeof.  But I'll check if I can change it easily to only support
> > parens.
> 
> Since typeof produces a type, it's used in different syntactic contexts 
> from sizeof, so has different ambiguity issues, and requiring parentheses 
> with typeof is not relevant to sizeof/lengthof.  I think lengthof should 
> follow sizeof.  Make sure there's a testcase for lengthof applied to a 
> compound literal (the case that illustrates how, on parsing sizeof 
> (type-name), the compiler needs to see what comes after (type-name) to 
> determine whether it's actually sizeof applied to an expression (if '{' 
> follows) or to a type (otherwise)).  (If you're following the sizeof 
> implementation closely enough, this should just work.)
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 

Hi, 
I am not convinced that we should introduce the same syntax weirdness
for this feature. sizeof seems to be the only place in the core language
where a keyword is used as an operator in expressions, and
that does not resemble function-call notation. In particular your 
example with compound literals shows that we could avoid syntax look-ahead 
by not doing this. (People argued violently against look-ahead when we 
discussed possible inclusion of lambdas into C23)

We don't have to repeat all historic accidents when inventing a new feature.
Sure that gcc may invent anything to their liking, but when and if we pass this
for standardisation we will give such considerations a careful look.

Jens
-- 
Jens Gustedt - INRIA & ICube, Strasbourg, France

Re: [PATCH] RISC-V: tree-optimization/116274 - overzealous SLP vectorization

2024-08-08 Thread Richard Biener




> Am 08.08.2024 um 15:12 schrieb Richard Sandiford :
> 
> Richard Biener  writes:
>> The following tries to address that the vectorizer fails to have
>> precise knowledge of argument and return calling conventions and
>> views some accesses as loads and stores that are not.
>> This is mainly important when doing basic-block vectorization as
>> otherwise loop indexing would force such arguments to memory.
>> 
>> On x86 the reduction in the number of apparent loads and stores
>> often dominates cost analysis so the following tries to mitigate
>> this aggressively by adjusting only the scalar load and store
>> cost, reducing them to the cost of a simple scalar statement,
>> but not touching the vector access cost which would be much
>> harder to estimate.  Thereby we error on the side of not performing
>> basic-block vectorization.
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> 
>> Richard - we can of course do this adjustment in the backend as well
>> but it might be worthwhile in generic code.  Do you see similar
>> issues on arm?
> 
> Yeah, a pathological case is:
> 
> struct a { float f[4]; };
> struct a test(struct a a) {
>  a.f[0] += 1;
>  a.f[1] += 2;
>  a.f[2] += 3;
>  a.f[3] += 4;
>  return a;
> }
> 
> which with -O2 generates:
> 
> test:
> .LFB0:
>.cfi_startproc
>fmovw1, s2
>fmovw4, s0
>mov x0, 0
>fmovw3, s1
>sub sp, sp, #16
>.cfi_def_cfa_offset 16
>mov x2, 0
>bfi x0, x1, 0, 32
>fmovw1, s3
>bfi x2, x4, 0, 32
>bfi x2, x3, 32, 32
>bfi x0, x1, 32, 32
>adrpx1, .LC0
>stp x2, x0, [sp]
>ldr q30, [sp]
>ldr q31, [x1, #:lo12:.LC0]
>add sp, sp, 16
>.cfi_def_cfa_offset 0
>faddv31.4s, v30.4s, v31.4s
>umovx0, v31.d[0]
>umovx1, v31.d[1]
>mov x3, x0
>lsr x4, x0, 32
>lsr x0, x1, 32
>fmovs1, w4
>fmovs3, w0
>fmovs2, w1
>lsr w0, w3, 0
>fmovs0, w0
>ret
>.cfi_endproc
> 
> Admittedly most of the badness there would probably be fixed by
> parameter and return fsra (Jiufu Guo's patch), but it still doesn't
> make much sense to marshall 4 separate floats into one vector for
> a single addition, only to tear it apart into 4 separate floats
> afterwards.  We should just do four scalar additions instead.
> 
> (The patch doesn't fix this case, although it does trigger.)
> 
>>PR tree-optimization/116274
>>* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Cost scalar loads
>>and stores as simple scalar stmts when they access a non-global,
>>not address-taken variable that doesn't have BLKmode assigned.
>> 
>>* gcc.target/i386/pr116274.c: New testcase.
>> ---
>> gcc/testsuite/gcc.target/i386/pr116274.c |  9 +
>> gcc/tree-vect-slp.cc | 12 +++-
>> 2 files changed, 20 insertions(+), 1 deletion(-)
>> create mode 100644 gcc/testsuite/gcc.target/i386/pr116274.c
>> 
>> diff --git a/gcc/testsuite/gcc.target/i386/pr116274.c 
>> b/gcc/testsuite/gcc.target/i386/pr116274.c
>> new file mode 100644
>> index 000..d5811344b93
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/pr116274.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-slp2-optimized" } */
>> +
>> +struct a { long x,y; };
>> +long test(struct a a) { return a.x+a.y; }
>> +
>> +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" } } 
>> */
>> +/* { dg-final { scan-assembler-times "addl|leaq" 1 } } */
>> +/* { dg-final { scan-assembler-not "padd" } } */
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index 3464d0c0e23..e43ff721100 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -7807,7 +7807,17 @@ next_lane:
>>   vect_cost_for_stmt kind;
>>   if (STMT_VINFO_DATA_REF (orig_stmt_info))
>>{
>> -  if (DR_IS_READ (STMT_VINFO_DATA_REF (orig_stmt_info)))
>> +  data_reference_p dr = STMT_VINFO_DATA_REF (orig_stmt_info);
>> +  tree base = get_base_address (DR_REF (dr));
>> +  /* When the scalar access is to a non-global not address-taken
>> + decl that is not BLKmode assume we can access it with a single
>> + non-load/store instruction.  */
>> +  if (DECL_P (base)
>> +  && !is_global_var (base)
>> +  && !TREE_ADDRESSABLE (base)
>> +  && DECL_MODE (base) != BLKmode)
>> +kind = scalar_stmt;
>> +  else if (DR_IS_READ (STMT_VINFO_DATA_REF (orig_stmt_info)))
>>kind = scalar_load;
>>  else
>>kind = scalar_store;
> 
> LGTM FWIW, but did you consider skipping the cost altogether?
> I'm not sure what the scalar_stmt would correspond to in practice,
> if we assume that the ABI (for parameters/returns) or RA (for locals)
> puts the data in a sensible register class for

Re: [PATCH v3 1/2] aarch64: Add AdvSIMD faminmax intrinsics

2024-08-08 Thread Kyrylo Tkachov

Hi Saurabh,

> On 7 Aug 2024, at 17:11, saurabh@arm.com wrote:
>
> External email: Use caution opening links or attachments
>
>
> The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
> mandatory from Armv9.5-a. It introduces instructions for computing the
> floating point absolute maximum and minimum of the two vectors element-wise.
>
> This patch does two things:
> 1. Introduces AdvSIMD faminmax intrinsics.
> 2. Move report_missing_extension and reported_missing_extension_p to
>   make it more usable.
>
> The intrinsics of this extension are implemented as the following
> builtin functions:
> * vamax_f16
> * vamaxq_f16
> * vamax_f32
> * vamaxq_f32
> * vamaxq_f64
> * vamin_f16
> * vaminq_f16
> * vamin_f32
> * vaminq_f32
> * vaminq_f64
>
> We moved the definition of `report_missing_extension` from
> gcc/config/aarch64/aarch64-sve-builtins.cc to
> gcc/config/aarch64/aarch64-builtins.cc and its declaration to
> gcc/config/aarch64/aarch64-builtins.h. We also moved the declaration
> of `reported_missing_extension_p` from
> gcc/config/aarch64/aarch64-sve-builtins.cc
> to gcc/config/aarch64/aarch64-builtins.cc, closer to the definition of
> `report_missing_extension`. In the exsiting code structure, this leads
> to `report_missing_extension` being usable from both normal builtins
> and sve builtins.
>
> gcc/ChangeLog:
>
>* config/aarch64/aarch64-builtins.cc
>(enum aarch64_builtins): New enum values for faminmax builtins.
>(aarch64_init_faminmax_builtins): New function to declare new
> builtins.
>(handle_arm_neon_h): Modify to call
> aarch64_init_faminmax_builtins.
>(aarch64_general_check_builtin_call): Modify to check whether
> +faminmax flag is being used and printing error message if not being
> used.
>(aarch64_expand_builtin_faminmax): New function to emit
> instructions of this extension.
>(aarch64_general_expand_builtin): Modify to call
> aarch64_expand_builtin_faminmax.
>(report_missing_extension): Move from
> config/aarch64/aarch64-sve-builtins.cc.
>* config/aarch64/aarch64-builtins.h
>(report_missing_extension): Declaration for this function so
> that it can be used wherever this header is included.
>(reported_missing_extension_p): Move from
> config/aarch64/aarch64-sve-builtins.cc
>* config/aarch64/aarch64-option-extensions.def
>(AARCH64_OPT_EXTENSION): Introduce new flag for this
> extension.
>* config/aarch64/aarch64-simd.md
>(aarch64_): Instruction pattern for
> faminmax intrinsics.
>* config/aarch64/aarch64-sve-builtins.cc
>(reported_missing_extension_p): Move to
> config/aarch64/aarch64-builtins.c
>(report_missing_extension): Move to
> config/aarch64/aarch64-builtins.cc
>* config/aarch64/aarch64.h
>(TARGET_FAMINMAX): Introduce new flag for this extension.
>* config/aarch64/iterators.md: Introduce new iterators for
>  faminmax intrinsics.
>* config/arm/types.md: Introduce neon_fp_aminmax attributes.
>* doc/invoke.texi: Document extension in AArch64 Options.
>

Thank you for the updates.
It seems now that the report_missing_extensions refactoring is also done by 
Andrew’s patch at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659875.html

Looks like you’ll need to coordinate on how to land this change.
I think Andrew’s changes should go in first and this patch to be rebased on top 
of that.
Otherwise ok.
Thanks,
Kyrill

> gcc/testsuite/ChangeLog:
>
>* gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test.
>* gcc.target/aarch64/simd/faminmax-builtins.c: New test.
> ---
> gcc/config/aarch64/aarch64-builtins.cc| 173 +-
> gcc/config/aarch64/aarch64-builtins.h |   5 +-
> .../aarch64/aarch64-option-extensions.def |   2 +
> gcc/config/aarch64/aarch64-simd.md|  11 ++
> gcc/config/aarch64/aarch64-sve-builtins.cc|  22 ---
> gcc/config/aarch64/aarch64.h  |   4 +
> gcc/config/aarch64/iterators.md   |   9 +
> gcc/config/arm/types.md   |   6 +
> gcc/doc/invoke.texi   |   2 +
> .../aarch64/simd/faminmax-builtins-no-flag.c  |  10 +
> .../aarch64/simd/faminmax-builtins.c  | 115 
> 11 files changed, 327 insertions(+), 32 deletions(-)
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c
>

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index 30669f8aa18..cd590186f22 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -829,6 +829,17 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  /* FAMINMAX builtins.  */
+  AARCH64_FAMINMAX_BUILTIN_FAMAX4H,
+  AARCH64_FAMINMAX_BUILTIN_FAMAX8H,
+  AARCH64_FAMINMAX_BUIL

Re: [PATCH] RISC-V: tree-optimization/116274 - overzealous SLP vectorization

2024-08-08 Thread Richard Sandiford

Richard Biener  writes:
>> Am 08.08.2024 um 15:12 schrieb Richard Sandiford :
>>>PR tree-optimization/116274
>>>* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Cost scalar loads
>>>and stores as simple scalar stmts when they access a non-global,
>>>not address-taken variable that doesn't have BLKmode assigned.
>>> 
>>>* gcc.target/i386/pr116274.c: New testcase.
>>> ---
>>> gcc/testsuite/gcc.target/i386/pr116274.c |  9 +
>>> gcc/tree-vect-slp.cc | 12 +++-
>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>> create mode 100644 gcc/testsuite/gcc.target/i386/pr116274.c
>>> 
>>> diff --git a/gcc/testsuite/gcc.target/i386/pr116274.c 
>>> b/gcc/testsuite/gcc.target/i386/pr116274.c
>>> new file mode 100644
>>> index 000..d5811344b93
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/i386/pr116274.c
>>> @@ -0,0 +1,9 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-slp2-optimized" } */
>>> +
>>> +struct a { long x,y; };
>>> +long test(struct a a) { return a.x+a.y; }
>>> +
>>> +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" } 
>>> } */
>>> +/* { dg-final { scan-assembler-times "addl|leaq" 1 } } */
>>> +/* { dg-final { scan-assembler-not "padd" } } */
>>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>>> index 3464d0c0e23..e43ff721100 100644
>>> --- a/gcc/tree-vect-slp.cc
>>> +++ b/gcc/tree-vect-slp.cc
>>> @@ -7807,7 +7807,17 @@ next_lane:
>>>   vect_cost_for_stmt kind;
>>>   if (STMT_VINFO_DATA_REF (orig_stmt_info))
>>>{
>>> -  if (DR_IS_READ (STMT_VINFO_DATA_REF (orig_stmt_info)))
>>> +  data_reference_p dr = STMT_VINFO_DATA_REF (orig_stmt_info);
>>> +  tree base = get_base_address (DR_REF (dr));
>>> +  /* When the scalar access is to a non-global not address-taken
>>> + decl that is not BLKmode assume we can access it with a single
>>> + non-load/store instruction.  */
>>> +  if (DECL_P (base)
>>> +  && !is_global_var (base)
>>> +  && !TREE_ADDRESSABLE (base)
>>> +  && DECL_MODE (base) != BLKmode)
>>> +kind = scalar_stmt;
>>> +  else if (DR_IS_READ (STMT_VINFO_DATA_REF (orig_stmt_info)))
>>>kind = scalar_load;
>>>  else
>>>kind = scalar_store;
>> 
>> LGTM FWIW, but did you consider skipping the cost altogether?
>> I'm not sure what the scalar_stmt would correspond to in practice,
>> if we assume that the ABI (for parameters/returns) or RA (for locals)
>> puts the data in a sensible register class for the datatype.
>
> On x86_64 you get up to two eightbytes in two gpr or float regs, so with an 
> example with four int we’d get a scalar shift for the second and fourth int 
> and with eight short you get to the point where the vector marshaling might 
> be profitable.  So it’s a heuristic that says it’s likely not zero cost but 
> definitely not as high as a load.  Anything better would need to know the 
> actual register passings.

Ah, yeah, fair enough.  I suppose that would be true for aarch64 too
on things like:

  struct a { char f[4]; };
  struct a test(struct a a) {
a.f[0] += 1;
a.f[1] += 2;
a.f[2] += 3;
a.f[3] += 4;
return a;
  }

It's just that there are important cases where it wouldn't happen for
floats on aarch64, and the scalar_stmt cost for floats is typcially 2.

But like you say, that could be fixed later, and this should be a
strict improvement over the status quo.

Richard

[PATCH] c++: DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P cleanups

2024-08-08 Thread Patrick Palka

DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P templates can only appear as part
of a template friend declaration, and in turn get partially instantiated
only from tsubst_friend_function or tsubst_friend_class.  So rather than
having tsubst_template_decl clear the flag, let's leave it up to the
tsubst friend routines to clear it so that template friend handling stays
localized (note that tsubst_friend_function already was clearing it).

Also the template depth comparison test within tsubst_friend_function is
equivalent to DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P since such templates
always have more levels than the class context, and it's not possible to
directly refer to an existing template that has more levels than the
current template context.

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_class): Simplify depth comparison test
in the redeclaration code path to
DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P.  Clear the flag in the
new template code path after partial instantiation here ...
(tsubst_template_decl): ... instead of here.
---
 gcc/cp/pt.cc | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 677ed7d1289..d468a3037b6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11764,8 +11764,7 @@ tsubst_friend_class (tree friend_tmpl, tree args)
   compatible with the attachment of the friend template.  */
module_may_redeclare (tmpl, friend_tmpl);
 
-  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (friend_tmpl))
- > TMPL_ARGS_DEPTH (args))
+  if (DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (friend_tmpl))
{
  tree parms = tsubst_template_parms (DECL_TEMPLATE_PARMS (friend_tmpl),
  args, tf_warning_or_error);
@@ -11807,6 +11806,7 @@ tsubst_friend_class (tree friend_tmpl, tree args)
  CLASSTYPE_USE_TEMPLATE (TREE_TYPE (tmpl)) = 0;
  CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl))
= INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl)));
+ DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (tmpl) = false;
 
  /* Substitute into and set the constraints on the new declaration.  */
  if (tree ci = get_constraints (friend_tmpl))
@@ -15008,8 +15008,6 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t 
complain,
   if (PRIMARY_TEMPLATE_P (t))
 DECL_PRIMARY_TEMPLATE (r) = r;
 
-  DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (r) = false;
-
   if (!lambda_fntype && !class_p)
 {
   /* Record this non-type partial instantiation.  */
-- 
2.46.0.39.g891ee3b9db

[PATCH] c++: clean up cp_identifier_kind checks

2024-08-08 Thread Patrick Palka

The predicates for checking an IDENTIFIER node's cp_identifier_kind
currently directly test the three flag bits that encode the kind.  This
patch instead makes the checks first reconstruct the cp_identifier_kind
in its entirety and then compare that.

gcc/cp/ChangeLog:

* cp-tree.h (get_identifier_kind): Define.
(IDENTIFIER_KEYWORD_P): Redefine using get_identifier_kind.
(IDENTIFIER_CDTOR_P): Likewise.
(IDENTIFIER_CTOR_P): Likewise.
(IDENTIFIER_DTOR_P): Likewise.
(IDENTIFIER_ANY_OP_P): Likewise.
(IDENTIFIER_OVL_OP_P): Likewise.
(IDENTIFIER_ASSIGN_OP_P): Likewise.
(IDENTIFIER_CONV_OP_P): Likewise.
(IDENTIFIER_TRAIT_P): Likewise.
* parser.cc (cp_lexer_peek_trait): Mark IDENTIFIER_TRAIT_P
test UNLIKELY.
---
 gcc/cp/cp-tree.h | 41 +
 gcc/cp/parser.cc |  3 ++-
 2 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b81bc91208f..0c25ec5a04e 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1255,56 +1255,57 @@ enum cp_identifier_kind {
 #define IDENTIFIER_VIRTUAL_P(NODE) \
   TREE_LANG_FLAG_5 (IDENTIFIER_NODE_CHECK (NODE))
 
+/* Return the cp_identifier_kind of the given IDENTIFIER node ID.  */
+
+ATTRIBUTE_PURE inline cp_identifier_kind
+get_identifier_kind (tree id)
+{
+  unsigned bit0 = IDENTIFIER_KIND_BIT_0 (id);
+  unsigned bit1 = IDENTIFIER_KIND_BIT_1 (id);
+  unsigned bit2 = IDENTIFIER_KIND_BIT_2 (id);
+  return cp_identifier_kind ((bit2 << 2) | (bit1 << 1) | bit0);
+}
+
 /* True if this identifier is a reserved word.  C_RID_CODE (node) is
then the RID_* value of the keyword.  Value 1.  */
 #define IDENTIFIER_KEYWORD_P(NODE) \
-  ((!IDENTIFIER_KIND_BIT_2 (NODE)) \
-   & (!IDENTIFIER_KIND_BIT_1 (NODE))   \
-   & IDENTIFIER_KIND_BIT_0 (NODE))
+  (get_identifier_kind (NODE) == cik_keyword)
 
 /* True if this identifier is the name of a constructor or
destructor.  Value 2 or 3.  */
 #define IDENTIFIER_CDTOR_P(NODE)   \
-  ((!IDENTIFIER_KIND_BIT_2 (NODE)) \
-   & IDENTIFIER_KIND_BIT_1 (NODE))
+  (IDENTIFIER_CTOR_P (NODE) || IDENTIFIER_DTOR_P (NODE))
 
 /* True if this identifier is the name of a constructor.  Value 2.  */
 #define IDENTIFIER_CTOR_P(NODE)\
-  (IDENTIFIER_CDTOR_P(NODE)\
-& (!IDENTIFIER_KIND_BIT_0 (NODE)))
+  (get_identifier_kind (NODE) == cik_ctor)
 
 /* True if this identifier is the name of a destructor.  Value 3.  */
 #define IDENTIFIER_DTOR_P(NODE)\
-  (IDENTIFIER_CDTOR_P(NODE)\
-& IDENTIFIER_KIND_BIT_0 (NODE))
+  (get_identifier_kind (NODE) == cik_dtor)
 
 /* True if this identifier is for any operator name (including
conversions).  Value 4, 5, or 6.  */
 #define IDENTIFIER_ANY_OP_P(NODE)  \
-  (IDENTIFIER_KIND_BIT_2 (NODE) && !IDENTIFIER_TRAIT_P (NODE))
+  (IDENTIFIER_OVL_OP_P (NODE) || IDENTIFIER_CONV_OP_P (NODE))
 
 /* True if this identifier is for an overloaded operator. Values 4, 5.  */
 #define IDENTIFIER_OVL_OP_P(NODE)  \
-  (IDENTIFIER_ANY_OP_P (NODE)  \
-   & (!IDENTIFIER_KIND_BIT_1 (NODE)))
+  (get_identifier_kind (NODE) == cik_simple_op  \
+   || get_identifier_kind (NODE) == cik_assign_op)
 
 /* True if this identifier is for any assignment. Values 5.  */
 #define IDENTIFIER_ASSIGN_OP_P(NODE)   \
-  (IDENTIFIER_OVL_OP_P (NODE)  \
-   & IDENTIFIER_KIND_BIT_0 (NODE))
+  (get_identifier_kind (NODE) == cik_assign_op)
 
 /* True if this identifier is the name of a type-conversion
operator.  Value 6.  */
 #define IDENTIFIER_CONV_OP_P(NODE) \
-  (IDENTIFIER_ANY_OP_P (NODE)  \
-   & IDENTIFIER_KIND_BIT_1 (NODE)  \
-   & (!IDENTIFIER_KIND_BIT_0 (NODE)))
+  (get_identifier_kind (NODE) == cik_conv_op)
 
 /* True if this identifier is the name of a built-in trait.  */
 #define IDENTIFIER_TRAIT_P(NODE)   \
-  (IDENTIFIER_KIND_BIT_0 (NODE)\
-   & IDENTIFIER_KIND_BIT_1 (NODE)  \
-   & IDENTIFIER_KIND_BIT_2 (NODE))
+  (get_identifier_kind (NODE) == cik_trait)
 
 /* True if this identifier is a new or delete operator.  */
 #define IDENTIFIER_NEWDEL_OP_P(NODE)   \
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f625b0a310c..60ff41ec9fa 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -1198,7 +1198,8 @@ static const cp_trait *
 cp_lexer_peek_trait (cp_lexer *lexer)
 {
   const cp_token *token1 = cp_lexer_peek_token (lexer);
-  if (token1->type == CPP_NAME && IDENTIFIER_TRAIT_P (token1->u.value))
+  if (token1->type == CPP_NAME
+  && UNLIKELY (IDENTIFIER_TRAIT_P (token1->u.value)))
 {
   const cp_trait &trait = cp_traits[IDENTIFIER_CP_INDEX (token1->u.value)];
   const bool is_pack_element = (trait.kind == CPTK_TYPE_PACK_ELEMENT);
-- 
2.46.0.39.g891ee3b9db

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Martin Uecker

Am Donnerstag, dem 08.08.2024 um 16:56 +0200 schrieb Jens Gustedt:
> Am 8. August 2024 13:28:57 MESZ schrieb Joseph Myers :
> > On Thu, 8 Aug 2024, Alejandro Colomar wrote:
> > 
> > > Hi Jens,
> > > 
> > > On Thu, Aug 08, 2024 at 11:13:02AM GMT, Jens Gustedt wrote:
> > > > > but to maintain expectations, I think it would be better to do
> > > > > the same here.
> > > > > 
> > > > 
> > > > Just to compare, the recent additions in C23 typeof etc. only have the
> > > > parenthesized versions. So there would be precedent. And it really
> > > > eases transition
> > > > 
> > > Hmmm, interesting.
> > > 
> > > The good part of reusing sizeof syntax is that I can reuse internal code
> > > for sizeof. But I'll check if I can change it easily to only support
> > > parens.
> > > 
> > 
> > Since typeof produces a type, it's used in different syntactic contexts 
> > from sizeof, so has different ambiguity issues, and requiring parentheses 
> > with typeof is not relevant to sizeof/lengthof. I think lengthof should 
> > follow sizeof. Make sure there's a testcase for lengthof applied to a 
> > compound literal (the case that illustrates how, on parsing sizeof 
> > (type-name), the compiler needs to see what comes after (type-name) to 
> > determine whether it's actually sizeof applied to an expression (if '{' 
> > follows) or to a type (otherwise)). (If you're following the sizeof 
> > implementation closely enough, this should just work.)

> Hi, 
> I am not convinced that we should introduce the same syntax weirdness
> for this feature. sizeof seems to be the only place in the core language
> where a keyword is used as an operator in expressions, and
> that does not resemble function-call notation. In particular your 
> example with compound literals shows that we could avoid syntax look-ahead 
> by not doing this. 

It is the other way around: With the "(" there is the ambiguity
whether this starts a compound literal or a type name enclosed
in parentheses.  But this is not problematic for parsing.

Martin


> (People argued violently against look-ahead when we discussed
> possible inclusion of lambdas into C23)

> We don't have to repeat all historic accidents when inventing a new feature.
> Sure that gcc may invent anything to their liking, but when and if we pass 
> this
> for standardisation we will give such considerations a careful look.

> Jens

Re: [PATCH v2 1/4] aarch64: Refactor check_required_extensions

2024-08-08 Thread Richard Sandiford

Andrew Carlotti  writes:
> Move SVE extension checking functionality to aarch64-builtins.cc, so
> that it can be shared by non-SVE intrinsics.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve-builtins.cc (check_builtin_call)
>   (expand_builtin): Update calls to the below.
>   (report_missing_extension, check_required_registers)
>   (check_required_extensions): Move out of aarch64_sve namespace,
>   rename, and move into...
>   * config/aarch64/aarch64-builtins.cc (aarch64_report_missing_extension)
>   (aarch64_check_non_general_registers)
>   (aarch64_check_required_extensions) ...here.
>   * config/aarch64/aarch64-protos.h (aarch64_check_required_extensions):
>   Add prototype.
>
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> 30669f8aa1823b64689c67e306d38e234bd31698..d0fb8bc1d1fedb382cba1a1f09a9c3ce6757ee22
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -2180,6 +2180,110 @@ aarch64_general_builtin_decl (unsigned code, bool)
>return aarch64_builtin_decls[code];
>  }
>  
> +/* True if we've already complained about attempts to use functions
> +   when the required extension is disabled.  */
> +static bool reported_missing_extension_p;
> +
> +/* True if we've already complained about attempts to use functions
> +   which require registers that are missing.  */
> +static bool reported_missing_registers_p;
> +
> +/* Report an error against LOCATION that the user has tried to use
> +   function FNDECL when extension EXTENSION is disabled.  */
> +static void
> +aarch64_report_missing_extension (location_t location, tree fndecl,
> +   const char *extension)
> +{
> +  /* Avoid reporting a slew of messages for a single oversight.  */
> +  if (reported_missing_extension_p)
> +return;
> +
> +  error_at (location, "ACLE function %qD requires ISA extension %qs",
> + fndecl, extension);
> +  inform (location, "you can enable %qs using the command-line"
> +   " option %<-march%>, or by using the %"
> +   " attribute or pragma", extension);
> +  reported_missing_extension_p = true;
> +}
> +
> +/* Check whether non-general registers required by ACLE function fndecl are
> + * available.  Report an error against LOCATION and return false if not.  */

Nit: should be no leading "*" on this line.

> +static bool
> +aarch64_check_non_general_registers (location_t location, tree fndecl)
> +{
> +  /* Avoid reporting a slew of messages for a single oversight.  */
> +  if (reported_missing_registers_p)
> +return false;
> +
> +  if (TARGET_GENERAL_REGS_ONLY)
> +{
> +  /* FP/SIMD/SVE registers are not usable when -mgeneral-regs-only option
> +  is specified.  */
> +  error_at (location,
> + "ACLE function %qD is incompatible with the use of %qs",
> + fndecl, "-mgeneral-regs-only");
> +  reported_missing_registers_p = true;
> +  return false;
> +}
> +
> +  return true;
> +}
> +
> +/* Check whether all the AARCH64_FL_* values in REQUIRED_EXTENSIONS are
> +   enabled, given that those extensions are required for function FNDECL.
> +   Report an error against LOCATION if not.
> +   If REQUIRES_NON_GENERAL_REGISTERS is true, then also check whether
> +   non-general registers are available.  */
> +bool
> +aarch64_check_required_extensions (location_t location, tree fndecl,
> +aarch64_feature_flags required_extensions,
> +bool requires_non_general_registers)

Rather than pass requires_non_general_registers, could we just test
whether:

  (get_flags_off (AARCH64_FL_FP) & required_extensions)

?  (The call is a constexpr.)  So:

> +{
> +  auto missing_extensions = required_extensions & ~aarch64_asm_isa_flags;
> +  if (missing_extensions == 0)
> +return requires_non_general_registers
> +? aarch64_check_non_general_registers (location, fndecl)
> +: true;

return (!(get_flags_off (AARCH64_FL_FP) & required_extensions)
|| aarch64_check_non_general_registers (location, fndecl));

LGTM otherwise, but please give 24 hours for others to comment.

Thanks,
Richard

> +
> +  if (missing_extensions & AARCH64_FL_SM_OFF)
> +{
> +  error_at (location, "ACLE function %qD cannot be called when"
> + " SME streaming mode is enabled", fndecl);
> +  return false;
> +}
> +
> +  if (missing_extensions & AARCH64_FL_SM_ON)
> +{
> +  error_at (location, "ACLE function %qD can only be called when"
> + " SME streaming mode is enabled", fndecl);
> +  return false;
> +}
> +
> +  if (missing_extensions & AARCH64_FL_ZA_ON)
> +{
> +  error_at (location, "ACLE function %qD can only be called from"
> + " a function that has %qs state", fndecl, "za");
> +  return false;
> +}
> +
> +  static const struct {
> +aarch64_feature_fl

[PATCH v2] Support if conversion for switches

2024-08-08 Thread Andi Kleen

The gimple-if-to-switch pass converts if statements with
multiple equal checks on the same value to a switch. This breaks
vectorization which cannot handle switches.

Teach the tree-if-conv pass used by the vectorizer to handle
simple switch statements, like those created by if-to-switch earlier.
These are switches that only have a single non default block,
They are handled similar to COND in if conversion.

This makes the vect-bitfield-read-1-not test fail. The test
checks for a bitfield analysis failing, but it actually
relied on the ifcvt erroring out early because the test
is using a switch. The if conversion still does not
work because the switch is not in a form that this
patch can handle, but it fails much later and the bitfield
analysis succeeds, which makes the test fail. I marked
it xfail because it doesn't seem to be testing what it wants
to test.

[v2: Fix tests to run correctly. Update comments and commit log.
 Fix gimple switch accessor use.]

gcc/ChangeLog:

PR tree-opt/115866
* tree-if-conv.cc (if_convertible_switch_p): New function.
(if_convertible_stmt_p): Check for switch.
(get_loop_body_in_if_conv_order): Handle switch.
(predicate_bbs): Likewise.
(predicate_statements): Likewise.
(remove_conditions_and_labels): Likewise.
(ifcvt_split_critical_edges): Likewise.
(ifcvt_local_dce): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-1.c: New test.
* gcc.dg/vect/vect-switch-ifcvt-2.c: New test.
* gcc.dg/vect/vect-switch-search-line-fast.c: New test.
* gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.
---
 gcc/doc/cfg.texi  |   4 +-
 .../gcc.dg/vect/vect-bitfield-read-1-not.c|   2 +-
 .../gcc.dg/vect/vect-switch-ifcvt-1.c | 115 ++
 .../gcc.dg/vect/vect-switch-ifcvt-2.c |  49 
 .../vect/vect-switch-search-line-fast.c   |  17 +++
 gcc/tree-if-conv.cc   |  93 +-
 6 files changed, 272 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c

diff --git a/gcc/doc/cfg.texi b/gcc/doc/cfg.texi
index 9a22420f91f..a6f2b9f97d6 100644
--- a/gcc/doc/cfg.texi
+++ b/gcc/doc/cfg.texi
@@ -83,13 +83,13 @@ lexicographical order, except @code{ENTRY_BLOCK} and 
@code{EXIT_BLOCK}.
 The macro @code{FOR_ALL_BB} also visits all basic blocks in
 lexicographical order, including @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}.
 
-@findex post_order_compute, inverted_post_order_compute, walk_dominator_tree
+@findex post_order_compute, inverted_post_order_compute, dom_walker::walk
 The functions @code{post_order_compute} and @code{inverted_post_order_compute}
 can be used to compute topological orders of the CFG.  The orders are
 stored as vectors of basic block indices.  The @code{BASIC_BLOCK} array
 can be used to iterate each basic block by index.
 Dominator traversals are also possible using
-@code{walk_dominator_tree}.  Given two basic blocks A and B, block A
+@code{dom_walker::walk}.  Given two basic blocks A and B, block A
 dominates block B if A is @emph{always} executed before B@.
 
 Each @code{basic_block} also contains pointers to the first
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
index 0d91067ebb2..85f4de8464a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
@@ -55,6 +55,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+/* { dg-final { scan-tree-dump-times "Bitfield OK to lower." 0 "ifcvt" { xfail 
*-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
new file mode 100644
index 000..f5352ef8ed7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
@@ -0,0 +1,115 @@
+/* { dg-require-effective-target vect_int } */
+#include "tree-vect.h"
+
+extern void abort (void);
+
+int
+f1 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  switch (*s)
+   {
+   case ',':
+   case '|':
+ c++;
+   }
+  s++;
+}
+  return c;
+}
+
+int
+f2 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s != '#')
+   {
+ switch (*s)
+   {
+   case ',':
+   case '|':
+ c++;
+   }
+   }
+  s++;
+}
+  return c;
+}
+
+int
+f3 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s != '#')
+if (*s == ',' || *s == '|' || *s == '@' || *s == '*')
+ c++;
+  s++;
+}
+  return c;
+}
+
+
+int
+f4 (char *s)
+{
+

[PATCH] Fix reference to the dom walker function in the documentation

2024-08-08 Thread Andi Kleen

From: Andi Kleen 

It is using a class now with a different name.

I will commit as obvious unless someone complains

Also I included this patch by mistake in my earlier if conversion v2
patch. Please ignore that hunk there.

gcc/ChangeLog:

* doc/cfg.texi: Fix references to dom_walker.
---
 gcc/doc/cfg.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/cfg.texi b/gcc/doc/cfg.texi
index 9a22420f91f..a6f2b9f97d6 100644
--- a/gcc/doc/cfg.texi
+++ b/gcc/doc/cfg.texi
@@ -83,13 +83,13 @@ lexicographical order, except @code{ENTRY_BLOCK} and 
@code{EXIT_BLOCK}.
 The macro @code{FOR_ALL_BB} also visits all basic blocks in
 lexicographical order, including @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}.
 
-@findex post_order_compute, inverted_post_order_compute, walk_dominator_tree
+@findex post_order_compute, inverted_post_order_compute, dom_walker::walk
 The functions @code{post_order_compute} and @code{inverted_post_order_compute}
 can be used to compute topological orders of the CFG.  The orders are
 stored as vectors of basic block indices.  The @code{BASIC_BLOCK} array
 can be used to iterate each basic block by index.
 Dominator traversals are also possible using
-@code{walk_dominator_tree}.  Given two basic blocks A and B, block A
+@code{dom_walker::walk}.  Given two basic blocks A and B, block A
 dominates block B if A is @emph{always} executed before B@.
 
 Each @code{basic_block} also contains pointers to the first
-- 
2.45.2

Re: [PATCH v2 2/4] aarch64: Fix tme intrinsic availability

2024-08-08 Thread Richard Sandiford

Andrew Carlotti  writes:
> The availability of tme intrinsics was previously gated at both
> initialisation time (using global target options) and usage time
> (accounting for function-specific target options).  This patch removes
> the check at initialisation time, and also moves the intrinsics out of
> the header file to allow for better error messages (matching the
> existing error messages for SVE intrinsics).
>
> gcc/ChangeLog:
>
>   PR target/112108
>   * config/aarch64/aarch64-builtins.cc (aarch64_init_tme_builtins):
>   (aarch64_general_init_builtins): Move tme initialisation...
>   (handle_arm_acle_h): ...to here, and remove feature check.
>   (aarch64_general_check_builtin_call): Check tme intrinsics.
>   (aarch64_expand_builtin_tme): Check feature availability.
>   * config/aarch64/arm_acle.h (__tstart, __tcommit, __tcancel)
>   (__ttest): Remove.
>   (_TMFAILURE_*): Define unconditionally.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/112108
>   * gcc.target/aarch64/acle/tme_guard-1.c: New test.
>   * gcc.target/aarch64/acle/tme_guard-2.c: New test.
>   * gcc.target/aarch64/acle/tme_guard-3.c: New test.
>   * gcc.target/aarch64/acle/tme_guard-4.c: New test.
>
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> d0fb8bc1d1fedb382cba1a1f09a9c3ce6757ee22..f7d31d8c4308b4a883f8ce7df5c3ee319a9c
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1791,19 +1791,19 @@ aarch64_init_tme_builtins (void)
>  = build_function_type_list (void_type_node, uint64_type_node, NULL);
>  
>aarch64_builtin_decls[AARCH64_TME_BUILTIN_TSTART]
> -= aarch64_general_add_builtin ("__builtin_aarch64_tstart",
> += aarch64_general_simulate_builtin ("__tstart",
>  ftype_uint64_void,
>  AARCH64_TME_BUILTIN_TSTART);
>aarch64_builtin_decls[AARCH64_TME_BUILTIN_TTEST]
> -= aarch64_general_add_builtin ("__builtin_aarch64_ttest",
> += aarch64_general_simulate_builtin ("__ttest",
>  ftype_uint64_void,
>  AARCH64_TME_BUILTIN_TTEST);
>aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCOMMIT]
> -= aarch64_general_add_builtin ("__builtin_aarch64_tcommit",
> += aarch64_general_simulate_builtin ("__tcommit",
>  ftype_void_void,
>  AARCH64_TME_BUILTIN_TCOMMIT);
>aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCANCEL]
> -= aarch64_general_add_builtin ("__builtin_aarch64_tcancel",
> += aarch64_general_simulate_builtin ("__tcancel",
>  ftype_void_uint64,
>  AARCH64_TME_BUILTIN_TCANCEL);

Very minor, sorry, but could you reindent the arguments to match
the new function name?

>  }
> @@ -2068,6 +2068,7 @@ handle_arm_acle_h (void)
>  {
>if (TARGET_LS64)
>  aarch64_init_ls64_builtins ();
> +  aarch64_init_tme_builtins ();
>  }
>  
>  /* Initialize fpsr fpcr getters and setters.  */
> @@ -2160,9 +2161,6 @@ aarch64_general_init_builtins (void)
>if (!TARGET_ILP32)
>  aarch64_init_pauth_hint_builtins ();
>  
> -  if (TARGET_TME)
> -aarch64_init_tme_builtins ();
> -
>if (TARGET_MEMTAG)
>  aarch64_init_memtag_builtins ();
>  
> @@ -2289,6 +2287,7 @@ aarch64_general_check_builtin_call (location_t 
> location, vec,
>   unsigned int code, tree fndecl,
>   unsigned int nargs ATTRIBUTE_UNUSED, tree *args)
>  {
> +  tree decl = aarch64_builtin_decls[code];
>switch (code)
>  {
>  case AARCH64_RSR:
> @@ -2301,15 +2300,28 @@ aarch64_general_check_builtin_call (location_t 
> location, vec,
>  case AARCH64_WSR64:
>  case AARCH64_WSRF:
>  case AARCH64_WSRF64:
> -  tree addr = STRIP_NOPS (args[0]);
> -  if (TREE_CODE (TREE_TYPE (addr)) != POINTER_TYPE
> -   || TREE_CODE (addr) != ADDR_EXPR
> -   || TREE_CODE (TREE_OPERAND (addr, 0)) != STRING_CST)
> - {
> -   error_at (location, "first argument to %qD must be a string literal",
> - fndecl);
> -   return false;
> - }
> +  {
> + tree addr = STRIP_NOPS (args[0]);
> + if (TREE_CODE (TREE_TYPE (addr)) != POINTER_TYPE
> + || TREE_CODE (addr) != ADDR_EXPR
> + || TREE_CODE (TREE_OPERAND (addr, 0)) != STRING_CST)
> +   {
> + error_at (location, "first argument to %qD must be a string 
> literal",
> +   fndecl);
> + return false;
> +   }
> + break;
> +  }
> +
> +case AARCH64_TME_BUILTIN_TSTART:
> +case AARCH64_TME_BUILTIN_TCOMMIT:
> +case AARCH64_TME_BUILTIN_TTEST:
> +case AARCH64_TME_BUILTIN_TCANCEL:
> +  return aarch64_check_required_extensions (location, decl,
> + AARCH64_FL_TME, f

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Jens Gustedt

Am 8. August 2024 17:42:54 MESZ schrieb Martin Uecker :
> Am Donnerstag, dem 08.08.2024 um 16:56 +0200 schrieb Jens Gustedt:
> > Am 8. August 2024 13:28:57 MESZ schrieb Joseph Myers :
> > > On Thu, 8 Aug 2024, Alejandro Colomar wrote:
> > > 
> > > > Hi Jens,
> > > > 
> > > > On Thu, Aug 08, 2024 at 11:13:02AM GMT, Jens Gustedt wrote:
> > > > > > but to maintain expectations, I think it would be better to do
> > > > > > the same here.
> > > > > > 
> > > > > 
> > > > > Just to compare, the recent additions in C23 typeof etc. only have the
> > > > > parenthesized versions. So there would be precedent. And it really
> > > > > eases transition
> > > > > 
> > > > Hmmm, interesting.
> > > > 
> > > > The good part of reusing sizeof syntax is that I can reuse internal code
> > > > for sizeof. But I'll check if I can change it easily to only support
> > > > parens.
> > > > 
> > > 
> > > Since typeof produces a type, it's used in different syntactic contexts 
> > > from sizeof, so has different ambiguity issues, and requiring parentheses 
> > > with typeof is not relevant to sizeof/lengthof. I think lengthof should 
> > > follow sizeof. Make sure there's a testcase for lengthof applied to a 
> > > compound literal (the case that illustrates how, on parsing sizeof 
> > > (type-name), the compiler needs to see what comes after (type-name) to 
> > > determine whether it's actually sizeof applied to an expression (if '{' 
> > > follows) or to a type (otherwise)). (If you're following the sizeof 
> > > implementation closely enough, this should just work.)
> 
> > Hi, 
> > I am not convinced that we should introduce the same syntax weirdness
> > for this feature. sizeof seems to be the only place in the core language
> > where a keyword is used as an operator in expressions, and
> > that does not resemble function-call notation. In particular your 
> > example with compound literals shows that we could avoid syntax look-ahead 
> > by not doing this. 
> 
> It is the other way around: With the "(" there is the ambiguity
> whether this starts a compound literal or a type name enclosed
> in parentheses.  But this is not problematic for parsing.

No, the ambiguity is there because the first ( after the keyword could start 
either a type in parenthesis or an expression, and among these a compound 
literal. If that first parenthesis would be part of the construct (as for the 
typeof or offsetof constructs) there would be no ambiguity a the only look 
ahead would be balanced parenthesis parsing.

And just because there is "no problem"
because we learned to deal with this weirdness, it still doesn't mean we have 
to write an inconsistency forward for which we don't even remember why we have 
it.


> 
> Martin
> 
> 
> > (People argued violently against look-ahead when we discussed
> > possible inclusion of lambdas into C23)
> 
> > We don't have to repeat all historic accidents when inventing a new feature.
> > Sure that gcc may invent anything to their liking, but when and if we pass 
> > this
> > for standardisation we will give such considerations a careful look.
> 
> > Jens
> 


-- 
Jens Gustedt - INRIA & ICube, Strasbourg, France

[committed] amdgcn: Fix VGPR max count

2024-08-08 Thread Andrew Stubbs

The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means the
maximum usable number of registers is 252.  This patch prevents the compiler
from exceeding this artifical limit.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers
remaining after maximum allocation using TARGET_VGPR_GRANULARITY.
---
 gcc/config/gcn/gcn.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index b22132de6ab..0725d15c8ed 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -2493,6 +2493,13 @@ gcn_secondary_reload (bool in_p, rtx x, reg_class_t 
rclass,
 static void
 gcn_conditional_register_usage (void)
 {
+  /* Some architectures have a register allocation granularity that does not
+ permit use of the full register count.  */
+  for (int i = 256 - (256 % TARGET_VGPR_GRANULARITY);
+   i < 256;
+   i++)
+fixed_regs[VGPR_REGNO (i)] = call_used_regs[VGPR_REGNO (i)] = 1;
+
   if (!cfun || !cfun->machine)
 return;
 
-- 
2.45.2

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Martin Uecker

Am Donnerstag, dem 08.08.2024 um 18:23 +0200 schrieb Jens Gustedt:
> As said, even if we don't consider this problematic because we are used to 
> the mildly complex case distinction that you just exposed over several 
> paragraphs, it doesn't mean that we should
> do it, nor does it mean that it would be beneficial for our users or for 
> other implementations that would like to follow. 
> 
> And also as said, all other features in the standard, being types, typeof, or 
> expressions, e.g offsetof, unreachable or other gnu extensions,  don't have 
> nor need this kind of syntax.
> 
> We should be designing features for the future, not the past


While not problematic for parsing, I see now how the grammar becomes
better if we eliminated this quirk. Thanks!

But we should then deprecate this for sizeof too.


Martin


> 
> Jens

Re: [PATCH v2 3/4] aarch64: Fix memtag intrinsic availability

2024-08-08 Thread Richard Sandiford

Andrew Carlotti  writes:
> The availability of memtag intrinsics and data types were determined
> solely by the globally specified architecture features, which did not
> reflect any changes specified in target pragmas or attributes.
>
> This patch removes the initialisation-time guards for the intrinsics,
> and replaces them with checks at use time. It also removes the macro
> indirection from the header file - this simplifies the header, and
> allows the missing extension error reporting to find the user-facing
> intrinsic names.
>
> gcc/ChangeLog:
>
>   PR target/112108
>   * config/aarch64/aarch64-builtins.cc (aarch64_init_memtag_builtins):
>   Replace internal builtin names with intrinsic names.
>   (aarch64_general_init_builtins): Move memtag intialisation...
>   (handle_arm_acle_h): ...to here, and remove feature check.
>   (aarch64_general_check_builtin_call): Check memtag intrinsics.
>   (aarch64_expand_builtin_memtag): Add feature check.
>   * config/aarch64/arm_acle.h (__arm_mte_create_random_tag)
>   (__arm_mte_exclude_tag, __arm_mte_ptrdiff)
>   (__arm_mte_increment_tag, __arm_mte_set_tag, __arm_mte_get_tag):
>   Remove.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/112108
>   * gcc.target/aarch64/acle/memtag_guard-1.c: New test.
>   * gcc.target/aarch64/acle/memtag_guard-2.c: New test.
>   * gcc.target/aarch64/acle/memtag_guard-3.c: New test.
>   * gcc.target/aarch64/acle/memtag_guard-4.c: New test.

Same comments about reindentation and expand checking as for 2/4.
Also, one very minor nit:

> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> f7d31d8c4308b4a883f8ce7df5c3ee319a9c..50667e555497b483aea6a64bb5809ddc62cedf83
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1936,7 +1936,7 @@ aarch64_init_memtag_builtins (void)
>  
>  #define AARCH64_INIT_MEMTAG_BUILTINS_DECL(F, N, I, T) \
>aarch64_builtin_decls[AARCH64_MEMTAG_BUILTIN_##F] \
> -= aarch64_general_add_builtin ("__builtin_aarch64_memtag_"#N, \
> += aarch64_general_simulate_builtin ("__arm_mte_"#N, \
>  T, AARCH64_MEMTAG_BUILTIN_##F); \
>aarch64_memtag_builtin_data[AARCH64_MEMTAG_BUILTIN_##F - \
> AARCH64_MEMTAG_BUILTIN_START - 1] = \
> @@ -1944,19 +1944,19 @@ aarch64_init_memtag_builtins (void)
>  
>fntype = build_function_type_list (ptr_type_node, ptr_type_node,
>uint64_type_node, NULL);
> -  AARCH64_INIT_MEMTAG_BUILTINS_DECL (IRG, irg, irg, fntype);
> +  AARCH64_INIT_MEMTAG_BUILTINS_DECL (IRG, create_random_tag, irg, fntype);
>  
>fntype = build_function_type_list (uint64_type_node, ptr_type_node,
>uint64_type_node, NULL);
> -  AARCH64_INIT_MEMTAG_BUILTINS_DECL (GMI, gmi, gmi, fntype);
> +  AARCH64_INIT_MEMTAG_BUILTINS_DECL (GMI, exclude_tag, gmi, fntype);
>  
>fntype = build_function_type_list (ptrdiff_type_node, ptr_type_node,
>ptr_type_node, NULL);
> -  AARCH64_INIT_MEMTAG_BUILTINS_DECL (SUBP, subp, subp, fntype);
> +  AARCH64_INIT_MEMTAG_BUILTINS_DECL (SUBP, ptrdiff, subp, fntype);
>  
>fntype = build_function_type_list (ptr_type_node, ptr_type_node,
>unsigned_type_node, NULL);
> -  AARCH64_INIT_MEMTAG_BUILTINS_DECL (INC_TAG, inc_tag, addg, fntype);
> +  AARCH64_INIT_MEMTAG_BUILTINS_DECL (INC_TAG, increment_tag, addg, fntype);
>  
>fntype = build_function_type_list (void_type_node, ptr_type_node, NULL);
>AARCH64_INIT_MEMTAG_BUILTINS_DECL (SET_TAG, set_tag, stg, fntype);
> @@ -2069,6 +2069,7 @@ handle_arm_acle_h (void)
>if (TARGET_LS64)
>  aarch64_init_ls64_builtins ();
>aarch64_init_tme_builtins ();
> +  aarch64_init_memtag_builtins ();
>  }
>  
>  /* Initialize fpsr fpcr getters and setters.  */
> @@ -2161,9 +2162,6 @@ aarch64_general_init_builtins (void)
>if (!TARGET_ILP32)
>  aarch64_init_pauth_hint_builtins ();
>  
> -  if (TARGET_MEMTAG)
> -aarch64_init_memtag_builtins ();
> -
>if (in_lto_p)
>  handle_arm_acle_h ();
>  }
> @@ -2323,7 +2321,12 @@ aarch64_general_check_builtin_call (location_t 
> location, vec,
>  default:
>break;
>  }
> -  /* Default behavior.  */
> +
> +  if (code >= AARCH64_MEMTAG_BUILTIN_START
> +  && code <= AARCH64_MEMTAG_BUILTIN_END)
> + return aarch64_check_required_extensions (location, decl,
> +   AARCH64_FL_MEMTAG, false);

The return statement should be indented by 4 rather than 8 columns.

LGTM otherwise, but please give others 24 hours to comment.

Thanks,
Richard

> +
>return true;
>  }
>  
> @@ -3098,6 +3101,11 @@ aarch64_expand_builtin_memtag (int fcode, tree exp, 
> rtx target)
>return const0_rtx;
>  }
>  
> +  tree fndecl = aarch64_builtin_decls[fcode];
> +

Re: [PATCH v2 4/4] aarch64: Fix ls64 intrinsic availability

2024-08-08 Thread Richard Sandiford

Andrew Carlotti  writes:
> The availability of ls64 intrinsics and data types were determined
> solely by the globally specified architecture features, which did not
> reflect any changes specified in target pragmas or attributes.
>
> This patch removes the initialisation-time guards for the intrinsics,
> and replaces them with checks at use time. We also get better error
> messages when ls64 is not available (matching the existing error
> messages for SVE intrinsics).
>
> The data512_t type is made always available; this is consistent with the
> present behaviour for Neon fp16/bf16 types.
>
> gcc/ChangeLog:
>
>   PR target/112108
>   * config/aarch64/aarch64-builtins.cc (handle_arm_acle_h): Remove
>   feature check at initialisation.
>   (aarch64_general_check_builtin_call): Check ls64 intrinsics.
>   (aarch64_expand_builtin_ls64): Add feature check.
>   * config/aarch64/arm_acle.h: (data512_t) Make always available.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/112108
>   * gcc.target/aarch64/acle/ls64_guard-1.c: New test.
>   * gcc.target/aarch64/acle/ls64_guard-2.c: New test.
>   * gcc.target/aarch64/acle/ls64_guard-3.c: New test.
>   * gcc.target/aarch64/acle/ls64_guard-4.c: New test.

Same comment as 2/4 about checking during expansion.  LGTM otherwise,
but please give others 24 hours to comment.

Thanks for cleaning this up.

Richard

> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> 50667e555497b483aea6a64bb5809ddc62cedf83..ba0147a2077514b4d2a6f9bccc8e7fe897d891b3
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -2066,8 +2066,7 @@ aarch64_init_data_intrinsics (void)
>  void
>  handle_arm_acle_h (void)
>  {
> -  if (TARGET_LS64)
> -aarch64_init_ls64_builtins ();
> +  aarch64_init_ls64_builtins ();
>aarch64_init_tme_builtins ();
>aarch64_init_memtag_builtins ();
>  }
> @@ -2318,6 +2317,13 @@ aarch64_general_check_builtin_call (location_t 
> location, vec,
>return aarch64_check_required_extensions (location, decl,
>   AARCH64_FL_TME, false);
>  
> +case AARCH64_LS64_BUILTIN_LD64B:
> +case AARCH64_LS64_BUILTIN_ST64B:
> +case AARCH64_LS64_BUILTIN_ST64BV:
> +case AARCH64_LS64_BUILTIN_ST64BV0:
> +  return aarch64_check_required_extensions (location, decl,
> + AARCH64_FL_LS64, false);
> +
>  default:
>break;
>  }
> @@ -2798,6 +2804,11 @@ aarch64_expand_builtin_ls64 (int fcode, tree exp, rtx 
> target)
>  {
>expand_operand ops[3];
>  
> +  tree fndecl = aarch64_builtin_decls[fcode];
> +  if (!aarch64_check_required_extensions (EXPR_LOCATION (exp), fndecl,
> +   AARCH64_FL_LS64, false))
> +return target;
> +
>switch (fcode)
>  {
>  case AARCH64_LS64_BUILTIN_LD64B:
> diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
> index 
> ab04326791309796125860ce64e63fe858a4a733..ab4e7e60e046a9e9c81237de2ca5463c3d4f96ca
>  100644
> --- a/gcc/config/aarch64/arm_acle.h
> +++ b/gcc/config/aarch64/arm_acle.h
> @@ -265,9 +265,7 @@ __crc32d (uint32_t __a, uint64_t __b)
>  #define _TMFAILURE_INT0x0080u
>  #define _TMFAILURE_TRIVIAL0x0100u
>  
> -#ifdef __ARM_FEATURE_LS64
>  typedef __arm_data512_t data512_t;
> -#endif
>  
>  #pragma GCC push_options
>  #pragma GCC target ("+nothing+rng")
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-1.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-1.c
> new file mode 100644
> index 
> ..7dfc193a2934c994220280990316027c07e75ac4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-1.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv8.6-a" } */
> +
> +#include 
> +
> +data512_t foo (void * p)
> +{
> +  return __arm_ld64b (p); /* { dg-error {ACLE function '__arm_ld64b' 
> requires ISA extension 'ls64'} } */
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-2.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-2.c
> new file mode 100644
> index 
> ..3ede05a81f026f8606ee2c9cd56f15ce45caa1c8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-2.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv8.6-a" } */
> +
> +#include 
> +
> +#pragma GCC target("arch=armv8-a+ls64")
> +data512_t foo (void * p)
> +{
> +  return __arm_ld64b (p);
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-3.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-3.c
> new file mode 100644
> index 
> ..e0fccdad7bec4aa522fb709d010289fd02f91d05
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-3.c
> @@ -0,0

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Jens Gustedt

As said, even if we don't consider this problematic because we are used to the 
mildly complex case distinction that you just exposed over several paragraphs, 
it doesn't mean that we should do it, nor does it mean that it would be 
beneficial for our users or for other implementations that would like to 
follow. 

And also as said, all other features in the standard, being types, typeof, or 
expressions, e.g offsetof, unreachable or other gnu extensions,  don't have nor 
need this kind of syntax.

We should be designing features for the future, not the past

Jens
-- 
Jens Gustedt - INRIA & ICube, Strasbourg, France

Re: [PATCH 0/8] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

2024-08-08 Thread Steve Kargl

On Thu, Aug 08, 2024 at 11:09:10AM +0200, Mikael Morin wrote:
> 
> These patches are about inlining, there is no manipulation of the parse
> tree.  So I would rather use a separate option (-finline-intrinsics?).

I've only followed the discussion from afar, but gcc already supports
a -finline and -fno-inline option.

% gfortran13 -o z -fno-inline a.f90
% ./z
 0  -2.459358E-01  5.567117E-02
 1   4.347283E-02 -9.840712E-03
 2   2.546304E-01 -5.763932E-02
 3   5.837931E-02 -1.321501E-02
 4  -2.196027E-01  4.971030E-02
 5  -2.340615E-01  5.298326E-02
 6  -1.445877E-02  3.272955E-03
 7   2.167110E-01 -4.905571E-02
 8   3.178541E-01 -7.195095E-02
 9   2.918557E-01 -6.606582E-02
 4.347275E-02  2.490154E-01

gcc/opts.cc:  SET_OPTION_IF_UNSET (opts, opts_set, flag_inline_functions, 
value);

This, of course, controls all inlining not just intrinsic subprograms.

PS: Thanks for the series of cleanup and improvement patches!
-- 
Steve

[patch,avr,applied] Fix a typo in built-in documentation

2024-08-08 Thread Georg-Johann Lay


Applied as obvious.

Johann

--


AVR: Fix a typo in __builtin_avr_mask1 documentation.

gcc/
* doc/extend.texi (AVR Built-in Functions) : Fix a typo.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89fe5db7aed..ae1ada3cdf8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -17052,7 +17052,7 @@ __builtin_avr_insert_bits (0x01234567, bits, 0);
 @defbuiltin{uint8_t __builtin_avr_mask1 (uint8_t @var{mask}, uint8_t 
@var{offs})}

 Rotate the 8-bit constant value @var{mask} by an offset of @var{offs},
 where @var{mask} is in @{ 0x01, 0xfe, 0x7f, 0x80 @}.
-This built-in can be use as an alternative to 8-bit expressions like
+This built-in can be used as an alternative to 8-bit expressions like
 @code{1 << offs} when their computation consumes too much
 time, and @var{offs} is known to be in the range 0@dots{}7.
 @example

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Joseph Myers

On Thu, 8 Aug 2024, Jens Gustedt wrote:

> No, the ambiguity is there because the first ( after the keyword could 
> start either a type in parenthesis or an expression, and among these a 
> compound literal. If that first parenthesis would be part of the 
> construct (as for the typeof or offsetof constructs) there would be no 
> ambiguity a the only look ahead would be balanced parenthesis parsing.

I don't consider this ambiguity / unbounded lookahead in any problematic 
sense.  There are the following cases for sizeof:

* Not followed by '(': sizeof unary-expression.

* Followed by '(' then a token that does not start a type-name: sizeof 
unary-expression.

* Followed by '(' then a token that does start a type-name: sizeof 
(type-name) later-tokens, where if later-tokens start with '{' then it's 
sizeof unary-expression and otherwise it's sizeof (type-name).

The last case is not problematic because the parsing of the type-name 
doesn't depend at all on what comes after it; it's parsed exactly the same 
whether it's part of sizeof (type-name) or a compound literal.  
Fundamentally this is exactly the same as if a cast-expression starts with 
(type-name): until the end of the type name, you don't know whether it's a 
cast, or whether the cast-expression is actually a unary-expression which 
is a postfix-expression which is a compound-literal.  In both cases, the 
parsing of a compound-literal is entered only after the initial 
(type-name) has been seen, because until after the (type-name) it's not 
known which construct is being parsed.

-- 
Joseph S. Myers
josmy...@redhat.com

[patch,avr,applied] Tweak post-inc address adjustments for some __flash reads.

2024-08-08 Thread Georg-Johann Lay


Some post-inc address adjustments can be skipped when the
address register is unused after.

Johann

--

AVR: Improve POST_INC output in some rare cases.

gcc/
* config/avr/avr.cc (avr_insn_has_reg_unused_note_p): New function.
(_reg_unused_after): Use it to recognize more cases.
(avr_out_lpm_no_lpmx) [POST_INC]: Use reg_unused_after.
AVR: Improve POST_INC output in some rare cases.

gcc/
* config/avr/avr.cc (avr_insn_has_reg_unused_note_p): New function.
(_reg_unused_after): Use it to recognize more cases.
(avr_out_lpm_no_lpmx) [POST_INC]: Use reg_unused_after.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 7229aac747b..0b3fd7a36d0 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -4621,7 +4621,7 @@ avr_out_lpm_no_lpmx (rtx_insn *insn, rtx *xop, int *plen)
 	avr_asm_len ("mov %0,r0", ®, plen, 1);
 	}
 
-  if (! _reg_unused_after (insn, xop[2], false))
+  if (! reg_unused_after (insn, xop[2]))
 	avr_asm_len ("adiw %2,1", xop, plen, 1);
 
   break; /* POST_INC */
@@ -11318,6 +11318,25 @@ avr_adjust_insn_length (rtx_insn *insn, int len)
   return len;
 }
 
+
+/* Return true when INSN has a REG_UNUSED note for hard reg REG.
+   rtlanal.cc::find_reg_note() uses == to compare XEXP (link, 0)
+   therefore use a custom function.  */
+
+static bool
+avr_insn_has_reg_unused_note_p (rtx_insn *insn, rtx reg)
+{
+  for (rtx link = REG_NOTES (insn); link; link = XEXP (link, 1))
+if (REG_NOTE_KIND (link) == REG_UNUSED
+	&& REG_P (XEXP (link, 0))
+	&& REGNO (reg) >= REGNO (XEXP (link, 0))
+	&& END_REGNO (reg) <= END_REGNO (XEXP (link, 0)))
+  return true;
+
+  return false;
+}
+
+
 /* Return nonzero if register REG dead after INSN.  */
 
 int
@@ -11344,6 +11363,17 @@ _reg_unused_after (rtx_insn *insn, rtx reg, bool look_at_insn)
   if (set && !MEM_P (SET_DEST (set))
 	  && reg_overlap_mentioned_p (reg, SET_DEST (set)))
 	return 1;
+
+  /* This case occurs when fuse-add introduced a POST_INC addressing,
+	 but the address register is unused after.  */
+  if (set)
+	{
+	  rtx mem = MEM_P (SET_SRC (set)) ? SET_SRC (set) : SET_DEST (set);
+	  if (MEM_P (mem)
+	  && reg_overlap_mentioned_p (reg, XEXP (mem, 0))
+	  && avr_insn_has_reg_unused_note_p (insn, reg))
+	return 1;
+	}
 }
 
   while ((insn = NEXT_INSN (insn)))

[patch,avr,applied] Fix target/116295 unrecognizable insn

2024-08-08 Thread Georg-Johann Lay


Applied this fix to trunk and v14 branch.

Johann

--

AVR: target/116295 - Fix unrecognizable insn with __flash read.

Some loads from non-generic address-spaces are performed by
libgcc calls, and they don't have a POST_INC form.  Don't consider
such insns when running -mfuse-add.

PR target/116295
gcc/
* config/avr/avr.cc (Mem_Insn::Mem_Insn): Don't consider MEMs
that are avr_mem_memx_p or avr_load_libgcc_p.

gcc/testsuite/
* gcc.target/avr/torture/pr116295.c: New test.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 0b3fd7a36d0..5cfd67a8e74 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -2121,6 +2121,10 @@ avr_pass_fuse_add::Mem_Insn::Mem_Insn (rtx_insn *insn)
   else
 return;
 
+  if (avr_mem_memx_p (mem)
+  || avr_load_libgcc_p (mem))
+return;
+
   addr = XEXP (mem, 0);
   addr_code = GET_CODE (addr);
 
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr116295.c b/gcc/testsuite/gcc.target/avr/torture/pr116295.c
new file mode 100644
index 000..0b3d380ff14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr116295.c
@@ -0,0 +1,22 @@
+/* { dg-do link } */
+/* { dg-additional-options "-std=gnu99" } */
+
+#ifdef __FLASH
+
+long val;
+
+__attribute__((used))
+const __flash long*
+load4_flash (const __flash long *p)
+{
+val += *p++;
+val += *p++;
+return p;
+}
+
+#endif
+
+int main (void)
+{
+return 0;
+}

[PATCH] c++: inherited CTAD fixes [PR116276]

2024-08-08 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK for trunk/14?

-- >8 --

This implements the inherited vs non-inherited guide tiebreaker
specified by P2582R1.  In order to track inherited-ness of a guide
it seems natural to reuse the lang_decl_fn::context field that already
tracks inherited-ness of a constructor.

This patch also works around CLASSTYPE_CONSTRUCTORS apparently not
always containing all inherited constructors, by iterating over
TYPE_FIELDS instead.

This patch also makes us recognize another written form of inherited
constructor, 'using Base::Base::Base' whose USING_DECL_SCOPE is a
TYPENAME_TYPE.

PR c++/116276

gcc/cp/ChangeLog:

* call.cc (joust): Implement P2582R1 inherited vs non-inherited
guide tiebreaker.
* cp-tree.h (lang_decl_fn::context): Document usage in
deduction_guide_p FUNCTION_DECLs.
(inherited_guide_p): Declare.
* pt.cc (inherited_guide_p): Define.
(set_inherited_guide_context): Define.
(alias_ctad_tweaks): Use set_inherited_guide_context.
(inherited_ctad_tweaks): Recognize some inherited constructors
whose scope is a TYPENAME_TYPE.
(ctor_deduction_guides_for): For C++23 inherited CTAD, loop
over TYPE_FIELDS instead of using CLASSTYPE_CONSTRUCTORS to
recognize all relevant using-decls.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/class-deduction-inherited4.C: Extend test.
* g++.dg/cpp23/class-deduction-inherited5.C: New test.
---
 gcc/cp/call.cc| 22 +
 gcc/cp/cp-tree.h  |  8 +++-
 gcc/cp/pt.cc  | 45 +++
 .../g++.dg/cpp23/class-deduction-inherited4.C | 15 ++-
 .../g++.dg/cpp23/class-deduction-inherited5.C | 25 +++
 5 files changed, 103 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/class-deduction-inherited5.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index a75e2e5e3af..3287f77b59b 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13261,6 +13261,28 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
   else if (cand2->rewritten ())
 return 1;
 
+  /* F1 and F2 are generated from class template argument deduction for a class
+ D, and F2 is generated from inheriting constructors from a base class of D
+ while F1 is not, and for each explicit function argument, the 
corresponding
+ parameters of F1 and F2 are either both ellipses or have the same type  */
+  if (deduction_guide_p (cand1->fn))
+{
+  bool inherited1 = inherited_guide_p (cand1->fn);
+  bool inherited2 = inherited_guide_p (cand2->fn);
+  if (int diff = inherited2 - inherited1)
+   {
+ for (i = 0; i < len; ++i)
+   {
+ conversion *t1 = cand1->convs[i + off1];
+ conversion *t2 = cand2->convs[i + off2];
+ if (!same_type_p (t1->type, t2->type))
+   break;
+   }
+ if (i == len)
+   return diff;
+   }
+}
+
   /* F1 is generated from a deduction-guide (13.3.1.8) and F2 is not */
   if (deduction_guide_p (cand1->fn))
 {
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 0c25ec5a04e..0b76fef0df4 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2973,8 +2973,11 @@ struct GTY(()) lang_decl_fn {
  chained here.  This pointer thunks to return pointer thunks
  will be chained on the return pointer thunk.
  For a DECL_CONSTUCTOR_P FUNCTION_DECL, this is the base from
- whence we inherit.  Otherwise, it is the class in which a
- (namespace-scope) friend is defined (if any).   */
+ whence we inherit.
+ For a deduction_guide_p FUNCTION_DECL, this is the base class
+ from whence we inherited the guide (if any).
+ Otherwise, it is the class in which a (namespace-scope) friend
+ is defined (if any).  */
   tree context;
 
   union lang_decl_u5
@@ -7655,6 +7658,7 @@ extern bool deduction_guide_p 
(const_tree);
 extern bool copy_guide_p   (const_tree);
 extern bool template_guide_p   (const_tree);
 extern bool builtin_guide_p(const_tree);
+extern bool inherited_guide_p  (const_tree);
 extern void store_explicit_specifier   (tree, tree);
 extern tree lookup_explicit_specifier  (tree);
 extern tree lookup_imported_hidden_friend  (tree);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d468a3037b6..b518e6d5185 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -29678,6 +29678,26 @@ builtin_guide_p (const_tree fn)
   return true;
 }
 
+/* True if FN is a C++23 inherited guide.  */
+
+bool
+inherited_guide_p (const_tree fn)
+{
+  gcc_assert (deduction_guide_p (fn));
+  return LANG_DECL_FN_CHECK (fn)->context != NULL_TREE;
+}
+
+/* Set the base class from which this transformed guide was inherited
+   as part of C++23 inherited

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Alejandro Colomar

Hi Martin, Jens, Joseph,

On Thu, Aug 08, 2024 at 06:30:42PM GMT, Martin Uecker wrote:
> Am Donnerstag, dem 08.08.2024 um 18:23 +0200 schrieb Jens Gustedt:
> > As said, even if we don't consider this problematic because we are used to 
> > the mildly complex case distinction that you just exposed over several 
> > paragraphs, it doesn't mean that we should
> > do it, nor does it mean that it would be beneficial for our users or for 
> > other implementations that would like to follow. 
> > 
> > And also as said, all other features in the standard, being types, typeof, 
> > or expressions, e.g offsetof, unreachable or other gnu extensions,  don't 
> > have nor need this kind of syntax.
> > 
> > We should be designing features for the future, not the past
> 
> 
> While not problematic for parsing, I see now how the grammar becomes
> better if we eliminated this quirk. Thanks!
> 
> But we should then deprecate this for sizeof too.

How about having __lengthof__ behave like sizeof, but deprecate it in
sizeof too?

ISO C could accept only lengthof() with parens, and we could have it
without them as a deprecated-on-arrival GNU extension.

And then remove it from both at some point in the future.

We could start by adding a -Wall warning for sizeof without parens, and
promote it to an error a few versions later.

Have a lovely day!
Alex

P.S.:  I'm doing a whole-tree update to use __lengthof__ instead of
open-coded sizeof divisons or macros based on it, and I've found several
bugs already.  I'll use this change to test the new operator in the
entire code base, which should result in no regressions at all.  That
would be an interesting test suite.  :)

However, I advance that it will be painful to review that patch.

-- 

signature.asc
Description: PGP signature

[r15-2820 Regression] FAIL: gcc.target/i386/pr105493.c scan-tree-dump-times slp1 " MEM \\[[^]]*\\] = " 4 on Linux/x86_64

2024-08-08 Thread haochen.jiang

On Linux/x86_64,

ab18785840d7b8afd9f716bab9d1eab415bc4fe9 is the first bad commit
commit ab18785840d7b8afd9f716bab9d1eab415bc4fe9
Author: Manolis Tsamis 
Date:   Tue Jun 25 08:00:04 2024 -0700

Rearrange SLP nodes with duplicate statements [PR98138]

caused

FAIL: gcc.target/i386/pr105493.c scan-tree-dump-times slp1 "  MEM 
 \\[[^]]*\\] = " 4

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-2820/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr105493.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr105493.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[PATCH 1/2] RISC-V: Constant synthesis with same upper and lower halves

2024-08-08 Thread Raphael Moreira Zinsly

From: Raphael Zinsly 

Improve handling of constants where its upper and lower 32-bit
halves are the same and Zbkb is not available in riscv_move_integer.
riscv_split_integer already handles this but the changes in
riscv_build_integer makes it possible to improve code generation for
negative values.

e.g. for:

unsigned long f (void) { return 0xf857f2def857f2deUL; }

Without the patch:

li  a0,-128454656
addia0,a0,734
li  a5,-128454656
addia5,a5,735
sllia5,a5,32
add a0,a5,a0

With the patch:

li  a0,128454656
addia0,a0,-735
sllia5,a0,32
add a0,a0,a5
xoria0,a0,-1

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_build_integer): Detect constants
with the same 32-bit halves and without Zbkb.
(riscv_move_integer): Add synthesys of these constants.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/synthesis-11.c: New test.

Co-authored-by: Jeff Law 
---
 gcc/config/riscv/riscv.cc | 59 +--
 gcc/testsuite/gcc.target/riscv/synthesis-11.c | 40 +
 2 files changed, 93 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/synthesis-11.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8ece7859945..454220d8ba4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1223,6 +1223,43 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
}
 
 }
+  else if (cost > 3 && TARGET_64BIT && can_create_pseudo_p ())
+{
+  struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
+  int alt_cost;
+
+  unsigned HOST_WIDE_INT loval = value & 0x;
+  unsigned HOST_WIDE_INT hival = (value & ~loval) >> 32;
+  bool bit31 = (hival & 0x8000) != 0;
+  /* Without pack we can generate it with a shift 32 followed by an or.  */
+  if (hival == loval && !bit31)
+   {
+ alt_cost = 2 + riscv_build_integer_1 (alt_codes,
+   sext_hwi (loval, 32), mode);
+ if (alt_cost < cost)
+   {
+ /* We need to save the first constant we build.  */
+ alt_codes[alt_cost - 3].save_temporary = true;
+
+ /* Now we want to shift the previously generated constant into the
+high half.  */
+ alt_codes[alt_cost - 2].code = ASHIFT;
+ alt_codes[alt_cost - 2].value = 32;
+ alt_codes[alt_cost - 2].use_uw = false;
+ alt_codes[alt_cost - 2].save_temporary = false;
+
+ /* And the final step, IOR the two halves together.  Since this 
uses
+the saved temporary, use CONCAT similar to what we do for 
Zbkb.  */
+ alt_codes[alt_cost - 1].code = CONCAT;
+ alt_codes[alt_cost - 1].value = 0;
+ alt_codes[alt_cost - 1].use_uw = false;
+ alt_codes[alt_cost - 1].save_temporary = false;
+
+ memcpy (codes, alt_codes, sizeof (alt_codes));
+ cost = alt_cost;
+   }
+   }
+}
 
   return cost;
 }
@@ -2786,12 +2823,22 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT 
value,
}
  else if (codes[i].code == CONCAT || codes[i].code == VEC_MERGE)
{
- rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp;
- rtx t2 = codes[i].code == VEC_MERGE ? old_value : x;
- gcc_assert (t2);
- t2 = gen_lowpart (SImode, t2);
- emit_insn (gen_riscv_xpack_di_si_2 (t, x, GEN_INT (32), t2));
- x = t;
+ if (codes[i].code == CONCAT && !TARGET_ZBKB)
+   {
+ /* The two values should have no bits in common, so we can
+use PLUS instead of IOR which has a higher chance of
+using a compressed instruction.  */
+ x = gen_rtx_PLUS (mode, x, old_value);
+   }
+ else
+   {
+ rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp;
+ rtx t2 = codes[i].code == VEC_MERGE ? old_value : x;
+ gcc_assert (t2);
+ t2 = gen_lowpart (SImode, t2);
+ emit_insn (gen_riscv_xpack_di_si_2 (t, x, GEN_INT (32), t2));
+ x = t;
+   }
}
  else
x = gen_rtx_fmt_ee (codes[i].code, mode,
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-11.c 
b/gcc/testsuite/gcc.target/riscv/synthesis-11.c
new file mode 100644
index 000..98401d5ca32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-11.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* We aggressively skip as we really just need to test the basic synthesis
+   which shouldn't vary based on the optimization level.  -O1 seems to work
+   and eliminates the usual sources of extraneous dead code that would throw
+   off

[PATCH 2/2] RISC-V: Constant synthesis by shifting the lower half

2024-08-08 Thread Raphael Moreira Zinsly

Improve handling of constants where the high half can be constructed
by shifting the low half.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_build_integer): Detect constants
were the higher half is a shift of the lower half.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/synthesis-12.c: New test.
---
 gcc/config/riscv/riscv.cc | 39 +++
 gcc/testsuite/gcc.target/riscv/synthesis-12.c | 27 +
 2 files changed, 66 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/synthesis-12.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 454220d8ba4..a3e8a243f15 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1259,6 +1259,45 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
  cost = alt_cost;
}
}
+
+  if (cost > 4 && !bit31)
+   {
+ int trailing_shift = ctz_hwi (loval) - ctz_hwi (hival);
+ int leading_shift = clz_hwi (loval) - clz_hwi (hival);
+ alt_cost = 2 + riscv_build_integer_1 (alt_codes, sext_hwi (loval, 32),
+   mode);
+ /* For constants where the upper half is a shift of the lower half we
+can do a similar transformation as for constants with the same
+halves.  */
+ if (alt_cost < cost)
+   {
+ alt_codes[alt_cost - 3].save_temporary = true;
+ alt_codes[alt_cost - 2].code = ASHIFT;
+ alt_codes[alt_cost - 2].use_uw = false;
+ alt_codes[alt_cost - 2].save_temporary = false;
+ alt_codes[alt_cost - 1].code = CONCAT;
+ alt_codes[alt_cost - 1].value = 0;
+ alt_codes[alt_cost - 1].use_uw = false;
+ alt_codes[alt_cost - 1].save_temporary = false;
+
+ /* Adjust the shift into the high half accordingly.  */
+ if ((trailing_shift > 0 && hival == (loval >> trailing_shift)) ||
+  (trailing_shift < 0 && hival == (loval << trailing_shift)))
+   {
+ alt_codes[alt_cost - 2].value = 32 - trailing_shift;
+ memcpy (codes, alt_codes, sizeof (alt_codes));
+ cost = alt_cost;
+   }
+ else if ((leading_shift < 0 && hival == (loval >> leading_shift))
+   || (leading_shift > 0
+   && hival == (loval << leading_shift)))
+   {
+ alt_codes[alt_cost - 2].value = 32 + leading_shift;
+ memcpy (codes, alt_codes, sizeof (alt_codes));
+ cost = alt_cost;
+   }
+   }
+   }
 }
 
   return cost;
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-12.c 
b/gcc/testsuite/gcc.target/riscv/synthesis-12.c
new file mode 100644
index 000..0265a2d6f13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-12.c
@@ -0,0 +1,27 @@
+
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* We aggressively skip as we really just need to test the basic synthesis
+   which shouldn't vary based on the optimization level.  -O1 seems to work
+   and eliminates the usual sources of extraneous dead code that would throw
+   off the counts.  */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } 
*/
+/* { dg-options "-march=rv64gc" } */
+
+/* Rather than test for a specific synthesis of all these constants or
+   having thousands of tests each testing one variant, we just test the
+   total number of instructions.
+
+   This isn't expected to change much and any change is worthy of a look.  */
+/* { dg-final { scan-assembler-times 
"\\t(add|addi|bseti|li|pack|ret|sh1add|sh2add|sh3add|slli|srli|xori|or)" 45 } } 
*/
+
+
+unsigned long foo_0x7857f2de7857f2de(void) { return 0x7857f2de7857f2deUL; }
+unsigned long foo_0x7fffdffe3fffefff(void) { return 0x7fffdffe3fffefffUL; }
+unsigned long foo_0x17fe3fffeffc(void) { return 0x17fe3fffeffcUL; }
+unsigned long foo_0x0a3fdbf0028ff6fc(void) { return 0x0a3fdbf0028ff6fcUL; }
+unsigned long foo_0x014067e805019fa0(void) { return 0x014067e805019fa0UL; }
+unsigned long foo_0x09d87e90009d87e9(void) { return 0x09d87e90009d87e9UL; }
+unsigned long foo_0x230232118119(void) { return 0x230232118119UL; }
+unsigned long foo_0x000711eb00e23d60(void) { return 0x000711eb00e23d60UL; }
+unsigned long foo_0x598381660e00(void) { return 0x598381660e00UL; }
-- 
2.42.0

Re: [PATCH 0/8] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

2024-08-08 Thread Thomas Koenig


Am 08.08.24 um 11:09 schrieb Mikael Morin:


As we are not planning to remove the library implementation (-Os!),
this is also the best way to compare library to inline code.


This makes perfect sense, but why reuse the -ffrontend-optimize option?
The manual describes it as:
This option performs front-end optimization, based on manipulating 
parts the Fortran parse tree


These patches are about inlining, there is no manipulation of the parse 
tree.  So I would rather use a separate option (-finline-intrinsics?).


I concur, -ffrontend-optimize should remain specific to manipulating
the parse tree.

Having a -finline-intrinsic options, which would be set at any
optimization level except -Os, may be the right way forward.

Or something else :-)

Best regards

Thomas

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread David Brown





On 08/08/2024 11:13, Jens Gustedt wrote:

Hi

Am 8. August 2024 10:26:14 MESZ schrieb Alejandro Colomar :

Hello Jens,

On Thu, Aug 08, 2024 at 07:35:12AM GMT, Jₑₙₛ Gustedt wrote:

Hello Alejandro,

On Thu, 8 Aug 2024 00:44:02 +0200, Alejandro Colomar wrote:


+Its syntax is similar to @code{sizeof}.


For my curiosity, do you also make the same distinction that with
expressions you may omit the parenthesis?


I thought of it.  TBH, I haven't tested that thoroughly.

In principle, I have implemented it in the same way as sizeof, yes.

Personally, I would have never allowed sizeof without parentheses, but I
understand there are people who think the parentheses hurt readability,
so I kept it in the same way.

I'm not sure why the parentheses are necessary with type names in
sizeof,


probably because of operator precedence. there would be no rule that tells us 
where sizeof ends and we'd switch back from parsing a type to parsing an 
expression



I personally have always found it looks odd that the sizeof operator 
does not always need parentheses - I suppose that is because it is a 
word, rather than punctuation.  To me, it looks more like a function or 
function-like macro.  And I'd view lengthof in the same light.  However, 
that's just personal opinion, not a rational argument!





but to maintain expectations, I think it would be better to do
the same here.


Just to compare, the recent additions in C23 typeof etc. only have the 
parenthesized versions. So there would be precedent. And it really eases 
transition



_Alignof (now "alignof") from C11 always needs parentheses too - but it 
always applies to a type, not an expression.  (I think it should also be 
possible to use it with expressions for consistency, but that's another 
matter.)


As I see it, there is a good reason to say that a "lengthof" feature 
should always have parentheses.  With "typeof" (either as the gcc 
extension or the C23 feature), you can come a long way to the 
functionality of the proposed "lengthof" (or "__lengthof__") using a 
macro.  This will mean that if someone writes code using the new feature 
in gcc, and another person wants to compile the code with older gcc or a 
different compiler, they can use a macro (even "#define lengthof(arr) 
(sizeof(arr)/sizeof((arr)[0])", which is less safe but works everywhere)
instead.  But that is only true of the person writing the original 
"lengthof" code has included the parentheses.






I wouldn't be sure that we should continue that distinction from
`sizeof`.


But then, what do we do?  Allow lengthof with type names without parens?
Or require parens?  I'm not comfortable with that choice.


Also that prefix variant would be difficult to wrap in a
`lengthof` macro (without underscores) as we would probably like to
have it in the end.


Do you mean that I should add _Lengthof?  We're adding __lengthof__ to
be a GNU extension with relative freedom from ISO.  If I sent a patch
adding _Lengthof, we'd have to send a proposal to ISO at the same time,
and we'd be waiting for ISO to discuss it before I can merge it.  And we
couldn't bring prior art to ISO.

With this approach instead, the plan is:

-  Merge __lengthof__ in GCC before ISO hears of it (well, there are
already several WG14 members in this discussion, so you have actually
heard of it, but we're free to do more or less what we want).

-  Propose _Lengthof to ISO C, with prior art in GCC as __lengthof__,
proposing the same semantics.  Also propose a lengthof macro defined
in 


I don't really see why we should take a detour via _Lengthof, I would hope we 
could directly propose lengthof as the standardization



It is traditional for C.  It has taken until C23 to get alignof, bool, 
etc., as full keywords.  I would expect that we would have _Lengthof for 
a transitional period while "lengthof" is in "" and other 
uses of it are deprecated.  Changes in C happen slowly if backwards 
compatibility is threatened (too slowly for some people, too fast for 
others).



-  When ISO C accepts _Lengthof and lengthof, map _Lengthof in GCC to
the same internals as __lengthof__, so they are the same thing.

Still, I'm interested in having some feedback from WG14, to prevent
implementing something that will have modifications when merged to
ISO C, so please CC anyone interested from WG14, if you know of any.


I think that more important would be to have clang on board with this.

In any case, thanks for doing this!

Jens

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Joseph Myers

On Thu, 8 Aug 2024, Alejandro Colomar wrote:

> How about having __lengthof__ behave like sizeof, but deprecate it in
> sizeof too?

Deprecation would be a matter for WG14.

> We could start by adding a -Wall warning for sizeof without parens, and
> promote it to an error a few versions later.

This is very far outside the scope of -Wall.  There is nothing confusing 
for the programmer about sizeof without parentheses and no likelihood that 
the programmer meant something other than the semantics of the code.

GCC should not be opinionated about promoting personal ideas of what is or 
is not good style or what might or might not be a future language feature; 
it should support a wide range of different programming styles.  The 
threshold for warning about something in -Wall (or -Wextra) should be much 
higher than "the language design would be simpler without this feature".

> P.S.:  I'm doing a whole-tree update to use __lengthof__ instead of
> open-coded sizeof divisons or macros based on it, and I've found several
> bugs already.  I'll use this change to test the new operator in the
> entire code base, which should result in no regressions at all.  That
> would be an interesting test suite.  :)

I think the code base (code on the host is generally in C++) should be 
readable to people who know C++ (C++11 is the documented requirement for 
building GCC - we're very conservative about adopting new language 
versions, to facilitate bootstrapping on a wide range of systems) as it 
is, not a playground for trying out new language features.  We have enough 
GCC-specific versions of standard features as it is (e.g. the GCC-specific 
vectors designed to interoperate with GCC's garbage collection), using a 
new feature that doesn't add expressivity and isn't in any standard C++ 
version doesn't seem like a good idea to me.

Actual bugs should of course be fixed.  But certainly standard features 
are preferable to something specific to GCC, and existing macros in GCC 
such as ARRAY_SIZE that people are at least familiar with are preferable 
to introducing a new language feature.

*If* the feature were adopted into C++26, we could then consider if 
existing macros should be renamed to look more like the future language 
feature.

Target code is at least always compiled with the same version of GCC, but 
it still shouldn't be a playground for new language features; that doesn't 
help readability, backporting patches to versions without the features, 
etc.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] aarch64/testsuite: Fix if-compare_2.c for removing vcond{, u, eq} patterns [PR116041]

2024-08-08 Thread Richard Sandiford

Andrew Pinski  writes:
> For bar1 and bar2, we currently is expecting to use the bsl instruction but
> with slightly different register allocation inside the loop (which happens 
> after
> the removal of the vcond{,u,eq} patterns), we get the bit instruction.  The 
> pattern that
> outputs bsl instruction will output bit and bif too depending register 
> allocation.
>
> So let's check for bsl, bit or bif instructions instead of just bsl 
> instruction.
>
> Tested on aarch64 both with an unmodified compiler and one which has the 
> patch to disable
> these optabs.
>
> gcc/testsuite/ChangeLog:
>
>   PR testsuite/116041
>   * gcc.target/aarch64/if-compare_2.c: Support bit and bif for
>   both bar1 and bar2; add comment on why too.

OK, thanks.

Richard

>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.target/aarch64/if-compare_2.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/if-compare_2.c 
> b/gcc/testsuite/gcc.target/aarch64/if-compare_2.c
> index 14988abac45..f5a2b1956e3 100644
> --- a/gcc/testsuite/gcc.target/aarch64/if-compare_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/if-compare_2.c
> @@ -8,6 +8,7 @@
>  
>  typedef int v4si __attribute__ ((vector_size (16)));
>  
> +
>  /*
>  **foo1:
>  **   cmgtv0.4s, v1.4s, v0.4s
> @@ -29,11 +30,13 @@ v4si foo2 (v4si a, v4si b, v4si c, v4si d) {
>  }
>  
>  
> +/* The bsl could be bit or bif depending on register
> +   allocator inside the loop. */
>  /**
>  **bar1:
>  **...
>  **   cmgev[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> -**   bsl v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
> +**   (bsl|bit|bif)   v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>  **   and v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>  **...
>  */
> @@ -44,11 +47,13 @@ void bar1 (int * restrict a, int * restrict b, int * 
> restrict c,
>  res[i] = ((a[i] < b[i]) & c[i]) | ((a[i] >= b[i]) & d[i]);
>  }
>  
> +/* The bsl could be bit or bif depending on register
> +   allocator inside the loop. */
>  /**
>  **bar2:
>  **...
>  **   cmgev[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> -**   bsl v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
> +**   (bsl|bit|bif)   v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>  **...
>  */
>  void bar2 (int * restrict a, int * restrict b, int * restrict c,

[PATCH] c++: ICE with NSDMIs and fn arguments [PR116015]

2024-08-08 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
The problem in this PR is that we ended up with

  {.rows=(&)->n,
   .outer_stride=(&)->rows}

that is, two PLACEHOLDER_EXPRs for different types on the same level
in one { }.  That should not happen; we may, for instance, neglect to
replace a PLACEHOLDER_EXPR due to CONSTRUCTOR_PLACEHOLDER_BOUNDARY on
the constructor.

The same problem happened in PR100252, which I fixed by introducing
replace_placeholders_for_class_temp_r.  That didn't work here, though,
because r_p_for_c_t_r only works for non-eliding TARGET_EXPRs: replacing
a PLACEHOLDER_EXPR with a temporary that is going to be elided will
result in a crash in gimplify_var_or_parm_decl when it encounters such
a loose decl.

But leaving the PLACEHOLDER_EXPRs in is also bad because then we end
up with this PR.

TARGET_EXPRs for function arguments are elided in gimplify_arg.  The
argument will get a real temporary only in get_formal_tmp_var.  One
idea was to use the temporary that is going to be elided anyway, and
then replace_decl it with the real object once we get it.  But that
didn't work out: one problem is that we elide the TARGET_EXPR for an
argument before we create the real temporary for the argument, and
when we get it, the context that this was a TARGET_EXPR for an argument
has been lost.  We're also in the middle end territory now, even though
this is a C++-specific problem.

I figured that since the to-be-elided temporary is going to stay around
until gimplification, the front end is free to use it.  Once we're done
with things like store_init_value, which replaces PLACEHOLDER_EXPRs with
the decl it is initializing, we can turn those to-be-elided temporaries
into PLACEHOLDER_EXPRs again, so that cp_gimplify_init_expr can replace
them with the real object once available.  The context is not lost so we
do not need an extra flag for these makeshift temporaries.

PR c++/116015

gcc/cp/ChangeLog:

* cp-gimplify.cc (replace_argument_temps_with_placeholders): New.
(cp_genericize_r) : Call it.
* typeck2.cc (replace_placeholders_for_class_temp_r): Do replace
placeholders in TARGET_EXPRs for function arguments.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/nsdmi-aggr23.C: New test.
---
 gcc/cp/cp-gimplify.cc | 24 +
 gcc/cp/typeck2.cc | 23 
 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr23.C | 26 +++
 3 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr23.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 003e68f1ea7..7f203cd6804 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1597,6 +1597,29 @@ predeclare_vla (tree expr)
 }
 }
 
+/* Replace the temporaries used for function arguments with PLACEHOLDER_EXPRs
+   so that they can be replaced again.  See the comment in
+   replace_placeholders_for_class_temp_r for more context.  CALL is either
+   the CALL_EXPR or AGGR_INIT_EXPR whose arguments we're going to adjust.  */
+
+static void
+replace_argument_temps_with_placeholders (tree call)
+{
+  for (int i = 0; i < call_expr_nargs (call); ++i)
+{
+  tree arg = get_nth_callarg (call, i);
+  if (TREE_CODE (arg) == TARGET_EXPR && TARGET_EXPR_ELIDING_P (arg))
+   {
+ tree slot = TARGET_EXPR_SLOT (arg);
+ tree placeholder = build0 (PLACEHOLDER_EXPR, TREE_TYPE (slot));
+ tree &init = TARGET_EXPR_INITIAL (arg);
+ if (replace_decl (&init, slot, placeholder)
+ && TREE_CODE (init) == CONSTRUCTOR)
+   CONSTRUCTOR_PLACEHOLDER_BOUNDARY (init) = true;
+   }
+}
+}
+
 /* Perform any pre-gimplification lowering of C++ front end trees to
GENERIC.  */
 
@@ -2125,6 +2148,7 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
*data)
 version is inlinable, a direct call to this version can be made
 otherwise the call should go through the dispatcher.  */
   {
+   replace_argument_temps_with_placeholders (stmt);
tree fn = cp_get_callee_fndecl_nofold (stmt);
if (fn && DECL_FUNCTION_VERSIONED (fn)
&& (current_function_decl == NULL
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 30a6fbe95c9..9b6109ac3ff 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -1425,6 +1425,29 @@ replace_placeholders_for_class_temp_r (tree *tp, int *, 
void *)
 {
   tree t = *tp;
 
+  /* If a TARGET_EXPR is an initializer of a function argument, it is
+ going to be elided in gimplify_arg.  So, we should not be using
+ its slot to replace the PLACEHOLDER_EXPR.  But we will only have
+ the real object once we've gimplified the argument, which is too
+ late: we should replace the PLACEHOLDER_EXPR now to avoid winding
+ up with two different PLACEHOLDER_EXPRs in a single {} (c++/116015).
+
+ We can get away with using the temporary...temporarily, bec

Re: [PATCH 3/3] libcpp: add AVX2 helper

2024-08-08 Thread Alexander Monakov



> On Wed, 7 Aug 2024, Richard Biener wrote:
> 
> > OK with that change.
> > 
> > Did you think about a AVX512 version (possibly with 32 byte vectors)?
> > In case there's a more efficient variant of pshufb/pmovmskb available
> > there - possibly
> > the load on the branch unit could be lessened with using masking.
> 
> Thanks for the idea; unfortunately I don't see any possible improvement.
> It would trade pmovmskb-(test+jcc,fused) for ktest-jcc, so unless the
> latencies are shorter it seems to be a wash. The only way to use fewer
> branches seems to be employing longer vectors.

A better answer is that of course we can reduce branching without AVX2
by using two SSE vectors in place of one 32-byte AVX vector. In fact,
with a bit of TLC this idea works out really well, and the result
closely matches the AVX2 code in performance. To put that in numbers,
on the testcase from my microbenchmark archive, unmodified GCC shows

 Performance counter stats for 'gcc/cc1plus.orig -fsyntax-only -quiet 
t-rawstr.cc' (9 runs):

 39.13 msec task-clock:u #1.101 CPUs 
utilized   ( +-  0.30% )
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
 2,206  page-faults:u#   56.374 K/sec   
( +-  2.58% )
61,502,159  cycles:u #1.572 GHz 
( +-  0.08% )
   749,841  stalled-cycles-frontend:u#1.22% frontend 
cycles idle( +-  0.20% )
 6,831,862  stalled-cycles-backend:u #   11.11% backend 
cycles idle ( +-  0.63% )
   141,972,604  instructions:u   #2.31  insn per 
cycle
  #0.05  stalled cycles per 
insn ( +-  0.00% )
46,054,279  branches:u   #1.177 G/sec   
( +-  0.00% )
   325,134  branch-misses:u  #0.71% of all 
branches ( +-  0.11% )

  0.035550 +- 0.000373 seconds time elapsed  ( +-  1.05% )


then with the AVX2 helper from my patchset we have

 Performance counter stats for 'gcc/cc1plus.avx -fsyntax-only -quiet 
t-rawstr.cc' (9 runs):

 36.39 msec task-clock:u #1.112 CPUs 
utilized   ( +-  0.27% )
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
 2,208  page-faults:u#   60.677 K/sec   
( +-  2.55% )
56,527,349  cycles:u #1.553 GHz 
( +-  0.09% )
   728,417  stalled-cycles-frontend:u#1.29% frontend 
cycles idle( +-  0.38% )
 6,221,761  stalled-cycles-backend:u #   11.01% backend 
cycles idle ( +-  1.58% )
   141,296,340  instructions:u   #2.50  insn per 
cycle
  #0.04  stalled cycles per 
insn ( +-  0.00% )
45,758,162  branches:u   #1.257 G/sec   
( +-  0.00% )
   295,042  branch-misses:u  #0.64% of all 
branches ( +-  0.12% )

  0.032736 +- 0.000460 seconds time elapsed  ( +-  1.41% )


and with the revised patch that uses SSSE3 more cleverly

 Performance counter stats for 'gcc/cc1plus -fsyntax-only -quiet t-rawstr.cc' 
(9 runs):

 36.89 msec task-clock:u #1.110 CPUs 
utilized   ( +-  0.29% )
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
 2,374  page-faults:u#   64.349 K/sec   
( +-  3.77% )
56,556,237  cycles:u #1.533 GHz 
( +-  0.11% )
   733,192  stalled-cycles-frontend:u#1.30% frontend 
cycles idle( +-  1.08% )
 6,271,987  stalled-cycles-backend:u #   11.09% backend 
cycles idle ( +-  1.89% )
   142,743,102  instructions:u   #2.52  insn per 
cycle
  #0.04  stalled cycles per 
insn ( +-  0.00% )
45,646,829  branches:u   #1.237 G/sec   
( +-  0.00% )
   295,155  branch-misses:u  #0.65% of all 
branches ( +-  0.11% )

  0.033242 +- 0.000418 seconds time elapsed  ( +-  1.26% )


Is the revised patch below still ok? I've rolled the configury changes into it,
and dropped the (now unnecessary) AVX2 helper (an

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Alejandro Colomar

Hi Joseph,

On Thu, Aug 08, 2024 at 05:31:05PM GMT, Joseph Myers wrote:
> On Thu, 8 Aug 2024, Alejandro Colomar wrote:
> 
> > How about having __lengthof__ behave like sizeof, but deprecate it in
> > sizeof too?
> 
> Deprecation would be a matter for WG14.

Yep; I wouldn't add it to -Wall unless WG14 decides to deprecate it
first.  But if it does, that could be the path.  For lengthof, I think
keeping it like sizeof would be the simplest, as an implementer.  And
users will probably not care too much.  And if WG14 decides to deprecate
it from sizeof, they can also deprecate it from lengthof at the same
time.

> I think the code base (code on the host is generally in C++) should be 
> readable to people who know C++ (C++11 is the documented requirement for 
> building GCC - we're very conservative about adopting new language 
> versions, to facilitate bootstrapping on a wide range of systems) as it 
> is, not a playground for trying out new language features.  We have enough 
> GCC-specific versions of standard features as it is (e.g. the GCC-specific 
> vectors designed to interoperate with GCC's garbage collection), using a 
> new feature that doesn't add expressivity and isn't in any standard C++ 
> version doesn't seem like a good idea to me.
> 
> Actual bugs should of course be fixed.  But certainly standard features 
> are preferable to something specific to GCC, and existing macros in GCC 
> such as ARRAY_SIZE that people are at least familiar with are preferable 
> to introducing a new language feature.

ARRAY_SIZE() is very rarely used.  From what I've seen, most of the
existing code uses the raw sizeof division, and there's a non-negligible
amount of typos in those.

I suggest that someone at least converts most or all calls to
ARRAY_SIZE(), so that it can later easily be changed to lengthof().

I can provide my patch as a draft, so that it's just adding some include
and s/__lengthof__/ARRAY_SIZE/, plus some whitespace and parens fixes.

> 
> *If* the feature were adopted into C++26, we could then consider if 
> existing macros should be renamed to look more like the future language 
> feature.
> 
> Target code is at least always compiled with the same version of GCC, but 
> it still shouldn't be a playground for new language features; that doesn't 
> help readability, backporting patches to versions without the features, 
> etc.

It will serve me as a huge test suite anyway; so it's worth it even if
just for myself.  And it will uncover bugs.  :)

Thanks!

Have a lovely day!
Alex

-- 

signature.asc
Description: PGP signature

Re: [PATCH v5 3/3] c: Add lengthof operator

2024-08-08 Thread Martin Uecker

Am Donnerstag, dem 08.08.2024 um 20:04 +0200 schrieb Alejandro Colomar:

> 
...
> > 
> > *If* the feature were adopted into C++26, we could then consider if 
> > existing macros should be renamed to look more like the future language 
> > feature.
> > 
> > Target code is at least always compiled with the same version of GCC, but 
> > it still shouldn't be a playground for new language features; that doesn't 
> > help readability, backporting patches to versions without the features, 
> > etc.
> 
> It will serve me as a huge test suite anyway; so it's worth it even if
> just for myself.  And it will uncover bugs.  :)

Did you implement a C++ version? Or are you talking about the C parts
of the code.  It is a bit sad that we do not get the testing of the
C FE anymore which a self-hosting would have.

Martin

1 2 >

1 - 100 of 132 matches

Mail list logo