[PATCH] expand: Fix handling of asm goto outputs vs. PHI argument adjustments [PR113921]

2024-02-15 Thread Jakub Jelinek
Hi!

The Linux kernel and the following testcase distilled from it is
miscompiled, because tree-outof-ssa.cc (eliminate_phi) emits some
fixups on some of the edges (but doesn't commit edge insertions).
Later expand_asm_stmt emits further instructions on the same edge.
Now the problem is that expand_asm_stmt uses insert_insn_on_edge
to add its own fixups, but that function appends to the existing
sequence on the edge if any.  And the bug triggers when the
fixup sequence emitted by eliminate_phi uses a pseudo which the
fixup sequence emitted by expand_asm_stmt later on sets.
So, we end up with
  (set (reg A) (asm_operands ...))
and on one of the edges queued sequence
  (set (reg C) (reg B)) // added by eliminate_phi
  (set (reg B) (reg A)) // added by expand_asm_stmt
That is wrong, what we emit by expand_asm_stmt needs to be as close
to the asm_operands as possible (they aren't known until expand_asm_stmt
is called, the PHI fixup code assumes it is reg B which holds the right
value) and the PHI adjustments need to be done after it.

So, the following patch introduces a prepend_insn_to_edge function and
uses it from expand_asm_stmt, so that we queue
  (set (reg B) (reg A)) // added by expand_asm_stmt
  (set (reg C) (reg B)) // added by eliminate_phi
instead and so the value from the asm_operands output propagates correctly
to the PHI result.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I think we need to backport it to all release branches (fortunately
non-supported compilers aren't affected because GCC 11 was the first one
to support asm goto with outputs), in cfgexpand.cc it won't apply cleanly
due to the PR113415 fix, but manually applying it there will work.

2024-02-15  Jakub Jelinek  

PR middle-end/113921
* cfgrtl.h (prepend_insn_to_edge): New declaration.
* cfgrtl.cc (insert_insn_on_edge): Clarify behavior in function
comment.
(prepend_insn_to_edge): New function.
* cfgexpand.cc (expand_asm_stmt): Use prepend_insn_to_edge instead of
insert_insn_on_edge.

* gcc.target/i386/pr113921.c: New test.

--- gcc/cfgrtl.h.jj 2024-01-03 11:51:42.576577897 +0100
+++ gcc/cfgrtl.h2024-02-14 21:19:13.029797669 +0100
@@ -38,6 +38,7 @@ extern edge try_redirect_by_replacing_ju
 extern void emit_barrier_after_bb (basic_block bb);
 extern basic_block force_nonfallthru_and_redirect (edge, basic_block, rtx);
 extern void insert_insn_on_edge (rtx, edge);
+extern void prepend_insn_to_edge (rtx, edge);
 extern void commit_one_edge_insertion (edge e);
 extern void commit_edge_insertions (void);
 extern void print_rtl_with_bb (FILE *, const rtx_insn *, dump_flags_t);
--- gcc/cfgrtl.cc.jj2024-01-03 11:51:28.900767705 +0100
+++ gcc/cfgrtl.cc   2024-02-14 21:19:24.036651779 +0100
@@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.
  - CFG-aware instruction chain manipulation
 delete_insn, delete_insn_chain
  - Edge splitting and committing to edges
-insert_insn_on_edge, commit_edge_insertions
+insert_insn_on_edge, prepend_insn_to_edge, commit_edge_insertions
  - CFG updating after insn simplification
 purge_dead_edges, purge_all_dead_edges
  - CFG fixing after coarse manipulation
@@ -1966,7 +1966,8 @@ rtl_split_edge (edge edge_in)
 
 /* Queue instructions for insertion on an edge between two basic blocks.
The new instructions and basic blocks (if any) will not appear in the
-   CFG until commit_edge_insertions is called.  */
+   CFG until commit_edge_insertions is called.  If there are already
+   queued instructions on the edge, PATTERN is appended to them.  */
 
 void
 insert_insn_on_edge (rtx pattern, edge e)
@@ -1984,6 +1985,25 @@ insert_insn_on_edge (rtx pattern, edge e
 
   e->insns.r = get_insns ();
   end_sequence ();
+}
+
+/* Like insert_insn_on_edge, but if there are already queued instructions
+   on the edge, PATTERN is prepended to them.  */
+
+void
+prepend_insn_to_edge (rtx pattern, edge e)
+{
+  /* We cannot insert instructions on an abnormal critical edge.
+ It will be easier to find the culprit if we die now.  */
+  gcc_assert (!((e->flags & EDGE_ABNORMAL) && EDGE_CRITICAL_P (e)));
+
+  start_sequence ();
+
+  emit_insn (pattern);
+  emit_insn (e->insns.r);
+
+  e->insns.r = get_insns ();
+  end_sequence ();
 }
 
 /* Update the CFG for the instructions queued on edge E.  */
--- gcc/cfgexpand.cc.jj 2024-02-10 11:25:09.995474027 +0100
+++ gcc/cfgexpand.cc2024-02-14 21:27:23.219300727 +0100
@@ -3687,7 +3687,7 @@ expand_asm_stmt (gasm *stmt)
  copy = get_insns ();
  end_sequence ();
}
- insert_insn_on_edge (copy, e);
+ prepend_insn_to_edge (copy, e);
}
}
 }
--- gcc/testsuite/gcc.target/i386/pr113921.c.jj 2024-02-14 21:21:15.194178515 
+0100
+++ gcc/testsuite/gcc.target/i386/pr113921.c2024-02-14 21:20:52.745476040 
+0100
@@ -0,0 +1,20 @@
+/* PR midd

Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-02-15 Thread Richard Biener
On Wed, 14 Feb 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > The following avoids accessing out-of-bound vector elements when
> >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision
> >> > elements.  The error was basing the number of elements to extract
> >> > on the rounded up total byte size involved and the patch bases
> >> > everything on the total number of elements to extract instead.
> >> 
> >> It's too long ago to be certain, but I think this was a deliberate choice.
> >> The point of the new vector constant encoding is that it can give an
> >> allegedly sensible value for any given index, even out-of-range ones.
> >> 
> >> Since the padding bits are undefined, we should in principle have a free
> >> choice of what to use.  And for VLA, it's often better to continue the
> >> existing pattern rather than force to zero.
> >> 
> >> I don't strongly object to changing it.  I think we should be careful
> >> about relying on zeroing for correctness though.  The bits are in principle
> >> undefined and we can't rely on reading zeros from equivalent memory or
> >> register values.
> >
> > The main motivation for a change here is to allow catching out-of-bound
> > indices again for VECTOR_CST_ELT, at least for constant nunits because
> > it might be a programming error like fat-fingering the index.  I do
> > think it's a regression that we no longer catch those.
> >
> > It's probably also a bit non-obvious how an encoding continues and
> > there might be DImode masks that can be represented by a 
> > zero-extended QImode immediate but "continued" it would require
> > a larger immediate.
> >
> > The change also effectively only changes something for 1 byte
> > encodings since nunits is a power of two and so is the element
> > size in bits.
> 
> Yeah, but even there, there's an argument that all-1s (0xff) is a more
> obvious value for an all-1s mask.
> 
> > A patch restoring the VECTOR_CST_ELT checking might be the
> > following
> >
> > diff --git a/gcc/tree.cc b/gcc/tree.cc
> > index 046a558d1b0..4c9b05167fd 100644
> > --- a/gcc/tree.cc
> > +++ b/gcc/tree.cc
> > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i)
> >if (i < encoded_nelts)
> >  return VECTOR_CST_ENCODED_ELT (t, i);
> >  
> > +  /* Catch out-of-bound element accesses.  */
> > +  gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i));
> > +
> >/* If there are no steps, the final encoded value is the right one.  */
> >if (!VECTOR_CST_STEPPED_P (t))
> >  {
> >
> > but it triggers quite a bit via const_binop for, for example
> >
> > #2  0x011c1506 in const_binop (code=PLUS_EXPR, 
> > arg1=, arg2=)
> > (gdb) p debug_generic_expr (arg1)
> > { 12, 13, 14, 15 }
> > $5 = void
> > (gdb) p debug_generic_expr (arg2)
> > { -2, -2, -2, -3 }
> > (gdb) p count
> > $4 = 6
> > (gdb) l
> > 1711  if (!elts.new_binary_operation (type, arg1, arg2, 
> > step_ok_p))
> > 1712return NULL_TREE;
> > 1713  unsigned int count = elts.encoded_nelts ();
> > 1714  for (unsigned int i = 0; i < count; ++i)
> > 1715{
> > 1716  tree elem1 = VECTOR_CST_ELT (arg1, i);
> > 1717  tree elem2 = VECTOR_CST_ELT (arg2, i);
> > 1718
> > 1719  tree elt = const_binop (code, elem1, elem2);
> >
> > this seems like an error to me - why would we, for fixed-size
> > vectors and for PLUS ever create a vector encoding with 6 elements?!
> > That seems at least inefficient to me?
> 
> It's a case of picking your poison.  On the other side, operating
> individually on each element of a V64QI is inefficient when the
> representation says up-front that all elements are equal.

True, though I wonder why for VLS vectors new_binary_operation
doesn't cap the number of encoded elts on the fixed vector size,
like doing

  encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts);

or if there's no good way to write it applying for both VLA and VLS
do it only when TYPE_VECTOR_SUBPARTS is constant.

> Fundemantally, operations on VLA vectors are treated as functions
> that map patterns to patterns.  The number of elements that are
> consumed isn't really relevant to the function itself.  The VLA
> folders therefore rely on being to read an element from a pattern
> even if the index is outside TREE_VECTOR_SUBPARTS.
> 
> There were two reasons for using VLA paths for VLS vectors.
> One I mentioned above: it saves time when all elements are equal,
> or have a similarly compact representation.  The other is that it
> makes VLA less special and ensures that the code gets more testing.
> 
> Maybe one compromise between that and the assert would be:
> 
> (1) enforce the assert only for VLS and

that's what I did by using maybe_gt?

> (2) add new checks to ensure that a VLA-friendly operation will never
> read out-of-bounds for VLS vectors
> 
> But I thi

RE: [PATCH]AArch64: update vget_set_lane_1.c test output

2024-02-15 Thread Tamar Christina
> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, February 1, 2024 4:42 PM
> To: Tamar Christina 
> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; nd
> ; Richard Earnshaw ; Marcus
> Shawcroft ; Kyrylo Tkachov
> 
> Subject: Re: [PATCH]AArch64: update vget_set_lane_1.c test output
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Thursday, February 1, 2024 2:24 PM
> >> To: Andrew Pinski 
> >> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd
> >> ; Richard Earnshaw ; Marcus
> >> Shawcroft ; Kyrylo Tkachov
> >> 
> >> Subject: Re: [PATCH]AArch64: update vget_set_lane_1.c test output
> >>
> >> Andrew Pinski  writes:
> >> > On Thu, Feb 1, 2024 at 1:26 AM Tamar Christina 
> >> wrote:
> >> >>
> >> >> Hi All,
> >> >>
> >> >> In the vget_set_lane_1.c test the following entries now generate a zip1
> instead
> >> of an INS
> >> >>
> >> >> BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
> >> >> BUILD_TEST (int32x2_t,   int32x2_t,   , , s32, 1, 0)
> >> >> BUILD_TEST (uint32x2_t,  uint32x2_t,  , , u32, 1, 0)
> >> >>
> >> >> This is because the non-Q variant for indices 0 and 1 are just 
> >> >> shuffling values.
> >> >> There is no perf difference between INS SIMD to SIMD and ZIP, as such 
> >> >> just
> >> update the
> >> >> test file.
> >> > Hmm, is this true on all cores? I suspect there is a core out there
> >> > where INS is implemented with a much lower latency than ZIP.
> >> > If we look at config/aarch64/thunderx.md, we can see INS is 2 cycles
> >> > while ZIP is 6 cycles (3/7 for q versions).
> >> > Now I don't have any invested interest in that core any more but I
> >> > just wanted to point out that is not exactly true for all cores.
> >>
> >> Thanks for the pointer.  In that case, perhaps we should prefer
> >> aarch64_evpc_ins over aarch64_evpc_zip in
> aarch64_expand_vec_perm_const_1?
> >> That's enough to fix this failure, but it'll probably require other
> >> tests to be adjusted...
> >
> > I think given that Thundex-X is a 10 year old micro-architecture that is 
> > several
> cases where
> > often used instructions have very high latencies that generic codegen 
> > should not
> be blocked
> > from progressing because of it.
> >
> > we use zips in many things and if thunderx codegen is really of that much
> importance then I
> > think the old codegen should be gated behind -mcpu=thunderx rather than
> preventing generic
> > changes.
> 
> But you said there was no perf difference between INS and ZIP, so it
> sounds like for all known cases, using INS rather than ZIP is either
> neutral or better.
> 
> There's also the possible secondary benefit that the INS patterns use
> standard RTL operations whereas the ZIP patterns use unspecs.
> 
> Keeping ZIP seems OK there's a specific reason to prefer it over INS for
> more modern cores though.

Ok, that's a fair point.  Doing some due diligence, Neoverse-E1 and
Cortex-A65 SWoGs seem to imply that there ZIPs have better throughput
than INSs. However the entries are inconsistent and I can't measure the
difference so I believe this to be a documentation bug.

That said, switching the operands seems to show one issue in that preferring
INS degenerates code in cases where we are inserting the top bits of the first
parameter into the bottom of the second parameter and returning,

Zip being a Three operand instruction allows us to put the result into the final
destination register with one operation whereas INS requires an fmov:

foo_uzp1_s32:
ins v0.s[1], v1.s[0]
fmovd0, d0
ret
foo_uzp2_s32:
ins v1.s[0], v0.s[1]
fmovd0, d1
ret

I've posted uzp but zip has the same issue.

So I guess it's not better to flip the order but perhaps I should add a case to
the zip/unzip RTL patterns for when op0 == op1?

Thanks,
Tamar
> 
> Thanks,
> Richard



Re: Question on -fwrapv and -fwrapv-pointer

2024-02-15 Thread Fangrui Song
On Fri, Sep 15, 2023 at 11:43 AM Kees Cook via Gcc-patches
 wrote:
>
> On Fri, Sep 15, 2023 at 05:47:08PM +, Qing Zhao wrote:
> >
> >
> > > On Sep 15, 2023, at 1:26 PM, Richard Biener  
> > > wrote:
> > >
> > >
> > >
> > >> Am 15.09.2023 um 17:37 schrieb Qing Zhao :
> > >>
> > >> 
> > >>
> >  On Sep 15, 2023, at 11:29 AM, Richard Biener 
> >   wrote:
> > 
> > 
> > 
> > > Am 15.09.2023 um 17:25 schrieb Qing Zhao :
> > 
> >  
> > 
> > > On Sep 15, 2023, at 8:41 AM, Arsen Arsenović  wrote:
> > >
> > >
> > > Qing Zhao  writes:
> > >
> > >> Even though unsigned integer overflow is well defined, it might be
> > >> unintentional, shall we warn user about this?
> > >
> > > This would be better addressed by providing operators or functions 
> > > that
> > > do overflow checking in the language, so that they can be explicitly
> > > used where overflow is unexpected.
> > 
> >  Yes, that will be very helpful to prevent unexpected overflow in the 
> >  program in general.
> >  However, this will mainly benefit new codes.
> > 
> >  For the existing C codes, especially large applications, we still need 
> >  to identify all the places
> >  Where the overflow is unexpected, and fix them.
> > 
> >  One good example is linux kernel.
> > 
> > > One could easily imagine a scenario
> > > where overflow is not expected in some region of code but is in the
> > > larger application.
> > 
> >  Yes, that’s exactly the same situation Linux kernel faces now, the 
> >  unexpected Overflow and
> >  expected wrap-around are mixed together inside one module.
> >  It’s hard to detect the unexpected overflow under such situation based 
> >  on the current GCC.
> > >>>
> > >>> But that’s hardly GCCs fault nor can GCC fix that in any way.  Only the 
> > >>> programmer can distinguish both cases.
> > >>
> > >> Right, compiler cannot fix this.
> > >> But can provide some tools to help the user to detect this more 
> > >> conveniently.
> > >>
> > >> Right now, GCC provides two set of options for different types:
> > >>
> > >> A. Turn the overflow to expected wrap-around (remove UB);
> > >> B. Detect overflow;
> > >>
> > >>   AB
> > >>  remove UB-fsanitize=…
> > >> signed   -fwrapvsigned-integer-overflow
> > >> pointer   -fwrapv-pointerpointer-overflow (broken in Clang)
> > >>
> > >> However, Options in A and B excluded with each other. They cannot mix 
> > >> together for a single file.
> > >>
> > >> What’s requested from Kernel is:
> > >>
> > >> compiler needs to provide a functionality that can mix these two 
> > >> together for a file.
> > >>
> > >> i.e, apply A (convert UB to defined behavior WRAP-AROUND) only to part 
> > >> of the program.  And then add -fsnaitize=*overflow to detect all other
> > >> Unexpected overflows in the program.

Yes, I believe combining A and B should be allowed.

> > >> This is currently missing from GCC, I guess?
> > >
> > > How can GCC know which part of the program wants wrapping and which 
> > > sanitizing?
> >
> > GCC doesn’t know, but the user knows.
> >
> > Then just provide the user a way to mark part of the program to be wrapping 
> > around and excluded from sanitizing?
> >
> > Currently, GCC provides
> >
> > __attribute__(optimize ("wrapv"))
> >
> > To mark the specific function to be wrapped around.
> >
> > However, this attribute does not work for linux kernel due to the following 
> > reason:
> >
> > Attribute optimize should be only used for debugging purpose;
> > The kernel has banned its usage;
> >
> > So, a new attribute was requested from Linux kernel security:
> >
> >  request wrap-around behavior for specific function (PR102317)
> > __attribute__((wrapv))
> >
> > Is this request reasonable?
>
> After working through this discussion, I'd say it's likely more helpful
> to have a way to disable the sanitizers for a given function (or
> variable). i.e. The goal for the kernel would that untrapped wrap-around
> would be the very rare exception. e.g. our refcount_t implementation:
> https://elixir.bootlin.com/linux/v6.5/source/include/linux/refcount.h#L200
>
> Then we can continue to build the kernel with -fno-strict-overflow (to
> avoid UB), but gain sanitizer coverage for all run-time wraps, except
> for the very few places where we depend on it. Getting there will also
> take some non-trivial refactoring on our end, but without the sanitizers
> we're unlikely to find them all.
>
> --
> Kees Cook

I see a Clang patch that proposes -fsanitize=signed-integer-wrap,
which appears to be the same as signed-integer-overflow, but performs
the check in the -fwrapv mode.
I made a reply to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102317#c13 that we
probably should just make -fsanitize=signed-integer-overflow work with
-fwrapv.

(This message is made so that interested folk

Re: [PATCH] lower-bitint: Ensure we don't get coalescing ICEs for (ab) SSA_NAMEs used in mul/div/mod [PR113567]

2024-02-15 Thread Richard Biener
On Thu, 15 Feb 2024, Jakub Jelinek wrote:

> Hi!
> 
> The build_bitint_stmt_ssa_conflicts hook has a special case for
> multiplication, division and modulo, where to ensure there is no overlap
> between lhs and rhs1/rhs2 arrays we make the lhs conflict with the
> operands.
> On the following testcase, we have
>   # a_1(ab) = PHI 
> lab:
>   a_3(ab) = a_1(ab) % 3;
> before lowering and this special case causes a_3(ab) and a_1(ab) to
> conflict, but the PHI requires them not to conflict, so we ICE because we
> can't find some partitioning that will work.
> 
> The following patch fixes this by special casing such statements before
> the partitioning, force the inputs of the multiplication/division which
> have large/huge _BitInt (ab) lhs into new non-(ab) SSA_NAMEs initialized
> right before the multiplication/division.  This allows the partitioning
> to work then, as it has the possibility to use a different partition for
> the */% operands.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-02-15  Jakub Jelinek  
> 
>   PR tree-optimization/113567
>   * gimple-lower-bitint.cc (gimple_lower_bitint): For large/huge
>   _BitInt multiplication, division or modulo with
>   SSA_NAME_OCCURS_IN_ABNORMAL_PHI lhs and at least one of rhs1 and rhs2
>   force the affected inputs into a new SSA_NAME.
> 
>   * gcc.dg/bitint-90.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-02-12 20:45:50.156275452 +0100
> +++ gcc/gimple-lower-bitint.cc2024-02-14 18:17:36.630664828 +0100
> @@ -5973,6 +5973,47 @@ gimple_lower_bitint (void)
> {
> default:
>   break;
> +   case MULT_EXPR:
> +   case TRUNC_DIV_EXPR:
> +   case TRUNC_MOD_EXPR:
> + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (s))
> +   {
> + location_t loc = gimple_location (stmt);
> + gsi = gsi_for_stmt (stmt);
> + tree rhs1 = gimple_assign_rhs1 (stmt);
> + tree rhs2 = gimple_assign_rhs2 (stmt);
> + /* For multiplication and division with (ab)
> +lhs and one or both operands force the operands
> +into new SSA_NAMEs to avoid coalescing failures.  */
> + if (TREE_CODE (rhs1) == SSA_NAME
> + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs1))
> +   {
> + first_large_huge = 0;
> + tree t = make_ssa_name (TREE_TYPE (rhs1));
> + g = gimple_build_assign (t, SSA_NAME, rhs1);
> + gsi_insert_before (&gsi, g, GSI_SAME_STMT);
> + gimple_set_location (g, loc);
> + gimple_assign_set_rhs1 (stmt, t);
> + if (rhs1 == rhs2)
> +   {
> + gimple_assign_set_rhs2 (stmt, t);
> + rhs2 = t;
> +   }
> + update_stmt (stmt);
> +   }
> + if (TREE_CODE (rhs2) == SSA_NAME
> + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs2))
> +   {
> + first_large_huge = 0;
> + tree t = make_ssa_name (TREE_TYPE (rhs2));
> + g = gimple_build_assign (t, SSA_NAME, rhs2);
> + gsi_insert_before (&gsi, g, GSI_SAME_STMT);
> + gimple_set_location (g, loc);
> + gimple_assign_set_rhs2 (stmt, t);
> + update_stmt (stmt);
> +   }
> +   }
> + break;
> case LROTATE_EXPR:
> case RROTATE_EXPR:
>   {
> --- gcc/testsuite/gcc.dg/bitint-90.c.jj   2024-02-14 18:24:20.546018881 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-90.c  2024-02-14 18:24:09.900167668 +0100
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/113567 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-O2" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 129
> +_BitInt(129) v;
> +
> +void
> +foo (_BitInt(129) a, int i)
> +{
> +  __label__  l1, l2;
> +  i &= 1;
> +  void *p[] = { &&l1, &&l2 };
> +l1:
> +  a %= 3;
> +  v = a;
> +  i = !i;
> +  goto *(p[i]);
> +l2:;
> +}
> +#else
> +int i;
> +#endif
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] gccrs: Avoid *.bak suffixed tests - use dg-skip-if instead

2024-02-15 Thread Jakub Jelinek
On Fri, Feb 09, 2024 at 11:03:38AM +0100, Jakub Jelinek wrote:
> On Wed, Feb 07, 2024 at 12:43:59PM +0100, arthur.co...@embecosm.com wrote:
> > From: Philip Herron 
> > 
> > This patch introduces one regression because generics are getting better
> > understood over time. The code here used to apply generics with the same
> > symbol from previous segments which was a bit of a hack with out limited
> > inference variable support. The regression looks like it will be related
> > to another issue which needs to default integer inference variables much
> > more aggresivly to default integer.
> > 
> > Fixes #2723
> > 
> > gcc/rust/ChangeLog:
> > 
> > * typecheck/rust-hir-type-check-path.cc 
> > (TypeCheckExpr::resolve_segments): remove hack
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * rust/compile/issue-1773.rs: Moved to...
> > * rust/compile/issue-1773.rs.bak: ...here.
> 
> Please don't use such suffixes in the testsuite.
> Either delete the testcase, or xfail it somehow until the bug is fixed.

To be precise, I have scripts to look for backup files in the tree (*~,
*.bak, *.orig, *.rej etc.) and this stands in the way several times a day.

Here is a fix for that in patch form, tested on x86_64-linux with
make check-rust RUNTESTFLAGS='compile.exp=issue-1773.rs'
Ok for trunk?

2024-02-15  Jakub Jelinek  

* rust/compile/issue-1773.rs.bak: Rename to ...
* rust/compile/issue-1773.rs: ... this.  Add dg-skip-if directive.

diff --git a/gcc/testsuite/rust/compile/issue-1773.rs.bak 
b/gcc/testsuite/rust/compile/issue-1773.rs
similarity index 89%
rename from gcc/testsuite/rust/compile/issue-1773.rs.bak
rename to gcc/testsuite/rust/compile/issue-1773.rs
index a4542aea00b..468497a4792 100644
--- a/gcc/testsuite/rust/compile/issue-1773.rs.bak
+++ b/gcc/testsuite/rust/compile/issue-1773.rs
@@ -1,4 +1,5 @@
 #[lang = "sized"]
+// { dg-skip-if "" { *-*-* } }
 pub trait Sized {}
 
 trait Foo {

Jakub



RE: [COMMITTED V3 1/4] RISC-V: Add non-vector types to dfa pipelines

2024-02-15 Thread Li, Pan2
Hi Edwin,

Sorry for late reply due to holiday. I double-checked the 
calling-convernsion-*.c dump, it is safe to adjust the asm check to the number 
as you mentioned.

Pan

-Original Message-
From: Edwin Lu  
Sent: Tuesday, February 6, 2024 2:42 AM
To: Li, Pan2 ; juzhe.zh...@rivai.ai; gcc-patches 

Cc: Robin Dapp ; kito.cheng ; 
jeffreyalaw ; palmer ; vineetg 
; Patrick O'Neill 
Subject: Re: [COMMITTED V3 1/4] RISC-V: Add non-vector types to dfa pipelines

On 2/2/2024 11:10 PM, Li, Pan2 wrote:
> Hi Edwin
> 
>> I believe the only problematic failures are the 5 vls calling convention
>> ones where only 24 ld\\s+a[0-1],\\s*[0-9]+\\(sp\\) are found.
> 
> Does this "only 24" comes from calling-convention-1.c?

Oops sorry about that. I said I would include all the 7 failures and 
ended up not doing that. The failures are here
FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c -O3 
-ftree-vectorize --param riscv-autovec-preference=scalable 
scan-assembler-times ld\\s+a[0-1],\\s*[0-9]+\\(sp\\) 35
FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c -O3 
-ftree-vectorize --param riscv-autovec-preference=scalable 
scan-assembler-times ld\\s+a[0-1],\\s*[0-9]+\\(sp\\) 33
FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c -O3 
-ftree-vectorize --param riscv-autovec-preference=scalable 
scan-assembler-times ld\\s+a[0-1],\\s*[0-9]+\\(sp\\) 31
FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c -O3 
-ftree-vectorize --param riscv-autovec-preference=scalable 
scan-assembler-times ld\\s+a[0-1],\\s*[0-9]+\\(sp\\) 29
FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c -O3 
-ftree-vectorize --param riscv-autovec-preference=scalable 
scan-assembler-times ld\\s+a[0-1],\\s*[0-9]+\\(sp\\) 29

These all have the problem of only 24 ld\\s+a[0-1],\\s*[0-9]+\\(sp\\) 
being found. So that is calling-conventions 1, 2, 3, 4, 7 with only 24 
matching RE.

FAIL: gcc.target/riscv/rvv/base/vcreate.c scan-assembler-times 
vmv1r.v\\s+v[0-9]+,\\s*v[0-9]+ 24 <-- found 36 times
FAIL: gcc.target/riscv/rvv/base/vcreate.c scan-assembler-times 
vmv2r.v\\s+v[0-9]+,\\s*v[0-9]+ 12 <-- found 28 times
FAIL: gcc.target/riscv/rvv/base/vcreate.c scan-assembler-times 
vmv4r.v\\s+v[0-9]+,\\s*v[0-9]+ 16 <-- found 19 times

These find more vmv's than expected

FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-107.c   -O2 
scan-assembler-times vsetvli\\tzero,zero,e32,m1,t[au],m[au] 1 <-- found 
0 times
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-107.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli\\tzero,zero,e32,m1,t[au],m[au] 1 <-- found 0 times
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-107.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times 
vsetvli\\tzero,zero,e32,m1,t[au],m[au] 1 <-- found 0 times

These failures are from vsetvli zero,a0,e2,m1,ta,ma being found instead. 
I believe these should be fine.

> 
>> This is what I'm getting locally (first instance of wrong match):
>> v32qi_RET1_ARG8:
>> .LFB109:
> 
> V32qi will pass the args by reference instead of GPR(s), thus It is expected. 
> I think we need to diff the asm code before and after the patch for the whole 
> test-file.
> The RE "ld\\s+a[0-1],\\s*[0-9]+\\(sp\\)" would like to check vls mode values 
> are returned by a[0-1].
> 

I've been using this https://godbolt.org/z/vdxTY3rc7 (calling convention 
1) as my comparison to what I have compiled locally (included as 
attachment). From what I see, the differences, aside from reordering due 
to latency, are that the ld insns use a5 (for 32-512) or t4 (for 
1024-2048) or t5 (for 4096) for ARG8 and ARG9. Is there something else 
that I might be missing?

Edwin



Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-02-15 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 14 Feb 2024, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
>> >
>> >> Richard Biener  writes:
>> >> > The following avoids accessing out-of-bound vector elements when
>> >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision
>> >> > elements.  The error was basing the number of elements to extract
>> >> > on the rounded up total byte size involved and the patch bases
>> >> > everything on the total number of elements to extract instead.
>> >> 
>> >> It's too long ago to be certain, but I think this was a deliberate choice.
>> >> The point of the new vector constant encoding is that it can give an
>> >> allegedly sensible value for any given index, even out-of-range ones.
>> >> 
>> >> Since the padding bits are undefined, we should in principle have a free
>> >> choice of what to use.  And for VLA, it's often better to continue the
>> >> existing pattern rather than force to zero.
>> >> 
>> >> I don't strongly object to changing it.  I think we should be careful
>> >> about relying on zeroing for correctness though.  The bits are in 
>> >> principle
>> >> undefined and we can't rely on reading zeros from equivalent memory or
>> >> register values.
>> >
>> > The main motivation for a change here is to allow catching out-of-bound
>> > indices again for VECTOR_CST_ELT, at least for constant nunits because
>> > it might be a programming error like fat-fingering the index.  I do
>> > think it's a regression that we no longer catch those.
>> >
>> > It's probably also a bit non-obvious how an encoding continues and
>> > there might be DImode masks that can be represented by a 
>> > zero-extended QImode immediate but "continued" it would require
>> > a larger immediate.
>> >
>> > The change also effectively only changes something for 1 byte
>> > encodings since nunits is a power of two and so is the element
>> > size in bits.
>> 
>> Yeah, but even there, there's an argument that all-1s (0xff) is a more
>> obvious value for an all-1s mask.
>> 
>> > A patch restoring the VECTOR_CST_ELT checking might be the
>> > following
>> >
>> > diff --git a/gcc/tree.cc b/gcc/tree.cc
>> > index 046a558d1b0..4c9b05167fd 100644
>> > --- a/gcc/tree.cc
>> > +++ b/gcc/tree.cc
>> > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i)
>> >if (i < encoded_nelts)
>> >  return VECTOR_CST_ENCODED_ELT (t, i);
>> >  
>> > +  /* Catch out-of-bound element accesses.  */
>> > +  gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i));
>> > +
>> >/* If there are no steps, the final encoded value is the right one.  */
>> >if (!VECTOR_CST_STEPPED_P (t))
>> >  {
>> >
>> > but it triggers quite a bit via const_binop for, for example
>> >
>> > #2  0x011c1506 in const_binop (code=PLUS_EXPR, 
>> > arg1=, arg2=)
>> > (gdb) p debug_generic_expr (arg1)
>> > { 12, 13, 14, 15 }
>> > $5 = void
>> > (gdb) p debug_generic_expr (arg2)
>> > { -2, -2, -2, -3 }
>> > (gdb) p count
>> > $4 = 6
>> > (gdb) l
>> > 1711  if (!elts.new_binary_operation (type, arg1, arg2, 
>> > step_ok_p))
>> > 1712return NULL_TREE;
>> > 1713  unsigned int count = elts.encoded_nelts ();
>> > 1714  for (unsigned int i = 0; i < count; ++i)
>> > 1715{
>> > 1716  tree elem1 = VECTOR_CST_ELT (arg1, i);
>> > 1717  tree elem2 = VECTOR_CST_ELT (arg2, i);
>> > 1718
>> > 1719  tree elt = const_binop (code, elem1, elem2);
>> >
>> > this seems like an error to me - why would we, for fixed-size
>> > vectors and for PLUS ever create a vector encoding with 6 elements?!
>> > That seems at least inefficient to me?
>> 
>> It's a case of picking your poison.  On the other side, operating
>> individually on each element of a V64QI is inefficient when the
>> representation says up-front that all elements are equal.
>
> True, though I wonder why for VLS vectors new_binary_operation
> doesn't cap the number of encoded elts on the fixed vector size,
> like doing
>
>   encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts);
>
> or if there's no good way to write it applying for both VLA and VLS
> do it only when TYPE_VECTOR_SUBPARTS is constant.

ordered_min can't be used because there's no guarantee that encoded_elts
and TYPE_VECTOR_SUBPARTS are well-ordered for the VLA case.  E.g.
for a stepped (3-element) encoding and a length of 2+2X, the stepped
encoding is longer for X==0 and the vector is longer for X>0.

But yeah, in general, trying to enforce this for VLS would probably
lead to a proliferation of more "if VLA do one thing, if VLS do some
other thing".  The aim was to avoid that where it didn't seem strictly
necessary.

>> Fundemantally, operations on VLA vectors are treated as functions
>> that map patterns to patterns.  The number of elements that are
>> consumed isn't really relevant to the function itself.  The VLA
>> folders therefore rely on being t

Re: [PATCH] Arm: Fix incorrect tailcall-generation for indirect calls [PR113780]

2024-02-15 Thread Tejas Belagod

On 2/14/24 3:55 PM, Richard Earnshaw (lists) wrote:

On 14/02/2024 09:20, Tejas Belagod wrote:

On 2/7/24 11:41 PM, Richard Earnshaw (lists) wrote:

On 07/02/2024 07:59, Tejas Belagod wrote:

This patch fixes a bug that causes indirect calls in PAC-enabled functions
to be tailcalled incorrectly when all argument registers R0-R3 are used.

Tested on arm-none-eabi for armv8.1-m.main. OK for trunk?

2024-02-07  Tejas Belagod  

 PR target/113780
 * gcc/config/arm.cc (arm_function_ok_for_sibcall): Don't allow tailcalls
   for indirect calls with 4 or more arguments in pac-enabled functions.

 * gcc.target/arm/pac-sibcall.c: New.
---
   gcc/config/arm/arm.cc  | 12 
   gcc/testsuite/gcc.target/arm/pac-sibcall.c | 11 +++
   2 files changed, 19 insertions(+), 4 deletions(-)
   create mode 100644 gcc/testsuite/gcc.target/arm/pac-sibcall.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index c44047c377a..c1f8286a4d4 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -7980,10 +7980,14 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
     && DECL_WEAK (decl))
   return false;
   -  /* We cannot do a tailcall for an indirect call by descriptor if all the
- argument registers are used because the only register left to load the
- address is IP and it will already contain the static chain.  */
-  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+  /* We cannot do a tailcall for an indirect call by descriptor or for an
+ indirect call in a pac-enabled function if all the argument registers
+ are used because the only register left to load the address is IP and
+ it will already contain the static chain or the PAC signature in the
+ case of PAC-enabled functions.  */


This comment is becoming a bit unwieldy.  I suggest restructuring it as:

We cannot tailcall an indirect call by descriptor if all the call-clobbered
general registers are live (r0-r3 and ip).  This can happen when:
    - IP contains the static chain, or
    - IP is needed for validating the PAC signature.



+  if (!decl
+  && ((CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+  || arm_current_function_pac_enabled_p()))
   {
     tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
     CUMULATIVE_ARGS cum;
diff --git a/gcc/testsuite/gcc.target/arm/pac-sibcall.c 
b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
new file mode 100644
index 000..c57bf7a952c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
@@ -0,0 +1,11 @@
+/* Testing return address signing.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-options " -mcpu=cortex-m85 -mbranch-protection=pac-ret+leaf -O2" } */


No, you can't just add options like this, you need to first check that they 
won't result in conflicts with other options on the command line.  See 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644077.html for an 
example of how to handle this.


Thanks for the review, Richard. Respin attached.

Thanks,
Tejas.


+
+void fail(void (*f)(int, int, int, int))
+{
+  f(1, 2, 3, 4);
+}
+
+/* { dg-final { scan-assembler-not "bx\tip\t@ indirect register sibling call" 
} } */


R.


+++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
@@ -0,0 +1,14 @@
+/* If all call-clobbered general registers are live (r0-r3, ip), disable
+   indirect tail-call for a PAC-enabled function.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
This only checks if -mbranch-protection can work with the existing 
architecture/cpu; not with the flags you're about to add below.  You should 
check for arm_arch_v8_1m_main_pacbti_ok instead; then you can assume that 
-mbranch-protection can be added.



Indeed! Thanks for catching that.


+/* { dg-add-options arm_arch_v8_1m_main_pacbti } */
+/* { dg-additional-options "-mbranch-protection=pac-ret+leaf -O2" } */

Otherwise this is OK if you fix the above.



Thanks Richard. Respin attached. Will apply.

Thanks,
Tejas.


R.
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
c44047c377a802d0c1dc1406df1b88a6b079607b..1cd69268ee986a0953cc85ab259355d2191250ac
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -7980,10 +7980,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   && DECL_WEAK (decl))
 return false;
 
-  /* We cannot do a tailcall for an indirect call by descriptor if all the
- argument registers are used because the only register left to load the
- address is IP and it will already contain the static chain.  */
-  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+  /* We cannot tailcall an indirect call by descriptor if all the 
call-clobbered
+ general registers are live (r0-r3 and ip).  This can happen when:
+  - IP contains the static chain, or
+  - IP is needed for validating the PAC signature.  */
+  if (!decl

[PATCH] RISC-V: Add new option -march=help to print all supported extensions

2024-02-15 Thread Kito Cheng
The output of -march=help is like below:

```
All available -march extensions for RISC-V:
NameVersion
i   2.0, 2.1
e   2.0
m   2.0
a   2.0, 2.1
f   2.0, 2.2
d   2.0, 2.2
...
```

Also support -print-supported-extensions and --print-supported-extensions for
clang compatibility.

gcc/ChangeLog:

PR target/109349

* common/config/riscv/riscv-common.cc (riscv_arch_help): New.
* config/riscv/riscv-protos.h (RISCV_MAJOR_VERSION_BASE): New.
(RISCV_MINOR_VERSION_BASE): Ditto.
(RISCV_REVISION_VERSION_BASE): Ditto.
* config/riscv/riscv-c.cc (riscv_ext_version_value): Use enum
rather than magic number.
* config/riscv/riscv.h (riscv_arch_help): New.
(EXTRA_SPEC_FUNCTIONS): Add riscv_arch_help.
(DRIVER_SELF_SPECS): Handle -march=help, -print-supported-extensions and
--print-supported-extensions.
* config/riscv/riscv.opt (march=help): New.
(print-supported-extensions): New.
(-print-supported-extensions): New.
* doc/invoke.texi (RISC-V Options): Document -march=help.
---
 gcc/common/config/riscv/riscv-common.cc | 46 +
 gcc/config/riscv/riscv-c.cc |  2 +-
 gcc/config/riscv/riscv-protos.h |  7 
 gcc/config/riscv/riscv.h|  7 +++-
 gcc/config/riscv/riscv.opt  | 12 +++
 gcc/doc/invoke.texi |  3 +-
 6 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 631ce8309a0..8974fa4a128 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 
 #define INCLUDE_STRING
+#define INCLUDE_SET
+#define INCLUDE_MAP
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -2225,6 +2227,50 @@ riscv_get_valid_option_values (int option_code,
   return v;
 }
 
+const char *
+riscv_arch_help (int argc, const char **argv)
+{
+  /* Collect all exts, and sort it in canonical order.  */
+  struct extension_comparator {
+bool operator()(const std::string& a, const std::string& b) const {
+  return subset_cmp(a, b) >= 1;
+}
+  };
+  std::map, extension_comparator> all_exts;
+  for (const riscv_ext_version &ext : riscv_ext_version_table)
+{
+  if (!ext.name)
+   break;
+  if (ext.name[0] == 'g')
+   continue;
+  unsigned version_value = (ext.major_version * RISCV_MAJOR_VERSION_BASE)
+   + (ext.minor_version
+  * RISCV_MINOR_VERSION_BASE);
+  all_exts[ext.name].insert(version_value);
+}
+
+  printf("All available -march extensions for RISC-V:\n");
+  printf("\t%-20sVersion\n", "Name");
+  for (auto const &ext_info : all_exts)
+{
+  printf("\t%-20s\t", ext_info.first.c_str());
+  bool first = true;
+  for (auto version : ext_info.second)
+   {
+ if (first)
+   first = false;
+ else
+   printf(", ");
+ unsigned major = version / RISCV_MAJOR_VERSION_BASE;
+ unsigned minor = (version % RISCV_MAJOR_VERSION_BASE)
+   / RISCV_MINOR_VERSION_BASE;
+ printf("%u.%u", major, minor);
+   }
+  printf("\n");
+}
+  exit (0);
+}
+
 /* Implement TARGET_OPTION_OPTIMIZATION_TABLE.  */
 static const struct default_options riscv_option_optimization_table[] =
   {
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 94c3871c760..3ef06dcfd2d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -37,7 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 static int
 riscv_ext_version_value (unsigned major, unsigned minor)
 {
-  return (major * 100) + (minor * 1000);
+  return (major * RISCV_MAJOR_VERSION_BASE) + (minor * 
RISCV_MINOR_VERSION_BASE);
 }
 
 /* Implement TARGET_CPU_CPP_BUILTINS.  */
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index ae1685850ac..80efdf2b7e5 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -780,4 +780,11 @@ const struct riscv_tune_info *
 riscv_parse_tune (const char *, bool);
 const cpu_vector_cost *get_vector_costs ();
 
+enum
+{
+  RISCV_MAJOR_VERSION_BASE = 100,
+  RISCV_MINOR_VERSION_BASE = 1000,
+  RISCV_REVISION_VERSION_BASE = 1,
+};
+
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 669308cc96d..da089a03e9d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -50,12 +50,14 @@ extern const char *riscv_expand_arch (int argc, const char 
**argv);
 extern const char *riscv_expand_arch_from_cpu (int

[PATCH] Do not record dependences from debug stmts in tail merging

2024-02-15 Thread Richard Biener
The following avoids recording BB dependences for debug stmt uses.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

It's unlikely a dependence is just because of debug stmts so
actual compare-debug issues are very unlikely.  Still spotted
while investigating a CI regression mail (for an obsolete broken
patch ...)

* tree-ssa-tail-merge.cc (same_succ_hash): Skip debug
stmts.
---
 gcc/tree-ssa-tail-merge.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/tree-ssa-tail-merge.cc b/gcc/tree-ssa-tail-merge.cc
index f4e6ae6e8a2..c8b4a79294d 100644
--- a/gcc/tree-ssa-tail-merge.cc
+++ b/gcc/tree-ssa-tail-merge.cc
@@ -474,6 +474,9 @@ same_succ_hash (const same_succ *e)
!gsi_end_p (gsi); gsi_next_nondebug (&gsi))
 {
   stmt = gsi_stmt (gsi);
+  if (is_gimple_debug (stmt))
+   continue;
+
   stmt_update_dep_bb (stmt);
   if (stmt_local_def (stmt))
continue;
-- 
2.35.3


Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs

On 15/02/2024 07:49, Richard Biener wrote:

On Wed, 14 Feb 2024, Andrew Stubbs wrote:


On 14/02/2024 13:43, Richard Biener wrote:

On Wed, 14 Feb 2024, Andrew Stubbs wrote:


On 14/02/2024 13:27, Richard Biener wrote:

On Wed, 14 Feb 2024, Andrew Stubbs wrote:


On 13/02/2024 08:26, Richard Biener wrote:

On Mon, 12 Feb 2024, Thomas Schwinge wrote:


Hi!

On 2023-10-20T12:51:03+0100, Andrew Stubbs 
wrote:

I've committed this patch


... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
"amdgcn: add -march=gfx1030 EXPERIMENTAL".

The RDNA2 ISA variant doesn't support certain instructions previous
implemented in GCC/GCN, so a number of patterns etc. had to be
disabled:


[...] Vector
reductions will need to be reworked for RDNA2.  [...]



* config/gcn/gcn-valu.md (@dpp_move): Disable for RDNA2.
(addc3): Add RDNA2 syntax variant.
(subc3): Likewise.
(2_exec): Add RDNA2 alternatives.
(vec_cmpdi): Likewise.
(vec_cmpdi): Likewise.
(vec_cmpdi_exec): Likewise.
(vec_cmpdi_exec): Likewise.
(vec_cmpdi_dup): Likewise.
(vec_cmpdi_dup_exec): Likewise.
(reduc__scal_): Disable for RDNA2.
(*_dpp_shr_): Likewise.
(*plus_carry_dpp_shr_): Likewise.
(*plus_carry_in_dpp_shr_): Likewise.


Etc.  The expectation being that GCC middle end copes with this, and
synthesizes some less ideal yet still functional vector code, I
presume.

The later RDNA3/gfx1100 support builds on top of this, and that's what
I'm currently working on getting proper GCC/GCN target (not offloading)
results for.

I'm seeing a good number of execution test FAILs (regressions compared
to
my earlier non-gfx1100 testing), and I've now tracked down where one
large class of those comes into existance -- not yet how to resolve,
unfortunately.  But maybe, with you guys' combined vectorizer and back
end experience, the latter will be done quickly?

Richard, I don't know if you've ever run actual GCC/GCN target (not
offloading) testing; let me know if you have any questions about that.


I've only done offload testing - in the x86_64 build tree run
check-target-libgomp.  If you can tell me how to do GCN target testing
(maybe document it on the wiki even!) I can try do that as well.


Given that (at least largely?) the same patterns etc. are disabled as
in
my gfx1100 configuration, I suppose your gfx1030 one would exhibit the
same issues.  You can build GCC/GCN target like you build the
offloading
one, just remove '--enable-as-accelerator-for=[...]'.  Likely, you can
even use a offloading GCC/GCN build to reproduce the issue below.

One example is the attached 'builtin-bitops-1.c', reduced from
'gcc.c-torture/execute/builtin-bitops-1.c', where 'my_popcount' is
miscompiled as soon as '-ftree-vectorize' is effective:

$ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ builtin-bitops-1.c
-Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
-Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -fdump-tree-all-all
-fdump-ipa-all-all -fdump-rtl-all-all -save-temps -march=gfx1100
-O1
-ftree-vectorize

In the 'diff' of 'a-builtin-bitops-1.c.179t.vect', for example, for
'-march=gfx90a' vs. '-march=gfx1100', we see:

+builtin-bitops-1.c:7:17: missed:   reduc op not supported by
target.

..., and therefore:

-builtin-bitops-1.c:7:17: note:  Reduce using direct vector
reduction.
+builtin-bitops-1.c:7:17: note:  Reduce using vector shifts
+builtin-bitops-1.c:7:17: note:  extract scalar result

That is, instead of one '.REDUC_PLUS' for gfx90a, for gfx1100 we build
a
chain of summation of 'VEC_PERM_EXPR's.  However, there's wrong code
generated:

$ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
i=1, ints[i]=0x1 a=1, b=2
i=2, ints[i]=0x8000 a=1, b=2
i=3, ints[i]=0x2 a=1, b=2
i=4, ints[i]=0x4000 a=1, b=2
i=5, ints[i]=0x1 a=1, b=2
i=6, ints[i]=0x8000 a=1, b=2
i=7, ints[i]=0xa5a5a5a5 a=16, b=32
i=8, ints[i]=0x5a5a5a5a a=16, b=32
i=9, ints[i]=0xcafe a=11, b=22
i=10, ints[i]=0xcafe00 a=11, b=22
i=11, ints[i]=0xcafe a=11, b=22
i=12, ints[i]=0x a=32, b=64

(I can't tell if the 'b = 2 * a' pattern is purely coincidental?)

I don't speak enough "vectorization" to fully understand the generic
vectorized algorithm and its implementation.  It appears that the
"Reduce using vector shifts" code has been around for a very long time,
but also has gone through a number of changes.  I can't tell which GCC
targets/configurations it's actually used for (in the same way as for
GCN gfx1100), and thus whether there's an issue in that vectorizer
code,
or rather in the GCN back end, or GCN back end parameterizing the
generic
code?


The "shift" reduction is basically doing reduction by repeatedly
adding the upper to the lower half of the vector (each time halving
the vector size).


Manually working through the 'a-builtin-bitops-1.c.265t.optimized'
code:

in

Re: [PATCH] Do not record dependences from debug stmts in tail merging

2024-02-15 Thread Jakub Jelinek
On Thu, Feb 15, 2024 at 11:00:29AM +0100, Richard Biener wrote:
> The following avoids recording BB dependences for debug stmt uses.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> It's unlikely a dependence is just because of debug stmts so
> actual compare-debug issues are very unlikely.  Still spotted
> while investigating a CI regression mail (for an obsolete broken
> patch ...)
> 
>   * tree-ssa-tail-merge.cc (same_succ_hash): Skip debug
>   stmts.

LGTM.

Jakub



Re: [PATCH 1/2] doc: Fix some standard named pattern documentation modes

2024-02-15 Thread Richard Biener
On Thu, Feb 15, 2024 at 12:16 AM Andrew Pinski  wrote:
>
> Currently these use `@var{m3}` but the 3 here is a literal 3
> and not part of the mode itself so it should not be inside
> the var. Fixed as such.
>
> Built the documentation to make sure it looks correct now.

OK

> gcc/ChangeLog:
>
> * doc/md.texi (widen_ssum, widen_usum, smulhs, umulhs,
> smulhrs, umulhrs, sdiv_pow2): Move the 3 outside of the
> var.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/doc/md.texi | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index b0c61925120..274dd03d419 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5798,19 +5798,19 @@ is of a wider mode, is computed and added to operand 
> 3. Operand 3 is of a mode
>  equal or wider than the mode of the absolute difference. The result is placed
>  in operand 0, which is of the same mode as operand 3.
>
> -@cindex @code{widen_ssum@var{m3}} instruction pattern
> -@cindex @code{widen_usum@var{m3}} instruction pattern
> -@item @samp{widen_ssum@var{m3}}
> -@itemx @samp{widen_usum@var{m3}}
> +@cindex @code{widen_ssum@var{m}3} instruction pattern
> +@cindex @code{widen_usum@var{m}3} instruction pattern
> +@item @samp{widen_ssum@var{m}3}
> +@itemx @samp{widen_usum@var{m}3}
>  Operands 0 and 2 are of the same mode, which is wider than the mode of
>  operand 1. Add operand 1 to operand 2 and place the widened result in
>  operand 0. (This is used express accumulation of elements into an accumulator
>  of a wider mode.)
>
> -@cindex @code{smulhs@var{m3}} instruction pattern
> -@cindex @code{umulhs@var{m3}} instruction pattern
> -@item @samp{smulhs@var{m3}}
> -@itemx @samp{umulhs@var{m3}}
> +@cindex @code{smulhs@var{m}3} instruction pattern
> +@cindex @code{umulhs@var{m}3} instruction pattern
> +@item @samp{smulhs@var{m}3}
> +@itemx @samp{umulhs@var{m}3}
>  Signed/unsigned multiply high with scale. This is equivalent to the C code:
>  @smallexample
>  narrow op0, op1, op2;
> @@ -5820,10 +5820,10 @@ op0 = (narrow) (((wide) op1 * (wide) op2) >> (N / 2 - 
> 1));
>  where the sign of @samp{narrow} determines whether this is a signed
>  or unsigned operation, and @var{N} is the size of @samp{wide} in bits.
>
> -@cindex @code{smulhrs@var{m3}} instruction pattern
> -@cindex @code{umulhrs@var{m3}} instruction pattern
> -@item @samp{smulhrs@var{m3}}
> -@itemx @samp{umulhrs@var{m3}}
> +@cindex @code{smulhrs@var{m}3} instruction pattern
> +@cindex @code{umulhrs@var{m}3} instruction pattern
> +@item @samp{smulhrs@var{m}3}
> +@itemx @samp{umulhrs@var{m}3}
>  Signed/unsigned multiply high with round and scale. This is
>  equivalent to the C code:
>  @smallexample
> @@ -5834,10 +5834,10 @@ op0 = (narrow) (wide) op1 * (wide) op2) >> (N / 2 
> - 2)) + 1) >> 1);
>  where the sign of @samp{narrow} determines whether this is a signed
>  or unsigned operation, and @var{N} is the size of @samp{wide} in bits.
>
> -@cindex @code{sdiv_pow2@var{m3}} instruction pattern
> -@cindex @code{sdiv_pow2@var{m3}} instruction pattern
> -@item @samp{sdiv_pow2@var{m3}}
> -@itemx @samp{sdiv_pow2@var{m3}}
> +@cindex @code{sdiv_pow2@var{m}3} instruction pattern
> +@cindex @code{sdiv_pow2@var{m}3} instruction pattern
> +@item @samp{sdiv_pow2@var{m}3}
> +@itemx @samp{sdiv_pow2@var{m}3}
>  Signed division by power-of-2 immediate. Equivalent to:
>  @smallexample
>  signed op0, op1;
> --
> 2.43.0
>


Re: [PATCH 2/2] doc: Add documentation of which operand matches the mode of the standard pattern name [PR113508]

2024-02-15 Thread Richard Biener
On Thu, Feb 15, 2024 at 12:16 AM Andrew Pinski  wrote:
>
> In some of the standard pattern names, it is not obvious which mode is being 
> used in the pattern
> name. Is it operand 0, 1, or 2? Is it the wider mode or the narrower mode?
> This fixes that so there is no confusion by adding a sentence to some of them.
>
> Built the documentation to make sure that it builds.

OK.

> gcc/ChangeLog:
>
> * doc/md.texi (sdot_prod@var{m}, udot_prod@var{m},
> usdot_prod@var{m}, ssad@var{m}, usad@var{m}, widen_usum@var{m}3,
> smulhs@var{m}3, umulhs@var{m}3, smulhrs@var{m}3, umulhrs@var{m}3):
> Add sentence about what the mode m is.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/doc/md.texi | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 274dd03d419..33b37e79cd4 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5746,6 +5746,7 @@ Operand 1 and operand 2 are of the same mode. Their
>  product, which is of a wider mode, is computed and added to operand 3.
>  Operand 3 is of a mode equal or wider than the mode of the product. The
>  result is placed in operand 0, which is of the same mode as operand 3.
> +@var{m} is the mode of operand 1 and operand 2.
>
>  Semantically the expressions perform the multiplication in the following 
> signs
>
> @@ -5763,6 +5764,7 @@ Operand 1 and operand 2 are of the same mode. Their
>  product, which is of a wider mode, is computed and added to operand 3.
>  Operand 3 is of a mode equal or wider than the mode of the product. The
>  result is placed in operand 0, which is of the same mode as operand 3.
> +@var{m} is the mode of operand 1 and operand 2.
>
>  Semantically the expressions perform the multiplication in the following 
> signs
>
> @@ -5779,6 +5781,7 @@ Operand 1 must be unsigned and operand 2 signed. Their
>  product, which is of a wider mode, is computed and added to operand 3.
>  Operand 3 is of a mode equal or wider than the mode of the product. The
>  result is placed in operand 0, which is of the same mode as operand 3.
> +@var{m} is the mode of operand 1 and operand 2.
>
>  Semantically the expressions perform the multiplication in the following 
> signs
>
> @@ -5797,6 +5800,7 @@ Operand 1 and operand 2 are of the same mode. Their 
> absolute difference, which
>  is of a wider mode, is computed and added to operand 3. Operand 3 is of a 
> mode
>  equal or wider than the mode of the absolute difference. The result is placed
>  in operand 0, which is of the same mode as operand 3.
> +@var{m} is the mode of operand 1 and operand 2.
>
>  @cindex @code{widen_ssum@var{m}3} instruction pattern
>  @cindex @code{widen_usum@var{m}3} instruction pattern
> @@ -5806,6 +5810,7 @@ Operands 0 and 2 are of the same mode, which is wider 
> than the mode of
>  operand 1. Add operand 1 to operand 2 and place the widened result in
>  operand 0. (This is used express accumulation of elements into an accumulator
>  of a wider mode.)
> +@var{m} is the mode of operand 1.
>
>  @cindex @code{smulhs@var{m}3} instruction pattern
>  @cindex @code{umulhs@var{m}3} instruction pattern
> @@ -5819,6 +5824,8 @@ op0 = (narrow) (((wide) op1 * (wide) op2) >> (N / 2 - 
> 1));
>  @end smallexample
>  where the sign of @samp{narrow} determines whether this is a signed
>  or unsigned operation, and @var{N} is the size of @samp{wide} in bits.
> +@var{m} is the mode for all 3 operands (narrow). The wide mode is not 
> specified
> +and is defined to fit the whole multiply.
>
>  @cindex @code{smulhrs@var{m}3} instruction pattern
>  @cindex @code{umulhrs@var{m}3} instruction pattern
> @@ -5833,6 +5840,8 @@ op0 = (narrow) (wide) op1 * (wide) op2) >> (N / 2 - 
> 2)) + 1) >> 1);
>  @end smallexample
>  where the sign of @samp{narrow} determines whether this is a signed
>  or unsigned operation, and @var{N} is the size of @samp{wide} in bits.
> +@var{m} is the mode for all 3 operands (narrow). The wide mode is not 
> specified
> +and is defined to fit the whole multiply.
>
>  @cindex @code{sdiv_pow2@var{m}3} instruction pattern
>  @cindex @code{sdiv_pow2@var{m}3} instruction pattern
> --
> 2.43.0
>


Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-02-15 Thread Richard Biener
On Thu, 15 Feb 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
> >> >
> >> >> Richard Biener  writes:
> >> >> > The following avoids accessing out-of-bound vector elements when
> >> >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision
> >> >> > elements.  The error was basing the number of elements to extract
> >> >> > on the rounded up total byte size involved and the patch bases
> >> >> > everything on the total number of elements to extract instead.
> >> >> 
> >> >> It's too long ago to be certain, but I think this was a deliberate 
> >> >> choice.
> >> >> The point of the new vector constant encoding is that it can give an
> >> >> allegedly sensible value for any given index, even out-of-range ones.
> >> >> 
> >> >> Since the padding bits are undefined, we should in principle have a free
> >> >> choice of what to use.  And for VLA, it's often better to continue the
> >> >> existing pattern rather than force to zero.
> >> >> 
> >> >> I don't strongly object to changing it.  I think we should be careful
> >> >> about relying on zeroing for correctness though.  The bits are in 
> >> >> principle
> >> >> undefined and we can't rely on reading zeros from equivalent memory or
> >> >> register values.
> >> >
> >> > The main motivation for a change here is to allow catching out-of-bound
> >> > indices again for VECTOR_CST_ELT, at least for constant nunits because
> >> > it might be a programming error like fat-fingering the index.  I do
> >> > think it's a regression that we no longer catch those.
> >> >
> >> > It's probably also a bit non-obvious how an encoding continues and
> >> > there might be DImode masks that can be represented by a 
> >> > zero-extended QImode immediate but "continued" it would require
> >> > a larger immediate.
> >> >
> >> > The change also effectively only changes something for 1 byte
> >> > encodings since nunits is a power of two and so is the element
> >> > size in bits.
> >> 
> >> Yeah, but even there, there's an argument that all-1s (0xff) is a more
> >> obvious value for an all-1s mask.
> >> 
> >> > A patch restoring the VECTOR_CST_ELT checking might be the
> >> > following
> >> >
> >> > diff --git a/gcc/tree.cc b/gcc/tree.cc
> >> > index 046a558d1b0..4c9b05167fd 100644
> >> > --- a/gcc/tree.cc
> >> > +++ b/gcc/tree.cc
> >> > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i)
> >> >if (i < encoded_nelts)
> >> >  return VECTOR_CST_ENCODED_ELT (t, i);
> >> >  
> >> > +  /* Catch out-of-bound element accesses.  */
> >> > +  gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i));
> >> > +
> >> >/* If there are no steps, the final encoded value is the right one.  
> >> > */
> >> >if (!VECTOR_CST_STEPPED_P (t))
> >> >  {
> >> >
> >> > but it triggers quite a bit via const_binop for, for example
> >> >
> >> > #2  0x011c1506 in const_binop (code=PLUS_EXPR, 
> >> > arg1=, arg2=)
> >> > (gdb) p debug_generic_expr (arg1)
> >> > { 12, 13, 14, 15 }
> >> > $5 = void
> >> > (gdb) p debug_generic_expr (arg2)
> >> > { -2, -2, -2, -3 }
> >> > (gdb) p count
> >> > $4 = 6
> >> > (gdb) l
> >> > 1711  if (!elts.new_binary_operation (type, arg1, arg2, 
> >> > step_ok_p))
> >> > 1712return NULL_TREE;
> >> > 1713  unsigned int count = elts.encoded_nelts ();
> >> > 1714  for (unsigned int i = 0; i < count; ++i)
> >> > 1715{
> >> > 1716  tree elem1 = VECTOR_CST_ELT (arg1, i);
> >> > 1717  tree elem2 = VECTOR_CST_ELT (arg2, i);
> >> > 1718
> >> > 1719  tree elt = const_binop (code, elem1, elem2);
> >> >
> >> > this seems like an error to me - why would we, for fixed-size
> >> > vectors and for PLUS ever create a vector encoding with 6 elements?!
> >> > That seems at least inefficient to me?
> >> 
> >> It's a case of picking your poison.  On the other side, operating
> >> individually on each element of a V64QI is inefficient when the
> >> representation says up-front that all elements are equal.
> >
> > True, though I wonder why for VLS vectors new_binary_operation
> > doesn't cap the number of encoded elts on the fixed vector size,
> > like doing
> >
> >   encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts);
> >
> > or if there's no good way to write it applying for both VLA and VLS
> > do it only when TYPE_VECTOR_SUBPARTS is constant.
> 
> ordered_min can't be used because there's no guarantee that encoded_elts
> and TYPE_VECTOR_SUBPARTS are well-ordered for the VLA case.  E.g.
> for a stepped (3-element) encoding and a length of 2+2X, the stepped
> encoding is longer for X==0 and the vector is longer for X>0.
> 
> But yeah, in general, trying to enforce this for VLS would probably
> lead to a proliferation of more "if VLA do one thing, if VLS do some
> other thing".  The aim was to avoid th

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Richard Biener
On Thu, 15 Feb 2024, Andrew Stubbs wrote:

> On 15/02/2024 07:49, Richard Biener wrote:
> > On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> > 
> >> On 14/02/2024 13:43, Richard Biener wrote:
> >>> On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> >>>
>  On 14/02/2024 13:27, Richard Biener wrote:
> > On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> >
> >> On 13/02/2024 08:26, Richard Biener wrote:
> >>> On Mon, 12 Feb 2024, Thomas Schwinge wrote:
> >>>
>  Hi!
> 
>  On 2023-10-20T12:51:03+0100, Andrew Stubbs 
>  wrote:
> > I've committed this patch
> 
>  ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>  "amdgcn: add -march=gfx1030 EXPERIMENTAL".
> 
>  The RDNA2 ISA variant doesn't support certain instructions previous
>  implemented in GCC/GCN, so a number of patterns etc. had to be
>  disabled:
> 
> > [...] Vector
> > reductions will need to be reworked for RDNA2.  [...]
> 
> > * config/gcn/gcn-valu.md (@dpp_move): Disable for RDNA2.
> > (addc3): Add RDNA2 syntax variant.
> > (subc3): Likewise.
> > (2_exec): Add RDNA2 alternatives.
> > (vec_cmpdi): Likewise.
> > (vec_cmpdi): Likewise.
> > (vec_cmpdi_exec): Likewise.
> > (vec_cmpdi_exec): Likewise.
> > (vec_cmpdi_dup): Likewise.
> > (vec_cmpdi_dup_exec): Likewise.
> > (reduc__scal_): Disable for RDNA2.
> > (*_dpp_shr_): Likewise.
> > (*plus_carry_dpp_shr_): Likewise.
> > (*plus_carry_in_dpp_shr_): Likewise.
> 
>  Etc.  The expectation being that GCC middle end copes with this, and
>  synthesizes some less ideal yet still functional vector code, I
>  presume.
> 
>  The later RDNA3/gfx1100 support builds on top of this, and that's
>  what
>  I'm currently working on getting proper GCC/GCN target (not
>  offloading)
>  results for.
> 
>  I'm seeing a good number of execution test FAILs (regressions
>  compared
>  to
>  my earlier non-gfx1100 testing), and I've now tracked down where one
>  large class of those comes into existance -- not yet how to resolve,
>  unfortunately.  But maybe, with you guys' combined vectorizer and
>  back
>  end experience, the latter will be done quickly?
> 
>  Richard, I don't know if you've ever run actual GCC/GCN target (not
>  offloading) testing; let me know if you have any questions about
>  that.
> >>>
> >>> I've only done offload testing - in the x86_64 build tree run
> >>> check-target-libgomp.  If you can tell me how to do GCN target testing
> >>> (maybe document it on the wiki even!) I can try do that as well.
> >>>
>  Given that (at least largely?) the same patterns etc. are disabled as
>  in
>  my gfx1100 configuration, I suppose your gfx1030 one would exhibit
>  the
>  same issues.  You can build GCC/GCN target like you build the
>  offloading
>  one, just remove '--enable-as-accelerator-for=[...]'.  Likely, you
>  can
>  even use a offloading GCC/GCN build to reproduce the issue below.
> 
>  One example is the attached 'builtin-bitops-1.c', reduced from
>  'gcc.c-torture/execute/builtin-bitops-1.c', where 'my_popcount' is
>  miscompiled as soon as '-ftree-vectorize' is effective:
> 
>  $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ builtin-bitops-1.c
>  -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>  -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -fdump-tree-all-all
>  -fdump-ipa-all-all -fdump-rtl-all-all -save-temps
>  -march=gfx1100
>  -O1
>  -ftree-vectorize
> 
>  In the 'diff' of 'a-builtin-bitops-1.c.179t.vect', for example, for
>  '-march=gfx90a' vs. '-march=gfx1100', we see:
> 
>  +builtin-bitops-1.c:7:17: missed:   reduc op not supported by
>  target.
> 
>  ..., and therefore:
> 
>  -builtin-bitops-1.c:7:17: note:  Reduce using direct vector
>  reduction.
>  +builtin-bitops-1.c:7:17: note:  Reduce using vector shifts
>  +builtin-bitops-1.c:7:17: note:  extract scalar result
> 
>  That is, instead of one '.REDUC_PLUS' for gfx90a, for gfx1100 we
>  build
>  a
>  chain of summation of 'VEC_PERM_EXPR's.  However, there's wrong code
>  generated:
> 
>  $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>  i=1, ints[i]=0x1 a=1, b=2
>  i=2, ints[i]=0x80

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Thomas Schwinge
Hi!

On 2024-02-15T08:49:17+0100, Richard Biener  wrote:
> On Wed, 14 Feb 2024, Andrew Stubbs wrote:
>> On 14/02/2024 13:43, Richard Biener wrote:
>> > On Wed, 14 Feb 2024, Andrew Stubbs wrote:
>> >> On 14/02/2024 13:27, Richard Biener wrote:
>> >>> On Wed, 14 Feb 2024, Andrew Stubbs wrote:
>>  On 13/02/2024 08:26, Richard Biener wrote:
>> > On Mon, 12 Feb 2024, Thomas Schwinge wrote:
>> >> On 2023-10-20T12:51:03+0100, Andrew Stubbs 
>> >> wrote:
>> >>> I've committed this patch
>> >>
>> >> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>> >> "amdgcn: add -march=gfx1030 EXPERIMENTAL".
>> >>
>> >> The RDNA2 ISA variant doesn't support certain instructions previous
>> >> implemented in GCC/GCN, so a number of patterns etc. had to be
>> >> disabled:
>> >>
>> >>> [...] Vector
>> >>> reductions will need to be reworked for RDNA2.  [...]
>> >>
>> >>>* config/gcn/gcn-valu.md (@dpp_move): Disable for RDNA2.
>> >>>(addc3): Add RDNA2 syntax variant.
>> >>>(subc3): Likewise.
>> >>>(2_exec): Add RDNA2 alternatives.
>> >>>(vec_cmpdi): Likewise.
>> >>>(vec_cmpdi): Likewise.
>> >>>(vec_cmpdi_exec): Likewise.
>> >>>(vec_cmpdi_exec): Likewise.
>> >>>(vec_cmpdi_dup): Likewise.
>> >>>(vec_cmpdi_dup_exec): Likewise.
>> >>>(reduc__scal_): Disable for RDNA2.
>> >>>(*_dpp_shr_): Likewise.
>> >>>(*plus_carry_dpp_shr_): Likewise.
>> >>>(*plus_carry_in_dpp_shr_): Likewise.
>> >>
>> >> Etc.  The expectation being that GCC middle end copes with this, and
>> >> synthesizes some less ideal yet still functional vector code, I 
>> >> presume.
>> >>
>> >> The later RDNA3/gfx1100 support builds on top of this, and that's what
>> >> I'm currently working on getting proper GCC/GCN target (not 
>> >> offloading)
>> >> results for.
>> >>
>> >> I'm seeing a good number of execution test FAILs (regressions 
>> >> compared to
>> >> my earlier non-gfx1100 testing), and I've now tracked down where one
>> >> large class of those comes into existance -- [...]

>> >> With the following hack applied to 'gcc/tree-vect-loop.cc':
>> >>
>> >>@@ -6687,8 +6687,9 @@ vect_create_epilog_for_reduction
>> >>(loop_vec_info
>> >>loop_vinfo,
>> >>   reduce_with_shift = have_whole_vector_shift (mode1);
>> >>   if (!VECTOR_MODE_P (mode1)
>> >>  || !directly_supported_p (code, vectype1))
>> >>reduce_with_shift = false;
>> >>+  reduce_with_shift = false;
>> >>
>> >> ..., I'm able to work around those regressions: by means of forcing
>> >> "Reduce using scalar code" instead of "Reduce using vector shifts".

>> The attached not-well-tested patch should allow only valid permutations.
>> Hopefully we go back to working code, but there'll be things that won't
>> vectorize. That said, the new "dump" output code has fewer and probably
>> cheaper instructions, so hmmm.
>
> This fixes the reduced builtin-bitops-1.c on RDNA2.

I confirm that "amdgcn: Disallow unsupported permute on RDNA devices"
also obsoletes my 'reduce_with_shift = false;' hack -- and also cures a
good number of additional FAILs (regressions), where presumably we
permute via different code paths.  Thanks!

There also are a few regressions, but only minor:

PASS: gcc.dg/vect/no-vfa-vect-depend-3.c (test for excess errors)
PASS: gcc.dg/vect/no-vfa-vect-depend-3.c execution test
PASS: gcc.dg/vect/no-vfa-vect-depend-3.c scan-tree-dump-times vect 
"vectorized 1 loops" 4
[-PASS:-]{+FAIL:+} gcc.dg/vect/no-vfa-vect-depend-3.c scan-tree-dump-times 
vect "dependence distance negative" 4

..., because:

gcc.dg/vect/no-vfa-vect-depend-3.c: pattern found 6 times
FAIL: gcc.dg/vect/no-vfa-vect-depend-3.c scan-tree-dump-times vect 
"dependence distance negative" 4

PASS: gcc.dg/vect/vect-119.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/vect/vect-119.c scan-tree-dump-times vect 
"Detected interleaving load of size 2" 1
PASS: gcc.dg/vect/vect-119.c scan-tree-dump-not optimized "Invalid sum"

..., because:

gcc.dg/vect/vect-119.c: pattern found 3 times
FAIL: gcc.dg/vect/vect-119.c scan-tree-dump-times vect "Detected 
interleaving load of size 2" 1

PASS: gcc.dg/vect/vect-reduc-mul_1.c (test for excess errors)
PASS: gcc.dg/vect/vect-reduc-mul_1.c execution test
[-PASS:-]{+FAIL:+} gcc.dg/vect/vect-reduc-mul_1.c scan-tree-dump vect 
"Reduce using vector shifts"

PASS: gcc.dg/vect/vect-reduc-mul_2.c (test for excess errors)
PASS: gcc.dg/vect/vect-reduc-mul_2.c execution test
[-PASS:-]{+FAIL:+} gcc.dg/vect/vect-reduc-mul_2.c scan-tree-dump vect 
"Reduce using vector shifts"

..., plus the following, in combination with the earlier changes
disabling patterns:

PASS: gcc.dg/vect/ve

Re: [PATCH] RISC-V: Add new option -march=help to print all supported extensions

2024-02-15 Thread Christoph Müllner
On Thu, Feb 15, 2024 at 10:56 AM Kito Cheng  wrote:
>
> The output of -march=help is like below:
>
> ```
> All available -march extensions for RISC-V:
> NameVersion
> i   2.0, 2.1
> e   2.0
> m   2.0
> a   2.0, 2.1
> f   2.0, 2.2
> d   2.0, 2.2
> ...
> ```
>
> Also support -print-supported-extensions and --print-supported-extensions for
> clang compatibility.

If I remember correctly, then this feature was requested several times
in the past.
Thanks for working on this!

Reviewed-by: Christoph Müllner 

I have done a quick feature test (no bootstrapping, no check for
compiler warnings) as well.
Below you find all supported RISC-V extension in today's master branch:

All available -march extensions for RISC-V:
NameVersion
i   2.0, 2.1
e   2.0
m   2.0
a   2.0, 2.1
f   2.0, 2.2
d   2.0, 2.2
c   2.0
v   1.0
h   1.0
zic64b  1.0
zicbom  1.0
zicbop  1.0
zicboz  1.0
ziccamoa1.0
ziccif  1.0
zicclsm 1.0
ziccrse 1.0
zicntr  2.0
zicond  1.0
zicsr   2.0
zifencei2.0
zihintntl   1.0
zihintpause 2.0
zihpm   2.0
zmmul   1.0
za128rs 1.0
za64rs  1.0
zawrs   1.0
zfa 1.0
zfh 1.0
zfhmin  1.0
zfinx   1.0
zdinx   1.0
zca 1.0
zcb 1.0
zcd 1.0
zce 1.0
zcf 1.0
zcmp1.0
zcmt1.0
zba 1.0
zbb 1.0
zbc 1.0
zbkb1.0
zbkc1.0
zbkc1.0
zbkx1.0
zbs 1.0
zk  1.0
zkn 1.0
zknd1.0
zkne1.0
zknh1.0
zkr 1.0
zks 1.0
zksed   1.0
zksh1.0
zkt 1.0
ztso1.0
zvbb1.0
zvbc1.0
zve32f  1.0
zve32x  1.0
zve64d  1.0
zve64f  1.0
zve64x  1.0
zvfbfmin1.0
zvfh1.0
zvfhmin 1.0
zvkb1.0
zvkg1.0
zvkn1.0
zvknc   1.0
zvkned  1.0
zvkng   1.0
zvknha  1.0
zvknhb  1.0
zvks1.0
zvksc   1.0
zvksed  1.0
zvksg   1.0
zvksh   1.0
zvkt1.0
zvl1024b1.0
zvl128b 1.0
zvl16384b   1.0
zvl2048b1.0
zvl256b 1.0
zvl32768b   1.0
zvl32b  1.0
zvl4096b1.0
zvl512b 1.0
zvl64b  1.0
zvl65536b   1.0
zvl8192b1.0
zhinx   1.0
zhinxmin1.0
smaia   1.0
smepmp  1.0
smstateen   1.0
ssaia   1.0
sscofpmf1.0
ssstateen   1.0
sstc1.0
svinval 1.0
svnapot 1.0
svpbmt  1.0
xcvalu  1.0
xcvelw  1.0
xcvmac  1.0
xcvsimd 1.0
xtheadba1.0

RE: [PATCH]AArch64: remove ls64 from being mandatory on armv8.7-a..

2024-02-15 Thread Tamar Christina
Hi,  this I a new version of the patch updating some additional tests
because some of the LTO tests required a newer binutils than my distro had.

---

The Arm Architectural Reference Manual (Version J.a, section A2.9 on FEAT_LS64)
shows that ls64 is an optional extensions and should not be enabled by default
for Armv8.7-a.

This drops it from the mandatory bits for the architecture and brings GCC inline
with LLVM and the achitecture.

Note that we will not be changing binutils to preserve compatibility with older
released compilers.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and backport to GCC 13,12,11?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (AARCH64_ARCH): Remove LS64 from
Armv8.7-a.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/acle/ls64.C: Add +ls64.
* g++.target/aarch64/acle/ls64_lto.C: Likewise.
* gcc.target/aarch64/acle/ls64_lto.c: Likewise.
* gcc.target/aarch64/acle/pr110100.c: Likewise.
* gcc.target/aarch64/acle/pr110132.c: Likewise.
* gcc.target/aarch64/options_set_28.c: Drop check for nols64.
* gcc.target/aarch64/pragma_cpp_predefs_2.c: Correct header checks.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
b7115ff7c3d4a7ee7abbedcb091ef15a7efacc79..9bec30e9203bac01155281ef3474846c402bb29e
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -37,7 +37,7 @@ AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 
8,  (V8_2A, PAUTH, R
 AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
F16FML, DOTPROD, FLAGM))
 AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
SSBS, PREDRES))
 AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
-AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, LS64))
+AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A))
 AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
 AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A))
 AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
diff --git a/gcc/testsuite/g++.target/aarch64/acle/ls64.C 
b/gcc/testsuite/g++.target/aarch64/acle/ls64.C
index 
d9002785b578741bde1202761f0881dc3d47e608..dcfe6f1af6711a7f3ec2562f6aabf56baecf417d
 100644
--- a/gcc/testsuite/g++.target/aarch64/acle/ls64.C
+++ b/gcc/testsuite/g++.target/aarch64/acle/ls64.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-march=armv8.7-a" } */
+/* { dg-additional-options "-march=armv8.7-a+ls64" } */
 #include 
 int main()
 {
diff --git a/gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C 
b/gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C
index 
274a4771e1c1d13bcb1a7bdc77c2e499726f024c..0198fe2a1b78627b873bf22e3d8416dbdcc77078
 100644
--- a/gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C
+++ b/gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C
@@ -1,5 +1,5 @@
 /* { dg-do link { target aarch64_asm_ls64_ok } } */
-/* { dg-additional-options "-march=armv8.7-a -flto" } */
+/* { dg-additional-options "-march=armv8.7-a+ls64 -flto" } */
 #include 
 int main()
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c 
b/gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c
index 
8b4f24277717675badc39dd145d365f75f5ceb27..0e5ae0b052b50b08d35151f4bc113617c1569bd3
 100644
--- a/gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c
+++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target aarch64_asm_ls64_ok } } */
-/* { dg-additional-options "-march=armv8.7-a -flto" } */
+/* { dg-additional-options "-march=armv8.7-a+ls64 -flto" } */
 #include 
 int main(void)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c 
b/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c
index 
f56d5e619e8ac23cdf720574bd6ee08fbfd36423..62a82b97c56debad092cc8fd1ed48f0219109cd7
 100644
--- a/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c
+++ b/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8.7-a -O2" } */
+/* { dg-options "-march=armv8.7-a+ls64 -O2" } */
 #include 
 void do_st64b(data512_t data) {
   __arm_st64b((void*)0x1000, data);
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/pr110132.c 
b/gcc/testsuite/gcc.target/aarch64/acle/pr110132.c
index 
fb88d633dd20772fd96e976a400fe52ae0bc3647..423d91b9a99f269d01d07428414ade7cc518c711
 100644
--- a/gcc/testsuite/gcc.target/aarch64/acle/pr110132.c
+++ b/gcc/testsuite/gcc.target/aarch64/acle/pr110132.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-march=armv8.7-a" } */
+/* { dg-additional-options "-march=armv8.7-a+ls64" } */
 
 /* Check that ls64 builtins can be invoked using a preprocesed testcase
without triggering bogus builtin warnings, 

[PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-15 Thread Tamar Christina
Hi All,

This test has never worked on AArch64 since the day it was committed.  It has
a number of issues that prevent it from working on AArch64:

1.  IEEE does not require that FP operations raise a SIGFPE for FP operations,
only that an exception is raised somehow.

2. Most Arm designed cores don't raise SIGFPE and instead set a status register
   and some partner cores raise a SIGILL instead.

3. The way it checks for feenableexcept doesn't really work for AArch64.

As such this test doesn't seem to really provide much value on AArch64 so we
should just xfail it.

Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR fortran/107071
* gfortran.dg/ieee/modes_1.f90: xfail aarch64.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90 
b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
index 
205c47f38007d06116289c19d6b23cf3bf83bd48..3667571969427ae7b2b96684ec1af8b3fdd4985f
 100644
--- a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
+++ b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
@@ -1,4 +1,4 @@
-! { dg-do run }
+! { dg-do run { xfail { aarch64*-*-* } } }
 !
 ! Test IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES
 




-- 
diff --git a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90 
b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
index 
205c47f38007d06116289c19d6b23cf3bf83bd48..3667571969427ae7b2b96684ec1af8b3fdd4985f
 100644
--- a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
+++ b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
@@ -1,4 +1,4 @@
-! { dg-do run }
+! { dg-do run { xfail { aarch64*-*-* } } }
 !
 ! Test IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES
 





Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs

On 15/02/2024 10:21, Richard Biener wrote:
[snip]

I suppse if RDNA really only has 32 lane vectors (it sounds like it,
even if it can "simulate" 64 lane ones?) then it might make sense to
vectorize for 32 lanes?  That said, with variable-length it likely
doesn't matter but I'd not expose fixed-size modes with 64 lanes then?


For most operations, wavefrontsize=64 works just fine; the GPU runs each
instruction twice and presents a pair of hardware registers as a logical
64-lane register. This breaks down for permutations and reductions, and is
obviously inefficient when they vectors are not fully utilized, but is
otherwise compatible with the GCN/CDNA compiler.

I didn't want to invest all the effort it would take to support
wavefrontsize=32, which would be the natural mode for these devices; the
number of places that have "64" hard-coded is just too big. Not only that, but
the EXEC and VCC registers change from DImode to SImode and that's going to
break a lot of stuff. (And we have no paying customer for this.)

I'm open to patch submissions. :)


OK, I see ;)  As said for fully masked that's a good answer.  I'd
probably still not expose V64mode modes in the RTL expanders for the
vect_* patterns?  Or, what happens if you change
gcn_vectorize_preferred_simd_mode to return 32 lane modes for RDNA
and omit 64 lane modes from gcn_autovectorize_vector_modes for RDNA?


Changing the preferred mode probably would fix permute.


Does that possibly leave performance on the plate? (not sure if there's
any documents about choosing wavefrontsize=64 vs 32 with regard to
performance)

Note it would entirely forbit the vectorizer from using larger modes,
it just makes it prefer the smaller ones.  OTOH if you then run
wavefrontsize=64 ontop of it it's probably wasting the 2nd instruction
by always masking it?


Right, the GPU will continue to process the "top half" of the vector as 
an additional step, regardless whether you put anything useful there, or 
not.



So yeah.  Guess a s/64/wavefrontsize/ would be a first step towards
allowing 32 there ...


I think the DImode to SImode change is the most difficult fix. Unless 
you know of a cunning trick, that's going to mean a lot of changes to a 
lot of the machine description; substitutions, duplications, iterators, 
indirections, etc., etc., etc.


The "64" substitution would be tedious but less hairy. I did a lot of 
those when I created the fake vector sizes.



Anyway, the fix works, so that's the most important thing ;)


:)

Andrew


Re: [PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-15 Thread Richard Earnshaw (lists)
On 15/02/2024 10:57, Tamar Christina wrote:
> Hi All,
> 
> This test has never worked on AArch64 since the day it was committed.  It has
> a number of issues that prevent it from working on AArch64:
> 
> 1.  IEEE does not require that FP operations raise a SIGFPE for FP operations,
>     only that an exception is raised somehow.
> 
> 2. Most Arm designed cores don't raise SIGFPE and instead set a status 
> register
>    and some partner cores raise a SIGILL instead.
> 
> 3. The way it checks for feenableexcept doesn't really work for AArch64.
> 
> As such this test doesn't seem to really provide much value on AArch64 so we
> should just xfail it.
> 
> Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Wouldn't it be better to just skip the test.  XFAIL just adds clutter to 
verbose output and suggests that someday the tools might be fixed for this case.

Better still would be a new dg-requires fp_exceptions_raise_sigfpe as a guard 
for the test.

R.

> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
>     PR fortran/107071
>     * gfortran.dg/ieee/modes_1.f90: xfail aarch64.
> 
> --- inline copy of patch --
> diff --git a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90 
> b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> index 
> 205c47f38007d06116289c19d6b23cf3bf83bd48..3667571969427ae7b2b96684ec1af8b3fdd4985f
>  100644
> --- a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> +++ b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> @@ -1,4 +1,4 @@
> -! { dg-do run }
> +! { dg-do run { xfail { aarch64*-*-* } } }
>  !
>  ! Test IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES
>  
> 
> 
> 
> 
> -- 



RE: [PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-15 Thread Tamar Christina
> -Original Message-
> From: Richard Earnshaw (lists) 
> Sent: Thursday, February 15, 2024 11:01 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; Marcus Shawcroft ; Kyrylo
> Tkachov ; Richard Sandiford
> 
> Subject: Re: [PATCH]AArch64: xfail modes_1.f90 [PR107071]
> 
> On 15/02/2024 10:57, Tamar Christina wrote:
> > Hi All,
> >
> > This test has never worked on AArch64 since the day it was committed.  It 
> > has
> > a number of issues that prevent it from working on AArch64:
> >
> > 1.  IEEE does not require that FP operations raise a SIGFPE for FP 
> > operations,
> >     only that an exception is raised somehow.
> >
> > 2. Most Arm designed cores don't raise SIGFPE and instead set a status 
> > register
> >    and some partner cores raise a SIGILL instead.
> >
> > 3. The way it checks for feenableexcept doesn't really work for AArch64.
> >
> > As such this test doesn't seem to really provide much value on AArch64 so we
> > should just xfail it.
> >
> > Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Wouldn't it be better to just skip the test.  XFAIL just adds clutter to 
> verbose output
> and suggests that someday the tools might be fixed for this case.
> 
> Better still would be a new dg-requires fp_exceptions_raise_sigfpe as a guard 
> for
> the test.

There seems to be check_effective_target_fenv_exceptions which seems to test for
if the target can raise FP exceptions.  I'll see if that works.

Thanks,
Tamar

> 
> R.
> 
> >
> > Thanks,
> > Tamar
> >
> > gcc/testsuite/ChangeLog:
> >
> >     PR fortran/107071
> >     * gfortran.dg/ieee/modes_1.f90: xfail aarch64.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> > index
> 205c47f38007d06116289c19d6b23cf3bf83bd48..3667571969427ae7b2b9668
> 4ec1af8b3fdd4985f 100644
> > --- a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> > +++ b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> > @@ -1,4 +1,4 @@
> > -! { dg-do run }
> > +! { dg-do run { xfail { aarch64*-*-* } } }
> >  !
> >  ! Test IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES
> >
> >
> >
> >
> >
> > --



Re: [PATCH] RISC-V: Set require-effective-target rv64 for PR113742

2024-02-15 Thread Robin Dapp
> Ah oops I glanced over the /* { dg-do compile } */part. It should be
> fine to add '-march=rv64gc' instead then?

Hmm it's a bit tricky.  So generally -mcpu=sifive-p670 includes rv64
but it does not override a previously specified -march=rv32 (that might
have been added by the test harness or the test target).  It looks
like it does override a (build option and thus not directly specified
when compiling) --with-arch=rv32.

For now I'd stick with something like -march=rv64gc -mtune=sifive-p670
(but please check if the original problem does occur with this).
While you're at it you could delete the redundant '/' in the first
line.

In general it's a bit counterintuitive a test specifying a
particular CPU (that supports several extensions) might have
those overridden when e.g. testing on a rv32 target not supporting
those.  We also do not support cpu names in the march string
so there is no nice way of overriding previously specified marchs.

Kito: Any idea regarding this?  I read in your commit message that
mcpu has lower precedence than march.  Right now that allows us to
somewhat silently remove architecture options that are specified
last on the command line.

aarch64 warns in case something is in conflict, maybe we should do
that as well?

At least I find it a bit annoying that we don't have a way of
saying:
"This test always needs to be compiled with all arch features of
cpu = ..." and rather need to specify -march=rv64gcv_z..._z...

Without having this thought through, can't mcpu be of kind of
similar precedence to march and we'd let the one specified last
"win" in case of conflicts?  Possibly with an exception for
the 32/64 bit.  Does LLVM not have this problem?

Regards
 Robin



[committed] libstdc++: Use 128-bit arithmetic for std::linear_congruential_engine [PR87744]

2024-02-15 Thread Jonathan Wakely
Tested aarch64-linux and x86_64-linux (-m64 and -m32).

Pushed to trunk.

-- >8 --

For 32-bit targets without __int128 we need to implement the LCG
transition function by hand using 64-bit types.

We can also slightly simplify the __mod function by using if-constexpr
unconditionally, disabling -Wc++17-extensions warnings with diagnostic
pragmas.

libstdc++-v3/ChangeLog:

PR libstdc++/87744
* include/bits/random.h [!__SIZEOF_INT128__] (_Select_uint_least_t):
Define specialization for 64-bit generators with
non-power-of-two modulus and large constants.
(__mod): Use if constexpr unconditionally.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.
* testsuite/26_numerics/random/linear_congruential_engine/87744.cc:
New test.
---
 libstdc++-v3/include/bits/random.h| 116 --
 .../linear_congruential_engine/87744.cc   |  22 
 .../26_numerics/random/pr60037-neg.cc |   2 +-
 3 files changed, 132 insertions(+), 8 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/26_numerics/random/linear_congruential_engine/87744.cc

diff --git a/libstdc++-v3/include/bits/random.h 
b/libstdc++-v3/include/bits/random.h
index 0fbd092e7ef..5fda21af882 100644
--- a/libstdc++-v3/include/bits/random.h
+++ b/libstdc++-v3/include/bits/random.h
@@ -64,6 +64,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Implementation-space details.
   namespace __detail
   {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions"
+
 template
  (std::numeric_limits<_UIntType>::digits)>
@@ -102,6 +105,108 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template
   struct _Select_uint_least_t<__s, 1>
   { __extension__ using type = unsigned __int128; };
+#elif __has_builtin(__builtin_add_overflow) \
+&& __has_builtin(__builtin_sub_overflow) \
+&& defined __UINT64_TYPE__
+template
+  struct _Select_uint_least_t<__s, 1>
+  {
+   // This is NOT a general-purpose 128-bit integer type.
+   // It only supports (type(a) * x + c) % m as needed by __mod.
+   struct type
+   {
+ explicit
+ type(uint64_t __a) noexcept : _M_lo(__a), _M_hi(0) { }
+
+ // pre: __l._M_hi == 0
+ friend type
+ operator*(type __l, uint64_t __x) noexcept
+ {
+   // Split 64-bit values __l._M_lo and __x into high and low 32-bit
+   // limbs and multiply those individually.
+   // l * x = (l0 + l1) * (x0 + x1) = l0x0 + l0x1 + l1x0 + l1x1
+
+   constexpr uint64_t __mask = 0x;
+   uint64_t __ll[2] = { __l._M_lo >> 32, __l._M_lo & __mask };
+   uint64_t __xx[2] = { __x >> 32, __x & __mask };
+   uint64_t __l0x0 = __ll[0] * __xx[0];
+   uint64_t __l0x1 = __ll[0] * __xx[1];
+   uint64_t __l1x0 = __ll[1] * __xx[0];
+   uint64_t __l1x1 = __ll[1] * __xx[1];
+   // These bits are the low half of __l._M_hi
+   // and the high half of __l._M_lo.
+   uint64_t __mid
+ = (__l0x1 & __mask) + (__l1x0 & __mask) + (__l1x1 >> 32);
+   __l._M_hi = __l0x0 + (__l0x1 >> 32) + (__l1x0 >> 32) + (__mid >> 
32);
+   __l._M_lo = (__mid << 32) + (__l1x1 & __mask);
+   return __l;
+ }
+
+ friend type
+ operator+(type __l, uint64_t __c) noexcept
+ {
+   __l._M_hi += __builtin_add_overflow(__l._M_lo, __c, &__l._M_lo);
+   return __l;
+ }
+
+ friend type
+ operator%(type __l, uint64_t __m) noexcept
+ {
+   if (__builtin_expect(__l._M_hi == 0, 0))
+ {
+   __l._M_lo %= __m;
+   return __l;
+ }
+
+   int __shift = __builtin_clzll(__m) + 64
+   - __builtin_clzll(__l._M_hi);
+   type __x(0);
+   if (__shift >= 64)
+ {
+   __x._M_hi = __m << (__shift - 64);
+   __x._M_lo = 0;
+ }
+   else
+ {
+   __x._M_hi = __m >> (64 - __shift);
+   __x._M_lo = __m << __shift;
+ }
+
+   while (__l._M_hi != 0 || __l._M_lo >= __m)
+ {
+   if (__x <= __l)
+ {
+   __l._M_hi -= __x._M_hi;
+   __l._M_hi -= __builtin_sub_overflow(__l._M_lo, __x._M_lo,
+   &__l._M_lo);
+ }
+   __x._M_lo = (__x._M_lo >> 1) | (__x._M_hi << 63);
+   __x._M_hi >>= 1;
+ }
+   return __l;
+ }
+
+ // pre: __l._M_hi == 0
+ explicit operator uint64_t() const noexcept
+ { return _M_lo; }
+
+ friend bool operator<(const type& __l, const type& __r) noexcept
+ {
+   if (__l._M_hi < __r._M_hi)
+ return true;
+   else if (_

[committed] libstdc++: Avoid aliasing violation in std::valarray [PR99117]

2024-02-15 Thread Jonathan Wakely
Tested aarch64-linux and x86_64-linux. Pushed to trunk.

This should be backported too, as it's a regression fix.

-- >8 --

The call to __valarray_copy constructs an _Array object to refer to
this->_M_data but that means that accesses to this->_M_data are through
a restrict-qualified pointer. This leads to undefined behaviour when
copying from an _Expr object that actually aliases this->_M_data.

Replace the call to __valarray_copy with a plain loop. I think this
removes the only use of that overload of __valarray_copy, so it could
probably be removed. I haven't done that here.

libstdc++-v3/ChangeLog:

PR libstdc++/99117
* include/std/valarray (valarray::operator=(const _Expr&)):
Use loop to copy instead of __valarray_copy with _Array.
* testsuite/26_numerics/valarray/99117.cc: New test.
---
 libstdc++-v3/include/std/valarray   |  8 +++-
 .../testsuite/26_numerics/valarray/99117.cc | 17 +
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/26_numerics/valarray/99117.cc

diff --git a/libstdc++-v3/include/std/valarray 
b/libstdc++-v3/include/std/valarray
index a4eecd833f7..46cd57e7982 100644
--- a/libstdc++-v3/include/std/valarray
+++ b/libstdc++-v3/include/std/valarray
@@ -840,7 +840,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 630. arrays of valarray.
   if (_M_size == __e.size())
-   std::__valarray_copy(__e, _M_size, _Array<_Tp>(_M_data));
+   {
+ // Copy manually instead of using __valarray_copy, because __e might
+ // alias _M_data and the _Array param type of __valarray_copy uses
+ // restrict which doesn't allow aliasing.
+ for (size_t __i = 0; __i < _M_size; ++__i)
+   _M_data[__i] = __e[__i];
+   }
   else
{
  if (_M_data)
diff --git a/libstdc++-v3/testsuite/26_numerics/valarray/99117.cc 
b/libstdc++-v3/testsuite/26_numerics/valarray/99117.cc
new file mode 100644
index 000..81621bd079a
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/valarray/99117.cc
@@ -0,0 +1,17 @@
+// { dg-do run { target c++11 } }
+
+// PR libstdc++/99117 cannot accumulate std::valarray
+
+#include 
+#include 
+#include 
+
+int main()
+{
+std::vector> v = {{1,1}, {2,2}};
+std::valarray sum(2);
+for (const auto& e : v)
+  sum = sum + e;
+VERIFY(sum[0]==3);
+VERIFY(sum[1]==3);
+}
-- 
2.43.0



[committed] libstdc++: Update tzdata to 2024a

2024-02-15 Thread Jonathan Wakely
Tested aarch64-linux.  Pushed to trunk.

This should be backported to gcc-13 too.

-- >8 --

Import the new 2024a tzdata.zi file. The leapseconds file was also
updated to have a new expiry (no new leap seconds were added).

libstdc++-v3/ChangeLog:

* src/c++20/tzdata.zi: Import new file from 2024a release.
* src/c++20/tzdb.cc (tzdb_list::_Node::_S_read_leap_seconds)
Update expiry date for leap seconds list.
---
 libstdc++-v3/src/c++20/tzdata.zi | 3970 +++---
 libstdc++-v3/src/c++20/tzdb.cc   |4 +-
 2 files changed, 1986 insertions(+), 1988 deletions(-)

diff --git a/libstdc++-v3/src/c++20/tzdata.zi b/libstdc++-v3/src/c++20/tzdata.zi
index 4e01359010c..be1c4085920 100644
--- a/libstdc++-v3/src/c++20/tzdata.zi
+++ b/libstdc++-v3/src/c++20/tzdata.zi
@@ -1,4 +1,4 @@
-# version 2023d
+# version 2024a
 # This zic input file is in the public domain.
 R d 1916 o - Jun 14 23s 1 S
 R d 1916 1919 - O Su>=1 23s 0 -
@@ -22,27 +22,6 @@ R d 1978 o - Mar 24 1 1 S
 R d 1978 o - S 22 3 0 -
 R d 1980 o - Ap 25 0 1 S
 R d 1980 o - O 31 2 0 -
-Z Africa/Algiers 0:12:12 - LMT 1891 Mar 16
-0:9:21 - PMT 1911 Mar 11
-0 d WE%sT 1940 F 25 2
-1 d CE%sT 1946 O 7
-0 - WET 1956 Ja 29
-1 - CET 1963 Ap 14
-0 d WE%sT 1977 O 21
-1 d CE%sT 1979 O 26
-0 d WE%sT 1981 May
-1 - CET
-Z Atlantic/Cape_Verde -1:34:4 - LMT 1912 Ja 1 2u
--2 - -02 1942 S
--2 1 -01 1945 O 15
--2 - -02 1975 N 25 2
--1 - -01
-Z Africa/Ndjamena 1:0:12 - LMT 1912
-1 - WAT 1979 O 14
-1 1 WAST 1980 Mar 8
-1 - WAT
-Z Africa/Abidjan -0:16:8 - LMT 1912
-0 - GMT
 R K 1940 o - Jul 15 0 1 S
 R K 1940 o - O 1 0 0 -
 R K 1941 o - Ap 15 0 1 S
@@ -77,21 +56,6 @@ R K 2014 o - Jul 31 24 1 S
 R K 2014 o - S lastTh 24 0 -
 R K 2023 ma - Ap lastF 0 1 S
 R K 2023 ma - O lastTh 24 0 -
-Z Africa/Cairo 2:5:9 - LMT 1900 O
-2 K EE%sT
-Z Africa/Bissau -1:2:20 - LMT 1912 Ja 1 1u
--1 - -01 1975
-0 - GMT
-Z Africa/Nairobi 2:27:16 - LMT 1908 May
-2:30 - +0230 1928 Jun 30 24
-3 - EAT 1930 Ja 4 24
-2:30 - +0230 1936 D 31 24
-2:45 - +0245 1942 Jul 31 24
-3 - EAT
-Z Africa/Monrovia -0:43:8 - LMT 1882
--0:43:8 - MMT 1919 Mar
--0:44:30 - MMT 1972 Ja 7
-0 - GMT
 R L 1951 o - O 14 2 1 S
 R L 1952 o - Ja 1 0 0 -
 R L 1953 o - O 9 2 1 S
@@ -109,21 +73,10 @@ R L 1997 o - Ap 4 0 1 S
 R L 1997 o - O 4 0 0 -
 R L 2013 o - Mar lastF 1 1 S
 R L 2013 o - O lastF 2 0 -
-Z Africa/Tripoli 0:52:44 - LMT 1920
-1 L CE%sT 1959
-2 - EET 1982
-1 L CE%sT 1990 May 4
-2 - EET 1996 S 30
-1 L CE%sT 1997 O 4
-2 - EET 2012 N 10 2
-1 L CE%sT 2013 O 25 2
-2 - EET
 R MU 1982 o - O 10 0 1 -
 R MU 1983 o - Mar 21 0 0 -
 R MU 2008 o - O lastSu 2 1 -
 R MU 2009 o - Mar lastSu 2 0 -
-Z Indian/Mauritius 3:50 - LMT 1907
-4 MU +04/+05
 R M 1939 o - S 12 0 1 -
 R M 1939 o - N 19 0 0 -
 R M 1940 o - F 25 0 1 -
@@ -307,53 +260,15 @@ R M 2086 o - Ap 14 3 -1 -
 R M 2086 o - May 19 2 0 -
 R M 2087 o - Mar 30 3 -1 -
 R M 2087 o - May 11 2 0 -
-Z Africa/Casablanca -0:30:20 - LMT 1913 O 26
-0 M +00/+01 1984 Mar 16
-1 - +01 1986
-0 M +00/+01 2018 O 28 3
-1 M +01/+00
-Z Africa/El_Aaiun -0:52:48 - LMT 1934
--1 - -01 1976 Ap 14
-0 M +00/+01 2018 O 28 3
-1 M +01/+00
-Z Africa/Maputo 2:10:20 - LMT 1903 Mar
-2 - CAT
 R NA 1994 o - Mar 21 0 -1 WAT
 R NA 1994 2017 - S Su>=1 2 0 CAT
 R NA 1995 2017 - Ap Su>=1 2 -1 WAT
-Z Africa/Windhoek 1:8:24 - LMT 1892 F 8
-1:30 - +0130 1903 Mar
-2 - SAST 1942 S 20 2
-2 1 SAST 1943 Mar 21 2
-2 - SAST 1990 Mar 21
-2 NA %s
-Z Africa/Lagos 0:13:35 - LMT 1905 Jul
-0 - GMT 1908 Jul
-0:13:35 - LMT 1914
-0:30 - +0030 1919 S
-1 - WAT
-Z Africa/Sao_Tome 0:26:56 - LMT 1884
--0:36:45 - LMT 1912 Ja 1 0u
-0 - GMT 2018 Ja 1 1
-1 - WAT 2019 Ja 1 2
-0 - GMT
 R SA 1942 1943 - S Su>=15 2 1 -
 R SA 1943 1944 - Mar Su>=15 2 0 -
-Z Africa/Johannesburg 1:52 - LMT 1892 F 8
-1:30 - SAST 1903 Mar
-2 SA SAST
 R SD 1970 o - May 1 0 1 S
 R SD 1970 1985 - O 15 0 0 -
 R SD 1971 o - Ap 30 0 1 S
 R SD 1972 1985 - Ap lastSu 0 1 S
-Z Africa/Khartoum 2:10:8 - LMT 1931
-2 SD CA%sT 2000 Ja 15 12
-3 - EAT 2017 N
-2 - CAT
-Z Africa/Juba 2:6:28 - LMT 1931
-2 SD CA%sT 2000 Ja 15 12
-3 - EAT 2021 F
-2 - CAT
 R n 1939 o - Ap 15 23s 1 S
 R n 1939 o - N 18 23s 0 -
 R n 1940 o - F 25 23s 1 S
@@ -379,90 +294,14 @@ R n 2005 o - May 1 0s 1 S
 R n 2005 o - S 30 1s 0 -
 R n 2006 2008 - Mar lastSu 2s 1 S
 R n 2006 2008 - O lastSu 2s 0 -
-Z Africa/Tunis 0:40:44 - LMT 1881 May 12
-0:9:21 - PMT 1911 Mar 11
-1 n CE%sT
-Z Antarctica/Casey 0 - -00 1969
-8 - +08 2009 O 18 2
-11 - +11 2010 Mar 5 2
-8 - +08 2011 O 28 2
-11 - +11 2012 F 21 17u
-8 - +08 2016 O 22
-11 - +11 2018 Mar 11 4
-8 - +08 2018 O 7 4
-11 - +11 2019 Mar 17 3
-8 - +08 2019 O 4 3
-11 - +11 2020 Mar 8 3
-8 - +08 2020 O 4 0:1
-11 - +11 2021 Mar 14
-8 - +08 2021 O 3 0:1
-11 - +11 2022 Mar 13
-8 - +08 2022 O 2 0:1
-11 - +11 2023 Mar 9 3
-8 - +08
-Z Antarctica/Davis 0 - -00 1957 Ja 13
-7 - +07 1964 N
-0 - -00 1969 F
-7 - +07 2009 O 18 2
-5 - +05 2010 Mar 10 20u
-7 - +07 2011 O 28 2
-5 - +05 2012 F 21 20u
-7 - +07
-Z Antarctica/Mawson 0 - -00 1954 F 13
-6 - +06 2009 O 18 2
-5

[committed] libstdc++: Remove redundant zeroing in std::bitset::operator>>= [PR113806]

2024-02-15 Thread Jonathan Wakely
Tested aarch64-linux and x86_64-linux. Pushed to trunk.

-- >8 --

The unused bits in the high word are already zero before this operation.
Shifting the used bits to the right cannot affect the unused bits, so we
don't need to sanitize them.

libstdc++-v3/ChangeLog:

PR libstdc++/113806
* include/std/bitset (bitset::operator>>=): Remove redundant
call to _M_do_sanitize.
---
 libstdc++-v3/include/std/bitset | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/bitset b/libstdc++-v3/include/std/bitset
index 16c4040f532..ccd6d19f7a4 100644
--- a/libstdc++-v3/include/std/bitset
+++ b/libstdc++-v3/include/std/bitset
@@ -1094,10 +1094,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   operator>>=(size_t __position) _GLIBCXX_NOEXCEPT
   {
if (__builtin_expect(__position < _Nb, 1))
- {
-   this->_M_do_right_shift(__position);
-   this->_M_do_sanitize();
- }
+ this->_M_do_right_shift(__position);
else
  this->_M_do_reset();
return *this;
-- 
2.43.0



[committed] libstdc++: Use unsigned division in std::rotate [PR113811]

2024-02-15 Thread Jonathan Wakely
Tested aarch64-linux and x86_64-linux. Pushed to trunk.

-- >8 --

Signed 64-bit division is much slower than unsigned, so cast the n and
k values to unsigned before doing n %= k. We know this is safe because
neither value can be negative.

libstdc++-v3/ChangeLog:

PR libstdc++/113811
* include/bits/stl_algo.h (__rotate): Use unsigned values for
division.
---
 libstdc++-v3/include/bits/stl_algo.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 9496b53f887..7a0cf6b6737 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -1251,6 +1251,12 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
   typedef typename iterator_traits<_RandomAccessIterator>::value_type
_ValueType;
 
+#if __cplusplus >= 201103L
+  typedef typename make_unsigned<_Distance>::type _UDistance;
+#else
+  typedef _Distance _UDistance;
+#endif
+
   _Distance __n = __last   - __first;
   _Distance __k = __middle - __first;
 
@@ -1281,7 +1287,7 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
  ++__p;
  ++__q;
}
- __n %= __k;
+ __n = static_cast<_UDistance>(__n) % static_cast<_UDistance>(__k);
  if (__n == 0)
return __ret;
  std::swap(__n, __k);
@@ -1305,7 +1311,7 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
  --__q;
  std::iter_swap(__p, __q);
}
- __n %= __k;
+ __n = static_cast<_UDistance>(__n) % static_cast<_UDistance>(__k);
  if (__n == 0)
return __ret;
  std::swap(__n, __k);
-- 
2.43.0



[committed] libstdc++: Use memset to optimize std::bitset::set() [PR113807]

2024-02-15 Thread Jonathan Wakely
Tested aarch64-linux and x86_64-linux. Pushed to trunk.

-- >8 --

As pointed out in the PR we already do this for reset().

libstdc++-v3/ChangeLog:

PR libstdc++/113807
* include/std/bitset (bitset::set()): Use memset instead of a
loop over the individual words.
---
 libstdc++-v3/include/std/bitset | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/bitset b/libstdc++-v3/include/std/bitset
index 3243c649731..16c4040f532 100644
--- a/libstdc++-v3/include/std/bitset
+++ b/libstdc++-v3/include/std/bitset
@@ -177,8 +177,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _GLIBCXX14_CONSTEXPR void
   _M_do_set() _GLIBCXX_NOEXCEPT
   {
-   for (size_t __i = 0; __i < _Nw; __i++)
- _M_w[__i] = ~static_cast<_WordT>(0);
+#if __cplusplus >= 201402L
+   if (__builtin_is_constant_evaluated())
+ {
+   for (_WordT& __w : _M_w)
+ __w = ~static_cast<_WordT>(0);;
+   return;
+ }
+#endif
+   __builtin_memset(_M_w, 0xFF, _Nw * sizeof(_WordT));
   }
 
   _GLIBCXX14_CONSTEXPR void
-- 
2.43.0



Re: [PATCH] gccrs: Avoid *.bak suffixed tests - use dg-skip-if instead

2024-02-15 Thread Arthur Cohen

Hi Jakub,

On 2/15/24 10:10, Jakub Jelinek wrote:

On Fri, Feb 09, 2024 at 11:03:38AM +0100, Jakub Jelinek wrote:

On Wed, Feb 07, 2024 at 12:43:59PM +0100, arthur.co...@embecosm.com wrote:

From: Philip Herron 

This patch introduces one regression because generics are getting better
understood over time. The code here used to apply generics with the same
symbol from previous segments which was a bit of a hack with out limited
inference variable support. The regression looks like it will be related
to another issue which needs to default integer inference variables much
more aggresivly to default integer.

Fixes #2723

gcc/rust/ChangeLog:

* typecheck/rust-hir-type-check-path.cc 
(TypeCheckExpr::resolve_segments): remove hack

gcc/testsuite/ChangeLog:

* rust/compile/issue-1773.rs: Moved to...
* rust/compile/issue-1773.rs.bak: ...here.


Please don't use such suffixes in the testsuite.
Either delete the testcase, or xfail it somehow until the bug is fixed.


To be precise, I have scripts to look for backup files in the tree (*~,
*.bak, *.orig, *.rej etc.) and this stands in the way several times a day.

Here is a fix for that in patch form, tested on x86_64-linux with
make check-rust RUNTESTFLAGS='compile.exp=issue-1773.rs'
Ok for trunk? >
2024-02-15  Jakub Jelinek  

* rust/compile/issue-1773.rs.bak: Rename to ...
* rust/compile/issue-1773.rs: ... this.  Add dg-skip-if directive.

diff --git a/gcc/testsuite/rust/compile/issue-1773.rs.bak 
b/gcc/testsuite/rust/compile/issue-1773.rs
similarity index 89%
rename from gcc/testsuite/rust/compile/issue-1773.rs.bak
rename to gcc/testsuite/rust/compile/issue-1773.rs
index a4542aea00b..468497a4792 100644
--- a/gcc/testsuite/rust/compile/issue-1773.rs.bak
+++ b/gcc/testsuite/rust/compile/issue-1773.rs
@@ -1,4 +1,5 @@
  #[lang = "sized"]
+// { dg-skip-if "" { *-*-* } }
  pub trait Sized {}
  
  trait Foo {


Jakub




Looks good to me, thanks for taking the time! OK for trunk.

Best,

Arthur


Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Richard Biener
On Thu, 15 Feb 2024, Andrew Stubbs wrote:

> On 15/02/2024 10:21, Richard Biener wrote:
> [snip]
> >>> I suppse if RDNA really only has 32 lane vectors (it sounds like it,
> >>> even if it can "simulate" 64 lane ones?) then it might make sense to
> >>> vectorize for 32 lanes?  That said, with variable-length it likely
> >>> doesn't matter but I'd not expose fixed-size modes with 64 lanes then?
> >>
> >> For most operations, wavefrontsize=64 works just fine; the GPU runs each
> >> instruction twice and presents a pair of hardware registers as a logical
> >> 64-lane register. This breaks down for permutations and reductions, and is
> >> obviously inefficient when they vectors are not fully utilized, but is
> >> otherwise compatible with the GCN/CDNA compiler.
> >>
> >> I didn't want to invest all the effort it would take to support
> >> wavefrontsize=32, which would be the natural mode for these devices; the
> >> number of places that have "64" hard-coded is just too big. Not only that,
> >> but
> >> the EXEC and VCC registers change from DImode to SImode and that's going to
> >> break a lot of stuff. (And we have no paying customer for this.)
> >>
> >> I'm open to patch submissions. :)
> > 
> > OK, I see ;)  As said for fully masked that's a good answer.  I'd
> > probably still not expose V64mode modes in the RTL expanders for the
> > vect_* patterns?  Or, what happens if you change
> > gcn_vectorize_preferred_simd_mode to return 32 lane modes for RDNA
> > and omit 64 lane modes from gcn_autovectorize_vector_modes for RDNA?
> 
> Changing the preferred mode probably would fix permute.
> 
> > Does that possibly leave performance on the plate? (not sure if there's
> > any documents about choosing wavefrontsize=64 vs 32 with regard to
> > performance)
> > 
> > Note it would entirely forbit the vectorizer from using larger modes,
> > it just makes it prefer the smaller ones.  OTOH if you then run
> > wavefrontsize=64 ontop of it it's probably wasting the 2nd instruction
> > by always masking it?
> 
> Right, the GPU will continue to process the "top half" of the vector as an
> additional step, regardless whether you put anything useful there, or not.
> 
> > So yeah.  Guess a s/64/wavefrontsize/ would be a first step towards
> > allowing 32 there ...
> 
> I think the DImode to SImode change is the most difficult fix. Unless you know
> of a cunning trick, that's going to mean a lot of changes to a lot of the
> machine description; substitutions, duplications, iterators, indirections,
> etc., etc., etc.

Hmm, maybe just leave it at DImode in the patterns?  OTOH mode
iterators to do both SImode and DImode might work as well, but yeah,
a lot of churn.

Richard.


Re: [PATCH] bpf: fix zero_extendqidi2 ldx template

2024-02-15 Thread Jose E. Marchesi


Hi Faust.
OK, thank you.

> Commit 77d0f9ec3809b4d2e32c36069b6b9239d301c030 inadvertently changed
> the normal asm dialect instruction template for zero_extendqidi2 from
> ldxb to ldxh. Fix that.
>
> Tested for bpf-unknown-none on x86_64-linux-gnu host.
>
> gcc/
>
>   * config/bpf/bpf.md (zero_extendqidi2): Correct asm template to
>   use ldxb instead of ldxh.
> ---
>  gcc/config/bpf/bpf.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 080a63cd970..50df1aaa3e2 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -292,7 +292,7 @@ (define_insn "zero_extendqidi2"
>"@
> {and\t%0,0xff|%0 &= 0xff}
> {mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}
> -   {ldxh\t%0,%1|%0 = *(u8 *) (%1)}"
> +   {ldxb\t%0,%1|%0 = *(u8 *) (%1)}"
>[(set_attr "type" "alu,alu,ldx")])
>  
>  (define_insn "zero_extendsidi2"


[PATCH v4 10/12] libstdc++: Optimize std::add_rvalue_reference compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of
std::add_rvalue_reference by dispatching to the new
__add_rvalue_reference built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (add_rvalue_reference): Use
__add_rvalue_reference built-in trait.
(__add_rvalue_reference_helper): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 1f4e6db72f4..219d36fabba 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1157,6 +1157,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// @cond undocumented
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_rvalue_reference)
+  template
+struct __add_rvalue_reference_helper
+{ using type = __add_rvalue_reference(_Tp); };
+#else
   template
 struct __add_rvalue_reference_helper
 { using type = _Tp; };
@@ -1164,6 +1169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __add_rvalue_reference_helper<_Tp, __void_t<_Tp&&>>
 { using type = _Tp&&; };
+#endif
 
   template
 using __add_rval_ref_t = typename __add_rvalue_reference_helper<_Tp>::type;
@@ -1720,9 +1726,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// add_rvalue_reference
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_rvalue_reference)
+  template
+struct add_rvalue_reference
+{ using type = __add_rvalue_reference(_Tp); };
+#else
   template
 struct add_rvalue_reference
 { using type = __add_rval_ref_t<_Tp>; };
+#endif
 
 #if __cplusplus > 201103L
   /// Alias template for remove_reference
-- 
2.43.0



[PATCH v4 05/12] c++: Implement __remove_all_extents built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::remove_all_extents.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_all_extents.
* semantics.cc (finish_trait_type): Handle
CPTK_REMOVE_ALL_EXTENTS.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__remove_all_extents.
* g++.dg/ext/remove_all_extents.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  3 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +++
 gcc/testsuite/g++.dg/ext/remove_all_extents.C | 16 
 4 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/remove_all_extents.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 3ff5611b60e..ce29108bad6 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -94,6 +94,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, 
"__is_trivially_copyable", 1)
 DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
+DEFTRAIT_TYPE (REMOVE_ALL_EXTENTS, "__remove_all_extents", 1)
 DEFTRAIT_TYPE (REMOVE_CV, "__remove_cv", 1)
 DEFTRAIT_TYPE (REMOVE_CVREF, "__remove_cvref", 1)
 DEFTRAIT_TYPE (REMOVE_EXTENT, "__remove_extent", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 6ab054b106a..c8ac5167c3c 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12769,6 +12769,9 @@ finish_trait_type (cp_trait_kind kind, tree type1, tree 
type2,
type1 = TREE_TYPE (type1);
   return build_pointer_type (type1);
 
+case CPTK_REMOVE_ALL_EXTENTS:
+  return strip_array_types (type1);
+
 case CPTK_REMOVE_CV:
   return cv_unqualified (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 4f1094befb9..9af64173524 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -164,6 +164,9 @@
 #if !__has_builtin (__reference_converts_from_temporary)
 # error "__has_builtin (__reference_converts_from_temporary) failed"
 #endif
+#if !__has_builtin (__remove_all_extents)
+# error "__has_builtin (__remove_all_extents) failed"
+#endif
 #if !__has_builtin (__remove_cv)
 # error "__has_builtin (__remove_cv) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/remove_all_extents.C 
b/gcc/testsuite/g++.dg/ext/remove_all_extents.C
new file mode 100644
index 000..60ade2ade7f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/remove_all_extents.C
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__remove_all_extents(int), int));
+SA(__is_same(__remove_all_extents(int[2]), int));
+SA(__is_same(__remove_all_extents(int[2][3]), int));
+SA(__is_same(__remove_all_extents(int[][3]), int));
+SA(__is_same(__remove_all_extents(const int[2][3]), const int));
+SA(__is_same(__remove_all_extents(ClassType), ClassType));
+SA(__is_same(__remove_all_extents(ClassType[2]), ClassType));
+SA(__is_same(__remove_all_extents(ClassType[2][3]), ClassType));
+SA(__is_same(__remove_all_extents(ClassType[][3]), ClassType));
+SA(__is_same(__remove_all_extents(const ClassType[2][3]), const ClassType));
-- 
2.43.0



[PATCH v4 08/12] libstdc++: Optimize std::add_lvalue_reference compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of
std::add_lvalue_reference by dispatching to the new
__add_lvalue_reference built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (add_lvalue_reference): Use
__add_lvalue_reference built-in trait.
(__add_lvalue_reference_helper): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 2e1cc1c1d5f..1f4e6db72f4 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1129,6 +1129,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// @cond undocumented
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_lvalue_reference)
+  template
+struct __add_lvalue_reference_helper
+{ using type = __add_lvalue_reference(_Tp); };
+#else
   template
 struct __add_lvalue_reference_helper
 { using type = _Tp; };
@@ -1136,6 +1141,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __add_lvalue_reference_helper<_Tp, __void_t<_Tp&>>
 { using type = _Tp&; };
+#endif
 
   template
 using __add_lval_ref_t = typename __add_lvalue_reference_helper<_Tp>::type;
@@ -1703,9 +1709,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// add_lvalue_reference
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_lvalue_reference)
+  template
+struct add_lvalue_reference
+{ using type = __add_lvalue_reference(_Tp); };
+#else
   template
 struct add_lvalue_reference
 { using type = __add_lval_ref_t<_Tp>; };
+#endif
 
   /// add_rvalue_reference
   template
-- 
2.43.0



[PATCH v4 11/12] c++: Implement __decay built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::decay.

gcc/cp/ChangeLog:

* cp-trait.def: Define __decay.
* semantics.cc (finish_trait_type): Handle CPTK_DECAY.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __decay.
* g++.dg/ext/decay.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  | 12 
 gcc/testsuite/g++.dg/ext/decay.C | 39 
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 4 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/decay.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 9e8f9eb38b8..11270f3ae6b 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -51,6 +51,7 @@
 DEFTRAIT_TYPE (ADD_LVALUE_REFERENCE, "__add_lvalue_reference", 1)
 DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
 DEFTRAIT_TYPE (ADD_RVALUE_REFERENCE, "__add_rvalue_reference", 1)
+DEFTRAIT_TYPE (DECAY, "__decay", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_COPY, "__has_nothrow_copy", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index f437e272ea6..256e7ef8166 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12785,6 +12785,18 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
return type1;
   return cp_build_reference_type (type1, /*rval=*/true);
 
+case CPTK_DECAY:
+  if (TYPE_REF_P (type1))
+   type1 = TREE_TYPE (type1);
+
+  if (TREE_CODE (type1) == ARRAY_TYPE)
+   return finish_trait_type (CPTK_ADD_POINTER, TREE_TYPE (type1), type2,
+ complain);
+  else if (TREE_CODE (type1) == FUNCTION_TYPE)
+   return finish_trait_type (CPTK_ADD_POINTER, type1, type2, complain);
+  else
+   return cv_unqualified (type1);
+
 case CPTK_REMOVE_ALL_EXTENTS:
   return strip_array_types (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/decay.C b/gcc/testsuite/g++.dg/ext/decay.C
new file mode 100644
index 000..cf224b7452c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/decay.C
@@ -0,0 +1,39 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+// class ClassType { };
+
+// Positive tests.
+using test1_type = __decay(bool);
+SA(__is_same(test1_type, bool));
+
+// NB: DR 705.
+using test2_type = __decay(const int);
+SA(__is_same(test2_type, int));
+
+using test3_type = __decay(int[4]);
+SA(__is_same(test3_type, __remove_extent(int[4])*));
+
+using fn_type = void ();
+using test4_type = __decay(fn_type);
+SA(__is_same(test4_type, __add_pointer(fn_type)));
+
+using cfn_type = void () const;
+using test5_type = __decay(cfn_type);
+SA(__is_same(test5_type, cfn_type));
+
+// SA(__is_same(__add_rvalue_reference(int), int&&));
+// SA(__is_same(__add_rvalue_reference(int&&), int&&));
+// SA(__is_same(__add_rvalue_reference(int&), int&));
+// SA(__is_same(__add_rvalue_reference(const int), const int&&));
+// SA(__is_same(__add_rvalue_reference(int*), int*&&));
+// SA(__is_same(__add_rvalue_reference(ClassType&&), ClassType&&));
+// SA(__is_same(__add_rvalue_reference(ClassType), ClassType&&));
+// SA(__is_same(__add_rvalue_reference(int(int)), int(&&)(int)));
+// SA(__is_same(__add_rvalue_reference(void), void));
+// SA(__is_same(__add_rvalue_reference(const void), const void));
+// SA(__is_same(__add_rvalue_reference(bool(int) const), bool(int) const));
+// SA(__is_same(__add_rvalue_reference(bool(int) &), bool(int) &));
+// SA(__is_same(__add_rvalue_reference(bool(int) const &&), bool(int) const 
&&));
+// SA(__is_same(__add_rvalue_reference(bool(int)), bool(&&)(int)));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 9d7e59b47fb..5b590db1cf6 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -11,6 +11,9 @@
 #if !__has_builtin (__add_rvalue_reference)
 # error "__has_builtin (__add_rvalue_reference) failed"
 #endif
+#if !__has_builtin (__decay)
+# error "__has_builtin (__decay) failed"
+#endif
 #if !__has_builtin (__builtin_addressof)
 # error "__has_builtin (__builtin_addressof) failed"
 #endif
-- 
2.43.0



[patch,avr,applied] Fix PR target/113927: Simple code triggers a stack frame

2024-02-15 Thread Georg-Johann Lay

Applied this patch

Johann

--

AVR: target 113927 - Simple code triggers stack frame for Reduced Tiny.

The -mmcu=avrtiny cores have no ADIW and SBIW instructions.  This was
implemented by clearing all regs out of regclass ADDW_REGS so that
constraint "w" never matched.  This corrupted the subset relations of
the register classes as they appear in enum reg_class.

This patch keeps ADDW_REGS like for all other cores, i.e. it contains
R24...R31.  Instead of tests like  test_hard_reg_class (ADDW_REGS, *)
the code now uses  avr_adiw_reg_p (*).  And all insns with constraint "w"
get "isa" insn attribute value of "adiw".

Plus, a new built-in macro __AVR_HAVE_ADIW__ is provided, which is more
specific than __AVR_TINY__.

gcc/
PR target/113927
* config/avr/avr.h (AVR_HAVE_ADIW): New macro.
* config/avr/avr-protos.h (avr_adiw_reg_p): New proto.
* config/avr/avr.cc (avr_adiw_reg_p): New function.
(avr_conditional_register_usage) [AVR_TINY]: Don't clear ADDW_REGS.
Replace test_hard_reg_class (ADDW_REGS, ...) with calls to
* config/avr/avr.md: Same.
(attr "isa") : Remove.
: Add.
(define_insn, define_insn_and_split): When an alternative has
constraint "w", then set attribute "isa" to "adiw".
* config/avr/avr-c.cc (avr_cpu_cpp_builtins) [AVR_HAVE_ADIW]:
Built-in define __AVR_HAVE_ADIW__.
* doc/invoke.texi (AVR Options): Document it.
diff --git a/gcc/config/avr/avr-c.cc b/gcc/config/avr/avr-c.cc
index 60905a76556..5e7f759ed73 100644
--- a/gcc/config/avr/avr-c.cc
+++ b/gcc/config/avr/avr-c.cc
@@ -307,6 +307,7 @@ avr_cpu_cpp_builtins (struct cpp_reader *pfile)
   if (AVR_HAVE_ELPMX)cpp_define (pfile, "__AVR_HAVE_ELPMX__");
   if (AVR_HAVE_MOVW) cpp_define (pfile, "__AVR_HAVE_MOVW__");
   if (AVR_HAVE_LPMX) cpp_define (pfile, "__AVR_HAVE_LPMX__");
+  if (AVR_HAVE_ADIW) cpp_define (pfile, "__AVR_HAVE_ADIW__");
 
   if (avr_arch->asm_only)
 cpp_define (pfile, "__AVR_ASM_ONLY__");
diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index 46b75f96b9c..7d1f815c664 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -123,6 +123,7 @@ extern enum reg_class avr_mode_code_base_reg_class (machine_mode, addr_space_t,
 extern bool avr_regno_mode_code_ok_for_base_p (int, machine_mode, addr_space_t, RTX_CODE, RTX_CODE);
 extern rtx avr_incoming_return_addr_rtx (void);
 extern rtx avr_legitimize_reload_address (rtx*, machine_mode, int, int, int, int, rtx (*)(rtx,int));
+extern bool avr_adiw_reg_p (rtx);
 extern bool avr_mem_flash_p (rtx);
 extern bool avr_mem_memx_p (rtx);
 extern bool avr_load_libgcc_p (rtx);
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index d21b286ed8b..4a55f14bff7 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -292,6 +292,17 @@ avr_to_int_mode (rtx x)
 : simplify_gen_subreg (int_mode_for_mode (mode).require (), x, mode, 0);
 }
 
+
+/* Return true if hard register REG supports the ADIW and SBIW instructions.  */
+
+bool
+avr_adiw_reg_p (rtx reg)
+{
+  return (AVR_HAVE_ADIW
+	  && test_hard_reg_class (ADDW_REGS, reg));
+}
+
+
 namespace {
 
 static const pass_data avr_pass_data_recompute_notes =
@@ -6272,7 +6283,7 @@ avr_out_compare (rtx_insn *insn, rtx *xop, int *plen)
   /* Word registers >= R24 can use SBIW/ADIW with 0..63.  */
 
   if (i == 0
-	  && test_hard_reg_class (ADDW_REGS, reg8))
+	  && avr_adiw_reg_p (reg8))
 	{
 	  int val16 = trunc_int_for_mode (INTVAL (xval), HImode);
 
@@ -8186,7 +8197,7 @@ avr_out_plus_1 (rtx *xop, int *plen, enum rtx_code code, int *pcc,
   if (!started
 	  && i % 2 == 0
 	  && i + 2 <= n_bytes
-	  && test_hard_reg_class (ADDW_REGS, reg8))
+	  && avr_adiw_reg_p (reg8))
 	{
 	  rtx xval16 = simplify_gen_subreg (HImode, xval, imode, i);
 	  unsigned int val16 = UINTVAL (xval16) & GET_MODE_MASK (HImode);
@@ -8678,7 +8689,7 @@ avr_out_plus_set_ZN (rtx *xop, int *plen)
 }
 
   if (n_bytes == 2
-  && test_hard_reg_class (ADDW_REGS, xreg)
+  && avr_adiw_reg_p (xreg)
   && IN_RANGE (INTVAL (xval), 1, 63))
 {
   // Add 16-bit value in [1..63] to a w register.
@@ -8705,7 +8716,7 @@ avr_out_plus_set_ZN (rtx *xop, int *plen)
 
   if (i == 0
 	  && n_bytes >= 2
-	  && test_hard_reg_class (ADDW_REGS, op[0]))
+	  && avr_adiw_reg_p (op[0]))
 	{
 	  op[1] = simplify_gen_subreg (HImode, xval, mode, 0);
 	  if (IN_RANGE (INTVAL (op[1]), 0, 63))
@@ -13312,7 +13323,6 @@ avr_conditional_register_usage (void)
 	  reg_alloc_order[i] = tiny_reg_alloc_order[i];
 	}
 
-  CLEAR_HARD_REG_SET (reg_class_contents[(int) ADDW_REGS]);
   CLEAR_HARD_REG_SET (reg_class_contents[(int) NO_LD_REGS]);
 }
 }
@@ -14043,7 +14053,7 @@ avr_out_cpymem (rtx_insn *insn ATTRIBUTE_UNUSED, rtx *op, int *plen)
 {
   addr_space_t as = (addr_space_t) INTVAL (op[0]);
   machine_mode loop_mode = GET_MODE (op[1]);
-  bool sbiw_p = test_hard_reg_class (ADDW_REGS, op[1]);
+  bool

[PATCH v4 09/12] c++: Implement __add_rvalue_reference built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::add_rvalue_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_rvalue_reference.
* semantics.cc (finish_trait_type): Handle
CPTK_ADD_RVALUE_REFERENCE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__add_rvalue_reference.
* g++.dg/ext/add_rvalue_reference.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  8 
 .../g++.dg/ext/add_rvalue_reference.C | 20 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +++
 4 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/add_rvalue_reference.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 7dcc6bbad76..9e8f9eb38b8 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -50,6 +50,7 @@
 
 DEFTRAIT_TYPE (ADD_LVALUE_REFERENCE, "__add_lvalue_reference", 1)
 DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
+DEFTRAIT_TYPE (ADD_RVALUE_REFERENCE, "__add_rvalue_reference", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_COPY, "__has_nothrow_copy", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 82fc31d9f9b..f437e272ea6 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12777,6 +12777,14 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
type1 = TREE_TYPE (type1);
   return build_pointer_type (type1);
 
+case CPTK_ADD_RVALUE_REFERENCE:
+  if (VOID_TYPE_P (type1)
+ || (FUNC_OR_METHOD_TYPE_P (type1)
+ && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
+ || type_memfn_rqual (type1) != REF_QUAL_NONE)))
+   return type1;
+  return cp_build_reference_type (type1, /*rval=*/true);
+
 case CPTK_REMOVE_ALL_EXTENTS:
   return strip_array_types (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/add_rvalue_reference.C 
b/gcc/testsuite/g++.dg/ext/add_rvalue_reference.C
new file mode 100644
index 000..c92fe6bfa17
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/add_rvalue_reference.C
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__add_rvalue_reference(int), int&&));
+SA(__is_same(__add_rvalue_reference(int&&), int&&));
+SA(__is_same(__add_rvalue_reference(int&), int&));
+SA(__is_same(__add_rvalue_reference(const int), const int&&));
+SA(__is_same(__add_rvalue_reference(int*), int*&&));
+SA(__is_same(__add_rvalue_reference(ClassType&&), ClassType&&));
+SA(__is_same(__add_rvalue_reference(ClassType), ClassType&&));
+SA(__is_same(__add_rvalue_reference(int(int)), int(&&)(int)));
+SA(__is_same(__add_rvalue_reference(void), void));
+SA(__is_same(__add_rvalue_reference(const void), const void));
+SA(__is_same(__add_rvalue_reference(bool(int) const), bool(int) const));
+SA(__is_same(__add_rvalue_reference(bool(int) &), bool(int) &));
+SA(__is_same(__add_rvalue_reference(bool(int) const &&), bool(int) const &&));
+SA(__is_same(__add_rvalue_reference(bool(int)), bool(&&)(int)));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 1046ffe7d01..9d7e59b47fb 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -8,6 +8,9 @@
 #if !__has_builtin (__add_pointer)
 # error "__has_builtin (__add_pointer) failed"
 #endif
+#if !__has_builtin (__add_rvalue_reference)
+# error "__has_builtin (__add_rvalue_reference) failed"
+#endif
 #if !__has_builtin (__builtin_addressof)
 # error "__has_builtin (__builtin_addressof) failed"
 #endif
-- 
2.43.0



[PATCH v4 07/12] c++: Implement __add_lvalue_reference built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::add_lvalue_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_lvalue_reference.
* semantics.cc (finish_trait_type): Handle
CPTK_ADD_LVALUE_REFERENCE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__add_lvalue_reference.
* g++.dg/ext/add_lvalue_reference.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  8 +++
 .../g++.dg/ext/add_lvalue_reference.C | 21 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +++
 4 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/add_lvalue_reference.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index ce29108bad6..7dcc6bbad76 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -48,6 +48,7 @@
 #define DEFTRAIT_TYPE_DEFAULTED
 #endif
 
+DEFTRAIT_TYPE (ADD_LVALUE_REFERENCE, "__add_lvalue_reference", 1)
 DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index c8ac5167c3c..82fc31d9f9b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12760,6 +12760,14 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
 
   switch (kind)
 {
+case CPTK_ADD_LVALUE_REFERENCE:
+  if (VOID_TYPE_P (type1)
+ || (FUNC_OR_METHOD_TYPE_P (type1)
+ && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
+ || type_memfn_rqual (type1) != REF_QUAL_NONE)))
+   return type1;
+  return cp_build_reference_type (type1, /*rval=*/false);
+
 case CPTK_ADD_POINTER:
   if (FUNC_OR_METHOD_TYPE_P (type1)
  && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
diff --git a/gcc/testsuite/g++.dg/ext/add_lvalue_reference.C 
b/gcc/testsuite/g++.dg/ext/add_lvalue_reference.C
new file mode 100644
index 000..8fe1e0300e5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/add_lvalue_reference.C
@@ -0,0 +1,21 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__add_lvalue_reference(int), int&));
+SA(__is_same(__add_lvalue_reference(int&), int&));
+SA(__is_same(__add_lvalue_reference(const int), const int&));
+SA(__is_same(__add_lvalue_reference(int*), int*&));
+SA(__is_same(__add_lvalue_reference(ClassType&), ClassType&));
+SA(__is_same(__add_lvalue_reference(ClassType), ClassType&));
+SA(__is_same(__add_lvalue_reference(int(int)), int(&)(int)));
+SA(__is_same(__add_lvalue_reference(int&&), int&));
+SA(__is_same(__add_lvalue_reference(ClassType&&), ClassType&));
+SA(__is_same(__add_lvalue_reference(void), void));
+SA(__is_same(__add_lvalue_reference(const void), const void));
+SA(__is_same(__add_lvalue_reference(bool(int) const), bool(int) const));
+SA(__is_same(__add_lvalue_reference(bool(int) &), bool(int) &));
+SA(__is_same(__add_lvalue_reference(bool(int) const &&), bool(int) const &&));
+SA(__is_same(__add_lvalue_reference(bool(int)), bool(&)(int)));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 9af64173524..1046ffe7d01 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -2,6 +2,9 @@
 // { dg-do compile }
 // Verify that __has_builtin gives the correct answer for C++ built-ins.
 
+#if !__has_builtin (__add_lvalue_reference)
+# error "__has_builtin (__add_lvalue_reference) failed"
+#endif
 #if !__has_builtin (__add_pointer)
 # error "__has_builtin (__add_pointer) failed"
 #endif
-- 
2.43.0



[PATCH v4 12/12] libstdc++: Optimize std::decay compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of std::decay
by dispatching to the new __decay built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (decay): Use __decay built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 219d36fabba..90718d772dd 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2288,6 +2288,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// @cond undocumented
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__decay)
+  template
+struct decay
+{ using type = __decay(_Tp); };
+#else
   // Decay trait for arrays and functions, used for perfect forwarding
   // in make_pair, make_tuple, etc.
   template
@@ -2319,6 +2324,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct decay<_Tp&&>
 { using type = typename __decay_selector<_Tp>::type; };
+#endif
 
   /// @cond undocumented
 
-- 
2.43.0



[PATCH v4 06/12] libstdc++: Optimize std::remove_all_extents compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of
std::remove_all_extents by dispatching to the new __remove_all_extents
built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_all_extents): Use
__remove_all_extents built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 0fb1762186c..2e1cc1c1d5f 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2083,6 +2083,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// remove_all_extents
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__remove_all_extents)
+  template
+struct remove_all_extents
+{ using type = __remove_all_extents(_Tp); };
+#else
   template
 struct remove_all_extents
 { using type = _Tp; };
@@ -2094,6 +2099,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct remove_all_extents<_Tp[]>
 { using type = typename remove_all_extents<_Tp>::type; };
+#endif
 
 #if __cplusplus > 201103L
   /// Alias template for remove_extent
-- 
2.43.0



Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs

On 15/02/2024 10:23, Thomas Schwinge wrote:

Hi!

On 2024-02-15T08:49:17+0100, Richard Biener  wrote:

On Wed, 14 Feb 2024, Andrew Stubbs wrote:

On 14/02/2024 13:43, Richard Biener wrote:

On Wed, 14 Feb 2024, Andrew Stubbs wrote:

On 14/02/2024 13:27, Richard Biener wrote:

On Wed, 14 Feb 2024, Andrew Stubbs wrote:

On 13/02/2024 08:26, Richard Biener wrote:

On Mon, 12 Feb 2024, Thomas Schwinge wrote:

On 2023-10-20T12:51:03+0100, Andrew Stubbs 
wrote:

I've committed this patch


... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
"amdgcn: add -march=gfx1030 EXPERIMENTAL".

The RDNA2 ISA variant doesn't support certain instructions previous
implemented in GCC/GCN, so a number of patterns etc. had to be
disabled:


[...] Vector
reductions will need to be reworked for RDNA2.  [...]



* config/gcn/gcn-valu.md (@dpp_move): Disable for RDNA2.
(addc3): Add RDNA2 syntax variant.
(subc3): Likewise.
(2_exec): Add RDNA2 alternatives.
(vec_cmpdi): Likewise.
(vec_cmpdi): Likewise.
(vec_cmpdi_exec): Likewise.
(vec_cmpdi_exec): Likewise.
(vec_cmpdi_dup): Likewise.
(vec_cmpdi_dup_exec): Likewise.
(reduc__scal_): Disable for RDNA2.
(*_dpp_shr_): Likewise.
(*plus_carry_dpp_shr_): Likewise.
(*plus_carry_in_dpp_shr_): Likewise.


Etc.  The expectation being that GCC middle end copes with this, and
synthesizes some less ideal yet still functional vector code, I presume.

The later RDNA3/gfx1100 support builds on top of this, and that's what
I'm currently working on getting proper GCC/GCN target (not offloading)
results for.

I'm seeing a good number of execution test FAILs (regressions compared to
my earlier non-gfx1100 testing), and I've now tracked down where one
large class of those comes into existance -- [...]



With the following hack applied to 'gcc/tree-vect-loop.cc':

@@ -6687,8 +6687,9 @@ vect_create_epilog_for_reduction
(loop_vec_info
loop_vinfo,
   reduce_with_shift = have_whole_vector_shift (mode1);
   if (!VECTOR_MODE_P (mode1)
  || !directly_supported_p (code, vectype1))
reduce_with_shift = false;
+  reduce_with_shift = false;

..., I'm able to work around those regressions: by means of forcing
"Reduce using scalar code" instead of "Reduce using vector shifts".



The attached not-well-tested patch should allow only valid permutations.
Hopefully we go back to working code, but there'll be things that won't
vectorize. That said, the new "dump" output code has fewer and probably
cheaper instructions, so hmmm.


This fixes the reduced builtin-bitops-1.c on RDNA2.


I confirm that "amdgcn: Disallow unsupported permute on RDNA devices"
also obsoletes my 'reduce_with_shift = false;' hack -- and also cures a
good number of additional FAILs (regressions), where presumably we
permute via different code paths.  Thanks!

There also are a few regressions, but only minor:

 PASS: gcc.dg/vect/no-vfa-vect-depend-3.c (test for excess errors)
 PASS: gcc.dg/vect/no-vfa-vect-depend-3.c execution test
 PASS: gcc.dg/vect/no-vfa-vect-depend-3.c scan-tree-dump-times vect "vectorized 
1 loops" 4
 [-PASS:-]{+FAIL:+} gcc.dg/vect/no-vfa-vect-depend-3.c scan-tree-dump-times vect 
"dependence distance negative" 4

..., because:

 gcc.dg/vect/no-vfa-vect-depend-3.c: pattern found 6 times
 FAIL: gcc.dg/vect/no-vfa-vect-depend-3.c scan-tree-dump-times vect "dependence 
distance negative" 4

 PASS: gcc.dg/vect/vect-119.c (test for excess errors)
 [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-119.c scan-tree-dump-times vect "Detected 
interleaving load of size 2" 1
 PASS: gcc.dg/vect/vect-119.c scan-tree-dump-not optimized "Invalid sum"

..., because:

 gcc.dg/vect/vect-119.c: pattern found 3 times
 FAIL: gcc.dg/vect/vect-119.c scan-tree-dump-times vect "Detected interleaving 
load of size 2" 1

 PASS: gcc.dg/vect/vect-reduc-mul_1.c (test for excess errors)
 PASS: gcc.dg/vect/vect-reduc-mul_1.c execution test
 [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-reduc-mul_1.c scan-tree-dump vect "Reduce 
using vector shifts"

 PASS: gcc.dg/vect/vect-reduc-mul_2.c (test for excess errors)
 PASS: gcc.dg/vect/vect-reduc-mul_2.c execution test
 [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-reduc-mul_2.c scan-tree-dump vect "Reduce 
using vector shifts"

..., plus the following, in combination with the earlier changes
disabling patterns:

 PASS: gcc.dg/vect/vect-reduc-or_1.c (test for excess errors)
 PASS: gcc.dg/vect/vect-reduc-or_1.c execution test
 [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce 
using direct vector reduction"

 PASS: gcc.dg/vect/vect-reduc-or_2.c (test for excess errors)
 PASS: gcc.dg/vect/vect-reduc-or_2.c execution test
 [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce 
using direct vector reduction"

Such test cases will need conditionalization on specific conf

Re: [PATCH][_GLIBCXX_DEBUG] Fix std::__niter_base behavior

2024-02-15 Thread Jonathan Wakely
On Wed, 14 Feb 2024 at 21:48, François Dumont  wrote:

>
> On 14/02/2024 20:44, Jonathan Wakely wrote:
>
>
>
> On Wed, 14 Feb 2024 at 18:39, François Dumont 
> wrote:
>
>> libstdc++: [_GLIBCXX_DEBUG] Fix std::__niter_base behavior
>>
>> std::__niter_base is used in _GLIBCXX_DEBUG mode to remove
>> _Safe_iterator<>
>> wrapper on random access iterators. But doing so it should also preserve
>> original
>> behavior to remove __normal_iterator wrapper.
>>
>> libstdc++-v3/ChangeLog:
>>
>>  * include/bits/stl_algobase.h (std::__niter_base): Redefine the
>> overload
>>  definitions for __gnu_debug::_Safe_iterator.
>>  * include/debug/safe_iterator.tcc (std::__niter_base): Adapt
>> declarations.
>>
>> Ok to commit once all tests completed (still need to check pre-c++11) ?
>>
>
>
> The declaration in  include/bits/stl_algobase.h has a noexcept-specifier
> but the definition in include/debug/safe_iterator.tcc does not have one -
> that seems wrong (I'm surprised it even compiles).
>
> It does !
>

The diagnostic is suppressed without -Wsystem-headers:

/home/jwakely/gcc/14/include/c++/14.0.1/debug/safe_iterator.tcc:255:5: warning:
declaration of 'template constexpr decltype (std::__
niter_base(declval<_Ite>())) std::__niter_base(const
__gnu_debug::_Safe_iterator<_Iterator, _Sequence,
random_access_iterator_tag>&)' has a different except
ion specifier [-Wsystem-headers]
 255 | __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
 | ^~~~
/home/jwakely/gcc/14/include/c++/14.0.1/bits/stl_algobase.h:335:5: note: from
previous declaration 'template constexpr decltype
(std
::__niter_base(declval<_Ite>())) std::__niter_base(const
__gnu_debug::_Safe_iterator<_Iterator, _Sequence,
random_access_iterator_tag>&) noexcept (noexcept
(is_nothrow_copy_constructible()))>::value))'
 335 | __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
 | ^~~~


It's a hard error with Clang though:

deb.cc:7:10: error: call to '__niter_base' is ambiguous






> I thought it was only necessary at declaration, and I also had troubles
> doing it right at definition because of the interaction with the auto and
> ->.
>

The trailing-return-type has to come after the noexcept-specifier.



> Now simplified and consistent in this new proposal.
>
>
> Just using std::is_nothrow_copy_constructible<_Ite> seems simpler, that
> will be true for __normal_iterator if
> is_nothrow_copy_constructible is true.
>
> Ok
>
>
> The definition in include/debug/safe_iterator.tcc should use
> std::declval<_Ite>() not declval<_Ite>(). Is there any reason why the
> definition uses a late-specified-return-type (i.e. auto and ->) when the
> declaration doesn't?
>
>
> I initially plan to use '-> std::decltype(std::__niter_base(__it.base()))'
> but this did not compile, ambiguity issue. So I resort to using
> std::declval and I could have then done it the same way as declaration,
> done now.
>
> Attached is what I'm testing, ok to commit once fully tested ?
>

OK, thanks.


[PATCH] c-c++-common/Wrestrict.c: fix some typos and enable for LLP64

2024-02-15 Thread Jonathan Yong

Attached patch OK?

Copy/pasted for review convenience.

diff --git a/gcc/testsuite/c-c++-common/Wrestrict.c 
b/gcc/testsuite/c-c++-common/Wrestrict.c
index 4d005a618b3..57a3f67e21e 100644
--- a/gcc/testsuite/c-c++-common/Wrestrict.c
+++ b/gcc/testsuite/c-c++-common/Wrestrict.c
@@ -381,14 +381,14 @@ void test_memcpy_range_exceed (char *d, const char *s)
   T (d + i, s + 1, 3);   /* { dg-warning "accessing 3 bytes at offsets \\\[\[0-9\]+, 
\[0-9\]+] and 1 overlaps 1 byte" "memcpy" } */
 
 #if __SIZEOF_SIZE_T__ == 8

-  /* Verfiy the offset and size computation is correct.  The overlap
- offset mentioned in the warning plus sthe size of the access must
+  /* Verify the offset and size computation is correct.  The overlap
+ offset mentioned in the warning plus the size of the access must
  not exceed DIFF_MAX.  */
-  T (d, d + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and 
\\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" 
"LP64" { target lp64 } } */
-  T (d + i, d, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 
9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { 
target lp64 } } */
+  T (d, d + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and 
\\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" 
"LP64" { target { lp64 || llp64 } } } */
+  T (d + i, d, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 
9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { 
target { lp64 || llp64 } } } */
 
-  T (d, s + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and \\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" "LP64" { target lp64 } } */

-  T (d + i, s, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 
9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { 
target lp64 } } */
+  T (d, s + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and 
\\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" 
"LP64" { target { lp64 || llp64 } } } */
+  T (d + i, s, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 
9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { 
target { lp64 || llp64 } } } */
 #elif __SIZEOF_SIZE_T__ == 4
   T (d, d + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and \\\[2147483645, 
2147483647] overlaps 3 bytes at offset 2147483642" "ILP32" { target ilp32 } } */
   T (d + i, d, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[2147483645, 
2147483647] and 0 overlaps 3 bytes at offset 2147483642" "ILP32" { target ilp32 } } 
*/From 57b2310756b5d0de99fbdbf7b0b11f01fe66be11 Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Sun, 11 Feb 2024 09:25:25 +
Subject: [PATCH] c-c++-common/Wrestrict.c: fix some typos and enable for LLP64

Signed-off-by: Jonathan Yong <10wa...@gmail.com>

gcc/testsuite:

	* c-c++-common/Wrestrict.c: Fix typos in comments and
	enable for LLP64 testing.
---
 gcc/testsuite/c-c++-common/Wrestrict.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/Wrestrict.c b/gcc/testsuite/c-c++-common/Wrestrict.c
index 4d005a618b3..57a3f67e21e 100644
--- a/gcc/testsuite/c-c++-common/Wrestrict.c
+++ b/gcc/testsuite/c-c++-common/Wrestrict.c
@@ -381,14 +381,14 @@ void test_memcpy_range_exceed (char *d, const char *s)
   T (d + i, s + 1, 3);   /* { dg-warning "accessing 3 bytes at offsets \\\[\[0-9\]+, \[0-9\]+] and 1 overlaps 1 byte" "memcpy" } */
 
 #if __SIZEOF_SIZE_T__ == 8
-  /* Verfiy the offset and size computation is correct.  The overlap
- offset mentioned in the warning plus sthe size of the access must
+  /* Verify the offset and size computation is correct.  The overlap
+ offset mentioned in the warning plus the size of the access must
  not exceed DIFF_MAX.  */
-  T (d, d + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and \\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" "LP64" { target lp64 } } */
-  T (d + i, d, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { target lp64 } } */
+  T (d, d + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and \\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" "LP64" { target { lp64 || llp64 } } } */
+  T (d + i, d, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { target { lp64 || llp64 } } } */
 
-  T (d, s + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and \\

Re: [PATCH] libgccjit: Add support for machine-dependent builtins

2024-02-15 Thread Antoni Boucher
David: Ping

On Thu, 2024-02-08 at 08:59 -0500, Antoni Boucher wrote:
> David: Ping.
> 
> On Wed, 2024-01-10 at 18:58 -0500, Antoni Boucher wrote:
> > Here it is: https://gcc.gnu.org/pipermail/jit/2023q4/001725.html
> > 
> > On Wed, 2024-01-10 at 18:44 -0500, David Malcolm wrote:
> > > On Wed, 2024-01-10 at 18:29 -0500, Antoni Boucher wrote:
> > > > David: Ping in case you missed this patch.
> > > 
> > > For some reason it's not showing up in patchwork (or, at least, I
> > > can't
> > > find it there).  Do you have a URL for it there?
> > > 
> > > Sorry about this
> > > Dave
> > > 
> > > > 
> > > > On Sat, 2023-02-11 at 17:37 -0800, Andrew Pinski wrote:
> > > > > On Sat, Feb 11, 2023 at 4:31 PM Antoni Boucher via Gcc-
> > > > > patches
> > > > >  wrote:
> > > > > > 
> > > > > > Hi.
> > > > > > This patch adds support for machine-dependent builtins in
> > > > > > libgccjit
> > > > > > (bug 108762).
> > > > > > 
> > > > > > There are two things I don't like in this patch:
> > > > > > 
> > > > > >  1. There are a few functions copied from the C frontend
> > > > > > (common_mark_addressable_vec and a few others).
> > > > > > 
> > > > > >  2. Getting a target builtin only works from the second
> > > > > > compilation
> > > > > > since the type information is recorded at the first
> > > > > > compilation.
> > > > > > I
> > > > > > couldn't find a way to get the builtin data without using
> > > > > > the
> > > > > > langhook.
> > > > > > It is necessary to get the type information for type
> > > > > > checking
> > > > > > and
> > > > > > instrospection.
> > > > > > 
> > > > > > Any idea how to fix these issues?
> > > > > 
> > > > > Seems like you should do this patch in a few steps; that is
> > > > > split
> > > > > it
> > > > > up.
> > > > > Definitely split out GCC_JIT_TYPE_BFLOAT16 support.
> > > > > I also think the vector support should be in a different
> > > > > patch
> > > > > too.
> > > > > 
> > > > > Splitting out these parts would definitely make it easier for
> > > > > review
> > > > > and make incremental improvements.
> > > > > 
> > > > > Thanks,
> > > > > Andrew Pinski
> > > > > 
> > > > > 
> > > > > 
> > > > > > 
> > > > > > Thanks for the review.
> > > > 
> > > 
> > 
> 



Re: [PATCH] libgccjit: Clear pending_assemble_externals_processed

2024-02-15 Thread Antoni Boucher
David: Ping.

On Thu, 2024-02-08 at 17:09 -0500, Antoni Boucher wrote:
> Hi.
> This patch fixes the bug 113842.
> I cannot yet add a test with this patch since it requires using
> try/catch which is not yet merged in master.
> Thanks for the review.



Re: [PATCH][GCC][Arm] Missing optimization pattern for rev16 on architectures with thumb1

2024-02-15 Thread Richard Earnshaw (lists)
On 12/02/2024 13:48, Matthieu Longo wrote:
> This patch marks a rev16 test as XFAIL for architectures having only Thumb1 
> support. The generated code is functionally correct, but the optimization is 
> disabled when -mthumb is equivalent to Thumb1. Fixing the root issue would 
> requires changes that are not suitable for GCC14 stage 4.
> 
> More information at https://linaro.atlassian.net/browse/GNU-1141
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/arm/rev16_2.c: XFAIL when compiled with Thumb1.

Thanks, I've tweaked the commit message slightly and pushed this.

R.


[PATCH] tree-optimization/111156 - properly dissolve SLP only groups

2024-02-15 Thread Richard Biener
The following fixes the omission of failing to look at pattern
stmts when we need to dissolve SLP only groups.

Bootstrapped and tested on x86-64-unknown-linux-gnu, pushed.

PR tree-optimization/56
* tree-vect-loop.cc (vect_dissolve_slp_only_groups): Look
at the pattern stmt if any.
---
 gcc/tree-vect-loop.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9e26b09504d..5a5865c42fc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2551,7 +2551,8 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo)
   FOR_EACH_VEC_ELT (datarefs, i, dr)
 {
   gcc_assert (DR_REF (dr));
-  stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (DR_STMT (dr));
+  stmt_vec_info stmt_info
+   = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (DR_STMT (dr)));
 
   /* Check if the load is a part of an interleaving chain.  */
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
-- 
2.35.3


Re: [PATCH] expand: Fix handling of asm goto outputs vs. PHI argument adjustments [PR113921]

2024-02-15 Thread Richard Biener
On Thu, 15 Feb 2024, Jakub Jelinek wrote:

> Hi!
> 
> The Linux kernel and the following testcase distilled from it is
> miscompiled, because tree-outof-ssa.cc (eliminate_phi) emits some
> fixups on some of the edges (but doesn't commit edge insertions).
> Later expand_asm_stmt emits further instructions on the same edge.
> Now the problem is that expand_asm_stmt uses insert_insn_on_edge
> to add its own fixups, but that function appends to the existing
> sequence on the edge if any.  And the bug triggers when the
> fixup sequence emitted by eliminate_phi uses a pseudo which the
> fixup sequence emitted by expand_asm_stmt later on sets.
> So, we end up with
>   (set (reg A) (asm_operands ...))
> and on one of the edges queued sequence
>   (set (reg C) (reg B)) // added by eliminate_phi
>   (set (reg B) (reg A)) // added by expand_asm_stmt
> That is wrong, what we emit by expand_asm_stmt needs to be as close
> to the asm_operands as possible (they aren't known until expand_asm_stmt
> is called, the PHI fixup code assumes it is reg B which holds the right
> value) and the PHI adjustments need to be done after it.
> 
> So, the following patch introduces a prepend_insn_to_edge function and
> uses it from expand_asm_stmt, so that we queue
>   (set (reg B) (reg A)) // added by expand_asm_stmt
>   (set (reg C) (reg B)) // added by eliminate_phi
> instead and so the value from the asm_operands output propagates correctly
> to the PHI result.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> I think we need to backport it to all release branches (fortunately
> non-supported compilers aren't affected because GCC 11 was the first one
> to support asm goto with outputs), in cfgexpand.cc it won't apply cleanly
> due to the PR113415 fix, but manually applying it there will work.
> 
> 2024-02-15  Jakub Jelinek  
> 
>   PR middle-end/113921
>   * cfgrtl.h (prepend_insn_to_edge): New declaration.
>   * cfgrtl.cc (insert_insn_on_edge): Clarify behavior in function
>   comment.
>   (prepend_insn_to_edge): New function.
>   * cfgexpand.cc (expand_asm_stmt): Use prepend_insn_to_edge instead of
>   insert_insn_on_edge.
> 
>   * gcc.target/i386/pr113921.c: New test.
> 
> --- gcc/cfgrtl.h.jj   2024-01-03 11:51:42.576577897 +0100
> +++ gcc/cfgrtl.h  2024-02-14 21:19:13.029797669 +0100
> @@ -38,6 +38,7 @@ extern edge try_redirect_by_replacing_ju
>  extern void emit_barrier_after_bb (basic_block bb);
>  extern basic_block force_nonfallthru_and_redirect (edge, basic_block, rtx);
>  extern void insert_insn_on_edge (rtx, edge);
> +extern void prepend_insn_to_edge (rtx, edge);
>  extern void commit_one_edge_insertion (edge e);
>  extern void commit_edge_insertions (void);
>  extern void print_rtl_with_bb (FILE *, const rtx_insn *, dump_flags_t);
> --- gcc/cfgrtl.cc.jj  2024-01-03 11:51:28.900767705 +0100
> +++ gcc/cfgrtl.cc 2024-02-14 21:19:24.036651779 +0100
> @@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.
>   - CFG-aware instruction chain manipulation
>delete_insn, delete_insn_chain
>   - Edge splitting and committing to edges
> -  insert_insn_on_edge, commit_edge_insertions
> +  insert_insn_on_edge, prepend_insn_to_edge, commit_edge_insertions
>   - CFG updating after insn simplification
>purge_dead_edges, purge_all_dead_edges
>   - CFG fixing after coarse manipulation
> @@ -1966,7 +1966,8 @@ rtl_split_edge (edge edge_in)
>  
>  /* Queue instructions for insertion on an edge between two basic blocks.
> The new instructions and basic blocks (if any) will not appear in the
> -   CFG until commit_edge_insertions is called.  */
> +   CFG until commit_edge_insertions is called.  If there are already
> +   queued instructions on the edge, PATTERN is appended to them.  */
>  
>  void
>  insert_insn_on_edge (rtx pattern, edge e)
> @@ -1984,6 +1985,25 @@ insert_insn_on_edge (rtx pattern, edge e
>  
>e->insns.r = get_insns ();
>end_sequence ();
> +}
> +
> +/* Like insert_insn_on_edge, but if there are already queued instructions
> +   on the edge, PATTERN is prepended to them.  */
> +
> +void
> +prepend_insn_to_edge (rtx pattern, edge e)
> +{
> +  /* We cannot insert instructions on an abnormal critical edge.
> + It will be easier to find the culprit if we die now.  */
> +  gcc_assert (!((e->flags & EDGE_ABNORMAL) && EDGE_CRITICAL_P (e)));
> +
> +  start_sequence ();
> +
> +  emit_insn (pattern);
> +  emit_insn (e->insns.r);
> +
> +  e->insns.r = get_insns ();
> +  end_sequence ();
>  }
>  
>  /* Update the CFG for the instructions queued on edge E.  */
> --- gcc/cfgexpand.cc.jj   2024-02-10 11:25:09.995474027 +0100
> +++ gcc/cfgexpand.cc  2024-02-14 21:27:23.219300727 +0100
> @@ -3687,7 +3687,7 @@ expand_asm_stmt (gasm *stmt)
> copy = get_insns ();
> end_sequence ();
>   }
> -   insert_insn_on_edge (copy, e);
> +   prepend_ins

Re: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by adding whole vector shift right [PR113872]

2024-02-15 Thread Richard Sandiford
Andrew Pinski  writes:
> The backend currently defines a whole vector shift left for 64bit vectors, 
> adding the
> shift right can also improve code for some PERMs too. So this adds that 
> pattern.

Is this reversed?  It looks like we have the shift right and the patch is
adding the shift left (at least in GCC internal and little-endian terms).

But on many Arm cores, EXT has a higher throughput than SHL, so I don't think
we should do this unconditionally.

Thanks,
Richard

>
> I added a testcase for the shift left also. I also fixed the instruction 
> template
> there which was using a space instead of a tab after the instruction.
>
> Built and tested on aarch64-linux-gnu.
>
>   PR target/113872
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (vec_shr_): Use 
> tab instead of space after
>   the instruction in the template.
>   (vec_shl_): New pattern
>   * config/aarch64/iterators.md (unspec): Add UNSPEC_VEC_SHL
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/perm_zero-1.c: New test.
>   * gcc.target/aarch64/perm_zero-2.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64-simd.md | 18 --
>  gcc/config/aarch64/iterators.md|  1 +
>  gcc/testsuite/gcc.target/aarch64/perm_zero-1.c | 15 +++
>  gcc/testsuite/gcc.target/aarch64/perm_zero-2.c | 15 +++
>  4 files changed, 47 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/perm_zero-1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/perm_zero-2.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index f8bb973a278..0d2f1ea3902 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1592,9 +1592,23 @@ (define_insn "vec_shr_"
>"TARGET_SIMD"
>{
>  if (BYTES_BIG_ENDIAN)
> -  return "shl %d0, %d1, %2";
> +  return "shl\t%d0, %d1, %2";
>  else
> -  return "ushr %d0, %d1, %2";
> +  return "ushr\t%d0, %d1, %2";
> +  }
> +  [(set_attr "type" "neon_shift_imm")]
> +)
> +(define_insn "vec_shl_"
> +  [(set (match_operand:VD 0 "register_operand" "=w")
> +(unspec:VD [(match_operand:VD 1 "register_operand" "w")
> + (match_operand:SI 2 "immediate_operand" "i")]
> +UNSPEC_VEC_SHL))]
> +  "TARGET_SIMD"
> +  {
> +if (BYTES_BIG_ENDIAN)
> +  return "ushr\t%d0, %d1, %2";
> +else
> +  return "shl\t%d0, %d1, %2";
>}
>[(set_attr "type" "neon_shift_imm")]
>  )
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 99cde46f1ba..3aebe9cf18a 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -758,6 +758,7 @@ (define_c_enum "unspec"
>  UNSPEC_PMULL; Used in aarch64-simd.md.
>  UNSPEC_PMULL2   ; Used in aarch64-simd.md.
>  UNSPEC_REV_REGLIST  ; Used in aarch64-simd.md.
> +UNSPEC_VEC_SHL  ; Used in aarch64-simd.md.
>  UNSPEC_VEC_SHR  ; Used in aarch64-simd.md.
>  UNSPEC_SQRDMLAH ; Used in aarch64-simd.md.
>  UNSPEC_SQRDMLSH ; Used in aarch64-simd.md.
> diff --git a/gcc/testsuite/gcc.target/aarch64/perm_zero-1.c 
> b/gcc/testsuite/gcc.target/aarch64/perm_zero-1.c
> new file mode 100644
> index 000..3c8f0591a2f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/perm_zero-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2"  } */
> +/* PR target/113872 */
> +/* For 64bit vectors, PERM with a constant 0 should produce a shift instead 
> of the ext instruction. */
> +
> +#define vect64 __attribute__((vector_size(8)))
> +
> +void f(vect64  unsigned short *a)
> +{
> +  *a = __builtin_shufflevector((vect64 unsigned short){0},*a, 3,4,5,6);
> +}
> +
> +/* { dg-final { scan-assembler-times "ushr\t" 1 { target aarch64_big_endian 
> } } } */
> +/* { dg-final { scan-assembler-times "shl\t" 1 { target 
> aarch64_little_endian } } } */
> +/* { dg-final { scan-assembler-not "ext\t"  } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/perm_zero-2.c 
> b/gcc/testsuite/gcc.target/aarch64/perm_zero-2.c
> new file mode 100644
> index 000..970e428f832
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/perm_zero-2.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2"  } */
> +/* PR target/113872 */
> +/* For 64bit vectors, PERM with a constant 0 should produce a shift instead 
> of the ext instruction. */
> +
> +#define vect64 __attribute__((vector_size(8)))
> +
> +void f(vect64  unsigned short *a)
> +{
> +  *a = __builtin_shufflevector(*a, (vect64 unsigned short){0},3,4,5,6);
> +}
> +
> +/* { dg-final { scan-assembler-times "shl\t" 1 { target aarch64_big_endian } 
> } } */
> +/* { dg-final { scan-assembler-times "ushr\t" 1 { target 
> aarch64_little_endian } } } */
> +/* { dg-final { scan-assembler-not "ext\t"  } } */


[committed] testsuite: Add testcase for already fixed PR [PR107385]

2024-02-15 Thread Jakub Jelinek
Hi!

This testcase has been fixed by the PR113921 fix, but unlike testcase
in there this one is not target specific.

Tested on x86_64-linux -m32/-m64, committed to trunk as obvious.

2024-02-15  Jakub Jelinek  

PR middle-end/107385
* gcc.dg/pr107385.c: New test.

--- gcc/testsuite/gcc.dg/pr107385.c.jj  2024-01-13 00:05:00.077372302 +0100
+++ gcc/testsuite/gcc.dg/pr107385.c 2024-02-15 09:18:47.711260427 +0100
@@ -0,0 +1,20 @@
+/* PR middle-end/107385 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__((noipa)) int
+foo (void)
+{
+  int x;
+  asm goto ("": "=r" (x) : "0" (15) :: lab);
+  x = 6;
+lab:
+  return x;
+}
+
+int
+main ()
+{
+  if (foo () != 6)
+__builtin_abort ();
+}

Jakub



[PATCH] c++: implicit move with throw [PR113853]

2024-02-15 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we have

  template
  auto is_throwable(T t) -> decltype(throw t, true) { ... }

where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused
the wrong overload to have been chosen.  Jason figured out it's because
we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266
says that an id-expression is move-eligible if

"the id-expression (possibly parenthesized) is the operand of
a throw-expression, and names an implicitly movable entity that belongs
to a scope that does not contain the compound-statement of the innermost
lambda-expression, try-block, or function-try-block (if any) whose
compound-statement or ctor-initializer contains the throw-expression."

I worked out that it's trying to say that given

  struct X {
X();
X(const X&);
X(X&&) = delete;
  };

the following should fail: the scope of the throw is an sk_try, and it's
also x's scope S, and S "does not contain the compound-statement of the
*try-block" so x is move-eligible, so we move, so we fail.

  void f ()
  try {
X x;
throw x;  // use of deleted function
  } catch (...) {
  }

Whereas here:

  void g (X x)
  try {
throw x;
  } catch (...) {
  }

the throw is again in an sk_try, but x's scope is an sk_function_parms
which *does* contain the {} of the *try-block, so x is not move-eligible,
so we don't move, so we use X(const X&), and the code is fine.

The current code also doesn't seem to handle

  void h (X x) {
void z (decltype(throw x, true));
  }

where there's no enclosing lambda or sk_try so we should move.

I'm not doing anything about lambdas because we shouldn't reach the
code at the end of the function: the DECL_HAS_VALUE_EXPR_P check
shouldn't let us go further.

PR c++/113789
PR c++/113853

gcc/cp/ChangeLog:

* typeck.cc (treat_lvalue_as_rvalue_p): Update code to better
reflect [expr.prim.id.unqual]#4.2.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/sfinae69.C: Remove dg-bogus.
* g++.dg/cpp0x/sfinae70.C: New test.
* g++.dg/cpp0x/sfinae71.C: New test.
* g++.dg/cpp0x/sfinae72.C: New test.
* g++.dg/cpp2a/implicit-move4.C: New test.
---
 gcc/cp/typeck.cc| 32 +--
 gcc/testsuite/g++.dg/cpp0x/sfinae69.C   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/sfinae70.C   | 16 ++
 gcc/testsuite/g++.dg/cpp0x/sfinae71.C   | 17 ++
 gcc/testsuite/g++.dg/cpp0x/sfinae72.C   | 17 ++
 gcc/testsuite/g++.dg/cpp2a/implicit-move4.C | 59 +
 6 files changed, 126 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/sfinae70.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/sfinae71.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/sfinae72.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/implicit-move4.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 132c55cfc6d..0dc44cd87ca 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -10837,37 +10837,37 @@ treat_lvalue_as_rvalue_p (tree expr, bool return_p)
  parenthesized) id-expression that names an implicitly movable entity
  declared in the body or parameter-declaration-clause of the innermost
  enclosing function or lambda-expression, */
-  if (DECL_CONTEXT (retval) != current_function_decl)
-return NULL_TREE;
   if (return_p)
 {
+  if (DECL_CONTEXT (retval) != current_function_decl)
+   return NULL_TREE;
   expr = move (expr);
   if (expr == error_mark_node)
return NULL_TREE;
   return set_implicit_rvalue_p (expr);
 }
 
-  /* if the operand of a throw-expression is a (possibly parenthesized)
- id-expression that names an implicitly movable entity whose scope does not
- extend beyond the compound-statement of the innermost try-block or
- function-try-block (if any) whose compound-statement or ctor-initializer
- encloses the throw-expression, */
+  /* if the id-expression (possibly parenthesized) is the operand of
+ a throw-expression, and names an implicitly movable entity that belongs
+ to a scope that does not contain the compound-statement of the innermost
+ lambda-expression, try-block, or function-try-block (if any) whose
+ compound-statement or ctor-initializer contains the throw-expression.  */
 
   /* C++20 added move on throw of parms.  */
   if (TREE_CODE (retval) == PARM_DECL && cxx_dialect < cxx20)
 return NULL_TREE;
 
   for (cp_binding_level *b = current_binding_level;
-   ; b = b->level_chain)
-{
-  for (tree decl = b->names; decl; decl = TREE_CHAIN (decl))
-   if (decl == retval)
- return set_implicit_rvalue_p (move (expr));
-  if (b->kind == sk_function_parms
- || b->kind == sk_try
- || b->kind == sk_namespace)
+   b->kind != sk_namespace; b = b->level_chain)
+if (b->kind == sk_try)
+  {
+   for (tree decl = b->names; decl; decl = TREE_CHAIN (decl))
+ 

[PATCH v5 11/14] c++: Implement __decay built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::decay.

gcc/cp/ChangeLog:

* cp-trait.def: Define __decay.
* semantics.cc (finish_trait_type): Handle CPTK_DECAY.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __decay.
* g++.dg/ext/decay.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  | 12 
 gcc/testsuite/g++.dg/ext/decay.C | 39 
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 4 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/decay.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 9e8f9eb38b8..11270f3ae6b 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -51,6 +51,7 @@
 DEFTRAIT_TYPE (ADD_LVALUE_REFERENCE, "__add_lvalue_reference", 1)
 DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
 DEFTRAIT_TYPE (ADD_RVALUE_REFERENCE, "__add_rvalue_reference", 1)
+DEFTRAIT_TYPE (DECAY, "__decay", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_COPY, "__has_nothrow_copy", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index f437e272ea6..256e7ef8166 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12785,6 +12785,18 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
return type1;
   return cp_build_reference_type (type1, /*rval=*/true);
 
+case CPTK_DECAY:
+  if (TYPE_REF_P (type1))
+   type1 = TREE_TYPE (type1);
+
+  if (TREE_CODE (type1) == ARRAY_TYPE)
+   return finish_trait_type (CPTK_ADD_POINTER, TREE_TYPE (type1), type2,
+ complain);
+  else if (TREE_CODE (type1) == FUNCTION_TYPE)
+   return finish_trait_type (CPTK_ADD_POINTER, type1, type2, complain);
+  else
+   return cv_unqualified (type1);
+
 case CPTK_REMOVE_ALL_EXTENTS:
   return strip_array_types (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/decay.C b/gcc/testsuite/g++.dg/ext/decay.C
new file mode 100644
index 000..cf224b7452c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/decay.C
@@ -0,0 +1,39 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+// class ClassType { };
+
+// Positive tests.
+using test1_type = __decay(bool);
+SA(__is_same(test1_type, bool));
+
+// NB: DR 705.
+using test2_type = __decay(const int);
+SA(__is_same(test2_type, int));
+
+using test3_type = __decay(int[4]);
+SA(__is_same(test3_type, __remove_extent(int[4])*));
+
+using fn_type = void ();
+using test4_type = __decay(fn_type);
+SA(__is_same(test4_type, __add_pointer(fn_type)));
+
+using cfn_type = void () const;
+using test5_type = __decay(cfn_type);
+SA(__is_same(test5_type, cfn_type));
+
+// SA(__is_same(__add_rvalue_reference(int), int&&));
+// SA(__is_same(__add_rvalue_reference(int&&), int&&));
+// SA(__is_same(__add_rvalue_reference(int&), int&));
+// SA(__is_same(__add_rvalue_reference(const int), const int&&));
+// SA(__is_same(__add_rvalue_reference(int*), int*&&));
+// SA(__is_same(__add_rvalue_reference(ClassType&&), ClassType&&));
+// SA(__is_same(__add_rvalue_reference(ClassType), ClassType&&));
+// SA(__is_same(__add_rvalue_reference(int(int)), int(&&)(int)));
+// SA(__is_same(__add_rvalue_reference(void), void));
+// SA(__is_same(__add_rvalue_reference(const void), const void));
+// SA(__is_same(__add_rvalue_reference(bool(int) const), bool(int) const));
+// SA(__is_same(__add_rvalue_reference(bool(int) &), bool(int) &));
+// SA(__is_same(__add_rvalue_reference(bool(int) const &&), bool(int) const 
&&));
+// SA(__is_same(__add_rvalue_reference(bool(int)), bool(&&)(int)));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 9d7e59b47fb..5b590db1cf6 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -11,6 +11,9 @@
 #if !__has_builtin (__add_rvalue_reference)
 # error "__has_builtin (__add_rvalue_reference) failed"
 #endif
+#if !__has_builtin (__decay)
+# error "__has_builtin (__decay) failed"
+#endif
 #if !__has_builtin (__builtin_addressof)
 # error "__has_builtin (__builtin_addressof) failed"
 #endif
-- 
2.43.0



[PATCH v5 03/14] c++: Implement __remove_extent built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::remove_extent.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_extent.
* semantics.cc (finish_trait_type): Handle CPTK_REMOVE_EXTENT.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __remove_extent.
* g++.dg/ext/remove_extent.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  5 +
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/remove_extent.C | 16 
 4 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/remove_extent.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index cec385ee501..3ff5611b60e 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -96,6 +96,7 @@ DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_tempo
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
 DEFTRAIT_TYPE (REMOVE_CV, "__remove_cv", 1)
 DEFTRAIT_TYPE (REMOVE_CVREF, "__remove_cvref", 1)
+DEFTRAIT_TYPE (REMOVE_EXTENT, "__remove_extent", 1)
 DEFTRAIT_TYPE (REMOVE_POINTER, "__remove_pointer", 1)
 DEFTRAIT_TYPE (REMOVE_REFERENCE, "__remove_reference", 1)
 DEFTRAIT_TYPE (TYPE_PACK_ELEMENT, "__type_pack_element", -1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8dc975495a8..6ab054b106a 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12777,6 +12777,11 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
type1 = TREE_TYPE (type1);
   return cv_unqualified (type1);
 
+case CPTK_REMOVE_EXTENT:
+  if (TREE_CODE (type1) == ARRAY_TYPE)
+   type1 = TREE_TYPE (type1);
+  return type1;
+
 case CPTK_REMOVE_POINTER:
   if (TYPE_PTR_P (type1))
type1 = TREE_TYPE (type1);
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 56e8db7ac32..4f1094befb9 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -170,6 +170,9 @@
 #if !__has_builtin (__remove_cvref)
 # error "__has_builtin (__remove_cvref) failed"
 #endif
+#if !__has_builtin (__remove_extent)
+# error "__has_builtin (__remove_extent) failed"
+#endif
 #if !__has_builtin (__remove_pointer)
 # error "__has_builtin (__remove_pointer) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/remove_extent.C 
b/gcc/testsuite/g++.dg/ext/remove_extent.C
new file mode 100644
index 000..6183aca5a48
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/remove_extent.C
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__remove_extent(int), int));
+SA(__is_same(__remove_extent(int[2]), int));
+SA(__is_same(__remove_extent(int[2][3]), int[3]));
+SA(__is_same(__remove_extent(int[][3]), int[3]));
+SA(__is_same(__remove_extent(const int[2]), const int));
+SA(__is_same(__remove_extent(ClassType), ClassType));
+SA(__is_same(__remove_extent(ClassType[2]), ClassType));
+SA(__is_same(__remove_extent(ClassType[2][3]), ClassType[3]));
+SA(__is_same(__remove_extent(ClassType[][3]), ClassType[3]));
+SA(__is_same(__remove_extent(const ClassType[2]), const ClassType));
-- 
2.43.0



[PATCH v5 08/14] libstdc++: Optimize std::add_lvalue_reference compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of
std::add_lvalue_reference by dispatching to the new
__add_lvalue_reference built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (add_lvalue_reference): Use
__add_lvalue_reference built-in trait.
(__add_lvalue_reference_helper): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 2e1cc1c1d5f..1f4e6db72f4 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1129,6 +1129,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// @cond undocumented
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_lvalue_reference)
+  template
+struct __add_lvalue_reference_helper
+{ using type = __add_lvalue_reference(_Tp); };
+#else
   template
 struct __add_lvalue_reference_helper
 { using type = _Tp; };
@@ -1136,6 +1141,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __add_lvalue_reference_helper<_Tp, __void_t<_Tp&>>
 { using type = _Tp&; };
+#endif
 
   template
 using __add_lval_ref_t = typename __add_lvalue_reference_helper<_Tp>::type;
@@ -1703,9 +1709,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// add_lvalue_reference
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_lvalue_reference)
+  template
+struct add_lvalue_reference
+{ using type = __add_lvalue_reference(_Tp); };
+#else
   template
 struct add_lvalue_reference
 { using type = __add_lval_ref_t<_Tp>; };
+#endif
 
   /// add_rvalue_reference
   template
-- 
2.43.0



[PATCH v5 06/14] libstdc++: Optimize std::remove_all_extents compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of
std::remove_all_extents by dispatching to the new __remove_all_extents
built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_all_extents): Use
__remove_all_extents built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 0fb1762186c..2e1cc1c1d5f 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2083,6 +2083,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// remove_all_extents
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__remove_all_extents)
+  template
+struct remove_all_extents
+{ using type = __remove_all_extents(_Tp); };
+#else
   template
 struct remove_all_extents
 { using type = _Tp; };
@@ -2094,6 +2099,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct remove_all_extents<_Tp[]>
 { using type = typename remove_all_extents<_Tp>::type; };
+#endif
 
 #if __cplusplus > 201103L
   /// Alias template for remove_extent
-- 
2.43.0



[PATCH v5 02/14] libstdc++: Optimize std::add_pointer compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of std::add_pointer
by dispatching to the new __add_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (add_pointer): Use __add_pointer
built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 21402fd8c13..3bde7cb8ba3 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2121,6 +2121,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 #endif
 
+  /// add_pointer
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_pointer)
+  template
+struct add_pointer
+{ using type = __add_pointer(_Tp); };
+#else
   template
 struct __add_pointer_helper
 { using type = _Tp; };
@@ -2129,7 +2135,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __add_pointer_helper<_Tp, __void_t<_Tp*>>
 { using type = _Tp*; };
 
-  /// add_pointer
   template
 struct add_pointer
 : public __add_pointer_helper<_Tp>
@@ -2142,6 +2147,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct add_pointer<_Tp&&>
 { using type = _Tp*; };
+#endif
 
 #if __cplusplus > 201103L
   /// Alias template for remove_pointer
-- 
2.43.0



[PATCH v5 07/14] c++: Implement __add_lvalue_reference built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::add_lvalue_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_lvalue_reference.
* semantics.cc (finish_trait_type): Handle
CPTK_ADD_LVALUE_REFERENCE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__add_lvalue_reference.
* g++.dg/ext/add_lvalue_reference.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  8 +++
 .../g++.dg/ext/add_lvalue_reference.C | 21 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +++
 4 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/add_lvalue_reference.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index ce29108bad6..7dcc6bbad76 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -48,6 +48,7 @@
 #define DEFTRAIT_TYPE_DEFAULTED
 #endif
 
+DEFTRAIT_TYPE (ADD_LVALUE_REFERENCE, "__add_lvalue_reference", 1)
 DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index c8ac5167c3c..82fc31d9f9b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12760,6 +12760,14 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
 
   switch (kind)
 {
+case CPTK_ADD_LVALUE_REFERENCE:
+  if (VOID_TYPE_P (type1)
+ || (FUNC_OR_METHOD_TYPE_P (type1)
+ && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
+ || type_memfn_rqual (type1) != REF_QUAL_NONE)))
+   return type1;
+  return cp_build_reference_type (type1, /*rval=*/false);
+
 case CPTK_ADD_POINTER:
   if (FUNC_OR_METHOD_TYPE_P (type1)
  && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
diff --git a/gcc/testsuite/g++.dg/ext/add_lvalue_reference.C 
b/gcc/testsuite/g++.dg/ext/add_lvalue_reference.C
new file mode 100644
index 000..8fe1e0300e5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/add_lvalue_reference.C
@@ -0,0 +1,21 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__add_lvalue_reference(int), int&));
+SA(__is_same(__add_lvalue_reference(int&), int&));
+SA(__is_same(__add_lvalue_reference(const int), const int&));
+SA(__is_same(__add_lvalue_reference(int*), int*&));
+SA(__is_same(__add_lvalue_reference(ClassType&), ClassType&));
+SA(__is_same(__add_lvalue_reference(ClassType), ClassType&));
+SA(__is_same(__add_lvalue_reference(int(int)), int(&)(int)));
+SA(__is_same(__add_lvalue_reference(int&&), int&));
+SA(__is_same(__add_lvalue_reference(ClassType&&), ClassType&));
+SA(__is_same(__add_lvalue_reference(void), void));
+SA(__is_same(__add_lvalue_reference(const void), const void));
+SA(__is_same(__add_lvalue_reference(bool(int) const), bool(int) const));
+SA(__is_same(__add_lvalue_reference(bool(int) &), bool(int) &));
+SA(__is_same(__add_lvalue_reference(bool(int) const &&), bool(int) const &&));
+SA(__is_same(__add_lvalue_reference(bool(int)), bool(&)(int)));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 9af64173524..1046ffe7d01 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -2,6 +2,9 @@
 // { dg-do compile }
 // Verify that __has_builtin gives the correct answer for C++ built-ins.
 
+#if !__has_builtin (__add_lvalue_reference)
+# error "__has_builtin (__add_lvalue_reference) failed"
+#endif
 #if !__has_builtin (__add_pointer)
 # error "__has_builtin (__add_pointer) failed"
 #endif
-- 
2.43.0



[PATCH v5 13/14] c++: Implement __rank built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::rank.

gcc/cp/ChangeLog:

* cp-trait.def: Define __rank.
* semantics.cc (trait_expr_value): Handle CPTK_RANK.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __rank.
* g++.dg/ext/rank.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  | 18 --
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/rank.C  | 14 ++
 4 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/rank.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 11270f3ae6b..3d5a7970563 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -95,6 +95,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
"__is_trivially_assignable", 2)
 DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", -1)
 DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
 DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
+DEFTRAIT_EXPR (RANK, "__rank", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
 DEFTRAIT_TYPE (REMOVE_ALL_EXTENTS, "__remove_all_extents", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 256e7ef8166..4f285909b83 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12538,6 +12538,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_DEDUCIBLE:
   return type_targs_deducible_from (type1, type2);
 
+/* __rank is handled in finish_trait_expr. */
+case CPTK_RANK:
+
 #define DEFTRAIT_TYPE(CODE, NAME, ARITY) \
 case CPTK_##CODE:
 #include "cp-trait.def"
@@ -12698,6 +12701,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
 case CPTK_IS_UNION:
+case CPTK_RANK:
   break;
 
 case CPTK_IS_LAYOUT_COMPATIBLE:
@@ -12729,8 +12733,18 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
   gcc_unreachable ();
 }
 
-  tree val = (trait_expr_value (kind, type1, type2)
- ? boolean_true_node : boolean_false_node);
+  tree val;
+  if (kind == CPTK_RANK)
+{
+  size_t rank = 0;
+  for (; TREE_CODE (type1) == ARRAY_TYPE; type1 = TREE_TYPE (type1))
+   ++rank;
+  val = build_int_cst (size_type_node, rank);
+}
+  else
+val = (trait_expr_value (kind, type1, type2)
+  ? boolean_true_node : boolean_false_node);
+
   return maybe_wrap_with_location (val, loc);
 }
 
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 5b590db1cf6..a00193c1a81 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -167,6 +167,9 @@
 #if !__has_builtin (__is_union)
 # error "__has_builtin (__is_union) failed"
 #endif
+#if !__has_builtin (__rank)
+# error "__has_builtin (__rank) failed"
+#endif
 #if !__has_builtin (__reference_constructs_from_temporary)
 # error "__has_builtin (__reference_constructs_from_temporary) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/rank.C b/gcc/testsuite/g++.dg/ext/rank.C
new file mode 100644
index 000..bab062d776e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/rank.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__rank(int) == 0);
+SA(__rank(int[2]) == 1);
+SA(__rank(int[][4]) == 2);
+SA(__rank(int[2][2][4][4][6][6]) == 6);
+SA(__rank(ClassType) == 0);
+SA(__rank(ClassType[2]) == 1);
+SA(__rank(ClassType[][4]) == 2);
+SA(__rank(ClassType[2][2][4][4][6][6]) == 6);
-- 
2.43.0



[PATCH v5 05/14] c++: Implement __remove_all_extents built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::remove_all_extents.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_all_extents.
* semantics.cc (finish_trait_type): Handle
CPTK_REMOVE_ALL_EXTENTS.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__remove_all_extents.
* g++.dg/ext/remove_all_extents.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  3 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +++
 gcc/testsuite/g++.dg/ext/remove_all_extents.C | 16 
 4 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/remove_all_extents.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 3ff5611b60e..ce29108bad6 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -94,6 +94,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, 
"__is_trivially_copyable", 1)
 DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
+DEFTRAIT_TYPE (REMOVE_ALL_EXTENTS, "__remove_all_extents", 1)
 DEFTRAIT_TYPE (REMOVE_CV, "__remove_cv", 1)
 DEFTRAIT_TYPE (REMOVE_CVREF, "__remove_cvref", 1)
 DEFTRAIT_TYPE (REMOVE_EXTENT, "__remove_extent", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 6ab054b106a..c8ac5167c3c 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12769,6 +12769,9 @@ finish_trait_type (cp_trait_kind kind, tree type1, tree 
type2,
type1 = TREE_TYPE (type1);
   return build_pointer_type (type1);
 
+case CPTK_REMOVE_ALL_EXTENTS:
+  return strip_array_types (type1);
+
 case CPTK_REMOVE_CV:
   return cv_unqualified (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 4f1094befb9..9af64173524 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -164,6 +164,9 @@
 #if !__has_builtin (__reference_converts_from_temporary)
 # error "__has_builtin (__reference_converts_from_temporary) failed"
 #endif
+#if !__has_builtin (__remove_all_extents)
+# error "__has_builtin (__remove_all_extents) failed"
+#endif
 #if !__has_builtin (__remove_cv)
 # error "__has_builtin (__remove_cv) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/remove_all_extents.C 
b/gcc/testsuite/g++.dg/ext/remove_all_extents.C
new file mode 100644
index 000..60ade2ade7f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/remove_all_extents.C
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__remove_all_extents(int), int));
+SA(__is_same(__remove_all_extents(int[2]), int));
+SA(__is_same(__remove_all_extents(int[2][3]), int));
+SA(__is_same(__remove_all_extents(int[][3]), int));
+SA(__is_same(__remove_all_extents(const int[2][3]), const int));
+SA(__is_same(__remove_all_extents(ClassType), ClassType));
+SA(__is_same(__remove_all_extents(ClassType[2]), ClassType));
+SA(__is_same(__remove_all_extents(ClassType[2][3]), ClassType));
+SA(__is_same(__remove_all_extents(ClassType[][3]), ClassType));
+SA(__is_same(__remove_all_extents(const ClassType[2][3]), const ClassType));
-- 
2.43.0



[PATCH v5 10/14] libstdc++: Optimize std::add_rvalue_reference compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of
std::add_rvalue_reference by dispatching to the new
__add_rvalue_reference built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (add_rvalue_reference): Use
__add_rvalue_reference built-in trait.
(__add_rvalue_reference_helper): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 1f4e6db72f4..219d36fabba 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1157,6 +1157,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// @cond undocumented
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_rvalue_reference)
+  template
+struct __add_rvalue_reference_helper
+{ using type = __add_rvalue_reference(_Tp); };
+#else
   template
 struct __add_rvalue_reference_helper
 { using type = _Tp; };
@@ -1164,6 +1169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __add_rvalue_reference_helper<_Tp, __void_t<_Tp&&>>
 { using type = _Tp&&; };
+#endif
 
   template
 using __add_rval_ref_t = typename __add_rvalue_reference_helper<_Tp>::type;
@@ -1720,9 +1726,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// add_rvalue_reference
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_rvalue_reference)
+  template
+struct add_rvalue_reference
+{ using type = __add_rvalue_reference(_Tp); };
+#else
   template
 struct add_rvalue_reference
 { using type = __add_rval_ref_t<_Tp>; };
+#endif
 
 #if __cplusplus > 201103L
   /// Alias template for remove_reference
-- 
2.43.0



[PATCH v5 14/14] libstdc++: Optimize std::rank compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of std::rank
by dispatching to the new __rank built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (rank): Use __rank built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 90718d772dd..5d2e6eaa2a2 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1445,6 +1445,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// rank
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__rank)
+  template
+struct rank
+: public integral_constant { };
+#else
   template
 struct rank
 : public integral_constant { };
@@ -1456,6 +1461,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct rank<_Tp[]>
 : public integral_constant::value> { };
+#endif
 
   /// extent
   template
-- 
2.43.0



[PATCH v5 09/14] c++: Implement __add_rvalue_reference built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::add_rvalue_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_rvalue_reference.
* semantics.cc (finish_trait_type): Handle
CPTK_ADD_RVALUE_REFERENCE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__add_rvalue_reference.
* g++.dg/ext/add_rvalue_reference.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  8 
 .../g++.dg/ext/add_rvalue_reference.C | 20 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +++
 4 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/add_rvalue_reference.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 7dcc6bbad76..9e8f9eb38b8 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -50,6 +50,7 @@
 
 DEFTRAIT_TYPE (ADD_LVALUE_REFERENCE, "__add_lvalue_reference", 1)
 DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
+DEFTRAIT_TYPE (ADD_RVALUE_REFERENCE, "__add_rvalue_reference", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_COPY, "__has_nothrow_copy", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 82fc31d9f9b..f437e272ea6 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12777,6 +12777,14 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
type1 = TREE_TYPE (type1);
   return build_pointer_type (type1);
 
+case CPTK_ADD_RVALUE_REFERENCE:
+  if (VOID_TYPE_P (type1)
+ || (FUNC_OR_METHOD_TYPE_P (type1)
+ && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
+ || type_memfn_rqual (type1) != REF_QUAL_NONE)))
+   return type1;
+  return cp_build_reference_type (type1, /*rval=*/true);
+
 case CPTK_REMOVE_ALL_EXTENTS:
   return strip_array_types (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/add_rvalue_reference.C 
b/gcc/testsuite/g++.dg/ext/add_rvalue_reference.C
new file mode 100644
index 000..c92fe6bfa17
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/add_rvalue_reference.C
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__add_rvalue_reference(int), int&&));
+SA(__is_same(__add_rvalue_reference(int&&), int&&));
+SA(__is_same(__add_rvalue_reference(int&), int&));
+SA(__is_same(__add_rvalue_reference(const int), const int&&));
+SA(__is_same(__add_rvalue_reference(int*), int*&&));
+SA(__is_same(__add_rvalue_reference(ClassType&&), ClassType&&));
+SA(__is_same(__add_rvalue_reference(ClassType), ClassType&&));
+SA(__is_same(__add_rvalue_reference(int(int)), int(&&)(int)));
+SA(__is_same(__add_rvalue_reference(void), void));
+SA(__is_same(__add_rvalue_reference(const void), const void));
+SA(__is_same(__add_rvalue_reference(bool(int) const), bool(int) const));
+SA(__is_same(__add_rvalue_reference(bool(int) &), bool(int) &));
+SA(__is_same(__add_rvalue_reference(bool(int) const &&), bool(int) const &&));
+SA(__is_same(__add_rvalue_reference(bool(int)), bool(&&)(int)));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 1046ffe7d01..9d7e59b47fb 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -8,6 +8,9 @@
 #if !__has_builtin (__add_pointer)
 # error "__has_builtin (__add_pointer) failed"
 #endif
+#if !__has_builtin (__add_rvalue_reference)
+# error "__has_builtin (__add_rvalue_reference) failed"
+#endif
 #if !__has_builtin (__builtin_addressof)
 # error "__has_builtin (__builtin_addressof) failed"
 #endif
-- 
2.43.0



[PATCH v5 12/14] libstdc++: Optimize std::decay compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of std::decay
by dispatching to the new __decay built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (decay): Use __decay built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 219d36fabba..90718d772dd 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2288,6 +2288,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// @cond undocumented
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__decay)
+  template
+struct decay
+{ using type = __decay(_Tp); };
+#else
   // Decay trait for arrays and functions, used for perfect forwarding
   // in make_pair, make_tuple, etc.
   template
@@ -2319,6 +2324,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct decay<_Tp&&>
 { using type = typename __decay_selector<_Tp>::type; };
+#endif
 
   /// @cond undocumented
 
-- 
2.43.0



[PATCH v5 04/14] libstdc++: Optimize std::remove_extent compilation performance

2024-02-15 Thread Ken Matsui
This patch optimizes the compilation performance of std::remove_extent
by dispatching to the new __remove_extent built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_extent): Use __remove_extent
built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 3bde7cb8ba3..0fb1762186c 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2064,6 +2064,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Array modifications.
 
   /// remove_extent
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__remove_extent)
+  template
+struct remove_extent
+{ using type = __remove_extent(_Tp); };
+#else
   template
 struct remove_extent
 { using type = _Tp; };
@@ -2075,6 +2080,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct remove_extent<_Tp[]>
 { using type = _Tp; };
+#endif
 
   /// remove_all_extents
   template
-- 
2.43.0



[PATCH v5 01/14] c++: Implement __add_pointer built-in trait

2024-02-15 Thread Ken Matsui
This patch implements built-in trait for std::add_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_pointer.
* semantics.cc (finish_trait_type): Handle CPTK_ADD_POINTER.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __add_pointer.
* g++.dg/ext/add_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  9 ++
 gcc/testsuite/g++.dg/ext/add_pointer.C   | 39 
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 4 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/add_pointer.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 394f006f20f..cec385ee501 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -48,6 +48,7 @@
 #define DEFTRAIT_TYPE_DEFAULTED
 #endif
 
+DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_COPY, "__has_nothrow_copy", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 57840176863..8dc975495a8 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12760,6 +12760,15 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
 
   switch (kind)
 {
+case CPTK_ADD_POINTER:
+  if (FUNC_OR_METHOD_TYPE_P (type1)
+ && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
+ || type_memfn_rqual (type1) != REF_QUAL_NONE))
+   return type1;
+  if (TYPE_REF_P (type1))
+   type1 = TREE_TYPE (type1);
+  return build_pointer_type (type1);
+
 case CPTK_REMOVE_CV:
   return cv_unqualified (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/add_pointer.C 
b/gcc/testsuite/g++.dg/ext/add_pointer.C
new file mode 100644
index 000..c405cdd0feb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/add_pointer.C
@@ -0,0 +1,39 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__add_pointer(int), int*));
+SA(__is_same(__add_pointer(int*), int**));
+SA(__is_same(__add_pointer(const int), const int*));
+SA(__is_same(__add_pointer(int&), int*));
+SA(__is_same(__add_pointer(ClassType*), ClassType**));
+SA(__is_same(__add_pointer(ClassType), ClassType*));
+SA(__is_same(__add_pointer(void), void*));
+SA(__is_same(__add_pointer(const void), const void*));
+SA(__is_same(__add_pointer(volatile void), volatile void*));
+SA(__is_same(__add_pointer(const volatile void), const volatile void*));
+
+void f1();
+using f1_type = decltype(f1);
+using pf1_type = decltype(&f1);
+SA(__is_same(__add_pointer(f1_type), pf1_type));
+
+void f2() noexcept; // PR libstdc++/78361
+using f2_type = decltype(f2);
+using pf2_type = decltype(&f2);
+SA(__is_same(__add_pointer(f2_type), pf2_type));
+
+using fn_type = void();
+using pfn_type = void(*)();
+SA(__is_same(__add_pointer(fn_type), pfn_type));
+
+SA(__is_same(__add_pointer(void() &), void() &));
+SA(__is_same(__add_pointer(void() & noexcept), void() & noexcept));
+SA(__is_same(__add_pointer(void() const), void() const));
+SA(__is_same(__add_pointer(void(...) &), void(...) &));
+SA(__is_same(__add_pointer(void(...) & noexcept), void(...) & noexcept));
+SA(__is_same(__add_pointer(void(...) const), void(...) const));
+
+SA(__is_same(__add_pointer(void() __restrict), void() __restrict));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 02b4b4d745d..56e8db7ac32 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -2,6 +2,9 @@
 // { dg-do compile }
 // Verify that __has_builtin gives the correct answer for C++ built-ins.
 
+#if !__has_builtin (__add_pointer)
+# error "__has_builtin (__add_pointer) failed"
+#endif
 #if !__has_builtin (__builtin_addressof)
 # error "__has_builtin (__builtin_addressof) failed"
 #endif
-- 
2.43.0



Re: [PATCH] libgccjit: Clear pending_assemble_externals_processed

2024-02-15 Thread David Malcolm
On Thu, 2024-02-08 at 17:09 -0500, Antoni Boucher wrote:
> Hi.
> This patch fixes the bug 113842.
> I cannot yet add a test with this patch since it requires using
> try/catch which is not yet merged in master.
> Thanks for the review.

Thanks; patch looks good for trunk, assuming you've tested it on a
target that defines ASM_OUTPUT_EXTERNAL.

Dave



Re: [PATCH] libgccjit: Fix ira cost segfault

2024-02-15 Thread Antoni Boucher
This patch is indeed not necessary anymore.

On Wed, 2024-01-10 at 09:32 -0500, David Malcolm wrote:
> On Wed, 2024-01-10 at 09:30 -0500, David Malcolm wrote:
> > On Thu, 2023-11-16 at 17:28 -0500, Antoni Boucher wrote:
> > > Hi.
> > > This patch fixes a segfault that happens when compiling librsvg
> > > (more
> > > specifically its dependency aho-corasick) with rustc_codegen_gcc
> > > (bug
> > > 112575).
> > > I was not able to create a reproducer for this bug: I'm assuming
> > > I
> > > might need to concat all the reproducers together in the same
> > > file
> > > in
> > > order to be able to reproduce the issue.
> > 
> > Hi Antoni
> > 
> > Thanks for the patch; sorry for missing it before.
> > 
> > CCing the i386 maintainers; quoting the patch here to give them
> > context:
> 
> Oops; actually adding them to the CC this time; sorry.
> 
> > 
> > > From e0f4f51682266bc9f507afdb64908ed3695a2f5e Mon Sep 17 00:00:00
> > > 2001
> > > From: Antoni Boucher 
> > > Date: Thu, 2 Nov 2023 17:18:35 -0400
> > > Subject: [PATCH] libgccjit: Fix ira cost segfault
> > > 
> > > gcc/ChangeLog:
> > > PR jit/112575
> > > * config/i386/i386-options.cc
> > > (ix86_option_override_internal):
> > > Cleanup target_attribute_cache.
> > > ---
> > >  gcc/config/i386/i386-options.cc | 6 ++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/gcc/config/i386/i386-options.cc
> > > b/gcc/config/i386/i386-options.cc
> > > index df7d24352d1..f596c0fb53c 100644
> > > --- a/gcc/config/i386/i386-options.cc
> > > +++ b/gcc/config/i386/i386-options.cc
> > > @@ -3070,6 +3070,12 @@ ix86_option_override_internal (bool
> > > main_args_p,
> > > = opts->x_flag_unsafe_math_optimizations;
> > >    target_option_default_node = target_option_current_node
> > >  = build_target_option_node (opts, opts_set);
> > > +  /* TODO: check if this is the correct location.  It should
> > > probably be in
> > > +    some finalizer function, but I don't
> > > +    know if there's one.  */
> > > +  target_attribute_cache[0] = NULL;
> > > +  target_attribute_cache[1] = NULL;
> > > +  target_attribute_cache[2] = NULL;
> > >  }
> > >  
> > >    if (opts->x_flag_cf_protection != CF_NONE)
> > > -- 
> > > 2.42.1
> > > 
> > 
> > Presumably this happens when there's more than one in-process
> > invocation of the compiler code (via libgccjit).
> > 
> > > 
> > > I'm also not sure I put the cleanup in the correct location.
> > > Is there any finalizer function for target specific code?
> > 
> > As you know (but the i386 maintainers might not), to allow multiple
> > in-
> > process invocations of the compiler code (for libgccjit) we've been
> > putting code to reset global state in various
> > {filename_cc}_finalize
> > functions called from toplev::finalize (see the end of toplev.cc).
> > 
> > There doesn't seem to be any kind of hook at this time for calling
> > target-specific cleanups from toplev::finalize.
> > 
> > However, as of r14-4003-geaa8e8541349df ggc_common_finalize zeroes
> > everything marked with GTY.  The array target_attribute_cache does
> > have
> > a GTY marking, so perhaps as of that commit this patch isn't
> > necessary?
> > 
> > Otherwise, if special-casing this is required, sorry: I'm not
> > familiar
> > enough with i386-options.cc to know if the patch is correct.
> > 
> > > 
> > > Thanks to fix this issue.
> > 
> > Dave
> 



[PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Ajit Agarwal
Hello Richard:

As per your suggestion I have divided the patch into target independent
and target dependent for aarch64 target. I kept aarch64-ldp-fusion same
and did not change that.

Common infrastructure of load store pair fusion is divided into
target independent and target dependent code for rs6000 and aarch64
target.

Target independent code is structured in the following files.
gcc/pair-fusion-base.h
gcc/pair-fusion-common.cc
gcc/pair-fusion.cc

Target independent code is the Generic code with pure virtual
function to interface betwwen target independent and dependent
code.

Thanks & Regards
Ajit

Target independent code for common infrastructure of load
store fusion for rs6000 and aarch64 target.

Common infrastructure of load store pair fusion is divided into
target independent and target dependent code for rs6000 and aarch64
target.

Target independent code is structured in the following files.
gcc/pair-fusion-base.h
gcc/pair-fusion-common.cc
gcc/pair-fusion.cc

Target independent code is the Generic code with pure virtual
function to interface betwwen target independent and dependent
code.

2024-02-15  Ajit Kumar Agarwal  

gcc/ChangeLog:

* pair-fusion-base.h: Generic header code for load store fusion
that can be shared across different architectures.
* pair-fusion-common.cc: Generic source code for load store
fusion that can be shared across different architectures.
* pair-fusion.cc: Generic implementation of pair_fusion class
defined in pair-fusion-base.h
* Makefile.in: Add new executable pair-fusion.o and
pair-fusion-common.o.
---
 gcc/Makefile.in   |2 +
 gcc/pair-fusion-base.h|  586 ++
 gcc/pair-fusion-common.cc | 1202 
 gcc/pair-fusion.cc| 1225 +
 4 files changed, 3015 insertions(+)
 create mode 100644 gcc/pair-fusion-base.h
 create mode 100644 gcc/pair-fusion-common.cc
 create mode 100644 gcc/pair-fusion.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a74761b7ab3..df5061ddfe7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1563,6 +1563,8 @@ OBJS = \
ipa-strub.o \
ipa.o \
ira.o \
+   pair-fusion-common.o \
+   pair-fusion.o \
ira-build.o \
ira-costs.o \
ira-conflicts.o \
diff --git a/gcc/pair-fusion-base.h b/gcc/pair-fusion-base.h
new file mode 100644
index 000..fdaf4fd743d
--- /dev/null
+++ b/gcc/pair-fusion-base.h
@@ -0,0 +1,586 @@
+// Generic code for Pair MEM  fusion optimization pass.
+// Copyright (C) 2023-2024 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify it
+// under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful, but
+// WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+// General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+#ifndef GCC_PAIR_FUSION_H
+#define GCC_PAIR_FUSION_H
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+#define INCLUDE_LIST
+#define INCLUDE_TYPE_TRAITS
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "df.h"
+#include "rtl-iter.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "tree-pass.h"
+#include "ordered-hash-map.h"
+#include "tree-dfa.h"
+#include "fold-const.h"
+#include "tree-hash-traits.h"
+#include "print-tree.h"
+#include "insn-attr.h"
+using namespace rtl_ssa;
+// We pack these fields (load_p, fpsimd_p, and size) into an integer
+// (LFS) which we use as part of the key into the main hash tables.
+//
+// The idea is that we group candidates together only if they agree on
+// the fields below.  Candidates that disagree on any of these
+// properties shouldn't be merged together.
+struct lfs_fields
+{
+  bool load_p;
+  bool fpsimd_p;
+  unsigned size;
+};
+
+using insn_list_t = std::list;
+using insn_iter_t = insn_list_t::iterator;
+
+// Information about the accesses at a given offset from a particular
+// base.  Stored in an access_group, see below.
+struct access_record
+{
+  poly_int64 offset;
+  std::list cand_insns;
+  std::list::iterator place;
+
+  access_record (poly_int64 off) : offset (off) {}
+};
+
+// A group of accesses where adjacent accesses could be ldp/stp
+// candidates.  The splay tree supports efficient insertion,
+// while the list supports efficient iteration.
+struct access_group
+{
+  splay_tree tree;
+  std::list list;
+
+  template
+  inline void track (Alloc node_alloc, poly_int64 offset, insn_info *insn

Re: [PATCH] Notes on the warnings-as-errors change in GCC 14

2024-02-15 Thread Florian Weimer
* Jonathan Wakely:

>>+To fix the remaining int-conversions issues, add casts
>>+to an appropriate pointer or integer type.  On GNU systems, the
>>+standard (but generally optional) types
>
> I know what you mean here, but I'm not sure the parenthesis adds
> clarity for anybody who doesn't already know that those types are
> optional for conformance. I think they're supported for all targets
> that GCC supports, so maybe just omit the parenthesis.

Fair enough, I've droped the parentheses.  Thank you for your other
suggestions as well, I have applied them.

Florian



Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-02-15 Thread Alex Coplan
On 14/02/2024 11:18, Richard Sandiford wrote:
> Alex Coplan  writes:
> > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
> > The only part of the patch that isn't a straight cherry-pick is due to
> > the TX iterator lacking TDmode for GCC 12, so this version adjusts
> > TX_V16QI accordingly.
> >
> > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
> > testsuite I saw were in
> > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
> > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
> > since libhwasan gained the short granule tag feature, I've requested a
> > backport of the following patch (committed as
> > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
> > (independent) issue for GCC 12:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
> >
> > OK for the GCC 12 branch?
> 
> OK, thanks.

Thanks.  The patch cherry-picks cleanly on the GCC 11 branch, and
bootstraps/regtests OK there.  Is it OK for GCC 11 too, even though the
issue is latent there (at least for the testcase in the patch)?

Alex

> 
> Richard
> 
> > Thanks,
> > Alex
> >
> > -- >8 --
> >
> > The PR shows us ICEing due to an unrecognizable TFmode save emitted by
> > aarch64_process_components.  The problem is that for T{I,F,D}mode we
> > conservatively require mems to be in range for x-register ldp/stp.  That
> > is because (at least for TImode) it can be allocated to both GPRs and
> > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
> > a q-register load/store.
> >
> > As Richard pointed out in the PR, aarch64_get_separate_components
> > already checks that the offsets are suitable for a single load, so we
> > just need to choose a mode in aarch64_reg_save_mode that gives the full
> > q-register range.  In this patch, we choose V16QImode as an alternative
> > 16-byte "bag-of-bits" mode that doesn't have the artificial range
> > restrictions imposed on T{I,F,D}mode.
> >
> > Unlike for GCC 14 we need additional handling in the load/store pair
> > code as various cases are not expecting to see V16QImode (particularly
> > the writeback patterns, but also aarch64_gen_load_pair).
> >
> > gcc/ChangeLog:
> >
> > PR target/111677
> > * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
> > V16QImode for the full 16-byte FPR saves in the vector PCS case.
> > (aarch64_gen_storewb_pair): Handle V16QImode.
> > (aarch64_gen_loadwb_pair): Likewise.
> > (aarch64_gen_load_pair): Likewise.
> > * config/aarch64/aarch64.md (loadwb_pair_):
> > Rename to ...
> > (loadwb_pair_): ... this, extending to
> > V16QImode.
> > (storewb_pair_): Rename to ...
> > (storewb_pair_): ... this, extending to
> > V16QImode.
> > * config/aarch64/iterators.md (TX_V16QI): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/111677
> > * gcc.target/aarch64/torture/pr111677.c: New test.
> >
> > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
> > ---
> >  gcc/config/aarch64/aarch64.cc | 13 ++-
> >  gcc/config/aarch64/aarch64.md | 35 ++-
> >  gcc/config/aarch64/iterators.md   |  3 ++
> >  .../gcc.target/aarch64/torture/pr111677.c | 28 +++
> >  4 files changed, 61 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 3bccd96a23d..2bbba323770 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
> >case ARM_PCS_SIMD:
> > /* The vector PCS saves the low 128 bits (which is the full
> >register on non-SVE targets).  */
> > -   return TFmode;
> > +   return V16QImode;
> >  
> >case ARM_PCS_SVE:
> > /* Use vectors of DImode for registers that need frame
> > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx 
> > base, rtx reg, rtx reg2,
> >return gen_storewb_pairtf_di (base, base, reg, reg2,
> > GEN_INT (-adjustment),
> > GEN_INT (UNITS_PER_VREG - adjustment));
> > +case E_V16QImode:
> > +  return gen_storewb_pairv16qi_di (base, base, reg, reg2,
> > +  GEN_INT (-adjustment),
> > +  GEN_INT (UNITS_PER_VREG - adjustment));
> >  default:
> >gcc_unreachable ();
> >  }
> > @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx 
> > base, rtx reg, rtx reg2,
> >  case E_TFmode:
> >return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT 
> > (adjustment),
> >GEN_INT (UNITS_PER_VREG));
> > +case E_V16QImode:
> > +  return gen_loadwb_pairv16qi_di (base, base, 

Re: [PATCH] Notes on the warnings-as-errors change in GCC 14

2024-02-15 Thread Florian Weimer
* Sam James:

> It's fine if you leave this out, but consider mentioning the common
> pitfall of autoconf projects not including config.h consistently before
> all inclues. We could also mention AC_USE_SYSTEM_EXTENSIONS.

I added: 

“
Alternatively, projects using using Autoconf
could enable AC_USE_SYSTEM_EXTENSIONS.
”

 inclusion is a larger issue, I think, best addressed by
future diagnostics.

>> +
>> +When building library code on GNU systems, it was possible to call
>> +undefined (not just undeclared) functions and still run other code in
>> +the library, particularly if ELF lazy binding was used.  Only
>> +executing the undefined function call would result in a lazy binding
>> +error and program crash.
>
> Maybe explicitly refer to the bfd linker's relaxed behaviour so it
> sounds less mysterious.

Like this?

“

When building library code on GNU systems,
https://sourceware.org/binutils/docs-2.42/ld/Options.html#index-_002d_002dallow_002dshlib_002dundefined";>it
 was possible to call
  undefined (not just undeclared) functions
and still run other code in the library, particularly if ELF lazy binding
was used.  Only executing the undefined function call would result in a
lazy binding error and program crash.
”

Thanks,
Florian



Re: [PATCH] Notes on the warnings-as-errors change in GCC 14

2024-02-15 Thread Sam James

Florian Weimer  writes:

> * Sam James:
>
>> It's fine if you leave this out, but consider mentioning the common
>> pitfall of autoconf projects not including config.h consistently before
>> all inclues. We could also mention AC_USE_SYSTEM_EXTENSIONS.
>
> I added: 
>
> “
> Alternatively, projects using using Autoconf
> could enable AC_USE_SYSTEM_EXTENSIONS.
> ”
>
>  inclusion is a larger issue, I think, best addressed by
> future diagnostics.

OK, works for me. We should discuss some options for the latter at some
point though (I think we could do it for libc cases where it matters for
LFS at least) but that's for another time.

>
>>> +
>>> +When building library code on GNU systems, it was possible to call
>>> +undefined (not just undeclared) functions and still run other code in
>>> +the library, particularly if ELF lazy binding was used.  Only
>>> +executing the undefined function call would result in a lazy binding
>>> +error and program crash.
>>
>> Maybe explicitly refer to the bfd linker's relaxed behaviour so it
>> sounds less mysterious.
>
> Like this?
>
> “
> 
> When building library code on GNU systems,
>  href="https://sourceware.org/binutils/docs-2.42/ld/Options.html#index-_002d_002dallow_002dshlib_002dundefined";>it
>  was possible to call
>   undefined (not just undeclared) functions
> and still run other code in the library, particularly if ELF lazy binding
> was used.  Only executing the undefined function call would result in a
> lazy binding error and program crash.
> ”

Sounds good, thanks.

>
> Thanks,
> Florian

best,
sam


signature.asc
Description: PGP signature


Re: [PATCH] Notes on the warnings-as-errors change in GCC 14

2024-02-15 Thread Florian Weimer
* Gerald Pfeifer:

>>  This mostly happens in function definitions
>> +that are not prototypes
>
> Naive questions: Can definitions really be prototypes (in C)?

Yes, I think so: definitions can be declarations, and function
prototypes are declarations.  The standard uses the phrase “function
definition that does not include a function prototype declarator”.
Should I write “old-style function definition” instead?

>
>> +declared outside the parameter list.  Using the correct
>> +type maybe required to avoid int-conversion errors (see below).
>
> Something feels odd with this sentence?

The fix is to write “may[ ]be“, as suggested by other reviewers.

>> +Incorrectly spelled type names in function declarations are treated as
>> +errors in more cases, under a
>> +new -Wdeclaration-missing-parameter-type warning.  The
>> +second line in the following example is now treated as an error
>> +(previously this resulted in an unnamed warning):
>
> What is an "unnamed" warning? Can we simply omit "unnamed" here?

A warning not controlled by a specific -W… option.  I've made the
change.

>> +GCC will type-check function arguments after that, potentially
>> +requiring further changes.  (Previously, the function declaration was
>> +treated as not having no prototype.)
>
> That second sentence uses double negation, which logically is the same as 
> just the original statement.

Other reviews suggests to change it to “not having [a] prototype”.

>> +
>> +By default, GCC still accepts returning an expression of
>> +type void from within a function that itself
>> +returns void, as a GNU extension that matches C++ rules
>> +in this area.
>
> Does the GNU extension match C++ (standard rules)?

Yes.  Should I write “matches [standard] C++ rules”?

Thanks,
Florian



RE: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by adding whole vector shift right [PR113872]

2024-02-15 Thread Tamar Christina
> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, February 15, 2024 2:56 PM
> To: Andrew Pinski 
> Cc: gcc-patches@gcc.gnu.org; Tamar Christina 
> Subject: Re: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by adding 
> whole
> vector shift right [PR113872]
> 
> Andrew Pinski  writes:
> > The backend currently defines a whole vector shift left for 64bit vectors, 
> > adding
> the
> > shift right can also improve code for some PERMs too. So this adds that 
> > pattern.
> 
> Is this reversed?  It looks like we have the shift right and the patch is
> adding the shift left (at least in GCC internal and little-endian terms).
> 
> But on many Arm cores, EXT has a higher throughput than SHL, so I don't think
> we should do this unconditionally.

Yeah, on most (if not all) all Arm cores the EXT has higher throughput than SHL
and on Cortex-A75 the EXT has both higher throughput and lower latency.

I guess the expected gain here is that we wouldn't need to create the zero 
vector,
However on modern Arm cores the zero vector creation is free using movi and EXT
being three operand also means we only need one copy if e.g in a loop.

Kind Regards,
Tamar

> 
> Thanks,
> Richard
> 
> >
> > I added a testcase for the shift left also. I also fixed the instruction 
> > template
> > there which was using a space instead of a tab after the instruction.
> >
> > Built and tested on aarch64-linux-gnu.
> >
> > PR target/113872
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (vec_shr_):
> Use tab instead of space after
> > the instruction in the template.
> > (vec_shl_): New pattern
> > * config/aarch64/iterators.md (unspec): Add UNSPEC_VEC_SHL
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/perm_zero-1.c: New test.
> > * gcc.target/aarch64/perm_zero-2.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/config/aarch64/aarch64-simd.md | 18 --
> >  gcc/config/aarch64/iterators.md|  1 +
> >  gcc/testsuite/gcc.target/aarch64/perm_zero-1.c | 15 +++
> >  gcc/testsuite/gcc.target/aarch64/perm_zero-2.c | 15 +++
> >  4 files changed, 47 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/perm_zero-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/perm_zero-2.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> > index f8bb973a278..0d2f1ea3902 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -1592,9 +1592,23 @@ (define_insn "vec_shr_"
> >"TARGET_SIMD"
> >{
> >  if (BYTES_BIG_ENDIAN)
> > -  return "shl %d0, %d1, %2";
> > +  return "shl\t%d0, %d1, %2";
> >  else
> > -  return "ushr %d0, %d1, %2";
> > +  return "ushr\t%d0, %d1, %2";
> > +  }
> > +  [(set_attr "type" "neon_shift_imm")]
> > +)
> > +(define_insn "vec_shl_"
> > +  [(set (match_operand:VD 0 "register_operand" "=w")
> > +(unspec:VD [(match_operand:VD 1 "register_operand" "w")
> > +   (match_operand:SI 2 "immediate_operand" "i")]
> > +  UNSPEC_VEC_SHL))]
> > +  "TARGET_SIMD"
> > +  {
> > +if (BYTES_BIG_ENDIAN)
> > +  return "ushr\t%d0, %d1, %2";
> > +else
> > +  return "shl\t%d0, %d1, %2";
> >}
> >[(set_attr "type" "neon_shift_imm")]
> >  )
> > diff --git a/gcc/config/aarch64/iterators.md 
> > b/gcc/config/aarch64/iterators.md
> > index 99cde46f1ba..3aebe9cf18a 100644
> > --- a/gcc/config/aarch64/iterators.md
> > +++ b/gcc/config/aarch64/iterators.md
> > @@ -758,6 +758,7 @@ (define_c_enum "unspec"
> >  UNSPEC_PMULL; Used in aarch64-simd.md.
> >  UNSPEC_PMULL2   ; Used in aarch64-simd.md.
> >  UNSPEC_REV_REGLIST  ; Used in aarch64-simd.md.
> > +UNSPEC_VEC_SHL  ; Used in aarch64-simd.md.
> >  UNSPEC_VEC_SHR  ; Used in aarch64-simd.md.
> >  UNSPEC_SQRDMLAH ; Used in aarch64-simd.md.
> >  UNSPEC_SQRDMLSH ; Used in aarch64-simd.md.
> > diff --git a/gcc/testsuite/gcc.target/aarch64/perm_zero-1.c
> b/gcc/testsuite/gcc.target/aarch64/perm_zero-1.c
> > new file mode 100644
> > index 000..3c8f0591a2f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/perm_zero-1.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2"  } */
> > +/* PR target/113872 */
> > +/* For 64bit vectors, PERM with a constant 0 should produce a shift 
> > instead of
> the ext instruction. */
> > +
> > +#define vect64 __attribute__((vector_size(8)))
> > +
> > +void f(vect64  unsigned short *a)
> > +{
> > +  *a = __builtin_shufflevector((vect64 unsigned short){0},*a, 3,4,5,6);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "ushr\t" 1 { target 
> > aarch64_big_endian } } }
> */
> > +/* { dg-final { scan-assembler-times "shl\t" 1 { target 
> > aarch64_little_endian } } }
> */
> > +/* { dg-final { scan-assembler-not "ext\t"  } 

[PATCH] libgccjit: Add count zeroes builtins to ensure_optimization_builtins_exist

2024-02-15 Thread Antoni Boucher
Hi.
This patch adds some missing builtins that can be generated by
optimizations.
I'm not sure how to add a test for this one.
Do you know the C code that can be optimized to a builtin_clz?
Thanks for the review.
From 578cb40bd333abf57e5b3b08d3453bdcf7ad80b5 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Thu, 8 Feb 2024 21:48:27 -0500
Subject: [PATCH] libgccjit: Add count zeroes builtins to
 ensure_optimization_builtins_exist

gcc/jit/ChangeLog:

	* jit-builtins.cc (ensure_optimization_builtins_exist): Add
	missing builtins.
---
 gcc/jit/jit-builtins.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/jit/jit-builtins.cc b/gcc/jit/jit-builtins.cc
index e0bb24738dd..0c13c8db586 100644
--- a/gcc/jit/jit-builtins.cc
+++ b/gcc/jit/jit-builtins.cc
@@ -612,6 +612,12 @@ builtins_manager::ensure_optimization_builtins_exist ()
   (void)get_builtin_function_by_id (BUILT_IN_POPCOUNT);
   (void)get_builtin_function_by_id (BUILT_IN_POPCOUNTL);
   (void)get_builtin_function_by_id (BUILT_IN_POPCOUNTLL);
+  (void)get_builtin_function_by_id (BUILT_IN_CLZ);
+  (void)get_builtin_function_by_id (BUILT_IN_CTZ);
+  (void)get_builtin_function_by_id (BUILT_IN_CLZL);
+  (void)get_builtin_function_by_id (BUILT_IN_CTZL);
+  (void)get_builtin_function_by_id (BUILT_IN_CLZLL);
+  (void)get_builtin_function_by_id (BUILT_IN_CTZLL);
 }
 
 /* Playback support.  */
-- 
2.43.0



Re: [PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Alex Coplan
On 15/02/2024 21:24, Ajit Agarwal wrote:
> Hello Richard:
> 
> As per your suggestion I have divided the patch into target independent
> and target dependent for aarch64 target. I kept aarch64-ldp-fusion same
> and did not change that.

I'm not sure this was what Richard suggested doing, though.
He said (from
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645545.html):

> Maybe one way of making the review easier would be to split the aarch64
> pass into the "target-dependent" and "target-independent" pieces
> in-place, i.e. keeping everything within aarch64-ldp-fusion.cc, and then
> (as separate patches) move the target-independent pieces outside
> config/aarch64.

but this adds the target-independent parts separately instead of
splitting it out within config/aarch64 (which I agree should make the
review easier).

Thanks,
Alex

> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code for rs6000 and aarch64
> target.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion-base.h
> gcc/pair-fusion-common.cc
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> Thanks & Regards
> Ajit
> 
> Target independent code for common infrastructure of load
> store fusion for rs6000 and aarch64 target.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code for rs6000 and aarch64
> target.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion-base.h
> gcc/pair-fusion-common.cc
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> 2024-02-15  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * pair-fusion-base.h: Generic header code for load store fusion
>   that can be shared across different architectures.
>   * pair-fusion-common.cc: Generic source code for load store
>   fusion that can be shared across different architectures.
>   * pair-fusion.cc: Generic implementation of pair_fusion class
>   defined in pair-fusion-base.h
>   * Makefile.in: Add new executable pair-fusion.o and
>   pair-fusion-common.o.
> ---
>  gcc/Makefile.in   |2 +
>  gcc/pair-fusion-base.h|  586 ++
>  gcc/pair-fusion-common.cc | 1202 
>  gcc/pair-fusion.cc| 1225 +
>  4 files changed, 3015 insertions(+)
>  create mode 100644 gcc/pair-fusion-base.h
>  create mode 100644 gcc/pair-fusion-common.cc
>  create mode 100644 gcc/pair-fusion.cc
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a74761b7ab3..df5061ddfe7 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1563,6 +1563,8 @@ OBJS = \
>   ipa-strub.o \
>   ipa.o \
>   ira.o \
> + pair-fusion-common.o \
> + pair-fusion.o \
>   ira-build.o \
>   ira-costs.o \
>   ira-conflicts.o \
> diff --git a/gcc/pair-fusion-base.h b/gcc/pair-fusion-base.h
> new file mode 100644
> index 000..fdaf4fd743d
> --- /dev/null
> +++ b/gcc/pair-fusion-base.h
> @@ -0,0 +1,586 @@
> +// Generic code for Pair MEM  fusion optimization pass.
> +// Copyright (C) 2023-2024 Free Software Foundation, Inc.
> +//
> +// This file is part of GCC.
> +//
> +// GCC is free software; you can redistribute it and/or modify it
> +// under the terms of the GNU General Public License as published by
> +// the Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +//
> +// GCC is distributed in the hope that it will be useful, but
> +// WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +// General Public License for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with GCC; see the file COPYING3.  If not see
> +// .
> +
> +#ifndef GCC_PAIR_FUSION_H
> +#define GCC_PAIR_FUSION_H
> +#define INCLUDE_ALGORITHM
> +#define INCLUDE_FUNCTIONAL
> +#define INCLUDE_LIST
> +#define INCLUDE_TYPE_TRAITS
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "df.h"
> +#include "rtl-iter.h"
> +#include "rtl-ssa.h"
> +#include "cfgcleanup.h"
> +#include "tree-pass.h"
> +#include "ordered-hash-map.h"
> +#include "tree-dfa.h"
> +#include "fold-const.h"
> +#include "tree-hash-traits.h"
> +#include "print-tree.h"
> +#include "insn-attr.h"
> +using namespace rtl_ssa;
> +// We pack these fields (load_p, fpsimd_p, and size) into an integer
> +// (LFS) which we use as part of the key into the main hash tables.
> +//
> +// The idea is that we group candidates together only if they agree on
> +

Re: [PATCH] Notes on the warnings-as-errors change in GCC 14

2024-02-15 Thread Florian Weimer
* Gerald Pfeifer:

> On Fri, 2 Feb 2024, Florian Weimer wrote:
>>  htdocs/gcc-14/porting_to.html | 465 
>> ++
>>  1 file changed, 465 insertions(+)
>> +
>> +Using pointers as integers and vice versa 
>> (-Werror=int-conversion)
>
>> +It makes sense to address missing int type, implicit
>
> Should this be plural here ("int types") or some adding a 
> word such as "declaration"? Genuine question.

“missing int type[s]” seems to be okay.

>> +Some of them will be caused by missing types resulting
>> +in int, and the default int return type of
>> +implicitly declared functions.
>
> ...resulting in implicit int... or something like that?
> Not sure how to be phrase it.

I went with: “missing types [treated as] int”

>> +GCC no longer casts all pointer types to all other pointer types.
>
> Do you mean it no longer does so implicitly, or not at all? That is,
> there are now cases where even an explicit cast such as
>
>   foo_p = (foo_type*) bar_p
>
> no longer works? Or just
>
>   foo_p = bar_p
>
> no longer works for all combinations?

The latter, other reviewers noted it as well, and I've got this now:
“GCC no longer [allows implicitly casting] all pointer types to all”

>> +appropriate casts, and maybe consider using code void *
>> +in more places (particularly for old programs that predate the
>> +introduction of void into the C language).
>
> Here I got confused.
>
> At first I thought I was reading that void * should be used 
> for cases where void did not exist yet. Now I think I 
> understand: this is about programs where void * was not used 
> since it was not part of the language yet - and the encouragement is to 
> update such old code by using it. 
>
> If so, how about making the second case void *, too?

Makes sense.  Technically you can't have void * without void,
but I can see this may be confusing.

>> +#include 
>
> I *think* we may need to use > here instead of plain '>', though I may 
> be wrong.

No, only < needs to be quoted.  This is true even for XML, not just
HTML5.  Do you want me to change these to >?

>> +
>> +int
>> +compare (const void *a1, const void *b1)
>> +{
>> +  char *const *a = a1;
>> +  char *const *b = b1;
>> +  return strcmp (*a, *b);
>> +}
>> +
>
> Great that you include this example here, that really helps!
>
> Just why "const void *a1" versus "char *const *a", that is, the different 
> placement of const?

It's the right type. 8-)  The examples uses an array of char *,
not const char *.

>> +unrelated to the actual objective of the probe.  These failed probes
>> +tend to consistently disable program features and their tests, which
>> +means that an unexpected probe failure may result in silently dropping
>> +features.
>
> Omit "consistently"? I'm not sure what it adds here. And simplify the 
> second half, something like
>
>   These failed probes tend to disable program features (and their tests), 
>   resulting in silently dropping features.

What about this?

   These failed probes tend to disable program features [together with]
   their tests[], resulting in silently dropping features.

This what I meant with “consistently”: implementations and tests are
gone, so the testsuite doesn't flag it.

>> +In cases where this is a concern, generated config.log,
>> +config.h and other source code files can be compared
>> +using https://www.gnu.org/software/diffutils/";>diff,
>> +to ensure there are no unexpected differences.
>
> I wouldn't link to GNU diffutils here; just refer to the diff 
> command - or even omit that aspect and leave it at "can be compared".

diff is really useful for that, manual comparison isn't. 8-)
I can drop the hyperlink.

>> +Some build systems do not pass the CFLAGS environment
>> +or make variable to all parts of the builds
>
> Is "make" a common variable? What is the context here?

Hmm, I meant to allude $(CFLAGS) here.

“CFLAGS [] variable to all parts of the builds” should be
sufficient.

>> +
>> +It is unclear at which point GCC can enable the C23 bool
>> +keyword by default (making the bool type available
>> +without including #include  explicitly).
>
> Does C every include some header files implicitly?

GCC does, for .  Not relevant here, though.

> For the benefit of the doubt: Okay, and thank you, modulo feedback from 
> Jonathan and my two responses.

Thank you for your review.

I need to add two more code examples to the Autoconf section, should I
post a v2 with that, or add that in a subsequent commit?

Florian



Re: [PATCH] Notes on the warnings-as-errors change in GCC 14

2024-02-15 Thread Florian Weimer
* Gerald Pfeifer:

> On Fri, 2 Feb 2024, Florian Weimer wrote:
>> +Certain warnings are now errors
>
> That's quite a nice description, thank you, Florian!
>
>> +The initial ISO C standard and its 1999 revision removed support for
>
> May I suggest to wrap paragraphs in ...? Not strictly necessary any 
> more, now that we switched to HTML 5, though more consistent and explicit.

I've tried this now, and adds extra whitespace in some cases, for
example here:

“

In most cases, simply adding the missing int keyword
addresses the error.  For example, a flag variable like



  static initialized;



becomes:



  static int initialized;

”

I would have to nest the  in the , which suggests a complexity
that just isn't there.

Thanks,
Florian



Re: [PATCH] libgccjit: Add count zeroes builtins to ensure_optimization_builtins_exist

2024-02-15 Thread David Malcolm
On Thu, 2024-02-15 at 11:32 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds some missing builtins that can be generated by
> optimizations.
> I'm not sure how to add a test for this one.
> Do you know the C code that can be optimized to a builtin_clz?

I don't.  Given a reproducer we could probably figure it out, but it's
probably not worth bothering.

> Thanks for the review.

Thanks, looks good to me for trunk.

Dave



[PATCH] testsuite: Define _POSIX_SOURCE for tests [PR113278]

2024-02-15 Thread Torbjörn SVENSSON
Ok for trunk?

--

As the tests assume that fileno() is visible (only part of POSIX),
define the guard to ensure that it's visible.  Currently, glibc appears
to always have this defined in C++, newlib does not.

Without this patch, fails like this can be seen:

Testing analyzer/fileno-1.c,  -std=c++98
.../fileno-1.c: In function 'int test_pass_through(FILE*)':
.../fileno-1.c:5:10: error: 'fileno' was not declared in this scope
FAIL: c-c++-common/analyzer/fileno-1.c  -std=c++98 (test for excess errors)

Patch has been verified on Linux.

gcc/testsuite/ChangeLog:
PR113278
* c-c++-common/analyzer/fileno-1.c: Define _POSIX_SOURCE.
* c-c++-common/analyzer/flex-with-call-summaries.c: Same.
* c-c++-common/analyzer/flex-without-call-summaries.c: Same.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/c-c++-common/analyzer/fileno-1.c  | 2 ++
 gcc/testsuite/c-c++-common/analyzer/flex-with-call-summaries.c  | 1 +
 .../c-c++-common/analyzer/flex-without-call-summaries.c | 1 +
 3 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/analyzer/fileno-1.c 
b/gcc/testsuite/c-c++-common/analyzer/fileno-1.c
index d34e51a5022..9f9af7116e6 100644
--- a/gcc/testsuite/c-c++-common/analyzer/fileno-1.c
+++ b/gcc/testsuite/c-c++-common/analyzer/fileno-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-D_POSIX_SOURCE" } */
+
 #include 
 
 int test_pass_through (FILE *stream)
diff --git a/gcc/testsuite/c-c++-common/analyzer/flex-with-call-summaries.c 
b/gcc/testsuite/c-c++-common/analyzer/flex-with-call-summaries.c
index 963a84bc9ab..cbb953ad06a 100644
--- a/gcc/testsuite/c-c++-common/analyzer/flex-with-call-summaries.c
+++ b/gcc/testsuite/c-c++-common/analyzer/flex-with-call-summaries.c
@@ -6,6 +6,7 @@
 /* { dg-additional-options "-fanalyzer-call-summaries" } */
 /* { dg-additional-options "-Wno-analyzer-too-complex" } */
 /* { dg-additional-options "-Wno-analyzer-symbol-too-complex" } */
+/* { dg-additional-options "-D_POSIX_SOURCE" } */
 
 /* A lexical scanner generated by flex */
 
diff --git a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c 
b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
index b1c23312137..c6ecb25d25d 100644
--- a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
+++ b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
@@ -4,6 +4,7 @@
 /* { dg-additional-options "-fno-analyzer-call-summaries" } */
 
 /* { dg-additional-options "-Wno-analyzer-too-complex" } */
+/* { dg-additional-options "-D_POSIX_SOURCE" } */
 
 
 /* A lexical scanner generated by flex */
-- 
2.25.1



RE: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by adding whole vector shift right [PR113872]

2024-02-15 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Tamar Christina 
> Sent: Thursday, February 15, 2024 8:27 AM
> To: Richard Sandiford ; Andrew Pinski (QUIC)
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by adding
> whole vector shift right [PR113872]
> 
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: Thursday, February 15, 2024 2:56 PM
> > To: Andrew Pinski 
> > Cc: gcc-patches@gcc.gnu.org; Tamar Christina 
> > Subject: Re: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by
> > adding whole vector shift right [PR113872]
> >
> > Andrew Pinski  writes:
> > > The backend currently defines a whole vector shift left for 64bit
> > > vectors, adding
> > the
> > > shift right can also improve code for some PERMs too. So this adds that
> pattern.
> >
> > Is this reversed?  It looks like we have the shift right and the patch
> > is adding the shift left (at least in GCC internal and little-endian terms).
> >
> > But on many Arm cores, EXT has a higher throughput than SHL, so I
> > don't think we should do this unconditionally.
> 
> Yeah, on most (if not all) all Arm cores the EXT has higher throughput than 
> SHL
> and on Cortex-A75 the EXT has both higher throughput and lower latency.
> 
> I guess the expected gain here is that we wouldn't need to create the zero
> vector, However on modern Arm cores the zero vector creation is free using
> movi and EXT being three operand also means we only need one copy if e.g in
> a loop.

That might be true on Arm's cores but that is not true on Quacom's cores which 
I am working with.
EXT and SHL have the same throughput and latency but movi is not free (in that 
it still not done in the renamer) and a register is definitely not free if 
there is huge register pressure.
So I think we will need to figure out a way to have this as a tuning mechanism.

Thanks,
Andrew Pinski

> 
> Kind Regards,
> Tamar
> 
> >
> > Thanks,
> > Richard
> >
> > >
> > > I added a testcase for the shift left also. I also fixed the
> > > instruction template there which was using a space instead of a tab after
> the instruction.
> > >
> > > Built and tested on aarch64-linux-gnu.
> > >
> > >   PR target/113872
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-simd.md
> (vec_shr_):
> > Use tab instead of space after
> > >   the instruction in the template.
> > >   (vec_shl_): New pattern
> > >   * config/aarch64/iterators.md (unspec): Add UNSPEC_VEC_SHL
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/perm_zero-1.c: New test.
> > >   * gcc.target/aarch64/perm_zero-2.c: New test.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/config/aarch64/aarch64-simd.md | 18 --
> > >  gcc/config/aarch64/iterators.md|  1 +
> > >  gcc/testsuite/gcc.target/aarch64/perm_zero-1.c | 15 +++
> > > gcc/testsuite/gcc.target/aarch64/perm_zero-2.c | 15 +++
> > >  4 files changed, 47 insertions(+), 2 deletions(-)  create mode
> > > 100644 gcc/testsuite/gcc.target/aarch64/perm_zero-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/perm_zero-2.c
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > > index f8bb973a278..0d2f1ea3902 100644
> > > --- a/gcc/config/aarch64/aarch64-simd.md
> > > +++ b/gcc/config/aarch64/aarch64-simd.md
> > > @@ -1592,9 +1592,23 @@ (define_insn
> "vec_shr_"
> > >"TARGET_SIMD"
> > >{
> > >  if (BYTES_BIG_ENDIAN)
> > > -  return "shl %d0, %d1, %2";
> > > +  return "shl\t%d0, %d1, %2";
> > >  else
> > > -  return "ushr %d0, %d1, %2";
> > > +  return "ushr\t%d0, %d1, %2";
> > > +  }
> > > +  [(set_attr "type" "neon_shift_imm")]
> > > +)
> > > +(define_insn "vec_shl_"
> > > +  [(set (match_operand:VD 0 "register_operand" "=w")
> > > +(unspec:VD [(match_operand:VD 1 "register_operand" "w")
> > > + (match_operand:SI 2 "immediate_operand" "i")]
> > > +UNSPEC_VEC_SHL))]
> > > +  "TARGET_SIMD"
> > > +  {
> > > +if (BYTES_BIG_ENDIAN)
> > > +  return "ushr\t%d0, %d1, %2";
> > > +else
> > > +  return "shl\t%d0, %d1, %2";
> > >}
> > >[(set_attr "type" "neon_shift_imm")]
> > >  )
> > > diff --git a/gcc/config/aarch64/iterators.md
> > > b/gcc/config/aarch64/iterators.md index 99cde46f1ba..3aebe9cf18a
> > > 100644
> > > --- a/gcc/config/aarch64/iterators.md
> > > +++ b/gcc/config/aarch64/iterators.md
> > > @@ -758,6 +758,7 @@ (define_c_enum "unspec"
> > >  UNSPEC_PMULL; Used in aarch64-simd.md.
> > >  UNSPEC_PMULL2   ; Used in aarch64-simd.md.
> > >  UNSPEC_REV_REGLIST  ; Used in aarch64-simd.md.
> > > +UNSPEC_VEC_SHL  ; Used in aarch64-simd.md.
> > >  UNSPEC_VEC_SHR  ; Used in aarch64-simd.md.
> > >  UNSPEC_SQRDMLAH ; Used in aarch64-simd.md.
> > >  UNSPEC_SQRDMLSH ; Used in aarch64-simd.md.
> > > diff --git a/gcc/testsuite/gcc.target/

Re: [PATCH] tree-optimization/113910 - huge compile time during PTA

2024-02-15 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 14 Feb 2024, Richard Biener wrote:
>
>> For the testcase in PR113910 we spend a lot of time in PTA comparing
>> bitmaps for looking up equivalence class members.  This points to
>> the very weak bitmap_hash function which effectively hashes set
>> and a subset of not set bits.  The following improves it by mixing
>> that weak result with the population count of the bitmap, reducing
>> the number of collisions significantly.  It's still by no means
>> a good hash function.
>> 
>> One major problem with it was that it simply truncated the
>> BITMAP_WORD sized intermediate hash to hashval_t which is
>> unsigned int, effectively not hashing half of the bits.  That solves
>> most of the slowness.  Mixing in the population count improves
>> compile-time by another 30% though.
>> 
>> This reduces the compile-time for the testcase from tens of minutes
>> to 30 seconds and PTA time from 99% to 25%.  bitmap_equal_p is gone
>> from the profile.
>> 
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu, will
>> push to trunk and branches.
>
> Ha, and it breaks bootstrap because I misunderstood
> bitmap_count_bits_in_word (should be word_s_).  Fixing this turns
> out that hashing the population count doesn't help anything
> so I'm re-testing the following simpler variant, giving up on the
> cheap last 25% but solving the regression as well.
>
> Richard.
>
> From a76aebfdc4b6247db6a061e6395fd088a5694122 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Wed, 14 Feb 2024 12:33:13 +0100
> Subject: [PATCH] tree-optimization/113910 - huge compile time during PTA
> To: gcc-patches@gcc.gnu.org
>
> For the testcase in PR113910 we spend a lot of time in PTA comparing
> bitmaps for looking up equivalence class members.  This points to
> the very weak bitmap_hash function which effectively hashes set
> and a subset of not set bits.
>
> The major problem with it is that it simply truncates the
> BITMAP_WORD sized intermediate hash to hashval_t which is
> unsigned int, effectively not hashing half of the bits.
>
> This reduces the compile-time for the testcase from tens of minutes
> to 42 seconds and PTA time from 99% to 46%.
>
>   PR tree-optimization/113910
>   * bitmap.cc (bitmap_hash): Mix the full element "hash" to
>   the hashval_t hash.
> ---
>  gcc/bitmap.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
> index 6cf326bca5a..459e32c1ad1 100644
> --- a/gcc/bitmap.cc
> +++ b/gcc/bitmap.cc
> @@ -2706,7 +2706,7 @@ bitmap_hash (const_bitmap head)
>for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
>   hash ^= ptr->bits[ix];
>  }
> -  return (hashval_t)hash;
> +  return iterative_hash (&hash, sizeof (hash), 0);
>  }
>  
>  

LGTM FWIW, but just curious: does using the iterative hash routines for
each update (instead of ^) help too, or is it too slow?  Or maybe do an
iterative hash for the idx part and keep ^ for the bits accumulation?
Also wonder whether using + rather than ^ for the bits accumulation
would help...

Thanks,
Richard




Re: [PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Ajit Agarwal
Hello Alex:

On 15/02/24 10:12 pm, Alex Coplan wrote:
> On 15/02/2024 21:24, Ajit Agarwal wrote:
>> Hello Richard:
>>
>> As per your suggestion I have divided the patch into target independent
>> and target dependent for aarch64 target. I kept aarch64-ldp-fusion same
>> and did not change that.
> 
> I'm not sure this was what Richard suggested doing, though.
> He said (from
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645545.html):
> 
>> Maybe one way of making the review easier would be to split the aarch64
>> pass into the "target-dependent" and "target-independent" pieces
>> in-place, i.e. keeping everything within aarch64-ldp-fusion.cc, and then
>> (as separate patches) move the target-independent pieces outside
>> config/aarch64.
> 
> but this adds the target-independent parts separately instead of
> splitting it out within config/aarch64 (which I agree should make the
> review easier).

I am sorry I didnt follow. Can you kindly elaborate on this.

Thanks & Regards
Ajit
> 
> Thanks,
> Alex
> 
>>
>> Common infrastructure of load store pair fusion is divided into
>> target independent and target dependent code for rs6000 and aarch64
>> target.
>>
>> Target independent code is structured in the following files.
>> gcc/pair-fusion-base.h
>> gcc/pair-fusion-common.cc
>> gcc/pair-fusion.cc
>>
>> Target independent code is the Generic code with pure virtual
>> function to interface betwwen target independent and dependent
>> code.
>>
>> Thanks & Regards
>> Ajit
>>
>> Target independent code for common infrastructure of load
>> store fusion for rs6000 and aarch64 target.
>>
>> Common infrastructure of load store pair fusion is divided into
>> target independent and target dependent code for rs6000 and aarch64
>> target.
>>
>> Target independent code is structured in the following files.
>> gcc/pair-fusion-base.h
>> gcc/pair-fusion-common.cc
>> gcc/pair-fusion.cc
>>
>> Target independent code is the Generic code with pure virtual
>> function to interface betwwen target independent and dependent
>> code.
>>
>> 2024-02-15  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * pair-fusion-base.h: Generic header code for load store fusion
>>  that can be shared across different architectures.
>>  * pair-fusion-common.cc: Generic source code for load store
>>  fusion that can be shared across different architectures.
>>  * pair-fusion.cc: Generic implementation of pair_fusion class
>>  defined in pair-fusion-base.h
>>  * Makefile.in: Add new executable pair-fusion.o and
>>  pair-fusion-common.o.
>> ---
>>  gcc/Makefile.in   |2 +
>>  gcc/pair-fusion-base.h|  586 ++
>>  gcc/pair-fusion-common.cc | 1202 
>>  gcc/pair-fusion.cc| 1225 +
>>  4 files changed, 3015 insertions(+)
>>  create mode 100644 gcc/pair-fusion-base.h
>>  create mode 100644 gcc/pair-fusion-common.cc
>>  create mode 100644 gcc/pair-fusion.cc
>>
>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>> index a74761b7ab3..df5061ddfe7 100644
>> --- a/gcc/Makefile.in
>> +++ b/gcc/Makefile.in
>> @@ -1563,6 +1563,8 @@ OBJS = \
>>  ipa-strub.o \
>>  ipa.o \
>>  ira.o \
>> +pair-fusion-common.o \
>> +pair-fusion.o \
>>  ira-build.o \
>>  ira-costs.o \
>>  ira-conflicts.o \
>> diff --git a/gcc/pair-fusion-base.h b/gcc/pair-fusion-base.h
>> new file mode 100644
>> index 000..fdaf4fd743d
>> --- /dev/null
>> +++ b/gcc/pair-fusion-base.h
>> @@ -0,0 +1,586 @@
>> +// Generic code for Pair MEM  fusion optimization pass.
>> +// Copyright (C) 2023-2024 Free Software Foundation, Inc.
>> +//
>> +// This file is part of GCC.
>> +//
>> +// GCC is free software; you can redistribute it and/or modify it
>> +// under the terms of the GNU General Public License as published by
>> +// the Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +//
>> +// GCC is distributed in the hope that it will be useful, but
>> +// WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +// General Public License for more details.
>> +//
>> +// You should have received a copy of the GNU General Public License
>> +// along with GCC; see the file COPYING3.  If not see
>> +// .
>> +
>> +#ifndef GCC_PAIR_FUSION_H
>> +#define GCC_PAIR_FUSION_H
>> +#define INCLUDE_ALGORITHM
>> +#define INCLUDE_FUNCTIONAL
>> +#define INCLUDE_LIST
>> +#define INCLUDE_TYPE_TRAITS
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "backend.h"
>> +#include "rtl.h"
>> +#include "df.h"
>> +#include "rtl-iter.h"
>> +#include "rtl-ssa.h"
>> +#include "cfgcleanup.h"
>> +#include "tree-pass.h"
>> +#include "ordered-hash-map.h"
>> +#include "tree-dfa.h"
>> +#include "fold-const.h"
>> +#include "tree-hash-traits.h"
>> +#include "print-tree.h"
>> +#include "insn

Re: [PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Alex Coplan
On 15/02/2024 22:38, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 15/02/24 10:12 pm, Alex Coplan wrote:
> > On 15/02/2024 21:24, Ajit Agarwal wrote:
> >> Hello Richard:
> >>
> >> As per your suggestion I have divided the patch into target independent
> >> and target dependent for aarch64 target. I kept aarch64-ldp-fusion same
> >> and did not change that.
> > 
> > I'm not sure this was what Richard suggested doing, though.
> > He said (from
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645545.html):
> > 
> >> Maybe one way of making the review easier would be to split the aarch64
> >> pass into the "target-dependent" and "target-independent" pieces
> >> in-place, i.e. keeping everything within aarch64-ldp-fusion.cc, and then
> >> (as separate patches) move the target-independent pieces outside
> >> config/aarch64.
> > 
> > but this adds the target-independent parts separately instead of
> > splitting it out within config/aarch64 (which I agree should make the
> > review easier).
> 
> I am sorry I didnt follow. Can you kindly elaborate on this.

So IIUC Richard was suggesting splitting into target-independent and
target-dependent pieces within aarch64-ldp-fusion.cc as a first step,
i.e. you introduce the abstractions (virtual functions) needed within
that file.  That should hopefully be a relatively small diff.

Then in a separate patch you can move the target-independent parts out of
config/aarch64.

Does that make sense?

Thanks,
Alex

> 
> Thanks & Regards
> Ajit
> > 
> > Thanks,
> > Alex
> > 
> >>
> >> Common infrastructure of load store pair fusion is divided into
> >> target independent and target dependent code for rs6000 and aarch64
> >> target.
> >>
> >> Target independent code is structured in the following files.
> >> gcc/pair-fusion-base.h
> >> gcc/pair-fusion-common.cc
> >> gcc/pair-fusion.cc
> >>
> >> Target independent code is the Generic code with pure virtual
> >> function to interface betwwen target independent and dependent
> >> code.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >> Target independent code for common infrastructure of load
> >> store fusion for rs6000 and aarch64 target.
> >>
> >> Common infrastructure of load store pair fusion is divided into
> >> target independent and target dependent code for rs6000 and aarch64
> >> target.
> >>
> >> Target independent code is structured in the following files.
> >> gcc/pair-fusion-base.h
> >> gcc/pair-fusion-common.cc
> >> gcc/pair-fusion.cc
> >>
> >> Target independent code is the Generic code with pure virtual
> >> function to interface betwwen target independent and dependent
> >> code.
> >>
> >> 2024-02-15  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >>* pair-fusion-base.h: Generic header code for load store fusion
> >>that can be shared across different architectures.
> >>* pair-fusion-common.cc: Generic source code for load store
> >>fusion that can be shared across different architectures.
> >>* pair-fusion.cc: Generic implementation of pair_fusion class
> >>defined in pair-fusion-base.h
> >>* Makefile.in: Add new executable pair-fusion.o and
> >>pair-fusion-common.o.
> >> ---
> >>  gcc/Makefile.in   |2 +
> >>  gcc/pair-fusion-base.h|  586 ++
> >>  gcc/pair-fusion-common.cc | 1202 
> >>  gcc/pair-fusion.cc| 1225 +
> >>  4 files changed, 3015 insertions(+)
> >>  create mode 100644 gcc/pair-fusion-base.h
> >>  create mode 100644 gcc/pair-fusion-common.cc
> >>  create mode 100644 gcc/pair-fusion.cc
> >>
> >> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> >> index a74761b7ab3..df5061ddfe7 100644
> >> --- a/gcc/Makefile.in
> >> +++ b/gcc/Makefile.in
> >> @@ -1563,6 +1563,8 @@ OBJS = \
> >>ipa-strub.o \
> >>ipa.o \
> >>ira.o \
> >> +  pair-fusion-common.o \
> >> +  pair-fusion.o \
> >>ira-build.o \
> >>ira-costs.o \
> >>ira-conflicts.o \
> >> diff --git a/gcc/pair-fusion-base.h b/gcc/pair-fusion-base.h
> >> new file mode 100644
> >> index 000..fdaf4fd743d
> >> --- /dev/null
> >> +++ b/gcc/pair-fusion-base.h
> >> @@ -0,0 +1,586 @@
> >> +// Generic code for Pair MEM  fusion optimization pass.
> >> +// Copyright (C) 2023-2024 Free Software Foundation, Inc.
> >> +//
> >> +// This file is part of GCC.
> >> +//
> >> +// GCC is free software; you can redistribute it and/or modify it
> >> +// under the terms of the GNU General Public License as published by
> >> +// the Free Software Foundation; either version 3, or (at your option)
> >> +// any later version.
> >> +//
> >> +// GCC is distributed in the hope that it will be useful, but
> >> +// WITHOUT ANY WARRANTY; without even the implied warranty of
> >> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> >> +// General Public License for more details.
> >> +//
> >> +// You should have received a copy of the GNU General Public License
> >> +// along with GCC; see the file COP

Re: [PATCH] testsuite: Define _POSIX_SOURCE for tests [PR113278]

2024-02-15 Thread Mike Stump
On Feb 15, 2024, at 9:03 AM, Torbjörn SVENSSON  
wrote:
> 
> Ok for trunk?

Ok.

> gcc/testsuite/ChangeLog:
>   PR113278
>   * c-c++-common/analyzer/fileno-1.c: Define _POSIX_SOURCE.
>   * c-c++-common/analyzer/flex-with-call-summaries.c: Same.
>   * c-c++-common/analyzer/flex-without-call-summaries.c: Same.


Re: [PATCH] tree-optimization/113910 - huge compile time during PTA

2024-02-15 Thread Richard Biener



> Am 15.02.2024 um 18:06 schrieb Richard Sandiford :
> 
> Richard Biener  writes:
>>> On Wed, 14 Feb 2024, Richard Biener wrote:
>>> 
>>> For the testcase in PR113910 we spend a lot of time in PTA comparing
>>> bitmaps for looking up equivalence class members.  This points to
>>> the very weak bitmap_hash function which effectively hashes set
>>> and a subset of not set bits.  The following improves it by mixing
>>> that weak result with the population count of the bitmap, reducing
>>> the number of collisions significantly.  It's still by no means
>>> a good hash function.
>>> 
>>> One major problem with it was that it simply truncated the
>>> BITMAP_WORD sized intermediate hash to hashval_t which is
>>> unsigned int, effectively not hashing half of the bits.  That solves
>>> most of the slowness.  Mixing in the population count improves
>>> compile-time by another 30% though.
>>> 
>>> This reduces the compile-time for the testcase from tens of minutes
>>> to 30 seconds and PTA time from 99% to 25%.  bitmap_equal_p is gone
>>> from the profile.
>>> 
>>> Bootstrap and regtest running on x86_64-unknown-linux-gnu, will
>>> push to trunk and branches.
>> 
>> Ha, and it breaks bootstrap because I misunderstood
>> bitmap_count_bits_in_word (should be word_s_).  Fixing this turns
>> out that hashing the population count doesn't help anything
>> so I'm re-testing the following simpler variant, giving up on the
>> cheap last 25% but solving the regression as well.
>> 
>> Richard.
>> 
>> From a76aebfdc4b6247db6a061e6395fd088a5694122 Mon Sep 17 00:00:00 2001
>> From: Richard Biener 
>> Date: Wed, 14 Feb 2024 12:33:13 +0100
>> Subject: [PATCH] tree-optimization/113910 - huge compile time during PTA
>> To: gcc-patches@gcc.gnu.org
>> 
>> For the testcase in PR113910 we spend a lot of time in PTA comparing
>> bitmaps for looking up equivalence class members.  This points to
>> the very weak bitmap_hash function which effectively hashes set
>> and a subset of not set bits.
>> 
>> The major problem with it is that it simply truncates the
>> BITMAP_WORD sized intermediate hash to hashval_t which is
>> unsigned int, effectively not hashing half of the bits.
>> 
>> This reduces the compile-time for the testcase from tens of minutes
>> to 42 seconds and PTA time from 99% to 46%.
>> 
>>PR tree-optimization/113910
>>* bitmap.cc (bitmap_hash): Mix the full element "hash" to
>>the hashval_t hash.
>> ---
>> gcc/bitmap.cc | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
>> index 6cf326bca5a..459e32c1ad1 100644
>> --- a/gcc/bitmap.cc
>> +++ b/gcc/bitmap.cc
>> @@ -2706,7 +2706,7 @@ bitmap_hash (const_bitmap head)
>>   for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
>>hash ^= ptr->bits[ix];
>> }
>> -  return (hashval_t)hash;
>> +  return iterative_hash (&hash, sizeof (hash), 0);
>> }
>> 
>> 
> 
> LGTM FWIW, but just curious: does using the iterative hash routines for
> each update (instead of ^) help too, or is it too slow?

It helps but is too slow.

>  Or maybe do an
> iterative hash for the idx part and keep ^ for the bits accumulation?
> Also wonder whether using + rather than ^ for the bits accumulation
> would help...

I have a patch picking the new BFD string hash for this which is fast.  I’ll do 
this for stage1, trying to replace our generic hash function.

Richard 

> Thanks,
> Richard
> 
> 


Re: [PATCH] testsuite: Add support for scanning assembly with comparitor

2024-02-15 Thread Mike Stump
On Feb 12, 2024, at 11:38 AM, Edwin Lu  wrote:
> 
> There is currently no support for matching at least x lines of assembly
> (only scan-assembler-times). This patch would allow setting upper or lower
> bounds.
> 
> Use case: using different scheduler descriptions and/or cost models will 
> change
> assembler output. Testing common functionality across tunes would require a
> separate testcase per tune since each assembly output would be different. If 
> we
> know a base number of lines should appear across all tunes (i.e. testing 
> return
> values: we expect at minimum n stores into register x), we can lower-bound the
> test to search for scan-assembler-bound {RE for storing into register x} >= n.
> This avoids artificially inflating the scan-assembler-times expected count due
> to the assembler choosing to perform extra stores into register x (using it as
> a temporary register).
> 
> The testcase would be more robust to cpu/tune changes at the cost of not being
> as granular towards specific cpu tuning.

I didn't see an Ok?  Just in case you forgot, yes, this is ok.

Re: [COMMITTED V3 1/4] RISC-V: Add non-vector types to dfa pipelines

2024-02-15 Thread Edwin Lu

On 2/15/2024 1:25 AM, Li, Pan2 wrote:


Sorry for late reply due to holiday. I double-checked the 
calling-convernsion-*.c dump, it is safe to adjust the asm check to the number 
as you mentioned.


Hi Pan,

I hope you had a good holiday! I already changed the numbers and added a 
bit more checks and documentation to the calling-convention-*.c files in 
this patch 
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645638.html.


If you have the time, it'd be great if you can take a look at it.

Thanks!
Edwin



Re: [PATCH] testsuite: Define _POSIX_SOURCE for tests [PR113278]

2024-02-15 Thread Torbjorn SVENSSON




On 2024-02-15 18:18, Mike Stump wrote:

On Feb 15, 2024, at 9:03 AM, Torbjörn SVENSSON  
wrote:


Ok for trunk?


Ok.


Pushed as 8e8c2d2b34971bb29e74341a3efc625f1db06639.




gcc/testsuite/ChangeLog:
PR113278
* c-c++-common/analyzer/fileno-1.c: Define _POSIX_SOURCE.
* c-c++-common/analyzer/flex-with-call-summaries.c: Same.
* c-c++-common/analyzer/flex-without-call-summaries.c: Same.


Re: [PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Ajit Agarwal



On 15/02/24 10:43 pm, Alex Coplan wrote:
> So IIUC Richard was suggesting splitting into target-independent and
> target-dependent pieces within aarch64-ldp-fusion.cc as a first step,
> i.e. you introduce the abstractions (virtual functions) needed within
> that file.  That should hopefully be a relatively small diff.
> 
> Then in a separate patch you can move the target-independent parts out of
> config/aarch64.
> 
> Does that make sense?

Thanks a lot for explaining this. Sure I will do that and send the patch as per
above.

Thanks & Regards
Ajit


[PATCH] Fortran: fix passing array component to polymorphic argument [PR105658]

2024-02-15 Thread Peter Hill
Dear all,

The attached patch fixes PR105658 by forcing an array temporary to be
created. This is required when passing an array component, but this
didn't happen if the dummy argument was an unlimited polymorphic type.

The problem bit of code is in `gfc_conv_expr_descriptor`, near L7828:

  subref_array_target = (is_subref_array (expr)
 && (se->direct_byref
|| expr->ts.type == BT_CHARACTER));
  need_tmp = (gfc_ref_needs_temporary_p (expr->ref)
  && !subref_array_target);

where `need_tmp` is being evaluated to 0.  The logic here isn't clear
to me, and this function is used in several places, which is why I
went with setting `parmse.force_tmp = 1` in `gfc_conv_procedure_call`
and using the same conditional as the later branch for the
non-polymorphic case (near the call to `gfc_conv_subref_array_arg`)

If this patch is ok, please could someone commit it for me? This is my
first patch for GCC, so apologies in advance if the commit message is
missing something.

Tested on x86_64-pc-linux-gnu.

The bug is present in gfortran back to 4.9, so should it also be backported?

Cheers,
Peter

 PR fortran/105658

gcc/fortran/ChangeLog

* trans-expr.cc (gfc_conv_procedure_call): When passing an
array component reference of intrinsic type to a procedure
with an unlimited polymorphic dummy argument, a temporary
should be created.

gcc/testsuite/ChangeLog

* gfortran.dg/PR105658.f90: New test.
---
 gcc/fortran/trans-expr.cc  |  8 
 gcc/testsuite/gfortran.dg/PR105658.f90 | 25 +
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/PR105658.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index a0593b76f18..7fd3047c4e9 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6439,6 +6439,14 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   CLASS object for the unlimited polymorphic formal.  */
gfc_find_vtab (&e->ts);
gfc_init_se (&parmse, se);
+   /* The actual argument is a component reference to an array
+  of derived types, so we need to force creation of a
+  temporary */
+   if (e->expr_type == EXPR_VARIABLE
+   && is_subref_array (e)
+   && !(fsym && fsym->attr.pointer))
+ parmse.force_tmp = 1;
+
gfc_conv_intrinsic_to_class (&parmse, e, fsym->ts);

  }
diff --git a/gcc/testsuite/gfortran.dg/PR105658.f90
b/gcc/testsuite/gfortran.dg/PR105658.f90
new file mode 100644
index 000..407ee25f77c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR105658.f90
@@ -0,0 +1,25 @@
+! { dg-do compile }
+! { dg-options "-Warray-temporaries" }
+! Test fix for incorrectly passing array component to unlimited
polymorphic procedure
+
+module test_PR105658_mod
+  implicit none
+  type :: foo
+integer :: member1
+integer :: member2
+  end type foo
+contains
+  subroutine print_poly(array)
+class(*), dimension(:), intent(in) :: array
+select type(array)
+type is (integer)
+  print*, array
+end select
+  end subroutine print_poly
+
+  subroutine do_print(thing)
+type(foo), dimension(3), intent(in) :: thing
+call print_poly(thing%member1) ! { dg-warning "array temporary" }
+  end subroutine do_print
+
+end module test_PR105658_mod
-- 
2.43.0


Re: [PATCH] aarch64: Fix undefined code in vect_ctz_1.c

2024-02-15 Thread Richard Sandiford
Andrew Pinski  writes:
> The testcase gcc.target/aarch64/vect_ctz_1.c fails execution when running
> with -march=armv9-a due to the testcase calls __builtin_ctz with a value of 0.
> The testcase should not depend on undefined behavior of __builtin_ctz. So this
> changes it to use the g form with the 2nd argument of 32. Now the execution 
> part
> of the testcase work. It still has a scan-assembler failure which should be 
> fixed
> seperately.
>
> OK? Tested on aarch64-linux-gnu.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/vect_ctz_1.c (TEST): Use g form of the builtin and 
> pass 32
>   as the value expected at 0.

OK, but it looks like vect-clz.c could use the same fix.

I think we have enough coverage elsewhere that we still use CLZ for the
"normal" builtins (e.g. sve/clz_1.c).

Richard

> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c 
> b/gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c
> index c4eaf5b3a91..5fcf1e31ab2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c
> @@ -9,7 +9,7 @@ count_tz_##name (unsigned *__restrict a, int *__restrict b) \
>  { \
>int i; \
>for (i = 0; i < count; i++) \
> -b[i] = __builtin_##subname (a[i]); \
> +b[i] = __builtin_##subname##g (a[i], 32); \
>  }
>  
>  #define CHECK(name, count, input, output) \


Re: [PATCH] aarch64, acle header: Cast uint64_t pointers to DIMode.

2024-02-15 Thread Richard Sandiford
Iain Sandoe  writes:
>> On 5 Feb 2024, at 14:56, Iain Sandoe  wrote:
>> 
>> Tested on aarch64-linux,darwin and a cross from aarch64-darwin to linux,
>> OK for trunk, or some alternative is needed?
>
> Hmm.. apparently, this fails the linaro pre-commit CI for g++ with:
> error: invalid conversion from 'long int*' to 'long unsigned int*' 
> [-fpermissive]
>
> So, I guess some alternative is needed, advice welcome,

The builtins are registered with:

static void
aarch64_init_rng_builtins (void)
{
  tree unsigned_ptr_type = build_pointer_type (unsigned_intDI_type_node);
  ...

Does it work if you change unsigned_intDI_type_node to
get_typenode_from_name (UINT64_TYPE)?

Thanks,
Richard


[PATCH] c++/modules: optimize tree flag streaming

2024-02-15 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

One would expect consecutive calls to bytes_in/out::b for streaming
adjacent bits, as we do for tree flag streaming, to at least be
optimized by the compiler into individual bit operations using
statically known bit positions (and ideally merged into larger sized
reads/writes).

Unfortunately this doesn't happen because the compiler has trouble
tracking the values of this->bit_pos and this->bit_val across such
calls, likely because the compiler doesn't know 'this' and so it's
treated as global memory.  This means for each consecutive bit stream
operation, bit_pos and bit_val are loaded from memory, checked if
buffering is needed, and finally the bit is extracted from bit_val
according to the (unknown) bit_pos, even though relative to the previous
operation (if we didn't need to buffer) bit_val is unchanged and bit_pos
is just 1 larger.  This ends up being quite slow, with tree_node_bools
taking 10% of time when streaming in parts of the std module.

This patch optimizes this by making tracking of bit_pos and bit_val
easier for the compiler.  Rather than bit_pos and bit_val being members
of the (effectively global) bytes_in/out objects, this patch factors out
the bit streaming code/state into separate classes bits_in/out that get
constructed locally as needed for bit streaming.  Since these objects
are now clearly local, the compiler can more easily track their values.

And since bit streaming is intended to be batched it's natural for these
new classes to be RAII-enabled such that the bit stream is flushed upon
destruction.

In order to make the most of this improved tracking of bit position,
this patch changes parts where we conditionally stream a tree flag
to unconditionally stream (the flag or a dummy value).  That way
the number of bits streamed and the respective bit positions are as
statically known as reasonably possible.  In lang_decl_bools and
lang_type_bools we flush the current bit buffer at the start so that
subsequent bit positions are statically known.  And in core bools, we
can add explicit early exits utilizing invariants that the compiler
can't figure out itself (e.g. a tree code can't have both TS_TYPE_COMMON
and TS_DECL_COMMON, and if a tree code doesn't have TS_DECL_COMMON then
it doesn't have TS_DECL_WITH_VIS).  Finally if we're streaming fewer
than 4 bits, it's more space efficient to stream them as individual
bytes rather than as packed bits (due to the 32-bit buffer).

This patch also moves the definitions of the relevant streaming classes
into anonymous namespaces so that the compiler can make more informed
decisions about inlining their member functions.

After this patch, compile time for a simple Hello World using the std
module is reduced by 7% with a release compiler.  The on-disk size of
the std module increases by 0.7% (presumably due to the extra flushing
done in lang_decl_bools and lang_type_bools).

The bit stream out performance isn't improved as much as the stream in
due to the spans/lengths instrumentation performed on stream out (which
probably should be e.g. removed for release builds?)

gcc/cp/ChangeLog:

* module.cc
(class data): Enclose in an anonymous namespace.
(data::calc_crc): Moved from bytes::calc_crc.
(class bytes): Remove.  Move bit_flush to namespace scope.
(class bytes_in): Enclose in an anonymous namespace.  Inherit
directly from data and adjust accordingly.  Move b and bflush
members to bits_in.
(class bytes_out): As above.  Remove is_set static data member.
(bit_flush): Moved from class bytes.
(struct bits_in): Define.
(struct bits_out): Define.
(bytes_out::bflush): Moved to bits_out/in.
(bytes_in::bflush): Likewise
(bytes_in::bfill): Removed.
(bytes_out::b): Moved to bits_out/in.
(bytes_in::b): Likewise.
(class trees_in): Enclose in an anonymous namespace.
(class trees_out): Enclose in an anonymous namespace.
(trees_out::core_bools): Add bits_out/in parameter and use it.
Unconditionally stream a bit for public_flag.  Add early exits
as appropriate.
(trees_out::core_bools): Likewise.
(trees_out::lang_decl_bools): Add bits_out/in parameter and use
it.  Flush the current bit buffer at the start.  Unconditionally
stream a bit for module_keyed_decls_p.
(trees_in::lang_decl_bools): Likewise.
(trees_out::lang_type_bools): Add bits_out/in parameter and use
it.  Flush the current bit buffer at the start.
(trees_in::lang_type_bools): Likewise.
(trees_out::tree_node_bools): Construct a bits_out object and
use/pass it.
(trees_in::tree_node_bools): Likewise.
(trees_out::decl_value): Stream stray bit values as bytes.
(trees_in::decl_value): Likewise.
(module_state::write_define): Likewise.
(m

Re: [PATCH][_GLIBCXX_DEBUG] Fix std::__niter_base behavior

2024-02-15 Thread François Dumont


On 15/02/2024 14:17, Jonathan Wakely wrote:



On Wed, 14 Feb 2024 at 21:48, François Dumont  
wrote:



On 14/02/2024 20:44, Jonathan Wakely wrote:



On Wed, 14 Feb 2024 at 18:39, François Dumont
 wrote:

libstdc++: [_GLIBCXX_DEBUG] Fix std::__niter_base behavior

std::__niter_base is used in _GLIBCXX_DEBUG mode to remove
_Safe_iterator<>
wrapper on random access iterators. But doing so it should
also preserve
original
behavior to remove __normal_iterator wrapper.

libstdc++-v3/ChangeLog:

 * include/bits/stl_algobase.h (std::__niter_base):
Redefine the
overload
 definitions for __gnu_debug::_Safe_iterator.
 * include/debug/safe_iterator.tcc (std::__niter_base):
Adapt
declarations.

Ok to commit once all tests completed (still need to check
pre-c++11) ?



The declaration in  include/bits/stl_algobase.h has a
noexcept-specifier but the definition in
include/debug/safe_iterator.tcc does not have one - that seems
wrong (I'm surprised it even compiles).


It does !


The diagnostic is suppressed without -Wsystem-headers:

/home/jwakely/gcc/14/include/c++/14.0.1/debug/safe_iterator.tcc:255:5:warning: 
declaration of 'template constexpr decltype 
(std::__
niter_base(declval<_Ite>())) std::__niter_base(const 
__gnu_debug::_Safe_iterator<_Iterator, _Sequence, 
random_access_iterator_tag>&)' has a different except

ion specifier [-Wsystem-headers]
 255 | __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
 | ^~~~
/home/jwakely/gcc/14/include/c++/14.0.1/bits/stl_algobase.h:335:5:note: 
from previous declaration 'template constexpr 
decltype (std
::__niter_base(declval<_Ite>())) std::__niter_base(const 
__gnu_debug::_Safe_iterator<_Iterator, _Sequence, 
random_access_iterator_tag>&) noexcept (noexcept
(is_nothrow_copy_constructible(std::__niter_base(declval<_Ite>()))>::value))'

 335 | __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
 | ^~~~


It's a hard error with Clang though:

deb.cc:7:10: error: call to '__niter_base' is ambiguous



Yes, I eventually got the error too, I hadn't run enough tests yet.





I thought it was only necessary at declaration, and I also had
troubles doing it right at definition because of the interaction
with the auto and ->.


The trailing-return-type has to come after the noexcept-specifier.

Now simplified and consistent in this new proposal.



Just using std::is_nothrow_copy_constructible<_Ite> seems
simpler, that will be true for __normal_iterator if
is_nothrow_copy_constructible is true.


Ok



The definition in include/debug/safe_iterator.tcc should use
std::declval<_Ite>() not declval<_Ite>(). Is there any reason why
the definition uses a late-specified-return-type (i.e. auto and
->) when the declaration doesn't?



I initially plan to use '->
std::decltype(std::__niter_base(__it.base()))' but this did not
compile, ambiguity issue. So I resort to using std::declval and I
could have then done it the same way as declaration, done now.

Attached is what I'm testing, ok to commit once fully tested ?


OK, thanks.


Thanks for validation but I have a problem to test for c++98.

When I do:

make CXXFLAGS=-std=c++98 check-debug

I see in debug/libstdc++.log for example:

Executing on host: /home/fdumont/dev/gcc/build/./gcc/xg++ -shared-libgcc 
... -mshstk -std=c++98 -g -O2 -DLOCALEDIR="." -nostdinc++ 
-I/home/fdumont/dev/gcc/... 
/home/fdumont/dev/gcc/git/libstdc++-v3/testsuite/25_algorithms/copy/3.cc 
-D_GLIBCXX_DEBUG   -std=gnu++17  -include bits/stdc++.h ...  -lm -o 
./3.exe    (timeout = 360)


The -std=c++98 is there but later comes the -std=gnu++17 so I think it 
runs in C++17, no ?


I also tried the documented alternative:

make check 
'RUNTESTFLAGS=--target_board=unix/-O3\"{-std=gnu++98,-std=gnu++11,-std=gnu++14}\"'

but same problem, -std=gnu++17 comes last.

I'll try to rebuild all from scratch but I won't commit soon then.



Re: [PATCH][_GLIBCXX_DEBUG] Fix std::__niter_base behavior

2024-02-15 Thread Jonathan Wakely
On Thu, 15 Feb 2024 at 18:38, François Dumont  wrote:

>
> On 15/02/2024 14:17, Jonathan Wakely wrote:
>
>
>
> On Wed, 14 Feb 2024 at 21:48, François Dumont 
> wrote:
>
>>
>> On 14/02/2024 20:44, Jonathan Wakely wrote:
>>
>>
>>
>> On Wed, 14 Feb 2024 at 18:39, François Dumont 
>> wrote:
>>
>>> libstdc++: [_GLIBCXX_DEBUG] Fix std::__niter_base behavior
>>>
>>> std::__niter_base is used in _GLIBCXX_DEBUG mode to remove
>>> _Safe_iterator<>
>>> wrapper on random access iterators. But doing so it should also preserve
>>> original
>>> behavior to remove __normal_iterator wrapper.
>>>
>>> libstdc++-v3/ChangeLog:
>>>
>>>  * include/bits/stl_algobase.h (std::__niter_base): Redefine the
>>> overload
>>>  definitions for __gnu_debug::_Safe_iterator.
>>>  * include/debug/safe_iterator.tcc (std::__niter_base): Adapt
>>> declarations.
>>>
>>> Ok to commit once all tests completed (still need to check pre-c++11) ?
>>>
>>
>>
>> The declaration in  include/bits/stl_algobase.h has a noexcept-specifier
>> but the definition in include/debug/safe_iterator.tcc does not have one -
>> that seems wrong (I'm surprised it even compiles).
>>
>> It does !
>>
>
> The diagnostic is suppressed without -Wsystem-headers:
>
> /home/jwakely/gcc/14/include/c++/14.0.1/debug/safe_iterator.tcc:255:5: 
> warning:
> declaration of 'template constexpr decltype
> (std::__
> niter_base(declval<_Ite>())) std::__niter_base(const
> __gnu_debug::_Safe_iterator<_Iterator, _Sequence,
> random_access_iterator_tag>&)' has a different except
> ion specifier [-Wsystem-headers]
>  255 | __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
>  | ^~~~
> /home/jwakely/gcc/14/include/c++/14.0.1/bits/stl_algobase.h:335:5: note: from
> previous declaration 'template constexpr decltype
> (std
> ::__niter_base(declval<_Ite>())) std::__niter_base(const
> __gnu_debug::_Safe_iterator<_Iterator, _Sequence,
> random_access_iterator_tag>&) noexcept (noexcept
> (is_nothrow_copy_constructible (std::__niter_base(declval<_Ite>()))>::value))'
>  335 | __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
>  | ^~~~
>
>
> It's a hard error with Clang though:
>
> deb.cc:7:10: error: call to '__niter_base' is ambiguous
>
>
> Yes, I eventually got the error too, I hadn't run enough tests yet.
>
>
>
>
>
>
>> I thought it was only necessary at declaration, and I also had troubles
>> doing it right at definition because of the interaction with the auto and
>> ->.
>>
>
> The trailing-return-type has to come after the noexcept-specifier.
>
>
>
>> Now simplified and consistent in this new proposal.
>>
>>
>> Just using std::is_nothrow_copy_constructible<_Ite> seems simpler, that
>> will be true for __normal_iterator if
>> is_nothrow_copy_constructible is true.
>>
>> Ok
>>
>>
>> The definition in include/debug/safe_iterator.tcc should use
>> std::declval<_Ite>() not declval<_Ite>(). Is there any reason why the
>> definition uses a late-specified-return-type (i.e. auto and ->) when the
>> declaration doesn't?
>>
>>
>> I initially plan to use '->
>> std::decltype(std::__niter_base(__it.base()))' but this did not compile,
>> ambiguity issue. So I resort to using std::declval and I could have then
>> done it the same way as declaration, done now.
>>
>> Attached is what I'm testing, ok to commit once fully tested ?
>>
>
> OK, thanks.
>
> Thanks for validation but I have a problem to test for c++98.
>
> When I do:
>
> make CXXFLAGS=-std=c++98 check-debug
>

That doesn't work any more, see
https://gcc.gnu.org/onlinedocs/libstdc++/manual/test.html#test.run.permutations



> I see in debug/libstdc++.log for example:
>
> Executing on host: /home/fdumont/dev/gcc/build/./gcc/xg++ -shared-libgcc
> ... -mshstk -std=c++98 -g -O2 -DLOCALEDIR="." -nostdinc++
> -I/home/fdumont/dev/gcc/...
> /home/fdumont/dev/gcc/git/libstdc++-v3/testsuite/25_algorithms/copy/3.cc
> -D_GLIBCXX_DEBUG   -std=gnu++17  -include bits/stdc++.h ...  -lm  -o
> ./3.exe(timeout = 360)
>
> The -std=c++98 is there but later comes the -std=gnu++17 so I think it
> runs in C++17, no ?
>
> I also tried the documented alternative:
>
> make check 
> 'RUNTESTFLAGS=--target_board=unix/-O3\"{-std=gnu++98,-std=gnu++11,-std=gnu++14}\"'
>
>
> but same problem, -std=gnu++17 comes last.
>
> I'll try to rebuild all from scratch but I won't commit soon then.
>
>
>


[PATCH 0/2 V2] aarch64: Place target independent and dependent code in one file.

2024-02-15 Thread Ajit Agarwal
Hello Alex/Richard:

I have placed target indpendent and target dependent code in
aarch64-ldp-fusion for load store fusion.

Common infrastructure of load store pair fusion is divided into
target independent and target dependent code.

Target independent code is the Generic code with pure virtual
function to interface betwwen target independent and dependent
code.

Target dependent code is the implementation of pure virtual
function for aarch64 target and the call to target independent
code.

Bootstrapped in aarch64-linux-gnu.

Thanks & Regards
Ajit


aarch64: Place target independent and dependent code in one file.

Common infrastructure of load store pair fusion is divided into
target independent and target dependent code.

Target independent code is the Generic code with pure virtual
function to interface betwwen target independent and dependent
code.

Target dependent code is the implementation of pure virtual
function for aarch64 target and the call to target independent
code.

2024-02-15  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc: Place target
independent and dependent code.
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 3513 --
 1 file changed, 1842 insertions(+), 1671 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 22ed95eb743..0ab842e2bbb 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -17,6 +17,7 @@
 // along with GCC; see the file COPYING3.  If not see
 // .
 
+
 #define INCLUDE_ALGORITHM
 #define INCLUDE_FUNCTIONAL
 #define INCLUDE_LIST
@@ -37,13 +38,12 @@
 #include "tree-hash-traits.h"
 #include "print-tree.h"
 #include "insn-attr.h"
-
 using namespace rtl_ssa;
 
-static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7;
-static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1));
-static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1;
-static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1;
+static constexpr HOST_WIDE_INT PAIR_MEM_IMM_BITS = 7;
+static constexpr HOST_WIDE_INT PAIR_MEM_IMM_SIGN_BIT = (1 << 
(PAIR_MEM_IMM_BITS - 1));
+static constexpr HOST_WIDE_INT PAIR_MEM_MAX_IMM = PAIR_MEM_IMM_SIGN_BIT - 1;
+static constexpr HOST_WIDE_INT PAIR_MEM_MIN_IMM = -PAIR_MEM_MAX_IMM - 1;
 
 // We pack these fields (load_p, fpsimd_p, and size) into an integer
 // (LFS) which we use as part of the key into the main hash tables.
@@ -138,8 +138,144 @@ struct alt_base
   poly_int64 offset;
 };
 
+// Class that implements a state machine for building the changes needed to 
form
+// a store pair instruction.  This allows us to easily build the changes in
+// program order, as required by rtl-ssa.
+struct stp_change_builder
+{
+  enum class state
+  {
+FIRST,
+INSERT,
+FIXUP_USE,
+LAST,
+DONE
+  };
+
+  enum class action
+  {
+TOMBSTONE,
+CHANGE,
+INSERT,
+FIXUP_USE
+  };
+
+  struct change
+  {
+action type;
+insn_info *insn;
+  };
+
+  bool done () const { return m_state == state::DONE; }
+
+  stp_change_builder (insn_info *insns[2],
+ insn_info *repurpose,
+ insn_info *dest)
+: m_state (state::FIRST), m_insns { insns[0], insns[1] },
+  m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
+
+  change get_change () const
+  {
+switch (m_state)
+  {
+  case state::FIRST:
+   return {
+ m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
+ m_insns[0]
+   };
+  case state::LAST:
+   return {
+ m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
+ m_insns[1]
+   };
+  case state::INSERT:
+   return { action::INSERT, m_dest };
+  case state::FIXUP_USE:
+   return { action::FIXUP_USE, m_use->insn () };
+  case state::DONE:
+   break;
+  }
+
+gcc_unreachable ();
+  }
+
+  // Transition to the next state.
+  void advance ()
+  {
+switch (m_state)
+  {
+  case state::FIRST:
+   if (m_repurpose)
+ m_state = state::LAST;
+   else
+ m_state = state::INSERT;
+   break;
+  case state::INSERT:
+  {
+   def_info *def = memory_access (m_insns[0]->defs ());
+   while (*def->next_def ()->insn () <= *m_dest)
+ def = def->next_def ();
+
+   // Now we know DEF feeds the insertion point for the new stp.
+   // Look for any uses of DEF that will consume the new stp.
+   gcc_assert (*def->insn () <= *m_dest
+   && *def->next_def ()->insn () > *m_dest);
+
+   auto set = as_a (def);
+   for (auto use : set->nondebug_insn_uses ())
+ if (*use->insn () > *m_dest)
+   {
+ m_use = use;
+ break;
+   }
+
+   if (m_use)
+ m_state = state::FIXUP_USE;
+   else
+ m_state = state::LAST;
+   break;
+  }
+

  1   2   >