Re: [PATCH] slp: fix sharing of SLP only patterns. (PR99149)

2021-02-22 Thread Richard Biener
On Fri, 19 Feb 2021, Tamar Christina wrote:

> Hi Richi,
> 
> The attached testcase ICEs due to a couple of issues.
> In the testcase you have two SLP instances that share the majority of their
> definition with each other.  One tree defines a COMPLEX_MUL sequence and the
> other tree a COMPLEX_FMA.
> 
> The ice happens because:
> 
> 1. the refcounts are wrong, in particular the FMA case doesn't correctly count
> the references for the COMPLEX_MUL that it consumes.
> 
> 2. when the FMA is created it incorrectly assumes it can just tear apart the 
> MUL
> node that it's consuming.  This is wrong and should only be done when there is
> no more uses of the node, in which case the vector only pattern is no longer
> relevant.
> 
> To fix the last part the SLP only pattern reset code was moved into
> vect_free_slp_tree which results in cleaner code.  I also think it does belong
> there since that function knows when there are no more uses of the node and so
> the pattern should be unmarked, so when the the vectorizer is inspecting the 
> BB
> it doesn't find the now invalid vector only patterns.

Note that if you rely on the pattern being reset when it's no longer used
then this won't work in case we end up in vect_slp_convert_to_external
because that does

  /* Don't remove and free the child nodes here, since they could be
 referenced by other structures.  The analysis and scheduling phases
 (need to) ignore child nodes of anything that isn't 
vect_internal_def.  */
  unsigned int group_size = SLP_TREE_LANES (node);
  SLP_TREE_DEF_TYPE (node) = vect_external_def;
  SLP_TREE_SCALAR_OPS (node).safe_grow (group_size, true);
  SLP_TREE_LOAD_PERMUTATION (node).release ();
  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
{
  tree lhs = gimple_get_lhs (vect_orig_stmt (stmt_info)->stmt);

where I don't remember the reason for not freeing the children
(it's Richards code, so you'd have to ask him).  OTOH the above only
triggers for non-loop vectorization where the pattern reset isn't
needed?

> This has the obvious problem in that, eventually after analysis is done, the
> entire SLP tree is dissolved before codegen.  Where we get into trouble as we
> have now dissolved the patterns too...

Huh?  Codegen needs the SLP nodes so they most definitely survive until
after vectorization has finished.  So did you run into actual problems
with this approach?

> My initial thought was to add a parameter to vect_free_slp_tree, but I know 
> you
> wouldn't like that.  So I am sending this patch up as an RFC.
> 
> PS. This testcase actually shows that the codegen we get in these cases is not
> optimal.  Currently this won't vectorize as the compiler thinks the vector
> version is too expensive.
>
> My guess here is because the patterns now unshare the tree and it's likely
> costing the setup for the vector code twice?

I can see it figures there are LIVE lanes out of the SLP tree, likely
because the way vect_bb_slp_mark_live_stmts works out the stmt to
vectorize doesn't work for SLP patterns.  It sees

t.ii:19:10: note: node 0x3b87dd0 (max_nunits=2, refcnt=1)
t.ii:19:10: note: op template: MEM  [(struct a *)_10] = _24;
t.ii:19:10: note:   stmt 0 MEM  [(struct a *)_10] = _24;
t.ii:19:10: note:   stmt 1 MEM  [(struct a *)_10 + 4B] = _25;
t.ii:19:10: note:   children 0x3b87e58
t.ii:19:10: note: node 0x3b87e58 (max_nunits=2, refcnt=1)
t.ii:19:10: note: op template: slp_patt_34 = .COMPLEX_FMA (_24, _24, _24);
t.ii:19:10: note:   stmt 0 _24 = l$b_11 + _20;
t.ii:19:10: note:   stmt 1 _25 = l$c_12 + _23;
t.ii:19:10: note:   children 0x3b87ee0 0x3b88100 0x3b88100
t.ii:19:10: note: node 0x3b87ee0 (max_nunits=2, refcnt=1)
t.ii:19:10: note: op template: l$b_11 = MEM[(const struct a &)_10].b;
t.ii:19:10: note:   stmt 0 l$b_11 = MEM[(const struct a &)_10].b;
t.ii:19:10: note:   stmt 1 l$c_12 = MEM[(const struct a &)_10].c;
t.ii:19:10: note: node 0x3b88100 (max_nunits=2, refcnt=4)
t.ii:19:10: note: op template: k$b_4 = MEM[(const struct a &)_3].b;
t.ii:19:10: note:   stmt 0 k$b_4 = MEM[(const struct a &)_3].b;
t.ii:19:10: note:   stmt 1 k$c_5 = MEM[(const struct a &)_3].c;

but the scalar stmts for the .COMPLEX_FMA pattern have uses that
are not marked as PURE_SLP (_20 and _23).

There's the other issue that we have

t.ii:19:10: note: op template: slp_patt_33 = .COMPLEX_MUL (_20, _20);
t.ii:19:10: note:   stmt 0 _20 = _18 - _19;
t.ii:19:10: note:   stmt 1 _23 = _21 + _22;
t.ii:19:10: note:   children 0x3b88180 0x3b88538
t.ii:19:10: note: node 0x3b88538 (max_nunits=1, refcnt=1)
t.ii:19:10: note: op: VEC_PERM_EXPR
t.ii:19:10: note:   { }
t.ii:19:10: note:   lane permutation { 0[0] 1[1] }
t.ii:19:10: note:   children 0x3b880f8 0x3b88290
t.ii:19:10: note: node 0x3b880f8 (max_nunits=2, refcnt=1)
t.ii:19:10: note: op template: k$b_4 = MEM[(const struct a &)_3].b;
t.ii:19:10: note:   stmt 0 k$b_4 = MEM[(const struct a &)_3].b;
t.ii:19:10: note:   stmt 

[PATCH] tree-optimization/99165 - fix ICE in store-merging w/ non-call EH

2021-02-22 Thread Richard Biener
This adds a missing accumulation to ret.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-02-22  Richard Biener  

PR tree-optimization/99165
* gimple-ssa-store-merging.c (pass_store_merging::process_store):
Accumulate changed to ret.

* g++.dg/pr99165.C: New testcase.
---
 gcc/gimple-ssa-store-merging.c | 2 +-
 gcc/testsuite/g++.dg/pr99165.C | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/pr99165.C

diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index b4c5e8eb9a8..213c1551d39 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -5230,7 +5230,7 @@ pass_store_merging::process_store (gimple *stmt)
  if (idx >= (unsigned)param_max_store_chains_to_track
  || (n_stores + (*e)->m_store_info.length ()
  > (unsigned)param_max_stores_to_track))
-   terminate_and_process_chain (*e);
+   ret |= terminate_and_process_chain (*e);
  else
{
  n_stores += (*e)->m_store_info.length ();
diff --git a/gcc/testsuite/g++.dg/pr99165.C b/gcc/testsuite/g++.dg/pr99165.C
new file mode 100644
index 000..70ffd0345cf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr99165.C
@@ -0,0 +1,7 @@
+// { dg-do compile }
+// { dg-options "-O2 -fnon-call-exceptions --param=max-stores-to-track=2" }
+
+struct A {
+  A() : i() {}
+  int i;
+} *ap2 = new A[3];
-- 
2.26.2


Re: [PATCH] Fix ICE in tree_inlinable_function_p.

2021-02-22 Thread Richard Biener via Gcc-patches
On Sat, Feb 20, 2021 at 11:49 AM Martin Liška  wrote:
>
> After g:1a2a7096e5e20d736c6138179470b21aa5a74864 we forbid inlining
> for a VLA types. What we miss is setting inline_forbidden_reason
> variable.
>
> Fixes:
>
> ./xgcc -B. -O3 -c 
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/pr99122-2.c -Winline
>
> during GIMPLE pass: local-fnsummary
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/pr99122-2.c: In function 
> ‘foo’:
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/pr99122-2.c:21:1: internal 
> compiler error: Segmentation fault
> 21 | }
>| ^
> 0xe8b2ca crash_signal
> /home/marxin/Programming/gcc/gcc/toplev.c:327
> 0x1a92733 pp_format(pretty_printer*, text_info*)
> /home/marxin/Programming/gcc/gcc/pretty-print.c:1096
> 0x1a76b90 diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*)
> /home/marxin/Programming/gcc/gcc/diagnostic.c:1244
> 0x1a79994 diagnostic_impl
> /home/marxin/Programming/gcc/gcc/diagnostic.c:1406
> 0x1a79994 warning(int, char const*, ...)
> /home/marxin/Programming/gcc/gcc/diagnostic.c:1527
> 0xf1bb16 tree_inlinable_function_p(tree_node*)
> /home/marxin/Programming/gcc/gcc/tree-inline.c:4123
> 0xc3f1c5 compute_fn_summary(cgraph_node*, bool)
> /home/marxin/Programming/gcc/gcc/ipa-fnsummary.c:3110
> 0xc3f937 compute_fn_summary_for_current
> /home/marxin/Programming/gcc/gcc/ipa-fnsummary.c:3160
> 0xc3f937 execute
> /home/marxin/Programming/gcc/gcc/ipa-fnsummary.c:4768
>
> Ready to be installed?

OK.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * tree-inline.c (inline_forbidden_p): Set
> inline_forbidden_reason.
> ---
>   gcc/tree-inline.c | 14 --
>   1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index c993b1fee8a..1dcb31c0267 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -4027,10 +4027,20 @@ inline_forbidden_p (tree fndecl)
>the caller.  */
> if (COMPLETE_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))
> && !poly_int_tree_p (TYPE_SIZE (TREE_TYPE (TREE_TYPE (fndecl)
> -return true;
> +{
> +  inline_forbidden_reason
> +   = G_("function %q+F can never be inlined because "
> +"it has a VLA return argument");
> +  return true;
> +}
> for (tree parm = DECL_ARGUMENTS (fndecl); parm; parm = DECL_CHAIN (parm))
>   if (!poly_int_tree_p (DECL_SIZE (parm)))
> -  return true;
> +  {
> +   inline_forbidden_reason
> + = G_("function %q+F can never be inlined because "
> +  "it has a VLA argument");
> +   return true;
> +  }
>
> FOR_EACH_BB_FN (bb, fun)
>   {
> --
> 2.30.1
>


[PATCH] dump SLP subgraph before costing

2021-02-22 Thread Richard Biener
This adds another dump of the SLP subgraph we're throwing at costing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-02-22  Richard Biener  

* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Dump
costed subgraph.
---
 gcc/tree-vect-slp.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index ea8a97b01c6..c521c34a83b 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4366,6 +4366,15 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
   unsigned int vec_inside_cost = 0, vec_outside_cost = 0, scalar_cost = 0;
   unsigned int vec_prologue_cost = 0, vec_epilogue_cost = 0;
 
+  if (dump_enabled_p ())
+{
+  dump_printf_loc (MSG_NOTE, vect_location, "Costing subgraph: \n");
+  hash_set visited;
+  FOR_EACH_VEC_ELT (slp_instances, i, instance)
+   vect_print_slp_graph (MSG_NOTE, vect_location,
+ SLP_INSTANCE_TREE (instance), visited);
+}
+
   /* Calculate scalar cost and sum the cost for the vector stmts
  previously collected.  */
   stmt_vector_for_cost scalar_costs = vNULL;
-- 
2.26.2


Re: [PR94092] Re: [RFC] test builtin ratio for loop distribution

2021-02-22 Thread Richard Biener via Gcc-patches
On Fri, Feb 19, 2021 at 9:08 AM Alexandre Oliva  wrote:
>
> Here's an improved version of the patch.  Regstrapped on
> x86_64-linux-gnu, with and without a patchlet that moved multi-pieces
> ahead of setmem, and also tested with riscv32-elf.
>
> Is it ok to install?  Or should it wait for stage1?

It generally looks OK but I'd wait for stage1.  I'd also like to see
comments from others.  Note that I think we need to guard
the loop emitting on optimize_*_for_speed/!size.  It's also not entirely
clear to me how the code avoids expanding all > 1 byte block size
memsets to a loop (thus bypassing more optimal libc implementations
for larger sizes).  I also remember that we sometimes do a dynamic
dispatch between inlined and non-inlined code paths, though that might
be target specific handling in the x86 backend.

Thanks,
Richard.

> [PR94092] introduce try store by multiple pieces
>
> From: Alexandre Oliva 
>
> The ldist pass turns even very short loops into memset calls.  E.g.,
> the TFmode emulation calls end with a loop of up to 3 iterations, to
> zero out trailing words, and the loop distribution pass turns them
> into calls of the memset builtin.
>
> Though short constant-length clearing memsets are usually dealt with
> efficiently, for non-constant-length ones, the options are setmemM, or
> a function calls.
>
> RISC-V doesn't have any setmemM pattern, so the loops above end up
> "optimized" into memset calls, incurring not only the overhead of an
> explicit call, but also discarding the information the compiler has
> about the alignment of the destination, and that the length is a
> multiple of the word alignment.
>
> This patch handles variable lengths with multiple conditional
> power-of-2-constant-sized stores-by-pieces, so as to reduce the
> overhead of length compares.
>
> It also changes the last copy-prop pass into ccp, so that pointer
> alignment and length's nonzero bits are detected and made available
> for the expander, even for ldist-introduced SSA_NAMEs.
>
>
> for  gcc/ChangeLog
>
> PR tree-optimization/94092
> * builtins.c (try_store_by_multiple_pieces): New.
> (expand_builtin_memset_args): Use it.  If target_char_cast
> fails, proceed as for non-constant val.  Pass len's ctz to...
> * expr.c (clear_storage_hints): ... this.  Try store by
> multiple pieces after setmem.
> (clear_storage): Adjust.
> * expr.h (clear_storage_hints): Likewise.
> (try_store_by_multiple_pieces): Declare.
> * passes.def: Replace the last copy_prop to ccp.
> ---
>  gcc/builtins.c |  182 
> ++--
>  gcc/expr.c |9 ++-
>  gcc/expr.h |   13 
>  gcc/passes.def |5 +-
>  4 files changed, 197 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 0aed008687ccc..05b14f2a3f6d3 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -6600,6 +6600,166 @@ expand_builtin_memset (tree exp, rtx target, 
> machine_mode mode)
>return expand_builtin_memset_args (dest, val, len, target, mode, exp);
>  }
>
> +/* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO.
> +   Return TRUE if successful, FALSE otherwise.  TO is assumed to be
> +   aligned at an ALIGN-bits boundary.  LEN must be a multiple of
> +   1< +
> +   The strategy is to issue one store_by_pieces for each power of two,
> +   from most to least significant, guarded by a test on whether there
> +   are at least that many bytes left to copy in LEN.
> +
> +   ??? Should we skip some powers of two in favor of loops?  Maybe start
> +   at the max of TO/LEN/word alignment, at least when optimizing for
> +   size, instead of ensuring O(log len) dynamic compares?  */
> +
> +bool
> +try_store_by_multiple_pieces (rtx to, rtx len, unsigned int ctz_len,
> + unsigned HOST_WIDE_INT min_len,
> + unsigned HOST_WIDE_INT max_len,
> + rtx val, char valc, unsigned int align)
> +{
> +  int max_bits = floor_log2 (max_len);
> +  int min_bits = floor_log2 (min_len);
> +  int sctz_len = ctz_len;
> +
> +  gcc_checking_assert (sctz_len >= 0);
> +
> +  if (val)
> +valc = 1;
> +
> +  /* Bits more significant than TST_BITS are part of the shared prefix
> + in the binary representation of both min_len and max_len.  Since
> + they're identical, we don't need to test them in the loop.  */
> +  int tst_bits = (max_bits != min_bits ? max_bits
> + : floor_log2 (max_len ^ min_len));
> +
> +  /* Check whether it's profitable to start by storing a fixed BLKSIZE
> + bytes, to lower max_bits.  In the unlikely case of a constant LEN
> + (implied by identical MAX_LEN and MIN_LEN), we want to issue a
> + single store_by_pieces, but otherwise, select the minimum multiple
> + of the ALIGN (in bytes) and of the MCD of the possible LENs, that
> + brings MAX_LEN below TST_BITS, if that's

Re: [PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered

2021-02-22 Thread Alex Coplan via Gcc-patches
Hi Andre,

Thanks for fixing this.

On 19/02/2021 10:53, Andre Vieira (lists) via Gcc-patches wrote:
> Hi,
> 
> This patch makes sure that allocno copies are not created for unordered
> modes. The testcases in the PR highlighted a case where an allocno copy was
> being created for:
> (insn 121 120 123 11 (parallel [
>     (set (reg:VNx2QI 217)
>     (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 ]) 0)))
>     (clobber (scratch:VNx16BI))
>     ]) 4750 {*vec_duplicatevnx2qi_reg}
>  (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
>     (nil)))
> 
> As the compiler detected that the vec_duplicate_reg pattern allowed
> the input and output operand to be of the same register class, it tried to
> create an allocno copy for these two operands, stripping subregs in the
> process. However, this meant that the copy was between VNx2QI and SI, which
> have unordered mode precisions.
> 
> So at compile time we do not know which of the two modes is smaller which is
> a requirement when updating allocno copy costs.
> 
> Regression tested on aarch64-linux-gnu.
> 
> Is this OK for trunk (and after a week backport to gcc-10) ?
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
> new file mode 100644
> index 
> ..ee0c7b51602cacd45f9e33acecb1eaa9f9edebf2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
> @@ -0,0 +1,12 @@
> +/* PR rtl-optimization/98791  */
> +/* { dg-do compile } */
> +/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } 
> */
> +extern char a[], b[];
> +short c, d;
> +long *e;
> +void f() {
> +  for (int g; g < c; g += 1) {
> +a[g] = d;
> +b[g] = e[g];
> +  }
> +}

For the testcase, you might want to use the one I posted most recently:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791#c3
which avoids the dependency on the aarch64-autovec-preference param
(which is in GCC 11 only) as this will simplify backporting.

But if it's preferable to have a testcase without SVE intrinsics for GCC
11 then we should stick with that.

Cheers,
Alex


Re: [PATCH] gcov: use mmap pools for KVP.

2021-02-22 Thread Martin Liška

PING^3

On 2/9/21 9:37 AM, Martin Liška wrote:

PING^2

@Honza: ?

On 1/29/21 2:33 PM, Martin Liška wrote:

PING^1

On 1/25/21 1:51 PM, Martin Liška wrote:

On 1/22/21 3:33 PM, Jan Hubicka wrote:

It is definitly doable (gcov machinery is quite flexible WRT having more
types of counters).


Yes, that would introduce back the dropped TOPN counters which I intentionally
dropped.


We could bring back topn counters or the easier dominating vlaue ones
and add command line option.  However perhaps better in long term would
be keep adding mmap ifdefs for targets where it is important...

Kernel guys seems to be getting on profile feedback with clang, so we
should keep in mind posibility of supporting that as well.


Sure, that should not be so difficult. They should handle it in their kernel
run-time.




  We could however also ask users to either resort to
-fno-profile-values or implement mmap equivalent ifdefs to libgcov if
they really care about malloc profiling.


Seems reasonable to me.

Well, the current approach makes some assumptions on mmap (and malloc), but
they seem reasonable to me. I don't expect anybody building Mingw Firefox PGO 
build,
it's likely unsupported configuration.


It is possible to build Firefox with mingw on windows and I would
expected feedback to work.
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Compiling_Mozilla_With_Mingw
However this is not only about Firefox - we notice problems with Firefox
since it is only real word app where we look at PGO more systematically.
But we definitly should aim for PGO to be useful for apps of similar
complexity.  It would be nice to incrase this testing coverage.


I see.





Another observation about the tcmalloc 'malloc' implementation. It hangs in a 
PGO build,
but libgcov would be happy if NULL would be returned.


Yep, I would expect other folks to try to PGO optimize malloc
implementations an we could run into variety of issues...


Ok, can we go with the suggested mmap patch for now?

Thanks,
Martin



Honza


Martin



So personally I do not see this as a must feature (and I think Martin
was really looking forward to drop the former topn profile
implementation :)

Another intersting case would be, of course, profiling of kernel...

Honza













[Patch] Fortran/OpenMP: Fix optional dummy procedures [PR99171]

2021-02-22 Thread Tobias Burnus

Normal dummy arguments get some additional redirection if they are
OPTIONAL; however, that's not the case for dummy procedures.

That was shown by a simple 'procedure(), optional :: proc' example
in the PR. – The fix is as simple.

However, I thought it still makes sense to test all combinations of
procedure pointer (incl. c_funptr) with optional and pointer...

OK for mainline and GCC 10 (it is a 10/11 regression)?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Fortran/OpenMP: Fix optional dummy procedures [PR99171]

gcc/fortran/ChangeLog:

	PR fortran/99171
	* trans-openmp.c (gfc_omp_is_optional_argument): Regard optional
	dummy procs as nonoptional as no special treatment is needed.

libgomp/ChangeLog:

	PR fortran/99171
	* testsuite/libgomp.fortran/dummy-procs-1.f90: New test.

 gcc/fortran/trans-openmp.c |   5 +-
 .../testsuite/libgomp.fortran/dummy-procs-1.f90| 393 +
 2 files changed, 397 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 67e370f8b57..349df1cc346 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -64,7 +64,9 @@ gfc_omp_is_allocatable_or_ptr (const_tree decl)
 /* True if the argument is an optional argument; except that false is also
returned for arguments with the value attribute (nonpointers) and for
assumed-shape variables (decl is a local variable containing arg->data).
-   Note that pvoid_type_node is for 'type(c_ptr), value.  */
+   Note that for 'procedure(), optional' the value false is used as that's
+   always a pointer and no additional indirection is used.
+   Note that pvoid_type_node is for 'type(c_ptr), value' (and c_funloc).  */
 
 static bool
 gfc_omp_is_optional_argument (const_tree decl)
@@ -73,6 +75,7 @@ gfc_omp_is_optional_argument (const_tree decl)
 	  && DECL_LANG_SPECIFIC (decl)
 	  && TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
 	  && !VOID_TYPE_P (TREE_TYPE (TREE_TYPE (decl)))
+	  && TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) != FUNCTION_TYPE
 	  && GFC_DECL_OPTIONAL_ARGUMENT (decl));
 }
 
diff --git a/libgomp/testsuite/libgomp.fortran/dummy-procs-1.f90 b/libgomp/testsuite/libgomp.fortran/dummy-procs-1.f90
new file mode 100644
index 000..fcb17ce69a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/dummy-procs-1.f90
@@ -0,0 +1,393 @@
+! { dg-do run }
+!
+! PR fortran/99171
+!
+! Check dummy procedure arguments, especially optional ones
+!
+module m
+  use iso_c_binding
+  implicit none (type, external)
+  integer :: cnt
+  integer :: cnt2
+contains
+  subroutine proc()
+cnt = cnt + 1
+  end subroutine
+
+  subroutine proc2()
+cnt2 = cnt2 + 1
+  end subroutine
+
+  subroutine check(my_proc)
+procedure(proc) :: my_proc
+cnt = 42
+call my_proc()
+if (cnt /= 43) stop 1
+
+!$omp parallel
+  call my_proc()
+!$omp end parallel
+if (cnt <= 43) stop 2 
+  end
+
+  subroutine check_opt(my_proc)
+procedure(proc), optional :: my_proc
+logical :: is_present
+is_present = present(my_proc)
+cnt = 55
+if (present (my_proc)) then
+  call my_proc()
+  if (cnt /= 56) stop 3
+endif
+
+!$omp parallel
+  if (is_present .neqv. present (my_proc)) stop 4
+  if (present (my_proc)) then
+call my_proc()
+if (cnt <= 56) stop 5
+  end if
+!$omp end parallel
+if (is_present) then
+  if (cnt <= 56) stop 6
+else if (cnt /= 55) then
+  stop 7
+end if
+  end
+
+  subroutine check_ptr(my_proc)
+procedure(proc), pointer :: my_proc
+logical :: is_assoc
+integer :: mycnt
+is_assoc = associated (my_proc)
+
+cnt = 10
+cnt2 = 20
+if (associated (my_proc)) then
+  call my_proc()
+  if (cnt /= 11 .or. cnt2 /= 20) stop 8
+endif
+
+!$omp parallel
+  if (is_assoc .neqv. associated (my_proc)) stop 9
+  if (associated (my_proc)) then
+if (.not. associated (my_proc, proc)) stop 10
+call my_proc()
+if (cnt <= 11 .or. cnt2 /= 20) stop 11
+  else if (cnt /= 10 .or. cnt2 /= 20) then
+stop 12
+  end if
+!$omp end parallel
+if (is_assoc .neqv. associated (my_proc)) stop 13
+if (associated (my_proc)) then
+  if (cnt <= 11 .or. cnt2 /= 20) stop 14
+else if (is_assoc .and. (cnt /= 11 .or. cnt2 /= 20)) then
+  stop 15
+end if
+
+cnt = 30
+cnt2 = 40
+mycnt = 0
+!$omp parallel shared(mycnt)
+  !$omp critical
+ my_proc => proc2
+ if (.not.associated (my_proc, proc2)) stop 17
+ mycnt = mycnt + 1
+ call my_proc()
+ if (cnt2 /= 40 + mycnt .or. cnt /= 30) stop 18
+  !$omp end critical
+!$omp end parallel
+if (.not.associated (my_proc, proc2)) stop 19
+if (cnt2 /= 40 + mycnt .or. cnt /= 30) stop 20
+  end
+
+  subroutine check_ptr_o

Re: [Patch] Fortran/OpenMP: Fix optional dummy procedures [PR99171]

2021-02-22 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 22, 2021 at 01:06:56PM +0100, Tobias Burnus wrote:
> Normal dummy arguments get some additional redirection if they are
> OPTIONAL; however, that's not the case for dummy procedures.
> 
> That was shown by a simple 'procedure(), optional :: proc' example
> in the PR. – The fix is as simple.
> 
> However, I thought it still makes sense to test all combinations of
> procedure pointer (incl. c_funptr) with optional and pointer...
> 
> OK for mainline and GCC 10 (it is a 10/11 regression)?

Ok, thanks.

> gcc/fortran/ChangeLog:
> 
>   PR fortran/99171
>   * trans-openmp.c (gfc_omp_is_optional_argument): Regard optional
>   dummy procs as nonoptional as no special treatment is needed.
> 
> libgomp/ChangeLog:
> 
>   PR fortran/99171
>   * testsuite/libgomp.fortran/dummy-procs-1.f90: New test.
> 
>  gcc/fortran/trans-openmp.c |   5 +-
>  .../testsuite/libgomp.fortran/dummy-procs-1.f90| 393 
> +
>  2 files changed, 397 insertions(+), 1 deletion(-)

Jakub



Re: [PATCH v2 2/2] ada: add 128bit operation to MIPS N32 and N64

2021-02-22 Thread YunQiang Su
Arnaud Charlet  于2021年2月18日周四 下午3:38写道:
>
> > For MIPS N64 and N32:
> >   add GNATRTL_128BIT_PAIRS to LIBGNAT_TARGET_PAIRS
> >   add GNATRTL_128BIT_OBJS to EXTRA_GNATRTL_NONTASKING_OBJS
> >
> > gcc/ada/ChangeLog:
> >   PR ada/98996
> >   * Makefile.rtl (LIBGNAT_TARGET_PAIRS, EXTRA_GNATRTL_NONTASKING_OBJS)
> >   : add 128Bit operation file to MIPS N64 and N32.
>
> Note that the ChangeLog is generated automatically these days.
>
> The change is OK, thanks.

can you help me to commit this patchset?

>
> > ---
> >  gcc/ada/ChangeLog|  6 ++
> >  gcc/ada/Makefile.rtl | 12 
> >  2 files changed, 18 insertions(+)
> >
> > diff --git a/gcc/ada/ChangeLog b/gcc/ada/ChangeLog
> > index 52faefaa2ae..3565a32c5ac 100644
> > --- a/gcc/ada/ChangeLog
> > +++ b/gcc/ada/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2021-02-18  YunQiang Su  
> > +
> > + PR ada/98996
> > + * Makefile.rtl (LIBGNAT_TARGET_PAIRS, EXTRA_GNATRTL_NONTASKING_OBJS)
> > + : add 128Bit operation file to MIPS N64 and N32.
> > +
> >  2021-02-12  Arnaud Charlet  
> >
> >   * repinfo.ads, repinfo.adb (*SO_Ref*): Restore.
> > diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
> > index 35faf13ea46..d86eb8acbf3 100644
> > --- a/gcc/ada/Makefile.rtl
> > +++ b/gcc/ada/Makefile.rtl
> > @@ -2311,6 +2311,18 @@ ifeq ($(strip $(filter-out mips% 
> > linux%,$(target_cpu) $(target_os))),)
> >s-tpopsp.adb >system.ads >
> > +  ifeq ($(strip $(filter-out mips64% mipsisa64%,$(target_cpu))),)
> > +ifneq ($(strip $(MULTISUBDIR)),/32)
> > +  LIBGNAT_TARGET_PAIRS += $(GNATRTL_128BIT_PAIRS)
> > +  EXTRA_GNATRTL_NONTASKING_OBJS += $(GNATRTL_128BIT_OBJS)
> > +endif
> > +  else
> > +ifeq ($(strip $(filter-out /64 /n32,$(MULTISUBDIR))),)
> > +  LIBGNAT_TARGET_PAIRS += $(GNATRTL_128BIT_PAIRS)
> > +  EXTRA_GNATRTL_NONTASKING_OBJS += $(GNATRTL_128BIT_OBJS)
> > +endif
> > +  endif
> > +
> >TOOLS_TARGET_PAIRS = indepsw.adb >
> >EXTRA_GNATRTL_TASKING_OBJS=s-linux.o
> > --
> > 2.20.1


Re: [PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered

2021-02-22 Thread Andre Vieira (lists) via Gcc-patches

Hi Alex,

On 22/02/2021 10:20, Alex Coplan wrote:

For the testcase, you might want to use the one I posted most recently:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791#c3
which avoids the dependency on the aarch64-autovec-preference param
(which is in GCC 11 only) as this will simplify backporting.

But if it's preferable to have a testcase without SVE intrinsics for GCC
11 then we should stick with that.
I don't see any problem with having SVE intrinsics in the testcase, 
committed with your other test as I agree it makes the backport easier 
eventually.


Thanks for pointing that out.
diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index 
2c2234734c3166872d94d94c5960045cb89ff2a8..d83cfc1c1a708ba04f5e01a395721540e31173f0
 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -275,7 +275,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool 
constraint_p,
   ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)];
   ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)];
 
-  if (!allocnos_conflict_for_copy_p (a1, a2) && offset1 == offset2)
+  if (!allocnos_conflict_for_copy_p (a1, a2)
+ && offset1 == offset2
+ && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)),
+   GET_MODE_PRECISION (ALLOCNO_MODE (a2
{
  cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn,
 ira_curr_loop_tree_node);
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
new file mode 100644
index 
..cc1f1831afb68ba70016cbe26f8f9273cfceafa8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
@@ -0,0 +1,12 @@
+/* PR rtl-optimization/98791  */
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-vectorize" } */
+#include 
+extern char a[11];
+extern long b[];
+void f() {
+  for (int d; d < 10; d++) {
+a[d] = svaddv(svptrue_b8(), svdup_u8(0));
+b[d] = 0;
+  }
+}


Re: [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]

2021-02-22 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 19, 2021 at 07:12:42PM +, Kwok Cheung Yeung wrote:
> I have opted for a union of completion_sem (for tasks that are undeferred)
> and a struct gomp_team *detach_team (for deferred tasks) that holds the team
> if the completion event has not yet fulfilled, or NULL if is it. I don't see
> the point of having an indirection to the union here since the union is just
> the size of a pointer, so it might as well be inlined.

I see three issues with the union of completion_sem and detach_team done
that way.

1) while linux --enable-futex and accel gomp_sem_t is small (int), rtems
   and especially posix gomp_sem_t is large; so while it might be a good
   idea to inline gomp_sem_t on config/{linux,accel} into the union, for
   the rest it might be better to use indirection; if it is only for the
   undeferred tasks, it could be just using an automatic variable and
   put into the struct address of that; could be done either always,
   or define some macro in config/{linux,accel}/sem.h that gomp_sem_t is
   small and decide on the indirection based on that macro
2) kind == GOMP_TASK_UNDEFERRED is true also for the deferred tasks while
   running the cpyfn callback; guess this could be dealt with making sure
   the detach handling is done only after
  thr->task = task;
  if (cpyfn)
{
  cpyfn (arg, data);
  task->copy_ctors_done = true;
}
  else
memcpy (arg, data, arg_size);
  thr->task = parent;
  task->kind = GOMP_TASK_WAITING;
  task->fn = fn;
  task->fn_data = arg;
  task->final_task = (flags & GOMP_TASK_FLAG_FINAL) >> 1;
   I see you've instead removed the GOMP_TASK_UNDEFERRED but the rationale
   for that is that the copy constructors are being run synchronously
3) kind is not constant, for the deferred tasks it can change over the
   lifetime of the task, as you've said in the comments, it is kind ==
   GOMP_TASK_UNDEFERRED vs. other values; while the changes of task->kind
   are done while holding the task lock, omp_fulfill_event reads it before
   locking that lock, so I think it needs to be done using
   if (__atomic_load_n (&task->kind, MEMMODEL_RELAXED) == GOMP_TASK_UNDEFERRED)
   Pedantically the stores to task->kind also need to be done
   with __atomic_store_n MEMMODEL_RELAXED.

Now, similarly for 3) on task->kind, task->detach_team is similar case,
again, some other omp_fulfill_event can clear it (under lock, but still read
outside of the lock), so it
probably should be read with
  struct gomp_team *team
= __atomic_load_n (&task->detach_team, MEMMODEL_RELAXED);
And again, pedantically the detach_team stores should be atomic relaxed
stores too.  

> > Do you agree with this, or see some reason why this can't work?
> 
> The main problem I see is this code in gomp_barrier_handle_tasks:
> 
> if (--team->task_count == 0
> && gomp_team_barrier_waiting_for_tasks (&team->barrier))
>   {
> gomp_team_barrier_done (&team->barrier, state);
> 
> We do not have access to state from within omp_fulfill_event, so how should
> this be handled?

Sure, omp_fulfill_event shouldn't do any waiting, it needs to awake anything
that could have been waiting.

> @@ -688,8 +697,7 @@ struct gomp_team
>int work_share_cancelled;
>int team_cancelled;
>  
> -  /* Tasks waiting for their completion event to be fulfilled.  */
> -  struct priority_queue task_detach_queue;
> +  /* Number of tasks waiting for their completion event to be fulfilled.  */
>unsigned int task_detach_count;

Do we need task_detach_count?  Currently it is only initialized and
incremented/decremented, but never tested for anything.
Though see below.

> +  gomp_debug (0, "omp_fulfill_event: %p event fulfilled for finished task\n",
> +   task);
> +  size_t new_tasks = gomp_task_run_post_handle_depend (task, team);
> +  gomp_task_run_post_remove_parent (task);
> +  gomp_clear_parent (&task->children_queue);
> +  gomp_task_run_post_remove_taskgroup (task);
> +  team->task_count--;
> +  team->task_detach_count--;
> +
> +  /* Wake up any threads that may be waiting for the detached task
> + to complete.  */
> +  struct gomp_task *parent = task->parent;
> +
> +  if (parent && parent->taskwait)
> +{
> +  if (parent->taskwait->in_taskwait)
> + {
> +   parent->taskwait->in_taskwait = false;
> +   gomp_sem_post (&parent->taskwait->taskwait_sem);
> + }
> +  else if (parent->taskwait->in_depend_wait)
> + {
> +   parent->taskwait->in_depend_wait = false;
> +   gomp_sem_post (&parent->taskwait->taskwait_sem);
> + }
> +}

Looking at gomp_task_run_post_remove_parent, doesn't that function
already handle the in_taskwait and in_depend_wait gomp_sem_posts?

> +  if (task->taskgroup && task->taskgroup->in_taskgroup_wait)
> +{
> +  task->taskgroup->in_taskgroup_wait = false;
> +  gomp_sem_post (&task->taskgroup->taskgroup_sem);
> +}

And into gomp_task_run_post_remove_t

Re: [PATCH] doc: c: c++: Document the C/C++ extended asm empty input constraints

2021-02-22 Thread Jonathan Wakely via Gcc-patches

On 15/02/21 23:22 +, Neven Sajko wrote:

There is a long-standing, but undocumented GCC inline assembly feature
that's part of the extended asm GCC extension to C and C++: extended
asm empty input constraints.

Although I don't really use extended asm much, and I never contributed
to GCC before; I tried to document the feature as far as I understand
it. I ran make html to check that the changed Texinfo is well formed.

FTR, empty input constraints have been mentioned on the GCC mailing
lists, e.g.:
https://gcc.gnu.org/pipermail/gcc-help/2015-June/124410.html

I release this contribution into the public domain.

Neven Sajko

gcc/ChangeLog:

   * doc/md.texi: Document extended asm empty input constraints

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index e3686dbfe..deccfd38a 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1131,7 +1131,102 @@ the addressing register.
@subsection Simple Constraints
@cindex simple constraints

-The simplest kind of constraint is a string full of letters, each of
+An input constraint is allowed to be an empty string, in which case it is
+called an empty input constraint. (When an empty input constraint is used,
+the assembler template will most probably also be empty. I.e., the @code{asm}
+declaration need not contain actual assembly code.) An empty input


This parenthesis seems to be important to the next sentence, so should
it really be in parentheses?

Also, it's an asm statement, not asm declaration, isn't it?

And we don't seem to document that an empty assembler tempalte is
allowed, except in passing here.


+constraint can be used to create an artificial dependency on a C or C++


Is it redundant to say "C or C++ variable"? What other kind of
variable could it be talking about?


+variable (the variable that appears in the expression associated with the
+constraint) without incurring unnecessary costs to performance.


I think this whole paragraph (in fact all the new text) is quite
verbose and could be much more succint.


+An input constraint is allowed to be an empty string, in which case it is
+called an empty input constraint. (When an empty input constraint is used,
+the assembler template will most probably also be empty. I.e., the @code{asm}
+declaration need not contain actual assembly code.) An empty input
+constraint can be used to create an artificial dependency on a C or C++
+variable (the variable that appears in the expression associated with the
+constraint) without incurring unnecessary costs to performance.


Also, does the discussion of empty input constraints really belong
first? Isn't it a special case, and so should be documented after the
more common case? And do we really want to spend as long talking about
this special case as we do about all the other types of input
constraint combined?


+An example of where such behavior may be useful is for preventing compiler


"This can be useful to prevent compiler optimizations like dead store
elimination and hoisting code outside loops. Specific applications may
include ..."


+optimizations like dead store elimination or hoisting code outside a loop for
+certain pieces of C or C++ code. Specific applications may include direct
+interaction with hardware features; or things like testing, fuzzing and
+benchmarking.
+
+Here's a simple C++20 program that is not useful in practice but demonstrates
+relevant behavior; store it as a file called asm.cc:
+
+@verbatim
+#include 
+
+int
+main() {
+// Greater than or equal to zero.
+constexpr int asmV = ASM_V;
+
+// The exact content of v is irrelevant for
+// this example.
+std::vector v{7, 6, 9, 3, 2, 0};


This is ill-formed because int -> char is a narrowing conversion.
Please use valid C++ for the example.



+
+for (int i{0}; i < (1 << 28); i++) {
+for (int j{0}; j < 6; j++) {


I don't think the brace initialization helps the example. Consider
that many readers of the GCC manual are C programmers. Why use a
feature specific to C++ when you could just say "int i = 0;" so
everybody understands it?


+// The exact operation on the contents
+// of v is not relevant for this
+// example.
+v[j]++;
+
+if constexpr (1 <= asmV) {


I don't think the Yoda Conditions help the example.


+asm volatile ("" :: ""(v.size()));
+for (auto x: v) {
+asm volatile ("" :: ""(x));
+}
+}
+if constexpr (2 <= asmV) {
+asm volatile ("" :: ""(v.size()));
+for (auto x: v) {
+asm volatile ("" :: ""(x));
+}
+}
+if constexpr (3 <= asmV) {
+asm volatile ("" :: ""(v.size()));
+for (auto x: v) {
+asm volatile ("" :: ""(x));
+}
+}
+}
+}
+
+return 0;


A return statement is not needed for main, so removing it k

c++: cross-header-unit template definitions [PR 99153]

2021-02-22 Thread Nathan Sidwell
A member function can be defined in a different	header-file than the one 
defining the class.  In such situations we must unmark the decl as 
imported.  When the entity is a template we failed to unmark the 
template_decl.


Perhaps	the duplication	of these flags on the template_decl from the 
underlying decl is an error.  I	set on the fence about it for a	long 
time during development, but I don't think now is the time to change 
that (barring catastrophic bugs).


PR c++/99153
gcc/cp/
* decl.c (duplicate_decls): Move DECL_MODULE_IMPORT_P propagation
to common-path.
* module.cc (set_defining_module): Add assert.
gcc/testsuite/
* g++.dg/modules/pr99153_a.H: New.
* g++.dg/modules/pr99153_b.H: New.

--
Nathan Sidwell
diff --git c/gcc/cp/decl.c w/gcc/cp/decl.c
index 6f3414f058e..7fa8f52d667 100644
--- c/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -2879,19 +2879,6 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden)
 	  (char *) newdecl + sizeof (struct tree_common),
 	  sizeof (struct tree_decl_common) - sizeof (struct tree_common));
 
-  if (DECL_LANG_SPECIFIC (olddecl) && DECL_TEMPLATE_INFO (olddecl))
-	{
-	  /* Repropagate the module information to the template.  */
-	  tree tmpl = DECL_TI_TEMPLATE (olddecl);
-
-	  if (DECL_TEMPLATE_RESULT (tmpl) == olddecl)
-	{
-	  DECL_MODULE_PURVIEW_P (tmpl) = DECL_MODULE_PURVIEW_P (olddecl);
-	  gcc_checking_assert (!DECL_MODULE_IMPORT_P (olddecl));
-	  DECL_MODULE_IMPORT_P (tmpl) = false;
-	}
-	}
-
   switch (TREE_CODE (newdecl))
 	{
 	case LABEL_DECL:
@@ -2925,6 +2912,19 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden)
 	}
 }
 
+  if (DECL_LANG_SPECIFIC (olddecl) && DECL_TEMPLATE_INFO (olddecl))
+{
+  /* Repropagate the module information to the template.  */
+  tree tmpl = DECL_TI_TEMPLATE (olddecl);
+
+  if (DECL_TEMPLATE_RESULT (tmpl) == olddecl)
+	{
+	  DECL_MODULE_PURVIEW_P (tmpl) = DECL_MODULE_PURVIEW_P (olddecl);
+	  gcc_checking_assert (!DECL_MODULE_IMPORT_P (olddecl));
+	  DECL_MODULE_IMPORT_P (tmpl) = false;
+	}
+}
+
   if (VAR_OR_FUNCTION_DECL_P (newdecl))
 {
   if (DECL_EXTERNAL (olddecl)
diff --git c/gcc/cp/module.cc w/gcc/cp/module.cc
index 3d17b8ddcdb..7a40be3db35 100644
--- c/gcc/cp/module.cc
+++ w/gcc/cp/module.cc
@@ -18516,6 +18516,7 @@ set_defining_module (tree decl)
 		  gcc_checking_assert (!use_tpl);
 		  /* Get to the TEMPLATE_DECL.  */
 		  decl = TI_TEMPLATE (ti);
+		  gcc_checking_assert (!DECL_MODULE_IMPORT_P (decl));
 		}
 
 	  /* Record it on the class_members list.  */
diff --git c/gcc/testsuite/g++.dg/modules/pr99153_a.H w/gcc/testsuite/g++.dg/modules/pr99153_a.H
new file mode 100644
index 000..3eaa76bdc32
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/pr99153_a.H
@@ -0,0 +1,11 @@
+// { dg-additional-options -fmodule-header }
+// { dg-module-cmi {} }
+
+template
+struct pair
+{
+  inline void Frob ();
+};
+
+template
+inline void Widget ();
diff --git c/gcc/testsuite/g++.dg/modules/pr99153_b.H w/gcc/testsuite/g++.dg/modules/pr99153_b.H
new file mode 100644
index 000..5699378d317
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/pr99153_b.H
@@ -0,0 +1,15 @@
+// PR 99153 Mismatched flags on template and result
+// { dg-additional-options -fmodule-header }
+// { dg-module-cmi {} }
+
+import  "pr99153_a.H";
+
+template
+inline  void pair<_T1>::Frob()
+{ }
+
+
+template
+inline void Widget () 
+{
+}


c++: Recursing header imports and other duplicates [PR 99174]

2021-02-22 Thread Nathan Sidwell
The fix	for 98741 introduced two issues.  (a) recursive	header units 
iced because we tried to read the preprocessor state after having failed 
to read the config.  (b) we could have duplicate imports	of named 
modules in our pending queue, and that would lead to duplicate requests 
for pathnames, which coupled with the use of a null-pathname to indicate 
we'd asked could lead to desynchronization with the	module mapper. 
Fixed by adding a 'visited' flag to module state, so we ask exactly once.


PR c++/99174
gcc/cp
* module.cc (struct module_state): Add visited_p flag.
(name_pending_imports): Use it to avoid duplicate requests.
(preprocess_module): Don't read preprocessor state if we failed to
load a module's config.
gcc/testsuite/
* g++.dg/modules/pr99174-1_a.C: New.
* g++.dg/modules/pr99174-1_b.C: New.
* g++.dg/modules/pr99174-1_c.C: New.
* g++.dg/modules/pr99174.H: New.

--
Nathan Sidwell
diff --git c/gcc/cp/module.cc w/gcc/cp/module.cc
index 3d17b8ddcdb..09a9ca8778b 100644
--- c/gcc/cp/module.cc
+++ w/gcc/cp/module.cc
@@ -3551,9 +3551,10 @@ class GTY((chain_next ("%h.parent"), for_user)) module_state {
   bool call_init_p : 1; /* This module's global initializer needs
 			   calling.  */
   bool inform_read_p : 1; /* Inform of a read.  */
+  bool visited_p : 1;/* A walk-once flag. */
   /* Record extensions emitted or permitted.  */
   unsigned extensions : SE_BITS;
-  /* 13 bits used, 3 bits remain  */
+  /* 14 bits used, 2 bits remain  */
 
  public:
   module_state (tree name, module_state *, bool);
@@ -3787,6 +3788,7 @@ module_state::module_state (tree name, module_state *parent, bool partition)
   partition_p = partition;
 
   inform_read_p = false;
+  visited_p = false;
 
   extensions = 0;
   if (name && TREE_CODE (name) == STRING_CST)
@@ -19304,16 +19306,16 @@ name_pending_imports (cpp_reader *reader, bool at_end)
 {
   module_state *module = (*pending_imports)[ix];
   gcc_checking_assert (module->is_direct ());
-  if (!module->filename)
+  if (!module->filename
+	  && !module->visited_p
+	  && (module->is_header () || !only_headers))
 	{
-	  Cody::Flags flags
-	= (flag_preprocess_only ? Cody::Flags::None
-	   : Cody::Flags::NameOnly);
+	  module->visited_p = true;
+	  Cody::Flags flags = (flag_preprocess_only
+			   ? Cody::Flags::None : Cody::Flags::NameOnly);
 
-	  if (only_headers && !module->is_header ())
-	;
-	  else if (module->module_p
-		   && (module->is_partition () || module->exported_p))
+	  if (module->module_p
+	  && (module->is_partition () || module->exported_p))
 	mapper->ModuleExport (module->get_flatname (), flags);
 	  else
 	mapper->ModuleImport (module->get_flatname (), flags);
@@ -19325,15 +19327,13 @@ name_pending_imports (cpp_reader *reader, bool at_end)
   for (unsigned ix = 0; ix != pending_imports->length (); ix++)
 {
   module_state *module = (*pending_imports)[ix];
-  gcc_checking_assert (module->is_direct ());
-  if (only_headers && !module->is_header ())
-	;
-  else if (!module->filename)
+  if (module->visited_p)
 	{
-	  Cody::Packet const &p = *r_iter;
-	  ++r_iter;
+	  module->visited_p = false;
+	  gcc_checking_assert (!module->filename);
 
-	  module->set_filename (p);
+	  module->set_filename (*r_iter);
+	  ++r_iter;
 	}
 }
 
@@ -19443,7 +19443,8 @@ preprocess_module (module_state *module, location_t from_loc,
 	  /* Now read the preprocessor state of this particular
 	 import.  */
 	  unsigned n = dump.push (module);
-	  if (module->read_preprocessor (true))
+	  if (module->loadedness == ML_CONFIG
+	  && module->read_preprocessor (true))
 	module->import_macros ();
 	  dump.pop (n);
 
diff --git c/gcc/testsuite/g++.dg/modules/pr99174-1_a.C w/gcc/testsuite/g++.dg/modules/pr99174-1_a.C
new file mode 100644
index 000..c22b45bff45
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/pr99174-1_a.C
@@ -0,0 +1,8 @@
+// PR 99174 what if we import the same module twice (with no
+// intervening header import)?
+// { dg-additional-options -fmodules-ts }
+
+export module Foo;
+// { dg-module-cmi Foo }
+
+export void Foo ();
diff --git c/gcc/testsuite/g++.dg/modules/pr99174-1_b.C w/gcc/testsuite/g++.dg/modules/pr99174-1_b.C
new file mode 100644
index 000..aaa0a9492ad
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/pr99174-1_b.C
@@ -0,0 +1,6 @@
+// { dg-additional-options -fmodules-ts }
+
+export module Bar;
+// { dg-module-cmi Bar }
+
+export void Bar ();
diff --git c/gcc/testsuite/g++.dg/modules/pr99174-1_c.C w/gcc/testsuite/g++.dg/modules/pr99174-1_c.C
new file mode 100644
index 000..58936f292f8
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/pr99174-1_c.C
@@ -0,0 +1,11 @@
+// { dg-additional-options -fmodules-ts }
+
+import Foo;
+import Foo;
+import Bar;
+
+int main ()
+{
+  Foo ();
+  Bar ();
+}
diff --git c/gcc/testsuite/g++.dg/modules/pr99174.H w/gcc/testsuite/g+

[committed] hppa: Add mi_thunk support for vcalls

2021-02-22 Thread John David Anglin
The attached change adds mi_thunk support for vcalls on hppa.  Tested on 
hppa-unknown-linux-gnu,
hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11.  Committed to trunk and gcc-10.

Dave
-- 
John David Anglin  dave.ang...@bell.net

Add mi_thunk support for vcalls on hppa.

gcc/ChangeLog:

PR target/85074
* config/pa/pa.c (TARGET_ASM_CAN_OUTPUT_MI_THUNK): Define as
hook_bool_const_tree_hwi_hwi_const_tree_true.
(pa_asm_output_mi_thunk): Add support for nonzero vcall_offset.

diff --git a/gcc/config/pa/pa.c b/gcc/config/pa/pa.c
index 3921b5c98de..d7fcd11e504 100644
--- a/gcc/config/pa/pa.c
+++ b/gcc/config/pa/pa.c
@@ -293,7 +293,7 @@ static size_t n_deferred_plabels = 0;
 #undef TARGET_ASM_OUTPUT_MI_THUNK
 #define TARGET_ASM_OUTPUT_MI_THUNK pa_asm_output_mi_thunk
 #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
-#define TARGET_ASM_CAN_OUTPUT_MI_THUNK default_can_output_mi_thunk_no_vcall
+#define TARGET_ASM_CAN_OUTPUT_MI_THUNK 
hook_bool_const_tree_hwi_hwi_const_tree_true

 #undef TARGET_ASM_FILE_END
 #define TARGET_ASM_FILE_END pa_file_end
@@ -8461,12 +8461,15 @@ pa_is_function_label_plus_const (rtx op)
  && GET_CODE (XEXP (op, 1)) == CONST_INT);
 }

-/* Output assembly code for a thunk to FUNCTION.  */
+/* Output the assembler code for a thunk function.  THUNK_DECL is the
+   declaration for the thunk function itself, FUNCTION is the decl for
+   the target function.  DELTA is an immediate constant offset to be
+   added to THIS.  If VCALL_OFFSET is nonzero, the word at
+   *(*this + vcall_offset) should be added to THIS.  */

 static void
 pa_asm_output_mi_thunk (FILE *file, tree thunk_fndecl, HOST_WIDE_INT delta,
-   HOST_WIDE_INT vcall_offset ATTRIBUTE_UNUSED,
-   tree function)
+   HOST_WIDE_INT vcall_offset, tree function)
 {
   const char *fnname = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk_fndecl));
   static unsigned int current_thunk_number;
@@ -8482,201 +8485,386 @@ pa_asm_output_mi_thunk (FILE *file, tree 
thunk_fndecl, HOST_WIDE_INT delta,
   assemble_start_function (thunk_fndecl, fnname);
   final_start_function (emit_barrier (), file, 1);

-  /* Output the thunk.  We know that the function is in the same
- translation unit (i.e., the same space) as the thunk, and that
- thunks are output after their method.  Thus, we don't need an
- external branch to reach the function.  With SOM and GAS,
- functions and thunks are effectively in different sections.
- Thus, we can always use a IA-relative branch and the linker
- will add a long branch stub if necessary.
-
- However, we have to be careful when generating PIC code on the
- SOM port to ensure that the sequence does not transfer to an
- import stub for the target function as this could clobber the
- return value saved at SP-24.  This would also apply to the
- 32-bit linux port if the multi-space model is implemented.  */
-  if ((!TARGET_LONG_CALLS && TARGET_SOM && !TARGET_PORTABLE_RUNTIME
-   && !(flag_pic && TREE_PUBLIC (function))
-   && (TARGET_GAS || last_address < 262132))
-  || (!TARGET_LONG_CALLS && !TARGET_SOM && !TARGET_PORTABLE_RUNTIME
- && ((targetm_common.have_named_sections
-  && DECL_SECTION_NAME (thunk_fndecl) != NULL
-  /* The GNU 64-bit linker has rather poor stub management.
- So, we use a long branch from thunks that aren't in
- the same section as the target function.  */
-  && ((!TARGET_64BIT
-   && (DECL_SECTION_NAME (thunk_fndecl)
-   != DECL_SECTION_NAME (function)))
-  || ((DECL_SECTION_NAME (thunk_fndecl)
-   == DECL_SECTION_NAME (function))
-  && last_address < 262132)))
- /* In this case, we need to be able to reach the start of
-the stub table even though the function is likely closer
-and can be jumped to directly.  */
- || (targetm_common.have_named_sections
- && DECL_SECTION_NAME (thunk_fndecl) == NULL
- && DECL_SECTION_NAME (function) == NULL
- && total_code_bytes < MAX_PCREL17F_OFFSET)
- /* Likewise.  */
- || (!targetm_common.have_named_sections
- && total_code_bytes < MAX_PCREL17F_OFFSET
-{
-  if (!val_14)
-   output_asm_insn ("addil L'%2,%%r26", xoperands);
-
-  output_asm_insn ("b %0", xoperands);
-
-  if (val_14)
-   {
- output_asm_insn ("ldo %2(%%r26),%%r26", xoperands);
- nbytes += 8;
+  if (!vcall_offset)
+{
+  /* Output the thunk.  We know that the function is in the same
+translation unit (i.e., the same space) as the thunk, and that
+thunks are output after their method.  Thus, we don't need an
+external branch to reach the function.  With SOM and GAS,
+functions and thunk

[gcc-12 PATCH] ira: Correct HONOR_REG_ALLOC_ORDER usage

2021-02-22 Thread Uros Bizjak via Gcc-patches
The intention of HONOR_REG_ALLOC_ORDER is to ensure that IRA allocates
registers in the order given by REG_ALLOC_ORDER.  However in
ira_better_spill_reload_regno_p, there is still a place where the
calculation depends on the presence of REG_ALLOC_ORDER, ignoring
HONOR_REG_ALLOC_ORDER macro altogether.  The patch uses the correct macro
at this place.

On the other hand, assign_hard_reg function respects HONOR_REG_ALLOC_ORDER,
but expects this macro to return 1 to avoid internal cost calculations.
As the macro is defined to 0 by default, it is expected that targets redefine
HONOR_REG_ALLOC_ORDER to return nonzero value, even if REG_ALLOC_ORDER
is defined.  This approach is prone to errors, so the patch defines
HONOR_REG_ALLOC_ORDER to 1 by default if REG_ALLOC_ORDER is defined.

2021-02-22  Uroš Bizjak  

gcc/
* defaults.h (HONOR_REG_ALLOC_ORDER): If not defined,
define to 1 if REG_ALLOC_ORDER is defined.
* doc/tm.texi.in (HONOR_REG_ALLOC_ORDER):
Describe new default definition.
* doc/tm.texi: Regenerate.
* ira-color.c (ira_better_spill_reload_regno_p):
Use HONOR_REG_ALLOC_ORDER instead of REG_ALLOC_ORDER
to determine better spill reload regno.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for gcc-12 when it opens?

Uros.
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 91216593e75..2af4add0c05 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1047,7 +1047,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
 If not, see
 #endif
 
 #ifndef HONOR_REG_ALLOC_ORDER
-#define HONOR_REG_ALLOC_ORDER 0
+# if defined REG_ALLOC_ORDER
+#  define HONOR_REG_ALLOC_ORDER 1
+# else
+#  define HONOR_REG_ALLOC_ORDER 0
+# endif
 #endif
 
 /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 062785af1e2..9a346555ec8 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2050,7 +2050,10 @@ prologue and restoring it in the epilogue.  This 
discourages it from
 using call-saved registers.  If a machine wants to ensure that IRA
 allocates registers in the order given by REG_ALLOC_ORDER even if some
 call-saved registers appear earlier than call-used ones, then define this
-macro as a C expression to nonzero. Default is 0.
+macro as a C expression to nonzero.
+
+The default definition is @code{1} if @code{REG_ALLOC_ORDER} is defined;
+otherwise, it is @code{0}.
 @end defmac
 
 @defmac IRA_HARD_REGNO_ADD_COST_MULTIPLIER (@var{regno})
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 3b19e6f4281..1cb7fad7a46 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1797,7 +1797,10 @@ prologue and restoring it in the epilogue.  This 
discourages it from
 using call-saved registers.  If a machine wants to ensure that IRA
 allocates registers in the order given by REG_ALLOC_ORDER even if some
 call-saved registers appear earlier than call-used ones, then define this
-macro as a C expression to nonzero. Default is 0.
+macro as a C expression to nonzero.
+
+The default definition is @code{1} if @code{REG_ALLOC_ORDER} is defined;
+otherwise, it is @code{0}.
 @end defmac
 
 @defmac IRA_HARD_REGNO_ADD_COST_MULTIPLIER (@var{regno})
diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 3d01c60800c..4586c9a1e08 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -4835,14 +4835,17 @@ ira_better_spill_reload_regno_p (int *regnos, int 
*other_regnos,
 return cost < other_cost;
   if (length != other_length)
 return length > other_length;
-#ifdef REG_ALLOC_ORDER
-  if (hard_regno >= 0 && other_hard_regno >= 0)
-return (inv_reg_alloc_order[hard_regno]
-   < inv_reg_alloc_order[other_hard_regno]);
-#else
-  if (call_used_count != other_call_used_count)
-return call_used_count > other_call_used_count;
-#endif
+  if (HONOR_REG_ALLOC_ORDER)
+{
+  if (hard_regno >= 0 && other_hard_regno >= 0)
+   return (inv_reg_alloc_order[hard_regno]
+   < inv_reg_alloc_order[other_hard_regno]);
+}
+  else
+{
+  if (call_used_count != other_call_used_count)
+   return call_used_count > other_call_used_count;
+}
   return false;
 }
 


Re: [PATCH] doc: c: c++: Document the C/C++ extended asm empty input constraints

2021-02-22 Thread Segher Boessenkool
Hi!

First off, thanbk you for the patch!

On Mon, Feb 15, 2021 at 11:22:52PM +, Neven Sajko via Gcc-patches wrote:
> There is a long-standing, but undocumented GCC inline assembly feature
> that's part of the extended asm GCC extension to C and C++: extended
> asm empty input constraints.

There is no such thing.  *All* empty constraints have the same
semantics: anything whatsoever will do.  Any register, any constant, any
memory.

> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -1131,7 +1131,102 @@ the addressing register.
>  @subsection Simple Constraints
>  @cindex simple constraints
> 
> -The simplest kind of constraint is a string full of letters, each of
> +An input constraint is allowed to be an empty string, in which case it is
> +called an empty input constraint.

That is just shorthand for "empty constraint that is used for an input
operand".  It is not special, and it *is* documented:
https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
  The simplest kind of constraint is a string full of letters, each of
  which describes one kind of operand that is permitted.

A length zero string is allowed as well.  This could be made more
explicit sure; OTOH, it isn't very often useful.  So your example
(using it for making a dependency) is certainly useful to have.  But
it is not a special case at all.

> (When an empty input constraint is used,
> +the assembler template will most probably also be empty. I.e., the @code{asm}
> +declaration need not contain actual assembly code.)

Don't use parentheses like this in documentation please.

> An empty input
> +constraint can be used to create an artificial dependency on a C or C++
> +variable (the variable that appears in the expression associated with the
> +constraint) without incurring unnecessary costs to performance.

It still needs a register (or memory) reserved there (or sometimes a
constant can be used, but you have no dependency in that case!)

> +An example of where such behavior may be useful is for preventing compiler
> +optimizations like dead store elimination or hoisting code outside a loop for
> +certain pieces of C or C++ code.

You should not think about preventing the compiler from doing something.
Instead, you can give the compiler extra information that makes it *do*
something: it has to, because it has to implement the semantics your
source program has.

> Specific applications may include direct
> +interaction with hardware features; or things like testing, fuzzing and
> +benchmarking.

What does this mean?


Here is a simple example showing why this isn't as simple to use as
you imply here:

===
void f(int x)
{
asm volatile("" :: ""(x));
}

void g(void)
{
return f(42);
}
===

Both function compile to (taking aarch64 as example) just "ret".  But,
if you look at what the compiler does, you see in the "dfinish" pass it
has for f:

(insn:TI 6 3 20 (asm_operands/v ("") ("") 0 [
(reg:SI 0 x0 [93])
]
 [
(asm_input:SI ("") zlc.c:3)
]
 [] zlc.c:3) "zlc.c":3:2 -1
 (expr_list:REG_DEAD (reg:SI 0 x0 [93])
(nil)))


(so it has register x0 as input), while function g has

(insn:TI 5 2 16 (asm_operands/v ("") ("") 0 [
(const_int 42 [0x2a])
]
 [
(asm_input:SI ("") zlc.c:3)
]
 [] zlc.c:3) "zlc.c":3:2 -1
 (nil))

which has no dependency, gets fed the constant 42 instead, because
*anything at all* is allowed by an empty constraint.

You can also make this clear by using

asm volatile("# %0" :: ""(x));

which gives
# x0
resp.
# 42

or, with -fverbose-asm:
# x0// tmp93
and
# 42//

which is clear as mud, but it means in f there was a variable as input
to the asm, and in g there wasn't.


Segher


cris: Fix addi insn mult vs. shift canonicalization

2021-02-22 Thread Hans-Peter Nilsson via Gcc-patches
Ever since the canonicalization clean-up of (mult X (1 << N)) into
(ashift X N) outside addresses, the CRIS addi patterns have been
unmatched.  No big deal.

Unfortunately, nobody thought of adjusting reloaded addresses, so
transforming mult into a shift has to be a kludged for when reload
decides that it has to move an address like (plus (mult reg0 4) reg1)
into a register, as happens building libgfortran.  (No, simplify_rtx
et al don't automatically DTRT.)  Something less kludgy would make
sense if it wasn't for the current late development stage and reload
being deprecated.  I don't know whether this issue is absent for LRA,
though.

I added a testsuite for the reload issue, despite being exposed by a
libgfortran build, so the issue would be covered by C/C++ builds, but
to the CRIS test-suite, not as a generic test, to avoid bad feelings
from anyone preferring short test-times to redundant coverage.

Committed.

gcc:
* config/cris/cris.c (cris_print_operand) <'T'>: Change
valid operand from is now an addi mult-value to shift-value.
* config/cris/cris.md (*addi): Change expression of scaled
operand from mult to ashift.
* config/cris/cris.md (*addi_reload): New insn_and_split.

gcc/testsuite:
* gcc.target/cris/torture/sync-reload-mul-1.c: New test.
---
 gcc/config/cris/cris.c | 23 +--
 gcc/config/cris/cris.md| 33 +++---
 .../gcc.target/cris/torture/sync-reload-mul-1.c| 13 +
 3 files changed, 57 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/cris/torture/sync-reload-mul-1.c

diff --git a/gcc/config/cris/cris.c b/gcc/config/cris/cris.c
index 48ea8552e593..8a42aa16da13 100644
--- a/gcc/config/cris/cris.c
+++ b/gcc/config/cris/cris.c
@@ -880,9 +880,6 @@ cris_print_operand (FILE *file, rtx x, int code)
 {
   rtx operand = x;
 
-  /* Size-strings corresponding to MULT expressions.  */
-  static const char *const mults[] = { "BAD:0", ".b", ".w", "BAD:3", ".d" };
-
   /* New code entries should just be added to the switch below.  If
  handling is finished, just return.  If handling was just a
  modification of the operand, the modified operand should be put in
@@ -1212,11 +1209,21 @@ cris_print_operand (FILE *file, rtx x, int code)
   return;
 
 case 'T':
-  /* Print the size letter for an operand to a MULT, which must be a
-const_int with a suitable value.  */
-  if (!CONST_INT_P (operand) || INTVAL (operand) > 4)
-   LOSE_AND_RETURN ("invalid operand for 'T' modifier", x);
-  fprintf (file, "%s", mults[INTVAL (operand)]);
+  {
+   /* Print the size letter for an operand to a ASHIFT, which must be a
+  const_int with a suitable value.  */
+   int shiftval;
+
+   if (!CONST_INT_P (operand))
+ LOSE_AND_RETURN ("invalid operand for 'T' modifier", x);
+
+   shiftval = INTVAL (operand);
+
+   if (!(shiftval == 1 || shiftval == 2))
+ LOSE_AND_RETURN ("invalid operand for 'T' modifier", x);
+
+   fprintf (file, "%s", shiftval == 1 ? ".w" : ".d");
+  }
   return;
 
 case 0:
diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index 0fd29f99ba99..069f7e00b6db 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -1278,18 +1278,43 @@ (define_insn "addi_mul"
 (define_insn "*addi"
   [(set (match_operand:SI 0 "register_operand" "=r")
(plus:SI
-(mult:SI (match_operand:SI 2 "register_operand" "r")
- (match_operand:SI 3 "const_int_operand" "n"))
+(ashift:SI (match_operand:SI 2 "register_operand" "r")
+   (match_operand:SI 3 "const_int_operand" "n"))
 (match_operand:SI 1 "register_operand" "0")))]
   "operands[0] != frame_pointer_rtx
&& operands[1] != frame_pointer_rtx
&& CONST_INT_P (operands[3])
-   && (INTVAL (operands[3]) == 1
-   || INTVAL (operands[3]) == 2 || INTVAL (operands[3]) == 4)"
+   && (INTVAL (operands[3]) == 1 || INTVAL (operands[3]) == 2)"
   "addi %2%T3,%0"
   [(set_attr "slottable" "yes")
(set_attr "cc" "none")])
 
+;; The mult-vs-ashift canonicalization-cleanup plagues us: nothing in
+;; reload transforms a "scaled multiplication" into an ashift in a
+;; reloaded address; it's passed as-is and expected to be recognized,
+;; or else we get a tell-tale "unrecognizable insn".
+;; On top of that, we *should* match the bare insn, as a *matching
+;; pattern* (as opposed to e.g. a reload_load_address expander
+;; changing the mul into an ashift), so can_reload_into will re-use
+;; registers in the reloaded expression instead of allocating a new
+;; register.
+(define_insn_and_split "*addi_reload"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (plus:SI
+(mult:SI (match_operand:SI 2 "register_operand" "r")
+ (match_operand:SI 3 "const_int_operand" "n"))
+(match_operand:SI 1 "register_operand" "

cris: testsuite/gcc.target/cris/biap-mul.c: New test.

2021-02-22 Thread Hans-Peter Nilsson via Gcc-patches
Needed coverage for that *addi_mul pattern.  Committed.

gcc/testsuite:
* gcc.target/cris/biap-mul.c: New test.
---
 gcc/testsuite/gcc.target/cris/biap-mul.c | 15 +++
 1 file changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/cris/biap-mul.c

diff --git a/gcc/testsuite/gcc.target/cris/biap-mul.c 
b/gcc/testsuite/gcc.target/cris/biap-mul.c
new file mode 100644
index ..e0054632b239
--- /dev/null
+++ b/gcc/testsuite/gcc.target/cris/biap-mul.c
@@ -0,0 +1,15 @@
+/* Make sure ADDI is used for trivial multiplications too.  */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "\taddi" 2 } } */
+/* { dg-final { scan-assembler-not "\tlsl|\tmul|\tmove|\tadd\[^i\]" } } */
+
+int xyzzy (int r10)
+{
+  return r10 * 5;
+}
+
+int plugh (int r10)
+{
+  return r10 * 3;
+}
-- 
2.11.0



cris: testsuite/gcc.target/cris/biap.c: Add a Y+=X*2 to the Y+=X*4.

2021-02-22 Thread Hans-Peter Nilsson via Gcc-patches
Also, tweak the scan-assembler regexps to include a tab,
lest they may spuriously match file-paths in the emitted
assembly code, should some be added at some point.  And, add
"mul", "move" and (non-addi-)"add" to insns that shouldn't
appear.

Committed.

gcc/testsuite:
* gcc.target/cris/biap.c: Add a Y+=X*2 to the Y+=X*4.
---
 gcc/testsuite/gcc.target/cris/biap.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/cris/biap.c 
b/gcc/testsuite/gcc.target/cris/biap.c
index 1f3b4368a36a..f31e61c20c5f 100644
--- a/gcc/testsuite/gcc.target/cris/biap.c
+++ b/gcc/testsuite/gcc.target/cris/biap.c
@@ -2,10 +2,15 @@
See also PR37939.  */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler "addi" } } */
-/* { dg-final { scan-assembler-not "lsl" } } */
+/* { dg-final { scan-assembler-times "\taddi" 2 } } */
+/* { dg-final { scan-assembler-not "\tlsl|\tmul|\tmove|\tadd\[^i\]" } } */
 
 int xyzzy (int r10, int r11)
 {
   return r11 * 4 + r10;
 }
+
+int plugh (int r10, int r11)
+{
+  return r11 * 2 + r10;
+}
-- 
2.11.0



Re: [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]

2021-02-22 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 22, 2021 at 02:49:44PM +0100, Jakub Jelinek wrote:
> So, I think for the team != gomp_thread ()->ts.team
> && !do_wake
> && gomp_team_barrier_waiting_for_tasks (&team->barrier)
> && team->task_detach_count == 0
> case, we need to wake up 1 thread anyway and arrange for it to do:
>   gomp_team_barrier_done (&team->barrier, state);
>   gomp_mutex_unlock (&team->task_lock);
>   gomp_team_barrier_wake (&team->barrier, 0);
> Possibly in
>   if (!priority_queue_empty_p (&team->task_queue, MEMMODEL_RELAXED))
> add
>   else if (team->task_count == 0
>  && gomp_team_barrier_waiting_for_tasks (&team->barrier))
>   {
> gomp_team_barrier_done (&team->barrier, state);
> gomp_mutex_unlock (&team->task_lock);
> gomp_team_barrier_wake (&team->barrier, 0);
> if (to_free)
>   {
> gomp_finish_task (to_free);
> free (to_free);
>   }
> return;
>   }
> but the:
>   if (--team->task_count == 0
>   && gomp_team_barrier_waiting_for_tasks (&team->barrier))
> {
>   gomp_team_barrier_done (&team->barrier, state);
>   gomp_mutex_unlock (&team->task_lock);
>   gomp_team_barrier_wake (&team->barrier, 0);
>   gomp_mutex_lock (&team->task_lock);
> }
> in that case would then be incorrect, we don't want to do that twice.
> So, either that second if would need to do the to_free handling
> and return instead of taking the lock again and looping, or
> perhaps we can just do
> --team->task_count;
> there instead and let the above added else if handle that?

FYI, I've just tested that part of change alone whether it doesn't break
anything else and it worked fine:

diff --git a/libgomp/task.c b/libgomp/task.c
index b242e7c8d20..9c27c3b5148 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -1405,6 +1405,19 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
  team->task_running_count++;
  child_task->in_tied_task = true;
}
+  else if (team->task_count == 0
+  && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+   {
+ gomp_team_barrier_done (&team->barrier, state);
+ gomp_mutex_unlock (&team->task_lock);
+ gomp_team_barrier_wake (&team->barrier, 0);
+ if (to_free)
+   {
+ gomp_finish_task (to_free);
+ free (to_free);
+   }
+ return;
+   }
   gomp_mutex_unlock (&team->task_lock);
   if (do_wake)
{
@@ -1479,14 +1492,7 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
  if (do_wake > new_tasks)
do_wake = new_tasks;
}
- if (--team->task_count == 0
- && gomp_team_barrier_waiting_for_tasks (&team->barrier))
-   {
- gomp_team_barrier_done (&team->barrier, state);
- gomp_mutex_unlock (&team->task_lock);
- gomp_team_barrier_wake (&team->barrier, 0);
- gomp_mutex_lock (&team->task_lock);
-   }
+ --team->task_count;
}
}
 }


Jakub



Re: [PATCH] libstdc++: Fix up parallel_backend_serial.h [PR97549]

2021-02-22 Thread Thomas Rodgers via Gcc-patches



- Original Message -
> Hi!
> 
> In GCC 10, parallel_backend.h just included parallel_backend_{serial,tbb}.h
> and
> did nothing beyond that, and parallel_backend_tbb.h provided directly
> namespace __pstl { namespace __par_backend { ... } }
> and defined everything in there, while parallel_backend_serial.h did:
> namespace __pstl { namespace __serial { ... } } and had this
> namespace __pstl { namespace __par_backend { using namespace
> __pstl::__serial; } }
> at the end.
> In GCC 11, parallel_backend.h does:
> namespace __pstl { namespace __par_backend = __serial_backend; }
> after including parallel_backend_serial.h or
> namespace __pstl { namespace __par_backend = __tbb_backend; }
> after including parallel_backend_tbb.h.  The latter then has:
> namespace __pstl { namespace __tbb_backend { ... } }
> and no using etc. at the end, while parallel_backend_serial.h changed to:
> namespace __pstl { namespace __serial_backend { ... } }
> but has this leftover block from the GCC 10 times.  Even changing that
> using namespace __pstl::__serial;
> to
> using namespace __pstl::__serial_backend;
> doesn't work, as it clashes with
> namespace __pstl { namespace __par_backend = __serial_backend; }
> in parallel_backend.h.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux
> both without and with tbb-devel installed.
> 
> Ok for trunk?

Yes, thanks.

> 
> 2021-02-20  Jakub Jelinek  
> 
>   PR libstdc++-v3/97549
>   * include/pstl/parallel_backend_serial.h: Remove __pstl::__par_backend.
> 
> --- libstdc++-v3/include/pstl/parallel_backend_serial.h.jj2020-10-21
> 19:33:24.059872401 +0200
> +++ libstdc++-v3/include/pstl/parallel_backend_serial.h   2021-02-19
> 11:59:56.414645219 +0100
> @@ -127,12 +127,4 @@ __parallel_invoke(_ExecutionPolicy&&, _F
>  } // namespace __serial_backend
>  } // namespace __pstl
>  
> -namespace __pstl
> -{
> -namespace __par_backend
> -{
> -using namespace __pstl::__serial;
> -}
> -} // namespace __pstl
> -
>  #endif /* _PSTL_PARALLEL_BACKEND_SERIAL_H */
> 
>   Jakub
> 



Committed: g++.dg/warn/Warray-bounds-10..13: Fix for 32-bit newlib targets

2021-02-22 Thread Hans-Peter Nilsson via Gcc-patches
See gcc/config/newlib-stdint.h, where targets that have
LONG_TYPE_SIZE == 32, get __INT32_TYPE__ defined to "long
int".

All these tests have "typedef __INT32_TYPE__ int32_t;".
Thus the tests fail for 32-bit newlib targets due to related
warning messages being matched to "aka int" where the
emitted message for these targets have "aka long int".

Tested cris-elf and x86_64-linux, committed as obvious.

gcc/testsuite:
* g++.dg/warn/Warray-bounds-10.C, g++.dg/warn/Warray-bounds-11.C,
g++.dg/warn/Warray-bounds-12.C, g++.dg/warn/Warray-bounds-13.C:
Handle __INT32_TYPE__ being "long int".
---
 gcc/testsuite/g++.dg/warn/Warray-bounds-10.C | 24 
 gcc/testsuite/g++.dg/warn/Warray-bounds-11.C | 24 
 gcc/testsuite/g++.dg/warn/Warray-bounds-12.C | 24 
 gcc/testsuite/g++.dg/warn/Warray-bounds-13.C | 24 
 4 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C 
b/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C
index 22466977b68a..4deea31fae03 100644
--- a/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C
+++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C
@@ -19,9 +19,9 @@ void warn_op_new ()
 {
   T (int32_t, 0, 0);  // { dg-warning "array subscript 0 is outside 
array bounds of 'int32_t \\\[0]'" }
   // { dg-message "referencing an object of size 
\\d allocated by 'void\\\* operator new\\\(\(long \)?unsigned int\\\)'" "note" 
{ target *-*-* } .-1 }
-  T (int32_t, 1, 0);  // { dg-warning "array subscript 'int32_t {aka 
int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" }
-  T (int32_t, 2, 0); //  { dg-warning "array subscript 'int32_t {aka 
int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" }
-  T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka 
int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[3]'" }
+  T (int32_t, 1, 0);  // { dg-warning "array subscript 'int32_t {aka 
(long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" }
+  T (int32_t, 2, 0); //  { dg-warning "array subscript 'int32_t {aka 
(long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" }
+  T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka 
(long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[3]'" }
 
   T (int32_t, 4, 0);
 
@@ -30,9 +30,9 @@ void warn_op_new ()
   T (int32_t, 2, 1);  // { dg-warning "array subscript 1 is outside 
array bounds " }
   T (int32_t, 3, 1);  // { dg-warning "array subscript 1 is outside 
array bounds " }
   T (int32_t, 4, 1);  // { dg-warning "array subscript 1 is outside 
array bounds " }
-  T (int32_t, 5, 1);  // { dg-warning "array subscript 'int32_t {aka 
int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[5]" }
-  T (int32_t, 6, 1);  // { dg-warning "array subscript 'int32_t {aka 
int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[6]" }
-  T (int32_t, 7, 1);  // { dg-warning "array subscript 'int32_t {aka 
int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[7]" }
+  T (int32_t, 5, 1);  // { dg-warning "array subscript 'int32_t {aka 
(long )?int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[5]" }
+  T (int32_t, 6, 1);  // { dg-warning "array subscript 'int32_t {aka 
(long )?int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[6]" }
+  T (int32_t, 7, 1);  // { dg-warning "array subscript 'int32_t {aka 
(long )?int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[7]" }
 
   T (int32_t, 8, 1);
 }
@@ -45,9 +45,9 @@ void warn_op_array_new ()
 
   T (int32_t, 0, 0);  // { dg-warning "array subscript 0 is outside 
array bounds of 'int32_t \\\[0]'" }
   // { dg-message "referencing an object of size 
\\d allocated by 'void\\\* operator new \\\[]\\\(\(long \)?unsigned int\\\)'" 
"note" { target *-*-* } .-1 }
-  T (int32_t, 1, 0);  // { dg-warning "array subscript 'int32_t {aka 
int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" }
-  T (int32_t, 2, 0); //  { dg-warning "array subscript 'int32_t {aka 
int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" }
-  T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka 
int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[3]'" }
+  T (int32_t, 1, 0);  // { dg-warning "array subscript 'int32_t {aka 
(long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" }
+  T (int32_t, 2, 0); //  { dg-warning "array subscript 'int32_t {aka 
(long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" }
+  T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka 
(long )?int}\\\

Re: [PATCH, V2] Add conversions between _Float128 and Decimal.

2021-02-22 Thread Michael Meissner via Gcc-patches
On Sat, Feb 20, 2021 at 06:33:12PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Feb 09, 2021 at 02:35:05AM -0500, Michael Meissner wrote:
> > This patch implements conversions between _Float128 and the 3 Decimal 
> > floating
> > types.  It does this by extendending the dfp-bit conversions to add a new
> > binary floating point type (KF), and doing the conversions in the same 
> > manner
> > as the other binary/decimal conversions.
> 
> > For conversions from _Float128 to Decimal, this patch uses a function
> > (__sprintfkf) instead of the sprintf function to convert long double values 
> > to
> > strings.  The __sprintfkf function determines if GLIBC 2.32 or newer is used
> > and calls the IEEE 128-bit version of sprintf (__sprintfieee128).  If the 
> > GLIBC
> > is earlier than 2.32, the code will convert _Float128 to __ibm128 and then 
> > use
> > the normal sprintf to convert this value.
> 
> So if built with a glibc version before 2.32 (less than a year old) it
> will give the wrong answer.  This needs improving, or it will be another
> eight or so years until this is generally usable.

But until the long double format default is changed to be IEEE 128-bit floating
point,  only the people who explicitly convert between __float128/_Float128 and
the decimal types will see the issue.

In order to switch to the IEEE 128-bit floating point long double, you need at
least GLIBC 2.32, and you will get full accuracy.

> > The compilers built fine providing I recompiled gmp, mpc, and mpfr with the
> > appropriate long double options.  There were a few differences in the test
> > suite runs that will be addressed in later patches, but over all it works
> > well.
> 
> What kind of differences?  I assume you checked all, and all differences
> are an improvement, or the differences are inconsequential and the test
> is not very good?

I have submitted patches for these before, and I will shortly ping or resubmit
them again.  But I felt this was more import to get in before worrying about
changes to the testsuite.  The changes are:

* C test   c-c++-common/dfp/convert-bfp-11.c fails
* C test   gcc.target/powerpc/pr70117.c  fails
* C test   gcc.dg/torture/float128-nan.c fails
* C test   gcc.target/powerpc/nan128-1.c fails
* C++ test c-c++-common/dfp/convert-bfp-11.c fails
* C++ testsall modules tests fails
* Fortran test gfortran.dg/default_format_2.f90  now passes
* Fortran test gfortran.dg/default_format_denormal_2.f90 now passes
* Fortran test gfortran.dg/ieee/large_2.f90  now passes

The following two tests test facets of the IBM 128-bit long double
implementation.  Since they are hard wired to use IBM 128-bit long double, I've
added options in the patches to run these tests with -mabi=ibmlongdouble if the
default is -mabi=ieeelongdouble.

* c-c++-common/dfp/convert-bfp-11.c
* gcc.target/powerpc/pr70117.c

The following two tests fail because they are testing the old libquadmath 'q'
built-ins, and there is a subtle difference between using the _Float128
built-in function for the nans function and the long double built-in function
for nans.  In particular, the signalling NaN is silently converted to a quiet
NaN.

* gcc.dg/torture/float128-nan.c
* gcc.target/powerpc/nan128-1.c

The modules failures are PR c++/98645, and are not a back end feature.

The 3 fortran tests now pass if long double uses the IEEE 128-bit
representation:

* gfortran.dg/default_format_2.f90
* gfortran.dg/default_format_denormal_2.f90
* gfortran.dg/ieee/large_2.f90

> 
> > --- /dev/null
> > +++ b/libgcc/config/rs6000/_sprintfkf.c
> > @@ -0,0 +1,57 @@
> > +   If we are linked against an earlier library, we will have fake it by
> > +   converting the value to long double, and using sprinf to do the 
> > conversion.
> 
> (typo, sprintf)

Fixed.

> > +   This isn't ideal, as IEEE 128-bit has more exponent range than IBM
> > +   128-bit.  */
> 
> And that results in some numbers being printed as Inf that are not.  But
> also, the significand has more bits, so there is a loss of precision as
> well.

Yes.

> > +int __sprintfkf (char *restrict string,
> > +const char *restrict format,
> > +_Float128 number)
> > +{
> > +  if (__sprintfieee128)
> > +return __sprintfieee128 (string, format, number);
> > +
> > +  return sprintf (string, format, (long double)number);
> 
> Space after cast.

Fixed.

> > +_Float128
> > +__strtokf (const char *string, char **endptr)
> > +{
> > +  long double num;
> > +
> > +  if (__strtoieee128)
> > +return __strtoieee128 (string, endptr);
> > +
> > +  num = strtold (string, endptr);
> > +  return (_Float128) num;
> 
> Do not cast return values please.  All casts you do should be *needed*,
> have a purpose.

Well it is changing type (num is long double, i.e. IBM 128-bit long double) and
the return is _Flo

[PATCH, constexpr, coroutines ] Generic lambda coroutines cannot be constexpr [PR96251].

2021-02-22 Thread Iain Sandoe

Hi,

PR96251 (and dups) is a rejects-valid bug against coroutines (but I’m
not sure that the problem is really there).

 * coroutines cannot be constexpr.

* Part of the way that the coroutine implementation works is a
consequence of the "lazy discovery of coroutine-ness”;  whenever
we first encounter a coroutine keyword...

.. we mark the function as a coroutine, and then we can deal with
diagnostics etc. that change under these circumstances.  This
marking of the function is independent of whether keyword
expressions are type-dependent or not.

* when we encounter a coroutine keyword in a constexpr context
we error.

* the constexpr machinery also has been taught that coroutine
expressions should cause constexpr-ness to be rejected when
checking for "potentially constexpr".

So why is there a problem?

* lambdas are implicitly constexpr from C++17.

* the constexpr machinery tends to bail early *with a conservative
  assumption that the expression is potentially constexpr* when it
  finds a type-dependent expression - without evaluating sub-
  expressions to see if they are valid (thus missing coroutine exprs.
  in such positions).

The combination of these ^^ two things, means that for many generic
lambdas with non-trivial bodies - we then enter instantiation with a
constexpr type-dependent coroutine (which is a Thing that Should not
Be).  As soon as we try to tsub any coroutine keyword expression
encountered, we error out.

* I was not able to see any way in which the instantiation process
  could be made to bail in this case and re-try for non-constexpr.

* Nor able to see somewhere else where that decision could be made.

^^ these two points reflect that I'm not familiar with the constexpr
machinery.

* Proposed solution.

Since coroutines cannot be constexpr (not even sure what that would
mean in any practicable implementation).  The fix proposed is to
reject constexpr-ness for coroutine functions as early as possible.

* a) We can prevent a generic lambda coroutine from being converted
  to constexpr in finish_function ().  Likewise, via the tests below, we can
  avoid it for regular lambdas.

  b) We can also reject coroutine function decls early in both
  is_valid_constexpr_fn () and potential_constant_expression_1 ().

So - is there some alternate solution that would be better?



(in some ways it seems pointless to delay rejection of a coroutine
 function to some later point).

OTOH, I suppose that one could have some weird code where coroutine
coroutine keywords only appeared in a non-constexpr paths of the code
- but our current lowering is not prepared for such a circumstance.

AFAIU the standard, there's no dispensation from the "cannot be
constexpr" for such a code construction .. but ICBW.

In any event, the cases reported (and fixed by this patch) are not trying
anything so fiendishly clever.

Tested on x86_64-darwin.

Suggestions?
OK for master (after wider testing)?
thanks
Iain

= 

coroutines cannot be constexpr, but generic lambdas (from C++17)
are implicitly assumed to be, and the processing of type-
dependent lambda bodies often terminates before any coroutine
expressions are encountered.  For example, the PR notes cases
where the coro expressions are hidden by type-dependent for or
switch expressions.

The solution proposed is to deny coroutine generic lambdas.  We also
tighten up the checks for is_valid_constexpr_fn() and do an early
test for coroutine functions in checking for potential constant
expressions.

gcc/cp/ChangeLog:

PR c++/96251
* constexpr.c (is_valid_constexpr_fn): Test for a
coroutine before the chekc for lambda.
(potential_constant_expression_1): Disallow coroutine
function decls.
* decl.c (finish_function): Do not mark generic
coroutine lambda templates as constexpr.

gcc/testsuite/ChangeLog:

PR c++/96251
* g++.dg/coroutines/pr96251.C: New test.
---
 gcc/cp/constexpr.c|  7 ++-
 gcc/cp/decl.c |  2 +-
 gcc/testsuite/g++.dg/coroutines/pr96251.C | 22 ++
 3 files changed, 29 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr96251.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 377fe322ee8..036bf04efa9 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -210,7 +210,9 @@ is_valid_constexpr_fn (tree fun, bool complain)
  }
 }

-  if (LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)) && cxx_dialect < cxx17)
+  if (DECL_COROUTINE_P (fun))
+ret = false;
+  else if (LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)) && cxx_dialect < cxx17)
 {
   ret = false;
   if (complain)
@@ -7827,6 +7829,9 @@ potential_constant_expression_1 (tree t, bool  
want_rval, bool strict, bool now,

   switch (TREE_CODE (t))
 {
 case FUNCTION_DECL:
+  if (DECL_COROUTINE_P (t))
+   return false;
+   /* FALLTHROUGH.  */
 case BASELINK:
 case TEMPLATE_DE

[PATCH] PR fortran/99206 - [11 Regression] ICE in add_init_expr_to_sym, at fortran/decl.c:1980

2021-02-22 Thread Harald Anlauf via Gcc-patches
Dear all,

when simplifying the RESHAPE intrinsic we lost the string length when
the resulting array had size zero.  The attached patch sets the resulting
length from the source.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald


PR fortran/99206 - ICE in add_init_expr_to_sym, at fortran/decl.c:1980

Make sure that the string length is properly set during simplification
even when the resulting array is zero-sized.

gcc/fortran/ChangeLog:

PR fortran/99206
* simplify.c (gfc_simplify_reshape): Set string length for
character arguments.

gcc/testsuite/ChangeLog:

PR fortran/99206
* gfortran.dg/reshape_zerosize_4.f90: New test.

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index 35f267db588..388aca7c38c 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -6887,6 +6887,8 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp,
 			   &source->where);
   if (source->ts.type == BT_DERIVED)
 result->ts.u.derived = source->ts.u.derived;
+  if (source->ts.type == BT_CHARACTER && result->ts.u.cl == NULL)
+result->ts = source->ts;
   result->rank = rank;
   result->shape = gfc_get_shape (rank);
   for (i = 0; i < rank; i++)
diff --git a/gcc/testsuite/gfortran.dg/reshape_zerosize_4.f90 b/gcc/testsuite/gfortran.dg/reshape_zerosize_4.f90
new file mode 100644
index 000..d9a0d217271
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/reshape_zerosize_4.f90
@@ -0,0 +1,14 @@
+! { dg-do run }
+! PR fortran/99206 - ICE in add_init_expr_to_sym, at fortran/decl.c:1980
+! Check simplifier of RESHAPE for character arrays.
+
+program p
+  character(*),parameter :: a(0) = reshape([  'ab'], [0])
+  character(*,kind=4), parameter :: c(0) = reshape([4_'cd'], [0])
+  if (len (a)   /= 2) stop 1
+  if (len (reshape (  ['ab'], [0])) /= 2) stop 2
+  if (kind(reshape (  ['ab'], [0])) /= 1) stop 3
+  if (len (c)   /= 2) stop 4
+  if (len (reshape ([4_'cd'], [0])) /= 2) stop 5
+  if (kind(reshape ([4_'cd'], [0])) /= 4) stop 6
+end


Re: [PATCH, V2] Add conversions between _Float128 and Decimal.

2021-02-22 Thread Segher Boessenkool
On Mon, Feb 22, 2021 at 03:52:52PM -0500, Michael Meissner wrote:
> On Sat, Feb 20, 2021 at 06:33:12PM -0600, Segher Boessenkool wrote:
> > So if built with a glibc version before 2.32 (less than a year old) it
> > will give the wrong answer.  This needs improving, or it will be another
> > eight or so years until this is generally usable.
> 
> But until the long double format default is changed to be IEEE 128-bit 
> floating
> point,  only the people who explicitly convert between __float128/_Float128 
> and
> the decimal types will see the issue.
> 
> In order to switch to the IEEE 128-bit floating point long double, you need at
> least GLIBC 2.32, and you will get full accuracy.

But see my point.  It only works if you can break all links with the
past.  Which cannot work if you have to work together with other
communities, don't just do your own thing.

("Full accuracy" is a strange thing to say...  doubele-double has (much)
higher precision for some numbers...  but if you want IEEE float
semantics, double-double is simply *wrong*).

> > > The compilers built fine providing I recompiled gmp, mpc, and mpfr with 
> > > the
> > > appropriate long double options.  There were a few differences in the test
> > > suite runs that will be addressed in later patches, but over all it works
> > > well.
> > 
> > What kind of differences?  I assume you checked all, and all differences
> > are an improvement, or the differences are inconsequential and the test
> > is not very good?
> 
> I have submitted patches for these before, and I will shortly ping or resubmit
> them again.  But I felt this was more import to get in before worrying about
> changes to the testsuite.

That isn't what I asked?  I don't want to investigate all these bugs, I
just asked if you did and found nothing that should hold up this patch.
Just "yes" would work :-)

> The following two tests fail because they are testing the old libquadmath 'q'
> built-ins, and there is a subtle difference between using the _Float128
> built-in function for the nans function and the long double built-in function
> for nans.  In particular, the signalling NaN is silently converted to a quiet
> NaN.
> 
> * gcc.dg/torture/float128-nan.c
> * gcc.target/powerpc/nan128-1.c

Both tests needs to use -fsignaling-nans.  Do they work better with that
fixed?

> > > +_Float128
> > > +__strtokf (const char *string, char **endptr)
> > > +{
> > > +  long double num;
> > > +
> > > +  if (__strtoieee128)
> > > +return __strtoieee128 (string, endptr);
> > > +
> > > +  num = strtold (string, endptr);
> > > +  return (_Float128) num;
> > 
> > Do not cast return values please.  All casts you do should be *needed*,
> > have a purpose.
> 
> Well it is changing type (num is long double, i.e. IBM 128-bit long double) 
> and
> the return is _Float128.

And this does not make a difference since function prototypes exist.  So
please don't.  *All* casts there are should be there for a reason; and
the few casts you end up using all look like a "danger!" sign to a
careful reader (they hide compiler warnings, and that is the *least*
trouble they cause!)


Thanks,


Segher


RE: [PATCH] aarch64: Add internal tune flag to minimise VL-based scalar ops

2021-02-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Kyrylo Tkachov via Gcc-patches
> Sent: 16 February 2021 15:20
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] aarch64: Add internal tune flag to minimise VL-based scalar
> ops
> 
> Hi all,
> 
> This patch introduces an internal tune flag to break up VL-based scalar ops
> into a GP-reg scalar op with the VL read kept separate. This can be preferable
> on some CPUs.
> 
> I went for a tune param rather than extending the rtx costs as our RTX costs
> tables aren't set up to track
> this intricacy.
> 
> I've confirmed that on the simple loop:
> void vadd (int *dst, int *op1, int *op2, int count)
> {
>   for (int i = 0; i < count; ++i)
> dst[i] = op1[i] + op2[i];
> }
> 
> we now split the incw into a cntw outside the loop and the add inside.
> 
> +   cntwx5
> ...
> loop:
> -   incwx4
> +   add x4, x4, x5
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> This is a minimally invasive fix to help the performance of just -
> mcpu=neoverse-v1 for
> GCC 11 so I'd like to have it in stage4 if possible,
> but I'd appreciate some feedback on the risk assessment of it.

After some offline discussion and evaluation with Richard Sandiford we are 
happy with the patch to go in now as it's a minimal codegen change isolated to 
a specific non-default tuning, so I've pushed the patch.
Thanks,
Kyrill

> 
> Thanks,
> Kyrill
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants):
>   Define.
>   * config/aarch64/aarch64.md (add3): Force
> CONST_POLY_INT immediates
>   into a register when the above is enabled.
>   * config/aarch64/aarch64.c (neoversev1_tunings):
>   AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.
>   (aarch64_rtx_costs): Use
> AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.
> 
> gcc/testsuite/
> 
>   * gcc.target/aarch64/sve/cse_sve_vl_constants_1.c: New test.


[PATCH] [libstdc++] Refactor/cleanup of atomic wait implementation

2021-02-22 Thread Thomas Rodgers
From: Thomas Rodgers 

This is a substantial rewrite of the atomic wait/notify (and timed wait
counterparts) implementation.

The previous __platform_wait looped on EINTR however this behavior is
not required by the standard. A new _GLIBCXX_HAVE_PLATFORM_WAIT macro
now controls whether wait/notify are implemented using a platform
specific primitive or with a platform agnostic mutex/condvar. This
patch only supplies a definition for linux futexes. A future update
could add support __ulock_wait/wake on Darwin, for instance.

The members of __waiters were lifted to a new base class. The members
are now arranged such that overall sizeof(__waiters_base) fits in two
cache lines (on platforms with at least 64 byte cache lines). The
definition will also use destructive_interference_size for this if it
is available.

The __waiters type is now specific to untimed waits. Timed waits have a
corresponding __timed_waiters type. Much of the code has been moved from
the previous __atomic_wait() free function to the __waiter_base template
and a __waiter derived type is provided to implement the un-timed wait
operations. A similar change has been made to the timed wait
implementation.

The __atomic_spin code has been extended to take a spin policy which is
invoked after the initial busy wait loop. The default policy is to
return from the spin. The timed wait code adds a timed backoff spinning
policy. The code from  which implements this_thread::sleep_for,
sleep_until has been moved to a new  header
which allows the thread sleep code to be consumed without pulling in the
whole of .

The entry points into the wait/notify code have been restructured to
support either -
   * Testing the current value of the atomic stored at the given address
 and waiting on a notification.
   * Applying a predicate to determine if the wait was satisfied.
The entry points were renamed to make it clear that the wait and wake
operations operate on addresses. The first variant takes the expected
value and a function which returns the current value that should be used
in comparison operations, these operations are named with a _v suffix
(e.g. 'value'). All atomic<_Tp> wait/notify operations use the first
variant. Barriers, latches and semaphores use the predicate variant.

This change also centralizes what it means to compare values for the
purposes of atomic::wait rather than scattering through individual
predicates.

This change also centralizes the repetitive code which adjusts for
different user supplied clocks (this should be moved elsewhere
and all such adjustments should use a common implementation).

libstdc++-v3/ChangeLog:
* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/bits/atomic_base.h: Adjust all calls
to __atomic_wait/__atomic_notify for new call signatures.
* include/bits/atomic_wait.h: Extensive rewrite.
* include/bits/atomic_timed_wait.h: Likewise.
* include/bits/semaphore_base.h: Adjust all calls
to __atomic_wait/__atomic_notify for new call signatures.
* include/bits/std_thread_sleep.h: New file.
* include/std/atomic: Likewise.
* include/std/barrier: Likewise.
* include/std/latch: Likewise.
* testsuite/29_atomics/atomic/wait_notify/bool.cc: Simplify
test.
* testsuite/29_atomics/atomic/wait_notify/generic.cc: Likewise.
* testsuite/29_atomics/atomic/wait_notify/pointers.cc: Likewise.
* testsuite/29_atomics/atomic_flag/wait_notify.cc: Likewise.
* testsuite/29_atomics/atomic_float/wait_notify.cc: Likewise.
* testsuite/29_atomics/atomic_integral/wait_notify.cc: Likewise.
* testsuite/29_atomics/atomic_ref/wait_notify.cc: Likewise.
---
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/atomic_base.h   |  36 +-
 libstdc++-v3/include/bits/atomic_timed_wait.h | 398 +++--
 libstdc++-v3/include/bits/atomic_wait.h   | 400 +++---
 libstdc++-v3/include/bits/semaphore_base.h|  73 +---
 libstdc++-v3/include/bits/std_thread_sleep.h  | 119 ++
 libstdc++-v3/include/std/atomic   |  15 +-
 libstdc++-v3/include/std/barrier  |   4 +-
 libstdc++-v3/include/std/latch|   4 +-
 libstdc++-v3/include/std/thread   |  68 +--
 .../29_atomics/atomic/wait_notify/bool.cc |  37 +-
 .../29_atomics/atomic/wait_notify/generic.cc  |  19 +-
 .../29_atomics/atomic/wait_notify/pointers.cc |  36 +-
 .../29_atomics/atomic_flag/wait_notify/1.cc   |  37 +-
 .../29_atomics/atomic_float/wait_notify.cc|  26 +-
 .../29_atomics/atomic_integral/wait_notify.cc |  73 ++--
 .../29_atomics/atomic_ref/wait_notify.cc  |  74 +---
 18 files changed, 804 insertions(+), 617 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/std_thread_sleep.h

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v

[PATCH 2/2] libstdc++: Fix endianness issue with IBM long double [PR98384]

2021-02-22 Thread Patrick Palka via Gcc-patches
The logic in std::to_chars for extracting the high- and low-order parts
of a IBM long double value does the right thing on powerpc64le, but not
on powerpc64be.  This patch makes the extraction endian-agnostic, which
fixes the execution FAIL of to_chars/long_double.cc on powerpc64be.

Tested on powerpc64le and powerpc64be.  Does this look OK for trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/98384
* src/c++17/floating_to_chars.cc (get_ieee_repr): Extract
the high- and low-order parts from an IBM long double value
in an endian-agnostic way.
---
 libstdc++-v3/src/c++17/floating_to_chars.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc 
b/libstdc++-v3/src/c++17/floating_to_chars.cc
index a50548acae7..4b2f85c1c1a 100644
--- a/libstdc++-v3/src/c++17/floating_to_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_to_chars.cc
@@ -395,11 +395,11 @@ namespace
   // of the high part, and we merge the mantissa of the high part with the
   // mantissa (and the implicit leading bit) of the low part.
   using uint_t = unsigned __int128;
-  uint_t value_bits = 0;
-  memcpy(&value_bits, &value, sizeof(value_bits));
+  uint64_t value_bits[2] = {};
+  memcpy(value_bits, &value, sizeof(value_bits));
 
-  const uint64_t value_hi = value_bits;
-  const uint64_t value_lo = value_bits >> 64;
+  const uint64_t value_hi = value_bits[0];
+  const uint64_t value_lo = value_bits[1];
 
   uint64_t mantissa_hi = value_hi & ((1ull << 52) - 1);
   unsigned exponent_hi = (value_hi >> 52) & ((1ull << 11) - 1);
-- 
2.30.1.559.g2283e0e9af



[PATCH 1/2] libstdc++: Robustify long double std::to_chars testcase [PR98384]

2021-02-22 Thread Patrick Palka via Gcc-patches
This makes the hexadecimal section of the long double std::to_chars
testcase more robust by avoiding false-negative FAILs due to printf
using a different leading hex digit than us, and by additionally
verifying the correctness of the hexadecimal form via round-tripping
through std::from_chars.

Tested on x86, x86_64, powerpc64be, powerpc64le and aarch64.  Does this
look OK for trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/98384
* testsuite/20_util/to_chars/long_double.cc: Include .
(test01): Simplify verifying the nearby values by using a
2-iteration loop and a dedicated output buffer to check that the
nearby values are different.  Factor out the printf-based
verification into a local function, and check that the leading
hex digits agree before comparing with the output of printf.
Also verify the output by round-tripping it through from_chars.
---
 .../testsuite/20_util/to_chars/long_double.cc | 73 ---
 1 file changed, 47 insertions(+), 26 deletions(-)

diff --git a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
index 4f72cb65400..34cc03034cc 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -50,6 +51,38 @@ namespace detail
 void
 test01()
 {
+  // Verifies correctness of the hexadecimal form [BEGIN,END) for VALUE by
+  // round-tripping it through from_chars (if available).
+  auto verify_via_from_chars = [] (char *begin, char *end, long double value) {
+#if __cpp_lib_to_chars >= 201611L || _GLIBCXX_HAVE_USELOCALE
+long double roundtrip;
+auto result = from_chars(begin, end, roundtrip, chars_format::hex);
+VERIFY( result.ec == errc{} );
+VERIFY( result.ptr == end );
+VERIFY( roundtrip == value );
+#endif
+  };
+
+  // Verifies correctness of the null-terminated hexadecimal form at BEGIN
+  // for VALUE and PRECISION by comparing it with the output of printf's %La
+  // conversion specifier.
+  auto verify_via_printf = [] (char *begin, long double value,
+  optional precision = nullopt) {
+char printf_buffer[1024] = {};
+if (precision.has_value())
+  sprintf(printf_buffer, "%.*La", precision.value(), value);
+else
+  sprintf(printf_buffer, "%La", value);
+
+// Only compare with the output of printf if the leading hex digits agree.
+// If the leading hex digit of our form doesn't agree with that of printf,
+// then the two forms may still be equivalent (e.g. 1.1p+0 vs 8.8p-3), so 
we
+// don't want a FAIL in this case.  But if the leading hex digits do agree,
+// then we do expect the two forms to be the same.
+if (printf_buffer[strlen("0x")] == begin[0])
+  VERIFY( !strcmp(begin, printf_buffer+strlen("0x")) );
+  };
+
   const long double hex_testcases[]
 = { detail::nextdownl(numeric_limits::max()),
detail::nextupl(numeric_limits::min()),
@@ -92,38 +125,27 @@ test01()
if (testcase == 0.0L || isinf(testcase))
  continue;
 
-   char to_chars_buffer[1024], printf_buffer[1024];
-   memset(to_chars_buffer, '\0', sizeof(to_chars_buffer));
-   memset(printf_buffer, '\0', sizeof(printf_buffer));
-
+   char to_chars_buffer[1024] = {};
auto result = to_chars(begin(to_chars_buffer), end(to_chars_buffer),
   testcase, chars_format::hex);
VERIFY( result.ec == errc{} );
*result.ptr = '\0';
-   sprintf(printf_buffer, "%La", testcase);
-   VERIFY( !strcmp(to_chars_buffer, printf_buffer+strlen("0x")) );
+   verify_via_from_chars(begin(to_chars_buffer), result.ptr, testcase);
+   verify_via_printf(to_chars_buffer, testcase);
 
+   // Verify the nearby values, and also check they have a different
+   // shortest form.
+   for (long double nearby
+: { detail::nextdownl(testcase), detail::nextupl(testcase) })
  {
-   // Verify that the nearby values have a different shortest form.
-   testcase = detail::nextdownl(testcase);
-   result = to_chars(begin(to_chars_buffer), end(to_chars_buffer),
- testcase, chars_format::hex);
-   VERIFY( result.ec == errc{} );
-   *result.ptr = '\0';
-   VERIFY( strcmp(to_chars_buffer, printf_buffer+strlen("0x")) != 0);
-   sprintf(printf_buffer, "%La", testcase);
-   VERIFY( !strcmp(to_chars_buffer, printf_buffer+strlen("0x")) );
-
-   testcase = detail::nextupl(detail::nextupl(testcase));
-   result = to_chars(begin(to_chars_buffer), end(to_chars_buffer),
- testcase, chars_format::hex);
+   char nearby_buffer[1024] = {};
+   result = to_chars(begin(nearby_buffer), end(nearby_buffer),
+ 

Re: [PATCH 2/2] libstdc++: Fix endianness issue with IBM long double [PR98384]

2021-02-22 Thread Jonathan Wakely via Gcc-patches
On Mon, 22 Feb 2021, 22:13 Patrick Palka via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> The logic in std::to_chars for extracting the high- and low-order parts
> of a IBM long double value does the right thing on powerpc64le, but not
> on powerpc64be.  This patch makes the extraction endian-agnostic, which
> fixes the execution FAIL of to_chars/long_double.cc on powerpc64be.
>
> Tested on powerpc64le and powerpc64be.  Does this look OK for trunk?
>


I was going to suggest __builtin_unpack_longdouble but actually since you
want the bits this is better. You'd need another step to get the bits from
each double.

OK for trunk.


Re: [PATCH] C++: target attribute - local decl

2021-02-22 Thread Jason Merrill via Gcc-patches

On 2/18/21 7:15 AM, Martin Liška wrote:

We crash when target attribute get_function_versions_dispatcher is called
for a function that is not registered in call graph. This fixes that
by emitting a new error.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/cp/ChangeLog:

 PR c++/99108
 * call.c (get_function_version_dispatcher): Do not parse
 target attribute for a function with a missing declaration.

gcc/testsuite/ChangeLog:

 PR c++/99108
 * g++.target/i386/pr99108.C: New test.
---
  gcc/cp/call.c   |  8 +++-
  gcc/testsuite/g++.target/i386/pr99108.C | 18 ++
  2 files changed, 25 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.target/i386/pr99108.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 186feef6fe3..844853e504e 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8386,8 +8386,14 @@ get_function_version_dispatcher (tree fn)
    && DECL_FUNCTION_VERSIONED (fn));

    gcc_assert (targetm.get_function_versions_dispatcher);
-  dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
+  if (cgraph_node::get (fn) == NULL)
+    {
+  error_at (DECL_SOURCE_LOCATION (fn), "missing declaration "
+    "for a multiversioned function");


This seems like the wrong message.  The declaration isn't missing, 
you're using its source location for the error.  Do you mean 
"definition"?  But adding definitions earlier in the file doesn't help. 
And having file-scope declarations without definitions is fine.


The problem seems to be with the handling of local decls.  If 
DECL_LOCAL_DECL_P, you need to look at DECL_LOCAL_DECL_ALIAS to find the 
namespace-scope decl.  But then if there is no preceding namespace-scope 
declaration, the new decl created by push_local_extern_decl_alias 
doesn't have a cgraph node, either.  I guess maybe_function_versions 
also needs to look through DECL_LOCAL_DECL_ALIAS.



+  return NULL;
+    }

+  dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
    if (dispatcher_decl == NULL)
  {
    error_at (input_location, "use of multiversioned function "
diff --git a/gcc/testsuite/g++.target/i386/pr99108.C 
b/gcc/testsuite/g++.target/i386/pr99108.C

new file mode 100644
index 000..b0c4ffa2688
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr99108.C
@@ -0,0 +1,18 @@
+/* PR c++/99108 */
+/* { dg-do compile { target c++20 } } */
+/* { dg-require-ifunc "" }  */
+
+struct A {
+  void foo(auto);
+};
+void A::foo(auto)
+{
+  int f(void) __attribute__((target("default")));
+  int f(void) __attribute__((target("arch=atom"))); /* { dg-error 
"missing declaration for a multiversioned function" } */

+  int b = f();
+}
+void bar(void)
+{
+  A c;
+  c.foo(7);
+}




Re: [PATCH v7] Practical improvement to libgcc complex divide

2021-02-22 Thread Patrick McGehearty via Gcc-patches

I decided to use float_h_prefix instead of modename or mode prefix
as Joseph has better insight as to naming expectations when
others are working in gcc/c-family/c-cppbuiltin.c.

All requested changes made and tested. Patch coming shortly.

- patrick


On 2/19/2021 5:48 PM, Joseph Myers wrote:

On Fri, 19 Feb 2021, Patrick McGehearty via Gcc-patches wrote:


Here you're properly computing the mapping from mode to float.h macro
prefix (though I think "modename" is a confusing name for the variable
used for float.h macro prefixes; to me, "modename" suggests the names such
as SF or DF, which are actually "name" here, and something like
"float_h_prefix" would be better than "modename").

float_h_prefix puts me in mind of half-precision floating point rather than
float.h.
How would modeprefix do instead of modename?

I'm not sure modeprefix is better; you could say SF is the prefix of
SFmode, for example.  Hence my suggestion of a name that makes clear it's
the prefix *in float.h*.  typeprefix might be a possibility, as it's
really a prefix associated with a corresponding type rather than directly
with the mode.





[PATCH] Require SHF_GNU_RETAIN support for retain test

2021-02-22 Thread H.J. Lu via Gcc-patches
Since retain attribute requires SHF_GNU_RETAIN, run retain tests only
if SHF_GNU_RETAIN is supported.

PR testsuite/99173
* c-c++-common/attr-retain-5.c: Require R_flag_in_section.
* c-c++-common/attr-retain-6.c: Likewise.
* c-c++-common/attr-retain-7.c: Likewise.
* c-c++-common/attr-retain-8.c: Likewise.
* c-c++-common/attr-retain-9.c: Likewise.
---
 gcc/testsuite/c-c++-common/attr-retain-5.c | 2 +-
 gcc/testsuite/c-c++-common/attr-retain-6.c | 2 +-
 gcc/testsuite/c-c++-common/attr-retain-7.c | 2 +-
 gcc/testsuite/c-c++-common/attr-retain-8.c | 2 +-
 gcc/testsuite/c-c++-common/attr-retain-9.c | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/attr-retain-5.c 
b/gcc/testsuite/c-c++-common/attr-retain-5.c
index ee6e2c4e054..1f3f8bfb5c5 100644
--- a/gcc/testsuite/c-c++-common/attr-retain-5.c
+++ b/gcc/testsuite/c-c++-common/attr-retain-5.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target R_flag_in_section } } */
 /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */
 /* { dg-options "-Wall -O2" } */
 
diff --git a/gcc/testsuite/c-c++-common/attr-retain-6.c 
b/gcc/testsuite/c-c++-common/attr-retain-6.c
index 9aead148ec8..ed6cfae2948 100644
--- a/gcc/testsuite/c-c++-common/attr-retain-6.c
+++ b/gcc/testsuite/c-c++-common/attr-retain-6.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target R_flag_in_section } } */
 /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */
 /* { dg-options "-Wall -O2" } */
 
diff --git a/gcc/testsuite/c-c++-common/attr-retain-7.c 
b/gcc/testsuite/c-c++-common/attr-retain-7.c
index 4c1673583db..c56d18a88e0 100644
--- a/gcc/testsuite/c-c++-common/attr-retain-7.c
+++ b/gcc/testsuite/c-c++-common/attr-retain-7.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target R_flag_in_section } } */
 /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */
 /* { dg-options "-Wall -O2" } */
 
diff --git a/gcc/testsuite/c-c++-common/attr-retain-8.c 
b/gcc/testsuite/c-c++-common/attr-retain-8.c
index 7e067f56274..c1a9b52972e 100644
--- a/gcc/testsuite/c-c++-common/attr-retain-8.c
+++ b/gcc/testsuite/c-c++-common/attr-retain-8.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target R_flag_in_section } } */
 /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */
 /* { dg-options "-Wall -O2" } */
 
diff --git a/gcc/testsuite/c-c++-common/attr-retain-9.c 
b/gcc/testsuite/c-c++-common/attr-retain-9.c
index 81accc0ee91..e6d7c0b43da 100644
--- a/gcc/testsuite/c-c++-common/attr-retain-9.c
+++ b/gcc/testsuite/c-c++-common/attr-retain-9.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target R_flag_in_section } } */
 /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */
 /* { dg-options "-Wall -O2" } */
 
-- 
2.29.2



Re: [PATCH] handle bad __dynamic_cast more gracefully (PR 99074)

2021-02-22 Thread Jason Merrill via Gcc-patches

On 2/13/21 7:31 PM, Martin Sebor wrote:

The test case in PR 99074 invokes dynamic_cast with the this pointer
in a non-static member function called on a null pointer.  The call
is, of course, undefined and other different circumstances would be
diagnosed by -Wnonnull.   Unfortunately, in the test case, the null
pointer is the result of inlining and constant propagation and so
detected neither by the front end -Wnonnull nor by the middle end.
The program ends up passing it to __dynamic_cast() which then
crashes at runtime (again, not surprising for undefined behavior.

However, the reporter says the code behaved gracefully (didn't crash)
when compiled with GCC 4.8, and in my tests it also doesn't crash
when compiled with Clang or ICC.  I looked to see if it's possible
to do better and it seems it is.

The attached patch improves things by changing __dynamic_cast to
fail by returning null when the first argument is null, and also


This hunk is OK.


by declaring __dynamic_cast with attribute nonnull so that invalid
calls to it with a constant null pointer can be detected at compile
time.


This is not; dynamic_cast is specified to return null for a null operand.

"If v is a null pointer value, the result is a null pointer value."

The undefined behavior is the call to _to_object, not the dynamic_cast.


Since the test case is undefined it seems borderline whether this
can strictly be considered a regression, even if some previous
releases handled it more gracefully.


Indeed.  But handling the null case in __dynamic_cast as well as in the 
compiler seems harmless enough.


Jason



[PATCH v8] Practical improvement to libgcc complex divide

2021-02-22 Thread Patrick McGehearty via Gcc-patches
Changes in this version from Version 7:
gcc/c-family/c-cppbuiltin.c
Changed modename to float_h_prefix
Removed (mode == TYPE_MODE...) code left over from earlier approach.
libgcc/libgcc2.c
Removed conditional use of XF constants in TF case.
Also left over from earlier approach.

Tested _Float64x case on x86. Works as expected.

Correctness and performance test programs used during development of
this project may be found in the attachment to:
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html

Summary of Purpose

This patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test data by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rare and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Further investigation showed changing the half precision algorithm
to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no
loss of precision and modest improvement in performance.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.  This
use is made generic by defining appropriate __LIBGCC2_* macros in
c-cppbuiltin.c.

Tests are added for when both parts of the denominator have exponents
small enough to allow shifting any subnormal values to normal values
all input values could be scaled up without risking overflow. That
gained a clear improvement in accuracy. Similarly, when either
numerator was subnormal and the other numerator and both denominator
values were not too large, scaling could be used to reduce risk of
computing with subnormals.  The test and scaling values used all fit
within the allowed exponent range for each precision required by the C
standard.

Float precision has more difficulty with getting correct answers than
double pre

Re: [PATCH] Require SHF_GNU_RETAIN support for retain test

2021-02-22 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 22, 2021 at 03:06:58PM -0800, H.J. Lu via Gcc-patches wrote:
> Since retain attribute requires SHF_GNU_RETAIN, run retain tests only
> if SHF_GNU_RETAIN is supported.
> 
>   PR testsuite/99173
>   * c-c++-common/attr-retain-5.c: Require R_flag_in_section.
>   * c-c++-common/attr-retain-6.c: Likewise.
>   * c-c++-common/attr-retain-7.c: Likewise.
>   * c-c++-common/attr-retain-8.c: Likewise.
>   * c-c++-common/attr-retain-9.c: Likewise.

Ok, thanks.

Jakub



[committed] analyzer: handle error/error_at_line [PR99196]

2021-02-22 Thread David Malcolm via Gcc-patches
PR analyzer/99196 describes a false positive from -fanalyzer due to
the analyzer not "knowing" that calls to GNU libc's error(3) with a
nonzero status terminate the process and thus don't return.

This patch fixes the false positive by special-casing "error" and
"error_at_line".

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r11-7333-g5ee4ba031dd9fc60bf2494ca30f46c0acaa34805.

gcc/analyzer/ChangeLog:
PR analyzer/99196
* engine.cc (exploded_node::on_stmt): Provide terminate_path
flag as a way for on_call_pre to terminate the current analysis
path.
* region-model-impl-calls.cc (call_details::num_args): New.
(region_model::impl_call_error): New.
* region-model.cc (region_model::on_call_pre): Add param
"out_terminate_path".  Handle "error" and "error_at_line".
* region-model.h (call_details::num_args): New decl.
(region_model::on_call_pre): Add param "out_terminate_path".
(region_model::impl_call_error): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/99196
* gcc.dg/analyzer/error-1.c: New test.
* gcc.dg/analyzer/error-2.c: New test.
* gcc.dg/analyzer/error-3.c: New test.
---
 gcc/analyzer/engine.cc  |  6 ++-
 gcc/analyzer/region-model-impl-calls.cc | 38 ++
 gcc/analyzer/region-model.cc| 22 -
 gcc/analyzer/region-model.h |  7 ++-
 gcc/testsuite/gcc.dg/analyzer/error-1.c | 66 +
 gcc/testsuite/gcc.dg/analyzer/error-2.c | 48 ++
 gcc/testsuite/gcc.dg/analyzer/error-3.c | 11 +
 7 files changed, 194 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/error-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/error-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/error-3.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 89b3be22ecb..0edeb490a53 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1132,6 +1132,7 @@ exploded_node::on_stmt (exploded_graph &eg,
  stmt);
 
   bool unknown_side_effects = false;
+  bool terminate_path = false;
 
   switch (gimple_code (stmt))
 {
@@ -1203,7 +1204,7 @@ exploded_node::on_stmt (exploded_graph &eg,
  }
else
  unknown_side_effects
-   = state->m_region_model->on_call_pre (call, &ctxt);
+   = state->m_region_model->on_call_pre (call, &ctxt, &terminate_path);
   }
   break;
 
@@ -1215,6 +1216,9 @@ exploded_node::on_stmt (exploded_graph &eg,
   break;
 }
 
+  if (terminate_path)
+return on_stmt_flags::terminate_path ();
+
   bool any_sm_changes = false;
   int sm_idx;
   sm_state_map *smap;
diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index d3b66ba7375..72404a5bc61 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -96,6 +96,14 @@ call_details::maybe_set_lhs (const svalue *result) const
 return false;
 }
 
+/* Return the number of arguments used by the call statement.  */
+
+unsigned
+call_details::num_args () const
+{
+  return gimple_call_num_args (m_call);
+}
+
 /* Get argument IDX at the callsite as a tree.  */
 
 tree
@@ -240,6 +248,36 @@ region_model::impl_call_calloc (const call_details &cd)
   return true;
 }
 
+/* Handle the on_call_pre part of "error" and "error_at_line" from
+   GNU's non-standard .
+   MIN_ARGS identifies the minimum number of expected arguments
+   to be consistent with such a call (3 and 5 respectively).
+   Return true if handling it as one of these functions.
+   Write true to *OUT_TERMINATE_PATH if this execution path should be
+   terminated (e.g. the function call terminates the process).  */
+
+bool
+region_model::impl_call_error (const call_details &cd, unsigned min_args,
+  bool *out_terminate_path)
+{
+  /* Bail if not enough args.  */
+  if (cd.num_args () < min_args)
+return false;
+
+  /* Initial argument ought to be of type "int".  */
+  if (cd.get_arg_type (0) != integer_type_node)
+return false;
+
+  /* The process exits if status != 0, so it only continues
+ for the case where status == 0.
+ Add that constraint, or terminate this analysis path.  */
+  tree status = cd.get_arg_tree (0);
+  if (!add_constraint (status, EQ_EXPR, integer_zero_node, cd.get_ctxt ()))
+*out_terminate_path = true;
+
+  return true;
+}
+
 /* Handle the on_call_post part of "free", after sm-handling.
 
If the ptr points to an underlying heap region, delete the region,
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 159b0f18654..2053f8f79bb 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -741,10 +741,14 @@ region_model::on_assignment (const gassign *assign, 
region_model_context *ctxt)
 
Return true if the function call has u

Re: [PATCH] doc: c: c++: Document the C/C++ extended asm empty input constraints

2021-02-22 Thread Neven Sajko via Gcc-patches
On Mon, 22 Feb 2021 at 16:30, Segher Boessenkool
 wrote:
>
> Hi!
>
> First off, thanbk you for the patch!

You're welcome!

> On Mon, Feb 15, 2021 at 11:22:52PM +, Neven Sajko via Gcc-patches wrote:
> > There is a long-standing, but undocumented GCC inline assembly feature
> > that's part of the extended asm GCC extension to C and C++: extended
> > asm empty input constraints.
>
> There is no such thing.  *All* empty constraints have the same
> semantics: anything whatsoever will do.  Any register, any constant, any
> memory.

What I was trying to express is that input operand constraints are
unlike output operand constraints in that they can be empty. I now
realize I ended up being slightly confusing, though.

> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -1131,7 +1131,102 @@ the addressing register.
> >  @subsection Simple Constraints
> >  @cindex simple constraints
> >
> > -The simplest kind of constraint is a string full of letters, each of
> > +An input constraint is allowed to be an empty string, in which case it is
> > +called an empty input constraint.
>
> That is just shorthand for "empty constraint that is used for an input
> operand".  It is not special, and it *is* documented:
> https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
>   The simplest kind of constraint is a string full of letters, each of
>   which describes one kind of operand that is permitted.
>
> A length zero string is allowed as well.  This could be made more
> explicit sure; OTOH, it isn't very often useful.  So your example
> (using it for making a dependency) is certainly useful to have.  But
> it is not a special case at all.

Syntactically, it's not a special case; but I definitely think the
semantics could be better documented. Proof:

* There's a relevant Stack Overflow question. If I didn't know better
I'd conclude from the discussion there that empty input constraints
are undocumented and unsupported, and there would surely be an answer
if the documentation on the GCC side was a bit better:
https://stackoverflow.com/questions/63305223/gcc-asm-with-empty-input-operand-constraint

* Clang erroneously doesn't support empty constraints for many years
now (even though their internal documentation still says empty input
constraints are supported, and external documentation says they
support all the same constraints as GCC does). I suppose they may have
been mislead by the lack of explicit mention of the feature in GCC's
documentation.

> > (When an empty input constraint is used,
> > +the assembler template will most probably also be empty. I.e., the 
> > @code{asm}
> > +declaration need not contain actual assembly code.)
>
> Don't use parentheses like this in documentation please.

OK.

> > An empty input
> > +constraint can be used to create an artificial dependency on a C or C++
> > +variable (the variable that appears in the expression associated with the
> > +constraint) without incurring unnecessary costs to performance.
>
> It still needs a register (or memory) reserved there (or sometimes a
> constant can be used, but you have no dependency in that case!)

Yeah, this is a bit more complicated than I perhaps implied. An asm
volatile can tell the compiler "I need this value calculated at this
point", but the compiler may still choose to eliminate the calculation
from the generated code if it can perform it itself at compilation
time. Thus currently the programmer must be able to predict if GCC
will be able compute the value of some variable or expression; the
good thing is that this is usually easy to predict.

> > +An example of where such behavior may be useful is for preventing compiler
> > +optimizations like dead store elimination or hoisting code outside a loop 
> > for
> > +certain pieces of C or C++ code.
>
> You should not think about preventing the compiler from doing something.
> Instead, you can give the compiler extra information that makes it *do*
> something: it has to, because it has to implement the semantics your
> source program has.
>
> > Specific applications may include direct
> > +interaction with hardware features; or things like testing, fuzzing and
> > +benchmarking.
>
> What does this mean?

The manual already has examples for "direct interaction with hardware features".

Benchmarking is another relatively well known example of an activity
during which we may be inconvenienced by the compiler doing dead store
elimination and loop hoisting at certain specific places in the code.
E.g., Google's Benchmark has DoNotOptimize and Facebook's Folly has
doNotOptimizeAway:

https://github.com/google/benchmark/blob/master/include/benchmark/benchmark.h#L308
https://github.com/facebook/folly/blob/master/folly/BenchmarkUtil.h#L73

Unit testing and fuzzing are other such examples, specifically when
trying to test for undefined behavior with sanitizers.

> Here is a simple example showing why this isn't as simple to use as
> you imply here:
>
> ===
> void f(int x)

PING 4 [PATCH] correct fix to avoid false positives for vectorized stores (PR 96963, 94655)

2021-02-22 Thread Martin Sebor via Gcc-patches

Ping 4:
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564059.html

On 2/14/21 5:43 PM, Martin Sebor wrote:

Ping 3:
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564059.html

This is a fix for two P2 bugs (false positives).

On 2/6/21 10:13 AM, Martin Sebor wrote:

Ping 2:
   https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564059.html

On 1/29/21 10:20 AM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564059.html

On 1/21/21 4:38 PM, Martin Sebor wrote:

The hack I put in compute_objsize() last January for pr93200 isn't
quite correct.  It happened to suppress the false positive there
but, due to what looks like a thinko on my part, not in some other
test cases involving vectorized stores.

The attached change adjusts the hack to have compute_objsize() give
up on all MEM_REFs with a vector type.  This effectively disables
-Wstringop-{overflow,overread} for vector accesses (either written
by the user or synthesized by GCC from ordinary accesses).  It
doesn't affect -Warray-bounds because this warning doesn't use
compute_objsize() yet.  When it does (it should considerably
simplify the code) some additional changes will be needed to
preserve -Warray-bounds for out of bounds vector accesses.
The test this patch adds should serve as a reminder to make
it if we forget.

Tested on x86_64-linux.  Since PR 94655 was reported against GCC
10 I'd like to apply this fix to both the trunk and the 10 branch.

Martin










Committed: g++.dg/warn/Wplacement-new-size-1.C, -2, -6: Fix for default_packed targets

2021-02-22 Thread Hans-Peter Nilsson via Gcc-patches
Looking at commit de05c19d5fd6, that adjustment to these
tests apparently assumed that the testsuite is run (only) on
targets where structure memory layout has padding as per
"natural alignment".  For cris-elf, a target with no padding
in structure memory layout, these tests have been failing
since that commit.

Tested cris-elf and x86_64-linux, committed as obvious.

gcc/testsuite:
* g++.dg/warn/Wplacement-new-size-1.C,
g++.dg/warn/Wplacement-new-size-2.C,
g++.dg/warn/Wplacement-new-size-6.C: Adjust for
default_packed targets.
---
 gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C | 12 ++--
 gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C | 14 +++---
 gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C | 22 +++---
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C 
b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C
index cec83163dbe7..1ad5c2da75af 100644
--- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C
+++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C
@@ -67,8 +67,8 @@ void fBx (BAx *pbx, BAx &rbx)
 {
   BAx bax;
   // The uninitialized flexible array takes up the bytes of padding.
-  new (bax.ax.a) char;
-  new (bax.ax.a) Int16;
+  new (bax.ax.a) char; // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax.ax.a) Int16;// { dg-warning "placement" "" { target 
default_packed } }
   new (bax.ax.a) Int32;// { dg-warning "placement" }
 
   new (pbx->ax.a) char;
@@ -86,10 +86,10 @@ void fBx1 ()
   static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ {} } };
 
   // The empty flexible array takes up the bytes of padding.
-  new (bax1.ax.a) char;
-  new (bax1.ax.a) char[2];
-  new (bax1.ax.a) Int16;
-  new (bax1.ax.a) char[3];
+  new (bax1.ax.a) char; // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) char[2];  // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) Int16;// { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) char[3];  // { dg-warning "placement" "" { target 
default_packed } }
   new (bax1.ax.a) char[4];  // { dg-warning "placement" }
   new (bax1.ax.a) Int32;// { dg-warning "placement" }
 }
diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C 
b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C
index e5fdfe1f603e..a4de2b04629c 100644
--- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C
+++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C
@@ -125,9 +125,9 @@ void fBx (BAx *pbx, BAx &rbx)
 {
   BAx bax;
   // The uninitialized flexible array takes up the bytes of padding.
-  new (bax.ax.a) char;
-  new (bax.ax.a) Int16;
-  new (bax.ax.a) char[3];
+  new (bax.ax.a) char;// { dg-warning "placement" "" { target 
default_packed } }
+  new (bax.ax.a) Int16;   // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax.ax.a) char[3]; // { dg-warning "placement" "" { target 
default_packed } }
   new (bax.ax.a) Int32;   // { dg-warning "placement" }
   new (bax.ax.a) char[4]; // { dg-warning "placement" }
   new (bax.ax.a) char[5]; // { dg-warning "placement" }
@@ -147,10 +147,10 @@ void fBx1 ()
   static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ {} } };
 
   // The empty flexible array takes up the bytes of padding.
-  new (bax1.ax.a) char;
-  new (bax1.ax.a) char[2];
-  new (bax1.ax.a) Int16;
-  new (bax1.ax.a) char[3];
+  new (bax1.ax.a) char;   // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) char[2];// { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) Int16;  // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) char[3];// { dg-warning "placement" "" { target 
default_packed } }
   new (bax1.ax.a) Int32;  // { dg-warning "placement" }
   new (bax1.ax.a) char[4];// { dg-warning "placement" }
   new (bax1.ax.a) char[5];// { dg-warning "placement" }
diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C 
b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C
index 5eb63d23b477..86a18fa08e63 100644
--- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C
+++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C
@@ -18,9 +18,9 @@ void fBx1 ()
   static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ { 3 } } }; // { dg-error 
"initialization of flexible array member in a nested context" }
 
   // The first three bytes of the flexible array member live in the padding.
-  new (bax1.ax.a) char;
-  new (bax1.ax.a) char[2];
-  new (bax1.ax.a) Int16;
+  new (bax1.ax.a) char; // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) char[2];  // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) Int16;// { dg-warning "placement" "" { target 
default_packed } }
   new (bax1.ax.a) Int32;// { dg-warning "placement" }
 }
 
@@ -29,10 +29,10 @@ 

Re: [PATCH] handle bad __dynamic_cast more gracefully (PR 99074)

2021-02-22 Thread Martin Sebor via Gcc-patches

On 2/22/21 4:08 PM, Jason Merrill wrote:

On 2/13/21 7:31 PM, Martin Sebor wrote:

The test case in PR 99074 invokes dynamic_cast with the this pointer
in a non-static member function called on a null pointer.  The call
is, of course, undefined and other different circumstances would be
diagnosed by -Wnonnull.   Unfortunately, in the test case, the null
pointer is the result of inlining and constant propagation and so
detected neither by the front end -Wnonnull nor by the middle end.
The program ends up passing it to __dynamic_cast() which then
crashes at runtime (again, not surprising for undefined behavior.

However, the reporter says the code behaved gracefully (didn't crash)
when compiled with GCC 4.8, and in my tests it also doesn't crash
when compiled with Clang or ICC.  I looked to see if it's possible
to do better and it seems it is.

The attached patch improves things by changing __dynamic_cast to
fail by returning null when the first argument is null, and also


This hunk is OK.


by declaring __dynamic_cast with attribute nonnull so that invalid
calls to it with a constant null pointer can be detected at compile
time.


This is not; dynamic_cast is specified to return null for a null operand.

"If v is a null pointer value, the result is a null pointer value."

The undefined behavior is the call to _to_object, not the dynamic_cast.


Yes, of course.  Just to be clear, in case it's not from the patch,
it adds nonnull to the __dynamic_cast() function in libsupc++ which
(if I read the comment right) documents nonnull-ness as its
precondition.  The function should never be called with a null
pointer except in buggy code like in the PR.  I don't think this
is the most elegant way to diagnose the user bug but I also couldn't
think of anything better.  Do you have any suggestions?  (I ask in
part because for GCC 12 I'd like to see about issuing the warning
requested on PR 12277.)

WRT to the documentation of __dynamic_cast(), I didn't remove
the comment in the patch that mentions the precondition because
of the new warning.  If we want to consider null pointers valid
input to the function it seems we should update the comment.  Do
you agree?

The comment is below.  I assume SUB refers to SRC_PTR.

/* sub: source address to be adjusted; nonnull, and since the
 *  source object is polymorphic, *(void**)sub is a virtual pointer.
 * src: static type of the source object.
 * dst: destination type (the "T" in "dynamic_cast(v)").
 * src2dst_offset: a static hint about the location of the
 *source subobject with respect to the complete object;
 *special negative values are:
 *   -1: no hint
 *   -2: src is not a public base of dst
 *   -3: src is a multiple public base type but never a
 *   virtual base type
 *otherwise, the src type is a unique public nonvirtual
 *base type of dst at offset src2dst_offset from the
 *origin of dst.  */
extern "C" void *
__dynamic_cast (const void *src_ptr,// object started from
const __class_type_info *src_type, // type of the 
starting object

const __class_type_info *dst_type, // desired target type
ptrdiff_t src2dst) // how src and dst are related




Since the test case is undefined it seems borderline whether this
can strictly be considered a regression, even if some previous
releases handled it more gracefully.


Indeed.  But handling the null case in __dynamic_cast as well as in the 
compiler seems harmless enough.


Okay.

Martin



Jason





Re: Committed: g++.dg/warn/Wplacement-new-size-1.C, -2, -6: Fix for default_packed targets

2021-02-22 Thread Martin Sebor via Gcc-patches

On 2/22/21 5:48 PM, Hans-Peter Nilsson wrote:

Looking at commit de05c19d5fd6, that adjustment to these
tests apparently assumed that the testsuite is run (only) on
targets where structure memory layout has padding as per
"natural alignment".  For cris-elf, a target with no padding
in structure memory layout, these tests have been failing
since that commit.

Tested cris-elf and x86_64-linux, committed as obvious.


Thanks for the heads up.  Would declaring the structs with attribute
packed work?  If so, that would seem more robust to me since it would
exercise the behavior on all targets.

Martin



gcc/testsuite:
* g++.dg/warn/Wplacement-new-size-1.C,
g++.dg/warn/Wplacement-new-size-2.C,
g++.dg/warn/Wplacement-new-size-6.C: Adjust for
default_packed targets.
---
  gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C | 12 ++--
  gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C | 14 +++---
  gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C | 22 +++---
  3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C 
b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C
index cec83163dbe7..1ad5c2da75af 100644
--- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C
+++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C
@@ -67,8 +67,8 @@ void fBx (BAx *pbx, BAx &rbx)
  {
BAx bax;
// The uninitialized flexible array takes up the bytes of padding.
-  new (bax.ax.a) char;
-  new (bax.ax.a) Int16;
+  new (bax.ax.a) char; // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax.ax.a) Int16;// { dg-warning "placement" "" { target 
default_packed } }
new (bax.ax.a) Int32;// { dg-warning "placement" }
  
new (pbx->ax.a) char;

@@ -86,10 +86,10 @@ void fBx1 ()
static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ {} } };
  
// The empty flexible array takes up the bytes of padding.

-  new (bax1.ax.a) char;
-  new (bax1.ax.a) char[2];
-  new (bax1.ax.a) Int16;
-  new (bax1.ax.a) char[3];
+  new (bax1.ax.a) char; // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) char[2];  // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) Int16;// { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) char[3];  // { dg-warning "placement" "" { target 
default_packed } }
new (bax1.ax.a) char[4];  // { dg-warning "placement" }
new (bax1.ax.a) Int32;// { dg-warning "placement" }
  }
diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C 
b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C
index e5fdfe1f603e..a4de2b04629c 100644
--- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C
+++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C
@@ -125,9 +125,9 @@ void fBx (BAx *pbx, BAx &rbx)
  {
BAx bax;
// The uninitialized flexible array takes up the bytes of padding.
-  new (bax.ax.a) char;
-  new (bax.ax.a) Int16;
-  new (bax.ax.a) char[3];
+  new (bax.ax.a) char;// { dg-warning "placement" "" { target 
default_packed } }
+  new (bax.ax.a) Int16;   // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax.ax.a) char[3]; // { dg-warning "placement" "" { target 
default_packed } }
new (bax.ax.a) Int32;   // { dg-warning "placement" }
new (bax.ax.a) char[4]; // { dg-warning "placement" }
new (bax.ax.a) char[5]; // { dg-warning "placement" }
@@ -147,10 +147,10 @@ void fBx1 ()
static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ {} } };
  
// The empty flexible array takes up the bytes of padding.

-  new (bax1.ax.a) char;
-  new (bax1.ax.a) char[2];
-  new (bax1.ax.a) Int16;
-  new (bax1.ax.a) char[3];
+  new (bax1.ax.a) char;   // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) char[2];// { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) Int16;  // { dg-warning "placement" "" { target 
default_packed } }
+  new (bax1.ax.a) char[3];// { dg-warning "placement" "" { target 
default_packed } }
new (bax1.ax.a) Int32;  // { dg-warning "placement" }
new (bax1.ax.a) char[4];// { dg-warning "placement" }
new (bax1.ax.a) char[5];// { dg-warning "placement" }
diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C 
b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C
index 5eb63d23b477..86a18fa08e63 100644
--- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C
+++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C
@@ -18,9 +18,9 @@ void fBx1 ()
static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ { 3 } } }; // { dg-error 
"initialization of flexible array member in a nested context" }
  
// The first three bytes of the flexible array member live in the padding.

-  new (bax1.ax.a) char;
-  new (bax1.ax.a) char[2];
-  new (bax1.ax.a) Int16;
+  new (bax1.ax.a) char; // { dg-warning "placement" "" { target 
default_p

Re: Committed: g++.dg/warn/Wplacement-new-size-1.C, -2, -6: Fix for default_packed targets

2021-02-22 Thread Hans-Peter Nilsson via Gcc-patches
> From: Martin Sebor 
> Date: Tue, 23 Feb 2021 02:07:59 +0100

> On 2/22/21 5:48 PM, Hans-Peter Nilsson wrote:
> > Looking at commit de05c19d5fd6, that adjustment to these
> > tests apparently assumed that the testsuite is run (only) on
> > targets where structure memory layout has padding as per
> > "natural alignment".  For cris-elf, a target with no padding
> > in structure memory layout, these tests have been failing
> > since that commit.
> > 
> > Tested cris-elf and x86_64-linux, committed as obvious.
> 
> Thanks for the heads up.  Would declaring the structs with attribute
> packed work?

That seems likely, yes.

brgds, H-P


Re: [PATCH] PR fortran/99206 - [11 Regression] ICE in add_init_expr_to_sym, at fortran/decl.c:1980

2021-02-22 Thread Jerry DeLisle

Hi Harald, Looks Good to Me.

OK

Thanks

On 2/22/21 1:14 PM, Harald Anlauf via Fortran wrote:

Dear all,

when simplifying the RESHAPE intrinsic we lost the string length when
the resulting array had size zero.  The attached patch sets the resulting
length from the source.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald


PR fortran/99206 - ICE in add_init_expr_to_sym, at fortran/decl.c:1980

Make sure that the string length is properly set during simplification
even when the resulting array is zero-sized.

gcc/fortran/ChangeLog:

PR fortran/99206
* simplify.c (gfc_simplify_reshape): Set string length for
character arguments.

gcc/testsuite/ChangeLog:

PR fortran/99206
* gfortran.dg/reshape_zerosize_4.f90: New test.





Re: [PATCH] handle bad __dynamic_cast more gracefully (PR 99074)

2021-02-22 Thread Jason Merrill via Gcc-patches

On 2/22/21 8:00 PM, Martin Sebor wrote:

On 2/22/21 4:08 PM, Jason Merrill wrote:

On 2/13/21 7:31 PM, Martin Sebor wrote:

The test case in PR 99074 invokes dynamic_cast with the this pointer
in a non-static member function called on a null pointer.  The call
is, of course, undefined and other different circumstances would be
diagnosed by -Wnonnull.   Unfortunately, in the test case, the null
pointer is the result of inlining and constant propagation and so
detected neither by the front end -Wnonnull nor by the middle end.
The program ends up passing it to __dynamic_cast() which then
crashes at runtime (again, not surprising for undefined behavior.

However, the reporter says the code behaved gracefully (didn't crash)
when compiled with GCC 4.8, and in my tests it also doesn't crash
when compiled with Clang or ICC.  I looked to see if it's possible
to do better and it seems it is.

The attached patch improves things by changing __dynamic_cast to
fail by returning null when the first argument is null, and also


This hunk is OK.


by declaring __dynamic_cast with attribute nonnull so that invalid
calls to it with a constant null pointer can be detected at compile
time.


This is not; dynamic_cast is specified to return null for a null operand.

"If v is a null pointer value, the result is a null pointer value."

The undefined behavior is the call to _to_object, not the dynamic_cast.


Yes, of course.  Just to be clear, in case it's not from the patch,
it adds nonnull to the __dynamic_cast() function in libsupc++ which
(if I read the comment right) documents nonnull-ness as its
precondition.  The function should never be called with a null
pointer except in buggy code like in the PR.


True.  So the attribute is technically correct, but the resulting 
warning is misleading, since the user shouldn't need to know anything 
about the internal implementation of dynamic_cast, and the user-visible 
feature doesn't have that constraint.



I don't think this
is the most elegant way to diagnose the user bug but I also couldn't
think of anything better.  Do you have any suggestions?  (I ask in
part because for GCC 12 I'd like to see about issuing the warning
requested on PR 12277.)


If we can manage to warn about a null argument to __dynamic_cast, I'd 
think we should also be able to warn about a null 'this' argument to 
_to_object.



WRT to the documentation of __dynamic_cast(), I didn't remove
the comment in the patch that mentions the precondition because
of the new warning.  If we want to consider null pointers valid
input to the function it seems we should update the comment.  Do
you agree?


The comment is from the ABI 
(https://itanium-cxx-abi.github.io/cxx-abi/abi.html#dynamic_cast-algorithm), 
we shouldn't change it.



The comment is below.  I assume SUB refers to SRC_PTR.

/* sub: source address to be adjusted; nonnull, and since the
  *  source object is polymorphic, *(void**)sub is a virtual pointer.
  * src: static type of the source object.
  * dst: destination type (the "T" in "dynamic_cast(v)").
  * src2dst_offset: a static hint about the location of the
  *    source subobject with respect to the complete object;
  *    special negative values are:
  *   -1: no hint
  *   -2: src is not a public base of dst
  *   -3: src is a multiple public base type but never a
  *   virtual base type
  *    otherwise, the src type is a unique public nonvirtual
  *    base type of dst at offset src2dst_offset from the
  *    origin of dst.  */
extern "C" void *
__dynamic_cast (const void *src_ptr,    // object started from
     const __class_type_info *src_type, // type of the 
starting object

     const __class_type_info *dst_type, // desired target type
     ptrdiff_t src2dst) // how src and dst are related



Since the test case is undefined it seems borderline whether this
can strictly be considered a regression, even if some previous
releases handled it more gracefully.


Indeed.  But handling the null case in __dynamic_cast as well as in 
the compiler seems harmless enough.


Okay.


I guess it's a question of which behavior is preferable in (this) case 
of undefined behavior.  Doing something reasonable is in general a valid 
choice.  OTOH, the crash revealed the undefined behavior in the testcase.


Jason



Re: [PATCH, constexpr, coroutines ] Generic lambda coroutines cannot be constexpr [PR96251].

2021-02-22 Thread Jason Merrill via Gcc-patches

On 2/22/21 3:59 PM, Iain Sandoe wrote:

Hi,

PR96251 (and dups) is a rejects-valid bug against coroutines (but I’m
not sure that the problem is really there).

  * coroutines cannot be constexpr.

* Part of the way that the coroutine implementation works is a
consequence of the "lazy discovery of coroutine-ness”;  whenever
we first encounter a coroutine keyword...

.. we mark the function as a coroutine, and then we can deal with
diagnostics etc. that change under these circumstances.  This
marking of the function is independent of whether keyword
expressions are type-dependent or not.

* when we encounter a coroutine keyword in a constexpr context
we error.

* the constexpr machinery also has been taught that coroutine
expressions should cause constexpr-ness to be rejected when
checking for "potentially constexpr".

So why is there a problem?

* lambdas are implicitly constexpr from C++17.

* the constexpr machinery tends to bail early *with a conservative
   assumption that the expression is potentially constexpr* when it
   finds a type-dependent expression - without evaluating sub-
   expressions to see if they are valid (thus missing coroutine exprs.
   in such positions).

The combination of these ^^ two things, means that for many generic
lambdas with non-trivial bodies - we then enter instantiation with a
constexpr type-dependent coroutine (which is a Thing that Should not
Be).  As soon as we try to tsub any coroutine keyword expression
encountered, we error out.

* I was not able to see any way in which the instantiation process
   could be made to bail in this case and re-try for non-constexpr.


Many of the other places that set cp_function_chain->invalid_constexpr 
condition their errors on !is_instantiation_of_constexpr, which should 
also fix this testcase.



* Nor able to see somewhere else where that decision could be made.

^^ these two points reflect that I'm not familiar with the constexpr
     machinery.

* Proposed solution.

Since coroutines cannot be constexpr (not even sure what that would
mean in any practicable implementation).  The fix proposed is to
reject constexpr-ness for coroutine functions as early as possible.

* a) We can prevent a generic lambda coroutine from being converted
   to constexpr in finish_function ().  Likewise, via the tests below, 
we can

   avoid it for regular lambdas.

   b) We can also reject coroutine function decls early in both
   is_valid_constexpr_fn () and potential_constant_expression_1 ().

So - is there some alternate solution that would be better?



(in some ways it seems pointless to delay rejection of a coroutine
  function to some later point).

OTOH, I suppose that one could have some weird code where coroutine
coroutine keywords only appeared in a non-constexpr paths of the code
- but our current lowering is not prepared for such a circumstance.

AFAIU the standard, there's no dispensation from the "cannot be
constexpr" for such a code construction .. but ICBW.

In any event, the cases reported (and fixed by this patch) are not trying
anything so fiendishly clever.

Tested on x86_64-darwin.

Suggestions?



OK for master (after wider testing)?
thanks
Iain

= 

coroutines cannot be constexpr, but generic lambdas (from C++17)
are implicitly assumed to be, and the processing of type-
dependent lambda bodies often terminates before any coroutine
expressions are encountered.  For example, the PR notes cases
where the coro expressions are hidden by type-dependent for or
switch expressions.

The solution proposed is to deny coroutine generic lambdas.  We also
tighten up the checks for is_valid_constexpr_fn() and do an early
test for coroutine functions in checking for potential constant
expressions.

gcc/cp/ChangeLog:

 PR c++/96251
 * constexpr.c (is_valid_constexpr_fn): Test for a
 coroutine before the chekc for lambda.
 (potential_constant_expression_1): Disallow coroutine
 function decls.
 * decl.c (finish_function): Do not mark generic
 coroutine lambda templates as constexpr.

gcc/testsuite/ChangeLog:

 PR c++/96251
 * g++.dg/coroutines/pr96251.C: New test.
---
  gcc/cp/constexpr.c    |  7 ++-
  gcc/cp/decl.c |  2 +-
  gcc/testsuite/g++.dg/coroutines/pr96251.C | 22 ++
  3 files changed, 29 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr96251.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 377fe322ee8..036bf04efa9 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -210,7 +210,9 @@ is_valid_constexpr_fn (tree fun, bool complain)
    }
  }

-  if (LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)) && cxx_dialect < cxx17)
+  if (DECL_COROUTINE_P (fun))
+    ret = false;
+  else if (LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)) && cxx_dialect < cxx17)
  {
    ret = false;
    if (complain)
@@ -7827,6 +7829,9 @@ potential_constant_expression_1 (tree t, bool 
want_rval

Re: [PATCH] c++: Micro-optimize instantiation_dependent_expression_p

2021-02-22 Thread Jason Merrill via Gcc-patches

On 2/12/21 6:36 PM, Patrick Palka wrote:

This makes instantiation_dependent_expression_p avoid checking
potential_constant_expression when processing_template_decl isn't set
(and hence when value_dependent_expression_p is definitely false).


OK.


gcc/cp/ChangeLog:

* pt.c (instantiation_dependent_expression_p): Check
processing_template_decl before checking
potential_constant_expression.
---
  gcc/cp/pt.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index dd89dd6f896..5102bf02d0f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -27519,7 +27519,8 @@ bool
  instantiation_dependent_expression_p (tree expression)
  {
return (instantiation_dependent_uneval_expression_p (expression)
- || (potential_constant_expression (expression)
+ || (processing_template_decl
+ && potential_constant_expression (expression)
  && value_dependent_expression_p (expression)));
  }
  





Re: [PATCH v2 0/5] RISC-V big endian support

2021-02-22 Thread Kito Cheng via Gcc-patches
Hi Marcus:

Thanks for the quick update, I am testing your V2 patch now, the result seems
really great now, some of fail case seems like not cause by
big-endian patch, I am reviewing and comparing the fail case with the
little-endian build.

> Should I make a PR against riscv-newlib on GitHub, or would you prefer
> some other process for dealing with newlib fixes related to these
> patches?

Could you send to newlib mailing list directly, ideally riscv-newlib
just a buffer
we don't want to hold any patch there as possible.
https://sourceware.org/mailman/listinfo/newlib/




On Sun, Feb 21, 2021 at 8:17 AM Marcus Comstedt  wrote:
>
> This is an update to the patch series for big endian RISC-V support.
>
> Changes since last version:
>
>   * Added documentation of -mbig-endian and -mlittle-endian
>
>   * New patch: Fix soft-fp endianness setting
>
>   * New patch: Fix trampoline generation on big endian
>
>   * New patch: Update the shift-shift-5.c testcase to work correctly
> on big endian
>
> With these changes, and two fixes to newlib (setting correct floating
> point byteorder, and an update to the handcoded assembler for strcmp),
> I'm now down to
>
>= Summary of gcc testsuite =
> | # of unexpected case / # of unique unexpected 
> case
> |  gcc |  g++ | gfortran |
>  rv64gc/   lp64/ medlow |   14 / 8 |   39 /10 |  - |
>
> and of these only two failures do not also occur for little endian:
>
> FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
> FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi
>
> I'm quite puzzled why these two testcases give different results with
> -mbig-endian compared to with -mlittle-endian though, since they only
> deal with register-to-register operations so the memory model should be
> completely irrelevant...
>
>
>   // Marcus
>
>
>


[PATCH v3 1/2] MIPS: unaligned load: use SImode for SUBREG if OK (PR98996)

2021-02-22 Thread YunQiang Su
It is found by ada s-pack96.adb ftbfs, due to 96bit load: 96 = 64 + 32.
While the 32bit pair of l r is mark as SUBREG, so they are
not in SImode, make it fail to find suitable insn.

gcc/ChangeLog:

   * config/mips/mips.c (mips_expand_ext_as_unaligned_load):
   If TARGET_64BIT and dest is SUBREG, we check the width, if it
   equal to SImode, we use SImode operation, just like what we are
   doing for REG one.
---
 gcc/ChangeLog  | 8 
 gcc/config/mips/mips.c | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3bd877243b7..e86d7817d9d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2021-02-23  YunQiang Su  
+
+   PR target/98996
+   * config/mips/mips.c (mips_expand_ext_as_unaligned_load):
+   If TARGET_64BIT and dest is SUBREG, we check the width, if it
+   equal to SImode, we use SImode operation, just like what we are
+   doing for REG one.
+
 2021-02-22  Kyrylo Tkachov  
 
* config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants):
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 8bd2d29552e..e901d860c3d 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -8400,7 +8400,7 @@ mips_expand_ext_as_unaligned_load (rtx dest, rtx src, 
HOST_WIDE_INT width,
   /* If TARGET_64BIT, the destination of a 32-bit "extz" or "extzv" will
  be a DImode, create a new temp and emit a zero extend at the end.  */
   if (GET_MODE (dest) == DImode
-  && REG_P (dest)
+  && (REG_P (dest) || (SUBREG_P(dest) && !MEM_P(SUBREG_REG(dest
   && GET_MODE_BITSIZE (SImode) == width)
 {
   dest1 = dest;
-- 
2.20.1



[PATCH v3 2/2] ada: add 128bit operation to MIPS N32 and N64

2021-02-22 Thread YunQiang Su
For MIPS N64 and N32:
  add GNATRTL_128BIT_PAIRS to LIBGNAT_TARGET_PAIRS
  add GNATRTL_128BIT_OBJS to EXTRA_GNATRTL_NONTASKING_OBJS

gcc/ada/ChangeLog:
PR ada/98996
* Makefile.rtl:  add 128Bit operation file to MIPS
N64 and N32 (LIBGNAT_TARGET_PAIRS, EXTRA_GNATRTL_NONTASKING_OBJS).
---
 gcc/ada/ChangeLog|  6 ++
 gcc/ada/Makefile.rtl | 12 
 2 files changed, 18 insertions(+)

diff --git a/gcc/ada/ChangeLog b/gcc/ada/ChangeLog
index 52faefaa2ae..fb09986bde0 100644
--- a/gcc/ada/ChangeLog
+++ b/gcc/ada/ChangeLog
@@ -1,3 +1,9 @@
+2021-02-23  YunQiang Su  
+
+   PR ada/98996
+   * Makefile.rtl:  add 128Bit operation file to MIPS
+   N64 and N32 (LIBGNAT_TARGET_PAIRS, EXTRA_GNATRTL_NONTASKING_OBJS).
+
 2021-02-12  Arnaud Charlet  
 
* repinfo.ads, repinfo.adb (*SO_Ref*): Restore.
diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
index 35faf13ea46..987eff0abba 100644
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -2311,6 +2311,18 @@ ifeq ($(strip $(filter-out mips% linux%,$(target_cpu) 
$(target_os))),)
   s-tpopsp.adb

Re: [PATCH] c++: Private parent access check for using decls [PR19377]

2021-02-22 Thread Jason Merrill via Gcc-patches

On 2/15/21 7:45 AM, Anthony Sharp wrote:

PARENT_BINFO, has private access to DECL.  Examine certain sepcial cases


"special"


I did try the name
lookup as Jason suggested but, as I say, in the case of overloaded
functions it returns multiple values, so it didn't help in determining
what DECL a USING_DECL inherited.


I was thinking you could walk through the overload set to see if it 
contains DECL.


Jason



Re: [PATCH] c++: Fix folding of non-dependent BASELINKs [PR95468]

2021-02-22 Thread Jason Merrill via Gcc-patches

On 2/13/21 10:21 PM, Patrick Palka wrote:

On Fri, 12 Feb 2021, Patrick Palka wrote:


Here, the problem ultimately seems to be that tsubst_copy_and_build,
when called with empty args as we do during non-dependent expression
folding, doesn't touch BASELINKs at all: it delegates to tsubst_copy
which then immediately exits early due to the empty args.  This means
that the CAST_EXPR int(1) in the BASELINK A::condition never
gets folded (as part of folding of the overall CALL_EXPR), which later
causes us to crash when performing overload resolution of the rebuilt
CALL_EXPR (which is in terms of this still-templated BASELINK).

This doesn't happen when condition() is a namespace-scope function
because then condition is represented as a TEMPLATE_ID_EXPR
rather than a BASELINK, which does get handled directly from
tsubst_copy_and_build.

This patch fixes this issue by having tsubst_copy_and_build handle
BASELINK directly rather than delegating to tsubst_copy, so that it
processes BASELINKS even when args is empty.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

PR c++/95468
* pt.c (tsubst_copy_and_build) : New case, copied
over from tsubst_copy.

gcc/testsuite/ChangeLog:

PR c++/95468
* g++.dg/template/non-dependent15.C: New test.
---
  gcc/cp/pt.c |  5 +
  gcc/testsuite/g++.dg/template/non-dependent15.C | 12 
  2 files changed, 17 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent15.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 5102bf02d0f..5b2f43dc5c1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -19856,6 +19856,11 @@ tsubst_copy_and_build (tree t,
  case SCOPE_REF:
RETURN (tsubst_qualified_id (t, args, complain, in_decl, /*done=*/true,
  /*address_p=*/false));
+
+case BASELINK:
+  return tsubst_baselink (t, current_nonlambda_class_type (),
+ args, complain, in_decl);


Oops, this should use the RETURN macro instead of a bare 'return' so
that input_location gets properly restored on function exit.


Indeed.


And it looks like there's an existing instance of this bug in the
LAMBDA_EXPR case of tsubst_copy_and_build.  Perhaps it's a good time to
replace these RETURN macros with the equivalent use of iloc_sentinel,
like so?  Bootstrapped and tested on x86_64-pc-linux-gnu.


Both patches are OK.


-- >8 --

Subject: [PATCH] c++: Replace RETURN macros with iloc_sentinel

This replaces the C-era RETURN macro idiom used by some of the tsubsting
functions with an iloc_sentinel declared at the start of each function.

gcc/cp/ChangeLog:

* pt.c (tsubst_decl): Delete the RETURN macro, and replace its
uses with a plain 'return'.  Set up an iloc_sentinel at the
start of the function to set and restore input_location.
(tsubst_expr): Likewise.  Remove redundant break statements that
immediately follow a return.
(tsubst_copy_and_build): Likewise.  Remove 'retval' local
variable.  Add gcc_unreachable to the end of the function.
---
  gcc/cp/pt.c | 316 
  1 file changed, 145 insertions(+), 171 deletions(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 5b2f43dc5c1..32d1759258c 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -14414,15 +14414,12 @@ enclosing_instantiation_of (tree otctx)
  static tree
  tsubst_decl (tree t, tree args, tsubst_flags_t complain)
  {
-#define RETURN(EXP) do { r = (EXP); goto out; } while(0)
-  location_t saved_loc;
tree r = NULL_TREE;
tree in_decl = t;
hashval_t hash = 0;
  
-  /* Set the filename and linenumber to improve error-reporting.  */

-  saved_loc = input_location;
-  input_location = DECL_SOURCE_LOCATION (t);
+  /* Set the source position to improve error-reporting.  */
+  iloc_sentinel ils (DECL_SOURCE_LOCATION (t));
  
switch (TREE_CODE (t))

  {
@@ -14453,7 +14450,7 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
  if (spec
  && TREE_CODE (spec) == PARM_DECL
  && TREE_CODE (TREE_TYPE (spec)) != TYPE_PACK_EXPANSION)
-  RETURN (spec);
+ return spec;
  
  /* Expand the TYPE_PACK_EXPANSION that provides the types for

 the parameters in this function parameter pack.  */
@@ -14466,8 +14463,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
  /* Zero-length parameter packs are boring. Just substitute
 into the chain.  */
if (len == 0 && !cp_unevaluated_operand)
-  RETURN (tsubst (TREE_CHAIN (t), args, complain,
- TREE_CHAIN (t)));
+ return tsubst (TREE_CHAIN (t), args, complain,
+TREE_CHAIN (t));
}
  else
{
@@ -14

[PATCH,rs6000] [v2] Optimize pcrel access of globals

2021-02-22 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

This patch implements a RTL pass that looks for pc-relative loads of the
address of an external variable using the PCREL_GOT relocation and a
single load or store that uses that external address.

Produced by a cast of thousands:
 * Michael Meissner
 * Peter Bergner
 * Bill Schmidt
 * Alan Modra
 * Segher Boessenkool
 * Aaron Sawdey

This incorporates the changes requested in Segher's review. A few things I
did not change were the insn-at-a-time scan that could be done with DF, and
I did not change to using statistics.[ch] for the counters struct. I did try
to improve the naming, and rewrote a number of comments to make them consistent
with the code, and generally tried to make things more readable.

OK for trunk if bootstrap/regtest passes?

Thanks!
   Aaron

gcc/ChangeLog:

* config.gcc: Add pcrel-opt.o.
* config/rs6000/pcrel-opt.c: New file.
* config/rs6000/pcrel-opt.md: New file.
* config/rs6000/predicates.md: Add d_form_memory predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_PCREL_OPT.
* config/rs6000/rs6000-passes.def: Add pass_pcrel_opt.
* config/rs6000/rs6000-protos.h: Add reg_to_non_prefixed(),
pcrel_opt_valid_mem_p(), output_pcrel_opt_reloc(),
and make_pass_pcrel_opt().
* config/rs6000/rs6000.c (reg_to_non_prefixed): Make global.
(rs6000_option_override_internal): Add pcrel-opt.
(rs6000_delegitimize_address): Support pcrel-opt.
(rs6000_opt_masks): Add pcrel-opt.
(pcrel_opt_valid_mem_p): New function.
(reg_to_non_prefixed): Make global.
(rs6000_asm_output_opcode): Reset next_insn_prefixed_p.
(output_pcrel_opt_reloc): New function.
* config/rs6000/rs6000.md (loads_extern_addr): New attr.
(pcrel_extern_addr): Set loads_extern_addr.
Add include for pcrel-opt.md.
* config/rs6000/rs6000.opt: Add -mpcrel-opt.
* config/rs6000/t-rs6000: Add rules for pcrel-opt.c and
pcrel-opt.md.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pcrel-opt-inc-di.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-df.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-di.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-hi.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-qi.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-sf.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-si.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-vector.c: New test.
* gcc.target/powerpc/pcrel-opt-st-df.c: New test.
* gcc.target/powerpc/pcrel-opt-st-di.c: New test.
* gcc.target/powerpc/pcrel-opt-st-hi.c: New test.
* gcc.target/powerpc/pcrel-opt-st-qi.c: New test.
* gcc.target/powerpc/pcrel-opt-st-sf.c: New test.
* gcc.target/powerpc/pcrel-opt-st-si.c: New test.
* gcc.target/powerpc/pcrel-opt-st-vector.c: New test.
---
 gcc/config.gcc|   8 +-
 gcc/config/rs6000/pcrel-opt.md| 399 
 gcc/config/rs6000/predicates.md   |  21 +
 gcc/config/rs6000/rs6000-cpus.def |   2 +
 gcc/config/rs6000/rs6000-passes.def   |   8 +
 gcc/config/rs6000/rs6000-pcrel-opt.c  | 924 ++
 gcc/config/rs6000/rs6000-protos.h |   4 +
 gcc/config/rs6000/rs6000.c| 111 ++-
 gcc/config/rs6000/rs6000.md   |   8 +-
 gcc/config/rs6000/rs6000.opt  |   4 +
 gcc/config/rs6000/t-rs6000|   7 +-
 .../gcc.target/powerpc/pcrel-opt-inc-di.c |  17 +
 .../gcc.target/powerpc/pcrel-opt-ld-df.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-ld-di.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-hi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-qi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-sf.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-si.c  |  41 +
 .../gcc.target/powerpc/pcrel-opt-ld-vector.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-df.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-di.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-hi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-st-qi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-st-sf.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-si.c  |  41 +
 .../gcc.target/powerpc/pcrel-opt-st-vector.c  |  36 +
 26 files changed, 2054 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/rs6000/pcrel-opt.md
 create mode 100644 gcc/config/rs6000/rs6000-pcrel-opt.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c
 create mode 

[COMMITED][BACKPORT GCC10] aarch64: Add cpu cost tables for A64FX

2021-02-22 Thread Qian Jianhua
This is a backport of adding cost tables for A64FX.
Bootstrapped and tested on aarch64-none-linux-gnu.

2021-02-23 Qian Jianhua 

gcc/ChangeLog:

* config/aarch64/aarch64-cost-tables.h (a64fx_extra_costs): New.
* config/aarch64/aarch64.c (a64fx_addrcost_table): New.
(a64fx_regmove_cost, a64fx_vector_cost): New.
(a64fx_tunings): Use the new added cost tables.
---
 gcc/config/aarch64/aarch64-cost-tables.h | 103 +++
 gcc/config/aarch64/aarch64.c |  53 +++-
 2 files changed, 152 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 8a98bf4278c..c6805717f6e 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -541,4 +541,107 @@ const struct cpu_cost_table tsv110_extra_costs =
   }
 };
 
+const struct cpu_cost_table a64fx_extra_costs =
+{
+  /* ALU */
+  {
+0, /* arith.  */
+0, /* logical.  */
+0, /* shift.  */
+0, /* shift_reg.  */
+COSTS_N_INSNS (1), /* arith_shift.  */
+COSTS_N_INSNS (1), /* arith_shift_reg.  */
+COSTS_N_INSNS (1), /* log_shift.  */
+COSTS_N_INSNS (1), /* log_shift_reg.  */
+0, /* extend.  */
+COSTS_N_INSNS (1), /* extend_arith.  */
+0, /* bfi.  */
+0, /* bfx.  */
+0, /* clz.  */
+0, /* rev.  */
+0, /* non_exec.  */
+true   /* non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (4),   /* simple.  */
+  COSTS_N_INSNS (4),   /* flag_setting.  */
+  COSTS_N_INSNS (4),   /* extend.  */
+  COSTS_N_INSNS (5),   /* add.  */
+  COSTS_N_INSNS (5),   /* extend_add.  */
+  COSTS_N_INSNS (18)   /* idiv.  */
+},
+/* MULT DImode */
+{
+  COSTS_N_INSNS (4),   /* simple.  */
+  0,   /* flag_setting (N/A).  */
+  COSTS_N_INSNS (4),   /* extend.  */
+  COSTS_N_INSNS (5),   /* add.  */
+  COSTS_N_INSNS (5),   /* extend_add.  */
+  COSTS_N_INSNS (26)   /* idiv.  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (4), /* load.  */
+COSTS_N_INSNS (4), /* load_sign_extend.  */
+COSTS_N_INSNS (5), /* ldrd.  */
+COSTS_N_INSNS (4), /* ldm_1st.  */
+1, /* ldm_regs_per_insn_1st.  */
+2, /* ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (4), /* loadf.  */
+COSTS_N_INSNS (4), /* loadd.  */
+COSTS_N_INSNS (5), /* load_unaligned.  */
+0, /* store.  */
+0, /* strd.  */
+0, /* stm_1st.  */
+1, /* stm_regs_per_insn_1st.  */
+2, /* stm_regs_per_insn_subsequent.  */
+0, /* storef.  */
+0, /* stored.  */
+0, /* store_unaligned.  */
+COSTS_N_INSNS (1), /* loadv.  */
+COSTS_N_INSNS (1)  /* storev.  */
+  },
+  {
+/* FP SFmode */
+{
+  COSTS_N_INSNS (6),  /* div.  */
+  COSTS_N_INSNS (1),   /* mult.  */
+  COSTS_N_INSNS (1),   /* mult_addsub.  */
+  COSTS_N_INSNS (2),   /* fma.  */
+  COSTS_N_INSNS (1),   /* addsub.  */
+  COSTS_N_INSNS (1),   /* fpconst.  */
+  COSTS_N_INSNS (1),   /* neg.  */
+  COSTS_N_INSNS (1),   /* compare.  */
+  COSTS_N_INSNS (2),   /* widen.  */
+  COSTS_N_INSNS (2),   /* narrow.  */
+  COSTS_N_INSNS (2),   /* toint.  */
+  COSTS_N_INSNS (2),   /* fromint.  */
+  COSTS_N_INSNS (2)/* roundint.  */
+},
+/* FP DFmode */
+{
+  COSTS_N_INSNS (11),  /* div.  */
+  COSTS_N_INSNS (1),   /* mult.  */
+  COSTS_N_INSNS (1),   /* mult_addsub.  */
+  COSTS_N_INSNS (2),   /* fma.  */
+  COSTS_N_INSNS (1),   /* addsub.  */
+  COSTS_N_INSNS (1),   /* fpconst.  */
+  COSTS_N_INSNS (1),   /* neg.  */
+  COSTS_N_INSNS (1),   /* compare.  */
+  COSTS_N_INSNS (2),   /* widen.  */
+  COSTS_N_INSNS (2),   /* narrow.  */
+  COSTS_N_INSNS (2),   /* toint.  */
+  COSTS_N_INSNS (2),   /* fromint.  */
+  COSTS_N_INSNS (2)/* roundint.  */
+}
+  },
+  /* Vector */
+  {
+COSTS_N_INSNS (1)  /* alu.  */
+  }
+};
+
 #endif
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9bfc99ab090..46fe6835506 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -464,6 +464,22 @@ static const struct cpu_addrcost_table 
qdf24xx_addrcost_table =
   2, /* imm_offset  */
 };
 
+static const struct 

Re: [PATCH v2 0/5] RISC-V big endian support

2021-02-22 Thread Kito Cheng via Gcc-patches
Seems like only 3 fail are related to big-endian, you don't need to
worry about other fails.

FAIL: gcc.c-torture/execute/string-opt-5.c
FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi

On Tue, Feb 23, 2021 at 10:38 AM Kito Cheng  wrote:
>
> Hi Marcus:
>
> Thanks for the quick update, I am testing your V2 patch now, the result seems
> really great now, some of fail case seems like not cause by
> big-endian patch, I am reviewing and comparing the fail case with the
> little-endian build.
>
> > Should I make a PR against riscv-newlib on GitHub, or would you prefer
> > some other process for dealing with newlib fixes related to these
> > patches?
>
> Could you send to newlib mailing list directly, ideally riscv-newlib
> just a buffer
> we don't want to hold any patch there as possible.
> https://sourceware.org/mailman/listinfo/newlib/
>
>
>
>
> On Sun, Feb 21, 2021 at 8:17 AM Marcus Comstedt  wrote:
> >
> > This is an update to the patch series for big endian RISC-V support.
> >
> > Changes since last version:
> >
> >   * Added documentation of -mbig-endian and -mlittle-endian
> >
> >   * New patch: Fix soft-fp endianness setting
> >
> >   * New patch: Fix trampoline generation on big endian
> >
> >   * New patch: Update the shift-shift-5.c testcase to work correctly
> > on big endian
> >
> > With these changes, and two fixes to newlib (setting correct floating
> > point byteorder, and an update to the handcoded assembler for strcmp),
> > I'm now down to
> >
> >= Summary of gcc testsuite =
> > | # of unexpected case / # of unique unexpected 
> > case
> > |  gcc |  g++ | gfortran |
> >  rv64gc/   lp64/ medlow |   14 / 8 |   39 /10 |  - |
> >
> > and of these only two failures do not also occur for little endian:
> >
> > FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
> > FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi
> >
> > I'm quite puzzled why these two testcases give different results with
> > -mbig-endian compared to with -mlittle-endian though, since they only
> > deal with register-to-register operations so the memory model should be
> > completely irrelevant...
> >
> >
> >   // Marcus
> >
> >
> >


Re: [PATCH v2 0/5] RISC-V big endian support

2021-02-22 Thread Marcus Comstedt


Hi Kito,

Kito Cheng  writes:

> FAIL: gcc.c-torture/execute/string-opt-5.c
> FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
> FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi

string-opt-5.c is one of the newlib issues I mentioned (handcoded
assembler for strcmp which assumed LE (it was intended to #error out
on BE, but used "BYTE_ORDER" instead of "__BYTE_ORDER__", so the check
never worked)).  I'll send the fixes later today.

The shift-and tests don't generate incorrect code or anything, but
it's still puzzling why the generated code is different from with
-mlittle-endian.


  // Marcus




Re: [gcc-12 PATCH] ira: Correct HONOR_REG_ALLOC_ORDER usage

2021-02-22 Thread Richard Biener
On Mon, 22 Feb 2021, Uros Bizjak wrote:

> The intention of HONOR_REG_ALLOC_ORDER is to ensure that IRA allocates
> registers in the order given by REG_ALLOC_ORDER.  However in
> ira_better_spill_reload_regno_p, there is still a place where the
> calculation depends on the presence of REG_ALLOC_ORDER, ignoring
> HONOR_REG_ALLOC_ORDER macro altogether.  The patch uses the correct macro
> at this place.
> 
> On the other hand, assign_hard_reg function respects HONOR_REG_ALLOC_ORDER,
> but expects this macro to return 1 to avoid internal cost calculations.
> As the macro is defined to 0 by default, it is expected that targets redefine
> HONOR_REG_ALLOC_ORDER to return nonzero value, even if REG_ALLOC_ORDER
> is defined.  This approach is prone to errors, so the patch defines
> HONOR_REG_ALLOC_ORDER to 1 by default if REG_ALLOC_ORDER is defined.
> 
> 2021-02-22  Uroš Bizjak  
> 
> gcc/
> * defaults.h (HONOR_REG_ALLOC_ORDER): If not defined,
> define to 1 if REG_ALLOC_ORDER is defined.
> * doc/tm.texi.in (HONOR_REG_ALLOC_ORDER):
> Describe new default definition.
> * doc/tm.texi: Regenerate.
> * ira-color.c (ira_better_spill_reload_regno_p):
> Use HONOR_REG_ALLOC_ORDER instead of REG_ALLOC_ORDER
> to determine better spill reload regno.
> 
> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> OK for gcc-12 when it opens?

OK in case Vlad doesn't have a better suggestion or further comments.

Do you have an idea for how many targets the changed default is an
actual change?

Thanks,
Richard.

> Uros.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)