date:20191119


One potential improvement is to enable the heuristics
for ENABLE_GC_CHECKING. The macro is about sanity checking
and poisoning of GGC memory. Which seems to me completely
independent to setting of the default parameters.

Ready to be installed?
Thanks,
Martin
>From e0eab3674caedfcd5b332a335f513c501ab8dfb4 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 19 Nov 2019 09:09:05 +0100
Subject: [PATCH] Enable init_ggc_heuristics for ENABLE_GC_CHECKING.

gcc/ChangeLog:

2019-11-19  Martin Liska  

	* opts.c: Enable init_ggc_heuristics for situation
	when ENABLE_GC_CHECKING is enabled.
---
 gcc/opts.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/opts.c b/gcc/opts.c
index 7e363e5c1d6..3c197bc2f76 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -275,7 +275,7 @@ init_opts_obstack (void)
   gcc_obstack_init (&opts_obstack);
 }
 
-#if !defined ENABLE_GC_CHECKING && !defined ENABLE_GC_ALWAYS_COLLECT
+#if !defined ENABLE_GC_ALWAYS_COLLECT
 
 /* Modify the bound based on rlimits.  */
 
@@ -373,7 +373,7 @@ ggc_min_heapsize_heuristic (void)
 static void
 init_ggc_heuristics (void)
 {
-#if !defined ENABLE_GC_CHECKING && !defined ENABLE_GC_ALWAYS_COLLECT
+#if !defined ENABLE_GC_ALWAYS_COLLECT
   param_ggc_min_expand = ggc_min_expand_heuristic ();
   param_ggc_min_heapsize = ggc_min_heapsize_heuristic ();
 #endif
-- 
2.24.0

Re: [PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-19 Thread Christophe Lyon

On Mon, 18 Nov 2019 at 20:54, Richard Henderson
 wrote:
>
> On 11/18/19 1:30 PM, Christophe Lyon wrote:
> > I'm sorry to notice that the last test (asm-flag-6.c) fails to execute
> > when compiling with -mabi=ilp32. I have less details than for Arm,
> > because here I'm using the Foundation Model as simulator instead of
> > Qemu. In addition, I'm using an old version of it, so maybe it's a
> > simulator bug. Does it work on your side?
>
> I don't know how to test ilp32 with qemu.  Is there a distribution that uses
> this mode, and one tests in system mode?  We don't have user-only support for
> ilp32.
>

Sorry I wasn't clear: I test aarch64-elf with -mabi=ilp32, using newlib.

> I think I have reproduced this with newlib and aarch64-elf.  It could be
> solvable by using either unsigned long long, or by using %w constraints with
> the two cmp instructions.
>
> Except that I made that change and the failure didn't go away.  I'm having
> trouble building a version of gdb that can debug this...
>
>
> r~

Re: [PATCH] libstdc++: Define C++20 range utilities and range factories

2019-11-19 Thread Stephan Bergmann


On 17/11/2019 02:07, Jonathan Wakely wrote:

This adds another chunk of the  header.

The changes from P1456R1 (Move-only views) and P1862R1 (Range adaptors
for non-copyable iterators) are included, but not the changes from
P1870R1 (forwarding-range is too subtle).

The tests for subrange and iota_view are poor and should be improved.


At least with recent Clang trunk, with -std=c++2a this causes failures like


In file included from 
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/set:61:
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_set.h:1057:48:
 error: default initialization of an object of const type 'const bool'
  template inline constexpr bool __enable_view_impl;
   ^
In file included from 
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/set:62:
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_multiset.h:1045:48:
 error: redefinition of '__enable_view_impl'
  template inline constexpr bool __enable_view_impl;
   ^
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_set.h:1057:48:
 note: previous definition is here
  template inline constexpr bool __enable_view_impl;
   ^


and I'm not sure whether this is legal C++20 code that Clang is just not 
capable of handling?

[PATCH] Fix up __builtin_mul_overflow for signed * signed -> unsigned (PR middle-end/91450)

Hi!

This case is correctly described in the comment in expand_mul_overflow:
 s1 * s2 -> ur
t1 = (s1 & s2) < 0 ? (-(U) s1) : ((U) s1)
t2 = (s1 & s2) < 0 ? (-(U) s2) : ((U) s2)
res = t1 * t2
ovf = (s1 ^ s2) < 0 ? (s1 && s2) : main_ovf (true)  */
where if one of the operands is negative and the other non-negative,
the overflow is whenever both operands are non-zero, so s1 && s2,
but unfortunately the code has been implementing s1 & s2 instead,
which e.g. for the -2 and 1 where (-2 & 1) == 0, but (-2 && 1) != 0
is incorrect.
The second hunk is when we know from VRP that one of the operand (and which
one) is negative and which one is non-negative, obviously the negative
one can't be zero, so we can just compare the non-negative one against zero.
The third hunk is when we don't know this precisely from VRP.  In that case
we used to do essentially
  if ((s1 & s2) == 0)
goto main_label;
  ovf = 1;
main_label:
  ovf |= __builtin_mul_overflow ((unsigned) s1, (unsigned) s2, &r);
(the fallthrough is ok, because mul_overflow when one of the operands
is zero will just compute zero), this patch changes it to:
  if (s1 == 0)
goto main_label;
  if (s2 == 0)
goto main_label;
  ovf = 1;
main_label:
  ovf |= __builtin_mul_overflow ((unsigned) s1, (unsigned) s2, &r);
but as an optimization, if we know from VRP that one of the operands
is negative, we don't bother comparing that operand to 0.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and release branches?

2019-11-19  Jakub Jelinek  

PR middle-end/91450
* internal-fn.c (expand_mul_overflow): For s1 * s2 -> ur, if one
operand is negative and one non-negative, compare the non-negative
one against 0 rather than comparing s1 & s2 against 0.  Otherwise,
don't compare (s1 & s2) == 0, but compare separately both s1 == 0
and s2 == 0, unless one of them is known to be negative.  Remove
tem2 variable, use tem where tem2 has been used before.

* gcc.c-torture/execute/pr91450-1.c: New test.
* gcc.c-torture/execute/pr91450-2.c: New test.

--- gcc/internal-fn.c.jj2019-11-08 11:33:48.0 +0100
+++ gcc/internal-fn.c   2019-11-18 14:41:48.247381649 +0100
@@ -1408,7 +1408,7 @@ expand_mul_overflow (location_t loc, tre
   /* s1 * s2 -> ur  */
   if (!uns0_p && !uns1_p && unsr_p)
 {
-  rtx tem, tem2;
+  rtx tem;
   switch (pos_neg0 | pos_neg1)
{
case 1: /* Both operands known to be non-negative.  */
@@ -1438,10 +1438,8 @@ expand_mul_overflow (location_t loc, tre
  ops.op2 = NULL_TREE;
  ops.location = loc;
  res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
- tem = expand_binop (mode, and_optab, op0, op1, NULL_RTX, false,
- OPTAB_LIB_WIDEN);
- do_compare_rtx_and_jump (tem, const0_rtx, EQ, true, mode,
-  NULL_RTX, NULL, done_label,
+ do_compare_rtx_and_jump (pos_neg0 == 1 ? op0 : op1, const0_rtx, 
EQ,
+  true, mode, NULL_RTX, NULL, done_label,
   profile_probability::very_likely ());
  goto do_error_label;
}
@@ -1472,16 +1470,23 @@ expand_mul_overflow (location_t loc, tre
  arg1 = error_mark_node;
  emit_jump (do_main_label);
  emit_label (after_negate_label);
- tem2 = expand_binop (mode, xor_optab, op0, op1, NULL_RTX, false,
-  OPTAB_LIB_WIDEN);
- do_compare_rtx_and_jump (tem2, const0_rtx, GE, false, mode, NULL_RTX,
-  NULL, do_main_label, 
profile_probability::very_likely ());
+ tem = expand_binop (mode, xor_optab, op0, op1, NULL_RTX, false,
+ OPTAB_LIB_WIDEN);
+ do_compare_rtx_and_jump (tem, const0_rtx, GE, false, mode, NULL_RTX,
+  NULL, do_main_label,
+  profile_probability::very_likely ());
  /* One argument is negative here, the other positive.  This
 overflows always, unless one of the arguments is 0.  But
 if e.g. s2 is 0, (U) s1 * 0 doesn't overflow, whatever s1
 is, thus we can keep do_main code oring in overflow as is.  */
- do_compare_rtx_and_jump (tem, const0_rtx, EQ, true, mode, NULL_RTX,
-  NULL, do_main_label, 
profile_probability::very_likely ());
+ if (pos_neg0 != 2)
+   do_compare_rtx_and_jump (op0, const0_rtx, EQ, true, mode, NULL_RTX,
+NULL, do_main_label,
+profile_probability::very_unlikely ());
+ if (pos_neg1 != 2)
+   do_compare_rtx_and_jump (op1, const0_rtx, EQ, true, mode, NULL_RTX,
+NULL, do_main_label,
+

Re: [PATCH] Restore enable_if lost during original import of pstl

2019-11-19 Thread Jonathan Wakely


On 18/11/19 20:54 -0800, Thomas Rodgers wrote:


* include/pstl/glue_numeric_defs.h: Restore enable_if lost during 
original
import of pstl.
* include/pstl/glue_numeric_impl.h: Likewise.


OK for trunk and gcc-9-branch, thanks.

Re: [PATCH] libstdc++: Define C++20 range utilities and range factories

2019-11-19 Thread Jonathan Wakely


On 19/11/19 09:38 +0100, Stephan Bergmann wrote:

On 17/11/2019 02:07, Jonathan Wakely wrote:

This adds another chunk of the  header.

The changes from P1456R1 (Move-only views) and P1862R1 (Range adaptors
for non-copyable iterators) are included, but not the changes from
P1870R1 (forwarding-range is too subtle).

The tests for subrange and iota_view are poor and should be improved.


At least with recent Clang trunk, with -std=c++2a this causes failures like


In file included from 
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/set:61:
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_set.h:1057:48:
 error: default initialization of an object of const type 'const bool'
 template inline constexpr bool __enable_view_impl;
  ^
In file included from 
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/set:62:
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_multiset.h:1045:48:
 error: redefinition of '__enable_view_impl'
 template inline constexpr bool __enable_view_impl;
  ^
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_set.h:1057:48:
 note: previous definition is here
 template inline constexpr bool __enable_view_impl;
  ^


and I'm not sure whether this is legal C++20 code that Clang is just 
not capable of handling?


I'm not sure either. I checked with GCC and it did what I expected
(the first declaration is not a definition, just a declaration). I'll
check if that's valid.

For now the simplest fix would be to just guard all those variable
templates with #if __cpp_lib_concepts since std::ranges::enable_view
only exists when concepts are supported (which is false for Clang).

I'll look into it, thanks for the heads up.

Re: [WIP PATCH] add object access attributes (PR 83859)

On Mon, Nov 18, 2019 at 5:45 PM Martin Sebor  wrote:
>
> On 11/18/19 1:36 AM, Richard Biener wrote:
> > On Fri, Nov 15, 2019 at 10:28 PM Martin Sebor  wrote:
> >>
> >> Thanks for the suggestion.  I will do that for GCC 11.  I take
> >> Richard's point that the attributes' semantics need to be clearly
> >> and carefully specified before they're put to use for optimization.
> >
> > Before they are exposed to users please.  It doesn't help if we
> > specify the same attribute for optimization later when uses are out
> > in the wild "guessing" at what the possible interpretation is.
> >
> > Maybe we can name your attributes maybe_readonly and friends
> > to clearly indicate that this is only a guess by the user so at most
> > usable for diagnostics but never for optimization.
> >
> > Since we have quite costly attribute lookup I also prefer something
> > that translates to less attributes - how about
> > __attribute__((diag_argspec(1, readonly), diag_argspec(2, writeonly)))
> > to indicate argument 1 is maybe readonly, 2 is writeonly?  We can
> > then merge this into a single diag_arspec attribute instance we can
> > lookup.
>
> I can look into making a change along these lines.
>
> I'm not fond of the idea of introducing a "maybe" kind of attributes
> now and another parallel "for-sure" set later.  My goal is to have
> the attributes express the same access constraints as those on
> the arguments to built-in string functions like memcpy or strcat
> (i.e., read_only only reads a pointed-to object, write_only only
> writes, and, for strcat, read_write both reads and writes it).
>
> Those properties are sufficiently well understood.  The three
> attributes aren't intended to express constraints on aliasing
> or on the side-effects of the functions, like restrict or
> the const and pure attributes.

They at least sound like they do.  I didn't look at the latest version
of the patch but please amend the documentation of the attributes
to then say that they will never be used in a way affecting code
generation.

Thanks,
Richard.

>
> To let users do more than that, some additional annotation will
> probably be necessary.  In my WIP patch I have a no_side_effect
> attribute that lets functions do more than const and pure but
> that's still work in progress that I don't plan to submit for
> GCC 10.
>
> Martin
>
> >
> >>>
> >>> I don't see anything terribly concerning.  Looking forward to the final
> >>> iteration here.
> >>
> >> Attached is a subset of the original patch that just adds the three
> >> attributes and uses them to do buffer overflow checking.  I have
> >> also enhanced the detection of invalid arguments (null pointers,
> >> negative sizes).
> >>
> >> Retested on x86_64-linux.
> >>
> >> Martin
>

[committed] Punt instead of assertion failure when TYPE_MODE is not vmode (PR tree-optimization/92557)

Hi!

This patch restores the previous behavior, there could be many reasons why
TYPE_MODE is BLKmode or some integral mode instead of a vector mode,
unsupported vector mode, lack of available registers etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2019-11-19  Jakub Jelinek  

PR tree-optimization/92557
* omp-low.c (omp_clause_aligned_alignment): Punt if TYPE_MODE is not
vmode rather than asserting it always is.

* gcc.dg/gomp/pr92557.c: New test.

--- gcc/omp-low.c.jj2019-11-15 00:37:26.359069152 +0100
+++ gcc/omp-low.c   2019-11-18 16:17:27.767714777 +0100
@@ -4086,8 +4086,8 @@ omp_clause_aligned_alignment (tree claus
if (type == NULL_TREE || TYPE_MODE (type) != mode)
  continue;
type = build_vector_type_for_mode (type, vmode);
-   /* The functions above are not allowed to return invalid modes.  */
-   gcc_assert (TYPE_MODE (type) == vmode);
+   if (TYPE_MODE (type) != vmode)
+ continue;
if (TYPE_ALIGN_UNIT (type) > al)
  al = TYPE_ALIGN_UNIT (type);
   }
--- gcc/testsuite/gcc.dg/gomp/pr92557.c.jj  2019-11-18 16:18:18.246961685 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr92557.c 2019-11-18 16:19:21.531017555 +0100
@@ -0,0 +1,13 @@
+/* PR tree-optimization/92557 */
+/* { dg-do compile } */
+/* { dg-additional-options "-maltivec" { target powerpc*-*-* } } */
+
+void
+foo (double *p)
+{
+  int i;
+
+#pragma omp simd aligned (p)
+  for (i = 0; i < 1; ++i)
+p[i] = 7.0;
+}

Jakub

[PING] Re: [PATCH] Fix PR92088

On Fri, 8 Nov 2019, Richard Biener wrote:

> 
> The following works around a middle-end limitation not being able
> to deal with by value-passing of VLAs transparently during inlining
> (but only DECL_BY_REFERENCE is handled) in the C frontend by marking
> said functions as not inlinable.  This avoids ICEs later.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK for trunk?

Ping.

> Thanks,
> Richard.
> 
> 2019-11-09  Richard Biener  
> 
>   PR c/92088
>   c/
>   * c-decl.c (grokdeclarator): Prevent inlining of nested
>   function with VLA arguments.
> 
>   * builtins.c (compute_objsize): Deal with VLAs.
> 
>   * gcc.dg/torture/pr92088-1.c: New testcase.
>   * gcc.dg/torture/pr92088-2.c: Likewise.
> 
> Index: gcc/builtins.c
> ===
> --- gcc/builtins.c(revision 277906)
> +++ gcc/builtins.c(working copy)
> @@ -3708,7 +3708,8 @@ compute_objsize (tree dest, int ostype,
>if (DECL_P (ref))
>  {
>*pdecl = ref;
> -  return DECL_SIZE_UNIT (ref);
> +  if (tree size = DECL_SIZE_UNIT (ref))
> + return TREE_CODE (size) == INTEGER_CST ? size : NULL_TREE;
>  }
>  
>tree type = TREE_TYPE (dest);
> Index: gcc/c/c-decl.c
> ===
> --- gcc/c/c-decl.c(revision 277906)
> +++ gcc/c/c-decl.c(working copy)
> @@ -7304,6 +7304,23 @@ grokdeclarator (const struct c_declarato
>   "no linkage");
>}
>  
> +/* For nested functions disqualify ones taking VLAs by value
> +   from inlining since the middle-end cannot deal with this.
> +   ???  We should arrange for those to be passed by reference
> +   with emitting the copy on the caller side in the frontend.  */
> +if (storage_class == csc_none
> + && TREE_CODE (type) == FUNCTION_TYPE)
> +  for (tree al = TYPE_ARG_TYPES (type); al; al = TREE_CHAIN (al))
> + {
> +   tree arg = TREE_VALUE (al);
> +   if (arg != error_mark_node
> +   && C_TYPE_VARIABLE_SIZE (TREE_VALUE (al)))
> + {
> +   DECL_UNINLINABLE (decl) = 1;
> +   break;
> + }
> + }
> +
>  /* Record `register' declaration for warnings on &
> and in case doing stupid register allocation.  */
>  
> Index: gcc/testsuite/gcc.dg/torture/pr92088-1.c
> ===
> --- gcc/testsuite/gcc.dg/torture/pr92088-1.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/torture/pr92088-1.c  (working copy)
> @@ -0,0 +1,22 @@
> +/* { dg-do run } */
> +
> +int __attribute__((noipa))
> +g (char *p)
> +{
> +  return p[9];
> +}
> +int main (int argc, char **argv)
> +{
> +  struct S {
> +char toto[argc + 16];
> +  };
> +  int f (struct S arg) {
> +  __builtin_strcpy(arg.toto, "helloworld");
> +  return g (arg.toto);
> +  }
> +  struct S bob;
> +  __builtin_strcpy(bob.toto, "coucoucoucou");
> +  if (f(bob) != 'd' || __builtin_strcmp (bob.toto, "coucoucoucou"))
> +__builtin_abort ();
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.dg/torture/pr92088-2.c
> ===
> --- gcc/testsuite/gcc.dg/torture/pr92088-2.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/torture/pr92088-2.c  (working copy)
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +
> +void foo(int n)
> +{
> +  struct X { int a[n]; } y;
> +
> +  struct X baz (struct X x)
> +{
> +  x.a[0] = 1;
> +  return x;
> +}
> +
> +  y.a[0] = 0;
> +  y = baz(y);
> +  if (y.a[0] != 1)
> +__builtin_abort ();
> +}
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[committed] Fix ICE in handle_omp_class_iterator (PR c++/92504)

Hi!

Calling cp_fully_fold (which has been introduced with C++ late folding)
before actually diagnosing invalid arguments of the comparison is
problematic, because as the testcase shows the folding can then ICE before
the invalid code is reported.

Removing the cp_fully_fold call doesn't regress anything, so I've removed
it after doing bootstrap/regtest on x86_64-linux and i686-linux and committed
to trunk.

2019-11-19  Jakub Jelinek  

PR c++/92504
* semantics.c (handle_omp_for_class_iterator): Don't call
cp_fully_fold on cond.

* g++.dg/gomp/pr92504.C: New test.

--- gcc/cp/semantics.c.jj   2019-11-11 21:04:24.837962966 +0100
+++ gcc/cp/semantics.c  2019-11-18 19:19:36.921680774 +0100
@@ -8434,7 +8434,6 @@ handle_omp_for_class_iterator (int i, lo
   if (init && EXPR_HAS_LOCATION (init))
 elocus = EXPR_LOCATION (init);
 
-  cond = cp_fully_fold (cond);
   switch (TREE_CODE (cond))
 {
 case GT_EXPR:
--- gcc/testsuite/g++.dg/gomp/pr92504.C.jj  2019-11-18 19:18:45.211452647 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr92504.C 2019-11-18 19:19:19.660938426 +0100
@@ -0,0 +1,29 @@
+// PR c++/92504
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-O2" }
+
+namespace std {
+  typedef __SIZE_TYPE__ size_t;
+  typedef __PTRDIFF_TYPE__ ptrdiff_t;
+}
+
+struct A {
+  A ();
+  A (const A &);
+  A & operator++ ();
+  bool operator != (const A &) const;
+  std::ptrdiff_t operator - (const A &);
+  A & operator += (std::size_t);
+  int a;
+  A & begin ();
+  A & end ();  // { dg-message "declared here" }
+};
+
+void
+bar ()
+{
+  A a;
+  #pragma omp for
+  for (auto b = a; b != a.end; ++b)// { dg-error "invalid use of 
non-static member function" }
+;
+}

Jakub

[PATCH 2/3] [ARC] Add scaled load pattern

2019-11-19 Thread Claudiu Zissulescu

ARC processors can use scaled addresses, i.e., the offset part of the
load address can be shifted by 2 (multiplied by 4). Add this pattern
and a test for it.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.md (load_scaledsi): New pattern.

testcase/
-xx-xx  Claudiu Zissulescu  

* gcc.target/arc/scaled-ld.c: New test.
---
 gcc/config/arc/arc.md| 12 
 gcc/testsuite/gcc.target/arc/scaled-ld.c | 13 +
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arc/scaled-ld.c

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index d2b7a45b6e6..ed16be65cab 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -867,6 +867,18 @@ archs4x, archs4xd"
   "st%U0 %1,%0\;st%U0.di %1,%0"
   [(set_attr "type" "store")])
 
+(define_insn "*load_scaledsi"
+  [(set (match_operand:SI 0 "register_operand" "=q,r,r")
+   (mem:SI (plus:SI (mult:SI (match_operand:SI 1 "register_operand" 
"q,r,r")
+ (const_int 4))
+(match_operand:SI 2 "nonmemory_operand" "q,r,Cal"]
+  ""
+  "ld%?.as\\t%0,[%2,%1]"
+  [(set_attr "type" "load")
+   (set_attr "iscompact" "true,false,false")
+   (set_attr "length" "2,4,8")
+   (set_attr "cpu_facility" "cd,*,*")])
+
 ;; Combiner patterns for compare with zero
 (define_mode_iterator SQH [QI HI])
 (define_mode_attr SQH_postfix [(QI "b") (HI "%_")])
diff --git a/gcc/testsuite/gcc.target/arc/scaled-ld.c 
b/gcc/testsuite/gcc.target/arc/scaled-ld.c
new file mode 100644
index 000..bebae8fe505
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/scaled-ld.c
@@ -0,0 +1,13 @@
+/* Simple test for scaled load addressed.  */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+const int table[2] = {1, 2};
+
+int
+foo (char i)
+{
+  return table[i];
+}
+
+/* { dg-final { scan-assembler "ld.as" } } */
-- 
2.23.0

[PATCH 3/3] [ARC] Register ARC specific passes with a .def file.

2019-11-19 Thread Claudiu Zissulescu

Use arc-passes.def to register ARC specific passes.

Ok to apply?
Claudiu

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc-protos.h (make_pass_arc_ifcvt): Declare.
(make_pass_arc_predicate_delay_insns): Likewise.
* config/arc/arc.c (class pass_arc_ifcvt): Reformat text, add gate
method.
(class pass_arc_predicate_delay_insns): Likewise.
(arc_init): Remove registering of ARC specific passes.
* config/arc/t-arc (PASSES_EXTRA): Add arc-passes.def.
* config/arc/arc-passes.def: New file.
---
 gcc/config/arc/arc-passes.def | 29 
 gcc/config/arc/arc-protos.h   |  3 ++
 gcc/config/arc/arc.c  | 64 +++
 gcc/config/arc/t-arc  |  2 ++
 4 files changed, 61 insertions(+), 37 deletions(-)
 create mode 100644 gcc/config/arc/arc-passes.def

diff --git a/gcc/config/arc/arc-passes.def b/gcc/config/arc/arc-passes.def
new file mode 100644
index 000..ebb69a563df
--- /dev/null
+++ b/gcc/config/arc/arc-passes.def
@@ -0,0 +1,29 @@
+/* Description of target passes for ARC.
+   Copyright (C) 2019 Free Software Foundation, Inc. */
+
+/* This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* First target dependent ARC if-conversion pass.  */
+INSERT_PASS_AFTER (pass_delay_slots, 1, pass_arc_ifcvt);
+
+/* Second target dependent ARC if-conversion pass.  */
+INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_arc_ifcvt);
+
+/* Find annulled delay insns and convert them to use the appropriate
+   predicate.  This allows branch shortening to size up these
+   instructions properly.  */
+INSERT_PASS_AFTER (pass_delay_slots, 1, pass_arc_predicate_delay_insns);
diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
index 1220e77206d..da128dcbcb2 100644
--- a/gcc/config/arc/arc-protos.h
+++ b/gcc/config/arc/arc-protos.h
@@ -111,3 +111,6 @@ extern void arc_eh_return_address_location (rtx);
 extern bool arc_is_jli_call_p (rtx);
 extern void arc_file_end (void);
 extern bool arc_is_secure_call_p (rtx);
+
+rtl_opt_pass * make_pass_arc_ifcvt (gcc::context *ctxt);
+rtl_opt_pass * make_pass_arc_predicate_delay_insns (gcc::context *ctxt);
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 918c4e06533..d67775448a3 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -961,14 +961,24 @@ const pass_data pass_data_arc_ifcvt =
 
 class pass_arc_ifcvt : public rtl_opt_pass
 {
-public:
-  pass_arc_ifcvt(gcc::context *ctxt)
-  : rtl_opt_pass(pass_data_arc_ifcvt, ctxt)
-  {}
+ public:
+ pass_arc_ifcvt (gcc::context *ctxt)
+   : rtl_opt_pass (pass_data_arc_ifcvt, ctxt)
+{}
 
   /* opt_pass methods: */
-  opt_pass * clone () { return new pass_arc_ifcvt (m_ctxt); }
-  virtual unsigned int execute (function *) { return arc_ifcvt (); }
+  opt_pass * clone ()
+{
+  return new pass_arc_ifcvt (m_ctxt);
+}
+  virtual unsigned int execute (function *)
+  {
+return arc_ifcvt ();
+  }
+  virtual bool gate (function *)
+  {
+return (optimize > 1 && !TARGET_NO_COND_EXEC);
+  }
 };
 
 } // anon namespace
@@ -998,16 +1008,20 @@ const pass_data pass_data_arc_predicate_delay_insns =
 
 class pass_arc_predicate_delay_insns : public rtl_opt_pass
 {
-public:
-  pass_arc_predicate_delay_insns(gcc::context *ctxt)
-  : rtl_opt_pass(pass_data_arc_predicate_delay_insns, ctxt)
-  {}
+ public:
+ pass_arc_predicate_delay_insns(gcc::context *ctxt)
+   : rtl_opt_pass(pass_data_arc_predicate_delay_insns, ctxt)
+{}
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *)
-{
-  return arc_predicate_delay_insns ();
-}
+  {
+return arc_predicate_delay_insns ();
+  }
+  virtual bool gate (function *)
+  {
+return flag_delayed_branch;
+  }
 };
 
 } // anon namespace
@@ -1100,30 +1114,6 @@ arc_init (void)
   arc_punct_chars['&'] = 1;
   arc_punct_chars['+'] = 1;
   arc_punct_chars['_'] = 1;
-
-  if (optimize > 1 && !TARGET_NO_COND_EXEC)
-{
-  /* There are two target-independent ifcvt passes, and arc_reorg may do
-one or more arc_ifcvt calls.  */
-  opt_pass *pass_arc_ifcvt_4 = make_pass_arc_ifcvt (g);
-  struct register_pass_info arc_ifcvt4_info
-   = { pass_arc_ifcvt_4, "dbr", 1, PASS_POS_INSERT_AFTER };
-  struct register_pass_info arc_ifcvt5_info
-   = { pass_arc_ifcvt_4->clone (), "shorten", 1, PASS_POS_INSER

[PATCH 1/3] [ARC] Fix failing pr77309 for ARC700

2019-11-19 Thread Claudiu Zissulescu

The patterns neg_scc_insn and not_scc_insn are not correct, leading to
failing pr77309 test for ARC700. Add two new bic compare with zero
patterns to improve output code.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.md (bic_f): Use cc_set_register predicate.
(bic_cmp0_noout): New pattern.
(bic_cmp0): Likewise.
(neg_scc_insn): Remove pattern.
(not_scc_insn): Likewise.
---
 gcc/config/arc/arc.md | 52 ---
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 7da52199415..d2b7a45b6e6 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -1272,7 +1272,7 @@ archs4x, archs4xd"
   "")
 
 (define_insn "*bic_f"
-  [(set (match_operand 3 "cc_register" "=Rcc,Rcc,Rcc")
+  [(set (match_operand 3 "cc_set_register" "")
(match_operator 4 "zn_compare_operator"
  [(and:SI (match_operand:SI 1 "register_operand" "c,0,c")
   (not:SI
@@ -1286,6 +1286,34 @@ archs4x, archs4xd"
(set_attr "cond" "set_zn,set_zn,set_zn")
(set_attr "length" "4,4,8")])
 
+(define_insn "*bic_cmp0_noout"
+  [(set (match_operand 0 "cc_set_register" "")
+   (compare:CC_ZN
+(and:SI (not:SI (match_operand:SI 1 "nonmemory_operand" "Lr,Cal,r"))
+(match_operand:SI 2 "nonmemory_operand" "r,r,Cal"))
+(const_int 0)))]
+  "register_operand (operands[1], SImode)
+   || register_operand (operands[2], SImode)"
+  "bic.f\\t0,%2,%1"
+  [(set_attr "type" "unary")
+   (set_attr "cond" "set_zn")
+   (set_attr "length" "4,8,8")])
+
+(define_insn "*bic_cmp0"
+  [(set (match_operand 0 "cc_set_register" "")
+   (compare:CC_ZN
+(and:SI (not:SI (match_operand:SI 1 "nonmemory_operand" "Lr,Cal,r"))
+(match_operand:SI 2 "nonmemory_operand" "r,r,Cal"))
+(const_int 0)))
+   (set (match_operand:SI 3 "register_operand" "=r,r,r")
+   (and:SI (not:SI (match_dup 1)) (match_dup 2)))]
+  "register_operand (operands[1], SImode)
+   || register_operand (operands[2], SImode)"
+  "bic.f\\t%3,%2,%1"
+  [(set_attr "type" "unary")
+   (set_attr "cond" "set_zn")
+   (set_attr "length" "4,8,8")])
+
 (define_expand "movdi"
   [(set (match_operand:DI 0 "move_dest_operand" "")
(match_operand:DI 1 "general_operand" ""))]
@@ -3763,28 +3791,6 @@ archs4x, archs4xd"
 }
   [(set_attr "type" "unary")])
 
-;; ??? At least for ARC600, we should use sbc b,b,s12 if we want a value
-;; that is one lower if the carry flag is set.
-
-;; ??? Look up negscc insn.  See pa.md for example.
-(define_insn "*neg_scc_insn"
-  [(set (match_operand:SI 0 "dest_reg_operand" "=w")
-   (neg:SI (match_operator:SI 1 "proper_comparison_operator"
-[(reg CC_REG) (const_int 0)])))]
-  ""
-  "mov %0,-1\;sub.%D1 %0,%0,%0"
-  [(set_attr "type" "unary")
-   (set_attr "length" "8")])
-
-(define_insn "*not_scc_insn"
-  [(set (match_operand:SI 0 "dest_reg_operand" "=w")
-   (not:SI (match_operator:SI 1 "proper_comparison_operator"
-[(reg CC_REG) (const_int 0)])))]
-  ""
-  "mov %0,1\;sub.%d1 %0,%0,%0"
-  [(set_attr "type" "unary")
-   (set_attr "length" "8")])
-
 ; cond_exec patterns
 (define_insn "*movsi_ne"
   [(cond_exec
-- 
2.23.0

[PATCH] Use xchg for -Os (PR target/92549)

Hi!

xchg instruction is smaller, in some cases much smaller than 3 moves,
(e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
disaster, but from Agner Fog tables and
https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
it doesn't seem to be something we'd want to use when optimizing for speed,
at least not on Intel.

While we have *swap patterns, those are very unlikely to be triggered
during combine, usually we have different pseudos in there and the actual
need for swapping is only materialized during RA.

The following patch does it when optimizing the insn for size only.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-19  Jakub Jelinek  

PR target/92549
* config/i386/i386.md (peephole2 for *swap): New peephole2.

* gcc.target/i386/pr92549.c: New test.

--- gcc/config/i386/i386.md.jj  2019-10-28 22:16:14.583008121 +0100
+++ gcc/config/i386/i386.md 2019-11-18 17:06:36.050742545 +0100
@@ -2787,6 +2787,17 @@
(set_attr "amdfam10_decode" "double")
(set_attr "bdver1_decode" "double")])
 
+(define_peephole2
+  [(set (match_operand:SWI 0 "register_operand")
+   (match_operand:SWI 1 "register_operand"))
+   (set (match_dup 1)
+   (match_operand:SWI 2 "register_operand"))
+   (set (match_dup 2) (match_dup 0))]
+  "peep2_reg_dead_p (3, operands[0])
+   && optimize_insn_for_size_p ()"
+  [(parallel [(set (match_dup 1) (match_dup 2))
+ (set (match_dup 2) (match_dup 1))])])
+
 (define_expand "movstrict"
   [(set (strict_low_part (match_operand:SWI12 0 "register_operand"))
(match_operand:SWI12 1 "general_operand"))]
--- gcc/testsuite/gcc.target/i386/pr92549.c.jj  2019-11-18 17:48:35.533177377 
+0100
+++ gcc/testsuite/gcc.target/i386/pr92549.c 2019-11-18 17:49:31.888336444 
+0100
@@ -0,0 +1,28 @@
+/* PR target/92549 */
+/* { dg-do compile } */
+/* { dg-options "-Os -masm=att" } */
+/* { dg-final { scan-assembler "xchgl" } } */
+
+#ifdef __i386__
+#define R , regparm (2)
+#else
+#define R
+#endif
+
+__attribute__((noipa R)) int
+bar (int a, int b)
+{
+  return b - a + 5;
+}
+
+__attribute__((noipa R)) int
+foo (int a, int b)
+{
+  return 1 + bar (b, a);
+}
+
+int
+main ()
+{
+  return foo (39, 3);
+}

Jakub

Re: [PATCH] Enable init_ggc_heuristics for ENABLE_GC_CHECKING.

On Tue, Nov 19, 2019 at 9:12 AM Martin Liška  wrote:
>
> One potential improvement is to enable the heuristics
> for ENABLE_GC_CHECKING. The macro is about sanity checking
> and poisoning of GGC memory. Which seems to me completely
> independent to setting of the default parameters.

Well, doing more collects also is "checking" and the GC checking part
this way becomes deterministic (not dependent on host setup) so no,
this is by no means an obvious change.  If we think we collect too
often we can adjust the GC param static defaults to todays reality instead
(mostly --param ggc-min-heapsize which has been 4MB for ages).

Btw, I see the params.opt default is now 4096 unconditionally so
does ENABLE_GC_ALWAYS_COLLECT no longer "work"?
It used to adjust the params.def default values.

Richard.

> Ready to be installed?
> Thanks,
> Martin

Re: [PATCH] Add one more pass_convert_switch late.

On Mon, Nov 18, 2019 at 03:34:35PM +0100, Martin Liška wrote:
> > Now, I believe with the if to gswitch optimization these will only rarely
> > kick in, because the IL will have switches that reassoc doesn't handle,
> > instead of series of ifs.
> 
> Yes, so my question is whether reassoc can handle gswitch statements similar
> to series of if statements? Note that use can write explicitly something like:

I don't see how.  Either the cswitch pass runs early, and then you need to
take it into account when deciding whether to expand it as series of ifs or
table lookup or bittest (say, realize you'd need just a single condition to
cover everything and go with if series), or it runs late and then on the
other side if reassoc turned gswitch into if series without taking into
account other possibilities like table lookup, it could pessimize stuff.
So, IMHO cswitch should he able to do this too.

> 
> switch (a)
>case 0...30:
>   return 1;
>case 32:
>   return 1;
>case 34:
>   return 1;
> 
> which won't be optimized by reassoc. While it can handle something 
> 0<=A<=30|A==32|A==34.

Jakub

Fix typo in the manual

2019-11-19 Thread Eric Botcazou

Tested on x86_64-suse-linux, applied on all active branches.


2019-11-19  Eric Botcazou  

* doc/invoke.texi (-gno-internal-reset-location-views): Fix typo.

-- 
Eric BotcazouIndex: doc/invoke.texi
===
--- doc/invoke.texi	(revision 278433)
+++ doc/invoke.texi	(working copy)
@@ -8175,7 +8175,7 @@ consumers are not expected to support th
 would be rendered unable to decode location lists using it.
 
 @item -ginternal-reset-location-views
-@itemx -gnointernal-reset-location-views
+@itemx -gno-internal-reset-location-views
 @opindex ginternal-reset-location-views
 @opindex gno-internal-reset-location-views
 Attempt to determine location views that can be omitted from location

Re: [PATCH] Fix up __builtin_mul_overflow for signed * signed -> unsigned (PR middle-end/91450)

On Tue, 19 Nov 2019, Jakub Jelinek wrote:

> Hi!
> 
> This case is correctly described in the comment in expand_mul_overflow:
>  s1 * s2 -> ur
> t1 = (s1 & s2) < 0 ? (-(U) s1) : ((U) s1)
> t2 = (s1 & s2) < 0 ? (-(U) s2) : ((U) s2)
> res = t1 * t2
> ovf = (s1 ^ s2) < 0 ? (s1 && s2) : main_ovf (true)  */
> where if one of the operands is negative and the other non-negative,
> the overflow is whenever both operands are non-zero, so s1 && s2,
> but unfortunately the code has been implementing s1 & s2 instead,
> which e.g. for the -2 and 1 where (-2 & 1) == 0, but (-2 && 1) != 0
> is incorrect.
> The second hunk is when we know from VRP that one of the operand (and which
> one) is negative and which one is non-negative, obviously the negative
> one can't be zero, so we can just compare the non-negative one against zero.
> The third hunk is when we don't know this precisely from VRP.  In that case
> we used to do essentially
>   if ((s1 & s2) == 0)
> goto main_label;
>   ovf = 1;
> main_label:
>   ovf |= __builtin_mul_overflow ((unsigned) s1, (unsigned) s2, &r);
> (the fallthrough is ok, because mul_overflow when one of the operands
> is zero will just compute zero), this patch changes it to:
>   if (s1 == 0)
> goto main_label;
>   if (s2 == 0)
> goto main_label;
>   ovf = 1;
> main_label:
>   ovf |= __builtin_mul_overflow ((unsigned) s1, (unsigned) s2, &r);
> but as an optimization, if we know from VRP that one of the operands
> is negative, we don't bother comparing that operand to 0.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> and release branches?

OK.

Thanks,
Richard.
 
> 2019-11-19  Jakub Jelinek  
> 
>   PR middle-end/91450
>   * internal-fn.c (expand_mul_overflow): For s1 * s2 -> ur, if one
>   operand is negative and one non-negative, compare the non-negative
>   one against 0 rather than comparing s1 & s2 against 0.  Otherwise,
>   don't compare (s1 & s2) == 0, but compare separately both s1 == 0
>   and s2 == 0, unless one of them is known to be negative.  Remove
>   tem2 variable, use tem where tem2 has been used before.
> 
>   * gcc.c-torture/execute/pr91450-1.c: New test.
>   * gcc.c-torture/execute/pr91450-2.c: New test.
> 
> --- gcc/internal-fn.c.jj  2019-11-08 11:33:48.0 +0100
> +++ gcc/internal-fn.c 2019-11-18 14:41:48.247381649 +0100
> @@ -1408,7 +1408,7 @@ expand_mul_overflow (location_t loc, tre
>/* s1 * s2 -> ur  */
>if (!uns0_p && !uns1_p && unsr_p)
>  {
> -  rtx tem, tem2;
> +  rtx tem;
>switch (pos_neg0 | pos_neg1)
>   {
>   case 1: /* Both operands known to be non-negative.  */
> @@ -1438,10 +1438,8 @@ expand_mul_overflow (location_t loc, tre
> ops.op2 = NULL_TREE;
> ops.location = loc;
> res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
> -   tem = expand_binop (mode, and_optab, op0, op1, NULL_RTX, false,
> -   OPTAB_LIB_WIDEN);
> -   do_compare_rtx_and_jump (tem, const0_rtx, EQ, true, mode,
> -NULL_RTX, NULL, done_label,
> +   do_compare_rtx_and_jump (pos_neg0 == 1 ? op0 : op1, const0_rtx, 
> EQ,
> +true, mode, NULL_RTX, NULL, done_label,
>  profile_probability::very_likely ());
> goto do_error_label;
>   }
> @@ -1472,16 +1470,23 @@ expand_mul_overflow (location_t loc, tre
> arg1 = error_mark_node;
> emit_jump (do_main_label);
> emit_label (after_negate_label);
> -   tem2 = expand_binop (mode, xor_optab, op0, op1, NULL_RTX, false,
> -OPTAB_LIB_WIDEN);
> -   do_compare_rtx_and_jump (tem2, const0_rtx, GE, false, mode, NULL_RTX,
> -NULL, do_main_label, 
> profile_probability::very_likely ());
> +   tem = expand_binop (mode, xor_optab, op0, op1, NULL_RTX, false,
> +   OPTAB_LIB_WIDEN);
> +   do_compare_rtx_and_jump (tem, const0_rtx, GE, false, mode, NULL_RTX,
> +NULL, do_main_label,
> +profile_probability::very_likely ());
> /* One argument is negative here, the other positive.  This
>overflows always, unless one of the arguments is 0.  But
>if e.g. s2 is 0, (U) s1 * 0 doesn't overflow, whatever s1
>is, thus we can keep do_main code oring in overflow as is.  */
> -   do_compare_rtx_and_jump (tem, const0_rtx, EQ, true, mode, NULL_RTX,
> -NULL, do_main_label, 
> profile_probability::very_likely ());
> +   if (pos_neg0 != 2)
> + do_compare_rtx_and_jump (op0, const0_rtx, EQ, true, mode, NULL_RTX,
> +  NULL, do_main_label,
> +  profile_probability::very_unlikely ());
>

Re: [PATCH] Use xchg for -Os (PR target/92549)

On Tue, Nov 19, 2019 at 10:04 AM Jakub Jelinek  wrote:
>
> Hi!
>
> xchg instruction is smaller, in some cases much smaller than 3 moves,
> (e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
> disaster, but from Agner Fog tables and
> https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
> it doesn't seem to be something we'd want to use when optimizing for speed,
> at least not on Intel.
>
> While we have *swap patterns, those are very unlikely to be triggered
> during combine, usually we have different pseudos in there and the actual
> need for swapping is only materialized during RA.
>
> The following patch does it when optimizing the insn for size only.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I wonder if IRA/LRA should be made aware of an xchg instruction
(and it's cost?) so it knows it doesn't actually need a temporary register?

Richard.

> 2019-11-19  Jakub Jelinek  
>
> PR target/92549
> * config/i386/i386.md (peephole2 for *swap): New peephole2.
>
> * gcc.target/i386/pr92549.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2019-10-28 22:16:14.583008121 +0100
> +++ gcc/config/i386/i386.md 2019-11-18 17:06:36.050742545 +0100
> @@ -2787,6 +2787,17 @@
> (set_attr "amdfam10_decode" "double")
> (set_attr "bdver1_decode" "double")])
>
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "register_operand")
> +   (match_operand:SWI 1 "register_operand"))
> +   (set (match_dup 1)
> +   (match_operand:SWI 2 "register_operand"))
> +   (set (match_dup 2) (match_dup 0))]
> +  "peep2_reg_dead_p (3, operands[0])
> +   && optimize_insn_for_size_p ()"
> +  [(parallel [(set (match_dup 1) (match_dup 2))
> + (set (match_dup 2) (match_dup 1))])])
> +
>  (define_expand "movstrict"
>[(set (strict_low_part (match_operand:SWI12 0 "register_operand"))
> (match_operand:SWI12 1 "general_operand"))]
> --- gcc/testsuite/gcc.target/i386/pr92549.c.jj  2019-11-18 17:48:35.533177377 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr92549.c 2019-11-18 17:49:31.888336444 
> +0100
> @@ -0,0 +1,28 @@
> +/* PR target/92549 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -masm=att" } */
> +/* { dg-final { scan-assembler "xchgl" } } */
> +
> +#ifdef __i386__
> +#define R , regparm (2)
> +#else
> +#define R
> +#endif
> +
> +__attribute__((noipa R)) int
> +bar (int a, int b)
> +{
> +  return b - a + 5;
> +}
> +
> +__attribute__((noipa R)) int
> +foo (int a, int b)
> +{
> +  return 1 + bar (b, a);
> +}
> +
> +int
> +main ()
> +{
> +  return foo (39, 3);
> +}
>
> Jakub
>

Re: [PATCH] Use xchg for -Os (PR target/92549)

2019-11-19 Thread Uros Bizjak

On Tue, Nov 19, 2019 at 10:04 AM Jakub Jelinek  wrote:
>
> Hi!
>
> xchg instruction is smaller, in some cases much smaller than 3 moves,
> (e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
> disaster, but from Agner Fog tables and
> https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
> it doesn't seem to be something we'd want to use when optimizing for speed,
> at least not on Intel.
>
> While we have *swap patterns, those are very unlikely to be triggered
> during combine, usually we have different pseudos in there and the actual
> need for swapping is only materialized during RA.
>
> The following patch does it when optimizing the insn for size only.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-11-19  Jakub Jelinek  
>
> PR target/92549
> * config/i386/i386.md (peephole2 for *swap): New peephole2.
>
> * gcc.target/i386/pr92549.c: New test.

OK with some test adjustments.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2019-10-28 22:16:14.583008121 +0100
> +++ gcc/config/i386/i386.md 2019-11-18 17:06:36.050742545 +0100
> @@ -2787,6 +2787,17 @@
> (set_attr "amdfam10_decode" "double")
> (set_attr "bdver1_decode" "double")])
>
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "register_operand")
> +   (match_operand:SWI 1 "register_operand"))
> +   (set (match_dup 1)
> +   (match_operand:SWI 2 "register_operand"))
> +   (set (match_dup 2) (match_dup 0))]
> +  "peep2_reg_dead_p (3, operands[0])
> +   && optimize_insn_for_size_p ()"
> +  [(parallel [(set (match_dup 1) (match_dup 2))
> + (set (match_dup 2) (match_dup 1))])])
> +
>  (define_expand "movstrict"
>[(set (strict_low_part (match_operand:SWI12 0 "register_operand"))
> (match_operand:SWI12 1 "general_operand"))]
> --- gcc/testsuite/gcc.target/i386/pr92549.c.jj  2019-11-18 17:48:35.533177377 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr92549.c 2019-11-18 17:49:31.888336444 
> +0100
> @@ -0,0 +1,28 @@
> +/* PR target/92549 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -masm=att" } */
> +/* { dg-final { scan-assembler "xchgl" } } */
> +
> +#ifdef __i386__
> +#define R , regparm (2)
> +#else
> +#define R
> +#endif

Please use

*/ { dg-additional-options "-mregparm=2" { target ia32 } } */

instead.

> +
> +__attribute__((noipa R)) int
> +bar (int a, int b)
> +{
> +  return b - a + 5;
> +}
> +
> +__attribute__((noipa R)) int
> +foo (int a, int b)
> +{
> +  return 1 + bar (b, a);
> +}
> +
> +int
> +main ()
> +{
> +  return foo (39, 3);
> +}

No need for main in compile-only tests.

> Jakub
>

Re: [PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-19 Thread Richard Henderson

On 11/19/19 9:29 AM, Christophe Lyon wrote:
> On Mon, 18 Nov 2019 at 20:54, Richard Henderson
>  wrote:
>>
>> On 11/18/19 1:30 PM, Christophe Lyon wrote:
>>> I'm sorry to notice that the last test (asm-flag-6.c) fails to execute
>>> when compiling with -mabi=ilp32. I have less details than for Arm,
>>> because here I'm using the Foundation Model as simulator instead of
>>> Qemu. In addition, I'm using an old version of it, so maybe it's a
>>> simulator bug. Does it work on your side?
>>
>> I don't know how to test ilp32 with qemu.  Is there a distribution that uses
>> this mode, and one tests in system mode?  We don't have user-only support for
>> ilp32.
>>
> 
> Sorry I wasn't clear: I test aarch64-elf with -mabi=ilp32, using newlib.

In the short term, can you please try this testsuite patch?


r~
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
index 963b5a48c70..54d7fbf317d 100644
--- a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
@@ -1,6 +1,12 @@
 /* Executable testcase for 'output flags.'  */
 /* { dg-do run } */
 
+#ifdef __LP64__
+#define W ""
+#else
+#define W "w"
+#endif
+
 int test_bits (long nzcv)
 {
   long n, z, c, v;
@@ -16,7 +22,7 @@ int test_cmps (long x, long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@ccgt"(gt), "=@cclt"(lt), "=@ccge"(ge), "=@ccle"(le)
   : [x] "r"(x), [y] "r"(y));
 
@@ -30,7 +36,7 @@ int test_cmpu (unsigned long x, unsigned long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@cchi"(gt), "=@cclo"(lt), "=@cchs"(ge), "=@ccls"(le)
   : [x] "r"(x), [y] "r"(y));

Re: [PATCH] Add one more pass_convert_switch late.

On Mon, Nov 18, 2019 at 10:17 AM Martin Liška  wrote:
>
> On 11/14/19 1:15 PM, Richard Biener wrote:
> > Hmm.  I was thinking of moving the pass instead of adding another one.
> > What's the reason to run switch-conversion during early optimization again?
>
> I'm adding CC, as he's the author of cswitch pass and he probably knows
> more about the pass placement. But I bet early conversion to array access
> rapidly simplifies CFG and other passes can benefit from that.
>
> >
> > But then it's probably not too bad... (and somehow I'd still like to see
> > switch-conversion, switch lowering and if-to-switch be "integrated"
> > somehow, analyzing the IL and then outputting optimized if/switch
> > according to the same cost metric).
>
> Well, my intuition is that gswitch statement is a canonical representation
> of a switch. And so that one wants to have if-to-switch early in the pipeline
> and switch lowering very late?

Just to throw in a tiny thing here - while a switch seems a natural canonical
representation it is actually a poor one for most GCC passes since they're
going to give up on switches and it's CFG complexity.  GCCs switch
representation also isn't exactly good (remember all that label lookup
stuff you have to do).  Switches are a quite high-level representation
and to me only "good" if we are actually going to emit a jump-table.
Given converting a series of ifs to a switch is "hard" (the opposite
is easy) going for a switch early (before CFG transforms mess up
things too much) kind-of makes sense, but lowering back non-jump table
ones soon is equally important.

That's of course all unrelated to adding another switch-conversion pass.
(and yes, switch-converted IL is more friendly than a switch)

Richard.

> Martin
>

Re: [PATCH] libstdc++: Define C++20 range utilities and range factories

2019-11-19 Thread Jonathan Wakely


On 19/11/19 08:51 +, Jonathan Wakely wrote:

On 19/11/19 09:38 +0100, Stephan Bergmann wrote:

On 17/11/2019 02:07, Jonathan Wakely wrote:

This adds another chunk of the  header.

The changes from P1456R1 (Move-only views) and P1862R1 (Range adaptors
for non-copyable iterators) are included, but not the changes from
P1870R1 (forwarding-range is too subtle).

The tests for subrange and iota_view are poor and should be improved.


At least with recent Clang trunk, with -std=c++2a this causes failures like


In file included from 
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/set:61:
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_set.h:1057:48:
 error: default initialization of an object of const type 'const bool'
template inline constexpr bool __enable_view_impl;
 ^
In file included from 
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/set:62:
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_multiset.h:1045:48:
 error: redefinition of '__enable_view_impl'
template inline constexpr bool __enable_view_impl;
 ^
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_set.h:1057:48:
 note: previous definition is here
template inline constexpr bool __enable_view_impl;
 ^


and I'm not sure whether this is legal C++20 code that Clang is just 
not capable of handling?


I'm not sure either. I checked with GCC and it did what I expected
(the first declaration is not a definition, just a declaration). I'll
check if that's valid.

For now the simplest fix would be to just guard all those variable
templates with #if __cpp_lib_concepts since std::ranges::enable_view
only exists when concepts are supported (which is false for Clang).

I'll look into it, thanks for the heads up.


GCC is wrong. I reported https://gcc.gnu.org/PR92576 and I've fixed the
libstdc++ code with the attached patch.

commit 60c1e565ca97887d1d51490237bc3463875c4e99
Author: Jonathan Wakely 
Date:   Tue Nov 19 09:13:42 2019 +

libstdc++: Fix declarations of variable templates

This code is invalid and rejected by other compilers (see PR 92576).

* include/bits/regex.h (ranges::__detail::__enable_view_impl): Fix
declaration.
* include/bits/stl_multiset.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/bits/stl_set.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/bits/unordered_set.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/debug/multiset.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/debug/set.h (ranges::__detail::__enable_view_impl): Likewise.
* include/debug/unordered_set (ranges::__detail::__enable_view_impl):
Likewise.

diff --git a/libstdc++-v3/include/bits/regex.h b/libstdc++-v3/include/bits/regex.h
index 49994369563..4a19065fb58 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -2061,7 +2061,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
 #if __cplusplus > 201703L
 namespace ranges::__detail
 {
-  template inline constexpr bool __enable_view_impl;
+  template extern inline const bool __enable_view_impl;
   template
 inline constexpr bool __enable_view_impl>
   = false;
diff --git a/libstdc++-v3/include/bits/stl_multiset.h b/libstdc++-v3/include/bits/stl_multiset.h
index 9e34961f4a5..14207e62205 100644
--- a/libstdc++-v3/include/bits/stl_multiset.h
+++ b/libstdc++-v3/include/bits/stl_multiset.h
@@ -1042,7 +1042,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 #if __cplusplus > 201703L
 namespace ranges::__detail
 {
-  template inline constexpr bool __enable_view_impl;
+  template extern inline const bool __enable_view_impl;
   template
 inline constexpr bool
   __enable_view_impl<_GLIBCXX_STD_C::multiset<_Key, _Compare, _Alloc>>
diff --git a/libstdc++-v3/include/bits/stl_set.h b/libstdc++-v3/include/bits/stl_set.h
index 135d57af496..85f26c24dae 100644
--- a/libstdc++-v3/include/bits/stl_set.h
+++ b/libstdc++-v3/include/bits/stl_set.h
@@ -1054,7 +1054,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 #if __cplusplus > 201703L
 namespace ranges::__detail
 {
-  template inline constexpr bool __enable_view_impl;
+  template extern inline const bool __enable_view_impl;
   template
 inline constexpr bool
   __enable_view_impl<_GLIBCXX_STD_C::set<_Key, _Compare, _Alloc>> = false;
diff --git a/libstdc++-v3/include/bits/unordered_set.h b/libstdc++-v3/include/bits/unordered_set.h
index 98943fb1a47..b138d02bbe8 100644
--- a/libstdc++-v3/include/bits/unordered_set.h
+++ b/libstdc++-v3/include/bits/unordered_set.h
@@ -1775,7 +1775,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 #if __cplusp

Re: [SVE] PR89007 - Implement generic vector average expansion

2019-11-19 Thread Prathamesh Kulkarni

On Mon, 18 Nov 2019 at 16:17, Kyrill Tkachov
 wrote:
>
> Hi Prathamesh,
>
> On 11/14/19 6:47 PM, Prathamesh Kulkarni wrote:
> > Hi,
> > As suggested in PR, the attached patch falls back to distributing
> > rshift over plus_expr instead of fallback widening -> arithmetic ->
> > narrowing sequence, if target support is not available.
> > Bootstrap+tested on x86_64-unknown-linux-gnu and aarch64-linux-gnu.
> > OK to commit ?
> >
> > Thanks,
> > Prathamesh
>
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c
> new file mode 100644
> index 000..b682f3f3b74
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +#define N 1024
> +unsigned char dst[N];
> +unsigned char in1[N];
> +unsigned char in2[N];
> +
> +void
> +foo ()
> +{
> +  for( int x = 0; x < N; x++ )
> +dst[x] = (in1[x] + in2[x] + 1) >> 1;
> +}
> +
> +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
>
>
> I think you'll want to make the test a bit strong to test the actual 
> instructions expected here.
> You'll also want to test the IFN_AVG_FLOOR case, as your patch adds support 
> for it too.
Hi Kyrill,
Thanks for the suggestions, I have updated  tests in the attached patch.
Does it look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Kyrill
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 8ebbcd76b64..7025a3b4dc2 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info 
> last_stmt_info, tree *type_out)
>
> /* Check for target support.  */
> tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
> -  if (!new_vectype
> -  || !direct_internal_fn_supported_p (ifn, new_vectype,
> - OPTIMIZE_FOR_SPEED))
> +
> +  if (!new_vectype)
>   return NULL;
>
> +  bool ifn_supported
> += direct_internal_fn_supported_p (ifn, new_vectype, OPTIMIZE_FOR_SPEED);
> +
> /* The IR requires a valid vector type for the cast result, even though
>it's likely to be discarded.  */
> *type_out = get_vectype_for_scalar_type (vinfo, type);
> if (!*type_out)
>   return NULL;
>
> -  /* Generate the IFN_AVG* call.  */
> tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> tree new_ops[2];
> vect_convert_inputs (last_stmt_info, 2, new_ops, new_type,
>unprom, new_vectype);
> +
> +  if (!ifn_supported)
> +{
> +  /* If there is no target support available, generate code
> +to distribute rshift over plus and add one depending
> +upon floor or ceil rounding.  */
> +
> +  tree one_cst = build_one_cst (new_type);
> +
> +  tree tmp1 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g1 = gimple_build_assign (tmp1, RSHIFT_EXPR, new_ops[0], 
> one_cst);
> +
> +  tree tmp2 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g2 = gimple_build_assign (tmp2, RSHIFT_EXPR, new_ops[1], 
> one_cst);
> +
> +  tree tmp3 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g3 = gimple_build_assign (tmp3, PLUS_EXPR, tmp1, tmp2);
> +
> +  tree tmp4 = vect_recog_temp_ssa_var (new_type, NULL);
> +  tree_code c = (ifn == IFN_AVG_CEIL) ? BIT_IOR_EXPR : BIT_AND_EXPR;
> +  gassign *g4 = gimple_build_assign (tmp4, c, new_ops[0], new_ops[1]);
> +
> +  tree tmp5 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g5 = gimple_build_assign (tmp5, BIT_AND_EXPR, tmp4, one_cst);
> +
> +  gassign *g6 = gimple_build_assign (new_var, PLUS_EXPR, tmp3, tmp5);
> +
> +  append_pattern_def_seq (last_stmt_info, g1, new_vectype);
> +  append_pattern_def_seq (last_stmt_info, g2, new_vectype);
> +  append_pattern_def_seq (last_stmt_info, g3, new_vectype);
> +  append_pattern_def_seq (last_stmt_info, g4, new_vectype);
> +  append_pattern_def_seq (last_stmt_info, g5, new_vectype);
> +  return vect_convert_output (last_stmt_info, type, g6, new_vectype);
> +}
> +
> +  /* Generate the IFN_AVG* call.  */
> gcall *average_stmt = gimple_build_call_internal (ifn, 2, new_ops[0],
> new_ops[1]);
> gimple_call_set_lhs (average_stmt, new_var);
>
2019-11-19  Prathamesh Kulkarni  

PR tree-optimization/89007
* tree-vect-patterns.c (vect_recog_average_pattern): If there is no
target support available, generate code to distribute rshift over plus
and add one depending upon floor or ceil rounding.

testsuite/
* gcc.target/aarch64/sve/pr89007.c: New test.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
new file mode 100644
index 000..32095c63c61
--- /dev/null
+++ b/gcc/testsuite/gcc.t

Re: [PATCH] Use xchg for -Os (PR target/92549)

> Hi!
> 
> xchg instruction is smaller, in some cases much smaller than 3 moves,
> (e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
> disaster, but from Agner Fog tables and
> https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
> it doesn't seem to be something we'd want to use when optimizing for speed,
> at least not on Intel.
> 
> While we have *swap patterns, those are very unlikely to be triggered
> during combine, usually we have different pseudos in there and the actual
> need for swapping is only materialized during RA.
> 
> The following patch does it when optimizing the insn for size only.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-11-19  Jakub Jelinek  
> 
>   PR target/92549
>   * config/i386/i386.md (peephole2 for *swap): New peephole2.
> 
>   * gcc.target/i386/pr92549.c: New test.

Yep, it makes sense to me.
I remember poking around xchg a while ago.  We used to generate it
pre-reload which had various unexpected consequences, but doing it via
peephole seems safe. So patch is OK.

Honza
> 
> --- gcc/config/i386/i386.md.jj2019-10-28 22:16:14.583008121 +0100
> +++ gcc/config/i386/i386.md   2019-11-18 17:06:36.050742545 +0100
> @@ -2787,6 +2787,17 @@
> (set_attr "amdfam10_decode" "double")
> (set_attr "bdver1_decode" "double")])
>  
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "register_operand")
> + (match_operand:SWI 1 "register_operand"))
> +   (set (match_dup 1)
> + (match_operand:SWI 2 "register_operand"))
> +   (set (match_dup 2) (match_dup 0))]
> +  "peep2_reg_dead_p (3, operands[0])
> +   && optimize_insn_for_size_p ()"
> +  [(parallel [(set (match_dup 1) (match_dup 2))
> +   (set (match_dup 2) (match_dup 1))])])
> +
>  (define_expand "movstrict"
>[(set (strict_low_part (match_operand:SWI12 0 "register_operand"))
>   (match_operand:SWI12 1 "general_operand"))]
> --- gcc/testsuite/gcc.target/i386/pr92549.c.jj2019-11-18 
> 17:48:35.533177377 +0100
> +++ gcc/testsuite/gcc.target/i386/pr92549.c   2019-11-18 17:49:31.888336444 
> +0100
> @@ -0,0 +1,28 @@
> +/* PR target/92549 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -masm=att" } */
> +/* { dg-final { scan-assembler "xchgl" } } */
> +
> +#ifdef __i386__
> +#define R , regparm (2)
> +#else
> +#define R
> +#endif
> +
> +__attribute__((noipa R)) int
> +bar (int a, int b)
> +{
> +  return b - a + 5;
> +}
> +
> +__attribute__((noipa R)) int
> +foo (int a, int b)
> +{
> +  return 1 + bar (b, a);
> +}
> +
> +int
> +main ()
> +{
> +  return foo (39, 3);
> +}
> 
>   Jakub
>

Re: [C++ coroutines 4/6] Middle end expanders and transforms.

On Sun, Nov 17, 2019 at 11:27 AM Iain Sandoe  wrote:
>
>
> As described in the covering note, the main part of this is the
> expansion of the library support builtins, these are simple boolean
> or numerical substitutions.
>
> The functionality of implementing an exit from scope without cleanup
> is performed here by lowering an IFN to a gimple goto.
>
> The final part is the expansion of the coroutine IFNs that describe the
> state machine connections to the dispatchers.
>
>In the front end we construct a single actor function that contains
>the coroutine state machine.
>
>The actor function has three entry conditions:
> 1. from the ramp, resume point 0 - to initial-suspend.
> 2. when resume () is executed (resume point N).
> 3. from the destroy () shim when that is executed.
>
>The actor function begins with two dispatchers; one for resume and
>one for destroy (where the initial entry from the ramp is a special-
>case of resume point 0).
>
>Each suspend point and each dispatch entry is marked with an IFN such
>that we can connect the relevant dispatchers to their target labels.
>
>So, if we have:
>
>CO_YIELD (NUM, FINAL, RES_LAB, DEST_LAB, FRAME_PTR)
>
>This is await point NUM, and is the final await if FINAL is non-zero.
>The resume point is RES_LAB, and the destroy point is DEST_LAB.
>
>We expect to find a CO_ACTOR (NUM) in the resume dispatcher and a
>CO_ACTOR (NUM+1) in the destroy dispatcher.
>
>Initially, the intent of keeping the resume and destroy paths together
>is that the conditionals controlling them are identical, and thus there
>would be duplication of any optimisation of those paths if the split
>were earlier.
>
>Subsequent inlining of the actor (and DCE) is then able to extract the
>resume and destroy paths as separate functions if that is found
>profitable by the optimisers.
>
>Once we have remade the connections to their correct postions, we elide
>the labels that the front end inserted.

Comments inline.

> gcc/ChangeLog:
>
> 2019-11-17  Iain Sandoe  
>
> * Makefile.in: Add coroutine-passes.o.
> * coroutine-passes.cc: New file.
> * passes.def: Add pass_coroutine_lower_builtins,
> pass_coroutine_early_expand_ifns and
> pass_coroutine_finalize_frame.
> * tree-pass.h (make_pass_coroutine_lower_builtins): New.
> (make_pass_coroutine_early_expand_ifns): New.
> (make_pass_coroutine_finalize_frame): New.
> ---
>  gcc/Makefile.in |   1 +
>  gcc/coroutine-passes.cc | 621 
> 
>  gcc/passes.def  |   3 +
>  gcc/tree-pass.h |   3 +
>  4 files changed, 628 insertions(+)
>  create mode 100644 gcc/coroutine-passes.cc
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index ac21401..fc7226a 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1266,6 +1266,7 @@ OBJS = \
> compare-elim.o \
> context.o \
> convert.o \
> +   coroutine-passes.o \
> coverage.o \
> cppbuiltin.o \
> cppdefault.o \
> diff --git a/gcc/coroutine-passes.cc b/gcc/coroutine-passes.cc
> new file mode 100644
> index 000..33e1d38
> --- /dev/null
> +++ b/gcc/coroutine-passes.cc
> @@ -0,0 +1,621 @@
> +/* coroutine expansion and optimisation passes.
> +
> +   Copyright (C) 2018-2019 Free Software Foundation, Inc.
> +
> + Contributed by Iain Sandoe  under contract to Facebook.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "target.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "cgraph.h"
> +#include "pretty-print.h"
> +#include "diagnostic-core.h"
> +#include "fold-const.h"
> +#include "internal-fn.h"
> +#include "langhooks.h"
> +#include "gimplify.h"
> +#include "gimple-iterator.h"
> +#include "gimplify-me.h"
> +#include "gimple-walk.h"
> +#include "gimple-fold.h"
> +#include "tree-cfg.h"
> +#include "tree-into-ssa.h"
> +#include "tree-ssa-propagate.h"
> +#include "gimple-pretty-print.h"
> +#include "cfghooks.h"
> +
> +/* Here we:
> +   * lower the internal function that implements an exit from scope.
> +   * expand the builtins that are used to implement the

Re: [PATCH] Restore init_ggc_heuristics.

On Mon, Nov 18, 2019 at 1:24 PM Martin Liška  wrote:
>
> Hello.
>
> After my param to option transformation, we lost automatic GGC
> detection. It's because init_ggc_heuristics is called before
> init_options_struct which memsets all the values to zero first.
>
> I've tested the patch with --enable-checking=release and I hope
> Honza can test it more?
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

I prefer to _not_ move all the functions.  Moving the init_ggc_heuristics
call is OK.

Thanks,
Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-11-18  Martin Liska  
>
> * ggc-common.c (ggc_rlimit_bound): Move to opts.c
> (ggc_min_expand_heuristic): Likewise.
> (ggc_min_heapsize_heuristic): Likewise.
> (init_ggc_heuristics): Likewise.
> * ggc.h (init_ggc_heuristics): Remove declaration.
> * opts.c (ggc_rlimit_bound): Moved here from ggc-common.c.
> (ggc_min_expand_heuristic): Likewise.
> (ggc_min_heapsize_heuristic): Likewise.
> (init_ggc_heuristics): Likewise.
> (init_options_struct): Init GGC params.
> * toplev.c (general_init): Remove call to init_ggc_heuristics.
> ---
>   gcc/ggc-common.c | 103 -
>   gcc/ggc.h|   3 --
>   gcc/opts.c   | 106 +++
>   gcc/toplev.c |   4 --
>   4 files changed, 106 insertions(+), 110 deletions(-)
>
>

Re: [PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-19 Thread Christophe Lyon

On Tue, 19 Nov 2019 at 10:23, Richard Henderson
 wrote:
>
> On 11/19/19 9:29 AM, Christophe Lyon wrote:
> > On Mon, 18 Nov 2019 at 20:54, Richard Henderson
> >  wrote:
> >>
> >> On 11/18/19 1:30 PM, Christophe Lyon wrote:
> >>> I'm sorry to notice that the last test (asm-flag-6.c) fails to execute
> >>> when compiling with -mabi=ilp32. I have less details than for Arm,
> >>> because here I'm using the Foundation Model as simulator instead of
> >>> Qemu. In addition, I'm using an old version of it, so maybe it's a
> >>> simulator bug. Does it work on your side?
> >>
> >> I don't know how to test ilp32 with qemu.  Is there a distribution that 
> >> uses
> >> this mode, and one tests in system mode?  We don't have user-only support 
> >> for
> >> ilp32.
> >>
> >
> > Sorry I wasn't clear: I test aarch64-elf with -mabi=ilp32, using newlib.
>
> In the short term, can you please try this testsuite patch?
>
I confirm this patch makes the test pass.

>
> r~

Re: [PATCH] Initialize a variable due to -Wmaybe-uninitialized.

2019-11-19 Thread Andreas Schwab

On Nov 18 2019, Jeff Law wrote:

> On 11/18/19 6:17 AM, Martin Liška wrote:
>> Hi.
>> 
>> The patch is about yet another bootstrap -Wmaybe-uninitialized warning.
>> I've just tested that on risv64 cross compiler with latest trunk.
>> 
>> Ready to be installed?
>> Thanks,
>> Martin
>> 
>> gcc/ChangeLog:
>> 
>> 2019-11-18  Martin Liska  
>> 
>> PR bootstrap/92540
>> * config/riscv/riscv.c (riscv_address_insns): Initialize
>> addr in order to remove boostrap -Wmaybe-uninitialized
>> error.
> OK.  I had this internally, but haven't had time to analyze if the
> warnings was a false positive or not.

IMHO it is a false positive.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] Add one more pass_convert_switch late.


On 11/19/19 10:12 AM, Jakub Jelinek wrote:

On Mon, Nov 18, 2019 at 03:34:35PM +0100, Martin Liška wrote:

Now, I believe with the if to gswitch optimization these will only rarely
kick in, because the IL will have switches that reassoc doesn't handle,
instead of series of ifs.


Yes, so my question is whether reassoc can handle gswitch statements similar
to series of if statements? Note that use can write explicitly something like:


I don't see how.  Either the cswitch pass runs early, and then you need to
take it into account when deciding whether to expand it as series of ifs or
table lookup or bittest (say, realize you'd need just a single condition to
cover everything and go with if series), or it runs late and then on the
other side if reassoc turned gswitch into if series without taking into
account other possibilities like table lookup, it could pessimize stuff.
So, IMHO cswitch should he able to do this too.


I agree with you that we should teach switch lowering to also consider the
transformation that reassoc does. I understand it as a generalized bit test,
and I would be happy to implement that on gswitch level. Which leads us
back to need for if-to-switch conversion pass that will enable such 
optimization.

My counter-example is following:

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c
index 944362ad076..58f9737b918 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c
@@ -6,10 +6,15 @@
 
 int test (int a, int b, int c)

 {
-  if ( a == 10 || a == 12 || a == 26)
-return b;
-  else
-return c;
+  switch (a)
+{
+case 10:
+case 12:
+case 26:
+  return b;
+default:
+  return c;
+}
 }
 
 int main ()


Note that I see both representation in the source code equivalent.
It's probably just a matter of taste ;)

Martin





switch (a)
case 0...30:
   return 1;
case 32:
   return 1;
case 34:
   return 1;

which won't be optimized by reassoc. While it can handle something 
0<=A<=30|A==32|A==34.


Jakub

Re: [PATCH] Add one more pass_convert_switch late.

On Tue, Nov 19, 2019 at 11:20:16AM +0100, Martin Liška wrote:
> On 11/19/19 10:12 AM, Jakub Jelinek wrote:
> > On Mon, Nov 18, 2019 at 03:34:35PM +0100, Martin Liška wrote:
> > > > Now, I believe with the if to gswitch optimization these will only 
> > > > rarely
> > > > kick in, because the IL will have switches that reassoc doesn't handle,
> > > > instead of series of ifs.
> > > 
> > > Yes, so my question is whether reassoc can handle gswitch statements 
> > > similar
> > > to series of if statements? Note that use can write explicitly something 
> > > like:
> > 
> > I don't see how.  Either the cswitch pass runs early, and then you need to
> > take it into account when deciding whether to expand it as series of ifs or
> > table lookup or bittest (say, realize you'd need just a single condition to
> > cover everything and go with if series), or it runs late and then on the
> > other side if reassoc turned gswitch into if series without taking into
> > account other possibilities like table lookup, it could pessimize stuff.
> > So, IMHO cswitch should he able to do this too.
> 
> I agree with you that we should teach switch lowering to also consider the
> transformation that reassoc does. I understand it as a generalized bit test,
> and I would be happy to implement that on gswitch level. Which leads us
> back to need for if-to-switch conversion pass that will enable such 
> optimization.

The point I'm trying to make is that with if-to-switch, if cswitch doesn't
handle it, we have a regression.  If user writes it as a switch, it is a
missed optimization if we generate worse code than we could, but not a
regression.

Jakub

[arm, v3] Follow up for asm-flags (thumb1, ilp32)

2019-11-19 Thread Richard Henderson

I'm not sure what happened to v2.  I can see it in my sent email, but it never
made it to the mailing list, and possibly not to Richard E. either.

So resending, with an extra testsuite fix for ilp32, spotted by Christophe.

Re thumb1, rather than an ifdef in config/arm/aarch-common.c, as I did in v1, I
am swapping out a targetm hook when changing into and out of thumb1 mode.


r~


gcc/
* config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
to define __GCC_ASM_FLAG_OUTPUTS__.
* config/arm/arm.c (thumb1_md_asm_adjust): New function.
(arm_option_params_internal): Swap out targetm.md_asm_adjust
depending on TARGET_THUMB1.
* doc/extend.texi (FlagOutputOperands): Document thumb1 restriction.

gcc/testsuite/
* testsuite/gcc.target/arm/asm-flag-3.c: Skip for thumb1.
* testsuite/gcc.target/arm/asm-flag-5.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-6.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-4.c: New test.

* testsuite/gcc.target/aarch64/asm-flag-6.c: Use %w for
asm inputs to cmp instruction for ILP32.


diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index c4485ce7af1..546b35a5cbd 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -122,7 +122,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   if (arm_arch_notm)
 builtin_define ("__ARM_ARCH_ISA_ARM");
   builtin_define ("__APCS_32__");
-  builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
+
+  def_or_undef_macro (pfile, "__GCC_ASM_FLAG_OUTPUTS__", !TARGET_THUMB1);
 
   def_or_undef_macro (pfile, "__thumb__", TARGET_THUMB);
   def_or_undef_macro (pfile, "__thumb2__", TARGET_THUMB2);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1fd30c238cd..a6b401b7f2e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -325,6 +325,9 @@ static unsigned int arm_hard_regno_nregs (unsigned int, 
machine_mode);
 static bool arm_hard_regno_mode_ok (unsigned int, machine_mode);
 static bool arm_modes_tieable_p (machine_mode, machine_mode);
 static HOST_WIDE_INT arm_constant_alignment (const_tree, HOST_WIDE_INT);
+static rtx_insn * thumb1_md_asm_adjust (vec &, vec &,
+   vec &, vec &,
+   HARD_REG_SET &);
 
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -2941,6 +2944,11 @@ arm_option_params_internal (void)
   /* For THUMB2, we limit the conditional sequence to one IT block.  */
   if (TARGET_THUMB2)
 max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
+
+  if (TARGET_THUMB1)
+targetm.md_asm_adjust = thumb1_md_asm_adjust;
+  else
+targetm.md_asm_adjust = arm_md_asm_adjust;
 }
 
 /* True if -mflip-thumb should next add an attribute for the default
@@ -32528,6 +32536,23 @@ arm_run_selftests (void)
 #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
 #endif /* CHECKING_P */
 
+/* Worker function for TARGET_MD_ASM_ADJUST, while in thumb1 mode.
+   Unlike the arm version, we do NOT implement asm flag outputs.  */
+
+rtx_insn *
+thumb1_md_asm_adjust (vec &outputs, vec &/*inputs*/,
+ vec &constraints,
+ vec &/*clobbers*/, HARD_REG_SET &/*clobbered_regs*/)
+{
+  for (unsigned i = 0, n = outputs.length (); i < n; ++i)
+if (strncmp (constraints[i], "=@cc", 4) == 0)
+  {
+   sorry ("asm flags not supported in thumb1 mode");
+   break;
+  }
+  return NULL;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-arm.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
index 963b5a48c70..54d7fbf317d 100644
--- a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
@@ -1,6 +1,12 @@
 /* Executable testcase for 'output flags.'  */
 /* { dg-do run } */
 
+#ifdef __LP64__
+#define W ""
+#else
+#define W "w"
+#endif
+
 int test_bits (long nzcv)
 {
   long n, z, c, v;
@@ -16,7 +22,7 @@ int test_cmps (long x, long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@ccgt"(gt), "=@cclt"(lt), "=@ccge"(ge), "=@ccle"(le)
   : [x] "r"(x), [y] "r"(y));
 
@@ -30,7 +36,7 @@ int test_cmpu (unsigned long x, unsigned long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@cchi"(gt), "=@cclo"(lt), "=@cchs"(ge), "=@ccls"(le)
   : [x] "r"(x), [y] "r"(y));
 
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-1.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
index 9707ebfcebb..97104d3ac73 100644
--- a/gcc/testsuite/gcc.target/arm/asm-flag-1.c
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
@@ -1,6 +1,7 @@
 /* Test the valid @cc asm flag outputs.  */
 /* { dg-do compile } */
 /* { dg-options "-O" } */
+/* { dg-skip-if "" { arm_thumb1 } } */
 
 #ifndef __GCC_ASM_FLAG_OUTPUTS__
 #error "m

Re: [patch, fortran] Load scalar intent-in variables at the beginning of procedures

2019-11-19 Thread Bernhard Reutner-Fischer

On 16 November 2019 21:33:58 CET, Thomas Koenig  wrote:
>Hello world,
>
>here is an update to the patch.

+ char name[GFC_MAX_SYMBOL_LEN + 1];
+ snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
+   f->sym->name);
+
+ if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)

(I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the like.
(II) s/__dummy/__intent_in/ for clarity?

thanks,

Re: [arm, v3] Follow up for asm-flags (thumb1, ilp32)

2019-11-19 Thread Richard Earnshaw (lists)


On 19/11/2019 10:35, Richard Henderson wrote:

I'm not sure what happened to v2.  I can see it in my sent email, but it never
made it to the mailing list, and possibly not to Richard E. either.

So resending, with an extra testsuite fix for ilp32, spotted by Christophe.

Re thumb1, rather than an ifdef in config/arm/aarch-common.c, as I did in v1, I
am swapping out a targetm hook when changing into and out of thumb1 mode.


r~




OK.

R.

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-19 Thread Kyrill Tkachov


Hi Dennis,

On 11/12/19 5:32 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 15:57, Kyrill Tkachov wrote:

On 11/12/19 3:50 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:

Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

  * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
  AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
  AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
  AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
  AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
  (aarch64_init_memtag_builtins): New.
  (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
  (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
  (aarch64_expand_builtin_memtag): New.
  (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
  (AARCH64_BUILTIN_SUBCODE): New macro.
  (aarch64_resolve_overloaded_memtag): New.
  (aarch64_resolve_overloaded_builtin_general): New hook. Call
  aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
  __ARM_FEATURE_MEMORY_TAGGING when enabled.
  (aarch64_resolve_overloaded_builtin): Call
  aarch64_resolve_overloaded_builtin_general.
  * config/aarch64/aarch64-protos.h
  (aarch64_resolve_overloaded_builtin_general): New declaration.
  * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
  (TARGET_MEMTAG): Likewise.
  * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
  UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
  (irg, gmi, subp, addg, ldg, stg): New instructions.
  * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New
macro.
  (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
  (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag):
Likewise.
  * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
  (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
  * config/arm/types.md (memtag): New.
  * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

  * gcc.target/aarch64/acle/memtag_1.c: New test.
  * gcc.target/aarch64/acle/memtag_2.c: New test.
  * gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading
tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics

The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with
the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

   * config/aarch64/aarch64-builtins.c (enum
aarch64_builtins): Add
   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
   AARCH64_MEMTAG_BUILTIN_INC_TAG,
AARCH64_MEMTAG_BUILTIN_SET_TAG,
   AARCH64_MEMTAG_BUILTIN_GET_TAG, and
AARCH64_MEMTAG_BUILTIN_END.
   (aarch64_init_memtag_builtins): New.
   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
   (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
   (aarch64_expand_builtin_memtag): New.
   (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
   (AARCH64_BUILTIN_SUBCODE): New macro.
   (aarch64_resolve_overloaded_memtag): New.
   (aarch64_resolve_overloaded_builtin): New hook. Call
   aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
   __ARM_FEATURE_MEMOR

Re: Add a new combine pass

Segher Boessenkool  writes:
> On Sun, Nov 17, 2019 at 11:35:26PM +, Richard Sandiford wrote:
>> While working on SVE, I've noticed several cases in which we fail
>> to combine instructions because the combined form would need to be
>> placed earlier in the instruction stream than the last of the
>> instructions being combined.  This includes one very important
>> case in the handling of the first fault register (FFR).
>
> Do you have an example of that?

It's difficult to share realistic examples at this stage since this
isn't really the right forum for making them public for the first time.
But in rtl terms we have:

(set (reg/v:VNx16BI 102 [ ok ])
 (reg:VNx16BI 85 ffrt))
(set (reg:VNx16BI 85 ffrt)
 (unspec:VNx16BI [(reg:VNx16BI 85 ffrt)] UNSPEC_UPDATE_FFRT))
(set (reg:CC_NZC 66 cc)
 (unspec:CC_NZC
   [(reg:VNx16BI 106) repeated x2
(const_int 1 [0x1])
(reg/v:VNx16BI 102 [ ok ])] UNSPEC_PTEST))

and want to combine the first and third instruction at the site of the
first instruction.  Current combine gives:

Trying 18 -> 24:
   18: r102:VNx16BI=ffrt:VNx16BI
   24: cc:CC_NZC=unspec[r106:VNx16BI,r106:VNx16BI,0x1,r102:VNx16BI] 104
Can't combine i2 into i3

because of:

  /* Make sure that the value that is to be substituted for the register
 does not use any registers whose values alter in between.  However,
 If the insns are adjacent, a use can't cross a set even though we
 think it might (this can happen for a sequence of insns each setting
 the same destination; last_set of that register might point to
 a NOTE).  If INSN has a REG_EQUIV note, the register is always
 equivalent to the memory so the substitution is valid even if there
 are intervening stores.  Also, don't move a volatile asm or
 UNSPEC_VOLATILE across any other insns.  */
  || (! all_adjacent
  && (((!MEM_P (src)
|| ! find_reg_note (insn, REG_EQUIV, src))
   && modified_between_p (src, insn, i3))
  || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
  || GET_CODE (src) == UNSPEC_VOLATILE))

>> Combine currently requires the combined instruction to live at the same
>> location as i3.
>
> Or i2 and i3.
>
>> I thought about trying to relax that restriction, but it
>> would be difficult to do with the current pass structure while keeping
>> everything linear-ish time.
>
> s/difficult/impossible/, yes.
>
> A long time ago we had to only move insns forward for correctness even,
> but that should no longer be required, combine always is finite by other
> means now.
>
>> So this patch instead goes for an option that has been talked about
>> several times over the years: writing a new combine pass that just
>> does instruction combination, and not all the other optimisations
>> that have been bolted onto combine over time.  E.g. it deliberately
>> doesn't do things like nonzero-bits tracking, since that really ought
>> to be a separate, more global, optimisation.
>
> In my dreams tracking nonzero bits would be a dataflow problem.
>
>> This is still far from being a realistic replacement for the even
>> the combine parts of the current combine pass.  E.g.:
>> 
>> - it only handles combinations that can be built up from individual
>>   two-instruction combinations.
>
> And combine does any of {2,3,4}->{1,2} combinations, and it also can
> modify a third insn ("other_insn").  For the bigger ->1 combos, if it
> *can* be decomposed in a bunch of 2->1, then those result in insns that
> are greater cost than those we started with (or else those combinations
> *would* be done).  For the ->2 combinations, there are many ways those
> two insns can be formed: it can be the two arms of a parallel, or
> combine can break a non-matching insn into two at what looks like a good
> spot for that, or it can use a define_split for it.
>
> All those things lead to many more successful combinations :-)

Right.  I definitely want to support multi-insn combos too.  It's one of
the TODOs in the head comment, along with the other points in this list.
Like I say, it's not yet a realistic replacement for even the combine parts
of the current pass.

>> On a more positive note, the pass handles things that the current
>> combine pass doesn't:
>> 
>> - the main motivating feature mentioned above: it works out where
>>   the combined instruction could validly live and moves it there
>>   if necessary.  If there are a range of valid places, it tries
>>   to pick the best one based on register pressure (although only
>>   with a simple heuristic for now).
>
> How are dependencies represented in your new pass?  If it just does
> walks over the insn stream for everything, you get quadratic complexity
> if you move insns backwards.  We have that in combine already, mostly
> from modified_between_p, but that is limited because of how LOG_LINKS
> work, and we have been doing this for so long and there are no problem

Re: [PATCH] Enable init_ggc_heuristics for ENABLE_GC_CHECKING.


On 11/19/19 10:12 AM, Richard Biener wrote:

On Tue, Nov 19, 2019 at 9:12 AM Martin Liška  wrote:


One potential improvement is to enable the heuristics
for ENABLE_GC_CHECKING. The macro is about sanity checking
and poisoning of GGC memory. Which seems to me completely
independent to setting of the default parameters.


Well, doing more collects also is "checking" and the GC checking part
this way becomes deterministic (not dependent on host setup) so no,
this is by no means an obvious change.  If we think we collect too
often we can adjust the GC param static defaults to todays reality instead
(mostly --param ggc-min-heapsize which has been 4MB for ages).


Sure, that's probably more reasonable change we can do. Do you have a suggestion
when can default be now?



Btw, I see the params.opt default is now 4096 unconditionally so
does ENABLE_GC_ALWAYS_COLLECT no longer "work"?
It used to adjust the params.def default values.


It will work with ENABLE_GC_ALWAYS_COLLECT once the patch
in previous email will be applied.

Martin



Richard.


Ready to be installed?
Thanks,
Martin

Re: [PATCH] Restore init_ggc_heuristics.


On 11/19/19 11:03 AM, Richard Biener wrote:

On Mon, Nov 18, 2019 at 1:24 PM Martin Liška  wrote:


Hello.

After my param to option transformation, we lost automatic GGC
detection. It's because init_ggc_heuristics is called before
init_options_struct which memsets all the values to zero first.

I've tested the patch with --enable-checking=release and I hope
Honza can test it more?

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


I prefer to _not_ move all the functions.  Moving the init_ggc_heuristics
call is OK.


I would like to, but opts.o is also put into all wrappers:

g++ -no-pie   -g   -DIN_GCC -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
-DHAVE_CONFIG_H -static-libstdc++ -static-libgcc   -o xgcc gcc.o gcc-main.o 
ggc-none.o \
  c/gccspec.o driver-i386.o  libcommon-target.a \
   libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a 
../libiberty/libiberty.a ../libdecnumber/libdecnumber.a
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: 
libcommon-target.a(opts.o): in function `init_options_struct(gcc_options*, 
gcc_options*)':
/home/marxin/Programming/gcc/gcc/opts.c:292: undefined reference to 
`init_ggc_heuristics()'
collect2: error: ld returned 1 exit status
make: *** [Makefile:2037: xgcc] Error 1

and adding ggc-common.o to OBJS-libcommon-target will not work.
That's why I also moved the implementation.

Martin



Thanks,
Richard.


Thanks,
Martin

gcc/ChangeLog:

2019-11-18  Martin Liska  

 * ggc-common.c (ggc_rlimit_bound): Move to opts.c
 (ggc_min_expand_heuristic): Likewise.
 (ggc_min_heapsize_heuristic): Likewise.
 (init_ggc_heuristics): Likewise.
 * ggc.h (init_ggc_heuristics): Remove declaration.
 * opts.c (ggc_rlimit_bound): Moved here from ggc-common.c.
 (ggc_min_expand_heuristic): Likewise.
 (ggc_min_heapsize_heuristic): Likewise.
 (init_ggc_heuristics): Likewise.
 (init_options_struct): Init GGC params.
 * toplev.c (general_init): Remove call to init_ggc_heuristics.
---
   gcc/ggc-common.c | 103 -
   gcc/ggc.h|   3 --
   gcc/opts.c   | 106 +++
   gcc/toplev.c |   4 --
   4 files changed, 106 insertions(+), 110 deletions(-)

[PATCH] Fix PR92581



We are now vectorizing condition reduction chains but epilogue generation
doesn't consider them.  The following fixes this by simply emitting
a chain of index vector updates rather than trying to be clever
and combining the conditions which is quite target dependent.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-11-19  Richard Biener  

PR tree-optimization/92581
* tree-vect-loop.c (vect_create_epilog_for_reduction): For
condition reduction chains gather all conditions involved
for computing the index reduction vector.

* gcc.dg/vect/vect-cond-reduc-5.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 278431)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4552,18 +4552,26 @@ vect_create_epilog_for_reduction (stmt_v
  zeroes.  */
   if (STMT_VINFO_REDUC_TYPE (reduc_info) == COND_REDUCTION)
 {
+  auto_vec, 2> ccompares;
   stmt_vec_info cond_info = STMT_VINFO_REDUC_DEF (reduc_info);
   cond_info = vect_stmt_to_vectorize (cond_info);
-  while (gimple_assign_rhs_code (cond_info->stmt) != COND_EXPR)
+  while (cond_info != reduc_info)
{
+ if (gimple_assign_rhs_code (cond_info->stmt) == COND_EXPR)
+   {
+ gimple *vec_stmt = STMT_VINFO_VEC_STMT (cond_info)->stmt;
+ gcc_assert (gimple_assign_rhs_code (vec_stmt) == VEC_COND_EXPR);
+ ccompares.safe_push
+   (std::make_pair (unshare_expr (gimple_assign_rhs1 (vec_stmt)),
+STMT_VINFO_REDUC_IDX (cond_info) == 2));
+   }
  cond_info
= loop_vinfo->lookup_def (gimple_op (cond_info->stmt,
 1 + STMT_VINFO_REDUC_IDX
(cond_info)));
  cond_info = vect_stmt_to_vectorize (cond_info);
}
-  gimple *vec_stmt = STMT_VINFO_VEC_STMT (cond_info)->stmt;
-  gcc_assert (gimple_assign_rhs_code (vec_stmt) == VEC_COND_EXPR);
+  gcc_assert (ccompares.length () != 0);
 
   tree indx_before_incr, indx_after_incr;
   poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
@@ -4605,37 +4613,35 @@ vect_create_epilog_for_reduction (stmt_v
   add_phi_arg (as_a  (new_phi), vec_zero,
   loop_preheader_edge (loop), UNKNOWN_LOCATION);
 
-  /* Now take the condition from the loops original cond_expr
-(VEC_STMT) and produce a new cond_expr (INDEX_COND_EXPR) which for
+  /* Now take the condition from the loops original cond_exprs
+and produce a new cond_exprs (INDEX_COND_EXPR) which for
 every match uses values from the induction variable
 (INDEX_BEFORE_INCR) otherwise uses values from the phi node
 (NEW_PHI_TREE).
 Finally, we update the phi (NEW_PHI_TREE) to take the value of
 the new cond_expr (INDEX_COND_EXPR).  */
-
-  /* Duplicate the condition from vec_stmt.  */
-  tree ccompare = unshare_expr (gimple_assign_rhs1 (vec_stmt));
-
-  /* Create a conditional, where the condition is taken from vec_stmt
-(CCOMPARE).  The then and else values mirror the main VEC_COND_EXPR:
-the reduction phi corresponds to NEW_PHI_TREE and the new values
-correspond to INDEX_BEFORE_INCR.  */
-  gcc_assert (STMT_VINFO_REDUC_IDX (cond_info) >= 1);
-  tree index_cond_expr;
-  if (STMT_VINFO_REDUC_IDX (cond_info) == 2)
-   index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
- ccompare, indx_before_incr, new_phi_tree);
-  else
-   index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
- ccompare, new_phi_tree, indx_before_incr);
-  induction_index = make_ssa_name (cr_index_vector_type);
-  gimple *index_condition = gimple_build_assign (induction_index,
-index_cond_expr);
-  gsi_insert_before (&incr_gsi, index_condition, GSI_SAME_STMT);
-  stmt_vec_info index_vec_info = loop_vinfo->add_stmt (index_condition);
+  gimple_seq stmts = NULL;
+  for (int i = ccompares.length () - 1; i != -1; --i)
+   {
+ tree ccompare = ccompares[i].first;
+ if (ccompares[i].second)
+   new_phi_tree = gimple_build (&stmts, VEC_COND_EXPR,
+cr_index_vector_type,
+ccompare,
+indx_before_incr, new_phi_tree);
+ else
+   new_phi_tree = gimple_build (&stmts, VEC_COND_EXPR,
+cr_index_vector_type,
+ccompare,
+new_phi_tree, indx_before_incr);
+   }
+  gsi_insert_seq_before (&incr_gsi, stmts, GSI_SAME_STMT);

[patch, openacc] Adjust tests for amdgcn offloading

2019-11-19 Thread Andrew Stubbs

This patch adds GCN special casing for most of the OpenACC libgomp tests 
that require it. It also disables one testcase that explicitly uses CUDA.


OK to commit?

Andrew
Update OpenACC tests for amdgcn

2019-11-19  Andrew Stubbs  

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Handle gcn.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/function-not-offloaded.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/tile-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-1.f: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Disable on GCN.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
index b356feb8108..e82a03e8f3c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
@@ -224,6 +224,8 @@ static void cb_compute_construct_end (acc_prof_info *prof_info, acc_event_info *
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index 7cfc364e411..ddf647cda9b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -106,6 +106,8 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *e
 assert (event_info->launch_event.vector_length >= 1);
   else if (acc_device_type == acc_device_nvidia) /* ... is special.  */
 assert (event_info->launch_event.vector_length == 32);
+  else if (acc_device_type == acc_device_gcn) /* ...and so is this.  */
+assert (event_info->launch_event.vector_length == 64);
   else
 {
 #ifdef __OPTIMIZE__
@@ -118,6 +120,8 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *e
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
index ac6eb48cbbe..dc7c7582ce2 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
@@ -265,6 +265,8 @@ static void cb_enter_data_end (acc_prof_info *prof_info, acc_event_info *event_i
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
@@ -319,6 +321,8 @@ static void cb_exit_data_start (acc_prof_info *prof_info, acc_event_info *event_
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
@@ -371,6 +375,8 @@ static void cb_exit_data_end (acc_prof_info *prof_info, acc_event_info *event_in
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
@@ -510,6 +516,8 @@ static void cb_compute_construct_end (acc_prof_info *prof_info, acc_event_info *
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);

Re: Symver attribute

On Tue, Nov 19, 2019 at 07:29:37AM +0100, Jan Hubicka wrote:
> Current patch makes GCC to accept symver attribute on all ELF targets
> and simply output .symver directive into the assembly file hoping for
> the best.  I hope that is acceptable since user will be informed by
> assembler that symver is not supported.
> 
> For testsuite we however needs tests to not fail when .symver is not
> accepted by assembler that can probably by tested by the dg-require
> infrastructure...
> 
> It is not perfect solution but also not much worse than what we do
> elsewhere...

Sounds perfectly fine to me, but how many tests will need changing?  Is
it only those that use symbol versioning directly?


Segher

Re: [PATCH] Enable init_ggc_heuristics for ENABLE_GC_CHECKING.

On Tue, Nov 19, 2019 at 12:36 PM Martin Liška  wrote:
>
> On 11/19/19 10:12 AM, Richard Biener wrote:
> > On Tue, Nov 19, 2019 at 9:12 AM Martin Liška  wrote:
> >>
> >> One potential improvement is to enable the heuristics
> >> for ENABLE_GC_CHECKING. The macro is about sanity checking
> >> and poisoning of GGC memory. Which seems to me completely
> >> independent to setting of the default parameters.
> >
> > Well, doing more collects also is "checking" and the GC checking part
> > this way becomes deterministic (not dependent on host setup) so no,
> > this is by no means an obvious change.  If we think we collect too
> > often we can adjust the GC param static defaults to todays reality instead
> > (mostly --param ggc-min-heapsize which has been 4MB for ages).
>
> Sure, that's probably more reasonable change we can do. Do you have a 
> suggestion
> when can default be now?

I'm not sure.  For GC checking (poisoning, etc.) more collecting is
better because
then stuff is poisoned earlier.  The question is really the intent here.

> >
> > Btw, I see the params.opt default is now 4096 unconditionally so
> > does ENABLE_GC_ALWAYS_COLLECT no longer "work"?
> > It used to adjust the params.def default values.
>
> It will work with ENABLE_GC_ALWAYS_COLLECT once the patch
> in previous email will be applied.
>
> Martin
>
> >
> > Richard.
> >
> >> Ready to be installed?
> >> Thanks,
> >> Martin
>

Re: [PATCH] Restore init_ggc_heuristics.

On Tue, Nov 19, 2019 at 12:37 PM Martin Liška  wrote:
>
> On 11/19/19 11:03 AM, Richard Biener wrote:
> > On Mon, Nov 18, 2019 at 1:24 PM Martin Liška  wrote:
> >>
> >> Hello.
> >>
> >> After my param to option transformation, we lost automatic GGC
> >> detection. It's because init_ggc_heuristics is called before
> >> init_options_struct which memsets all the values to zero first.
> >>
> >> I've tested the patch with --enable-checking=release and I hope
> >> Honza can test it more?
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > I prefer to _not_ move all the functions.  Moving the init_ggc_heuristics
> > call is OK.
>
> I would like to, but opts.o is also put into all wrappers:
>
> g++ -no-pie   -g   -DIN_GCC -fno-exceptions -fno-rtti 
> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
> -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
> -DHAVE_CONFIG_H -static-libstdc++ -static-libgcc   -o xgcc gcc.o gcc-main.o 
> ggc-none.o \
>c/gccspec.o driver-i386.o  libcommon-target.a \
> libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a 
> ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a
> /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: 
> libcommon-target.a(opts.o): in function `init_options_struct(gcc_options*, 
> gcc_options*)':
> /home/marxin/Programming/gcc/gcc/opts.c:292: undefined reference to 
> `init_ggc_heuristics()'
> collect2: error: ld returned 1 exit status
> make: *** [Makefile:2037: xgcc] Error 1
>
> and adding ggc-common.o to OBJS-libcommon-target will not work.
> That's why I also moved the implementation.

Well, then call it from the caller of init_options_struct instead,
right after it or after the
langhook variant is called?

Richard.

>
> Martin
>
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2019-11-18  Martin Liska  
> >>
> >>  * ggc-common.c (ggc_rlimit_bound): Move to opts.c
> >>  (ggc_min_expand_heuristic): Likewise.
> >>  (ggc_min_heapsize_heuristic): Likewise.
> >>  (init_ggc_heuristics): Likewise.
> >>  * ggc.h (init_ggc_heuristics): Remove declaration.
> >>  * opts.c (ggc_rlimit_bound): Moved here from ggc-common.c.
> >>  (ggc_min_expand_heuristic): Likewise.
> >>  (ggc_min_heapsize_heuristic): Likewise.
> >>  (init_ggc_heuristics): Likewise.
> >>  (init_options_struct): Init GGC params.
> >>  * toplev.c (general_init): Remove call to init_ggc_heuristics.
> >> ---
> >>gcc/ggc-common.c | 103 -
> >>gcc/ggc.h|   3 --
> >>gcc/opts.c   | 106 +++
> >>gcc/toplev.c |   4 --
> >>4 files changed, 106 insertions(+), 110 deletions(-)
> >>
> >>
>

Re: [committed] Punt instead of assertion failure when TYPE_MODE is not vmode (PR tree-optimization/92557)

Jakub Jelinek  writes:
> Hi!
>
> This patch restores the previous behavior, there could be many reasons why
> TYPE_MODE is BLKmode or some integral mode instead of a vector mode,
> unsupported vector mode, lack of available registers etc.

Fow the record, I think the assert was valid and I'd have preferred it
if we'd fixed the targets so that they don't return bogus modes for
preferred_simd_mode.  After all, it's the targets that decide which
vector modes are supported and what can go in registers.

So IMO the assert simply exposed a genuine target bug.  I don't strongly
object though.

Thanks,
Richard

> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.
>
> 2019-11-19  Jakub Jelinek  
>
>   PR tree-optimization/92557
>   * omp-low.c (omp_clause_aligned_alignment): Punt if TYPE_MODE is not
>   vmode rather than asserting it always is.
>
>   * gcc.dg/gomp/pr92557.c: New test.
>
> --- gcc/omp-low.c.jj  2019-11-15 00:37:26.359069152 +0100
> +++ gcc/omp-low.c 2019-11-18 16:17:27.767714777 +0100
> @@ -4086,8 +4086,8 @@ omp_clause_aligned_alignment (tree claus
>   if (type == NULL_TREE || TYPE_MODE (type) != mode)
> continue;
>   type = build_vector_type_for_mode (type, vmode);
> - /* The functions above are not allowed to return invalid modes.  */
> - gcc_assert (TYPE_MODE (type) == vmode);
> + if (TYPE_MODE (type) != vmode)
> +   continue;
>   if (TYPE_ALIGN_UNIT (type) > al)
> al = TYPE_ALIGN_UNIT (type);
>}
> --- gcc/testsuite/gcc.dg/gomp/pr92557.c.jj2019-11-18 16:18:18.246961685 
> +0100
> +++ gcc/testsuite/gcc.dg/gomp/pr92557.c   2019-11-18 16:19:21.531017555 
> +0100
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/92557 */
> +/* { dg-do compile } */
> +/* { dg-additional-options "-maltivec" { target powerpc*-*-* } } */
> +
> +void
> +foo (double *p)
> +{
> +  int i;
> +
> +#pragma omp simd aligned (p)
> +  for (i = 0; i < 1; ++i)
> +p[i] = 7.0;
> +}
>
>   Jakub

Re: [committed] Punt instead of assertion failure when TYPE_MODE is not vmode (PR tree-optimization/92557)

On Tue, Nov 19, 2019 at 12:58:19PM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > This patch restores the previous behavior, there could be many reasons why
> > TYPE_MODE is BLKmode or some integral mode instead of a vector mode,
> > unsupported vector mode, lack of available registers etc.
> 
> Fow the record, I think the assert was valid and I'd have preferred it
> if we'd fixed the targets so that they don't return bogus modes for
> preferred_simd_mode.  After all, it's the targets that decide which
> vector modes are supported and what can go in registers.
> 
> So IMO the assert simply exposed a genuine target bug.  I don't strongly
> object though.

I don't object if it is fixed in the targets, I just don't think
omp_clause_aligned_alignment is the place where such target bugs should be
diagnosed.  If we want to enforce it, we should enforce it in some self-test
(though, that wouldn't cover all possible ISA combinations), or in some
checking function somewhere where it is always triggered.
The vectorizer apparently doesn't enforce that.

Jakub

Re: [PATCH] Restore init_ggc_heuristics.


On 11/19/19 1:42 PM, Richard Biener wrote:

Well, then call it from the caller of init_options_struct instead,
right after it or after the
langhook variant is called?


Yes, that's definitely possible, there's a patch that I've just locally tested.

Ready for trunk?
Thanks,
Martin
>From b868170d585354e3cfedb4b6076ec66d475fa66d Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 19 Nov 2019 13:55:40 +0100
Subject: [PATCH] Restore init_ggc_heuristics.

gcc/ChangeLog:

2019-11-19  Martin Liska  

	* toplev.c (general_init): Move the call...
	(toplev::main): ... here as we need init_options_struct
	being called.
---
 gcc/toplev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index d4583bac66c..cfc757d84ba 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1240,10 +1240,6 @@ general_init (const char *argv0, bool init_signals)
   /* Initialize register usage now so switches may override.  */
   init_reg_sets ();
 
-  /* This must be done after global_init_params but before argument
- processing.  */
-  init_ggc_heuristics ();
-
   /* Create the singleton holder for global state.  This creates the
  dump manager.  */
   g = new gcc::context ();
@@ -2377,6 +2373,10 @@ toplev::main (int argc, char **argv)
   init_options_struct (&global_options, &global_options_set);
   lang_hooks.init_options_struct (&global_options);
 
+  /* Init GGC heuristics must be caller after we initialize
+ options.  */
+  init_ggc_heuristics ();
+
   /* Convert the options to an array.  */
   decode_cmdline_options_to_array_default_mask (argc,
 		CONST_CAST2 (const char **,
-- 
2.24.0

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-19 Thread Dennis Zhang

Hi Kyrill,

On 19/11/2019 11:21, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 11/12/19 5:32 PM, Dennis Zhang wrote:
>> Hi Kyrill,
>>
>> On 12/11/2019 15:57, Kyrill Tkachov wrote:
>>> On 11/12/19 3:50 PM, Dennis Zhang wrote:
 Hi Kyrill,

 On 12/11/2019 09:40, Kyrill Tkachov wrote:
> Hi Dennis,
>
> On 11/7/19 1:48 PM, Dennis Zhang wrote:
>> Hi Kyrill,
>>
>> I have rebased the patch on top of current truck.
>> For resolve_overloaded, I redefined my memtag overloading function to
>> fit the latest resolve_overloaded_builtin interface.
>>
>> Regression tested again and survived for aarch64-none-linux-gnu.
> Please reply inline rather than top-posting on gcc-patches.
>
>
>> Cheers
>> Dennis
>>
>> Changelog is updated as following:
>>
>> gcc/ChangeLog:
>>
>> 2019-11-07  Dennis Zhang  
>>
>>   * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): 
>> Add
>>   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
>>   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
>>   AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
>>   AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
>>   (aarch64_init_memtag_builtins): New.
>>   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
>>   (aarch64_general_init_builtins): Call
>> aarch64_init_memtag_builtins.
>>   (aarch64_expand_builtin_memtag): New.
>>   (aarch64_general_expand_builtin): Call
>> aarch64_expand_builtin_memtag.
>>   (AARCH64_BUILTIN_SUBCODE): New macro.
>>   (aarch64_resolve_overloaded_memtag): New.
>>   (aarch64_resolve_overloaded_builtin_general): New hook. Call
>>   aarch64_resolve_overloaded_memtag to handle overloaded MTE
>> builtins.
>>   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): 
>> Define
>>   __ARM_FEATURE_MEMORY_TAGGING when enabled.
>>   (aarch64_resolve_overloaded_builtin): Call
>>   aarch64_resolve_overloaded_builtin_general.
>>   * config/aarch64/aarch64-protos.h
>>   (aarch64_resolve_overloaded_builtin_general): New declaration.
>>   * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
>>   (TARGET_MEMTAG): Likewise.
>>   * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
>>   UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
>>   (irg, gmi, subp, addg, ldg, stg): New instructions.
>>   * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New
>> macro.
>>   (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
>>   (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag):
>> Likewise.
>>   * config/aarch64/predicates.md (aarch64_memtag_tag_offset): 
>> New.
>>   (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
>>   * config/arm/types.md (memtag): New.
>>   * doc/invoke.texi (-memtag): Update description.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-07  Dennis Zhang  
>>
>>   * gcc.target/aarch64/acle/memtag_1.c: New test.
>>   * gcc.target/aarch64/acle/memtag_2.c: New test.
>>   * gcc.target/aarch64/acle/memtag_3.c: New test.
>>
>>
>> On 04/11/2019 16:40, Kyrill Tkachov wrote:
>>> Hi Dennis,
>>>
>>> On 10/17/19 11:03 AM, Dennis Zhang wrote:
 Hi,

 Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
 It can be used for spatial and temporal memory safety detection and
 lightweight lock and key system.

 This patch enables new intrinsics leveraging MTE instructions to
 implement functionalities of creating tags, setting tags, reading
 tags,
 and manipulating tags.
 The intrinsics are part of Arm ACLE extension:
 https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics 


 The MTE ISA specification can be found at
 https://developer.arm.com/docs/ddi0487/latest chapter D6.

 Bootstraped and regtested for aarch64-none-linux-gnu.

 Please help to check if it's OK for trunk.

>>> This looks mostly ok to me but for further review this needs to be
>>> rebased on top of current trunk as there are some conflicts with
>>> the SVE
>>> ACLE changes that recently went in. Most conflicts looks trivial to
>>> resolve but one that needs more attention is the definition of the
>>> TARGET_RESOLVE_OVERLOADED_BUILTIN hook.
>>>
>>> Thanks,
>>>
>>> Kyrill
>>>
 Many Thanks
 Dennis

 gcc/ChangeLog:

 2019-10-16  Dennis Zhang  

    * config/aarch64/aarch64-builtins.c (enum
 aarch64_builtins): Add

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-19 Thread Kyrill Tkachov




On 11/19/19 1:41 PM, Dennis Zhang wrote:

Hi Kyrill,

On 19/11/2019 11:21, Kyrill Tkachov wrote:

Hi Dennis,

On 11/12/19 5:32 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 15:57, Kyrill Tkachov wrote:

On 11/12/19 3:50 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:

Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

   * config/aarch64/aarch64-builtins.c (enum aarch64_builtins):
Add
   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
   AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
   AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
   (aarch64_init_memtag_builtins): New.
   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
   (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
   (aarch64_expand_builtin_memtag): New.
   (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
   (AARCH64_BUILTIN_SUBCODE): New macro.
   (aarch64_resolve_overloaded_memtag): New.
   (aarch64_resolve_overloaded_builtin_general): New hook. Call
   aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
   __ARM_FEATURE_MEMORY_TAGGING when enabled.
   (aarch64_resolve_overloaded_builtin): Call
   aarch64_resolve_overloaded_builtin_general.
   * config/aarch64/aarch64-protos.h
   (aarch64_resolve_overloaded_builtin_general): New declaration.
   * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
   (TARGET_MEMTAG): Likewise.
   * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
   UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
   (irg, gmi, subp, addg, ldg, stg): New instructions.
   * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New
macro.
   (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
   (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag):
Likewise.
   * config/aarch64/predicates.md (aarch64_memtag_tag_offset):
New.
   (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
   * config/arm/types.md (memtag): New.
   * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

   * gcc.target/aarch64/acle/memtag_1.c: New test.
   * gcc.target/aarch64/acle/memtag_2.c: New test.
   * gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading
tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics


The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with
the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

    * config/aarch64/aarch64-builtins.c (enum
aarch64_builtins): Add
    AARCH64_MEMTAG_BUILTIN_START,
AARCH64_MEMTAG_BUILTIN_IRG,
    AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
    AARCH64_MEMTAG_BUILTIN_INC_TAG,
AARCH64_MEMTAG_BUILTIN_SET_TAG,
    AARCH64_MEMTAG_BUILTIN_GET_TAG, and
AARCH64_MEMTAG_BUILTIN_END.
    (aarch64_init_memtag_builtins): New.
    (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
    (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
    (aarch64_expand_builtin_memtag): New.
    (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
    (AARCH64_BUILTIN_SUBCODE): New macro.
    (aarch64_resolve_overloaded_memtag): New.
    (aarch64_resolve_overloaded_builtin): New hook. Call
    aarch64_resolve_overloaded_

Re: [PATCH] Add one more pass_convert_switch late.


On 11/19/19 10:25 AM, Richard Biener wrote:

On Mon, Nov 18, 2019 at 10:17 AM Martin Liška  wrote:


On 11/14/19 1:15 PM, Richard Biener wrote:

Hmm.  I was thinking of moving the pass instead of adding another one.
What's the reason to run switch-conversion during early optimization again?


I'm adding CC, as he's the author of cswitch pass and he probably knows
more about the pass placement. But I bet early conversion to array access
rapidly simplifies CFG and other passes can benefit from that.



But then it's probably not too bad... (and somehow I'd still like to see
switch-conversion, switch lowering and if-to-switch be "integrated"
somehow, analyzing the IL and then outputting optimized if/switch
according to the same cost metric).


Well, my intuition is that gswitch statement is a canonical representation
of a switch. And so that one wants to have if-to-switch early in the pipeline
and switch lowering very late?


Just to throw in a tiny thing here - while a switch seems a natural canonical
representation it is actually a poor one for most GCC passes since they're
going to give up on switches and it's CFG complexity.  GCCs switch
representation also isn't exactly good (remember all that label lookup
stuff you have to do).  Switches are a quite high-level representation
and to me only "good" if we are actually going to emit a jump-table.


I agree with you here. My simple counter example would be a user input
where a switch will be explicitly used :)


Given converting a series of ifs to a switch is "hard" (the opposite
is easy) going for a switch early (before CFG transforms mess up
things too much) kind-of makes sense, but lowering back non-jump table
ones soon is equally important.


What we can do is to share decision login in between iftoswitch pass and
cswitch/switch lowering. That way we can convert chains of if statements
that will be interesting for any of the later passes.

One comment here is that 'if-elseif -> gswitch -> lowering to ifs' can
still produce a faster code as be expand smart to a decision tree.

The question still remains what do we want for GCC 10? A selective
iftoswitch conversion?

Thanks,
Martin



That's of course all unrelated to adding another switch-conversion pass.
(and yes, switch-converted IL is more friendly than a switch)

Richard.


Martin

Re: [PATCH] Add one more pass_convert_switch late.


On 11/19/19 11:25 AM, Jakub Jelinek wrote:

The point I'm trying to make is that with if-to-switch, if cswitch doesn't
handle it, we have a regression.  If user writes it as a switch, it is a
missed optimization if we generate worse code than we could, but not a
regression.


Sure, but the question is how common are these conditions that can reassoc
transform using the fancy popcount techniques? On the other handy, if-to-switch
can optimize more switch statements to e.g. jump tables.

As mentioned in the previous email, I can imagine doing gswitch from a if chain
only if cswitch/switch lowering can handle it.

Martin

[committed] Update loop-1.c test for amdgcn

2019-11-19 Thread Andrew Stubbs

I've committed this minor testcase update. The change updates the 
expected result for amdgcn, but shouldn't affect any other target.


The compiler used to calculate the function address once and then call 
it five times. Now it repeats the calculation five times, so the pattern 
doesn't match. The code is still correct for the purpose of the testcase 
either way, however, so I'm removing the over-fussy match.


Andrew
Update loop-1.c test for amdgcn

2019-11-19  Andrew Stubbs  

	gcc/testsuite/
	* gcc.dg/tree-ssa/loop-1.c: Change amdgcn assembler scan.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c
index 4b5a43457b0..39ee4dea883 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c
@@ -45,8 +45,6 @@ int xxx(void)
relaxation.  */
 /* CRIS and MSP430 keep the address in a register.  */
 /* m68k sometimes puts the address in a register, depending on CPU and PIC.  */
-/* AMD GCN loads symbol addresses as hi/lo pairs, and then reuses that for
-   each jump.  */
 
 /* { dg-final { scan-assembler-times "foo" 5 { xfail hppa*-*-* ia64*-*-* sh*-*-* cris-*-* crisv32-*-* fido-*-* m68k-*-* i?86-*-mingw* i?86-*-cygwin* x86_64-*-mingw* visium-*-* nvptx*-*-* pdp11*-*-* msp430-*-* amdgcn*-*-* } } } */
 /* { dg-final { scan-assembler-times "foo,%r" 5 { target hppa*-*-* } } } */
@@ -58,5 +56,4 @@ int xxx(void)
 /* { dg-final { scan-assembler-times "\[jb\]sr" 5 { target fido-*-* m68k-*-* pdp11-*-* } } } */
 /* { dg-final { scan-assembler-times "bra *tr,r\[1-9\]*,r21" 5 { target visium-*-* } } } */
 /* { dg-final { scan-assembler-times "(?n)\[ \t\]call\[ \t\].*\[ \t\]foo," 5 { target nvptx*-*-* } } } */
-/* { dg-final { scan-assembler-times "add_u32\t\[sv\]\[0-9\]*, \[sv\]\[0-9\]*, foo@rel32@lo" 1 { target { amdgcn*-*-* } } } } */
 /* { dg-final { scan-assembler-times "s_swappc_b64" 5 { target { amdgcn*-*-* } } } } */

Re: [PATCH] Restore init_ggc_heuristics.

On Tue, Nov 19, 2019 at 2:18 PM Martin Liška  wrote:
>
> On 11/19/19 1:42 PM, Richard Biener wrote:
> > Well, then call it from the caller of init_options_struct instead,
> > right after it or after the
> > langhook variant is called?
>
> Yes, that's definitely possible, there's a patch that I've just locally 
> tested.
>
> Ready for trunk?

OK.

> Thanks,
> Martin

Re: [PATCH] Add one more pass_convert_switch late.

On Tue, Nov 19, 2019 at 2:54 PM Martin Liška  wrote:
>
> On 11/19/19 10:25 AM, Richard Biener wrote:
> > On Mon, Nov 18, 2019 at 10:17 AM Martin Liška  wrote:
> >>
> >> On 11/14/19 1:15 PM, Richard Biener wrote:
> >>> Hmm.  I was thinking of moving the pass instead of adding another one.
> >>> What's the reason to run switch-conversion during early optimization 
> >>> again?
> >>
> >> I'm adding CC, as he's the author of cswitch pass and he probably knows
> >> more about the pass placement. But I bet early conversion to array access
> >> rapidly simplifies CFG and other passes can benefit from that.
> >>
> >>>
> >>> But then it's probably not too bad... (and somehow I'd still like to see
> >>> switch-conversion, switch lowering and if-to-switch be "integrated"
> >>> somehow, analyzing the IL and then outputting optimized if/switch
> >>> according to the same cost metric).
> >>
> >> Well, my intuition is that gswitch statement is a canonical representation
> >> of a switch. And so that one wants to have if-to-switch early in the 
> >> pipeline
> >> and switch lowering very late?
> >
> > Just to throw in a tiny thing here - while a switch seems a natural 
> > canonical
> > representation it is actually a poor one for most GCC passes since they're
> > going to give up on switches and it's CFG complexity.  GCCs switch
> > representation also isn't exactly good (remember all that label lookup
> > stuff you have to do).  Switches are a quite high-level representation
> > and to me only "good" if we are actually going to emit a jump-table.
>
> I agree with you here. My simple counter example would be a user input
> where a switch will be explicitly used :)
>
> > Given converting a series of ifs to a switch is "hard" (the opposite
> > is easy) going for a switch early (before CFG transforms mess up
> > things too much) kind-of makes sense, but lowering back non-jump table
> > ones soon is equally important.
>
> What we can do is to share decision login in between iftoswitch pass and
> cswitch/switch lowering. That way we can convert chains of if statements
> that will be interesting for any of the later passes.
>
> One comment here is that 'if-elseif -> gswitch -> lowering to ifs' can
> still produce a faster code as be expand smart to a decision tree.

Sure.  As said in the if-to-switch pass "review" all these (now three)
passes should share an intermediate representation and costing they
then generate code from (either switch-converted lookups, switches
or optimized if-chains).  There's no reason to actually generate a switch
from an if-chain if we just want to optimize the if-chain (it might be
convenient for the code generator though, not sure).  Likewise if the
switch ends up switch-converted there's no reason to produce the
switch first.

But first and foremost sharing costing and transform decision code
which means having some unified representation of the IL semantics
is what is needed for this.

> The question still remains what do we want for GCC 10? A selective
> iftoswitch conversion?

Not sure if we really need it for GCC 10 - was there any pressing reason
other than PRxyz asks for it?

Thanks,
Richard.

> Thanks,
> Martin
>
> >
> > That's of course all unrelated to adding another switch-conversion pass.
> > (and yes, switch-converted IL is more friendly than a switch)
> >
> > Richard.
> >
> >> Martin
> >>
>

Re: Symver attribute

> On Tue, Nov 19, 2019 at 07:29:37AM +0100, Jan Hubicka wrote:
> > Current patch makes GCC to accept symver attribute on all ELF targets
> > and simply output .symver directive into the assembly file hoping for
> > the best.  I hope that is acceptable since user will be informed by
> > assembler that symver is not supported.
> > 
> > For testsuite we however needs tests to not fail when .symver is not
> > accepted by assembler that can probably by tested by the dg-require
> > infrastructure...
> > 
> > It is not perfect solution but also not much worse than what we do
> > elsewhere...
> 
> Sounds perfectly fine to me, but how many tests will need changing?  Is
> it only those that use symbol versioning directly?

There are no tests presently, I plan to write some so those will get
dg-require-symver.

.symver is output only if you use explicit symver function/variable
attribute. So the "only" downdisde of this is that instead of getting
friendly error message from GCC that your target does not support
symvers (because we can not easily check for it) you will get less
friendly error message from assembler.  I hope that is acceptale since
we have pre-existing situations like that already.

Honza

> 
> 
> Segher

Re: [committed] Punt instead of assertion failure when TYPE_MODE is not vmode (PR tree-optimization/92557)

On Tue, Nov 19, 2019 at 12:58:19PM +, Richard Sandiford wrote:
> Fow the record, I think the assert was valid and I'd have preferred it
> if we'd fixed the targets so that they don't return bogus modes for
> preferred_simd_mode.  After all, it's the targets that decide which
> vector modes are supported and what can go in registers.
> 
> So IMO the assert simply exposed a genuine target bug.  I don't strongly
> object though.

Why is this a bug?  Saying V2DI is the preferred vector mode for DI does
not imply that we can actually put V2DI in registers, does it?  It isn't
documented like that, in any case.

(Of course it is somewhat silly to return a mode that cannot be used, and
it is likely the target can do better.)


Segher

Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores

ping

Various CPUs have max_cond_insns set to 5 due to historical reasons.
Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A
cores, so change it to 2 for all Cortex-A cores.  Set max_cond_insns
to 4 on Thumb-2 architectures given it's already limited to that by
MAX_INSN_PER_IT_BLOCK.  Also use the CPU tuning setting when a CPU/tune
is selected if -mrestrict-it is not explicitly set.

On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well
as a 0.4% codesize reduction.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-08-19  Wilco Dijkstra  

* gcc/config/arm/arm.c (arm_option_override_internal):
Use max_cond_insns from CPU tuning unless -mrestrict-it is used.
(arm_v6t2_tune): set max_cond_insns to 4.
(arm_cortex_tune): set max_cond_insns to 2.
(arm_cortex_a8_tune): Likewise.
(arm_cortex_a7_tune): Likewise.
(arm_cortex_a35_tune): Likewise.
(arm_cortex_a53_tune): Likewise.
(arm_cortex_a5_tune): Likewise.
(arm_cortex_a9_tune): Likewise.
(arm_v6m_tune): set max_cond_insns to 4.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
628cf02f23fb29392a63d87f561c3ee2fb73a515..38ac16ad1def91ca78ccfa98fd1679b2b5114851
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1943,7 +1943,7 @@ const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1968,7 +1968,7 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1991,7 +1991,7 @@ const struct tune_params arm_cortex_a8_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2014,7 +2014,7 @@ const struct tune_params arm_cortex_a7_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2060,7 +2060,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2083,7 +2083,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2167,9 +2167,6 @@ const struct tune_params arm_xgene1_tune =
   tune_params::SCHED_AUTOPREF_OFF
 };
 
-/* Branches can be dual-issued on Cortex-A5, so conditional execution is
-   less appealing.  Set max_insns_skipped to a low value.  */
-
 const struct tune_params arm_cortex_a5_tune =
 {
   &cortexa5_extra_costs,
@@ -2178,7 +2175,7 @@ const struct tune_params arm_cortex_a5_tune =
   arm_cortex_a5_branch_cost,
   &arm_default_vec_cost,
   1,   /* Constant limit.  */
-  1,   /* Max cond insns.  */
+  2,

Re: [PATCH][Arm] Only enable fsched-pressure with Ofast

ping

The current pressure scheduler doesn't appear to correctly track register
pressure and avoid creating unnecessary spills when register pressure is high.
As a result disabling the early scheduler improves integer performance
considerably and reduces codesize as a bonus. Since scheduling floating point
code is generally beneficial (more registers and higher latencies), only enable
the pressure scheduler with -Ofast.

On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well
as a 0.2% codesize reduction.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-11-06  Wilco Dijkstra  

* gcc/common/config/arm-common.c (arm_option_optimization_table):
Enable fsched_pressure with Ofast only.

--
diff --git a/gcc/common/config/arm/arm-common.c 
b/gcc/common/config/arm/arm-common.c
index 
41a920f6dc96833e778faa8dbcc19beac483734c..b761d3abd670a144a593c4b410b1e7fbdcb52f56
 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -38,7 +38,7 @@ static const struct default_options 
arm_option_optimization_table[] =
   {
 /* Enable section anchors by default at -O1 or higher.  */
 { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
-{ OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
+{ OPT_LEVELS_FAST, OPT_fsched_pressure, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };

Re: [PATCH][AArch64] PR79262: Adjust vector cost



ping

PR79262 has been fixed for almost all AArch64 cpus, however the example is still
vectorized in a few cases, resulting in lower performance.  Increase the cost of
vector-to-scalar moves so it is more similar to the other vector costs. As a 
result
-mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
performance improves.

OK for commit?

ChangeLog:
2018-01-22  Wilco Dijkstra  

PR target/79262
* config/aarch64/aarch64.c (generic_vector_cost): Adjust 
vec_to_scalar_cost.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
   1, /* vec_int_stmt_cost  */
   1, /* vec_fp_stmt_cost  */
   2, /* vec_permute_cost  */
-  1, /* vec_to_scalar_cost  */
+  2, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
   1, /* vec_align_load_cost  */
   1, /* vec_unalign_load_cost  */

Re: [PATCH][AArch64] PR79262: Adjust vector cost

Wilco Dijkstra  writes:
> ping

I acked this here: https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01229.html

>
> PR79262 has been fixed for almost all AArch64 cpus, however the example is 
> still
> vectorized in a few cases, resulting in lower performance.  Increase the cost 
> of
> vector-to-scalar moves so it is more similar to the other vector costs. As a 
> result
> -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> performance improves.
>
> OK for commit?
>
> ChangeLog:
> 2018-01-22  Wilco Dijkstra  
>
> PR target/79262
> * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
> vec_to_scalar_cost.
> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
>1, /* vec_int_stmt_cost  */
>1, /* vec_fp_stmt_cost  */
>2, /* vec_permute_cost  */
> -  1, /* vec_to_scalar_cost  */
> +  2, /* vec_to_scalar_cost  */
>1, /* scalar_to_vec_cost  */
>1, /* vec_align_load_cost  */
>1, /* vec_unalign_load_cost  */

Re: [PATCH] Use xchg for -Os (PR target/92549)

2019-11-19 Thread Vladimir Makarov

On 11/19/19 4:19 AM, Richard Biener wrote:

On Tue, Nov 19, 2019 at 10:04 AM Jakub Jelinek wrote:

Hi!

xchg instruction is smaller, in some cases much smaller than 3 moves,
(e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
disaster, but from Agner Fog tables and
https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
it doesn't seem to be something we'd want to use when optimizing for speed,
at least not on Intel.

While we have *swap patterns, those are very unlikely to be triggered
during combine, usually we have different pseudos in there and the actual
need for swapping is only materialized during RA.

The following patch does it when optimizing the insn for size only.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I wonder if IRA/LRA should be made aware of an xchg instruction
(and it's cost?) so it knows it doesn't actually need a temporary register?

There is one place where it could be used. It is
ira-emit.c::modify_move_list where register shuffling on region border
happens. It is possible that pseudo1:hr1, pseudo2:hr2 inside a region
should be pseudo1:hr2 and pseudo2:hr1 outside the region.

First I used a trick of 3 xor insns to break cycle w/o new temp
creation. But such move cycles were so very rare events that I just
finished with new temp creation. That is extremely rare case, the goal
IRA to decrease any move generation on region borders. But if somebody
implements xchg usage there, I'll not be against this.

Re: [PATCH] Fix dwarf-lineinfo inconsistency of inlined subroutines

2019-11-19 Thread Bernd Edlinger

On 11/19/19 3:01 AM, Alexandre Oliva wrote:
> Hello, Bernd,
> 
> Apologies for taking so long to respond.  I had not noticed your patch
> before being explicitly copied on it.
> 
> IIUC you're describing a problem in GDB, that could be summed up as its
> not paying attention to is_stmt and being unaware of location views, and
> you appear to be proposing to work around that by changing what AFAICT
> is a correct line number program into one that, though incorrect, will
> get more desirable behavior out of current GDB.
> 
> If that is so, I'm inclined to disagree that this is a desirable change.
> We should strive to generate correct, rather than incorrect debug
> information.  It was foreseen and expected that statement frontiers and
> location views would introduce occasional behavior regressions in
> debuggers unable to consume this information; what was not expected was
> that debuggers would lack support for it for so long.
> 
> I'd much rather make debug info incorrect, and instead introduce support
> for the extended debug info in consumers.  I realize it would take a lot
> more work to implement a proper fix there, however.  Unfortunately, I
> don't know of anyone currently working on that counterpart
> implementation, so the best recommendation I can offer to avoid this
> problem is to disable statement frontiers (-gno-statement-frontiers) and
> location views (-gno-variable-location-views).  This will get you line
> number programs and location lists that do not exercise features that
> GDB is not aware of.
> 
> Perhaps we should change our defaults, if the situation with GDB does
> not change :-(
> 

I tried those options, and -gno-variable-location-views changes
nothing in the debug exprience, but -gno-statement-frontiers
makes gdb do this:

Breakpoint 1, get_alias_set (t=t@entry=0x77ff6c60) at 
../../gcc-10-20191117/gcc/alias.c:829
829 {
(gdb) n
837   if (t == error_mark_node
(gdb) n
829 {
(gdb) n
838   || (! TYPE_P (t)
(gdb) n
839   && (TREE_TYPE (t) == 0 || TREE_TYPE (t) == error_mark_node)))
(gdb) n
851   STRIP_NOPS (t);
(gdb)


So while the inline function is skipped this time, the function prologue
is executed twice, and the code is probably more jumpy than before.

It is interesting that my previous attempt at fixing it in gdb resulted
in basically the same effect :-/


My question to you is this: can you explain, given the following debug info, 
extracted
by readelf from alias.c, why the address 0x6e11 is in the inlined subroutine
and *not* in the get_alias_set function?  If you can explain it to me, I can 
probably
explain it to gdb as well.


  [0x8b37]  Special opcode 61: advance Address by 4 to 0x6e04 and Line by 0 
to 3386
  [0x8b38]  Set is_stmt to 1
  [0x8b39]  Special opcode 189: advance Address by 13 to 0x6e11 and Line by 
2 to 3388
  [0x8b3a]  Set is_stmt to 0
  [0x8b3b]  Copy (view 1)
  [0x8b3c]  Set File Name to entry 5 in the File Name Table
  [0x8b3e]  Set column to 8
  [0x8b40]  Advance Line by -2549 to 839

 <2><45b53>: Abbrev Number: 14 (DW_TAG_inlined_subroutine)
<45b54>   DW_AT_abstract_origin: <0x4c5b0>
<45b58>   DW_AT_entry_pc: 0x6e00
<45b60>   DW_AT_GNU_entry_view: 1
<45b62>   DW_AT_ranges  : 0x9d60
<45b66>   DW_AT_call_file   : 5
<45b67>   DW_AT_call_line   : 839
<45b69>   DW_AT_call_column : 8
<45b6a>   DW_AT_sibling : <0x45be4>

9d60 6e00 6e11
9d60 1115 112f
9d60 

So I read this: when pc==0x6e11, the location is line 3388 (in tree.h) it is a 
statement.
But the inlined subroutine "contains_struct_check" extends from 0x6e00..0x6e11 
and from
0x1115..0x112f.  Therefore 0x6e11 is OUTSIDE the subroutine, and that is what 
the gdb
thinks there as well.


Thanks
Bernd.

Re: [PATCH][AArch64] PR79262: Adjust vector cost

Hi Richard,

> I acked this here: 
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01229.html

Thanks - I missed your email, but it's committed now. Yes we will
need to look at the vector costs again and retune them based on
recent vectorizer improvements and latest microarchitectures.

Cheers,
Wilco

RFA; patch to fix PR90007

2019-11-19 Thread Vladimir Makarov


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007

Sometime ago a code which permits LRA to reload hard register into 
memory as it did for pseudo were added.  But this LRA possibility was 
not reflected in recog.c.  The following patch fixes this discrepancy 
and as a result fixes PR90007.


OK to commit?

Index: ChangeLog
===
--- ChangeLog	(revision 278451)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2019-11-19  Vladimir Makarov  
+
+	PR rtl-optimization/90007
+	* recog.c (constrain_operands): Permit hard registers too for
+	memory when LRA is used.
+
 2019-11-19  Martin Liska  
 
 	* toplev.c (general_init): Move the call...
Index: recog.c
===
--- recog.c	(revision 278413)
+++ recog.c	(working copy)
@@ -2757,10 +2757,9 @@ constrain_operands (int strict, alternat
 			   /* Before reload, accept what reload can turn
   into a mem.  */
 			   || (strict < 0 && CONSTANT_P (op))
-			   /* Before reload, accept a pseudo,
+			   /* Before reload, accept a pseudo or hard register,
   since LRA can turn it into a mem.  */
-			   || (strict < 0 && targetm.lra_p () && REG_P (op)
-   && REGNO (op) >= FIRST_PSEUDO_REGISTER)
+			   || (strict < 0 && targetm.lra_p () && REG_P (op))
 			   /* During reload, accept a pseudo  */
 			   || (reload_in_progress && REG_P (op)
    && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 278451)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2019-11-19  Vladimir Makarov  
+
+	PR rtl-optimization/90007
+	* gcc.target/i386/pr90007.c: New test.
+
 2019-11-15  Andrew Sutton  
 
 	PR c++/89913
Index: testsuite/gcc.target/i386/pr90007.c
===
--- testsuite/gcc.target/i386/pr90007.c	(nonexistent)
+++ testsuite/gcc.target/i386/pr90007.c	(working copy)
@@ -0,0 +1,15 @@
+/* PR rtl-optimization/90007 */
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-march=bdver1 -mfpmath=387 -O1 -fschedule-insns -fselective-scheduling" } */
+
+void
+qj (int b9, int r9, int k4, int k0, int e7)
+{
+  (void) b9;
+  (void) r9;
+  (void) k4;
+
+  while (!!k0 == e7 * 1.1)
+{
+}
+}

Reverting r278411

Hi,

r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
author of this simplify-rtx code, and I think what you did is all wrong.
Also, it should be two patches, the CSE part should be separate.  (I can
not tell if that is the part that regresses us, either).

Could you please revert the patch?  I'll re-implement the simplify-rtx
part of it properly, if you want.  And I can test the CSE part separately.
But right now we need trunk to work again.

Thanks,


Segher

Re: Symver attribute

On Tue, Nov 19, 2019 at 03:53:43PM +0100, Jan Hubicka wrote:
> > Sounds perfectly fine to me, but how many tests will need changing?  Is
> > it only those that use symbol versioning directly?
> 
> There are no tests presently, I plan to write some so those will get
> dg-require-symver.
> 
> .symver is output only if you use explicit symver function/variable
> attribute. So the "only" downdisde of this is that instead of getting
> friendly error message from GCC that your target does not support
> symvers (because we can not easily check for it) you will get less
> friendly error message from assembler.  I hope that is acceptale since
> we have pre-existing situations like that already.

Ah, I thought this would happen for various things from libc as well, so
there would be a lot of random fallout.  I probably misunderstood :-)


Segher

Re: Reverting r278411

Segher Boessenkool  writes:
> Hi,
>
> r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
> author of this simplify-rtx code, and I think what you did is all wrong.
> Also, it should be two patches, the CSE part should be separate.  (I can
> not tell if that is the part that regresses us, either).
>
> Could you please revert the patch?  I'll re-implement the simplify-rtx
> part of it properly, if you want.  And I can test the CSE part separately.
> But right now we need trunk to work again.

OK, done as r278455.

Richard

Re: Reverting r278411

On Tue, Nov 19, 2019 at 04:32:09PM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
> > author of this simplify-rtx code, and I think what you did is all wrong.
> > Also, it should be two patches, the CSE part should be separate.  (I can
> > not tell if that is the part that regresses us, either).
> >
> > Could you please revert the patch?  I'll re-implement the simplify-rtx
> > part of it properly, if you want.  And I can test the CSE part separately.
> > But right now we need trunk to work again.
> 
> OK, done as r278455.

Thanks!


Segher

Re: Symver attribute

On Tue, Nov 19, 2019 at 10:19:05AM -0600, Segher Boessenkool wrote:
> On Tue, Nov 19, 2019 at 03:53:43PM +0100, Jan Hubicka wrote:
> > > Sounds perfectly fine to me, but how many tests will need changing?  Is
> > > it only those that use symbol versioning directly?
> > 
> > There are no tests presently, I plan to write some so those will get
> > dg-require-symver.
> > 
> > .symver is output only if you use explicit symver function/variable
> > attribute. So the "only" downdisde of this is that instead of getting
> > friendly error message from GCC that your target does not support
> > symvers (because we can not easily check for it) you will get less
> > friendly error message from assembler.  I hope that is acceptale since
> > we have pre-existing situations like that already.
> 
> Ah, I thought this would happen for various things from libc as well, so
> there would be a lot of random fallout.  I probably misunderstood :-)

glibc so far uses inline asm with .symver directives.  That could change one
day of course conditionally to use the GCC symver attribute.

Jakub

Re: [PATCH, rs6000] Refactor FP vector comparison operators

Hi!

On Tue, Nov 12, 2019 at 06:41:07PM +0800, Kewen.Lin wrote:
> +;; code iterators and attributes for vector FP comparison operators:
> +(define_code_iterator vector_fp_comparison_operator [lt le ne
> +  ungt unge unlt unle])

Let's indent that differently?

(define_code_iterator
  vector_fp_comparison_operator [lt le ne ungt unge unlt unle])

is nice I think.

> +; There are 14 possible vector FP comparison operators, gt and eq of them 
> have
> +; been expanded above, so just support 12 remaining operators here.
>  
> +; 0. For ge:

(Don't number comments please).

> +(define_expand "vector_ge"
> +  [(set (match_operand:VEC_F 0 "vlogical_operand")
> + (ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand")
> +   (match_operand:VEC_F 2 "vlogical_operand")))]
>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> +  "")

This already exists?  Line 764 in vector.md?

> +; 1. For lt/le/ne/ungt/unge/unlt/unle:
> +; lt(a,b)   = gt(b,a)
> +; le(a,b)   = ge(b,a)
> +; unge(a,b) = ~lt(a,b)
> +; unle(a,b) = ~gt(a,b)
> +; ne(a,b)   = ~eq(a,b)
> +; ungt(a,b) = ~le(a,b)
> +; unlt(a,b) = ~ge(a,b)
> +(define_insn_and_split "vector_"
>[(set (match_operand:VEC_F 0 "vfloat_operand")
> + (vector_fp_comparison_operator:VEC_F
> +(match_operand:VEC_F 1 "vfloat_operand")
> +(match_operand:VEC_F 2 "vfloat_operand")))]
>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>"#"
> +  "can_create_pseudo_p ()"
> +  [(pc)]
>  {
> +  enum rtx_code cond = ;
> +  rtx res = gen_reg_rtx (mode);
> +  bool need_invert = false;
> +  switch (cond)
> +{
> +case LT:
> +case LE:
> +  cond = swap_condition (cond);
> +  std::swap (operands[1], operands[2]);
> +  break;
> +case UNLE:
> +case UNLT:
> +case NE:
> +case UNGE:
> +case UNGT:
> +  cond = reverse_condition_maybe_unordered (cond);
> +  need_invert = true;
> +  break;
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  rtx comp = gen_rtx_fmt_ee (cond, mode, operands[1], operands[2]);
> +  emit_insn (gen_rtx_SET (res, comp));
> +
> +  if (need_invert)
> +emit_insn (gen_one_cmpl2 (res, res));
> +
> +  emit_insn (gen_rtx_SET (operands[0], res));
>  })

Does this work for (say) ungt?  I would do two switch statements: the
first only to do the invert, and then the second to do the swap (with
the modified cond!)  So:

{
  enum rtx_code cond = ;
  bool need_invert = false;

  if (cond == UNLE || cond == UNLT || cond == NE
  || cond == UNGE || cond == UNGT)
{
  cond = reverse_condition_maybe_unordered (cond);
  need_invert = true;
}

  if (cond == LT || cond == LE)
{
  cond = swap_condition (cond);
  std::swap (operands[1], operands[2]);
}

  rtx comp = gen_rtx_fmt_ee (cond, mode, operands[1], operands[2]);

  if (need_invert)
{
  rtx res = gen_reg_rtx (mode);
  emit_insn (gen_rtx_SET (res, comp));
  emit_insn (gen_one_cmpl2 (operands[0], res));
}
  else
emit_insn (gen_rtx_SET (operands[0], comp));
})

> +; 2. For ltgt/uneq/ordered/unordered:
> +; ltgt: gt(a,b) | gt(b,a)
> +; uneq: ~gt(a,b) & ~gt(b,a)
> +; ordered: ge(a,b) | ge(b,a)
> +; unordered: ~ge(a,b) & ~ge(b,a)

You could write that as
  ~(ge(a,b) | ge(b,a))
which may show the structure better?

> +(define_insn_and_split "vector_"
>[(set (match_operand:VEC_F 0 "vfloat_operand")
> + (vector_fp_extra_comparison_operator:VEC_F
> +(match_operand:VEC_F 1 "vfloat_operand")
> +(match_operand:VEC_F 2 "vfloat_operand")))]
>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>"#"
> +  "can_create_pseudo_p ()"

This should be "&& can_create_pseudo ()".

Also, can_create_pseudo in the split condition can fail, in theory
anyway.  It should be part of the insn condition, instead, and even
then it can fail: after the last split pass, but before reload. such
an insn can in principle be created, and then it sill be never split.

In this case, maybe you should add a scratch register; in the previous
case, you can reuse operands[0] for res instead of making a new reg
for it, if !can_create_pseudo ().

>  {
> +  enum rtx_code cond = ;
> +  rtx res1 = gen_reg_rtx (mode);
> +  rtx res2 = gen_reg_rtx (mode);
> +  bool need_invert = false;
> +  switch (cond)
> +{
> +case UNEQ:
> +  need_invert = true;
> +/* Fall through.  */
> +case LTGT:
> +  cond = GT;
> +  break;
> +case UNORDERED:
> +  need_invert = true;
> +/* Fall through.  */
> +case ORDERED:
> +  cond = GE;
> +  break;
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  rtx comp1 = gen_rtx_fmt_ee (cond, mode, operands[1], operands[2]);
> +  emit_insn (gen_rtx_SET (res1, comp1));
> +  rtx comp2 = gen_rtx_fmt_ee (cond, mode, operands[2], operands[1]);
> +  emit_insn (gen_rtx_SET (res2, comp2));
> +
> +  emit_insn (gen_ior3 (res1, res1, res2));
> +
> +  if (need_invert)
>

Re: [PATCH] Multibyte awareness for diagnostics (PR 49973)

2019-11-19 Thread David Malcolm

On Fri, 2019-09-27 at 16:41 -0400, Lewis Hyatt wrote:
> On Thu, Sep 26, 2019 at 08:46:56PM +, Joseph Myers wrote:
> > On Thu, 26 Sep 2019, Lewis Hyatt wrote:
> > 
> > > A couple notes: 
> > > - In order to avoid any portability problems with wchar_t,
> > > the
> > > equivalent of wcwidth() from libc is implemented in-house.
> > 
> > I'm uneasy about contrib/gen_wcwidth.cpp doing the generation using
> > host 
> > libc's wcwidth.  The effect is that libcpp/generated_cpp_wcwidth.h
> > is 
> > *not* reproducible unless you know exactly what host (libc version,
> > locale 
> > used when running the program, distribution patches to libc and
> > locale 
> > data) was used to run the program.  I think we need a generator
> > that works 
> > from Unicode data in some way so we can explicitly say what version
> > of the 
> > (unmodified) Unicode data was used.
> 
> Here is a revised patch that hopefully addresses your concerns. I
> borrowed the
> relevant Python code for parsing Unicode's data files from glibc,
> then added a
> new script that parses the locale data they output into the same sort
> of simply
> searchable tables I was creating before. The new generated table is
> very close
> to the old one, but there are some differences due to improvements
> that have
> been made to glibc recently, affecting 200 or so codepoints. The
> procedure for
> updating GCC's wcwidth would then be the following:
> 
> -The three Unicode data files live in contrib/unicode/
> {EastAsianWidth.txt,PropList.txt,UnicodeData.txt} and can be updated
> at any
> time when Unicode changes them.
> 
> -glibc's processing logic lives in two Python scripts in
> contrib/unicode/from_glibc and these would ideally be updated when
> glibc makes
> updates. It seems they occasionally put some manual overrides, etc.,
> based on
> feedback and bug reports. (These are the verbatim scripts from glibc
> with no
> changes, so they need only be copied over.)
> 
> -contrib/unicode/gen_wcwidth.py runs the glibc code, using GCC's
> Unicode data
> files as inputs, and produces the necessary tables for
> libcpp/generated_cpp_wcwidth.h.
> 
> Hope that sounds better. This way it is relatively straightforward to
> keep in
> sync with glibc (which seems desirable to me anyway), but is also
> always
> reproducible.
> 
> Note: I did not include the three large unicode data files in this
> emailed
> patch, although they would be committed as part of the patch
> presumably.
> They are available here:
> ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
> ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
> ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
> 
> The rest of the patch is unchanged from before, other than one
> comment updated
> to reflect the new situation, and charset.c rebased to current trunk.
> 
> Thank you for taking the time to review this.
> 
> -Lewis

Thanks for posting this patch; I'm sorry about how long it's taken me
to review it.

BTW, have you done GCC contributor paperwork?
  https://gcc.gnu.org/contribute.html#legal

> diff --git a/contrib/unicode/from_glibc/unicode_utils.py 
> b/contrib/unicode/from_glibc/unicode_utils.py
> new file mode 100644
> index 000..a9e94cce418
> --- /dev/null
> +++ b/contrib/unicode/from_glibc/unicode_utils.py

[...snip...]

I'll leave it for Joseph to comment on whether this approach satisifies
his concerns; I'll focus on the diagnostic-show-locus.c changes.

I'm assuming that all of the Python code is Python 3, rather than 2
(I see python 3 shebangs and python3-style-print, so it looks good from
that POV).  It appears that there's no need for the build machine to
have Python, it's only needed when maintainers are refreshing the data
from glibc.

If we're going with this approach, and redistributing those unicode data
files as part of our repo and tarballs, do we need some kind of
copyright/license statement that spells out the situation?  What does
glibc do for this?

> diff --git a/contrib/unicode/from_glibc/utf8_gen.py 
> b/contrib/unicode/from_glibc/utf8_gen.py
> new file mode 100755
> index 000..0e5583cd259
> --- /dev/null
> +++ b/contrib/unicode/from_glibc/utf8_gen.py

[...snip...]

> diff --git a/contrib/unicode/gen_wcwidth.py b/contrib/unicode/gen_wcwidth.py
> new file mode 100755
> index 000..02b28bcedcf
> --- /dev/null
> +++ b/contrib/unicode/gen_wcwidth.py

[...snip...]

If we're going with this approach (which I'll leave to Joseph), perhaps
this directory should have a brief README (covering much of the material
you've mentioned in the email with the patch, talking about syncing with
glibc, regenerating the data, etc).

> diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
> index 4d563dda8f4..7a5bd36d962 100644
> --- a/gcc/diagnostic-show-locus.c
> +++ b/gcc/diagnostic-show-locus.c
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gcc-rich-location.h"
>  #include "selftest.h"
>  #include "selftest-diagnost

Re: [PATCH] Free dominance info at the beginning of pass_jump_after_combine

On Wed, Nov 13, 2019 at 01:15:18PM +0100, Ilya Leoshkevich wrote:
> > But OTOH it may well be the case that other things in cleanup_cfg make
> > the known dominance info invalid as well, in which case just the comment
> > is a bit misleading.  Sounds likely to me :-)
> 
> Yeah, that's what I worry about as well.  In particular, this block in
> try_optimize_cfg:

Heh, I did that code, whoops :-)

> /* Try to change a conditional branch to a return to the
>respective conditional return.  */
> if (EDGE_COUNT (b->succs) == 2
> && any_condjump_p (BB_END (b))
> && bb_is_just_return (BRANCH_EDGE (b)->dest, &ret, &use))
>   {
> if (redirect_jump (as_a  (BB_END (b)),
>PATTERN (ret), 0))
>   {
> if (use)
>   emit_insn_before (copy_insn (PATTERN (use)),
> BB_END (b));
> if (dump_file)
>   fprintf (dump_file, "Changed conditional jump %d->%d "
>"to conditional return.\n",
>b->index, BRANCH_EDGE (b)->dest->index);
> redirect_edge_succ (BRANCH_EDGE (b),
> EXIT_BLOCK_PTR_FOR_FN (cfun));
> BRANCH_EDGE (b)->flags &= ~EDGE_CROSSING;
> changed_here = true;
>   }
>   }
> 
> runs regardless of cleanup mode, and it makes use of redirect_edge_succ,
> which does not update dominators.

Yeah.  Of course this only does anything if the targeted block is only
a return insn (and maybe a USE of the return value), so nothing bad will
happen, but the dom info is not technically correct anymore.


Segher

Re: [PATCH] Multibyte awareness for diagnostics (PR 49973)

2019-11-19 Thread Joseph Myers

On Tue, 19 Nov 2019, David Malcolm wrote:

> If we're going with this approach, and redistributing those unicode data
> files as part of our repo and tarballs, do we need some kind of
> copyright/license statement that spells out the situation?  What does
> glibc do for this?

glibc includes the files in localedata/unicode-gen/ directory.

My inclination is that we should include them in the GCC sources, but 
there were concerns about doing so when I asked about that issue when last 
updating the data used for warnings about normalization of UCNs in 
identifiers (which I should probably update again - given the work I did 
last time, this time it should just be a regeneration with newer input 
files) .

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C++ coroutines 3/6] Front end parsing and transforms.

2019-11-19 Thread Nathan Sidwell


On 11/17/19 5:25 AM, Iain Sandoe wrote:


+++ b/gcc/cp/coroutines.cc



+/* = Morph and Expand. =



+/* Helpers for label creation.  */
+static tree
+create_anon_label_with_ctx (location_t loc, tree ctx)
+{
+  tree lab = build_decl (loc, LABEL_DECL, NULL_TREE, void_type_node);
+
+  DECL_ARTIFICIAL (lab) = 1;
+  DECL_IGNORED_P (lab) = 1;


We can use true for such boolean flags now, your call.



+/* We mark our named labels as used, because we want to keep them in place
+   during development.  FIXME: Remove this before integration.  */


FIXME?


+struct __proxy_replace
+{
+  tree from, to;
+};


std::pair ?



+/* If this is a coreturn statement (or one wrapped in a cleanup) then
+   return the list of statements to replace it.  */
+static tree


space between comment and function -- here and elsewhere.


+  /* p.return_void and p.return_value are probably void, but it's not
+ clear if that's intended to be a guarantee.  CHECKME.  */


CHECKME?


+  /* We might have a single co_return statement, in which case, we do
+have to replace it with a list.  */


'do have to' reads weirdly -> 'have to' or 'do not have to' ?



+static tree
+co_await_expander (tree *stmt, int * /*do_subtree*/, void *d)
+{



+  r = build2_loc (loc, INIT_EXPR, TREE_TYPE (sv_handle), sv_handle,
+ suspend);
+  append_to_statement_list (r, &body_list);
+  tree resume
+   = lookup_member (TREE_TYPE (sv_handle), get_identifier ("resume"), 1, 0,
+tf_warning_or_error);
+  resume = build_new_method_call (sv_handle, resume, NULL, NULL_TREE,
+ LOOKUP_NORMAL, NULL, tf_warning_or_error);
+  resume = coro_build_cvt_void_expr_stmt (resume, loc);


Do we have a way of forcing this to be a tail call?  Should comment 
and/or TODO:



+  append_to_statement_list (resume, &body_list);
+}


// comments approaching ...


+  add_stmt (r); // case 0:
+  // Implement the suspend, a scope exit without clean ups.
+  r = build_call_expr_internal_loc (loc, IFN_CO_SUSPN, void_type_node, 1, 
susp);
+  r = coro_build_cvt_void_expr_stmt (r, loc);


// danger zone over :)  [there are others scattered around though]



+/* When we built the await expressions, we didn't know the coro frame
+   layout, therefore no idea where to find the promise or where to put
+   the awaitables.  Now we know these things, fill them in.  */
+static tree
+transform_await_expr (tree await_expr, struct __await_xform_data *xform)
+{
+  struct suspend_point_info *si = suspend_points->get (await_expr);
+  location_t loc = EXPR_LOCATION (await_expr);
+  if (!si)
+{
+  error_at (loc, "no suspend point info for %qD", await_expr);


that looks implementory speak -- is it an ICE?



+  /* FIXME: determine if it's better to walk the co_await several times with
+ a quick test, or once with a more complex test.  */


Probably can simply be an 'i'm not sure, I went with ... ' comment?



+  /* Update the block associated with the outer scope of the orig fn.  */
+  tree first = expr_first (fnbody);
+  if (first && TREE_CODE (first) == BIND_EXPR)
+{
+  /* We will discard this, since it's connected to the original scope
+nest... ??? CHECKME, this might be overly cautious.  */

?


+  tree block = BIND_EXPR_BLOCK (first);
+  if (block) // For this to be missing is probably a bug.


gcc_assert?


+   {
+ gcc_assert (BLOCK_SUPERCONTEXT (block) == NULL_TREE);
+ gcc_assert (BLOCK_CHAIN (block) == NULL_TREE);
+ BLOCK_SUPERCONTEXT (block) = top_block;
+ BLOCK_SUBBLOCKS (top_block) = block;
+   }
+}
+
+  add_stmt (actor_bind);
+  tree actor_body = push_stmt_list ();
+
+  /* FIXME: this is development marker, remove later.  */


FIXME



+  tree return_void = NULL_TREE;
+  tree rvm
+= lookup_promise_member (orig, "return_void", loc, false /*musthave*/);

  /*musthave=*/false  I think there are other similar cases



+  /* Get a reference to the final suspend var in the frame.  */
+  transform_await_expr (final_await, &xform);
+  r = coro_build_expr_stmt (final_await, loc);
+  add_stmt (r);
+
+  /* now do the tail of the function.  */

now->Now


+  /* Here deallocate the frame (if we allocated it), which we will have at
+ present.  */


sentence is confusing.  reword?


+/* Helper that returns an identifier for an appended extension to the
+   current un-mangled function name.  */
+static tree
+get_fn_local_identifier (tree orig, const char *append)
+{
+  /* Figure out the bits we need to generate names for the outlined things
+ For consistency, this needs to behave the same way as
+ ASM_FORMAT_PRIVATE_NAME does. */


pity we don't have a generic helper already.


+  char *an;
+  if (DECL_ASSEMBLER_NAME (orig))
+an = ACONCAT ((IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (orig)), sep, 
append,
+  (char *) 0));
+  else if (DECL_USE_TEMPLATE (

Re: [PATCH] Fix slowness in demangler

2019-11-19 Thread Jeff Law

On 11/19/19 12:00 PM, Tim Rühsen wrote:
> On 16.11.19 18:14, Jeff Law wrote:
>> On 11/12/19 9:39 AM, Tim Rühsen wrote:
>>> On 11/12/19 4:15 PM, Ian Lance Taylor wrote:
 On Tue, Nov 12, 2019 at 6:15 AM Tim Rühsen  wrote:
> this is a proposal to fix
> https://sourceware.org/bugzilla/show_bug.cgi?id=25180
>
> In short:
> cxxfilt
> _ZZ1_DO1z1Dclaa1D1VEE1VE2zo
>
> takes several minutes with 100% CPU before it comes back with a result.
>
> With this patch the result is returned immediately. The test suite in
> binutils-gdb/libiberty/ throws no error.
>
> I'd like to note that I am not subscribed to the list, so please add me
> to CC when replying. Thanks in advance.
 This is OK with an appropriate ChangeLog entry.
>>> Thanks for feedback, Ian.
>>>
>>> Attached is the patch with a ChangeLog entry.
>>>
>>> Regards, Tim
> ...
> 
>> THanks.  Installed after minor twiddling of the ChangeLog.
>>
>> jeff
> 
> Thanks. Where exactly did it go ? Still can't see it in
> git://sourceware.org/git/binutils-gdb.git. Is there another repository ?
It's in the GCC repo.  binutils-gdb will (at some point) pull it from GCC.
jeff

Re: [PATCH] Fix slowness in demangler

2019-11-19 Thread Tim Rühsen

On 16.11.19 18:14, Jeff Law wrote:
> On 11/12/19 9:39 AM, Tim Rühsen wrote:
>> On 11/12/19 4:15 PM, Ian Lance Taylor wrote:
>>> On Tue, Nov 12, 2019 at 6:15 AM Tim Rühsen  wrote:
 this is a proposal to fix
 https://sourceware.org/bugzilla/show_bug.cgi?id=25180

 In short:
 cxxfilt
 _ZZ1_DO1z1Dclaa1D1VEE1VE2zo

 takes several minutes with 100% CPU before it comes back with a result.

 With this patch the result is returned immediately. The test suite in
 binutils-gdb/libiberty/ throws no error.

 I'd like to note that I am not subscribed to the list, so please add me
 to CC when replying. Thanks in advance.
>>> This is OK with an appropriate ChangeLog entry.
>> Thanks for feedback, Ian.
>>
>> Attached is the patch with a ChangeLog entry.
>>
>> Regards, Tim
...

> THanks.  Installed after minor twiddling of the ChangeLog.
> 
> jeff

Thanks. Where exactly did it go ? Still can't see it in
git://sourceware.org/git/binutils-gdb.git. Is there another repository ?

Regards, Tim



signature.asc
Description: OpenPGP digital signature

Minor speedup in estimate_edge_badness

Hi,
this patch avoids some extra calculation in edge_badness.

Bootstrapped/regtested x86_64-linux. Comitted.

* ipa-inline.c (inlining_speedup): New function.
(edge_badness): Use it.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 278441)
+++ ipa-inline.c(working copy)
@@ -768,6 +768,33 @@ compute_inlined_call_time (struct cgraph
   return time;
 }
 
+/* Determine time saved by inlininig EDGE of frequency FREQ
+   where callee's runtime w/o inlineing is UNINLINED_TYPE
+   and with inlined is INLINED_TYPE.  */
+
+inline sreal
+inlining_speedup (struct cgraph_edge *edge,
+ sreal freq,
+ sreal uninlined_time,
+ sreal inlined_time)
+{
+  sreal speedup = uninlined_time - inlined_time;
+  /* Handling of call_time should match one in ipa-inline-fnsummary.c
+ (estimate_edge_size_and_time).  */
+  sreal call_time = ipa_call_summaries->get (edge)->call_stmt_time;
+
+  if (freq > 0)
+{
+  speedup = (speedup + call_time);
+  if (freq != 1)
+   speedup = speedup * freq;
+}
+  else if (freq == 0)
+speedup = speedup >> 11;
+  gcc_checking_assert (speedup >= 0);
+  return speedup;
+}
+
 /* Return true if the speedup for inlining E is bigger than
PARAM_MAX_INLINE_MIN_SPEEDUP.  */
 
@@ -1149,10 +1176,8 @@ edge_badness (struct cgraph_edge *edge,
   sreal numerator, denominator;
   int overall_growth;
   sreal freq = edge->sreal_frequency ();
-  sreal inlined_time = compute_inlined_call_time (edge, edge_time, freq);
 
-  numerator = (compute_uninlined_call_time (edge, unspec_edge_time, freq)
-  - inlined_time);
+  numerator = inlining_speedup (edge, freq, unspec_edge_time, edge_time);
   if (numerator <= 0)
numerator = ((sreal) 1 >> 8);
   if (caller->count.ipa ().nonzero_p ())
@@ -1235,16 +1260,14 @@ edge_badness (struct cgraph_edge *edge,
  fprintf (dump_file,
   "  %f: guessed profile. frequency %f, count %" PRId64
   " caller count %" PRId64
-  " time w/o inlining %f, time with inlining %f"
+  " time saved %f"
   " overall growth %i (current) %i (original)"
   " %i (compensated)\n",
   badness.to_double (),
   freq.to_double (),
   edge->count.ipa ().initialized_p () ? edge->count.ipa 
().to_gcov_type () : -1,
   caller->count.ipa ().initialized_p () ? caller->count.ipa 
().to_gcov_type () : -1,
-  compute_uninlined_call_time (edge,
-   unspec_edge_time, 
freq).to_double (),
-  inlined_time.to_double (),
+  inlining_speedup (edge, freq, unspec_edge_time, 
edge_time).to_double (),
   estimate_growth (callee),
   callee_info->growth, overall_growth);
}

Re: [PATCH] Fix slowness in demangler

2019-11-19 Thread Tim Rühsen

On 19.11.19 20:01, Jeff Law wrote:
> On 11/19/19 12:00 PM, Tim Rühsen wrote:
>> Thanks. Where exactly did it go ? Still can't see it in
>> git://sourceware.org/git/binutils-gdb.git. Is there another repository ?
> It's in the GCC repo.  binutils-gdb will (at some point) pull it from GCC.
> jeff

Thanks, good to know, will clone it from there :-)

Tim



signature.asc
Description: OpenPGP digital signature

Remove unused parameter to estimate_edge_size_and_time

Hi,
this patch removes unused argument of estimate_edge_size_and_time.
Bootstrapped/regtested x86_64-linux, comitted.

* ipa-fnsummary.c (estimate_edge_size_and_time): Drop parameter PROP.
(estimate_calls_size_and_time): Update.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 278441)
+++ ipa-fnsummary.c (working copy)
@@ -2950,7 +2950,6 @@ estimate_edge_devirt_benefit (struct cgr
 static inline void
 estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *min_size,
 sreal *time,
-int prob,
 vec known_vals,
 vec known_contexts,
 vec known_aggs,
@@ -2960,6 +2959,7 @@ estimate_edge_size_and_time (struct cgra
   int call_size = es->call_stmt_size;
   int call_time = es->call_stmt_time;
   int cur_size;
+
   if (!e->callee && hints && e->maybe_hot_p ()
   && estimate_edge_devirt_benefit (e, &call_size, &call_time,
   known_vals, known_contexts, known_aggs))
@@ -2968,12 +2968,8 @@ estimate_edge_size_and_time (struct cgra
   *size += cur_size;
   if (min_size)
 *min_size += cur_size;
-  if (!time)
-;
-  else if (prob == REG_BR_PROB_BASE)
+  if (time)
 *time += ((sreal)call_time) * e->sreal_frequency ();
-  else
-*time += ((sreal)call_time * prob) * e->sreal_frequency ();
 }
 
 
@@ -3019,7 +3015,7 @@ estimate_calls_size_and_time (struct cgr
 sowe do not need to compute probabilities.  */
  estimate_edge_size_and_time (e, size,
   es->predicate ? NULL : min_size,
-  time, REG_BR_PROB_BASE,
+  time,
   known_vals, known_contexts,
   known_aggs, hints);
}
@@ -3031,7 +3027,7 @@ estimate_calls_size_and_time (struct cgr
  || es->predicate->evaluate (possible_truths))
estimate_edge_size_and_time (e, size,
 es->predicate ? NULL : min_size,
-time, REG_BR_PROB_BASE,
+time,
 known_vals, known_contexts, known_aggs,
 hints);
 }

Re: Reverting r278411

Segher Boessenkool  writes:
> Hi,
>
> r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
> author of this simplify-rtx code, and I think what you did is all wrong.
> Also, it should be two patches, the CSE part should be separate.  (I can
> not tell if that is the part that regresses us, either).

I committed them as one patch because I'd tested them together,
which I thought was the way this should be done.  In practice
a clean revert would have to be done as a pair anyway: reverting
one part individually would have required XFAILs on the tests.

And yeah, it was the cse.c part that was the problem.
find_sets_in_insn has:

  /* Don't count call-insns, (set (reg 0) (call ...)), as a set.
 The hard function value register is used only once, to copy to
 someplace else, so it isn't worth cse'ing.  */
  else if (GET_CODE (SET_SRC (x)) == CALL)
;

So n_sets == 1 can be true if we have a single SET in parallel
with a call.  Unsurprising, I guess no-one had MEM sets in
parallel with a call, so it didn't matter until now.

I'm retesting with a !CALL_P check added to make sure that was
the only problem.

Before I resubmit, why is the simplify-rtx.c part all wrong?

Thanks,
Richard

Re: [PATCH v2] PR92398: Fix testcase failure of pr72804.c

Hi!

On Mon, Nov 18, 2019 at 04:21:07PM +0800, luoxhu wrote:
> On 2019/11/15 18:17, Segher Boessenkool wrote:
> > On Thu, Nov 14, 2019 at 09:12:32PM -0600, Xiong Hu Luo wrote:
> >> P9LE generated instruction is not worse than P8LE.
> >> mtvsrdd;xxlnot;stxv vs. not;not;std;std.
> >> Update the test case to fix failures.
> > 
> > So this no longer runs it for p7, and it also doesn't run it for cpus
> > after p9 anymore.  Could you change it to be a test for p9 and above,
> > and one for before p9?  Does that work?
> Since one test file could not support multiple dg do-compile or dg-options,
> it seems not feasible to build both power7, power8 and power9 option in one
> source file, I added check_effective_target_power[7,8,9] and
> add_options_for_power[7,8,9] in target-supports.exp for each cpu configure.

You can add a selector to the scan-assembler tests, I meant.

There are two ways to run tests: in one way you use just what -mcpu=
etc. you happen to get; in the other way, you force particular options.
With the first way, you will automatically also test everything on newer
cpus.  With the second way, you also test (for example) p8 code gen when
actually testing on a p9 system.  Both approaches have their uses.

In this case, I would use the first approach, but okay :-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.h
> @@ -0,0 +1,17 @@
> +/* This test code is included into pr72804.p8.c and pr72804.p9.c

And p7.

> +   The two files have the tests for the number of instructions generated for
> +   P8LE versus P9LE.  */

Three files :-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.p7.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2" } */
> +/* { dg-add-options power7 } */
> +
> +/* { dg-final { scan-assembler-times "not " 4 {xfail be} } } */
> +/* { dg-final { scan-assembler-times "std " 2  } } */
> +/* { dg-final { scan-assembler-times "ld " 2 } } */
> +/* { dg-final { scan-assembler-not "lxvd2x" {xfail be} } } */
> +/* { dg-final { scan-assembler-not "stxvd2x" {xfail be} } } */
> +/* { dg-final { scan-assembler-not "xxpermdi" } } */
> +/* { dg-final { scan-assembler-not "mfvsrd" } } */
> +/* { dg-final { scan-assembler-not "mfvsrd" } } */

So this is exactly the same as p8.  Hmm.

> --- a/gcc/testsuite/gcc.target/powerpc/pr72804.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.p9.c
> @@ -1,25 +1,16 @@
>  /* { dg-do compile { target { lp64 } } } */
> -/* { dg-skip-if "" { powerpc*-*-darwin* } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
> -/* { dg-options "-O2 -mvsx -fno-inline-functions --param 
> max-inline-insns-single-O2=200" } */
> +/* { dg-options "-O2 -mvsx" } */
> +/* { dg-add-options power9 } */

-mvsx is superfluous if you have -mcpu=power9 already.
You shouldn't test powerpc_vsx_ok if you force a CPU that has it.

> -/* { dg-final { scan-assembler-times "not " 4 } } */
> -/* { dg-final { scan-assembler-times "std " 2 } } */
> +/* { dg-final { scan-assembler-times "not " 2 { target power9 } } } */
> +/* { dg-final { scan-assembler-not "std " { target power9 } } } */
>  /* { dg-final { scan-assembler-times "ld " 2 } } */
>  /* { dg-final { scan-assembler-not "lxvd2x" } } */
>  /* { dg-final { scan-assembler-not "stxvd2x" } } */
>  /* { dg-final { scan-assembler-not "xxpermdi" } } */
>  /* { dg-final { scan-assembler-not "mfvsrd" } } */
>  /* { dg-final { scan-assembler-not "mfvsrd" } } */

You now force power9 for this test, so you shouldn't test for it anymore.

> +proc check_effective_target_power7 {  } {
> +  return [check_no_compiler_messages_nocache power7 assembly {
> + #if !(!defined(_ARCH_PWR8) && defined(_ARCH_PWR7))
> + #error !_ARCH_PWR7
> + #endif
> +  } "-mdejagnu-cpu=power7"]
> +}

So this test for p7 exactly (not power8 or up).  Okay.  Add a comment
saying that?

> +proc add_options_for_power7 { flags } {
> +  if { [istarget powerpc*-*-*]  } {
> +return "$flags -mdejagnu-cpu=power7"
> +  }
> +  return "$flags"
> +}

Is this better than just adding -mdejagnu-cpu=power7 directly?


Segher

[C++ PATCH] Consider parm types equivalence for operator rewrite tiebreaker.

The C++ committee continues to discuss how best to avoid breaking existing
code with the new rules for reversed operators.  A recent suggestion was to
base the tie-breaker on the parameter types of the candidates, which made a
lot of sense to me, so this patch implements that.

This patch also mentions that a candidate was reversed or rewritten when
printing the list of candidates, and warns about a comparison that becomes
recursive under the new rules.  There is no flag for this warning; people
can silence it by swapping the operands.

Tested x86_64-pc-linux-gnu, applying to trunk.

* call.c (same_fn_or_template): Change to cand_parms_match.
(joust): Adjust.
(print_z_candidate): Mark rewritten/reversed candidates.
(build_new_op_1): Warn about recursive call with reversed arguments.
---
 gcc/cp/call.c | 63 ---
 .../g++.dg/cpp2a/spaceship-rewrite2.C | 12 
 .../g++.dg/cpp2a/spaceship-rewrite3.C | 10 +++
 .../g++.dg/cpp2a/spaceship-rewrite4.C | 12 
 4 files changed, 73 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite4.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 13639a1c901..f4dfa7b3f56 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -3694,6 +3694,10 @@ print_z_candidate (location_t loc, const char *msgstr,
 inform (cloc, "%s%#qD (near match)", msg, fn);
   else if (DECL_DELETED_FN (fn))
 inform (cloc, "%s%#qD (deleted)", msg, fn);
+  else if (candidate->reversed ())
+inform (cloc, "%s%#qD (reversed)", msg, fn);
+  else if (candidate->rewritten ())
+inform (cloc, "%s%#qD (rewritten)", msg, fn);
   else
 inform (cloc, "%s%#qD", msg, fn);
   if (fn != candidate->fn)
@@ -6219,8 +6223,14 @@ build_new_op_1 (const op_location_t &loc, enum tree_code 
code, int flags,
  else
{
  if (cand->reversed ())
-   /* We swapped these in add_candidate, swap them back now.  */
-   std::swap (cand->convs[0], cand->convs[1]);
+   {
+ /* We swapped these in add_candidate, swap them back now.  */
+ std::swap (cand->convs[0], cand->convs[1]);
+ if (cand->fn == current_function_decl)
+   warning_at (loc, 0, "in C++20 this comparison calls the "
+   "current function recursively with reversed "
+   "arguments");
+   }
  result = build_over_call (cand, LOOKUP_NORMAL, complain);
}
 
@@ -10995,18 +11005,32 @@ joust_maybe_elide_copy (z_candidate *&cand)
   return false;
 }
 
-/* True if cand1 and cand2 represent the same function or function
-   template.  */
+/* True if the defining declarations of the two candidates have equivalent
+   parameters.  */
 
-static bool
-same_fn_or_template (z_candidate *cand1, z_candidate *cand2)
+bool
+cand_parms_match (z_candidate *c1, z_candidate *c2)
 {
-  if (cand1->fn == cand2->fn)
+  tree fn1 = c1->template_decl;
+  tree fn2 = c2->template_decl;
+  if (fn1 && fn2)
+{
+  fn1 = most_general_template (TI_TEMPLATE (fn1));
+  fn1 = DECL_TEMPLATE_RESULT (fn1);
+  fn2 = most_general_template (TI_TEMPLATE (fn2));
+  fn2 = DECL_TEMPLATE_RESULT (fn2);
+}
+  else
+{
+  fn1 = c1->fn;
+  fn2 = c2->fn;
+}
+  if (fn1 == fn2)
 return true;
-  if (!cand1->template_decl || !cand2->template_decl)
+  if (identifier_p (fn1) || identifier_p (fn2))
 return false;
-  return (most_general_template (TI_TEMPLATE (cand1->template_decl))
- == most_general_template (TI_TEMPLATE (cand2->template_decl)));
+  return compparms (TYPE_ARG_TYPES (TREE_TYPE (fn1)),
+   TYPE_ARG_TYPES (TREE_TYPE (fn2)));
 }
 
 /* Compare two candidates for overloading as described in
@@ -11155,20 +11179,11 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
 
  if (winner && comp != winner)
{
- if (same_fn_or_template (cand1, cand2))
-   {
- /* Ambiguity between normal and reversed versions of the
-same comparison operator; prefer the normal one.
-https://lists.isocpp.org/core/2019/10/7438.php  */
- if (cand1->reversed ())
-   winner = -1;
- else
-   {
- gcc_checking_assert (cand2->reversed ());
- winner = 1;
-   }
- break;
-   }
+ /* Ambiguity between normal and reversed comparison operators
+with the same parameter types; prefer the normal one.  */
+ if ((cand1->reversed () != cand2->reversed ())
+ && cand_parms_match (can

Re: Add a new combine pass

On Tue, Nov 19, 2019 at 11:33:13AM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > On Sun, Nov 17, 2019 at 11:35:26PM +, Richard Sandiford wrote:
> >> While working on SVE, I've noticed several cases in which we fail
> >> to combine instructions because the combined form would need to be
> >> placed earlier in the instruction stream than the last of the
> >> instructions being combined.  This includes one very important
> >> case in the handling of the first fault register (FFR).
> >
> > Do you have an example of that?
> 
> It's difficult to share realistic examples at this stage since this
> isn't really the right forum for making them public for the first time.

Oh I'm very sorry.  In the future, just say "Future" and I know what
you mean :-)

>   /* Make sure that the value that is to be substituted for the register
>does not use any registers whose values alter in between.  However,
>If the insns are adjacent, a use can't cross a set even though we
>think it might (this can happen for a sequence of insns each setting
>the same destination; last_set of that register might point to
>a NOTE).  If INSN has a REG_EQUIV note, the register is always
>equivalent to the memory so the substitution is valid even if there
>are intervening stores.  Also, don't move a volatile asm or
>UNSPEC_VOLATILE across any other insns.  */
>   || (! all_adjacent
> && (((!MEM_P (src)
>   || ! find_reg_note (insn, REG_EQUIV, src))
>  && modified_between_p (src, insn, i3))
> || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
> || GET_CODE (src) == UNSPEC_VOLATILE))

So this would work if you had pseudos here, instead of the hard reg?
Because it is a hard reg it is the same number in both places, making it
hard to move.

> > How are dependencies represented in your new pass?  If it just does
> > walks over the insn stream for everything, you get quadratic complexity
> > if you move insns backwards.  We have that in combine already, mostly
> > from modified_between_p, but that is limited because of how LOG_LINKS
> > work, and we have been doing this for so long and there are no problems
> > found with it, so it must work in practice.  But I am worried about it
> > when moving insns back an unlimited distance.
> 
> It builds def-use chains, but using a constant limit on the number of
> explicitly-recorded uses.  All other uses go in a numerical live range
> from which they (conservatively) never escape.  The def-use chains
> represent memory as a single entity, a bit like in gimple.

Ah.  So that range thing ensures correctness.

Why don't you use DF for the DU chains?

> >> - it tries using REG_EQUAL notes for the final instruction.
> >
> > And that.
> 
> I meant REG_EQUAL notes on i3, i.e. it tries replacing the src of i3
> with i3's REG_EQUAL note and combining into that.  Does combine do that?
> I couldn't see it, and in:
> 
>https://gcc.gnu.org/ml/gcc/2019-06/msg00148.html
> 
> you seemed to reject the idea of allowing it.

Yes, I still do.  Do you have an example where it helps?

> >> - it can parallelise two independent instructions that both read from
> >>   the same register or both read from memory.
> >
> > That only if somehow there is a link between the two (so essentially
> > never).  The only combinations tried by combine are those via LOG_LINKs,
> > which are between a SET and the first corresponding use.  This is a key
> > factor that makes it kind of linear (instead of exponential) complexity.
> 
> Tracking limited def-use chains is what makes this last bit easy.
> We can just try parallelising two instructions from the (bounded) list
> of uses.  And for this case there's not any garbage rtl involved, since
> we reuse the same PARALLEL rtx between attempts.  The cost is basically
> all in the recog call (which would obviously mount up if we went
> overboard).

*All* examples above and below are just this.

If you disable everything else, what do the statistics look like then?

> > One thing I want to do is some mini-combine after every split, probably
> > only with the insns new from the split.  But we have no cfglayout mode
> > anymore then, and only hard regs (except in the first split pass, which
> > is just a little later than your new pass).
> 
> Yeah, sounds like it could be useful.  I guess there'd need to be
> an extra condition on the combination that the new insn can't be
> immediately split.

It would run *after* split.  Not interleaved with it.

> > And amount of garbage produced?
> 
> If -ftime-report stats are accurate, then the total amount of
> memory allocated is:
> 
> run-combine=2 (normal combine): 1793 kB
> run-combine=4 (new pass only):98 kB
> run-combine=6 (both passes):1871 kB (new pass accounts for 78 kB)
> 
> But again that's not a fair comparison when the main combine pass does more.

The way combine does SUBST is pretty fu

Re: [C++ PATCH] Fix error-recovery with constexpr dtor (PR c++/92414)


On 11/18/19 11:22 PM, Jakub Jelinek wrote:

On Mon, Nov 18, 2019 at 02:38:51PM -0500, Jason Merrill wrote:

On 11/8/19 9:14 AM, Jakub Jelinek wrote:

Hi!

We ICE on the following testcase, because DECL_INITIAL (decl) is
error_mark_node due to previously reported error and
cxx_eval_outermost_constant_expr is unhappy if ctx.ctor is not
a CONSTRUCTOR, but error_mark_node.

If the initializer is invalid, it should have been diagnosed already and
there is no need to try to evaluate constexpr dtor on it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-08  Jakub Jelinek  

PR c++/92414
* constexpr.c (cxx_constant_dtor): Don't call
cxx_eval_outermost_constant_expr if DECL_INITIAL is erroneous.


Why add the error-checking here rather than in
cxx_eval_outermost_constant_expr when we use DECL_INITIAL?


So like this?
Also passed bootstrap/regtest on x86_64-linux and i686-linux:


OK, thanks.


2019-11-19  Jakub Jelinek  

PR c++/92414
* constexpr.c (cxx_eval_outermost_constant_expr): If DECL_INITIAL
on object is erroneous, return t without trying to evaluate
a constexpr dtor.

* g++.dg/cpp2a/constexpr-dtor4.C: New test.

--- gcc/cp/constexpr.c.jj   2019-11-18 21:50:49.539233013 +0100
+++ gcc/cp/constexpr.c  2019-11-18 21:52:47.217475549 +0100
@@ -5834,6 +5834,8 @@ cxx_eval_outermost_constant_expr (tree t
  gcc_assert (object && VAR_P (object));
  gcc_assert (DECL_DECLARED_CONSTEXPR_P (object));
  gcc_assert (DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (object));
+ if (error_operand_p (DECL_INITIAL (object)))
+   return t;
  ctx.ctor = unshare_expr (DECL_INITIAL (object));
  TREE_READONLY (ctx.ctor) = false;
  /* Temporarily force decl_really_constant_value to return false
--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor4.C.jj 2019-11-18 
21:50:56.152134256 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor4.C2019-11-18 
21:50:56.152134256 +0100
@@ -0,0 +1,15 @@
+// PR c++/92414
+// { dg-do compile { target c++2a } }
+
+struct A { virtual void foo (); };
+
+struct B : A {
+  constexpr B (int);   // { dg-warning "used but never defined" }
+  constexpr ~B () { }
+};
+
+struct D : B {
+  constexpr D () : B (42) { }  // { dg-error "used before its definition" }
+};
+
+constexpr D d; // { dg-message "in 'constexpr' expansion of" }


Jakub

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-11-19 Thread David Malcolm

On Sat, 2019-11-16 at 21:42 +0100, Thomas Schwinge wrote:
> Hi David!
> 
> On 2019-11-15T20:22:47-0500, David Malcolm 
> wrote:
> > This patch kit
> 
> (I have not looked at the patches.)  ;-)
> 
> > introduces a static analysis pass for GCC that can diagnose
> > various kinds of problems in C code at compile-time (e.g. double-
> > free,
> > use-after-free, etc).
> 
> Sounds great from the description!

Thanks.

> Would it make sense to add to the wiki page
>  a (high-level)
> comparison to other static analyzers (Coverity, cppcheck,
> clang-static-analyzer, others?), in terms of how they work, what
> their
> respective benefits are, what their design goals are, etc.  (Of
> course
> understanding that yours is much less mature at this point; talking
> about
> high-level design rather than current implementation status.)
> 
> For example, why do we want that in GCC instead of an external tool
> -- in
> part covered in your Rationale.  Can a compiler-side implementation
> benefit from having more information available than an external tool?
> GCC-side implementation is readily available (modulo GCC plugin
> installation?) vs. external ones need to be installed/set up first.
> GCC-side one only works with GCC-supported languages.  GCC-side one
> analyzes actual code being compiled -- thinking about preprocessor-
> level
> '#if' etc., which surely are problematic for external tools that are
> not
> actually replicating a real build.  And so on.  (If you don't want to
> spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
> compare yours to external tools.)
> 
> Just an idea, because I wondered about these things.

Thanks; I've added some notes to the "Rationale" section of the wiki
page.

A lot of the information you're after is hidden in patch 2 of the kit,
in an analysis.texi (though admittedly that's hard to read in "patch
that adds a .texi file" form).

For now, I've uploaded a prebuilt version of the HTML to:

https://dmalcolm.fedorapeople.org/gcc/2019-11-19/gccint/Static-Analyzer.html


> > The analyzer runs as an IPA pass on the gimple SSA representation.
> > It associates state machines with data, with transitions at certain
> > statements and edges.  It finds "interesting" interprocedural paths
> > through the user's code, in which bogus state transitions happen.
> > 
> > For example, given:
> > 
> >free (ptr);
> >free (ptr);
> > 
> > at the first call, "ptr" transitions to the "freed" state, and
> > at the second call the analyzer complains, since "ptr" is already
> > in
> > the "freed" state (unless "ptr" is NULL, in which case it stays in
> > the NULL state for both calls).
> > 
> > Specific state machines include:
> > - a checker for malloc/free, for detecting double-free, resource
> > leaks,
> >   use-after-free, etc (sm-malloc.cc), and
> 
> I can immediately see how this can be useful for a bunch of
> 'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as
> source
> code directives.  ..., and this would've flagged existing code in the
> libgomp OpenACC tests, which recently has given me some grief. Short
> summary/examples:
> 
> In addition to host-side 'malloc'/'free', there is device-side
> (separate
> memory space) 'acc_malloc'/'acc_free'. 

I've been thinking about generalizing the malloc/free checker to cover
resource acquisition/release pairs, adding a "domain" for the
allocation, where we'd complain if the resource release function isn't
of the same domain as the resource acquisition function.

Allocation domains might be:
  malloc/free
  C++ scalar new/delete
  C++ array new/delete
  FILE * (fopen/fclose)
  "foo_alloc"/"foo_release" for libfoo (i.e. user-extensible, via
attributes)

and thus catch things like deleting with scalar delete when the buffer
was allocated using new[], and various kinds of layering violations.

I'm afraid that I'm not very familiar with OpenACC.  Would
acc_malloc/acc_free fit into that pattern, or would more be needed? 
For example, can you e.g. dereference a device-side pointer in host
code, or would we ideally issue a diagnostic about that?

>  Static checking example: don't
> mix up host-side and device-side pointers.  (Both are normal C/C++
> pointers.  Hmm, maybe such checking could easily be implemented even
> outside of your checker by annotating the respective function
> declarations with an attribute describing which in/out arguments are
> host-side vs. device-side pointers.)
> 
> Then, there are functions to "map" host-side and device-side memory:
> 'acc_map_data'/'acc_unmap_data'.  Static checking example: you must
> not
> 'acc_free' memory spaces that are still mapped.

It sounds like this state machine is somewhat more complicated.

Is there a state transition diagram for this somewhere?  I don't have
that for my state machines, but there are at least lists of states; see
e.g. the various state_t within malloc_state_machine
near the top of:
https://gcc.gnu.or

Re: [PING] Re: [PATCH] Fix PR92088

2019-11-19 Thread Joseph Myers

On Tue, 19 Nov 2019, Richard Biener wrote:

> > +/* For nested functions disqualify ones taking VLAs by value
> > +   from inlining since the middle-end cannot deal with this.
> > +   ???  We should arrange for those to be passed by reference
> > +   with emitting the copy on the caller side in the frontend.  */
> > +if (storage_class == csc_none
> > +   && TREE_CODE (type) == FUNCTION_TYPE)
> > +  for (tree al = TYPE_ARG_TYPES (type); al; al = TREE_CHAIN (al))
> > +   {
> > + tree arg = TREE_VALUE (al);
> > + if (arg != error_mark_node
> > + && C_TYPE_VARIABLE_SIZE (TREE_VALUE (al)))

The second use of TREE_VALUE (al) looks like it would be better as a 
reference to arg.  OK with that change.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Multibyte awareness for diagnostics (PR 49973)

2019-11-19 Thread Joseph Myers

On Tue, 19 Nov 2019, David Malcolm wrote:

> If we're going with this approach (which I'll leave to Joseph), perhaps

I think reusing the glibc generator is appropriate.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: C++ PATCH for c++/91363 - P0960R3: Parenthesized initialization of aggregates


On 11/19/19 1:44 AM, Marek Polacek wrote:

It also looks like you're using the LOOKUP flag to mean two different
things:

1) try to treat parenthesized args as an aggregate initializer
(build_new_method_call_1)
2) treat this CONSTRUCTOR as coming from parenthesized args
(store_init_value/digest_init)


Correct.


Why is the flag needed for #1?  When do we not want to try to treat the args
as an aggregate initializer?


There are cases where treating the args as an aggregate initializer causes
spurious overload resolution successes, e.g.

  void swap(int&, int&);

  int& get();

  struct pair {
void swap(pair&) noexcept(noexcept(swap(get(), get( { } // { dg-error "no 
matching function for call" }
  };

There are no viable candidates for pair::swap (# of args mismatch) but since
pair is an aggregate, build_new_method_call_1 would return a CONSTRUCTOR so
overload resolution would succeed.  Another case had to do with SFINAE and
decltype where we didn't evaluate the arg, but succeeding in the
no-viable-function case caused the compiler to choose the wrong function.


Hmm, but then the parenthesized list is arguments for swap, not an 
initializer for a single argument of swap.  That would be using it for 
copy-initialization, and we only want to treat parenthesized args as an 
aggregate initializer in direct-initialization.  Can we check for 
direct-init (i.e. !LOOKUP_ONLYCONVERTING) instead?



-  expr = get_target_expr_sfinae (digest_init (totype, expr, complain),
+  /* We want an implicit lookup, but when initializing an aggregate
+from a parenthesized list, we must remember not to warn about
+narrowing conversions.  */
+  flags = LOOKUP_IMPLICIT;
+  if (CONSTRUCTOR_IS_PAREN_INIT (expr))
+   flags |= LOOKUP_AGGREGATE_PAREN_INIT;
+  expr = get_target_expr_sfinae (digest_init_flags (totype, expr,
+   flags, complain),


I was thinking to set the flag inside digest_init, so callers don't need 
to know about it.



@@ -6704,6 +6761,14 @@ check_initializer (tree decl, tree init, int flags, 
vec **cleanups)
}
  init_code = NULL_TREE;
}
+ else if (BRACE_ENCLOSED_INITIALIZER_P (init_code))
+   {
+ /* In C++20, the call to build_aggr_init could have created
+a CONSTRUCTOR to handle A(1, 2).  */
+ gcc_assert (cxx_dialect >= cxx2a);
+ init = init_code;
+ flags |= LOOKUP_AGGREGATE_PAREN_INIT;
+   }


I think you want to clear init_code after copying it into init.


+ init = build_aggr_init (decl, init, flags, tf_warning_or_error);
+ /* In C++20, a member initializer list can be initializing an
+aggregate from a parenthesized list of values:
+
+  struct S {
+A aggr;
+S() : aggr(1, 2, 3) { }
+  };
+
+ Build up an INIT_EXPR like we do for aggr{1, 2, 3}, so that
+ build_data_member_initialization can grok it.  */
+ if (cxx_dialect >= cxx2a)
+   {
+ tree t = init;
+ while (TREE_CODE (t) == EXPR_STMT
+|| TREE_CODE (t) == CONVERT_EXPR)
+   t = TREE_OPERAND (t, 0);
+ if (BRACE_ENCLOSED_INITIALIZER_P (t))
+   {
+ t = digest_init_flags (type, t,
+(LOOKUP_IMPLICIT
+ | LOOKUP_AGGREGATE_PAREN_INIT),
+tf_warning_or_error);
+ init = build2 (INIT_EXPR, type, decl, t);
+   }
+   }


All this should happen within the call to build_aggr_init. 
expand_default_init already builds INIT_EXPR in various cases; it could 
do so for this case, too.



+  if (BRACE_ENCLOSED_INITIALIZER_P (exp))
+{
+  gcc_assert (cxx_dialect >= cxx2a);
+  return finish_compound_literal (type, exp, complain,
+ fcl_functional_paren);
+}


How about handling this in build_cplus_new instead of places that also 
call build_cplus_new?


It might also be better to call digest_init in build_new_method_call_1 
and not involve finish_compound_literal at all, since the latter calls 
reshape_init.


Jason

Re: [C++ PATCH] c++/92450 - ICE with invalid nested name specifier.


On 11/18/19 7:04 PM, Marek Polacek wrote:

The newly added diagnostic causes an ICE because the new grokdeclarator call
returned error_mark_node and DECL_SOURCE_LOCATION crashes on that.  So don't
attempt to print the new diagnostic if we've encountered something not
resembling a bit-field.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-11-18  Marek Polacek  

PR c++/92450 - ICE with invalid nested name specifier.
* parser.c (cp_parser_member_declaration): Bail out for invalid code.

* g++.dg/parse/crash71.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index c473e7fd92f..caed6946977 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -25052,6 +25052,9 @@ cp_parser_member_declaration (cp_parser* parser)
  tree d = grokdeclarator (declarator, &decl_specifiers,
   BITFIELD, /*initialized=*/false,
   &attributes);
+ /* Hopelessly invalid code, give up.  */
+ if (error_operand_p (d))
+   goto out;
  error_at (DECL_SOURCE_LOCATION (d),
"bit-field %qD has non-integral type %qT",
d, TREE_TYPE (d));


Don't we still want the rest of the error-recovery code there, i.e. 
skipping to the end of the statement?


Jason

Re: [C++ Patch] Avoid duplicate warning


On 11/18/19 3:45 PM, Paolo Carlini wrote:

functions like c_common_truthvalue_conversion are shared with the C 
front-end thus don't get a tsubst_flags_t argument. It seems clear that 
long term we have to do something about those but in the meanwhile we 
have been using warning sentinels which, along the paths which certainly 
must have all the warnings suppressed, work well for now and cannot 
cause regressions. Here I propose to add (at least) one more to 
ocp_convert since I have a straightforward testcase.


OK.


Note sue if we want to proactively add, say, one for warn_parentheses too.


Not without a testcase.

Jason

Re: [patch, fortran] Load scalar intent-in variables at the beginning of procedures

2019-11-19 Thread Thomas Koenig


Am 19.11.19 um 11:39 schrieb Bernhard Reutner-Fischer:

+ char name[GFC_MAX_SYMBOL_LEN + 1];
+ snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
+   f->sym->name);
+
+ if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)

(I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the like.


GFC_MAX_SYMBOL_LEN is the maximum length of a gfortran symbol. AFAIK, it
is not possible to use a longer symbol name than that, so it needs to be
truncated. Uniqueness of the variable name is guaranteed by the var_num
variable.

If the user puts dummy arguments Supercalifragilisticexpialidociousa and
Supercalifragilisticexpialidociousb into the argument list of a
procedure, he will have to look at the numbers to differentiate them :-)


(II) s/__dummy/__intent_in/ for clarity?


It's moved away a bit from INTENT(IN) now, because an argument which
cannot be modified (even by passing to a procedure with a dummy argument
with unknown intent) is now also handled.

Regards

Thomas

libgo patch committed: Better cgo handling for . in pkgpath

2019-11-19 Thread Ian Lance Taylor

This libgo patch by Than McIntosh updates cgo's
gccgoPkgpathToSymbolNew() to bring it into conformance with the way
that gccgo now handles package paths with embedded dots (see
https://golang.org/cl/200838).  See also https://gcc.gnu.org/PR61880,
a related bug.  This patch is a copy of https://golang.org/cl/207957
in the main Go repo.  This is for https://golang.org/issue/35623.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 278316)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-25d5e9dca49ad3f49393b254dd87f8df51487c65
+245904ac148f15db530fb8d50f526c47d08024c5
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/cgo/out.go
===
--- libgo/go/cmd/cgo/out.go (revision 277299)
+++ libgo/go/cmd/cgo/out.go (working copy)
@@ -1318,8 +1318,10 @@ func gccgoPkgpathToSymbolNew(ppath strin
for _, c := range []byte(ppath) {
switch {
case 'A' <= c && c <= 'Z', 'a' <= c && c <= 'z',
-   '0' <= c && c <= '9', c == '_', c == '.':
+   '0' <= c && c <= '9', c == '_':
bsl = append(bsl, c)
+   case c == '.':
+   bsl = append(bsl, ".x2e"...)
default:
changed = true
encbytes := []byte(fmt.Sprintf("..z%02x", c))

[PATCH v2] libgomp/test: Add flags to find libatomic in build-tree testing

2019-11-19 Thread Maciej W. Rozycki

Add flags to find libatomic in build-tree testing, fixing a catastrophic 
libgomp testsuite failure with targets such as `riscv-linux-gnu' that 
imply `-latomic' with the `-pthread' GCC option, implied in turn by the 
`-fopenacc' and `-fopenmp' options, removing failures like:

.../bin/riscv64-linux-gnu-ld: cannot find -latomic
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: libgomp.c/../libgomp.c-c++-common/atomic-18.c (test for excess errors)
Excess errors:
.../bin/riscv64-linux-gnu-ld: cannot find -latomic

UNRESOLVED: libgomp.c/../libgomp.c-c++-common/atomic-18.c compilation failed to 
produce executable

and bringing overall test results for the said target (here with the 
`x86_64-linux-gnu' host and RISC-V QEMU in the Linux user emulation mode 
as the target board) from:

=== libgomp Summary ===

# of expected passes90
# of unexpected failures3267
# of expected failures  2
# of unresolved testcases   3247
# of unsupported tests  548

to:

=== libgomp Summary ===

# of expected passes6834
# of unexpected failures4
# of expected failures  4
# of unsupported tests  518

libgomp/
* testsuite/lib/libgomp.exp (libgomp_init): Add flags to find 
libatomic in build-tree testing.
---
Hi Jakub,

 Thank you for your review.

On Mon, 18 Nov 2019, Jakub Jelinek wrote:

> > --- gcc.orig/libgomp/testsuite/lib/libgomp.exp
> > +++ gcc/libgomp/testsuite/lib/libgomp.exp
> > @@ -174,6 +174,16 @@ proc libgomp_init { args } {
> >  # For build-tree testing, also consider the library paths used for 
> > builing.
> >  # For installed testing, we assume all that to be provided in the 
> > sysroot.
> >  if { $blddir != "" } {
> > +   # Offload options imply `-pthread', and that implies `-latomic'
> > +   # on some targets, so wire in libatomic build directories.
> 
> -fopenmp is not an option I'd like to call Offload option, OpenMP is 
> much more than just offloading, and this isn't on some targets, but only 
> one, riscv*.
> So, I think it should be done only for that target and talk about 
> -fopenmp/-fopenacc options instead of Offload options.

 Thanks for straightening me out on OpenMP; I have adjusted the comment 
and the change description accordingly.

 I think it makes no sense to require people to update this test harness 
if a future target also requires access to libatomic; the `-L' option and 
the dynamic loader path are harmless if not required, as libatomic won't 
actually be pulled unless otherwise requested and there is no other 
library in libatomic's build directory that could be used unintentionally.  
Whereas in the absence of these settings a random version of libatomic 
already present somewhere on the system may accidentally be used and 
affect test results somehow.

 Why do you think it is important that it is only the relevant (i.e. 
`riscv*') targets that the directory providing newly-built libatomic is 
pointed at ?

  Maciej

Changes from v1:

- Replaced references to "offload options" with "`-fopenacc' and 
  `-fopenmp' options".
---
 libgomp/testsuite/lib/libgomp.exp |   11 +++
 1 file changed, 11 insertions(+)

gcc-test-libgomp-atomic-lib-path.diff
Index: gcc/libgomp/testsuite/lib/libgomp.exp
===
--- gcc.orig/libgomp/testsuite/lib/libgomp.exp
+++ gcc/libgomp/testsuite/lib/libgomp.exp
@@ -174,6 +174,17 @@ proc libgomp_init { args } {
 # For build-tree testing, also consider the library paths used for builing.
 # For installed testing, we assume all that to be provided in the sysroot.
 if { $blddir != "" } {
+   # The `-fopenacc' and `-fopenmp' options imply `-pthread', and
+   # that implies `-latomic' on some targets, so wire in libatomic
+   # build directories.
+   set shlib_ext [get_shlib_extension]
+   set atomic_library_path "${blddir}/../libatomic/.libs"
+   if { [file exists "${atomic_library_path}/libatomic.a"]
+|| [file exists \
+"${atomic_library_path}/libatomic.${shlib_ext}"] } {
+   lappend ALWAYS_CFLAGS "additional_flags=-L${atomic_library_path}"
+   append always_ld_library_path ":${atomic_library_path}"
+   }
global cuda_driver_include
global cuda_driver_lib
if { $cuda_driver_include != "" } {

Re: Reverting r278411

Hi Richard,

On Tue, Nov 19, 2019 at 07:35:10PM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
> > author of this simplify-rtx code, and I think what you did is all wrong.
> > Also, it should be two patches, the CSE part should be separate.  (I can
> > not tell if that is the part that regresses us, either).
> 
> I committed them as one patch because I'd tested them together,
> which I thought was the way this should be done.

Ideally you would test separate patches separately, too :-)

> In practice
> a clean revert would have to be done as a pair anyway: reverting
> one part individually would have required XFAILs on the tests.
> 
> And yeah, it was the cse.c part that was the problem.

Thanks for finding the problem so quickly, much appreciated.

> find_sets_in_insn has:
> 
>   /* Don't count call-insns, (set (reg 0) (call ...)), as a set.
>The hard function value register is used only once, to copy to
>someplace else, so it isn't worth cse'ing.  */
>   else if (GET_CODE (SET_SRC (x)) == CALL)
>   ;
> 
> So n_sets == 1 can be true if we have a single SET in parallel
> with a call.  Unsurprising, I guess no-one had MEM sets in
> parallel with a call, so it didn't matter until now.
> 
> I'm retesting with a !CALL_P check added to make sure that was
> the only problem.
> 
> Before I resubmit, why is the simplify-rtx.c part all wrong?

It was nice and simple, and it isn't anymore.  8 4 2 1 for the four of
lt gt eq un are hardly worth documenting or making symbolic constants
for, because a) they are familiar to all powerpc users anyway (the
assembler has those as predefined constants!), admittedly this isn't a
strong argument for most people; but also b) they were only used in two
short and obvious routines.  After your patch the routines are no
longer short or obvious.

A comparison result is exactly one of four things: lt, gt, eq, or un.
So we can express testing for any subset of those with just an OR of
the masks for the individual conditions.  Whether a comparison is
floating point, floating point no-nans, signed, or unsigned, is just
a property of the comparison, not of the result, in this representation.

If you do not mix signed and unsigned comparisons (they make not much
sense mixed anyway(*)), you need no changes at all: just translate
ltu/gtu/leu/geu to/from lt/gt/le/ge on the way in and out of this
function (there already are helper functions for that, signed_condition
and unsigned_condition).

Segher

(*) lt(a,b) & ltu(a,b) means the signs of a and b are equal, and a

Re: [PATCH v2] libgomp/test: Add flags to find libatomic in build-tree testing

On Tue, Nov 19, 2019 at 11:11:39PM +, Maciej W. Rozycki wrote:
>  Why do you think it is important that it is only the relevant (i.e. 
> `riscv*') targets that the directory providing newly-built libatomic is 
> pointed at ?

Such changes don't happen every day, no other port added it similarly to
riscv during the 2.5 years since riscv had it, and no port needed it ever
before.  With unneeded -L options the log file is larger, one needs to
cut'n'paste more when trying to reproduce stuff etc.

Jakub

[C/C++ PATCH] Fix up build of GCC 4.6 and earlier with GCC 9+ (PR c/90677)

Hi!

The following patch fixes build of older GCC releases with newer ones.
In GCC 4.6 and earlier, we had:
struct cgraph_node;
struct cgraph_node *cgraph_node (tree);
and that is something on which GCC 9+ code errors on if it sees any
__gcc_diag__ and similar attributes, because cgraph_node it wants to find is
not a type.

As older GCC releases don't have the __gcc_diag__ etc. attributes guarded on
no newer GCC releases, only on minimum GCC version that does support it,
I think we need to make sure we don't reject what older GCCs used to have.

The following patch does that.  In addition, get_pointer_to_named_type
looked misnamed, because we actually aren't interested in getting gimple *
etc. type, but gimple.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and eventually 9 too?

2019-11-19  Jakub Jelinek  

PR c/90677
* c-format.c (get_pointer_to_named_type): Renamed to ...
(get_named_type): ... this.  If result is FUNCTION_DECL for
cgraph_node, return cgraph_node from pointee of return type if
possible instead of emitting an error.
(init_dynamic_diag_info): Adjust get_pointer_to_named_type callers
to call get_named_type instead.  Formatting fixes.

* c-c++-common/pr90677.c: New test.

--- gcc/c-family/c-format.c.jj  2019-10-05 09:35:12.917997709 +0200
+++ gcc/c-family/c-format.c 2019-11-19 13:05:10.113308578 +0100
@@ -4899,21 +4899,39 @@ init_dynamic_gfc_info (void)
 }
 }
 
-/* Lookup the type named NAME and return a pointer-to-NAME type if found.
-   Otherwise, return void_type_node if NAME has not been used yet, or 
NULL_TREE if
-   NAME is not a type (issuing an error).  */
+/* Lookup the type named NAME and return a NAME type if found.
+   Otherwise, return void_type_node if NAME has not been used yet,
+   or NULL_TREE if NAME is not a type (issuing an error).  */
 
 static tree
-get_pointer_to_named_type (const char *name)
+get_named_type (const char *name)
 {
   tree result;
-  if ((result = maybe_get_identifier (name)))
+  if (tree id = maybe_get_identifier (name))
 {
-  result = identifier_global_value (result);
+  result = identifier_global_value (id);
   if (result)
{
  if (TREE_CODE (result) != TYPE_DECL)
{
+ if (TREE_CODE (result) == FUNCTION_DECL
+ && !strcmp (name, "cgraph_node")
+ && TREE_CODE (TREE_TYPE (result)) == FUNCTION_TYPE)
+   {
+ /* Before GCC 4.7, there used to be
+struct cgraph_node;
+struct cgraph_node *cgraph_node (tree);
+Don't error in this case, so that GCC 9+ can still
+compile GCC 4.6 and earlier.  */
+ tree res = TREE_TYPE (TREE_TYPE (result));
+ if (TREE_CODE (res) == POINTER_TYPE
+ && (TYPE_NAME (TREE_TYPE (res)) == id
+ || (TREE_CODE (TYPE_NAME (TREE_TYPE (res)))
+ == TYPE_DECL
+ && (DECL_NAME (TYPE_NAME (TREE_TYPE (res)))
+ == id
+   return TREE_TYPE (res);
+   }
  error ("%qs is not defined as a type", name);
  result = NULL_TREE;
}
@@ -4953,23 +4971,24 @@ init_dynamic_diag_info (void)
 an extra type level.  */
   if ((local_tree_type_node = maybe_get_identifier ("tree")))
{
- local_tree_type_node = identifier_global_value (local_tree_type_node);
+ local_tree_type_node
+   = identifier_global_value (local_tree_type_node);
  if (local_tree_type_node)
{
  if (TREE_CODE (local_tree_type_node) != TYPE_DECL)
{
  error ("% is not defined as a type");
- local_tree_type_node = 0;
+ local_tree_type_node = NULL_TREE;
}
  else if (TREE_CODE (TREE_TYPE (local_tree_type_node))
   != POINTER_TYPE)
{
  error ("% is not defined as a pointer type");
- local_tree_type_node = 0;
+ local_tree_type_node = NULL_TREE;
}
  else
-   local_tree_type_node =
- TREE_TYPE (TREE_TYPE (local_tree_type_node));
+   local_tree_type_node
+ = TREE_TYPE (TREE_TYPE (local_tree_type_node));
}
}
   else
@@ -4979,12 +4998,12 @@ init_dynamic_diag_info (void)
   /* Similar to the above but for gimple*.  */
   if (!local_gimple_ptr_node
   || local_gimple_ptr_node == void_type_node)
-local_gimple_ptr_node = get_pointer_to_named_type ("gimple");
+local_gimple_ptr_node = get_named_type ("gimple");
 
   /* Similar to the above but for cgraph_node*.  */
   if (!local_cgraph_node_ptr_node
   || local_cgraph_node_ptr_node == void_type

[C++ PATCH] Fix weird addr_expr not supported by dump_expr messages (PR c++/90767)

Hi!

The following patch is a minimal fix to avoid
cannot convert ‘‘addr_expr’ not supported by dump_type’ to ‘X*’
and similar messages.  The recently added complain_about_bad_argument
function expects a from_type argument, but conv->from isn't necessarily a
type, it can be an expression too.

With this patch one gets error like:
cannot convert ‘const X*’ to ‘X*’
and note like
initializing argument 'this' of ‘void X::foo()’
Still, perhaps what GCC 8 and earlier used to emit might be clearer:
pr90767-1.C: In member function ‘X::operator T() const’:
pr90767-1.C:12:7: error: no matching function for call to ‘X::foo() const’
pr90767-1.C:6:8: note: candidate: ‘void X::foo()’ 
pr90767-1.C:6:8: note:   passing ‘const X*’ as ‘this’ argument discards 
qualifiers
There is the print_conversion_rejection function that handles the various
cases, like this vs. other arguments, conv->from with expr type vs. type
etc.
Though, I must say I don't understand the reasons why 
complain_about_bad_argument
has been added and whether we'd want to emit there what
print_conversion_rejection prints as notes with 2 leading spaces instead as
errors with no leading spaces.

In any case, I think the patch below is a step in the right direction.

2019-11-19  Jakub Jelinek  

PR c++/90767
* call.c (complain_about_no_candidates_for_method_call): If
conv->from is not a type, pass to complain_about_bad_argument
lvalue_type of conv->from.

* g++.dg/diagnostic/pr90767-1.C: New test.
* g++.dg/diagnostic/pr90767-2.C: New test.

--- gcc/cp/call.c.jj2019-11-18 18:49:14.461880924 +0100
+++ gcc/cp/call.c   2019-11-19 14:40:19.121937148 +0100
@@ -9861,8 +9861,11 @@ complain_about_no_candidates_for_method_
  if (const conversion_info *conv
= maybe_get_bad_conversion_for_unmatched_call (candidate))
{
+ tree from_type = conv->from;
+ if (!TYPE_P (conv->from))
+   from_type = lvalue_type (conv->from);
  complain_about_bad_argument (conv->loc,
-  conv->from, conv->to_type,
+  from_type, conv->to_type,
   candidate->fn, conv->n_arg);
  return;
}
--- gcc/testsuite/g++.dg/diagnostic/pr90767-1.C.jj  2019-11-19 
14:48:00.386041586 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr90767-1.C 2019-11-19 14:46:53.395043036 
+0100
@@ -0,0 +1,15 @@
+// PR c++/90767
+// { dg-do compile }
+
+struct X {
+  int n;
+  void foo (); // { dg-message "initializing argument 'this'" }
+
+  template
+  operator T () const
+{
+  if (n == 0)
+   foo (); // { dg-error "cannot convert 'const X\\*' to 'X\\*'" }
+  return n;
+}
+};
--- gcc/testsuite/g++.dg/diagnostic/pr90767-2.C.jj  2019-11-19 
14:50:48.923522136 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr90767-2.C 2019-11-19 14:52:27.324051149 
+0100
@@ -0,0 +1,15 @@
+// PR c++/90767
+// { dg-do compile }
+
+struct A {
+  struct B { B (int) {} };
+
+  template 
+  void foo ()
+  {
+int x = 0;
+bar (x);   // { dg-error "cannot convert 'int' to 'A::B&'" }
+  }
+
+  void bar (B &arg) {} // { dg-message "initializing argument 1" }
+};

Jakub

[PATCH] Fix ICE during MEM_REF expansion (PR middle-end/90840)

Hi!

On the following testcase we ICE on i686-linux (32-bit), because we store
(first 96-bit, then 72-bit) structure into the first part of a 2x 96-bit
complex long double, and for 96-bit floats there is no corresponding
integral mode that covers it and we ICE when op0 is not in MEM (it is a
REG).
The following patch handles the simple case where the whole dest REG is
covered and value is a MEM using a load from the memory, and for the rest
just spills on the stack, similarly how we punt when for stores to complex
REGs if the bitsize/bitnum cover portions of both halves.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-19  Jakub Jelinek  

PR middle-end/90840
* expmed.c (store_bit_field_1): Handle the case where op0 is not a MEM
and has a mode that doesn't have corresponding integral type.

* gcc.c-torture/compile/pr90840.c: New test.

--- gcc/expmed.c.jj 2019-11-15 00:37:32.0 +0100
+++ gcc/expmed.c2019-11-19 17:09:22.035129617 +0100
@@ -840,6 +840,27 @@ store_bit_field_1 (rtx str_rtx, poly_uin
   if (MEM_P (op0))
op0 = adjust_bitfield_address_size (op0, op0_mode.else_blk (),
0, MEM_SIZE (op0));
+  else if (!op0_mode.exists ())
+   {
+ if (ibitnum == 0
+ && known_eq (ibitsize, GET_MODE_BITSIZE (GET_MODE (op0)))
+ && MEM_P (value)
+ && !reverse)
+   {
+ value = adjust_address (value, GET_MODE (op0), 0);
+ emit_move_insn (op0, value);
+ return true;
+   }
+ if (!fallback_p)
+   return false;
+ rtx temp = assign_stack_temp (GET_MODE (op0),
+   GET_MODE_SIZE (GET_MODE (op0)));
+ emit_move_insn (temp, op0);
+ store_bit_field_1 (temp, bitsize, bitnum, 0, 0, fieldmode, value,
+reverse, fallback_p);
+ emit_move_insn (op0, temp);
+ return true;
+   }
   else
op0 = gen_lowpart (op0_mode.require (), op0);
 }
--- gcc/testsuite/gcc.c-torture/compile/pr90840.c.jj2019-11-19 
17:18:31.361918896 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr90840.c   2019-11-19 
17:11:06.010575339 +0100
@@ -0,0 +1,19 @@
+/* PR middle-end/90840 */
+struct S { long long a; int b; };
+struct S foo (void);
+struct __attribute__((packed)) T { long long a; char b; };
+struct T baz (void);
+
+void
+bar (void)
+{
+  _Complex long double c;
+  *(struct S *) &c = foo ();
+}
+
+void
+qux (void)
+{
+  _Complex long double c;
+  *(struct T *) &c = baz ();
+}

Jakub

[C++ PATCH] Fix up lambda decl specifier parsing ICE (PR c++/90842)

Hi!

In lambdas, the only valid decl specifiers are mutable, constexpr or
consteval.  For various other simple specifiers it is fine to parse them
and reject afterwards if the parsing is simple consuming of a single token
and setting some flags, but as the testcase shows, especially allowing
type specifiers, including new type definitions in there can cause ICEs.
The following patch punts for the cases where the parsing isn't that simple,
which I think is concept, typedef (there we e.g. commit tentative parsing)
or the type specifiers.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok of trunk?

2019-11-19  Jakub Jelinek  

PR c++/90842
* parser.c (cp_parser_decl_specifier_seq): For concept, typedef
or type specifier with CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR
don't try to parse them at all.

* g++.dg/cpp1y/lambda-generic-90842.C: New test.
* g++.dg/cpp0x/lambda/lambda-86550.C: Adjust expected diagnostics.

--- gcc/cp/parser.c.jj  2019-11-14 09:13:24.356104252 +0100
+++ gcc/cp/parser.c 2019-11-19 17:47:24.776014270 +0100
@@ -14094,6 +14094,12 @@ cp_parser_decl_specifier_seq (cp_parser*
  break;
 
 case RID_CONCEPT:
+  if (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR)
+{
+ found_decl_spec = false;
+ break;
+}
+
   ds = ds_concept;
   cp_lexer_consume_token (parser->lexer);
 
@@ -14136,6 +14142,12 @@ cp_parser_decl_specifier_seq (cp_parser*
  /* decl-specifier:
   typedef  */
case RID_TYPEDEF:
+  if (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR)
+{
+ found_decl_spec = false;
+ break;
+}
+
  ds = ds_typedef;
  /* Consume the token.  */
  cp_lexer_consume_token (parser->lexer);
@@ -14229,7 +14241,9 @@ cp_parser_decl_specifier_seq (cp_parser*
 
   /* If we don't have a DECL_SPEC yet, then we must be looking at
 a type-specifier.  */
-  if (!found_decl_spec && !constructor_p)
+  if (!found_decl_spec
+ && !constructor_p
+ && (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR) == 0)
{
  int decl_spec_declares_class_or_enum;
  bool is_cv_qualifier;
@@ -14288,9 +14302,6 @@ cp_parser_decl_specifier_seq (cp_parser*
  found_decl_spec = true;
  if (!is_cv_qualifier)
decl_specs->any_type_specifiers_p = true;
-
- if ((flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR) != 0)
-   error_at (token->location, "type-specifier invalid in lambda");
}
}
 
--- gcc/testsuite/g++.dg/cpp1y/lambda-generic-90842.C.jj2019-11-19 
17:53:37.682440002 +0100
+++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-90842.C   2019-11-19 
17:53:01.815976144 +0100
@@ -0,0 +1,4 @@
+// PR c++/90842
+// { dg-do compile { target c++14 } }
+
+auto a = [](auto x) struct C { void foo (); } {};  // { dg-error 
"expected" }
--- gcc/testsuite/g++.dg/cpp0x/lambda/lambda-86550.C.jj 2018-07-18 
23:01:22.824082949 +0200
+++ gcc/testsuite/g++.dg/cpp0x/lambda/lambda-86550.C2019-11-19 
17:54:26.978703112 +0100
@@ -4,6 +4,6 @@
 void
 foo ()
 {
-  auto a = []() bool {};   // { dg-error "type-specifier 
invalid in lambda" }
-  auto b = []() bool bool bool bool int {};// { dg-error "type-specifier 
invalid in lambda" }
+  auto a = []() bool {};   // { dg-error "expected" }
+  auto b = []() bool bool bool bool int {};// { dg-error "expected" }
 }

Jakub

[PATCH] Fix up arch= handling in x86 target attribute (PR target/90867)