Re: [PATCH] Fix PR79201 (half-way)

2017-05-12 Thread Richard Biener
On Thu, 11 May 2017, Uros Bizjak wrote:

> On Thu, May 11, 2017 at 4:02 PM, Richard Biener  wrote:
> 
> >> > Uros added the testcase in 2008 -- I think if we want to have a testcase
> >> > for the original issue we need a different one.  Or simply remove
> >> > the testcase.
> >>
> >> No, there is something going on in the testcase:
> >>
> >> .L3:
> >> movq(%ecx,%eax,8), %mm1
> >> paddq   (%ebx,%eax,8), %mm1
> >> addl$1, %eax
> >> movq%mm1, %mm0
> >> cmpl%eax, %edx
> >> jne .L3
> >>
> >>
> >> The compiler should allocate %mm0 to movq and paddq to avoid %mm1 ->
> >> %mm0 move. These are all movv1di patterns (they shouldn't interfere
> >> with movdi), and it is not clear to me why RA allocates %mm1 instead
> >> of %mm0.
> >
> > In any case the testcase is no longer testing what it tested as the
> > input to RA is now different.  The testcase doesn't make much sense:
> 
> Following is the cleaned testcase:
> 
> --cut here--
> /* { dg-do compile { target ia32 } } */
> /* { dg-options "-O2 -msse2 -mtune=core2" } */
> /* { dg-additional-options "-mno-vect8-ret-in-mem" { target *-*-vxworks* } } 
> */
> /* { dg-additional-options "-mabi=sysv" { target x86_64-*-mingw* } } */
> 
> #include 
> 
> typedef __SIZE_TYPE__ size_t;
> 
> __m64
> unsigned_add3 (const __m64 * a, const __m64 * b, size_t count)
> {
>   __m64 sum = { 0, 0 };
> 
>   if (count > 0)
> sum = _mm_add_si64 (a[count-1], b[count-1]);
> 
>   return sum;
> }
> 
> /* { dg-final { scan-assembler-times "movq\[ \\t\]+\[^\n\]*%mm" 1 } } */
> -- cut here--
> 
> The testcase still tests that only one movq is generated (gcc-4.1
> generated three). However, I have disabled the test on x86_64, since
> x86_64 returns mmx values in XMM registers, and MMX -> XMM moves
> always go through memory.

Thank you very much.

Richard.


Re: [RFA] Improve tree-ssa-uninit.c's predicate simplification

2017-05-12 Thread Richard Biener
On Thu, May 11, 2017 at 4:07 PM, Jeff Law  wrote:
> On 05/11/2017 01:50 AM, Richard Biener wrote:
>
>>
>> It actually seems to handle negation as well.  Which means it
>> handles disjunctive normal form.
>
> Negation is "handled" by allowing an individual predicate to be negated.
> However, the predicate must still be is_neq_zero_form_p to really
> participate in expansion, normalization & simplification (and negated
> predicates do not fit that form).  Furthermore, you can only negate an
> individual predicate, not a chain of predicates.  Thus you can express !A,
> but you can not express !(A & B) AFAICT.

Yes, because that's not normal form.  !(A && B) -> !A || !B, normalization can
cause quite some "duplication" as it simply spells out all possible combinations
where the result is true.

> Thus I was left with just looking for simplifications where I could
> eliminate a common term and the result would fit in the forms supported by
> tree-ssa-uninit.c
>
>
>> The patch should "simply" transform the input into disjunctive normal
>> form.
>> (X | Y) & (X | Z) happens to be conjunctive normal form (but I'm sure that
>> generally the input may be not in any of the two normal forms).
>
> THe disjunctive normal form can't actually be expressed with the
> infrastructure in tree-ssa-uninit.c.  Adding it seems like a huge amount of
> work with marginal benefit.  Maybe I'm missing something.

It seems to exactly support disjunctive normal form.

>>
>> Adding a single special-case doesn't look so useful to me.
>
> It's certainly not something that brings us a huge benefit -- adding
> disjunctive normal form would would be a much larger change conceptually as
> well as significantly complicate the existing code and it's not clear it
> would actually be a larger benefit than just supporting the elimination of a
> common term.

Adding "proper" normalization of all cases might be complicated, yes.

> Filtering of uninit warnings isn't all that interesting of a problem IMHO.
> When this code does something useful it really means that either an earlier
> pass (usually jump threading) missed an optimization or that the
> optimization was too expensive relative to the gain.  The regression of
> uninit-pred-8 falls into both categories -- we can't detect the jump thread
> and even if we did, the cost (in terms of blocks copied) would be huge
> relative to the elimination of a single runtime branch.
>
>
>>
>> Ugh, looking at the code it seems to be full of special cases (read:
>> it's quite ad-hoc) rather than building up a tree of |& conditions
>> and then normalizing it.
>
> Right.  It doesn't build a tree of operations, but it does build a chain of
> normalized predicates in conjunctive normal form.   In theory, any
> transformation that works on conjunctive normal form should be supportable
> in this code.

might have swapped conjunctive for disjunctive, whatever - it supports exactly
one of the normal forms so my question was why it doesn't "simply"
properly normalize all predicate trees rather than having ad-hoc handling of
some special cases.

>> Even if not pretty (vec > ...) the data structure in uninit
>> looks
>> sensible just it seems that while function sames suggest it should work
>> the
>> way I'd like it to it doesn't (for some reason).  Can't we fix that
>> instead please?
>
> That's probably a larger project than I can justify tackling at this time.

Pulling it out yes - but making the uninit normal form working properly?

Honestly I didn't try (otherwise you'd have seen a patch ;)) but before looking
yesterday I had the impression the code was much farther away from working
on one of the normal forms.

Richard.

> FWIW, I have asked folks in the past to look into pulling out the predicate
> building and analysis bits for re-use by other passes -- that code is
> probably the most interesting long term.  That would seem to be the natural
> time to rethink some of the implementation decisions, particularly since
> improvements we made to the predicate analysis would likely help multiple
> passes rather than just filtering uninit warnings.
>
> Jeff


Re: [PATCH] Don't assume __secure_getenv is available

2017-05-12 Thread Thomas Schwinge
Hi!

On Thu, 27 Apr 2017 21:50:51 +0300, Janne Blomqvist  
wrote:
> [...], retain the support for __secure_getenv but call it only via a
> weak reference.
> 
> Regtested on x86_64-pc-linux-gnu, Ok for trunk, 7.x when it reopens,
> 6, 5?

Hmm, how has this been tested?  Because:

> --- a/libgfortran/runtime/environ.c
> +++ b/libgfortran/runtime/environ.c

>  #ifdef FALLBACK_SECURE_GETENV
> +
> +#if SUPPORTS_WEAKREF && defined(HAVE___SECURE_GETENV)
> +static char* weak_secure_getenv (const char*)
> +  __attribute__((__weakref__("__secure_gettime")));
> +#endif

"gettime" vs. "getenv"?  ;-)


Grüße
 Thomas


Re: [PATCH] Externalize definition of create_tmp_reg_or_ssa_name

2017-05-12 Thread Richard Biener
On Thu, May 11, 2017 at 5:52 PM, Will Schmidt  wrote:
> Hi,
>
> I had initially posted this back in Dec, at which time it
> was given an OK.  Since a bunch of time has passed, I'm reposting
> to refresh memories, and ensure it's still OK.  :-)
>
> I will be needing access to the create_tmp_reg_or_ssa_name()
> function for some subsequent early expansion of vectors patches.
> Thus...
>
> Externalize the definition of create_tmp_reg_or_ssa_name
> for use in rs6000.c.  (Usage will show up in later patches).
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu, and
> powerpc-unknown-linux, (p7,p8le,p8be) with no regressions.
>
> OK for trunk?

Ok.

>
> 2017-05-10  Will Schmidt  
>
> [gcc]
> * gimple-fold.c (create_tmp_reg_or_ssa_name): Remove static 
> declaration.
> * gimple-fold.h (create_tmp_reg_or_ssa_name): New prototype.
>
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 9fd45d1..2eaeecb 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -161,8 +161,8 @@ can_refer_decl_in_current_unit_p (tree decl, tree 
> from_decl)
> is in SSA form, a SSA name is created.  Otherwise a temporary register
> is made.  */
>
> -static tree
> -create_tmp_reg_or_ssa_name (tree type, gimple *stmt = NULL)
> +tree
> +create_tmp_reg_or_ssa_name (tree type, gimple *stmt)
>  {
>if (gimple_in_ssa_p (cfun))
>  return make_ssa_name (type, stmt);
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index e4931a1..d34d880 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_GIMPLE_FOLD_H
>  #define GCC_GIMPLE_FOLD_H
>
> +extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
>  extern tree canonicalize_constructor_val (tree, tree);
>  extern tree get_symbol_constant_value (tree);
>  extern bool get_range_strlen (tree, tree[2]);
>
>


Re: [PATCH] decl lang hooks

2017-05-12 Thread Richard Biener
On Thu, May 11, 2017 at 7:58 PM, Nathan Sidwell  wrote:
> On 05/11/2017 01:50 PM, Rainer Orth wrote:
>
>> however, it breaks bootstrap with --enable-languages=obj-c++:
>
>
> wierd, I thought --enable-languges=all enabled that (and I have seen objc
> issues pop up during development).  Will take another look.

to enable all languages you need to do
--enable-languages=all,ada,obj-c++,go,brig

Richard.

> nathan
>
>
> --
> Nathan Sidwell


Re: [PATCH] Don't assume __secure_getenv is available

2017-05-12 Thread Janne Blomqvist
On Fri, May 12, 2017 at 10:23 AM, Thomas Schwinge
 wrote:
> Hi!
>
> On Thu, 27 Apr 2017 21:50:51 +0300, Janne Blomqvist 
>  wrote:
>> [...], retain the support for __secure_getenv but call it only via a
>> weak reference.
>>
>> Regtested on x86_64-pc-linux-gnu, Ok for trunk, 7.x when it reopens,
>> 6, 5?
>
> Hmm, how has this been tested?  Because:
>
>> --- a/libgfortran/runtime/environ.c
>> +++ b/libgfortran/runtime/environ.c
>
>>  #ifdef FALLBACK_SECURE_GETENV
>> +
>> +#if SUPPORTS_WEAKREF && defined(HAVE___SECURE_GETENV)
>> +static char* weak_secure_getenv (const char*)
>> +  __attribute__((__weakref__("__secure_gettime")));
>> +#endif
>
> "gettime" vs. "getenv"?  ;-)

Oops. I'm not at my gcc development box now, please consider a patch
to fix this pre-approved.

As for testing, I regtested, but my gcc development machine has glibc
2.23 which has secure_getenv so it doesn't exercise the fallback
path..



-- 
Janne Blomqvist


Re: [PATCH] Don't fold in save_expr (PR sanitizer/80536, PR sanitizer/80386)

2017-05-12 Thread Richard Biener
On Thu, May 11, 2017 at 3:49 PM, Marek Polacek  wrote:
> I had an interesting time coming to grips with these two PRs.  But it
> essentially comes down to the fold call in save_expr.  With that, we can call
> fold() on an expression whose operands weren't folded yet, but that is what
> code in fold_binary_loc assumes.  Since fold() is not recursive, we could end
> up there with expressions such as
> i * ((unsigned long) -0 + 1) * 2
> causing various sorts of problems: turning valid code into invalid, infinite
> recursion, etc.
>
> It's not clear why save_expr folds.  I did some archeology but all I could
> find was r67189 (that's 2003) which had:
>
> -  tree t = expr;
> +  tree t = fold (expr);
>tree inner;
>
> -  /* Don't fold a COMPONENT_EXPR: if the operand was a CONSTRUCTOR (the
> - only time it will fold), it can cause problems with PLACEHOLDER_EXPRs
> - in Ada.  Moreover, it isn't at all clear why we fold here at all.  */
> -  if (TREE_CODE (t) != COMPONENT_REF)
> -t = fold (t);
>
> so it wasn't clear even 14 years ago.  But now we know the call is harmful.
>
> Now, fold() doesn't recurse, but can it happen that it folds something anyway?
> And maybe turn the expression into an invariant so that we don't need to wrap
> it in a SAVE_EXPR?  Turns out it can, but in the C FE, which either uses
> c_save_expr that c_fully_folds, or calls save_expr when in_late_binary_op (so
> the operands have already been folded), it's very rare, and can only be 
> triggered
> by code such as
>
> void
> f (int i)
> {
>   int (*a)[i];
>   int x[sizeof (*a)];
> }
>
> so I'm not very worried about that.  For C++, Jakub suggested to add SAVE_EXPR
> handling to cp_fold.
>
> Future work: get rid of c_save_expr, only use save_expr, and teach 
> c_fully_fold
> how to fold SAVE_EXPR (once).  Although there's this c_wrap_maybe_const
> business...
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Interesting - I tried this once (removing fold () from save_expr) but
it failed miserably
(don't remember the details).  Maybe the cp/ part fixes it up.

Did you include Ada in testing?

Otherwise the tree.c change is ok.

(yay, one less to go in the attempt to remove fold ())

Thanks!
Richard.

> 2017-05-11  Marek Polacek  
>
> PR sanitizer/80536
> PR sanitizer/80386
> * cp-gimplify.c (cp_fold): Handle SAVE_EXPR.
>
> * tree.c (save_expr): Don't fold the expression.
>
> * c-c++-common/ubsan/pr80536.c: New test.
> * g++.dg/ubsan/pr80386.C: New test.
>
> diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
> index de62414..38a8ed4 100644
> --- gcc/cp/cp-gimplify.c
> +++ gcc/cp/cp-gimplify.c
> @@ -2428,6 +2428,15 @@ cp_fold (tree x)
>x = fold (x);
>break;
>
> +case SAVE_EXPR:
> +  /* A SAVE_EXPR might contain e.g. (0 * i) + (0 * j), which, after
> +folding, evaluates to an invariant.  In that case no need to wrap
> +this folded tree with a SAVE_EXPR.  */
> +  r = cp_fold (TREE_OPERAND (x, 0));
> +  if (tree_invariant_p (r))
> +   x = r;
> +  break;
> +
>  default:
>return org_x;
>  }
> diff --git gcc/testsuite/c-c++-common/ubsan/pr80536.c 
> gcc/testsuite/c-c++-common/ubsan/pr80536.c
> index e69de29..23913ad 100644
> --- gcc/testsuite/c-c++-common/ubsan/pr80536.c
> +++ gcc/testsuite/c-c++-common/ubsan/pr80536.c
> @@ -0,0 +1,9 @@
> +/* PR sanitizer/80536 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined" } */
> +
> +int
> +foo (int i)
> +{
> +  return ((i * (unsigned long long) (-0 + 1UL)) * 2) % 1;
> +}
> diff --git gcc/testsuite/g++.dg/ubsan/pr80386.C 
> gcc/testsuite/g++.dg/ubsan/pr80386.C
> index e69de29..60122da 100644
> --- gcc/testsuite/g++.dg/ubsan/pr80386.C
> +++ gcc/testsuite/g++.dg/ubsan/pr80386.C
> @@ -0,0 +1,13 @@
> +// PR sanitizer/80386
> +// { dg-do run }
> +// { dg-options "-fsanitize=undefined -fno-sanitize-recover" }
> +
> +static unsigned long long int i = 13996271126042720493ULL;
> +
> +int
> +main ()
> +{
> +  int r = (((2921 + 0) - short(i)) + 0x7fff) >> 0;
> +  asm volatile ("" : "+g" (r));
> +  return 0;
> +}
> diff --git gcc/tree.c gcc/tree.c
> index b76b521..7506725 100644
> --- gcc/tree.c
> +++ gcc/tree.c
> @@ -3337,7 +3337,6 @@ tree_invariant_p (tree t)
>  tree
>  save_expr (tree expr)
>  {
> -  tree t = fold (expr);
>tree inner;
>
>/* If the tree evaluates to a constant, then we don't want to hide that
> @@ -3345,33 +3344,32 @@ save_expr (tree expr)
>   However, a read-only object that has side effects cannot be bypassed.
>   Since it is no problem to reevaluate literals, we just return the
>   literal node.  */
> -  inner = skip_simple_arithmetic (t);
> +  inner = skip_simple_arithmetic (expr);
>if (TREE_CODE (inner) == ERROR_MARK)
>  return inner;
>
>if (tree_invariant_p_1 (inner))
> -return t;
> +return expr;
>
>/* If INNER contains a PLACEHOLDER_EXPR, we must evaluate it each time, 
> since
>   i

Re: [PATCH GCC8][13/33]Rewrite cost computation of ivopts

2017-05-12 Thread Christophe Lyon
Hi Bin,


On 4 May 2017 at 17:25, Bin.Cheng  wrote:
> On Wed, Apr 26, 2017 at 11:18 AM, Richard Biener
>  wrote:
>> On Wed, Apr 26, 2017 at 12:12 PM, Bin.Cheng  wrote:
>>> On Wed, Apr 26, 2017 at 10:50 AM, Richard Biener
>>>  wrote:
 On Tue, Apr 18, 2017 at 12:43 PM, Bin Cheng  wrote:
> Hi,
> This is the major part of this patch series.  It rewrites cost 
> computation of ivopts using tree affine.
> Apart from description given by cover message:
>   A) New computation cost model.  Currently, there are big amount code 
> trying to understand
>  tree expression and estimate its computation cost.  The model is 
> designed long ago
>  for generic tree expressions.  In order to process generic 
> expression (even address
>  expression of array/memory references), it has code for too many 
> corner cases.  The
>  problem is it's somehow impossible to handle all complicated 
> expressions, even with
>  complicated logic in functions like get_computation_cost_at, 
> difference_cost,
>  ptr_difference_cost, get_address_cost and so on...  The second 
> problem is it's hard
>  to keep cost model consistent among special cases.  As special cases 
> being added
>  from time to time, the model is no long unified any more.  There are 
> cases that right
>  cost results in bad code, or vice versa, wrong cost results in good 
> code.  Finally,
>  it's difficult to add code for new cases.
>  This patch introduces a new cost computation model by using tree 
> affine.  Tree exprs
>  are lowered to aff_tree which is simple arithmetic operation 
> usually.  Code handling
>  special cases is no longer necessary, which brings us quite 
> simplicity.  It is also
>  easier to compute consistent costs among different expressions using 
> tree affine,
>  which gives us a unified cost model.
> This patch also fixes issue that cost computation for address type iv_use 
> is inconsistent
> with how it is re-rewritten in the end.  It greatly simplified cost 
> computation.
>
> Is it OK?

 The patch is quite hard to follow (diff messes up here -- this is a
 case where a context
>>> Hi Richard,
>>> Thanks for reviewing, attachment is the updated context diff.  It also
>>> includes a minor fix handling pre-increment addressing mode,
>>> specifically, it adds two lines of code:
>>>
>>>  if (stmt_after_increment (data->current_loop, cand, use->stmt))
>>> ainc_offset += ainc_step;
>>>
>>>
 diff is easier to read).  I trust you on the implementation details
 here, the overall structure
 looks ok to me.  The only question I have is with regarding to
 get_loop_invariant_expr
 which seems to be much less sophisticated than before (basically it's
 now what was
 record_inv_expr?).  I suppose the old functionality is superseeded by
 using affines
 everywhere else.
>>> Yes, previous version tries a lot to cancel common sub expression when
>>> representing use with cand.  New version relies on tree affine which
>>> is much better.  One problem with invariant expression estimation is
>>> we don't know in IVOPT if the inv_expr would be hoisted or not by
>>> later passes.  This problem exists all the time, we can only make
>>> assumptions here, I think new version is bit more aggressive in
>>> recording new inv_expr here.
>>
>> Thanks.
>>
>> LGTM.
> Trivial update replacing calls to aff_combination_simple_p because of
> change in previous patch.
>
> Thanks,
> bin

This commit (r247885) causes regressions on aarch64:
  - PASS now FAIL [PASS => FAIL]:

  Executed from: gcc.target/aarch64/aarch64.exp
gcc.target/aarch64/pr62178.c scan-assembler ld1r\\t{v[0-9]+.

Christophe

>>
>> Richard.
>>
>>> Thanks,
>>> bin

 Otherwise ok.

 Thanks,
 Richard.


> Thanks,
> bin
> 2017-04-11  Bin Cheng  
>
> * tree-ssa-loop-ivopts.c (get_loop_invariant_expr): Simplify.
> (adjust_setup_cost): New parameter supporting round up adjustment.
> (struct address_cost_data): Delete.
> (force_expr_to_var_cost): Don't bound cost with spill_cost.
> (split_address_cost, ptr_difference_cost): Delete.
> (difference_cost, compare_aff_trees, record_inv_expr): Delete.
> (struct ainc_cost_data): New struct.
> (get_address_cost_ainc): New function.
> (get_address_cost, get_computation_cost): Reimplement.
> (determine_group_iv_cost_address): Record inv_expr for all uses of
> a group.
> (determine_group_iv_cost_cond): Call get_loop_invariant_expr.
> (iv_ca_has_deps): Reimplemented to ...
> (iv_ca_more_deps): ... this.  Check if NEW_CP introduces more deps
> than OLD_CP.
>   

Re: [PATCH] Don't assume __secure_getenv is available

2017-05-12 Thread Thomas Schwinge
Hi!

On Fri, 12 May 2017 10:26:59 +0300, Janne Blomqvist  
wrote:
> On Fri, May 12, 2017 at 10:23 AM, Thomas Schwinge
>  wrote:
> > On Thu, 27 Apr 2017 21:50:51 +0300, Janne Blomqvist 
> >  wrote:
> >> [...], retain the support for __secure_getenv but call it only via a
> >> weak reference.
> >>
> >> Regtested on x86_64-pc-linux-gnu, Ok for trunk, 7.x when it reopens,
> >> 6, 5?
> >
> > Hmm, how has this been tested?  Because:
> >
> >> --- a/libgfortran/runtime/environ.c
> >> +++ b/libgfortran/runtime/environ.c
> >
> >>  #ifdef FALLBACK_SECURE_GETENV
> >> +
> >> +#if SUPPORTS_WEAKREF && defined(HAVE___SECURE_GETENV)
> >> +static char* weak_secure_getenv (const char*)
> >> +  __attribute__((__weakref__("__secure_gettime")));
> >> +#endif
> >
> > "gettime" vs. "getenv"?  ;-)
> 
> Oops. I'm not at my gcc development box now, please consider a patch
> to fix this pre-approved.
> 
> As for testing, I regtested, but my gcc development machine has glibc
> 2.23 which has secure_getenv so it doesn't exercise the fallback
> path..

Then, that clearly isn't an appropriate testing methodology?  What I do
in such cases is manually induce the expected environment (for example,
here I'd probably try hacking out "HAVE_SECURE_GETENV" and
"HAVE___SECURE_GETENV" from the generated "config.h" or even
"libgfortran/configure"), and then manually run something that is
expected to behave differently in an environment relevant to
"secure_getenv" -- I doubt that such a thing is being (or, can really be)
included in the GCC testsuite?

Still untested -- ;-) -- but including another typo fix, committed to
trunk in r247952:

commit bc9457364b4b9a847c91e35a0aa5fc3b73df53a0
Author: tschwinge 
Date:   Fri May 12 07:56:41 2017 +

Typo fixes for "Don't assume __secure_getenv is available"

libgfortran/
* runtime/environ.c (weak_secure_getenv): Fix "__secure_gettime"
vs. "__secure_getenv" typo.
(secure_getenv): Fix "HAVE__SECURE_GETENV"
vs. "HAVE___SECURE_GETENV" typo.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@247952 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgfortran/ChangeLog | 7 +++
 libgfortran/runtime/environ.c | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git libgfortran/ChangeLog libgfortran/ChangeLog
index 337daaf..6b7da0a 100644
--- libgfortran/ChangeLog
+++ libgfortran/ChangeLog
@@ -1,3 +1,10 @@
+2017-05-12  Thomas Schwinge  
+
+   * runtime/environ.c (weak_secure_getenv): Fix "__secure_gettime"
+   vs. "__secure_getenv" typo.
+   (secure_getenv): Fix "HAVE__SECURE_GETENV"
+   vs. "HAVE___SECURE_GETENV" typo.
+
 2017-05-11  Janne Blomqvist  
 
* libgfortran.h: HAVE_SECURE_GETENV: Don't check
diff --git libgfortran/runtime/environ.c libgfortran/runtime/environ.c
index 969dcdf..f0a593e 100644
--- libgfortran/runtime/environ.c
+++ libgfortran/runtime/environ.c
@@ -40,13 +40,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 #if SUPPORTS_WEAKREF && defined(HAVE___SECURE_GETENV)
 static char* weak_secure_getenv (const char*)
-  __attribute__((__weakref__("__secure_gettime")));
+  __attribute__((__weakref__("__secure_getenv")));
 #endif
 
 char *
 secure_getenv (const char *name)
 {
-#if SUPPORTS_WEAKREF && defined(HAVE__SECURE_GETENV)
+#if SUPPORTS_WEAKREF && defined(HAVE___SECURE_GETENV)
   if (weak_secure_getenv)
 return weak_secure_getenv (name);
 #endif


Grüße
 Thomas


Re: [PATCH] Don't assume __secure_getenv is available

2017-05-12 Thread Janne Blomqvist
On Fri, May 12, 2017 at 11:02 AM, Thomas Schwinge
 wrote:
> Hi!
>
> On Fri, 12 May 2017 10:26:59 +0300, Janne Blomqvist 
>  wrote:
>> On Fri, May 12, 2017 at 10:23 AM, Thomas Schwinge
>>  wrote:
>> > On Thu, 27 Apr 2017 21:50:51 +0300, Janne Blomqvist 
>> >  wrote:
>> >> [...], retain the support for __secure_getenv but call it only via a
>> >> weak reference.
>> >>
>> >> Regtested on x86_64-pc-linux-gnu, Ok for trunk, 7.x when it reopens,
>> >> 6, 5?
>> >
>> > Hmm, how has this been tested?  Because:
>> >
>> >> --- a/libgfortran/runtime/environ.c
>> >> +++ b/libgfortran/runtime/environ.c
>> >
>> >>  #ifdef FALLBACK_SECURE_GETENV
>> >> +
>> >> +#if SUPPORTS_WEAKREF && defined(HAVE___SECURE_GETENV)
>> >> +static char* weak_secure_getenv (const char*)
>> >> +  __attribute__((__weakref__("__secure_gettime")));
>> >> +#endif
>> >
>> > "gettime" vs. "getenv"?  ;-)
>>
>> Oops. I'm not at my gcc development box now, please consider a patch
>> to fix this pre-approved.
>>
>> As for testing, I regtested, but my gcc development machine has glibc
>> 2.23 which has secure_getenv so it doesn't exercise the fallback
>> path..
>
> Then, that clearly isn't an appropriate testing methodology?  What I do
> in such cases is manually induce the expected environment (for example,
> here I'd probably try hacking out "HAVE_SECURE_GETENV" and
> "HAVE___SECURE_GETENV" from the generated "config.h" or even
> "libgfortran/configure"), and then manually run something that is
> expected to behave differently in an environment relevant to
> "secure_getenv"

Yes, I've done something like that in the past to test different
branches of #ifdeffy code. Here it was simple enough that I thought
"what could possibly go wrong?"; Plenty, apparently! ;)

> Still untested -- ;-) -- but including another typo fix, committed to
> trunk in r247952:

Thanks. I had actually fixed the HAVE___SECURE_GETENV typo myself in
my git tree, but then I committed the wrong patch to svn. Aagh..

Moar coffee needed...



-- 
Janne Blomqvist


Re: [patch, libfortran] Fix amount of memory allocation for matrix - vector calculation

2017-05-12 Thread Janne Blomqvist
On Fri, May 12, 2017 at 1:14 AM, Thomas Koenig  wrote:
> Hello world,
>
> the memory allocation for the buffer in the library matmul
> routines still has one problem: The value of 0xdeadbeef meant
> as poison could end up in the calculation of the size of the
> buffer for the blocked matmul.
>
> The attached patch fixes that. Verified with regression-test,
> also by running a few select test cases under valgrind.
>
> No test case because nothing appeared to fail.
>
> OK for trunk?

Patch missing?




-- 
Janne Blomqvist


Re: Test cases to check OpenACC offloaded function's attributes and classification

2017-05-12 Thread Thomas Schwinge
Hi!

On Wed, 10 May 2017 17:49:48 +0200, Jakub Jelinek  wrote:
> On Mon, May 08, 2017 at 07:02:15PM +0200, Thomas Schwinge wrote:
> > On Thu, 4 Aug 2016 16:06:10 +0200, I wrote:
> > > On Wed, 27 Jul 2016 10:59:02 +0200, I wrote:
> > > > OK for trunk?
> > 
> > (In the mean time, I also added some more testing.)

> > Test cases to check OpenACC offloaded function's attributes and 
> > classification

> Dunno if it isn't too fragile, but if you are willing to maintain it, ok.

Sure.  (Why would I propose it otherwise?)

Committed to trunk in r247953:

commit 692b887e5afc026d8217f0654896f6777edbf7a7
Author: tschwinge 
Date:   Fri May 12 08:42:31 2017 +

Test cases to check OpenACC offloaded function's attributes and 
classification

gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: New file.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/classify-parallel.f95: Likewise.
* gfortran.dg/goacc/classify-routine.f95: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@247953 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog| 11 ++
 .../goacc/classify-kernels-unparallelized.c| 39 
 .../c-c++-common/goacc/classify-kernels.c  | 35 ++
 .../c-c++-common/goacc/classify-parallel.c | 28 +++
 .../c-c++-common/goacc/classify-routine.c  | 30 
 .../goacc/classify-kernels-unparallelized.f95  | 41 ++
 .../gfortran.dg/goacc/classify-kernels.f95 | 37 +++
 .../gfortran.dg/goacc/classify-parallel.f95| 30 
 .../gfortran.dg/goacc/classify-routine.f95 | 29 +++
 9 files changed, 280 insertions(+)

diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index 3553c99..5ed40a5 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,14 @@
+2017-05-12  Thomas Schwinge  
+
+   * c-c++-common/goacc/classify-kernels-unparallelized.c: New file.
+   * c-c++-common/goacc/classify-kernels.c: Likewise.
+   * c-c++-common/goacc/classify-parallel.c: Likewise.
+   * c-c++-common/goacc/classify-routine.c: Likewise.
+   * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
+   * gfortran.dg/goacc/classify-kernels.f95: Likewise.
+   * gfortran.dg/goacc/classify-parallel.f95: Likewise.
+   * gfortran.dg/goacc/classify-routine.f95: Likewise.
+
 2017-05-11  Nathan Sidwell  
 
* lib/gcc-dg.exp (schedule-cleanups): Add lang dump capability.
diff --git gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c 
gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
new file mode 100644
index 000..a76351c
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -0,0 +1,39 @@
+/* Check offloaded function's attributes and classification for unparallelized
+   OpenACC kernels.  */
+
+/* { dg-additional-options "-O2" }
+   { dg-additional-options "-fdump-tree-ompexp" }
+   { dg-additional-options "-fdump-tree-parloops1-all" }
+   { dg-additional-options "-fdump-tree-oaccdevlow" } */
+
+#define N 1024
+
+extern unsigned int *__restrict a;
+extern unsigned int *__restrict b;
+extern unsigned int *__restrict c;
+
+/* An "extern"al mapping of loop iterations/array indices makes the loop
+   unparallelizable.  */
+extern unsigned int f (unsigned int);
+
+void KERNELS ()
+{
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  for (unsigned int i = 0; i < N; i++)
+c[i] = a[f (i)] + b[f (i)];
+}
+
+/* Check the offloaded function's attributes.
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(omp target 
entrypoint\\)\\)" 1 "ompexp" } } */
+
+/* Check that exactly one OpenACC kernels construct is analyzed, and that it
+   can't be parallelized.
+   { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function 
\\(, , \\), omp target entrypoint\\)\\)" 1 "parloops1" } }
+   { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" 
} } */
+
+/* Check the offloaded function's classification and compute dimensions (will
+   always be 1 x 1 x 1 for non-offloading compilation).
+   { dg-final { scan-tree-dump-times "(?n)Function is kernels offload" 1 
"oaccdevlow" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 
1 "oaccdevlow" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function 
\\(1, 1, 1\\), omp target entrypoint\\)\\)" 1 "oaccdevlow" } } *

Re: [PATCH] Make PRE/FRE elimination not do useless work

2017-05-12 Thread Richard Biener
On Thu, 11 May 2017, Christophe Lyon wrote:

> Hi Richard,
> 
> 
> On 10 May 2017 at 16:20, Richard Biener  wrote:
> >
> > So this is a patch that makes skipping unreachable code when
> > doing elimination possible.  Previously interesting interactions
> > with tail-merging made this impossible, now I seem to have
> > figured a way around this.
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > With this more elaborate stmt folding during elimination might
> > be in reach.
> >
> > Richard.
> >
> > 2017-05-10  Richard Biener  
> >
> > * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
> > Skip unreachable blocks and destinations.
> > (eliminate): Move stmt removal and fixup ...
> > (fini_eliminate): ... here.  Skip inserted exprs.
> > (pass_pre::execute): Move fini_pre after fini_eliminate.
> > * tree-ssa-tailmerge.c: Include tree-cfgcleanup.h.
> > (tail_merge_optimize): Run cleanup_tree_cfg if requested by
> > PRE to get rid of dead code that has invalid SSA form and
> > split critical edges again.
> >
> > Index: gcc/tree-ssa-pre.c
> > ===
> > --- gcc/tree-ssa-pre.c  (revision 247831)
> > +++ gcc/tree-ssa-pre.c  (working copy)
> > @@ -4196,9 +4196,14 @@ eliminate_dom_walker::before_dom_childre
> >/* Mark new bb.  */
> >el_avail_stack.safe_push (NULL_TREE);
> >
> > -  /* ???  If we do nothing for unreachable blocks then this will confuse
> > - tailmerging.  Eventually we can reduce its reliance on SCCVN now
> > - that we fully copy/constant-propagate (most) things.  */
> > +  /* Skip unreachable blocks marked unreachable during the SCCVN domwalk.  
> > */
> > +  edge_iterator ei;
> > +  edge e;
> > +  FOR_EACH_EDGE (e, ei, b->preds)
> > +if (e->flags & EDGE_EXECUTABLE)
> > +  break;
> > +  if (! e)
> > +return NULL;
> >
> >for (gphi_iterator gsi = gsi_start_phis (b); !gsi_end_p (gsi);)
> >  {
> > @@ -4695,10 +4700,8 @@ eliminate_dom_walker::before_dom_childre
> >  }
> >
> >/* Replace destination PHI arguments.  */
> > -  edge_iterator ei;
> > -  edge e;
> >FOR_EACH_EDGE (e, ei, b->succs)
> > -{
> > +if (e->flags & EDGE_EXECUTABLE)
> >for (gphi_iterator gsi = gsi_start_phis (e->dest);
> >!gsi_end_p (gsi);
> >gsi_next (&gsi))
> > @@ -4717,7 +4720,6 @@ eliminate_dom_walker::before_dom_childre
> > gimple_set_plf (SSA_NAME_DEF_STMT (sprime), NECESSARY, 
> > true);
> > }
> > }
> > -}
> >return NULL;
> >  }
> >
> > @@ -4743,9 +4745,6 @@ eliminate_dom_walker::after_dom_children
> >  static unsigned int
> >  eliminate (bool do_pre)
> >  {
> > -  gimple_stmt_iterator gsi;
> > -  gimple *stmt;
> > -
> >need_eh_cleanup = BITMAP_ALLOC (NULL);
> >need_ab_cleanup = BITMAP_ALLOC (NULL);
> >
> > @@ -4761,6 +4760,18 @@ eliminate (bool do_pre)
> >el_avail.release ();
> >el_avail_stack.release ();
> >
> > +  return el_todo;
> > +}
> > +
> > +/* Perform CFG cleanups made necessary by elimination.  */
> > +
> > +static unsigned
> > +fini_eliminate (void)
> > +{
> > +  gimple_stmt_iterator gsi;
> > +  gimple *stmt;
> > +  unsigned todo = 0;
> > +
> >/* We cannot remove stmts during BB walk, especially not release SSA
> >   names there as this confuses the VN machinery.  The stmts ending
> >   up in el_to_remove are either stores or simple copies.
> > @@ -4782,8 +4793,9 @@ eliminate (bool do_pre)
> > lhs = gimple_get_lhs (stmt);
> >
> >if (inserted_exprs
> > - && TREE_CODE (lhs) == SSA_NAME)
> > -   bitmap_clear_bit (inserted_exprs, SSA_NAME_VERSION (lhs));
> > + && TREE_CODE (lhs) == SSA_NAME
> > + && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (lhs)))
> > +   continue;
> >
> >gsi = gsi_for_stmt (stmt);
> >if (gimple_code (stmt) == GIMPLE_PHI)
> > @@ -4800,7 +4812,7 @@ eliminate (bool do_pre)
> > }
> >
> >/* Removing a stmt may expose a forwarder block.  */
> > -  el_todo |= TODO_cleanup_cfg;
> > +  todo |= TODO_cleanup_cfg;
> >  }
> >el_to_remove.release ();
> >
> > @@ -4819,18 +4831,10 @@ eliminate (bool do_pre)
> > }
> >
> >if (fixup_noreturn_call (stmt))
> > -   el_todo |= TODO_cleanup_cfg;
> > +   todo |= TODO_cleanup_cfg;
> >  }
> >el_to_fixup.release ();
> >
> > -  return el_todo;
> > -}
> > -
> > -/* Perform CFG cleanups made necessary by elimination.  */
> > -
> > -static unsigned
> > -fini_eliminate (void)
> > -{
> >bool do_eh_cleanup = !bitmap_empty_p (need_eh_cleanup);
> >bool do_ab_cleanup = !bitmap_empty_p (need_ab_cleanup);
> >
> > @@ -4844,8 +4848,8 @@ fini_eliminate (void)
> >BITMAP_FREE (need_ab_cleanup);
> >
> >if (do_eh_cleanup || do_ab_cleanup)
> > -return TODO_cleanup_cfg;
> > -  return 0;
> > +todo |= TODO_cleanup_cfg;
> > +  

[PATCH] Fix PR80713

2017-05-12 Thread Richard Biener

The following fixes up my earlier change in PRE for fallout with the
intricate interaction of the two "DCE" algorithms in PRE...

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2017-05-12  Richard Biener  

PR tree-optimization/80713
* tree-ssa-pre.c (remove_dead_inserted_code): Clear
inserted_exprs bit for not removed stmts.

* gcc.dg/torture/pr80713.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 247952)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -4947,8 +4947,14 @@ remove_dead_inserted_code (void)
}
 }
 
+  unsigned int to_clear = -1U;
   EXECUTE_IF_SET_IN_BITMAP (inserted_exprs, 0, i, bi)
 {
+  if (to_clear != -1U)
+   {
+ bitmap_clear_bit (inserted_exprs, to_clear);
+ to_clear = -1U;
+   }
   t = SSA_NAME_DEF_STMT (ssa_name (i));
   if (!gimple_plf (t, NECESSARY))
{
@@ -4969,7 +4975,14 @@ remove_dead_inserted_code (void)
  release_defs (t);
}
}
+  else
+   /* eliminate_fini will skip stmts marked for removal if we
+  already removed it and uses inserted_exprs for this, so
+  clear those we didn't end up removing.  */
+   to_clear = i;
 }
+  if (to_clear != -1U)
+bitmap_clear_bit (inserted_exprs, to_clear);
   BITMAP_FREE (worklist);
 }
 
Index: gcc/testsuite/gcc.dg/torture/pr80713.c
===
--- gcc/testsuite/gcc.dg/torture/pr80713.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr80713.c  (working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+int a, b, d, e, f;
+int *c;
+void g()
+{
+  for (;;)
+{
+  if (*c) {
+ int h;
+ *c = &h;
+  } else
+   b = 0;
+  if (f)
+   *c = g;
+  else
+   for (; a; a++)
+ for (;;) {
+ if (d)
+   break;
+ c = e;
+ }
+}
+}


Re: Use "oacc kernels" attribute for OpenACC kernels

2017-05-12 Thread Thomas Schwinge
Hi!

On Wed, 10 May 2017 18:28:38 +0200, Jakub Jelinek  wrote:
> On Mon, May 08, 2017 at 09:29:28PM +0200, Thomas Schwinge wrote:
> > Use "oacc kernels" attribute for OpenACC kernels

> > * omp-expand.c (expand_omp_target)
> > : Set "oacc kernels" attribute.
> 
> I think
>   * omp-expand.c (expand_omp_target) :
>   Set "oacc kernels" attribute.
> fits better.

So you overrule how Emacs Change Log mode's fill-paragraph things this
should be formatted.  ;-)

> > @@ -7451,7 +7457,7 @@ expand_omp_target (struct omp_region *region)
> >break;
> >  case BUILT_IN_GOACC_PARALLEL:
> >{
> > -   oacc_set_fn_attrib (child_fn, clauses, oacc_kernels_p, &args);
> > +   oacc_set_fn_attrib (child_fn, clauses, &args);
> > tagging = true;
> >}
> >/* FALLTHRU */
> 
> The {}s aren't needed around this, could you drop them?

Done, but aren't such cleanups usually requested to be done separately of
actual code changes?

> > +   pos = tree_cons (purpose[ix],
> > +build_int_cst (integer_type_node, dims[ix]),
> > +pos);
> 
> pos); would fit on the earlier line.

As already split over more than one line, I thought it was clearer if the
"chain" parameter was on its own line -- but you get to overrule me
there, too.  ;-)

> Ok with those changes.

Thanks; committed to trunk in r247955:

commit 1d3ea8fcacec29c9a89d9d0a505ed5fbdd5ad73e
Author: tschwinge 
Date:   Fri May 12 09:02:55 2017 +

Use "oacc kernels" attribute for OpenACC kernels

gcc/
* omp-expand.c (expand_omp_target) 
:
Set "oacc kernels" attribute.
* omp-general.c (oacc_set_fn_attrib): Remove is_kernel formal
parameter.  Adjust all users.
(oacc_fn_attrib_kernels_p): Remove function.
* omp-offload.c (execute_oacc_device_lower): Look for "oacc
kernels" attribute instead of calling oacc_fn_attrib_kernels_p.
* tree-ssa-loop.c (gate_oacc_kernels): Likewise.
* tree-parloops.c (create_parallel_loop): If oacc_kernels_p,
assert "oacc kernels" attribute is set.
gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/classify-parallel.f95: Likewise.
* gfortran.dg/goacc/classify-routine.f95: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@247955 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  | 13 
 gcc/omp-expand.c   | 20 ++
 gcc/omp-general.c  | 18 ++--
 gcc/omp-general.h  |  4 +---
 gcc/omp-offload.c  | 24 +++---
 gcc/testsuite/ChangeLog|  9 
 .../goacc/classify-kernels-unparallelized.c|  8 
 .../c-c++-common/goacc/classify-kernels.c  |  8 
 .../c-c++-common/goacc/classify-parallel.c |  2 +-
 .../c-c++-common/goacc/classify-routine.c  |  2 +-
 .../goacc/classify-kernels-unparallelized.f95  |  8 
 .../gfortran.dg/goacc/classify-kernels.f95 |  8 
 .../gfortran.dg/goacc/classify-parallel.f95|  2 +-
 .../gfortran.dg/goacc/classify-routine.f95 |  2 +-
 gcc/tree-parloops.c|  5 -
 gcc/tree-ssa-loop.c|  5 +
 16 files changed, 74 insertions(+), 64 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index e1f8cf5..aeb22df 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,16 @@
+2017-05-12  Thomas Schwinge  
+
+   * omp-expand.c (expand_omp_target) :
+   Set "oacc kernels" attribute.
+   * omp-general.c (oacc_set_fn_attrib): Remove is_kernel formal
+   parameter.  Adjust all users.
+   (oacc_fn_attrib_kernels_p): Remove function.
+   * omp-offload.c (execute_oacc_device_lower): Look for "oacc
+   kernels" attribute instead of calling oacc_fn_attrib_kernels_p.
+   * tree-ssa-loop.c (gate_oacc_kernels): Likewise.
+   * tree-parloops.c (create_parallel_loop): If oacc_kernels_p,
+   assert "oacc kernels" attribute is set.
+
 2017-05-11  Carl Love  
 
* config/rs6000/rs6000-c: Add support for built-in functions
diff --git gcc/omp-expand.c gcc/omp-expand.c
index 5c48b78..7a7c747 100644
--- gcc/omp-expand.c
+++ gcc/omp-expand.c
@@ -7083,7 +7083,16 @@ expand_omp_target (struct omp_region *region)
   exit_bb = region->exit;
 
   if (gimple_o

Re: Use "oacc kernels parallelized" attribute for parallelized OpenACC kernels

2017-05-12 Thread Thomas Schwinge
Hi!

On Wed, 10 May 2017 18:30:54 +0200, Jakub Jelinek  wrote:
> On Tue, May 09, 2017 at 10:57:34PM +0200, Thomas Schwinge wrote:
> > Use "oacc kernels parallelized" attribute for parallelized OpenACC 
> > kernels

> Ok.

Thanks.  Committed to trunk in r247957:

commit 5dd0c4e81e7a79afccfc936407affbdda2e3b737
Author: tschwinge 
Date:   Fri May 12 09:18:34 2017 +

[PR middle-end/69921] Use "oacc kernels parallelized" attribute for 
parallelized OpenACC kernels

gcc/
PR middle-end/69921
* tree-parloops.c (create_parallel_loop): Set "oacc kernels
parallelized" attribute for parallelized OpenACC kernels.
* omp-offload.c (execute_oacc_device_lower): Use it.
gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-counter-vars-function-scope.c:
Likewise.
* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
* c-c++-common/goacc/kernels-loop.c: Likewise.
* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
* c-c++-common/goacc/kernels-reduction.c: Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@247957 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |  5 +
 gcc/omp-offload.c  | 24 
 gcc/testsuite/ChangeLog| 26 ++
 .../goacc/classify-kernels-unparallelized.c|  2 +-
 .../c-c++-common/goacc/classify-kernels.c  |  6 ++---
 .../goacc/kernels-counter-vars-function-scope.c|  3 +--
 .../goacc/kernels-double-reduction-n.c |  3 +--
 .../c-c++-common/goacc/kernels-double-reduction.c  |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c  |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c  |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c  |  3 +--
 .../c-c++-common/goacc/kernels-loop-mod-not-zero.c |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c  |  3 +--
 .../c-c++-common/goacc/kernels-loop-nest.c |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop.c|  3 +--
 .../c-c++-common/goacc/kernels-one-counter-var.c   |  3 +--
 .../c-c++-common/goacc/kernels-reduction.c |  3 +--
 .../goacc/classify-kernels-unparallelized.f95  |  2 +-
 .../gfortran.dg/goacc/classify-kernels.f95 |  6 ++---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 |  3 +--
 .../gfortran.dg/goacc/kernels-loop-data-2.f95  |  3 +--
 .../goacc/kernels-loop-data-enter-exit-2.f95   |  3 +--
 .../goacc/kernels-loop-data-enter-exit.f95 |  3 +--
 .../gfortran.dg/goacc/kernels-loop-data-update.f95 |  3 +--
 .../gfortran.dg/goacc/kernels-loop-data.f95|  3 +--
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 |  5 ++---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95   |  3 +--
 gcc/tree-parloops.c| 16 +++--
 28 files changed, 89 insertions(+), 60 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index aeb22df..580a3db 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,5 +1,10 @@
 2017-05-12  Thomas Schwinge  
 
+   PR middle-end/69921
+   * tree-parloops.c (create_parallel_loop): Set "oacc kernels
+   parallelized" attribute for parallelized OpenACC kernels.
+   * omp-offload.c (execute_oacc_device_lower): Use it.
+
* omp-expand.c (expand_omp_target) :
Set "oacc kernels" attribute.
* omp-general.c (oacc_set_fn_attrib): Remove is_kernel formal
diff --git gcc/omp-offload.c gcc/omp-offload.c
index d24f131..9372f9e 100644
--- gcc/omp-offload.c
+++ gcc/omp-offload.c
@@ -1444,6 +1444,13 @@ exe

Re: [PATCH] make RTL/TREE/IPA dump kind an index

2017-05-12 Thread Rainer Orth
Hi Nathan,

> On 05/11/2017 06:57 PM, Rainer Orth wrote:
> er, the introduction of scanlang.exp broke libatomic, libgomp, and
>> libitm testing.  Their logfiles show
>>
>> trunk/12-gcc/build/sparc-sun-solaris2.12/libatomic/testsuite/libatomic.log:ERROR:
>> Couldn't find library file scanlang.exp.
>> trunk/12-gcc/build/sparc-sun-solaris2.12/libgomp/testsuite/libgomp.log:ERROR:
>> Couldn't find library file scanlang.exp.
>> trunk/12-gcc/build/sparc-sun-solaris2.12/libitm/testsuite/libitm.log:ERROR:
>> Couldn't find library file scanlang.exp.
>>
>> and the testsuites aren't run.  The following patch fixes this (manually
>> tested on libatomic and libitm so far).
>
> Thanks Rainer!

I've now installed the patch with the following ChangeLog entry after
libgomp testing finished:

2017-05-12  Rainer Orth  

libitm:
* testsuite/lib/libitm.exp: Load scanlang.exp.

libgomp:
* testsuite/lib/libgomp.exp: Load scanlang.exp.

libatomic:
* testsuite/lib/libatomic.exp: Load scanlang.exp.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Heads-Up: early LTO debug to land, breaking Mach-O / [X]COFF

2017-05-12 Thread Richard Biener

This is a heads-up that I am in the process of implementing the last
of Jasons review comments on the dwarf2out parts of early LTO debug
support.  I hope to post final patches early next week after thoroughly
re-testing everything.

Note that Mach-O and [X]COFF support in the simple-object machinery
is still missing for the early LTO debug feature so I am going to
break LTOing with DWARF debuginfo on Darwin and Windows (CCing
maintainers).  Mach-O support has been worked on a bit by Iain
and myself but the simple-object piece is still missing.
A workaround is to use stabs on these targets with LTO.

DWARF part: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01023.html
simple-object part: 
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01733.html

both still apply with some fuzz.

Richard.


Re: [PATCH 4/5 v2] Vect peeling cost model

2017-05-12 Thread Richard Biener
On Thu, May 11, 2017 at 5:21 PM, Robin Dapp  wrote:
> Included the workaround for SLP now. With it, testsuite is clean on x86
> as well.

All patches in the series are ok now.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2017-05-11  Robin Dapp  
>
> * tree-vect-data-refs.c (vect_get_data_access_cost):
> Workaround for SLP handling.
> (vect_enhance_data_refs_alignment):
> Remove check for supportable_dr_alignment, compute costs for
> doing no peeling at all, compare to the best peeling costs so
> far and do no peeling if cheaper.


Re: OpenACC C front end maintenance: c_parser_oacc_single_int_clause

2017-05-12 Thread Thomas Schwinge
Hi!

On Wed, 10 May 2017 18:32:28 +0200, Jakub Jelinek  wrote:
> On Tue, May 09, 2017 at 11:27:14PM +0200, Thomas Schwinge wrote:
> > OpenACC C front end maintenance: c_parser_oacc_single_int_clause

> Ok.

Thanks.  Committed to trunk in r247960:

commit 641fc3aef896abe9687038abfb5fe186c4f350c8
Author: tschwinge 
Date:   Fri May 12 09:33:18 2017 +

OpenACC C front end maintenance: c_parser_oacc_single_int_clause

gcc/c/
* c-parser.c (c_parser_omp_clause_num_gangs)
(c_parser_omp_clause_num_workers)
(c_parser_omp_clause_vector_length): Merge functions into...
(c_parser_oacc_single_int_clause): ... this new function.  Adjust
all users.
gcc/testsuite/
* c-c++-common/goacc/parallel-dims-1.c: New file.
* c-c++-common/goacc/parallel-dims-2.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@247960 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/c/ChangeLog|   8 +
 gcc/c/c-parser.c   | 176 ++---
 gcc/testsuite/ChangeLog|   3 +
 gcc/testsuite/c-c++-common/goacc/parallel-dims-1.c |   8 +
 gcc/testsuite/c-c++-common/goacc/parallel-dims-2.c | 134 
 5 files changed, 202 insertions(+), 127 deletions(-)

diff --git gcc/c/ChangeLog gcc/c/ChangeLog
index 497c9b9..b12f41d 100644
--- gcc/c/ChangeLog
+++ gcc/c/ChangeLog
@@ -1,3 +1,11 @@
+2017-05-12  Thomas Schwinge  
+
+   * c-parser.c (c_parser_omp_clause_num_gangs)
+   (c_parser_omp_clause_num_workers)
+   (c_parser_omp_clause_vector_length): Merge functions into...
+   (c_parser_oacc_single_int_clause): ... this new function.  Adjust
+   all users.
+
 2017-05-11  Nathan Sidwell  
 
* gimple-parser.c: Don't #include tree-dump.h.
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 9398652..90d2d17 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -11339,51 +11339,6 @@ c_parser_omp_clause_nowait (c_parser *parser 
ATTRIBUTE_UNUSED, tree list)
   return c;
 }
 
-/* OpenACC:
-   num_gangs ( expression ) */
-
-static tree
-c_parser_omp_clause_num_gangs (c_parser *parser, tree list)
-{
-  location_t num_gangs_loc = c_parser_peek_token (parser)->location;
-  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-{
-  location_t expr_loc = c_parser_peek_token (parser)->location;
-  c_expr expr = c_parser_expression (parser);
-  expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true);
-  tree c, t = expr.value;
-  t = c_fully_fold (t, false, NULL);
-
-  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-   {
- c_parser_error (parser, "expected integer expression");
- return list;
-   }
-
-  /* Attempt to statically determine when the number isn't positive.  */
-  c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, t,
-  build_int_cst (TREE_TYPE (t), 0));
-  protected_set_expr_location (c, expr_loc);
-  if (c == boolean_true_node)
-   {
- warning_at (expr_loc, 0,
- "% value must be positive");
- t = integer_one_node;
-   }
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_NUM_GANGS, "num_gangs");
-
-  c = build_omp_clause (num_gangs_loc, OMP_CLAUSE_NUM_GANGS);
-  OMP_CLAUSE_NUM_GANGS_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-}
-
-  return list;
-}
-
 /* OpenMP 2.5:
num_threads ( expression ) */
 
@@ -11671,48 +11626,54 @@ c_parser_omp_clause_is_device_ptr (c_parser *parser, 
tree list)
 }
 
 /* OpenACC:
-   num_workers ( expression ) */
+   num_gangs ( expression )
+   num_workers ( expression )
+   vector_length ( expression )  */
 
 static tree
-c_parser_omp_clause_num_workers (c_parser *parser, tree list)
+c_parser_oacc_single_int_clause (c_parser *parser, omp_clause_code code,
+tree list)
 {
-  location_t num_workers_loc = c_parser_peek_token (parser)->location;
-  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+  location_t loc = c_parser_peek_token (parser)->location;
+
+  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+return list;
+
+  location_t expr_loc = c_parser_peek_token (parser)->location;
+  c_expr expr = c_parser_expression (parser);
+  expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true);
+  tree c, t = expr.value;
+  t = c_fully_fold (t, false, NULL);
+
+  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
+
+  if (t == error_mark_node)
+return list;
+  else if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
 {
-  location_t expr_loc = c_parser_peek_token (parser)->location;
-  c_expr expr = c_parser_expression (parser);
-  expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true);
-  tree c, t

[PATCH] Limit perf data buffer during profiling

2017-05-12 Thread Andi Kleen
From: Andi Kleen 

With high -j parallelism the autofdo tests can randomly fail.
autofdo uses Linux perf to record profiling data.
Linux perf uses a locked perf buffer. By default it has
around 516k buffer per uid (/proc/sys/kernel/perf_event_mlock_kb).

An individual perf record tries to grab the full 516k,
which makes parallel perf record fail.

This patch limits the perf buffer for individual perf record to 8k.
With the default settings this allows a parallelism of the test
cases of 16, which is hopefully good enough

(if not would need to add some kind of semaphore, or ask
the user to increase the limit as root)

I also removed an unneeded -o perf.data option

Thanks to Marcin to finally spotting the problem.

Passes bootstrap and test on x86_64-linux. Ok for trunk?

gcc/testsuite/:

2017-05-12  Andi Kleen  

PR testsuite/77684
* lib/target-supports.exp (profopt-perf-wrapper): Remove
-p perf.data option. Add -m8 option.
---
 gcc/testsuite/lib/target-supports.exp | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 83e7f2670e6..a22657767db 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -522,9 +522,16 @@ proc check_effective_target_keeps_null_pointer_checks { } {
 
 # Return the autofdo profile wrapper
 
+# Linux by default allows 516KB of perf event buffers
+# in /proc/sys/kernel/perfe_event_mlock_kb
+# Each individual perf tries to grab it
+# This causes problems with parallel test suite runs. Instead
+# limit us to 8 pages (32K), which should be good enough
+# for the small test programs. With the default settings 
+# this allows parallelism of 16 and higher of parallel gcc-auto-profile
 proc profopt-perf-wrapper { } {
 global srcdir
-return "$srcdir/../config/i386/gcc-auto-profile -o perf.data "
+return "$srcdir/../config/i386/gcc-auto-profile -m8 "
 }
 
 # Return true if profiling is supported on the target.
-- 
2.12.2



Re: [RFC PATCH, i386]: Enable post-reload compare elimination pass

2017-05-12 Thread Hans-Peter Nilsson
(To-list pruned, my correction doesn't need attention.)

On Thu, 11 May 2017, Hans-Peter Nilsson wrote:
> On Wed, 10 May 2017, Jakub Jelinek wrote:
>
> > On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote:
> > > BTW: This patch now catches 417 cases (instead of 200+) in linux
> > > build, including e.g.:
> > >
> > > (parallel [
> > > (set (reg:CCZ 17 flags)
> > > (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
> > > (const_int 1 [0x1]))
> > > (const_int 0 [0])))
> > > (set (reg:DI 4 si)
> > > (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
> > > (const_int 1 [0x1]
> > > ])

> Anyway, people seem to drift towards the ccreg-last variant

JFTR, I miswrote that; I meant "towards the variant with
ccreg-first" as in Uros' example kept above and as opposed to my
example.

brgds, H-P


Re: [PATCH] Limit perf data buffer during profiling

2017-05-12 Thread Richard Biener
On Fri, May 12, 2017 at 11:38 AM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> With high -j parallelism the autofdo tests can randomly fail.
> autofdo uses Linux perf to record profiling data.
> Linux perf uses a locked perf buffer. By default it has
> around 516k buffer per uid (/proc/sys/kernel/perf_event_mlock_kb).
>
> An individual perf record tries to grab the full 516k,
> which makes parallel perf record fail.
>
> This patch limits the perf buffer for individual perf record to 8k.
> With the default settings this allows a parallelism of the test
> cases of 16, which is hopefully good enough
>
> (if not would need to add some kind of semaphore, or ask
> the user to increase the limit as root)
>
> I also removed an unneeded -o perf.data option
>
> Thanks to Marcin to finally spotting the problem.
>
> Passes bootstrap and test on x86_64-linux. Ok for trunk?

Ok.  Can you retain -o perf.data (even if that's the default)?

Thanks,
Richard.

> gcc/testsuite/:
>
> 2017-05-12  Andi Kleen  
>
> PR testsuite/77684
> * lib/target-supports.exp (profopt-perf-wrapper): Remove
> -p perf.data option. Add -m8 option.
> ---
>  gcc/testsuite/lib/target-supports.exp | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 83e7f2670e6..a22657767db 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -522,9 +522,16 @@ proc check_effective_target_keeps_null_pointer_checks { 
> } {
>
>  # Return the autofdo profile wrapper
>
> +# Linux by default allows 516KB of perf event buffers
> +# in /proc/sys/kernel/perfe_event_mlock_kb
> +# Each individual perf tries to grab it
> +# This causes problems with parallel test suite runs. Instead
> +# limit us to 8 pages (32K), which should be good enough
> +# for the small test programs. With the default settings
> +# this allows parallelism of 16 and higher of parallel gcc-auto-profile
>  proc profopt-perf-wrapper { } {
>  global srcdir
> -return "$srcdir/../config/i386/gcc-auto-profile -o perf.data "
> +return "$srcdir/../config/i386/gcc-auto-profile -m8 "
>  }
>
>  # Return true if profiling is supported on the target.
> --
> 2.12.2
>


Re: [PATCH GCC8][13/33]Rewrite cost computation of ivopts

2017-05-12 Thread Bin.Cheng
On Fri, May 12, 2017 at 8:39 AM, Christophe Lyon
 wrote:
> Hi Bin,
>
>
> On 4 May 2017 at 17:25, Bin.Cheng  wrote:
>> On Wed, Apr 26, 2017 at 11:18 AM, Richard Biener
>>  wrote:
>>> On Wed, Apr 26, 2017 at 12:12 PM, Bin.Cheng  wrote:
 On Wed, Apr 26, 2017 at 10:50 AM, Richard Biener
  wrote:
> On Tue, Apr 18, 2017 at 12:43 PM, Bin Cheng  wrote:
>> Hi,
>> This is the major part of this patch series.  It rewrites cost 
>> computation of ivopts using tree affine.
>> Apart from description given by cover message:
>>   A) New computation cost model.  Currently, there are big amount code 
>> trying to understand
>>  tree expression and estimate its computation cost.  The model is 
>> designed long ago
>>  for generic tree expressions.  In order to process generic 
>> expression (even address
>>  expression of array/memory references), it has code for too many 
>> corner cases.  The
>>  problem is it's somehow impossible to handle all complicated 
>> expressions, even with
>>  complicated logic in functions like get_computation_cost_at, 
>> difference_cost,
>>  ptr_difference_cost, get_address_cost and so on...  The second 
>> problem is it's hard
>>  to keep cost model consistent among special cases.  As special 
>> cases being added
>>  from time to time, the model is no long unified any more.  There 
>> are cases that right
>>  cost results in bad code, or vice versa, wrong cost results in good 
>> code.  Finally,
>>  it's difficult to add code for new cases.
>>  This patch introduces a new cost computation model by using tree 
>> affine.  Tree exprs
>>  are lowered to aff_tree which is simple arithmetic operation 
>> usually.  Code handling
>>  special cases is no longer necessary, which brings us quite 
>> simplicity.  It is also
>>  easier to compute consistent costs among different expressions 
>> using tree affine,
>>  which gives us a unified cost model.
>> This patch also fixes issue that cost computation for address type 
>> iv_use is inconsistent
>> with how it is re-rewritten in the end.  It greatly simplified cost 
>> computation.
>>
>> Is it OK?
>
> The patch is quite hard to follow (diff messes up here -- this is a
> case where a context
 Hi Richard,
 Thanks for reviewing, attachment is the updated context diff.  It also
 includes a minor fix handling pre-increment addressing mode,
 specifically, it adds two lines of code:

  if (stmt_after_increment (data->current_loop, cand, use->stmt))
 ainc_offset += ainc_step;


> diff is easier to read).  I trust you on the implementation details
> here, the overall structure
> looks ok to me.  The only question I have is with regarding to
> get_loop_invariant_expr
> which seems to be much less sophisticated than before (basically it's
> now what was
> record_inv_expr?).  I suppose the old functionality is superseeded by
> using affines
> everywhere else.
 Yes, previous version tries a lot to cancel common sub expression when
 representing use with cand.  New version relies on tree affine which
 is much better.  One problem with invariant expression estimation is
 we don't know in IVOPT if the inv_expr would be hoisted or not by
 later passes.  This problem exists all the time, we can only make
 assumptions here, I think new version is bit more aggressive in
 recording new inv_expr here.
>>>
>>> Thanks.
>>>
>>> LGTM.
>> Trivial update replacing calls to aff_combination_simple_p because of
>> change in previous patch.
>>
>> Thanks,
>> bin
>
> This commit (r247885) causes regressions on aarch64:
>   - PASS now FAIL [PASS => FAIL]:
>
>   Executed from: gcc.target/aarch64/aarch64.exp
> gcc.target/aarch64/pr62178.c scan-assembler ld1r\\t{v[0-9]+.
Hi,
Thanks for reporting.  The cause is a bit complicated for this
failure.  First, IVOPTs chooses [base + 4], rather than [base] address
expression because it has no knowledge that ld1r doesn't support
pre-autoincrement addressing mode; also RTL passes also contributes to
the regression.  I will open a PR for tracking.

Thanks,
bin
>
> Christophe
>
>>>
>>> Richard.
>>>
 Thanks,
 bin
>
> Otherwise ok.
>
> Thanks,
> Richard.
>
>
>> Thanks,
>> bin
>> 2017-04-11  Bin Cheng  
>>
>> * tree-ssa-loop-ivopts.c (get_loop_invariant_expr): Simplify.
>> (adjust_setup_cost): New parameter supporting round up 
>> adjustment.
>> (struct address_cost_data): Delete.
>> (force_expr_to_var_cost): Don't bound cost with spill_cost.
>> (split_address_cost, ptr_difference_cost): Delete.
>> (difference_cost, compare_aff_trees, record_inv_expr): Delete.
>

Re: [PATCH] Don't fold in save_expr (PR sanitizer/80536, PR sanitizer/80386)

2017-05-12 Thread Marek Polacek
On Fri, May 12, 2017 at 09:32:48AM +0200, Richard Biener wrote:
> On Thu, May 11, 2017 at 3:49 PM, Marek Polacek  wrote:
> > I had an interesting time coming to grips with these two PRs.  But it
> > essentially comes down to the fold call in save_expr.  With that, we can 
> > call
> > fold() on an expression whose operands weren't folded yet, but that is what
> > code in fold_binary_loc assumes.  Since fold() is not recursive, we could 
> > end
> > up there with expressions such as
> > i * ((unsigned long) -0 + 1) * 2
> > causing various sorts of problems: turning valid code into invalid, infinite
> > recursion, etc.
> >
> > It's not clear why save_expr folds.  I did some archeology but all I could
> > find was r67189 (that's 2003) which had:
> >
> > -  tree t = expr;
> > +  tree t = fold (expr);
> >tree inner;
> >
> > -  /* Don't fold a COMPONENT_EXPR: if the operand was a CONSTRUCTOR (the
> > - only time it will fold), it can cause problems with PLACEHOLDER_EXPRs
> > - in Ada.  Moreover, it isn't at all clear why we fold here at all.  */
> > -  if (TREE_CODE (t) != COMPONENT_REF)
> > -t = fold (t);
> >
> > so it wasn't clear even 14 years ago.  But now we know the call is harmful.

I've come to believe this was purely to discover more invariants.

> > Now, fold() doesn't recurse, but can it happen that it folds something 
> > anyway?
> > And maybe turn the expression into an invariant so that we don't need to 
> > wrap
> > it in a SAVE_EXPR?  Turns out it can, but in the C FE, which either uses
> > c_save_expr that c_fully_folds, or calls save_expr when in_late_binary_op 
> > (so
> > the operands have already been folded), it's very rare, and can only be 
> > triggered
> > by code such as
> >
> > void
> > f (int i)
> > {
> >   int (*a)[i];
> >   int x[sizeof (*a)];
> > }
> >
> > so I'm not very worried about that.  For C++, Jakub suggested to add 
> > SAVE_EXPR
> > handling to cp_fold.
> >
> > Future work: get rid of c_save_expr, only use save_expr, and teach 
> > c_fully_fold
> > how to fold SAVE_EXPR (once).  Although there's this c_wrap_maybe_const
> > business...
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> Interesting - I tried this once (removing fold () from save_expr) but
> it failed miserably
> (don't remember the details).  Maybe the cp/ part fixes it up.

I think that must've been before C++ added all that constexpr bits.  The
cp_fold part is just an optimization that triggers very rarely.

> Did you include Ada in testing?

Not before, so I've tested this now with Ada included -- looks fine still.

> Otherwise the tree.c change is ok.

Thanks.  Jason/Joseph, any comments?

> (yay, one less to go in the attempt to remove fold ())

Do we have any low hanging fruit here?  Suppose not...

Marek


[PATCH] Fix PR80721

2017-05-12 Thread Richard Biener

It was pointed out by Markus that the EH emergency pool is not
kept sorted and fully merged properly for the cases of freeing
an entry before the first free entry and for the cases where
merging with the immediate successor and for the case with
merging with both successor and predecessor is possible.

The following patch attempts to fix this.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok for trunk?  (given low / close to no testing coverage
extra close eyes wanted!)

Reporter says maybe it can't happen in real-life as it requires
EH deallocation order not be the reverse of allocation order.
I don't know enough here for a quick guess but "in C++ everything
is possible" ;)

Thanks,
Richard.

2017-05-12  Richard Biener  
Markus Eisenmann  

PR libstdc++/80721
* libsupc++/eh_alloc.cc (pool::free): Keep list properly
sorted and add missing freelist item merging cases.

Index: libstdc++-v3/libsupc++/eh_alloc.cc
===
--- libstdc++-v3/libsupc++/eh_alloc.cc  (revision 247951)
+++ libstdc++-v3/libsupc++/eh_alloc.cc  (working copy)
@@ -194,13 +194,17 @@ namespace
   allocated_entry *e = reinterpret_cast 
(reinterpret_cast  (data) - offsetof (allocated_entry, data));
   std::size_t sz = e->size;
-  if (!first_free_entry)
+  if (!first_free_entry
+ || (reinterpret_cast  (e) + sz
+ < reinterpret_cast  (first_free_entry)))
{
- // If the free list is empty just put the entry there.
+ // If the free list is empty or the entry is before the
+ // first element and cannot be merged with it add it as
+ // the first free entry.
  free_entry *f = reinterpret_cast  (e);
  new (f) free_entry;
  f->size = sz;
- f->next = NULL;
+ f->next = first_free_entry;
  first_free_entry = f;
}
   else if (reinterpret_cast  (e) + sz
@@ -224,9 +228,17 @@ namespace
   > reinterpret_cast  (e) + sz);
   fe = &(*fe)->next)
;
+ // If we can merge the next block into us do so and continue
+ // with the cases below.
+ if (reinterpret_cast  (e) + sz
+ == reinterpret_cast  ((*fe)->next))
+   {
+ sz += (*fe)->next->size;
+ (*fe)->next = (*fe)->next->next;
+   }
  if (reinterpret_cast  (*fe) + (*fe)->size
  == reinterpret_cast  (e))
-   /* Merge with the freelist entry.  */
+   // Merge with the freelist entry.
(*fe)->size += sz;
  else
{


Re: [PATCH] Don't fold in save_expr (PR sanitizer/80536, PR sanitizer/80386)

2017-05-12 Thread Richard Biener
On Fri, May 12, 2017 at 12:09 PM, Marek Polacek  wrote:
> On Fri, May 12, 2017 at 09:32:48AM +0200, Richard Biener wrote:
>> On Thu, May 11, 2017 at 3:49 PM, Marek Polacek  wrote:
>> > I had an interesting time coming to grips with these two PRs.  But it
>> > essentially comes down to the fold call in save_expr.  With that, we can 
>> > call
>> > fold() on an expression whose operands weren't folded yet, but that is what
>> > code in fold_binary_loc assumes.  Since fold() is not recursive, we could 
>> > end
>> > up there with expressions such as
>> > i * ((unsigned long) -0 + 1) * 2
>> > causing various sorts of problems: turning valid code into invalid, 
>> > infinite
>> > recursion, etc.
>> >
>> > It's not clear why save_expr folds.  I did some archeology but all I could
>> > find was r67189 (that's 2003) which had:
>> >
>> > -  tree t = expr;
>> > +  tree t = fold (expr);
>> >tree inner;
>> >
>> > -  /* Don't fold a COMPONENT_EXPR: if the operand was a CONSTRUCTOR (the
>> > - only time it will fold), it can cause problems with PLACEHOLDER_EXPRs
>> > - in Ada.  Moreover, it isn't at all clear why we fold here at all.  */
>> > -  if (TREE_CODE (t) != COMPONENT_REF)
>> > -t = fold (t);
>> >
>> > so it wasn't clear even 14 years ago.  But now we know the call is harmful.
>
> I've come to believe this was purely to discover more invariants.
>
>> > Now, fold() doesn't recurse, but can it happen that it folds something 
>> > anyway?
>> > And maybe turn the expression into an invariant so that we don't need to 
>> > wrap
>> > it in a SAVE_EXPR?  Turns out it can, but in the C FE, which either uses
>> > c_save_expr that c_fully_folds, or calls save_expr when in_late_binary_op 
>> > (so
>> > the operands have already been folded), it's very rare, and can only be 
>> > triggered
>> > by code such as
>> >
>> > void
>> > f (int i)
>> > {
>> >   int (*a)[i];
>> >   int x[sizeof (*a)];
>> > }
>> >
>> > so I'm not very worried about that.  For C++, Jakub suggested to add 
>> > SAVE_EXPR
>> > handling to cp_fold.
>> >
>> > Future work: get rid of c_save_expr, only use save_expr, and teach 
>> > c_fully_fold
>> > how to fold SAVE_EXPR (once).  Although there's this c_wrap_maybe_const
>> > business...
>> >
>> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
>>
>> Interesting - I tried this once (removing fold () from save_expr) but
>> it failed miserably
>> (don't remember the details).  Maybe the cp/ part fixes it up.
>
> I think that must've been before C++ added all that constexpr bits.  The
> cp_fold part is just an optimization that triggers very rarely.
>
>> Did you include Ada in testing?
>
> Not before, so I've tested this now with Ada included -- looks fine still.
>
>> Otherwise the tree.c change is ok.
>
> Thanks.  Jason/Joseph, any comments?
>
>> (yay, one less to go in the attempt to remove fold ())
>
> Do we have any low hanging fruit here?  Suppose not...

Factoring out the quaternary cases to better handle

  fold (build4 (code, ...

that happens in quite a few places.  And then there's

tree t = build3 (BIT_FIELD_REF, currop->type, genop0, op1, op2);
REF_REVERSE_STORAGE_ORDER (t) = currop->reverse;
return fold (t);

for the lack of [fold_]buildN getting the REF_REVERSE_STORAGE_ORDER flag ...
(I fully blame Eric for this ;)).

Ok, now starts to be non-low-hanging.

Richard.

> Marek


Re: Go patches committed: merge recent changes to gofrontend

2017-05-12 Thread Thomas Schwinge
Hi!

This doesn't block me in any way, but I wanted to report it anyway:

On Wed, 10 May 2017 10:26:15 -0700, Ian Lance Taylor  wrote:
> I have committed a large patch to update the Go frontend and libgo to
> the recent changes in the gofrontend repository.

Doing an incremental rebuild, that ran into:

[...]/source-gcc/libgo/go/runtime/heapdump.go:379:14: error: reference to 
undefined identifier 'sys.Goexperiment'
  dumpstr(sys.Goexperiment)
  ^
make[3]: *** [runtime.lo] Error 1
make[3]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/libgo'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/libgo'
make[1]: *** [all] Error 2
make[1]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/libgo'
make: *** [all-target-libgo] Error 2

Removing "x86_64-pc-linux-gnu/libgo", and rebuilding, it passed.

There is:

> commit eab2960aee91d3e3a6baa5b1bce01262d24c714f
> Author: Ian Lance Taylor 
> Date:   Thu Apr 20 17:08:19 2017 -0700
> 
> runtime/internal/sys: define Goexperiment
> 
> The gc toolchain defines Goexperiment based on the environment
> variable GOEXPERIMENT when the toolchain is built.  We just always set
> Goexperiment to the empty string.
> 
> Reviewed-on: https://go-review.googlesource.com/41292

| diff --git a/libgo/Makefile.am b/libgo/Makefile.am
| index f600a83..f4bf2bc 100644
| --- a/libgo/Makefile.am
| +++ b/libgo/Makefile.am
| 
| @@ -512,6 +512,7 @@
|   echo "package sys" > version.go.tmp
|   echo 'const DefaultGoroot = "$(prefix)"' >> version.go.tmp
|   echo 'const TheVersion = "'`cat $(srcdir)/VERSION | sed 1q`' '`$(GOC) 
--version | sed 1q`'"' >> version.go.tmp
| + echo 'const Goexperiment = ``' >> version.go.tmp
|   echo 'const GOARCH = "'$(GOARCH)'"' >> version.go.tmp
|   echo 'const GOOS = "'$(GOOS)'"' >> version.go.tmp
|   echo 'const GccgoToolDir = "$(libexecsubdir)"' >> version.go.tmp

In the failed build's tree, I do see that the Makefile has been renewed,
but the version.go file has not yet been, at the time the build failed.
Is there some missing dependency, or should that be implicit?
libgo/go/runtime/heapdump.go also is new, and is the only user of
sys.Goexperiment, as far as I can tell.


Grüße
 Thomas


Re: Heads-Up: early LTO debug to land, breaking Mach-O / [X]COFF

2017-05-12 Thread JonY
On 05/12/2017 09:24 AM, Richard Biener wrote:
> 
> This is a heads-up that I am in the process of implementing the last
> of Jasons review comments on the dwarf2out parts of early LTO debug
> support.  I hope to post final patches early next week after thoroughly
> re-testing everything.
> 
> Note that Mach-O and [X]COFF support in the simple-object machinery
> is still missing for the early LTO debug feature so I am going to
> break LTOing with DWARF debuginfo on Darwin and Windows (CCing
> maintainers).  Mach-O support has been worked on a bit by Iain
> and myself but the simple-object piece is still missing.
> A workaround is to use stabs on these targets with LTO.
> 
> DWARF part: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01023.html
> simple-object part: 
> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01733.html
> 
> both still apply with some fuzz.
> 
> Richard.
> 

AFAIK, LTO doesn't work well on Windows to begin with, if non-LTO modes
are unaffected, I have no issues with this.




signature.asc
Description: OpenPGP digital signature


[PATCH][X86] Add missing xgetbv xsetbv intrinsics

2017-05-12 Thread Koval, Julia
Hi,

This patch add these missing intrinsics:
_xsetbv
_xgetbv

gcc/
* config/i386/i386-builtin-types.def (VOID_FTYPE_INT_INT64): New type.
* config/i386/i386-builtin.def (__builtin_ia32_xgetbv,
__builtin_ia32_xsetbv): New builtins.
* config/i386/i386.c (ix86_expand_special_args_builtin): Process new 
type.
(ix86_expand_builtin): Special expand for new intrinsics.
* config/i386/i386.md: (UNSPECV_XGETBV, UNSPECV_XSETBV): New.
(xsetbv, xsetbv_rex64, xgetbv, xgetbv_rex64): New patterns.
* config/i386/xsaveintrin.h (_xsetbv, _getbv): New intrinsics.

gcc/testsuite
* gcc.target/i386/xgetsetbv.c: New test.

Ok for trunk?

Thanks,
Julia


xgetbv_patch
Description: xgetbv_patch


Re: [RFC PATCH, i386]: Enable post-reload compare elimination pass

2017-05-12 Thread Jakub Jelinek
On Fri, May 12, 2017 at 05:42:59AM -0400, Hans-Peter Nilsson wrote:
> (To-list pruned, my correction doesn't need attention.)
> 
> On Thu, 11 May 2017, Hans-Peter Nilsson wrote:
> > On Wed, 10 May 2017, Jakub Jelinek wrote:
> >
> > > On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote:
> > > > BTW: This patch now catches 417 cases (instead of 200+) in linux
> > > > build, including e.g.:
> > > >
> > > > (parallel [
> > > > (set (reg:CCZ 17 flags)
> > > > (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
> > > > (const_int 1 [0x1]))
> > > > (const_int 0 [0])))
> > > > (set (reg:DI 4 si)
> > > > (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] 
> > > > [93])
> > > > (const_int 1 [0x1]
> > > > ])
> 
> > Anyway, people seem to drift towards the ccreg-last variant
> 
> JFTR, I miswrote that; I meant "towards the variant with
> ccreg-first" as in Uros' example kept above and as opposed to my
> example.

If I skim the current primary/secondary targets, then i386/x86_64, rs6000,
aarch64, arm, sparc, s390* all use the ccreg-first order, in mips I couldn't
find either of the orders.
Looking at other targets, in alpha, avr, bfin, c6x, cr16, cris, fr30, ft32,
h8300, ia64, iq2000, lm32, m32c, m32r, m68k, mcore, microblaze, mmix, moxie,
nds32, nios2, nvptx, pa, pdp11, riscv, rl78, sh, spu, stormy16, tilegx,
tilepro, v850, vax, xtensa I can't find any order,
arc, epiphany are ccreg-first,
frv is some weird mixture of ccreg-first (e.g. *combo_intop_compare2) and
ccreg-last (e.g. adddi3_lower),
mn10300, rx and visium are ccreg-last that we want to switch over now to
ccreg-first.
It was all from quickly skimming (mostly) the main config//.md
looking for \(compare and/or \(reg.*CC and looking if there are patterns
that have the same arithmetics inside a compare or something similar vs.
the operation repeated on a set, so it is possible I've missed something.

But if the above is roughly true, then it is of course preferable to change
3 less used targets than 6 primary/secondary ones + 2 further ones.

Jakub


Re: Heads-Up: early LTO debug to land, breaking Mach-O / [X]COFF

2017-05-12 Thread Iain Sandoe
Hi Richard,

> On 12 May 2017, at 10:24, Richard Biener  wrote:
> 
> 
> This is a heads-up that I am in the process of implementing the last
> of Jasons review comments on the dwarf2out parts of early LTO debug
> support.  I hope to post final patches early next week after thoroughly
> re-testing everything.
> 
> Note that Mach-O and [X]COFF support in the simple-object machinery
> is still missing for the early LTO debug feature so I am going to
> break LTOing with DWARF debuginfo on Darwin and Windows (CCing
> maintainers).  Mach-O support has been worked on a bit by Iain
> and myself but the simple-object piece is still missing.


Still on my TODO, and intending to do it for Mach-O - but rather short of 
cycles (if non-LTO is unaffected at least we have some breathing space).

> A workaround is to use stabs on these targets with LTO.

stabs isn’t going to work (well, if at all) on modern Darwin...

> DWARF part: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01023.html
> simple-object part: 
> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01733.html
> 
> both still apply with some fuzz.

I have a branch somewhere, will rebase - I’ve been getting stuff up to speed 
this week,
Iain



Re: Remove _Safe_container _IsCxx11AllocatorAware template parameter

2017-05-12 Thread Jonathan Wakely

On 11/05/17 22:03 +0200, François Dumont wrote:

Hi

   _Safe_container _IsCxx11AllocatorAware template allocator is only 
used if C++11 Abi is not used so I simplified it.


   * include/debug/safe_container.h [_GLIBCXX_USE_CXX11_ABI]
   (_Safe_container<>): Remove _IsCxx11AllocatorAware template parameter.
   * include/debug/string: Adapt.

   Tested under Linux x86_64 with debug mode and active C++11 Abi.


Wait, doesn't this mean that debug containers have a different base
class depending on which ABI is active in the translation unit? That
is a One-Definition Rule violation when you link together objects
compiled with different values of _GLIBCXX_USE_CXX11_ABI, which is
supposed to work. I went to enormous effort to ensure it works.



François




diff --git a/libstdc++-v3/include/debug/safe_container.h 
b/libstdc++-v3/include/debug/safe_container.h
index 3d44c15..e985c2a 100644
--- a/libstdc++-v3/include/debug/safe_container.h
+++ b/libstdc++-v3/include/debug/safe_container.h
@@ -36,8 +36,12 @@ namespace __gnu_debug
  /// Safe class dealing with some allocator dependent operations.
  template class _SafeBase,
-  bool _IsCxx11AllocatorAware = true>
+  template class _SafeBase
+#if _GLIBCXX_USE_CXX11_ABI
+  >
+#else
+  , bool _IsCxx11AllocatorAware = true>
+#endif
class _Safe_container


In any case, I don't think this is simpler, there are more lines of
code, and the preprocessor condition makes it harder to read and
harder to reason about.


: public _SafeBase<_SafeContainer>
{
@@ -82,8 +86,10 @@ namespace __gnu_debug
  {
__glibcxx_check_self_move_assign(__x);

+#  if !_GLIBCXX_USE_CXX11_ABI
if (_IsCxx11AllocatorAware)
  {
+#  endif


This is fairly pointless. Again it makes the code a bit harder to
read, and since it's a compile-time constant the condition will be
optimised away completely.



[C++ PATCH, RFC] Implement new C++ intrinsics __is_assignable and __is_constructible.

2017-05-12 Thread Ville Voutilainen
I have tested this with the full suite on Linux-PPC64. It works otherwise fine,
but there's one snag: 20_util/unique_ptr/specialized_algorithms/swap_cxx17.cc
fails, and it looks like the trait ends up instantiating the definition
of a destructor, which then ends up being hard-error ill-formed.

That is, we have

struct C { };
void swap(C&, C&) = delete;
static_assert( !std::is_swappable_v> );

struct D { D(D&&) = delete; };
static_assert( !std::is_swappable_v> );

The destructor definitions of unique_ptr and unique_ptr
are not valid because that destructor will try to call an operator()
on those deleters, and they don't have that operator.

The existing library implementation of is_constructible doesn't
instantiate the destructor definitions, or if it does, they don't cause
a hard error. This intrinsic trait approach behaves differently, and I would
welcome any help fixing that problem.

2017-05-12  Ville Voutilainen  

c-family/

Implement new C++ intrinsics __is_assignable and __is_constructible.
* c-common.c (__is_assignable, __is_constructible): New.
* c-common.h (RID_IS_ASSIGNABLE, RID_IS_CONSTRUCTIBLE): Likewise.

cp/

Implement new C++ intrinsics __is_assignable and __is_constructible.
* cp-tree.h (CPTK_IS_ASSIGNABLE, CPTK_IS_CONSTRUCTIBLE): New.
(is_xible): New.
* cxx-pretty-print.c (pp_cxx_trait_expression): Handle
CPTK_IS_ASSIGNABLE and CPTK_IS_CONSTRUCTIBLE.
* method.c (is_xible_helper): New.
(is_trivially_xible): Adjust.
(is_xible): New.
* parser.c (cp_parser_primary_expression): Handle
RID_IS_ASSIGNABLE and RID_IS_CONSTRUCTIBLE.
(cp_parser_trait_expr): Likewise.
* semantics.c (trait_expr_value): Handle
CPTK_IS_ASSIGNABLE and CPTK_IS_CONSTRUCTIBLE.

libstdc++-v3/

Implement new C++ intrinsics __is_assignable and __is_constructible.
* include/std/type_traits (__do_is_static_castable_impl): Remove.
(__is_static_castable_impl, __is_static_castable_safe): Likewise.
(__is_static_castable, __do_is_direct_constructible_impl): Likewise.
(__is_direct_constructible_impl): Likewise.
(__is_direct_constructible_new_safe): Likewise.
(__is_base_to_derived_ref, __is_lvalue_to_rvalue_ref): Likewise.
(__is_direct_constructible_ref_cast): Likewise.
(__is_direct_constructible_new, __is_direct_constructible): Likewise.
(__do_is_nary_constructible_impl): Likewise.
(__is_nary_constructible_impl, __is_nary_constructible): Likewise.
(__is_constructible_impl): Likewise.
(is_constructible): Call the intrinsic.
(__is_assignable_helper): Remove.
(is_assignable): Call the intrinsic.
(is_trivially_constructible): Likewise.
(is_trivially_assignable): Likewise.
(testsuite/20_util/declval/requirements/1_neg.cc): Adjust.
(testsuite/20_util/make_signed/requirements/typedefs_neg.cc): Likewise.
(testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc):
Likewise.
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index ad686d2..369e112 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -514,6 +514,8 @@ const struct c_common_resword c_common_reswords[] =
   { "volatile",RID_VOLATILE,   0 },
   { "wchar_t", RID_WCHAR,  D_CXXONLY },
   { "while",   RID_WHILE,  0 },
+  { "__is_assignable", RID_IS_ASSIGNABLE, D_CXXONLY },
+  { "__is_constructible", RID_IS_CONSTRUCTIBLE, D_CXXONLY },
 
   /* C++ transactional memory.  */
   { "synchronized",RID_SYNCHRONIZED, D_CXX_OBJC | D_TRANSMEM },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 9e3982d..a986056 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -172,6 +172,7 @@ enum rid
   RID_IS_TRIVIALLY_ASSIGNABLE, RID_IS_TRIVIALLY_CONSTRUCTIBLE,
   RID_IS_TRIVIALLY_COPYABLE,
   RID_IS_UNION,RID_UNDERLYING_TYPE,
+  RID_IS_ASSIGNABLE,   RID_IS_CONSTRUCTIBLE,
 
   /* C++11 */
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 09b1364..154ba3c 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -927,7 +927,9 @@ enum cp_trait_kind
   CPTK_IS_TRIVIALLY_CONSTRUCTIBLE,
   CPTK_IS_TRIVIALLY_COPYABLE,
   CPTK_IS_UNION,
-  CPTK_UNDERLYING_TYPE
+  CPTK_UNDERLYING_TYPE,
+  CPTK_IS_ASSIGNABLE,
+  CPTK_IS_CONSTRUCTIBLE
 };
 
 /* The types that we are processing.  */
@@ -6112,6 +6114,7 @@ extern void use_thunk (tree, 
bool);
 extern bool trivial_fn_p   (tree);
 extern tree forward_parm   (tree);
 extern bool is_trivially_xible (enum tree_code, tree, tree);
+extern bool is_xible   (enum tree_code, tree, tree);
 extern tree get_defaulted_eh_spec  (tree);
 extern tree unevaluated_noexcept_spec  (void);
 extern void after_nsdmi_defaulted_late_checks   (tree);
diff --git a/gcc/cp/cxx-pretty-print.c b/gcc/cp/cx

Re: [PATCH GCC8][13/33]Rewrite cost computation of ivopts

2017-05-12 Thread Bin.Cheng
On Fri, May 12, 2017 at 10:50 AM, Bin.Cheng  wrote:
> On Fri, May 12, 2017 at 8:39 AM, Christophe Lyon
>  wrote:
>> Hi Bin,
>>
>>
>> On 4 May 2017 at 17:25, Bin.Cheng  wrote:
>>> On Wed, Apr 26, 2017 at 11:18 AM, Richard Biener
>>>  wrote:
 On Wed, Apr 26, 2017 at 12:12 PM, Bin.Cheng  wrote:
> On Wed, Apr 26, 2017 at 10:50 AM, Richard Biener
>  wrote:
>> On Tue, Apr 18, 2017 at 12:43 PM, Bin Cheng  wrote:
>>> Hi,
>>> This is the major part of this patch series.  It rewrites cost 
>>> computation of ivopts using tree affine.
>>> Apart from description given by cover message:
>>>   A) New computation cost model.  Currently, there are big amount code 
>>> trying to understand
>>>  tree expression and estimate its computation cost.  The model is 
>>> designed long ago
>>>  for generic tree expressions.  In order to process generic 
>>> expression (even address
>>>  expression of array/memory references), it has code for too many 
>>> corner cases.  The
>>>  problem is it's somehow impossible to handle all complicated 
>>> expressions, even with
>>>  complicated logic in functions like get_computation_cost_at, 
>>> difference_cost,
>>>  ptr_difference_cost, get_address_cost and so on...  The second 
>>> problem is it's hard
>>>  to keep cost model consistent among special cases.  As special 
>>> cases being added
>>>  from time to time, the model is no long unified any more.  There 
>>> are cases that right
>>>  cost results in bad code, or vice versa, wrong cost results in 
>>> good code.  Finally,
>>>  it's difficult to add code for new cases.
>>>  This patch introduces a new cost computation model by using tree 
>>> affine.  Tree exprs
>>>  are lowered to aff_tree which is simple arithmetic operation 
>>> usually.  Code handling
>>>  special cases is no longer necessary, which brings us quite 
>>> simplicity.  It is also
>>>  easier to compute consistent costs among different expressions 
>>> using tree affine,
>>>  which gives us a unified cost model.
>>> This patch also fixes issue that cost computation for address type 
>>> iv_use is inconsistent
>>> with how it is re-rewritten in the end.  It greatly simplified cost 
>>> computation.
>>>
>>> Is it OK?
>>
>> The patch is quite hard to follow (diff messes up here -- this is a
>> case where a context
> Hi Richard,
> Thanks for reviewing, attachment is the updated context diff.  It also
> includes a minor fix handling pre-increment addressing mode,
> specifically, it adds two lines of code:
>
>  if (stmt_after_increment (data->current_loop, cand, use->stmt))
> ainc_offset += ainc_step;
>
>
>> diff is easier to read).  I trust you on the implementation details
>> here, the overall structure
>> looks ok to me.  The only question I have is with regarding to
>> get_loop_invariant_expr
>> which seems to be much less sophisticated than before (basically it's
>> now what was
>> record_inv_expr?).  I suppose the old functionality is superseeded by
>> using affines
>> everywhere else.
> Yes, previous version tries a lot to cancel common sub expression when
> representing use with cand.  New version relies on tree affine which
> is much better.  One problem with invariant expression estimation is
> we don't know in IVOPT if the inv_expr would be hoisted or not by
> later passes.  This problem exists all the time, we can only make
> assumptions here, I think new version is bit more aggressive in
> recording new inv_expr here.

 Thanks.

 LGTM.
>>> Trivial update replacing calls to aff_combination_simple_p because of
>>> change in previous patch.
>>>
>>> Thanks,
>>> bin
>>
>> This commit (r247885) causes regressions on aarch64:
>>   - PASS now FAIL [PASS => FAIL]:
>>
>>   Executed from: gcc.target/aarch64/aarch64.exp
>> gcc.target/aarch64/pr62178.c scan-assembler ld1r\\t{v[0-9]+.
> Hi,
> Thanks for reporting.  The cause is a bit complicated for this
> failure.  First, IVOPTs chooses [base + 4], rather than [base] address
> expression because it has no knowledge that ld1r doesn't support
> pre-autoincrement addressing mode; also RTL passes also contributes to
> the regression.  I will open a PR for tracking.
PR80724 filed.

Thanks,
bin


Re: Avoid _Rb_tree_rotate_[left,right] symbols export

2017-05-12 Thread Jonathan Wakely

On 11/05/17 22:06 +0200, François Dumont wrote:

Hi

   When versioned namespace is active we can avoid export of 
_Rb_tree_rotate_[left,right] symbols. I also took the opportunity to 
put static functions in the anonymous namespace rather than using 
static. Is this usage of static still planned to be deprecated ?


No, I don't think so.

A much simpler (but equivalent) change would be:

--- a/libstdc++-v3/src/c++98/tree.cc
+++ b/libstdc++-v3/src/c++98/tree.cc
@@ -153,6 +153,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  /* Static keyword was missing on _Rb_tree_rotate_left.
 Export the symbol for backward compatibility until
 next ABI change.  */
+#if _GLIBCXX_INLINE_VERSION
+  static
+#endif
  void
  _Rb_tree_rotate_left(_Rb_tree_node_base* const __x,
  _Rb_tree_node_base*& __root)
@@ -184,6 +187,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  /* Static keyword was missing on _Rb_tree_rotate_right
 Export the symbol for backward compatibility until
 next ABI change.  */
+#if _GLIBCXX_INLINE_VERSION
+  static
+#endif
  void
  _Rb_tree_rotate_right(_Rb_tree_node_base* const __x,
   _Rb_tree_node_base*& __root)




   * src/c++98/tree.cc [_GLIBCXX_INLINE_VERSION]
   (_Rb_tree_rotate_left, _Rb_tree_rotate_right): Remove.
   * src/c++98/tree.cc (local_Rb_tree_increment, local_Rb_tree_decrement):
   Move to anonymous namespace.
   (local_Rb_tree_rotate_left, local_Rb_tree_rotate_right): Likewise.

Tested under Linux x86_64 with versioned namespace.


What about the normal configuration? It's much more important that the
default configuration works. The versioned namespace that nobody uses
doesn't matter.



Re: [C++ PATCH, RFC] Implement new C++ intrinsics __is_assignable and __is_constructible.

2017-05-12 Thread Daniel Krügler
2017-05-12 12:39 GMT+02:00 Ville Voutilainen :
> I have tested this with the full suite on Linux-PPC64. It works otherwise 
> fine,
> but there's one snag: 20_util/unique_ptr/specialized_algorithms/swap_cxx17.cc
> fails, and it looks like the trait ends up instantiating the definition
> of a destructor, which then ends up being hard-error ill-formed.

I'm pretty sure that the library-based implementation does not
instantiate the destructor definition.

Your description sounds remotely similar to me to the current problem
of __is_trivially_constructible intrinsic, which seems to instantiate
the copy constructor definition albeit it (IMO) shouldn't:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80654

Could there be a similar cause?

- Daniel

> That is, we have
>
> struct C { };
> void swap(C&, C&) = delete;
> static_assert( !std::is_swappable_v> );
>
> struct D { D(D&&) = delete; };
> static_assert( !std::is_swappable_v> );
>
> The destructor definitions of unique_ptr and unique_ptr
> are not valid because that destructor will try to call an operator()
> on those deleters, and they don't have that operator.
>
> The existing library implementation of is_constructible doesn't
> instantiate the destructor definitions, or if it does, they don't cause
> a hard error. This intrinsic trait approach behaves differently, and I would
> welcome any help fixing that problem.
>
> 2017-05-12  Ville Voutilainen  
>
> c-family/
>
> Implement new C++ intrinsics __is_assignable and __is_constructible.
> * c-common.c (__is_assignable, __is_constructible): New.
> * c-common.h (RID_IS_ASSIGNABLE, RID_IS_CONSTRUCTIBLE): Likewise.
>
> cp/
>
> Implement new C++ intrinsics __is_assignable and __is_constructible.
> * cp-tree.h (CPTK_IS_ASSIGNABLE, CPTK_IS_CONSTRUCTIBLE): New.
> (is_xible): New.
> * cxx-pretty-print.c (pp_cxx_trait_expression): Handle
> CPTK_IS_ASSIGNABLE and CPTK_IS_CONSTRUCTIBLE.
> * method.c (is_xible_helper): New.
> (is_trivially_xible): Adjust.
> (is_xible): New.
> * parser.c (cp_parser_primary_expression): Handle
> RID_IS_ASSIGNABLE and RID_IS_CONSTRUCTIBLE.
> (cp_parser_trait_expr): Likewise.
> * semantics.c (trait_expr_value): Handle
> CPTK_IS_ASSIGNABLE and CPTK_IS_CONSTRUCTIBLE.
>
> libstdc++-v3/
>
> Implement new C++ intrinsics __is_assignable and __is_constructible.
> * include/std/type_traits (__do_is_static_castable_impl): Remove.
> (__is_static_castable_impl, __is_static_castable_safe): Likewise.
> (__is_static_castable, __do_is_direct_constructible_impl): Likewise.
> (__is_direct_constructible_impl): Likewise.
> (__is_direct_constructible_new_safe): Likewise.
> (__is_base_to_derived_ref, __is_lvalue_to_rvalue_ref): Likewise.
> (__is_direct_constructible_ref_cast): Likewise.
> (__is_direct_constructible_new, __is_direct_constructible): Likewise.
> (__do_is_nary_constructible_impl): Likewise.
> (__is_nary_constructible_impl, __is_nary_constructible): Likewise.
> (__is_constructible_impl): Likewise.
> (is_constructible): Call the intrinsic.
> (__is_assignable_helper): Remove.
> (is_assignable): Call the intrinsic.
> (is_trivially_constructible): Likewise.
> (is_trivially_assignable): Likewise.
> (testsuite/20_util/declval/requirements/1_neg.cc): Adjust.
> (testsuite/20_util/make_signed/requirements/typedefs_neg.cc): Likewise.
> (testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc):
> Likewise.



-- 


SavedURI :Show URLShow URLSavedURI :
SavedURI :Hide URLHide URLSavedURI :
https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154



Re: [PATCH] Fix PR80721

2017-05-12 Thread Jonathan Wakely

On 12/05/17 12:10 +0200, Richard Biener wrote:


It was pointed out by Markus that the EH emergency pool is not
kept sorted and fully merged properly for the cases of freeing
an entry before the first free entry and for the cases where
merging with the immediate successor and for the case with
merging with both successor and predecessor is possible.

The following patch attempts to fix this.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok for trunk?  (given low / close to no testing coverage
extra close eyes wanted!)

Reporter says maybe it can't happen in real-life as it requires
EH deallocation order not be the reverse of allocation order.
I don't know enough here for a quick guess but "in C++ everything
is possible" ;)


It's definitely possible.

std::exception_ptr p1, p2;
try {
 throw 1;
} catch (...) {
 p1 = std::current_exception();
}
try {
 throw 2;
} catch (...) {
 p2 = std::current_exception();
}

Both exceptions are still active, and their lifetimes are now tied to
value objects which can be reset, swapped, copied etc. so their
lifetime is now arbitrary.




Re: [C++ PATCH, RFC] Implement new C++ intrinsics __is_assignable and __is_constructible.

2017-05-12 Thread Ville Voutilainen
On 12 May 2017 at 14:06, Daniel Krügler  wrote:
> 2017-05-12 12:39 GMT+02:00 Ville Voutilainen :
>> I have tested this with the full suite on Linux-PPC64. It works otherwise 
>> fine,
>> but there's one snag: 20_util/unique_ptr/specialized_algorithms/swap_cxx17.cc
>> fails, and it looks like the trait ends up instantiating the definition
>> of a destructor, which then ends up being hard-error ill-formed.
>
> I'm pretty sure that the library-based implementation does not
> instantiate the destructor definition.
>
> Your description sounds remotely similar to me to the current problem
> of __is_trivially_constructible intrinsic, which seems to instantiate
> the copy constructor definition albeit it (IMO) shouldn't:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80654
>
> Could there be a similar cause?


Seems quite plausible to me. I would be happy to fix that bug in the
same go, but I'm a bit
lost as to what exactly causes the problem. constructible_expr in method.c does
build_special_member_call for the constructor and the destructor, so
perhaps there
are some flags that could make it behave.


[PATCH GCC][1/6]Compute type mode and register pressure class mapping

2017-05-12 Thread Bin Cheng
Hi,
This will be a patch series implementing an interface which estimates register
pressure on tree ssa and uses the information in predictive common optimization.
This the first patch computing map from type modes to register pressure classes.

Given there is no pseudo register on tree ssa form, we need type mode -> 
register
class information to compute register pressure.  This patch adds such map in
struct target_ira and a function computing the map.  Though the map is computed
by guess, it's enough for use on tree level.  As a matter of fact, we only need
to identify GENERAL, FLOAT and VECTOR register classes.

Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin

2017-05-10  Bin Cheng  

* ira.c (setup_mode_classes): New function.
(find_reg_classes): Call above function.
* ira.h (struct target_ira): New field x_ira_mode_classes.
(ira_mode_classes): New macro.From 9f4b7c2ec22ee65549e019cf63b25653a103aa2c Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 24 Apr 2017 14:41:28 +0100
Subject: [PATCH 1/6] ira-mode-reg_class-map-20170316.txt

---
 gcc/ira.c | 77 +++
 gcc/ira.h |  7 ++
 2 files changed, 84 insertions(+)

diff --git a/gcc/ira.c b/gcc/ira.c
index bfb0508..c9c596b 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -1153,6 +1153,82 @@ setup_class_translate (void)
   ira_pressure_classes_num, ira_pressure_classes);
 }
 
+/* Find desired register class for machine mode from information about
+   register pressure class.  On RTL level, we can compute preferred
+   register class infomation for each pseudo register or allocno.  On
+   GIMPLE level, we need to infer register class from variable's type,
+   i.e, we need map from type mode to register class.
+
+   The map information is computed by simple guess, it's good enough
+   for use on GIMPLE.  */
+void
+setup_mode_classes (void)
+{
+  int i, j;
+  machine_mode mode;
+  enum reg_class vector_class = NO_REGS;
+
+  for (i = 0; i < NUM_MACHINE_MODES; i++)
+{
+  mode = (machine_mode) i;
+  ira_mode_classes[mode] = NO_REGS;
+
+  /* Only care about integer, float and vector modes on GIMPLE.  */
+  if (!INTEGRAL_MODE_P (mode)
+ && !FLOAT_MODE_P (mode) && !VECTOR_MODE_P (mode))
+   continue;
+
+  /* Integers must be in GENERAL_REGS by default.  */
+  if (SCALAR_INT_MODE_P (mode))
+   {
+ ira_mode_classes[mode] = GENERAL_REGS;
+ continue;
+   }
+
+  /* Iterate over pressure classes and find the most appropriate
+one for this mode.  */
+  for (j = 0; j < ira_pressure_classes_num; j++)
+   {
+ HARD_REG_SET valid_for_cl;
+ enum reg_class cl = ira_pressure_classes[j];
+
+ if (!contains_reg_of_mode[cl][mode])
+   continue;
+
+ COPY_HARD_REG_SET (valid_for_cl, reg_class_contents[cl]);
+ AND_COMPL_HARD_REG_SET (valid_for_cl,
+ ira_prohibited_class_mode_regs[cl][mode]);
+ AND_COMPL_HARD_REG_SET (valid_for_cl, ira_no_alloc_regs);
+ if (hard_reg_set_empty_p (valid_for_cl))
+   continue;
+
+ if (ira_mode_classes[mode] == NO_REGS)
+   {
+ ira_mode_classes[mode] = cl;
+
+ /* Record reg_class for vector mode.  */
+ if (VECTOR_MODE_P (mode) && cl != NO_REGS)
+   vector_class = cl;
+
+ continue;
+   }
+ /* Prefer non GENERAL_REGS for floating points.  */
+ if ((FLOAT_MODE_P (mode) || VECTOR_MODE_P (mode))
+ && cl != GENERAL_REGS && ira_mode_classes[mode] == GENERAL_REGS)
+   ira_mode_classes[mode] = cl;
+   }
+}
+
+  /* Setup vector modes that are missed previously.  */
+  if (vector_class != NO_REGS)
+for (i = 0; i < NUM_MACHINE_MODES; i++)
+  {
+   mode = (machine_mode) i;
+   if (ira_mode_classes[mode] == NO_REGS && VECTOR_MODE_P (mode))
+ ira_mode_classes[mode] = vector_class;
+  }
+}
+
 /* Order numbers of allocno classes in original target allocno class
array, -1 for non-allocno classes.  */
 static int allocno_class_order[N_REG_CLASSES];
@@ -1429,6 +1505,7 @@ find_reg_classes (void)
   setup_class_translate ();
   reorder_important_classes ();
   setup_reg_class_relations ();
+  setup_mode_classes ();
 }
 
 
diff --git a/gcc/ira.h b/gcc/ira.h
index 667cfdc..b505e4d 100644
--- a/gcc/ira.h
+++ b/gcc/ira.h
@@ -66,6 +66,11 @@ struct target_ira
  class.  */
   enum reg_class x_ira_pressure_class_translate[N_REG_CLASSES];
 
+  /* Map of machine mode to register pressure class.  With this map,
+ coarse-grained register pressure can be computed on GIMPLE, where
+ we don't have insn pattern to compute preferred reg class.  */
+  enum reg_class x_ira_mode_classes[MAX_MACHINE_MODE];
+
   /* Biggest pressure register class containing stack registers.
  NO_REGS if there are no stack registe

[PATCH GCC][3/6]New file computing regional register pressure on TREE SSA

2017-05-12 Thread Bin Cheng
Hi,
This patch computes register pressure information on TREE SSA by a backward live
range data flow problem.  The major motivation is to estimate register pressure
for inner-most loop on TREE SSA, then other optimizations can use it.  So far 
the
information is used only in predcom later, but it could be useful to implement a
tree level scheduler in order to shrink live ranges.  Unfortunately the example
live range shrink pass I implemented doesn't have obvious impact on performance.
I think one reason is TER which effectively undoes its effect.  Maybe it will be
useful once TER/expanding is replaced with a better instruction selector, it is
not included in this patch.
One fact I need to mention is David proposed a similar patch long time ago at
https://gcc.gnu.org/ml/gcc-patches/2008-12/msg01261.html.  It tries to compute
register pressure information on tree ssa and shrink live ranges based on that
information.  Unfortunately the patch wasn't merged in the end.  There has been
quite changes in GCC implementation, I didn't use its code directly.  However,
I did read that patch and had it in mind when implementing this one.  If there
is any issue in this patch, it would be me that should be blamed.  I also sent
message to David about this patch and the possible relation with his.

Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin

2017-05-10  Xinliang David Li  
Bin Cheng  

* Makefile.in (tree-ssa-regpressure.o): New object file.
* tree-ssa-regpressure.c: New file.
* tree-ssa-regpressure.h: New file.From bf6e51ff68d87c372719de567d4de49d77744f77 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 8 May 2017 15:20:27 +0100
Subject: [PATCH 3/6] tree-ssa-regpressure-20170504.txt

---
 gcc/Makefile.in|   1 +
 gcc/tree-ssa-regpressure.c | 829 +
 gcc/tree-ssa-regpressure.h |  21 ++
 3 files changed, 851 insertions(+)
 create mode 100644 gcc/tree-ssa-regpressure.c
 create mode 100644 gcc/tree-ssa-regpressure.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 97259ac..abfd4bc 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1534,6 +1534,7 @@ OBJS = \
tree-ssa-pre.o \
tree-ssa-propagate.o \
tree-ssa-reassoc.o \
+   tree-ssa-regpressure.o \
tree-ssa-sccvn.o \
tree-ssa-scopedtables.o \
tree-ssa-sink.o \
diff --git a/gcc/tree-ssa-regpressure.c b/gcc/tree-ssa-regpressure.c
new file mode 100644
index 000..ebc6576
--- /dev/null
+++ b/gcc/tree-ssa-regpressure.c
@@ -0,0 +1,829 @@
+/* Reg Pressure Model and Live Range Shrinking Optimization on TREE SSA.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "memmodel.h"
+#include "ira.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-pretty-print.h"
+#include "stor-layout.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "cfgloop.h"
+#include "tree-ssa-regpressure.h"
+
+
+/* Entry for invariant hash table.  */
+struct inv_entry
+{
+  tree expr;
+  unsigned id;
+  hashval_t hash;
+};
+
+/* Helpers for invariant hash table.  */
+
+struct inv_hasher : delete_ptr_hash 
+{
+  static inline hashval_t hash (const inv_entry *);
+  static inline bool equal (const inv_entry *, const inv_entry *);
+};
+
+/* Hash function for invariant hash table.  */
+
+inline hashval_t
+inv_hasher::hash (const inv_entry *entry)
+{
+  return entry->hash;
+}
+
+/* Hash table equality function for invariant hash table.  */
+
+inline bool
+inv_hasher::equal (const inv_entry *entry1, const inv_entry *entry2)
+{
+  return (entry1->hash == entry2->hash
+ && operand_equal_p (entry1->expr, entry2->expr, 0)
+ && (TYPE_PRECISION (TREE_TYPE (entry1->expr))
+ == TYPE_PRECISION (TREE_TYPE (entry2->expr;
+}
+
+/* Reg_alloc structure, coalesced from ssa variables.  */
+typedef struct reg_alloc
+{
+  /* ID of this reg_alloc.  */
+  unsigned id;
+  /* The type of ssa variables of this reg_alloc.  */
+  tree type;
+  /* Ssa variables coalesced to this reg_alloc.  */
+  bitmap vars;
+}*reg_alloc_t;
+
+/* Live range information for a basic block.  */
+struct bb_lr_info
+{
+  /* G

[PATCH GCC][2/6]Compute available register for each register classes

2017-05-12 Thread Bin Cheng
Hi,
Currently available/clobber registers are computed only for GENERAL_REGS, this
patch extends it for all reg pressure classes.  It also updates existing uses
in various places.

Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin
2017-05-10  Bin Cheng  

* cfgloop.h (struct target_cfgloop): Change x_target_avail_regs and
x_target_clobbered_regs into array fields.
(init_avail_clobber_regs): New declaration.
* cfgloopanal.c (memmodel.h, ira.h): Include header files.
(init_set_costs): Remove computation for old x_target_avail_regs and
x_target_clobbered_regs fields.
(init_avail_clobber_regs): New function.
(estimate_reg_pressure_cost): Update the uses.
* toplev.c (cfgloop.h): Update comment why the header file is needed.
(backend_init_target): Call init_avail_clobber_regs.
* tree-predcom.c (memmodel.h, ira.h): Include header files.
(MAX_DISTANCE): Update the use.
* tree-ssa-loop-ivopts.c (determine_set_costs): Update the uses.
(determine_set_costs): Ditto.From f9a2481957916e880d89972344ecc074f74ada95 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 24 Apr 2017 14:42:23 +0100
Subject: [PATCH 2/6] init-available-regs-20170316.txt

---
 gcc/cfgloop.h  | 10 ---
 gcc/cfgloopanal.c  | 68 +++---
 gcc/toplev.c   |  3 +-
 gcc/tree-predcom.c |  4 ++-
 gcc/tree-ssa-loop-ivopts.c | 14 ++
 5 files changed, 72 insertions(+), 27 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index a8bec1d..853a04a 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -754,11 +754,12 @@ loop_iterator::~loop_iterator ()
 
 /* The properties of the target.  */
 struct target_cfgloop {
-  /* Number of available registers.  */
-  unsigned x_target_avail_regs;
+  /* Number of available registers per register pressure class.  */
+  unsigned x_target_avail_regs[N_REG_CLASSES];
 
-  /* Number of available registers that are call-clobbered.  */
-  unsigned x_target_clobbered_regs;
+  /* Number of available registers that are call-clobbered, per register
+ pressure class.  */
+  unsigned x_target_clobbered_regs[N_REG_CLASSES];
 
   /* Number of registers reserved for temporary expressions.  */
   unsigned x_target_res_regs;
@@ -793,6 +794,7 @@ extern struct target_cfgloop *this_target_cfgloop;
invariant motion.  */
 extern unsigned estimate_reg_pressure_cost (unsigned, unsigned, bool, bool);
 extern void init_set_costs (void);
+extern void init_avail_clobber_regs (void);
 
 /* Loop optimizer initialization.  */
 extern void loop_optimizer_init (unsigned);
diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
index aa06c96..befbf36 100644
--- a/gcc/cfgloopanal.c
+++ b/gcc/cfgloopanal.c
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "backend.h"
 #include "rtl.h"
+#include "memmodel.h"
+#include "ira.h"
 #include "tree.h"
 #include "predict.h"
 #include "memmodel.h"
@@ -336,20 +338,6 @@ init_set_costs (void)
   rtx reg2 = gen_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 2);
   rtx addr = gen_raw_REG (Pmode, LAST_VIRTUAL_REGISTER + 3);
   rtx mem = validize_mem (gen_rtx_MEM (SImode, addr));
-  unsigned i;
-
-  target_avail_regs = 0;
-  target_clobbered_regs = 0;
-  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i)
-   && !fixed_regs[i])
-  {
-   target_avail_regs++;
-   if (call_used_regs[i])
- target_clobbered_regs++;
-  }
-
-  target_res_regs = 3;
 
   for (speed = 0; speed < 2; speed++)
  {
@@ -379,6 +367,54 @@ init_set_costs (void)
   default_rtl_profile ();
 }
 
+/* Initialize available, clobbered register for each register classes.  */
+
+void
+init_avail_clobber_regs (void)
+{
+  int j;
+  unsigned i;
+  bool general_regs_presented_p = false;
+
+  /* Check if GENERAL_REGS is one of pressure classes.  */
+  for (j = 0; j < ira_pressure_classes_num; j++)
+{
+  target_avail_regs[j] = 0;
+  target_clobbered_regs[j] = 0;
+  if (ira_pressure_classes[j] == GENERAL_REGS)
+   general_regs_presented_p = true;
+}
+  target_avail_regs[GENERAL_REGS] = 0;
+  target_clobbered_regs[GENERAL_REGS] = 0;
+
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+{
+  if (fixed_regs[i])
+   continue;
+
+  bool call_used = call_used_regs[i];
+
+  for (j = 0; j < ira_pressure_classes_num; j++)
+   if (TEST_HARD_REG_BIT (reg_class_contents[ira_pressure_classes[j]], i))
+ {
+   target_avail_regs[ira_pressure_classes[j]]++;
+   if (call_used)
+ target_clobbered_regs[ira_pressure_classes[j]]++;
+ }
+
+  /* Compute pressure information for GENERAL_REGS separately.  */
+  if (!general_regs_presented_p)
+   if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i))
+ {
+   target_avail_regs[GENE

[PATCH GCC][4/6]Simple patch skips single element component

2017-05-12 Thread Bin Cheng
Hi,
This is a simple patch discarding simple element components earlier in predcom.
Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin
2017-05-10  Bin Cheng  

* tree-predcom.c (determine_roots_comp): Skip single-elem chain.From fd3dd235dca80671d1201098c9235a17b5a2f544 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 24 Apr 2017 14:47:25 +0100
Subject: [PATCH 4/6] skip-single-elem-component-20161221.txt

---
 gcc/tree-predcom.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 96af63f..0b7811a 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -1229,8 +1229,11 @@ determine_roots_comp (struct loop *loop,
   return;
 }
 
-  comp->refs.qsort (order_drefs);
+  /* Trivial component.  */
+  if (comp->refs.length () <= 1)
+return;
 
+  comp->refs.qsort (order_drefs);
   FOR_EACH_VEC_ELT (comp->refs, i, a)
 {
   if (!chain || DR_IS_WRITE (a->ref)
-- 
1.9.1



[PATCH GCC][6/6]Avoid aggressive predcom for high register pressure cases

2017-05-12 Thread Bin Cheng
Hi,
Aggressive precom could result in larger number loop carried variables, causes 
high
register pressure and spilling.  One example is the hot loop of 436.cactusADM, 
in
which >25 loop carried variables are introduced for the vectorized version loop,
depending on the vector factor.  This patch computes loop register pressure on 
tree
ssa using previously introduced interface.  It uses the information to prune 
chains
with simple heuristic.  For example, combined and zero-length chains are always
allowed; other chains are allowed under register cost; and loop unrolling is 
forced
off if register pressure is high.  With this patch, the benchmark can be obvious
improved on AArch64.

Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin
2017-05-10  Bin Cheng  

* tree-predcom.c (stor-layout.h, tree-ssa-regpressure.h): New header
files.
(prune_chains): New function.
(tree_predictive_commoning_loop): Call compute_loop_reg_pressure to
compute reg pressure.  Prune chains based on reg pressure.  Force
to not unroll if reg pressure is high.From 744a63e16063451c6c36a8f08271c5174c902392 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 8 May 2017 11:04:46 +0100
Subject: [PATCH 6/6] pcom-reg-pressure-20170503.txt

---
 gcc/tree-predcom.c | 67 ++
 1 file changed, 67 insertions(+)

diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index f73d869..b2df037 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -197,6 +197,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "gimple-pretty-print.h"
+#include "stor-layout.h"
 #include "alias.h"
 #include "fold-const.h"
 #include "cfgloop.h"
@@ -207,6 +208,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop-ivopts.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
+#include "tree-ssa-regpressure.h"
 #include "tree-ssa-loop.h"
 #include "tree-into-ssa.h"
 #include "tree-dfa.h"
@@ -2510,6 +2512,57 @@ insert_init_seqs (struct loop *loop, vec chains)
   }
 }
 
+/* Prune chains causing high register pressure.  */
+
+static bool
+prune_chains (vec *chains, unsigned *max_pressure)
+{
+  bool pruned_p = false;
+  machine_mode mode;
+  enum reg_class cl;
+  unsigned i, new_pressure;
+
+  for (i = 0; i < chains->length ();)
+{
+  chain_p chain = (*chains)[i];
+  /* Always allow combined chain and zero-length chain.  */
+  if (chain->combined
+ || chain->type == CT_COMBINATION || chain->length == 0)
+   {
+ i++;
+ continue;
+   }
+
+  gcc_assert (chain->refs.length () > 0);
+  mode = TYPE_MODE (TREE_TYPE (chain->refs[0]->ref->ref));
+  /* Bypass chain that doesn't contribute to any reg_class, although
+something could be wrong when mapping type mode to reg_class.  */
+  if (ira_mode_classes[mode] == NO_REGS)
+   {
+ i++;
+ continue;
+   }
+
+  cl = ira_pressure_class_translate[ira_mode_classes[mode]];
+  /* Prune chain if it causes higher register pressure than available
+registers; otherwise keep the chain and update register pressure
+information.  */
+  new_pressure = max_pressure[cl] + chain->length - 1;
+  if (new_pressure <= target_avail_regs[cl])
+   {
+ i++;
+ max_pressure[cl] = new_pressure;
+   }
+  else
+   {
+ gimple_seq_discard (chain->init_seq);
+ chains->unordered_remove (i);
+ pruned_p = true;
+   }
+}
+  return pruned_p;
+}
+
 /* Performs predictive commoning for LOOP.  Returns true if LOOP was
unrolled.  */
 
@@ -2523,6 +2576,8 @@ tree_predictive_commoning_loop (struct loop *loop)
   unsigned unroll_factor;
   struct tree_niter_desc desc;
   bool unroll = false;
+  bool high_pressure_p;
+  unsigned max_pressure[N_REG_CLASSES];
   edge exit;
   bitmap tmp_vars;
 
@@ -2592,6 +2647,11 @@ tree_predictive_commoning_loop (struct loop *loop)
   /* Try to combine the chains that are always worked with together.  */
   try_combine_chains (&chains);
 
+  compute_loop_reg_pressure (loop, max_pressure, &high_pressure_p);
+  if (prune_chains (&chains, max_pressure))
+if (dump_file && (dump_flags & TDF_DETAILS))
+  fprintf (dump_file, "Prune chain because of high reg pressure\n");
+
   insert_init_seqs (loop, chains);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -2603,6 +2663,13 @@ tree_predictive_commoning_loop (struct loop *loop)
   /* Determine the unroll factor, and if the loop should be unrolled, ensure
  that its number of iterations is divisible by the factor.  */
   unroll_factor = determine_unroll_factor (chains);
+  /* Force to not unroll if register pressure is high.  */
+  if (high_pressure_p && unroll_factor > 1)
+{
+  unroll_factor = 1;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Force 

[PATCH GCC][5/6]Record initialization statements and only insert it for valid chains

2017-05-12 Thread Bin Cheng
Hi,
This patch caches initialization statements and only inserts it for valid 
chains.
Looks like current code even inserts such stmts for invalid chains which will be
deleted as dead code afterwards.

Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin
2017-05-10  Bin Cheng  

* tree-predcom.c (struct chain): New field init_seq.
(prepare_initializers_chain): Record intialization stmts in above
field.  Discard it if chain is invalid.
(insert_init_seqs): New function.
(tree_predictive_commoning_loop): Call insert_init_seqs.From db10e77e52ad44416edc08f9789faabef911a9d8 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 24 Apr 2017 14:48:12 +0100
Subject: [PATCH 5/6] chain-init-seq-20170102.txt

---
 gcc/tree-predcom.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 0b7811a..f73d869 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -296,6 +296,9 @@ typedef struct chain
   /* Initializers for the variables.  */
   vec inits;
 
+  /* gimple stmts intializing the initial variables of the chain.  */
+  gimple_seq init_seq;
+
   /* True if there is a use of a variable with the maximal distance
  that comes after the root in the loop.  */
   unsigned has_max_use_after : 1;
@@ -2454,12 +2457,14 @@ prepare_initializers_chain (struct loop *loop, chain_p 
chain)
   init = ref_at_iteration (dr, (int) i - n, &stmts);
   if (!chain->all_always_accessed && tree_could_trap_p (init))
{
+ if (chain->init_seq)
+   gimple_seq_discard (chain->init_seq);
  gimple_seq_discard (stmts);
  return false;
}
 
   if (stmts)
-   gsi_insert_seq_on_edge_immediate (entry, stmts);
+   gimple_seq_add_seq (&chain->init_seq, stmts);
 
   chain->inits[i] = init;
 }
@@ -2489,6 +2494,22 @@ prepare_initializers (struct loop *loop, vec 
chains)
 }
 }
 
+/* Insert all initializing gimple stmts into loop's entry edge.  */
+
+static void
+insert_init_seqs (struct loop *loop, vec chains)
+{
+  unsigned i;
+  edge entry = loop_preheader_edge (loop);
+
+  for (i = 0; i < chains.length (); ++i)
+if (chains[i]->init_seq)
+  {
+   gsi_insert_seq_on_edge_immediate (entry, chains[i]->init_seq);
+   chains[i]->init_seq = NULL;
+  }
+}
+
 /* Performs predictive commoning for LOOP.  Returns true if LOOP was
unrolled.  */
 
@@ -2571,6 +2592,8 @@ tree_predictive_commoning_loop (struct loop *loop)
   /* Try to combine the chains that are always worked with together.  */
   try_combine_chains (&chains);
 
+  insert_init_seqs (loop, chains);
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Before commoning:\n\n");
-- 
1.9.1



[PATCH] Add sequence check to leaf_function_p

2017-05-12 Thread Wilco Dijkstra
This is a followup from: 
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02916.html

Add an assert to leaf_function_p to ensure it is not called from a
prolog or epilog sequence (which would incorrectly return true in a
non-leaf function).  There are several targets which still call
leaf_function_p, and while most appear safe or appear aware of the
issue, it is likely not all such calls are safe.  This check enables
any such latent bugs to be found.

Bootstrap OK on AArch64.

2017-05-11  Wilco Dijkstra  

* final.c (leaf_function_p): Check we are not in a sequence.
--

diff --git a/gcc/final.c b/gcc/final.c
index 
820162b2d28d734901375017cf0c7a3095e8903e..c9aa610d2696738342f61b9c944a7a2f18e7497c
 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -4309,6 +4309,9 @@ leaf_function_p (void)
 {
   rtx_insn *insn;
 
+  /* Check we are not in a prologue or epilogue sequence.  */
+  gcc_assert (!in_sequence_p ());
+
   /* Some back-ends (e.g. s390) want leaf functions to stay leaf
  functions even if they call mcount.  */
   if (crtl->profile && !targetm.keep_leaf_when_profiled ())


[PATCH] Properly fold stmts in PRE

2017-05-12 Thread Richard Biener

When Jeff did the DSE improvements I was reminded that PRE didn't
fold calls properly (aka not in-place).  This results in unfolded
memory ops (not inlined or removed as do nothing for size zero).

The following finally fixes that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-05-12  Richard Biener  

* tree-ssa-sccvn.h (has_VN_INFO): Declare.
* tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
Fold all stmts not inplace.

* g++.dg/tree-ssa/ssa-dse-2.C: Adjust.

Index: gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C
===
--- gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C   (revision 247965)
+++ gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C   (working copy)
@@ -54,6 +54,4 @@ fill_vec_av_set (av_set_t av)
 }
 
 /* { dg-final { scan-tree-dump-not "Trimming statement .head = -" "dse2" } } */
-/* { dg-final { scan-tree-dump "Deleted dead call: " "dse2" } } */
-
-
+/* { dg-final { scan-tree-dump-not "mem\[^\r\n\]*, 0\\);" "dse2" } } */
Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 247965)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -4645,30 +4645,51 @@ eliminate_dom_walker::before_dom_childre
  && TREE_CODE (gimple_assign_rhs1 (stmt)) == ADDR_EXPR)
recompute_tree_invariant_for_addr_expr (gimple_assign_rhs1 (stmt));
  gimple *old_stmt = stmt;
- if (is_gimple_call (stmt))
+ gimple_stmt_iterator prev = gsi;
+ gsi_prev (&prev);
+ if (fold_stmt (&gsi))
{
- /* ???  Only fold calls inplace for now, this may create new
-SSA names which in turn will confuse free_scc_vn SSA name
-release code.  */
- fold_stmt_inplace (&gsi);
- /* When changing a call into a noreturn call, cfg cleanup
-is needed to fix up the noreturn call.  */
- if (!was_noreturn && gimple_call_noreturn_p (stmt))
-   el_to_fixup.safe_push  (stmt);
-   }
- else
-   {
- fold_stmt (&gsi);
- stmt = gsi_stmt (gsi);
- if ((gimple_code (stmt) == GIMPLE_COND
-  && (gimple_cond_true_p (as_a  (stmt))
-  || gimple_cond_false_p (as_a  (stmt
- || (gimple_code (stmt) == GIMPLE_SWITCH
- && TREE_CODE (gimple_switch_index (
- as_a  (stmt)))
-== INTEGER_CST))
-   el_todo |= TODO_cleanup_cfg;
+ /* fold_stmt may have created new stmts inbetween
+the previous stmt and the folded stmt.  Mark
+all defs created there as varying to not confuse
+the SCCVN machinery as we're using that even during
+elimination.  */
+ if (gsi_end_p (prev))
+   prev = gsi_start_bb (b);
+ else
+   gsi_next (&prev);
+ if (gsi_stmt (prev) != gsi_stmt (gsi))
+   do
+ {
+   tree def;
+   ssa_op_iter dit;
+   FOR_EACH_SSA_TREE_OPERAND (def, gsi_stmt (prev),
+  dit, SSA_OP_ALL_DEFS)
+ /* As existing DEFs may move between stmts
+we have to guard VN_INFO_GET.  */
+ if (! has_VN_INFO (def))
+   VN_INFO_GET (def)->valnum = def;
+   if (gsi_stmt (prev) == gsi_stmt (gsi))
+ break;
+   gsi_next (&prev);
+ }
+   while (1);
}
+ stmt = gsi_stmt (gsi);
+ /* When changing a call into a noreturn call, cfg cleanup
+is needed to fix up the noreturn call.  */
+ if (!was_noreturn
+ && is_gimple_call (stmt) && gimple_call_noreturn_p (stmt))
+   el_to_fixup.safe_push  (stmt);
+ /* When changing a condition or switch into one we know what
+edge will be executed, schedule a cfg cleanup.  */
+ if ((gimple_code (stmt) == GIMPLE_COND
+  && (gimple_cond_true_p (as_a  (stmt))
+  || gimple_cond_false_p (as_a  (stmt
+ || (gimple_code (stmt) == GIMPLE_SWITCH
+ && TREE_CODE (gimple_switch_index
+ (as_a  (stmt))) == INTEGER_CST))
+   el_todo |= TODO_cleanup_cfg;
  /* If we removed EH side-effects from the statement, clean
 its EH information.  */
  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
Index: gcc/tree-ssa-sccvn.h
===
--- gcc/tree-ssa-sccvn.h(revision 247965)
+++ gcc/tree-ssa-sccvn.h(working copy)
@@ -209,6 +209,7 @@ typ

Re: Go patches committed: merge recent changes to gofrontend

2017-05-12 Thread Ian Lance Taylor
On Wed, May 10, 2017 at 5:37 PM, Andrew Pinski  wrote:
> On Wed, May 10, 2017 at 10:26 AM, Ian Lance Taylor  wrote:
>> I have committed a large patch to update the Go frontend and libgo to
>> the recent changes in the gofrontend repository.  I had postponed
>> merging changes during the GCC 7 release process.  I am now merging
>> all the changes that were pending during that period.  Although this
>> is a merged patch, the changes can be seen individually in the
>> gofrontend repo (https://go.googlesource.com/gofrontend).  They are
>> also listed below.
>>
>> This is a fairly significant patch that brings in the concurrent
>> garbage collector used in the Go 1.8 runtime.  This significantly
>> reduces pauses due to garbage collection while running a Go program.
>>
>> This patch also brings in experimental support for AIX for gccgo,
>> contributed by Matthieu Sarter and others at Atos Infogérance.
>>
>> The actual patch is too large for this e-mail patch, but I have
>> attached all the changes to the gcc/go directory.
>>
>> Ian
>
>
> This causes a build failure on aarch64-linux-gnu:
> ../../../gcc/libgo/runtime/proc.c: In function ‘runtime_malg’:
> ../../../gcc/libgo/runtime/proc.c:729:43: warning: implicit
> declaration of function ‘mstats’; did you mean ‘mstart1’?
> [-Wimplicit-function-declaration]
> void *p = runtime_sysAlloc(stacksize, &mstats()->other_sys);
>^~
>mstart1
> ../../../gcc/libgo/runtime/proc.c:729:51: error: invalid type argument
> of ‘->’ (have ‘int’)
> void *p = runtime_sysAlloc(stacksize, &mstats()->other_sys);

Sorry about that.  I intended to test on a non-split-stack system, but
I forgot.  This patch fixes the build problem and lets most of the
testsuite pass.  I still see two test failures on non-split-stack that
I need to look into (index0-out and runtime on 32-bit systems).  Patch
tested on x86_64-pc-linux-gnu with and without split-stack enabled.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 247948)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-3c1258156a2ae483c5cc523cb7a3c3374cbe7c2c
+d5bfa6cebb19a154cbfbc53f6e647d2ca7adef68
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/runtime2.go
===
--- libgo/go/runtime/runtime2.go(revision 247931)
+++ libgo/go/runtime/runtime2.go(working copy)
@@ -793,3 +793,10 @@ type g_ucontext_t [(_sizeof_ucontext_t +
 // sigset is the Go version of the C type sigset_t.
 // _sigset_t is defined by the Makefile from .
 type sigset _sigset_t
+
+// getMemstats returns a pointer to the internal memstats variable,
+// for C code.
+//go:linkname getMemstats runtime.getMemstats
+func getMemstats() *mstats {
+   return &memstats
+}
Index: libgo/runtime/proc.c
===
--- libgo/runtime/proc.c(revision 247848)
+++ libgo/runtime/proc.c(working copy)
@@ -406,6 +406,8 @@ extern void globrunqput(G*)
   __asm__(GOSYM_PREFIX "runtime.globrunqput");
 extern P* pidleget(void)
   __asm__(GOSYM_PREFIX "runtime.pidleget");
+extern struct mstats* getMemstats(void)
+  __asm__(GOSYM_PREFIX "runtime.getMemstats");
 
 bool runtime_isstarted;
 
@@ -726,7 +728,7 @@ runtime_malg(bool allocatestack, bool si
 // 32-bit mode, the Go allocation space is all of
 // memory anyhow.
if(sizeof(void*) == 8) {
-   void *p = runtime_sysAlloc(stacksize, 
&mstats()->other_sys);
+   void *p = runtime_sysAlloc(stacksize, 
&getMemstats()->stacks_sys);
if(p == nil)
runtime_throw("runtime: cannot allocate memory 
for goroutine stack");
*ret_stack = (byte*)p;
Index: libgo/runtime/stack.c
===
--- libgo/runtime/stack.c   (revision 247848)
+++ libgo/runtime/stack.c   (working copy)
@@ -80,7 +80,6 @@ static void doscanstack1(G *gp, void *gc
scanstackblock(sp, (uintptr)(spsize), gcw);
}
 #else
-   M *mp;
byte* bottom;
byte* top;
 


Re: Go patches committed: merge recent changes to gofrontend

2017-05-12 Thread Ian Lance Taylor
On Fri, May 12, 2017 at 3:25 AM, Thomas Schwinge
 wrote:
>
> This doesn't block me in any way, but I wanted to report it anyway:
>
> On Wed, 10 May 2017 10:26:15 -0700, Ian Lance Taylor  wrote:
>> I have committed a large patch to update the Go frontend and libgo to
>> the recent changes in the gofrontend repository.
>
> Doing an incremental rebuild, that ran into:
>
> [...]/source-gcc/libgo/go/runtime/heapdump.go:379:14: error: reference to 
> undefined identifier 'sys.Goexperiment'
>   dumpstr(sys.Goexperiment)
>   ^
> make[3]: *** [runtime.lo] Error 1
> make[3]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/libgo'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/libgo'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/libgo'
> make: *** [all-target-libgo] Error 2
>
> Removing "x86_64-pc-linux-gnu/libgo", and rebuilding, it passed.
>
> There is:
>
>> commit eab2960aee91d3e3a6baa5b1bce01262d24c714f
>> Author: Ian Lance Taylor 
>> Date:   Thu Apr 20 17:08:19 2017 -0700
>>
>> runtime/internal/sys: define Goexperiment
>>
>> The gc toolchain defines Goexperiment based on the environment
>> variable GOEXPERIMENT when the toolchain is built.  We just always set
>> Goexperiment to the empty string.
>>
>> Reviewed-on: https://go-review.googlesource.com/41292
>
> | diff --git a/libgo/Makefile.am b/libgo/Makefile.am
> | index f600a83..f4bf2bc 100644
> | --- a/libgo/Makefile.am
> | +++ b/libgo/Makefile.am
> |
> | @@ -512,6 +512,7 @@
> |   echo "package sys" > version.go.tmp
> |   echo 'const DefaultGoroot = "$(prefix)"' >> version.go.tmp
> |   echo 'const TheVersion = "'`cat $(srcdir)/VERSION | sed 1q`' '`$(GOC) 
> --version | sed 1q`'"' >> version.go.tmp
> | + echo 'const Goexperiment = ``' >> version.go.tmp
> |   echo 'const GOARCH = "'$(GOARCH)'"' >> version.go.tmp
> |   echo 'const GOOS = "'$(GOOS)'"' >> version.go.tmp
> |   echo 'const GccgoToolDir = "$(libexecsubdir)"' >> version.go.tmp
>
> In the failed build's tree, I do see that the Makefile has been renewed,
> but the version.go file has not yet been, at the time the build failed.
> Is there some missing dependency, or should that be implicit?
> libgo/go/runtime/heapdump.go also is new, and is the only user of
> sys.Goexperiment, as far as I can tell.

Thanks.  I don't know what would cause that.  The dependencies in
libgo/Makefile.am look right to me.  They do the same kind of thing as
I see in gcc/Makefile.in (for example, for cs-config.h).  If anybody
has any ideas, please let me know.

Ian


[PATCH v2 01/N] Add default value for last argument of dump functions.

2017-05-12 Thread Martin Liška
Hello.

After fiddling with that, I decided to come up with reworked first part
of the patch and eventual translation to a more hierarchical structure
is subject for discussion.

This first patch adds default for couple of dump functions and it helps in 
future
to transform 0 to a TDF_NONE (or whatever we call it).

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin


0001-Add-default-value-for-last-argument-of-dump-function.patch.bz2
Description: application/bzip


[PATCH v2 2/N] Introduce dump_flags_t type and use it instead of int, type.

2017-05-12 Thread Martin Liška
Second part changes 'int flags' to a new typedef.
All corresponding interfaces have been changed.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From fdcd3bc0c76140d8d2e28cd4b8d15cc55ddba4b0 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 1 Feb 2017 15:34:52 +0100
Subject: [PATCH 2/3] Introduce dump_flags_t type and use it instead of int
 type.

gcc/cp/ChangeLog:

2017-05-12  Martin Liska  

	* class.c (dump_class_hierarchy): Introduce dump_flags_t type and
	use it instead of int type.
	(dump_vtable): Likewise.
	(dump_vtt): Likewise.
	* decl2.c (dump_tu): Likewise.

gcc/c-family/ChangeLog:

2017-05-12  Martin Liska  

	* c-common.h: Introduce dump_flags_t type and
	use it instead of int type.
	* c-gimplify.c (c_genericize): Likewise.
	* c-opts.c: Likewise.

gcc/c/ChangeLog:

2017-05-12  Martin Liska  

	* c-decl.c (c_parse_final_cleanups): Introduce dump_flags_t type and
	use it instead of int type.

gcc/ChangeLog:

2017-05-12  Martin Liska  

	* cfg.c: Introduce dump_flags_t type and
	use it instead of int type.
	* cfg.h: Likewise.
	* cfghooks.c: Likewise.
	* cfghooks.h (struct cfg_hooks): Likewise.
	* cfgrtl.c: Likewise.
	* cfgrtl.h: Likewise.
	* cgraph.c (cgraph_node::get_body): Likewise.
	* coretypes.h: Likewise.
	* domwalk.c: Likewise.
	* domwalk.h: Likewise.
	* dumpfile.c (struct dump_option_value_info): Likewise.
	(dump_enable_all): Likewise.
	(dump_switch_p_1): Likewise.
	(opt_info_switch_p): Likewise.
	* dumpfile.h (enum tree_dump_index): Likewise.
	(struct dump_file_info): Likewise.
	* genemit.c: Likewise.
	* generic-match-head.c: Likewise.
	* gengtype.c (open_base_files): Likewise.
	* gimple-pretty-print.c: Likewise.
	* gimple-pretty-print.h: Likewise.
	* graph.c (print_graph_cfg): Likewise.
	* graphite-scop-detection.c (dot_all_sese): Likewise.
	* ipa-devirt.c (build_type_inheritance_graph): Likewise.
	* loop-unroll.c (report_unroll): Likewise.
	* passes.c (pass_manager::register_one_dump_file): Likewise.
	* print-tree.c: Likewise.
	* statistics.c: Likewise.
	* tree-cfg.c: Likewise.
	* tree-cfg.h: Likewise.
	* tree-dfa.c: Likewise.
	* tree-dfa.h: Likewise.
	* tree-dump.c (dump_function): Likewise.
	* tree-dump.h (struct dump_info): Likewise.
	* tree-pretty-print.c: Likewise.
	* tree-pretty-print.h: Likewise.
	* tree-ssa-live.c: Likewise.
	* tree-ssa-live.h: Likewise.
	* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise.
	* tree-vect-loop.c: Likewise.
	* tree-vect-slp.c: Likewise.
---
 gcc/c-family/c-common.h   |   2 +-
 gcc/c-family/c-gimplify.c |   2 +-
 gcc/c-family/c-opts.c |   6 +-
 gcc/c/c-decl.c|   2 +-
 gcc/cfg.c |   8 +--
 gcc/cfg.h |   6 +-
 gcc/cfghooks.c|   4 +-
 gcc/cfghooks.h|   6 +-
 gcc/cfgrtl.c  |   8 +--
 gcc/cfgrtl.h  |   4 +-
 gcc/cgraph.c  |   2 +-
 gcc/coretypes.h   |   1 +
 gcc/cp/class.c|  14 ++---
 gcc/cp/decl2.c|   2 +-
 gcc/domwalk.c |   2 +-
 gcc/domwalk.h |   2 +-
 gcc/dumpfile.c|  52 +
 gcc/dumpfile.h|  43 --
 gcc/genemit.c |   1 -
 gcc/generic-match-head.c  |   1 -
 gcc/gengtype.c|   3 +-
 gcc/gimple-pretty-print.c | 127 +-
 gcc/gimple-pretty-print.h |  10 ++--
 gcc/graph.c   |   4 +-
 gcc/graphite-scop-detection.c |   4 +-
 gcc/ipa-devirt.c  |   2 +-
 gcc/loop-unroll.c |   2 +-
 gcc/passes.c  |   3 +-
 gcc/print-tree.c  |   2 +-
 gcc/statistics.c  |   2 +-
 gcc/tree-cfg.c|  10 ++--
 gcc/tree-cfg.h|   4 +-
 gcc/tree-dfa.c|   2 +-
 gcc/tree-dfa.h|   2 +-
 gcc/tree-dump.c   |   4 +-
 gcc/tree-dump.h   |   4 +-
 gcc/tree-pretty-print.c   |  39 ++---
 gcc/tree-pretty-print.h   |  18 +++---
 gcc/tree-ssa-live.c   |   8 +--
 gcc/tree-ssa-live.h   |   6 +-
 gcc/tree-ssa-loop-ivcanon.c   |   2 +-
 gcc/tree-vect-loop.c  |   2 +-
 gcc/tree-vect-slp.c   |   2 +-
 43 files changed, 236 insertions(+), 194 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 9e3982d1ce8..ded0c37bc8a 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -904,7 +904,7 @@ extern bool c_common_post_options (const char **);
 extern bool c_common_init (void);
 extern void c_common_finish (void);
 extern void c_common_parse_file (void);
-extern FILE *get_dump_info (int, int *);
+extern FILE *get_dump_info (int, dump_flags_t *);
 extern alias_set_type c_common_get_alias_set (tree);
 extern void c_register_builtin_type (tree, const char*);
 extern bool c_promoting_integer_type_p (const_tree);
diff --git a/gc

[PATCH v2 3/N] Transform TDF_{lang,tree,ipa,rtl} to dump_kind enum.

2017-05-12 Thread Martin Liška
Third part removes TDF_* flags mentioned in the subject. These flags are used
to enable all passes of specific type and Nathan has recently separated these
by a new pair of macros. I hope moving these to a separate enum will help even 
more.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 33ae777e1dc3b6026299fd19b5f2d80e60f3ca4a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 10 May 2017 14:32:26 +0200
Subject: [PATCH 3/3] Transform TDF_{lang,tree,ipa,rtl} to dump_kind enum.

gcc/ChangeLog:

2017-05-12  Martin Liska  

	* cfgrtl.c (rtl_verify_edges): Remove usage of TDF_RTL.
	* dumpfile.c (dump_register): Use new enum dump_kind.
	(get_dump_file_name): Likewise.
	(dump_enable_all): Likewise.
	(dump_switch_p_1): Likewise.
	(enable_rtl_dump_file): Remove usage of TDF_RTL.
	* dumpfile.h (enum dump_kind): New enum type.
	(struct dump_file_info): Create constructor and
	format fields and comments.
	* passes.c (pass_manager::register_one_dump_file):
	Use num dump_kind.
	* statistics.c (statistics_early_init): Likewise.
	* tree-ssa-loop-prefetch.c (dump_mem_details): Replace
	TDF_TREE with TDF_SLIM.
	(gather_memory_references_ref): Likewise.
---
 gcc/cfgrtl.c |  2 +-
 gcc/dumpfile.c   | 84 +-
 gcc/dumpfile.h   | 87 +++-
 gcc/loop-unroll.c|  4 +-
 gcc/passes.c | 10 ++---
 gcc/statistics.c |  2 +-
 gcc/tree-ssa-loop-ivcanon.c  |  2 +-
 gcc/tree-ssa-loop-prefetch.c |  8 ++--
 8 files changed, 111 insertions(+), 88 deletions(-)

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index ead82d23166..aad02921392 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -2527,7 +2527,7 @@ rtl_verify_edges (void)
 	&& JUMP_P (BB_END (bb))
 	&& CROSSING_JUMP_P (BB_END (bb)))
   {
-print_rtl_with_bb (stderr, get_insns (), TDF_RTL | TDF_BLOCKS | TDF_DETAILS);
+	print_rtl_with_bb (stderr, get_insns (), TDF_BLOCKS | TDF_DETAILS);
 error ("Region crossing jump across same section in bb %i",
bb->index);
 err = 1;
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index ee967d825f4..ccc6db3077a 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -47,37 +47,41 @@ FILE *alt_dump_file = NULL;
 const char *dump_file_name;
 dump_flags_t dump_flags;
 
+dump_file_info::dump_file_info (): suffix (NULL), swtch (NULL), glob (NULL),
+  pfilename (NULL), alt_filename (NULL), pstream (NULL), alt_stream (NULL),
+  dkind (DK_none), pflags (), alt_flags (0), optgroup_flags (0),
+  pstate (0), num (0), owns_strings (false), graph_dump_initialized (false)
+{
+}
+
+dump_file_info::dump_file_info (const char *_suffix, const char *_swtch,
+dump_kind _dkind, int _num):
+  suffix (_suffix), swtch (_swtch), glob (NULL),
+  pfilename (NULL), alt_filename (NULL), pstream (NULL), alt_stream (NULL),
+  dkind (_dkind), pflags (), alt_flags (0), optgroup_flags (0),
+  pstate (0), num (_num), owns_strings (false), graph_dump_initialized (false)
+{
+}
+
 /* Table of tree dump switches. This must be consistent with the
TREE_DUMP_INDEX enumeration in dumpfile.h.  */
 static struct dump_file_info dump_files[TDI_end] =
 {
-  {NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, 0, 0, 0, 0, 0, false, false},
-  {".cgraph", "ipa-cgraph", NULL, NULL, NULL, NULL, NULL, TDF_IPA,
-   0, 0, 0, 0, 0, false, false},
-  {".type-inheritance", "ipa-type-inheritance", NULL, NULL, NULL, NULL, NULL, TDF_IPA,
-   0, 0, 0, 0, 0, false, false},
-  {".ipa-clones", "ipa-clones", NULL, NULL, NULL, NULL, NULL, TDF_IPA,
-   0, 0, 0, 0, 0, false, false},
-  {".tu", "translation-unit", NULL, NULL, NULL, NULL, NULL, TDF_LANG,
-   0, 0, 0, 0, 1, false, false},
-  {".class", "class-hierarchy", NULL, NULL, NULL, NULL, NULL, TDF_LANG,
-   0, 0, 0, 0, 2, false, false},
-  {".original", "tree-original", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 3, false, false},
-  {".gimple", "tree-gimple", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 4, false, false},
-  {".nested", "tree-nested", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 5, false, false},
+  dump_file_info (),
+  dump_file_info (".cgraph", "ipa-cgraph", DK_ipa, 0),
+  dump_file_info (".type-inheritance", "ipa-type-inheritance", DK_ipa, 0),
+  dump_file_info (".ipa-clones", "ipa-clones", DK_ipa, 0),
+  dump_file_info (".tu", "translation-unit", DK_lang, 1),
+  dump_file_info (".class", "class-hierarchy", DK_lang, 2),
+  dump_file_info (".original", "tree-original", DK_tree, 3),
+  dump_file_info (".gimple", "tree-gimple", DK_tree, 4),
+  dump_file_info (".nested", "tree-nested", DK_tree, 5),
 #define FIRST_AUTO_NUMBERED_DUMP 6
 
-  {NULL, "lang-all", NULL, NULL, NULL, NULL, NULL, TDF_LANG,
-   0, 0, 0, 0, 0, false, false},
-  {NULL, "tree-all", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 0, false, false},
-  {NULL, "rtl-all

libbacktrace patch: fix memory allocation buglet on DWARF line table with zero dirs

2017-05-12 Thread Than McIntosh via gcc-patches
Hello,

The attached patch to libbacktrace is intended to fix a memory
allocation bug involving reading of line table information.

The scenario of interest takes place when libbacktrace reads a DWARF
line table whose directory count is zero (an unusual case). If the
memory allocator invocation triggers a call to mmap, this can cause
the size passed to mmap to be zero, resulting in an EINVAL error.  The
fix is to detect the zero-directory case and avoid invoking the
allocation helper with a zero size.

Questions + comments welcome.

Thanks, Than

---

diff --git a/libbacktrace/dwarf.c b/libbacktrace/dwarf.c
index 80c64034092..bcb2c0991ef 100644
--- a/libbacktrace/dwarf.c
+++ b/libbacktrace/dwarf.c
@@ -1563,16 +1563,15 @@ add_line (struct backtrace_state *state,
struct dwarf_data *ddata,
   return 1;
 }

-/* Free the line header information.  If FREE_FILENAMES is true we
-   free the file names themselves, otherwise we leave them, as there
-   may be line structures pointing to them.  */
+/* Free the line header information.  */

 static void
 free_line_header (struct backtrace_state *state, struct line_header *hdr,
   backtrace_error_callback error_callback, void *data)
 {
-  backtrace_free (state, hdr->dirs, hdr->dirs_count * sizeof (const char *),
-  error_callback, data);
+  if (hdr->dirs_count != 0)
+backtrace_free (state, hdr->dirs, hdr->dirs_count * sizeof (const char *),
+error_callback, data);
   backtrace_free (state, hdr->filenames,
   hdr->filenames_count * sizeof (char *),
   error_callback, data);
@@ -1633,12 +1632,16 @@ read_line_header (struct backtrace_state
*state, struct unit *u,
   ++hdr->dirs_count;
 }

-  hdr->dirs = ((const char **)
-   backtrace_alloc (state,
- hdr->dirs_count * sizeof (const char *),
- line_buf->error_callback, line_buf->data));
-  if (hdr->dirs == NULL)
-return 0;
+  hdr->dirs = NULL;
+  if (hdr->dirs_count != 0)
+{
+  hdr->dirs = ((const char **)
+   backtrace_alloc (state,
+hdr->dirs_count * sizeof (const char *),
+line_buf->error_callback, line_buf->data));
+  if (hdr->dirs == NULL)
+return 0;
+}

   i = 0;
   while (*hdr_buf.buf != '\0')


Re: [PATCH] Add sequence check to leaf_function_p

2017-05-12 Thread Alexander Monakov
On Fri, 12 May 2017, Wilco Dijkstra wrote:

> This is a followup from: 
> https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02916.html
> 
> Add an assert to leaf_function_p to ensure it is not called from a
> prolog or epilog sequence (which would incorrectly return true in a
> non-leaf function).

As I understand, we need to ensure that get_insns call retrieves the topmost
sequence corresponding to the function body, not any current subsequence that
could have been started via start_sequence.  Therefore the 'prolog or epilog'
part is a bit misleading, we could be in a subsequence for other reasons, and
we need to reject those as well.  So, ...

> diff --git a/gcc/final.c b/gcc/final.c
> index 
> 820162b2d28d734901375017cf0c7a3095e8903e..c9aa610d2696738342f61b9c944a7a2f18e7497c
>  100644
> --- a/gcc/final.c
> +++ b/gcc/final.c
> @@ -4309,6 +4309,9 @@ leaf_function_p (void)
>  {
>rtx_insn *insn;
>  
> +  /* Check we are not in a prologue or epilogue sequence.  */
> +  gcc_assert (!in_sequence_p ());
> +

... can the comment please be reworded to match the code, if it's necessary to
have a comment here at all?  E.g. "Ensure we walk the entire function body after
the following get_insns call".

Thanks.
Alexander


[gomp4] Handle parallel reductions explicitly initialized by the user

2017-05-12 Thread Cesar Philippidis
This patch resolves an ICE inside the nvptx BE involving reduction
variables which are initialized by the user. E.g.

  #pragma acc parallel reduction(+:var)
  {
var = 1;

#pragma acc loop reduction(+:var)
...

Currently, the nvptx BE expects the internal function GOACC_REDUCTION to
have a LHS argument containing the reduction variable to be initialized,
however in this case that would be overly aggressive. So when the user
initializes the reduction variable explicitly to an immediate constant,
the optimizer will propagate that value to the reduction initializer
associated with inner the acc loop directly and remove the LHS for the
outer parallel reduction initializer. This shouldn't be a problem here
because during omp lowering, reduction variables are initialized just
outside of the loop. So even if the loop is a empty, the outer parallel
reduction will still be initialized properly.

The other approach I supposed would be to inhibit constant propagation
on reduction on those GOACC internal functions. But that is probably too
restrictive.

I applied this patch to gomp-4_0-branch. Bernd, I believe that trunk may
have a similar problem. If does, is this OK for trunk? I'll hopefully
start working on trunk again in a week or so.

Cesar

2017-05-12  Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.c (nvptx_goacc_reduction_init): Don't update
	LHS if it doesn't exist.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c: New test.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index f790728..bd1236b 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4948,7 +4948,11 @@ nvptx_goacc_reduction_init (gcall *call)
 	init = var;
 	}
 
-  gimplify_assign (lhs, init, &seq);
+  /* The LHS may be NULL if a reduction variable on a parallel
+	 construct is initialized to some constant inside the parallel
+	 region.  */
+  if (lhs)
+	gimplify_assign (lhs, init, &seq);
 }
 
   pop_gimplify_context (NULL);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c
new file mode 100644
index 000..6510143
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c
@@ -0,0 +1,31 @@
+/* Check a parallel reduction which is are explicitly initialized by
+   the user.  */
+
+/* { dg-do run } */
+
+#include 
+
+int
+main ()
+{
+  int n = 10;
+  float accel = 1.0, host = 1.0;
+  int i;
+
+#pragma acc parallel copyin(n) reduction(*:accel)
+  {
+accel = 1.0;
+#pragma acc loop gang reduction(*:accel)
+for( i = 1; i <= n; i++)
+  {
+	accel *= 2.0;
+  }
+  }
+
+  for (i = 1; i <= n; i++)
+host *= 2.0;
+
+  assert (accel == host);
+
+  return 0;
+}


Re: [PATCH, GCC/ARM, Stage 1] Add missing TARGET_32BIT conditional to movsi

2017-05-12 Thread Prakhar Bahuguna
On 11/05/2017 10:58:52, Kyrill Tkachov wrote:
> 
> On 11/05/17 10:56, Prakhar Bahuguna wrote:
> > Resolve the regressions introduced on non-Thumb targets by the Purecode for
> > ARMv8-M Baseline patch. The TARGET_32BIT conditional has been re-added to 
> > the
> > movsi expander and splitter in addition to TARGET_HAVE_MOVT.
> > 
> > gcc/ChangeLog:
> > 
> > 2017-05-11  Prakhar Bahuguna  
> > 
> > * config/arm/arm.md (movsi): Add TARGET_32BIT in addition to the
> > TARGET_HAVE_MOVT conditional.
> > (movt splitter): Likewise.
> > 
> > Testing done: Full regression testing for ARMv5 (including XScale), ARMv7-A,
> > ARMv7-M and ARMv8-M. The failing tests in particular 
> > (gcc.target/arm/scd42-2.c
> > and g++.dg/torture/vshuf-v4si.c) now pass and no further regressions were
> > found.
> 
> Ok if a bootstrap on arm-none-linux-gnueabihf target passes as well.
> 
> Thanks,
> Kyrill

Patch was committed in r247971 with one trivial change (brackets around the
conditionals), and an arm-none-linux-gnueabihf bootstrap was successful with
this modified patch.

-- 

Prakhar Bahuguna


[PATCH] PR libstdc++/78939 make tuple_size depend on tuple_size

2017-05-12 Thread Jonathan Wakely

This solves a conflict between late changes to the C++17 library and
core language, which make tuple_size::value cause errors for
const structured binding declarations. The problem is that LWG 2770
wants tuple_size to be complete, but sometimes have no value
member, but the core language for structured bindings barfs on a
complete type with no value member.

It's possible this will be fixed in core (e.g. by only trying to use
tuple_size when it's complete _and_ has a usable value member) but for
now let's just fix it in the library. This is consistent with the
solution in libc++ too.

PR libstdc++/78939
* include/std/utility (tuple_size): Only define partial
specializations when tuple_size::value is valid.
* testsuite/20_util/tuple/78939.cc: New.
* testsuite/20_util/tuple/cv_tuple_size_neg.cc: New.

Tested powerpc64le-linux, committed to trunk.

commit f72d64f99385f51efb8f437618f084335fe64f89
Author: Jonathan Wakely 
Date:   Wed Mar 8 03:07:48 2017 +

PR libstdc++/78939 make tuple_size depend on tuple_size

PR libstdc++/78939
* include/std/utility (tuple_size): Only define partial
specializations when tuple_size::value is valid.
* testsuite/20_util/tuple/78939.cc: New.
* testsuite/20_util/tuple/cv_tuple_size_neg.cc: New.

diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index 188fcc2..c18bcb6 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -88,24 +88,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct tuple_size;
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
-  // 2770. tuple_size specialization is not SFINAE compatible
-  template
-struct __tuple_size_cv_impl { };
-
-  template
-struct __tuple_size_cv_impl<_Tp, 
__void_t::value)>>
-: integral_constant::value> { };
-
-  // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2313. tuple_size should always derive from integral_constant
-  template
-struct tuple_size : __tuple_size_cv_impl<_Tp> { };
+  // 2770. tuple_size specialization is not SFINAE compatible
+
+  template::type,
+  typename = typename enable_if::value>::type,
+  size_t = tuple_size<_Tp>::value>
+using __enable_if_has_tuple_size = _Tp;
 
   template
-struct tuple_size : __tuple_size_cv_impl<_Tp> { };
+struct tuple_size>
+: public tuple_size<_Tp> { };
 
   template
-struct tuple_size : __tuple_size_cv_impl<_Tp> { };
+struct tuple_size>
+: public tuple_size<_Tp> { };
+
+  template
+struct tuple_size>
+: public tuple_size<_Tp> { };
 
   /// Gives the type of the ith element of a given tuple type.
   template
diff --git a/libstdc++-v3/testsuite/20_util/tuple/78939.cc 
b/libstdc++-v3/testsuite/20_util/tuple/78939.cc
new file mode 100644
index 000..bab143b
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/tuple/78939.cc
@@ -0,0 +1,49 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++1z } }
+
+// PR libstdc++/78939
+
+#include 
+
+struct A { int i, j; };
+
+int
+test01()
+{
+  A a{};
+  const auto [i, j] = a;
+  return i + j;
+}
+
+int
+test02()
+{
+  A a{};
+  volatile auto [i, j] = a;
+  return i + j;
+}
+
+int
+test03()
+{
+  A a{};
+  const volatile auto [i, j] = a;
+  return i + j;
+}
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cv_tuple_size_neg.cc 
b/libstdc++-v3/testsuite/20_util/tuple/cv_tuple_size_neg.cc
new file mode 100644
index 000..450ed89
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/tuple/cv_tuple_size_neg.cc
@@ -0,0 +1,35 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public Li

Re: [PATCH 0/2] [ARM] PR61551 addressing mode costs

2017-05-12 Thread Charles Baylis
Ping.

Original thread: https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01314.html

(I will fix the typos which Bernhard found before submitting)

On 21 February 2017 at 16:54,   wrote:
> From: Charles Baylis 
>
> Hi Ramana,
>
> This patch set continues previous work on fixing the cost calculations for 
> MEMs
> which use different addressing modes. It implements the approach we discussed
> at Linaro Connect BKK16.
>
> I have included some notes on the patch set as follows:
>
>
> Background:
>
> The motivating problem is that this function:
>   char *f(char *p, int8x8x4_t v, int r) { vst4_s8(p, v); p+=32; return p; }
> compiles to:
> mov r3, r0
> addsr0, r0, #32
> vst4.8  {d0-d3}, [r3]
> bx  lr
> but we would like to get:
> vst4.8  {d0-d3}, [r0]!
> bx  lr
>
> Although the ARM back end contains patterns for the write-back forms of these
> instructions, they are not currently generated. The reason for this is that 
> the
> auto-inc-dec phase does not perform this optimisation because arm_rtx_costs
> incorrectly calculates the cost of "vst4.8  {d0-d3}, [r0]!" as much higher 
> than
> "vst4.8  {d0-d3}, [r3]". For that reason, it considers the POST_INC form to be
> worse than the initial sequence of vst4/add and does not perform the
> transformation.
>
> In fact, GCC6 has regressions compared to GCC5 in this area, and no longer
> does post-indexed addressing for int64_t or 64 bit vector types.
>
>
> Solution:
>
> Change cost calculation for MEMs so that the cost of the memory access
> is computed separately from the cost of the addressing mode. A new
> table-driven mechanism is introduced for the costs of the addressing modes.
>
> The first patch in the series implements the calculation of the cost of
> the memory access.
>
> The second patch adds the table-driven model of the extra cost of the
> selected addressing mode. I don't have access to a lot of CPU pipeline
> information, so most CPUs use the generic cost table, with the exception of
> Cortex-A57.
>
>
> Testing:
>
> I did "make check" on arm-linux-gnueabihf with qemu. This patch fixes one test
> failure in lp1243022.c.
>
>
> Benchmarking:
>
> On Cortex-A15, SPEC2006 and a popular suite of embedded benchmarks perform the
> same as before this patch is applied.  This is expected, the expected gain is
> in code quality for hand-written NEON intrinsics code.
>
>
>
> Charles Baylis (2):
>   [ARM] Refactor costs calculation for MEM.
>   [ARM] Add table of costs for AAarch32 addressing modes.
>
>  gcc/config/arm/aarch-common-protos.h |  16 +
>  gcc/config/arm/aarch-cost-tables.h   |  54 ++--
>  gcc/config/arm/arm.c | 120 
> ++-
>  3 files changed, 154 insertions(+), 36 deletions(-)
>
> --
> 2.7.4
>


Re: [PATCH] PR libstdc++/78939 make tuple_size depend on tuple_size

2017-05-12 Thread Jonathan Wakely

On 12/05/17 15:49 +0100, Jonathan Wakely wrote:

This solves a conflict between late changes to the C++17 library and
core language, which make tuple_size::value cause errors for
const structured binding declarations. The problem is that LWG 2770
wants tuple_size to be complete, but sometimes have no value
member, but the core language for structured bindings barfs on a
complete type with no value member.

It's possible this will be fixed in core (e.g. by only trying to use
tuple_size when it's complete _and_ has a usable value member) but for
now let's just fix it in the library. This is consistent with the
solution in libc++ too.

PR libstdc++/78939
* include/std/utility (tuple_size): Only define partial
specializations when tuple_size::value is valid.
* testsuite/20_util/tuple/78939.cc: New.
* testsuite/20_util/tuple/cv_tuple_size_neg.cc: New.

Tested powerpc64le-linux, committed to trunk.


This is the patch for gcc-7-branch, which only uses the new
tuple_size implementation for C++17 (because it's only needed to
make structured bindings work, and there's a chance this new
implementation could cause regressions for C++14 code).

Tested x86_64-linux, committed to gcc-7-branch.


commit ff2b5b263f9a7f7a48ae5de24d9d97bf81dc116c
Author: redi 
Date:   Fri May 12 14:43:11 2017 +

PR libstdc++/78939 make tuple_size depend on tuple_size

	PR libstdc++/78939
	* include/std/utility (tuple_size) [__cplusplus > 201402L]:
	Only define partial specializations when tuple_size::value is
	valid.
	* testsuite/20_util/tuple/78939.cc: New.

diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index 188fcc2..1d9b8eb 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -89,6 +89,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2770. tuple_size specialization is not SFINAE compatible
+
+#if __cplusplus <= 201402L
   template
 struct __tuple_size_cv_impl { };
 
@@ -106,6 +108,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct tuple_size : __tuple_size_cv_impl<_Tp> { };
+#else
+  template::type,
+	   typename = typename enable_if::value>::type,
+	   size_t = tuple_size<_Tp>::value>
+using __enable_if_has_tuple_size = _Tp;
+
+  template
+struct tuple_size>
+: public tuple_size<_Tp> { };
+
+  template
+struct tuple_size>
+: public tuple_size<_Tp> { };
+
+  template
+struct tuple_size>
+: public tuple_size<_Tp> { };
+#endif
 
   /// Gives the type of the ith element of a given tuple type.
   template
diff --git a/libstdc++-v3/testsuite/20_util/tuple/78939.cc b/libstdc++-v3/testsuite/20_util/tuple/78939.cc
new file mode 100644
index 000..bab143b
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/tuple/78939.cc
@@ -0,0 +1,49 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++1z } }
+
+// PR libstdc++/78939
+
+#include 
+
+struct A { int i, j; };
+
+int
+test01()
+{
+  A a{};
+  const auto [i, j] = a;
+  return i + j;
+}
+
+int
+test02()
+{
+  A a{};
+  volatile auto [i, j] = a;
+  return i + j;
+}
+
+int
+test03()
+{
+  A a{};
+  const volatile auto [i, j] = a;
+  return i + j;
+}


Re: [PATCH] Add sequence check to leaf_function_p

2017-05-12 Thread Wilco Dijkstra
Alexander Monakov wrote:
> As I understand, we need to ensure that get_insns call retrieves the topmost
> sequence corresponding to the function body, not any current subsequence that
> could have been started via start_sequence.  Therefore the 'prolog or epilog'
> part is a bit misleading, we could be in a subsequence for other reasons, and
> we need to reject those as well.  So, ...

It's most likely that if this assert triggers, it is from a prolog or epilog 
sequence.

> ... can the comment please be reworded to match the code, if it's necessary to
> have a comment here at all?  E.g. "Ensure we walk the entire function body 
> after
> the following get_insns call".

I've changed to to "Ensure we walk the entire function body."

Wilco

2017-05-11  Wilco Dijkstra  

* final.c (leaf_function_p): Check we are not in a sequence.
--

diff --git a/gcc/final.c b/gcc/final.c
index 
820162b2d28d734901375017cf0c7a3095e8903e..e61b78b0a915df748fcf3d73cd46db855cf7a1c6
 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -4309,6 +4309,9 @@ leaf_function_p (void)
 {
   rtx_insn *insn;
 
+  /* Ensure we walk the entire function body.  */
+  gcc_assert (!in_sequence_p ());
+
   /* Some back-ends (e.g. s390) want leaf functions to stay leaf
  functions even if they call mcount.  */
   if (crtl->profile && !targetm.keep_leaf_when_profiled ())


Go patch committed: pass current function to Backend::call_expression

2017-05-12 Thread Ian Lance Taylor
This patch by Than McIntosh passes in the calling (containing)
Bfunction when invoking the Backend method call_expression(), so as to
handle the case where generation of the call forces the creation of a
temp var within the calling function.  Bootstrapped and ran Go tests
on x86_64-pc-linux-gnu.  Committed to mainline.

Ian

2017-05-12  Than McIntosh  

* go-gcc.cc (Gcc_backend::call_expression): Add caller parameter.
Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc(revision 247937)
+++ gcc/go/go-gcc.cc(working copy)
@@ -348,7 +348,8 @@ class Gcc_backend : public Backend
   array_index_expression(Bexpression* array, Bexpression* index, Location);
 
   Bexpression*
-  call_expression(Bexpression* fn, const std::vector& args,
+  call_expression(Bfunction* caller, Bexpression* fn,
+  const std::vector& args,
   Bexpression* static_chain, Location);
 
   Bexpression*
@@ -1892,9 +1893,11 @@ Gcc_backend::array_index_expression(Bexp
 
 // Create an expression for a call to FN_EXPR with FN_ARGS.
 Bexpression*
-Gcc_backend::call_expression(Bexpression* fn_expr,
+Gcc_backend::call_expression(Bfunction*, // containing fcn for call
+ Bexpression* fn_expr,
  const std::vector& fn_args,
- Bexpression* chain_expr, Location location)
+ Bexpression* chain_expr,
+ Location location)
 {
   tree fn = fn_expr->get_tree();
   if (fn == error_mark_node || TREE_TYPE(fn) == error_mark_node)
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 247967)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-d5bfa6cebb19a154cbfbc53f6e647d2ca7adef68
+2f21020c9f61b31bd04d5b814aaa27bf976bf07a
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/backend.h
===
--- gcc/go/gofrontend/backend.h (revision 247848)
+++ gcc/go/gofrontend/backend.h (working copy)
@@ -372,9 +372,11 @@ class Backend
   virtual Bexpression*
   array_index_expression(Bexpression* array, Bexpression* index, Location) = 0;
 
-  // Create an expression for a call to FN with ARGS.
+  // Create an expression for a call to FN with ARGS, taking place within
+  // caller CALLER.
   virtual Bexpression*
-  call_expression(Bexpression* fn, const std::vector& args,
+  call_expression(Bfunction *caller, Bexpression* fn,
+  const std::vector& args,
  Bexpression* static_chain, Location) = 0;
 
   // Return an expression that allocates SIZE bytes on the stack.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 247848)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -10290,13 +10290,16 @@ Call_expression::do_get_backend(Translat
   bfn = gogo->backend()->convert_expression(bft, bfn, location);
 }
 
-  Bexpression* call = gogo->backend()->call_expression(bfn, fn_args,
-  bclosure, location);
+  Bfunction* bfunction = NULL;
+  if (context->function())
+bfunction = context->function()->func_value()->get_decl();
+  Bexpression* call = gogo->backend()->call_expression(bfunction, bfn,
+   fn_args, bclosure,
+   location);
 
   if (this->results_ != NULL)
 {
   Bexpression* bcall_ref = this->call_result_ref(context);
-  Bfunction* bfunction = context->function()->func_value()->get_decl();
   Bstatement* assn_stmt =
   gogo->backend()->assignment_statement(bfunction,
 bcall_ref, call, location);
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 247848)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -708,8 +708,8 @@ Gogo::init_imports(std::vectorbackend()->function_code_expression(pfunc, unknown_loc);
   Bexpression* pfunc_call =
-   this->backend()->call_expression(pfunc_code, empty_args,
-NULL, unknown_loc);
+  this->backend()->call_expression(bfunction, pfunc_code, empty_args,
+   NULL, unknown_loc);
   init_stmts.push_back(this->backend()->expression_statement(bfunction,
  pfunc_call));
 }
@@ -1498,7 +1498,7 @@ Gogo::write_globals()
   Bfunction* initfn = func->get_or_make_decl(this, *p);
   Bexpression* func_code =
   this->backend()->function_co

[PATCH] Fix PR middle-end/80707, ICE: r247844 causes error: extra outgoing edge

2017-05-12 Thread Peter Bergner
My fix for PR51513 modified group_case_labels_stmt() to remove unreachable
case statements labels.  Being a middle-end newbie, I incorrectly thought
group_case_labels_stmt() was only called very early, before we have a cfg.
With -O3, we can generate extra copies of the switch statement, well after
the cfg exists, and we end up calling group_case_labels_stmt() to optimize
them.  In those cases, we need to remove their edges from the cfg.

This passes my bootstrap and regtesting on powerpc64le-linux and x86_64-linux
with no regressions.  In addition, both David and HJ confirm this fixes the
bootstrap issues they ran into.

Is this ok for trunk?

Peter

gcc/
PR middle-end/80707
* tree-cfg.c: Remove cfg edges of unreachable case statements.

gcc/testsuite/
* g++.dg/pr80707.C: New test.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 247845)
+++ gcc/tree-cfg.c  (working copy)
@@ -1684,6 +1684,10 @@ group_case_labels_stmt (gswitch *stmt)
  || (EDGE_COUNT (base_bb->succs) == 0
  && gimple_seq_unreachable_p (bb_seq (base_bb
{
+ edge e;
+ if (base_bb != default_bb
+ && (e = find_edge (gimple_bb (stmt), base_bb)) != NULL)
+   remove_edge_and_dominated_blocks (e);
  gimple_switch_set_label (stmt, i, NULL_TREE);
  i++;
  new_size--;
Index: gcc/testsuite/g++.dg/pr80707.C
===
--- gcc/testsuite/g++.dg/pr80707.C  (nonexistent)
+++ gcc/testsuite/g++.dg/pr80707.C  (working copy)
@@ -0,0 +1,29 @@
+// PR middle-end/80707 ICE: extra outgoing edge causes verify_flow_info error.
+// { dg-do compile }
+// { dg-options "-O3" } */
+
+struct A {
+  int m_fn1(int &) const;
+};
+int A::m_fn1(int &p1) const {
+  int a[6];
+  int b = 0;
+  for (int i;; i++) {
+if (a[i])
+  break;
+b++;
+  }
+  while (b) {
+int c;
+switch (b) {
+case 1:
+  c = 0;
+  break;
+case 5:
+  c = a[0];
+}
+if (c)
+  p1 = 0;
+b--;
+  }
+}



[PING**2] [PATCH] Force use of absolute path names for gcov

2017-05-12 Thread Bernd Edlinger
Ping...

On 04/28/17 19:41, Bernd Edlinger wrote:
> Ping...
>
> I attached a rebased patch file, with the doc changes and
> merge conflicts with trunk of today fixed, but otherwise
> identical.
>
>
> Thanks
> Bernd.
>
> On 04/21/17 22:26, Bernd Edlinger wrote:
>>
>>
>> On 04/21/17 21:50, Joseph Myers wrote:
>>> On Fri, 21 Apr 2017, Bernd Edlinger wrote:
>>>
 So I would like to add a -fprofile-abs-path option that
 forces absolute path names in gcno files, which allows gcov
 to get the true canonicalized source name.
>>>
>>> I don't see any actual documentation of this option in the patch (you
>>> add
>>> it to the summary list of options, and mention it in text under the
>>> documentation of --coverage, but don't have any actual @item
>>> -fprofile-abs-path / @opindex fprofile-abs-path paragraph with text
>>> describing what the option does).
>>>
>>
>> Ah yes, thanks.
>>
>> So I'll add one more sentence to invoke.texi:
>>
>> @@ -10696,6 +10713,12 @@
>>  generate test coverage data.  Coverage data matches the source files
>>  more closely if you do not optimize.
>>
>> +@item -fprofile-abs-path
>> +@opindex fprofile-abs-path
>> +Automatically convert relative source file names to absolute path names
>> +in the @file{.gcno} files.  This allows @command{gcov} to find the
>> correct
>> +sources in projects with multiple directories.
>> +
>>  @item -fprofile-dir=@var{path}
>>  @opindex fprofile-dir
>>
>>
>>
>>
>> Bernd.


[PING][PATCH][ PR rtl-optimization/79286] Drop may_trap_p exception to testing dominance in update_equiv_regs

2017-05-12 Thread Bernd Edlinger
Ping...

On 04/29/17 09:06, Bernd Edlinger wrote:
> On 04/28/17 20:46, Jeff Law wrote:
>> On 04/28/2017 11:27 AM, Bernd Edlinger wrote:

>>>
>>> Yes I agree, that is probably not worth it.  So I could try to remove
>>> the special handling of PIC+const and see what happens.
>>>
>>> However the SYMBOL_REF_FUNCTION_P is another story, that part I would
>>> like to keep: It happens quite often, already w/o -fpic that call
>>> statements are using SYMBOL_REFs to ordinary (not weak) function
>>> symbols, and may_trap returns 1 for these call statements wihch is IMHO
>>> wrong.
>> Hmm, thinking more about this, wasn't the original case a PIC referrence
>> for something like &x[BIGNUM].
>>
>> Perhaps we could consider a PIC reference without other arithmetic as
>> safe.  That would likely pick up the SYMBOL_REF_FUNCTION_P case you want
>> as well good deal many more PIC references as non-trapping.
>>
>
> Yes, I like this idea.
>
> I tried to compile openssl with -m32 -fpic as an example, and counted
> how often the mem[pic+const] is hit: that was 2353 times, all kind of
> object refs.
>
> Then I tried your idea, and only 54 unhandled pic refs remained, all of
> them looking like this:
>
> (plus:SI (reg:SI 107)
> (const:SI (plus:SI (unspec:SI [
> (symbol_ref:SI ("bf_init") [flags 0x2]  0x2ac00f7bac60 bf_init>)
> ] UNSPEC_GOTOFF)
> (const_int 4164 [0x1044]
>
> I believe that is a negligible fall out from such a big code base.
>
> Although the pic references do no longer reach the
> SYMBOL_REF_FUNCTION_P in this version of the patch, I still see
> that happening without -fpic option, so I left it as is.
>
>
> Attached is the new version of my patch.
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.


[PING**3] [PATCH, ARM] correctly encode the CC reg data flow

2017-05-12 Thread Bernd Edlinger
Ping...

On 04/29/17 19:21, Bernd Edlinger wrote:
> Ping...
>
> On 04/20/17 20:11, Bernd Edlinger wrote:
>> Ping...
>>
>> for this patch:
>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
>>
>> On 01/18/17 16:36, Bernd Edlinger wrote:
>>> On 01/13/17 19:28, Bernd Edlinger wrote:
 On 01/13/17 17:10, Bernd Edlinger wrote:
> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this is related to PR77308, the follow-up patch will depend on this
>>> one.
>>>
>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>> before reload, a mis-compilation in libgcc function
>>> __gnu_satfractdasq
>>> was discovered, see [1] for more details.
>>>
>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>> up into this:
>>>
>>>[(set (reg:CC CC_REGNUM)
>>>  (compare:CC (match_dup 0) (match_dup 1)))
>>> (parallel [(set (reg:CC CC_REGNUM)
>>> (compare:CC (match_dup 3) (match_dup 4)))
>>>(set (match_dup 2)
>>> (minus:SI (match_dup 5)
>>>  (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>> 0])]
>>>
>>>[(set (reg:CC CC_REGNUM)
>>>  (compare:CC (match_dup 2) (match_dup 3)))
>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>(set (reg:CC CC_REGNUM)
>>> (compare:CC (match_dup 0) (match_dup 1]
>>>
>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>> redundant and thus got removed, because the data values are
>>> identical.
>>>
>>> I think that applies to a number of similar pattern where data
>>> flow is happening through the CC reg.
>>>
>>> So this is a kind of correctness issue, and should be fixed
>>> independently from the optimization issue PR77308.
>>>
>>> Therefore I think the patterns need to specify the true
>>> value that will be in the CC reg, in order for cse to
>>> know what the instructions are really doing.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>>
>>
>> I agree you've found a valid problem here, but I have some issues
>> with
>> the patch itself.
>>
>>
>> (define_insn_and_split "subdi3_compare1"
>>   [(set (reg:CC_NCV CC_REGNUM)
>> (compare:CC_NCV
>>   (match_operand:DI 1 "register_operand" "r")
>>   (match_operand:DI 2 "register_operand" "r")))
>>(set (match_operand:DI 0 "register_operand" "=&r")
>> (minus:DI (match_dup 1) (match_dup 2)))]
>>   "TARGET_32BIT"
>>   "#"
>>   "&& reload_completed"
>>   [(parallel [(set (reg:CC CC_REGNUM)
>>(compare:CC (match_dup 1) (match_dup 2)))
>>   (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>> 2)))])
>>(parallel [(set (reg:CC_C CC_REGNUM)
>>(compare:CC_C
>>  (zero_extend:DI (match_dup 4))
>>  (plus:DI (zero_extend:DI (match_dup 5))
>>   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>>   (set (match_dup 3)
>>(minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>>
>>
>> This pattern is now no-longer self consistent in that before the
>> split
>> the overall result for the condition register is in mode CC_NCV, but
>> afterwards it is just CC_C.
>>
>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>> reflect the result of the 64-bit comparison), but that then implies
>> that
>> the cc mode of subsi3_carryin_compare is incorrect as well and
>> should in
>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>> agree
>> that CC_NCV is the correct mode for this operation
>>
>> I'm not sure if there are other consequences that will fall out from
>> fixing this (it's possible that we might need a change to
>> select_cc_mode
>> as well).
>>
>
> Yes, this is still a bit awkward...
>
> The N and V bit will be the correct result for the subdi3_compare1
> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
> only gets the C bit correct, the expression for N and V is a different
> one.
>
> It probably works, because the subsi3_carryin_compare instruction sets
> more CC bits than the pattern does explicitly specify the value.
> We know the subsi3_carryin_compare also computes the NV bits, but
> it i

[PING**3] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-05-12 Thread Bernd Edlinger
Ping...

On 04/29/17 19:45, Bernd Edlinger wrote:
> Ping...
>
> I attached a rebased version since there was a merge conflict in
> the xordi3 pattern, otherwise the patch is still identical.
> It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2
> early when the target has no neon or iwmmxt.
>
>
> Thanks
> Bernd.
>
>
>
> On 11/28/16 20:42, Bernd Edlinger wrote:
>> On 11/25/16 12:30, Ramana Radhakrishnan wrote:
>>> On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
>>>  wrote:
 Hi!

 This improves the stack usage on the sha512 test case for the case
 without hardware fpu and without iwmmxt by splitting all di-mode
 patterns right while expanding which is similar to what the
 shift-pattern
 does.  It does nothing in the case iwmmxt and fpu=neon or vfp as
 well as
 thumb1.

>>>
>>> I would go further and do this in the absence of Neon, the VFP unit
>>> being there doesn't help with DImode operations i.e. we do not have 64
>>> bit integer arithmetic instructions without Neon. The main reason why
>>> we have the DImode patterns split so late is to give a chance for
>>> folks who want to do 64 bit arithmetic in Neon a chance to make this
>>> work as well as support some of the 64 bit Neon intrinsics which IIRC
>>> map down to these instructions. Doing this just for soft-float doesn't
>>> improve the default case only. I don't usually test iwmmxt and I'm not
>>> sure who has the ability to do so, thus keeping this restriction for
>>> iwMMX is fine.
>>>
>>>
>>
>> Yes I understand, thanks for pointing that out.
>>
>> I was not aware what iwmmxt exists at all, but I noticed that most
>> 64bit expansions work completely different, and would break if we split
>> the pattern early.
>>
>> I can however only look at the assembler outout for iwmmxt, and make
>> sure that the stack usage does not get worse.
>>
>> Thus the new version of the patch keeps only thumb1, neon and iwmmxt as
>> it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack
>> for the test cases, and vfp and soft-float at around 270 bytes stack
>> usage.
>>
 It reduces the stack usage from 2300 to near optimal 272 bytes (!).

 Note this also splits many ldrd/strd instructions and therefore I will
 post a followup-patch that mitigates this effect by enabling the
 ldrd/strd
 peephole optimization after the necessary reg-testing.


 Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>
>>> What do you mean by arm-linux-gnueabihf - when folks say that I
>>> interpret it as --with-arch=armv7-a --with-float=hard
>>> --with-fpu=vfpv3-d16 or (--with-fpu=neon).
>>>
>>> If you've really bootstrapped and regtested it on armhf, doesn't this
>>> patch as it stand have no effect there i.e. no change ?
>>> arm-linux-gnueabihf usually means to me someone has configured with
>>> --with-float=hard, so there are no regressions in the hard float ABI
>>> case,
>>>
>>
>> I know it proves little.  When I say arm-linux-gnueabihf
>> I do in fact mean --enable-languages=all,ada,go,obj-c++
>> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
>> --with-float=hard.
>>
>> My main interest in the stack usage is of course not because of linux,
>> but because of eCos where we have very small task stacks and in fact
>> no fpu support by the O/S at all, so that patch is exactly what we need.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.


[PING**2] [PATCH, ARM] Further improve stack usage in sha512, part 2 (PR 77308)

2017-05-12 Thread Bernd Edlinger
Ping...

On 04/29/17 19:52, Bernd Edlinger wrote:
> Ping...
>
> I attached the latest version of my patch.
>
>
> Thanks
> Bernd.
>
> On 12/18/16 14:14, Bernd Edlinger wrote:
>> Hi,
>>
>> this splits the *arm_negdi2, *arm_cmpdi_insn and *arm_cmpdi_unsigned
>> also at split1 except for TARGET_NEON and TARGET_IWMMXT.
>>
>> In the new test case the stack is reduced to about 270 bytes, except
>> for neon and iwmmxt, where this does not change anything.
>>
>> This patch depends on [1] and [2] before it can be applied.
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>>
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html
>> [2] https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01562.html


[PING] [PATCH] Implement a warning for bogus sizeof(pointer) / sizeof(pointer[0])

2017-05-12 Thread Bernd Edlinger
Ping for the C changes.

Thanks
Bernd.

On 05/03/17 15:14, Jason Merrill wrote:
> On Tue, May 2, 2017 at 9:26 AM, Bernd Edlinger
>  wrote:
>> On 05/01/17 17:54, Jason Merrill wrote:
>>> On Fri, Apr 28, 2017 at 1:05 PM, Bernd Edlinger
>>>  wrote:
 On 04/28/17 17:29, Martin Sebor wrote:
> On 04/28/2017 08:12 AM, Bernd Edlinger wrote:
>>
>> Do you want me to change the %qT format strings to %T ?
>
> Yes, with the surrounding %< and %> the nested directives should
> use the unquoted forms, otherwise the printer would end up quoting
> both the whole expression and the type operand.
>
> FWIW, to help avoid this mistake, I think this might be something
> for GCC -Wformat to warn on and the pretty-printer to detect (and
> ICE on).
>

 Ah, now I understand.  That's pretty advanced.

 Here is the modified patch with correct quoting of the expression.

 Bootstrap and reg-testing on x86_64-pc-linux-gnu.
>>>
 * cp-gimplify.c (cp_fold): Implement the -Wsizeof_pointer_div warning.
>>>
>>> I think this warning belongs in cp_build_binary_op rather than cp_fold.
>>>
>>
>> Done, as suggested.
>
> The pattern in that function is to treat all *_DIV_EXPR the same; I
> don't think we need to break that pattern with this patch.  So please
> move the new code after the other DIV case labels.  With that the C++
> changes are OK.
>
> Jason
>


On 05/03/17 15:14, Jason Merrill wrote:
 > On Tue, May 2, 2017 at 9:26 AM, Bernd Edlinger
 >  wrote:
 >> On 05/01/17 17:54, Jason Merrill wrote:
 >>> On Fri, Apr 28, 2017 at 1:05 PM, Bernd Edlinger
 >>>  wrote:
  On 04/28/17 17:29, Martin Sebor wrote:
 > On 04/28/2017 08:12 AM, Bernd Edlinger wrote:
 >>
 >> Do you want me to change the %qT format strings to %T ?
 >
 > Yes, with the surrounding %< and %> the nested directives should
 > use the unquoted forms, otherwise the printer would end up quoting
 > both the whole expression and the type operand.
 >
 > FWIW, to help avoid this mistake, I think this might be something
 > for GCC -Wformat to warn on and the pretty-printer to detect (and
 > ICE on).
 >
 
  Ah, now I understand.  That's pretty advanced.
 
  Here is the modified patch with correct quoting of the expression.
 
  Bootstrap and reg-testing on x86_64-pc-linux-gnu.
 >>>
  * cp-gimplify.c (cp_fold): Implement the -Wsizeof_pointer_div warning.
 >>>
 >>> I think this warning belongs in cp_build_binary_op rather than cp_fold.
 >>>
 >>
 >> Done, as suggested.
 >
 > The pattern in that function is to treat all *_DIV_EXPR the same; I
 > don't think we need to break that pattern with this patch.  So please
 > move the new code after the other DIV case labels.  With that the C++
 > changes are OK.
 >
 > Jason
 >


Re: [PATCH] Don't fold in save_expr (PR sanitizer/80536, PR sanitizer/80386)

2017-05-12 Thread Joseph Myers
On Fri, 12 May 2017, Marek Polacek wrote:

> > Otherwise the tree.c change is ok.
> 
> Thanks.  Jason/Joseph, any comments?

I don't have any comments on this.  (c_save_expr folds to avoid 
c_fully_fold needing to go inside SAVE_EXPRs.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [patch, libfortran] Fix amount of memory allocation for matrix - vector calculation

2017-05-12 Thread Thomas Koenig

Am 12.05.2017 um 10:16 schrieb Janne Blomqvist:

On Fri, May 12, 2017 at 1:14 AM, Thomas Koenig  wrote:

Hello world,

the memory allocation for the buffer in the library matmul
routines still has one problem: The value of 0xdeadbeef meant
as poison could end up in the calculation of the size of the
buffer for the blocked matmul.

The attached patch fixes that. Verified with regression-test,
also by running a few select test cases under valgrind.

No test case because nothing appeared to fail.

OK for trunk?


Patch missing?


Well, yes.

Here it is.

Regards

Thomas

Index: generated/matmul_c10.c
===
--- generated/matmul_c10.c	(Revision 247839)
+++ generated/matmul_c10.c	(Arbeitskopie)
@@ -222,9 +222,9 @@ matmul_c10_avx (gfc_array_c10 * const restrict ret
   bxstride = GFC_DESCRIPTOR_STRIDE(b,0);
 
   /* bystride should never be used for 1-dimensional b.
-	 in case it is we want it to cause a segfault, rather than
-	 an incorrect result. */
-  bystride = 0xDEADBEEF;
+ The value is only used for calculation of the
+ memory by the buffer.  */
+  bystride = 256;
   ycount = 1;
 }
   else
@@ -774,9 +774,9 @@ matmul_c10_avx2 (gfc_array_c10 * const restrict re
   bxstride = GFC_DESCRIPTOR_STRIDE(b,0);
 
   /* bystride should never be used for 1-dimensional b.
-	 in case it is we want it to cause a segfault, rather than
-	 an incorrect result. */
-  bystride = 0xDEADBEEF;
+ The value is only used for calculation of the
+ memory by the buffer.  */
+  bystride = 256;
   ycount = 1;
 }
   else
@@ -1326,9 +1326,9 @@ matmul_c10_avx512f (gfc_array_c10 * const restrict
   bxstride = GFC_DESCRIPTOR_STRIDE(b,0);
 
   /* bystride should never be used for 1-dimensional b.
-	 in case it is we want it to cause a segfault, rather than
-	 an incorrect result. */
-  bystride = 0xDEADBEEF;
+ The value is only used for calculation of the
+ memory by the buffer.  */
+  bystride = 256;
   ycount = 1;
 }
   else
@@ -1874,9 +1874,9 @@ matmul_c10_vanilla (gfc_array_c10 * const restrict
   bxstride = GFC_DESCRIPTOR_STRIDE(b,0);
 
   /* bystride should never be used for 1-dimensional b.
-	 in case it is we want it to cause a segfault, rather than
-	 an incorrect result. */
-  bystride = 0xDEADBEEF;
+ The value is only used for calculation of the
+ memory by the buffer.  */
+  bystride = 256;
   ycount = 1;
 }
   else
@@ -2480,9 +2480,9 @@ matmul_c10 (gfc_array_c10 * const restrict retarra
   bxstride = GFC_DESCRIPTOR_STRIDE(b,0);
 
   /* bystride should never be used for 1-dimensional b.
-	 in case it is we want it to cause a segfault, rather than
-	 an incorrect result. */
-  bystride = 0xDEADBEEF;
+ The value is only used for calculation of the
+ memory by the buffer.  */
+  bystride = 256;
   ycount = 1;
 }
   else
Index: generated/matmul_c16.c
===
--- generated/matmul_c16.c	(Revision 247839)
+++ generated/matmul_c16.c	(Arbeitskopie)
@@ -222,9 +222,9 @@ matmul_c16_avx (gfc_array_c16 * const restrict ret
   bxstride = GFC_DESCRIPTOR_STRIDE(b,0);
 
   /* bystride should never be used for 1-dimensional b.
-	 in case it is we want it to cause a segfault, rather than
-	 an incorrect result. */
-  bystride = 0xDEADBEEF;
+ The value is only used for calculation of the
+ memory by the buffer.  */
+  bystride = 256;
   ycount = 1;
 }
   else
@@ -774,9 +774,9 @@ matmul_c16_avx2 (gfc_array_c16 * const restrict re
   bxstride = GFC_DESCRIPTOR_STRIDE(b,0);
 
   /* bystride should never be used for 1-dimensional b.
-	 in case it is we want it to cause a segfault, rather than
-	 an incorrect result. */
-  bystride = 0xDEADBEEF;
+ The value is only used for calculation of the
+ memory by the buffer.  */
+  bystride = 256;
   ycount = 1;
 }
   else
@@ -1326,9 +1326,9 @@ matmul_c16_avx512f (gfc_array_c16 * const restrict
   bxstride = GFC_DESCRIPTOR_STRIDE(b,0);
 
   /* bystride should never be used for 1-dimensional b.
-	 in case it is we want it to cause a segfault, rather than
-	 an incorrect result. */
-  bystride = 0xDEADBEEF;
+ The value is only used for calculation of the
+ memory by the buffer.  */
+  bystride = 256;
   ycount = 1;
 }
   else
@@ -1874,9 +1874,9 @@ matmul_c16_vanilla (gfc_array_c16 * const restrict
   bxstride = GFC_DESCRIPTOR_STRIDE(b,0);
 
   /* bystride should never be used for 1-dimensional b.
-	 in case it is we want it to cause a segfault, rather than
-	 an incorrect result. */
-  bystride = 0xDEADBEEF;
+ The value is only used for calculation of the
+ memory by the buffer.  */
+  bystride = 256;
   ycount = 1;
 }
   else
@@ -2480,9 +

Re: [PATCH] Fix PR middle-end/80707, ICE: r247844 causes error: extra outgoing edge

2017-05-12 Thread Richard Biener
On May 12, 2017 6:46:29 PM GMT+02:00, Peter Bergner  
wrote:
>My fix for PR51513 modified group_case_labels_stmt() to remove
>unreachable
>case statements labels.  Being a middle-end newbie, I incorrectly
>thought
>group_case_labels_stmt() was only called very early, before we have a
>cfg.
>With -O3, we can generate extra copies of the switch statement, well
>after
>the cfg exists, and we end up calling group_case_labels_stmt() to
>optimize
>them.  In those cases, we need to remove their edges from the cfg.
>
>This passes my bootstrap and regtesting on powerpc64le-linux and
>x86_64-linux
>with no regressions.  In addition, both David and HJ confirm this fixes
>the
>bootstrap issues they ran into.
>
>Is this ok for trunk?

OK.

Richard.

>Peter
>
>gcc/
>   PR middle-end/80707
>   * tree-cfg.c: Remove cfg edges of unreachable case statements.
>
>gcc/testsuite/
>   * g++.dg/pr80707.C: New test.
>
>Index: gcc/tree-cfg.c
>===
>--- gcc/tree-cfg.c (revision 247845)
>+++ gcc/tree-cfg.c (working copy)
>@@ -1684,6 +1684,10 @@ group_case_labels_stmt (gswitch *stmt)
> || (EDGE_COUNT (base_bb->succs) == 0
> && gimple_seq_unreachable_p (bb_seq (base_bb
>   {
>+edge e;
>+if (base_bb != default_bb
>+&& (e = find_edge (gimple_bb (stmt), base_bb)) != NULL)
>+  remove_edge_and_dominated_blocks (e);
> gimple_switch_set_label (stmt, i, NULL_TREE);
> i++;
> new_size--;
>Index: gcc/testsuite/g++.dg/pr80707.C
>===
>--- gcc/testsuite/g++.dg/pr80707.C (nonexistent)
>+++ gcc/testsuite/g++.dg/pr80707.C (working copy)
>@@ -0,0 +1,29 @@
>+// PR middle-end/80707 ICE: extra outgoing edge causes
>verify_flow_info error.
>+// { dg-do compile }
>+// { dg-options "-O3" } */
>+
>+struct A {
>+  int m_fn1(int &) const;
>+};
>+int A::m_fn1(int &p1) const {
>+  int a[6];
>+  int b = 0;
>+  for (int i;; i++) {
>+if (a[i])
>+  break;
>+b++;
>+  }
>+  while (b) {
>+int c;
>+switch (b) {
>+case 1:
>+  c = 0;
>+  break;
>+case 5:
>+  c = a[0];
>+}
>+if (c)
>+  p1 = 0;
>+b--;
>+  }
>+}



Re: [PATCH] Fix PR middle-end/80707, ICE: r247844 causes error: extra outgoing edge

2017-05-12 Thread Peter Bergner
On 5/12/17 11:51 AM, Richard Biener wrote:
> On May 12, 2017 6:46:29 PM GMT+02:00, Peter Bergner  
> wrote:>> gcc/
>>  PR middle-end/80707
>>  * tree-cfg.c: Remove cfg edges of unreachable case statements.
>>
>> gcc/testsuite/
>>  * g++.dg/pr80707.C: New test.
>>
>> Is this ok for trunk?
> 
> OK.


Thanks, committed as revision 247984.

Peter




Add missing equivalence unwinding markers

2017-05-12 Thread Jeff Law



This was just something I noted while poking around.  I forgot to push 
the table unwinding markers in the newly added VRP dominator walk for 
threading.


This never generates incorrect code, but can result in failing to catch 
some jump threads as seen by the change in ssa-dom-thread-4.c.


Bootstrapped and regression tested on x86_64.  I would not expect this 
to impact the expected results for this test on logical_op_short_circuit 
targets as they were already fully threading during VRP1.


Installing on the trunk.

Jeff

commit ea442cbcd5b2ac7a656d96052c19290e113ee335
Author: Jeff Law 
Date:   Fri May 12 11:30:58 2017 -0600

* tree-vrp.c (vrp_dom_walker::before_dom_childern): Push unwinding
markers.

* g++.dg/tree-ssa/ssa-dom-thread-4.c: Update expected output.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 197c75b1ace..6cd066691b3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2017-05-12  Jeff Law  
+
+   * tree-vrp.c (vrp_dom_walker::before_dom_childern): Push unwinding
+   markers.
+
 2017-05-12  Peter Bergner  
 
PR middle-end/80707
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 28664c78cdb..d3797eb21b4 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2017-05-12  Jeff Law  
+
+   * g++.dg/tree-ssa/ssa-dom-thread-4.c: Update expected output.
+
 2017-05-12  Peter Bergner  
 
PR middle-end/80707
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
index ed76e8119ad..e13eb8673dd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
@@ -57,12 +57,8 @@ bitmap_ior_and_compl (bitmap dst, const_bitmap a, 
const_bitmap b,
we should thread all three, but due to a bug in the threading
code we missed the edge when the first conditional is false
(b_elt is zero, which means the second conditional is always
-   zero.  
-
-   The first two are caught by VRP1, the last is caught by DOM
-   along with another jump thread.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" { target { ! 
logical_op_short_circuit } } } } */
-/* { dg-final { scan-tree-dump-times "Threaded" 2 "dom2" { target { ! 
logical_op_short_circuit } } } } */
+   zero.  VRP1 catches all three.  */
+/* { dg-final { scan-tree-dump-times "Threaded" 3 "vrp1" { target { ! 
logical_op_short_circuit } } } } */
 
 /* On targets that define LOGICAL_OP_NON_SHORT_CIRCUIT to 0, we split both
"a_elt || b_elt" and "b_elt && kill_elt" into two conditions each,
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 0db8a3c3969..b8cb887cb39 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -10540,6 +10540,8 @@ vrp_dom_walker::before_dom_children (basic_block bb)
 {
   gimple_stmt_iterator gsi;
 
+  m_avail_exprs_stack->push_marker ();
+  m_const_and_copies->push_marker ();
   for (gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 {
   gimple *stmt = gsi_stmt (gsi);


Re: [PATCH 2/N] Add dump_flags_type for handling of suboptions.

2017-05-12 Thread Martin Sebor

On 05/05/2017 04:44 AM, Martin Liška wrote:

Hi.

This one is more interesting as it implements hierarchical option parsing
and as a first step I implemented that for optgroup suboptions.


I haven't gone through the rest of the patches so I could be
missing some context.  But I have a few observations about and
suggestions for the new dump_option_node (and to a smaller extent,
dump_flags_type) class template. (Feel free to disregard whatever
doesn't make sense.)

First, since dump_option_node's public member functions (including
the only ctor) are defined in dumpfile.c it seems that the template
definition doesn't really need to be provided in the header. If
that's correct, it would reduce coupling and might avoid bloat to
only declare the bare minimum needed to use the type in other .c
files, and define the rest in the one .c file where the complete
type is actually needed.

As far as I can see, there is just one instantiation of the template
in the patch, in dumpfile.c:

+  typedef dump_option_node node;

If there are no other existing instantiations and no others are
anticipated, it could even be a plain class (further reducing
complexity and bloat).

Finally, the template ctor takes a const char* and stores it
in a member, and the implicit copy ctor and assignment operator
copy the underlying vec class.  That means that the string
argument must outlive the constructed object, which is not
typically expected.  IIUC, vec is not safely copy-assignable
or copy-constructible (i.e., has undefined behavior).  At
a minimum, it would be appropriate to document these
constraints.  Removing them or preventing copy construction
and assignment to avoid getting bit by it would be even better.

For dump_flags_type, none of the members need to be explicitly
declared inline or mention the  template argument list.
operator&(dump_flags_type) should be declared const.  Compound
assignment operators should return a reference to *this  (to
behave as closely to the native operators).  The exclusive OR
operator and compound assignment are missing and should probably
be provided for completeness, even if they aren't needed.  The
m_mask member should probably be private for better encapsulation
(and operator uint64_t() const provided to provide implicit
conversion to that type, or perhaps one converting to E's
underlying type; that should also obviate the need for operator
bool).

Martin


Re: [PATCH] Add sequence check to leaf_function_p

2017-05-12 Thread Jeff Law

On 05/12/2017 07:30 AM, Alexander Monakov wrote:

On Fri, 12 May 2017, Wilco Dijkstra wrote:


This is a followup from: 
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02916.html

Add an assert to leaf_function_p to ensure it is not called from a
prolog or epilog sequence (which would incorrectly return true in a
non-leaf function).


As I understand, we need to ensure that get_insns call retrieves the topmost
sequence corresponding to the function body, not any current subsequence that
could have been started via start_sequence.  Therefore the 'prolog or epilog'
part is a bit misleading, we could be in a subsequence for other reasons, and
we need to reject those as well.  So, ...
Right.  It's not really a prologue/epilogue issue, but anytime we call 
leaf_function_p and we're not at the topmost sequence there's probably a 
bug.


jeff


Re: [PATCH] Add sequence check to leaf_function_p

2017-05-12 Thread Jeff Law

On 05/12/2017 10:29 AM, Wilco Dijkstra wrote:

Alexander Monakov wrote:

As I understand, we need to ensure that get_insns call retrieves the topmost
sequence corresponding to the function body, not any current subsequence that
could have been started via start_sequence.  Therefore the 'prolog or epilog'
part is a bit misleading, we could be in a subsequence for other reasons, and
we need to reject those as well.  So, ...


It's most likely that if this assert triggers, it is from a prolog or epilog 
sequence.


... can the comment please be reworded to match the code, if it's necessary to
have a comment here at all?  E.g. "Ensure we walk the entire function body after
the following get_insns call".


I've changed to to "Ensure we walk the entire function body."

Wilco

2017-05-11  Wilco Dijkstra  

* final.c (leaf_function_p): Check we are not in a sequence.

OK.
jeff


PR78972, 80283: Extend TER with scheduling

2017-05-12 Thread Bernd Schmidt
If you look at certain testcases like the one for PR78972, you'll find 
that the code generated by TER is maximally pessimal in terms of 
register pressure: we can generate a large number of intermediate 
results, and defer all the statements that use them up.


Another observation one can make is that doing TER doesn't actually buy 
us anything for a large subset of the values it finds: only a handful of 
places in the expand phase actually make use of the information. In 
cases where we know we aren't going to be making use of it, we could 
move expressions freely without doing TER-style substitution.


This patch uses the information collected by TER about the moveability 
of statements and performs a mini scheduling pass with the aim of 
reducing register pressure. The heuristic is fairly simple: something 
that consumes more values than it produces is preferred. This could be 
tuned further, but it already produces pretty good results: for the 
78972 testcase, the stack size is reduced from 2448 bytes to 288, and 
for PR80283, the stackframe of 496 bytes vanishes with the pass enabled.


In terms of benchmarks I've run SPEC a few times, and the last set of 
results showed not much of a change. Getting reproducible results has 
been tricky but all runs I made have been within 0%-1% improvement.


In this patch, the changed behaviour is gated with a -fschedule-ter 
option which is off by default; with that default it bootstraps and 
tests without regressions. The compiler also bootstraps with the option 
enabled, in that case there are some optimization issues. I'll address 
some of them with two followup patches, the remaining failures are:

 * a handful of guality/PR43077.c failures
   Debug insn generation is somewhat changed, and the peephole2 pass
   drops one of them on the floor.
 * three target/i386/bmi-* tests fail. These expect the combiner to
   build certain instruction patterns, and that turns out to be a
   little fragile. It would be nice to be able to use match.pd to
   produce target-specific patterns during expand.

Thoughts? Ok to apply?


Bernd
	PR middle-end/80283
	PR tree-optimization/78972
	* cfgexpand.c (should_forward_stmt_p, decide_schedule_stmt,
	resolve_deps, rank_ready_insns, add_dep, set_bb_uids, set_prios,
	floating_point_op_p, simple_schedule, expand_one_gimple_stmt):
	New static functions.
	(struct gimple_dep, struct gimple_sched_state): New structs.
	(sched_map): New static variable.
	(expand_gimple_basic_block): Use expand_one_gimple_state, and
	use the results from simple_schedule if flag_schedule_ter.
	* common.opt (fschedule-ter): New option.
	* toplev.c (process_options): Set flag_tree_ter if flag_schedule_ter.
	* doc/invoke.texi (Optimization Options): Add -fschedule-ter.
	(-fschedule-ter): Document.

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 66af699..e1b4898 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5337,6 +5337,345 @@ expand_debug_locations (void)
   flag_strict_aliasing = save_strict_alias;
 }
 
+/* Determine whether TER decided that a statment should be forwarded.  */
+
+static bool
+should_forward_stmt_p (gimple *stmt)
+{
+  if (!SA.values)
+return false;
+
+  def_operand_p def_p;
+  def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF);
+  if (def_p == NULL)
+return false;
+
+  /* Forward this stmt if it is in the list of
+ replaceable expressions.  */
+  if (bitmap_bit_p (SA.values,
+		SSA_NAME_VERSION (DEF_FROM_PTR (def_p
+return true;
+
+  return false;
+}
+
+/* Decide whether STMT should be scheduled, or left as a TER replacement.
+   Clear it as a replacement if we decide to schedule it.  */
+
+static bool
+decide_schedule_stmt (gimple *stmt)
+{
+  if (!SA.values)
+return false;
+
+  if (is_gimple_debug (stmt))
+return true;
+
+  def_operand_p def_p;
+  def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF);
+
+  if (def_p == NULL)
+return false;
+
+  tree def = DEF_FROM_PTR (def_p);
+  use_operand_p use_p;
+  gimple *use_stmt;
+  if (!single_imm_use (def, &use_p, &use_stmt))
+return false;
+
+  /* There are only a few cases where expansion can actually make use of
+ replaceable SSA names.  Since TER is pessimal for scheduling in
+ general, we try to limit what we do to cases where we know it is
+ beneficial, and schedule insns for the other cases.  */
+  tree_code def_c = ERROR_MARK;
+  if (gimple_code (stmt) == GIMPLE_ASSIGN)
+def_c = gimple_assign_rhs_code (stmt);
+
+  if (gimple_code (use_stmt) == GIMPLE_ASSIGN)
+{
+  tree_code c = gimple_assign_rhs_code (use_stmt);
+  if (TREE_CODE_CLASS (c) != tcc_comparison
+	  && c != FMA_EXPR
+	  && c != SSA_NAME
+	  && c != MEM_REF
+	  && c != TARGET_MEM_REF
+	  && def_c != VIEW_CONVERT_EXPR)
+	{
+	  if (bitmap_clear_bit (SA.values, SSA_NAME_VERSION (def)))
+	return true;
+	}
+}
+  return false;
+}
+
+struct gimple_dep;
+
+/* Record state for a single statement during simple_sched.  */
+struct gimple_sched_state
+{
+  gi

Re: [PATCH] Limit perf data buffer during profiling

2017-05-12 Thread Jeff Law

On 05/12/2017 03:38 AM, Andi Kleen wrote:

From: Andi Kleen 

With high -j parallelism the autofdo tests can randomly fail.
autofdo uses Linux perf to record profiling data.
Linux perf uses a locked perf buffer. By default it has
around 516k buffer per uid (/proc/sys/kernel/perf_event_mlock_kb).

An individual perf record tries to grab the full 516k,
which makes parallel perf record fail.

This patch limits the perf buffer for individual perf record to 8k.
With the default settings this allows a parallelism of the test
cases of 16, which is hopefully good enough
I'm routinely building & testing with -j56 these days.  We'll see if 
it's sufficient.




(if not would need to add some kind of semaphore, or ask
the user to increase the limit as root)

I also removed an unneeded -o perf.data option

Thanks to Marcin to finally spotting the problem.

Passes bootstrap and test on x86_64-linux. Ok for trunk?

gcc/testsuite/:

2017-05-12  Andi Kleen  

PR testsuite/77684
* lib/target-supports.exp (profopt-perf-wrapper): Remove
-p perf.data option. Add -m8 option.

OK.  But be aware that it may not ultimately be sufficient.

jeff


Re: [C++ PATCH, RFC] Implement new C++ intrinsics __is_assignable and __is_constructible.

2017-05-12 Thread Ville Voutilainen
On 12 May 2017 at 14:15, Ville Voutilainen  wrote:
> On 12 May 2017 at 14:06, Daniel Krügler  wrote:
>> Your description sounds remotely similar to me to the current problem
>> of __is_trivially_constructible intrinsic, which seems to instantiate
>> the copy constructor definition albeit it (IMO) shouldn't:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80654
>>
>> Could there be a similar cause?
>
>
> Seems quite plausible to me. I would be happy to fix that bug in the
> same go, but I'm a bit
> lost as to what exactly causes the problem. constructible_expr in method.c 
> does
> build_special_member_call for the constructor and the destructor, so
> perhaps there
> are some flags that could make it behave.

Well, now that Jason pointed out that cp_unevaluated_operand is the
trick, here's a new
patch that fixes that bug and passes the full testsuite on Linux-x64.

2017-05-12  Ville Voutilainen  

c-family/

Implement new C++ intrinsics __is_assignable and __is_constructible.
* c-common.c (__is_assignable, __is_constructible): New.
* c-common.h (RID_IS_ASSIGNABLE, RID_IS_CONSTRUCTIBLE): Likewise.

cp/

PR c++/80654
PR c++/80682
Implement new C++ intrinsics __is_assignable and __is_constructible.
* cp-tree.h (CPTK_IS_ASSIGNABLE, CPTK_IS_CONSTRUCTIBLE): New.
(is_xible): New.
* cxx-pretty-print.c (pp_cxx_trait_expression): Handle
CPTK_IS_ASSIGNABLE and CPTK_IS_CONSTRUCTIBLE.
* method.c (constructible_expr): Set cp_unevaluated.
(is_xible_helper): New.
(is_trivially_xible): Adjust.
(is_xible): New.
* parser.c (cp_parser_primary_expression): Handle
RID_IS_ASSIGNABLE and RID_IS_CONSTRUCTIBLE.
(cp_parser_trait_expr): Likewise.
* semantics.c (trait_expr_value): Handle
CPTK_IS_ASSIGNABLE and CPTK_IS_CONSTRUCTIBLE.

testsuite/

* g++.dg/ext/80654.C: New.

libstdc++-v3/

Implement new C++ intrinsics __is_assignable and __is_constructible.
* include/std/type_traits (__do_is_static_castable_impl): Remove.
(__is_static_castable_impl, __is_static_castable_safe): Likewise.
(__is_static_castable, __do_is_direct_constructible_impl): Likewise.
(__is_direct_constructible_impl): Likewise.
(__is_direct_constructible_new_safe): Likewise.
(__is_base_to_derived_ref, __is_lvalue_to_rvalue_ref): Likewise.
(__is_direct_constructible_ref_cast): Likewise.
(__is_direct_constructible_new, __is_direct_constructible): Likewise.
(__do_is_nary_constructible_impl): Likewise.
(__is_nary_constructible_impl, __is_nary_constructible): Likewise.
(__is_constructible_impl): Likewise.
(is_constructible): Call the intrinsic.
(__is_assignable_helper): Remove.
(is_assignable): Call the intrinsic.
(is_trivially_constructible): Likewise.
(is_trivially_assignable): Likewise.
(testsuite/20_util/declval/requirements/1_neg.cc): Adjust.
(testsuite/20_util/make_signed/requirements/typedefs_neg.cc): Likewise.
(testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc):
Likewise.
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index ad686d2..369e112 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -514,6 +514,8 @@ const struct c_common_resword c_common_reswords[] =
   { "volatile",RID_VOLATILE,   0 },
   { "wchar_t", RID_WCHAR,  D_CXXONLY },
   { "while",   RID_WHILE,  0 },
+  { "__is_assignable", RID_IS_ASSIGNABLE, D_CXXONLY },
+  { "__is_constructible", RID_IS_CONSTRUCTIBLE, D_CXXONLY },
 
   /* C++ transactional memory.  */
   { "synchronized",RID_SYNCHRONIZED, D_CXX_OBJC | D_TRANSMEM },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 9e3982d..a986056 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -172,6 +172,7 @@ enum rid
   RID_IS_TRIVIALLY_ASSIGNABLE, RID_IS_TRIVIALLY_CONSTRUCTIBLE,
   RID_IS_TRIVIALLY_COPYABLE,
   RID_IS_UNION,RID_UNDERLYING_TYPE,
+  RID_IS_ASSIGNABLE,   RID_IS_CONSTRUCTIBLE,
 
   /* C++11 */
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 09b1364..154ba3c 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -927,7 +927,9 @@ enum cp_trait_kind
   CPTK_IS_TRIVIALLY_CONSTRUCTIBLE,
   CPTK_IS_TRIVIALLY_COPYABLE,
   CPTK_IS_UNION,
-  CPTK_UNDERLYING_TYPE
+  CPTK_UNDERLYING_TYPE,
+  CPTK_IS_ASSIGNABLE,
+  CPTK_IS_CONSTRUCTIBLE
 };
 
 /* The types that we are processing.  */
@@ -6112,6 +6114,7 @@ extern void use_thunk (tree, 
bool);
 extern bool trivial_fn_p   (tree);
 extern tree forward_parm   (tree);
 extern bool is_trivially_xible (enum tree_code, tree, tree);
+extern bool is_xible   (enum tree_code, tree, tree);
 extern tree get_defaulted_eh_spec  (tree);
 extern tree unevaluated_noexcept_spec  (void);

Re: [RFC PATCH, i386]: Enable post-reload compare elimination pass

2017-05-12 Thread Jeff Law

On 05/10/2017 01:05 PM, Uros Bizjak wrote:

On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak  wrote:

On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek  wrote:

On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote:

Attached patch enables post-reload compare elimination pass by
providing expected patterns (duplicates of existing patterns with
setters of reg and flags switched in the parallel) for flag setting
arithmetic instructions.

The merge triggers more than 3000 times during the gcc bootstrap,
mostly in cases where intervening memory load or store prevents
combine from merging the arithmetic insn and the following compare.

Also, some recent linux x86_64 defconfig build results in ~200 merges,
removing ~200 test/cmp insns. Not much, but I think the results still
warrant the pass to be enabled.


Isn't the right fix instead to change the compare-elim.c pass to either
accept both reg vs. flags orderings in parallel, or both depending
on some target hook, or change it to the order i386.md and most other
major targets use and just fix up mn10300/rx (and aarch64?) to use the same
order?


Attached patch changes compare-elim.c order to what i386.md expects.

Thoughts?
I think with an appropriate change to the canonicalization rules in the 
manual this is fine.


I've got the visium, rx and mn103 patches handy to ensure they don't 
regress.  aarch64 doesn't seem to be affected either way but I didn't 
investigate why -- I expected it to improve with your change.


I'll write up a ChangeLog and commit the mn103, rx and visium changes 
after you commit the compare-elim & documentation bits.


jeff


Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget

2017-05-12 Thread Mike Stump
On May 8, 2017, at 7:49 AM, Steven Munroe  wrote:
> Of course as part of this process we will port as many of the
> corresponding DejaGnu tests from gcc/testsuite/gcc.target/i386/ to
> gcc/testsuite/gcc.target/powerpc/ as appropriate. So far the dg-do run
> tests only require minor source changes, mostly to the platform specific
> dg-* directives. A few dg-do compile tests are needed to insure we are
> getting the expected folding/Common subexpression elimination (CSE) to
> generate the optimum sequence for PowerPC.

If there is a way to share that seems reasonable and the x86 would like to 
share...

I'd let you and the x86 folks figure out what is best.

Re: [PATCH] correct length stpncpy handling in -Wformat-overflow (PR 80669)

2017-05-12 Thread Jeff Law

On 05/08/2017 11:37 AM, Martin Sebor wrote:

The -Wformat-overflow warning newly enhanced in GCC 8.0 to detect
reading past the end of the source sequence misinterprets the size
argument to stpncpy as a request to read that many bytes from the
source sequence, rather than the number of bytes to write.  Like
strncpy, the function never reads past the end of the source string.

This constraint is already handled correctly for strncpy by
the check_sizes function so the fix is to simply remove the wrong
length handling from expand_builtin_stpncpy and let check_sizes
do the right thing.

The attached patch passes bootstrap and regression test on x86_64.

Martin

gcc-80669.diff


PR middle-end/80669 - Bad -Wstringop-overflow warnings for stpncpy

gcc/ChangeLog:

PR middle-end/80669
* builtins.c (expand_builtin_stpncpy): Simplify.

gcc/testsuite/ChangeLog:

PR middle-end/80669
* gcc.dg/builtin-stpncpy.c: New test.

OK.
jeff


Re: [PATCH] handle sprintf(d, "%s", ...) in gimple-ssa-sprintf.c

2017-05-12 Thread Jeff Law

On 05/03/2017 08:46 AM, Martin Sebor wrote:

On 05/03/2017 04:20 AM, Richard Biener wrote:

On Tue, May 2, 2017 at 4:41 PM, Martin Sebor  wrote:

FWIW, my fix for bug 79062 is only partial (it gets the pass
to run but the warnings are still not issued).  I don't quite
understand what prevents the warning flag(s) from getting set
when -flto is used.  This seems to be a bigger problem than
just the sprintf pass not doing something just right.



I've never dug deeply in the LTO stuff, but I believe we stream the
compiler
flags, so it could be something there.



We do.


Alternately you might be running into a case where in LTO mode we
recreate
base types.  Look for a type equality tester that goes beyond just
testing
pointer equality.

ie, in LTO I think we'll create a type based on the streamed data, 
but I

also think we'll create various basic types.  Thus in LTO mode pointer
equality may not be sufficient.



We make sure that for most basic types we end up re-using them where
possible.
char_type_node is an example where that generally doesn't work because
it's
value depends on a command-line flag.



That answers the first part of the question of why the sprintf
pass wouldn't run (or do anything) with -flto.   With it fixed
(as in fold-const.c or tree-ssa-strlen.c as you suggested in
bug 79602) it runs and the optimization does its job, but no
warnings are issued.  The wan_foo_flags for warnings that are
enabled implicitly (e.g., by -Wall or -Wextra on the command
line) are clear.  There seem to be dependencies between warnings
in c.opt that ignore LTO (as a language), but even with those
corrected (i.e., with LTO added as a language to -Wformat and
-Wall) the flags are still clear when LTO runs.  Does that ring
any bells for you?


You can look at the lto_opts section (it's just a string) and see
that we seem to fail to pass through -Wall (or any warning option
I tried).  This is because

  /* Also drop all options that are handled by the driver as well,
 which includes things like -o and -v or -fhelp for example.
 We do not need those.  The only exception is -foffload 
option, if we
 write it in offload_lto section.  Also drop all diagnostic 
options.  */

  if ((cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING))
  && (!lto_stream_offload_p || option->opt_index != 
OPT_foffload_))

continue;

which means you have to explicitely enable diagnostics you want at
link time at the moment.

If you want to change that you have to do some changes to lto-wrapper.c
as for example only pass through warning options that are set on all
input files (warning options are not kept per function).


Great, thanks for the pointer!  I might tackle that at some point.

Jeff, given that we now understand why the warnings are suppressed
(i.e., the root cause of the rest of bug 79062), are you okay with
treating that independently of this PR (bug 77671) and committing
this patch (including the two-line fix to enable the sprintf return
value optimization with LTO)?
Yea.  I think you needed to add a function comment for is_call_safe and 
try_simplify_call.  With those additions this is OK.


Sorry for the delay.

jeff



Re: [PATCH] decl lang hooks

2017-05-12 Thread Mike Stump
On May 11, 2017, at 12:16 PM, Nathan Sidwell  wrote:
> 
> On 05/11/2017 01:50 PM, Rainer Orth wrote:
>> Hi Nathan,
>>> On 05/08/2017 05:34 PM, Joseph Myers wrote:
> 
>> ^~
>> make: *** [Makefile:1102: objcp/objcp-lang.o] Error 1
> 
> Having learnt how to spell --enable-languages properly today, here's the fix, 
> which I have applied.\

Thanks for your work.

Re: [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo

2017-05-12 Thread Jeff Law

On 05/02/2017 01:08 PM, David Malcolm wrote:

Currently the C/C++ frontends discard comments when parsing.
It's possible to set up libcpp to capture comments as tokens,
by setting CPP_OPTION (pfile, discard_comments) to false),
and this can be enabled using the -C command line option (see
also -CC), but c-family/c-lex.c then discards any CPP_COMMENT
tokens it sees, so they're not seen by the frontend parser.

The following patch adds an (optional) callback to libcpp
for handling comments, giving the comment content, and the
location it was seen at.  This approach allows arbitrary
logic to be wired up to comments, and avoids having to
copy the comment content to a new buffer (which the CPP_COMMENT
approach does).

This could be used by plugins to chain up on the callback
e.g. to parse specially-formatted comments, e.g. for
documentation generation, or e.g. for GObject introspection
annotations [1].

As a proof of concept, the patch uses this to add a spellchecker
for comments.  It uses the Enchant meta-library:
https://abiword.github.io/enchant/
(essentially a wrapper around 8 different spellchecking libraries).
I didn't bother with the autotool detection for enchant, and
just hacked it in for now.

Example output:

test.c:3:46: warning: spellcheck_word: "evaulate"
 When NONCONST_PRED is false the code will evaulate to constant and
   ^~~~
test.c:3:46: note: suggestion: "evaluate"
 When NONCONST_PRED is false the code will evaulate to constant and
   ^~~~
   evaluate
test.c:3:46: note: suggestion: "ululate"
 When NONCONST_PRED is false the code will evaulate to constant and
   ^~~~
   ululate
test.c:3:46: note: suggestion: "elevate"
 When NONCONST_PRED is false the code will evaulate to constant and
   ^~~~
   elevate

License-wise, Enchant is LGPL 2.1 "or (at your option) any
later version." with a special exception to allow non-LGPL
spellchecking providers (e.g. to allow linking against an
OS-provided spellchecker).

Various FIXMEs are present (e.g. hardcoded "en_US" for the
language to spellcheck against).
Also, the spellchecker has a lot of false positives e.g.
it doesn't grok URLs (and thus complains when it seens them);
similar for DejaGnu directives etc.

Does enchant seem like a reasonable dependency for the compiler?
(it pulls in libpthread.so.0, libglib-2.0.so.0, libgmodule-2.0.so.0).
Or would this be better pursued as a plugin?  (if so, I'd
prefer the plugin to live in the source tree as an example,
rather than out-of-tree).

Unrelated to spellchecking, I also added two new options:
-Wfixme and -Wtodo, for warning when comments containing
"FIXME" or "TODO" are encountered.
I use such comments a lot during development.  I thought some
people might want a warning about them (I tend to just use grep
though).  [TODO: document these in invoke.texi, add test cases]

Thoughts?  Does any of this sound useful?

[not yet bootstrapped; as noted above, I haven't yet done
the autoconf stuff for handling Enchant]

[1] https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations

gcc/ChangeLog:
* Makefile.in (LIBS): Hack in -lenchant for now.
(OBJS): Add spellcheck-enchant.o.
* common.opt (Wfixme): New option.
(Wtodo): New option.
* spellcheck-enchant.c: New file.
* spellcheck-enchant.h: New file.

gcc/c-family/ChangeLog:
* c-lex.c: Include spellcheck-enchant.h.
(init_c_lex): Wire up spellcheck_enchant_check_comment to the
comment callback.
* c-opts.c: Include spellcheck-enchant.h.
(c_common_post_options): Call spellcheck_enchant_init.
(c_common_finish): Call spellcheck_enchant_finish.

libcpp/ChangeLog:
* include/cpplib.h (struct cpp_callbacks): Add "comment"
callback.
* lex.c (_cpp_lex_direct): Call the comment callback if non-NULL.
enchant seems a bit out of the sweet spot, particular just to catch 
mis-spellings in comments.  But it might make an interesting plugin.


IIRC from our meeting earlier this week, you had another use case that 
might have been more compelling, but I can't remember what it was.


I do like the ability to at least capture the comments better and while 
we don't have a strong need for that capability now, we might in the future.


Jeff


[PATCH, i386]: Fix PR80723, ignore cost of adding/subtracting a carry flag for ADC/SBB insns

2017-05-12 Thread Uros Bizjak
Hello!

Attached patch adjusts RTX costs to ignore addition or subtraction of
a carry flag for ADC or SBB instruction. These operations are
essentially free.

2017-05-12  Uros Bizjak  

PR target/80723
* config/i386/i386.c (ix86_rtx_cost) [case PLUS]: Ignore the
cost of adding a carry flag for ADC instruction.
[case MINUS]: Ignore the cost of subtracting a carry flag
for SBB instruction.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}, where
the patch fixes gcc.target/i386/cadd.c FAIL.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 247980)
+++ config/i386/i386.c  (working copy)
@@ -40913,9 +40913,16 @@ ix86_rtx_costs (rtx x, machine_mode mode, int oute
}
  else if (GET_CODE (XEXP (x, 0)) == PLUS)
{
- *total = cost->lea;
- *total += rtx_cost (XEXP (XEXP (x, 0), 0), mode,
- outer_code, opno, speed);
+ /* Add with carry, ignore the cost of adding a carry flag.  */
+ if (ix86_carry_flag_operator (XEXP (XEXP (x, 0), 0), mode))
+   *total = cost->add;
+ else
+   {
+ *total = cost->lea;
+ *total += rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+ outer_code, opno, speed);
+   }
+
  *total += rtx_cost (XEXP (XEXP (x, 0), 1), mode,
  outer_code, opno, speed);
  *total += rtx_cost (XEXP (x, 1), mode,
@@ -40926,6 +40933,20 @@ ix86_rtx_costs (rtx x, machine_mode mode, int oute
   /* FALLTHRU */
 
 case MINUS:
+  /* Subtract with borrow, ignore the cost of subtracting a carry flag.  */
+  if (GET_MODE_CLASS (mode) == MODE_INT
+ && GET_MODE_SIZE (mode) <= UNITS_PER_WORD
+ && GET_CODE (XEXP (x, 0)) == MINUS
+ && ix86_carry_flag_operator (XEXP (XEXP (x, 0), 1), mode))
+   {
+ *total = cost->add;
+ *total += rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+ outer_code, opno, speed);
+ *total += rtx_cost (XEXP (x, 1), mode,
+ outer_code, opno, speed);
+ return true;
+   }
+
   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
{
  /* ??? SSE cost should be used here.  */


Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget

2017-05-12 Thread Steven Munroe
On Fri, 2017-05-12 at 11:38 -0700, Mike Stump wrote:
> On May 8, 2017, at 7:49 AM, Steven Munroe  wrote:
> > Of course as part of this process we will port as many of the
> > corresponding DejaGnu tests from gcc/testsuite/gcc.target/i386/ to
> > gcc/testsuite/gcc.target/powerpc/ as appropriate. So far the dg-do run
> > tests only require minor source changes, mostly to the platform specific
> > dg-* directives. A few dg-do compile tests are needed to insure we are
> > getting the expected folding/Common subexpression elimination (CSE) to
> > generate the optimum sequence for PowerPC.
> 
> If there is a way to share that seems reasonable and the x86 would like to 
> share...
> 
> I'd let you and the x86 folks figure out what is best.

It too early to tell but I have no objections to discussing options.

Are you looking to share source files? This seems like low value because
the files tend to be small and the only difference is the dg-*
directives. I don't know enough about the DejaGnu macros to even guess
at what this might entail.

So far the sharing it is mostly one way (./i386/ -> ./powerpc/) but if I
find cases that requires a new dg test and might also apply to ./i386/ I
be willing to share that with X86.




Re: [RFC PATCH, i386]: Enable post-reload compare elimination pass

2017-05-12 Thread Uros Bizjak
On Fri, May 12, 2017 at 8:34 PM, Jeff Law  wrote:
> On 05/10/2017 01:05 PM, Uros Bizjak wrote:
>>
>> On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak  wrote:
>>>
>>> On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek  wrote:

 On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote:
>
> Attached patch enables post-reload compare elimination pass by
> providing expected patterns (duplicates of existing patterns with
> setters of reg and flags switched in the parallel) for flag setting
> arithmetic instructions.
>
> The merge triggers more than 3000 times during the gcc bootstrap,
> mostly in cases where intervening memory load or store prevents
> combine from merging the arithmetic insn and the following compare.
>
> Also, some recent linux x86_64 defconfig build results in ~200 merges,
> removing ~200 test/cmp insns. Not much, but I think the results still
> warrant the pass to be enabled.


 Isn't the right fix instead to change the compare-elim.c pass to either
 accept both reg vs. flags orderings in parallel, or both depending
 on some target hook, or change it to the order i386.md and most other
 major targets use and just fix up mn10300/rx (and aarch64?) to use the
 same
 order?
>>
>>
>> Attached patch changes compare-elim.c order to what i386.md expects.
>>
>> Thoughts?
>
> I think with an appropriate change to the canonicalization rules in the
> manual this is fine.
>
> I've got the visium, rx and mn103 patches handy to ensure they don't
> regress.  aarch64 doesn't seem to be affected either way but I didn't
> investigate why -- I expected it to improve with your change.
>
> I'll write up a ChangeLog and commit the mn103, rx and visium changes after
> you commit the compare-elim & documentation bits.

Thanks!

I went ahead and commit the patch without documentation change (which
I plan to submit in a separate patch ASAP), with the following
ChangeLog:

2017-05-12  Uros Bizjak  

* compare-elim.c (try_eliminate_compare): Canonicalize
operation with embedded compare to
[(set (reg:CCM) (compare:CCM (operation) (immediate)))
 (set (reg) (operation)].

* config/i386/i386.c (TARGET_FLAGS_REGNUM): New define.

Re-bootstrapped and re-tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 247978)
+++ config/i386/i386.c  (working copy)
@@ -51805,6 +51826,8 @@ ix86_run_selftests (void)
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST ix86_address_cost
 
+#undef TARGET_FLAGS_REGNUM
+#define TARGET_FLAGS_REGNUM FLAGS_REG
 #undef TARGET_FIXED_CONDITION_CODE_REGS
 #define TARGET_FIXED_CONDITION_CODE_REGS ix86_fixed_condition_code_regs
 #undef TARGET_CC_MODES_COMPATIBLE
Index: compare-elim.c
===
--- compare-elim.c  (revision 247978)
+++ compare-elim.c  (working copy)
@@ -45,9 +45,9 @@ along with GCC; see the file COPYING3.  If not see
(3) If an insn of form (2) can usefully set the flags, there is
another pattern of the form
 
-   [(set (reg) (operation)
-(set (reg:CCM) (compare:CCM (operation) (immediate)))]
-
+   [(set (reg:CCM) (compare:CCM (operation) (immediate)))
+(set (reg) (operation)]
+
The mode CCM will be chosen as if by SELECT_CC_MODE.
 
Note that unlike NOTICE_UPDATE_CC, we do not handle memory operands.
@@ -582,7 +582,7 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, r
 static bool
 try_eliminate_compare (struct comparison *cmp)
 {
-  rtx x, flags, in_a, in_b, cmp_src;
+  rtx flags, in_a, in_b, cmp_src;
 
   /* We must have found an interesting "clobber" preceding the compare.  */
   if (cmp->prev_clobber == NULL)
@@ -628,7 +628,8 @@ try_eliminate_compare (struct comparison *cmp)
  Validate that PREV_CLOBBER itself does in fact refer to IN_A.  Do
  recall that we've already validated the shape of PREV_CLOBBER.  */
   rtx_insn *insn = cmp->prev_clobber;
-  x = XVECEXP (PATTERN (insn), 0, 0);
+
+  rtx x = XVECEXP (PATTERN (insn), 0, 0);
   if (rtx_equal_p (SET_DEST (x), in_a))
 cmp_src = SET_SRC (x);
 
@@ -666,13 +667,24 @@ try_eliminate_compare (struct comparison *cmp)
 flags = gen_rtx_REG (cmp->orig_mode, targetm.flags_regnum);
 
   /* Generate a new comparison for installation in the setter.  */
-  x = copy_rtx (cmp_src);
-  x = gen_rtx_COMPARE (GET_MODE (flags), x, in_b);
-  x = gen_rtx_SET (flags, x);
+  rtx y = copy_rtx (cmp_src);
+  y = gen_rtx_COMPARE (GET_MODE (flags), y, in_b);
+  y = gen_rtx_SET (flags, y);
 
+  /* Canonicalize instruction to:
+ [(set (reg:CCM) (compare:CCM (operation) (immediate)))
+  (set (reg) (operation)]  */
+
+  rtvec v = rtvec_alloc (2);
+  RTVEC_ELT (v, 0) = y;
+  RTVEC_ELT (v, 1) = x;
+  
+  rtx pat = gen_rtx_PARALLEL (VOIDmode, v);
+  
   /* Succeed if the n

[PATCH] [ARC] Recognise add_n and sub_n in combine again

2017-05-12 Thread Graham Markall
Since the combine pass canonicalises shift-add insns using plus and
ashift (as opposed to plus and mult which it previously used to do), it
no longer creates *add_n or *sub_n insns, as the patterns match plus and
mult only. The outcome of this is that some opportunities to generate
add{1,2,3} and sub{1,2,3} instructions are missed.

This change adds additional *add_n and *sub_n insns that match the
plus-ashift pattern. The original *add_n and *sub_n insns are still left
in, as they are sometimes generated later on by constant propagation.
The idea of adding these insns is modelled on the changes in:

  https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01882.html

which addresses a similar issue for the PA target.

For the small test cases that are added, even if the combine pass misses
the opportunity to generate addN or subN, constant propagation manages
to do so, so the rtl of the combine pass is checked.

gcc/ChangeLog:

* config/arc/arc.c (arc_print_operand): Handle constant operands.
(arc_rtx_costs): Add costs for new patterns.
* config/arc/arc.md: Additional *add_n and *sub_n patterns.
* config/arc/predicates.md: Add _1_2_3_operand predicate.

gcc/testsuite/ChangeLog:

* gcc.target/arc/add_n-combine.c: New.
* gcc.target/arc/sub_n-combine.c: New.
---
 gcc/ChangeLog|  7 
 gcc/config/arc/arc.c | 20 +---
 gcc/config/arc/arc.md| 26 +++
 gcc/config/arc/predicates.md |  5 +++
 gcc/testsuite/ChangeLog  |  5 +++
 gcc/testsuite/gcc.target/arc/add_n-combine.c | 48 
 gcc/testsuite/gcc.target/arc/sub_n-combine.c | 21 
 7 files changed, 128 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/add_n-combine.c
 create mode 100644 gcc/testsuite/gcc.target/arc/sub_n-combine.c

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 91c28e7..42730d5 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -3483,6 +3483,14 @@ arc_print_operand (FILE *file, rtx x, int code)
 
   return;
 
+case 'c':
+  if (GET_CODE (x) == CONST_INT)
+fprintf (file, "%d", INTVAL (x) );
+  else
+output_operand_lossage ("invalid operands to %%c code");
+
+  return;
+
 case 'M':
   if (GET_CODE (x) == CONST_INT)
fprintf (file, "%d",exact_log2(~INTVAL (x)) );
@@ -4895,8 +4903,10 @@ arc_rtx_costs (rtx x, machine_mode mode, int outer_code,
*total = COSTS_N_INSNS (2);
   return false;
 case PLUS:
-  if (GET_CODE (XEXP (x, 0)) == MULT
- && _2_4_8_operand (XEXP (XEXP (x, 0), 1), VOIDmode))
+  if ((GET_CODE (XEXP (x, 0)) == ASHIFT
+  && _1_2_3_operand (XEXP (XEXP (x, 0), 1), VOIDmode))
+  || (GET_CODE (XEXP (x, 0)) == MULT
+  && _2_4_8_operand (XEXP (XEXP (x, 0), 1), VOIDmode)))
{
  *total += (rtx_cost (XEXP (x, 1), mode, PLUS, 0, speed)
 + rtx_cost (XEXP (XEXP (x, 0), 0), mode, PLUS, 1, speed));
@@ -4904,8 +4914,10 @@ arc_rtx_costs (rtx x, machine_mode mode, int outer_code,
}
   return false;
 case MINUS:
-  if (GET_CODE (XEXP (x, 1)) == MULT
- && _2_4_8_operand (XEXP (XEXP (x, 1), 1), VOIDmode))
+  if ((GET_CODE (XEXP (x, 1)) == ASHIFT
+  && _1_2_3_operand (XEXP (XEXP (x, 1), 1), VOIDmode))
+  || (GET_CODE (XEXP (x, 1)) == MULT
+  && _2_4_8_operand (XEXP (XEXP (x, 1), 1), VOIDmode)))
{
  *total += (rtx_cost (XEXP (x, 0), mode, PLUS, 0, speed)
 + rtx_cost (XEXP (XEXP (x, 1), 0), mode, PLUS, 1, speed));
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index edb983f..ec783a0 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -2995,6 +2995,19 @@
 
 (define_insn "*add_n"
   [(set (match_operand:SI 0 "dest_reg_operand" "=Rcqq,Rcw,W,W,w,w")
+   (plus:SI (ashift:SI (match_operand:SI 1 "register_operand" 
"Rcqq,c,c,c,c,c")
+   (match_operand:SI 2 "_1_2_3_operand" ""))
+(match_operand:SI 3 "nonmemory_operand" 
"0,0,c,?Cal,?c,??Cal")))]
+  ""
+  "add%c2%? %0,%3,%1%&"
+  [(set_attr "type" "shift")
+   (set_attr "length" "*,4,4,8,4,8")
+   (set_attr "predicable" "yes,yes,no,no,no,no")
+   (set_attr "cond" "canuse,canuse,nocond,nocond,nocond,nocond")
+   (set_attr "iscompact" "maybe,false,false,false,false,false")])
+
+(define_insn "*add_n"
+  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcqq,Rcw,W,W,w,w")
(plus:SI (mult:SI (match_operand:SI 1 "register_operand" 
"Rcqq,c,c,c,c,c")
  (match_operand:SI 2 "_2_4_8_operand" ""))
 (match_operand:SI 3 "nonmemory_operand" 
"0,0,c,?Cal,?c,??Cal")))]
@@ -3011,6 +3024,19 @@
 (define_insn "*sub_n"
   [(set (match_operand:SI 0 "dest_reg_operand" "=Rcw,w,w")
(minus:SI (match_operand:SI 1 "nonmemory_op

C PATCH to kill c_save_expr or towards delayed folding for the C FE

2017-05-12 Thread Marek Polacek
[ This patch depends on my previous patch that removes the fold call from
  save_expr. ]

In the effort of reducing early folding, we should avoid calling c_fully_fold
blithely, except when needed for e.g. initializers.  This is a teeny tiny step
down that path.  This patch does away with c_save_expr, which always fully
folded the expression.  Instead, I moved the SAVE_EXPR folding to c_fully_fold.
The new flag SAVE_EXPR_FOLDED_P ensures we only fold the contents once.  Also,
if the contents of the SAVE_EXPR are tree_invariant_p, we don't have to wrap
the expression with SAVE_EXPR.  I wonder if I should instead introduce a hash
map, much like C++; if we fold the contents of a SAVE_EXPR to an invariant, I
think we'll fold the expression multiple times, because in that case we cannot
check the SAVE_EXPR_FOLDED_P flag.

Comments/future directions very much appreciated.

Bootstrapped/regtested on x86_64-linux, ok for trunk once the other patch makes
it in?

2017-05-12  Marek Polacek  

* c-common.c (c_save_expr): Remove.
(c_common_truthvalue_conversion): Remove a call to c_save_expr.
* c-common.h (c_save_expr): Remove declaration.

* c-convert.c (convert): Replace c_save_expr with save_expr.  Don't
call c_fully_fold.
* c-decl.c (grokdeclarator): Replace c_save_expr with save_expr. 
* c-fold.c (c_fully_fold_internal): Handle SAVE_EXPR.
* c-parser.c (c_parser_declaration_or_fndef): Replace c_save_expr with
save_expr.
(c_parser_conditional_expression): Likewise.
* c-tree.h (SAVE_EXPR_FOLDED_P): Define.
* c-typeck.c (build_modify_expr): Replace c_save_expr with save_expr.
(process_init_element): Likewise.
(build_binary_op): Likewise.
(handle_omp_array_sections_1): Likewise.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index ad686d2..f606e94 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -3164,24 +3164,6 @@ c_wrap_maybe_const (tree expr, bool non_const)
   return expr;
 }
 
-/* Wrap a SAVE_EXPR around EXPR, if appropriate.  Like save_expr, but
-   for C folds the inside expression and wraps a C_MAYBE_CONST_EXPR
-   around the SAVE_EXPR if needed so that c_fully_fold does not need
-   to look inside SAVE_EXPRs.  */
-
-tree
-c_save_expr (tree expr)
-{
-  bool maybe_const = true;
-  if (c_dialect_cxx ())
-return save_expr (expr);
-  expr = c_fully_fold (expr, false, &maybe_const);
-  expr = save_expr (expr);
-  if (!maybe_const)
-expr = c_wrap_maybe_const (expr, true);
-  return expr;
-}
-
 /* Return whether EXPR is a declaration whose address can never be
NULL.  */
 
@@ -3436,7 +3418,7 @@ c_common_truthvalue_conversion (location_t location, tree 
expr)
 
   if (TREE_CODE (TREE_TYPE (expr)) == COMPLEX_TYPE)
 {
-  tree t = (in_late_binary_op ? save_expr (expr) : c_save_expr (expr));
+  tree t = save_expr (expr);
   expr = (build_binary_op
  (EXPR_LOCATION (expr),
   (TREE_SIDE_EFFECTS (expr)
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index 9e3982d..3981544 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -836,7 +836,6 @@ extern enum conversion_safety unsafe_conversion_p 
(location_t, tree, tree,
 extern bool decl_with_nonnull_addr_p (const_tree);
 extern tree c_fully_fold (tree, bool, bool *);
 extern tree c_wrap_maybe_const (tree, bool);
-extern tree c_save_expr (tree);
 extern tree c_common_truthvalue_conversion (location_t, tree);
 extern void c_apply_type_quals_to_decl (int, tree);
 extern tree c_sizeof_or_alignof_type (location_t, tree, bool, bool, int);
diff --git gcc/c/c-convert.c gcc/c/c-convert.c
index 163feff..a7764f0 100644
--- gcc/c/c-convert.c
+++ gcc/c/c-convert.c
@@ -111,13 +111,7 @@ convert (tree type, tree expr)
  && COMPLETE_TYPE_P (type)
  && do_ubsan_in_current_function ())
{
- if (in_late_binary_op)
-   expr = save_expr (expr);
- else
-   {
- expr = c_save_expr (expr);
- expr = c_fully_fold (expr, false, NULL);
-   }
+ expr = save_expr (expr);
  tree check = ubsan_instrument_float_cast (loc, type, expr);
  expr = fold_build1 (FIX_TRUNC_EXPR, type, expr);
  if (check == NULL_TREE)
@@ -146,8 +140,7 @@ convert (tree type, tree expr)
 
 case COMPLEX_TYPE:
   /* If converting from COMPLEX_TYPE to a different COMPLEX_TYPE
-and e is not COMPLEX_EXPR, convert_to_complex uses save_expr,
-but for the C FE c_save_expr needs to be called instead.  */
+and E is not COMPLEX_EXPR, convert_to_complex uses save_expr.  */
   if (TREE_CODE (TREE_TYPE (e)) == COMPLEX_TYPE)
{
  if (TREE_CODE (e) != COMPLEX_EXPR)
@@ -155,10 +148,7 @@ convert (tree type, tree expr)
  tree subtype = TREE_TYPE (type);
  tree elt_type = TREE_TYPE (TREE_TYPE (e));
 
- if (in_late_binary_op)
-   

Re: C PATCH to kill c_save_expr or towards delayed folding for the C FE

2017-05-12 Thread Jakub Jelinek
On Fri, May 12, 2017 at 09:37:27PM +0200, Marek Polacek wrote:
> @@ -565,6 +564,25 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
> *maybe_const_operands,
>appropriate in any particular case.  */
>gcc_unreachable ();
>  
> +case SAVE_EXPR:
> +  /* Make sure to fold the contents of a SAVE_EXPR exactly once.  */
> +  if (!SAVE_EXPR_FOLDED_P (expr))
> + {
> +   op0 = TREE_OPERAND (expr, 0);
> +   op0 = c_fully_fold_internal (op0, in_init, maybe_const_operands,
> +maybe_const_itself, for_int_const);
> +   /* Don't wrap the folded tree in a SAVE_EXPR if we don't
> +  have to.  */
> +   if (tree_invariant_p (op0))
> + ret = op0;
> +   else
> + {
> +   TREE_OPERAND (expr, 0) = op0;
> +   SAVE_EXPR_FOLDED_P (expr) = true;
> + }
> + }

Wouldn't it be better to guard with if (!SAVE_EXPR_FOLDED_P (expr))
only c_fully_fold_internal recursion on the operand
and then use if (tree_invariant_p (op0)) unconditionally?

> @@ -113,6 +113,10 @@ along with GCC; see the file COPYING3.  If not see
> subexpression meaning it is not a constant expression.  */
>  #define CONSTRUCTOR_NON_CONST(EXPR) TREE_LANG_FLAG_1 (CONSTRUCTOR_CHECK 
> (EXPR))
>  
> +/* For a SAVE_EXPR, nonzero if the contents of the SAVE_EXPR have already
> +   been folded.  */

s/contents/operand/;s/have/has/ ?

Otherwise I'm all for this, but would like to give you and Joseph as C FE
maintainers the last word on this.

Jakub


Re: [RFC PATCH, i386]: Enable post-reload compare elimination pass

2017-05-12 Thread Uros Bizjak
On Fri, May 12, 2017 at 9:09 PM, Uros Bizjak  wrote:
> On Fri, May 12, 2017 at 8:34 PM, Jeff Law  wrote:
>> On 05/10/2017 01:05 PM, Uros Bizjak wrote:
>>>
>>> On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak  wrote:

 On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek  wrote:
>
> On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote:
>>
>> Attached patch enables post-reload compare elimination pass by
>> providing expected patterns (duplicates of existing patterns with
>> setters of reg and flags switched in the parallel) for flag setting
>> arithmetic instructions.
>>
>> The merge triggers more than 3000 times during the gcc bootstrap,
>> mostly in cases where intervening memory load or store prevents
>> combine from merging the arithmetic insn and the following compare.
>>
>> Also, some recent linux x86_64 defconfig build results in ~200 merges,
>> removing ~200 test/cmp insns. Not much, but I think the results still
>> warrant the pass to be enabled.
>
>
> Isn't the right fix instead to change the compare-elim.c pass to either
> accept both reg vs. flags orderings in parallel, or both depending
> on some target hook, or change it to the order i386.md and most other
> major targets use and just fix up mn10300/rx (and aarch64?) to use the
> same
> order?
>>>
>>>
>>> Attached patch changes compare-elim.c order to what i386.md expects.
>>>
>>> Thoughts?
>>
>> I think with an appropriate change to the canonicalization rules in the
>> manual this is fine.
>>
>> I've got the visium, rx and mn103 patches handy to ensure they don't
>> regress.  aarch64 doesn't seem to be affected either way but I didn't
>> investigate why -- I expected it to improve with your change.
>>
>> I'll write up a ChangeLog and commit the mn103, rx and visium changes after
>> you commit the compare-elim & documentation bits.

Something like the attached patch?

2017-05-12  Uros Bizjak  

* doc/md.texi ( Canonicalization of Instructions): Describe the
canonical form of instructions that inherently set a condition
code register.

Uros.
Index: doc/md.texi
===
--- doc/md.texi (revision 247980)
+++ doc/md.texi (working copy)
@@ -7258,6 +7258,25 @@
 if the first argument is a condition code register or @code{(cc0)}.
 
 @item
+For instructions that inherently set a condition code register, the
+@code{compare} operator is always written as the first RTL expression of
+the @code{parallel} instruction pattern.  For example,
+
+@smallexample
+(define_insn ""
+  [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ
+ (plus:SI
+   (match_operand:SI 1 "register_operand" "%r")
+   (match_operand:SI 2 "register_operand" "r"))
+ (const_int 0)))
+   (set (match_operand:SI 0 "register_operand" "=r")
+   (plus:SI (match_dup 1) (match_dup 2)))]
+  ""
+  "addl %0, %1, %2")
+@end smallexample
+
+@item
 An operand of @code{neg}, @code{not}, @code{mult}, @code{plus}, or
 @code{minus} is made the first operand under the same conditions as
 above.


New Swedish PO file for 'gcc' (version 7.1.0)

2017-05-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-7.1.0.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: C PATCH to kill c_save_expr or towards delayed folding for the C FE

2017-05-12 Thread Joseph Myers
On Fri, 12 May 2017, Marek Polacek wrote:

> In the effort of reducing early folding, we should avoid calling c_fully_fold
> blithely, except when needed for e.g. initializers.  This is a teeny tiny step

Note there are several reasons for early folding in the C front end: at 
least (a) cases where logically needed (initializers and other places 
where constants are needed), (b) because warnings need a folded 
expression, (c) when the expression will go somewhere c_fully_fold does 
not recurse inside.  Also (d) convert, at least, folds regardless of 
whether it's actually necessary.

There is a case for avoiding (b) by putting the necessary information in 
the IR so the warnings can happen later from c_fully_fold, though there 
may be other possible approaches.

> @@ -146,8 +140,7 @@ convert (tree type, tree expr)
>  
>  case COMPLEX_TYPE:
>/* If converting from COMPLEX_TYPE to a different COMPLEX_TYPE
> -  and e is not COMPLEX_EXPR, convert_to_complex uses save_expr,
> -  but for the C FE c_save_expr needs to be called instead.  */
> +  and E is not COMPLEX_EXPR, convert_to_complex uses save_expr.  */
>if (TREE_CODE (TREE_TYPE (e)) == COMPLEX_TYPE)
>   {
> if (TREE_CODE (e) != COMPLEX_EXPR)

The point of this comment is to explain why we don't just call 
convert_to_complex here (see PR 47150).  So with your changes it would 
seem appropriate to change c-convert.c back to calling convert_to_complex 
here.

The changes seem otherwise OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Remove _Safe_container _IsCxx11AllocatorAware template parameter

2017-05-12 Thread François Dumont

On 12/05/2017 12:38, Jonathan Wakely wrote:

On 11/05/17 22:03 +0200, François Dumont wrote:

Hi

   _Safe_container _IsCxx11AllocatorAware template allocator is only 
used if C++11 Abi is not used so I simplified it.


   * include/debug/safe_container.h [_GLIBCXX_USE_CXX11_ABI]
   (_Safe_container<>): Remove _IsCxx11AllocatorAware template 
parameter.

   * include/debug/string: Adapt.

   Tested under Linux x86_64 with debug mode and active C++11 Abi.


Wait, doesn't this mean that debug containers have a different base
class depending on which ABI is active in the translation unit? That
is a One-Definition Rule violation when you link together objects
compiled with different values of _GLIBCXX_USE_CXX11_ABI, which is
supposed to work. I went to enormous effort to ensure it works.



François



diff --git a/libstdc++-v3/include/debug/safe_container.h 
b/libstdc++-v3/include/debug/safe_container.h

index 3d44c15..e985c2a 100644
--- a/libstdc++-v3/include/debug/safe_container.h
+++ b/libstdc++-v3/include/debug/safe_container.h
@@ -36,8 +36,12 @@ namespace __gnu_debug
  /// Safe class dealing with some allocator dependent operations.
  template class _SafeBase,
-   bool _IsCxx11AllocatorAware = true>
+   template class _SafeBase
+#if _GLIBCXX_USE_CXX11_ABI
+   >
+#else
+   , bool _IsCxx11AllocatorAware = true>
+#endif
class _Safe_container


In any case, I don't think this is simpler, there are more lines of
code, and the preprocessor condition makes it harder to read and
harder to reason about.


: public _SafeBase<_SafeContainer>
{
@@ -82,8 +86,10 @@ namespace __gnu_debug
  {
__glibcxx_check_self_move_assign(__x);

+#  if !_GLIBCXX_USE_CXX11_ABI
if (_IsCxx11AllocatorAware)
  {
+#  endif


This is fairly pointless. Again it makes the code a bit harder to
read, and since it's a compile-time constant the condition will be
optimised away completely.



Ok, I reverted it.

François




[PATCH], PR target/80510, Optimize offsettable memory references on power7/power8

2017-05-12 Thread Michael Meissner
When I was fixing PR target/68163, I noticed there was a wider problem, and
opened up PR 80510.

The problem is if the DImode, DFmode, and SFmode are allowed in Altivec
registers before ISA 3.0, and the compiler wants to do an offsettable store.
The compiler generates a move from an Altivec register to a traditional
floating point register, and then the compiler generates the STFD or STFS
instruction.

This code adds peephole2's that notices there is a move from an altivec
regsiter to fpr register and store, it changes this load the offset into a GPR,
and do the indexed store from the Altivec register.  I also added code to do
the reverse (notice if there is a load to a FPR register and copy it to an
Altivec register) and use an indexed load.

I ran the Spec 2006 floating point suite with this patch, and the LBM benchmark
shows a nearly 3% gain with this patch, and there were no significant
regressions.  I also bootstrapped the compiler and insured there were no
regressions.  Can I check this into GCC 8?  How about GCC 7?

Note, using peepholes are a quick way to fix the particular problem.  However,
it would be nice long term to arrange things so the back end can tell the
register allocator to load up the offset into a register, instead of doing the
move/store.  I tried various modifications to secondary reload, but I wasn't
able to get it to change behavor.

[gcc]
2017-05-12  Michael Meissner  

PR target/80510
* config/rs6000/predicates.md (simple_offsettable_mem_operand):
New predicate.

* config/rs6000/rs6000.md (ALTIVEC_DFORM): New iterator.
(define_peephole2 for Altivec d-form load): Add peepholes to catch
cases where the register allocator uses a move and an offsettable
memory operation to/from a FPR register on ISA 2.06/2.07.
(define_peephole2 for Altivec d-form store): Likewise.

[gcc/testsuite]
2017-05-12  Michael Meissner  

PR target/80510
* gcc.target/powerpc/pr80510-1.c: New test.
* gcc.target/powerpc/pr80510-2.c: Likewise.



-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/branches/ibm/meissner-gcc8/gcc/config/rs6000)
   (revision 247282)
+++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy)
@@ -701,6 +701,11 @@ (define_code_attr minmax   [(smin "min
 (define_code_attr SMINMAX  [(smin "SMIN")
 (smax "SMAX")])
 
+;; Iterator to optimize the following cases:
+;; D-form load to FPR register & move to Altivec register
+;; Move Altivec register to FPR register and store
+(define_mode_iterator ALTIVEC_DFORM [DI DF SF])
+
 
 ;; Start with fixed-point load and store insns.  Here we put only the more
 ;; complex forms.  Basic data transfer is done later.
@@ -14067,6 +14072,74 @@ (define_insn "*fusion_p9__constant
(set_attr "length" "8")])
 
 
+;; Optimize cases where we want to do a D-form load (register+offset) on
+;; ISA 2.06/2.07 to an Altivec register, and the register allocator
+;; has generated:
+;; load fpr
+;; move fpr->altivec
+
+(define_peephole2
+  [(match_scratch:DI 0 "b")
+   (set (match_operand:ALTIVEC_DFORM 1 "fpr_reg_operand")
+   (match_operand:ALTIVEC_DFORM 2 "simple_offsettable_mem_operand"))
+   (set (match_operand:ALTIVEC_DFORM 3 "altivec_register_operand")
+   (match_dup 1))]
+  "TARGET_VSX && TARGET_POWERPC64 && TARGET_UPPER_REGS_
+   && !TARGET_P9_DFORM_SCALAR && peep2_reg_dead_p (2, operands[1])"
+  [(set (match_dup 0)
+   (match_dup 4))
+   (set (match_dup 3)
+   (match_dup 5))]
+{
+  rtx tmp_reg = operands[0];
+  rtx mem = operands[2];
+  rtx addr = XEXP (mem, 0);
+  rtx add_op0, add_op1, new_addr;
+
+  gcc_assert (GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM);
+  add_op0 = XEXP (addr, 0);
+  add_op1 = XEXP (addr, 1);
+  gcc_assert (REG_P (add_op0));
+  new_addr = gen_rtx_PLUS (DImode, add_op0, tmp_reg);
+
+  operands[4] = add_op1;
+  operands[5] = change_address (mem, mode, new_addr);
+})
+
+;; Optimize cases were want to do a D-form store on ISA 2.06/2.07 from an
+;; Altivec register, and the register allocator has generated:
+;; move altivec->fpr
+;; store fpr
+
+(define_peephole2
+  [(match_scratch:DI 0 "b")
+   (set (match_operand:ALTIVEC_DFORM 1 "fpr_reg_operand")
+   (match_operand:ALTIVEC_DFORM 2 "altivec_register_operand"))
+   (set (match_operand:ALTIVEC_DFORM 3 "simple_offsettable_mem_operand")
+   (match_dup 1))]
+  "TARGET_VSX && TARGET_POWERPC64 && TARGET_UPPER_REGS_
+   && !TARGET_P9_DFORM_SCALAR && peep2_reg_dead_p (2, operands[1])"
+  [(set (match_dup 0)
+   (match_dup 4))
+   (set (match_dup 5)
+   (match_dup 2))]
+{
+  rtx tmp_reg = operands[0];
+  rtx mem = 

[PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-05-12 Thread Daniel Santos

Ping?  I have posted revisions of the following in patch set:

05/12 - https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01442.html
09/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00348.html
11/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00350.html

I have retested them on Linux x86-64 in addition a Wine testsuite 
comparison resulting in fewer failed tests (31) than when using 
unpatched 7.1.0 (78) and 5.4.0 (78).  A cursory examination of the now 
working failures with 7.1.0 seemed to be to be due to race conditions in 
Wine that are incidentally hidden after the patches.


Is there anything else needed before we can commit these?  They still 
rebase cleanly onto the HEAD, but I can repost as "v5" if you prefer.


Thanks,
Daniel



Re: [PATCH] decl lang hooks

2017-05-12 Thread Nathan Sidwell

On 05/12/2017 03:26 AM, Richard Biener wrote:

On Thu, May 11, 2017 at 7:58 PM, Nathan Sidwell  wrote:

On 05/11/2017 01:50 PM, Rainer Orth wrote:


however, it breaks bootstrap with --enable-languages=obj-c++:



wierd, I thought --enable-languges=all enabled that (and I have seen objc
issues pop up during development).  Will take another look.


to enable all languages you need to do
--enable-languages=all,ada,obj-c++,go,brig


seems like --enable-languages=all means --enable-languages=some :)  (though it 
does help if one spells the option correctly)


nathan

--
Nathan Sidwell


Re: [PATCH] handling address mode changes inside extract_bit_field

2017-05-12 Thread Jim Wilson
On Thu, May 4, 2017 at 7:24 PM, Jeff Law  wrote:
> On 03/01/2017 03:06 PM, Jim Wilson wrote:
> This seems fine to me.  A testcase to add to the gcc.target testsuite would
> be useful, but I don't think it's strictly necessary.

Thanks for the review.  It was 2 months since I posted it, so I
retested it to make sure x86_64 and aarch64 bootstrap and make check
still worked, then checked it in.  That took me a while as I was busy
with other stuff this week.  The comment about a testcase is a good
point; I will work on one to submit.

Jim


Re: [PATCH] handling address mode changes inside extract_bit_field

2017-05-12 Thread Martin Sebor

On 05/04/2017 08:24 PM, Jeff Law wrote:

On 03/01/2017 03:06 PM, Jim Wilson wrote:

This is a proposed patch for the bug 79794 which I just submitted.
This isn't a regression, so this can wait for after the gcc 7 branch
if necessary.

The problem here is that a reg+offset MEM target is passed to
extract_bit_field with a vector register source.  On aarch64, we have
an instruction for this, but it accepts a reg address only, so the
address gets loaded into a reg inside extract_bit_field.  We then
return to expand_expr which does
   ! rtx_equal_p (temp, target)
which fails because of the address mode change, so we end up copying
target into a reg and then back to itself.

expand_expr has a solution for this problem.  There is an alt_rtl
variable that can be set when temp is logically the same as target.
This variable is currently not passed into extract_bit_field.  This
patch does that.

There is an additional complication that the actual address load into
a reg occurs inside maybe_expand_insn, and it doesn't seem reasonable
to pass alt_reg into that.  However, I can grab a bit from the
expand_operand structure to indicate when an operand is the target,
and then clear it if target is replaced with a reg.

The resulting patch works, but ends up a bit more invasive than I
hoped.  The patch has passed a bootstrap and make check test on x86_64
and aarch64.

Jim


alt-rtl.patch


Proposed patch for RTL expand bug affecting aarch64 vector code.

PR middle-end/79794
 * expmed.c (extract_bit_field_1): Add alt_rtl argument.  Before
maybe_expand_insn call, set ops[0].target.  If still set after call,
set alt_rtl.  Add extra arg to recursive calls.
(extract_bit_field): Add alt_rtl argument.  Pass to
extract_bit_field.
* expmed.h (extract_bit_field): Fix prototype.
* expr.c (emit_group_load_1, copy_blkmode_from_reg)
(copy_blkmode_to_reg, read_complex_part, store_field): Pass extra
NULL
to extract_bit_field_calls.
(expand_expr_real_1): Pass alt_rtl to expand_expr_real instead of 0.
Pass alt_rtl to extract_bit_field calls.
* calls.c (store_unaligned_arguments_into_psuedos)
load_register_parameters): Pass extra NULL to extract_bit_field
calls.
* optabs.c (maybe_legitimize_operand): Clear op->target when call
gen_reg_rtx.
* optabs.h (struct expand_operand): Add target bitfield.

The only part I found intrusive was the op->target stuff, but it wasn't
terrible.

The additional argument creates visual clutter in the diffs as the
callers get updated, but that's pretty easy to filter out.


Explicitly passing the additional argument at all the call sites
can be mitigated by giving the new alt_rtl argument a default
value of NULL in the declarations of the extract_bit_field functions.

Martin


  1   2   >