Re: [PATCH] Convert more passes to new dump framework

2013-08-30 Thread Andreas Schwab
Teresa Johnson  writes:

> Index: testsuite/gcc.dg/inline-dump.c
> ===
> --- testsuite/gcc.dg/inline-dump.c  (revision 0)
> +++ testsuite/gcc.dg/inline-dump.c  (revision 0)
> @@ -0,0 +1,11 @@
> +/* Verify that -fopt-info can output correct inline info.  */
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
> +static inline int leaf() {
> +  int i, ret = 0;
> +  for (i = 0; i < 10; i++)
> +ret += i;
> +  return ret;
> +}
> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
> leaf .*inlined into bar .*via inline instance foo.*\n" } */

I don't see that message, neither on ia64 nor m68k.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


[PATCH] Fix PR58228

2013-08-30 Thread Richard Biener

Invariant loads in the inner loop of a vectorized loop nest are not
supported, so make sure we don't run into the code that isn't prepared
to handle them.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2013-08-30  Richard Biener  

PR tree-optimization/58228
* tree-vect-data-refs.c (vect_analyze_data_ref_access): Do not
allow invariant loads in nested loop vectorization.

* gcc.dg/torture/pr58228.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 202068)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_analyze_data_ref_access (struct dat
*** 2272,2281 
return false;
  }
  
!   /* Allow invariant loads in loops.  */
if (loop_vinfo && integer_zerop (step))
  {
GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
return DR_IS_READ (dr);
  }
  
--- 2272,2288 
return false;
  }
  
!   /* Allow invariant loads in not nested loops.  */
if (loop_vinfo && integer_zerop (step))
  {
GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
+   if (nested_in_vect_loop_p (loop, stmt))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"zero step in inner loop of nest");
+ return false;
+   }
return DR_IS_READ (dr);
  }
  
Index: gcc/testsuite/gcc.dg/torture/pr58228.c
===
*** gcc/testsuite/gcc.dg/torture/pr58228.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr58228.c  (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do run } */
+ 
+ extern void abort (void);
+ int a[8][8] = {{1}};
+ int b, c, d, e;
+ 
+ int main ()
+ {
+   for (c = 0; c < 8; c++)
+ for (b = 0; b < 2; b++)
+   a[b + 4][c] = a[c][0];
+   if (a[4][4] != 1)
+ abort ();
+   return 0;
+ }


[PATCH] Fix PR58223

2013-08-30 Thread Richard Biener

Loop distribution fails to honor output dependences - the following
patch handles them the same as anti dependences.  All the code gathering
up dependences is a little weird and probably needs some TLC.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2013-08-30  Richard Biener  

PR tree-optimization/58223
* tree-loop-distribution.c (has_anti_dependence): Rename to ...
(has_anti_or_output_dependence): ... this and adjust to also
look for output dependences.
(mark_nodes_having_upstream_mem_writes): Adjust.
(rdg_flag_uses): Likewise.

* gcc.dg/torture/pr58223.c: New testcase.
* gcc.dg/tree-ssa/ldist-16.c: Flip expected behavior.

Index: gcc/tree-loop-distribution.c
===
*** gcc/tree-loop-distribution.c(revision 202068)
--- gcc/tree-loop-distribution.c(working copy)
*** already_processed_vertex_p (bitmap proce
*** 542,558 
  || !bitmap_bit_p (remaining_stmts, v));
  }
  
! /* Returns NULL when there is no anti-dependence among the successors
!of vertex V, otherwise returns the edge with the anti-dep.  */
  
  static struct graph_edge *
! has_anti_dependence (struct vertex *v)
  {
struct graph_edge *e;
  
if (v->succ)
  for (e = v->succ; e; e = e->succ_next)
!   if (RDGE_TYPE (e) == anti_dd)
return e;
  
return NULL;
--- 542,560 
  || !bitmap_bit_p (remaining_stmts, v));
  }
  
! /* Returns NULL when there is no anti-dependence or output-dependence
!among the successors of vertex V, otherwise returns the edge with the
!dependency.  */
  
  static struct graph_edge *
! has_anti_or_output_dependence (struct vertex *v)
  {
struct graph_edge *e;
  
if (v->succ)
  for (e = v->succ; e; e = e->succ_next)
!   if (RDGE_TYPE (e) == anti_dd
! || RDGE_TYPE (e) == output_dd)
return e;
  
return NULL;
*** mark_nodes_having_upstream_mem_writes (s
*** 604,614 
|| predecessor_has_mem_write (rdg, &(rdg->vertices[x]))
/* In anti dependences the read should occur before
   the write, this is why both the read and the write
!  should be placed in the same partition.  */
!   || has_anti_dependence (&(rdg->vertices[x])))
! {
!   bitmap_set_bit (upstream_mem_writes, x);
! }
  }
  
nodes.release ();
--- 606,615 
|| predecessor_has_mem_write (rdg, &(rdg->vertices[x]))
/* In anti dependences the read should occur before
   the write, this is why both the read and the write
!  should be placed in the same partition.  In output
!  dependences the writes order need to be preserved.  */
!   || has_anti_or_output_dependence (&(rdg->vertices[x])))
! bitmap_set_bit (upstream_mem_writes, x);
  }
  
nodes.release ();
*** rdg_flag_uses (struct graph *rdg, int u,
*** 637,643 
use_operand_p use_p;
struct vertex *x = &(rdg->vertices[u]);
gimple stmt = RDGV_STMT (x);
!   struct graph_edge *anti_dep = has_anti_dependence (x);
  
/* Keep in the same partition the destination of an antidependence,
   because this is a store to the exact same location.  Putting this
--- 638,644 
use_operand_p use_p;
struct vertex *x = &(rdg->vertices[u]);
gimple stmt = RDGV_STMT (x);
!   struct graph_edge *anti_dep = has_anti_or_output_dependence (x);
  
/* Keep in the same partition the destination of an antidependence,
   because this is a store to the exact same location.  Putting this
Index: gcc/testsuite/gcc.dg/tree-ssa/ldist-16.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/ldist-16.c(revision 202068)
--- gcc/testsuite/gcc.dg/tree-ssa/ldist-16.c(working copy)
*** void foo (int n)
*** 14,21 
  }
  }
  
! /* We should apply loop distribution and generate a memset (0).  */
  
! /* { dg-final { scan-tree-dump "distributed: split to 2" "ldist" } } */
! /* { dg-final { scan-tree-dump-times "generated memset zero" 1 "ldist" } } */
  /* { dg-final { cleanup-tree-dump "ldist" } } */
--- 14,21 
  }
  }
  
! /* We should not apply loop distribution and not generate a memset (0).  */
  
! /* { dg-final { scan-tree-dump "Loop 1 is the same" "ldist" } } */
! /* { dg-final { scan-tree-dump-times "generated memset zero" 0 "ldist" } } */
  /* { dg-final { cleanup-tree-dump "ldist" } } */
Index: gcc/testsuite/gcc.dg/torture/pr58223.c
===
*** gcc/testsuite/gcc.dg/torture/pr58223.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr58223.c  (working copy)
***
*** 0 
--- 1,16 
+ /* { dg-do run } */
+ 
+ exte

[PATCH] Fix PR58010

2013-08-30 Thread Richard Biener

Once again the vectorizer isn't prepared to vectorize dead code.
Fortunately in this case it's easy to fix - just remove the assert
that triggers and continue doing nothing for the non-existing
use of the reduction result.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2013-08-30  Richard Biener  

PR tree-optimization/58010
* tree-vect-loop.c (vect_create_epilog_for_reduction): Remove
assert that we have a loop-closed PHI.

* gcc.dg/pr58010.c: New testcase.

Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 202068)
--- gcc/tree-vect-loop.c(working copy)
*** vect_finalize_reduction:
*** 4373,4381 
  if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p
phis.safe_push (USE_STMT (use_p));
  
!   /* We expect to have found an exit_phi because of loop-closed-ssa
!  form.  */
!   gcc_assert (!phis.is_empty ());
  
FOR_EACH_VEC_ELT (phis, i, exit_phi)
  {
--- 4373,4380 
  if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p
phis.safe_push (USE_STMT (use_p));
  
!   /* While we expect to have found an exit_phi because of loop-closed-ssa
!  form we can end up without one if the scalar cycle is dead.  */
  
FOR_EACH_VEC_ELT (phis, i, exit_phi)
  {
Index: gcc/testsuite/gcc.dg/pr58010.c
===
*** gcc/testsuite/gcc.dg/pr58010.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr58010.c  (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -funswitch-loops -ftree-vectorize" } */
+ 
+ short a, b, c, d;
+ 
+ void f(void)
+ {
+   short e;
+ 
+   for(; e; e++)
+ for(; b; b++);
+ 
+   for(d = 0; d < 4; d++)
+ a ^= (e ^= 1) || c ? : e;
+ }


[PATCH 3/n] Handle conditional compare in vrp pass

2013-08-30 Thread Zhenqiang Chen
Hi,

The patch updates vrp pass to handle conditional compare (CCMP).

CCMP is a combine of BIT_AND_EXPR/BIT_IOR_EXPR and CMP expression. The codes
are similar with those to handle BIT_AND_EXPR/BIT_IOR_EXPR and CMP
expression.

When the compare in CCMP can be simplified, CCMP will be converted to a
copy,
compare or a bit operator.

Bootstrap on x86-64 and ARM Chromebook.
No ICE and runtime errors in regression tests.
All "vrp" related test cases PASS.

Thanks!
-Zhenqiang

ChangeLog:
2013-08-30  Zhenqiang Chen  

* tree-vrp.c (extract_range_from_assignment,
register_edge_assert_for,
simplify_stmt_using_ranges): Handle conditional compare.
(extract_range_from_comparison, can_simplify_ops_using_ranges_p,
simplify_ccmp_to_cmp, simple_ccmp_second_compare,
simplify_ccmp_to_op,
simplify_ccmp_stmt_using_ranges): New added.

testsuite/ChangeLog:
2013-08-30  Zhenqiang Chen  

* gcc.dg/tree-ssa/ccmp*.c: New test cases.

vrp-CCMP.patch
Description: Binary data


Re: [PATCH 6/6] Add manual GTY hooks

2013-08-30 Thread Richard Biener
On Thu, Aug 29, 2013 at 6:20 PM, David Malcolm  wrote:
> * gimple.c (gt_ggc_mx (gimple)): New, as required by GTY((user)).
> (gt_pch_nx (gimple)): Likewise.
> (gt_pch_nx (gimple, gt_pointer_operator, void *)): Likewise.
> * gimple.h  (gt_ggc_mx (gimple)): Declare.
> (gt_pch_nx (gimple)): Declare.
> (gt_pch_nx (gimple, gt_pointer_operator, void *)): Declare.

No GIMPLE should reside in PCHs so you should be able to just put
gcc_unreachable () in them ... (if dropping them does not work)

> * tree-cfg.c (ggc_mx (gimple&)): Remove declaration, as this
> collides with the function that GTY((user)) expects.
> (gt_ggc_mx (edge_def *)): Replace call to gt_ggc_mx on the
> gimple with gt_ggc_mx_gimple_statement_base: in the
> pre-GTY((user)) world, "gt_ggc_mx" was the function to be called
> on a possibly NULL pointed to check if needed marking and if so
> to traverse its fields.  In the GTY((user)) world, "gt_ggc_mx"
> is the function to be called on non-NULL objects immediately *after*
> they have been marked: it does not mark the object itself.
> (gt_pch_nx (gimple&)): Remove declaration.
> (gt_pch_nx (edge_def *)): Update as per the mx hook.
> ---
>  gcc/gimple.c   | 743 
> +
>  gcc/gimple.h   |   6 +
>  gcc/tree-cfg.c |   6 +-
>  3 files changed, 751 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 1ad36d1..dd99cda 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -4338,4 +4338,747 @@ build_type_cast (tree to_type, gimple op, enum 
> ssa_mode mode)
>return build_type_cast (to_type, gimple_assign_lhs (op), mode);
>  }
>
> +void
> +gt_ggc_mx (gimple gs)
> +{
> +  gimple x = gs;
> +  /* Emulation of the "chain_next" GTY attribute.
> +
> + gs has already been marked.
> + Iterate the chain of next statements, marking until we reach one that
> + has already been marked, or NULL.   */
> +  gimple xlimit = gs->next;
> +  while (ggc_test_and_set_mark (xlimit))
> +xlimit = xlimit->next;
> +
> +  /* All of the statements within the half-open interval [x..xlimit) have
> + just been marked.  Iterate through the list, visiting their fields.  */
> +  while (x != xlimit)
> +{
> +  gt_ggc_m_15basic_block_def (x->bb);
> +  switch (gimple_statement_structure (&((*x
> +   {
> +   case GSS_BASE:
> + break;
> +   case GSS_WITH_OPS:
> + {
> +   gimple_statement_with_ops *stmt
> + = static_cast  (x);
> +   size_t num = (size_t)(stmt->num_ops);
> +   for (size_t i = 0; i != num; i++)
> + gt_ggc_m_9tree_node (stmt->op[i]);
> + }
> + break;
> +   case GSS_WITH_MEM_OPS_BASE:
> + break;
> +   case GSS_WITH_MEM_OPS:
> + {
> +   gimple_statement_with_memory_ops *stmt
> + = static_cast  (x);
> +   size_t num = (size_t)(stmt->num_ops);
> +   for (size_t i = 0; i != num; i++)
> + gt_ggc_m_9tree_node (stmt->op[i]);
> + }
> + break;
> +   case GSS_CALL:
> + {
> +   gimple_statement_call *stmt
> + = static_cast  (x);
> +   gt_ggc_m_15bitmap_head_def (stmt->call_used.vars);
> +   gt_ggc_m_15bitmap_head_def (stmt->call_clobbered.vars);
> +   switch (stmt->subcode & GF_CALL_INTERNAL)
> + {
> + case 0:
> +   gt_ggc_m_9tree_node (stmt->u.fntype);
> +   break;
> + case GF_CALL_INTERNAL:
> +   break;
> + default:
> +   break;
> + }
> +   size_t num = (size_t)(stmt->num_ops);
> +   for (size_t i = 0; i != num; i++)
> + gt_ggc_m_9tree_node (stmt->op[i]);
> + }
> + break;
> +   case GSS_OMP:
> + {
> +   gimple_statement_omp *stmt
> + = static_cast  (x);
> +   gt_ggc_mx_gimple_statement_base (stmt->body);
> + }
> + break;
> +   case GSS_BIND:
> + {
> +gimple_statement_bind *stmt
> + = static_cast  (x);
> +   gt_ggc_m_9tree_node (stmt->vars);
> +   gt_ggc_m_9tree_node (stmt->block);
> +   gt_ggc_mx_gimple_statement_base (stmt->body);
> + }
> + break;
> +   case GSS_CATCH:
> + {
> +   gimple_statement_catch *stmt
> + = static_cast  (x);
> +   gt_ggc_m_9tree_node (stmt->types);
> +   gt_ggc_mx_gimple_statement_base (stmt->handler);
> + }
> + break;
> +   case GSS_EH_FILTER:
> + {
> +   gimple_statement_eh_filter *stmt
> + = static_cast  (x);
> +   gt_ggc_m_9tree_node (stmt->types);
> +   gt_ggc_mx_gimple_statement_base (stmt->failure);
> + }
> + break;
> +

Re: [PATCH 6/6] Add manual GTY hooks

2013-08-30 Thread Richard Biener
On Thu, Aug 29, 2013 at 9:44 PM, Steven Bosscher  wrote:
> On Thu, Aug 29, 2013 at 6:20 PM, David Malcolm  wrote:
>> * gimple.c (gt_ggc_mx (gimple)): New, as required by GTY((user)).
>> (gt_pch_nx (gimple)): Likewise.
>
> GIMPLE isn't supposed to end up in a PCH. Can you please make this
> function simply call gcc_unreachable()?
>
> FWIW 1: I really think all these hand-written markers aren't a good
> idea, we should really figure out a way to have automatic marker
> function generators, something less complex than gengtype, of course.
> But to have all these calls to the type-mangled marker functions
> (gt_pch_n_9tree_node, etc.) is going to be a problem in the long term.
>
> It seems to me that the first step in all these conversions to
> hand-written markers should be to make gengtype spit out the marker
> functions *without* the type name mangling, i.e. all marker functions
> should just be gt_ggc_mx(type) / gt_pch_nx(type).

Yes, the original idea was that gengtype would do that.  For things we like
to optimize the GTY((user)) thing would tell it that we do provide the markers.
Like when you want to look through a non-GCed intermediate object.  Or
for things like GTY((chain)) stuff that doesn't really work if you have multiple
chains (without clever GTY((skip))s...).

The lack of the unmangled overloads is annoying :/  IIRC Diego halfway completed
the transition to unmangled overloads / specializations?

Richard.

> Ciao!
> Steven


Re: wide-int branch: fixed mips regression

2013-08-30 Thread Richard Biener
On Fri, Aug 30, 2013 at 12:02 AM, Kenneth Zadeck
 wrote:
> Fixed
>
> FAIL: gcc.dg/fixed-point/int-warning.c  (test for warnings, line 12)
>
> on
>
>
>  mips64-unknown-linux-gnu

But now this implementation matches how I thought the wide-int encoding
works but not how it does actually work.

For unsigned 0x the predicate now does not return true.  I think
the old code is exactly correct.

Richard.


Re: Request to merge Undefined Behavior Sanitizer in (take 3)

2013-08-30 Thread Marek Polacek
On Tue, Aug 27, 2013 at 04:30:40PM +0200, Marek Polacek wrote:
> On Thu, Aug 22, 2013 at 09:01:57PM +0200, Marek Polacek wrote:
> > On Thu, Aug 22, 2013 at 07:51:07PM +0200, Marek Polacek wrote:
> > > Ping.
> > 
> > I'm withdrawing the ping for now.  I'll have to deal with some bootstrap
> > comparison failures first (ugh!).
> 
> Fixed with this patch:
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01344.html
> Also, I've changed the hash table to pointer map:
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01582.html

Now that the pointer_map has been converted to C hash table,
and we use GTY for it, I see no more pending issues currently.

> Regtested, ran bootstrap-ubsan on x86_64-linux.
> 
> Ok to merge ubsan into trunk?

Marek


Re: Request to merge Undefined Behavior Sanitizer in (take 3)

2013-08-30 Thread Jakub Jelinek
On Fri, Aug 30, 2013 at 10:13:06AM +0200, Marek Polacek wrote:
> On Tue, Aug 27, 2013 at 04:30:40PM +0200, Marek Polacek wrote:
> > On Thu, Aug 22, 2013 at 09:01:57PM +0200, Marek Polacek wrote:
> > > On Thu, Aug 22, 2013 at 07:51:07PM +0200, Marek Polacek wrote:
> > > > Ping.
> > > 
> > > I'm withdrawing the ping for now.  I'll have to deal with some bootstrap
> > > comparison failures first (ugh!).
> > 
> > Fixed with this patch:
> > http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01344.html
> > Also, I've changed the hash table to pointer map:
> > http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01582.html
> 
> Now that the pointer_map has been converted to C hash table,
> and we use GTY for it, I see no more pending issues currently.

So, can you please post a new final patch for the merge (with the new 
directories
or files in libsanitizer/ just listed in the ChangeLog entries, but not
actually included in the patch, that would make it too large and anyone can
look at libsanitizer/ubsan/ on the branch)?

Jakub


Re: Eliminate vectorizer analysis side effects

2013-08-30 Thread Richard Biener
On Fri, Aug 30, 2013 at 1:28 AM, Xinliang David Li  wrote:
> I was debugging a runtime failure of SPEC06 xalancbmk built with LIPO.
> Using -fdisable- option pinpoints the problem in slp vectorize
> pass on a particular function. dbgcnt support is added to to track
> down the individual BB, but it  fails even when the dbg count is set
> to 0.
>
> It turns out that no BB was actually vectorized for that function, but
> turning on/off slp-vectorize does make a difference in generated code
> -- the only difference between the good and bad case is stack layout.
>  The problem is  in the alignment analysis phase -- which
> speculatively changes the base declaration's alignment regardless
> whether the vectorization transformation will be performed or not
> later.
>
> The attached patch fixes the problem. Testing is ok. Ok for trunk?

Not in this form.  I'd rather not put extra fields in the data-refs this way.
(As for the xalancbmk runtime problem - doesn't this patch just paper
over the real issue?)

For BB SLP you still adjust DR bases that do not take part in
vectorization - the DR vector contains all refs in the BB, not just
those in the SLP instances we are going to vectorize.  So the
commit of the re-aligning decision should be done from where
we vectorize the DR, in vectorizable_load/store in its transform
phase.

If we decide to integrate the fields into the data-ref then the
analysis and commit parts should go into the data-ref machinery
as well.  Otherwise the vectorizer should use data-ref->aux or some
other way to hang off its private data.

Other than that, modifying alignment of variables that are not
vectorized is indeed a bug worth fixing.

Richard.

> thanks,
>
> David


Re: Fix PR middle-end/57370

2013-08-30 Thread Richard Biener
On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman  wrote:
> Richard,
> Do you want me to commit everything but the  insert_stmt_after hunk
> now?

Yes.

> There are more similar failures reported in
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
> the updated patch there. Shall I send that for review? Apart from the
> debug statement issue, almost all the bugs are due to dependence
> violation because certain newly inserted statements do not have the
> right UID. Instead of trying to catch all of them, will it be better
> if I check if the stmt has a proper uid (non-zero if it is not the
> first stmt) and assign a sensible value at the point where it is used
> (find_insert_point and appears_later_in_bb) instead of where the stmt
> is created? I think that would be less brittle.

In the end all this placement stuff should be reverted and done as
post-processing after reassoc is finished.  You can remember the
inserted stmts that are candidates for moving (just set a pass-local
flag on them) and assign UIDs for the whole function after all stmts
are inserted.

Richard.


> - Easwaran
>
>
>
> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
>  wrote:
>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
>>> I have a new patch that supersedes this. The new patch also fixes PR
>>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
>>> no test regression on x86_64/linux. Ok for trunk?
>>>
>>> 2013-07-31  Easwaran Raman  
>>>
>>> PR middle-end/57370
>>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
>>> of debug statements that cause inconsistent IR.
>>
>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
>> at all.  The other hunks are ok, but we need to work harder to preserve
>> debug stmts - simply removing all is not going to fly.
>>
>> Richard.
>>
>>>
>>> testsuite/ChangeLog:
>>> 2013-07-31  Easwaran Raman  
>>>
>>> PR middle-end/57370
>>> PR tree-optimization/57393
>>> PR tree-optimization/58011
>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>>
>>>
>>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  wrote:
> A newly generated statement in build_and_add_sum  function of
> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
> statement. In one instance, it was assigned the wrong uid (of an
> earlier phi statement) which messed up the IR and caused the test
> program to hang. Bootstraps and no test regressions on x86_64/linux.
> Ok for trunk?
>
> Thanks,
> Easwaran
>
>
> 2013-06-27  Easwaran Raman  
>
> PR middle-end/57370
> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a 
> phi
> node for a non-phi gimple statement.
>
> testsuite/ChangeLog:
> 2013-06-27  Easwaran Raman  
>
> PR middle-end/57370
> * gfortran.dg/reassoc_12.f90: New testcase.
>
>
> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
> ===
> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
> @@ -0,0 +1,74 @@
> +! { dg-do compile }
> +! { dg-options "-O2 -ffast-math" }
> +! PR middle-end/57370
> +
> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
> +   grad_deriv,npoints, sx)
> +IMPLICIT REAL*8 (t)
> +INTEGER, PARAMETER :: dp=8
> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
> +   e_ndrho_ndrho_rho
> +  DO ii=1,npoints
> +  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
> +t1425 = t233 * t557
> +t1429 = beta * t225
> +t1622 = t327 * t1621
> +t1626 = t327 * t1625
> +t1632 = t327 * t1631
> +t1685 = t105 * t1684
> +t2057 = t1636 + t8 * (t2635 + t3288)
> +  END IF
> +  IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN
> +t5469 = t5440 - t5443 - t5446 - t5449 - &
> +t5451 - t5454 - t5456 + t5459  - &
> +t5462 + t5466 - t5468
> +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
> +t5489 = 0.16e2_dp * t1429 * t1658
> +t5531 = 0.160e2_dp * t112 * t1626
> +t5533 = 0.160e2_dp * t112 * t1632
> +t5537 = 0.160e2_dp * t112 * t1622
> +t5541 = t5472 - t5478 - t5523 + t5525 + &
> +t5531 + t5533 + t5535 + t5537 + &
> +t5540
>

Re: [PATCH] Convert more passes to new dump framework

2013-08-30 Thread Richard Biener
On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson  wrote:
> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>  wrote:
> New patch below that removes this global variable, and also outputs
> the node->symbol.order (in square brackets after the function name so
> as to not clutter it). Inline messages with profile data look look:
>
> test.c:8:3: note: foobar [0] (9000) inlined into foo [2] (1000)
> with call count 9000 (via inline instance bar [3] (9000))

 Ick.  This looks both redundant and cluttered.  This is supposed to be
 understandable by GCC users, not only GCC developers.
>>>
>>> The main part that is only useful/understandable to gcc developers is
>>> the node->symbol.order in square brackes, requested by Martin. One
>>> possibility is that I could put that part under a param, disabled by
>>> default. We have something similar on the google branches that emits
>>> LIPO module info in the message, enabled via a param.
>>
>> But we have _dump files_ for that.  That's the developer-consumed
>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>> units it shouldn't exceed a single terminal full of output.
>
> But as a developer I don't want to have to parse lots of dump files
> for a summary of the major optimizations performed (e.g. inlining,
> unrolling) for an application, unless I am diving into the reasons for
> why or why not one of those optimizations occurred in a particular
> location. I really do want a summary emitted to stderr so that it is
> easily searchable/summarizable for the app as a whole.
>
> For example, some of the apps I am interested in have thousands of
> input files, and trying to collect and parse dump files for each and
> every one is overwhelming (it probably would be even if my input files
> numbered in the hundreds). What has been very useful is having these
> high level summary messages of inlines and unrolls emitted to stderr
> by -fopt-info. Then it is easy to search and sort by hotness to get a
> feel for things like what inlines are missing when moving to a new
> compiler, or compiling a new version of the source, for example. Then
> you know which files to focus on and collect dump files for.

I thought we can direct dump files to stderr now?  So, just use
-fdump-tree-all=stderr

and grep its contents.

>>
>>> I'd argue that the other information (the profile counts, emitted only
>>> when using -fprofile-use, and the inline call chains) are useful if
>>> you want to understand whether and how critical inlines are occurring.
>>> I think this is the type of information that users focused on
>>> optimizations, as well as gcc developers, want when they use
>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>> information.
>>
>> Well, I doubt that inline information is interesting to users unless we are
>> able to aggressively filter it to what users are interested in.  Which IMHO
>> isn't possible - users are interested in "I have not inlined this even though
>> inlining would severely improve performance" which would indicate a bug
>> in the heuristics we can reliably detect and thus it wouldn't be there.
>
> I have interacted with users who are aware of optimizations such as
> inlining and unrolling and want to look at that information to
> diagnose performance differences when refactoring code or using a new
> compiler version. I also think inlining (especially cross-module) is
> one example of an optimization that is still being tuned, and user
> reports of performance issues related to that have been useful.
>
> I really think that the two groups of people who will find -fopt-info
> useful are gcc developers and savvy performance-hungry users. For the
> former group the additional info is extremely useful. For the latter
> group some of the extra information may not be required (although a
> call count is useful for those using profile feedback), but IMO is not
> unreasonable.

well, your proposed output wrecks my 80x24 terminal already due to overly
long lines.

In the end we may up with a verbosity level for each sub-set of opt-info
messages.  Ick.

Richard.

> Teresa
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH 3/n] Handle conditional compare in vrp pass

2013-08-30 Thread Richard Biener
On Fri, Aug 30, 2013 at 9:51 AM, Zhenqiang Chen  wrote:
> Hi,
>
> The patch updates vrp pass to handle conditional compare (CCMP).
>
> CCMP is a combine of BIT_AND_EXPR/BIT_IOR_EXPR and CMP expression. The codes
> are similar with those to handle BIT_AND_EXPR/BIT_IOR_EXPR and CMP
> expression.

Btw, this is exactly why it's backward to do it this way.  We finally succeeded
in dropping TRUTH_AND/OR/..._EXPR in favor of lowering them to
BIT_AND/OR/..._EXPR so passes need only handle _one_ variant.  Now you
introduce another that all passes need to handle.

As far as I can see you only need those "combined" expressions at RTL
expansion time.

Richard.

> When the compare in CCMP can be simplified, CCMP will be converted to a
> copy,
> compare or a bit operator.
>
> Bootstrap on x86-64 and ARM Chromebook.
> No ICE and runtime errors in regression tests.
> All "vrp" related test cases PASS.
>
> Thanks!
> -Zhenqiang
>
> ChangeLog:
> 2013-08-30  Zhenqiang Chen  
>
> * tree-vrp.c (extract_range_from_assignment,
> register_edge_assert_for,
> simplify_stmt_using_ranges): Handle conditional compare.
> (extract_range_from_comparison, can_simplify_ops_using_ranges_p,
> simplify_ccmp_to_cmp, simple_ccmp_second_compare,
> simplify_ccmp_to_op,
> simplify_ccmp_stmt_using_ranges): New added.
>
> testsuite/ChangeLog:
> 2013-08-30  Zhenqiang Chen  
>
> * gcc.dg/tree-ssa/ccmp*.c: New test cases.


Re: [PATCH 6/6] Add manual GTY hooks

2013-08-30 Thread Richard Biener
On Thu, Aug 29, 2013 at 6:20 PM, David Malcolm  wrote:
> * gimple.c (gt_ggc_mx (gimple)): New, as required by GTY((user)).
> (gt_pch_nx (gimple)): Likewise.
> (gt_pch_nx (gimple, gt_pointer_operator, void *)): Likewise.
> * gimple.h  (gt_ggc_mx (gimple)): Declare.
> (gt_pch_nx (gimple)): Declare.
> (gt_pch_nx (gimple, gt_pointer_operator, void *)): Declare.
> * tree-cfg.c (ggc_mx (gimple&)): Remove declaration, as this
> collides with the function that GTY((user)) expects.
> (gt_ggc_mx (edge_def *)): Replace call to gt_ggc_mx on the
> gimple with gt_ggc_mx_gimple_statement_base: in the
> pre-GTY((user)) world, "gt_ggc_mx" was the function to be called
> on a possibly NULL pointed to check if needed marking and if so
> to traverse its fields.  In the GTY((user)) world, "gt_ggc_mx"
> is the function to be called on non-NULL objects immediately *after*
> they have been marked: it does not mark the object itself.
> (gt_pch_nx (gimple&)): Remove declaration.
> (gt_pch_nx (edge_def *)): Update as per the mx hook.

Btw, this shows that gimple isn't a true C++ hierarchy - because of GTY
you can only ever use 'gimple' pointers, not more specialized ones
like gimple_phi as you are missing the GTY machinery for them.

I'm not 100% convinced that we should do all this at this point without
getting a better hand on the gengtype issues (it's partial transition to
the C++ world of GCC)

> ---
>  gcc/gimple.c   | 743 
> +
>  gcc/gimple.h   |   6 +
>  gcc/tree-cfg.c |   6 +-
>  3 files changed, 751 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 1ad36d1..dd99cda 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -4338,4 +4338,747 @@ build_type_cast (tree to_type, gimple op, enum 
> ssa_mode mode)
>return build_type_cast (to_type, gimple_assign_lhs (op), mode);
>  }
>
> +void
> +gt_ggc_mx (gimple gs)
> +{
> +  gimple x = gs;
> +  /* Emulation of the "chain_next" GTY attribute.
> +
> + gs has already been marked.
> + Iterate the chain of next statements, marking until we reach one that
> + has already been marked, or NULL.   */
> +  gimple xlimit = gs->next;
> +  while (ggc_test_and_set_mark (xlimit))
> +xlimit = xlimit->next;
> +
> +  /* All of the statements within the half-open interval [x..xlimit) have
> + just been marked.  Iterate through the list, visiting their fields.  */
> +  while (x != xlimit)
> +{
> +  gt_ggc_m_15basic_block_def (x->bb);
> +  switch (gimple_statement_structure (&((*x
> +   {
> +   case GSS_BASE:
> + break;
> +   case GSS_WITH_OPS:
> + {
> +   gimple_statement_with_ops *stmt
> + = static_cast  (x);
> +   size_t num = (size_t)(stmt->num_ops);
> +   for (size_t i = 0; i != num; i++)
> + gt_ggc_m_9tree_node (stmt->op[i]);
> + }
> + break;
> +   case GSS_WITH_MEM_OPS_BASE:
> + break;
> +   case GSS_WITH_MEM_OPS:
> + {
> +   gimple_statement_with_memory_ops *stmt
> + = static_cast  (x);
> +   size_t num = (size_t)(stmt->num_ops);
> +   for (size_t i = 0; i != num; i++)
> + gt_ggc_m_9tree_node (stmt->op[i]);
> + }
> + break;
> +   case GSS_CALL:
> + {
> +   gimple_statement_call *stmt
> + = static_cast  (x);
> +   gt_ggc_m_15bitmap_head_def (stmt->call_used.vars);
> +   gt_ggc_m_15bitmap_head_def (stmt->call_clobbered.vars);
> +   switch (stmt->subcode & GF_CALL_INTERNAL)
> + {
> + case 0:
> +   gt_ggc_m_9tree_node (stmt->u.fntype);
> +   break;
> + case GF_CALL_INTERNAL:
> +   break;
> + default:
> +   break;
> + }
> +   size_t num = (size_t)(stmt->num_ops);
> +   for (size_t i = 0; i != num; i++)
> + gt_ggc_m_9tree_node (stmt->op[i]);
> + }
> + break;
> +   case GSS_OMP:
> + {
> +   gimple_statement_omp *stmt
> + = static_cast  (x);
> +   gt_ggc_mx_gimple_statement_base (stmt->body);
> + }
> + break;
> +   case GSS_BIND:
> + {
> +gimple_statement_bind *stmt
> + = static_cast  (x);
> +   gt_ggc_m_9tree_node (stmt->vars);
> +   gt_ggc_m_9tree_node (stmt->block);
> +   gt_ggc_mx_gimple_statement_base (stmt->body);
> + }
> + break;
> +   case GSS_CATCH:
> + {
> +   gimple_statement_catch *stmt
> + = static_cast  (x);
> +   gt_ggc_m_9tree_node (stmt->types);
> +   gt_ggc_mx_gimple_statement_base (stmt->handler);
> + }
> + break;
> +   case GSS_EH_FILTER:
> +  

Re: Request to merge Undefined Behavior Sanitizer in (take 3)

2013-08-30 Thread Jakub Jelinek
On Fri, Aug 30, 2013 at 10:38:51AM +0200, Marek Polacek wrote:
> On Fri, Aug 30, 2013 at 10:15:44AM +0200, Jakub Jelinek wrote:
> > So, can you please post a new final patch for the merge (with the new 
> > directories
> > or files in libsanitizer/ just listed in the ChangeLog entries, but not
> > actually included in the patch, that would make it too large and anyone can
> > look at libsanitizer/ubsan/ on the branch)?
> 
> Yep.  This is diff between trunk and the ubsan branch without new
> files.
> 
> BTW, when merging the ChangeLog.ubsan into normal ChangeLog, should I
> change the CL entry dates to the day of the merge into the trunk, or 
> can I keep them as they are?

Usually you write a new ChangeLog entry (per changed directory) that
summarizes all the changes, using the current date.  So, e.g. for the new
files you just mention * ubsan.c: New file. etc. and don't list all the
follow-up changes.

Jakub


Re: wide-int branch now up for public comment and review

2013-08-30 Thread Richard Biener
On Thu, 29 Aug 2013, Mike Stump wrote:

> On Aug 29, 2013, at 12:36 AM, Richard Biener  wrote:
> > On Wed, 28 Aug 2013, Mike Stump wrote:
> > 
> >> On Aug 28, 2013, at 3:22 AM, Richard Biener  wrote:
> >>> Btw, rtl.h still wastes space with
> >>> 
> >>> struct GTY((variable_size)) hwivec_def {
> >>> int num_elem; /* number of elements */
> >>> HOST_WIDE_INT elem[1];
> >>> };
> >>> 
> >>> struct GTY((chain_next ("RTX_NEXT (&%h)"),
> >>>   chain_prev ("RTX_PREV (&%h)"), variable_size)) rtx_def {
> >>> ...
> >>> /* The first element of the operands of this rtx.
> >>>The number of operands and their types are controlled
> >>>by the `code' field, according to rtl.def.  */
> >>> union u {
> >>>   rtunion fld[1];
> >>>   HOST_WIDE_INT hwint[1];
> >>>   struct block_symbol block_sym;
> >>>   struct real_value rv;
> >>>   struct fixed_value fv;
> >>>   struct hwivec_def hwiv;
> >>> } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;
> >>> };
> >>> 
> >>> there are 32bits available before the union.  If you don't use
> >>> those for num_elem then all wide-ints will at least take as
> >>> much space as DOUBLE_INTs originally took - and large ints
> >>> that would have required DOUBLE_INTs in the past will now
> >>> require more space than before.  Which means your math
> >>> motivating the 'num_elem' encoding stuff is wrong.  With
> >>> moving 'num_elem' before u you can even re-use the hwint
> >>> field in the union as the existing double-int code does
> >>> (which in fact could simply do the encoding trick in the
> >>> old CONST_DOUBLE scheme, similar for the tree INTEGER_CST
> >>> container).
> >> 
> >> So, HOST_WIDE_INT is likely 64 bits, and likely is 64 bit aligned.  The 
> >> base (stuff before the union) is 32 bits.  There is a 32 bit gap, even 
> >> if not used before the HOST_WIDE_INT elem.  We place the num_elem is 
> >> this gap.
> > 
> > No, you don't.  You place num_elem 64bit aligned _after_ the gap.
> > And you have another 32bit gap, as you say, before elem.
> 
> Ah, ok, I get it, thanks for the explanation.  This removes the second 
> gap creator and puts the field into the gap before the u union.

 struct GTY((variable_size)) hwivec_def {
-  int num_elem;/* number of elements */
   HOST_WIDE_INT elem[1];
 };

no need to wrap this in an extra struct type.  In fact you can
re-use the hwint member and its accessors in

  union u {
rtunion fld[1];
HOST_WIDE_INT hwint[1];
struct block_symbol block_sym;
struct real_value rv;
struct fixed_value fv;
  } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;

Richard.






Merge profiles of duplicated COMDAT functions

2013-08-30 Thread Jan Hubicka
Hi,
this patch makes COMDAT profiles right with LTO. (made jointly with Martin 
Liska)

To recap the issue:  COMDAT functions are produced many times. Each copy gets
its own set of counters and depending on inlining and linker decision, one 
or more of copies of a given COMDAT function will get executed on runtime.
After reading profile in, the profiles can be wrong when inlining/linker
decision differs at -fprofile-use stage.

This has nasty effect of optimizing important stuff for size.

Now with LTO the problem is solved, since early inlining things works magically
well.  Catch is that for large projects, we tend to explode with
-fprofile-generate -flto and we explicitely ask users to not use -flto when
generating profile.  This brings the problem back.

This patch makes profiles of multiple copies of given comdat to be merged during
LTO symtab merging.  This is done in the busy way: both functions are read into
memory, compared if CFGs match, profiles are merged, cgraph profile is updated
based on CFG profile and one of the bodies is released from memory.  The other
body will then be streamed by WPA as if the function was born during WPA time.

This is not terribly cheap by design (we load COMDATs into WPA memory), but
reading happens only when COMDAT gets merged and more than one of them has
non-0 count from profile feedback. Moreover this whole path is executed only
when -fno-lto is used for -fprofile-generate.

The compile time/memory impact seems under 1% both on GCC and firefox.  For GCC
profiledbootstrap we merge 1600 functions, mostly vector accestors and tree
checking (I tried checking enabled build). 

To make things even ugier, there is nasty special case where we already merged
function declarations, but we absolutely need to read two different bodies of
the same function.  I did so by copying the declaration of function I am going
to remove.  This provoke checking failure on labels that are in global stream
and thus have wrong context pointers in one of the bodies.  This is harmless
since we are going to throw it away, but it requires special case silencing of
the sanity check for LTO stream in.  We may want to avoid streaming in gimple
statements completely, but we need to merge statement histograms so that will
require a bit of massaging of the on-disk format to make this possible.
(histograms are currently stored into statement section that needs to be
changed). I plan to look into this incrementally.

Now for longer term, we want to make function CFGs independent of gimple body
and we want to decide on instrumentation at linktime, so we won't require user
to rebuild with -fprofile-use, just relink.  I plan to work on this, but 
not for 4.9.  Thus I hope we can go with this fix - the COMDAT profile loss
issue has proved itself to be very difficult to deal with and very common
for C++ programs.

Bootstrapped/regtested x86_64-liux, profiledbootstrapped and tested
on firefox.

If there will be no real opposition, I will go ahead with this patch during
weekend, so it is picked by periodic testers.

Honza

* lto-symtab.c (lto_cgraph_replace_node): Merge function profiles.
* cgraph.c (cgraph_get_body): Update call of input_function_body.
* ipa-utils.c: Include lto-streamer.h and ipa-inline.h
(ipa_merge_profiles): New function.
* ipa-utils.h (ipa_merge_profiles): Declare.
* lto-streamer-in.c (lto_input_function_body): Change to use
cgraph_node as parameter.
(lto_read_body): Take cgraph node as parameter.
Index: lto-symtab.c
===
--- lto-symtab.c(revision 202099)
+++ lto-symtab.c(working copy)
@@ -80,6 +82,7 @@
   /* Redirect incomming references.  */
   ipa_clone_referring ((symtab_node)prevailing_node, &node->symbol.ref_list);
 
+  ipa_merge_profiles (prevailing_node, node);
   lto_free_function_in_decl_state_for_node ((symtab_node)node);
 
   if (node->symbol.decl != prevailing_node->symbol.decl)
Index: cgraph.c
===
--- cgraph.c(revision 202099)
+++ cgraph.c(working copy)
@@ -3109,7 +3109,7 @@
 
   gcc_assert (DECL_STRUCT_FUNCTION (decl) == NULL);
 
-  lto_input_function_body (file_data, node->symbol.decl, data);
+  lto_input_function_body (file_data, node, data);
   lto_stats.num_function_bodies++;
   lto_free_section_data (file_data, LTO_section_function_body, name,
 data, len);
Index: ipa-utils.c
===
--- ipa-utils.c (revision 202099)
+++ ipa-utils.c (working copy)
@@ -37,6 +37,8 @@
 #include "flags.h"
 #include "diagnostic.h"
 #include "langhooks.h"
+#include "lto-streamer.h"
+#include "ipa-inline.h"
 
 /* Debugging function for postorder and inorder code. NOTE is a string
that is printed before the nodes are printed.  ORDER is an array of
@@ -618,3 +620,174 @@
 {
   dump_varpool_node_set

Re: Folding (a ? b : c) op d

2013-08-30 Thread Marc Glisse


Ping:

http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01624.html

and the related:

http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00230.html

--
Marc Glisse


[PATCH, c++ testsuite]: Improve g++.dg/abi/mangle33.C scan-asm pattern

2013-08-30 Thread Uros Bizjak
Hello!

I was looking at annoying

ERROR: g++.dg/abi/mangle33.C -std=c++98: error executing dg-final:
couldn't compile regular expression pattern: out of memory
UNRESOLVED: g++.dg/abi/mangle33.C -std=c++98: error executing
dg-final: couldn't compile regular expression pattern: out of memory
ERROR: g++.dg/abi/mangle33.C -std=c++11: error executing dg-final:
couldn't compile regular expression pattern: out of memory
UNRESOLVED: g++.dg/abi/mangle33.C -std=c++11: error executing
dg-final: couldn't compile regular expression pattern: out of memory

problem, introduced with Tcl 8.6 [1] that limits the stack of the RE compiler.

The following patch tried to improve the situation by using RE match
count feature, but unfortunately it doesn't satisfy the new Tcl 8.6
limitation.

Nevertheless, the patch looks like a good cleanup of the RE scan pattern.

2013-08-30  Uros Bizjak  

* g++.dg/abi/mangle33.C (dg-final): Use match count in scan RE.

The patch was tested with Tcl 8.5 where it works without problems.

OK for mainline?

BTW: There are a couple of similar problems with Tcl 8.6:

ERROR: tcl error sourcing
/home/uros/gcc-svn/trunk/libffi/testsuite/libffi.call/call.exp.
ERROR: couldn't compile regular expression pattern: out of memory

and

ERROR: tcl error sourcing
/home/uros/gcc-svn/trunk/libmudflap/testsuite/libmudflap.cth/cthfrags.exp.
ERROR: couldn't compile regular expression pattern: out of memory

[1] http://sourceforge.net/p/tcl/bugs/5178/

Uros.
Index: g++.dg/abi/mangle33.C
===
--- g++.dg/abi/mangle33.C   (revision 202044)
+++ g++.dg/abi/mangle33.C   (working copy)
@@ -15,5 +15,5 @@
   int j;
 }
 
-// { dg-final { scan-assembler 
"_ZN4043abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcd

Re: [C++ Patch] PR 51424

2013-08-30 Thread Paolo Carlini

Hi,

On 08/29/2013 09:40 PM, Jason Merrill wrote:

On 08/29/2013 11:24 AM, Paolo Carlini wrote:

+  if ((complain & tf_error)
+  && (flags & LOOKUP_DELEGATING_CONS)
+  && name == complete_ctor_identifier
+  && TREE_CODE (ret) == CALL_EXPR
+  && (DECL_ABSTRACT_ORIGIN (TREE_OPERAND (CALL_EXPR_FN (ret), 0))
+  == current_function_decl))
+error ("constructor delegates to itself");
How about doing this check in perform_target_ctor instead, so you 
don't need another LOOKUP flag?
I should have explained some of that in better detail. The main issue I 
had yesterday, is that the pattern matching can easily become very 
difficult if not impossible: if you look at the second half of 
expand_default_init, in some cases we wrap the returned CALL_EXPR in a 
COND_EXPR, also, even more difficult, in case of constexpr constructors, 
we don't have CALL_EXPRs at all (yesterday my first drafts failed very 
badly for those). Only early enough, around the end of the first 
build_special_member_call call we uniformly have CALL_EXPRs.


I could, for example pass down a separate bit, instead of playing again 
with the LOOKUP_* bits. At some point yesterday I even had that version 
tested ;)


What do you think?

Thanks,
Paolo.


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-30 Thread Jan Hubicka
> Great! Is this the LTO merging you were talking about in an earlier
> message, or the gcov runtime fix (that would presumably not be
> lto-specific)?

It is the LTO path - we need to merge profiles there anyway for his code 
unification
work.

> > I have patch to track this.  Moreover vforks seems to produce repeated
> > merging of results.
> 
> Aha, does this explain the gimp issue as well?

Not really - we still need to debug why we hit cold section so many times with
partitioning.  I sitll think easier approach will be to lock the cold section 
and
then start probably with testsuite (i.e. write script to compile the small 
testcases
with FDO + partitioning and see what crash by hitting cold section).

> >
> > Is it really necessary to run this from internal loop of the cfgcleanup?
> 
> The reason I added it here is that just below there is a call to
> verify_flow_info, and that will fail with the new verification.

Hmm, OK, I suppose we run the cleanup after partitioning just once or twice, 
right?
We can track this incrementally - I am not sure if we put it from the internal 
iteration
loop we would get anything substantial either.
Removing unreachable blocks twice is however ugly.

> +/* Ensure that all hot bbs are included in a hot path through the
> +   procedure. This is done by calling this function twice, once
> +   with WALK_UP true (to look for paths from the entry to hot bbs) and
> +   once with WALK_UP false (to look for paths from hot bbs to the exit).
> +   Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
> +   to BBS_IN_HOT_PARTITION.  */
> +
> +static unsigned int
> +sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
> +vec *bbs_in_hot_partition)
> +{
> +  /* Callers check this.  */
> +  gcc_checking_assert (cold_bb_count);
> +
> +  /* Keep examining hot bbs while we still have some left to check
> + and there are remaining cold bbs.  */
> +  vec hot_bbs_to_check = bbs_in_hot_partition->copy ();
> +  while (! hot_bbs_to_check.is_empty ()
> + && cold_bb_count)
> +{
> +  basic_block bb = hot_bbs_to_check.pop ();
> +  vec *edges = walk_up ? bb->preds : bb->succs;
> +  edge e;
> +  edge_iterator ei;
> +  int highest_probability = 0;
> +  int highest_freq = 0;
> +  gcov_type highest_count = 0;
> +  bool found = false;
> +
> +  /* Walk the preds/succs and check if there is at least one already
> + marked hot. Keep track of the most frequent pred/succ so that we
> + can mark it hot if we don't find one.  */
> +  FOR_EACH_EDGE (e, ei, edges)
> +{
> +  basic_block reach_bb = walk_up ? e->src : e->dest;
> +
> +  if (e->flags & EDGE_DFS_BACK)
> +continue;
> +
> +  if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
> +  {
> +found = true;
> +break;
> +  }
> +  /* The following loop will look for the hottest edge via
> + the edge count, if it is non-zero, then fallback to the edge
> + frequency and finally the edge probability.  */
> +  if (e->count > highest_count)
> +highest_count = e->count;
> +  int edge_freq = EDGE_FREQUENCY (e);
> +  if (edge_freq > highest_freq)
> +highest_freq = edge_freq;
> +  if (e->probability > highest_probability)
> +highest_probability = e->probability;
> +}
> +
> +  /* If bb is reached by (or reaches, in the case of !WALK_UP) another 
> hot
> + block (or unpartitioned, e.g. the entry block) then it is ok. If 
> not,
> + then the most frequent pred (or succ) needs to be adjusted.  In the
> + case where multiple preds/succs have the same frequency (e.g. a
> + 50-50 branch), then both will be adjusted.  */
> +  if (found)
> +continue;
> +
> +  FOR_EACH_EDGE (e, ei, edges)
> +{
> +  if (e->flags & EDGE_DFS_BACK)
> +continue;
> +  /* Select the hottest edge using the edge count, if it is non-zero,
> + then fallback to the edge frequency and finally the edge
> + probability.  */
> +  if (highest_count)
> +{
> +  if (e->count < highest_count)
> +continue;
> +}
> +  else if (highest_freq)

The frequency condition needs to be done only when you walk predecestors - when
you walk down the edge probabilities are just fine.

The patch seems OK to me now.  I will make our FDO tester to use partitioning 
so we get
this benchmarked a bit.

Honza


Re: Merge profiles of duplicated COMDAT functions

2013-08-30 Thread Richard Biener
On Fri, 30 Aug 2013, Jan Hubicka wrote:

> Hi,
> this patch makes COMDAT profiles right with LTO. (made jointly with Martin 
> Liska)
> 
> To recap the issue:  COMDAT functions are produced many times. Each copy gets
> its own set of counters and depending on inlining and linker decision, one 
> or more of copies of a given COMDAT function will get executed on runtime.
> After reading profile in, the profiles can be wrong when inlining/linker
> decision differs at -fprofile-use stage.
> 
> This has nasty effect of optimizing important stuff for size.
> 
> Now with LTO the problem is solved, since early inlining things works 
> magically
> well.  Catch is that for large projects, we tend to explode with
> -fprofile-generate -flto and we explicitely ask users to not use -flto when
> generating profile.  This brings the problem back.
> 
> This patch makes profiles of multiple copies of given comdat to be merged 
> during
> LTO symtab merging.  This is done in the busy way: both functions are read 
> into
> memory, compared if CFGs match, profiles are merged, cgraph profile is updated
> based on CFG profile and one of the bodies is released from memory.  The other
> body will then be streamed by WPA as if the function was born during WPA time.
> 
> This is not terribly cheap by design (we load COMDATs into WPA memory), but
> reading happens only when COMDAT gets merged and more than one of them has
> non-0 count from profile feedback. Moreover this whole path is executed only
> when -fno-lto is used for -fprofile-generate.
> 
> The compile time/memory impact seems under 1% both on GCC and firefox.  For 
> GCC
> profiledbootstrap we merge 1600 functions, mostly vector accestors and tree
> checking (I tried checking enabled build). 
> 
> To make things even ugier, there is nasty special case where we already merged
> function declarations, but we absolutely need to read two different bodies of
> the same function.  I did so by copying the declaration of function I am going
> to remove.  This provoke checking failure on labels that are in global stream
> and thus have wrong context pointers in one of the bodies.  This is harmless
> since we are going to throw it away, but it requires special case silencing of
> the sanity check for LTO stream in.  We may want to avoid streaming in gimple
> statements completely, but we need to merge statement histograms so that will
> require a bit of massaging of the on-disk format to make this possible.
> (histograms are currently stored into statement section that needs to be
> changed). I plan to look into this incrementally.
> 
> Now for longer term, we want to make function CFGs independent of gimple body
> and we want to decide on instrumentation at linktime, so we won't require user
> to rebuild with -fprofile-use, just relink.  I plan to work on this, but 
> not for 4.9.  Thus I hope we can go with this fix - the COMDAT profile loss
> issue has proved itself to be very difficult to deal with and very common
> for C++ programs.
> 
> Bootstrapped/regtested x86_64-liux, profiledbootstrapped and tested
> on firefox.
> 
> If there will be no real opposition, I will go ahead with this patch during
> weekend, so it is picked by periodic testers.
> 
> Honza
> 
>   * lto-symtab.c (lto_cgraph_replace_node): Merge function profiles.
>   * cgraph.c (cgraph_get_body): Update call of input_function_body.
>   * ipa-utils.c: Include lto-streamer.h and ipa-inline.h
>   (ipa_merge_profiles): New function.
>   * ipa-utils.h (ipa_merge_profiles): Declare.
>   * lto-streamer-in.c (lto_input_function_body): Change to use
>   cgraph_node as parameter.
>   (lto_read_body): Take cgraph node as parameter.
> Index: lto-symtab.c
> ===
> --- lto-symtab.c  (revision 202099)
> +++ lto-symtab.c  (working copy)
> @@ -80,6 +82,7 @@
>/* Redirect incomming references.  */
>ipa_clone_referring ((symtab_node)prevailing_node, &node->symbol.ref_list);
>  
> +  ipa_merge_profiles (prevailing_node, node);
>lto_free_function_in_decl_state_for_node ((symtab_node)node);
>  
>if (node->symbol.decl != prevailing_node->symbol.decl)
> Index: cgraph.c
> ===
> --- cgraph.c  (revision 202099)
> +++ cgraph.c  (working copy)
> @@ -3109,7 +3109,7 @@
>  
>gcc_assert (DECL_STRUCT_FUNCTION (decl) == NULL);
>  
> -  lto_input_function_body (file_data, node->symbol.decl, data);
> +  lto_input_function_body (file_data, node, data);
>lto_stats.num_function_bodies++;
>lto_free_section_data (file_data, LTO_section_function_body, name,
>data, len);
> Index: ipa-utils.c
> ===
> --- ipa-utils.c   (revision 202099)
> +++ ipa-utils.c   (working copy)
> @@ -37,6 +37,8 @@
>  #include "flags.h"
>  #include "diagnostic.h"
>  #include "langhooks.h"
> +#incl

Re: Folding (a ? b : c) op d

2013-08-30 Thread Richard Biener
On Sat, Jun 29, 2013 at 9:02 AM, Marc Glisse  wrote:
> Hello,
>
> this patch changes the test deciding whether to fold "op d" with both
> branches in (a ? b : c) op d. I don't know if the new test is right, it
> gives what I expect on the new testcase, but I may have missed important
> cases. Cc: Eric for comments as the author of the previous conditions.
>
> Passes bootstrap+testsuite on x86_64-unknown-linux-gnu.
>
> 2013-06-29  Marc Glisse  
>
> PR tree-optimization/57755
> gcc/
> * fold-const.c (fold_binary_op_with_conditional_arg): Change
> condition under which the transformation is performed.
>
> gcc/testsuite/
> * gcc.dg/pr57755.c: New testcase.
> * gcc.dg/binop-notand1a.c: Remove xfail.
> * gcc.dg/binop-notand4a.c: Likewise.
>
> --
> Marc Glisse
> Index: fold-const.c
> ===
> --- fold-const.c(revision 200556)
> +++ fold-const.c(working copy)
> @@ -6097,26 +6097,33 @@ constant_boolean_node (bool value, tree
> given here), it is the second argument.  TYPE is the type of the
> original expression.  Return NULL_TREE if no simplification is
> possible.  */
>
>  static tree
>  fold_binary_op_with_conditional_arg (location_t loc,
>  enum tree_code code,
>  tree type, tree op0, tree op1,
>  tree cond, tree arg, int cond_first_p)
>  {
> -  tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1);
> -  tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0);
> +  /* ??? This transformation is only worthwhile if we don't have
> + to wrap ARG in a SAVE_EXPR.  */
> +  if (TREE_SIDE_EFFECTS (arg))
> +return NULL_TREE;
> +
> +  /* Avoid exponential recursion.  */
> +  static int depth = 0;
> +  if (depth > 1)
> +return NULL_TREE;
> +

I don't like this kind of measures ... which one again is the transform that
undoes what this function does?

>tree test, true_value, false_value;
>tree lhs = NULL_TREE;
>tree rhs = NULL_TREE;
> -  enum tree_code cond_code = COND_EXPR;
>
>if (TREE_CODE (cond) == COND_EXPR
>|| TREE_CODE (cond) == VEC_COND_EXPR)
>  {
>test = TREE_OPERAND (cond, 0);
>true_value = TREE_OPERAND (cond, 1);
>false_value = TREE_OPERAND (cond, 2);
>/* If this operand throws an expression, then it does not make
>  sense to try to perform a logical or arithmetic operation
>  involving it.  */
> @@ -6126,54 +6133,49 @@ fold_binary_op_with_conditional_arg (loc
> rhs = false_value;
>  }
>else
>  {
>tree testtype = TREE_TYPE (cond);
>test = cond;
>true_value = constant_boolean_node (true, testtype);
>false_value = constant_boolean_node (false, testtype);
>  }
>
> -  if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE)
> -cond_code = VEC_COND_EXPR;
> -
> -  /* This transformation is only worthwhile if we don't have to wrap ARG
> - in a SAVE_EXPR and the operation can be simplified without recursing
> - on at least one of the branches once its pushed inside the COND_EXPR.
> */
> -  if (!TREE_CONSTANT (arg)
> -  && (TREE_SIDE_EFFECTS (arg)
> - || TREE_CODE (arg) == COND_EXPR || TREE_CODE (arg) ==
> VEC_COND_EXPR
> - || TREE_CONSTANT (true_value) || TREE_CONSTANT (false_value)))
> -return NULL_TREE;
> +  tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1);
> +  tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0);
>
>arg = fold_convert_loc (loc, arg_type, arg);
> +  ++depth;
>if (lhs == 0)
>  {
>true_value = fold_convert_loc (loc, cond_type, true_value);
>if (cond_first_p)
> lhs = fold_build2_loc (loc, code, type, true_value, arg);
>else
> lhs = fold_build2_loc (loc, code, type, arg, true_value);
>  }
>if (rhs == 0)
>  {
>false_value = fold_convert_loc (loc, cond_type, false_value);
>if (cond_first_p)
> rhs = fold_build2_loc (loc, code, type, false_value, arg);
>else
> rhs = fold_build2_loc (loc, code, type, arg, false_value);
>  }
> +  --depth;
>
>/* Check that we have simplified at least one of the branches.  */
> -  if (!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
> +  if (!TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
>  return NULL_TREE;
>
> +  enum tree_code cond_code = VECTOR_TYPE_P (TREE_TYPE (test))
> +? VEC_COND_EXPR : COND_EXPR;
>return fold_build3_loc (loc, cond_code, type, test, lhs, rhs);
>  }
>
>
>  /* Subroutine of fold() that checks for the addition of +/- 0.0.
>
> If !NEGATE, return true if ADDEND is +/-0.0 and, for all X of type
> TYPE, X + ADDEND is the same as X.  If NEGATE, return true if X -
> ADDEND is the same as X.
>
> Index: testsuite/gcc.dg/binop-notan

Re: COMDAT missing profile handling (was Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition)

2013-08-30 Thread Jan Hubicka
> > The esitmated profile is already there before reading the profile in, so we
> > only do not want to overwrite it.  Does the following work for you?
> 
> It does get the estimated frequencies on the bbs.

Good.

> > We wil also need to solve problem that in this case cgraph edges will have 
> > 0 profile.
> > We probably want to play the game there and just do the scaling for edge 
> > count,
> > since IPA passes probably do not want to care about partial profiles.
> 
> The problem I am having is that my new freqs_to_counts routine takes
> the sum of the incoming edge counts and computes the bb counts by
> scaling by the estimated bb frequency. But in the case where the
> caller was also missing profile data the edge count will have 0
> profile as you note above, so the counts remain 0. I am not quite sure
> how to handle this. Instead of using the sum of the incoming profile
> counts, I could just pick a default profile count to use as the entry
> bb count. The synthesized counts should get scaled down appropriately
> during inlining.

I am having problem to think of what default we should pick here.  

Thinking about the whole problem, I think the following sounds like possible
sollution:

1) use the patch above to get frequencies computed where counts are 0.
   Make sure that branch probailities are guessed for functions with non-0
   counts for each of branch that has count 0.
2) At a point we see inconsistency in the IPA profile (i.e. we get to fact
   that we have direct calls with non-0 counts to a comdat with 0 count),
   drop the profile_status for them to profile_guessed and let the backed
   optimize them as if there was no FDO.
   I do not think we can do better for those.

   If we do not see this problem, we can keep status as READ so the function
   will get size optimized.  Here we will lose for indirect calls, but I
   do not see how to handle these short of making every 0 count COMDAT guessed.
   Perhaps you can try how these variants work for you performance/code size
   wise.
3) at a time we inline function with guessed profile into function with read
   profile, we will use your trick to feed the counts in based on frequencies.

   We will do that in tree-inline when inlining. I.e. we will never feed fake
   counts into non-inline functions
4) we will want to sanitize callgarph, too.  This means looking for GUESSED
   profile functions and feeding their counts/edge counts based on sum of
   the incomming counts.

   We can do this as part of ipa-profile pass (I will move it to separate file
   as it is getting more complex) Here we will have the propagation issue you
   mention above.  Either we can ignore it or we can iterate.

3) and 4) will absolutely need some capping to be sure we won't make the
synthetized counts bigger than other counts in program.

I think we have code for 1-3. If the plan looks sane to you, I think we can
start getting it in?

Of course we want to eliminate most of the problems by getting runtime to
do the merging... (but the merging will not be posible always due to 
cfg differences comming from different optimization flags, anyway).

Honza


Re: folding (vec_)cond_expr in a binary operation

2013-08-30 Thread Richard Biener
On Sat, Jul 6, 2013 at 9:42 PM, Marc Glisse  wrote:
> Hello,
>
> the first attached patch does not bootstrap and has at least 2 main issues.
> The second patch does pass bootstrap+testsuite, but I liked the first
> more...
>
> First, the fold-const bit causes an assertion failure (building libjava) in
> combine_cond_expr_cond, which calls:
>
>   t = fold_binary_loc (gimple_location (stmt), code, type, op0, op1);
>
> and then checks:
>
>   /* Require that we got a boolean type out if we put one in.  */
>   gcc_assert (TREE_CODE (TREE_TYPE (t)) == TREE_CODE (type));
>
> which makes sense... Note that the 2 relevant types are:

well, the check is probably old and can be relaxed to also allow all
compatible types.

> (gdb) call debug_tree((tree)0x75e78930)
>   size  bitsizetype> constant 8>
> unit size  sizetype> constant 1>
> align 8 symtab -169408800 alias set -1 canonical type 0x76635c78
> precision 1 min  max  0x76632200 1>
> pointer_to_this  chain  0x76635dc8>>
> (gdb) call debug_tree(type)
>   unsigned QI
> size  bitsizetype> constant 8>
> unit size  sizetype> constant 1>
> align 8 symtab -170348640 alias set 19 canonical type 0x764f69d8
> precision 1 min  max  0x764f92e0 1>
> pointer_to_this >
>
> I need to relax the conditions on the vec?-1:0 optimization because with
> vectors we often end up with several types that are equivalent. Can you
> think of a good way to do it? In the second patch I limit the transformation
> to vectors and hope it doesn't cause as much trouble there...
>
>
>
> Second, the way the forwprop transformation is done, it can be necessary to
> run the forwprop pass several times in a row, which is not nice. It takes:
>
> stmt_cond
> stmt_binop
>
> and produces:
>
> stmt_folded
> stmt_cond2
>
> But one of the arguments of stmt_folded could be an ssa_name defined by a
> cond_expr, which could now be propagated into stmt_folded (there may be
> other new possible transformations). However, that cond_expr has already
> been visited and won't be again. The combine part of the pass uses a PLF to
> revisit statements, but the forwprop part doesn't have any specific
> mechanism.

forwprop is designed to re-process stmts it has folded to catch cascading
effects.  Now, as it as both a forward and a backward run you don't catch
2nd-order effects with iterating them.  On my TODO is to only do one kind,
either forward or backward propagation.

> The workarounds I can think of are:
> - manually check if some of the arguments of stmt_folded are cond_expr and
> recursively call the function on them;
> - move the transformation to the combine part of the pass;

this it is.

> - setup some system to revisit all earlier statements?
>
> I went with the second option in the second patch, but this limitation of
> forward propagation seems strange to me.
>
>
> (s/combine_binop_with_condition/forward_propagate_condition/ for the first
> patch)

Btw, as for the patch I don't like that you basically feed everything into
fold.  Yes, I know we do that for conditions because that's quite important
and nobody has re-written the condition folding as gimple pattern matcher.
I doubt that this transform is important enough to warrant another exception ;)

Richard.

> 2013-07-06  Marc Glisse  
>
> PR tree-optimization/57755
> gcc/
> * tree-ssa-forwprop.c (combine_binop_with_condition): New function.
> (ssa_forward_propagate_and_combine): Call it.
> * fold-const.c (fold_ternary_loc): In gimple form, don't check for
> type equality.
>
> gcc/testsuite/
> * g++.dg/tree-ssa/pr57755.C: New testcase.
>
> (this message does not supersede
> http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01624.html )
>
> --
> Marc Glisse
> Index: fold-const.c
> ===
> --- fold-const.c(revision 200736)
> +++ fold-const.c(working copy)
> @@ -14124,21 +14124,23 @@ fold_ternary_loc (location_t loc, enum t
>
>/* Convert A ? 1 : 0 to simply A.  */
>if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
>  : (integer_onep (op1)
> && !VECTOR_TYPE_P (type)))
>   && integer_zerop (op2)
>   /* If we try to convert OP0 to our type, the
>  call to fold will try to move the conversion inside
>  a COND, which will recurse.  In that case, the COND_EXPR
>  is probably the best choice, so leave it alone.  */
> - && type == TREE_TYPE (arg0))
> + && (type == TREE_TYPE (arg0)
> + || (in_gimple_form
> + && useless_type_conversion_p (type, TREE_TYPE (arg0)
> return pedantic_non_lvalue_loc (loc, arg0);
>
>/* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
>  over COND_EXPR in cases such as floating point comparisons.  */
>if (integer_zerop (op1)
>   && (code =

Re: Folding (a ? b : c) op d

2013-08-30 Thread Marc Glisse

On Fri, 30 Aug 2013, Richard Biener wrote:


On Sat, Jun 29, 2013 at 9:02 AM, Marc Glisse  wrote:

Hello,

this patch changes the test deciding whether to fold "op d" with both
branches in (a ? b : c) op d. I don't know if the new test is right, it
gives what I expect on the new testcase, but I may have missed important
cases. Cc: Eric for comments as the author of the previous conditions.

Passes bootstrap+testsuite on x86_64-unknown-linux-gnu.

2013-06-29  Marc Glisse  

PR tree-optimization/57755
gcc/
* fold-const.c (fold_binary_op_with_conditional_arg): Change
condition under which the transformation is performed.

gcc/testsuite/
* gcc.dg/pr57755.c: New testcase.
* gcc.dg/binop-notand1a.c: Remove xfail.
* gcc.dg/binop-notand4a.c: Likewise.

--
Marc Glisse
Index: fold-const.c
===
--- fold-const.c(revision 200556)
+++ fold-const.c(working copy)
@@ -6097,26 +6097,33 @@ constant_boolean_node (bool value, tree
given here), it is the second argument.  TYPE is the type of the
original expression.  Return NULL_TREE if no simplification is
possible.  */

 static tree
 fold_binary_op_with_conditional_arg (location_t loc,
 enum tree_code code,
 tree type, tree op0, tree op1,
 tree cond, tree arg, int cond_first_p)
 {
-  tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1);
-  tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0);
+  /* ??? This transformation is only worthwhile if we don't have
+ to wrap ARG in a SAVE_EXPR.  */
+  if (TREE_SIDE_EFFECTS (arg))
+return NULL_TREE;
+
+  /* Avoid exponential recursion.  */
+  static int depth = 0;
+  if (depth > 1)
+return NULL_TREE;
+


I don't like this kind of measures ... which one again is the transform that
undoes what this function does?


There is no undoing here (you may be thinking of 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57286 ), it is just a 
recursion that gets very slow for nested conditions:


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55219


   tree test, true_value, false_value;
   tree lhs = NULL_TREE;
   tree rhs = NULL_TREE;
-  enum tree_code cond_code = COND_EXPR;

   if (TREE_CODE (cond) == COND_EXPR
   || TREE_CODE (cond) == VEC_COND_EXPR)
 {
   test = TREE_OPERAND (cond, 0);
   true_value = TREE_OPERAND (cond, 1);
   false_value = TREE_OPERAND (cond, 2);
   /* If this operand throws an expression, then it does not make
 sense to try to perform a logical or arithmetic operation
 involving it.  */
@@ -6126,54 +6133,49 @@ fold_binary_op_with_conditional_arg (loc
rhs = false_value;
 }
   else
 {
   tree testtype = TREE_TYPE (cond);
   test = cond;
   true_value = constant_boolean_node (true, testtype);
   false_value = constant_boolean_node (false, testtype);
 }

-  if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE)
-cond_code = VEC_COND_EXPR;
-
-  /* This transformation is only worthwhile if we don't have to wrap ARG
- in a SAVE_EXPR and the operation can be simplified without recursing
- on at least one of the branches once its pushed inside the COND_EXPR.
*/
-  if (!TREE_CONSTANT (arg)
-  && (TREE_SIDE_EFFECTS (arg)
- || TREE_CODE (arg) == COND_EXPR || TREE_CODE (arg) ==
VEC_COND_EXPR
- || TREE_CONSTANT (true_value) || TREE_CONSTANT (false_value)))
-return NULL_TREE;
+  tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1);
+  tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0);

   arg = fold_convert_loc (loc, arg_type, arg);
+  ++depth;
   if (lhs == 0)
 {
   true_value = fold_convert_loc (loc, cond_type, true_value);
   if (cond_first_p)
lhs = fold_build2_loc (loc, code, type, true_value, arg);
   else
lhs = fold_build2_loc (loc, code, type, arg, true_value);
 }
   if (rhs == 0)
 {
   false_value = fold_convert_loc (loc, cond_type, false_value);
   if (cond_first_p)
rhs = fold_build2_loc (loc, code, type, false_value, arg);
   else
rhs = fold_build2_loc (loc, code, type, arg, false_value);
 }
+  --depth;

   /* Check that we have simplified at least one of the branches.  */
-  if (!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
+  if (!TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
 return NULL_TREE;

+  enum tree_code cond_code = VECTOR_TYPE_P (TREE_TYPE (test))
+? VEC_COND_EXPR : COND_EXPR;
   return fold_build3_loc (loc, cond_code, type, test, lhs, rhs);
 }


 /* Subroutine of fold() that checks for the addition of +/- 0.0.

If !NEGATE, return true if ADDEND is +/-0.0 and, for all X of type
TYPE, X + ADDEND is the same as X.  If NEGATE, return true if X -
ADDEND is the same as X.

Inde

Re: PR 58148 patch

2013-08-30 Thread Paolo Carlini

Hi,

On 08/29/2013 09:37 PM, François Dumont wrote:
Indeed, I check the Standard about const_pointer, so here is another 
attempt.


Tested under Linux x86_64.

2013-08-29  François Dumont  

PR libstdc++/58148
* include/debug/functions.h (__foreign_iterator_aux4): Use
sequence const_pointer as common type to compare pointers. Add a
fallback overload in case pointers cannot be cast to sequence
const_pointer.
* testsuite/23_containers/vector/modifiers/insert/58148.cc: New.

Ok to commit ?
Seems pretty good to me. I have been thinking: when the non-trivial 
__foreign_iterator_aux4 is selected it actually has available as the 
last two arguments


std::addressof(*(__it._M_get_sequence()->_M_base().begin()))
std::addressof(*__other)

we could as well give the parameters names and avoid passing __other. 
Also, I think we can do everything with std::less. I'm attaching below 
something I quickly hacked, untested, see if you like it in case commit 
something similar.


Thanks!
Paolo.

//
Index: functions.h
===
--- functions.h (revision 202068)
+++ functions.h (working copy)
@@ -36,7 +36,7 @@
 #include // for __addressof and addressof
 #if __cplusplus >= 201103L
 # include   // for less and greater_equal
-# include   // for common_type
+# include   // for is_lvalue_reference 
and __and_
 #endif
 #include 
 
@@ -172,27 +172,28 @@
 }
 
 #if __cplusplus >= 201103L
-  template
+  // Default implementation.
+  template
 inline bool
 __foreign_iterator_aux4(const _Safe_iterator<_Iterator, _Sequence>& __it,
-   _InputIterator __other,
-   _PointerType1, _PointerType2)
+   typename _Sequence::const_pointer __begin,
+   typename _Sequence::const_pointer __other)
 {
-  typedef typename std::common_type<_PointerType1,
-   _PointerType2>::type _PointerType;
+  typedef typename _Sequence::const_pointer _PointerType;
   constexpr std::less<_PointerType> __l{};
-  constexpr std::greater_equal<_PointerType> __ge{};
 
-  return (__l(std::addressof(*__other),
- std::addressof(*(__it._M_get_sequence()->_M_base().begin(
- || __ge(std::addressof(*__other),
- std::addressof(*(__it._M_get_sequence()->_M_base().end()
-  - 1)) + 1));
+  return (__l(__other, __begin)
+ || __l(std::addressof(*(__it._M_get_sequence()->_M_base().end()
+ - 1)), __other));
 }
- 
+
+  // Fallback when address type cannot be implicitely casted to sequence
+  // const_pointer.
+  template
+inline bool
+__foreign_iterator_aux4(const _Safe_iterator<_Iterator, _Sequence>&, ...)
+{ return true; }
+
   template
 inline bool
 __foreign_iterator_aux3(const _Safe_iterator<_Iterator, _Sequence>& __it,
@@ -209,12 +210,12 @@
- std::addressof(*(__it._M_get_sequence()->_M_base().begin()))
== __it._M_get_sequence()->size() - 1)
  return (__foreign_iterator_aux4
- (__it, __other,
+ (__it,
   std::addressof(*(__it._M_get_sequence()->_M_base().begin())),
   std::addressof(*__other)));
   return true;
 }
-  
+   
   /* Fallback overload for which we can't say, assume it is valid. */
   template
 inline bool
@@ -223,7 +224,7 @@
std::false_type)
 { return true; }
 #endif
-  
+
   /** Checks that iterators do not belong to the same sequence. */
   template
 inline bool


Re: [patch 4/4] -fstrict-volatile-bitfields cleanup v3: remove from defaults on all targets

2013-08-30 Thread Richard Biener
On Tue, Jul 2, 2013 at 7:33 PM, DJ Delorie  wrote:
>
>> The choice appears to be to continue to have broken volatile bitfields
>> on ARM with no way for users to make them conform to the ABI, or to
>> change things so that they conform to the ABI if you specify
>> -fstrict-volatile-bitfields explicitly and to the C/C++ standard by
>> default, without that option.
>
> I can't speak for ARM, but for the other targets (for which I wrote
> the original patch), the requirement is that volatile bitfield
> accesses ALWAYS be in the mode of the type specified by the user.  If
> the user says "int x:8;" then SImode (assuming 32-bit ints) must
> always be used, even if QImode could be used.
>
> The reason for this is that volatile bitfields are normally used for
> memory-mapped peripherals, and accessing peripheral registers in the
> wrong mode leads to incorrect operation.

I've argued in the past that this part of the semantics can be easily
implemented in the existing C++ memory model code (well, in
get-bit-range) for the cases where it doesn't conflict with the C++
memory model.  For the cases where it conflicts a warning can
be emitted, Like when you do

struct {
volatile int x:8;
char c;
} x;

where 'c' would be within the SImode you are supposed to use
with -fstrict-volatile-bitfields.

Which raises the question ... how does

x.x = 1;

actually work?  IMHO it must be sth horribly inefficient like

 tem = x.c; // QImode
 tem = tem << 8 | 1; // combine into SImode value
 x.x = tem; // SImode store

hoping that no sane ABI makes 'x' size 2.  Oh, I _can_ make it size 2:

struct {
volatile int x:8;
char c;
} __attribute__((packed)) x;
char y;

note the fancy global object 'y' I placed after 'x'.  Now the store will
clobber y(?)  So the only 'valid' way is

  tem = x.x; // SImode read(!)
  tem = tem & 0xff..00 | 1; // manipulate value
  x.x = tem; // SImode store

but then this doesn't work either because that 1-byte aligned object
could reside at a page boundary and so the read and write would trap.

Makes me ask who designed that crap ;)

But my point was that for all these special cases that likely do not
happen in practice (fingers crossed) the C++ memory model way
doesn't interfere with -fstrict-volatile-bitfields and the code can be
perfectly used to make the code as close as possible to the
-fstrict-volatile-bitifeld ABI.

Richard.


Re: Improve uncprop and coalescing

2013-08-30 Thread Richard Biener
On Wed, Jun 12, 2013 at 6:04 PM, Jeff Law  wrote:
>
> On 06/07/13 03:14, Richard Biener wrote:
>
>>> +/* Given SSA_NAMEs NAME1 and NAME2, return true if they are candidates
>>> for
>>> +   coalescing together, false otherwise.
>>> +
>>> +   This must stay consistent with the code in tree-ssa-live.c which
>>> +   sets up base values in the var map.  */
>>> +
>>> +bool
>>> +gimple_can_coalesce_p (tree name1, tree name2)
>>> +{
>>> +  /* First check the SSA_NAME's associated DECL.  We only want to
>>> + coalesce if they have the same DECL or both have no associated
>>> DECL.
>>> */
>>> +  if (SSA_NAME_VAR (name1) != SSA_NAME_VAR (name2))
>>> +return false;
>>> +
>>> +  /* Now check the types.  If the types are the same, then we should
>>> + try to coalesce V1 and V2.  */
>>> +  tree t1 = TREE_TYPE (name1);
>>> +  tree t2 = TREE_TYPE (name2);
>>> +  if (t1 == t2)
>>> +return true;
>>> +
>>> +  /* If the types are not the same, check for a canonical type match.
>>> This
>>> + (for example) allows coalescing when the types are fundamentally
>>> the
>>> + same, but just have different names.  */
>>> +  if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2))
>>
>>
>> Please use types_compatible_p (t1, t2) here, that's the correct API to use
>> here.
>>
>>> +return true;
>>> +
>>> +  return false;
>>> +}
>>> diff --git a/gcc/tree-ssa-live.c b/gcc/tree-ssa-live.c
>>> index 83a52a0..a624d00 100644
>>> --- a/gcc/tree-ssa-live.c
>>> +++ b/gcc/tree-ssa-live.c
>>> @@ -111,8 +111,12 @@ var_map_base_init (var_map map)
>>> as it restricts the sets we compute conflicts for.
>>> Using TREE_TYPE to generate sets is the easies as
>>> type equivalency also holds for SSA names with the same
>>> -  underlying decl.  */
>>> -   m->base.from = TREE_TYPE (var);
>>> +  underlying decl.
>>> +
>>> +  Check gimple_can_coalesce_p when changing this code.  */
>>> +   m->base.from = (TYPE_CANONICAL (TREE_TYPE (var))
>>> +   ? TYPE_CANONICAL (TREE_TYPE (var))
>>> +   : TREE_TYPE (var));
>>
>>
>> eh, but it's made complicated here ... so above do
>>
>>
>> if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2)
>>  && types_compatible_p (t1, t2))
>>
>> because looking at useless_type_conversion_p it looks like pointer types
>> with different address-spaces may have the same canonical type.  A comment
>> on why we check both, refering to var_map_base_init should also be added.
>
> Reading this again after a night of sleep, it appears you're agreeing that
> we can't just use types_compatible_p to drive what objects are put on the
> coalesce list.  The only code change you're asking for is to make sure we
> properly reject pointer types with different address spaces (which can be
> done via types_compatible_p).
>
>
> Right?

Yes, if I remember correctly after this long time ;)

Richard.

> jeff
>
>


Re: Ada PATCH: Fix ada/58239 by linking with xg++, not xgcc

2013-08-30 Thread Rainer Orth
Gabriel Dos Reis  writes:

> My earlier patch that formally desclared pretty_printer as polymorphic
> uncovered a latent bug in the existing build machinery of the Ada
> component.  It had been using xgcc to link against C++ source files.  It
> should use xg++ instead.
>
> This patch fixes that by introducing GXX_LINK which is GCC_LINK except
> that CXX (e.g. xg++) instead of CC (e.g. xgcc) is invoked.
>
> Eric, are there other executables that need to be linked with GXX_LINK
> too but aren't triggered yet? 

It would be good to have this patch approved and committed: right now,
Ada bootstrap is broken without it.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Merge profiles of duplicated COMDAT functions

2013-08-30 Thread Jan Hubicka
> 
> Please instead remove this assert and put the checking into
> tree-cfg.c:verify_gimple_label where it should need no special-casing
> on cgraph_state.
> 
> Otherwise the non-profile bits look ok.

OK, will do.
Thank you!

Honza


Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-30 Thread Torvald Riegel
On Mon, 2013-08-26 at 09:49 -0700, Richard Henderson wrote:
> On 08/22/2013 02:57 PM, Torvald Riegel wrote:
> > On Thu, 2013-08-22 at 12:05 -0700, Richard Henderson wrote:
> >> On 08/22/2013 11:39 AM, Torvald Riegel wrote:
> >>> + /* Store edi for future HTM fast path retries.  We use a stack slot
> >>> +lower than the jmpbuf so that the jmpbuf's rip field will overlap
> >>> +with the proper return address on the stack.  */
> >>> + movl%edi, -64(%rsp)
> >>
> >> You havn't allocated the stack frame here, and you're storing
> >> outside the redzone.  This is invalid.
> >>
> >> Two possibilities:
> >>
> >>  (1) always allocate the stack frame on entry to
> >>  the function (adds two register additions to
> >>  the htm fast path -- in the noise i'd think)
> >>
> >>  (2) store the edi value in the non-htm path, with
> >>  the pr_HTMRetryableAbort bit or'd in.  (adds an
> >>  extra store to the non-htm path; probably noise).
> >>  You'd want to mask out that bit when you reload it.
> > 
> > Oops.  Picked fix (2).  Better now?
> > 
> 
> Move the andl of edi down into the HAVE_AS_RTM block, above the orl of
> HTMRetriedAfterAbort.  Ok with that change.

Committed as r202101.



Re: [PATCH] Fix PR tree-optimization/58137

2013-08-30 Thread Richard Biener
On Tue, Aug 20, 2013 at 1:01 PM, Bernd Edlinger
 wrote:
> This patch fixes a bug in the vectorized pointer arithmetic in the forwprop
> optimization pass.
>
> Although it seems to be impossible to create a vector of pointers with the
> __attribute__((vector_size(x))) syntax, the vector loop optimization together
> with the loop unrolling can create code which adds a vector of ptroff_t
> repeatedly to a vector of pointers.
>
> The code in tree-ssa-forwprop.c handles program transformations of the
> form (PTR +- CST1) +- CST2 => PTR +- (CST1+-CST2) where PTR is
> a vector of pointers and CST1 and CST2 are vectors of ptroff_t, without the
> fix the result type of CST1+-CST2 was vector of pointer, which causes the
> ICE in tree-cfg.c, which sees an undefined pointer + pointer operation.
>
> Additionally the check in tree-cfg.c allows expressions of the form CST - PTR,
> which is clearly wrong. That should be fixed too.
>
> The patch was bootstrapped and regression tested on i686-pc-linux-gnu.

It seems to me that the forwprop code does not handle the fact that we
are permissive as to using PLUS_EXPR instead of POINTER_PLUS_EXPR
for vector pointer - offset additions.  So instead of doing this dance the
better (and more easily backportable) fix is to simply disable the transform
on pointer valued PLUS_EXPR.  Like with

  if (POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt
return false;

at the beginning of the function.

The real fix is of course to make vector pointer operations properly
use POINTER_PLUS_EXPR ...

Richard.



> Regards
> Bernd Edlinger


Re: [PATCH]: Fix PR middle-end/56382 -- Only move MODE_COMPLEX_FLOAT by parts if we can create pseudos

2013-08-30 Thread Eric Botcazou
> As you say, some other check might be more appropriate to determine
> whether a call to gen_reg_rtx might be needed in
> emit_move_complex_parts.
> For the PA, it would be something like "GET_MODE_BITSIZE (mode) >
> BITS_PER_WORD".
> But, we still need to check can_create_pseudo_p as we probably still
> want to use
> emit_move_complex_parts before reload.

Let's avoid trying to do something general since this seems to be really a 
corner case.  Can't we simply deal with hard registers specially?

  /* Move floating point as parts if splitting is easy.  */
  if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
  && optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing
  && !(REG_P (x)
   && HARD_REGISTER_P (x)
   && hard_regno_nregs[REGNO(x)][mode] == 1)
  && !(REG_P (y)
   && HARD_REGISTER_P (y)
   && hard_regno_nregs[REGNO(y)][mode] == 1))
try_int = false;


-- 
Eric Botcazou


Re: [C++ Patch] PR 51424

2013-08-30 Thread Paolo Carlini

Hi again,

On 08/30/2013 11:06 AM, Paolo Carlini wrote:
I could, for example pass down a separate bit, instead of playing 
again with the LOOKUP_* bits. At some point yesterday I even had that 
version tested ;)

In practice, something like the attached.

By the way, as regards this comment in cp-tree.h:

/* These are uses as bits in flags passed to various functions to
   control their behavior.  Despite the LOOKUP_ prefix, many of these
   do not control name lookup.  ??? Functions using these flags should
   probably be modified to accept explicit boolean flags for the
   behaviors relevant to them.  */

I think I could rather easily remove LOOKUP_ALREADY_DIGESTED, because 
it's only used between check_initializer and store_init_value.


Thanks!
Paolo.


Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 202101)
+++ cp/cp-tree.h(working copy)
@@ -5359,7 +5359,7 @@ extern tree do_friend (tree, 
tree, tree, tree,
 /* in init.c */
 extern tree expand_member_init (tree);
 extern void emit_mem_initializers  (tree);
-extern tree build_aggr_init(tree, tree, int,
+extern tree build_aggr_init(tree, tree, int, bool,
  tsubst_flags_t);
 extern int is_class_type   (tree, int);
 extern tree get_type_value (tree);
Index: cp/decl.c
===
--- cp/decl.c   (revision 202101)
+++ cp/decl.c   (working copy)
@@ -5539,7 +5539,8 @@ build_aggr_init_full_exprs (tree decl, tree init,
  
 {
   gcc_assert (stmts_are_full_exprs_p ());
-  return build_aggr_init (decl, init, flags, tf_warning_or_error);
+  return build_aggr_init (decl, init, flags, /*delegating_cons_p=*/false,
+ tf_warning_or_error);
 }
 
 /* Verify INIT (the initializer for DECL), and record the
Index: cp/init.c
===
--- cp/init.c   (revision 202101)
+++ cp/init.c   (working copy)
@@ -32,8 +32,10 @@ along with GCC; see the file COPYING3.  If not see
 static bool begin_init_stmts (tree *, tree *);
 static tree finish_init_stmts (bool, tree, tree);
 static void construct_virtual_base (tree, tree);
-static void expand_aggr_init_1 (tree, tree, tree, tree, int, tsubst_flags_t);
-static void expand_default_init (tree, tree, tree, tree, int, tsubst_flags_t);
+static void expand_aggr_init_1 (tree, tree, tree, tree, int, bool,
+   tsubst_flags_t);
+static void expand_default_init (tree, tree, tree, tree, int, bool,
+tsubst_flags_t);
 static void perform_member_init (tree, tree);
 static tree build_builtin_delete_call (tree);
 static int member_init_ok_or_else (tree, tree, tree);
@@ -501,7 +503,8 @@ perform_target_ctor (tree init)
   tree type = current_class_type;
 
   finish_expr_stmt (build_aggr_init (decl, init, LOOKUP_NORMAL,
- tf_warning_or_error));
+/*delegating_cons_p=*/true,
+tf_warning_or_error));
   if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type))
 {
   tree expr = build_delete (type, decl, sfk_complete_destructor,
@@ -693,6 +696,7 @@ perform_member_init (tree member, tree init)
   "uninitialized member %qD with % type %qT",
   member, type);
  finish_expr_stmt (build_aggr_init (decl, init, flags,
+/*delegating_cons_p=*/false,
 tf_warning_or_error));
}
 }
@@ -1076,6 +1080,7 @@ emit_mem_initializers (tree mem_inits)
  tf_warning_or_error),
  arguments,
  flags,
+ /*delegating_cons_p=*/false,
   tf_warning_or_error);
  expand_cleanup_for_base (subobject, NULL_TREE);
}
@@ -1242,7 +1247,7 @@ construct_virtual_base (tree vbase, tree arguments
   exp = convert_to_base_statically (current_class_ref, vbase);
 
   expand_aggr_init_1 (vbase, current_class_ref, exp, arguments,
- 0, tf_warning_or_error);
+ 0, /*delegating_cons_p=*/false, tf_warning_or_error);
   finish_then_clause (inner_if_stmt);
   finish_if_stmt (inner_if_stmt);
 
@@ -1448,7 +1453,8 @@ expand_member_init (tree name)
perform the initialization, but not both, as it would be ambiguous.  */
 
 tree
-build_aggr_init (tree exp, tree init, int flags, tsubst_flags_t complain)
+build_aggr_init (tree exp, tree init, int flags, bool delegating_cons_p,
+tsubst_flags_t complain)
 {
   tree stmt_expr;
   tree compound_stmt;
@@ -1513,7 +15

[PATCH] Fix overflow test in fold_single_bit_test

2013-08-30 Thread Richard Biener

As noted by Kenny and Joern.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2013-08-30  Richard Biener  

* fold-const.c (fold_single_bit_test): Fix overflow test.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 202100)
+++ gcc/fold-const.c(working copy)
@@ -6634,10 +6634,10 @@ fold_single_bit_test (location_t loc, en
 not overflow, adjust BITNUM and INNER.  */
   if (TREE_CODE (inner) == RSHIFT_EXPR
  && TREE_CODE (TREE_OPERAND (inner, 1)) == INTEGER_CST
- && TREE_INT_CST_HIGH (TREE_OPERAND (inner, 1)) == 0
+ && host_integerp (TREE_OPERAND (inner, 1), 1)
  && bitnum < TYPE_PRECISION (type)
- && 0 > compare_tree_int (TREE_OPERAND (inner, 1),
-  bitnum - TYPE_PRECISION (type)))
+ && (TREE_INT_CST_LOW (TREE_OPERAND (inner, 1))
+ < (unsigned) (TYPE_PRECISION (type) - bitnum)))
{
  bitnum += TREE_INT_CST_LOW (TREE_OPERAND (inner, 1));
  inner = TREE_OPERAND (inner, 0);


Re: [PATCH] RTEMS: Add LEON3/SPARC multilibs

2013-08-30 Thread Daniel Hellstrom

Hello Sebastian,

That seems like a good idea.

Thanks,
Daniel


On 08/29/2013 01:04 PM, Sebastian Huber wrote:

Recently support for LEON3 specific instructions were added to GCC.
Make this support available for RTEMS.

This patch should be committed to GCC 4.9.

gcc/ChangeLog
2013-08-29  Sebastian Huber  

* config/sparc/t-rtems: Add leon3 multilibs.
---
  gcc/config/sparc/t-rtems |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/sparc/t-rtems b/gcc/config/sparc/t-rtems
index 63d0217..f1a3d84 100644
--- a/gcc/config/sparc/t-rtems
+++ b/gcc/config/sparc/t-rtems
@@ -17,6 +17,6 @@
  # .
  #
  
-MULTILIB_OPTIONS = msoft-float mcpu=v8

-MULTILIB_DIRNAMES = soft v8
+MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3
+MULTILIB_DIRNAMES = soft v8 leon3
  MULTILIB_MATCHES = msoft-float=mno-fpu




[Patch] Rewrite regex matchers

2013-08-30 Thread Tim Shen
Rewrite matchers using structs instead of closures. It's more clear
and easy to maintain now.

Tested under -m32, -m64 and debug mode.

Thanks!


-- 
Tim Shen


matchers.patch
Description: Binary data


Re: [Patch] Rewrite regex matchers

2013-08-30 Thread Paolo Carlini
.. the testcase is not Ok. We want separate tests for char and wchar_t, 
in separate subdirs with the names 'char' and 'wchar_t'. In this way, 
the framework automatically helps us and doesn't try to run the latter 
for targets which don't have wchar_t.


Eg, testsuite/21_strings is completely structured this way.

Paolo.


[PATCH] Fix up strlen pass invalidation (PR tree-optimization/58277)

2013-08-30 Thread Jakub Jelinek
Hi!

Apparently I forgot to add code for complete invalidation of everything, if
more than 100 stmts with vdefs are seen in between immediate dominator and
current basic block.  That cutoff is there so that we don't spend too much
time on invalidation, but without this patch it actually ignored all the
other vdef stmts in between those two bbs rather than just starting with no
known string lengths at the start of the bb.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?

2013-08-30  Jakub Jelinek  

PR tree-optimization/58277
* tree-ssa-strlen.c (strlen_enter_block): If do_invalidate gave up
after seeing too many stmts with vdef in between dombb and current
bb, invalidate everything.

* gcc.c-torture/execute/pr58277-1.c: New test.
* gcc.c-torture/execute/pr58277-2.c: New test.

--- gcc/tree-ssa-strlen.c.jj2013-08-13 12:20:27.0 +0200
+++ gcc/tree-ssa-strlen.c   2013-08-30 11:02:39.717691590 +0200
@@ -1952,6 +1952,28 @@ strlen_enter_block (struct dom_walk_data
  int count_vdef = 100;
  do_invalidate (dombb, phi, visited, &count_vdef);
  BITMAP_FREE (visited);
+ if (count_vdef == 0)
+   {
+ /* If there were too many vdefs in between immediate
+dominator and current bb, invalidate everything.
+If stridx_to_strinfo has been unshared, we need
+to free it, otherwise just set it to NULL.  */
+ if (!strinfo_shared ())
+   {
+ unsigned int i;
+ strinfo si;
+
+ for (i = 1;
+  vec_safe_iterate (stridx_to_strinfo, i, &si);
+  ++i)
+   {
+ free_strinfo (si);
+ (*stridx_to_strinfo)[i] = NULL;
+   }
+   }
+ else
+   stridx_to_strinfo = NULL;
+   }
  break;
}
}
--- gcc/testsuite/gcc.c-torture/execute/pr58277-1.c.jj  2013-08-30 
11:10:36.590005465 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr58277-1.c 2013-08-30 
11:10:05.0 +0200
@@ -0,0 +1,102 @@
+/* PR tree-optimization/58277 */
+
+extern void abort (void);
+static int a[2];
+int b, c, d, *e, f, g, h, **i = &e, k, l = 1, n, o, p;
+static int **volatile j = &e;
+const int m;
+char u;
+
+int
+bar ()
+{
+  u = 0;
+  return m;
+}
+
+__attribute__((noinline, noclone)) void
+baz ()
+{
+  asm ("");
+}
+
+static int
+foo ()
+{
+  int t1;
+  g = bar ();
+  if (l)
+;
+  else
+for (;; h++)
+  {
+   *i = 0;
+   o = *e = 0;
+   if (p)
+ {
+   f = 0;
+   return 0;
+ }
+   for (;; k++)
+ {
+   int *t2 = 0;
+   int *const *t3[] = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, &t2, 0, 0, &t2, &t2, &t2,
+ &t2, &t2, 0, 0, 0, 0, 0, 0, 0, &t2, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, &t2, 0, 0, 0, 0, 0, 0, 0, &t2, &t2,
+ &t2, &t2, &t2, 0, 0, 0, 0, 0, 0, 0, &t2, 0, 0, 0,
+ &t2, 0, 0, 0, &t2, 0, &t2, 0, 0, &t2, 0, 0, 0, 0,
+ 0, &t2, 0, 0, 0, 0, &t2, &t2, 0, 0, 0, 0, &t2, 0,
+ 0, 0, 0, 0, 0, 0, &t2, 0, 0, 0, 0, 0, &t2, 0, 0, 0,
+ &t2, &t2
+   };
+   int *const **t4[] = {&t3[0]};
+   **i = 0;
+   if (**j)
+ break;
+   u = 0;
+ }
+   *i = *j;
+   t1 = 0;
+   for (; t1 < 5; t1++)
+ *i = *j;
+  }
+  *j = 0;
+  return 1;
+}
+
+int
+main ()
+{
+  int t5;
+  a[0] = 1;
+  {
+int *t6[6] = {&d, &d};
+for (n = 1; n; n--)
+  if (foo())
+   {
+ int *t7[] = {0};
+ d = 0;
+ for (; u < 1; u++)
+   *i = *j;
+ *i = 0;
+ *i = 0;
+ int t8[5] = {0};
+ *i = &t8[0];
+ int *const *t9 = &t6[0];
+ int *const **t10 = &t9;
+ *t10 = &t7[0];
+   }
+  }
+  u = 0;
+  for (; b; b++)
+for (t5 = 0; t5 < 10; t5++)
+  c = a[a[a[a[a[a[a[a[c;
+
+  baz ();
+
+  if (!a[a[a[a[a[a[a[a[a[a[a[a[a[a[a[u]]])
+abort ();
+
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr58277-2.c.jj  2013-08-30 
11:10:39.613988421 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr58277-2.c 2013-08-30 
11:06:17.0 +0200
@@ -0,0 +1,98 @@
+/* PR tree-optimization/58277 */
+
+extern void abort (void);
+static int a[1], b, c, e, i, j, k, m, q[] = { 1, 1 }, t;
+int volatile d;
+int **r;
+static int ***volatile s = &r;
+int f, g, o, x;
+static int *volatile h = &f, *p;
+char n;
+
+static void
+fn1 ()
+{
+  b = a[a[a[a[a[a[a[a[b;
+  b = a[a[a[a[a[a[a[a[b;
+  b = a[a[b]];
+  b = a[a[a[a[a[a[a[a[b;
+  

Re: Request to merge Undefined Behavior Sanitizer in (take 3)

2013-08-30 Thread Marek Polacek
On Fri, Aug 30, 2013 at 10:42:57AM +0200, Jakub Jelinek wrote:
> On Fri, Aug 30, 2013 at 10:38:51AM +0200, Marek Polacek wrote:
> > On Fri, Aug 30, 2013 at 10:15:44AM +0200, Jakub Jelinek wrote:
> > > So, can you please post a new final patch for the merge (with the new 
> > > directories
> > > or files in libsanitizer/ just listed in the ChangeLog entries, but not
> > > actually included in the patch, that would make it too large and anyone 
> > > can
> > > look at libsanitizer/ubsan/ on the branch)?
> > 
> > Yep.  This is diff between trunk and the ubsan branch without new
> > files.
> > 
> > BTW, when merging the ChangeLog.ubsan into normal ChangeLog, should I
> > change the CL entry dates to the day of the merge into the trunk, or 
> > can I keep them as they are?
> 
> Usually you write a new ChangeLog entry (per changed directory) that
> summarizes all the changes, using the current date.  So, e.g. for the new
> files you just mention * ubsan.c: New file. etc. and don't list all the
> follow-up changes.

I see.  For the record, here are the new ChangeLog entries I'm going
to use when doing the merge.

config/ChangeLog
2013-08-30  Marek Polacek  

* bootstrap-ubsan.mk: New.

gcc/c-family/ChangeLog
2013-08-30  Marek Polacek  

* c-ubsan.c: New file.
* c-ubsan.h: New file.

gcc/ChangeLog
2013-08-30  Marek Polacek  

* Makefile.in (ubsan.o): Add.
(c-family/c-ubsan.o): Add.
(builtins.o): Add ubsan.h dependency.
* ubsan.h: New file.
* ubsan.c: New file.
* common.opt: Add -fsanitize=undefined option.
(flag_sanitize): Add variable.
(fsanitize=): Add option.  Add Driver.
(fsanitize=thread): Remove option.
(fsanitize=address): Likewise.
(static-libubsan): New option.
* doc/invoke.texi: Document the new flag and -static-libubsan.
* sanitizer.def (DEF_SANITIZER_BUILTIN): Define.
(BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE): Define.
* builtin-attrs.def (ATTR_COLD): Define.
(ATTR_COLD_NOTHROW_LEAF_LIST): Define.
* builtins.def (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS): Define.
* flag-types.h (sanitize_code): New enum.
* opts.c (common_handle_option): Parse command line arguments
of -fsanitize=.  Add -fsanitize=unreachable option.
* varasm.c (get_variable_section): Adjust.
(assemble_noswitch_variable): Likewise.
(assemble_variable): Likewise.
(output_constant_def_contents): Likewise.
(categorize_decl_for_section): Likewise.
(place_block_symbol): Likewise.
(output_object_block): Likewise.
* builtins.def: Likewise.
* toplev.c (compile_file): Likewise.
(process_options): Likewise.
* cppbuiltin.c: Likewise.
* tsan.c (tsan_pass): Likewise.
(tsan_gate): Likewise.
(tsan_gate_O0): Likewise.
* cfgexpand.c (partition_stack_vars): Likewise.
(expand_stack_vars): Likewise.
(defer_stack_allocation): Likewise.
(expand_used_vars): Likewise.
* cfgcleanup.c (old_insns_match_p): Likewise.
* asan.c (asan_finish_file): Likewise.
(asan_instrument): Likewise.
(gate_asan): Likewise.
(initialize_sanitizer_builtins): Build BT_FN_VOID_PTR_PTR_PTR.
(ATTR_COLD_NOTHROW_LEAF_LIST): Define.
(asan_global_struct): Use pointer_sized_int_node instead
calling build_nonstandard_integer_type.
(initialize_sanitizer_builtins): Likewise.
(asan_finish_file): Likewise.
* gcc.c: Document %{%:function(args):X}.
(static_spec_functions): Add sanitize.
(handle_spec_function): Add retval_nonnull argument and if non-NULL,
store funcval != NULL there.
(do_spec_1): Adjust handle_spec_function caller.
(handle_braces): Allow %:function(args) as condition.
(sanitize_spec_function): New function.
(ADD_STATIC_LIBUBSAN_LIBS): Define.
(LIBUBSAN_SPEC): Likewise.
(LIBUBSAN_EARLY_SPEC): Likewise.
(SANITIZER_SPEC): Handle libubsan.
(SANITIZER_EARLY_SPEC): Likewise.
* config/darwin.h (LINK_COMMAND_SPEC_A): Use %:sanitize(address)
instead of fsanitize=address.
* config/arm/linux-eabi.h (ASAN_CC1_SPEC): Use %:sanitize(address)
instead of fsanitize=address*.
* builtins.c: Include ubsan.h.
(fold_builtin_0): Instrument __builtin_unreachable.
* config/rs6000/rs6000.h (FRAME_GROWS_DOWNWARD): Use flag_sanitize
instead of flag_asan.
* tree.h (enum tree_index): Add TI_POINTER_SIZED_TYPE.
(pointer_sized_int_node): Define.
* tree.c (build_common_tree_nodes): Initialize
pointer_sized_int_node.

gcc/cp/ChangeLog
2013-08-30  Marek Polacek  

* typeck.c (cp_build_binary_op): Add division by zero and shift
instrumentation.
* error.c (dum

[PATCH] Fix PR58250

2013-08-30 Thread Markus Trippelsdorf
During an LTO/PGO Firefox build one gets countless warnings:
  warning: -fprefetch-loop-arrays is not supported with -Os

This happens because flag_prefetch_loop_arrays is always set during the
profile-use phase.
Fix by following the suggestion of Jakub and disable prefetching of
loop_arrays when optimizing for size.

Please apply if the patch looks OK.
Thanks.

2013-08-30  Markus Trippelsdorf  

* config/i386/i386.c (ix86_option_override_internal): Only
prefetch loop_arrays during profile_use when not optimizing for
size.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a8d70bc..f385db4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3952,7 +3952,7 @@ ix86_option_override_internal (bool main_args_p)
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (flag_prefetch_loop_arrays < 0
   && HAVE_prefetch
-  && (optimize >= 3 || flag_profile_use)
+  && (optimize >= 3 || (flag_profile_use && !optimize_size))
   && TARGET_SOFTWARE_PREFETCHING_BENEFICIAL)
 flag_prefetch_loop_arrays = 1;
 

-- 
Markus


Re: [PATCH] Fix up strlen pass invalidation (PR tree-optimization/58277)

2013-08-30 Thread Richard Biener
On Fri, 30 Aug 2013, Jakub Jelinek wrote:

> Hi!
> 
> Apparently I forgot to add code for complete invalidation of everything, if
> more than 100 stmts with vdefs are seen in between immediate dominator and
> current basic block.  That cutoff is there so that we don't spend too much
> time on invalidation, but without this patch it actually ignored all the
> other vdef stmts in between those two bbs rather than just starting with no
> known string lengths at the start of the bb.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?

Ok.

Thanks,
Richard.

> 2013-08-30  Jakub Jelinek  
> 
>   PR tree-optimization/58277
>   * tree-ssa-strlen.c (strlen_enter_block): If do_invalidate gave up
>   after seeing too many stmts with vdef in between dombb and current
>   bb, invalidate everything.
> 
>   * gcc.c-torture/execute/pr58277-1.c: New test.
>   * gcc.c-torture/execute/pr58277-2.c: New test.
> 
> --- gcc/tree-ssa-strlen.c.jj  2013-08-13 12:20:27.0 +0200
> +++ gcc/tree-ssa-strlen.c 2013-08-30 11:02:39.717691590 +0200
> @@ -1952,6 +1952,28 @@ strlen_enter_block (struct dom_walk_data
> int count_vdef = 100;
> do_invalidate (dombb, phi, visited, &count_vdef);
> BITMAP_FREE (visited);
> +   if (count_vdef == 0)
> + {
> +   /* If there were too many vdefs in between immediate
> +  dominator and current bb, invalidate everything.
> +  If stridx_to_strinfo has been unshared, we need
> +  to free it, otherwise just set it to NULL.  */
> +   if (!strinfo_shared ())
> + {
> +   unsigned int i;
> +   strinfo si;
> +
> +   for (i = 1;
> +vec_safe_iterate (stridx_to_strinfo, i, &si);
> +++i)
> + {
> +   free_strinfo (si);
> +   (*stridx_to_strinfo)[i] = NULL;
> + }
> + }
> +   else
> + stridx_to_strinfo = NULL;
> + }
> break;
>   }
>   }
> --- gcc/testsuite/gcc.c-torture/execute/pr58277-1.c.jj2013-08-30 
> 11:10:36.590005465 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr58277-1.c   2013-08-30 
> 11:10:05.0 +0200
> @@ -0,0 +1,102 @@
> +/* PR tree-optimization/58277 */
> +
> +extern void abort (void);
> +static int a[2];
> +int b, c, d, *e, f, g, h, **i = &e, k, l = 1, n, o, p;
> +static int **volatile j = &e;
> +const int m;
> +char u;
> +
> +int
> +bar ()
> +{
> +  u = 0;
> +  return m;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +baz ()
> +{
> +  asm ("");
> +}
> +
> +static int
> +foo ()
> +{
> +  int t1;
> +  g = bar ();
> +  if (l)
> +;
> +  else
> +for (;; h++)
> +  {
> + *i = 0;
> + o = *e = 0;
> + if (p)
> +   {
> + f = 0;
> + return 0;
> +   }
> + for (;; k++)
> +   {
> + int *t2 = 0;
> + int *const *t3[] = {
> +   0, 0, 0, 0, 0, 0, 0, 0, 0, &t2, 0, 0, &t2, &t2, &t2,
> +   &t2, &t2, 0, 0, 0, 0, 0, 0, 0, &t2, 0, 0, 0, 0, 0, 0,
> +   0, 0, 0, 0, &t2, 0, 0, 0, 0, 0, 0, 0, &t2, &t2,
> +   &t2, &t2, &t2, 0, 0, 0, 0, 0, 0, 0, &t2, 0, 0, 0,
> +   &t2, 0, 0, 0, &t2, 0, &t2, 0, 0, &t2, 0, 0, 0, 0,
> +   0, &t2, 0, 0, 0, 0, &t2, &t2, 0, 0, 0, 0, &t2, 0,
> +   0, 0, 0, 0, 0, 0, &t2, 0, 0, 0, 0, 0, &t2, 0, 0, 0,
> +   &t2, &t2
> + };
> + int *const **t4[] = {&t3[0]};
> + **i = 0;
> + if (**j)
> +   break;
> + u = 0;
> +   }
> + *i = *j;
> + t1 = 0;
> + for (; t1 < 5; t1++)
> +   *i = *j;
> +  }
> +  *j = 0;
> +  return 1;
> +}
> +
> +int
> +main ()
> +{
> +  int t5;
> +  a[0] = 1;
> +  {
> +int *t6[6] = {&d, &d};
> +for (n = 1; n; n--)
> +  if (foo())
> + {
> +   int *t7[] = {0};
> +   d = 0;
> +   for (; u < 1; u++)
> + *i = *j;
> +   *i = 0;
> +   *i = 0;
> +   int t8[5] = {0};
> +   *i = &t8[0];
> +   int *const *t9 = &t6[0];
> +   int *const **t10 = &t9;
> +   *t10 = &t7[0];
> + }
> +  }
> +  u = 0;
> +  for (; b; b++)
> +for (t5 = 0; t5 < 10; t5++)
> +  c = a[a[a[a[a[a[a[a[c;
> +
> +  baz ();
> +
> +  if (!a[a[a[a[a[a[a[a[a[a[a[a[a[a[a[u]]])
> +abort ();
> +
> +  return 0;
> +}
> --- gcc/testsuite/gcc.c-torture/execute/pr58277-2.c.jj2013-08-30 
> 11:10:39.613988421 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr58277-2.c   2013-08-30 
> 11:06:17.0 +0200
> @@ -0,0 +1,98 @@
> +/* PR tree-optimization/58277 */
> +
> +extern void abort (void);
> +static int a[1], b, c, e, i, j, k, m, q[] = { 1, 1 }, t;
> +int volatil

Re: Request to merge Undefined Behavior Sanitizer in (take 3)

2013-08-30 Thread Jakub Jelinek
On Fri, Aug 30, 2013 at 02:23:44PM +0200, Marek Polacek wrote:
> I see.  For the record, here are the new ChangeLog entries I'm going
> to use when doing the merge.

Ok for trunk, thanks.

> config/ChangeLog
> 2013-08-30  Marek Polacek  
> 
>   * bootstrap-ubsan.mk: New.
> 
> gcc/c-family/ChangeLog
> 2013-08-30  Marek Polacek  
> 
>   * c-ubsan.c: New file.
>   * c-ubsan.h: New file.
> 
> gcc/ChangeLog
> 2013-08-30  Marek Polacek  
> 
>   * Makefile.in (ubsan.o): Add.
>   (c-family/c-ubsan.o): Add.
>   (builtins.o): Add ubsan.h dependency.
>   * ubsan.h: New file.
>   * ubsan.c: New file.
>   * common.opt: Add -fsanitize=undefined option.
>   (flag_sanitize): Add variable.
>   (fsanitize=): Add option.  Add Driver.
>   (fsanitize=thread): Remove option.
>   (fsanitize=address): Likewise.
>   (static-libubsan): New option.
>   * doc/invoke.texi: Document the new flag and -static-libubsan.
>   * sanitizer.def (DEF_SANITIZER_BUILTIN): Define.
>   (BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE): Define.
>   * builtin-attrs.def (ATTR_COLD): Define.
>   (ATTR_COLD_NOTHROW_LEAF_LIST): Define.
>   * builtins.def (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
>   BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS): Define.
>   * flag-types.h (sanitize_code): New enum.
>   * opts.c (common_handle_option): Parse command line arguments
>   of -fsanitize=.  Add -fsanitize=unreachable option.
>   * varasm.c (get_variable_section): Adjust.
>   (assemble_noswitch_variable): Likewise.
>   (assemble_variable): Likewise.
>   (output_constant_def_contents): Likewise.
>   (categorize_decl_for_section): Likewise.
>   (place_block_symbol): Likewise.
>   (output_object_block): Likewise.
>   * builtins.def: Likewise.
>   * toplev.c (compile_file): Likewise.
>   (process_options): Likewise.
>   * cppbuiltin.c: Likewise.
>   * tsan.c (tsan_pass): Likewise.
>   (tsan_gate): Likewise.
>   (tsan_gate_O0): Likewise.
>   * cfgexpand.c (partition_stack_vars): Likewise.
>   (expand_stack_vars): Likewise.
>   (defer_stack_allocation): Likewise.
>   (expand_used_vars): Likewise.
>   * cfgcleanup.c (old_insns_match_p): Likewise.
>   * asan.c (asan_finish_file): Likewise.
>   (asan_instrument): Likewise.
>   (gate_asan): Likewise.
>   (initialize_sanitizer_builtins): Build BT_FN_VOID_PTR_PTR_PTR.
>   (ATTR_COLD_NOTHROW_LEAF_LIST): Define.
>   (asan_global_struct): Use pointer_sized_int_node instead
>   calling build_nonstandard_integer_type.
>   (initialize_sanitizer_builtins): Likewise.
>   (asan_finish_file): Likewise.
>   * gcc.c: Document %{%:function(args):X}.
>   (static_spec_functions): Add sanitize.
>   (handle_spec_function): Add retval_nonnull argument and if non-NULL,
>   store funcval != NULL there.
>   (do_spec_1): Adjust handle_spec_function caller.
>   (handle_braces): Allow %:function(args) as condition.
>   (sanitize_spec_function): New function.
>   (ADD_STATIC_LIBUBSAN_LIBS): Define.
>   (LIBUBSAN_SPEC): Likewise.
>   (LIBUBSAN_EARLY_SPEC): Likewise.
>   (SANITIZER_SPEC): Handle libubsan.
>   (SANITIZER_EARLY_SPEC): Likewise.
>   * config/darwin.h (LINK_COMMAND_SPEC_A): Use %:sanitize(address)
>   instead of fsanitize=address.
>   * config/arm/linux-eabi.h (ASAN_CC1_SPEC): Use %:sanitize(address)
>   instead of fsanitize=address*.
>   * builtins.c: Include ubsan.h.
>   (fold_builtin_0): Instrument __builtin_unreachable.
>   * config/rs6000/rs6000.h (FRAME_GROWS_DOWNWARD): Use flag_sanitize
>   instead of flag_asan.
>   * tree.h (enum tree_index): Add TI_POINTER_SIZED_TYPE.
>   (pointer_sized_int_node): Define.
>   * tree.c (build_common_tree_nodes): Initialize
>   pointer_sized_int_node.
> 
> gcc/cp/ChangeLog
> 2013-08-30  Marek Polacek  
> 
>   * typeck.c (cp_build_binary_op): Add division by zero and shift
>   instrumentation.
>   * error.c (dump_expr): Special-case ubsan builtins.
> 
> gcc/c/ChangeLog
> 2013-08-30  Marek Polacek  
> 
>   * c-typeck.c (build_binary_op): Add division by zero and shift
>   instrumentation.
> 
> gcc/testsuite/ChangeLog
> 2013-08-30  Marek Polacek  
> 
>   * g++.dg/ubsan/div-by-zero-1.C: New test.
>   * c-c++-common/ubsan/save-expr-1.c: New test.
>   * c-c++-common/ubsan/save-expr-2.c: New test.
>   * c-c++-common/ubsan/save-expr-3.c: New test.
>   * c-c++-common/ubsan/save-expr-4.c: New test.
>   * c-c++-common/ubsan/typedef-1.c: New test.
>   * c-c++-common/ubsan/const-char-1.c: New test.
>   * c-c++-common/ubsan/const-expr.c: New test.
>   * c-c++-common/ubsan/div-by-zero-1.c: Likewise.
>   * c-c++-common/ubsan/shift-1.c: Likewise.
>   * c-c++-common/ubsan/shift-2.c: Likewise.
>   * c-c++-common/ubsan/div-by-zero-2.c: Likewise.
>   * lib/ubs

Re: RFC: patch to build GCC for arm with LRA

2013-08-30 Thread Yvan Roux
Hi,

here is a request for comments on the 2 attached patches which enable
the build of GCC on ARM with LRA.  The patches introduce a new
undocumented option -mlra to use LRA instead of reload, as what was
done on previous LRA support, which is here to ease the test and
comparison with reload and not supposed to be kept in 4.9.

On AArch32:

* The patch includes the previous changes of this thread, add the
handling of ASHIFTRT, LSHIFTRT, ROTATE and ROTATERT in get_index_scale
and let LRA handle the constraint in Thumb.
* The status is that the build complete in ARM mode with a couple of
regressions in the testsuite, that I'll investigate:
  - gcc.c-torture/execute/20060102-1.c
  - gcc.c-torture/execute/961026-1.c
  - gcc.target/arm/atomic-comp-swap-release-acquire.c
and some build failures in libstdc++ at the pass manager level (there
is an invalid read in gt_ggc_mx)
* The build of libraries in Thumb mode still doesn't complete, as
Vladimir said the secondary_reload  modification solves LRA cycling
but we still have some issues.

On AArch64 :

* I modified must_be_index_p to avoid the call of set_address_base
with patterns which contains a MULT.
* The build complete, but there is a couple of regression in the
testsuite I'm looking at on
 - gcc.c-torture/execute/ieee/fp-cmp-4l.c
 - c-c++-common/torture/complex-sign-mul-minus-one.c
for instance.

Any comments ?

Thanks
Yvan




On 6 July 2013 01:12, Vladimir Makarov  wrote:
> On 13-07-05 8:43 AM, Yvan Roux wrote:
>>
>> Hi,
>>
>> for AArch64 it is also needed to take into account SIGN_EXTRACT in the
>> set_address_base and set_address_index routines, as we acan encounter
>> that kind of insn for instance :
>>
>> (insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
>> (subreg:DI (reg/v:SI 76 [ elt ]) 0)
>> ...
>
> OK.
>
>> with the attached patch and the LRA enabled, compiler now bootstrap
>> but I've few regressions in the testsuite,
>
> It seems ok to me but I am confused of the following change:
>
>  set_address_base (struct address_info *info, rtx *loc, rtx *inner)
>  {
> -  if (GET_CODE (*inner) == LO_SUM)
> +  if (GET_CODE (*inner) == SIGN_EXTRACT)
> +inner = strip_address_mutations (&XEXP (*inner, 0));
> +
> +  if (GET_CODE (*inner) == LO_SUM || GET_CODE (*inner) == MULT)
>
>  inner = strip_address_mutations (&XEXP (*inner, 0));
>gcc_checking_assert (REG_P (*inner)
> || MEM_P (*inner)
>
> base address should not contain MULT (which you added).  It is controlled by
> the result of must_be_index_p.  So set_address_base should not have code for
> MULT and you need to change must_be_index_p in a way that set_address_base
> is not called for MULT.
>
>
>> gcc.c/torture/execute/fp-cmp-4l.c for instance. I was looking at these
>> issues before submitting  a complete AArch64 LRA enabling patch, but
>> as you are speaking about that...
>>
>> Valdimir, for the ARM target I already had the ASHIFTRT and LSHIFTRT
>> addition on my side, but still had an ICE during bootstrap with LRA
>> when compiling fixed-bit.c (the Max number of generated reload insns
>> we talk about already) is it working for you ?
>>
>>
> I did not check thumb I guess.  If what you are asking about the problem you
> sent me about 2 weeks ago, I guess one solution would be putting
>
>   if (lra_in_progress)
> return NO_REGS;
>
> at the beginning of arm/targhooks.c::default_secondary_reload.  LRA is smart
> enough to figure out what to do from constraints in most cases of when
> reload needs secondary_reload.  In arm case, default_secondary_reload only
> confuses LRA.
>
> I had no time to test this change, but it solves LRA cycling for the test
> case you sent me.
>


aarch32-lra.patch
Description: Binary data


aarch64-lra.patch
Description: Binary data


Re: [PATCH] Convert more passes to new dump framework

2013-08-30 Thread Teresa Johnson
Sorry, I should not have committed that new test along with this
portion of the patch. Removed as of r202106.
Teresa

On Fri, Aug 30, 2013 at 12:17 AM, Andreas Schwab  wrote:
> Teresa Johnson  writes:
>
>> Index: testsuite/gcc.dg/inline-dump.c
>> ===
>> --- testsuite/gcc.dg/inline-dump.c  (revision 0)
>> +++ testsuite/gcc.dg/inline-dump.c  (revision 0)
>> @@ -0,0 +1,11 @@
>> +/* Verify that -fopt-info can output correct inline info.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } 
>> */
>> +static inline int leaf() {
>> +  int i, ret = 0;
>> +  for (i = 0; i < 10; i++)
>> +ret += i;
>> +  return ret;
>> +}
>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>
> I don't see that message, neither on ia64 nor m68k.
>
> Andreas.
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: RFC: patch to build GCC for arm with LRA

2013-08-30 Thread Richard Earnshaw
On 30/08/13 14:09, Yvan Roux wrote:
> Hi,
> 
> here is a request for comments on the 2 attached patches which enable
> the build of GCC on ARM with LRA.  The patches introduce a new
> undocumented option -mlra to use LRA instead of reload, as what was
> done on previous LRA support, which is here to ease the test and
> comparison with reload and not supposed to be kept in 4.9.
> 
> On AArch32:
> 
> * The patch includes the previous changes of this thread, add the
> handling of ASHIFTRT, LSHIFTRT, ROTATE and ROTATERT in get_index_scale
> and let LRA handle the constraint in Thumb.
> * The status is that the build complete in ARM mode with a couple of
> regressions in the testsuite, that I'll investigate:
>   - gcc.c-torture/execute/20060102-1.c
>   - gcc.c-torture/execute/961026-1.c
>   - gcc.target/arm/atomic-comp-swap-release-acquire.c
> and some build failures in libstdc++ at the pass manager level (there
> is an invalid read in gt_ggc_mx)
> * The build of libraries in Thumb mode still doesn't complete, as
> Vladimir said the secondary_reload  modification solves LRA cycling
> but we still have some issues.
> 
> On AArch64 :
> 
> * I modified must_be_index_p to avoid the call of set_address_base
> with patterns which contains a MULT.
> * The build complete, but there is a couple of regression in the
> testsuite I'm looking at on
>  - gcc.c-torture/execute/ieee/fp-cmp-4l.c
>  - c-c++-common/torture/complex-sign-mul-minus-one.c
> for instance.
> 
> Any comments ?
> 

Why are you posting to conflicting sets of patches for rtlanal.c?
Changes to that file will have to work for all architectures; so while
it helps a little bit to see which changes are needed for which target,
ultimately you need one patch for that file that works everywhere.

Also, as a style nit, use

  return (test
  || test
  || test);

so that the parenthesis will force correct indentation of continuation
lines.

R.

> Thanks
> Yvan
> 
> 
> 
> 
> On 6 July 2013 01:12, Vladimir Makarov  wrote:
>> On 13-07-05 8:43 AM, Yvan Roux wrote:
>>>
>>> Hi,
>>>
>>> for AArch64 it is also needed to take into account SIGN_EXTRACT in the
>>> set_address_base and set_address_index routines, as we acan encounter
>>> that kind of insn for instance :
>>>
>>> (insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
>>> (subreg:DI (reg/v:SI 76 [ elt ]) 0)
>>> ...
>>
>> OK.
>>
>>> with the attached patch and the LRA enabled, compiler now bootstrap
>>> but I've few regressions in the testsuite,
>>
>> It seems ok to me but I am confused of the following change:
>>
>>  set_address_base (struct address_info *info, rtx *loc, rtx *inner)
>>  {
>> -  if (GET_CODE (*inner) == LO_SUM)
>> +  if (GET_CODE (*inner) == SIGN_EXTRACT)
>> +inner = strip_address_mutations (&XEXP (*inner, 0));
>> +
>> +  if (GET_CODE (*inner) == LO_SUM || GET_CODE (*inner) == MULT)
>>
>>  inner = strip_address_mutations (&XEXP (*inner, 0));
>>gcc_checking_assert (REG_P (*inner)
>> || MEM_P (*inner)
>>
>> base address should not contain MULT (which you added).  It is controlled by
>> the result of must_be_index_p.  So set_address_base should not have code for
>> MULT and you need to change must_be_index_p in a way that set_address_base
>> is not called for MULT.
>>
>>
>>> gcc.c/torture/execute/fp-cmp-4l.c for instance. I was looking at these
>>> issues before submitting  a complete AArch64 LRA enabling patch, but
>>> as you are speaking about that...
>>>
>>> Valdimir, for the ARM target I already had the ASHIFTRT and LSHIFTRT
>>> addition on my side, but still had an ICE during bootstrap with LRA
>>> when compiling fixed-bit.c (the Max number of generated reload insns
>>> we talk about already) is it working for you ?
>>>
>>>
>> I did not check thumb I guess.  If what you are asking about the problem you
>> sent me about 2 weeks ago, I guess one solution would be putting
>>
>>   if (lra_in_progress)
>> return NO_REGS;
>>
>> at the beginning of arm/targhooks.c::default_secondary_reload.  LRA is smart
>> enough to figure out what to do from constraints in most cases of when
>> reload needs secondary_reload.  In arm case, default_secondary_reload only
>> confuses LRA.
>>
>> I had no time to test this change, but it solves LRA cycling for the test
>> case you sent me.
>> =
>>
>> aarch32-lra.patch
>>
>>
>> N¬n‡r¥ªíÂ)emçhÂyhiם¢w^™©Ý
>>
>> aarch64-lra.patch
>>
>>
>> N¬n‡r¥ªíÂ)emçhÂyhiם¢w^™©Ý




Fwd: Re: [PATCH]. Fix HAVE_SYS_SDT_H for cross-compilation

2013-08-30 Thread Christian Bruel
(resent plain text, sorry)

A documentation comment on the proposed patch.

The issue occurred while building the target libgcc using the cross-gcc,
while cross-building a target-gcc

../../../../libgcc/unwind-dw2.c:42:21: fatal error: sys/sdt.h: No such
file or directory

indeed, auto-host.h had

/* Define if your target C library provides sys/sdt.h */
#define HAVE_SYS_SDT_H 1

because:

configure:26872: checking sys/sdt.h in the target C library
configure:26881: result: yes
(which is false)

So to cross build a target library |
--with-build-sysroot=|dir looks appropriate to specify the alternative
host root path.
but
--with-sysroot looks not appropriate because it changes the search paths
(that should still be /usr/include on the target tree).

So, consequently, the --with-build-sysroot documentation sentence
"This option is only useful when you are already using --with-sysroot."
looks incorrect to me as we seem to have here a use of
--with-build-sysroot without --with-sysroot.

Not sure if it's clear, but I'm wondering why this restriction in the
documentation ? Could we amend it  ?

Cheers

Christian

On 08/29/2013 10:36 AM, Christian Bruel wrote:
> Hello Bill and Jakub
>
> On 08/22/2013 07:47 PM, Jakub Jelinek wrote:
>> On Thu, Aug 22, 2013 at 09:39:48AM -0500, Bill Schmidt wrote:
>>> Hi Christian and Jakub,
>>>
>>> I'm curious whether there was ever any resolution for:
>>> http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01124.html.
> Sorry for not having sent a follow up for this.
>
> The problem is that configure was checking for cross features in the
> host root dir instead of the cross root dir.
>
> This SDT failure was only the visible part of the problem while building
> a Canadian Cross linux hosted GCC, as  we could as well silently test
> for different cross/target runtime features :-).
>
> I fixed this problem  by fixing the usage of with_build_sysroot while
> checking system features with target_header_dir when host != build.
> Checked for legacy issue with various bare or hosted SH4 compilers in
> various environments (linux, mingwn, cygwin)
>
> Comments ? does this is seems reasonable to push to trunk ?
>
> Cheers
>
> Christian
>
>






Re: [Patch] Rewrite regex matchers

2013-08-30 Thread Paolo Carlini

Hi,

On 08/30/2013 02:05 PM, Tim Shen wrote:

+  const _TraitsT&_M_traits;
+  _FlagT _M_flags;
+  bool   _M_is_non_matching;
+  std::set<_CharT>   _M_char_set;
+  std::set> _M_range_set;
+  _CharClassT_M_class_set;
another, very general comment: now that nothing is decided ABI-wise for 
these features, let's pay attention to the layouts, let's make sure that 
the data members are ordered in the best way to minimize the size. For 
example, when I see a bool sandwiched between big objects something 
seems at least weird... Let's make measurements and optimize for 64-bit 
but let's double check on 32-bit too.


Paolo.


Re: RFC: patch to build GCC for arm with LRA

2013-08-30 Thread Yvan Roux
Sorry for the previous off-the-list-html-format answer :(

On 30 August 2013 15:18, Richard Earnshaw  wrote:
> On 30/08/13 14:09, Yvan Roux wrote:
>> Hi,
>>
>> here is a request for comments on the 2 attached patches which enable
>> the build of GCC on ARM with LRA.  The patches introduce a new
>> undocumented option -mlra to use LRA instead of reload, as what was
>> done on previous LRA support, which is here to ease the test and
>> comparison with reload and not supposed to be kept in 4.9.
>>
>> On AArch32:
>>
>> * The patch includes the previous changes of this thread, add the
>> handling of ASHIFTRT, LSHIFTRT, ROTATE and ROTATERT in get_index_scale
>> and let LRA handle the constraint in Thumb.
>> * The status is that the build complete in ARM mode with a couple of
>> regressions in the testsuite, that I'll investigate:
>>   - gcc.c-torture/execute/20060102-1.c
>>   - gcc.c-torture/execute/961026-1.c
>>   - gcc.target/arm/atomic-comp-swap-release-acquire.c
>> and some build failures in libstdc++ at the pass manager level (there
>> is an invalid read in gt_ggc_mx)
>> * The build of libraries in Thumb mode still doesn't complete, as
>> Vladimir said the secondary_reload  modification solves LRA cycling
>> but we still have some issues.
>>
>> On AArch64 :
>>
>> * I modified must_be_index_p to avoid the call of set_address_base
>> with patterns which contains a MULT.
>> * The build complete, but there is a couple of regression in the
>> testsuite I'm looking at on
>>  - gcc.c-torture/execute/ieee/fp-cmp-4l.c
>>  - c-c++-common/torture/complex-sign-mul-minus-one.c
>> for instance.
>>
>> Any comments ?
>>
>
> Why are you posting to conflicting sets of patches for rtlanal.c?
> Changes to that file will have to work for all architectures; so while
> it helps a little bit to see which changes are needed for which target,
> ultimately you need one patch for that file that works everywhere.

Yes sorry for that, I've 2 dev branches on my side for AArch32 and
AArch64, and I thought that posting one patch for each will help
people who wants to test one target in particular, but I planned to
make either one patch for switching ARM on LRA at the end or 2 patches
without conflict.

> Also, as a style nit, use
>
>   return (test
>   || test
>   || test);
>
> so that the parenthesis will force correct indentation of continuation
> lines.

Ok, will fix this.

Thanks Richard

>
> R.
>
>> Thanks
>> Yvan
>>
>>
>>
>>
>> On 6 July 2013 01:12, Vladimir Makarov  wrote:
>>> On 13-07-05 8:43 AM, Yvan Roux wrote:

 Hi,

 for AArch64 it is also needed to take into account SIGN_EXTRACT in the
 set_address_base and set_address_index routines, as we acan encounter
 that kind of insn for instance :

 (insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
 (subreg:DI (reg/v:SI 76 [ elt ]) 0)
 ...
>>>
>>> OK.
>>>
 with the attached patch and the LRA enabled, compiler now bootstrap
 but I've few regressions in the testsuite,
>>>
>>> It seems ok to me but I am confused of the following change:
>>>
>>>  set_address_base (struct address_info *info, rtx *loc, rtx *inner)
>>>  {
>>> -  if (GET_CODE (*inner) == LO_SUM)
>>> +  if (GET_CODE (*inner) == SIGN_EXTRACT)
>>> +inner = strip_address_mutations (&XEXP (*inner, 0));
>>> +
>>> +  if (GET_CODE (*inner) == LO_SUM || GET_CODE (*inner) == MULT)
>>>
>>>  inner = strip_address_mutations (&XEXP (*inner, 0));
>>>gcc_checking_assert (REG_P (*inner)
>>> || MEM_P (*inner)
>>>
>>> base address should not contain MULT (which you added).  It is controlled by
>>> the result of must_be_index_p.  So set_address_base should not have code for
>>> MULT and you need to change must_be_index_p in a way that set_address_base
>>> is not called for MULT.
>>>
>>>
 gcc.c/torture/execute/fp-cmp-4l.c for instance. I was looking at these
 issues before submitting  a complete AArch64 LRA enabling patch, but
 as you are speaking about that...

 Valdimir, for the ARM target I already had the ASHIFTRT and LSHIFTRT
 addition on my side, but still had an ICE during bootstrap with LRA
 when compiling fixed-bit.c (the Max number of generated reload insns
 we talk about already) is it working for you ?


>>> I did not check thumb I guess.  If what you are asking about the problem you
>>> sent me about 2 weeks ago, I guess one solution would be putting
>>>
>>>   if (lra_in_progress)
>>> return NO_REGS;
>>>
>>> at the beginning of arm/targhooks.c::default_secondary_reload.  LRA is smart
>>> enough to figure out what to do from constraints in most cases of when
>>> reload needs secondary_reload.  In arm case, default_secondary_reload only
>>> confuses LRA.
>>>
>>> I had no time to test this change, but it solves LRA cycling for the test
>>> case you sent me.
>>> =
>>>
>>> aarch32-lra.patch
>>>
>>>
>>> N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™©Ý
>>>
>>> aarch64-lra.patch
>>>
>>>
>>> N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™

Re: [Patch] Rewrite regex matchers

2013-08-30 Thread Jakub Jelinek
On Fri, Aug 30, 2013 at 03:26:59PM +0200, Paolo Carlini wrote:
> Hi,
> 
> On 08/30/2013 02:05 PM, Tim Shen wrote:
> >+  const _TraitsT&_M_traits;
> >+  _FlagT _M_flags;
> >+  bool   _M_is_non_matching;
> >+  std::set<_CharT>   _M_char_set;
> >+  std::set> _M_range_set;
> >+  _CharClassT_M_class_set;
> another, very general comment: now that nothing is decided ABI-wise
> for these features, let's pay attention to the layouts, let's make
> sure that the data members are ordered in the best way to minimize
> the size. For example, when I see a bool sandwiched between big
> objects something seems at least weird... Let's make measurements
> and optimize for 64-bit but let's double check on 32-bit too.

You could use pahole utility from dwarves for this, though not sure how far
it is with support for DWARF4 and extensions recent gcc emit, so perhaps you
need -gdwarf-3 -gstrict-dwarf or similar.

Jakub


Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Michael Matz
Hi,

On Thu, 29 Aug 2013, David Malcolm wrote:

> Successfully bootstrapped and tested on x86_64-unknown-linux-gnu: all
> testcases show the same results as an unpatched build (relative to
> r202029).

I'd like to see some statistics for cc1{,plus} codesize and for compile 
time of something reasonably large (needing say 60 seconds to 
compile normally), before/after patch series.

And the manual GTY markers are so not maintainable in the long run, 
gengtype or something else really needs to be taught to create them 
automatically.


Ciao,
Michael.


Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Gabriel Dos Reis
On Fri, Aug 30, 2013 at 8:44 AM, Michael Matz  wrote:

> And the manual GTY markers are so not maintainable in the long run,
> gengtype or something else really needs to be taught to create them
> automatically.

I thought the principle that was acquired was that gengtype shouldn't
be improved to support more than what it does now….

-- Gaby


Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Jakub Jelinek
On Fri, Aug 30, 2013 at 08:58:43AM -0500, Gabriel Dos Reis wrote:
> On Fri, Aug 30, 2013 at 8:44 AM, Michael Matz  wrote:
> 
> > And the manual GTY markers are so not maintainable in the long run,
> > gengtype or something else really needs to be taught to create them
> > automatically.
> 
> I thought the principle that was acquired was that gengtype shouldn't
> be improved to support more than what it does now….

If it means that we'll need to write and maintain tons of hand written code
that could otherwise be generated and maintained by a tool for us, that
principle doesn't look very good.

Jakub


Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Gabriel Dos Reis
On Fri, Aug 30, 2013 at 9:02 AM, Jakub Jelinek  wrote:
> On Fri, Aug 30, 2013 at 08:58:43AM -0500, Gabriel Dos Reis wrote:
>> On Fri, Aug 30, 2013 at 8:44 AM, Michael Matz  wrote:
>>
>> > And the manual GTY markers are so not maintainable in the long run,
>> > gengtype or something else really needs to be taught to create them
>> > automatically.
>>
>> I thought the principle that was acquired was that gengtype shouldn't
>> be improved to support more than what it does now….
>
> If it means that we'll need to write and maintain tons of hand written code
> that could otherwise be generated and maintained by a tool for us, that
> principle doesn't look very good.
>
> Jakub

Back in March 2013, I asked about gengtype support for inheritance.

   http://gcc.gnu.org/ml/gcc/2013-03/msg00273.html

This

   http://gcc.gnu.org/ml/gcc/2013-03/msg00295.html

was the definitive answer that appeared to be the consensus.

-- Gaby


Committed: avoid null-pointer dereference in verify_changes

2013-08-30 Thread Joern Rennecke

I got nine ICEs in verify_changes for compile/20110401-1.c on ARC.

Fixed by checking changes[i].old before dereferencing it.

Bootstrapped on i686-pc-linux.gnu.

Committed as obvious.
2013-05-20  Joern Rennecke  

* recog.c (verify_changes): Verify that changes[i].old is non-zero
before applying REG_P.

Index: recog.c
===
--- recog.c (revision 202106)
+++ recog.c (working copy)
@@ -397,7 +397,10 @@ verify_changes (int num)
 MEM_ADDR_SPACE (object)))
break;
}
-  else if (REG_P (changes[i].old)
+  else if (/* changes[i].old might be zero, e.g. when putting a
+  REG_FRAME_RELATED_EXPR into a previously empty list.  */
+  changes[i].old
+  && REG_P (changes[i].old)
   && asm_noperands (PATTERN (object)) > 0
   && REG_EXPR (changes[i].old) != NULL_TREE
   && DECL_ASSEMBLER_NAME_SET_P (REG_EXPR (changes[i].old))


Re: [C++ Patch] PR 51424

2013-08-30 Thread Jason Merrill

On 08/30/2013 05:06 AM, Paolo Carlini wrote:

I should have explained some of that in better detail. The main issue I
had yesterday, is that the pattern matching can easily become very
difficult if not impossible: if you look at the second half of
expand_default_init, in some cases we wrap the returned CALL_EXPR in a
COND_EXPR


Ah, yes.  Let's go with your first patch, then, rather than make the 
change in multiple places.


Jason




Re: Merge profiles of duplicated COMDAT functions

2013-08-30 Thread Andi Kleen
Jan Hubicka  writes:
>
> Now for longer term, we want to make function CFGs independent of gimple body
> and we want to decide on instrumentation at linktime, so we won't require user
> to rebuild with -fprofile-use, just relink.  I plan to work on this, but 
> not for 4.9. 

It may be already obsolete with AutoFDO? 

Longer term I hope more and more of the explicit instrumentation 
functionality can be implemented using the hardware PMU.

For example (auto fdo doesn't do this today) it would be quite
possible to do x86 value profiling without explicit instrumentation
too by looking at the PEBS records, and likely similar functionality
elsewhere.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [Patch, AArch64] Remove arm_neon.h's dependency on stdint's macros.

2013-08-30 Thread Richard Earnshaw
On 30/08/13 15:03, Tejas Belagod wrote:
> Hi,
> 
> The attached patch removes dependency on stdint's macros used in arm_neon.h 
> viz. 
> UINT64_C() and INT64_C() making arm_neon.h more C++-friendly.
> 
> Tested on aarch64-none-elf.
> 
> OK for trunk?
> 
> Thanks,
> Tejas Belagod.
> ARM.
> 
> Changelog:
> 
> 2013-08-30  Tejas Belagod  
> 
> gcc/
>   * config/aarch64/arm_neon.h (__AARCH64_UINT64_C, __AARCH64_INT64_C): New
>   arm_neon.h's internal macros to specify 64-bit constants. Avoid using
>   stdint.h's macros.
> 
> 

OK.

R.




[PATCH] Don't issue array bound warnings on zero-length arrays

2013-08-30 Thread Meador Inge
Hi All,

This patch fixes a minor issue that can occur when issuing array bounds
warnings.  In GNU C mode we allow empty lists and their upper bound is
initialized to -1.  This confuses the array bound analysis in VRP and
in some cases we end up issuing false positives.  This patch fixes
the issue by bailing out when a zero-length array is encountered.

OK for trunk?

gcc/

2013-08-30  Meador Inge  

* tree-vrp.c (check_array_ref): Bail out no emtpy arrays.

gcc/testsuite/

2013-08-30  Meador Inge  

* gcc.dg/Warray-bounds-11.c: New testcase.

Index: gcc/testsuite/gcc.dg/Warray-bounds-11.c
===
--- gcc/testsuite/gcc.dg/Warray-bounds-11.c (revision 0)
+++ gcc/testsuite/gcc.dg/Warray-bounds-11.c (revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Warray-bounds -std=gnu99" } */
+/* Test zero-length arrays for GNU C.  */
+
+unsigned int a[] = { };
+unsigned int size_a;
+
+int test(void)
+{
+  /* This should not warn.  */
+  return size_a ? a[0] : 0;
+}
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 202088)
+++ gcc/tree-vrp.c  (working copy)
@@ -6137,9 +6137,10 @@ check_array_ref (location_t location, tr
   low_sub = up_sub = TREE_OPERAND (ref, 1);
   up_bound = array_ref_up_bound (ref);
 
-  /* Can not check flexible arrays.  */
+  /* Can not check flexible arrays or zero-length arrays.  */
   if (!up_bound
-  || TREE_CODE (up_bound) != INTEGER_CST)
+  || TREE_CODE (up_bound) != INTEGER_CST
+  || tree_int_cst_equal (up_bound, integer_minus_one_node))
 return;
 
   /* Accesses to trailing arrays via pointers may access storage


Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread David Malcolm
On Fri, 2013-08-30 at 09:12 -0500, Gabriel Dos Reis wrote:
> On Fri, Aug 30, 2013 at 9:02 AM, Jakub Jelinek  wrote:
> > On Fri, Aug 30, 2013 at 08:58:43AM -0500, Gabriel Dos Reis wrote:
> >> On Fri, Aug 30, 2013 at 8:44 AM, Michael Matz  wrote:
> >>
> >> > And the manual GTY markers are so not maintainable in the long run,
> >> > gengtype or something else really needs to be taught to create them
> >> > automatically.
> >>
> >> I thought the principle that was acquired was that gengtype shouldn't
> >> be improved to support more than what it does now….
> >
> > If it means that we'll need to write and maintain tons of hand written code
> > that could otherwise be generated and maintained by a tool for us, that
> > principle doesn't look very good.
> >
> > Jakub
> 
> Back in March 2013, I asked about gengtype support for inheritance.
> 
>http://gcc.gnu.org/ml/gcc/2013-03/msg00273.html
> 
> This
> 
>http://gcc.gnu.org/ml/gcc/2013-03/msg00295.html
> 
> was the definitive answer that appeared to be the consensus.

[adding Diego to the CC]

I get the impression from the responses so far to this and to the symtab
patches [1] that people would prefer that gengtype be somehow expanded
to cope with single-inheritance.

Handwaving over syntax (borrowing from the union-marking syntax),
perhaps something like this:

struct
  GTY((chain_next ("%h.next")),
   desc ("gimple_statement_structure (&%h)"))
  gimple_statement_base {
   /* as before */
};

struct GTY((subclass_tag ("GSS_PHI"))) gimple_statement_phi : public
gimple_statement_base {
}

and then have gengtype recognize the inheritance hierarchy below
gimple_statement_base, and "do the right thing" (again I'm handwaving).

Also, perhaps we could use a new GTY flag:  "never_in_pch" or somesuch,
so that we can mark e.g. gimple and rtx as never appearing in pch, and
thus only the GC hooks need to exist; the PCH either don't need to be
generated, or could be stubs containing just gcc_unreachable?   That
would be an independent feature from the subclass support, of course.

That said, for this particular patch series, for the hand-maintained
route, there would just be one big GTY-related function to maintain,
rather than 3, given the lack of need for PCH support.

Dave

[1] http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01152.html



Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-30 Thread Teresa Johnson
Can someone review and ok the attached patch for trunk? It has been
bootstrapped and tested on x86-64-unknown-linux-gnu, and tested by
enabling -freorder-blocks-and-partition enabled for a
profiledbootstrap as well.

(Honza, see more responses inlined below. Rong, please see note below as well).

Thanks,
Teresa

On Fri, Aug 30, 2013 at 2:14 AM, Jan Hubicka  wrote:
>> Great! Is this the LTO merging you were talking about in an earlier
>> message, or the gcov runtime fix (that would presumably not be
>> lto-specific)?
>
> It is the LTO path - we need to merge profiles there anyway for his code 
> unification
> work.

Rong - can you send a summary of the approach you are working on? Is
it LIPO-specific?

>
>> > I have patch to track this.  Moreover vforks seems to produce repeated
>> > merging of results.
>>
>> Aha, does this explain the gimp issue as well?
>
> Not really - we still need to debug why we hit cold section so many times with
> partitioning.  I sitll think easier approach will be to lock the cold section 
> and
> then start probably with testsuite (i.e. write script to compile the small 
> testcases
> with FDO + partitioning and see what crash by hitting cold section).

Ok, that is on my todo list.

>
>> >
>> > Is it really necessary to run this from internal loop of the cfgcleanup?
>>
>> The reason I added it here is that just below there is a call to
>> verify_flow_info, and that will fail with the new verification.
>
> Hmm, OK, I suppose we run the cleanup after partitioning just once or twice, 
> right?
> We can track this incrementally - I am not sure if we put it from the 
> internal iteration
> loop we would get anything substantial either.
> Removing unreachable blocks twice is however ugly.

When I was debugging the issue that led to this change I seemed to see
1-2 iterations typically. Although I haven't measured it
scientifically. It would be good to revisit that and see if we can
pull both parts out of the loop, but as a separate patch.

>
>> +/* Ensure that all hot bbs are included in a hot path through the
>> +   procedure. This is done by calling this function twice, once
>> +   with WALK_UP true (to look for paths from the entry to hot bbs) and
>> +   once with WALK_UP false (to look for paths from hot bbs to the exit).
>> +   Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
>> +   to BBS_IN_HOT_PARTITION.  */
>> +
>> +static unsigned int
>> +sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
>> +vec *bbs_in_hot_partition)
>> +{
>> +  /* Callers check this.  */
>> +  gcc_checking_assert (cold_bb_count);
>> +
>> +  /* Keep examining hot bbs while we still have some left to check
>> + and there are remaining cold bbs.  */
>> +  vec hot_bbs_to_check = bbs_in_hot_partition->copy ();
>> +  while (! hot_bbs_to_check.is_empty ()
>> + && cold_bb_count)
>> +{
>> +  basic_block bb = hot_bbs_to_check.pop ();
>> +  vec *edges = walk_up ? bb->preds : bb->succs;
>> +  edge e;
>> +  edge_iterator ei;
>> +  int highest_probability = 0;
>> +  int highest_freq = 0;
>> +  gcov_type highest_count = 0;
>> +  bool found = false;
>> +
>> +  /* Walk the preds/succs and check if there is at least one already
>> + marked hot. Keep track of the most frequent pred/succ so that we
>> + can mark it hot if we don't find one.  */
>> +  FOR_EACH_EDGE (e, ei, edges)
>> +{
>> +  basic_block reach_bb = walk_up ? e->src : e->dest;
>> +
>> +  if (e->flags & EDGE_DFS_BACK)
>> +continue;
>> +
>> +  if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
>> +  {
>> +found = true;
>> +break;
>> +  }
>> +  /* The following loop will look for the hottest edge via
>> + the edge count, if it is non-zero, then fallback to the edge
>> + frequency and finally the edge probability.  */
>> +  if (e->count > highest_count)
>> +highest_count = e->count;
>> +  int edge_freq = EDGE_FREQUENCY (e);
>> +  if (edge_freq > highest_freq)
>> +highest_freq = edge_freq;
>> +  if (e->probability > highest_probability)
>> +highest_probability = e->probability;
>> +}
>> +
>> +  /* If bb is reached by (or reaches, in the case of !WALK_UP) another 
>> hot
>> + block (or unpartitioned, e.g. the entry block) then it is ok. If 
>> not,
>> + then the most frequent pred (or succ) needs to be adjusted.  In the
>> + case where multiple preds/succs have the same frequency (e.g. a
>> + 50-50 branch), then both will be adjusted.  */
>> +  if (found)
>> +continue;
>> +
>> +  FOR_EACH_EDGE (e, ei, edges)
>> +{
>> +  if (e->flags & EDGE_DFS_BACK)
>> +continue;
>> +  /* Select the hottest edge using the edge count, if it is 
>> non-zero,
>> + then fallback to the 

Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Michael Matz
Hi,

On Fri, 30 Aug 2013, Gabriel Dos Reis wrote:

> On Fri, Aug 30, 2013 at 9:02 AM, Jakub Jelinek  wrote:
> 
> >> I thought the principle that was acquired was that gengtype shouldn't
> >> be improved to support more than what it does now….
> >
> > If it means that we'll need to write and maintain tons of hand written code
> > that could otherwise be generated and maintained by a tool for us, that
> > principle doesn't look very good.

Exactly.

> Back in March 2013, I asked about gengtype support for inheritance.
>http://gcc.gnu.org/ml/gcc/2013-03/msg00273.html
> This
>http://gcc.gnu.org/ml/gcc/2013-03/msg00295.html
> was the definitive answer that appeared to be the consensus.

Well, it was a wrong decision then.  For some smaller types writing manual 
marker might be a sensible thing, or for some extra complicated 
constructs.  But here we're talking about the most simple struct hierarchy 
imaginable.  Having to write manual markers for that one is absurd IMO.


Ciao,
Michael.

converting wide-int so that it uses its own type rather than hwi.

2013-08-30 Thread Kenneth Zadeck

richi,

on further thought, this is going to be a huge task.   The problem is at 
the edges.   right now we share the rep of the array with the tree-csts 
and rtl (though the rtl sharing may go away to support canonization).  
So if the hwi rep inside of wide int changes then we will have to 
implement copying with reblocking at that interface or push the type 
into there and change all of the fits_*hwi and to_*hwi interfaces to fit 
this different type.


i think i can get at least as good and perhaps better test coverage by 
changing the rep of hwi for a port.There will also be fallout work 
there, but it will be productive, in that it is just changing code from 
only doing the hwi case to supporting all precisions.


Kenny


Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Gabriel Dos Reis
On Fri, Aug 30, 2013 at 10:21 AM, Michael Matz  wrote:
> Hi,
>
> On Fri, 30 Aug 2013, Gabriel Dos Reis wrote:
>
>> On Fri, Aug 30, 2013 at 9:02 AM, Jakub Jelinek  wrote:
>>
>> >> I thought the principle that was acquired was that gengtype shouldn't
>> >> be improved to support more than what it does now….
>> >
>> > If it means that we'll need to write and maintain tons of hand written code
>> > that could otherwise be generated and maintained by a tool for us, that
>> > principle doesn't look very good.
>
> Exactly.
>
>> Back in March 2013, I asked about gengtype support for inheritance.
>>http://gcc.gnu.org/ml/gcc/2013-03/msg00273.html
>> This
>>http://gcc.gnu.org/ml/gcc/2013-03/msg00295.html
>> was the definitive answer that appeared to be the consensus.
>
> Well, it was a wrong decision then.  For some smaller types writing manual
> marker might be a sensible thing, or for some extra complicated
> constructs.  But here we're talking about the most simple struct hierarchy
> imaginable.  Having to write manual markers for that one is absurd IMO.

We can reserve the emotional strong words for later.  For now, the focus
should be for us to ensure we are being consistent and making design
decisions that carry consensus, hence my original note.

-- Gaby


Re: [Patch,Aarch64] Make frame grow downwards

2013-08-30 Thread Christophe Lyon
Hi,

Following our discussions about changing the frame direction on
aarch64, I have written a small test case using a large frame which
shows similar code generation with and without my patch. There
are small differences in register allocation which lead to larger
prologue/epilogue with my patch, but I think this could (should) be
improved independently.

I have attached the sample C source and the code generated with
(frane64.s.O2) and without (frame64.s.O2.ref) my patch when compiling
at -O2.

What do you think?

Thanks,

Christophe.

On 28 June 2013 13:41, Christophe Lyon  wrote:
> Hi,
> Following the discussion on
> http://gcc.gnu.org/ml/gcc/2013-05/msg00208.html
> here is a patch to change the frame direction.
>
> Passed 'make check' with no regression on aarch64-none-linux-gnu and
> with a few UNSUPPORTED -> PASS on aarch64-none-elf (pr34225, pr38616,
> pr38902, pr40971, pr46440, pr47766, pr49307, all of which are guarded
> by dg-require-effective-target fstack_protector).
>
> OK for trunk?
>
> Thanks,
>
> Christophe.
> 2013-06-28  Christophe Lyon  
>
> * config/aarch64/aarch64.h (FRAME_GROWS_DOWNWARD): Define to 1.
> * config/aarch64/aarch64.c (aarch64_initial_elimination_offset):
> Add get_frame_size () when eliminating frame pointer.
#define SIZE1 10

extern int func2(int);

int func1(int arg1, int arg2)
{
  int var1;
  int array1[SIZE1];
  int var2 = 0;

  for (var1 = arg1; var1 < arg2; var1++) {
array1[var1] = func2(var1);
  }

  for (var1 = arg1; var1 < arg2; var1++) {
var2 += array1[var1];
  }

  var2 += (array1[arg1] + array1[arg2-1]) / 2;

  return var2;
}


frame64.s.O2
Description: Binary data


frame64.s.O2.ref
Description: Binary data


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-30 Thread Jan Hubicka
> >
> > The frequency condition needs to be done only when you walk predecestors - 
> > when
> > you walk down the edge probabilities are just fine.
> 
> True. For simplicity I think it should be fine to leave as-is so there
> isn't more special casing as the current approach works in both
> directions.

Yep, you are right. Frequencies are safe both directions.

I think this change belongs to profile feedback category, so the patch is OK.

Honza


Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Diego Novillo
On Fri, Aug 30, 2013 at 11:21 AM, Michael Matz  wrote:
>
> Hi,
>
> On Fri, 30 Aug 2013, Gabriel Dos Reis wrote:
>
> > On Fri, Aug 30, 2013 at 9:02 AM, Jakub Jelinek  wrote:
> >
> > >> I thought the principle that was acquired was that gengtype shouldn't
> > >> be improved to support more than what it does now….
> > >
> > > If it means that we'll need to write and maintain tons of hand written 
> > > code
> > > that could otherwise be generated and maintained by a tool for us, that
> > > principle doesn't look very good.
>
> Exactly.
>
> > Back in March 2013, I asked about gengtype support for inheritance.
> >http://gcc.gnu.org/ml/gcc/2013-03/msg00273.html
> > This
> >http://gcc.gnu.org/ml/gcc/2013-03/msg00295.html
> > was the definitive answer that appeared to be the consensus.
>
> Well, it was a wrong decision then.  For some smaller types writing manual
> marker might be a sensible thing, or for some extra complicated
> constructs.  But here we're talking about the most simple struct hierarchy
> imaginable.  Having to write manual markers for that one is absurd IMO.

I want to discourage extending gengtype more than necessary. Long
term, I would like to see memory pools replacing GC. However, that is
likely a long road so we should find an interim solution.

I vaguely remember thinking about what would be needed to have
gengtype deal with inheritance. It needed some pretty ugly
annotations. This made gengtype even more magic.  That's very bad for
maintenance.

One thing I liked is a suggestion that went something along the lines
of creating some base templates that could be used to facilitate
writing the manual markers.

Perhaps we could minimally extend gengtype to generate those. But I
think we can take advantage of C++ template features to facilitate
this.


Diego.


Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-30 Thread Rainer Orth
Torvald Riegel  writes:

> On Mon, 2013-08-26 at 09:49 -0700, Richard Henderson wrote:
>> On 08/22/2013 02:57 PM, Torvald Riegel wrote:
>> > On Thu, 2013-08-22 at 12:05 -0700, Richard Henderson wrote:
>> >> On 08/22/2013 11:39 AM, Torvald Riegel wrote:
>> >>> +/* Store edi for future HTM fast path retries.  We use a stack 
>> >>> slot
>> >>> +   lower than the jmpbuf so that the jmpbuf's rip field will 
>> >>> overlap
>> >>> +   with the proper return address on the stack.  */
>> >>> +movl%edi, -64(%rsp)
>> >>
>> >> You havn't allocated the stack frame here, and you're storing
>> >> outside the redzone.  This is invalid.
>> >>
>> >> Two possibilities:
>> >>
>> >>  (1) always allocate the stack frame on entry to
>> >>  the function (adds two register additions to
>> >>  the htm fast path -- in the noise i'd think)
>> >>
>> >>  (2) store the edi value in the non-htm path, with
>> >>  the pr_HTMRetryableAbort bit or'd in.  (adds an
>> >>  extra store to the non-htm path; probably noise).
>> >>  You'd want to mask out that bit when you reload it.
>> > 
>> > Oops.  Picked fix (2).  Better now?
>> > 
>> 
>> Move the andl of edi down into the HAVE_AS_RTM block, above the orl of
>> HTMRetriedAfterAbort.  Ok with that change.
>
> Committed as r202101.
>

The patch has broken Solaris/x86 bootstrap:

ro@arenal 290 > /bin/ksh ./libtool --tag=CXX   --mode=compile 
/var/gcc/regression/trunk/11-gcc/build/./gcc/xg++ 
-B/var/gcc/regression/trunk/11-gcc/build/./gcc/ -nostdinc++ -nostdinc++ 
-I/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11
 
-I/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include
 -I/vol/gcc/src/hg/trunk/local/libstdc++-v3/libsupc++ 
-I/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/backward 
-I/vol/gcc/src/hg/trunk/local/libstdc++-v3/testsuite/util 
-L/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/src 
-L/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/src/.libs
 
-L/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/libsupc++/.libs
 -B/vol/gcc/i386-pc-solaris2.11/bin/ -B/vol/gcc/i386-pc-solaris2.11/lib/ 
-isystem /vol/gcc/i386-pc-solaris2.11/include -isystem 
/vol/gcc/i386-pc-solaris2.11/sys-include-DHAVE_CONFIG_H -I. 
-I/vol/gcc/src/hg/trunk/local/libitm  
-I/vol/gcc/src/hg/trunk/local/libitm/config/x86 
-I/vol/gcc/src/hg/trunk/local/libitm/config/posix 
-I/vol/gcc/src/hg/trunk/local/libitm/config/generic 
-I/vol/gcc/src/hg/trunk/local/libitm  -march=i486 -mtune=i386 
-fomit-frame-pointer -mrtm -Wall -Werror  -Wc,-pthread -std=gnu++0x 
-funwind-tables -fno-exceptions -fno-rtti -fabi-version=4 -g -O2 -MT rwlock.lo 
-MD -MP -MF .deps/rwlock.Tpo -c -o rwlock.lo 
/vol/gcc/src/hg/trunk/local/libitm/config/posix/rwlock.cc
libtool: compile:  /var/gcc/regression/trunk/11-gcc/build/./gcc/xg++ 
-B/var/gcc/regression/trunk/11-gcc/build/./gcc/ -nostdinc++ -nostdinc++ 
-I/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11
 
-I/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include
 -I/vol/gcc/src/hg/trunk/local/libstdc++-v3/libsupc++ 
-I/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/backward 
-I/vol/gcc/src/hg/trunk/local/libstdc++-v3/testsuite/util 
-L/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/src 
-L/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/src/.libs
 
-L/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/libsupc++/.libs
 -B/vol/gcc/i386-pc-solaris2.11/bin/ -B/vol/gcc/i386-pc-solaris2.11/lib/ 
-isystem /vol/gcc/i386-pc-solaris2.11/include -isystem 
/vol/gcc/i386-pc-solaris2.11/sys-include -DHAVE_CONFIG_H -I. 
-I/vol/gcc/src/hg/trunk/local/libitm 
-I/vol/gcc/src/hg/trunk/local/libitm/config/x86 
-I/vol/gcc/src/hg/trunk/local/libitm/config/posix 
-I/vol/gcc/src/hg/trunk/local/libitm/config/generic 
-I/vol/gcc/src/hg/trunk/local/libitm -march=i486 -mtune=i386 
-fomit-frame-pointer -mrtm -Wall -pthread -Werror -std=gnu++0x -funwind-tables 
-fno-exceptions -fno-rtti -fabi-version=4 -g -O2 -MT rwlock.lo -MD -MP -MF 
.deps/rwlock.Tpo -c /vol/gcc/src/hg/trunk/local/libitm/config/posix/rwlock.cc  
-fPIC -DPIC -o .libs/rwlock.o
In file included from /vol/gcc/src/hg/trunk/local/libitm/libitm_i.h:83:0,
 from 
/vol/gcc/src/hg/trunk/local/libitm/config/posix/rwlock.cc:25:
/vol/gcc/src/hg/trunk/local/libitm/config/posix/rwlock.h: In constructor 
'GTM::gtm_rwlock::gtm_rwlock()':
/vol/gcc/src/hg/trunk/local/libitm/config/posix/rwlock.h:64:18: error: 
'GTM::gtm_rwlock::c_confirmed_writers' will be initialized after 
[-Werror=reorder]
   pthread_cond_t c_confirmed_writers; // Writers wait here for readers
  ^
/vol/gcc/src/hg/trunk/local/libitm/config/posix/rwlock.h:59:29: error:   
'std::atomic GTM::gtm_rwlock::summary' [-Werror=reorder]
   std::ato

Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Diego Novillo
On Fri, Aug 30, 2013 at 11:37 AM, Jakub Jelinek  wrote:

> Teaching the gengtype parser about
> {struct,class} name : public {struct,class} someothername { ... }
> as opposed to current
> {struct,class} name { ... }
> shouldn't be that hard.  And, if the complaint is that we'd need to write
> whole C++ parser for it, then the response could be that we already have
> one C++ parser (and, even have plugin support and/or emit dwarf etc.).

It isn't.  It's annoying and a duplication of effort.

> So, gengtype could very well use a g++ plugin to emit the stuff it needs,
> or parse DWARF, etc.  And, we even could not require everybody actually
> emitting those themselves, we could check some text form of that
> (gengtype.state?) into the tree, so only people actually changing the
> compiler would need to run the plugin.

Yes.  Lawrence and I thought about moving gengtype inside g++.  That
seemed like a promising approach.

> Even if you have some stuff that helps you writing those, still it will be a
> big source of bugs (very hard to debug) and a maintainance nightmare.

Debugging gengtype is much harder.  It is magic code that is not visible.


Diego.


Re: Folding (a ? b : c) op d

2013-08-30 Thread Mike Stump
On Aug 30, 2013, at 2:38 AM, Marc Glisse  wrote:
> There is no undoing here, it is just a recursion that gets very slow for 
> nested conditions:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55219

Gosh, such simple and small code.  :-)

Re: [PATCH] Don't issue array bound warnings on zero-length arrays

2013-08-30 Thread Jeff Law

On 08/30/2013 09:13 AM, Meador Inge wrote:

Hi All,

This patch fixes a minor issue that can occur when issuing array bounds
warnings.  In GNU C mode we allow empty lists and their upper bound is
initialized to -1.  This confuses the array bound analysis in VRP and
in some cases we end up issuing false positives.  This patch fixes
the issue by bailing out when a zero-length array is encountered.

OK for trunk?

gcc/

2013-08-30  Meador Inge  

* tree-vrp.c (check_array_ref): Bail out no emtpy arrays.

gcc/testsuite/

2013-08-30  Meador Inge  

* gcc.dg/Warray-bounds-11.c: New testcase.

OK for the trunk.  Thanks,
Jeff



Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-30 Thread Dominique Dhumieres
> Committed as r202101.

This cause a bootstrap failure:

...
libtool: compile:  /opt/gcc/build_w/./gcc/xg++ -B/opt/gcc/build_w/./gcc/ 
-nostdinc++ -nostdinc++ 
-I/opt/gcc/build_w/x86_64-apple-darwin10.8.0/libstdc++-v3/include/x86_64-apple-darwin10.8.0
 -I/opt/gcc/build_w/x86_64-apple-darwin10.8.0/libstdc++-v3/include 
-I/opt/gcc/work/libstdc++-v3/libsupc++ 
-I/opt/gcc/work/libstdc++-v3/include/backward 
-I/opt/gcc/work/libstdc++-v3/testsuite/util 
-L/opt/gcc/build_w/x86_64-apple-darwin10.8.0/libstdc++-v3/src 
-L/opt/gcc/build_w/x86_64-apple-darwin10.8.0/libstdc++-v3/src/.libs 
-L/opt/gcc/build_w/x86_64-apple-darwin10.8.0/libstdc++-v3/libsupc++/.libs 
-B/opt/gcc/gcc4.9w/x86_64-apple-darwin10.8.0/bin/ 
-B/opt/gcc/gcc4.9w/x86_64-apple-darwin10.8.0/lib/ -isystem 
/opt/gcc/gcc4.9w/x86_64-apple-darwin10.8.0/include -isystem 
/opt/gcc/gcc4.9w/x86_64-apple-darwin10.8.0/sys-include -DHAVE_CONFIG_H -I. 
-I../../../work/libitm -I../../../work/libitm/config/x86 
-I../../../work/libitm/config/posix -I../../../work/libitm/config/generic 
-I../../../work/libitm -mrtm -Wall -pthread -Werror -std=gnu++0x 
-funwind-tables -fno-exceptions -fno-rtti -fabi-version=4 -g -O2 -MT rwlock.lo 
-MD -MP -MF .deps/rwlock.Tpo -c ../../../work/libitm/config/posix/rwlock.cc  
-fno-common -DPIC -o .libs/rwlock.o
In file included from ../../../work/libitm/libitm_i.h:83:0,
 from ../../../work/libitm/config/posix/rwlock.cc:25:
../../../work/libitm/config/posix/rwlock.h: In constructor 
'GTM::gtm_rwlock::gtm_rwlock()':
../../../work/libitm/config/posix/rwlock.h:64:18: error: 
'GTM::gtm_rwlock::c_confirmed_writers' will be initialized after 
[-Werror=reorder]
   pthread_cond_t c_confirmed_writers; // Writers wait here for readers
  ^
../../../work/libitm/config/posix/rwlock.h:59:29: error:   
'std::atomic GTM::gtm_rwlock::summary' [-Werror=reorder]
   std::atomic summary; // Bitmask of the above.
 ^
../../../work/libitm/config/posix/rwlock.cc:32:1: error:   when initialized 
here [-Werror=reorder]
 gtm_rwlock::gtm_rwlock()
 ^
cc1plus: all warnings being treated as errors
make[4]: *** [rwlock.lo] Error 1
make[3]: *** [all-recursive] Error 1
make[2]: *** [all] Error 2
make[1]: *** [all-target-libitm] Error 2
make: *** [all] Error 2

on x86_64-apple-darwin10.

TIA

Dominique


Re: Fix PR middle-end/57370

2013-08-30 Thread Easwaran Raman
On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener
 wrote:
> On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman  wrote:
>> Richard,
>> Do you want me to commit everything but the  insert_stmt_after hunk
>> now?
>
> Yes.
>
>> There are more similar failures reported in
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
>> the updated patch there. Shall I send that for review? Apart from the
>> debug statement issue, almost all the bugs are due to dependence
>> violation because certain newly inserted statements do not have the
>> right UID. Instead of trying to catch all of them, will it be better
>> if I check if the stmt has a proper uid (non-zero if it is not the
>> first stmt) and assign a sensible value at the point where it is used
>> (find_insert_point and appears_later_in_bb) instead of where the stmt
>> is created? I think that would be less brittle.
>
> In the end all this placement stuff should be reverted and done as
> post-processing after reassoc is finished.  You can remember the
> inserted stmts that are candidates for moving (just set a pass-local
> flag on them) and assign UIDs for the whole function after all stmts
> are inserted.

The problem is we need sane UID values as reassoc is happening - not
after it is finished. As it process each stmt in reassoc_bb, it might
generate new stmts which might have to be compared in
find_insert_point or appears_later_in_bb.

- Easwaran

> Richard.
>
>
>> - Easwaran
>>
>>
>>
>> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
>>  wrote:
>>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
 I have a new patch that supersedes this. The new patch also fixes PR
 tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
 no test regression on x86_64/linux. Ok for trunk?

 2013-07-31  Easwaran Raman  

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
 of debug statements that cause inconsistent IR.
>>>
>>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
>>> at all.  The other hunks are ok, but we need to work harder to preserve
>>> debug stmts - simply removing all is not going to fly.
>>>
>>> Richard.
>>>

 testsuite/ChangeLog:
 2013-07-31  Easwaran Raman  

 PR middle-end/57370
 PR tree-optimization/57393
 PR tree-optimization/58011
 * gfortran.dg/reassoc_12.f90: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.


 On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
> Ping.
>
> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  
> wrote:
>> A newly generated statement in build_and_add_sum  function of
>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>> statement. In one instance, it was assigned the wrong uid (of an
>> earlier phi statement) which messed up the IR and caused the test
>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>> Ok for trunk?
>>
>> Thanks,
>> Easwaran
>>
>>
>> 2013-06-27  Easwaran Raman  
>>
>> PR middle-end/57370
>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of 
>> a phi
>> node for a non-phi gimple statement.
>>
>> testsuite/ChangeLog:
>> 2013-06-27  Easwaran Raman  
>>
>> PR middle-end/57370
>> * gfortran.dg/reassoc_12.f90: New testcase.
>>
>>
>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
>> ===
>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>> @@ -0,0 +1,74 @@
>> +! { dg-do compile }
>> +! { dg-options "-O2 -ffast-math" }
>> +! PR middle-end/57370
>> +
>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
>> +   grad_deriv,npoints, sx)
>> +IMPLICIT REAL*8 (t)
>> +INTEGER, PARAMETER :: dp=8
>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
>> +   e_ndrho_ndrho_rho
>> +  DO ii=1,npoints
>> +  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
>> +t1425 = t233 * t557
>> +t1429 = beta * t225
>> +t1622 = t327 * t1621
>> +t1626 = t327 * t1625
>> +t1632 = t327 * t1631
>> +t1685 = t105 * t1684
>> +t2057 = t1636 + t8 * (t2635 + t3288)
>> +  END IF
>> +  IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN
>> +t5469 = t5440 - t5443 - t5446 - t5449 - &
>> +t5451 - t5454 - t5456 + t5459  - &
>> +t5462 + t5466 - t5468
>>

Re: COMDAT missing profile handling (was Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition)

2013-08-30 Thread Teresa Johnson
On Fri, Aug 30, 2013 at 2:31 AM, Jan Hubicka  wrote:
>> > The esitmated profile is already there before reading the profile in, so we
>> > only do not want to overwrite it.  Does the following work for you?
>>
>> It does get the estimated frequencies on the bbs.
>
> Good.

I ended up making some slight modifications, see below.

>
>> > We wil also need to solve problem that in this case cgraph edges will have 
>> > 0 profile.
>> > We probably want to play the game there and just do the scaling for edge 
>> > count,
>> > since IPA passes probably do not want to care about partial profiles.
>>
>> The problem I am having is that my new freqs_to_counts routine takes
>> the sum of the incoming edge counts and computes the bb counts by
>> scaling by the estimated bb frequency. But in the case where the
>> caller was also missing profile data the edge count will have 0
>> profile as you note above, so the counts remain 0. I am not quite sure
>> how to handle this. Instead of using the sum of the incoming profile
>> counts, I could just pick a default profile count to use as the entry
>> bb count. The synthesized counts should get scaled down appropriately
>> during inlining.
>
> I am having problem to think of what default we should pick here.

What I tried was if the sum of the incoming counts was 0 then use the
BB_FREQ_MAX as the fake incoming profile count and apportioning that
to the bbs based on the guessed profile frequencies. Rather than
iterating, I am simply doing this for all 0 count functions. I put an
assert in the inliner when testing to ensure that we never inline a
0-count body. However, I realized after reading your proposed solution
below that this is not good since it will cause all of these (possibly
truly cold) functions to not be placed in the unlikely text section.

As an aside, the patch I am testing is hitting an issue in a
profiledbootstrap that I think is an unrelated bug that seems to be
provoked by my changes:

../../gcc_trunk_7/gcc/config/i386/i386.c:33752:34: error: array
subscript is above array bounds [-Werror=array-bounds]
 if (ipar[i] + 1 != ipar[i + 1])

which seems to be a bogus error (probably related to PR 56456 and
other specific instances of this warning being bogus that have been
filed). Not sure what to do about this. But I am guessing it will not
be triggered once I rework the patch in any case...


>
> Thinking about the whole problem, I think the following sounds like possible
> sollution:
>
> 1) use the patch above to get frequencies computed where counts are 0.

Do you mean my original patch, where I am doing this via new function
init_and_estimate_bb_frequencies called from branch_prob()? I assume
so - that will give us frequencies on the branches but no counts yet,
and leave the profile status as READ.

>Make sure that branch probailities are guessed for functions with non-0
>counts for each of branch that has count 0.

Not sure about the second sentence above. Do you mean also guess
probabilities within functions that have other non-zero counts in
them. Or do you mean those reached via non-zero cgraph call counts?
The init_and_estimate_bb_frequencies routine will only guess
frequencies for routines that are all-0. I thought that the
probabilities on branches that are 0 would be 0, which seems correct
for functions with other non-zero counts?

> 2) At a point we see inconsistency in the IPA profile (i.e. we get to fact
>that we have direct calls with non-0 counts to a comdat with 0 count),
>drop the profile_status for them to profile_guessed and let the backed
>optimize them as if there was no FDO.
>I do not think we can do better for those.

I assume you mean sometime later than where we were currently doing
this when reading counts. Maybe during ipa-profile. That way it can be
done iteratively to handle the case where a 0-count comdat is calling
another 0-count comdat where they both are going to be reached via a
non-zero call further up.

>
>If we do not see this problem, we can keep status as READ so the function
>will get size optimized.  Here we will lose for indirect calls, but I
>do not see how to handle these short of making every 0 count COMDAT 
> guessed.
>Perhaps you can try how these variants work for you performance/code size
>wise.

Yes, I think you are right. We will lose a bit on the out-of-line
copy, but still should get this handled properly when inlined due to
3) below.

> 3) at a time we inline function with guessed profile into function with read
>profile, we will use your trick to feed the counts in based on frequencies.
>
>We will do that in tree-inline when inlining. I.e. we will never feed fake
>counts into non-inline functions

Right, so invoke a new routine like the following, when inlining and
before we invoke the copy_body, etc that scales the counts. Where the
count used can be the current sum of incoming cgraph edge counts
(presumably non-zero at this point since we decided 

Ubsan merged into trunk

2013-08-30 Thread Marek Polacek
I've just merged ubsan into trunk.  Please send complaints my way.
Thanks,

Marek


Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Gabriel Dos Reis
On Fri, Aug 30, 2013 at 10:37 AM, Jakub Jelinek  wrote:
> On Fri, Aug 30, 2013 at 11:28:34AM -0400, Diego Novillo wrote:
>> > Well, it was a wrong decision then.  For some smaller types writing manual
>> > marker might be a sensible thing, or for some extra complicated
>> > constructs.  But here we're talking about the most simple struct hierarchy
>> > imaginable.  Having to write manual markers for that one is absurd IMO.
>>
>> I want to discourage extending gengtype more than necessary. Long
>> term, I would like to see memory pools replacing GC. However, that is
>> likely a long road so we should find an interim solution.
>
> I have doubts that, still somewhat remember the obstack era and memory pools
> would again bring all the issues back, but let's put that aside.

We didn't have the automation of RAII with obstacle, back then.
We do have evidence of widely use industrial
C and C++ compilers that are built around the idea of memory pools.

>> I vaguely remember thinking about what would be needed to have
>> gengtype deal with inheritance. It needed some pretty ugly
>> annotations. This made gengtype even more magic.  That's very bad for
>> maintenance.
>
> Teaching the gengtype parser about
> {struct,class} name : public {struct,class} someothername { … }

I would suggest NOT to allow the {struct, class} part in the base-class clause.

> as opposed to current
> {struct,class} name { ... }
> shouldn't be that hard.  And, if the complaint is that we'd need to write
> whole C++ parser for it, then the response could be that we already have
> one C++ parser (and, even have plugin support and/or emit dwarf etc.).

We should not need one.  At most the base classes would have the form

   optional-typename optional-qualifying-scope  identifier
optional-template-argument-list

that should cover most of what we want.

> So, gengtype could very well use a g++ plugin to emit the stuff it needs,
> or parse DWARF, etc.  And, we even could not require everybody actually
> emitting those themselves, we could check some text form of that
> (gengtype.state?) into the tree, so only people actually changing the
> compiler would need to run the plugin.
>
>> One thing I liked is a suggestion that went something along the lines
>> of creating some base templates that could be used to facilitate
>> writing the manual markers.
>
> Even if you have some stuff that helps you writing those, still it will be a
> big source of bugs (very hard to debug) and a maintainance nightmare.
>
> Jakub


Re: [PATCH] Convert more passes to new dump framework

2013-08-30 Thread Xinliang David Li
Except that in this form, the dump will be extremely large and not
suitable for very large applications. Besides, we might also want to
use the same machinery (dump_printf_loc etc) for dump file dumping.
The current behavior of using '-details' to turn on opt-info-all
messages for dump files are not desirable.  How about the following:

1) add a new dump_kind modifier so that when that modifier is
specified, the messages won't goto the alt_dumpfile (controlled by
-fopt-info), but only to primary dump file. With this, the inline
messages can be dumped via:

   dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .)


2) add more flags in -fdump- support:

   -fdump-ipa-inline-opt   --> turn on opt-info messages only
   -fdump-ipa-inline-optall --> turn on opt-info-all messages
   -fdump-tree-pre-ir --> turn on GIMPLE dump only
   -fdump-tree-pre-details --> turn on everything (ir, optall, trace)

With this, developers can really just use


-fdump-ipa-inline-opt=stderr for inline messages.

thanks,

David

On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
 wrote:
> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson  wrote:
>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>>  wrote:
>> New patch below that removes this global variable, and also outputs
>> the node->symbol.order (in square brackets after the function name so
>> as to not clutter it). Inline messages with profile data look look:
>>
>> test.c:8:3: note: foobar [0] (9000) inlined into foo [2] (1000)
>> with call count 9000 (via inline instance bar [3] (9000))
>
> Ick.  This looks both redundant and cluttered.  This is supposed to be
> understandable by GCC users, not only GCC developers.

 The main part that is only useful/understandable to gcc developers is
 the node->symbol.order in square brackes, requested by Martin. One
 possibility is that I could put that part under a param, disabled by
 default. We have something similar on the google branches that emits
 LIPO module info in the message, enabled via a param.
>>>
>>> But we have _dump files_ for that.  That's the developer-consumed
>>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>>> units it shouldn't exceed a single terminal full of output.
>>
>> But as a developer I don't want to have to parse lots of dump files
>> for a summary of the major optimizations performed (e.g. inlining,
>> unrolling) for an application, unless I am diving into the reasons for
>> why or why not one of those optimizations occurred in a particular
>> location. I really do want a summary emitted to stderr so that it is
>> easily searchable/summarizable for the app as a whole.
>>
>> For example, some of the apps I am interested in have thousands of
>> input files, and trying to collect and parse dump files for each and
>> every one is overwhelming (it probably would be even if my input files
>> numbered in the hundreds). What has been very useful is having these
>> high level summary messages of inlines and unrolls emitted to stderr
>> by -fopt-info. Then it is easy to search and sort by hotness to get a
>> feel for things like what inlines are missing when moving to a new
>> compiler, or compiling a new version of the source, for example. Then
>> you know which files to focus on and collect dump files for.
>
> I thought we can direct dump files to stderr now?  So, just use
> -fdump-tree-all=stderr
>
> and grep its contents.
>
>>>
 I'd argue that the other information (the profile counts, emitted only
 when using -fprofile-use, and the inline call chains) are useful if
 you want to understand whether and how critical inlines are occurring.
 I think this is the type of information that users focused on
 optimizations, as well as gcc developers, want when they use
 -fopt-info. Otherwise it is difficult to make sense of the inline
 information.
>>>
>>> Well, I doubt that inline information is interesting to users unless we are
>>> able to aggressively filter it to what users are interested in.  Which IMHO
>>> isn't possible - users are interested in "I have not inlined this even 
>>> though
>>> inlining would severely improve performance" which would indicate a bug
>>> in the heuristics we can reliably detect and thus it wouldn't be there.
>>
>> I have interacted with users who are aware of optimizations such as
>> inlining and unrolling and want to look at that information to
>> diagnose performance differences when refactoring code or using a new
>> compiler version. I also think inlining (especially cross-module) is
>> one example of an optimization that is still being tuned, and user
>> reports of performance issues related to that have been useful.
>>
>> I really think that the two groups of people who will find -fopt-info
>> useful are gcc developers and savvy performance-hungry users. For the
>> former group the additional info is extremely useful. For the latter
>> gro

Re: Fix PR middle-end/57370

2013-08-30 Thread Jakub Jelinek
On Fri, Aug 30, 2013 at 09:13:34AM -0700, Easwaran Raman wrote:
> >> There are more similar failures reported in
> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
> >> the updated patch there. Shall I send that for review? Apart from the
> >> debug statement issue, almost all the bugs are due to dependence
> >> violation because certain newly inserted statements do not have the
> >> right UID. Instead of trying to catch all of them, will it be better
> >> if I check if the stmt has a proper uid (non-zero if it is not the
> >> first stmt) and assign a sensible value at the point where it is used
> >> (find_insert_point and appears_later_in_bb) instead of where the stmt
> >> is created? I think that would be less brittle.
> >
> > In the end all this placement stuff should be reverted and done as
> > post-processing after reassoc is finished.  You can remember the
> > inserted stmts that are candidates for moving (just set a pass-local
> > flag on them) and assign UIDs for the whole function after all stmts
> > are inserted.
> 
> The problem is we need sane UID values as reassoc is happening - not
> after it is finished. As it process each stmt in reassoc_bb, it might
> generate new stmts which might have to be compared in
> find_insert_point or appears_later_in_bb.

A new stmt will be created with UID 0 and in case there is stmt movement,
you could zero the UID during movement.  Then, you can just special case
UID zero when you are looking at UIDs.  As on trunk/4.8 gsi_for_stmt is
O(1), you can easily walk a couple of previous and/or later stmts and look
for the first non-zero UID around it, and treat it as if the UID was
previous non-zero UID + 0.5 or next non-zero UID - 0.5.  And, if you see
too many consecutive statements with UID 0 (more than some constant, 32?),
just reassign UIDs to the whole bb.  Or you could initially assign UIDs
with some gaps, then keep filling those gaps and once you fill them,
renumber again with new gaps.

Jakub


Re: Fix PR middle-end/57370

2013-08-30 Thread Jakub Jelinek
On Fri, Aug 30, 2013 at 09:49:59AM -0700, Easwaran Raman wrote:
> Yes, this is pretty much what I was proposing. The current
> implementation doesn't rely on UIDs being unique - they only have to
> be monotonically non-decreasing. So, when a UID of 0 is encountered,
> go up till a non-zero UID is found and then go down and assign this
> non-zero UID. This effectively implies that each UID-0 stmt is visited
> at most twice. I don't think we need to check if I see more than
> certain number of UID-0 stmts and redo the entire BB.

Sure, renumbering the entire BB would be only for compile time reasons;
if you end up with a huge bb where 90% of stmts will have the same UID,
it is as if you weren't using UIDs at all and always walked the entire BB
to find out what stmt precedes what.  You could spend a lot of time in
appears_later_in_bb.

Looking at the code, couple of nits:
  return ((bb_a == bb_b && gimple_uid (a)  < gimple_uid (b))
extra space before <.

  gsi_next (&gsi);
  if (gsi_end_p (gsi))
return stmt1;
  for (; !gsi_end_p (gsi); gsi_next (&gsi))
{
...
}
  return stmt1;

Why not just for (gsi_next (&gsi); !gsi_end_p (gsi); gsi_next (&gsi)) ?
The extra if (gsi_end_p (gsi)) return stmt1; doesn't make any sense
when the loop does exactly that too.

And for the debug stmts, are you sure you are resetting only those debug
stmts which you have to reset?  I mean, if there are say debug stmts using
the SSA_NAME in other bb, it doesn't need to be reset (unless you reuse
the same SSA_NAME for something else), and the compiler even has code
to reconstruct SSA_NAMEs that have been removed, but were still referenced
in debug stmts, using expressions in debug stmts and debug temporaries.

Jakub


RE: [PATCH] PR58143/58227 wrong code at -O3

2013-08-30 Thread Bernd Edlinger
On Thu, 29 Aug 2013 11:54:22, Richard Biener wrote:
> On Wed, Aug 28, 2013 at 11:24 PM, Bernd Edlinger
>  wrote:
>> The lim pass (aka loop invariant motion) can move conditional expressions 
>> with
>> possible undefined behavior out of the if statement inside a loop which may 
>> cause the
>> loop optimization to silently generate wrong code as PR 
>> tree-optimization/58143 and
>> PR tree-optimization/58227 demonstrate.
>>
>> This patch prevents lim from moving code that might cause undefined behavior.
>>
>> This triggered a minor regression in gcc.target/i386/pr53397-1.c:
>> Here lim used to move the expression "2*step" out of the loop, but this may 
>> cause
>> undefined behavior on case of overflow, I propose to resolve this by adding
>> -fno-strict-overflow. The test case looks pretty constructed anyway.
>>
>> The patch was boot-strapped and regression tested on i686-pc-linux-gnu and
>> X86_64-linux-gnu
>>
>> OK for trunk?
>
> 2013-08-28 Bernd Edlinger 
>
> PR tree-optimization/58143
> PR tree-optimization/58227
> Prevent lim from moving conditional expressions
> if that could trigger undefined behavior.
> * gimple.h (gimple_could_trap_p_2): New function.
> * gimple.c (gimple_could_trap_p_2): New function.
> (gimple_could_trap_p_1): Call gimple_could_trap_p_2.
> * tree-ssa-loop-im.c (movement_possibility): Likewise.
>
> Please fold the overall comment into the movement_possibility changelog.
>

done.

> Can you instead of introducing gimple_could_trap_p_2 simply change the
> signature and name of gimple_could_trap_p_1 please? It is only called
> three times.
> A proper name could be gimple_could_trap_or_invoke_undefined_p.
>

done.

> That you need to change the two testcases makes me nervous (also please
> use -fwrapv, not -fno-strict-overflow) - did you check what happens for them?
>

Yes, I looked at that again.  That is an interesting story.
Initially the function looks like:

for (a = 1; a < step; a++) {
   for (b = 0; b  1 && n> 0) {
  for (a = 1; a < step; a++) {
    pretmp = step * 2;
    for (b = 0; b < n; b += pretmp) {

Well "step*2" was initially at loop level 2, but now I see, it is actually at 
loop level 1.

When lim does not move this out of the loop the test case fails.
But if lim moves this statement out of the loop despite the possible undefined
behavior, the test case passes.

Now I think this is good opportunity for a simple heuristic:

If a statement is at loop level 1 we can move it out of the loop,
regardless of the fact, that it may invoke undefined behavior, because if it is
moved then out of any loops, and thus it cannot be an induction variable and
cause problems with aggressive loop optimizations later.

Only statements with possible undefined behavior in nested loops can become
induction variable if lim moves them from one loop into an outer loop.

Therefore with extremely much luck the test case will pass unmodified.
I tried it, and the patch passes bootstrap and causes zero regressions
with this heuristic.

Ok for trunk now?

Thanks
Bernd.

> Thanks,
> Richard.
>
>
>
>> Regards
>> Bernd Edlinger 2013-08-30  Bernd Edlinger  

PR tree-optimization/58143
PR tree-optimization/58227
* gimple.c (gimple_could_trap_p_1): Rename to ...
(gimple_could_trap_or_invoke_undefined_p): ... this
and add new parameter include_undefined.
(gimple_could_trap_or_invoke_undefined_p): Adjust.
(gimple_assign_rhs_could_trap_p): Adjust.
* gimple.h (gimple_could_trap_p_1): Renamed.
* tree-if-conv.c (ifcvt_could_trap_p): Adjust.
* tree-ssa-loop-im.c (movement_possibility): Prevent
lim from moving conditional expressions if that could
trigger undefined behavior in nested loops.
(determine_invariantness_stmt): Determine if this
is a nested loop.


testsuite/

* gcc.dg/pr58143.c: New test.



patch-lim.diff
Description: Binary data


[patch, fortran, docs] Unformatted sequential and special files

2013-08-30 Thread Thomas Koenig
Hello world,

the attached patch documents the format for unformatted sequential
files and what is, and is not, supported with special files.

Tested with "make dvi" and "make info".  OK for trunk?

Thomas

2013-08-30  Thomas Koenig  

PR fortran/30162
* gfortran.texi:  Document unformatted sequential file format
and I/O with special files.
Index: gfortran.texi
===
--- gfortran.texi	(Revision 201996)
+++ gfortran.texi	(Arbeitskopie)
@@ -1121,6 +1121,8 @@
 * Internal representation of LOGICAL variables::
 * Thread-safety of the runtime library::
 * Data consistency and durability::
+* Unformatted sequential file format::
+* I/O with special files::
 @end menu
 
 
@@ -1291,7 +1293,75 @@
 releasing @code{fcntl} file locks, if the server supports them, will
 also force cache validation and flushing dirty data and metadata.
 
+@node Unformatted sequential file format
+@section Unformatted sequential file format
+@cindex unformatted sequential files
+@cindex record marker
+@cindex subrecord
 
+Unformatted sequential files are stored using record markers. Each
+full record consists of a leading record marker, the data written
+by the user program, and a trailing record marker.  The record markers
+are four-byte integers by default, and eight-byte integers if the
+@option{-fmax-subrecord-length=8} option is in effect. Each record
+marker contains the number of bytes of data in the record.
+
+The maximum number of bytes of user data in a record is 2147483639 for
+a four-byte record marker. If this is exceeded, a record is split into
+subrecords. Each subrecord also has a leading and a trailing record
+marker. If the leading record marker contains a negative number, the
+number of user data bytes in the subrecord equals the absolute value
+of this number, and another subrecord follows the current one.  If the
+trailing record marker contains a negative number, then the number of
+bytes of user data equals the absolute value of that number, and there
+is a preceding subrecord.
+
+The format for unformatted sequential data can be duplicated using
+unformatted stream, as shown in this example program:
+
+@smallexample
+program main
+  implicit none
+  integer :: i
+  real, dimension(10) :: a, b
+  call random_number(a)
+  open (10,file='test.dat',form='unformatted',access='stream')
+  inquire (iolength=i) a
+  write (10) i, a, i
+  close (10)
+  open (10,file='test.dat',form='unformatted')
+  read (10) b
+  if (all (a == b)) print *,'success!'
+end program main
+@end smallexample
+
+@node I/O with special files
+@section I/O with special files
+@cindex special files
+@cindex pipes
+@cindex FIFO
+@cindex terminal devices
+@cindex BACKSPACE
+
+Special files such as pipes, FIFOs or terminal devices are supported
+only for the following types of file access:
+
+@itemize
+
+@item Formatted sequential
+
+@item Formatted stream
+
+@item Unformatted stream
+
+@end itemize
+
+Unformatted sequential file access is not supported for
+special files.  If necessary, it can be simulated using
+unformatted stream, see @ref{Unformatted sequential file format}.
+
+@code{BACKSPACE} is not supported for special files.
+
 @c -
 @c Extensions
 @c -


Re: Fix PR middle-end/57370

2013-08-30 Thread Easwaran Raman
On Fri, Aug 30, 2013 at 9:26 AM, Jakub Jelinek  wrote:
> On Fri, Aug 30, 2013 at 09:13:34AM -0700, Easwaran Raman wrote:
>> >> There are more similar failures reported in
>> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
>> >> the updated patch there. Shall I send that for review? Apart from the
>> >> debug statement issue, almost all the bugs are due to dependence
>> >> violation because certain newly inserted statements do not have the
>> >> right UID. Instead of trying to catch all of them, will it be better
>> >> if I check if the stmt has a proper uid (non-zero if it is not the
>> >> first stmt) and assign a sensible value at the point where it is used
>> >> (find_insert_point and appears_later_in_bb) instead of where the stmt
>> >> is created? I think that would be less brittle.
>> >
>> > In the end all this placement stuff should be reverted and done as
>> > post-processing after reassoc is finished.  You can remember the
>> > inserted stmts that are candidates for moving (just set a pass-local
>> > flag on them) and assign UIDs for the whole function after all stmts
>> > are inserted.
>>
>> The problem is we need sane UID values as reassoc is happening - not
>> after it is finished. As it process each stmt in reassoc_bb, it might
>> generate new stmts which might have to be compared in
>> find_insert_point or appears_later_in_bb.
>
> A new stmt will be created with UID 0 and in case there is stmt movement,
> you could zero the UID during movement.  Then, you can just special case
> UID zero when you are looking at UIDs.  As on trunk/4.8 gsi_for_stmt is
> O(1), you can easily walk a couple of previous and/or later stmts and look
> for the first non-zero UID around it, and treat it as if the UID was
> previous non-zero UID + 0.5 or next non-zero UID - 0.5.  And, if you see
> too many consecutive statements with UID 0 (more than some constant, 32?),
> just reassign UIDs to the whole bb.  Or you could initially assign UIDs
> with some gaps, then keep filling those gaps and once you fill them,
> renumber again with new gaps.
> Jakub

Yes, this is pretty much what I was proposing. The current
implementation doesn't rely on UIDs being unique - they only have to
be monotonically non-decreasing. So, when a UID of 0 is encountered,
go up till a non-zero UID is found and then go down and assign this
non-zero UID. This effectively implies that each UID-0 stmt is visited
at most twice. I don't think we need to check if I see more than
certain number of UID-0 stmts and redo the entire BB.

- Easwaran


Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-30 Thread Torvald Riegel
On Fri, 2013-08-30 at 16:49 +0200, Rainer Orth wrote:
> Torvald Riegel  writes:
> 
> > On Mon, 2013-08-26 at 09:49 -0700, Richard Henderson wrote:
> >> On 08/22/2013 02:57 PM, Torvald Riegel wrote:
> >> > On Thu, 2013-08-22 at 12:05 -0700, Richard Henderson wrote:
> >> >> On 08/22/2013 11:39 AM, Torvald Riegel wrote:
> >> >>> +  /* Store edi for future HTM fast path retries.  We use a stack 
> >> >>> slot
> >> >>> + lower than the jmpbuf so that the jmpbuf's rip field will 
> >> >>> overlap
> >> >>> + with the proper return address on the stack.  */
> >> >>> +  movl%edi, -64(%rsp)
> >> >>
> >> >> You havn't allocated the stack frame here, and you're storing
> >> >> outside the redzone.  This is invalid.
> >> >>
> >> >> Two possibilities:
> >> >>
> >> >>  (1) always allocate the stack frame on entry to
> >> >>  the function (adds two register additions to
> >> >>  the htm fast path -- in the noise i'd think)
> >> >>
> >> >>  (2) store the edi value in the non-htm path, with
> >> >>  the pr_HTMRetryableAbort bit or'd in.  (adds an
> >> >>  extra store to the non-htm path; probably noise).
> >> >>  You'd want to mask out that bit when you reload it.
> >> > 
> >> > Oops.  Picked fix (2).  Better now?
> >> > 
> >> 
> >> Move the andl of edi down into the HAVE_AS_RTM block, above the orl of
> >> HTMRetriedAfterAbort.  Ok with that change.
> >
> > Committed as r202101.
> >
> 
> The patch has broken Solaris/x86 bootstrap:

Sorry about that.  I'll try to remember to test on non-Linux / non-futex
systems too next time.  I committed the attached patch as r202116.
(Please note that I'll be offline for two weeks starting in about an
hour (yes, bad timing), so I hope I didn't miss anything else...)

Torvald
commit 29006efcecde6ec0366f6c88a662d30b9efb931f
Author: torvald 
Date:   Fri Aug 30 17:13:05 2013 +

libitm: Fix wrong initialization order introduced with r202101.

* config/posix/rwlock.cc: Fix initialization order.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@202116 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libitm/config/posix/rwlock.cc b/libitm/config/posix/rwlock.cc
index 488e9c2..61b6ad9 100644
--- a/libitm/config/posix/rwlock.cc
+++ b/libitm/config/posix/rwlock.cc
@@ -30,11 +30,11 @@ namespace GTM HIDDEN {
 // ??? Move this back to the header file when constexpr is implemented.
 
 gtm_rwlock::gtm_rwlock()
-  : mutex (PTHREAD_MUTEX_INITIALIZER),
+  : summary (0),
+mutex (PTHREAD_MUTEX_INITIALIZER),
 c_readers (PTHREAD_COND_INITIALIZER),
 c_writers (PTHREAD_COND_INITIALIZER),
 c_confirmed_writers (PTHREAD_COND_INITIALIZER),
-summary (0),
 a_readers (0),
 w_readers (0),
 w_writers (0)


Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread Jakub Jelinek
On Fri, Aug 30, 2013 at 11:28:34AM -0400, Diego Novillo wrote:
> > Well, it was a wrong decision then.  For some smaller types writing manual
> > marker might be a sensible thing, or for some extra complicated
> > constructs.  But here we're talking about the most simple struct hierarchy
> > imaginable.  Having to write manual markers for that one is absurd IMO.
> 
> I want to discourage extending gengtype more than necessary. Long
> term, I would like to see memory pools replacing GC. However, that is
> likely a long road so we should find an interim solution.

I have doubts that, still somewhat remember the obstack era and memory pools
would again bring all the issues back, but let's put that aside.

> I vaguely remember thinking about what would be needed to have
> gengtype deal with inheritance. It needed some pretty ugly
> annotations. This made gengtype even more magic.  That's very bad for
> maintenance.

Teaching the gengtype parser about
{struct,class} name : public {struct,class} someothername { ... }
as opposed to current
{struct,class} name { ... }
shouldn't be that hard.  And, if the complaint is that we'd need to write
whole C++ parser for it, then the response could be that we already have
one C++ parser (and, even have plugin support and/or emit dwarf etc.).
So, gengtype could very well use a g++ plugin to emit the stuff it needs,
or parse DWARF, etc.  And, we even could not require everybody actually
emitting those themselves, we could check some text form of that
(gengtype.state?) into the tree, so only people actually changing the
compiler would need to run the plugin.

> One thing I liked is a suggestion that went something along the lines
> of creating some base templates that could be used to facilitate
> writing the manual markers.

Even if you have some stuff that helps you writing those, still it will be a
big source of bugs (very hard to debug) and a maintainance nightmare.

Jakub


Re: Ubsan merged into trunk

2013-08-30 Thread Toon Moene

On 08/30/2013 06:14 PM, Marek Polacek wrote:


I've just merged ubsan into trunk.  Please send complaints my way.
Thanks,

Marek


Just watch the equivalent of this one:

http://gcc.gnu.org/ml/gcc-testresults/2013-08/msg02869.html

tomorrow morning (substitute "java" for "go" and "ubsan" for "asan").

Kind regards,

--
Toon Moene, Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: [patch 4/4] -fstrict-volatile-bitfields cleanup v3: remove from defaults on all targets

2013-08-30 Thread DJ Delorie

In all the real-world cases I've seen, the vendor/programmer was
careful to lay out the structs properly.  I think if the struct does
not reflect a reasonable (or physically possible) layout, oh well ;-)


Re: [PATCH, c++ testsuite]: Improve g++.dg/abi/mangle33.C scan-asm pattern

2013-08-30 Thread Mike Stump
On Aug 30, 2013, at 2:02 AM, Uros Bizjak  wrote:
> I was looking at annoying
> 
> ERROR: g++.dg/abi/mangle33.C -std=c++98: error executing dg-final:
> couldn't compile regular expression pattern: out of memory

Wow…  They are amazingly clueless.  What part of the mandate, though shall not 
claim out of memory until the last byte of my 4 TB drive is allocated did you 
not understand?  Could someone file a tcl bug and say they want a 100 MB regexp 
please, the 1980s are over.  I was going to say 10 MB, and realized that is 
what I would have said in 1994, then realized it isn't 1994 anymore either.

 [ pause ]

I filed:

  http://core.tcl.tk/tcl/tktview/267b7e2334ee2e9de34c4b00d6e72e2f1997085f

Thanks for finding the original tcl bug and tracking this down and letting us 
know it was a regression from 8.5 to 8.6.

> The patch was tested with Tcl 8.5 where it works without problems.
> 
> OK for mainline?

Ok.

> BTW: There are a couple of similar problems with Tcl 8.6:

> [1] http://sourceforge.net/p/tcl/bugs/5178/

:-(  closed.

I've amended the Tcl pointer with the bug pointer:

2013-08-30  Mike Stump  

* doc/install.texi (Prerequisites): Note regression in Tcl 8.6 with RE 
patterns.

Index: doc/install.texi
===
--- doc/install.texi(revision 202083)
+++ doc/install.texi(working copy)
@@ -426,7 +426,11 @@ Necessary when modifying @command{gperf}
 @itemx Expect
 @itemx Tcl
 
-Necessary to run the GCC testsuite; see the section on testing for details.
+Necessary to run the GCC testsuite; see the section on testing for
+details.  Tcl 8.6 has a known regression in RE pattern handling that
+make parts of the testsuite fail.  See
+@uref{http://core.tcl.tk/tcl/tktview/267b7e2334ee2e9de34c4b00d6e72e2f1997085f}
+for more information.
 
 @item autogen version 5.5.4 (or later) and
 @itemx guile version 1.4.1 (or later)


I usually don't like enshrining bug reports into the documentation, but, for 
this bug, I'll make an exception.  Now, I'm hoping they will actually fix this 
in tcl, so that we don't have to do anything special in the test suite.  Wake 
me in a year and let's see if they fixed it.

Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-08-30 Thread David Malcolm
On Fri, 2013-08-30 at 15:44 +0200, Michael Matz wrote:
> Hi,
> 
> On Thu, 29 Aug 2013, David Malcolm wrote:
> 
> > Successfully bootstrapped and tested on x86_64-unknown-linux-gnu: all
> > testcases show the same results as an unpatched build (relative to
> > r202029).
> 
> I'd like to see some statistics for cc1{,plus} codesize and for compile 
> time of something reasonably large (needing say 60 seconds to 
> compile normally), before/after patch series.

Here's the result of a pair of builds of r202029 without and with the
patches, configured with --enable-checking=release, running "make", then
stripping debuginfo [1]

# ll */build/gcc/cc1
-rwxr-xr-x. 1 root root 13230048 Aug 30 15:00 control/build/gcc/cc1
-rwxr-xr-x. 1 root root 13230144 Aug 30 15:00 experiment/build/gcc/cc1
(98 bytes difference)

# ll */build/gcc/cc1obj
-rwxr-xr-x. 1 root root 13426336 Aug 30 15:00 control/build/gcc/cc1obj
-rwxr-xr-x. 1 root root 13426432 Aug 30 15:00 experiment/build/gcc/cc1obj
(96 bytes diff)

# ll */build/gcc/cc1plus
-rwxr-xr-x. 1 root root 14328480 Aug 29 13:59 control/build/gcc/cc1plus
-rwxr-xr-x. 1 root root 14328608 Aug 29 13:59 experiment/build/gcc/cc1plus
(128 bytes diff)

# ll */build/gcc/f951
-rwxr-xr-x. 1 root root 13960728 Aug 30 15:05 control/build/gcc/f951
-rwxr-xr-x. 1 root root 13960856 Aug 30 15:05 experiment/build/gcc/f951
(128 bytes diff)

# ll */build/gcc/jc1
-rwxr-xr-x. 1 root root 12607704 Aug 30 15:17 control/build/gcc/jc1
-rwxr-xr-x. 1 root root 12607704 Aug 30 15:17 experiment/build/gcc/jc1
(the same size)

So the overall sizes of such binaries are essentially unchanged.

To dig a bit deeper, I extended my asmdiff tool [2] to compare sizes of
functions; I'm attaching the results of comparing cc1plus before/after.

Any suggestions on what to compile to compare performance?  By 60
seconds, do you mean 60s for one TU, or do you mean a large build e.g.
the linux kernel?

> And the manual GTY markers are so not maintainable in the long run, 
> gengtype or something else really needs to be taught to create them 
> automatically.

Apart from the GTY aspect, how do people feel about the patch series?
FWIW I have vague thoughts about doing something similar for tree -
doing so *might* give an easier route to the type vs expression
separation that Andrew spoke about at the Cauldron rearchitecture BoF.
(I started looking at doing a similar C++-ification of rtx, but...
gah)

Dave

[1] yes, I built as root; this was done on a throwaway provisioning of a
RHEL 6.4 x86_64 box.
[2] https://github.com/davidmalcolm/asmdiff
Old: test/control/build/gcc/cc1plus
New: test/experiment/build/gcc/cc1plus
  Function removed: Function('_start-0x1a684')
  Function removed: Function('gt_pch_p_18gimple_statement_d(void*, void*, void 
(*)(void*, void*), void*)')
  Function removed: Function('gt_ggc_mx_gimple_statement_d(void*)')
  Function removed: Function('gt_pch_nx_gimple_statement_d(void*)')
  Function removed: Function('vec::safe_push(constraint_expr const&)')
  Function added: Function('_start-0x1a6ac')
  Function added: Function('gt_ggc_mx(gimple_statement_base*)')
  Function added: Function('gt_pch_nx(gimple_statement_base*)')
  Function added: Function('gt_pch_nx(gimple_statement_base*, void (*)(void*, 
void*), void*)')
  Function added: Function('gt_pch_p_21gimple_statement_base(void*, void*, void 
(*)(void*, void*), void*)')
  Function added: Function('gt_ggc_mx_gimple_statement_base(void*)')
  Function added: Function('gt_pch_nx_gimple_statement_base(void*)')
  Function hash_table::find_slot_with_hash(void 
const*, unsigned int, insert_option) changed size from 5984 to 5872 bytes
  Function same_succ_def::equal(same_succ_def const*, same_succ_def const*) 
changed size from 480 to 512 bytes
  Function maybe_remove_unreachable_handlers() changed size from 112 to 128 
bytes
  Function verify_ssa_operands(gimple_statement_d*) changed size from 927 to 
960 bytes
(renamed to verify_ssa_operands(gimple_statement_base*))
  Function vect_analyze_data_ref_accesses(_loop_vec_info*, _bb_vec_info*) 
changed size from 3920 to 3872 bytes
  Function make_pass_ipa_pta(gcc::context*) changed size from 36880 to 16464 
bytes
  Function inline_merge_summary(cgraph_edge*) changed size from 17904 to 17920 
bytes
  Function release_defs_bitset(bitmap_head_def*) changed size from 2464 to 2304 
bytes
  Function gt_pch_nx(loop*&) changed size from 144 to 32 bytes
  Function make_pass_fold_builtins(gcc::context*) changed size from 5711 to 
5727 bytes
  Function dump_value_range(_IO_FILE*, value_range_d*) changed size from 32880 
to 32800 bytes
  Function extract_true_false_edges_from_block(basic_block_def*, edge_def**, 
edge_def**) changed size from 20208 to 20224 bytes
  Function tree_ssa_lim() changed size from 4847 to 4831 bytes
  Function gt_ggc_mx_control_flow_graph(void*) changed size from 96 to 192 bytes
  Function vectorize_loops() changed size from 3999 to 4191 bytes
  Function dump_node(tree_node const*, int, _IO_FILE*) ch

Re: [PATCH 6/6] Add manual GTY hooks

2013-08-30 Thread David Malcolm
On Fri, 2013-08-30 at 10:40 +0200, Richard Biener wrote:
> On Thu, Aug 29, 2013 at 6:20 PM, David Malcolm  wrote:
> > * gimple.c (gt_ggc_mx (gimple)): New, as required by GTY((user)).
> > (gt_pch_nx (gimple)): Likewise.
> > (gt_pch_nx (gimple, gt_pointer_operator, void *)): Likewise.
> > * gimple.h  (gt_ggc_mx (gimple)): Declare.
> > (gt_pch_nx (gimple)): Declare.
> > (gt_pch_nx (gimple, gt_pointer_operator, void *)): Declare.
> > * tree-cfg.c (ggc_mx (gimple&)): Remove declaration, as this
> > collides with the function that GTY((user)) expects.
> > (gt_ggc_mx (edge_def *)): Replace call to gt_ggc_mx on the
> > gimple with gt_ggc_mx_gimple_statement_base: in the
> > pre-GTY((user)) world, "gt_ggc_mx" was the function to be called
> > on a possibly NULL pointed to check if needed marking and if so
> > to traverse its fields.  In the GTY((user)) world, "gt_ggc_mx"
> > is the function to be called on non-NULL objects immediately *after*
> > they have been marked: it does not mark the object itself.
> > (gt_pch_nx (gimple&)): Remove declaration.
> > (gt_pch_nx (edge_def *)): Update as per the mx hook.
> 
> Btw, this shows that gimple isn't a true C++ hierarchy - because of GTY
> you can only ever use 'gimple' pointers, not more specialized ones
> like gimple_phi as you are missing the GTY machinery for them.

One can definitely use pointers to subclasses on the *stack*, indeed,
the patches use this throughout; IMHO it's an improvement.

FWIW I did a quick test and was also able to add dummy bss subclass ptrs
e.g.:

  static GTY(()) gimple_statement_phi *dummy_phi;

This leads to gengtype creating code like this in gtype-desc.c:

void
gt_ggc_mx_gimple_statement_phi (void *x_p)
{
  gimple_statement_phi * const x = (gimple_statement_phi *)x_p;
  if (ggc_test_and_set_mark (x))
{
  gt_ggc_mx (x);
}
}

Note how the "gt_ggc_mx (x);" is resolved by using the base-class
implementation of gt_ggc_mx i.e.: gt_ggc_mx (gimple )
since a gimple_statement_phi * is-a gimple_statement_base *.

Anything more complicated than that (e.g. containers) and we're hitting
up against the limitations of gengtype, I guess.

> I'm not 100% convinced that we should do all this at this point without
> getting a better hand on the gengtype issues (it's partial transition to
> the C++ world of GCC)

[...]



Re: [Patch] Rewrite regex matchers

2013-08-30 Thread Jan-Benedict Glaw
On Fri, 2013-08-30 15:36:03 +0200, Jakub Jelinek  wrote:
> On Fri, Aug 30, 2013 at 03:26:59PM +0200, Paolo Carlini wrote:
> > On 08/30/2013 02:05 PM, Tim Shen wrote:
> > >+  const _TraitsT&_M_traits;
> > >+  _FlagT _M_flags;
> > >+  bool   _M_is_non_matching;
> > >+  std::set<_CharT>   _M_char_set;
> > >+  std::set> _M_range_set;
> > >+  _CharClassT_M_class_set;
> > another, very general comment: now that nothing is decided ABI-wise
> > for these features, let's pay attention to the layouts, let's make
> > sure that the data members are ordered in the best way to minimize
> > the size. For example, when I see a bool sandwiched between big
> > objects something seems at least weird... Let's make measurements
> > and optimize for 64-bit but let's double check on 32-bit too.
> 
> You could use pahole utility from dwarves for this, though not sure how far
> it is with support for DWARF4 and extensions recent gcc emit, so perhaps you
> need -gdwarf-3 -gstrict-dwarf or similar.

But please keep in mind that size is not everything. Cachelines count,
too! If you save four or eight bytes off of a stackframe (or struct or
whatever), but sacrifice a cacheline or two to access badly layouted
variables in eg. a long-running loop, you even lost performance.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:  GDB has a 'break' feature; why doesn't it have 'fix' too?
the second  :


signature.asc
Description: Digital signature


Re: [PATCH] Convert more passes to new dump framework

2013-08-30 Thread Teresa Johnson
On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li  wrote:
> Except that in this form, the dump will be extremely large and not
> suitable for very large applications.

Yes. I did some measurements for both a fairly large source file that
is heavily optimized with LIPO and for a simple toy example that has
some inlining. For the large source file, the output from
-fdump-ipa-inline=stderr was almost 100x the line count of the
-fopt-info output. For the toy source file it was 43x. The size of the
-details output was 250x and 100x, respectively. Which is untenable
for a large app.

The issue I am having here is that I want a more verbose message, not
a more voluminous set of messages. Using either -fopt-info-all or
-fdump-ipa-inline to provoke the more verbose inline message will give
me a much greater volume of output.

One compromise could be to emit the more verbose inliner message under
a param (and a more concise "foo inlined into bar" by default with
-fopt-info). Or we could do some variant of what David talks about
below.

> Besides, we might also want to
> use the same machinery (dump_printf_loc etc) for dump file dumping.
> The current behavior of using '-details' to turn on opt-info-all
> messages for dump files are not desirable.

Interestingly, this doesn't even work. When I do
-fdump-ipa-inline-details=stderr (with my patch containing the inliner
messages) I am not getting those inliner messages emitted to stderr.
Even though in dumpfile.c "details" is set to (TDF_DETAILS |
MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
sure why, but will need to debug this.

> How about the following:
>
> 1) add a new dump_kind modifier so that when that modifier is
> specified, the messages won't goto the alt_dumpfile (controlled by
> -fopt-info), but only to primary dump file. With this, the inline
> messages can be dumped via:
>
>dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .)

(you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )

Typically OR-ing together flags like this indicates dump under any of
those conditions. But we could implement special handling for
OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
the primary dump file, and only under the other conditions specified
in the flag (here under "-optimized")

>
>
> 2) add more flags in -fdump- support:
>
>-fdump-ipa-inline-opt   --> turn on opt-info messages only
>-fdump-ipa-inline-optall --> turn on opt-info-all messages

According to the documentation (see the -fdump-tree- documentation on
http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
the above are already supposed to be there (-optimized, -missed, -note
and -optall). However, specifying any of these gives a warning like:
   cc1: warning: ignoring unknown option ‘optimized’ in
‘-fdump-ipa-inline’ [enabled by default]
Probably because none is listed in the dump_options[] array in dumpfile.c.

However, I don't think there is currently a way to use -fdump- options
and *only* get one of these, as much of the current dump output is
emitted whenever there is a dump_file defined. Until everything is
migrated to the new framework it may be difficult to get this to work.

>-fdump-tree-pre-ir --> turn on GIMPLE dump only
>-fdump-tree-pre-details --> turn on everything (ir, optall, trace)
>
> With this, developers can really just use
>
>
> -fdump-ipa-inline-opt=stderr for inline messages.

Yes, if we can figure out a good way to get this to work (i.e. only
emit the optimized messages and not the rest of the dump messages).
And unfortunately to get them all you need to specify
"-fdump-ipa-all-optimized -fdump-tree-all-optimized
-fdump-rtl-all-optimized" instead of just -fopt-info. Unless we can
add -fdump-all-all-optimized.

Teresa

>
> thanks,
>
> David
>
> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
>  wrote:
>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson  wrote:
>>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>>>  wrote:
>>> New patch below that removes this global variable, and also outputs
>>> the node->symbol.order (in square brackets after the function name so
>>> as to not clutter it). Inline messages with profile data look look:
>>>
>>> test.c:8:3: note: foobar [0] (9000) inlined into foo [2] (1000)
>>> with call count 9000 (via inline instance bar [3] (9000))
>>
>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>> understandable by GCC users, not only GCC developers.
>
> The main part that is only useful/understandable to gcc developers is
> the node->symbol.order in square brackes, requested by Martin. One
> possibility is that I could put that part under a param, disabled by
> default. We have something similar on the google branches that emits
> LIPO module info in the message, enabled via a param.

 But we have _dump files_ for that.  That's the developer-consumed
>>

Clean up pretty printers [19/n]

2013-08-30 Thread Gabriel Dos Reis

Most of declaration pretty printing functions.
Tested on an x86_64-suse-linux.  Applied to trunk.

-- Gaby

2013-08-30  Gabriel Dos Reis  
c-family/
* c-pretty-print.h (c_pretty_printer::declaration): Now a virtual
member function.
(c_pretty_printer::declaration_specifiers): Likewise.
(c_pretty_printer::declarator): Likewise.
(c_pretty_printer::abstract_declarator): Likewise.
(c_pretty_printer::direct_abstract_declarator): Likewise.
(c_pretty_printer::direct_declarator): Likewise.
(c_pretty_printer::function_specifier): Likewise.
(pp_declaration): Adjust.
(pp_declaration_specifiers): Likewise.
(pp_abstract_declarator): Likewise.
(pp_direct_declarator): Likewise.
(pp_function_specifier): Likewise.
(pp_direct_abstract_declarator): Remove as unused.
(pp_c_declaration): Remove.
(pp_c_declaration_specifiers): Likewise.
(pp_c_declarator): Likewise.
(pp_c_direct_declarator): Likewise.
(pp_c_function_specifier): Likewise.
(pp_c_direct_abstract_declarator): Likewise.
* c-pretty-print.c (c_pretty_printer::abstract_declarator): Rename
from pp_c_abstract_declarator.  Adjust.
(c_pretty_printer::direct_abstract_declarator):  Rename from
pp_c_direct_abstract_declarator.  Adjust.
(c_pretty_printer::function_specifier): Rename from
pp_c_function_specifier.  Adjust.
(c_pretty_printer::declaration_specifiers): Rename from
pp_c_declaration_specifiers.  Adjust.
(c_pretty_printer::direct_declarator): Rename from
pp_c_direct_declarator.  Adjust.
(c_pretty_printer::declarator): Rename from pp_c_declarator.  Adjust.
(c_pretty_printer::declaration): Rename from pp_c_declaration.  Adjust.
(c_pretty_printer::c_pretty_printer): Do not assign to
declaration, declaration_specifiers, declarator,
direct_declarator, direct_abstract_declarator, function_specifier.

cp/
* cxx-pretty-print.h (cxx_pretty_printer::declaration): Declare as
overrider.
(cxx_pretty_printer::declaration_specifiers): Likewise.
(cxx_pretty_printer::function_specifier): Likewise.
(cxx_pretty_printer::declarator): Likewise.
(cxx_pretty_printer::direct_declarator): Likewise.
(cxx_pretty_printer::abstract_declarator): Likewise.
(cxx_pretty_printer::direct_abstract_declarator): Likewise.
(pp_cxx_declaration): Remove.
* cxx-pretty-print.c (cxx_pretty_printer::function_specifier):
Rename from pp_cxx_function_specifier.  Adjust.
(cxx_pretty_printer::declaration_specifiers): Rename from
pp_cxx_decl_specifier_seq.  Adjust.
(cxx_pretty_printer::direct_declarator): Rename from
pp_cxx_direct_declarator.  Adjust.
(cxx_pretty_printer::declarator): Rename from pp_cxx_declarator.
Adjust.
(cxx_pretty_printer::abstract_declarator): Rename from
pp_cxx_abstract_declarator.  Adjust.
(cxx_pretty_printer::direct_abstract_declarator): Rename from
pp_cxx_direct_abstract_declarator.  Adjust.
(cxx_pretty_printer::declaration): Rename from
pp_cxx_declaration.  Adjust.
(cxx_pretty_printer::cxx_pretty_printer): Do not assign to
declaration, declaration_specifiers, function_specifier,
declarator, direct_declarator, abstract_declarator,
direct_abstract_declarator.
* error.c (dump_decl): Adjust.

Index: gcc/c-family/c-pretty-print.c
===
--- gcc/c-family/c-pretty-print.c   (revision 202108)
+++ gcc/c-family/c-pretty-print.c   (working copy)
@@ -431,7 +431,7 @@
   function declarations, this routine prints not just the
   specifier-qualifier-list of such entities or types of such entities,
   but also the 'pointer' production part of their declarators.  The
-  remaining part is done by pp_declarator or pp_c_abstract_declarator.  */
+  remaining part is done by pp_declarator or pp_abstract_declarator.  */
 
 void
 pp_c_specifier_qualifier_list (c_pretty_printer *pp, tree t)
@@ -533,18 +533,18 @@
   pointer
   pointer(opt) direct-abstract-declarator  */
 
-static void
-pp_c_abstract_declarator (c_pretty_printer *pp, tree t)
+void
+c_pretty_printer::abstract_declarator (tree t)
 {
   if (TREE_CODE (t) == POINTER_TYPE)
 {
   if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE
  || TREE_CODE (TREE_TYPE (t)) == FUNCTION_TYPE)
-   pp_c_right_paren (pp);
+   pp_c_right_paren (this);
   t = TREE_TYPE (t);
 }
 
-  pp_direct_abstract_declarator (pp, t);
+  direct_abstract_declarator (t);
 }
 
 /* direct-abstract-declarator:
@@ -554,34 +554,34 @@
   direct-abstract-declarator(opt) ( parameter-type-list(opt) )  */
 
 void
-pp_c_direct_abstract_declarator (c_pretty_printer *pp, tree t)
+c_pretty_printer::direct_abst

Re: PR 58191 patch

2013-08-30 Thread François Dumont

Hi

I finally generalized  this method to other debug functions, it is 
more consistent and clean the implementation of the debug checks. For 
4.8 branch I will limit it to just what need to be really fixed.


2013-08-30  François Dumont  

PR libstdc++/58191
* include/debug/macros.h (__glibcxx_check_partitioned_lower): Add
__gnu_debug::__base calls on iterators passed to internal debug
check.
(__glibcxx_check_partitioned_lower_pred): Likewise.
(__glibcxx_check_partitioned_upper): Likewise.
(__glibcxx_check_partitioned_upper_pred): Likewise.
(__glibcxx_check_sorted): Likewise.
(__glibcxx_check_sorted_pred): Likewise.
(__glibcxx_check_sorted_set): Likewise.
(__glibcxx_check_sorted_set_pred): Likewise.
* include/debug/functions.h (__check_partitioned_lower):
Remove code to detect safe iterators.
(__check_partitioned_upper): Likewise.
(__check_sorted): Likewise.

François


On 08/27/2013 11:08 PM, Paolo Carlini wrote:

On 08/27/2013 10:57 PM, François Dumont wrote:

Hi

Here is a patch to fix the small PR 58191 regression. I don't 
remember why I hadn't used the __gnu_debug::__base from the start 
rather than trying to reproduce its behavior within the 
__check_partitioned* methods. This way we only detect random access 
safe iterator to enhance performance but do not check the iterator 
category otherwise, concept checks are there for that reason.


The patch is generated from the 4.8 branch. I still need to reg 
test it but if it succeeded is it ok to commit ? Just in trunk or 
also in 4.8 branch ?
Thanks. Let's play safe, let's apply it to mainline and let's ask 
people on the audit trail of the bug too to test it. If everything 
goes well let's backport it to the branch in a few weeks.


Thanks again,
Paolo.



Index: include/debug/functions.h
===
--- include/debug/functions.h	(revision 201966)
+++ include/debug/functions.h	(working copy)
@@ -336,15 +336,6 @@
   return true;
 }
 
-  // For performance reason, as the iterator range has been validated, check on
-  // random access safe iterators is done using the base iterator.
-  template
-inline bool
-__check_sorted_aux(const _Safe_iterator<_Iterator, _Sequence>& __first,
-		   const _Safe_iterator<_Iterator, _Sequence>& __last,
-		   std::random_access_iterator_tag __tag)
-  { return __check_sorted_aux(__first.base(), __last.base(), __tag); }
-
   // Can't check if an input iterator sequence is sorted, because we can't step
   // through the sequence.
   template
@@ -371,17 +362,6 @@
   return true;
 }
 
-  // For performance reason, as the iterator range has been validated, check on
-  // random access safe iterators is done using the base iterator.
-  template
-inline bool
-__check_sorted_aux(const _Safe_iterator<_Iterator, _Sequence>& __first,
-		   const _Safe_iterator<_Iterator, _Sequence>& __last,
-		   _Predicate __pred,
-		   std::random_access_iterator_tag __tag)
-  { return __check_sorted_aux(__first.base(), __last.base(), __pred, __tag); }
-
   // Determine if a sequence is sorted.
   template
 inline bool
@@ -470,11 +450,13 @@
   return __check_sorted_set_aux(__first, __last, __pred, _SameType());
}
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 270. Binary search requirements overly strict
+  // Determine if a sequence is partitioned w.r.t. this element.
   template
 inline bool
-  __check_partitioned_lower_aux(_ForwardIterator __first,
-_ForwardIterator __last, const _Tp& __value,
-std::forward_iterator_tag)
+__check_partitioned_lower(_ForwardIterator __first,
+			  _ForwardIterator __last, const _Tp& __value)
 {
   while (__first != __last && *__first < __value)
 	++__first;
@@ -487,38 +469,11 @@
   return __first == __last;
 }
 
-  // For performance reason, as the iterator range has been validated, check on
-  // random access safe iterators is done using the base iterator.
-  template
-inline bool
-__check_partitioned_lower_aux(
-			const _Safe_iterator<_Iterator, _Sequence>& __first,
-			const _Safe_iterator<_Iterator, _Sequence>& __last,
-			const _Tp& __value,
-			std::random_access_iterator_tag __tag)
-{
-  return __check_partitioned_lower_aux(__first.base(), __last.base(),
-	   __value, __tag);
-}
-
-  // _GLIBCXX_RESOLVE_LIB_DEFECTS
-  // 270. Binary search requirements overly strict
-  // Determine if a sequence is partitioned w.r.t. this element.
   template
 inline bool
-__check_partitioned_lower(_ForwardIterator __first,
+__check_partitioned_upper(_ForwardIterator __first,
 			  _ForwardIterator __last, const _Tp& __value)
 {
-  return __check_partitioned_lower_aux(__first, __last, __value,
-	   std::__iterator_category(__first));
-}
-
-  template
-inline bool
-__check_partitioned_upper_aux(_ForwardIterator __first,
-  _ForwardIter

Re: [PATCH] Convert more passes to new dump framework

2013-08-30 Thread Xinliang David Li
On Fri, Aug 30, 2013 at 12:51 PM, Teresa Johnson  wrote:
> On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li  wrote:
>> Except that in this form, the dump will be extremely large and not
>> suitable for very large applications.
>
> Yes. I did some measurements for both a fairly large source file that
> is heavily optimized with LIPO and for a simple toy example that has
> some inlining. For the large source file, the output from
> -fdump-ipa-inline=stderr was almost 100x the line count of the
> -fopt-info output. For the toy source file it was 43x. The size of the
> -details output was 250x and 100x, respectively. Which is untenable
> for a large app.
>
> The issue I am having here is that I want a more verbose message, not
> a more voluminous set of messages. Using either -fopt-info-all or
> -fdump-ipa-inline to provoke the more verbose inline message will give
> me a much greater volume of output.
>
> One compromise could be to emit the more verbose inliner message under
> a param (and a more concise "foo inlined into bar" by default with
> -fopt-info). Or we could do some variant of what David talks about
> below.

something like --param=verbose-opt-info=1


>
>> Besides, we might also want to
>> use the same machinery (dump_printf_loc etc) for dump file dumping.
>> The current behavior of using '-details' to turn on opt-info-all
>> messages for dump files are not desirable.
>
> Interestingly, this doesn't even work. When I do
> -fdump-ipa-inline-details=stderr (with my patch containing the inliner
> messages) I am not getting those inliner messages emitted to stderr.
> Even though in dumpfile.c "details" is set to (TDF_DETAILS |
> MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
> sure why, but will need to debug this.

It works for vectorizer pass.

>
>> How about the following:
>>
>> 1) add a new dump_kind modifier so that when that modifier is
>> specified, the messages won't goto the alt_dumpfile (controlled by
>> -fopt-info), but only to primary dump file. With this, the inline
>> messages can be dumped via:
>>
>>dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .)
>
> (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )
>

Yes.

> Typically OR-ing together flags like this indicates dump under any of
> those conditions. But we could implement special handling for
> OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
> the primary dump file, and only under the other conditions specified
> in the flag (here under "-optimized")
>
>>
>>
>> 2) add more flags in -fdump- support:
>>
>>-fdump-ipa-inline-opt   --> turn on opt-info messages only
>>-fdump-ipa-inline-optall --> turn on opt-info-all messages
>
> According to the documentation (see the -fdump-tree- documentation on
> http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
> the above are already supposed to be there (-optimized, -missed, -note
> and -optall). However, specifying any of these gives a warning like:
>cc1: warning: ignoring unknown option ‘optimized’ in
> ‘-fdump-ipa-inline’ [enabled by default]
> Probably because none is listed in the dump_options[] array in dumpfile.c.
>
> However, I don't think there is currently a way to use -fdump- options
> and *only* get one of these, as much of the current dump output is
> emitted whenever there is a dump_file defined. Until everything is
> migrated to the new framework it may be difficult to get this to work.
>
>>-fdump-tree-pre-ir --> turn on GIMPLE dump only
>>-fdump-tree-pre-details --> turn on everything (ir, optall, trace)
>>
>> With this, developers can really just use
>>
>>
>> -fdump-ipa-inline-opt=stderr for inline messages.
>
> Yes, if we can figure out a good way to get this to work (i.e. only
> emit the optimized messages and not the rest of the dump messages).
> And unfortunately to get them all you need to specify
> "-fdump-ipa-all-optimized -fdump-tree-all-optimized
> -fdump-rtl-all-optimized" instead of just -fopt-info. Unless we can
> add -fdump-all-all-optimized.

Having general support requires cleanup of all the old style  if
(dump_file) fprintf (dump_file, ...) instances to be:

  if (dump_enabled_p ())
dump_printf (dump_kind );


However, it might be easier to do this filtering for IR dump only (in
execute_function_dump) -- do not dump IR if any of the MSG_ is
specified unless IR flag (a new flag) is also specified.

David


>
> Teresa
>
>>
>> thanks,
>>
>> David
>>
>> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
>>  wrote:
>>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson  
>>> wrote:
 On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
  wrote:
 New patch below that removes this global variable, and also outputs
 the node->symbol.order (in square brackets after the function name so
 as to not clutter it). Inline messages with profile data look look:

 test.c:8:3: note: foobar [0] (9000) inlined int

Re: Ubsan merged into trunk

2013-08-30 Thread Dominique Dhumieres
> I've just merged ubsan into trunk.  Please send complaints my way.

Bootstrap is broken on x86_64-apple-darwin10:

Making all in ubsan
make[4]: *** No rule to make target `../interception/libinterception.la', 
needed by `libubsan.la'.  Stop.
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-stage1-target-libsanitizer] Error 2
make[1]: *** [stage1-bubble] Error 2
make: *** [all] Error 2

TIA

Dominique


Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-30 Thread Gerald Pfeifer
On Fri, 30 Aug 2013, Dominique Dhumieres wrote:
>> Committed as r202101.
> This cause a bootstrap failure:
> on x86_64-apple-darwin10.

And FreeBSD.  Your fix also restored bootstrap there.  Thanks 
for the quick reaction, and to Dominique and Rainer for their
reports.

Gerald


Re: Merge profiles of duplicated COMDAT functions

2013-08-30 Thread Jan Hubicka
> Jan Hubicka  writes:
> >
> > Now for longer term, we want to make function CFGs independent of gimple 
> > body
> > and we want to decide on instrumentation at linktime, so we won't require 
> > user
> > to rebuild with -fprofile-use, just relink.  I plan to work on this, but 
> > not for 4.9. 
> 
> It may be already obsolete with AutoFDO? 
> 
> Longer term I hope more and more of the explicit instrumentation 
> functionality can be implemented using the hardware PMU.
> 
> For example (auto fdo doesn't do this today) it would be quite
> possible to do x86 value profiling without explicit instrumentation
> too by looking at the PEBS records, and likely similar functionality
> elsewhere.

Well, AutoFDO is great feature, but I do not think it is going to replace FDO
really soon.  FDO has advantages in letting you to measure things exactly and
reproducibly w/o hardware dependencies. 

But yep, with debug info based AutoFDO we may be able to unify comdat function
profiles w/o the issues we currently see with FO.
I am curious how both features will develop...

Honza
> 
> -Andi
> 
> -- 
> a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-30 Thread Teresa Johnson
On Fri, Aug 30, 2013 at 8:26 AM, Jan Hubicka  wrote:
>> >
>> > The frequency condition needs to be done only when you walk predecestors - 
>> > when
>> > you walk down the edge probabilities are just fine.
>>
>> True. For simplicity I think it should be fine to leave as-is so there
>> isn't more special casing as the current approach works in both
>> directions.
>
> Yep, you are right. Frequencies are safe both directions.
>
> I think this change belongs to profile feedback category, so the patch is OK.

ok, thanks.

Teresa

>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


  1   2   >