date:20120807

[PATCH] Make gimple_phi_set_result adjust SSA_NAME_DEF_STMT

2012-08-07 Thread Richard Guenther


It's the only LHS setter that does not so and that results in the
asymmetry of create_phi_node needing a SSA_NAME_DEF_STMT adjustment
if you fed it an SSA name result as opposed to a VAR_DECL result
in which case it will be already adjusted.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-07  Richard Guenther  

* gimple.h (gimple_phi_set_result): Adjust SSA_NAME_DEF_STMT.
* tree-phinodes.c (make_phi_node): Allow a NULL var.
* tree-into-ssa.c (insert_phi_nodes_for): Simplify.
* tree-complex.c (update_phi_components): Likewise.
* tree-ssa-loop-manip.c (create_iv): Likewise.
(add_exit_phis_edge): Likewise.
(split_loop_exit_edge): Likewise.
(tree_transform_and_unroll_loop): Likewise.
* value-prof.c (gimple_ic): Likewise.
(gimple_stringop_fixed_value): Likewise.
* tree-tailcall.c (tree_optimize_tail_calls_1): Likewise.
* omp-low.c (expand_parallel_call): Likewise.
(expand_omp_for_static_chunk): Likewise.
(expand_omp_atomic_pipeline): Likewise.
* tree-parloops.c (create_phi_for_local_result): Likewise.
(transform_to_exit_first_loop): Likewise.
* tree-vect-data-refs.c (vect_setup_realignment): Likewise.
* graphite-scop-detection.c (canonicalize_loop_closed_ssa): Likewise.
* tree-predcom.c (initialize_root_vars): Likewise.
(initialize_root_vars_lm): Likewise.
* sese.c (sese_add_exit_phis_edge): Likewise.
* gimple-streamer-in.c (input_phi): Likewise.
* tree-inline.c (copy_phis_for_bb): Likewise.
* tree-ssa-phiprop.c (phiprop_insert_phi): Likewise.
* tree-cfg.c (gimple_make_forwarder_block): Likewise.
(gimple_duplicate_bb): Likewise.

Index: gcc/tree-into-ssa.c
===
--- gcc/tree-into-ssa.c (revision 190197)
+++ gcc/tree-into-ssa.c (working copy)
@@ -1039,10 +1039,8 @@ insert_phi_nodes_for (tree var, bitmap p
  tree new_lhs;
 
  gcc_assert (update_p);
- phi = create_phi_node (var, bb);
-
- new_lhs = duplicate_ssa_name (var, phi);
- gimple_phi_set_result (phi, new_lhs);
+ new_lhs = duplicate_ssa_name (var, NULL);
+ phi = create_phi_node (new_lhs, bb);
  add_new_name_mapping (new_lhs, var);
 
  /* Add VAR to every argument slot of PHI.  We need VAR in
Index: gcc/tree-complex.c
===
--- gcc/tree-complex.c  (revision 190197)
+++ gcc/tree-complex.c  (working copy)
@@ -720,17 +720,11 @@ update_phi_components (basic_block bb)
 
  lr = get_component_ssa_name (gimple_phi_result (phi), false);
  if (TREE_CODE (lr) == SSA_NAME)
-   {
- pr = create_phi_node (lr, bb);
- SSA_NAME_DEF_STMT (lr) = pr;
-   }
+   pr = create_phi_node (lr, bb);
 
  li = get_component_ssa_name (gimple_phi_result (phi), true);
  if (TREE_CODE (li) == SSA_NAME)
-   {
- pi = create_phi_node (li, bb);
- SSA_NAME_DEF_STMT (li) = pi;
-   }
+   pi = create_phi_node (li, bb);
 
  for (i = 0, n = gimple_phi_num_args (phi); i < n; ++i)
{
Index: gcc/tree-ssa-loop-manip.c
===
--- gcc/tree-ssa-loop-manip.c   (revision 190197)
+++ gcc/tree-ssa-loop-manip.c   (working copy)
@@ -116,7 +116,6 @@ create_iv (tree base, tree step, tree va
 gsi_insert_seq_on_edge_immediate (pe, stmts);
 
   stmt = create_phi_node (vb, loop->header);
-  SSA_NAME_DEF_STMT (vb) = stmt;
   add_phi_arg (stmt, initial, loop_preheader_edge (loop), UNKNOWN_LOCATION);
   add_phi_arg (stmt, va, loop_latch_edge (loop), UNKNOWN_LOCATION);
 }
@@ -144,9 +143,8 @@ add_exit_phis_edge (basic_block exit, tr
   if (!e)
 return;
 
-  phi = create_phi_node (use, exit);
-  create_new_def_for (gimple_phi_result (phi), phi,
- gimple_phi_result_ptr (phi));
+  phi = create_phi_node (NULL_TREE, exit);
+  create_new_def_for (use, phi, gimple_phi_result_ptr (phi));
   FOR_EACH_EDGE (e, ei, exit->preds)
 add_phi_arg (phi, use, e, UNKNOWN_LOCATION);
 }
@@ -499,7 +497,6 @@ split_loop_exit_edge (edge exit)
 of the SSA name out of the loop.  */
   new_name = duplicate_ssa_name (name, NULL);
   new_phi = create_phi_node (new_name, bb);
-  SSA_NAME_DEF_STMT (new_name) = new_phi;
   add_phi_arg (new_phi, name, exit, locus);
   SET_USE (op_p, new_name);
 }
@@ -1012,7 +1009,6 @@ tree_transform_and_unroll_loop (struct l
 
   new_init = make_ssa_name (var, NULL);
   phi_rest = create_phi_node (new_init, rest);
-  SSA_NAME_DEF_STMT (new_init) = phi_rest;
 
   add_phi_arg (phi_rest, init, precond_edge, UNKNOWN_LOCATION);
   add_phi_arg (phi_rest, next, new_exit, UNKNOWN_LOCATION);
Index: gcc/value-prof.c

Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

2012-08-07 Thread Richard Guenther

On Tue, Aug 7, 2012 at 11:58 AM, Steven Bosscher  wrote:
> On Tue, Aug 7, 2012 at 11:52 AM, Richard Guenther
>  wrote:
>> So I wonder why simply looping over all SSA defs in a loop body and checking
>> whether a use is outside of it is not enough to compute this information ...
>> (yes, we might end up creating too many loop closed PHIs, namely on all
>> exits rather than just on those that the name is live over, but ... at
>> least maybe we can prune it with DOM info)
>
> I've been thinking the same thing, but I don't understand loop-closed
> SSA form well enough to try and rewrite it along those lines.

I've put it on my TODO to look into.

Richard.

> Ciao!
> Steven

[PATCH] Remove clobbers as we go into SSA

2012-08-07 Thread Richard Guenther


This makes sure clobbers of SSA names are never retained in the IL.
First by verifying that in the gimple verifier, second by removing
them when we go into SSA form.  Third by catching those we have to
get rid of when un-nesting - this avoids stuff like

 finally
   {
 D.1234 = CLOBBER;
 frame.i = D.1234;
   }

and D.1234 being a candidate for a register.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-07  Richard Guenther  

* tree-into-ssa.c (rewrite_stmt): Remove clobbers for variables
we rewrite into SSA form.
(rewrite_enter_block): Adjust.
* gimple-iterator.c (gsi_replace): Also allow replacement with
a stmt without a lhs.
* tree-ssa-live.c (remove_unused_locals): Remove code handling
clobbers of SSA names.
* tree-nested.c (convert_local_reference_stmt): Remove clobbers
for variables we access through the local chain.
* tree-cfg.c (verify_gimple_assign_single): Verify clobbers
clobber full decls only.

Index: gcc/tree-into-ssa.c
===
--- gcc/tree-into-ssa.c (revision 190199)
+++ gcc/tree-into-ssa.c (working copy)
@@ -1323,12 +1323,12 @@ rewrite_debug_stmt_uses (gimple stmt)
definition of a variable when a new real or virtual definition is found.  */
 
 static void
-rewrite_stmt (gimple_stmt_iterator si)
+rewrite_stmt (gimple_stmt_iterator *si)
 {
   use_operand_p use_p;
   def_operand_p def_p;
   ssa_op_iter iter;
-  gimple stmt = gsi_stmt (si);
+  gimple stmt = gsi_stmt (*si);
 
   /* If mark_def_sites decided that we don't need to rewrite this
  statement, ignore it.  */
@@ -1362,9 +1362,24 @@ rewrite_stmt (gimple_stmt_iterator si)
 FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_ALL_DEFS)
   {
tree var = DEF_FROM_PTR (def_p);
-   tree name = make_ssa_name (var, stmt);
+   tree name;
tree tracked_var;
+
gcc_assert (DECL_P (var));
+
+   if (gimple_clobber_p (stmt)
+   && is_gimple_reg (var))
+ {
+   /* If we rewrite a DECL into SSA form then drop its
+  clobber stmts and replace uses with a new default def.  */
+   gcc_assert (TREE_CODE (var) == VAR_DECL
+   && !gimple_vdef (stmt));
+   gsi_replace (si, gimple_build_nop (), true);
+   register_new_def (get_or_create_ssa_default_def (cfun, var), var);
+   break;
+ }
+
+   name = make_ssa_name (var, stmt);
SET_DEF (def_p, name);
register_new_def (DEF_FROM_PTR (def_p), var);
 
@@ -1372,7 +1387,7 @@ rewrite_stmt (gimple_stmt_iterator si)
if (tracked_var)
  {
gimple note = gimple_build_debug_bind (tracked_var, name, stmt);
-   gsi_insert_after (&si, note, GSI_SAME_STMT);
+   gsi_insert_after (si, note, GSI_SAME_STMT);
  }
   }
 }
@@ -1439,7 +1454,7 @@ rewrite_enter_block (struct dom_walk_dat
  of a variable when a new real or virtual definition is found.  */
   if (TEST_BIT (interesting_blocks, bb->index))
 for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-  rewrite_stmt (gsi);
+  rewrite_stmt (&gsi);
 
   /* Step 3.  Visit all the successor blocks of BB looking for PHI nodes.
  For every PHI node found, add a new argument containing the current
Index: gcc/gimple-iterator.c
===
--- gcc/gimple-iterator.c   (revision 190198)
+++ gcc/gimple-iterator.c   (working copy)
@@ -427,7 +427,7 @@ gsi_replace (gimple_stmt_iterator *gsi,
   if (stmt == orig_stmt)
 return;
 
-  gcc_assert (!gimple_has_lhs (orig_stmt)
+  gcc_assert (!gimple_has_lhs (orig_stmt) || !gimple_has_lhs (stmt)
  || gimple_get_lhs (orig_stmt) == gimple_get_lhs (stmt));
 
   gimple_set_location (stmt, gimple_location (orig_stmt));
Index: gcc/tree-ssa-live.c
===
--- gcc/tree-ssa-live.c (revision 190198)
+++ gcc/tree-ssa-live.c (working copy)
@@ -773,9 +773,6 @@ remove_unused_locals (void)
if (gimple_clobber_p (stmt))
  {
tree lhs = gimple_assign_lhs (stmt);
-   lhs = get_base_address (lhs);
-   if (TREE_CODE (lhs) == SSA_NAME)
- lhs = SSA_NAME_VAR (lhs);
if (TREE_CODE (lhs) == VAR_DECL && !is_used_p (lhs))
  {
unlink_stmt_vdef (stmt);
Index: gcc/tree-nested.c
===
--- gcc/tree-nested.c   (revision 190198)
+++ gcc/tree-nested.c   (working copy)
@@ -1727,6 +1727,20 @@ convert_local_reference_stmt (gimple_stm
   *handled_ops_p = false;
   return NULL_TREE;
 
+case GIMPLE_ASSIGN:
+  if (gimple_clobber_p (stmt))
+   {
+ tree lhs = gimple_assign_lhs (stmt);
+ if (!use_pointer_in_

Re: libgo patch committed: Support NumCPU on more platforms

2012-08-07 Thread Richard Earnshaw

On 07/08/12 05:43, Ian Lance Taylor wrote:
> This patch to libgo, from Shenghou Ma, adds support for NumCPU on
> additional platforms: Solaris, Irix, *BSD, Darwin.  Bootstrapped and ran
> Go testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.
> 
> Ian
> 
> 

Wouldn't it be more useful on Linux to check the task's affinity
settings?  Then when a task is locked to a limited set of cores it won't
overload those cores with threads.

R.

> foo.patch
> 
> 
> diff -r 12a361bc53d0 libgo/Makefile.am
> --- a/libgo/Makefile.am   Mon Aug 06 21:39:47 2012 -0700
> +++ b/libgo/Makefile.am   Mon Aug 06 21:40:21 2012 -0700
> @@ -390,6 +390,32 @@
>  runtime_lock_files = runtime/lock_sema.c runtime/thread-sema.c
>  endif
>  
> +if LIBGO_IS_LINUX
> +runtime_getncpu_file = runtime/getncpu-linux.c
> +else
> +if LIBGO_IS_DARWIN
> +runtime_getncpu_file = runtime/getncpu-bsd.c
> +else
> +if LIBGO_IS_IRIX
> +runtime_getncpu_file = runtime/getncpu-irix.c
> +else
> +if LIBGO_IS_SOLARIS
> +runtime_getncpu_file = runtime/getncpu-solaris.c
> +else
> +if LIBGO_IS_FREEBSD
> +runtime_getncpu_file = runtime/getncpu-bsd.c
> +else
> +if LIBGO_IS_NETBSD
> +runtime_getncpu_file = runtime/getncpu-bsd.c
> +else
> +runtime_getncpu_file = runtime/getncpu-none.c
> +endif
> +endif
> +endif
> +endif
> +endif
> +endif
> +
>  runtime_files = \
>   runtime/go-append.c \
>   runtime/go-assert.c \
> @@ -481,7 +507,8 @@
>   sema.c \
>   sigqueue.c \
>   string.c \
> - time.c
> + time.c \
> + $(runtime_getncpu_file)
>  
>  goc2c.$(OBJEXT): runtime/goc2c.c
>   $(CC_FOR_BUILD) -c $(CFLAGS_FOR_BUILD) $<
> diff -r 12a361bc53d0 libgo/runtime/getncpu-bsd.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +
> +++ b/libgo/runtime/getncpu-bsd.c Mon Aug 06 21:40:21 2012 -0700
> @@ -0,0 +1,24 @@
> +// Copyright 2012 The Go Authors. All rights reserved.
> +// Use of this source code is governed by a BSD-style
> +// license that can be found in the LICENSE file.
> +
> +#include 
> +#include 
> +
> +#include "runtime.h"
> +#include "defs.h"
> +
> +int32
> +getproccount(void)
> +{
> + int mib[2], out;
> + size_t len;
> +
> + mib[0] = CTL_HW;
> + mib[1] = HW_NCPU;
> + len = sizeof(out);
> + if(sysctl(mib, 2, &out, &len, NULL, 0) >= 0)
> + return (int32)out;
> + else
> + return 0;
> +}
> diff -r 12a361bc53d0 libgo/runtime/getncpu-irix.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +
> +++ b/libgo/runtime/getncpu-irix.cMon Aug 06 21:40:21 2012 -0700
> @@ -0,0 +1,16 @@
> +// Copyright 2012 The Go Authors. All rights reserved.
> +// Use of this source code is governed by a BSD-style
> +// license that can be found in the LICENSE file.
> +
> +#include 
> +
> +#include "runtime.h"
> +#include "defs.h"
> +
> +int32
> +getproccount(void)
> +{
> + int32 n;
> + n = (int32)sysconf(_SC_NPROC_ONLN);
> + return n > 1 ? n : 1;
> +}
> diff -r 12a361bc53d0 libgo/runtime/getncpu-linux.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +
> +++ b/libgo/runtime/getncpu-linux.c   Mon Aug 06 21:40:21 2012 -0700
> @@ -0,0 +1,47 @@
> +// Copyright 2012 The Go Authors. All rights reserved.
> +// Use of this source code is governed by a BSD-style
> +// license that can be found in the LICENSE file.
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "runtime.h"
> +#include "defs.h"
> +
> +#ifndef O_CLOEXEC
> +#define O_CLOEXEC 0
> +#endif
> +
> +int32
> +getproccount(void)
> +{
> + int32 fd, rd, cnt, cpustrlen;
> + const char *cpustr;
> + const byte *pos;
> + byte *bufpos;
> + byte buf[256];
> +
> + fd = open("/proc/stat", O_RDONLY|O_CLOEXEC, 0);
> + if(fd == -1)
> + return 1;
> + cnt = 0;
> + bufpos = buf;
> + cpustr = "\ncpu";
> + cpustrlen = strlen(cpustr);
> + for(;;) {
> + rd = read(fd, bufpos, sizeof(buf)-cpustrlen);
> + if(rd == -1)
> + break;
> + bufpos[rd] = 0;
> + for(pos=buf; (pos=(const byte*)strstr((const char*)pos, 
> cpustr)) != nil; cnt++, pos++) {
> + }
> + if(rd < cpustrlen)
> + break;
> + memmove(buf, bufpos+rd-cpustrlen+1, cpustrlen-1);
> + bufpos = buf+cpustrlen-1;
> + }
> + close(fd);
> + return cnt ? cnt : 1;
> +}
> diff -r 12a361bc53d0 libgo/runtime/getncpu-none.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +
> +++ b/libgo/runtime/getncpu-none.cMon Aug 06 21:40:21 2012 -0700
> @@ -0,0 +1,12 @@
> +// Copyright 2012 The Go Authors. All rights reserved.
> +// Use of this source code is governed by a BSD-style
> +// license that can be found in the LICENSE file.
> +
> +#include "runtime.h"
> +#include "defs.h"
> +
> +int32
> +getproccount(void)
> +{
> + return 0;
> +}
> diff -r 12a361bc53d0 libgo/runtime/getncpu-solaris.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +
> +++ b/libgo/runtime/getncpu-solaris.c Mon Aug 06 21:40:21 2012

Re: [PATCH] Set correct source location for deallocator calls

2012-08-07 Thread Richard Guenther

On Tue, Aug 7, 2012 at 12:07 AM, Dehao Chen  wrote:
> Ping...
>
> Richard, could you shed some lights on this?
>
> Thanks,
> Dehao
>
> On Mon, Jul 30, 2012 at 8:29 PM, Dehao Chen  wrote:
>> Hi,
>>
>> This patch fixes the source location for automatically generated calls
>> to deallocator. For example:
>>
>>  19 void foo(int i)
>>  20 {
>>  21   for (int j = 0; j < 10; j++)
>>  22 {
>>  23   t test;
>>  24   test.foo();
>>  25   if (i + j)
>>  26 {
>>  27   test.bar();
>>  28   return;
>>  29 }
>>  30 }
>>  31   return;
>>  32 }
>>
>> The deallocator for "23  t test" is called in two places: Line 28 and
>> line 30. However, gcc attributes both callsites to line 30.
>>
>> Bootstrapped and passed gcc regression tests.
>>
>> Is it ok for trunk?
>>
>> Thanks,
>> Dehao
>>
>> gcc/ChangeLog
>>
>> 2012-07-31  Dehao Chen  
>>
>> * tree-eh.c (goto_queue_node): New field.
>> (record_in_goto_queue): New parameter.
>> (record_in_goto_queue_label): New parameter.
>> (lower_try_finally_copy): Update source location.
>>
>> gcc/testsuite/ChangeLog
>>
>> 2012-07-31  Dehao Chen  
>>
>> * g++.dg/guality/deallocator.C: New test.
>>
>> Index: gcc/testsuite/g++.dg/guality/deallocator.C
>> ===
>> --- gcc/testsuite/g++.dg/guality/deallocator.C  (revision 0)
>> +++ gcc/testsuite/g++.dg/guality/deallocator.C  (revision 0)
>> @@ -0,0 +1,33 @@
>> +// Test that debug info generated for auto-inserted deallocator is
>> +// correctly attributed.
>> +// This patch scans for the lineno directly from assembly, which may
>> +// differ between different architectures. Because it mainly tests
>> +// FE generated debug info, without losing generality, only x86
>> +// assembly is scanned in this test.
>> +// { dg-do compile { target { i?86-*-* x86_64-*-* } } }
>> +// { dg-options "-O2 -fno-exceptions -g" }
>> +
>> +struct t {
>> +  t ();
>> +  ~t ();
>> +  void foo();
>> +  void bar();
>> +};
>> +
>> +int bar();
>> +
>> +void foo(int i)
>> +{
>> +  for (int j = 0; j < 10; j++)
>> +{
>> +  t test;
>> +  test.foo();
>> +  if (i + j)
>> +   {
>> + test.bar();
>> + return;
>> +   }
>> +}
>> +  return;
>> +}
>> +// { dg-final { scan-assembler "1 28 0" } }
>> Index: gcc/tree-eh.c
>> ===
>> --- gcc/tree-eh.c   (revision 189835)
>> +++ gcc/tree-eh.c   (working copy)
>> @@ -321,6 +321,7 @@
>>  struct goto_queue_node
>>  {
>>treemple stmt;
>> +  enum gimple_code code;
>>gimple_seq repl_stmt;
>>gimple cont_stmt;
>>int index;
>> @@ -560,7 +561,8 @@
>>  record_in_goto_queue (struct leh_tf_state *tf,
>>treemple new_stmt,
>>int index,
>> -  bool is_label)
>> +  bool is_label,
>> + enum gimple_code code)
>>  {
>>size_t active, size;
>>struct goto_queue_node *q;
>> @@ -583,6 +585,7 @@
>>memset (q, 0, sizeof (*q));
>>q->stmt = new_stmt;
>>q->index = index;
>> +  q->code = code;
>>q->is_label = is_label;
>>  }
>>
>> @@ -590,7 +593,8 @@
>> TF is not null.  */
>>
>>  static void
>> -record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree 
>> label)
>> +record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree 
>> label,
>> +   enum gimple_code code)
>>  {
>>int index;
>>treemple temp, new_stmt;
>> @@ -629,7 +633,7 @@
>>   since with a GIMPLE_COND we have an easy access to the then/else
>>   labels. */
>>new_stmt = stmt;
>> -  record_in_goto_queue (tf, new_stmt, index, true);
>> +  record_in_goto_queue (tf, new_stmt, index, true, code);
>>  }
>>
>>  /* For any GIMPLE_GOTO or GIMPLE_RETURN, decide whether it leaves a 
>> try_finally
>> @@ -649,19 +653,22 @@
>>  {
>>  case GIMPLE_COND:
>>new_stmt.tp = gimple_op_ptr (stmt, 2);
>> -  record_in_goto_queue_label (tf, new_stmt, gimple_cond_true_label 
>> (stmt));
>> +  record_in_goto_queue_label (tf, new_stmt, gimple_cond_true_label 
>> (stmt),
>> + gimple_code (stmt));
>>new_stmt.tp = gimple_op_ptr (stmt, 3);
>> -  record_in_goto_queue_label (tf, new_stmt,
>> gimple_cond_false_label (stmt));
>> +  record_in_goto_queue_label (tf, new_stmt, gimple_cond_false_label 
>> (stmt),
>> + gimple_code (stmt));
>>break;
>>  case GIMPLE_GOTO:
>>new_stmt.g = stmt;
>> -  record_in_goto_queue_label (tf, new_stmt, gimple_goto_dest (stmt));
>> +  record_in_goto_queue_label (tf, new_stmt, gimple_goto_dest (stmt),
>> + gimple_code (stmt));
>>break;
>>
>>  case GIMPLE_RETURN:
>>tf->may_return = true;
>>new_stmt.g = stmt;
>> -  record_in_goto_queue (tf, new_stmt,

Re: [google/main] Omit TARGET_LIB_PATH from RPATH_ENVVAR in HOST_EXPORTS on bootstrap builds (issue6441093)

2012-08-07 Thread Diego Novillo

On Thu, Aug 2, 2012 at 7:37 AM, Simon Baldwin  wrote:
> Omit TARGET_LIB_PATH from RPATH_ENVVAR in HOST_EXPORTS on bootstrap builds.
>
> Discussion and rationale at: http://gcc.gnu.org/ml/gcc/2012-06/msg00314.html
>
> For google/main.  Tested for bootstrap and regression.
>
> 2012-08-02  Simon Baldwin  
>
> * Makefile.tpl: Omit TARGET_LIB_PATH from RPATH_ENVVAR set in
> HOST_EXPORTS on bootstrap builds.
> * Makefile.in: Regenerate.

This is OK.

Does this belong in google/integration? Or will you be sending this
for trunk? If you're not going to be sending this to trunk, please put
it in google/integration.



Diego.

Re: add strnlen to libiberty (was Re: Assembly output optimisations)

2012-08-07 Thread Ian Lance Taylor

On Tue, Aug 7, 2012 at 2:30 AM, Hans-Peter Nilsson  wrote:
>
> Just don't forget that libiberty isn't a target library anymore.
> To wit, the (GCC) run-time exception is moot for that code, AFAIK.
> Maybe enough reason to abandon that rule so its code can be
> truly and freely shared between GNU projects.

The libiberty licensing is certainly confused.  I just don't want to
make it worse.

None of the code in libiberty is under the GCC Runtime Library
Exception, so I think that particular issue does not apply.

Ian

[PATCH][3/n] Allow anonymous SSA names, add copy_ssa_name

2012-08-07 Thread Richard Guenther


This adds copy_ssa_name similar to duplicate_ssa_name but not copying
any annotations (thus, just make a new SSA name that looks similar
to a template).  This avoids a bunch of SSA_NAME_VAR uses which will
no longer work to create SSA names off.

The 2nd part is stuff I came along when working on the overall patch.
We should release default defs of unused decls as they leak otherwise.
Also may_propagate_copy_into_asm always returns true as
DECL_HARD_REGISTER vars are never is_gimple_reg()s and thus never
have SSA names associated.  set_ssa_default_def should clear
SSA_NAME_DEFAULT_DEF off the name it replaces, even if the replacement
is NULL.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I will commit it with two revisions separating the changes to allow
easier bisection (the copy_ssa_name is a complete no-op).

Richard.

2012-08-07  Richard Guenther  

* tree-flow.h (copy_ssa_name_fn): New function.
(duplicate_ssa_name_fn): Likewise.
* tree-flow-inline.h (copy_ssa_name): New function.
(duplicate_ssa_name): Likewise.
* tree-ssanames.c (copy_ssa_name_fn): New function.
(duplicate_ssa_name): Rename to ...
(duplicate_ssa_name_fn): ... this and adjust.
* tree-tailcall.c (update_accumulator_with_ops): Use copy_ssa_name.
* tree-vect-loop-manip.c (slpeel_update_phi_nodes_for_guard1): Likewise.
(slpeel_update_phi_nodes_for_guard2): Likewise.
(slpeel_tree_peel_loop_to_edge): Likewise.
(vect_loop_versioning): Likewise.
* tree-parloops.c (transform_to_exit_first_loop): Likewise.
(create_parallel_loop): Likewise.
* ipa-split.c (split_function): Likewise.
* tree-vect-loop.c (vect_is_simple_reduction_1): Likewise.
(vect_create_epilog_for_reduction): Likewise.
* tree-vect-data-refs.c (bump_vector_ptr): Likewise.
(vect_setup_realignment): Likewise.
* tree-vect-stmts.c (vectorizable_load): Likewise.
* tree-switch-conversion.c (build_one_array): Likewise.
(gen_def_assigns): Likewise.
* tree-cfg.c (gimple_make_forwarder_block): Likewise.
* graphite-sese-to-poly.c (rewrite_close_phi_out_of_ssa): Call
create_zero_dim_array with the SSA name.
(rewrite_phi_out_of_ssa): Likewise.
(rewrite_cross_bb_scalar_dependence): Likewise.  Use copy_ssa_name.

* tree-dfa.c (set_ssa_default_def): Clear the SSA_NAME_DEFAULT_DEF
bit of the old name when we clear the slot.
* tree-ssa-live.c (remove_unused_locals): Release any default
def associated with an unused var.
* tree-ssa-copy.c (may_propagate_copy_into_asm): Always return true.

Index: gcc/tree-tailcall.c
===
--- gcc/tree-tailcall.c (revision 190199)
+++ gcc/tree-tailcall.c (working copy)
@@ -644,9 +644,9 @@ update_accumulator_with_ops (enum tree_c
 gimple_stmt_iterator gsi)
 {
   gimple stmt;
-  tree var;
+  tree var = copy_ssa_name (acc, NULL);
   if (types_compatible_p (TREE_TYPE (acc), TREE_TYPE (op1)))
-stmt = gimple_build_assign_with_ops (code, SSA_NAME_VAR (acc), acc, op1);
+stmt = gimple_build_assign_with_ops (code, var, acc, op1);
   else
 {
   tree rhs = fold_convert (TREE_TYPE (acc),
@@ -656,11 +656,8 @@ update_accumulator_with_ops (enum tree_c
op1));
   rhs = force_gimple_operand_gsi (&gsi, rhs,
  false, NULL, false, GSI_CONTINUE_LINKING);
-  stmt = gimple_build_assign (NULL_TREE, rhs);
+  stmt = gimple_build_assign (var, rhs);
 }
-  var = make_ssa_name (SSA_NAME_VAR (acc), stmt);
-  gimple_assign_set_lhs (stmt, var);
-  update_stmt (stmt);
   gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
   return var;
 }
Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  (revision 190199)
+++ gcc/tree-vect-loop-manip.c  (working copy)
@@ -511,14 +511,15 @@ slpeel_update_phi_nodes_for_guard1 (edge
gsi_next (&gsi_orig), gsi_next (&gsi_update))
 {
   source_location loop_locus, guard_locus;
+  tree new_res;
   orig_phi = gsi_stmt (gsi_orig);
   update_phi = gsi_stmt (gsi_update);
 
   /** 1. Handle new-merge-point phis  **/
 
   /* 1.1. Generate new phi node in NEW_MERGE_BB:  */
-  new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
- new_merge_bb);
+  new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL);
+  new_phi = create_phi_node (new_res, new_merge_bb);
 
   /* 1.2. NEW_MERGE_BB has two incoming edges: GUARD_EDGE and the exit-edge
 of LOOP. Set the two phi args in NEW_PHI for these edges:  */
@@ -547,8 +548,8 @@ slpeel_update_phi_nodes_for_guard1 (edge
continue;
 
   /* 2.1. Generate new phi node in NEW_EXIT_BB:  */
-

Re: [ping] Re: [patch v2] support for multiarch systems

2012-08-07 Thread Ian Lance Taylor

On Tue, Aug 7, 2012 at 3:31 AM, Matthias Klose  wrote:
> ping?
>
> On 08.07.2012 20:48, Matthias Klose wrote:
>> Please find attached v2 of the patch updated for trunk 20120706, x86 only, 
>> tested on
>> x86-linux-gnu, KFreeBSD and the Hurd.

> +[case "${withval}" in
> +yes|no|auto-detect) enable_multiarch=$withval;;
> +*) AC_MSG_ERROR(bad value ${withval} given for --enable-multiarch option) ;;

Please just use "auto", not "auto-detect", as in libgcc/configure.ac.

> +# needed for setting the multiarch name on ARM
> +AC_SUBST(with_float)

This seems like it should be in a different patch.

> +with_float = @with_float@

Likewise.

+ifeq ($(enable_multiarch),yes)
+  if_multiarch = $(1)
+else ifeq ($(enable_multiarch),auto-detect)
+  if_multiarch = $(if $(wildcard $(shell echo
$(SYSTEM_HEADER_DIR))/../../usr/lib/*/crti.o),$(1))
+else
+  if_multiarch =
+endif

This is nicely tricky but I don't understand why you are doing this in
such a confusing way.  Why not simply set the names in config.gcc?
What information do you have at make time that you don't have at
configure time?

The auto case is not general.  When cross-compiling without --sysroot,
or when using --native-system-header-dir, the test won't work.
Without --sysroot there may not be much that you can do to auto-detect
multiarch.  And with --sysroot, there should be no need to use
SYSTEM_HEADER_DIR at all; just look in the sysroot.

> +@item --enable-multiarch
> +Specify wether to enable or disable multiarch support.  The default is
> +to detect for glibc start files in a multiarch location, and enable it
> +if the files are found.

This is quite obscure for anybody who does not already know what multiarch is.

I'm sorry, I have to repeat my extreme annoyance that this scheme
breaks all earlier GCC releases.

Ian

[RFA 4.7/4.6] Re: [PATCH v2] Target-specific limits on vector alignment

2012-08-07 Thread Ulrich Weigand

Richard Guenther wrote:
> On Fri, Jul 27, 2012 at 5:24 PM, Ulrich Weigand  wrote:
> > ChangeLog:
> >
> > * target.def (vector_alignment): New target hook.
> > * doc/tm.texi.in (TARGET_VECTOR_ALIGNMENT): Document new hook.
> > * doc/tm.texi: Regenerate.
> > * targhooks.c (default_vector_alignment): New function.
> > * targhooks.h (default_vector_alignment): Add prototype.
> > * stor-layout.c (layout_type): Use targetm.vector_alignment.
> > * config/arm/arm.c (arm_vector_alignment): New function.
> > (TARGET_VECTOR_ALIGNMENT): Define.
> >
> > * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use
> > vector type alignment instead of size.
> > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Use
> > element type size directly instead of computing it from alignment.
> > Fix variable naming and comment.
> >
> > testsuite/ChangeLog:
> >
> > * lib/target-supports.exp
> > (check_effective_target_vect_natural_alignment): New function.
> > * gcc.dg/align-2.c: Only run on targets with natural alignment
> > of vector types.
> > * gcc.dg/vect/slp-25.c: Adjust tests for targets without natural
> > alignment of vector types.


> > OK for mainline?
> 
> Ok.

Would it be OK to backport this to 4.7 and possibly 4.6?

This patch represents a change in the ABI on ARM (the alignment could
potentially affect struct member offsets, for example), which could
conceivably cause incompatibilities with code compiled with older
versions of GCC.  We had originally decided we nevertheless want to
implement this change on ARM, because:

- GCC is now in compliance with the platform ABI document
- the old code could actually be buggy since GCC *assumed* 16-byte
  alignment that wasn't actually guaranteed
- code actually affected by this change ought to be rare (code using
  NEON vector types is rare in the first place, code using *structure
  members* of such types is even rarer, and code using such structures
  in cross-module APIs seems to be extremely rare)

Nevertheless, given we do want to make this change, it would be best
to roll it out as quickly as possible, to minimize the time period
where people might use old (not yet fixed) compilers to generate
non-ABI-compliant binaries.  Thus, we think it would be good for
the change to be applied to all still open release branches as well.

(Note that while the patch contains changes to common code, those
should be no-ops for all targets that do not implement the new hook.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Re: [RFA 4.7/4.6] Re: [PATCH v2] Target-specific limits on vector alignment

2012-08-07 Thread Richard Guenther

On Tue, Aug 7, 2012 at 4:56 PM, Ulrich Weigand  wrote:
> Richard Guenther wrote:
>> On Fri, Jul 27, 2012 at 5:24 PM, Ulrich Weigand  wrote:
>> > ChangeLog:
>> >
>> > * target.def (vector_alignment): New target hook.
>> > * doc/tm.texi.in (TARGET_VECTOR_ALIGNMENT): Document new hook.
>> > * doc/tm.texi: Regenerate.
>> > * targhooks.c (default_vector_alignment): New function.
>> > * targhooks.h (default_vector_alignment): Add prototype.
>> > * stor-layout.c (layout_type): Use targetm.vector_alignment.
>> > * config/arm/arm.c (arm_vector_alignment): New function.
>> > (TARGET_VECTOR_ALIGNMENT): Define.
>> >
>> > * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use
>> > vector type alignment instead of size.
>> > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Use
>> > element type size directly instead of computing it from alignment.
>> > Fix variable naming and comment.
>> >
>> > testsuite/ChangeLog:
>> >
>> > * lib/target-supports.exp
>> > (check_effective_target_vect_natural_alignment): New function.
>> > * gcc.dg/align-2.c: Only run on targets with natural alignment
>> > of vector types.
>> > * gcc.dg/vect/slp-25.c: Adjust tests for targets without natural
>> > alignment of vector types.
>
>
>> > OK for mainline?
>>
>> Ok.
>
> Would it be OK to backport this to 4.7 and possibly 4.6?
>
> This patch represents a change in the ABI on ARM (the alignment could
> potentially affect struct member offsets, for example), which could
> conceivably cause incompatibilities with code compiled with older
> versions of GCC.  We had originally decided we nevertheless want to
> implement this change on ARM, because:
>
> - GCC is now in compliance with the platform ABI document
> - the old code could actually be buggy since GCC *assumed* 16-byte
>   alignment that wasn't actually guaranteed
> - code actually affected by this change ought to be rare (code using
>   NEON vector types is rare in the first place, code using *structure
>   members* of such types is even rarer, and code using such structures
>   in cross-module APIs seems to be extremely rare)
>
> Nevertheless, given we do want to make this change, it would be best
> to roll it out as quickly as possible, to minimize the time period
> where people might use old (not yet fixed) compilers to generate
> non-ABI-compliant binaries.  Thus, we think it would be good for
> the change to be applied to all still open release branches as well.
>
> (Note that while the patch contains changes to common code, those
> should be no-ops for all targets that do not implement the new hook.)

I'll defer the decision to the target maintainers.  But please double-check
for any changes in the vectorizer parts when backporting to 4.6.

Thanks,
Richard.

> Bye,
> Ulrich
>
> --
>   Dr. Ulrich Weigand
>   GNU Toolchain for Linux on System z and Cell BE
>   ulrich.weig...@de.ibm.com
>

Re: add strnlen to libiberty (was Re: Assembly output optimisations)

2012-08-07 Thread Joseph S. Myers

On Tue, 7 Aug 2012, Ian Lance Taylor wrote:

> On Tue, Aug 7, 2012 at 2:30 AM, Hans-Peter Nilsson  wrote:
> >
> > Just don't forget that libiberty isn't a target library anymore.
> > To wit, the (GCC) run-time exception is moot for that code, AFAIK.
> > Maybe enough reason to abandon that rule so its code can be
> > truly and freely shared between GNU projects.
> 
> The libiberty licensing is certainly confused.  I just don't want to
> make it worse.

I think the natural way to sort it out is to move all the FSF-copyright 
files therein (including include/) to GPLv3, no license exception, except 
for cp-demangle.c (used in libstdc++-v3) and the headers it includes, 
which should have the GCC Runtime Library Exception notice.  libiberty is 
a library for a limited number of GNU projects, all under GPLv3; as far as 
I know the only reason it hasn't been converted to GPLv3 is that noone got 
around to doing so.  (gnulib also uses the practice of putting GPLv3 
license notices on the files even if they are also available under other 
licenses, with separate metadata indicating other licenses under which 
files are available.)

That wouldn't sort out the question of what "This file is part of" notices 
should be present, but would deal with the other license confusion.

(Ideally I think most of libiberty would be replaced by use of gnulib in 
the projects using libiberty - I see no real advantage to the GNU Project 
in having those two separate libraries for substantially the same purposes 
- but that's a much larger and harder task, which would also involve, for 
each libiberty file replaced by a gnulib version, ascertaining whether 
there are any features or local changes in the libiberty version that 
should be merged into the gnulib version or any common upstream such as 
glibc.  And some files in libiberty would probably need adding to gnulib 
as part of such a project.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [RFA 4.7/4.6] Re: [PATCH v2] Target-specific limits on vector alignment

2012-08-07 Thread Richard Earnshaw

On 07/08/12 16:04, Richard Guenther wrote:
> On Tue, Aug 7, 2012 at 4:56 PM, Ulrich Weigand  wrote:
>> Richard Guenther wrote:
>>> On Fri, Jul 27, 2012 at 5:24 PM, Ulrich Weigand  wrote:
 ChangeLog:

 * target.def (vector_alignment): New target hook.
 * doc/tm.texi.in (TARGET_VECTOR_ALIGNMENT): Document new hook.
 * doc/tm.texi: Regenerate.
 * targhooks.c (default_vector_alignment): New function.
 * targhooks.h (default_vector_alignment): Add prototype.
 * stor-layout.c (layout_type): Use targetm.vector_alignment.
 * config/arm/arm.c (arm_vector_alignment): New function.
 (TARGET_VECTOR_ALIGNMENT): Define.

 * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use
 vector type alignment instead of size.
 * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Use
 element type size directly instead of computing it from alignment.
 Fix variable naming and comment.

 testsuite/ChangeLog:

 * lib/target-supports.exp
 (check_effective_target_vect_natural_alignment): New function.
 * gcc.dg/align-2.c: Only run on targets with natural alignment
 of vector types.
 * gcc.dg/vect/slp-25.c: Adjust tests for targets without natural
 alignment of vector types.
>>
>>
 OK for mainline?
>>>
>>> Ok.
>>
>> Would it be OK to backport this to 4.7 and possibly 4.6?
>>
>> This patch represents a change in the ABI on ARM (the alignment could
>> potentially affect struct member offsets, for example), which could
>> conceivably cause incompatibilities with code compiled with older
>> versions of GCC.  We had originally decided we nevertheless want to
>> implement this change on ARM, because:
>>
>> - GCC is now in compliance with the platform ABI document
>> - the old code could actually be buggy since GCC *assumed* 16-byte
>>   alignment that wasn't actually guaranteed
>> - code actually affected by this change ought to be rare (code using
>>   NEON vector types is rare in the first place, code using *structure
>>   members* of such types is even rarer, and code using such structures
>>   in cross-module APIs seems to be extremely rare)
>>
>> Nevertheless, given we do want to make this change, it would be best
>> to roll it out as quickly as possible, to minimize the time period
>> where people might use old (not yet fixed) compilers to generate
>> non-ABI-compliant binaries.  Thus, we think it would be good for
>> the change to be applied to all still open release branches as well.
>>
>> (Note that while the patch contains changes to common code, those
>> should be no-ops for all targets that do not implement the new hook.)
> 
> I'll defer the decision to the target maintainers.  But please double-check
> for any changes in the vectorizer parts when backporting to 4.6.
> 

My inclination is to back-port this to all maintained branches (for
consistency).

However, it does need to be release-noted.

R.

Re: [RFA 4.7/4.6] Re: [PATCH v2] Target-specific limits on vector alignment

2012-08-07 Thread Ramana Radhakrishnan

>> (Note that while the patch contains changes to common code, those
>> should be no-ops for all targets that do not implement the new hook.)
>
> I'll defer the decision to the target maintainers.


I'd rather have this consistent across all maintained release branches
today than to leave this for an ABI break in 4.8 timeframe.
 In addition I'd like this documented in changes.html for each of the
release branches.


Thanks,
Ramana

Re: [PATCH] put exception tables for comdat functions in comdat, too

2012-08-07 Thread Richard Henderson

On 08/06/2012 05:45 PM, Sandra Loosemore wrote:
> OK to commit the patch as originally posted?

Yes.

I mis-read skimming the part of that function about relocations.


r~

Re: libgo patch committed: Support NumCPU on more platforms

2012-08-07 Thread Ian Lance Taylor

On Tue, Aug 7, 2012 at 5:21 AM, Richard Earnshaw  wrote:
>
> Wouldn't it be more useful on Linux to check the task's affinity
> settings?  Then when a task is locked to a limited set of cores it won't
> overload those cores with threads.

Good question.  I'm not sure.  This patch does not change the number
of threads that the Go runtime will use, so your particular concern is
not an issue.  This patch changes the return value of the standard
library function runtime.NumCPU.  And if the number of processors is >
1, it tweaks some of the runtime code, e.g., to use a limited time
spin lock (if the number of processors == 1, a spin lock is pointless,
and the code moves straight to a futex).  I guess that if the process
is restricted to a single processor by affinity, then it might be
better to be aware of that and skip the spin lock.  But I'm not sure
what the return value of runtime.NumCPU should be.  It's documented as
"NumCPU returns the number of logical CPUs on the local machine."

Ian

Re: [PATCH v3] s390: Convert from sync to atomic optabs

2012-08-07 Thread Ulrich Weigand

Richard Henderson wrote:

> [ This ought to be exactly the patch you bootstrapped.  It does
>   not include the SEQ follow-up. ]
> 
> Split out s390_two_part_insv from s390_expand_cs_hqi to try
> harder to use bit insertion instructions in the CAS loop.
> 
> Reorg s390_expand_insv to aid that.  Try RISBG last, after other
> mechanisms have failed; don't require operands in registers for
> it but force them there instead.  Try a limited form of ICM.

This looks good to me.  Retested with no regressions.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Re: s390: Avoid CAS boolean output inefficiency

2012-08-07 Thread Ulrich Weigand

Richard Henderson wrote:
> On 08/06/2012 11:34 AM, Ulrich Weigand wrote:
> > There is one particular inefficiency I have noticed.  This code:
> > 
> >   if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))
> > abort ();
> > 
> > from atomic-compare-exchange-3.c gets compiled into:
> > 
> > l   %r3,0(%r2)
> > larl%r1,v
> > cs  %r3,%r4,0(%r1)
> > ipm %r1
> > sra %r1,28
> > st  %r3,0(%r2)
> > ltr %r1,%r1
> > jne .L3
> > 
> > which is extremely inefficient; it converts the condition code into
> > an integer using the slow ipm, sra sequence, just so that it can
> > convert the integer back into a condition code via ltr and branch
> > on it ...
> 
> This was caused (or perhaps abetted by) the representation of EQ
> as NE ^ 1.  With the subsequent truncation and zero-extend, I
> think combine reached its insn limit of 3 before seeing everything
> it needed to see.

Yes, combine isn't able to handle everything in one go, but it finds
an intermediate insn.  With the old __sync_compare_and_swap, it is
then able to optimize everything away in a second step; the only
reason this doesn't work any more is the intervening store.

The following patch changes the builtin expander to pass a MEM oldval
as-is to the back-end expander, so that the back-end can move the
store to before the CC operation.  With that patch I'm also seeing
all the IPMs disappear.

[ Well, all except for this one:

> This happens because CSE notices the cbranch vs 0, and sets r116
> to zero along the path to
> 
>  32   if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG,
> __ATOMIC_RELEASE, 
> __ATOMIC_ACQUIRE))
> 
> at which point CSE decides that it would be cheaper to "re-use"
> the zero already in r116 instead of load another constant 0 here.
> After that, combine is ham-strung because r116 is not dead.

As you point out, this does indeed fix this problem as well:

> A short-term possibility is to have the CAS insns accept general_operand,
> so that the 0 gets merged.  
]


What do you think about this solution?  It has the advantage that
we still get the same xor code if we actually do need the ipm ...


Bye,
Ulrich


diff -ur gcc/builtins.c gcc.new/builtins.c
--- gcc/builtins.c  2012-08-07 16:04:45.054348099 +0200
+++ gcc.new/builtins.c  2012-08-07 15:44:01.304349225 +0200
@@ -5376,6 +5376,7 @@
 
   expect = expand_normal (CALL_EXPR_ARG (exp, 1));
   expect = convert_memory_address (Pmode, expect);
+  expect = gen_rtx_MEM (mode, expect);
   desired = expand_expr_force_mode (CALL_EXPR_ARG (exp, 2), mode);
 
   weak = CALL_EXPR_ARG (exp, 3);
@@ -5383,14 +5384,15 @@
   if (host_integerp (weak, 0) && tree_low_cst (weak, 0) != 0)
 is_weak = true;
 
-  oldval = copy_to_reg (gen_rtx_MEM (mode, expect));
-
+  oldval = expect;
   if (!expand_atomic_compare_and_swap ((target == const0_rtx ? NULL : &target),
   &oldval, mem, oldval, desired,
   is_weak, success, failure))
 return NULL_RTX;
 
-  emit_move_insn (gen_rtx_MEM (mode, expect), oldval);
+  if (oldval != expect)
+emit_move_insn (expect, oldval);
+
   return target;
 }
 
diff -ur gcc/config/s390/s390.md gcc.new/config/s390/s390.md
--- gcc/config/s390/s390.md 2012-08-07 16:04:54.204348621 +0200
+++ gcc.new/config/s390/s390.md 2012-08-07 16:00:21.934348628 +0200
@@ -8870,7 +8870,7 @@
 
 (define_expand "atomic_compare_and_swap"
   [(match_operand:SI 0 "register_operand") ;; bool success output
-   (match_operand:DGPR 1 "register_operand")   ;; oldval output
+   (match_operand:DGPR 1 "nonimmediate_operand");; oldval output
(match_operand:DGPR 2 "memory_operand") ;; memory
(match_operand:DGPR 3 "register_operand")   ;; expected intput
(match_operand:DGPR 4 "register_operand")   ;; newval intput
@@ -8879,9 +8879,17 @@
(match_operand:SI 7 "const_int_operand")]   ;; failure model
   ""
 {
-  rtx cc, cmp;
+  rtx cc, cmp, output = operands[1];
+
+  if (!register_operand (output, mode))
+output = gen_reg_rtx (mode);
+
   emit_insn (gen_atomic_compare_and_swap_internal
-(operands[1], operands[2], operands[3], operands[4]));
+(output, operands[2], operands[3], operands[4]));
+
+  if (output != operands[1])
+emit_move_insn (operands[1], output);
+
   cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
   cmp = gen_rtx_EQ (SImode, cc, const0_rtx);
   emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx));


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Re: [patch] Reduce memory overhead for build_insn_chain

2012-08-07 Thread Vladimir Makarov


On 08/07/2012 02:22 AM, Steven Bosscher wrote:

Hello,

In the test case for PR54146, build_insn_chain spends almost all its
time in this loop:

   FOR_EACH_BB_REVERSE (bb)
 {
   bitmap_iterator bi;
   rtx insn;

   CLEAR_REG_SET (live_relevant_regs);
-->   memset (live_subregs_used, 0, max_regno * sizeof (int));

The test case has >400,000 basic blocks and >1,000,000 regs...

Fortunately this can be done much more efficiently if one realizes
that an sbitmap knows its own size (using SBITMAP_SIZE that I
introduced recently). With that, live_subregs_used can be a sparse
bitmap.

Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?



Ok.  Thanks for the patch, Steven.

Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)

2012-08-07 Thread Mike Stump

On Aug 6, 2012, at 5:35 PM, Lawrence Crowl wrote:
> Convert double_int from a struct with function into a class with
> operators and methods.

We have a wide_int class that replaces this class.  :-(  It would have been 
better to just convert it.  Do you guys have a timeframe for the cxx-conversion 
landing?

Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)

2012-08-07 Thread Diego Novillo


On 12-08-07 13:22 , Mike Stump wrote:

On Aug 6, 2012, at 5:35 PM, Lawrence Crowl wrote:

Convert double_int from a struct with function into a class with
operators and methods.


We have a wide_int class that replaces this class.  :-(  It would
have been better to just convert it.  Do you guys have a timeframe
for the cxx-conversion landing?


Soon (for "soon" measured in days).  I will try to send the merge 
patches this week.



Diego.

[RFH / Patch] PR 54191

2012-08-07 Thread Paolo Carlini


Hi,

this is an update on C++/54191, where in a number of cases, having to do 
with inaccessible bases, we don't handle correctly access control under 
SFINAE.


The attached draft patch p fixes a number of tests (and passes 
regression testing), where we are currently emitting inaccessible base 
diagnostics from lookup_base instead of properly handling the failure in 
SFINAE. As you can see in the draft, after a number of tries, I'm simply 
adding a tsubst_flags_t parameter to lookup_base: obviously something is 
redundant with base_access, but knowing that a returned error_mark_node 
in every case means that the base is either ambiguous or inaccessible 
(when we error out outside SFINAE) is *so* convenient! For comparison, 
the existing get_delta_difference_1 shows an alternate approach, quite 
contorted IMHO. But these are just implementation details...


More interesting are the cases, commented out in the attached 54191.C, 
which we still get wrong also with the patch applied. In those cases we 
do *not* currently produce any inaccessible base diagnostics, we simply 
mishandle the SFINAE, apparently we don't perform any form of access 
control. I'm really looking for help about this set of 5 tests (4 braced 
casts + conditional operator)


Thanks!
Paolo.

///

Index: typeck.c
===
--- typeck.c(revision 190207)
+++ typeck.c(working copy)
@@ -2246,8 +2246,8 @@ build_class_member_access_expr (tree object, tree
  tree binfo;
  base_kind kind;
 
- binfo = lookup_base (access_path ? access_path : object_type,
-  member_scope, ba_unique,  &kind);
+ binfo = lookup_base_sfinae (access_path ? access_path : object_type,
+ member_scope, ba_unique, &kind, complain);
  if (binfo == error_mark_node)
return error_mark_node;
 
@@ -2630,7 +2630,8 @@ finish_class_member_access_expr (tree object, tree
}
 
  /* Find the base of OBJECT_TYPE corresponding to SCOPE.  */
- access_path = lookup_base (object_type, scope, ba_check, NULL);
+ access_path = lookup_base_sfinae (object_type, scope, ba_check,
+   NULL, complain);
  if (access_path == error_mark_node)
return error_mark_node;
  if (!access_path)
@@ -3150,8 +3151,8 @@ get_member_function_from_ptrfunc (tree *instance_p
   if (!same_type_ignoring_top_level_qualifiers_p
  (basetype, TREE_TYPE (TREE_TYPE (instance_ptr
{
- basetype = lookup_base (TREE_TYPE (TREE_TYPE (instance_ptr)),
- basetype, ba_check, NULL);
+ basetype = lookup_base_sfinae (TREE_TYPE (TREE_TYPE (instance_ptr)),
+basetype, ba_check, NULL, complain);
  instance_ptr = build_base_path (PLUS_EXPR, instance_ptr, basetype,
  1, complain);
  if (instance_ptr == error_mark_node)
@@ -5995,9 +5996,9 @@ build_static_cast_1 (tree type, tree expr, bool c_
 ambiguity.  However, if this is a static_cast being performed
 because the user wrote a C-style cast, then accessibility is
 not considered.  */
-  base = lookup_base (TREE_TYPE (type), intype,
- c_cast_p ? ba_unique : ba_check,
- NULL);
+  base = lookup_base_sfinae (TREE_TYPE (type), intype,
+c_cast_p ? ba_unique : ba_check,
+NULL, complain);
 
   /* Convert from "B*" to "D*".  This function will check that "B"
 is not a virtual base of "D".  */
@@ -6119,9 +6120,9 @@ build_static_cast_1 (tree type, tree expr, bool c_
  && check_for_casting_away_constness (intype, type, STATIC_CAST_EXPR,
   complain))
return error_mark_node;
-  base = lookup_base (TREE_TYPE (type), TREE_TYPE (intype),
- c_cast_p ? ba_unique : ba_check,
- NULL);
+  base = lookup_base_sfinae (TREE_TYPE (type), TREE_TYPE (intype),
+c_cast_p ? ba_unique : ba_check,
+NULL, complain);
   expr = build_base_path (MINUS_EXPR, expr, base, /*nonnull=*/false,
  complain);
   return cp_fold_convert(type, expr);
@@ -7181,16 +7182,11 @@ get_delta_difference_1 (tree from, tree to, bool c
 {
   tree binfo;
   base_kind kind;
-  base_access access = c_cast_p ? ba_unique : ba_check;
 
-  /* Note: ba_quiet does not distinguish between access control and
- ambiguity.  */
-  if (!(complain & tf_error))
-access |= ba_quiet;
+  binfo = lookup_base_sfinae (to, from, c_cast_p ? ba_unique : ba_check,
+ &kind, complain);
 
-  binfo = lookup_base (to, fr

Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)

2012-08-07 Thread Lawrence Crowl

On 8/7/12, Richard Guenther  wrote:
> On Tue, Aug 7, 2012 at 2:35 AM, Lawrence Crowl  wrote:
> > Convert double_int from a struct with function into a class with
> > operators and methods.
> >
> > This patch adds the methods and operators.  In general functions of
> > the form "double_int_whatever" become member functions "whatever" or,
> > when possible, operators.
> >
> > Every attempt has been made to preserve the existing algorithms, even
> > at the expense of some optimization opportunities.  Some examples:
> >
> >   The ext operation takes a value and returns a value.  However, that
> >   return value is usually assigned to the original variable.  An
> >   operation that modified a variable would be more efficient.
>
> That's not always the case though and I think the interface should be
> consistent with existing behavior to avoid errors creeping in during the
> transition.

We should probably think about naming conventions for mutating operations,
as I expect we will want them eventually.

> >   In some cases, an outer sign-specific function calls an inner
> >   function with the sign as a parameter, which then decides which
> >   implementation to do.  Decisions should not be artificially
> >   introduced, and the implementation of each case should be exposed as
> >   a separate routine.
> >
> >   The existing operations are implemented in terms of the new
> >   operations, which necessarily adds a layer between the new code and
> >   the existing users.  Once all files have migrated, this layer will
> >   be removed.
> >
> >   There are several existing operations implemented in terms of even
> >   older legacy operations.  This extra layer has not been removed.
> >
> > On occasion though, parameterized functions are often called
> > with a constant argments.  To support static statement of intent,
> > and potentially faster code in the future, there are several new
> > unparameterized member functions.  Some examples:
> >
> >   Four routines now encode both 'arithmetic or logical' and 'right or
> >   left' shift as part of the funciton name.
> >
> >   Four routines now encode both 'signed or unsigned' and 'less than or
> >   greater than' as part of the function name.
>
> For most parts overloads that take an (unsigned) HOST_WIDE_INT argument
> would be nice, as well as the ability to say dbl + 1.

Hm.  There seems to be significant opinion that there should not be any
implicit conversions.  I am okay with operations as above, but would like
to hear the opinions of others.

> > -typedef struct
> > +typedef struct double_int
> >  {
> > +public:
> > +  /* Normally, we would define constructors to create instances.
> > + Two things prevent us from doing so.
> > + First, defining a constructor makes the class non-POD in C++03,
> > + and we certainly want double_int to be a POD.
> > + Second, the GCC conding conventions prefer explicit conversion,
> > + and explicit conversion operators are not available until C++11.
> > */
> > +
> > +  static double_int make (unsigned HOST_WIDE_INT cst);
> > +  static double_int make (HOST_WIDE_INT cst);
> > +  static double_int make (unsigned int cst);
> > +  static double_int make (int cst);
>
> Did we somehow decide to not allow constructors?  It's odd to convert to
> C++ and end up with static member functions resembling them ...

Constructors are allowed, but PODs are often passed more efficiently.
That property seemed particularly important for double_int.

> Also I believe the conversion above introduces possible migration errors.
> Think of a previous
>
>  HOST_WIDE_INT a;
>  double_int d = uhwi_to_double_int (a);
>
> if you write that now as
>
>  HOST_WIDE_INT a;
>  double_int d = double_int::make (a);
>
> you get the effect of shwi_to_double_int.  Oops.

Hm.  I think the code was more likely to be wrong originally,
but I take your point.

> So as an intermediate
> I'd like you _not_ to introduce the make () overloads.

Okay.

> Btw, if HOST_WIDE_INT == int the above won't even compile.

Is that true of any host?  I am not aware of any.  Anyway, it is
moot if we remove the overloads.

-- 
Lawrence Crowl

Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)

2012-08-07 Thread Lawrence Crowl

On 8/7/12, Mike Stump  wrote:
> On Aug 6, 2012, at 5:35 PM, Lawrence Crowl wrote:
>> Convert double_int from a struct with function into a class with
>> operators and methods.
>
> We have a wide_int class that replaces this class.  :-(

Really?  Where?  I don't see neither definition nor use under gcc.

> It would have been better to just convert it.

That depends on how easy it is to migrate.  In the process I
learned that double_int isn't actually a double int, but is a
precision-controlled int.

-- 
Lawrence Crowl

[patch] PR54150

2012-08-07 Thread Steven Bosscher

Hello,

This patch fixes PR54150. The patch was approved by Richi in Bugzilla,
but I won't commit this for a few days so that others can speak up if
they don't agree.

Ciao!
Steven


PR54150.diff
Description: Binary data

Re: [patch] PR54150

2012-08-07 Thread Gabriel Dos Reis

On Tue, Aug 7, 2012 at 1:54 PM, Steven Bosscher  wrote:
> Hello,
>
> This patch fixes PR54150. The patch was approved by Richi in Bugzilla,
> but I won't commit this for a few days so that others can speak up if
> they don't agree.
>
> Ciao!
> Steven

OK.  Thanks for the cleanup!

-- Gaby

Re: PATCH: PR rtl-optimization/54157: [x32] -maddress-mode=long failures

2012-08-07 Thread H.J. Lu

On Sun, Aug 5, 2012 at 12:47 AM, Richard Sandiford
 wrote:
> For the record, I can't approve this, but...
>
> "H.J. Lu"  writes:
>> i386,md has
>>
>> (define_expand "extzv"
>>   [(set (match_operand:SI 0 "register_operand")
>> (zero_extract:SI (match_operand 1 "ext_register_operand")
>>  (match_operand:SI 2 "const8_operand")
>>  (match_operand:SI 3 "const8_operand")))]
>>   ""
>>
>> and mode_for_extraction picks word_mode for operand 1 since
>> its mode is VOIDmode.  This patch changes mode_for_extraction
>> to return the mode of operand 1 if the pattern accepts any mode.
>> I added *jcc_btsi_mask_2 since combine now tries a different
>> pattern, which leads to test failures on gcc.target/i386/bt-mask-1.c
>> and  gcc.target/i386/bt-mask-2.  I didn't update *jcc_btsi_mask_1
>> instead since I am not sure if it is used elsewhere.  Tested on
>> Linux/x86-64 and Linux/x32.  OK for trunk?
>
> the mode of the extraction operand is defined to be word_mode
> for registers (see md.texi), so that at least would need to
> be updated.  But I'm not convinced that the wanted_inner_mode here:
>
>if (! in_dest && unsignedp
> -  && mode_for_extraction (EP_extzv, -1) != MAX_MACHINE_MODE)
> +  && mode_for_extraction (EP_extzv, -1, VOIDmode) != MAX_MACHINE_MODE)
>  {
> -  wanted_inner_reg_mode = mode_for_extraction (EP_extzv, 1);
> -  pos_mode = mode_for_extraction (EP_extzv, 3);
> -  extraction_mode = mode_for_extraction (EP_extzv, 0);
> +  wanted_inner_reg_mode = mode_for_extraction (EP_extzv, 1,
> +  inner_mode);
> +  pos_mode = mode_for_extraction (EP_extzv, 3, VOIDmode);
> +  extraction_mode = mode_for_extraction (EP_extzv, 0, VOIDmode);
>  }
>
> is right.  inner_mode is the mode of the thing we're extracting,
> which doesn't ncessarily have anything to do with what the ext*
> patterns support.
>
> FWIW, in reply to your force_to_mode message, gen_lowpart_for_combine
> looks a bit odd:
>
>   if (omode == imode)
> return x;
>
>   /* Return identity if this is a CONST or symbolic reference.  */
>   if (omode == Pmode
>   && (GET_CODE (x) == CONST
>   || GET_CODE (x) == SYMBOL_REF
>   || GET_CODE (x) == LABEL_REF))
> return x;
>
> So if we know the modes are different, we nevertheless return the
> original mode for CONST, SYMBOL_REF or LABEL_REF.  Surely the caller
> isn't going to be expecting that?  If we're not prepared to change
> the mode to the one that the caller asked for then I think we should
> goto fail instead.
>
> I don't know if you're hitting that or not.
>
> Richard

Yes, I did

Breakpoint 5, gen_lowpart_for_combine (omode=DImode, x=0x71b03440)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:10778
10778   {
(const:SI (plus:SI (symbol_ref:SI ("tmp2") )
(const_int 99452 [0x1847c])))
(gdb) c
Continuing.

Breakpoint 5, gen_lowpart_for_combine (omode=DImode, x=0x71b03440)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:10778
10778   {
(const:SI (plus:SI (symbol_ref:SI ("tmp2") )
(const_int 99452 [0x1847c])))
(gdb) bt
#0  gen_lowpart_for_combine (omode=DImode, x=0x71b03440)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:10778
#1  0x00c83805 in gen_lowpart_or_truncate (x=0x71b03440,
mode=DImode) at /export/gnu/import/git/gcc-misc/gcc/combine.c:8110
#2  force_to_mode (x=0x71b02cd8, mode=DImode, mask=,
just_select=)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:8389
#3  0x00c8458a in make_extraction (mode=SImode, inner=0x71b02cd8,
pos=2, pos_rtx=, len=2, unsignedp=1,
in_dest=, in_compare=0)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:7480
#4  0x00c86f9c in make_compound_operation (x=x@entry=0x71b02d68,
in_code=in_code@entry=SET)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:7863
#5  0x00c89924 in simplify_set (x=0x71af7c78)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:6539
#6  combine_simplify_rtx (x=0x71af7c78, op0_mode=VOIDmode,
in_dest=in_dest@entry=0, in_cond=in_cond@entry=0)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:5971
#7  0x00c8bd1b in subst (x=, from=,
to=, in_dest=0, in_cond=0, unique_copy=0)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:5301
#8  0x00c8b8ab in subst (x=0x71aea870, from=0x71afc880,
---Type  to continue, or q  to quit---
to=0x71b02cd8, in_dest=0, in_cond=0, unique_copy=0)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:5165
#9  0x00c8f875 in try_combine (i3=i3@entry=0x71afe5a0,
i2=i2@entry=0x71afe558, i1=, i0=,
i0@entry=0x0, new_direct_jump_p=new_direct_jump_p@entry=0x7fffdb64,
last_combined_insn=last_combined_insn@entry=0x71afe5a0)
at /export/gnu/import/git/gcc-misc/gcc/combine.c:3260
#10 0x00c94673 in combine_instructions (nregs=,
f=) at /export/gnu/import/

Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)

2012-08-07 Thread Mike Stump

On Aug 7, 2012, at 11:38 AM, Lawrence Crowl wrote:
> Hm.  There seems to be significant opinion that there should not be any
> implicit conversions.  I am okay with operations as above, but would like
> to hear the opinions of others.

If there is an agreed upon and expected semantic, having them are useful.  In 
the wide-int world, which replaces double_int, I think there is an agreeable 
semantic and I think it is useful, so, I think we should plan on having them, 
though, I'd be fine with punting their implementation until such time as 
someone needs it.  If no one every needs the routine, I don't see the harm in 
not implementing it.

>> Did we somehow decide to not allow constructors?  It's odd to convert to
>> C++ and end up with static member functions resembling them ...
> 
> Constructors are allowed, but PODs are often passed more efficiently.
> That property seemed particularly important for double_int.

Show us the difference in timing.  Show us the generated code.  I can't imagine 
that it could ever matter.  I think a pretty abi is worth the cost.

Re: C++ PATCH for c++/48370 (extending lifetime of temps in aggregate initialization)

2012-08-07 Thread H.J. Lu

On Tue, Jun 12, 2012 at 10:00 PM, H.J. Lu  wrote:
> On Thu, Nov 3, 2011 at 8:50 PM, Jason Merrill  wrote:
>> 12.2 states that a temporary bound to a reference lives as long as the
>> reference itself.  We have done that for reference variables, but not in
>> other cases, such as aggregate initialization of a struct with reference
>> members.  In C++11, elements of a std::initializer_list have the same
>> semantics; they live as long as the std::initializer_list.  Again, we were
>> implementing that for initializer_list variables but not for
>> initializer_list subobjects.  This patch fixes that.
>>
>> Furthermore, if a temporary's lifetime is extended, we need to also extend
>> the lifetimes of any temporaries bound to references in its initializer, and
>> so on.
>>
>> The patch introduces a function extend_ref_init_temps called from
>> store_init_value after the call to digest_init.  To expose elements of an
>> initializer_list to this function, I needed to stop using
>> build_aggr_init_full_exprs for aggregate initialization of arrays, and
>> consequently needed to call build_vec_init from store_init_value to use one
>> EH region for cleaning up the whole array rather than one per element.  To
>> deal with multiple extended temporaries, we need to change the cleanup
>> pointer from a single tree to a VEC, and add a discriminator to the mangled
>> name of reference init temporaries; this has no ABI impact, since the
>> temporaries have no linkage, but I also updated the demangler accordingly.
>>
>> Since we now do lifetime extension in extend_ref_init_temps, we can
>> drastically simplify initialize_reference and do away with
>> build_init_list_var_init.
>>
>> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53650
>

This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54197

-- 
H.J.

Re: [google/gcc-4_7] Backport patch to add line info to non-virtual thunks

2012-08-07 Thread Rong Xu

Looks good to me for google-4_7 branches.

-Rong

On Mon, Aug 6, 2012 at 5:03 PM, Cary Coutant  wrote:
> This patch is for the google/gcc-4_7 branch.  It backports the following
> patch from trunk at r190190:
>
>   http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00321.html
>
> GCC generates non-virtual thunks directly to assembly code, which
> causes a couple of problems. First, it doesn't add source location
> information to the thunk, so the thunk simply inherits the random
> location from the end of the function in front of it (the function
> it's a thunk for). In two different compilation units compiled
> with different options, this could result in two different locations
> for the same thunk, and gold will give a false positive ODR
> violation warning.
>
> Second, if you try to compile with -S -dA, GCC crashes in final(),
> where it's trying to build a mapping from insn to bb, because the
> function has no cfg, and FOR_EACH_BB_REVERSE tries to dereference
> cfun->cfg without checking for non-NULL.
>
> Bootstrapped and ran testsuite with no regressions.
>
> Google ref b/6891766.
>
>
> 2012-08-06  Cary Coutant  
>
> gcc/
> * cgraphunit.c (assemble_thunk): Add source line info.
> * final.c (final): Check for non-null cfg pointer.
>
> gcc/testsuite/
> * g++.dg/debug/dwarf2/non-virtual-thunk.C: New test case.
>
>
> Index: gcc/final.c
> ===
> --- gcc/final.c (revision 190189)
> +++ gcc/final.c (working copy)
> @@ -1761,11 +1761,13 @@ final (rtx first, FILE *file, int optimi
>start_to_bb = XCNEWVEC (basic_block, bb_map_size);
>end_to_bb = XCNEWVEC (basic_block, bb_map_size);
>
> -  FOR_EACH_BB_REVERSE (bb)
> -   {
> - start_to_bb[INSN_UID (BB_HEAD (bb))] = bb;
> - end_to_bb[INSN_UID (BB_END (bb))] = bb;
> -   }
> +  /* There is no cfg for a thunk.  */
> +  if (!cfun->is_thunk)
> +   FOR_EACH_BB_REVERSE (bb)
> + {
> +   start_to_bb[INSN_UID (BB_HEAD (bb))] = bb;
> +   end_to_bb[INSN_UID (BB_END (bb))] = bb;
> + }
>  }
>
>/* Output the insns.  */
> Index: gcc/cgraphunit.c
> ===
> --- gcc/cgraphunit.c(revision 190189)
> +++ gcc/cgraphunit.c(working copy)
> @@ -1799,6 +1799,10 @@ assemble_thunk (struct cgraph_node *node
>init_function_start (thunk_fndecl);
>cfun->is_thunk = 1;
>assemble_start_function (thunk_fndecl, fnname);
> +  (*debug_hooks->source_line) (DECL_SOURCE_LINE (thunk_fndecl),
> +  DECL_SOURCE_FILE (thunk_fndecl),
> +  /* discriminator */ 0,
> +  /* is_stmt */ 1);
>
>targetm.asm_out.output_mi_thunk (asm_out_file, thunk_fndecl,
>fixed_offset, virtual_value, alias);
> Index: gcc/testsuite/g++.dg/debug/dwarf2/non-virtual-thunk.C
> ===
> --- gcc/testsuite/g++.dg/debug/dwarf2/non-virtual-thunk.C   (revision 0)
> +++ gcc/testsuite/g++.dg/debug/dwarf2/non-virtual-thunk.C   (revision 0)
> @@ -0,0 +1,39 @@
> +// { dg-do compile }
> +// { dg-options "-g2 -dA" }
> +
> +// Verify that line number info is output for the non-virtual
> +// thunks for C::~C().
> +// { dg-final { scan-assembler "thunk.C:30" } }
> +
> +class A
> +{
> + public:
> +  A();
> +  virtual ~A();
> + private:
> +  int i;
> +};
> +
> +class B
> +{
> + public:
> +  B();
> +  virtual ~B();
> + private:
> +  int i;
> +};
> +
> +class C : public A, public B
> +{
> + public:
> +  C();
> +  virtual ~C(); // line 30
> +};
> +
> +C::C()
> +{
> +}
> +
> +C::~C()
> +{
> +}

[PATCH][Cilkplus] Remove unwanted static chain.

2012-08-07 Thread Iyer, Balaji V

Hello Everyone,
This patch is for the Cilk Plus branch affecting mainly the C++ 
compiler. This patch will store the initial value of a loop correctly and 
remove the unnecessary static chain usage for some cases of Cilk_for.

Thanks,

Balaji V. Iyer.
Index: parser.c
===
--- parser.c(revision 190195)
+++ parser.c(working copy)
@@ -28351,6 +28351,13 @@
   FOR_EXPR (statement) = decl;
   CILK_FOR_GRAIN (statement) = grain; 
 
+  /*  If an initial value is available, and it is of type integer, then we
+  save it in CILK_FOR_INIT.  */
+  if (init && TREE_TYPE (init) && INTEGRAL_TYPE_P (TREE_TYPE (init))) 
+CILK_FOR_INIT (statement) = init;
+  else
+CILK_FOR_INIT (statement) = NULL_TREE;
+
   finish_cilk_for_init_stmt (statement);
 
   if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
Index: ChangeLog.cilk
===
--- ChangeLog.cilk  (revision 190195)
+++ ChangeLog.cilk  (working copy)
@@ -1,3 +1,7 @@
+2012-08-07  Balaji V. Iyer  
+
+   * parser.c (cp_parser_cilk_for): Stored the initial value in cilk_for 
tree.
+
 2012-08-01  Balaji V. Iyer  
 
* parser.c (cp_parser_userdef_char_literal): Replaced CALL_SPAWN and

RE: [PATCH][Cilkplus] Remove unwanted static chain.

2012-08-07 Thread Iyer, Balaji V

Sorry, I was in the wrong directory when I was creating the patch. Here is the 
fixed patch.

Thanks,

Balaji V. Iyer.

-Original Message-
From: Iyer, Balaji V 
Sent: Tuesday, August 07, 2012 4:18 PM
To: 'gcc-patches@gcc.gnu.org'
Subject: [PATCH][Cilkplus] Remove unwanted static chain.

Hello Everyone,
This patch is for the Cilk Plus branch affecting mainly the C++ 
compiler. This patch will store the initial value of a loop correctly and 
remove the unnecessary static chain usage for some cases of Cilk_for.

Thanks,

Balaji V. Iyer.
Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c (revision 190195)
+++ gcc/cp/parser.c (working copy)
@@ -28351,6 +28351,13 @@
   FOR_EXPR (statement) = decl;
   CILK_FOR_GRAIN (statement) = grain; 
 
+  /*  If an initial value is available, and it is of type integer, then we
+  save it in CILK_FOR_INIT.  */
+  if (init && TREE_TYPE (init) && INTEGRAL_TYPE_P (TREE_TYPE (init))) 
+CILK_FOR_INIT (statement) = init;
+  else
+CILK_FOR_INIT (statement) = NULL_TREE;
+
   finish_cilk_for_init_stmt (statement);
 
   if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
Index: gcc/cp/ChangeLog.cilk
===
--- gcc/cp/ChangeLog.cilk   (revision 190195)
+++ gcc/cp/ChangeLog.cilk   (working copy)
@@ -1,3 +1,7 @@
+2012-08-07  Balaji V. Iyer  
+
+   * parser.c (cp_parser_cilk_for): Stored the initial value in cilk_for 
tree.
+
 2012-08-01  Balaji V. Iyer  
 
* parser.c (cp_parser_userdef_char_literal): Replaced CALL_SPAWN and

Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)

2012-08-07 Thread Mike Stump

On Aug 7, 2012, at 11:42 AM, Lawrence Crowl wrote:
> On 8/7/12, Mike Stump  wrote:
>> On Aug 6, 2012, at 5:35 PM, Lawrence Crowl wrote:
>>> Convert double_int from a struct with function into a class with
>>> operators and methods.
>> 
>> We have a wide_int class that replaces this class.  :-(
> 
> Really?  Where?  I don't see neither definition nor use under gcc.

It is being developed; it is largely done, but is going through final reviews 
now.  When we finish with those review bits, we'll send it out.  It sounds like 
you'll beat us to the merge window, so, we'll cope with the fallout.

>> It would have been better to just convert it.
> 
> That depends on how easy it is to migrate.  In the process I
> learned that double_int isn't actually a double int, but is a
> precision-controlled int.

Well, I'd call it a pair of HOST_WIDE_INTs...  I think calling it a 
precision-controlled int is a tad optimistic.  For example, if want to control 
it to be 2048 bits, you'd find that your hopes are misplaced.  :-)

[PATCH, MIPS] fix MIPS16 hard-float function stub bugs

2012-08-07 Thread Sandra Loosemore

This patch fixes a group of bugs that were causing link errors on 
hard-float MIPS16 code built with a mips-linux-gnu toolchain.  This is 
Mark Mitchell's analysis of the original problem:



The MIPS16
instruction set cannot directly access hard-float registers, so helper
functions in libgcc are used to access floating-point registers.  In
addition, the compiler handles functions which take floating-point
values as arguments specially.  It generates a MIPS16 version of the
function that expects the floating-point arguments in general-purpose
registers.  But, since the hard-float ABI requires that floating-point
arguments are passed in floating-point registers, the compiler also
generates a MIPS32 stub which moves the floating-point values into
general-purpose registers and then tail-calls the MIPS16 function.
MIPS32 functions call the MIPS32 stub, while MIPS16 functions call the
MIPS16 version of the function.

The problem reported in the issue is an undefined symbol at link-time.
The undefined symbol is *not* one of the libgcc helper functions.  It is
also *not* the name of a MIPS32->MIPS16 stub function.  It's a third
category: a local alias for the MIPS16 function.  When generating a
stub, the compiler also generates a local (i.e., not visible outside the
object module) alias for the main function (not the stub).

So, for a function "f" with floating-point arguments, we end up with:

(a) "f" itself, a MIPS16 function, expecting floating-point arguments in
general-purpose registers

(b) "__fn_stub_f", a MIPS32 function, expecting floating-point arguments
in floating-point registers.  This function moves the arguments into
general-purpose registers and then tail-calls "f".

(c) "__fn_local_f", a local alias.

The purpose of the alias is as follows.  When making an indirect call
(i.e., a call via a function-pointer) to a function which either takes
floating-point arguments or returns a floating-point value from MIPS16
code, we must use a helper function in libgcc.  The helper function
moves values to/from the floating-point registers to conform to the
normal MIPS32 ABI.  But, if we're in MIPS16 code, and calling another
MIPS16 function indirectly, there's no point to shuffle things in and
out of floating-point registers.  In that case, we can skip the helper
function and just directly call the MIPS16 function.  And, if we're
calling from within the same object file, we don't want to use the name
"f" -- that will cause the linker to generate extra goop that we don't
need in the final executable.  So, we optimize by calling via the local
alias.

But, herein lies the bug.  We're trying to use the alias when we have a
function that returns a floating-point value -- even though it doesn't
have any floating-point arguments (see mips16_build_call_stub).  But, we
only generate stubs for functions that have floating-point arguments
(see mips16_build_function_stub) -- not for functions that have
floating-point return values.  And we generate aliases as the same time
that we generate stubs.  So, we never generate the alias.


So, the fix for this is to generate a local alias (but not a stub) for 
functions that return floating-point values, even if they do not have 
floating-point arguments.


In addition to that, there were still some related problems that had to 
be fixed to get this to work right:


* mips16_local_function_p, which is called from mips16_build_call_stub 
to detect the situations where we can optimize local calls, wasn't 
checking for weak or linkonce definitions, where the final copy of the 
function selected by the linker might come from a different object.


* The stubs for comdat functions need to go in the same comdat group as 
the function.


* The local stub symbol names should use the conventions for local 
symbol names.


We've had these fixes in our local code base for some time, and I 
regression-tested on a mips-linux-gnu mainline build.  OK to check in?


-Sandra


2012-08-07  Mark Mitchell 
Maciej W. Rozycki  
Julian Brown  

gcc/
* config/mips/mips.c (mips16_local_function_p): Also reject weak
and linkonce definitions.
(mips16_local_alias): Use LOCAL_LABEL_PREFIX for the symbol
generated.
(mips16_build_function_stub): Split out the alias creation part.
Declare stub once-only if target function is.
(mips16_build_function_stub_and_local_alias): New function to do
both things.
(mips16_build_call_stub): Declare stub once-only if target
function is.
(mips_output_function_prologue):  Also check for cases where a
a local alias only is needed to handle a floating-point return
value.

gcc/testsuite/
* gcc.target/mips/mips.exp (mips_option_groups): Add
interlink-mips16.
* gcc.target/mips/mips16-fn-local.c: New test.


Index: gcc/config/mips/mips.c
===
--- gcc/config

[PATCH, debug]: Fix PR 54177, Segfault in cselib_lookup due to NULL_RTX passed from vt_add_function_parameter

2012-08-07 Thread Uros Bizjak

Hello!

We can exit early from var_lowpart for matched modes. This avoids
situation, where var_lowpart tries to change

(gdb) p debug_rtx (incoming)
(concat:SC (reg:SF 48 $f16 [ x ])
(reg:SF 49 $f17 [ x+4 ]))
$1 = void

to SCmode, and returning NULL_RTX, since the RTX is neither REG, neither MEM.

2012-08-07  Uros Bizjak  

PR debug/54177
* var-tracking.c (var_lowpart): Exit early for matched modes.

Tested on x86_64-pc-linux-gnu {,-m32} and alphaev68-unknown-linux-gnu,
where the patch also fixes testsuite failure.

Patch was approved in the PR by Alexandre, and is committed to mainline SVN.

Uros.
Index: var-tracking.c
===
--- var-tracking.c  (revision 190150)
+++ var-tracking.c  (working copy)
@@ -5086,12 +5086,12 @@ var_lowpart (enum machine_mode mode, rtx loc)
 {
   unsigned int offset, reg_offset, regno;
 
+  if (GET_MODE (loc) == mode)
+return loc;
+
   if (!REG_P (loc) && !MEM_P (loc))
 return NULL;
 
-  if (GET_MODE (loc) == mode)
-return loc;
-
   offset = byte_lowpart_offset (mode, GET_MODE (loc));
 
   if (MEM_P (loc))

Re: Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)

2012-08-07 Thread Dimitrios Apostolou

I should mention that with my patch .ascii is used more aggresively than 
before, so if a string is longer than ELF_STRING_LIMIT it will be written 
as .ascii all of it, while in the past it would use .string for the 
string's tail. Example diff to original behaviour:



 .LASF15458:
-   .ascii  "SSA_VAR_P(DECL) (TREE_CODE (DECL) == VA"
-   .string "R_DECL || TREE_CODE (DECL) == PARM_DECL || TREE_CODE (DECL) == RESULT_DECL 
|| (TREE_CODE (DECL) == SSA_NAME && (TREE_CODE (SSA_NAME_VAR (DECL)) == VAR_DECL || 
TREE_CODE (SSA_NAME_VAR (DECL)) == PARM_DECL || TREE_CODE (SSA_NAME_VAR (DECL)) == 
RESULT_DECL)))"
+   .ascii  "SSA_VAR_P(DECL) (TREE_CODE (DECL) == VAR_DECL || TREE_CODE (D"
+   .ascii  "ECL) == PARM_DECL || TREE_CODE (DECL) == RESULT_DECL || (TREE"
+   .ascii  "_CODE (DECL) == SSA_NAME && (TREE_CODE (SSA_NAME_VAR (DECL)) "
+   .ascii  "== VAR_DECL || TREE_CODE (SSA_NAME_VAR (DECL)) == PARM_DECL |"
+   .ascii  "| TREE_CODE (SSA_NAME_VAR (DECL)) == RESULT_DECL)))\000"


BTW I can't find why ELF_STRING_LIMIT is only 256, it seems GAS supports 
arbitrary lengths. I'd have to change my code if we ever set it too high 
(or even unlimited) since I allocate the buffer on the stack.



Dimitris

Re: Value type of map need not be default copyable

2012-08-07 Thread Ollie Wild

On Sat, Aug 4, 2012 at 10:34 AM, Paolo Carlini  wrote:
>
>
> First, I think we should add libstdc++ in CC.
>
> Thus, I would recommend people working in this area to begin with
> unordered_map, because in that case emplace is already available, assuming
> that's really the point (and therefore reverting the patch was a good idea,
> luckily an existing testcase helped us)
>
> At the same time an implementation of emplace is definitely welcome, in
> any case.

I've attached a patch for unordered_map which solves the rvalue
reference problem.  For efficiency, I've created a new
_M_emplace_bucket method rather than call emplace directly.

I've verified all libstdc++ tests pass (sorry for the previous
oversight) and am running the full GCC test suite now.  However, I'd
appreciate any feedback on whether this is a reasonable approach.  STL
hacking is way outside my comfort zone.  ;-)

If this looks good, I'll take a stab at std::map.

Thanks,
Ollie


2012-08-03  Ollie Wild  

* include/bits/hashtable.h (_M_emplace_bucket): New function.
* include/bits/hashtable_policy.h (operator[](key_type&&)): Replace
_M_insert_bucket call with _M_emplace_bucket.
* testsuite/23_containers/unordered_map/operators/2.cc: New test.
commit dd690a5f164326c552c2450af6270ec27e9bfd8e
Author: Ollie Wild 
Date:   Tue Aug 7 16:34:05 2012 -0500

2012-08-03  Ollie Wild  

* include/bits/hashtable.h (_M_emplace_bucket): New function.
* include/bits/hashtable_policy.h (operator[](key_type&&)): Replace
_M_insert_bucket call with _M_emplace_bucket.
* testsuite/23_containers/unordered_map/operators/2.cc: New test.

 2012-08-04  Paolo Carlini  
 
Revert:
diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 2faf0b3..869b0e9 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -588,6 +588,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
iterator
_M_insert_bucket(_Arg&&, size_type, __hash_code);
 
+  template
+   iterator
+   _M_emplace_bucket(size_type, __hash_code, _Args&&... __args);
+
 
   template
std::pair
@@ -1356,6 +1360,51 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }
   }
 
+  // Insert v in bucket n (assumes no element with its key already present).
+  template
+template
+  typename _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
+ _H1, _H2, _Hash, _RehashPolicy,
+ _Traits>::iterator
+  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
+_H1, _H2, _Hash, _RehashPolicy, _Traits>::
+  _M_emplace_bucket(size_type __n, __hash_code __code, _Args&&... __args)
+  {
+   // First build the node to get access to the hash code
+   __node_type* __node = _M_allocate_node(std::forward<_Args>(__args)...);
+   __try
+ {
+
+   const __rehash_state& __saved_state = _M_rehash_policy._M_state();
+   std::pair __do_rehash
+ = _M_rehash_policy._M_need_rehash(_M_bucket_count,
+   _M_element_count, 1);
+
+   if (__do_rehash.first)
+ {
+   const key_type& __k = this->_M_extract()(__node->_M_v);
+   __n = __hash_code_base::_M_bucket_index(__k, __code,
+   __do_rehash.second);
+ }
+
+   this->_M_store_code(__node, __code);
+   if (__do_rehash.first)
+ _M_rehash(__do_rehash.second, __saved_state);
+
+   _M_insert_bucket_begin(__n, __node);
+   ++_M_element_count;
+   return iterator(__node);
+ }
+   __catch(...)
+ {
+   _M_deallocate_node(__node);
+   __throw_exception_again;
+ }
+  }
+
   // Insert v if no element with its key is already present.
   template_M_find_node(__n, __k, __code);
 
   if (!__p)
-   return __h->_M_insert_bucket(std::make_pair(std::move(__k),
-   mapped_type()),
-__n, __code)->second;
+   return __h->_M_emplace_bucket(__n, __code,
+ std::move(__k), mapped_type())->second;
   return (__p->_M_v).second;
 }
 
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/operators/2.cc 
b/libstdc++-v3/testsuite/23_containers/unordered_map/operators/2.cc
new file mode 100644
index 000..1940fa2
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/unordered_map/operators/2.cc
@@ -0,0 +1,44 @@
+// Copyright (C) 2012
+// Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your opti

Re: Value type of map need not be default copyable

2012-08-07 Thread Paolo Carlini


Hi,

(adding in CC Francois too)

On 08/07/2012 11:43 PM, Ollie Wild wrote:

On Sat, Aug 4, 2012 at 10:34 AM, Paolo Carlini  wrote:


First, I think we should add libstdc++ in CC.

Thus, I would recommend people working in this area to begin with
unordered_map, because in that case emplace is already available, assuming
that's really the point (and therefore reverting the patch was a good idea,
luckily an existing testcase helped us)

At the same time an implementation of emplace is definitely welcome, in
any case.

I've attached a patch for unordered_map which solves the rvalue
reference problem.  For efficiency, I've created a new
_M_emplace_bucket method rather than call emplace directly.

Patch looks already pretty good to me. A couple of comments:
- Did you consider whether _M_emplace_bucket could be used as an 
implementation detail of_M_emplace? Or figure out in any case a 
smart(er) refactoring to share code with the implementation of 
_M_emplace. I didn't study in detail your code, but I think we can avoid 
some redundancy.
- For consistency at least, I think we should get rid of the make_pair 
in the other overload too.

I've verified all libstdc++ tests pass (sorry for the previous
oversight) and am running the full GCC test suite now.  However, I'd
appreciate any feedback on whether this is a reasonable approach.  STL
hacking is way outside my comfort zone.  ;-)

If this looks good, I'll take a stab at std::map.
That would be great. We have a PR which remained open way too much time. 
The challenge is figuring out a way to avoid writing too much code 
similar to what we have already for the inserts. Honestly, I tried, a 
while ago, and failed, didn't see an easy way, but in any case, it's 
time to have this stuff implemented - I hear Marc ;) - one way or the 
other, for 4.8.0


Thanks,
Paolo.

Re: s390: Avoid CAS boolean output inefficiency

2012-08-07 Thread Richard Henderson

On 08/07/2012 10:02 AM, Ulrich Weigand wrote:
> The following patch changes the builtin expander to pass a MEM oldval
> as-is to the back-end expander, so that the back-end can move the
> store to before the CC operation.  With that patch I'm also seeing
> all the IPMs disappear.
...
> What do you think about this solution?  It has the advantage that
> we still get the same xor code if we actually do need the ipm ...

I'm ok with that patch.


r~

Re: Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)

2012-08-07 Thread Ian Lance Taylor

On Tue, Aug 7, 2012 at 2:24 PM, Dimitrios Apostolou  wrote:
>
> BTW I can't find why ELF_STRING_LIMIT is only 256, it seems GAS supports
> arbitrary lengths. I'd have to change my code if we ever set it too high (or
> even unlimited) since I allocate the buffer on the stack.

See the comment for ELF_STRING_LIMIT in config/elfos.h.  gas is not
the only ELF assembler.

Ian

[google] Remove deprecated pfmon-specific functions/structs from pmu-profile.c (issue6442086)

2012-08-07 Thread Chris Manghane

Removes references in libgcov.c to functions and structs removed from
pmu-profile.c

For google/main. Tested with crosstools and bootstrap.

2012-08-07  Chris Manghane  

* libgcc/pmu-profile.c
(enum pmu_tool_type): Remove pfmon-specific functions/structs.
(enum pmu_event_type): Ditto.
(enum pmu_state): Ditto.
(enum cpu_vendor_signature): Ditto.
(struct pmu_tool_info): Ditto.
(void gcov_write_branch_mispredict_infos): Ditto.
(get_x86cpu_vendor): Ditto.
(parse_pmu_profile_options): Ditto.
(start_addr2line_symbolizer): Ditto.
(reset_symbolizer_parent_pipes): Ditto.
(reset_symbolizer_child_pipes): Ditto.
(end_addr2line_symbolizer): Ditto.
(symbolize_addr2line): Ditto.
(start_pfmon_module): Ditto.
(convert_pct_to_unsigned): Ditto.
(parse_load_latency_line): Ditto.
(parse_branch_mispredict_line): Ditto.
(parse_pfmon_load_latency): Ditto.
(parse_pfmon_tool_header): Ditto.
(parse_pfmon_branch_mispredicts): Ditto.
(pmu_start): Ditto.
(init_pmu_branch_mispredict): Ditto.
(init_pmu_tool): Ditto.
(__gcov_init_pmu_profiler): Ditto.
(__gcov_start_pmu_profiler): Ditto.
(__gcov_stop_pmu_profiler): Ditto.
(gcov_write_branch_mispredict_line): Ditto.
(gcov_write_load_latency_infos): Ditto.
(gcov_write_branch_mispredict_infos): Ditto.
(gcov_write_tool_header): Ditto.
(__gcov_end_pmu_profiler): Ditto.
* libgcc/libgcov.c
(gcov_alloc_filename): Remove references to pfmon specific 
functions/structs.
(pmu_profile_stop): Ditto.
(gcov_exit): Ditto.
(__gcov_init): Ditto.
(__gcov_flush): Ditto.

Index: libgcc/pmu-profile.c
===
--- libgcc/pmu-profile.c(revision 190135)
+++ libgcc/pmu-profile.c(working copy)
@@ -67,169 +67,11 @@ see the files COPYING3 and COPYING.RUNTIME respect
 #define XDELETEVEC(p) free(p)
 #define XDELETE(p) free(p)
 
-#define PFMON_CMD "/usr/bin/pfmon"
-#define ADDR2LINE_CMD "/usr/bin/addr2line"
-#define PMU_TOOL_MAX_ARGS (20)
-static char default_addr2line[] = "??:0";
-static const char pfmon_ll_header[] = "# counts   %self%cum "
-"<10 <32 <64<256   <1024  >=1024  %wself  "
-"code addr symbol\n";
-static const char pfmon_bm_header[] =
-"# counts   %self%cum  code addr symbol\n";
-
-const char *pfmon_intel_ll_args[PMU_TOOL_MAX_ARGS] = {
-  PFMON_CMD,
-  "--aggregate-results",
-  "--follow-all",
-  "--with-header",
-  "--smpl-module=pebs-ll",
-  "--ld-lat-threshold=4",
-  "--pebs-ll-dcmiss-code",
-  "--resolve-addresses",
-  "-emem_inst_retired:LATENCY_ABOVE_THRESHOLD",
-  "--long-smpl-periods=1",
-  0  /* terminating NULL must be present */
-};
-
-const char *pfmon_amd_ll_args[PMU_TOOL_MAX_ARGS] = {
-  PFMON_CMD,
-  "--aggregate-results",
-  "--follow-all",
-  "-uk",
-  "--with-header",
-  "--smpl-module=ibs",
-  "--resolve-addresses",
-  "-eibsop_event:uops",
-  "--ibs-dcmiss-code",
-  "--long-smpl-periods=0x0",
-  0  /* terminating NULL must be present */
-};
-
-const char *pfmon_intel_brm_args[PMU_TOOL_MAX_ARGS] = {
-  PFMON_CMD,
-  "--aggregate-results",
-  "--follow-all",
-  "--with-header",
-  "--resolve-addresses",
-  "-eMISPREDICTED_BRANCH_RETIRED",
-  "--long-smpl-periods=1",
-  0  /* terminating NULL must be present */
-};
-
-const char *pfmon_amd_brm_args[PMU_TOOL_MAX_ARGS] = {
-  PFMON_CMD,
-  "--aggregate-results",
-  "--follow-all",
-  "--with-header",
-  "--resolve-addresses",
-  "-eRETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS",
-  "--long-smpl-periods=1",
-  0  /* terminating NULL must be present */
-};
-
-const char *addr2line_args[PMU_TOOL_MAX_ARGS] = {
-  ADDR2LINE_CMD,
-  "-e",
-  0  /* terminating NULL must be present */
-};
-
-
-enum pmu_tool_type
-{
-  PTT_PFMON,
-  PTT_LAST
-};
-
-enum pmu_event_type
-{
-  PET_INTEL_LOAD_LATENCY,
-  PET_AMD_LOAD_LATENCY,
-  PET_INTEL_BRANCH_MISPREDICT,
-  PET_AMD_BRANCH_MISPREDICT,
-  PET_LAST
-};
-
-typedef struct pmu_tool_fns {
-  const char *name; /* name of the pmu tool */
-  /* pmu tool commandline argument.  */
-  const char **arg_array;
-  /* Initialize pmu module.  */
-  void *(*init_pmu_module) (void);
-  /* Start profililing.  */
-  void (*start_pmu_module) (pid_t ppid, char *tmpfile, const char **args);
-  /* Stop profililing.  */
-  void (*stop_pmu_module) (void);
-  /* How to parse the output generated by the PMU tool.  */
-  int (*parse_pmu_output) (char *filename, void *pmu_data);
-  /* How to write parsed pmu data into gcda file.  */
-  void (*gcov_write_pmu_data) (void *data);
-  /* How to cleanup any data structure created during parsing.  */
-  void (*cleanup_pmu_data) (void *data);
-  /* How to initialize symbolizer for the PPID.  */
-  int (*start_symbolizer) (pid_t ppid);
-

Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)

2012-08-07 Thread Richard Henderson

On 08/06/2012 05:35 PM, Lawrence Crowl wrote:
> +inline double_int &
> +double_int::operator ++ ()
> +{
> +  *this + double_int_one;
> +  return *this;
> +}
> +
> +inline double_int &
> +double_int::operator -- ()
> +{
> +  *this - double_int_one;
> +  return *this;
> +}

Surely unused results there?


r~

[v3] libstdc++/54005

2012-08-07 Thread Benjamin De Kosnik


Pretty minor change, as per PR. This version seems more appropriate for
templatized types.

I'll wait a bit before putting this on 4.7_branch

-benjamin

tested x86/linux

2012-08-07  Benjamin Kosnik  

	PR libstdc++/54005
	* include/std/atomic: Use __atomic_always_lock_free.
	* include/bits/atomic_base.h: Same.


diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
index 9d5f4eb..598e1f1 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -422,11 +422,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   bool
   is_lock_free() const noexcept
-  { return __atomic_is_lock_free (sizeof (_M_i), &_M_i); }
+  { return __atomic_always_lock_free(sizeof(_M_i), &_M_i); }
 
   bool
   is_lock_free() const volatile noexcept
-  { return __atomic_is_lock_free (sizeof (_M_i), &_M_i); }
+  { return __atomic_always_lock_free(sizeof(_M_i), &_M_i); }
 
   void
   store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
@@ -716,11 +716,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   bool
   is_lock_free() const noexcept
-  { return __atomic_is_lock_free(_M_type_size(1), &_M_p); }
+  { return __atomic_always_lock_free(_M_type_size(1), &_M_p); }
 
   bool
   is_lock_free() const volatile noexcept
-  { return __atomic_is_lock_free(_M_type_size(1), &_M_p); }
+  { return __atomic_always_lock_free(_M_type_size(1), &_M_p); }
 
   void
   store(__pointer_type __p,
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 6a08b28..b5ca606 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -184,11 +184,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   bool
   is_lock_free() const noexcept
-  { return __atomic_is_lock_free(sizeof(_M_i), &_M_i); }
+  { return __atomic_always_lock_free(sizeof(_M_i), &_M_i); }
 
   bool
   is_lock_free() const volatile noexcept
-  { return __atomic_is_lock_free(sizeof(_M_i), &_M_i); }
+  { return __atomic_always_lock_free(sizeof(_M_i), &_M_i); }
 
   void
   store(_Tp __i, memory_order _m = memory_order_seq_cst) noexcept

Re: Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)

2012-08-07 Thread Dimitrios Apostolou


On Tue, 7 Aug 2012, Ian Lance Taylor wrote:


On Tue, Aug 7, 2012 at 2:24 PM, Dimitrios Apostolou  wrote:


BTW I can't find why ELF_STRING_LIMIT is only 256, it seems GAS supports
arbitrary lengths. I'd have to change my code if we ever set it too high (or
even unlimited) since I allocate the buffer on the stack.


See the comment for ELF_STRING_LIMIT in config/elfos.h.  gas is not
the only ELF assembler.


I've seen it and thought that for non-GAS ELF platforms you should define 
it to override elfos.h, I obviously misunderstood. So now I'm curious, 
this limit is there dominating all platforms because of the worst 
assembler? :-)



Dimitris


P.S. there is an obvious typo in the comment, STRING_LIMIT should be 
ELF_STRING_LIMIT (I think I introduced the typo last year...)

Re: Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)

2012-08-07 Thread Ian Lance Taylor

On Tue, Aug 7, 2012 at 4:27 PM, Dimitrios Apostolou  wrote:
> On Tue, 7 Aug 2012, Ian Lance Taylor wrote:
>
>> On Tue, Aug 7, 2012 at 2:24 PM, Dimitrios Apostolou  wrote:
>>>
>>>
>>> BTW I can't find why ELF_STRING_LIMIT is only 256, it seems GAS supports
>>> arbitrary lengths. I'd have to change my code if we ever set it too high
>>> (or
>>> even unlimited) since I allocate the buffer on the stack.
>>
>>
>> See the comment for ELF_STRING_LIMIT in config/elfos.h.  gas is not
>> the only ELF assembler.
>
>
> I've seen it and thought that for non-GAS ELF platforms you should define it
> to override elfos.h, I obviously misunderstood.

I'm not aware of any assembler-specific configuration files, other
than broad differences like completely different i386 syntax.  I think
it would be best to avoid introducing such files when possible.

> So now I'm curious, this
> limit is there dominating all platforms because of the worst assembler? :-)

Yes, but it's not like the limit matters in practice.

Ian

[patch] Fix problems with -fdebug-types-section and local types

2012-08-07 Thread Cary Coutant

With --std=c++11, a template parameter can refer to a local type defined
within a function.  Because that local type doesn't qualify for its own
type unit, we copy it as an "unworthy" type into the type unit that refers
to it, but we copy too much, leading to a comdat type unit that contains a
DIE with subprogram definitions rather than declarations.  These DIEs may
have DW_AT_low_pc/high_pc or DW_AT_ranges attributes, and consequently can
refer to range list entries that don't get emitted because they're not
marked when the compile unit is scanned, eventually causing an undefined
symbol at link time.

In addition, while debugging this problem, I found that the
DW_AT_object_pointer attribute, when left in the skeletons that are left
behind in the compile unit, causes duplicate copies of the types to be
copied back into the compile unit.

This patch fixes these problems by removing the DW_AT_object_pointer
attribute from the skeleton left behind in the compile unit, and by
copying DW_TAG_subprogram DIEs as declarations when copying "unworthy"
types into a type unit.  In order to preserve information in the DIE
structure, I also added DW_AT_abstract_origin as an attribute that
should be copied when cloning a DIE as a declaration.

I also fixed the dwarf4-typedef.C test, which should be turning on
the -fdebug-types-section flag.

Bootstrapped and tested on x86. OK for trunk?

-cary


2012-08-07   Cary Coutant  

gcc/
* dwarf2out.c (clone_as_declaration): Copy DW_AT_abstract_origin
attribute.
(generate_skeleton_bottom_up): Remove DW_AT_object_pointer attribute
from original DIE.
(clone_tree_hash): Rename to ...
(clone_tree_partial): ... this; change callers.  Copy
DW_TAG_subprogram DIEs as declarations.

gcc/testsuite/
* testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C: New test case.
* testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C: Add
-fdebug-types-section flag.


Index: gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C
===
--- gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C   (revision 0)
+++ gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C   (revision 0)
@@ -0,0 +1,55 @@
+// { dg-do compile }
+// { dg-options "--std=c++11 -dA -gdwarf-4 -fdebug-types-section 
-fno-merge-debug-strings" }
+
+// Check that -fdebug-types-sections does not copy a full referenced type
+// into a type unit.
+
+// Checks that at least one type unit is generated.
+//
+// { dg-final { scan-assembler "DIE \\(.*\\) DW_TAG_type_unit" } }
+//
+// Check that func is declared exactly twice in the debug info:
+// once in the type unit for struct D, and once in the compile unit.
+//
+// { dg-final { scan-assembler-times "\\.ascii 
\"func0\"\[^\n\]*DW_AT_name" 2 } }
+//
+// Check to make sure that no type unit contains a DIE with DW_AT_low_pc
+// or DW_AT_ranges.  These patterns assume that the compile unit is always
+// emitted after all type units.
+//
+// { dg-final { scan-assembler-not "\\.quad.*DW_AT_low_pc.*DIE \\(.*\\) 
DW_TAG_compile_unit" } }
+// { dg-final { scan-assembler-not "\\.quad.*DW_AT_ranges.*DIE \\(.*\\) 
DW_TAG_compile_unit" } }
+
+struct A {
+  A();
+  virtual ~A();
+  virtual void foo();
+ private:
+  int data;
+};
+
+struct B {
+  B();
+  virtual ~B();
+};
+
+extern B* table[];
+
+struct D {
+  template 
+  T* get(int i)
+  {
+B*& cell = table[i];
+if (cell == 0)
+  cell = new T();
+return static_cast(cell);
+  }
+};
+
+void func(D* d)
+{
+  struct C : B {
+A a;
+  };
+  d->get(0)->a.foo();
+}
Index: gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C
===
--- gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C  (revision 190189)
+++ gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-gdwarf-4" } */
+/* { dg-options "-gdwarf-4 -fdebug-types-section" } */
 
 /* Regression test for an ICE in output_die when using -gdwarf-4.  */
 
Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 190189)
+++ gcc/dwarf2out.c (working copy)
@@ -6357,6 +6357,7 @@ clone_as_declaration (dw_die_ref die)
 
   switch (a->dw_attr)
 {
+case DW_AT_abstract_origin:
 case DW_AT_artificial:
 case DW_AT_containing_type:
 case DW_AT_external:
@@ -6489,6 +6490,12 @@ generate_skeleton_bottom_up (skeleton_ch
 dw_die_ref clone = clone_die (c);
 move_all_children (c, clone);
 
+/* If the original has a DW_AT_object_pointer attribute,
+   it would now point to a child DIE just moved to the
+   cloned tree, so we need to remove that attribute from
+   the original.  */
+remove_AT (c, DW_AT_object_pointer);
+
 replace_child (c, clone, prev);
 generate_skeleton_ancestor_tre

Re: [PATCH] Intrinsics for ADCX

2012-08-07 Thread Michael Zolotukhin

Hi,
Here is the patch with some obvious fixes. If there are no objections,
could anyone please check it in?

Changelog entry:
2012-08-08 Michael Zolotukhin 

* common/config/i386/i386-common.c (OPTION_MASK_ISA_ADX_SET): New.
(OPTION_MASK_ISA_ADX_UNSET): Likewise.
(ix86_handle_option): Handle madx option.
* config.gcc (i[34567]86-*-*): Add adxintrin.h.
(x86_64-*-*): Likewise.
* config/i386/adxintrin.h: New header.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect ADCX/ADOX
support.
* config/i386/i386-builtin-types.def
(UCHAR_FTYPE_UCHAR_UINT_UINT_PUNSIGNED): New function type.
(UCHAR_FTYPE_UCHAR_ULONGLONG_ULONGLONG_PULONGLONG): Likewise.
* config/i386/i386-c.c: Define __ADX__ if needed.
* config/i386/i386.c (ix86_target_string): Define -madx option.
(PTA_ADX): New.
(ix86_option_override_internal): Handle new option.
(ix86_valid_target_attribute_inner_p): Add OPT_madx.
(ix86_builtins): Add IX86_BUILTIN_ADDCARRYX32,
IX86_BUILTIN_ADDCARRYX64.
(ix86_init_mmx_sse_builtins): Define corresponding built-ins.
(ix86_expand_builtin): Handle these built-ins.
(ix86_expand_args_builtin): Handle new function types.
* config/i386/i386.h (TARGET_ADX): New.
* config/i386/i386.md (adcx3): New define_insn.
* config/i386/i386.opt (madx): New.
* config/i386/x86intrin.h: Include adxintrin.h.

testsuite/Changelog entry:
2012-08-08 Michael Zolotukhin 

* gcc.target/i386/adx-addcarryx32-1.c: New.
* gcc.target/i386/adx-addcarryx32-2.c: New.
* gcc.target/i386/adx-addcarryx64-1.c: New.
* gcc.target/i386/adx-addcarryx64-2.c: New.
* gcc.target/i386/adx-check.h: New.
* gcc.target/i386/i386.exp (check_effective_target_adx): New.
* gcc.target/i386/sse-12.c: Add -madx.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.


On 3 August 2012 17:51, Uros Bizjak  wrote:
> On Fri, Aug 3, 2012 at 3:24 PM, Michael Zolotukhin
>  wrote:
>> Hi,
>> I made a new version of the patch, where I tried to take into account
>> Uros' remarks - is it ok for trunk?
>>
>> Bootstrap and new tests are passing, testing is in progress.
>>
>> Changelog entry:
>> 2012-08-03 Michael Zolotukhin 
>>
>> * common/config/i386/i386-common.c (OPTION_MASK_ISA_ADX_SET): New.
>> (OPTION_MASK_ISA_ADX_UNSET): Likewise.
>> (ix86_handle_option): Handle madx option.
>> * config.gcc (i[34567]86-*-*): Add adxintrin.h.
>> (x86_64-*-*): Likewise.
>> * config/i386/adxintrin.h: New header.
>> * config/i386/driver-i386.c (host_detect_local_cpu): Detect ADCX/ADOX
>> support.
>> * config/i386/i386-builtin-types.def
>> (UCHAR_FTYPE_UCHAR_UINT_UINT_PINT): New function type.
>> (UCHAR_FTYPE_UCHAR_ULONGLONG_ULONGLONG_PINT): Likewise.
>> * config/i386/i386-c.c: Define __ADX__ if needed.
>> * config/i386/i386.c (ix86_target_string): Define -madx option.
>> (PTA_ADX): New.
>> (ix86_option_override_internal): Handle new option.
>> (ix86_valid_target_attribute_inner_p): Add OPT_madx.
>> (ix86_builtins): Add IX86_BUILTIN_ADDCARRYX32,
>> IX86_BUILTIN_ADDCARRYX64.
>> (ix86_init_mmx_sse_builtins): Define corresponding built-ins.
>> (ix86_expand_builtin): Handle these built-ins.
>> (ix86_expand_args_builtin): Handle new function types.
>> * config/i386/i386.h (TARGET_ADX): New.
>> * config/i386/i386.md (adcx3): New define_insn.
>> * config/i386/i386.opt (madx): New.
>> * config/i386/x86intrin.h: Include adxintrin.h.
>>
>> testsuite/Changelog entry:
>> 2012-08-03 Michael Zolotukhin 
>>
>> * gcc.target/i386/adx-addcarryx32-1.c: New.
>> * gcc.target/i386/adx-addcarryx32-2.c: New.
>> * gcc.target/i386/adx-addcarryx64-1.c: New.
>> * gcc.target/i386/adx-addcarryx64-2.c: New.
>> * gcc.target/i386/adx-check.h: New.
>> * gcc.target/i386/i386.exp (check_effective_target_adx): New.
>> * gcc.target/i386/sse-12.c: Add -madx.
>> * gcc.target/i386/sse-13.c: Ditto.
>> * gcc.target/i386/sse-14.c: Ditto.
>> * gcc.target/i386/sse-22.c: Ditto.
>> * gcc.target/i386/sse-23.c: Ditto.
>> * g++.dg/other/i386-2.C: Ditto.
>> * g++.dg/other/i386-3.C: Ditto.
>
> Just change this line:
>
> +  op4 = gen_rtx_REG (CCmode, FLAGS_REG);
>
> back to CCCmode.
>
> OK with this change.
>
> Thanks,
> Uros.



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


bdw-adx-4.gcc.patch
Description: Binary data

Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)

2012-08-07 Thread Richard Guenther

On Tue, Aug 7, 2012 at 2:35 AM, Lawrence Crowl  wrote:
> Convert double_int from a struct with function into a class with
> operators and methods.
>
> This patch adds the methods and operators.  In general functions of
> the form "double_int_whatever" become member functions "whatever" or,
> when possible, operators.
>
> Every attempt has been made to preserve the existing algorithms, even
> at the expense of some optimization opportunities.  Some examples:
>
>   The ext operation takes a value and returns a value.  However, that
>   return value is usually assigned to the original variable.  An
>   operation that modified a variable would be more efficient.

That's not always the case though and I think the interface should be
consistent with existing behavior to avoid errors creeping in during the
transition.

>   In some cases, an outer sign-specific function calls an inner
>   function with the sign as a parameter, which then decides which
>   implementation to do.  Decisions should not be artificially
>   introduced, and the implementation of each case should be exposed as
>   a separate routine.
>
>   The existing operations are implemented in terms of the new
>   operations, which necessarily adds a layer between the new code and
>   the existing users.  Once all files have migrated, this layer will
>   be removed.
>
>   There are several existing operations implemented in terms of even
>   older legacy operations.  This extra layer has not been removed.
>
> On occasion though, parameterized functions are often called
> with a constant argments.  To support static statement of intent,
> and potentially faster code in the future, there are several new
> unparameterized member functions.  Some examples:
>
>   Four routines now encode both 'arithmetic or logical' and 'right or
>   left' shift as part of the funciton name.
>
>   Four routines now encode both 'signed or unsigned' and 'less than or
>   greater than' as part of the function name.

For most parts overloads that take an (unsigned) HOST_WIDE_INT argument
would be nice, as well as the ability to say dbl + 1.

> -typedef struct
> +typedef struct double_int
>  {
> +public:
> +  /* Normally, we would define constructors to create instances.
> + Two things prevent us from doing so.
> + First, defining a constructor makes the class non-POD in C++03,
> + and we certainly want double_int to be a POD.
> + Second, the GCC conding conventions prefer explicit conversion,
> + and explicit conversion operators are not available until C++11.  */
> +
> +  static double_int make (unsigned HOST_WIDE_INT cst);
> +  static double_int make (HOST_WIDE_INT cst);
> +  static double_int make (unsigned int cst);
> +  static double_int make (int cst);

Did we somehow decide to not allow constructors?  It's odd to convert to
C++ and end up with static member functions resembling them ...

Also I believe the conversion above introduces possible migration errors.
Think of a previous

 HOST_WIDE_INT a;
 double_int d = uhwi_to_double_int (a);

if you write that now as

 HOST_WIDE_INT a;
 double_int d = double_int::make (a);

you get the effect of shwi_to_double_int.  Oops.  So as an intermediate
I'd like you _not_ to introduce the make () overloads.

Btw, if HOST_WIDE_INT == int the above won't even compile.

Richard.

> +  /* No copy assignment operator or destructor to keep the type a POD.  */
> +
> +  /* There are some special value-creation static member functions.  */
> +
> +  static double_int mask (unsigned prec);
> +  static double_int max_value (unsigned int prec, bool uns);
> +  static double_int min_value (unsigned int prec, bool uns);
> +
> +  /* The following functions are mutating operations.  */
> +
> +  double_int &operator ++(); // prefix
> +  double_int &operator --(); // prefix
> +  double_int &operator *= (double_int);
> +  double_int &operator += (double_int);
> +  double_int &operator -= (double_int);
> +
> +  /* The following functions are non-mutating operations.  */
> +
> +  /* Conversion functions.  */
> +
> +  HOST_WIDE_INT to_signed () const;
> +  unsigned HOST_WIDE_INT to_unsigned () const;
> +
> +  /* Conversion query functions.  */
> +
> +  bool fits_unsigned() const;
> +  bool fits_signed() const;
> +  bool fits (bool uns) const;
> +
> +  /* Attribute query functions.  */
> +
> +  int trailing_zeros () const;
> +  int popcount () const;
> +
> +  /* Arithmetic query operations.  */
> +
> +  bool multiple_of (double_int, bool, double_int *) const;
> +
> +  /* Arithmetic operation functions.  */
> +
> +  double_int set_bit (unsigned) const;
> +  double_int mul_with_sign (double_int, bool, int *) const;
> +
> +  double_int operator * (double_int b) const;
> +  double_int operator + (double_int b) const;
> +  double_int operator - (double_int b) const;
> +  double_int operator - () const;
> +  double_int operator ~ () const;
> +  double_int operator & (double_int b) const;
> +  double_int operator | (double_int b) const;
> +  d

Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

2012-08-07 Thread Richard Guenther

On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher  wrote:
> Hello,
>
> In the test case for PR54146, compute_global_livein allocates/frees a
> worklist for >400,000 basic blocks on each invocation. And it's called
> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of
> basic blocks ever on the work list was only ~6500. So the work list
> can be much smaller most of the time.
>
> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?

Index: tree-ssa-loop-manip.c
===
--- tree-ssa-loop-manip.c   (revision 190176)
+++ tree-ssa-loop-manip.c   (working copy)
@@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live
   basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
   bitmap_iterator bi;

-  if (is_gimple_reg (var))
-bitmap_clear_bit (livein, def_bb->index);
-  else
-bitmap_set_bit (livein, def_bb->index);
+  gcc_checking_assert (is_gimple_reg (var));
+  bitmap_clear_bit (livein, def_bb->index);

!is_gimple_reg is true for virtual operand PHIs ... and at least
find_uses_to_rename_stmt loops over all uses (so, I suppose given the
comments elsewhere about tracking vops can you fix that to
s/SSA_OP_ALL_USES/SSA_OP_USE/?

 void
-compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap
def_blocks ATTRIBUTE_UNUSED)
+compute_global_livein (bitmap livein, bitmap def_blocks)
 {
-  basic_block bb, *worklist, *tos;
   unsigned i;
   bitmap_iterator bi;
+  VEC (basic_block, heap) *worklist;

-  tos = worklist
-= (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + 1));
+  /* Normally the work list size is bounded by the number of basic
+ blocks in the largest loop.  We don't know this number, but we
+ can be fairly sure that it will be relatively small.  */
+  worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100));

I suppose if we really want to optimize this we can make worklist static
so we at most grow once.  (in case you want to optimize allocation overhead,
not memory used).  Other structures in the ssa rewriter have this kind of
lifetime, too (the per-SSA name info for example).

Ok with the first change, whatever you prefer on the
compute_global_livein thing.

Thanks,
Richard.

> Ciao!
> Steven

Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

2012-08-07 Thread Steven Bosscher

On Tue, Aug 7, 2012 at 10:31 AM, Richard Guenther
 wrote:
> On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher  wrote:
>> Hello,
>>
>> In the test case for PR54146, compute_global_livein allocates/frees a
>> worklist for >400,000 basic blocks on each invocation. And it's called
>> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of
>> basic blocks ever on the work list was only ~6500. So the work list
>> can be much smaller most of the time.
>>
>> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?
>
> Index: tree-ssa-loop-manip.c
> ===
> --- tree-ssa-loop-manip.c   (revision 190176)
> +++ tree-ssa-loop-manip.c   (working copy)
> @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live
>basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
>bitmap_iterator bi;
>
> -  if (is_gimple_reg (var))
> -bitmap_clear_bit (livein, def_bb->index);
> -  else
> -bitmap_set_bit (livein, def_bb->index);
> +  gcc_checking_assert (is_gimple_reg (var));
> +  bitmap_clear_bit (livein, def_bb->index);
>
> !is_gimple_reg is true for virtual operand PHIs ... and at least
> find_uses_to_rename_stmt loops over all uses (so, I suppose given the
> comments elsewhere about tracking vops can you fix that to
> s/SSA_OP_ALL_USES/SSA_OP_USE/?

I'll give that a try.


>  void
> -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap
> def_blocks ATTRIBUTE_UNUSED)
> +compute_global_livein (bitmap livein, bitmap def_blocks)
>  {
> -  basic_block bb, *worklist, *tos;
>unsigned i;
>bitmap_iterator bi;
> +  VEC (basic_block, heap) *worklist;
>
> -  tos = worklist
> -= (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + 
> 1));
> +  /* Normally the work list size is bounded by the number of basic
> + blocks in the largest loop.  We don't know this number, but we
> + can be fairly sure that it will be relatively small.  */
> +  worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100));
>
> I suppose if we really want to optimize this we can make worklist static
> so we at most grow once.  (in case you want to optimize allocation overhead,
> not memory used).  Other structures in the ssa rewriter have this kind of
> lifetime, too (the per-SSA name info for example).

I don't want this to be static. This shouldn't be a persistent piece of data.
But I actually did try a static work list (the full list, not the
VEC), and it made no difference at all on the timings.

All time is spent in the loop in compute_global_livein itself. There
are >400,000 basic blocks but the maximum loop depth is only 3. I've
tried out livein as a sparseset last night (allocated and filled in
add_exit_phis_var and re-used for every call to add_exit_phis_edge),
that reduces the time spent in compute_global_livein by a factor 2.5
for this test case (i.e. not the order-of-magnitude change I'm looking
for).

Why can't the normal SSA updater be used to rewrite into loop-closed SSA form?

Ciao!
Steven

Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

2012-08-07 Thread Richard Guenther

On Tue, Aug 7, 2012 at 11:03 AM, Steven Bosscher  wrote:
> On Tue, Aug 7, 2012 at 10:31 AM, Richard Guenther
>  wrote:
>> On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher  
>> wrote:
>>> Hello,
>>>
>>> In the test case for PR54146, compute_global_livein allocates/frees a
>>> worklist for >400,000 basic blocks on each invocation. And it's called
>>> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of
>>> basic blocks ever on the work list was only ~6500. So the work list
>>> can be much smaller most of the time.
>>>
>>> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?
>>
>> Index: tree-ssa-loop-manip.c
>> ===
>> --- tree-ssa-loop-manip.c   (revision 190176)
>> +++ tree-ssa-loop-manip.c   (working copy)
>> @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live
>>basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
>>bitmap_iterator bi;
>>
>> -  if (is_gimple_reg (var))
>> -bitmap_clear_bit (livein, def_bb->index);
>> -  else
>> -bitmap_set_bit (livein, def_bb->index);
>> +  gcc_checking_assert (is_gimple_reg (var));
>> +  bitmap_clear_bit (livein, def_bb->index);
>>
>> !is_gimple_reg is true for virtual operand PHIs ... and at least
>> find_uses_to_rename_stmt loops over all uses (so, I suppose given the
>> comments elsewhere about tracking vops can you fix that to
>> s/SSA_OP_ALL_USES/SSA_OP_USE/?
>
> I'll give that a try.
>
>
>>  void
>> -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap
>> def_blocks ATTRIBUTE_UNUSED)
>> +compute_global_livein (bitmap livein, bitmap def_blocks)
>>  {
>> -  basic_block bb, *worklist, *tos;
>>unsigned i;
>>bitmap_iterator bi;
>> +  VEC (basic_block, heap) *worklist;
>>
>> -  tos = worklist
>> -= (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + 
>> 1));
>> +  /* Normally the work list size is bounded by the number of basic
>> + blocks in the largest loop.  We don't know this number, but we
>> + can be fairly sure that it will be relatively small.  */
>> +  worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100));
>>
>> I suppose if we really want to optimize this we can make worklist static
>> so we at most grow once.  (in case you want to optimize allocation overhead,
>> not memory used).  Other structures in the ssa rewriter have this kind of
>> lifetime, too (the per-SSA name info for example).
>
> I don't want this to be static. This shouldn't be a persistent piece of data.
> But I actually did try a static work list (the full list, not the
> VEC), and it made no difference at all on the timings.

Ah, so the patch only reduces allocation but not compile-time itself.

> All time is spent in the loop in compute_global_livein itself. There
> are >400,000 basic blocks but the maximum loop depth is only 3. I've
> tried out livein as a sparseset last night (allocated and filled in
> add_exit_phis_var and re-used for every call to add_exit_phis_edge),
> that reduces the time spent in compute_global_livein by a factor 2.5
> for this test case (i.e. not the order-of-magnitude change I'm looking
> for).

Another optimization would be to do

@@ -440,13 +442,13 @@ compute_global_livein (bitmap livein ATT
  && ! bitmap_bit_p (def_blocks, pred_index)
  && bitmap_set_bit (livein, pred_index))
{
- *tos++ = pred;
+ VEC_safe_push (basic_block, heap, worklist, pred);

thus combine the bitmap_set_bit and bitmap_bit_p tests on livein.

> Why can't the normal SSA updater be used to rewrite into loop-closed SSA form?

It does not have a mode to trigger PHI inserts for loop-closed SSA
form.  I suppose
most of the time we should try harder and not call rewrite_into_loop_closed_ssa
so many times (eventually for the easy cases where we _do_ have an old->new
name mapping or symbols to rename the SSA updater should deal with this).

Richard.

> Ciao!
> Steven

Re: add strnlen to libiberty (was Re: Assembly output optimisations)

2012-08-07 Thread Hans-Peter Nilsson

On Mon, 6 Aug 2012, Ian Lance Taylor wrote:
> On Mon, Aug 6, 2012 at 10:44 PM, Dimitrios Apostolou  wrote:
> >
> > What else is missing to make this patch appropriate for libiberty? Should I
> > change the prolog in strnlen.c, since I only copied it intact from gnulib?
>
> We generally try to avoid straight GPL source code without runtime
> exception in libiberty, and I don't see a reason to bend that rule for
> this trivial function.

Just don't forget that libiberty isn't a target library anymore.
To wit, the (GCC) run-time exception is moot for that code, AFAIK.
Maybe enough reason to abandon that rule so its code can be
truly and freely shared between GNU projects.

brgds, H-P

Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

2012-08-07 Thread Steven Bosscher

On Tue, Aug 7, 2012 at 11:22 AM, Richard Guenther
 wrote:
> Another optimization would be to do
>
> @@ -440,13 +442,13 @@ compute_global_livein (bitmap livein ATT
>   && ! bitmap_bit_p (def_blocks, pred_index)
>   && bitmap_set_bit (livein, pred_index))
> {
> - *tos++ = pred;
> + VEC_safe_push (basic_block, heap, worklist, pred);
>
> thus combine the bitmap_set_bit and bitmap_bit_p tests on livein.

That's just a micro-optimization that doesn't help much, because after
the bitmap_bit_p in the if, the last accessed bitmap member is cached
and the bitmap_set_bit is almost free.

What is needed here, is a (much) smaller computed livein. In the end,
the whole set of ~140,000 livein blocks is pruned to just 3, namely
the loop exits...

Ciao!
Steven

Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

2012-08-07 Thread Richard Guenther

On Tue, Aug 7, 2012 at 11:45 AM, Steven Bosscher  wrote:
> On Tue, Aug 7, 2012 at 11:22 AM, Richard Guenther
>  wrote:
>> Another optimization would be to do
>>
>> @@ -440,13 +442,13 @@ compute_global_livein (bitmap livein ATT
>>   && ! bitmap_bit_p (def_blocks, pred_index)
>>   && bitmap_set_bit (livein, pred_index))
>> {
>> - *tos++ = pred;
>> + VEC_safe_push (basic_block, heap, worklist, pred);
>>
>> thus combine the bitmap_set_bit and bitmap_bit_p tests on livein.
>
> That's just a micro-optimization that doesn't help much, because after
> the bitmap_bit_p in the if, the last accessed bitmap member is cached
> and the bitmap_set_bit is almost free.
>
> What is needed here, is a (much) smaller computed livein. In the end,
> the whole set of ~140,000 livein blocks is pruned to just 3, namely
> the loop exits...

So I wonder why simply looping over all SSA defs in a loop body and checking
whether a use is outside of it is not enough to compute this information ...
(yes, we might end up creating too many loop closed PHIs, namely on all
exits rather than just on those that the name is live over, but ... at
least maybe
we can prune it with DOM info)

Richard.

> Ciao!
> Steven

Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

2012-08-07 Thread Steven Bosscher

On Tue, Aug 7, 2012 at 11:52 AM, Richard Guenther
 wrote:
> So I wonder why simply looping over all SSA defs in a loop body and checking
> whether a use is outside of it is not enough to compute this information ...
> (yes, we might end up creating too many loop closed PHIs, namely on all
> exits rather than just on those that the name is live over, but ... at
> least maybe we can prune it with DOM info)

I've been thinking the same thing, but I don't understand loop-closed
SSA form well enough to try and rewrite it along those lines.

Ciao!
Steven

[ping] Re: [patch v2] support for multiarch systems

2012-08-07 Thread Matthias Klose

ping?

re-attaching the updated patch with the fixed comment in genmultilib.

  Matthias

On 08.07.2012 20:48, Matthias Klose wrote:
> Please find attached v2 of the patch updated for trunk 20120706, x86 only, 
> tested on
> x86-linux-gnu, KFreeBSD and the Hurd.
> 
> I left in the comment about the multiarch names, but I'm fine to 
> change/discard
> it. It was first required by Joseph Myers, then not found necessary by Paolo
> Bonzini. The patch includes the changes suggested by Thomas Schwinge.
> 
> Ok for the trunk?
> 
>   Matthias
> 

2012-06-25  Matthias Klose  

* doc/invoke.texi: Document -print-multiarch.
* doc/install.texi: Document --enable-multiarch.
* doc/fragments.texi: Document MULTILIB_OSDIRNAMES, MULTIARCH_DIRNAME.
* configure.ac: Add --enable-multiarch option.
* configure.in: Regenerate.
* Makefile.in (s-mlib): Pass MULTIARCH_DIRNAME to genmultilib.
enable_multiarch, with_float: New macros.
if_multiarch: New macro, define in terms of enable_multiarch.
* genmultilib: Add new argument for the multiarch name.
* gcc.c (multiarch_dir): Define.
(for_each_path): Search for multiarch suffixes.
(driver_handle_option): Handle multiarch option.
(do_spec_1): Pass -imultiarch if defined.
(main): Print multiarch.
(set_multilib_dir): Separate multilib and multiarch names
from multilib_select.
(print_multilib_info): Ignore multiarch names in multilib_select.
* incpath.c (add_standard_paths): Search the multiarch include dirs.
* cppdeault.h (default_include): Document multiarch in multilib
member.
* cppdefault.c: [LOCAL_INCLUDE_DIR, STANDARD_INCLUDE_DIR] Add an
include directory for multiarch directories.
* common.opt: New options --print-multiarch and -imultilib.
* config.gcc  (tmake_file):
Include i386/t-linux.
 (tmake_file):
Include i386/t-kfreebsd.
 (tmake_file): Include i386/t-gnu.
* config/i386/t-linux64: Add multiarch names in
MULTILIB_OSDIRNAMES, define MULTIARCH_DIRNAME.
* config/i386/t-gnu: New file.
* config/i386/t-kfreebsd: Likewise.
* config/i386/t-linux: Likewise.

Index: configure.ac
===
--- configure.ac(revision 188931)
+++ configure.ac(working copy)
@@ -623,6 +623,21 @@
 [], [enable_multilib=yes])
 AC_SUBST(enable_multilib)

+# Determine whether or not multiarch is enabled.
+AC_ARG_ENABLE(multiarch,
+[AS_HELP_STRING([--enable-multiarch],
+   [enable support for multiarch paths])],
+[case "${withval}" in
+yes|no|auto-detect) enable_multiarch=$withval;;
+*) AC_MSG_ERROR(bad value ${withval} given for --enable-multiarch option) ;;
+esac], [enable_multiarch=auto-detect])
+AC_MSG_CHECKING(for multiarch configuration)
+AC_SUBST(enable_multiarch)
+AC_MSG_RESULT($enable_multiarch)
+
+# needed for setting the multiarch name on ARM
+AC_SUBST(with_float)
+
 # Enable __cxa_atexit for C++.
 AC_ARG_ENABLE(__cxa_atexit,
 [AS_HELP_STRING([--enable-__cxa_atexit], [enable __cxa_atexit for C++])],
Index: cppdefault.c
===
--- cppdefault.c(revision 188931)
+++ cppdefault.c(working copy)
@@ -63,6 +63,7 @@
 #endif
 #ifdef LOCAL_INCLUDE_DIR
 /* /usr/local/include comes before the fixincluded header files.  */
+{ LOCAL_INCLUDE_DIR, 0, 0, 1, 1, 2 },
 { LOCAL_INCLUDE_DIR, 0, 0, 1, 1, 0 },
 #endif
 #ifdef PREFIX_INCLUDE_DIR
@@ -90,6 +91,7 @@
 #endif
 #ifdef NATIVE_SYSTEM_HEADER_DIR
 /* /usr/include comes dead last.  */
+{ NATIVE_SYSTEM_HEADER_DIR, NATIVE_SYSTEM_HEADER_COMPONENT, 0, 0, 1, 2 },
 { NATIVE_SYSTEM_HEADER_DIR, NATIVE_SYSTEM_HEADER_COMPONENT, 0, 0, 1, 0 },
 #endif
 { 0, 0, 0, 0, 0, 0 }
Index: cppdefault.h
===
--- cppdefault.h(revision 188931)
+++ cppdefault.h(working copy)
@@ -43,9 +43,11 @@
   C++.  */
   const char add_sysroot;  /* FNAME should be prefixed by
   cpp_SYSROOT.  */
-  const char multilib; /* FNAME should have the multilib path
-  specified with -imultilib
-  appended.  */
+  const char multilib; /* FNAME should have appended
+  - the multilib path specified with -imultilib
+when 1 is passed,
+  - the multiarch path specified with
+-imultiarch, when 2 is passed.  */
 };

 extern const struct default_include cpp_include_defaults[];
Index: Makefile.in
===
--- Makefile.in (revision 188931)
+++ Makefile.in (working cop

56 matches

Mail list logo