[PATCH] Make gimple_phi_set_result adjust SSA_NAME_DEF_STMT
It's the only LHS setter that does not so and that results in the asymmetry of create_phi_node needing a SSA_NAME_DEF_STMT adjustment if you fed it an SSA name result as opposed to a VAR_DECL result in which case it will be already adjusted. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-08-07 Richard Guenther * gimple.h (gimple_phi_set_result): Adjust SSA_NAME_DEF_STMT. * tree-phinodes.c (make_phi_node): Allow a NULL var. * tree-into-ssa.c (insert_phi_nodes_for): Simplify. * tree-complex.c (update_phi_components): Likewise. * tree-ssa-loop-manip.c (create_iv): Likewise. (add_exit_phis_edge): Likewise. (split_loop_exit_edge): Likewise. (tree_transform_and_unroll_loop): Likewise. * value-prof.c (gimple_ic): Likewise. (gimple_stringop_fixed_value): Likewise. * tree-tailcall.c (tree_optimize_tail_calls_1): Likewise. * omp-low.c (expand_parallel_call): Likewise. (expand_omp_for_static_chunk): Likewise. (expand_omp_atomic_pipeline): Likewise. * tree-parloops.c (create_phi_for_local_result): Likewise. (transform_to_exit_first_loop): Likewise. * tree-vect-data-refs.c (vect_setup_realignment): Likewise. * graphite-scop-detection.c (canonicalize_loop_closed_ssa): Likewise. * tree-predcom.c (initialize_root_vars): Likewise. (initialize_root_vars_lm): Likewise. * sese.c (sese_add_exit_phis_edge): Likewise. * gimple-streamer-in.c (input_phi): Likewise. * tree-inline.c (copy_phis_for_bb): Likewise. * tree-ssa-phiprop.c (phiprop_insert_phi): Likewise. * tree-cfg.c (gimple_make_forwarder_block): Likewise. (gimple_duplicate_bb): Likewise. Index: gcc/tree-into-ssa.c === --- gcc/tree-into-ssa.c (revision 190197) +++ gcc/tree-into-ssa.c (working copy) @@ -1039,10 +1039,8 @@ insert_phi_nodes_for (tree var, bitmap p tree new_lhs; gcc_assert (update_p); - phi = create_phi_node (var, bb); - - new_lhs = duplicate_ssa_name (var, phi); - gimple_phi_set_result (phi, new_lhs); + new_lhs = duplicate_ssa_name (var, NULL); + phi = create_phi_node (new_lhs, bb); add_new_name_mapping (new_lhs, var); /* Add VAR to every argument slot of PHI. We need VAR in Index: gcc/tree-complex.c === --- gcc/tree-complex.c (revision 190197) +++ gcc/tree-complex.c (working copy) @@ -720,17 +720,11 @@ update_phi_components (basic_block bb) lr = get_component_ssa_name (gimple_phi_result (phi), false); if (TREE_CODE (lr) == SSA_NAME) - { - pr = create_phi_node (lr, bb); - SSA_NAME_DEF_STMT (lr) = pr; - } + pr = create_phi_node (lr, bb); li = get_component_ssa_name (gimple_phi_result (phi), true); if (TREE_CODE (li) == SSA_NAME) - { - pi = create_phi_node (li, bb); - SSA_NAME_DEF_STMT (li) = pi; - } + pi = create_phi_node (li, bb); for (i = 0, n = gimple_phi_num_args (phi); i < n; ++i) { Index: gcc/tree-ssa-loop-manip.c === --- gcc/tree-ssa-loop-manip.c (revision 190197) +++ gcc/tree-ssa-loop-manip.c (working copy) @@ -116,7 +116,6 @@ create_iv (tree base, tree step, tree va gsi_insert_seq_on_edge_immediate (pe, stmts); stmt = create_phi_node (vb, loop->header); - SSA_NAME_DEF_STMT (vb) = stmt; add_phi_arg (stmt, initial, loop_preheader_edge (loop), UNKNOWN_LOCATION); add_phi_arg (stmt, va, loop_latch_edge (loop), UNKNOWN_LOCATION); } @@ -144,9 +143,8 @@ add_exit_phis_edge (basic_block exit, tr if (!e) return; - phi = create_phi_node (use, exit); - create_new_def_for (gimple_phi_result (phi), phi, - gimple_phi_result_ptr (phi)); + phi = create_phi_node (NULL_TREE, exit); + create_new_def_for (use, phi, gimple_phi_result_ptr (phi)); FOR_EACH_EDGE (e, ei, exit->preds) add_phi_arg (phi, use, e, UNKNOWN_LOCATION); } @@ -499,7 +497,6 @@ split_loop_exit_edge (edge exit) of the SSA name out of the loop. */ new_name = duplicate_ssa_name (name, NULL); new_phi = create_phi_node (new_name, bb); - SSA_NAME_DEF_STMT (new_name) = new_phi; add_phi_arg (new_phi, name, exit, locus); SET_USE (op_p, new_name); } @@ -1012,7 +1009,6 @@ tree_transform_and_unroll_loop (struct l new_init = make_ssa_name (var, NULL); phi_rest = create_phi_node (new_init, rest); - SSA_NAME_DEF_STMT (new_init) = phi_rest; add_phi_arg (phi_rest, init, precond_edge, UNKNOWN_LOCATION); add_phi_arg (phi_rest, next, new_exit, UNKNOWN_LOCATION); Index: gcc/value-prof.c
Re: [patch] Use a smaller, dynamic worklist in compute_global_livein
On Tue, Aug 7, 2012 at 11:58 AM, Steven Bosscher wrote: > On Tue, Aug 7, 2012 at 11:52 AM, Richard Guenther > wrote: >> So I wonder why simply looping over all SSA defs in a loop body and checking >> whether a use is outside of it is not enough to compute this information ... >> (yes, we might end up creating too many loop closed PHIs, namely on all >> exits rather than just on those that the name is live over, but ... at >> least maybe we can prune it with DOM info) > > I've been thinking the same thing, but I don't understand loop-closed > SSA form well enough to try and rewrite it along those lines. I've put it on my TODO to look into. Richard. > Ciao! > Steven
[PATCH] Remove clobbers as we go into SSA
This makes sure clobbers of SSA names are never retained in the IL. First by verifying that in the gimple verifier, second by removing them when we go into SSA form. Third by catching those we have to get rid of when un-nesting - this avoids stuff like finally { D.1234 = CLOBBER; frame.i = D.1234; } and D.1234 being a candidate for a register. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-08-07 Richard Guenther * tree-into-ssa.c (rewrite_stmt): Remove clobbers for variables we rewrite into SSA form. (rewrite_enter_block): Adjust. * gimple-iterator.c (gsi_replace): Also allow replacement with a stmt without a lhs. * tree-ssa-live.c (remove_unused_locals): Remove code handling clobbers of SSA names. * tree-nested.c (convert_local_reference_stmt): Remove clobbers for variables we access through the local chain. * tree-cfg.c (verify_gimple_assign_single): Verify clobbers clobber full decls only. Index: gcc/tree-into-ssa.c === --- gcc/tree-into-ssa.c (revision 190199) +++ gcc/tree-into-ssa.c (working copy) @@ -1323,12 +1323,12 @@ rewrite_debug_stmt_uses (gimple stmt) definition of a variable when a new real or virtual definition is found. */ static void -rewrite_stmt (gimple_stmt_iterator si) +rewrite_stmt (gimple_stmt_iterator *si) { use_operand_p use_p; def_operand_p def_p; ssa_op_iter iter; - gimple stmt = gsi_stmt (si); + gimple stmt = gsi_stmt (*si); /* If mark_def_sites decided that we don't need to rewrite this statement, ignore it. */ @@ -1362,9 +1362,24 @@ rewrite_stmt (gimple_stmt_iterator si) FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_ALL_DEFS) { tree var = DEF_FROM_PTR (def_p); - tree name = make_ssa_name (var, stmt); + tree name; tree tracked_var; + gcc_assert (DECL_P (var)); + + if (gimple_clobber_p (stmt) + && is_gimple_reg (var)) + { + /* If we rewrite a DECL into SSA form then drop its + clobber stmts and replace uses with a new default def. */ + gcc_assert (TREE_CODE (var) == VAR_DECL + && !gimple_vdef (stmt)); + gsi_replace (si, gimple_build_nop (), true); + register_new_def (get_or_create_ssa_default_def (cfun, var), var); + break; + } + + name = make_ssa_name (var, stmt); SET_DEF (def_p, name); register_new_def (DEF_FROM_PTR (def_p), var); @@ -1372,7 +1387,7 @@ rewrite_stmt (gimple_stmt_iterator si) if (tracked_var) { gimple note = gimple_build_debug_bind (tracked_var, name, stmt); - gsi_insert_after (&si, note, GSI_SAME_STMT); + gsi_insert_after (si, note, GSI_SAME_STMT); } } } @@ -1439,7 +1454,7 @@ rewrite_enter_block (struct dom_walk_dat of a variable when a new real or virtual definition is found. */ if (TEST_BIT (interesting_blocks, bb->index)) for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) - rewrite_stmt (gsi); + rewrite_stmt (&gsi); /* Step 3. Visit all the successor blocks of BB looking for PHI nodes. For every PHI node found, add a new argument containing the current Index: gcc/gimple-iterator.c === --- gcc/gimple-iterator.c (revision 190198) +++ gcc/gimple-iterator.c (working copy) @@ -427,7 +427,7 @@ gsi_replace (gimple_stmt_iterator *gsi, if (stmt == orig_stmt) return; - gcc_assert (!gimple_has_lhs (orig_stmt) + gcc_assert (!gimple_has_lhs (orig_stmt) || !gimple_has_lhs (stmt) || gimple_get_lhs (orig_stmt) == gimple_get_lhs (stmt)); gimple_set_location (stmt, gimple_location (orig_stmt)); Index: gcc/tree-ssa-live.c === --- gcc/tree-ssa-live.c (revision 190198) +++ gcc/tree-ssa-live.c (working copy) @@ -773,9 +773,6 @@ remove_unused_locals (void) if (gimple_clobber_p (stmt)) { tree lhs = gimple_assign_lhs (stmt); - lhs = get_base_address (lhs); - if (TREE_CODE (lhs) == SSA_NAME) - lhs = SSA_NAME_VAR (lhs); if (TREE_CODE (lhs) == VAR_DECL && !is_used_p (lhs)) { unlink_stmt_vdef (stmt); Index: gcc/tree-nested.c === --- gcc/tree-nested.c (revision 190198) +++ gcc/tree-nested.c (working copy) @@ -1727,6 +1727,20 @@ convert_local_reference_stmt (gimple_stm *handled_ops_p = false; return NULL_TREE; +case GIMPLE_ASSIGN: + if (gimple_clobber_p (stmt)) + { + tree lhs = gimple_assign_lhs (stmt); + if (!use_pointer_in_
Re: libgo patch committed: Support NumCPU on more platforms
On 07/08/12 05:43, Ian Lance Taylor wrote: > This patch to libgo, from Shenghou Ma, adds support for NumCPU on > additional platforms: Solaris, Irix, *BSD, Darwin. Bootstrapped and ran > Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. > > Ian > > Wouldn't it be more useful on Linux to check the task's affinity settings? Then when a task is locked to a limited set of cores it won't overload those cores with threads. R. > foo.patch > > > diff -r 12a361bc53d0 libgo/Makefile.am > --- a/libgo/Makefile.am Mon Aug 06 21:39:47 2012 -0700 > +++ b/libgo/Makefile.am Mon Aug 06 21:40:21 2012 -0700 > @@ -390,6 +390,32 @@ > runtime_lock_files = runtime/lock_sema.c runtime/thread-sema.c > endif > > +if LIBGO_IS_LINUX > +runtime_getncpu_file = runtime/getncpu-linux.c > +else > +if LIBGO_IS_DARWIN > +runtime_getncpu_file = runtime/getncpu-bsd.c > +else > +if LIBGO_IS_IRIX > +runtime_getncpu_file = runtime/getncpu-irix.c > +else > +if LIBGO_IS_SOLARIS > +runtime_getncpu_file = runtime/getncpu-solaris.c > +else > +if LIBGO_IS_FREEBSD > +runtime_getncpu_file = runtime/getncpu-bsd.c > +else > +if LIBGO_IS_NETBSD > +runtime_getncpu_file = runtime/getncpu-bsd.c > +else > +runtime_getncpu_file = runtime/getncpu-none.c > +endif > +endif > +endif > +endif > +endif > +endif > + > runtime_files = \ > runtime/go-append.c \ > runtime/go-assert.c \ > @@ -481,7 +507,8 @@ > sema.c \ > sigqueue.c \ > string.c \ > - time.c > + time.c \ > + $(runtime_getncpu_file) > > goc2c.$(OBJEXT): runtime/goc2c.c > $(CC_FOR_BUILD) -c $(CFLAGS_FOR_BUILD) $< > diff -r 12a361bc53d0 libgo/runtime/getncpu-bsd.c > --- /dev/null Thu Jan 01 00:00:00 1970 + > +++ b/libgo/runtime/getncpu-bsd.c Mon Aug 06 21:40:21 2012 -0700 > @@ -0,0 +1,24 @@ > +// Copyright 2012 The Go Authors. All rights reserved. > +// Use of this source code is governed by a BSD-style > +// license that can be found in the LICENSE file. > + > +#include > +#include > + > +#include "runtime.h" > +#include "defs.h" > + > +int32 > +getproccount(void) > +{ > + int mib[2], out; > + size_t len; > + > + mib[0] = CTL_HW; > + mib[1] = HW_NCPU; > + len = sizeof(out); > + if(sysctl(mib, 2, &out, &len, NULL, 0) >= 0) > + return (int32)out; > + else > + return 0; > +} > diff -r 12a361bc53d0 libgo/runtime/getncpu-irix.c > --- /dev/null Thu Jan 01 00:00:00 1970 + > +++ b/libgo/runtime/getncpu-irix.cMon Aug 06 21:40:21 2012 -0700 > @@ -0,0 +1,16 @@ > +// Copyright 2012 The Go Authors. All rights reserved. > +// Use of this source code is governed by a BSD-style > +// license that can be found in the LICENSE file. > + > +#include > + > +#include "runtime.h" > +#include "defs.h" > + > +int32 > +getproccount(void) > +{ > + int32 n; > + n = (int32)sysconf(_SC_NPROC_ONLN); > + return n > 1 ? n : 1; > +} > diff -r 12a361bc53d0 libgo/runtime/getncpu-linux.c > --- /dev/null Thu Jan 01 00:00:00 1970 + > +++ b/libgo/runtime/getncpu-linux.c Mon Aug 06 21:40:21 2012 -0700 > @@ -0,0 +1,47 @@ > +// Copyright 2012 The Go Authors. All rights reserved. > +// Use of this source code is governed by a BSD-style > +// license that can be found in the LICENSE file. > + > +#include > +#include > +#include > +#include > + > +#include "runtime.h" > +#include "defs.h" > + > +#ifndef O_CLOEXEC > +#define O_CLOEXEC 0 > +#endif > + > +int32 > +getproccount(void) > +{ > + int32 fd, rd, cnt, cpustrlen; > + const char *cpustr; > + const byte *pos; > + byte *bufpos; > + byte buf[256]; > + > + fd = open("/proc/stat", O_RDONLY|O_CLOEXEC, 0); > + if(fd == -1) > + return 1; > + cnt = 0; > + bufpos = buf; > + cpustr = "\ncpu"; > + cpustrlen = strlen(cpustr); > + for(;;) { > + rd = read(fd, bufpos, sizeof(buf)-cpustrlen); > + if(rd == -1) > + break; > + bufpos[rd] = 0; > + for(pos=buf; (pos=(const byte*)strstr((const char*)pos, > cpustr)) != nil; cnt++, pos++) { > + } > + if(rd < cpustrlen) > + break; > + memmove(buf, bufpos+rd-cpustrlen+1, cpustrlen-1); > + bufpos = buf+cpustrlen-1; > + } > + close(fd); > + return cnt ? cnt : 1; > +} > diff -r 12a361bc53d0 libgo/runtime/getncpu-none.c > --- /dev/null Thu Jan 01 00:00:00 1970 + > +++ b/libgo/runtime/getncpu-none.cMon Aug 06 21:40:21 2012 -0700 > @@ -0,0 +1,12 @@ > +// Copyright 2012 The Go Authors. All rights reserved. > +// Use of this source code is governed by a BSD-style > +// license that can be found in the LICENSE file. > + > +#include "runtime.h" > +#include "defs.h" > + > +int32 > +getproccount(void) > +{ > + return 0; > +} > diff -r 12a361bc53d0 libgo/runtime/getncpu-solaris.c > --- /dev/null Thu Jan 01 00:00:00 1970 + > +++ b/libgo/runtime/getncpu-solaris.c Mon Aug 06 21:40:21 2012
Re: [PATCH] Set correct source location for deallocator calls
On Tue, Aug 7, 2012 at 12:07 AM, Dehao Chen wrote: > Ping... > > Richard, could you shed some lights on this? > > Thanks, > Dehao > > On Mon, Jul 30, 2012 at 8:29 PM, Dehao Chen wrote: >> Hi, >> >> This patch fixes the source location for automatically generated calls >> to deallocator. For example: >> >> 19 void foo(int i) >> 20 { >> 21 for (int j = 0; j < 10; j++) >> 22 { >> 23 t test; >> 24 test.foo(); >> 25 if (i + j) >> 26 { >> 27 test.bar(); >> 28 return; >> 29 } >> 30 } >> 31 return; >> 32 } >> >> The deallocator for "23 t test" is called in two places: Line 28 and >> line 30. However, gcc attributes both callsites to line 30. >> >> Bootstrapped and passed gcc regression tests. >> >> Is it ok for trunk? >> >> Thanks, >> Dehao >> >> gcc/ChangeLog >> >> 2012-07-31 Dehao Chen >> >> * tree-eh.c (goto_queue_node): New field. >> (record_in_goto_queue): New parameter. >> (record_in_goto_queue_label): New parameter. >> (lower_try_finally_copy): Update source location. >> >> gcc/testsuite/ChangeLog >> >> 2012-07-31 Dehao Chen >> >> * g++.dg/guality/deallocator.C: New test. >> >> Index: gcc/testsuite/g++.dg/guality/deallocator.C >> === >> --- gcc/testsuite/g++.dg/guality/deallocator.C (revision 0) >> +++ gcc/testsuite/g++.dg/guality/deallocator.C (revision 0) >> @@ -0,0 +1,33 @@ >> +// Test that debug info generated for auto-inserted deallocator is >> +// correctly attributed. >> +// This patch scans for the lineno directly from assembly, which may >> +// differ between different architectures. Because it mainly tests >> +// FE generated debug info, without losing generality, only x86 >> +// assembly is scanned in this test. >> +// { dg-do compile { target { i?86-*-* x86_64-*-* } } } >> +// { dg-options "-O2 -fno-exceptions -g" } >> + >> +struct t { >> + t (); >> + ~t (); >> + void foo(); >> + void bar(); >> +}; >> + >> +int bar(); >> + >> +void foo(int i) >> +{ >> + for (int j = 0; j < 10; j++) >> +{ >> + t test; >> + test.foo(); >> + if (i + j) >> + { >> + test.bar(); >> + return; >> + } >> +} >> + return; >> +} >> +// { dg-final { scan-assembler "1 28 0" } } >> Index: gcc/tree-eh.c >> === >> --- gcc/tree-eh.c (revision 189835) >> +++ gcc/tree-eh.c (working copy) >> @@ -321,6 +321,7 @@ >> struct goto_queue_node >> { >>treemple stmt; >> + enum gimple_code code; >>gimple_seq repl_stmt; >>gimple cont_stmt; >>int index; >> @@ -560,7 +561,8 @@ >> record_in_goto_queue (struct leh_tf_state *tf, >>treemple new_stmt, >>int index, >> - bool is_label) >> + bool is_label, >> + enum gimple_code code) >> { >>size_t active, size; >>struct goto_queue_node *q; >> @@ -583,6 +585,7 @@ >>memset (q, 0, sizeof (*q)); >>q->stmt = new_stmt; >>q->index = index; >> + q->code = code; >>q->is_label = is_label; >> } >> >> @@ -590,7 +593,8 @@ >> TF is not null. */ >> >> static void >> -record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree >> label) >> +record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree >> label, >> + enum gimple_code code) >> { >>int index; >>treemple temp, new_stmt; >> @@ -629,7 +633,7 @@ >> since with a GIMPLE_COND we have an easy access to the then/else >> labels. */ >>new_stmt = stmt; >> - record_in_goto_queue (tf, new_stmt, index, true); >> + record_in_goto_queue (tf, new_stmt, index, true, code); >> } >> >> /* For any GIMPLE_GOTO or GIMPLE_RETURN, decide whether it leaves a >> try_finally >> @@ -649,19 +653,22 @@ >> { >> case GIMPLE_COND: >>new_stmt.tp = gimple_op_ptr (stmt, 2); >> - record_in_goto_queue_label (tf, new_stmt, gimple_cond_true_label >> (stmt)); >> + record_in_goto_queue_label (tf, new_stmt, gimple_cond_true_label >> (stmt), >> + gimple_code (stmt)); >>new_stmt.tp = gimple_op_ptr (stmt, 3); >> - record_in_goto_queue_label (tf, new_stmt, >> gimple_cond_false_label (stmt)); >> + record_in_goto_queue_label (tf, new_stmt, gimple_cond_false_label >> (stmt), >> + gimple_code (stmt)); >>break; >> case GIMPLE_GOTO: >>new_stmt.g = stmt; >> - record_in_goto_queue_label (tf, new_stmt, gimple_goto_dest (stmt)); >> + record_in_goto_queue_label (tf, new_stmt, gimple_goto_dest (stmt), >> + gimple_code (stmt)); >>break; >> >> case GIMPLE_RETURN: >>tf->may_return = true; >>new_stmt.g = stmt; >> - record_in_goto_queue (tf, new_stmt,
Re: [google/main] Omit TARGET_LIB_PATH from RPATH_ENVVAR in HOST_EXPORTS on bootstrap builds (issue6441093)
On Thu, Aug 2, 2012 at 7:37 AM, Simon Baldwin wrote: > Omit TARGET_LIB_PATH from RPATH_ENVVAR in HOST_EXPORTS on bootstrap builds. > > Discussion and rationale at: http://gcc.gnu.org/ml/gcc/2012-06/msg00314.html > > For google/main. Tested for bootstrap and regression. > > 2012-08-02 Simon Baldwin > > * Makefile.tpl: Omit TARGET_LIB_PATH from RPATH_ENVVAR set in > HOST_EXPORTS on bootstrap builds. > * Makefile.in: Regenerate. This is OK. Does this belong in google/integration? Or will you be sending this for trunk? If you're not going to be sending this to trunk, please put it in google/integration. Diego.
Re: add strnlen to libiberty (was Re: Assembly output optimisations)
On Tue, Aug 7, 2012 at 2:30 AM, Hans-Peter Nilsson wrote: > > Just don't forget that libiberty isn't a target library anymore. > To wit, the (GCC) run-time exception is moot for that code, AFAIK. > Maybe enough reason to abandon that rule so its code can be > truly and freely shared between GNU projects. The libiberty licensing is certainly confused. I just don't want to make it worse. None of the code in libiberty is under the GCC Runtime Library Exception, so I think that particular issue does not apply. Ian
[PATCH][3/n] Allow anonymous SSA names, add copy_ssa_name
This adds copy_ssa_name similar to duplicate_ssa_name but not copying any annotations (thus, just make a new SSA name that looks similar to a template). This avoids a bunch of SSA_NAME_VAR uses which will no longer work to create SSA names off. The 2nd part is stuff I came along when working on the overall patch. We should release default defs of unused decls as they leak otherwise. Also may_propagate_copy_into_asm always returns true as DECL_HARD_REGISTER vars are never is_gimple_reg()s and thus never have SSA names associated. set_ssa_default_def should clear SSA_NAME_DEFAULT_DEF off the name it replaces, even if the replacement is NULL. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. I will commit it with two revisions separating the changes to allow easier bisection (the copy_ssa_name is a complete no-op). Richard. 2012-08-07 Richard Guenther * tree-flow.h (copy_ssa_name_fn): New function. (duplicate_ssa_name_fn): Likewise. * tree-flow-inline.h (copy_ssa_name): New function. (duplicate_ssa_name): Likewise. * tree-ssanames.c (copy_ssa_name_fn): New function. (duplicate_ssa_name): Rename to ... (duplicate_ssa_name_fn): ... this and adjust. * tree-tailcall.c (update_accumulator_with_ops): Use copy_ssa_name. * tree-vect-loop-manip.c (slpeel_update_phi_nodes_for_guard1): Likewise. (slpeel_update_phi_nodes_for_guard2): Likewise. (slpeel_tree_peel_loop_to_edge): Likewise. (vect_loop_versioning): Likewise. * tree-parloops.c (transform_to_exit_first_loop): Likewise. (create_parallel_loop): Likewise. * ipa-split.c (split_function): Likewise. * tree-vect-loop.c (vect_is_simple_reduction_1): Likewise. (vect_create_epilog_for_reduction): Likewise. * tree-vect-data-refs.c (bump_vector_ptr): Likewise. (vect_setup_realignment): Likewise. * tree-vect-stmts.c (vectorizable_load): Likewise. * tree-switch-conversion.c (build_one_array): Likewise. (gen_def_assigns): Likewise. * tree-cfg.c (gimple_make_forwarder_block): Likewise. * graphite-sese-to-poly.c (rewrite_close_phi_out_of_ssa): Call create_zero_dim_array with the SSA name. (rewrite_phi_out_of_ssa): Likewise. (rewrite_cross_bb_scalar_dependence): Likewise. Use copy_ssa_name. * tree-dfa.c (set_ssa_default_def): Clear the SSA_NAME_DEFAULT_DEF bit of the old name when we clear the slot. * tree-ssa-live.c (remove_unused_locals): Release any default def associated with an unused var. * tree-ssa-copy.c (may_propagate_copy_into_asm): Always return true. Index: gcc/tree-tailcall.c === --- gcc/tree-tailcall.c (revision 190199) +++ gcc/tree-tailcall.c (working copy) @@ -644,9 +644,9 @@ update_accumulator_with_ops (enum tree_c gimple_stmt_iterator gsi) { gimple stmt; - tree var; + tree var = copy_ssa_name (acc, NULL); if (types_compatible_p (TREE_TYPE (acc), TREE_TYPE (op1))) -stmt = gimple_build_assign_with_ops (code, SSA_NAME_VAR (acc), acc, op1); +stmt = gimple_build_assign_with_ops (code, var, acc, op1); else { tree rhs = fold_convert (TREE_TYPE (acc), @@ -656,11 +656,8 @@ update_accumulator_with_ops (enum tree_c op1)); rhs = force_gimple_operand_gsi (&gsi, rhs, false, NULL, false, GSI_CONTINUE_LINKING); - stmt = gimple_build_assign (NULL_TREE, rhs); + stmt = gimple_build_assign (var, rhs); } - var = make_ssa_name (SSA_NAME_VAR (acc), stmt); - gimple_assign_set_lhs (stmt, var); - update_stmt (stmt); gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); return var; } Index: gcc/tree-vect-loop-manip.c === --- gcc/tree-vect-loop-manip.c (revision 190199) +++ gcc/tree-vect-loop-manip.c (working copy) @@ -511,14 +511,15 @@ slpeel_update_phi_nodes_for_guard1 (edge gsi_next (&gsi_orig), gsi_next (&gsi_update)) { source_location loop_locus, guard_locus; + tree new_res; orig_phi = gsi_stmt (gsi_orig); update_phi = gsi_stmt (gsi_update); /** 1. Handle new-merge-point phis **/ /* 1.1. Generate new phi node in NEW_MERGE_BB: */ - new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)), - new_merge_bb); + new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL); + new_phi = create_phi_node (new_res, new_merge_bb); /* 1.2. NEW_MERGE_BB has two incoming edges: GUARD_EDGE and the exit-edge of LOOP. Set the two phi args in NEW_PHI for these edges: */ @@ -547,8 +548,8 @@ slpeel_update_phi_nodes_for_guard1 (edge continue; /* 2.1. Generate new phi node in NEW_EXIT_BB: */ -
Re: [ping] Re: [patch v2] support for multiarch systems
On Tue, Aug 7, 2012 at 3:31 AM, Matthias Klose wrote: > ping? > > On 08.07.2012 20:48, Matthias Klose wrote: >> Please find attached v2 of the patch updated for trunk 20120706, x86 only, >> tested on >> x86-linux-gnu, KFreeBSD and the Hurd. > +[case "${withval}" in > +yes|no|auto-detect) enable_multiarch=$withval;; > +*) AC_MSG_ERROR(bad value ${withval} given for --enable-multiarch option) ;; Please just use "auto", not "auto-detect", as in libgcc/configure.ac. > +# needed for setting the multiarch name on ARM > +AC_SUBST(with_float) This seems like it should be in a different patch. > +with_float = @with_float@ Likewise. +ifeq ($(enable_multiarch),yes) + if_multiarch = $(1) +else ifeq ($(enable_multiarch),auto-detect) + if_multiarch = $(if $(wildcard $(shell echo $(SYSTEM_HEADER_DIR))/../../usr/lib/*/crti.o),$(1)) +else + if_multiarch = +endif This is nicely tricky but I don't understand why you are doing this in such a confusing way. Why not simply set the names in config.gcc? What information do you have at make time that you don't have at configure time? The auto case is not general. When cross-compiling without --sysroot, or when using --native-system-header-dir, the test won't work. Without --sysroot there may not be much that you can do to auto-detect multiarch. And with --sysroot, there should be no need to use SYSTEM_HEADER_DIR at all; just look in the sysroot. > +@item --enable-multiarch > +Specify wether to enable or disable multiarch support. The default is > +to detect for glibc start files in a multiarch location, and enable it > +if the files are found. This is quite obscure for anybody who does not already know what multiarch is. I'm sorry, I have to repeat my extreme annoyance that this scheme breaks all earlier GCC releases. Ian
[RFA 4.7/4.6] Re: [PATCH v2] Target-specific limits on vector alignment
Richard Guenther wrote: > On Fri, Jul 27, 2012 at 5:24 PM, Ulrich Weigand wrote: > > ChangeLog: > > > > * target.def (vector_alignment): New target hook. > > * doc/tm.texi.in (TARGET_VECTOR_ALIGNMENT): Document new hook. > > * doc/tm.texi: Regenerate. > > * targhooks.c (default_vector_alignment): New function. > > * targhooks.h (default_vector_alignment): Add prototype. > > * stor-layout.c (layout_type): Use targetm.vector_alignment. > > * config/arm/arm.c (arm_vector_alignment): New function. > > (TARGET_VECTOR_ALIGNMENT): Define. > > > > * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use > > vector type alignment instead of size. > > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Use > > element type size directly instead of computing it from alignment. > > Fix variable naming and comment. > > > > testsuite/ChangeLog: > > > > * lib/target-supports.exp > > (check_effective_target_vect_natural_alignment): New function. > > * gcc.dg/align-2.c: Only run on targets with natural alignment > > of vector types. > > * gcc.dg/vect/slp-25.c: Adjust tests for targets without natural > > alignment of vector types. > > OK for mainline? > > Ok. Would it be OK to backport this to 4.7 and possibly 4.6? This patch represents a change in the ABI on ARM (the alignment could potentially affect struct member offsets, for example), which could conceivably cause incompatibilities with code compiled with older versions of GCC. We had originally decided we nevertheless want to implement this change on ARM, because: - GCC is now in compliance with the platform ABI document - the old code could actually be buggy since GCC *assumed* 16-byte alignment that wasn't actually guaranteed - code actually affected by this change ought to be rare (code using NEON vector types is rare in the first place, code using *structure members* of such types is even rarer, and code using such structures in cross-module APIs seems to be extremely rare) Nevertheless, given we do want to make this change, it would be best to roll it out as quickly as possible, to minimize the time period where people might use old (not yet fixed) compilers to generate non-ABI-compliant binaries. Thus, we think it would be good for the change to be applied to all still open release branches as well. (Note that while the patch contains changes to common code, those should be no-ops for all targets that do not implement the new hook.) Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [RFA 4.7/4.6] Re: [PATCH v2] Target-specific limits on vector alignment
On Tue, Aug 7, 2012 at 4:56 PM, Ulrich Weigand wrote: > Richard Guenther wrote: >> On Fri, Jul 27, 2012 at 5:24 PM, Ulrich Weigand wrote: >> > ChangeLog: >> > >> > * target.def (vector_alignment): New target hook. >> > * doc/tm.texi.in (TARGET_VECTOR_ALIGNMENT): Document new hook. >> > * doc/tm.texi: Regenerate. >> > * targhooks.c (default_vector_alignment): New function. >> > * targhooks.h (default_vector_alignment): Add prototype. >> > * stor-layout.c (layout_type): Use targetm.vector_alignment. >> > * config/arm/arm.c (arm_vector_alignment): New function. >> > (TARGET_VECTOR_ALIGNMENT): Define. >> > >> > * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use >> > vector type alignment instead of size. >> > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Use >> > element type size directly instead of computing it from alignment. >> > Fix variable naming and comment. >> > >> > testsuite/ChangeLog: >> > >> > * lib/target-supports.exp >> > (check_effective_target_vect_natural_alignment): New function. >> > * gcc.dg/align-2.c: Only run on targets with natural alignment >> > of vector types. >> > * gcc.dg/vect/slp-25.c: Adjust tests for targets without natural >> > alignment of vector types. > > >> > OK for mainline? >> >> Ok. > > Would it be OK to backport this to 4.7 and possibly 4.6? > > This patch represents a change in the ABI on ARM (the alignment could > potentially affect struct member offsets, for example), which could > conceivably cause incompatibilities with code compiled with older > versions of GCC. We had originally decided we nevertheless want to > implement this change on ARM, because: > > - GCC is now in compliance with the platform ABI document > - the old code could actually be buggy since GCC *assumed* 16-byte > alignment that wasn't actually guaranteed > - code actually affected by this change ought to be rare (code using > NEON vector types is rare in the first place, code using *structure > members* of such types is even rarer, and code using such structures > in cross-module APIs seems to be extremely rare) > > Nevertheless, given we do want to make this change, it would be best > to roll it out as quickly as possible, to minimize the time period > where people might use old (not yet fixed) compilers to generate > non-ABI-compliant binaries. Thus, we think it would be good for > the change to be applied to all still open release branches as well. > > (Note that while the patch contains changes to common code, those > should be no-ops for all targets that do not implement the new hook.) I'll defer the decision to the target maintainers. But please double-check for any changes in the vectorizer parts when backporting to 4.6. Thanks, Richard. > Bye, > Ulrich > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > ulrich.weig...@de.ibm.com >
Re: add strnlen to libiberty (was Re: Assembly output optimisations)
On Tue, 7 Aug 2012, Ian Lance Taylor wrote: > On Tue, Aug 7, 2012 at 2:30 AM, Hans-Peter Nilsson wrote: > > > > Just don't forget that libiberty isn't a target library anymore. > > To wit, the (GCC) run-time exception is moot for that code, AFAIK. > > Maybe enough reason to abandon that rule so its code can be > > truly and freely shared between GNU projects. > > The libiberty licensing is certainly confused. I just don't want to > make it worse. I think the natural way to sort it out is to move all the FSF-copyright files therein (including include/) to GPLv3, no license exception, except for cp-demangle.c (used in libstdc++-v3) and the headers it includes, which should have the GCC Runtime Library Exception notice. libiberty is a library for a limited number of GNU projects, all under GPLv3; as far as I know the only reason it hasn't been converted to GPLv3 is that noone got around to doing so. (gnulib also uses the practice of putting GPLv3 license notices on the files even if they are also available under other licenses, with separate metadata indicating other licenses under which files are available.) That wouldn't sort out the question of what "This file is part of" notices should be present, but would deal with the other license confusion. (Ideally I think most of libiberty would be replaced by use of gnulib in the projects using libiberty - I see no real advantage to the GNU Project in having those two separate libraries for substantially the same purposes - but that's a much larger and harder task, which would also involve, for each libiberty file replaced by a gnulib version, ascertaining whether there are any features or local changes in the libiberty version that should be merged into the gnulib version or any common upstream such as glibc. And some files in libiberty would probably need adding to gnulib as part of such a project.) -- Joseph S. Myers jos...@codesourcery.com
Re: [RFA 4.7/4.6] Re: [PATCH v2] Target-specific limits on vector alignment
On 07/08/12 16:04, Richard Guenther wrote: > On Tue, Aug 7, 2012 at 4:56 PM, Ulrich Weigand wrote: >> Richard Guenther wrote: >>> On Fri, Jul 27, 2012 at 5:24 PM, Ulrich Weigand wrote: ChangeLog: * target.def (vector_alignment): New target hook. * doc/tm.texi.in (TARGET_VECTOR_ALIGNMENT): Document new hook. * doc/tm.texi: Regenerate. * targhooks.c (default_vector_alignment): New function. * targhooks.h (default_vector_alignment): Add prototype. * stor-layout.c (layout_type): Use targetm.vector_alignment. * config/arm/arm.c (arm_vector_alignment): New function. (TARGET_VECTOR_ALIGNMENT): Define. * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use vector type alignment instead of size. * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Use element type size directly instead of computing it from alignment. Fix variable naming and comment. testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_vect_natural_alignment): New function. * gcc.dg/align-2.c: Only run on targets with natural alignment of vector types. * gcc.dg/vect/slp-25.c: Adjust tests for targets without natural alignment of vector types. >> >> OK for mainline? >>> >>> Ok. >> >> Would it be OK to backport this to 4.7 and possibly 4.6? >> >> This patch represents a change in the ABI on ARM (the alignment could >> potentially affect struct member offsets, for example), which could >> conceivably cause incompatibilities with code compiled with older >> versions of GCC. We had originally decided we nevertheless want to >> implement this change on ARM, because: >> >> - GCC is now in compliance with the platform ABI document >> - the old code could actually be buggy since GCC *assumed* 16-byte >> alignment that wasn't actually guaranteed >> - code actually affected by this change ought to be rare (code using >> NEON vector types is rare in the first place, code using *structure >> members* of such types is even rarer, and code using such structures >> in cross-module APIs seems to be extremely rare) >> >> Nevertheless, given we do want to make this change, it would be best >> to roll it out as quickly as possible, to minimize the time period >> where people might use old (not yet fixed) compilers to generate >> non-ABI-compliant binaries. Thus, we think it would be good for >> the change to be applied to all still open release branches as well. >> >> (Note that while the patch contains changes to common code, those >> should be no-ops for all targets that do not implement the new hook.) > > I'll defer the decision to the target maintainers. But please double-check > for any changes in the vectorizer parts when backporting to 4.6. > My inclination is to back-port this to all maintained branches (for consistency). However, it does need to be release-noted. R.
Re: [RFA 4.7/4.6] Re: [PATCH v2] Target-specific limits on vector alignment
>> (Note that while the patch contains changes to common code, those >> should be no-ops for all targets that do not implement the new hook.) > > I'll defer the decision to the target maintainers. I'd rather have this consistent across all maintained release branches today than to leave this for an ABI break in 4.8 timeframe. In addition I'd like this documented in changes.html for each of the release branches. Thanks, Ramana
Re: [PATCH] put exception tables for comdat functions in comdat, too
On 08/06/2012 05:45 PM, Sandra Loosemore wrote: > OK to commit the patch as originally posted? Yes. I mis-read skimming the part of that function about relocations. r~
Re: libgo patch committed: Support NumCPU on more platforms
On Tue, Aug 7, 2012 at 5:21 AM, Richard Earnshaw wrote: > > Wouldn't it be more useful on Linux to check the task's affinity > settings? Then when a task is locked to a limited set of cores it won't > overload those cores with threads. Good question. I'm not sure. This patch does not change the number of threads that the Go runtime will use, so your particular concern is not an issue. This patch changes the return value of the standard library function runtime.NumCPU. And if the number of processors is > 1, it tweaks some of the runtime code, e.g., to use a limited time spin lock (if the number of processors == 1, a spin lock is pointless, and the code moves straight to a futex). I guess that if the process is restricted to a single processor by affinity, then it might be better to be aware of that and skip the spin lock. But I'm not sure what the return value of runtime.NumCPU should be. It's documented as "NumCPU returns the number of logical CPUs on the local machine." Ian
Re: [PATCH v3] s390: Convert from sync to atomic optabs
Richard Henderson wrote: > [ This ought to be exactly the patch you bootstrapped. It does > not include the SEQ follow-up. ] > > Split out s390_two_part_insv from s390_expand_cs_hqi to try > harder to use bit insertion instructions in the CAS loop. > > Reorg s390_expand_insv to aid that. Try RISBG last, after other > mechanisms have failed; don't require operands in registers for > it but force them there instead. Try a limited form of ICM. This looks good to me. Retested with no regressions. Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: s390: Avoid CAS boolean output inefficiency
Richard Henderson wrote: > On 08/06/2012 11:34 AM, Ulrich Weigand wrote: > > There is one particular inefficiency I have noticed. This code: > > > > if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0)) > > abort (); > > > > from atomic-compare-exchange-3.c gets compiled into: > > > > l %r3,0(%r2) > > larl%r1,v > > cs %r3,%r4,0(%r1) > > ipm %r1 > > sra %r1,28 > > st %r3,0(%r2) > > ltr %r1,%r1 > > jne .L3 > > > > which is extremely inefficient; it converts the condition code into > > an integer using the slow ipm, sra sequence, just so that it can > > convert the integer back into a condition code via ltr and branch > > on it ... > > This was caused (or perhaps abetted by) the representation of EQ > as NE ^ 1. With the subsequent truncation and zero-extend, I > think combine reached its insn limit of 3 before seeing everything > it needed to see. Yes, combine isn't able to handle everything in one go, but it finds an intermediate insn. With the old __sync_compare_and_swap, it is then able to optimize everything away in a second step; the only reason this doesn't work any more is the intervening store. The following patch changes the builtin expander to pass a MEM oldval as-is to the back-end expander, so that the back-end can move the store to before the CC operation. With that patch I'm also seeing all the IPMs disappear. [ Well, all except for this one: > This happens because CSE notices the cbranch vs 0, and sets r116 > to zero along the path to > > 32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG, > __ATOMIC_RELEASE, > __ATOMIC_ACQUIRE)) > > at which point CSE decides that it would be cheaper to "re-use" > the zero already in r116 instead of load another constant 0 here. > After that, combine is ham-strung because r116 is not dead. As you point out, this does indeed fix this problem as well: > A short-term possibility is to have the CAS insns accept general_operand, > so that the 0 gets merged. ] What do you think about this solution? It has the advantage that we still get the same xor code if we actually do need the ipm ... Bye, Ulrich diff -ur gcc/builtins.c gcc.new/builtins.c --- gcc/builtins.c 2012-08-07 16:04:45.054348099 +0200 +++ gcc.new/builtins.c 2012-08-07 15:44:01.304349225 +0200 @@ -5376,6 +5376,7 @@ expect = expand_normal (CALL_EXPR_ARG (exp, 1)); expect = convert_memory_address (Pmode, expect); + expect = gen_rtx_MEM (mode, expect); desired = expand_expr_force_mode (CALL_EXPR_ARG (exp, 2), mode); weak = CALL_EXPR_ARG (exp, 3); @@ -5383,14 +5384,15 @@ if (host_integerp (weak, 0) && tree_low_cst (weak, 0) != 0) is_weak = true; - oldval = copy_to_reg (gen_rtx_MEM (mode, expect)); - + oldval = expect; if (!expand_atomic_compare_and_swap ((target == const0_rtx ? NULL : &target), &oldval, mem, oldval, desired, is_weak, success, failure)) return NULL_RTX; - emit_move_insn (gen_rtx_MEM (mode, expect), oldval); + if (oldval != expect) +emit_move_insn (expect, oldval); + return target; } diff -ur gcc/config/s390/s390.md gcc.new/config/s390/s390.md --- gcc/config/s390/s390.md 2012-08-07 16:04:54.204348621 +0200 +++ gcc.new/config/s390/s390.md 2012-08-07 16:00:21.934348628 +0200 @@ -8870,7 +8870,7 @@ (define_expand "atomic_compare_and_swap" [(match_operand:SI 0 "register_operand") ;; bool success output - (match_operand:DGPR 1 "register_operand") ;; oldval output + (match_operand:DGPR 1 "nonimmediate_operand");; oldval output (match_operand:DGPR 2 "memory_operand") ;; memory (match_operand:DGPR 3 "register_operand") ;; expected intput (match_operand:DGPR 4 "register_operand") ;; newval intput @@ -8879,9 +8879,17 @@ (match_operand:SI 7 "const_int_operand")] ;; failure model "" { - rtx cc, cmp; + rtx cc, cmp, output = operands[1]; + + if (!register_operand (output, mode)) +output = gen_reg_rtx (mode); + emit_insn (gen_atomic_compare_and_swap_internal -(operands[1], operands[2], operands[3], operands[4])); +(output, operands[2], operands[3], operands[4])); + + if (output != operands[1]) +emit_move_insn (operands[1], output); + cc = gen_rtx_REG (CCZ1mode, CC_REGNUM); cmp = gen_rtx_EQ (SImode, cc, const0_rtx); emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx)); -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [patch] Reduce memory overhead for build_insn_chain
On 08/07/2012 02:22 AM, Steven Bosscher wrote: Hello, In the test case for PR54146, build_insn_chain spends almost all its time in this loop: FOR_EACH_BB_REVERSE (bb) { bitmap_iterator bi; rtx insn; CLEAR_REG_SET (live_relevant_regs); --> memset (live_subregs_used, 0, max_regno * sizeof (int)); The test case has >400,000 basic blocks and >1,000,000 regs... Fortunately this can be done much more efficiently if one realizes that an sbitmap knows its own size (using SBITMAP_SIZE that I introduced recently). With that, live_subregs_used can be a sparse bitmap. Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk? Ok. Thanks for the patch, Steven.
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On Aug 6, 2012, at 5:35 PM, Lawrence Crowl wrote: > Convert double_int from a struct with function into a class with > operators and methods. We have a wide_int class that replaces this class. :-( It would have been better to just convert it. Do you guys have a timeframe for the cxx-conversion landing?
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On 12-08-07 13:22 , Mike Stump wrote: On Aug 6, 2012, at 5:35 PM, Lawrence Crowl wrote: Convert double_int from a struct with function into a class with operators and methods. We have a wide_int class that replaces this class. :-( It would have been better to just convert it. Do you guys have a timeframe for the cxx-conversion landing? Soon (for "soon" measured in days). I will try to send the merge patches this week. Diego.
[RFH / Patch] PR 54191
Hi, this is an update on C++/54191, where in a number of cases, having to do with inaccessible bases, we don't handle correctly access control under SFINAE. The attached draft patch p fixes a number of tests (and passes regression testing), where we are currently emitting inaccessible base diagnostics from lookup_base instead of properly handling the failure in SFINAE. As you can see in the draft, after a number of tries, I'm simply adding a tsubst_flags_t parameter to lookup_base: obviously something is redundant with base_access, but knowing that a returned error_mark_node in every case means that the base is either ambiguous or inaccessible (when we error out outside SFINAE) is *so* convenient! For comparison, the existing get_delta_difference_1 shows an alternate approach, quite contorted IMHO. But these are just implementation details... More interesting are the cases, commented out in the attached 54191.C, which we still get wrong also with the patch applied. In those cases we do *not* currently produce any inaccessible base diagnostics, we simply mishandle the SFINAE, apparently we don't perform any form of access control. I'm really looking for help about this set of 5 tests (4 braced casts + conditional operator) Thanks! Paolo. /// Index: typeck.c === --- typeck.c(revision 190207) +++ typeck.c(working copy) @@ -2246,8 +2246,8 @@ build_class_member_access_expr (tree object, tree tree binfo; base_kind kind; - binfo = lookup_base (access_path ? access_path : object_type, - member_scope, ba_unique, &kind); + binfo = lookup_base_sfinae (access_path ? access_path : object_type, + member_scope, ba_unique, &kind, complain); if (binfo == error_mark_node) return error_mark_node; @@ -2630,7 +2630,8 @@ finish_class_member_access_expr (tree object, tree } /* Find the base of OBJECT_TYPE corresponding to SCOPE. */ - access_path = lookup_base (object_type, scope, ba_check, NULL); + access_path = lookup_base_sfinae (object_type, scope, ba_check, + NULL, complain); if (access_path == error_mark_node) return error_mark_node; if (!access_path) @@ -3150,8 +3151,8 @@ get_member_function_from_ptrfunc (tree *instance_p if (!same_type_ignoring_top_level_qualifiers_p (basetype, TREE_TYPE (TREE_TYPE (instance_ptr { - basetype = lookup_base (TREE_TYPE (TREE_TYPE (instance_ptr)), - basetype, ba_check, NULL); + basetype = lookup_base_sfinae (TREE_TYPE (TREE_TYPE (instance_ptr)), +basetype, ba_check, NULL, complain); instance_ptr = build_base_path (PLUS_EXPR, instance_ptr, basetype, 1, complain); if (instance_ptr == error_mark_node) @@ -5995,9 +5996,9 @@ build_static_cast_1 (tree type, tree expr, bool c_ ambiguity. However, if this is a static_cast being performed because the user wrote a C-style cast, then accessibility is not considered. */ - base = lookup_base (TREE_TYPE (type), intype, - c_cast_p ? ba_unique : ba_check, - NULL); + base = lookup_base_sfinae (TREE_TYPE (type), intype, +c_cast_p ? ba_unique : ba_check, +NULL, complain); /* Convert from "B*" to "D*". This function will check that "B" is not a virtual base of "D". */ @@ -6119,9 +6120,9 @@ build_static_cast_1 (tree type, tree expr, bool c_ && check_for_casting_away_constness (intype, type, STATIC_CAST_EXPR, complain)) return error_mark_node; - base = lookup_base (TREE_TYPE (type), TREE_TYPE (intype), - c_cast_p ? ba_unique : ba_check, - NULL); + base = lookup_base_sfinae (TREE_TYPE (type), TREE_TYPE (intype), +c_cast_p ? ba_unique : ba_check, +NULL, complain); expr = build_base_path (MINUS_EXPR, expr, base, /*nonnull=*/false, complain); return cp_fold_convert(type, expr); @@ -7181,16 +7182,11 @@ get_delta_difference_1 (tree from, tree to, bool c { tree binfo; base_kind kind; - base_access access = c_cast_p ? ba_unique : ba_check; - /* Note: ba_quiet does not distinguish between access control and - ambiguity. */ - if (!(complain & tf_error)) -access |= ba_quiet; + binfo = lookup_base_sfinae (to, from, c_cast_p ? ba_unique : ba_check, + &kind, complain); - binfo = lookup_base (to, fr
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On 8/7/12, Richard Guenther wrote: > On Tue, Aug 7, 2012 at 2:35 AM, Lawrence Crowl wrote: > > Convert double_int from a struct with function into a class with > > operators and methods. > > > > This patch adds the methods and operators. In general functions of > > the form "double_int_whatever" become member functions "whatever" or, > > when possible, operators. > > > > Every attempt has been made to preserve the existing algorithms, even > > at the expense of some optimization opportunities. Some examples: > > > > The ext operation takes a value and returns a value. However, that > > return value is usually assigned to the original variable. An > > operation that modified a variable would be more efficient. > > That's not always the case though and I think the interface should be > consistent with existing behavior to avoid errors creeping in during the > transition. We should probably think about naming conventions for mutating operations, as I expect we will want them eventually. > > In some cases, an outer sign-specific function calls an inner > > function with the sign as a parameter, which then decides which > > implementation to do. Decisions should not be artificially > > introduced, and the implementation of each case should be exposed as > > a separate routine. > > > > The existing operations are implemented in terms of the new > > operations, which necessarily adds a layer between the new code and > > the existing users. Once all files have migrated, this layer will > > be removed. > > > > There are several existing operations implemented in terms of even > > older legacy operations. This extra layer has not been removed. > > > > On occasion though, parameterized functions are often called > > with a constant argments. To support static statement of intent, > > and potentially faster code in the future, there are several new > > unparameterized member functions. Some examples: > > > > Four routines now encode both 'arithmetic or logical' and 'right or > > left' shift as part of the funciton name. > > > > Four routines now encode both 'signed or unsigned' and 'less than or > > greater than' as part of the function name. > > For most parts overloads that take an (unsigned) HOST_WIDE_INT argument > would be nice, as well as the ability to say dbl + 1. Hm. There seems to be significant opinion that there should not be any implicit conversions. I am okay with operations as above, but would like to hear the opinions of others. > > -typedef struct > > +typedef struct double_int > > { > > +public: > > + /* Normally, we would define constructors to create instances. > > + Two things prevent us from doing so. > > + First, defining a constructor makes the class non-POD in C++03, > > + and we certainly want double_int to be a POD. > > + Second, the GCC conding conventions prefer explicit conversion, > > + and explicit conversion operators are not available until C++11. > > */ > > + > > + static double_int make (unsigned HOST_WIDE_INT cst); > > + static double_int make (HOST_WIDE_INT cst); > > + static double_int make (unsigned int cst); > > + static double_int make (int cst); > > Did we somehow decide to not allow constructors? It's odd to convert to > C++ and end up with static member functions resembling them ... Constructors are allowed, but PODs are often passed more efficiently. That property seemed particularly important for double_int. > Also I believe the conversion above introduces possible migration errors. > Think of a previous > > HOST_WIDE_INT a; > double_int d = uhwi_to_double_int (a); > > if you write that now as > > HOST_WIDE_INT a; > double_int d = double_int::make (a); > > you get the effect of shwi_to_double_int. Oops. Hm. I think the code was more likely to be wrong originally, but I take your point. > So as an intermediate > I'd like you _not_ to introduce the make () overloads. Okay. > Btw, if HOST_WIDE_INT == int the above won't even compile. Is that true of any host? I am not aware of any. Anyway, it is moot if we remove the overloads. -- Lawrence Crowl
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On 8/7/12, Mike Stump wrote: > On Aug 6, 2012, at 5:35 PM, Lawrence Crowl wrote: >> Convert double_int from a struct with function into a class with >> operators and methods. > > We have a wide_int class that replaces this class. :-( Really? Where? I don't see neither definition nor use under gcc. > It would have been better to just convert it. That depends on how easy it is to migrate. In the process I learned that double_int isn't actually a double int, but is a precision-controlled int. -- Lawrence Crowl
[patch] PR54150
Hello, This patch fixes PR54150. The patch was approved by Richi in Bugzilla, but I won't commit this for a few days so that others can speak up if they don't agree. Ciao! Steven PR54150.diff Description: Binary data
Re: [patch] PR54150
On Tue, Aug 7, 2012 at 1:54 PM, Steven Bosscher wrote: > Hello, > > This patch fixes PR54150. The patch was approved by Richi in Bugzilla, > but I won't commit this for a few days so that others can speak up if > they don't agree. > > Ciao! > Steven OK. Thanks for the cleanup! -- Gaby
Re: PATCH: PR rtl-optimization/54157: [x32] -maddress-mode=long failures
On Sun, Aug 5, 2012 at 12:47 AM, Richard Sandiford wrote: > For the record, I can't approve this, but... > > "H.J. Lu" writes: >> i386,md has >> >> (define_expand "extzv" >> [(set (match_operand:SI 0 "register_operand") >> (zero_extract:SI (match_operand 1 "ext_register_operand") >> (match_operand:SI 2 "const8_operand") >> (match_operand:SI 3 "const8_operand")))] >> "" >> >> and mode_for_extraction picks word_mode for operand 1 since >> its mode is VOIDmode. This patch changes mode_for_extraction >> to return the mode of operand 1 if the pattern accepts any mode. >> I added *jcc_btsi_mask_2 since combine now tries a different >> pattern, which leads to test failures on gcc.target/i386/bt-mask-1.c >> and gcc.target/i386/bt-mask-2. I didn't update *jcc_btsi_mask_1 >> instead since I am not sure if it is used elsewhere. Tested on >> Linux/x86-64 and Linux/x32. OK for trunk? > > the mode of the extraction operand is defined to be word_mode > for registers (see md.texi), so that at least would need to > be updated. But I'm not convinced that the wanted_inner_mode here: > >if (! in_dest && unsignedp > - && mode_for_extraction (EP_extzv, -1) != MAX_MACHINE_MODE) > + && mode_for_extraction (EP_extzv, -1, VOIDmode) != MAX_MACHINE_MODE) > { > - wanted_inner_reg_mode = mode_for_extraction (EP_extzv, 1); > - pos_mode = mode_for_extraction (EP_extzv, 3); > - extraction_mode = mode_for_extraction (EP_extzv, 0); > + wanted_inner_reg_mode = mode_for_extraction (EP_extzv, 1, > + inner_mode); > + pos_mode = mode_for_extraction (EP_extzv, 3, VOIDmode); > + extraction_mode = mode_for_extraction (EP_extzv, 0, VOIDmode); > } > > is right. inner_mode is the mode of the thing we're extracting, > which doesn't ncessarily have anything to do with what the ext* > patterns support. > > FWIW, in reply to your force_to_mode message, gen_lowpart_for_combine > looks a bit odd: > > if (omode == imode) > return x; > > /* Return identity if this is a CONST or symbolic reference. */ > if (omode == Pmode > && (GET_CODE (x) == CONST > || GET_CODE (x) == SYMBOL_REF > || GET_CODE (x) == LABEL_REF)) > return x; > > So if we know the modes are different, we nevertheless return the > original mode for CONST, SYMBOL_REF or LABEL_REF. Surely the caller > isn't going to be expecting that? If we're not prepared to change > the mode to the one that the caller asked for then I think we should > goto fail instead. > > I don't know if you're hitting that or not. > > Richard Yes, I did Breakpoint 5, gen_lowpart_for_combine (omode=DImode, x=0x71b03440) at /export/gnu/import/git/gcc-misc/gcc/combine.c:10778 10778 { (const:SI (plus:SI (symbol_ref:SI ("tmp2") ) (const_int 99452 [0x1847c]))) (gdb) c Continuing. Breakpoint 5, gen_lowpart_for_combine (omode=DImode, x=0x71b03440) at /export/gnu/import/git/gcc-misc/gcc/combine.c:10778 10778 { (const:SI (plus:SI (symbol_ref:SI ("tmp2") ) (const_int 99452 [0x1847c]))) (gdb) bt #0 gen_lowpart_for_combine (omode=DImode, x=0x71b03440) at /export/gnu/import/git/gcc-misc/gcc/combine.c:10778 #1 0x00c83805 in gen_lowpart_or_truncate (x=0x71b03440, mode=DImode) at /export/gnu/import/git/gcc-misc/gcc/combine.c:8110 #2 force_to_mode (x=0x71b02cd8, mode=DImode, mask=, just_select=) at /export/gnu/import/git/gcc-misc/gcc/combine.c:8389 #3 0x00c8458a in make_extraction (mode=SImode, inner=0x71b02cd8, pos=2, pos_rtx=, len=2, unsignedp=1, in_dest=, in_compare=0) at /export/gnu/import/git/gcc-misc/gcc/combine.c:7480 #4 0x00c86f9c in make_compound_operation (x=x@entry=0x71b02d68, in_code=in_code@entry=SET) at /export/gnu/import/git/gcc-misc/gcc/combine.c:7863 #5 0x00c89924 in simplify_set (x=0x71af7c78) at /export/gnu/import/git/gcc-misc/gcc/combine.c:6539 #6 combine_simplify_rtx (x=0x71af7c78, op0_mode=VOIDmode, in_dest=in_dest@entry=0, in_cond=in_cond@entry=0) at /export/gnu/import/git/gcc-misc/gcc/combine.c:5971 #7 0x00c8bd1b in subst (x=, from=, to=, in_dest=0, in_cond=0, unique_copy=0) at /export/gnu/import/git/gcc-misc/gcc/combine.c:5301 #8 0x00c8b8ab in subst (x=0x71aea870, from=0x71afc880, ---Type to continue, or q to quit--- to=0x71b02cd8, in_dest=0, in_cond=0, unique_copy=0) at /export/gnu/import/git/gcc-misc/gcc/combine.c:5165 #9 0x00c8f875 in try_combine (i3=i3@entry=0x71afe5a0, i2=i2@entry=0x71afe558, i1=, i0=, i0@entry=0x0, new_direct_jump_p=new_direct_jump_p@entry=0x7fffdb64, last_combined_insn=last_combined_insn@entry=0x71afe5a0) at /export/gnu/import/git/gcc-misc/gcc/combine.c:3260 #10 0x00c94673 in combine_instructions (nregs=, f=) at /export/gnu/import/
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On Aug 7, 2012, at 11:38 AM, Lawrence Crowl wrote: > Hm. There seems to be significant opinion that there should not be any > implicit conversions. I am okay with operations as above, but would like > to hear the opinions of others. If there is an agreed upon and expected semantic, having them are useful. In the wide-int world, which replaces double_int, I think there is an agreeable semantic and I think it is useful, so, I think we should plan on having them, though, I'd be fine with punting their implementation until such time as someone needs it. If no one every needs the routine, I don't see the harm in not implementing it. >> Did we somehow decide to not allow constructors? It's odd to convert to >> C++ and end up with static member functions resembling them ... > > Constructors are allowed, but PODs are often passed more efficiently. > That property seemed particularly important for double_int. Show us the difference in timing. Show us the generated code. I can't imagine that it could ever matter. I think a pretty abi is worth the cost.
Re: C++ PATCH for c++/48370 (extending lifetime of temps in aggregate initialization)
On Tue, Jun 12, 2012 at 10:00 PM, H.J. Lu wrote: > On Thu, Nov 3, 2011 at 8:50 PM, Jason Merrill wrote: >> 12.2 states that a temporary bound to a reference lives as long as the >> reference itself. We have done that for reference variables, but not in >> other cases, such as aggregate initialization of a struct with reference >> members. In C++11, elements of a std::initializer_list have the same >> semantics; they live as long as the std::initializer_list. Again, we were >> implementing that for initializer_list variables but not for >> initializer_list subobjects. This patch fixes that. >> >> Furthermore, if a temporary's lifetime is extended, we need to also extend >> the lifetimes of any temporaries bound to references in its initializer, and >> so on. >> >> The patch introduces a function extend_ref_init_temps called from >> store_init_value after the call to digest_init. To expose elements of an >> initializer_list to this function, I needed to stop using >> build_aggr_init_full_exprs for aggregate initialization of arrays, and >> consequently needed to call build_vec_init from store_init_value to use one >> EH region for cleaning up the whole array rather than one per element. To >> deal with multiple extended temporaries, we need to change the cleanup >> pointer from a single tree to a VEC, and add a discriminator to the mangled >> name of reference init temporaries; this has no ABI impact, since the >> temporaries have no linkage, but I also updated the demangler accordingly. >> >> Since we now do lifetime extension in extend_ref_init_temps, we can >> drastically simplify initialize_reference and do away with >> build_init_list_var_init. >> >> Tested x86_64-pc-linux-gnu, applying to trunk. > > This caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53650 > This also caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54197 -- H.J.
Re: [google/gcc-4_7] Backport patch to add line info to non-virtual thunks
Looks good to me for google-4_7 branches. -Rong On Mon, Aug 6, 2012 at 5:03 PM, Cary Coutant wrote: > This patch is for the google/gcc-4_7 branch. It backports the following > patch from trunk at r190190: > > http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00321.html > > GCC generates non-virtual thunks directly to assembly code, which > causes a couple of problems. First, it doesn't add source location > information to the thunk, so the thunk simply inherits the random > location from the end of the function in front of it (the function > it's a thunk for). In two different compilation units compiled > with different options, this could result in two different locations > for the same thunk, and gold will give a false positive ODR > violation warning. > > Second, if you try to compile with -S -dA, GCC crashes in final(), > where it's trying to build a mapping from insn to bb, because the > function has no cfg, and FOR_EACH_BB_REVERSE tries to dereference > cfun->cfg without checking for non-NULL. > > Bootstrapped and ran testsuite with no regressions. > > Google ref b/6891766. > > > 2012-08-06 Cary Coutant > > gcc/ > * cgraphunit.c (assemble_thunk): Add source line info. > * final.c (final): Check for non-null cfg pointer. > > gcc/testsuite/ > * g++.dg/debug/dwarf2/non-virtual-thunk.C: New test case. > > > Index: gcc/final.c > === > --- gcc/final.c (revision 190189) > +++ gcc/final.c (working copy) > @@ -1761,11 +1761,13 @@ final (rtx first, FILE *file, int optimi >start_to_bb = XCNEWVEC (basic_block, bb_map_size); >end_to_bb = XCNEWVEC (basic_block, bb_map_size); > > - FOR_EACH_BB_REVERSE (bb) > - { > - start_to_bb[INSN_UID (BB_HEAD (bb))] = bb; > - end_to_bb[INSN_UID (BB_END (bb))] = bb; > - } > + /* There is no cfg for a thunk. */ > + if (!cfun->is_thunk) > + FOR_EACH_BB_REVERSE (bb) > + { > + start_to_bb[INSN_UID (BB_HEAD (bb))] = bb; > + end_to_bb[INSN_UID (BB_END (bb))] = bb; > + } > } > >/* Output the insns. */ > Index: gcc/cgraphunit.c > === > --- gcc/cgraphunit.c(revision 190189) > +++ gcc/cgraphunit.c(working copy) > @@ -1799,6 +1799,10 @@ assemble_thunk (struct cgraph_node *node >init_function_start (thunk_fndecl); >cfun->is_thunk = 1; >assemble_start_function (thunk_fndecl, fnname); > + (*debug_hooks->source_line) (DECL_SOURCE_LINE (thunk_fndecl), > + DECL_SOURCE_FILE (thunk_fndecl), > + /* discriminator */ 0, > + /* is_stmt */ 1); > >targetm.asm_out.output_mi_thunk (asm_out_file, thunk_fndecl, >fixed_offset, virtual_value, alias); > Index: gcc/testsuite/g++.dg/debug/dwarf2/non-virtual-thunk.C > === > --- gcc/testsuite/g++.dg/debug/dwarf2/non-virtual-thunk.C (revision 0) > +++ gcc/testsuite/g++.dg/debug/dwarf2/non-virtual-thunk.C (revision 0) > @@ -0,0 +1,39 @@ > +// { dg-do compile } > +// { dg-options "-g2 -dA" } > + > +// Verify that line number info is output for the non-virtual > +// thunks for C::~C(). > +// { dg-final { scan-assembler "thunk.C:30" } } > + > +class A > +{ > + public: > + A(); > + virtual ~A(); > + private: > + int i; > +}; > + > +class B > +{ > + public: > + B(); > + virtual ~B(); > + private: > + int i; > +}; > + > +class C : public A, public B > +{ > + public: > + C(); > + virtual ~C(); // line 30 > +}; > + > +C::C() > +{ > +} > + > +C::~C() > +{ > +}
[PATCH][Cilkplus] Remove unwanted static chain.
Hello Everyone, This patch is for the Cilk Plus branch affecting mainly the C++ compiler. This patch will store the initial value of a loop correctly and remove the unnecessary static chain usage for some cases of Cilk_for. Thanks, Balaji V. Iyer. Index: parser.c === --- parser.c(revision 190195) +++ parser.c(working copy) @@ -28351,6 +28351,13 @@ FOR_EXPR (statement) = decl; CILK_FOR_GRAIN (statement) = grain; + /* If an initial value is available, and it is of type integer, then we + save it in CILK_FOR_INIT. */ + if (init && TREE_TYPE (init) && INTEGRAL_TYPE_P (TREE_TYPE (init))) +CILK_FOR_INIT (statement) = init; + else +CILK_FOR_INIT (statement) = NULL_TREE; + finish_cilk_for_init_stmt (statement); if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA)) Index: ChangeLog.cilk === --- ChangeLog.cilk (revision 190195) +++ ChangeLog.cilk (working copy) @@ -1,3 +1,7 @@ +2012-08-07 Balaji V. Iyer + + * parser.c (cp_parser_cilk_for): Stored the initial value in cilk_for tree. + 2012-08-01 Balaji V. Iyer * parser.c (cp_parser_userdef_char_literal): Replaced CALL_SPAWN and
RE: [PATCH][Cilkplus] Remove unwanted static chain.
Sorry, I was in the wrong directory when I was creating the patch. Here is the fixed patch. Thanks, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Tuesday, August 07, 2012 4:18 PM To: 'gcc-patches@gcc.gnu.org' Subject: [PATCH][Cilkplus] Remove unwanted static chain. Hello Everyone, This patch is for the Cilk Plus branch affecting mainly the C++ compiler. This patch will store the initial value of a loop correctly and remove the unnecessary static chain usage for some cases of Cilk_for. Thanks, Balaji V. Iyer. Index: gcc/cp/parser.c === --- gcc/cp/parser.c (revision 190195) +++ gcc/cp/parser.c (working copy) @@ -28351,6 +28351,13 @@ FOR_EXPR (statement) = decl; CILK_FOR_GRAIN (statement) = grain; + /* If an initial value is available, and it is of type integer, then we + save it in CILK_FOR_INIT. */ + if (init && TREE_TYPE (init) && INTEGRAL_TYPE_P (TREE_TYPE (init))) +CILK_FOR_INIT (statement) = init; + else +CILK_FOR_INIT (statement) = NULL_TREE; + finish_cilk_for_init_stmt (statement); if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA)) Index: gcc/cp/ChangeLog.cilk === --- gcc/cp/ChangeLog.cilk (revision 190195) +++ gcc/cp/ChangeLog.cilk (working copy) @@ -1,3 +1,7 @@ +2012-08-07 Balaji V. Iyer + + * parser.c (cp_parser_cilk_for): Stored the initial value in cilk_for tree. + 2012-08-01 Balaji V. Iyer * parser.c (cp_parser_userdef_char_literal): Replaced CALL_SPAWN and
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On Aug 7, 2012, at 11:42 AM, Lawrence Crowl wrote: > On 8/7/12, Mike Stump wrote: >> On Aug 6, 2012, at 5:35 PM, Lawrence Crowl wrote: >>> Convert double_int from a struct with function into a class with >>> operators and methods. >> >> We have a wide_int class that replaces this class. :-( > > Really? Where? I don't see neither definition nor use under gcc. It is being developed; it is largely done, but is going through final reviews now. When we finish with those review bits, we'll send it out. It sounds like you'll beat us to the merge window, so, we'll cope with the fallout. >> It would have been better to just convert it. > > That depends on how easy it is to migrate. In the process I > learned that double_int isn't actually a double int, but is a > precision-controlled int. Well, I'd call it a pair of HOST_WIDE_INTs... I think calling it a precision-controlled int is a tad optimistic. For example, if want to control it to be 2048 bits, you'd find that your hopes are misplaced. :-)
[PATCH, MIPS] fix MIPS16 hard-float function stub bugs
This patch fixes a group of bugs that were causing link errors on hard-float MIPS16 code built with a mips-linux-gnu toolchain. This is Mark Mitchell's analysis of the original problem: The MIPS16 instruction set cannot directly access hard-float registers, so helper functions in libgcc are used to access floating-point registers. In addition, the compiler handles functions which take floating-point values as arguments specially. It generates a MIPS16 version of the function that expects the floating-point arguments in general-purpose registers. But, since the hard-float ABI requires that floating-point arguments are passed in floating-point registers, the compiler also generates a MIPS32 stub which moves the floating-point values into general-purpose registers and then tail-calls the MIPS16 function. MIPS32 functions call the MIPS32 stub, while MIPS16 functions call the MIPS16 version of the function. The problem reported in the issue is an undefined symbol at link-time. The undefined symbol is *not* one of the libgcc helper functions. It is also *not* the name of a MIPS32->MIPS16 stub function. It's a third category: a local alias for the MIPS16 function. When generating a stub, the compiler also generates a local (i.e., not visible outside the object module) alias for the main function (not the stub). So, for a function "f" with floating-point arguments, we end up with: (a) "f" itself, a MIPS16 function, expecting floating-point arguments in general-purpose registers (b) "__fn_stub_f", a MIPS32 function, expecting floating-point arguments in floating-point registers. This function moves the arguments into general-purpose registers and then tail-calls "f". (c) "__fn_local_f", a local alias. The purpose of the alias is as follows. When making an indirect call (i.e., a call via a function-pointer) to a function which either takes floating-point arguments or returns a floating-point value from MIPS16 code, we must use a helper function in libgcc. The helper function moves values to/from the floating-point registers to conform to the normal MIPS32 ABI. But, if we're in MIPS16 code, and calling another MIPS16 function indirectly, there's no point to shuffle things in and out of floating-point registers. In that case, we can skip the helper function and just directly call the MIPS16 function. And, if we're calling from within the same object file, we don't want to use the name "f" -- that will cause the linker to generate extra goop that we don't need in the final executable. So, we optimize by calling via the local alias. But, herein lies the bug. We're trying to use the alias when we have a function that returns a floating-point value -- even though it doesn't have any floating-point arguments (see mips16_build_call_stub). But, we only generate stubs for functions that have floating-point arguments (see mips16_build_function_stub) -- not for functions that have floating-point return values. And we generate aliases as the same time that we generate stubs. So, we never generate the alias. So, the fix for this is to generate a local alias (but not a stub) for functions that return floating-point values, even if they do not have floating-point arguments. In addition to that, there were still some related problems that had to be fixed to get this to work right: * mips16_local_function_p, which is called from mips16_build_call_stub to detect the situations where we can optimize local calls, wasn't checking for weak or linkonce definitions, where the final copy of the function selected by the linker might come from a different object. * The stubs for comdat functions need to go in the same comdat group as the function. * The local stub symbol names should use the conventions for local symbol names. We've had these fixes in our local code base for some time, and I regression-tested on a mips-linux-gnu mainline build. OK to check in? -Sandra 2012-08-07 Mark Mitchell Maciej W. Rozycki Julian Brown gcc/ * config/mips/mips.c (mips16_local_function_p): Also reject weak and linkonce definitions. (mips16_local_alias): Use LOCAL_LABEL_PREFIX for the symbol generated. (mips16_build_function_stub): Split out the alias creation part. Declare stub once-only if target function is. (mips16_build_function_stub_and_local_alias): New function to do both things. (mips16_build_call_stub): Declare stub once-only if target function is. (mips_output_function_prologue): Also check for cases where a a local alias only is needed to handle a floating-point return value. gcc/testsuite/ * gcc.target/mips/mips.exp (mips_option_groups): Add interlink-mips16. * gcc.target/mips/mips16-fn-local.c: New test. Index: gcc/config/mips/mips.c === --- gcc/config
[PATCH, debug]: Fix PR 54177, Segfault in cselib_lookup due to NULL_RTX passed from vt_add_function_parameter
Hello! We can exit early from var_lowpart for matched modes. This avoids situation, where var_lowpart tries to change (gdb) p debug_rtx (incoming) (concat:SC (reg:SF 48 $f16 [ x ]) (reg:SF 49 $f17 [ x+4 ])) $1 = void to SCmode, and returning NULL_RTX, since the RTX is neither REG, neither MEM. 2012-08-07 Uros Bizjak PR debug/54177 * var-tracking.c (var_lowpart): Exit early for matched modes. Tested on x86_64-pc-linux-gnu {,-m32} and alphaev68-unknown-linux-gnu, where the patch also fixes testsuite failure. Patch was approved in the PR by Alexandre, and is committed to mainline SVN. Uros. Index: var-tracking.c === --- var-tracking.c (revision 190150) +++ var-tracking.c (working copy) @@ -5086,12 +5086,12 @@ var_lowpart (enum machine_mode mode, rtx loc) { unsigned int offset, reg_offset, regno; + if (GET_MODE (loc) == mode) +return loc; + if (!REG_P (loc) && !MEM_P (loc)) return NULL; - if (GET_MODE (loc) == mode) -return loc; - offset = byte_lowpart_offset (mode, GET_MODE (loc)); if (MEM_P (loc))
Re: Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)
I should mention that with my patch .ascii is used more aggresively than before, so if a string is longer than ELF_STRING_LIMIT it will be written as .ascii all of it, while in the past it would use .string for the string's tail. Example diff to original behaviour: .LASF15458: - .ascii "SSA_VAR_P(DECL) (TREE_CODE (DECL) == VA" - .string "R_DECL || TREE_CODE (DECL) == PARM_DECL || TREE_CODE (DECL) == RESULT_DECL || (TREE_CODE (DECL) == SSA_NAME && (TREE_CODE (SSA_NAME_VAR (DECL)) == VAR_DECL || TREE_CODE (SSA_NAME_VAR (DECL)) == PARM_DECL || TREE_CODE (SSA_NAME_VAR (DECL)) == RESULT_DECL)))" + .ascii "SSA_VAR_P(DECL) (TREE_CODE (DECL) == VAR_DECL || TREE_CODE (D" + .ascii "ECL) == PARM_DECL || TREE_CODE (DECL) == RESULT_DECL || (TREE" + .ascii "_CODE (DECL) == SSA_NAME && (TREE_CODE (SSA_NAME_VAR (DECL)) " + .ascii "== VAR_DECL || TREE_CODE (SSA_NAME_VAR (DECL)) == PARM_DECL |" + .ascii "| TREE_CODE (SSA_NAME_VAR (DECL)) == RESULT_DECL)))\000" BTW I can't find why ELF_STRING_LIMIT is only 256, it seems GAS supports arbitrary lengths. I'd have to change my code if we ever set it too high (or even unlimited) since I allocate the buffer on the stack. Dimitris
Re: Value type of map need not be default copyable
On Sat, Aug 4, 2012 at 10:34 AM, Paolo Carlini wrote: > > > First, I think we should add libstdc++ in CC. > > Thus, I would recommend people working in this area to begin with > unordered_map, because in that case emplace is already available, assuming > that's really the point (and therefore reverting the patch was a good idea, > luckily an existing testcase helped us) > > At the same time an implementation of emplace is definitely welcome, in > any case. I've attached a patch for unordered_map which solves the rvalue reference problem. For efficiency, I've created a new _M_emplace_bucket method rather than call emplace directly. I've verified all libstdc++ tests pass (sorry for the previous oversight) and am running the full GCC test suite now. However, I'd appreciate any feedback on whether this is a reasonable approach. STL hacking is way outside my comfort zone. ;-) If this looks good, I'll take a stab at std::map. Thanks, Ollie 2012-08-03 Ollie Wild * include/bits/hashtable.h (_M_emplace_bucket): New function. * include/bits/hashtable_policy.h (operator[](key_type&&)): Replace _M_insert_bucket call with _M_emplace_bucket. * testsuite/23_containers/unordered_map/operators/2.cc: New test. commit dd690a5f164326c552c2450af6270ec27e9bfd8e Author: Ollie Wild Date: Tue Aug 7 16:34:05 2012 -0500 2012-08-03 Ollie Wild * include/bits/hashtable.h (_M_emplace_bucket): New function. * include/bits/hashtable_policy.h (operator[](key_type&&)): Replace _M_insert_bucket call with _M_emplace_bucket. * testsuite/23_containers/unordered_map/operators/2.cc: New test. 2012-08-04 Paolo Carlini Revert: diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index 2faf0b3..869b0e9 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -588,6 +588,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION iterator _M_insert_bucket(_Arg&&, size_type, __hash_code); + template + iterator + _M_emplace_bucket(size_type, __hash_code, _Args&&... __args); + template std::pair @@ -1356,6 +1360,51 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } + // Insert v in bucket n (assumes no element with its key already present). + template +template + typename _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, + _H1, _H2, _Hash, _RehashPolicy, + _Traits>::iterator + _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, +_H1, _H2, _Hash, _RehashPolicy, _Traits>:: + _M_emplace_bucket(size_type __n, __hash_code __code, _Args&&... __args) + { + // First build the node to get access to the hash code + __node_type* __node = _M_allocate_node(std::forward<_Args>(__args)...); + __try + { + + const __rehash_state& __saved_state = _M_rehash_policy._M_state(); + std::pair __do_rehash + = _M_rehash_policy._M_need_rehash(_M_bucket_count, + _M_element_count, 1); + + if (__do_rehash.first) + { + const key_type& __k = this->_M_extract()(__node->_M_v); + __n = __hash_code_base::_M_bucket_index(__k, __code, + __do_rehash.second); + } + + this->_M_store_code(__node, __code); + if (__do_rehash.first) + _M_rehash(__do_rehash.second, __saved_state); + + _M_insert_bucket_begin(__n, __node); + ++_M_element_count; + return iterator(__node); + } + __catch(...) + { + _M_deallocate_node(__node); + __throw_exception_again; + } + } + // Insert v if no element with its key is already present. template_M_find_node(__n, __k, __code); if (!__p) - return __h->_M_insert_bucket(std::make_pair(std::move(__k), - mapped_type()), -__n, __code)->second; + return __h->_M_emplace_bucket(__n, __code, + std::move(__k), mapped_type())->second; return (__p->_M_v).second; } diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/operators/2.cc b/libstdc++-v3/testsuite/23_containers/unordered_map/operators/2.cc new file mode 100644 index 000..1940fa2 --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/unordered_map/operators/2.cc @@ -0,0 +1,44 @@ +// Copyright (C) 2012 +// Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your opti
Re: Value type of map need not be default copyable
Hi, (adding in CC Francois too) On 08/07/2012 11:43 PM, Ollie Wild wrote: On Sat, Aug 4, 2012 at 10:34 AM, Paolo Carlini wrote: First, I think we should add libstdc++ in CC. Thus, I would recommend people working in this area to begin with unordered_map, because in that case emplace is already available, assuming that's really the point (and therefore reverting the patch was a good idea, luckily an existing testcase helped us) At the same time an implementation of emplace is definitely welcome, in any case. I've attached a patch for unordered_map which solves the rvalue reference problem. For efficiency, I've created a new _M_emplace_bucket method rather than call emplace directly. Patch looks already pretty good to me. A couple of comments: - Did you consider whether _M_emplace_bucket could be used as an implementation detail of_M_emplace? Or figure out in any case a smart(er) refactoring to share code with the implementation of _M_emplace. I didn't study in detail your code, but I think we can avoid some redundancy. - For consistency at least, I think we should get rid of the make_pair in the other overload too. I've verified all libstdc++ tests pass (sorry for the previous oversight) and am running the full GCC test suite now. However, I'd appreciate any feedback on whether this is a reasonable approach. STL hacking is way outside my comfort zone. ;-) If this looks good, I'll take a stab at std::map. That would be great. We have a PR which remained open way too much time. The challenge is figuring out a way to avoid writing too much code similar to what we have already for the inserts. Honestly, I tried, a while ago, and failed, didn't see an easy way, but in any case, it's time to have this stuff implemented - I hear Marc ;) - one way or the other, for 4.8.0 Thanks, Paolo.
Re: s390: Avoid CAS boolean output inefficiency
On 08/07/2012 10:02 AM, Ulrich Weigand wrote: > The following patch changes the builtin expander to pass a MEM oldval > as-is to the back-end expander, so that the back-end can move the > store to before the CC operation. With that patch I'm also seeing > all the IPMs disappear. ... > What do you think about this solution? It has the advantage that > we still get the same xor code if we actually do need the ipm ... I'm ok with that patch. r~
Re: Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)
On Tue, Aug 7, 2012 at 2:24 PM, Dimitrios Apostolou wrote: > > BTW I can't find why ELF_STRING_LIMIT is only 256, it seems GAS supports > arbitrary lengths. I'd have to change my code if we ever set it too high (or > even unlimited) since I allocate the buffer on the stack. See the comment for ELF_STRING_LIMIT in config/elfos.h. gas is not the only ELF assembler. Ian
[google] Remove deprecated pfmon-specific functions/structs from pmu-profile.c (issue6442086)
Removes references in libgcov.c to functions and structs removed from pmu-profile.c For google/main. Tested with crosstools and bootstrap. 2012-08-07 Chris Manghane * libgcc/pmu-profile.c (enum pmu_tool_type): Remove pfmon-specific functions/structs. (enum pmu_event_type): Ditto. (enum pmu_state): Ditto. (enum cpu_vendor_signature): Ditto. (struct pmu_tool_info): Ditto. (void gcov_write_branch_mispredict_infos): Ditto. (get_x86cpu_vendor): Ditto. (parse_pmu_profile_options): Ditto. (start_addr2line_symbolizer): Ditto. (reset_symbolizer_parent_pipes): Ditto. (reset_symbolizer_child_pipes): Ditto. (end_addr2line_symbolizer): Ditto. (symbolize_addr2line): Ditto. (start_pfmon_module): Ditto. (convert_pct_to_unsigned): Ditto. (parse_load_latency_line): Ditto. (parse_branch_mispredict_line): Ditto. (parse_pfmon_load_latency): Ditto. (parse_pfmon_tool_header): Ditto. (parse_pfmon_branch_mispredicts): Ditto. (pmu_start): Ditto. (init_pmu_branch_mispredict): Ditto. (init_pmu_tool): Ditto. (__gcov_init_pmu_profiler): Ditto. (__gcov_start_pmu_profiler): Ditto. (__gcov_stop_pmu_profiler): Ditto. (gcov_write_branch_mispredict_line): Ditto. (gcov_write_load_latency_infos): Ditto. (gcov_write_branch_mispredict_infos): Ditto. (gcov_write_tool_header): Ditto. (__gcov_end_pmu_profiler): Ditto. * libgcc/libgcov.c (gcov_alloc_filename): Remove references to pfmon specific functions/structs. (pmu_profile_stop): Ditto. (gcov_exit): Ditto. (__gcov_init): Ditto. (__gcov_flush): Ditto. Index: libgcc/pmu-profile.c === --- libgcc/pmu-profile.c(revision 190135) +++ libgcc/pmu-profile.c(working copy) @@ -67,169 +67,11 @@ see the files COPYING3 and COPYING.RUNTIME respect #define XDELETEVEC(p) free(p) #define XDELETE(p) free(p) -#define PFMON_CMD "/usr/bin/pfmon" -#define ADDR2LINE_CMD "/usr/bin/addr2line" -#define PMU_TOOL_MAX_ARGS (20) -static char default_addr2line[] = "??:0"; -static const char pfmon_ll_header[] = "# counts %self%cum " -"<10 <32 <64<256 <1024 >=1024 %wself " -"code addr symbol\n"; -static const char pfmon_bm_header[] = -"# counts %self%cum code addr symbol\n"; - -const char *pfmon_intel_ll_args[PMU_TOOL_MAX_ARGS] = { - PFMON_CMD, - "--aggregate-results", - "--follow-all", - "--with-header", - "--smpl-module=pebs-ll", - "--ld-lat-threshold=4", - "--pebs-ll-dcmiss-code", - "--resolve-addresses", - "-emem_inst_retired:LATENCY_ABOVE_THRESHOLD", - "--long-smpl-periods=1", - 0 /* terminating NULL must be present */ -}; - -const char *pfmon_amd_ll_args[PMU_TOOL_MAX_ARGS] = { - PFMON_CMD, - "--aggregate-results", - "--follow-all", - "-uk", - "--with-header", - "--smpl-module=ibs", - "--resolve-addresses", - "-eibsop_event:uops", - "--ibs-dcmiss-code", - "--long-smpl-periods=0x0", - 0 /* terminating NULL must be present */ -}; - -const char *pfmon_intel_brm_args[PMU_TOOL_MAX_ARGS] = { - PFMON_CMD, - "--aggregate-results", - "--follow-all", - "--with-header", - "--resolve-addresses", - "-eMISPREDICTED_BRANCH_RETIRED", - "--long-smpl-periods=1", - 0 /* terminating NULL must be present */ -}; - -const char *pfmon_amd_brm_args[PMU_TOOL_MAX_ARGS] = { - PFMON_CMD, - "--aggregate-results", - "--follow-all", - "--with-header", - "--resolve-addresses", - "-eRETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS", - "--long-smpl-periods=1", - 0 /* terminating NULL must be present */ -}; - -const char *addr2line_args[PMU_TOOL_MAX_ARGS] = { - ADDR2LINE_CMD, - "-e", - 0 /* terminating NULL must be present */ -}; - - -enum pmu_tool_type -{ - PTT_PFMON, - PTT_LAST -}; - -enum pmu_event_type -{ - PET_INTEL_LOAD_LATENCY, - PET_AMD_LOAD_LATENCY, - PET_INTEL_BRANCH_MISPREDICT, - PET_AMD_BRANCH_MISPREDICT, - PET_LAST -}; - -typedef struct pmu_tool_fns { - const char *name; /* name of the pmu tool */ - /* pmu tool commandline argument. */ - const char **arg_array; - /* Initialize pmu module. */ - void *(*init_pmu_module) (void); - /* Start profililing. */ - void (*start_pmu_module) (pid_t ppid, char *tmpfile, const char **args); - /* Stop profililing. */ - void (*stop_pmu_module) (void); - /* How to parse the output generated by the PMU tool. */ - int (*parse_pmu_output) (char *filename, void *pmu_data); - /* How to write parsed pmu data into gcda file. */ - void (*gcov_write_pmu_data) (void *data); - /* How to cleanup any data structure created during parsing. */ - void (*cleanup_pmu_data) (void *data); - /* How to initialize symbolizer for the PPID. */ - int (*start_symbolizer) (pid_t ppid); -
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On 08/06/2012 05:35 PM, Lawrence Crowl wrote: > +inline double_int & > +double_int::operator ++ () > +{ > + *this + double_int_one; > + return *this; > +} > + > +inline double_int & > +double_int::operator -- () > +{ > + *this - double_int_one; > + return *this; > +} Surely unused results there? r~
[v3] libstdc++/54005
Pretty minor change, as per PR. This version seems more appropriate for templatized types. I'll wait a bit before putting this on 4.7_branch -benjamin tested x86/linux 2012-08-07 Benjamin Kosnik PR libstdc++/54005 * include/std/atomic: Use __atomic_always_lock_free. * include/bits/atomic_base.h: Same. diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h index 9d5f4eb..598e1f1 100644 --- a/libstdc++-v3/include/bits/atomic_base.h +++ b/libstdc++-v3/include/bits/atomic_base.h @@ -422,11 +422,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool is_lock_free() const noexcept - { return __atomic_is_lock_free (sizeof (_M_i), &_M_i); } + { return __atomic_always_lock_free(sizeof(_M_i), &_M_i); } bool is_lock_free() const volatile noexcept - { return __atomic_is_lock_free (sizeof (_M_i), &_M_i); } + { return __atomic_always_lock_free(sizeof(_M_i), &_M_i); } void store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept @@ -716,11 +716,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool is_lock_free() const noexcept - { return __atomic_is_lock_free(_M_type_size(1), &_M_p); } + { return __atomic_always_lock_free(_M_type_size(1), &_M_p); } bool is_lock_free() const volatile noexcept - { return __atomic_is_lock_free(_M_type_size(1), &_M_p); } + { return __atomic_always_lock_free(_M_type_size(1), &_M_p); } void store(__pointer_type __p, diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic index 6a08b28..b5ca606 100644 --- a/libstdc++-v3/include/std/atomic +++ b/libstdc++-v3/include/std/atomic @@ -184,11 +184,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool is_lock_free() const noexcept - { return __atomic_is_lock_free(sizeof(_M_i), &_M_i); } + { return __atomic_always_lock_free(sizeof(_M_i), &_M_i); } bool is_lock_free() const volatile noexcept - { return __atomic_is_lock_free(sizeof(_M_i), &_M_i); } + { return __atomic_always_lock_free(sizeof(_M_i), &_M_i); } void store(_Tp __i, memory_order _m = memory_order_seq_cst) noexcept
Re: Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)
On Tue, 7 Aug 2012, Ian Lance Taylor wrote: On Tue, Aug 7, 2012 at 2:24 PM, Dimitrios Apostolou wrote: BTW I can't find why ELF_STRING_LIMIT is only 256, it seems GAS supports arbitrary lengths. I'd have to change my code if we ever set it too high (or even unlimited) since I allocate the buffer on the stack. See the comment for ELF_STRING_LIMIT in config/elfos.h. gas is not the only ELF assembler. I've seen it and thought that for non-GAS ELF platforms you should define it to override elfos.h, I obviously misunderstood. So now I'm curious, this limit is there dominating all platforms because of the worst assembler? :-) Dimitris P.S. there is an obvious typo in the comment, STRING_LIMIT should be ELF_STRING_LIMIT (I think I introduced the typo last year...)
Re: Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)
On Tue, Aug 7, 2012 at 4:27 PM, Dimitrios Apostolou wrote: > On Tue, 7 Aug 2012, Ian Lance Taylor wrote: > >> On Tue, Aug 7, 2012 at 2:24 PM, Dimitrios Apostolou wrote: >>> >>> >>> BTW I can't find why ELF_STRING_LIMIT is only 256, it seems GAS supports >>> arbitrary lengths. I'd have to change my code if we ever set it too high >>> (or >>> even unlimited) since I allocate the buffer on the stack. >> >> >> See the comment for ELF_STRING_LIMIT in config/elfos.h. gas is not >> the only ELF assembler. > > > I've seen it and thought that for non-GAS ELF platforms you should define it > to override elfos.h, I obviously misunderstood. I'm not aware of any assembler-specific configuration files, other than broad differences like completely different i386 syntax. I think it would be best to avoid introducing such files when possible. > So now I'm curious, this > limit is there dominating all platforms because of the worst assembler? :-) Yes, but it's not like the limit matters in practice. Ian
[patch] Fix problems with -fdebug-types-section and local types
With --std=c++11, a template parameter can refer to a local type defined within a function. Because that local type doesn't qualify for its own type unit, we copy it as an "unworthy" type into the type unit that refers to it, but we copy too much, leading to a comdat type unit that contains a DIE with subprogram definitions rather than declarations. These DIEs may have DW_AT_low_pc/high_pc or DW_AT_ranges attributes, and consequently can refer to range list entries that don't get emitted because they're not marked when the compile unit is scanned, eventually causing an undefined symbol at link time. In addition, while debugging this problem, I found that the DW_AT_object_pointer attribute, when left in the skeletons that are left behind in the compile unit, causes duplicate copies of the types to be copied back into the compile unit. This patch fixes these problems by removing the DW_AT_object_pointer attribute from the skeleton left behind in the compile unit, and by copying DW_TAG_subprogram DIEs as declarations when copying "unworthy" types into a type unit. In order to preserve information in the DIE structure, I also added DW_AT_abstract_origin as an attribute that should be copied when cloning a DIE as a declaration. I also fixed the dwarf4-typedef.C test, which should be turning on the -fdebug-types-section flag. Bootstrapped and tested on x86. OK for trunk? -cary 2012-08-07 Cary Coutant gcc/ * dwarf2out.c (clone_as_declaration): Copy DW_AT_abstract_origin attribute. (generate_skeleton_bottom_up): Remove DW_AT_object_pointer attribute from original DIE. (clone_tree_hash): Rename to ... (clone_tree_partial): ... this; change callers. Copy DW_TAG_subprogram DIEs as declarations. gcc/testsuite/ * testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C: New test case. * testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C: Add -fdebug-types-section flag. Index: gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C === --- gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C (revision 0) +++ gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C (revision 0) @@ -0,0 +1,55 @@ +// { dg-do compile } +// { dg-options "--std=c++11 -dA -gdwarf-4 -fdebug-types-section -fno-merge-debug-strings" } + +// Check that -fdebug-types-sections does not copy a full referenced type +// into a type unit. + +// Checks that at least one type unit is generated. +// +// { dg-final { scan-assembler "DIE \\(.*\\) DW_TAG_type_unit" } } +// +// Check that func is declared exactly twice in the debug info: +// once in the type unit for struct D, and once in the compile unit. +// +// { dg-final { scan-assembler-times "\\.ascii \"func0\"\[^\n\]*DW_AT_name" 2 } } +// +// Check to make sure that no type unit contains a DIE with DW_AT_low_pc +// or DW_AT_ranges. These patterns assume that the compile unit is always +// emitted after all type units. +// +// { dg-final { scan-assembler-not "\\.quad.*DW_AT_low_pc.*DIE \\(.*\\) DW_TAG_compile_unit" } } +// { dg-final { scan-assembler-not "\\.quad.*DW_AT_ranges.*DIE \\(.*\\) DW_TAG_compile_unit" } } + +struct A { + A(); + virtual ~A(); + virtual void foo(); + private: + int data; +}; + +struct B { + B(); + virtual ~B(); +}; + +extern B* table[]; + +struct D { + template + T* get(int i) + { +B*& cell = table[i]; +if (cell == 0) + cell = new T(); +return static_cast(cell); + } +}; + +void func(D* d) +{ + struct C : B { +A a; + }; + d->get(0)->a.foo(); +} Index: gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C === --- gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C (revision 190189) +++ gcc/testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-gdwarf-4" } */ +/* { dg-options "-gdwarf-4 -fdebug-types-section" } */ /* Regression test for an ICE in output_die when using -gdwarf-4. */ Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c (revision 190189) +++ gcc/dwarf2out.c (working copy) @@ -6357,6 +6357,7 @@ clone_as_declaration (dw_die_ref die) switch (a->dw_attr) { +case DW_AT_abstract_origin: case DW_AT_artificial: case DW_AT_containing_type: case DW_AT_external: @@ -6489,6 +6490,12 @@ generate_skeleton_bottom_up (skeleton_ch dw_die_ref clone = clone_die (c); move_all_children (c, clone); +/* If the original has a DW_AT_object_pointer attribute, + it would now point to a child DIE just moved to the + cloned tree, so we need to remove that attribute from + the original. */ +remove_AT (c, DW_AT_object_pointer); + replace_child (c, clone, prev); generate_skeleton_ancestor_tre
Re: [PATCH] Intrinsics for ADCX
Hi, Here is the patch with some obvious fixes. If there are no objections, could anyone please check it in? Changelog entry: 2012-08-08 Michael Zolotukhin * common/config/i386/i386-common.c (OPTION_MASK_ISA_ADX_SET): New. (OPTION_MASK_ISA_ADX_UNSET): Likewise. (ix86_handle_option): Handle madx option. * config.gcc (i[34567]86-*-*): Add adxintrin.h. (x86_64-*-*): Likewise. * config/i386/adxintrin.h: New header. * config/i386/driver-i386.c (host_detect_local_cpu): Detect ADCX/ADOX support. * config/i386/i386-builtin-types.def (UCHAR_FTYPE_UCHAR_UINT_UINT_PUNSIGNED): New function type. (UCHAR_FTYPE_UCHAR_ULONGLONG_ULONGLONG_PULONGLONG): Likewise. * config/i386/i386-c.c: Define __ADX__ if needed. * config/i386/i386.c (ix86_target_string): Define -madx option. (PTA_ADX): New. (ix86_option_override_internal): Handle new option. (ix86_valid_target_attribute_inner_p): Add OPT_madx. (ix86_builtins): Add IX86_BUILTIN_ADDCARRYX32, IX86_BUILTIN_ADDCARRYX64. (ix86_init_mmx_sse_builtins): Define corresponding built-ins. (ix86_expand_builtin): Handle these built-ins. (ix86_expand_args_builtin): Handle new function types. * config/i386/i386.h (TARGET_ADX): New. * config/i386/i386.md (adcx3): New define_insn. * config/i386/i386.opt (madx): New. * config/i386/x86intrin.h: Include adxintrin.h. testsuite/Changelog entry: 2012-08-08 Michael Zolotukhin * gcc.target/i386/adx-addcarryx32-1.c: New. * gcc.target/i386/adx-addcarryx32-2.c: New. * gcc.target/i386/adx-addcarryx64-1.c: New. * gcc.target/i386/adx-addcarryx64-2.c: New. * gcc.target/i386/adx-check.h: New. * gcc.target/i386/i386.exp (check_effective_target_adx): New. * gcc.target/i386/sse-12.c: Add -madx. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * g++.dg/other/i386-2.C: Ditto. * g++.dg/other/i386-3.C: Ditto. On 3 August 2012 17:51, Uros Bizjak wrote: > On Fri, Aug 3, 2012 at 3:24 PM, Michael Zolotukhin > wrote: >> Hi, >> I made a new version of the patch, where I tried to take into account >> Uros' remarks - is it ok for trunk? >> >> Bootstrap and new tests are passing, testing is in progress. >> >> Changelog entry: >> 2012-08-03 Michael Zolotukhin >> >> * common/config/i386/i386-common.c (OPTION_MASK_ISA_ADX_SET): New. >> (OPTION_MASK_ISA_ADX_UNSET): Likewise. >> (ix86_handle_option): Handle madx option. >> * config.gcc (i[34567]86-*-*): Add adxintrin.h. >> (x86_64-*-*): Likewise. >> * config/i386/adxintrin.h: New header. >> * config/i386/driver-i386.c (host_detect_local_cpu): Detect ADCX/ADOX >> support. >> * config/i386/i386-builtin-types.def >> (UCHAR_FTYPE_UCHAR_UINT_UINT_PINT): New function type. >> (UCHAR_FTYPE_UCHAR_ULONGLONG_ULONGLONG_PINT): Likewise. >> * config/i386/i386-c.c: Define __ADX__ if needed. >> * config/i386/i386.c (ix86_target_string): Define -madx option. >> (PTA_ADX): New. >> (ix86_option_override_internal): Handle new option. >> (ix86_valid_target_attribute_inner_p): Add OPT_madx. >> (ix86_builtins): Add IX86_BUILTIN_ADDCARRYX32, >> IX86_BUILTIN_ADDCARRYX64. >> (ix86_init_mmx_sse_builtins): Define corresponding built-ins. >> (ix86_expand_builtin): Handle these built-ins. >> (ix86_expand_args_builtin): Handle new function types. >> * config/i386/i386.h (TARGET_ADX): New. >> * config/i386/i386.md (adcx3): New define_insn. >> * config/i386/i386.opt (madx): New. >> * config/i386/x86intrin.h: Include adxintrin.h. >> >> testsuite/Changelog entry: >> 2012-08-03 Michael Zolotukhin >> >> * gcc.target/i386/adx-addcarryx32-1.c: New. >> * gcc.target/i386/adx-addcarryx32-2.c: New. >> * gcc.target/i386/adx-addcarryx64-1.c: New. >> * gcc.target/i386/adx-addcarryx64-2.c: New. >> * gcc.target/i386/adx-check.h: New. >> * gcc.target/i386/i386.exp (check_effective_target_adx): New. >> * gcc.target/i386/sse-12.c: Add -madx. >> * gcc.target/i386/sse-13.c: Ditto. >> * gcc.target/i386/sse-14.c: Ditto. >> * gcc.target/i386/sse-22.c: Ditto. >> * gcc.target/i386/sse-23.c: Ditto. >> * g++.dg/other/i386-2.C: Ditto. >> * g++.dg/other/i386-3.C: Ditto. > > Just change this line: > > + op4 = gen_rtx_REG (CCmode, FLAGS_REG); > > back to CCCmode. > > OK with this change. > > Thanks, > Uros. -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation. bdw-adx-4.gcc.patch Description: Binary data
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On Tue, Aug 7, 2012 at 2:35 AM, Lawrence Crowl wrote: > Convert double_int from a struct with function into a class with > operators and methods. > > This patch adds the methods and operators. In general functions of > the form "double_int_whatever" become member functions "whatever" or, > when possible, operators. > > Every attempt has been made to preserve the existing algorithms, even > at the expense of some optimization opportunities. Some examples: > > The ext operation takes a value and returns a value. However, that > return value is usually assigned to the original variable. An > operation that modified a variable would be more efficient. That's not always the case though and I think the interface should be consistent with existing behavior to avoid errors creeping in during the transition. > In some cases, an outer sign-specific function calls an inner > function with the sign as a parameter, which then decides which > implementation to do. Decisions should not be artificially > introduced, and the implementation of each case should be exposed as > a separate routine. > > The existing operations are implemented in terms of the new > operations, which necessarily adds a layer between the new code and > the existing users. Once all files have migrated, this layer will > be removed. > > There are several existing operations implemented in terms of even > older legacy operations. This extra layer has not been removed. > > On occasion though, parameterized functions are often called > with a constant argments. To support static statement of intent, > and potentially faster code in the future, there are several new > unparameterized member functions. Some examples: > > Four routines now encode both 'arithmetic or logical' and 'right or > left' shift as part of the funciton name. > > Four routines now encode both 'signed or unsigned' and 'less than or > greater than' as part of the function name. For most parts overloads that take an (unsigned) HOST_WIDE_INT argument would be nice, as well as the ability to say dbl + 1. > -typedef struct > +typedef struct double_int > { > +public: > + /* Normally, we would define constructors to create instances. > + Two things prevent us from doing so. > + First, defining a constructor makes the class non-POD in C++03, > + and we certainly want double_int to be a POD. > + Second, the GCC conding conventions prefer explicit conversion, > + and explicit conversion operators are not available until C++11. */ > + > + static double_int make (unsigned HOST_WIDE_INT cst); > + static double_int make (HOST_WIDE_INT cst); > + static double_int make (unsigned int cst); > + static double_int make (int cst); Did we somehow decide to not allow constructors? It's odd to convert to C++ and end up with static member functions resembling them ... Also I believe the conversion above introduces possible migration errors. Think of a previous HOST_WIDE_INT a; double_int d = uhwi_to_double_int (a); if you write that now as HOST_WIDE_INT a; double_int d = double_int::make (a); you get the effect of shwi_to_double_int. Oops. So as an intermediate I'd like you _not_ to introduce the make () overloads. Btw, if HOST_WIDE_INT == int the above won't even compile. Richard. > + /* No copy assignment operator or destructor to keep the type a POD. */ > + > + /* There are some special value-creation static member functions. */ > + > + static double_int mask (unsigned prec); > + static double_int max_value (unsigned int prec, bool uns); > + static double_int min_value (unsigned int prec, bool uns); > + > + /* The following functions are mutating operations. */ > + > + double_int &operator ++(); // prefix > + double_int &operator --(); // prefix > + double_int &operator *= (double_int); > + double_int &operator += (double_int); > + double_int &operator -= (double_int); > + > + /* The following functions are non-mutating operations. */ > + > + /* Conversion functions. */ > + > + HOST_WIDE_INT to_signed () const; > + unsigned HOST_WIDE_INT to_unsigned () const; > + > + /* Conversion query functions. */ > + > + bool fits_unsigned() const; > + bool fits_signed() const; > + bool fits (bool uns) const; > + > + /* Attribute query functions. */ > + > + int trailing_zeros () const; > + int popcount () const; > + > + /* Arithmetic query operations. */ > + > + bool multiple_of (double_int, bool, double_int *) const; > + > + /* Arithmetic operation functions. */ > + > + double_int set_bit (unsigned) const; > + double_int mul_with_sign (double_int, bool, int *) const; > + > + double_int operator * (double_int b) const; > + double_int operator + (double_int b) const; > + double_int operator - (double_int b) const; > + double_int operator - () const; > + double_int operator ~ () const; > + double_int operator & (double_int b) const; > + double_int operator | (double_int b) const; > + d
Re: [patch] Use a smaller, dynamic worklist in compute_global_livein
On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher wrote: > Hello, > > In the test case for PR54146, compute_global_livein allocates/frees a > worklist for >400,000 basic blocks on each invocation. And it's called > a lot, for rewrite_into_loop_closed_ssa. But the maximum number of > basic blocks ever on the work list was only ~6500. So the work list > can be much smaller most of the time. > > Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk? Index: tree-ssa-loop-manip.c === --- tree-ssa-loop-manip.c (revision 190176) +++ tree-ssa-loop-manip.c (working copy) @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var)); bitmap_iterator bi; - if (is_gimple_reg (var)) -bitmap_clear_bit (livein, def_bb->index); - else -bitmap_set_bit (livein, def_bb->index); + gcc_checking_assert (is_gimple_reg (var)); + bitmap_clear_bit (livein, def_bb->index); !is_gimple_reg is true for virtual operand PHIs ... and at least find_uses_to_rename_stmt loops over all uses (so, I suppose given the comments elsewhere about tracking vops can you fix that to s/SSA_OP_ALL_USES/SSA_OP_USE/? void -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap def_blocks ATTRIBUTE_UNUSED) +compute_global_livein (bitmap livein, bitmap def_blocks) { - basic_block bb, *worklist, *tos; unsigned i; bitmap_iterator bi; + VEC (basic_block, heap) *worklist; - tos = worklist -= (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + 1)); + /* Normally the work list size is bounded by the number of basic + blocks in the largest loop. We don't know this number, but we + can be fairly sure that it will be relatively small. */ + worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100)); I suppose if we really want to optimize this we can make worklist static so we at most grow once. (in case you want to optimize allocation overhead, not memory used). Other structures in the ssa rewriter have this kind of lifetime, too (the per-SSA name info for example). Ok with the first change, whatever you prefer on the compute_global_livein thing. Thanks, Richard. > Ciao! > Steven
Re: [patch] Use a smaller, dynamic worklist in compute_global_livein
On Tue, Aug 7, 2012 at 10:31 AM, Richard Guenther wrote: > On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher wrote: >> Hello, >> >> In the test case for PR54146, compute_global_livein allocates/frees a >> worklist for >400,000 basic blocks on each invocation. And it's called >> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of >> basic blocks ever on the work list was only ~6500. So the work list >> can be much smaller most of the time. >> >> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk? > > Index: tree-ssa-loop-manip.c > === > --- tree-ssa-loop-manip.c (revision 190176) > +++ tree-ssa-loop-manip.c (working copy) > @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live >basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var)); >bitmap_iterator bi; > > - if (is_gimple_reg (var)) > -bitmap_clear_bit (livein, def_bb->index); > - else > -bitmap_set_bit (livein, def_bb->index); > + gcc_checking_assert (is_gimple_reg (var)); > + bitmap_clear_bit (livein, def_bb->index); > > !is_gimple_reg is true for virtual operand PHIs ... and at least > find_uses_to_rename_stmt loops over all uses (so, I suppose given the > comments elsewhere about tracking vops can you fix that to > s/SSA_OP_ALL_USES/SSA_OP_USE/? I'll give that a try. > void > -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap > def_blocks ATTRIBUTE_UNUSED) > +compute_global_livein (bitmap livein, bitmap def_blocks) > { > - basic_block bb, *worklist, *tos; >unsigned i; >bitmap_iterator bi; > + VEC (basic_block, heap) *worklist; > > - tos = worklist > -= (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + > 1)); > + /* Normally the work list size is bounded by the number of basic > + blocks in the largest loop. We don't know this number, but we > + can be fairly sure that it will be relatively small. */ > + worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100)); > > I suppose if we really want to optimize this we can make worklist static > so we at most grow once. (in case you want to optimize allocation overhead, > not memory used). Other structures in the ssa rewriter have this kind of > lifetime, too (the per-SSA name info for example). I don't want this to be static. This shouldn't be a persistent piece of data. But I actually did try a static work list (the full list, not the VEC), and it made no difference at all on the timings. All time is spent in the loop in compute_global_livein itself. There are >400,000 basic blocks but the maximum loop depth is only 3. I've tried out livein as a sparseset last night (allocated and filled in add_exit_phis_var and re-used for every call to add_exit_phis_edge), that reduces the time spent in compute_global_livein by a factor 2.5 for this test case (i.e. not the order-of-magnitude change I'm looking for). Why can't the normal SSA updater be used to rewrite into loop-closed SSA form? Ciao! Steven
Re: [patch] Use a smaller, dynamic worklist in compute_global_livein
On Tue, Aug 7, 2012 at 11:03 AM, Steven Bosscher wrote: > On Tue, Aug 7, 2012 at 10:31 AM, Richard Guenther > wrote: >> On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher >> wrote: >>> Hello, >>> >>> In the test case for PR54146, compute_global_livein allocates/frees a >>> worklist for >400,000 basic blocks on each invocation. And it's called >>> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of >>> basic blocks ever on the work list was only ~6500. So the work list >>> can be much smaller most of the time. >>> >>> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk? >> >> Index: tree-ssa-loop-manip.c >> === >> --- tree-ssa-loop-manip.c (revision 190176) >> +++ tree-ssa-loop-manip.c (working copy) >> @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live >>basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var)); >>bitmap_iterator bi; >> >> - if (is_gimple_reg (var)) >> -bitmap_clear_bit (livein, def_bb->index); >> - else >> -bitmap_set_bit (livein, def_bb->index); >> + gcc_checking_assert (is_gimple_reg (var)); >> + bitmap_clear_bit (livein, def_bb->index); >> >> !is_gimple_reg is true for virtual operand PHIs ... and at least >> find_uses_to_rename_stmt loops over all uses (so, I suppose given the >> comments elsewhere about tracking vops can you fix that to >> s/SSA_OP_ALL_USES/SSA_OP_USE/? > > I'll give that a try. > > >> void >> -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap >> def_blocks ATTRIBUTE_UNUSED) >> +compute_global_livein (bitmap livein, bitmap def_blocks) >> { >> - basic_block bb, *worklist, *tos; >>unsigned i; >>bitmap_iterator bi; >> + VEC (basic_block, heap) *worklist; >> >> - tos = worklist >> -= (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + >> 1)); >> + /* Normally the work list size is bounded by the number of basic >> + blocks in the largest loop. We don't know this number, but we >> + can be fairly sure that it will be relatively small. */ >> + worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100)); >> >> I suppose if we really want to optimize this we can make worklist static >> so we at most grow once. (in case you want to optimize allocation overhead, >> not memory used). Other structures in the ssa rewriter have this kind of >> lifetime, too (the per-SSA name info for example). > > I don't want this to be static. This shouldn't be a persistent piece of data. > But I actually did try a static work list (the full list, not the > VEC), and it made no difference at all on the timings. Ah, so the patch only reduces allocation but not compile-time itself. > All time is spent in the loop in compute_global_livein itself. There > are >400,000 basic blocks but the maximum loop depth is only 3. I've > tried out livein as a sparseset last night (allocated and filled in > add_exit_phis_var and re-used for every call to add_exit_phis_edge), > that reduces the time spent in compute_global_livein by a factor 2.5 > for this test case (i.e. not the order-of-magnitude change I'm looking > for). Another optimization would be to do @@ -440,13 +442,13 @@ compute_global_livein (bitmap livein ATT && ! bitmap_bit_p (def_blocks, pred_index) && bitmap_set_bit (livein, pred_index)) { - *tos++ = pred; + VEC_safe_push (basic_block, heap, worklist, pred); thus combine the bitmap_set_bit and bitmap_bit_p tests on livein. > Why can't the normal SSA updater be used to rewrite into loop-closed SSA form? It does not have a mode to trigger PHI inserts for loop-closed SSA form. I suppose most of the time we should try harder and not call rewrite_into_loop_closed_ssa so many times (eventually for the easy cases where we _do_ have an old->new name mapping or symbols to rename the SSA updater should deal with this). Richard. > Ciao! > Steven
Re: add strnlen to libiberty (was Re: Assembly output optimisations)
On Mon, 6 Aug 2012, Ian Lance Taylor wrote: > On Mon, Aug 6, 2012 at 10:44 PM, Dimitrios Apostolou wrote: > > > > What else is missing to make this patch appropriate for libiberty? Should I > > change the prolog in strnlen.c, since I only copied it intact from gnulib? > > We generally try to avoid straight GPL source code without runtime > exception in libiberty, and I don't see a reason to bend that rule for > this trivial function. Just don't forget that libiberty isn't a target library anymore. To wit, the (GCC) run-time exception is moot for that code, AFAIK. Maybe enough reason to abandon that rule so its code can be truly and freely shared between GNU projects. brgds, H-P
Re: [patch] Use a smaller, dynamic worklist in compute_global_livein
On Tue, Aug 7, 2012 at 11:22 AM, Richard Guenther wrote: > Another optimization would be to do > > @@ -440,13 +442,13 @@ compute_global_livein (bitmap livein ATT > && ! bitmap_bit_p (def_blocks, pred_index) > && bitmap_set_bit (livein, pred_index)) > { > - *tos++ = pred; > + VEC_safe_push (basic_block, heap, worklist, pred); > > thus combine the bitmap_set_bit and bitmap_bit_p tests on livein. That's just a micro-optimization that doesn't help much, because after the bitmap_bit_p in the if, the last accessed bitmap member is cached and the bitmap_set_bit is almost free. What is needed here, is a (much) smaller computed livein. In the end, the whole set of ~140,000 livein blocks is pruned to just 3, namely the loop exits... Ciao! Steven
Re: [patch] Use a smaller, dynamic worklist in compute_global_livein
On Tue, Aug 7, 2012 at 11:45 AM, Steven Bosscher wrote: > On Tue, Aug 7, 2012 at 11:22 AM, Richard Guenther > wrote: >> Another optimization would be to do >> >> @@ -440,13 +442,13 @@ compute_global_livein (bitmap livein ATT >> && ! bitmap_bit_p (def_blocks, pred_index) >> && bitmap_set_bit (livein, pred_index)) >> { >> - *tos++ = pred; >> + VEC_safe_push (basic_block, heap, worklist, pred); >> >> thus combine the bitmap_set_bit and bitmap_bit_p tests on livein. > > That's just a micro-optimization that doesn't help much, because after > the bitmap_bit_p in the if, the last accessed bitmap member is cached > and the bitmap_set_bit is almost free. > > What is needed here, is a (much) smaller computed livein. In the end, > the whole set of ~140,000 livein blocks is pruned to just 3, namely > the loop exits... So I wonder why simply looping over all SSA defs in a loop body and checking whether a use is outside of it is not enough to compute this information ... (yes, we might end up creating too many loop closed PHIs, namely on all exits rather than just on those that the name is live over, but ... at least maybe we can prune it with DOM info) Richard. > Ciao! > Steven
Re: [patch] Use a smaller, dynamic worklist in compute_global_livein
On Tue, Aug 7, 2012 at 11:52 AM, Richard Guenther wrote: > So I wonder why simply looping over all SSA defs in a loop body and checking > whether a use is outside of it is not enough to compute this information ... > (yes, we might end up creating too many loop closed PHIs, namely on all > exits rather than just on those that the name is live over, but ... at > least maybe we can prune it with DOM info) I've been thinking the same thing, but I don't understand loop-closed SSA form well enough to try and rewrite it along those lines. Ciao! Steven
[ping] Re: [patch v2] support for multiarch systems
ping? re-attaching the updated patch with the fixed comment in genmultilib. Matthias On 08.07.2012 20:48, Matthias Klose wrote: > Please find attached v2 of the patch updated for trunk 20120706, x86 only, > tested on > x86-linux-gnu, KFreeBSD and the Hurd. > > I left in the comment about the multiarch names, but I'm fine to > change/discard > it. It was first required by Joseph Myers, then not found necessary by Paolo > Bonzini. The patch includes the changes suggested by Thomas Schwinge. > > Ok for the trunk? > > Matthias > 2012-06-25 Matthias Klose * doc/invoke.texi: Document -print-multiarch. * doc/install.texi: Document --enable-multiarch. * doc/fragments.texi: Document MULTILIB_OSDIRNAMES, MULTIARCH_DIRNAME. * configure.ac: Add --enable-multiarch option. * configure.in: Regenerate. * Makefile.in (s-mlib): Pass MULTIARCH_DIRNAME to genmultilib. enable_multiarch, with_float: New macros. if_multiarch: New macro, define in terms of enable_multiarch. * genmultilib: Add new argument for the multiarch name. * gcc.c (multiarch_dir): Define. (for_each_path): Search for multiarch suffixes. (driver_handle_option): Handle multiarch option. (do_spec_1): Pass -imultiarch if defined. (main): Print multiarch. (set_multilib_dir): Separate multilib and multiarch names from multilib_select. (print_multilib_info): Ignore multiarch names in multilib_select. * incpath.c (add_standard_paths): Search the multiarch include dirs. * cppdeault.h (default_include): Document multiarch in multilib member. * cppdefault.c: [LOCAL_INCLUDE_DIR, STANDARD_INCLUDE_DIR] Add an include directory for multiarch directories. * common.opt: New options --print-multiarch and -imultilib. * config.gcc (tmake_file): Include i386/t-linux. (tmake_file): Include i386/t-kfreebsd. (tmake_file): Include i386/t-gnu. * config/i386/t-linux64: Add multiarch names in MULTILIB_OSDIRNAMES, define MULTIARCH_DIRNAME. * config/i386/t-gnu: New file. * config/i386/t-kfreebsd: Likewise. * config/i386/t-linux: Likewise. Index: configure.ac === --- configure.ac(revision 188931) +++ configure.ac(working copy) @@ -623,6 +623,21 @@ [], [enable_multilib=yes]) AC_SUBST(enable_multilib) +# Determine whether or not multiarch is enabled. +AC_ARG_ENABLE(multiarch, +[AS_HELP_STRING([--enable-multiarch], + [enable support for multiarch paths])], +[case "${withval}" in +yes|no|auto-detect) enable_multiarch=$withval;; +*) AC_MSG_ERROR(bad value ${withval} given for --enable-multiarch option) ;; +esac], [enable_multiarch=auto-detect]) +AC_MSG_CHECKING(for multiarch configuration) +AC_SUBST(enable_multiarch) +AC_MSG_RESULT($enable_multiarch) + +# needed for setting the multiarch name on ARM +AC_SUBST(with_float) + # Enable __cxa_atexit for C++. AC_ARG_ENABLE(__cxa_atexit, [AS_HELP_STRING([--enable-__cxa_atexit], [enable __cxa_atexit for C++])], Index: cppdefault.c === --- cppdefault.c(revision 188931) +++ cppdefault.c(working copy) @@ -63,6 +63,7 @@ #endif #ifdef LOCAL_INCLUDE_DIR /* /usr/local/include comes before the fixincluded header files. */ +{ LOCAL_INCLUDE_DIR, 0, 0, 1, 1, 2 }, { LOCAL_INCLUDE_DIR, 0, 0, 1, 1, 0 }, #endif #ifdef PREFIX_INCLUDE_DIR @@ -90,6 +91,7 @@ #endif #ifdef NATIVE_SYSTEM_HEADER_DIR /* /usr/include comes dead last. */ +{ NATIVE_SYSTEM_HEADER_DIR, NATIVE_SYSTEM_HEADER_COMPONENT, 0, 0, 1, 2 }, { NATIVE_SYSTEM_HEADER_DIR, NATIVE_SYSTEM_HEADER_COMPONENT, 0, 0, 1, 0 }, #endif { 0, 0, 0, 0, 0, 0 } Index: cppdefault.h === --- cppdefault.h(revision 188931) +++ cppdefault.h(working copy) @@ -43,9 +43,11 @@ C++. */ const char add_sysroot; /* FNAME should be prefixed by cpp_SYSROOT. */ - const char multilib; /* FNAME should have the multilib path - specified with -imultilib - appended. */ + const char multilib; /* FNAME should have appended + - the multilib path specified with -imultilib +when 1 is passed, + - the multiarch path specified with +-imultiarch, when 2 is passed. */ }; extern const struct default_include cpp_include_defaults[]; Index: Makefile.in === --- Makefile.in (revision 188931) +++ Makefile.in (working cop