Ping*2: [PATCH][ARM] GCC command line support for Cortex-R7
Hi Richard, Can you please help to review this patch? BR, Terry > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Terry Guo > Sent: Monday, March 04, 2013 10:46 AM > To: gcc-patches@gcc.gnu.org > Subject: Ping: [PATCH][ARM] GCC command line support for Cortex-R7 > > Ping... > > The patch is at http://gcc.gnu.org/ml/gcc-patches/2013-02/msg01105.html. > > BR, > Terry > > > -Original Message- > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Terry Guo > > Sent: Monday, February 25, 2013 10:23 AM > > To: gcc-patches@gcc.gnu.org > > Subject: [PATCH][ARM] GCC command line support for Cortex-R7 > > > > Hi, > > > > This patch is to enable GCC to accept new command line option - > > mcpu=cortex-r7. Is it OK to trunk? > > > > BR, > > Terry > > > > 2013-02-25 Terry Guo > > > > * config/arm/arm-cores.def: Added core cortex-r7. > > * config/arm/arm-tune.md: Regenerated. > > * config/arm/arm-tables.opt: Regenerated. > > * doc/invoke.texi: Added entry for core cortex-r7. > >
Re: [patch] make gcse.c respect -fno-gcse-lm
On Mon, Mar 11, 2013 at 12:17 AM, Steven Bosscher wrote: > Hello, > > RTL PRE has an option to disable load motion. This option works fine, > except that all analysis for load motion is still performed. > > This patch stops gcse.c from recording memory sets for -fno-gcse-lm, > and conservatively returns true in oprs_unchanged_p for any MEM. > > Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu. OK? Ok. Thanks, Richard. > Ciao! > Steven > > > * gcse.c (oprs_unchanged_p): Respect flag_gcse_lm. > (record_last_mem_set_info): Likewise. > > Index: gcse.c > === > --- gcse.c (revision 196576) > +++ gcse.c (working copy) > @@ -890,8 +890,9 @@ oprs_unchanged_p (const_rtx x, const_rtx >} > > case MEM: > - if (load_killed_in_block_p (current_bb, DF_INSN_LUID (insn), > - x, avail_p)) > + if (! flag_gcse_lm > + || load_killed_in_block_p (current_bb, DF_INSN_LUID (insn), > +x, avail_p)) > return 0; >else > return oprs_unchanged_p (XEXP (x, 0), insn, avail_p); > @@ -1471,10 +1472,14 @@ canon_list_insert (rtx dest ATTRIBUTE_UN > static void > record_last_mem_set_info (rtx insn) > { > - int bb = BLOCK_FOR_INSN (insn)->index; > + int bb; > + > + if (! flag_gcse_lm) > +return; > >/* load_killed_in_block_p will handle the case of calls clobbering > everything. */ > + bb = BLOCK_FOR_INSN (insn)->index; >modify_mem_list[bb].safe_push (insn); >bitmap_set_bit (modify_mem_list_set, bb);
Re: [patch PR middle-end/39326 - compile time and memory explosion in combine
On Sat, Mar 9, 2013 at 11:23 PM, Steven Bosscher wrote: > Hello, > > The attached patch fixes one of the (at least) three scalability > problems reported in PR middle-end/39326. This problem is that combine > takes forever and increases the memory footprint from ~650MB to >7GB. > The cause is DSE creating a lot of new registers in replace_read, and > those registers are not copy-prop'd out between dse1 and combine. The > result is many overlapping live ranges and single-set-single-use > registers that combine is made to work on. > > The fix is to just not create so many new registers in DSE in the > first place. It is wasteful and unnecessary if an existing register > can be re-used. > > With this patch, for the test case of the PR the combine time in > combine goes down from ~350s to ~4.5s, and the memory footprint > explosion is avoided. For my set of cc1-i files this also helps reduce > compile time a modest amount, especially for the larger files of > course. > > Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu. > OK for trunk? Not sure on the patch details - but in general I wonder why _DSE_ performs full redundancy elimination at all ... which replace_read seems to be. Anyway, for this one I'd say we wait for stage1 and consider backporting for 4.8.1 given we want to do a 4.8 RC1 soon. Thanks, Richard. > Ciao! > Steven > > > PR middle-end/39326 > * dse.c (replace_read): If the stored value is a pseudo-register > that is set only once, re-use it to replace the load instead of > creating a new register.
Re: [committed] xfail gcc.dg/tree-ssa/pr55579.c on 32-bit hppa*-*-hpux*
John David Anglin writes: > Index: gcc.dg/tree-ssa/pr55579.c > === > --- gcc.dg/tree-ssa/pr55579.c (revision 196546) > +++ gcc.dg/tree-ssa/pr55579.c (working copy) > @@ -11,5 +11,5 @@ >return x; > } > > -/* { dg-final { scan-tree-dump "Created a debug-only replacement for s" > "esra" } } */ > +/* { dg-final { scan-tree-dump "Created a debug-only replacement for s" > "esra" {xfail { hppa*-*-hpux* && { ! lp64 } } } } } */ You most likely need whitespace before the xfail. Also try to add a comment (PR reference?) explaining it. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [patch] PR middle-end/39326 - limit LIM
On Sun, Mar 10, 2013 at 2:08 AM, Steven Bosscher wrote: > Hello, > > The attached patch fixes another one of the scalability problems > reported in PR middle-end/39326. > > This problem is that tree loop-invariant code motion explodes on basic > blocks with many memory references. Compile time is quadratic in the > number of memory references in the basic block, and so are the memory > requirements when the dependences or independences are propagated > bottom-up through the loop tree. > > The fix is to give up on loops with too many memory references to > handle. There is already a param for that for loop dependence > analysis, and this patch makes LIM use the same param. > > Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu. > OK for trunk? Given + well. Return true if all is well, false if something happened + that is fatal to the rest of the LIM pass. */ -static void +static bool gather_mem_refs_stmt (struct loop *loop, gimple stmt) and FOR_EACH_BB (bb) { ... + for (bsi = gsi_start_bb (bb); + !gsi_end_p (bsi) && all_ok; + gsi_next (&bsi)) + all_ok = gather_mem_refs_stmt (loop, gsi_stmt (bsi)); + + if (! all_ok) +bitmap_set_bit (loops_with_too_many_memrefs, loop->num); +} + + /* Propagate the information about loops with too many memory + references up the loop hierarchy. */ + FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) +{ + struct loop *outer = loop_outer (loop); + if (outer == current_loops->tree_root + || ! bitmap_bit_p (loops_with_too_many_memrefs, loop->num)) + continue; + bitmap_set_bit (loops_with_too_many_memrefs, outer->num); } I don't see how this propagation works correctly as you start to mark BBs as not-ok starting from a "random" basic-block in the loop tree. You of course also end up disqualifying very small loops completely if they happen to be analyzed after a very big one you disqualify. Of course that's partly because memory_accesses contains all memory accesses in the function - so I think rather than limiting on length of memory_accesses you want to limit on the length of memory_accesses.refs_in_loop (well, on memory_accesses.all_refs_in_loop). And you want the initial walk over all BBs to instead walk on BBs FOR_EACH_LOOP and LI_FROM_INNERMOST (you can then do the propagation to fill all_refs_in_loop there, too). I realize there isn't a good existing BB walker for this (with a suitable place to re-set all-ok) - a simple walk over get_loop_body via LI_FROM_INNERMOST would get you visit sub-loop bodies multiple times (easily skipped by a bb->loop_father test, of course, but still ...) That is, sth like the following preparatory patch to change the iteration: Index: gcc/tree-ssa-loop-im.c === --- gcc/tree-ssa-loop-im.c (revision 196547) +++ gcc/tree-ssa-loop-im.c (working copy) @@ -1629,29 +1629,30 @@ gather_mem_refs_in_loops (void) loop_iterator li; bitmap lrefs, alrefs, alrefso; - FOR_EACH_BB (bb) -{ - loop = bb->loop_father; - if (loop == current_loops->tree_root) - continue; - - for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi)) - gather_mem_refs_stmt (loop, gsi_stmt (bsi)); -} - - /* Propagate the information about accessed memory references up - the loop hierarchy. */ FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) { + basic_block bbs = get_loop_body (loop); + for (i = 0; i < loop->num_nodes; ++i) + { + bb = bbs[i]; + if (bb->loop_father != loop) + continue; + for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi)) + gather_mem_refs_stmt (loop, gsi_stmt (bsi)); + } + free (bbs); + + /* Propagate the information about accessed memory references up +the loop hierarchy. */ lrefs = memory_accesses.refs_in_loop[loop->num]; alrefs = memory_accesses.all_refs_in_loop[loop->num]; bitmap_ior_into (alrefs, lrefs); - if (loop_outer (loop) == current_loops->tree_root) - continue; - - alrefso = memory_accesses.all_refs_in_loop[loop_outer (loop)->num]; - bitmap_ior_into (alrefso, alrefs); + if (loop_outer (loop) != current_loops->tree_root) + { + alrefso = memory_accesses.all_refs_in_loop[loop_outer (loop)->num]; + bitmap_ior_into (alrefso, alrefs); + } } } At this point this should be stage1 material, eventually backported for 4.8.1. And yes, aside from the above the rest of the patch looks good to me (move loops_with_too_many_memrefs into the memory_accesses struct?) Thanks, Richard. > Ciao! > Steven > > (The ChangeLog is a bit long but the patch is relatively straight-forward.) > > * tree-flow.h (enum move_pos): Moved to tree-ssa-loop-im.c. > * tree-ssa-loop-im.c: Include diagnostic-core.h for warning_at() > (enum move_pos): Mov
Re: [testsuite] Remove dg-excess-errors in gcc.dg/inline_[34].c and unroll_[234].c
domi...@lps.ens.fr (Dominique Dhumieres) writes: > The following tests XPASS on i?86-*-linux* and x86_64-*-linux* (see, > e.g., http://gcc.gnu.org/ml/gcc-testresults/2013-02/msg02923.html ) > > XPASS: gcc.dg/inline_3.c (test for excess errors) > XPASS: gcc.dg/inline_4.c (test for excess errors) > XPASS: gcc.dg/unroll_2.c (test for excess errors) > XPASS: gcc.dg/unroll_3.c (test for excess errors) > XPASS: gcc.dg/unroll_4.c (test for excess errors) > > The following patch fixes this to remove the noise. In addition I do not > see why this tests should be restricted to i?86-*-linux* and > x86_64-*-linux* (the other tests inline_[12].c and unroll_[15] have no > restrictions). Tested on powerpc-apple-darwin9 and x86_64-apple-darwin10. > > Ok for mainline? If yes, could someone commit the patch since I don't have > write access? I've now tested the patch on x86_64-unknown-linux-gnu and i686-unknown-linux-gnu where it removes the XPASSes (thus less testsuite noise), and on {i386-pc,sparc-sun}-solaris2.{9,10,11} where the tests pass, too. Unless the release managers object, I plan to commit it later today. Thanks and sorry for the delay. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH, ARM] - Fix for PR56470
In PR56470 the compiler is failing an internal check on the operands to shift_operator after reload has completed. This comes from reload spilling a constant shift value from a register-specified shift which is out-of-range, but permitted by the constraints due to the overloaded nature of this operation (shift_operator is intended to permit any shift by 0..31 or a multiply by a constant power of 2 -- it's really a bit of a hack, but it keeps the number of patterns in the machine description down significantly).[1] Anyway, if reload spills out a constant that can match the constriant M, then it has no way of knowing if this is for a multiply or for a shift, so we have to just handle it. Unfortunately, the attempt to tighten up the validation of user ASM blocks that mis-use %S output specifier, means we can now ICE internally. The fix is to do the validation inside shift_op, rather than in the caller and to permit the out-of-range constants that reload can generate. Tested on a native bootstrap, with no regressions. R. [1] Now that we have iterators, there is perhaps a better way of doing this, but that's a big change at this point and not suitable for 4.8. PR target/56470 * arm.md (shift_op): Validate RTL pattern on the fly. (arm_print_operand, case 'S'): Don't use shift_operator to validate the RTL.Index: arm.c === --- arm.c (revision 196523) +++ arm.c (working copy) @@ -15430,71 +15430,87 @@ shift_op (rtx op, HOST_WIDE_INT *amountp const char * mnem; enum rtx_code code = GET_CODE (op); - switch (GET_CODE (XEXP (op, 1))) -{ -case REG: -case SUBREG: - *amountp = -1; - break; - -case CONST_INT: - *amountp = INTVAL (XEXP (op, 1)); - break; - -default: - gcc_unreachable (); -} - switch (code) { case ROTATE: - gcc_assert (*amountp != -1); - *amountp = 32 - *amountp; - code = ROTATERT; + if (!CONST_INT_P (XEXP (op, 1))) + { + output_operand_lossage ("invalid shift operand"); + return NULL; + } - /* Fall through. */ + code = ROTATERT; + *amountp = 32 - INTVAL (XEXP (op, 1)); + mnem = "ror"; + break; case ASHIFT: case ASHIFTRT: case LSHIFTRT: case ROTATERT: mnem = arm_shift_nmem(code); + if (CONST_INT_P (XEXP (op, 1))) + { + *amountp = INTVAL (XEXP (op, 1)); + } + else if (REG_P (XEXP (op, 1))) + { + *amountp = -1; + return mnem; + } + else + { + output_operand_lossage ("invalid shift operand"); + return NULL; + } break; case MULT: /* We never have to worry about the amount being other than a power of 2, since this case can never be reloaded from a reg. */ - gcc_assert (*amountp != -1); + if (!CONST_INT_P (XEXP (op, 1))) + { + output_operand_lossage ("invalid shift operand"); + return NULL; + } + + *amountp = INTVAL (XEXP (op, 1)) & 0x; + + /* Amount must be a power of two. */ + if (*amountp & (*amountp - 1)) + { + output_operand_lossage ("invalid shift operand"); + return NULL; + } + *amountp = int_log2 (*amountp); return ARM_LSL_NAME; default: - gcc_unreachable (); + output_operand_lossage ("invalid shift operand"); + return NULL; } - if (*amountp != -1) -{ - /* This is not 100% correct, but follows from the desire to merge -multiplication by a power of 2 with the recognizer for a -shift. >=32 is not a valid shift for "lsl", so we must try and -output a shift that produces the correct arithmetical result. -Using lsr #32 is identical except for the fact that the carry bit -is not set correctly if we set the flags; but we never use the -carry bit from such an operation, so we can ignore that. */ - if (code == ROTATERT) - /* Rotate is just modulo 32. */ - *amountp &= 31; - else if (*amountp != (*amountp & 31)) - { - if (code == ASHIFT) - mnem = "lsr"; - *amountp = 32; - } - - /* Shifts of 0 are no-ops. */ - if (*amountp == 0) - return NULL; -} + /* This is not 100% correct, but follows from the desire to merge + multiplication by a power of 2 with the recognizer for a + shift. >=32 is not a valid shift for "lsl", so we must try and + output a shift that produces the correct arithmetical result. + Using lsr #32 is identical except for the fact that the carry bit + is not set correctly if we set the flags; but we never use the + carry bit from such an operation, so we can ignore that. */ + if (code == ROTATERT) +/* Rotate is just modulo 32. */ +*amountp &= 31; + else if (*amountp != (*a
[PATCH][1/n] tree LIM TLC
This is a series of patches applying some TLC to LIM. This first patch gets rid of the remains of create_vop_ref_mapping and alongside cleans up how we record references. Bootstrap and regtest pending on x86_64-unknown-linux-gnu. Richard. 2013-03-11 Richard Biener * tree-ssa-loop-im.c (record_mem_ref_loc): Record ref as stored here. (gather_mem_refs_stmt): Instead of here and in create_vop_ref_mapping_loop. (gather_mem_refs_in_loops): Fold into ... (analyze_memory_references): ... this. Move data structure init to tree_ssa_lim_initialize. Propagate stored refs info as well. (create_vop_ref_mapping_loop): Remove. (create_vop_ref_mapping): Likewise. (tree_ssa_lim_initialize): Initialize ref bitmaps here. Index: trunk/gcc/tree-ssa-loop-im.c === *** trunk.orig/gcc/tree-ssa-loop-im.c 2013-03-11 12:38:43.0 +0100 --- trunk/gcc/tree-ssa-loop-im.c2013-03-11 13:00:50.417737114 +0100 *** mem_ref_locs_alloc (void) *** 1518,1528 description REF. The reference occurs in statement STMT. */ static void ! record_mem_ref_loc (mem_ref_p ref, struct loop *loop, gimple stmt, tree *loc) { mem_ref_loc_p aref = XNEW (struct mem_ref_loc); mem_ref_locs_p accs; - bitmap ril = memory_accesses.refs_in_loop[loop->num]; if (ref->accesses_in_loop.length () <= (unsigned) loop->num) --- 1518,1528 description REF. The reference occurs in statement STMT. */ static void ! record_mem_ref_loc (mem_ref_p ref, bool is_stored, ! struct loop *loop, gimple stmt, tree *loc) { mem_ref_loc_p aref = XNEW (struct mem_ref_loc); mem_ref_locs_p accs; if (ref->accesses_in_loop.length () <= (unsigned) loop->num) *** record_mem_ref_loc (mem_ref_p ref, struc *** 1536,1556 aref->stmt = stmt; aref->ref = loc; - accs->locs.safe_push (aref); - bitmap_set_bit (ril, ref->id); - } - - /* Marks reference REF as stored in LOOP. */ ! static void ! mark_ref_stored (mem_ref_p ref, struct loop *loop) ! { ! for (; !loop != current_loops->tree_root !&& !bitmap_bit_p (ref->stored, loop->num); !loop = loop_outer (loop)) ! bitmap_set_bit (ref->stored, loop->num); } /* Gathers memory references in statement STMT in LOOP, storing the --- 1536,1552 aref->stmt = stmt; aref->ref = loc; accs->locs.safe_push (aref); ! bitmap_set_bit (memory_accesses.refs_in_loop[loop->num], ref->id); ! if (is_stored) ! { ! bitmap_set_bit (memory_accesses.all_refs_stored_in_loop[loop->num], ! ref->id); ! while (loop != current_loops->tree_root !&& bitmap_set_bit (ref->stored, loop->num)) ! loop = loop_outer (loop); ! } } /* Gathers memory references in statement STMT in LOOP, storing the *** gather_mem_refs_stmt (struct loop *loop, *** 1582,1590 fprintf (dump_file, "Unanalyzed memory reference %u: ", id); print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); } ! if (gimple_vdef (stmt)) ! mark_ref_stored (ref, loop); ! record_mem_ref_loc (ref, loop, stmt, mem); return; } --- 1578,1584 fprintf (dump_file, "Unanalyzed memory reference %u: ", id); print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); } ! record_mem_ref_loc (ref, gimple_vdef (stmt), loop, stmt, mem); return; } *** gather_mem_refs_stmt (struct loop *loop, *** 1611,1627 } } ! if (is_stored) ! mark_ref_stored (ref, loop); ! ! record_mem_ref_loc (ref, loop, stmt, mem); return; } /* Gathers memory references in loops. */ static void ! gather_mem_refs_in_loops (void) { gimple_stmt_iterator bsi; basic_block bb; --- 1605,1618 } } ! record_mem_ref_loc (ref, is_stored, loop, stmt, mem); return; } /* Gathers memory references in loops. */ static void ! analyze_memory_references (void) { gimple_stmt_iterator bsi; basic_block bb; *** gather_mem_refs_in_loops (void) *** 1647,1731 alrefs = memory_accesses.all_refs_in_loop[loop->num]; bitmap_ior_into (alrefs, lrefs); ! if (loop_outer (loop) == current_loops->tree_root) continue; ! alrefso = memory_accesses.all_refs_in_loop[loop_outer (loop)->num]; bitmap_ior_into (alrefso, alrefs); } } - /* Create a mapping from virtual operands to references that touch them -in LOOP. */ - - static void - create_vop_ref_mapping_loop (struct loop *loop) - { - bitmap refs = memory_accesses.refs_in_loop[loop->num]; - struct loop *sloop; - bitmap_iterator bi; - unsigned i; - mem_ref_p ref; - - EXECUTE_IF_SET_IN_BITMAP (refs, 0, i,
[PING][PATCH,ARM] Peephole individual LDR/STD into LDRD/STRD
PING: http://gcc.gnu.org/ml/gcc-patches/2013-02/msg00604.html Thanks, Greta > -Original Message- > From: Greta Yorsh [mailto:greta.yo...@arm.com] > Sent: 13 February 2013 13:36 > To: GCC Patches > Cc: Ramana Radhakrishnan; Richard Earnshaw; 'p...@codesourcery.com'; > 'ni...@redhat.com' > Subject: [PATCH,ARM] Peephole individual LDR/STD into LDRD/STRD > > This patch defines peephole2 patterns that merge two individual LDR > instructions into LDRD instruction (resp. STR into STRD) whenever > possible using the following transformations: > * reorder two memory accesses, > * rename registers when storing two constants, and > * reorder target registers of a load when they are used by a > commutative operation. > > In ARM mode only, the pair of registers IP and SP is allowed as > operands in LDRD/STRD. To handle it, this patch defines a new > constraint "q" to be CORE_REGS in ARM mode and GENERAL_REGS (i.e., > equivalent to "r") otherwise. Note that in ARM mode "q" is not > equivalent to "rk" because of the way constraints are matched. The new > constraint "q" is used in place of "r" for DImode move between register > and memory. > > This is a new version of the patch posted for review a long time ago: > http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00914.html > All the dependencies mentioned in the previous patch have already been > upstreamed. > Compared to the previous version, the new patch > * handles both ARM and Thumb modes in the same peephole pattern, > * does not attempt to generate LDRD/STRD when optimizing for size and > non of the LDM/STM patterns match (but it would be easy to add), > * does not include scan-assembly tests specific for cortex-a15 and > cortex-a9, because they are not stable and highly sensitive to other > optimizations. > > No regression on qemu for arm-none-eabi with cpu cortex-a15. > > Bootstrap successful on Cortex-A15 TC2. > > Spec2k results: > Performance: slight improvement in overall scores (less than 1%) in > both CINT2000 and CFP2000. > For individual benchmarks, there is a slight variation in performance, > within less than 1%, which I consider to be just noise. > Object size: there is a slight reduction in size in all the benchmarks > - overall 0.2% and at most 0.5% for individual benchmarks. > Baseline compiler is gcc r194473 from December 2012. > Compiled in thumb mode with hardfp. > Run on Cortex-A15 hardware. > > Ok for gcc4.9 stage 1? > > Thanks, > Greta > > gcc/ > > 2013-02-13 Greta Yorsh > > * config/arm/constraints.md (q): New constraint. > * config/arm/ldrdstrd.md: New file. > * config/arm/arm.md (ldrdstrd.md) New include. > (arm_movdi): Use "q" instead of "r" constraint > for double-word memory access. > (movdf_soft_insn): Likewise. > * config/arm/vfp.md (movdi_vfp): Likewise. > * config/arm/t-arm (MD_INCLUDES): Add ldrdstrd.md. > * config/arm/arm-protos.h (gen_operands_ldrd_strd): New > declaration. > * config/arm/arm.c (gen_operands_ldrd_strd): New function. > (mem_ok_for_ldrd_strd): Likewise. > (output_move_double): Update assertion. > > gcc/testsuite > > 2013-02-13 Greta Yorsh > > * gcc.target/arm/peep-ldrd-1.c: New test. > * gcc.target/arm/peep-strd-1.c: Likewise.
Re: C++ PATCH for c++/56567 (ICE with lambda returning init-list)
On 03/08/2013 10:54 AM, Jason Merrill wrote: My initial proposal for allowing general return type deduction allowed deduction of std::initializer_list, which is not permitted by C++11. But this doesn't make sense, because the underlying array will immediately leak, so we should just give an error even in C++1y. As pointed out in the PR, this approach was wrong because it's possible to have an expression with std::initializer_list type; the thing returned might not have itself been a brace-enclosed init list. So let's catch that earlier. Tested x86_64-pc-linux-gnu, applying to trunk. commit 3d3cc430920cbb17be4e85742c34d51b94537a37 Author: Jason Merrill Date: Mon Mar 11 12:03:31 2013 -0400 PR c++/56567 * typeck.c (check_return_expr): Disallow returning init list here. * semantics.c (apply_deduced_return_type): Not here. diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 233765a..e909b98 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -9061,12 +9061,6 @@ apply_deduced_return_type (tree fco, tree return_type) if (return_type == error_mark_node) return; - if (is_std_init_list (return_type)) -{ - error ("returning %qT", return_type); - return_type = void_type_node; -} - if (LAMBDA_FUNCTION_P (fco)) { tree lambda = CLASSTYPE_LAMBDA_EXPR (current_class_type); diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c index 58295d7..58ebcc0 100644 --- a/gcc/cp/typeck.c +++ b/gcc/cp/typeck.c @@ -8136,6 +8136,11 @@ check_return_expr (tree retval, bool *no_warning) "deduced to %"); type = error_mark_node; } + else if (retval && BRACE_ENCLOSED_INITIALIZER_P (retval)) + { + error ("returning initializer list"); + type = error_mark_node; + } else { if (!retval) diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-initlist3.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-initlist3.C index 029287b..f7b82ef 100644 --- a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-initlist3.C +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-initlist3.C @@ -5,7 +5,7 @@ int main() { - []{ return { 1, 2 }; }(); // { dg-error "initializer_list" } + []{ return { 1, 2 }; }(); // { dg-error "initializer.list" } } // { dg-prune-output "return-statement with a value" }
[c++-concepts] Semantic handling of requirements
Adding initial support for the semantic analysis of template requirements. This patch adds features for reducing requires clauses into logical formulas comprised only of atomic propositions and logical connectives. The next patch will add inlining for constraint predicates. I had hoped to add the file as constraint.cc, but the build system in this version doesn't seem to have to build support for .cc files (kept getting "no such file: cp/constraint.o", but adding the file as .c worked just fine. 2013-03-11 Andrew Sutton * gcc/cp/Make-lang.in: Add constraint.c * gcc/cp/constraint.c: New (conjoin_requirements): New (disjoin_requirements): New (requirement_reduction): New class (reduce_requirements): New * gcc/cp/cp-tree.h (reduce_requrements): New (conjoin_requirements): New (disjoin_requirements): New * gcc/cp/cp-tree.h (finish_template_template_parm) Comments. * gcc/cp/semantics.c (finish_template_requirements) Start working with requirements. reqs-sem.patch Description: Binary data
Re: [Fortran, Patch] Improve documentation of the non-implemented RECORD STRUCTURE extension
*ping* Tobias Burnus: During the discussion of UNION, I decided to have a look at the current documentation. UNION is not mentioned (except for some commented lines), but record structures are. Attached is an attempted to improve the documentation. The current version is at http://gcc.gnu.org/onlinedocs/gfortran/STRUCTURE-and-RECORD.html I tested it with make pdf and make info and looked at the output. OK for the trunk? Tobias
Re: [Fortran, Patch] Improve documentation of the non-implemented RECORD STRUCTURE extension
On Mon, Mar 11, 2013 at 06:39:37PM +0100, Tobias Burnus wrote: > *ping* > > Tobias Burnus: > > During the discussion of UNION, I decided to have a look at the > > current documentation. UNION is not mentioned (except for some > > commented lines), but record structures are. Attached is an attempted > > to improve the documentation. > > > > The current version is at > > http://gcc.gnu.org/onlinedocs/gfortran/STRUCTURE-and-RECORD.html > > > > I tested it with make pdf and make info and looked at the output. > > OK for the trunk? The patch looks ok to me. I wonder if the text should identify the vendor that introduced this feature as DEC. For the youngster on the list, DEC is the Digital Equipment Corporation. -- Steve
[Patch,AVR]: Fix PR56263
This patch implements a new warning option -Waddr-space-convert warns about conversions to a non-containing address space. Address spaces are implemented in such a way that each address space is contained in each other space so that casting is possible, e.g. in code like char read_c (bool in_flash, const char *str) { if (in_flash) return * (const __flash char*) str; else return *str; } However, there are no warning about implicit or explicit address space conversions, which makes it hard to port progmem code to address space code. If an address space qualifier is missing, e.g. when calling a function that is not qualified correctly, this would result in wrong code (in contrast to pgm_read that work no matter how the address is qualified). There is still some work to do to get more precise warnings and this is just a first take to implement PR56263, see the FIXME in the source. The details can be worked out later, e.g. for 4.8.1. Ok for trunk? PR target/56263 * config/avr/avr.c (TARGET_CONVERT_TO_TYPE): Define to... (avr_convert_to_type): ...this new static function. * config/avr/avr.opt (-Waddr-space-convert): New C option. * doc/invoke.texi (AVR Options): Document it. Index: config/avr/avr.opt === --- config/avr/avr.opt (revision 196495) +++ config/avr/avr.opt (working copy) @@ -74,3 +74,7 @@ When accessing RAM, use X as imposed by msp8 Target Report RejectNegative Var(avr_sp8) Init(0) The device has no SPH special function register. This option will be overridden by the compiler driver with the correct setting if presence/absence of SPH can be deduced from -mmcu=MCU. + +Waddr-space-convert +Warning C Report Var(avr_warn_addr_space_convert) Init(0) +Warn if the address space of an address is change. Index: config/avr/avr.c === --- config/avr/avr.c (revision 196495) +++ config/avr/avr.c (working copy) @@ -10765,6 +10765,42 @@ avr_addr_space_subset_p (addr_space_t su } +/* Implement `TARGET_CONVERT_TO_TYPE'. */ + +static tree +avr_convert_to_type (tree type, tree expr) +{ + if (avr_warn_addr_space_convert + && expr != error_mark_node + && POINTER_TYPE_P (type) + && POINTER_TYPE_P (TREE_TYPE (expr))) +{ + addr_space_t as_old = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (expr))); + addr_space_t as_new = TYPE_ADDR_SPACE (TREE_TYPE (type)); + + if (avr_log.progmem) +avr_edump ("%?: type = %t\nexpr = %t\n\n", type, expr); + + if (as_new != ADDR_SPACE_MEMX + && as_new != as_old) +{ + location_t loc = EXPR_LOCATION (expr); + const char *name_old = avr_addrspace[as_old].name; + const char *name_new = avr_addrspace[as_new].name; + + warning (OPT_Waddr_space_convert, + "conversion from address space %qs to address space %qs", + ADDR_SPACE_GENERIC_P (as_old) ? "generic" : name_old, + ADDR_SPACE_GENERIC_P (as_new) ? "generic" : name_new); + + return fold_build1_loc (loc, ADDR_SPACE_CONVERT_EXPR, type, expr); +} +} + + return NULL_TREE; +} + + /* Worker function for movmemhi expander. XOP[0] Destination as MEM:BLK XOP[1] Source " " @@ -12149,6 +12185,9 @@ avr_fold_builtin (tree fndecl, int n_arg #undef TARGET_FIXED_POINT_SUPPORTED_P #define TARGET_FIXED_POINT_SUPPORTED_P hook_bool_void_true +#undef TARGET_CONVERT_TO_TYPE +#define TARGET_CONVERT_TO_TYPE avr_convert_to_type + #undef TARGET_ADDR_SPACE_SUBSET_P #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p Index: doc/invoke.texi === --- doc/invoke.texi (revision 196495) +++ doc/invoke.texi (working copy) @@ -514,7 +514,7 @@ Objective-C and Objective-C++ Dialects}. @emph{AVR Options} @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol -mcall-prologues -mint8 -mno-interrupts -mrelax @gol --mstrict-X -mtiny-stack} +-mstrict-X -mtiny-stack -Waddr-space-convert} @emph{Blackfin Options} @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol @@ -11653,6 +11653,11 @@ when @code{EICALL} or @code{EIJMP} instr Indirect jumps and calls on these devices are handled as follows by the compiler and are subject to some limitations: +@item -Waddr-space-convert +@opindex Waddr-space-convert +Warn about conversions between address spaces in the case where the +resulting address space is not contained in the incoming address space. + @itemize @bullet @item
Re: [patch PR middle-end/39326 - compile time and memory explosion in combine
On 03/11/2013 03:23 AM, Richard Biener wrote: On Sat, Mar 9, 2013 at 11:23 PM, Steven Bosscher wrote: Hello, The attached patch fixes one of the (at least) three scalability problems reported in PR middle-end/39326. This problem is that combine takes forever and increases the memory footprint from ~650MB to >7GB. The cause is DSE creating a lot of new registers in replace_read, and those registers are not copy-prop'd out between dse1 and combine. The result is many overlapping live ranges and single-set-single-use registers that combine is made to work on. The fix is to just not create so many new registers in DSE in the first place. It is wasteful and unnecessary if an existing register can be re-used. With this patch, for the test case of the PR the combine time in combine goes down from ~350s to ~4.5s, and the memory footprint explosion is avoided. For my set of cc1-i files this also helps reduce compile time a modest amount, especially for the larger files of course. Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu. OK for trunk? Not sure on the patch details - but in general I wonder why _DSE_ performs full redundancy elimination at all ... which replace_read seems to be. Anyway, for this one I'd say we wait for stage1 and consider backporting for 4.8.1 given we want to do a 4.8 RC1 soon. Agreed on the defer until 4.8.1. jeff
Fix sharing of clobbers
Hi, in the testcase we die in post-reload cprop because updating one insn clobber actually affect other insn clobber. I actually introduced clobber sharing back in 2005 to save memory for (clobber cc0). This broke with introduction of post-reload code copying that is now done by shrink wrapping. Fixed by unsharing those clobbers that originate from pseudos. Bootstrapped/regtested x86_64-linux, OK? PR middle-end/56571 * valtrack.c (cleanup_auto_inc_dec): Unshare clobbers originating from pseudos. * emit-rtl.c (verify_rtx_sharing): Likewise. (copy_insn_1): Likewise. * rtl.c (copy_rtx): Likewise. * gcc.c-torture/compile/pr56571.c: New testcase. Index: valtrack.c === *** valtrack.c (revision 196596) --- valtrack.c (working copy) *** cleanup_auto_inc_dec (rtx src, enum mach *** 71,77 /* SCRATCH must be shared because they represent distinct values. */ return x; case CLOBBER: ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER) return x; break; --- 71,78 /* SCRATCH must be shared because they represent distinct values. */ return x; case CLOBBER: ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER ! && ORIGINAL_REGNO (XEXP (x, 0)) == REGNO (XEXP (x, 0))) return x; break; Index: testsuite/gcc.c-torture/compile/pr56571.c === *** testsuite/gcc.c-torture/compile/pr56571.c (revision 0) --- testsuite/gcc.c-torture/compile/pr56571.c (revision 0) *** *** 0 --- 1,8 + /* { dg-options "-funroll-loops -ftracer" } */ + int a, b; + + int f(void) + { + (a % b) && f(); + a = (0 || a | (a ? : 1)); + } Index: emit-rtl.c === *** emit-rtl.c (revision 196596) --- emit-rtl.c (working copy) *** verify_rtx_sharing (rtx orig, rtx insn) *** 2583,2589 return; /* SCRATCH must be shared because they represent distinct values. */ case CLOBBER: ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER) return; break; --- 2583,2593 return; /* SCRATCH must be shared because they represent distinct values. */ case CLOBBER: ! /* Share clobbers of hard registers (like cc0), but do not share pseudo reg ! clobbers or clobbers of hard registers that originated as pseudos. ! This is needed to allow safe register renaming. */ ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER ! && ORIGINAL_REGNO (XEXP (x, 0)) == REGNO (XEXP (x, 0))) return; break; *** repeat: *** 2797,2803 /* SCRATCH must be shared because they represent distinct values. */ return; case CLOBBER: ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER) return; break; --- 2801,2811 /* SCRATCH must be shared because they represent distinct values. */ return; case CLOBBER: ! /* Share clobbers of hard registers (like cc0), but do not share pseudo reg ! clobbers or clobbers of hard registers that originated as pseudos. ! This is needed to allow safe register renaming. */ ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER ! && ORIGINAL_REGNO (XEXP (x, 0)) == REGNO (XEXP (x, 0))) return; break; *** copy_insn_1 (rtx orig) *** 5303,5309 case SIMPLE_RETURN: return orig; case CLOBBER: ! if (REG_P (XEXP (orig, 0)) && REGNO (XEXP (orig, 0)) < FIRST_PSEUDO_REGISTER) return orig; break; --- 5311,5318 case SIMPLE_RETURN: return orig; case CLOBBER: ! if (REG_P (XEXP (orig, 0)) && REGNO (XEXP (orig, 0)) < FIRST_PSEUDO_REGISTER ! && ORIGINAL_REGNO (XEXP (orig, 0)) == REGNO (XEXP (orig, 0))) return orig; break; Index: rtl.c === *** rtl.c (revision 196596) --- rtl.c (working copy) *** copy_rtx (rtx orig) *** 256,262 /* SCRATCH must be shared because they represent distinct values. */ return orig; case CLOBBER: ! if (REG_P (XEXP (orig, 0)) && REGNO (XEXP (orig, 0)) < FIRST_PSEUDO_REGISTER) return orig; break; --- 256,266 /* SCRATCH must be shared because they represent distinct values. */ return orig; case CLOBBER: ! /* Share clobbers of hard registers (like cc0), but do not share pseudo reg ! clobbers or clobbers of hard registers that originated as pseudos. ! This is nee
Re: [patch] Make dse.c dumps less verbose unless dumping details
On 03/09/2013 12:59 PM, Steven Bosscher wrote: Hello, Debugging a DSE bug, I found the dumps to be almost unusable because they are so verbose. For my test case, the -fdump-rtl-dse1 dump is a ~17GB file without the attached patch (the test case has ~5 insns). With the patch, all this extra information is only dumped with -fdump-rtl-dse1-details. Bootstrapped&tested on x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu. Will commit as obvious. Ciao! Steven * dse.c (delete_dead_store_insn): Respect TDF_DETAILS. (canon_address, record_store, replace_read, check_mem_read_rtx, scan_insn, dse_step1, dse_step2_init, dse_step2_spill, dse_step4, dse_step5_nospill, dse_step5_spill, dse_step6, rest_of_handle_dse): Likewise. Given we're trying to wrap up 4.8 and get an RC out, I would have preferred this not go in. It's not strictly necessary to get 4.8 out the door. Regardless, it's done and I'm certainly not going to suggest pulling the patch out. In the future, please avoid any non-critical checkins this late in the game. jeff
Re: extend fwprop optimization
On 03/10/2013 11:52 PM, Wei Mi wrote: Hi, This is the fwprop extension patch which is put in order. Regression test and bootstrap pass. Please help to review its rationality. The following is a brief description what I have done in the patch. In order to make fwprop more effective in rtl optimization, we extend it to handle general expressions instead of the three cases listed in the head comment in fwprop.c. The major changes include a) We need to check propagation correctness for src exprs of def which contain mem references. Previous fwprop for the three cases above doesn't have the problem. b) We need a better cost model because the benefit is usually not so apparent as the three cases above. For a general fwprop problem, there are two possible sources where benefit comes from. The frist is the new use insn after propagation and simplification may have lower cost than itself before propagation, or propagation may create a new insn, that could be splitted or peephole optimized later and get a lower cost. The second is that if all the uses are replaced with the src of the def insn, the def insn could be deleted. So instead of check each def-use pair independently, we use DU chain to track all the uses for a def. For each def-use pair, we attempt the propagation, record the change candidate in changes[] array, but we wait to confirm the changes until all the pairs with the same def are iterated. The changes confirmation is done in the func confirm_change_group_by_cost. We only do this for fwprop. For fwprop_addr, the benefit of each change is ensured by propagation_rtx_1 using should_replace_address, so we just confirm all the changes without checking benefit again. Can you please attach this to the 4.9 pending patches tracker bug. We're really focused on trying to get 4.8 out the door and this doesn't seem like suitable material for GCC 4.8. I haven't looked at the details of the patch at all yet and doubt I would prior to GCC 4.8 going out the door. Thanks, jeff
Re: Fix sharing of clobbers
On 03/11/2013 12:06 PM, Jan Hubicka wrote: Hi, in the testcase we die in post-reload cprop because updating one insn clobber actually affect other insn clobber. I actually introduced clobber sharing back in 2005 to save memory for (clobber cc0). This broke with introduction of post-reload code copying that is now done by shrink wrapping. Fixed by unsharing those clobbers that originate from pseudos. Bootstrapped/regtested x86_64-linux, OK? PR middle-end/56571 * valtrack.c (cleanup_auto_inc_dec): Unshare clobbers originating from pseudos. * emit-rtl.c (verify_rtx_sharing): Likewise. (copy_insn_1): Likewise. * rtl.c (copy_rtx): Likewise. * gcc.c-torture/compile/pr56571.c: New testcase. Index: valtrack.c === *** valtrack.c (revision 196596) --- valtrack.c (working copy) *** cleanup_auto_inc_dec (rtx src, enum mach *** 71,77 /* SCRATCH must be shared because they represent distinct values. */ return x; case CLOBBER: ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER) return x; break; --- 71,78 /* SCRATCH must be shared because they represent distinct values. */ return x; case CLOBBER: ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER ! && ORIGINAL_REGNO (XEXP (x, 0)) == REGNO (XEXP (x, 0))) return x; break; Comment? Otherwise one would have to know to look in verify_rtx_sharing to know why clobbers originating from pseudos aren't shared. *** copy_insn_1 (rtx orig) *** 5303,5309 case SIMPLE_RETURN: return orig; case CLOBBER: ! if (REG_P (XEXP (orig, 0)) && REGNO (XEXP (orig, 0)) < FIRST_PSEUDO_REGISTER) return orig; break; --- 5311,5318 case SIMPLE_RETURN: return orig; case CLOBBER: ! if (REG_P (XEXP (orig, 0)) && REGNO (XEXP (orig, 0)) < FIRST_PSEUDO_REGISTER ! && ORIGINAL_REGNO (XEXP (orig, 0)) == REGNO (XEXP (orig, 0))) return orig; break; Similarly. OK with those changes. jeff
Re: Fix sharing of clobbers
On Mon, Mar 11, 2013 at 07:06:13PM +0100, Jan Hubicka wrote: > *** emit-rtl.c(revision 196596) > --- emit-rtl.c(working copy) > *** verify_rtx_sharing (rtx orig, rtx insn) > *** 2583,2589 > return; > /* SCRATCH must be shared because they represent distinct values. */ > case CLOBBER: > ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < > FIRST_PSEUDO_REGISTER) > return; > break; > > --- 2583,2593 > return; > /* SCRATCH must be shared because they represent distinct values. */ Beyond what Jeff asked you to change, please also move the above comment about SCRATCH back above the return; line (got accidentally moved during your 2004 change). Thanks. > case CLOBBER: > ! /* Share clobbers of hard registers (like cc0), but do not share > pseudo reg > ! clobbers or clobbers of hard registers that originated as pseudos. > ! This is needed to allow safe register renaming. */ > ! if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER > ! && ORIGINAL_REGNO (XEXP (x, 0)) == REGNO (XEXP (x, 0))) > return; > break; Jakub
Re: extend fwprop optimization
On Mon, Mar 11, 2013 at 7:10 PM, Jeff Law wrote: > On 03/10/2013 11:52 PM, Wei Mi wrote: >> >> Hi, >> >> This is the fwprop extension patch which is put in order. Regression >> test and bootstrap pass. Please help to review its rationality. The >> following is a brief description what I have done in the patch. >> >> In order to make fwprop more effective in rtl optimization, we extend >> it to handle general expressions instead of the three cases listed in >> the head comment in fwprop.c. The major changes include a) We need to >> check propagation correctness for src exprs of def which contain mem >> references. Previous fwprop for the three cases above doesn't have the >> problem. b) We need a better cost model because the benefit is usually >> not so apparent as the three cases above. >> >> For a general fwprop problem, there are two possible sources where >> benefit comes from. The frist is the new use insn after propagation >> and simplification may have lower cost than itself before propagation, >> or propagation may create a new insn, that could be splitted or >> peephole optimized later and get a lower cost. The second is that if >> all the uses are replaced with the src of the def insn, the def insn >> could be deleted. >> >> So instead of check each def-use pair independently, we use DU chain >> to track all the uses for a def. For each def-use pair, we attempt the >> propagation, record the change candidate in changes[] array, but we >> wait to confirm the changes until all the pairs with the same def are >> iterated. The changes confirmation is done in the func >> confirm_change_group_by_cost. We only do this for fwprop. For >> fwprop_addr, the benefit of each change is ensured by >> propagation_rtx_1 using should_replace_address, so we just confirm all >> the changes without checking benefit again. > > Can you please attach this to the 4.9 pending patches tracker bug. We're > really focused on trying to get 4.8 out the door and this doesn't seem like > suitable material for GCC 4.8. > > I haven't looked at the details of the patch at all yet and doubt I would > prior to GCC 4.8 going out the door. > > Thanks, > jeff > Jeff, The world has more people than you, and with different interests. This patch was posted here for comments on the idea, and while I'm sure your feedback would be very valuable, it is no more required for discussing this patch than it is for releasing GCC 4.8. Ciao! Steven
Re: [patch] PR middle-end/39326 - limit LIM
On Mon, Mar 11, 2013 at 10:52 AM, Richard Biener wrote: > Given > > + well. Return true if all is well, false if something happened > + that is fatal to the rest of the LIM pass. */ > > -static void > +static bool > gather_mem_refs_stmt (struct loop *loop, gimple stmt) > > and > >FOR_EACH_BB (bb) > { > ... > + for (bsi = gsi_start_bb (bb); > + !gsi_end_p (bsi) && all_ok; > + gsi_next (&bsi)) > + all_ok = gather_mem_refs_stmt (loop, gsi_stmt (bsi)); > + > + if (! all_ok) > +bitmap_set_bit (loops_with_too_many_memrefs, loop->num); > +} > + > + /* Propagate the information about loops with too many memory > + references up the loop hierarchy. */ > + FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) > +{ > + struct loop *outer = loop_outer (loop); > + if (outer == current_loops->tree_root > + || ! bitmap_bit_p (loops_with_too_many_memrefs, loop->num)) > + continue; > + bitmap_set_bit (loops_with_too_many_memrefs, outer->num); > } > > I don't see how this propagation works correctly as you start to mark > BBs as not-ok starting from a "random" basic-block in the loop tree. Not at all. The function looks like this: static void gather_mem_refs_in_loops (bitmap loops_with_too_many_memrefs) { FOR_EACH_BB (bb) { for each gimple statement all_ok = gather_mem_refs_stmt (loop, gsi_stmt (bsi)); if (! all_ok) bitmap_set_bit (loops_with_too_many_memrefs, loop->num); } /* Propagate the information about loops with too many memory references up the loop hierarchy. */ FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) { struct loop *outer = loop_outer (loop); if (outer == current_loops->tree_root || ! bitmap_bit_p (loops_with_too_many_memrefs, loop->num)) continue; bitmap_set_bit (loops_with_too_many_memrefs, outer->num); } /* Propagate the information about accessed memory references up the loop hierarchy. */ FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) /* Propagate stuff */ } So all basic blocks are visited first. Note it is also like this without my patch. > You of course also end up disqualifying very small loops completely > if they happen to be analyzed after a very big one you disqualify. > Of course that's partly because memory_accesses contains all > memory accesses in the function - so I think rather than limiting > on length of memory_accesses you want to limit on the length of > memory_accesses.refs_in_loop (well, on memory_accesses.all_refs_in_loop). Right, I guess the limit should be per-loop, and it's "global" now. > And you want the initial walk over all BBs to instead walk on BBs > FOR_EACH_LOOP and LI_FROM_INNERMOST (you can then do the > propagation to fill all_refs_in_loop there, too). That is already what happens. > At this point this should be stage1 material, eventually backported for 4.8.1. Obviously. > And yes, aside from the above the rest of the patch looks good to me > (move loops_with_too_many_memrefs into the memory_accesses struct?) That's a good idea. I'll come back with an updated patch for trunk GCC 4.9. Ciao! Steven
Re: [patch PR middle-end/39326 - compile time and memory explosion in combine
On Mon, Mar 11, 2013 at 10:23 AM, Richard Biener wrote: > On Sat, Mar 9, 2013 at 11:23 PM, Steven Bosscher wrote: >> The attached patch fixes one of the (at least) three scalability >> problems reported in PR middle-end/39326. This problem is that combine >> takes forever and increases the memory footprint from ~650MB to >7GB. >> The cause is DSE creating a lot of new registers in replace_read, and >> those registers are not copy-prop'd out between dse1 and combine. The >> result is many overlapping live ranges and single-set-single-use >> registers that combine is made to work on. >> >> The fix is to just not create so many new registers in DSE in the >> first place. It is wasteful and unnecessary if an existing register >> can be re-used. >> >> With this patch, for the test case of the PR the combine time in >> combine goes down from ~350s to ~4.5s, and the memory footprint >> explosion is avoided. For my set of cc1-i files this also helps reduce >> compile time a modest amount, especially for the larger files of >> course. >> >> Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu. >> OK for trunk? > > Not sure on the patch details - but in general I wonder why _DSE_ > performs full redundancy elimination at all ... which replace_read > seems to be. Yes. No idea why, it's always been like that. > Anyway, for this one I'd say we wait for stage1 and consider backporting > for 4.8.1 given we want to do a 4.8 RC1 soon. OK. I've found a buglet in the patch, too, so I'll post an updated patch for trunk GCC 4.9. Ciao! Steven
[Patch, Fortran, committed] Minor libgfortran/ fixes
Found using Coverity Scanner. The first is a bug, the other just silences a dead-code warning. Committed as Rev. 196603 after build + regtest. Tobias 2013-03-11 Tobias Burnus * io/transfer.c (read_block_direct): Correct condition. * intrinsics/execute_command_line.c (execute_command_line): Remove dead code for the HAVE_FORK case. diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c index 515c34f..d97a325 100644 --- a/libgfortran/io/transfer.c +++ b/libgfortran/io/transfer.c @@ -641,7 +641,7 @@ read_block_direct (st_parameter_dt *dtp, void *buf, size_t nbytes) have_read_subrecord = sread (dtp->u.p.current_unit->s, buf + have_read_record, to_read_subrecord); - if (unlikely (have_read_subrecord) < 0) + if (unlikely (have_read_subrecord < 0)) { generate_error (&dtp->common, LIBERROR_OS, NULL); return; diff --git a/libgfortran/intrinsics/execute_command_line.c b/libgfortran/intrinsics/execute_command_line.c index d0f812d..fa6ea9f 100644 --- a/libgfortran/intrinsics/execute_command_line.c +++ b/libgfortran/intrinsics/execute_command_line.c @@ -72,52 +72,54 @@ execute_command_line (const char *command, bool wait, int *exitstat, #if defined(HAVE_FORK) if (!wait) { /* Asynchronous execution. */ pid_t pid; set_cmdstat (cmdstat, EXEC_NOERROR); if ((pid = fork()) < 0) set_cmdstat (cmdstat, EXEC_CHILDFAILED); else if (pid == 0) { /* Child process. */ int res = system (cmd); _exit (WIFEXITED(res) ? WEXITSTATUS(res) : res); } } else #endif { /* Synchronous execution. */ int res = system (cmd); if (res == -1) set_cmdstat (cmdstat, EXEC_SYSTEMFAILED); +#ifndef HAVE_FORK else if (!wait) set_cmdstat (cmdstat, EXEC_SYNCHRONOUS); +#endif else set_cmdstat (cmdstat, EXEC_NOERROR); if (res != -1) { #if defined(WEXITSTATUS) && defined(WIFEXITED) *exitstat = WIFEXITED(res) ? WEXITSTATUS(res) : res; #else *exitstat = res; #endif } } /* Now copy back to the Fortran string if needed. */ if (cmdstat && *cmdstat > EXEC_NOERROR) { if (cmdmsg) fstrcpy (cmdmsg, cmdmsg_len, cmdmsg_values[*cmdstat], strlen (cmdmsg_values[*cmdstat])); else runtime_error ("Failure in EXECUTE_COMMAND_LINE: %s", cmdmsg_values[*cmdstat]); } }
[avr,committed] Fix PR56591
http://gcc.gnu.org/r196604 This adds a missing space in output_operand_lossage. Applied as obvious. Johann PR target/56591 * config/avr/avr.c (avr_print_operand): Add space after '%c' in output_operand_lossage message. Index: config/avr/avr.c === --- config/avr/avr.c(revision 196602) +++ config/avr/avr.c(working copy) @@ -2231,7 +2231,7 @@ avr_print_operand (FILE *file, rtx x, in { HOST_WIDE_INT ival = INTVAL (avr_to_int_mode (x)); if (code != 0) -output_operand_lossage ("Unsupported code '%c'for fixed-point:", +output_operand_lossage ("Unsupported code '%c' for fixed-point:", code); fprintf (file, HOST_WIDE_INT_PRINT_DEC, ival); }
Re: [PATCH] PowerPC merge TD/TF moves
On Thu, Mar 07, 2013 at 08:45:10PM -0500, David Edelsohn wrote: > On Wed, Jan 30, 2013 at 5:50 PM, Michael Meissner > wrote: > > This patch like the previous 2 pages combines the decimal and binary > > floating > > point moves, this time for 128-bit floating point. > > > > In doing this patch, I discovered that I left out the code in the previous > > patch to enable the wg constraint to enable -mcpu=power6x to utilize the > > direct > > move instructions. So, I added the code in this patch, and also created a > > test > > to make sure that direct moves are generated in the future. > > > > I also added the reload helper for DDmode to rs6000_vector_reload that was > > missed in the last patch. This was harmless, since that is only used with > > an > > undocumented debug switch. Hopefully sometime in the future, I will scalar > > floating point to be able to be loaded in the upper 32 VSX registers that > > are > > overlaid over the Altivec registers. > > > > Like the previous 2 patches, I've bootstrapped this, and ran make check > > with no > > regressions. Is it ok to apply when GCC 4.9 opens up? > > > > I have one more patch in the insn combination to post, combining movdi on > > systems with normal floating point and with the power6 direct move > > instructions. > > Mike, > > Which of these sets of patches adjusts and updates > rs6000_register_move_cost for -mfpgpr and for VSRs and FPRs sharing > the same register file? None of these patches adjust register_move_cost. -- Michael Meissner, IBM Now: M/S 2757, 5 Technology Place Drive, Westford, MA 01886-3141, USA March 20: M/S 2506R, 550 King Street, Littleton, MA 01460, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: extend fwprop optimization
On Mon, Mar 11, 2013 at 6:52 AM, Wei Mi wrote: > This is the fwprop extension patch which is put in order. Regression > test and bootstrap pass. Please help to review its rationality. The > following is a brief description what I have done in the patch. > > In order to make fwprop more effective in rtl optimization, we extend > it to handle general expressions instead of the three cases listed in > the head comment in fwprop.c. The major changes include a) We need to > check propagation correctness for src exprs of def which contain mem > references. Previous fwprop for the three cases above doesn't have the > problem. b) We need a better cost model because the benefit is usually > not so apparent as the three cases above. > > For a general fwprop problem, there are two possible sources where > benefit comes from. The frist is the new use insn after propagation > and simplification may have lower cost than itself before propagation, > or propagation may create a new insn, that could be splitted or > peephole optimized later and get a lower cost. The second is that if > all the uses are replaced with the src of the def insn, the def insn > could be deleted. > > So instead of check each def-use pair independently, we use DU chain > to track all the uses for a def. For each def-use pair, we attempt the > propagation, record the change candidate in changes[] array, but we > wait to confirm the changes until all the pairs with the same def are > iterated. The changes confirmation is done in the func > confirm_change_group_by_cost. We only do this for fwprop. For > fwprop_addr, the benefit of each change is ensured by > propagation_rtx_1 using should_replace_address, so we just confirm all > the changes without checking benefit again. Hello Wei Mi, So IIUC, in essence you are doing: main: FOR_EACH_BB: FOR_BB_INSNS, non-debug insns only: for each df_ref DEF operand on insn: iterate_def_uses iterate_def_uses: for each UD chain from DEF to USE(i): forward_propagate_into confirm changes by total benefit I still like the idea, but there are also still a few "design issues" to resolve. Some of the same comments as before apply: Do you really, really, reallyreally have to go so low-level as to insn splitting, peephole optimizations, and even register allocation, to get the cost right? That will almost certainly not be acceptable, and I for one would oppose such a change. It's IMHO a violation of proper engineering when your medium-to-high level code transformations have to do that. If you have strong reasons for your approach, it'd be helpful if you can explain them so that we can together look for a less intrusive solution (e.g. splitting earlier, adjusting the cost model, etc.). So things like: > + /* we may call peephole2_insns in fwprop phase to estimate how > + peephole will affect the cost of the insn transformed by fwprop. > + fwprop is done before ira phase. In that case, we simply return > + a new pseudo register. */ > + if (!strncmp (current_pass->name, "fwprop", 6)) > +return gen_reg_rtx (mode); and > Index: config/i386/i386.c > === > --- config/i386/i386.c(revision 196270) > +++ config/i386/i386.c(working copy) > @@ -15901,8 +15901,14 @@ ix86_expand_clear (rtx dest) > { >rtx tmp; > > - /* We play register width games, which are only valid after reload. */ > - gcc_assert (reload_completed); > + /* We play register width games, which are only valid after reload. > + An exception: fwprop call peephole to estimate the change benefit, > + and peephole will call this func. That is before reload complete. > + It will not bring any problem because the peephole2_insns call is > + only used for cost estimation in fwprop, and its change will be > + abandoned immediately after the cost estimation. */ > + if (strncmp (current_pass->name, "fwprop", 6)) > +gcc_assert (reload_completed); are IMHO not OK. Note that your patch is a bit difficult to read at some points because you have included a bunch of non-changes (whitespaces fixes -- necessary cleanups but not relevant for your patch), see e.g. the changed lines that contain "lra_in_progress". Also the changes like: > static bool > -propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags) > +propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, bool speed) which are quite distracting, making it harder to see what has *really* changed. You should probably just a helper function apply_change_group_num() and avoid all the apply_change_group use fixups. In fwprop.c: > + /* DF_LR_RUN_DCE is used in peephole2_insns, which is called for cost > + estimation in estimate_split_and_peephole. */ > + df_set_flags (DF_LR_RUN_DCE); >df_md_add_problem (); >df_note_add_problem (); > - df_analyze (); > + df_chain_add_problem (DF_UD_CHAIN | DF_DU_CHAIN); >df_maybe_reorganize_use_refs (DF_RE
[trunk][google/gcc47]Add dependence of configure-target-libmudflap on configure-target-libstdc++-v3 (issue7740043)
binWsrE0LGKO9.bin Description: Binary data
Re: [trunk][google/gcc47]Add dependence of configure-target-libmudflap on configure-target-libstdc++-v3 (issue7740043)
Don't know why the email body became attachment. Sent it again. The review link is https://codereview.appspot.com/7740043 Hi Diego, The nightly build of gcc-4.7 based ppc64 and ppc32 crosstools have failed since the build server upgraded to gPrecise one week ago. Log shows a configuration fa ilure on libmudflap. checking for suffix of object files... /lib/cpp configure: error: in `/g/nightly/build/work/gcc-4.7.x-grtev3-powerpc32-8540/rpmbuild/BUILD/.../powerpc-grtev3-linux-gnu/libmudflap': configure: error: C++ preprocessor "/lib/cpp" fails sanity check See `config.log' for more details. There is no /lib/cpp on gprecise server, though it should not be used here. What happened was that libmudflap configure looks for a preprocessor by trying $CXX -E and then backing off to /lib/cpp. $CXX -E is failing with "unrecognized command line option ‘-funconfigured-libstdc++’", and the /lib/cpp backstop then fails also. The -funconfigured-libstdc++ is because configure can't find libstdc++/scripts/testsuite_flags. This is a so-far undiagnosed race in gcc make, masked where /lib/cpp is available. And that's absent because in this build, for whatever reason, libstdc++ loses a race with libmudflap. The theory is confirmed by: 1) if we force --job=1, build can succeed 2) If we apply the following patch to build-gcc/Makefile, build can succeed. After removing this dependency, build fails with the same error again. Is this patch ok for google/gcc-4_7? If the same issue exists on upstream trunk, how does the patch sound to trunk? Thanks, Jing 2013-03-11 Jing Yu * Makefile.in: (maybe-configure-target-libmudflap): Add dependence on configure-target-libstdc++-v3. Index: Makefile.in === --- Makefile.in (revision 196604) +++ Makefile.in (working copy) @@ -31879,6 +31879,9 @@ maybe-configure-target-libmudflap: @if gcc-bootstrap configure-target-libmudflap: stage_current @endif gcc-bootstrap +@if target-libstdc++-v3 +configure-target-libmudflap: configure-target-libstdc++-v3 +@endif target-libstdc++-v3 @if target-libmudflap maybe-configure-target-libmudflap: configure-target-libmudflap configure-target-libmudflap:
Re: [committed] xfail gcc.dg/tree-ssa/pr55579.c on 32-bit hppa*-*-hpux*
On 11-Mar-13, at 5:25 AM, Rainer Orth wrote: You most likely need whitespace before the xfail. Also try to add a comment (PR reference?) explaining it. Added comments explaining xfails for gcc.dg/tree-ssa/pr55579.c and gcc.dg/tree-ssa/vector-4.c. Fixed whitespace. Committed after testing on hppa2.0w-hp-hpux11.11. Dave -- John David Anglin dave.ang...@bell.net 2013-03-11 John David Anglin * gcc.dg/tree-ssa/vector-4.c: Add comment regarding xfail. * gcc.dg/tree-ssa/pr55579.c: Likewise. Index: gcc.dg/tree-ssa/vector-4.c === --- gcc.dg/tree-ssa/vector-4.c (revision 196592) +++ gcc.dg/tree-ssa/vector-4.c (working copy) @@ -9,6 +9,7 @@ } /* The compound literal should be placed directly in the vec_perm. */ +/* Test is xfailed on 32-bit hppa*-*-* because target-callee-copies. */ /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR ;" 1 "gimple" { xfail { hppa*-*-* && { ! lp64 } } } } } */ /* { dg-final { cleanup-tree-dump "gimple" } } */ Index: gcc.dg/tree-ssa/pr55579.c === --- gcc.dg/tree-ssa/pr55579.c (revision 196592) +++ gcc.dg/tree-ssa/pr55579.c (working copy) @@ -11,5 +11,6 @@ return x; } -/* { dg-final { scan-tree-dump "Created a debug-only replacement for s" "esra" {xfail { hppa*-*-hpux* && { ! lp64 } } } } } */ +/* Test fails on 32-bit hppa*-*-hpux*. See PR debug/56307. */ +/* { dg-final { scan-tree-dump "Created a debug-only replacement for s" "esra" { xfail { hppa*-*-hpux* && { ! lp64 } } } } } */ /* { dg-final { cleanup-tree-dump "esra" } } */
[PATCH][4.8][4.7][4.6] Make -shared-libgcc the default on Cygwin.
Hello list, The attached patch makes -shared-libgcc the default for Cygwin. This is something that I should have done some time ago, as shared libgcc on Cygwin is more than mature. What's more, it is vital for reliable compilation of applications that throw exceptions or share TLS variables from DLLs into the main executable; at present these compile incorrectly without an explicit -shared-libgcc. For instance, the just-released MPFR-3.1.2 doesn't work without it. Given that it's a very simple tweak to the compiler specs on a single platform only, I would like to use my target maintainer's discretion to apply it even at this late stage, but I figure it's so close to RC1 that I should ask the RM's permission anyway. I'd also like to backport it to all the currently-open branches. gcc/ChangeLog 2013-03-12 Dave Korn * config/i386/cygwin.h (SHARED_LIBGCC_SPEC): Make shared libgcc the default setting. Is this OK by everyone? cheers, DaveK Index: gcc/config/i386/cygwin.h === --- gcc/config/i386/cygwin.h (revision 196604) +++ gcc/config/i386/cygwin.h (working copy) @@ -48,11 +48,7 @@ along with GCC; see the file COPYING3. If not see %{static|static-libgcc:-lgcc -lgcc_eh} \ %{!static: \ %{!static-libgcc: \ - %{!shared: \ - %{!shared-libgcc:-lgcc -lgcc_eh} \ - %{shared-libgcc:-lgcc_s -lgcc} \ - } \ - %{shared:-lgcc_s -lgcc} \ + -lgcc_s -lgcc \ } \ } " #else