Re: Intrinsics for N2965: Type traits and base classes
> This patch consists intrinsics to properly create the bases and > direct_bases of a class in the correct order (including multiple nested > ambiguous virtual and non-virtual classes) for N2965 > (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2965.html). > This allows you to create type traits for giving the base classes of the > class: Please post a ChangeLog entry with a patch. Someone added one for you: 2011-10-17 Michael Spertus * gcc/c-family/c-common.c (c_common_reswords): Add __bases, __direct_bases. * gcc/c-family/c-common.h: Add RID_BASES and RID_DIRECT_BASES. but it is in the wrong file (c-family has its own ChangeLog) and shouldn't contain the gcc/c-family prefix (paths are relative to the directory). -- Eric Botcazou
Re: [C++ Patch] PR 50757
> exactly like the recently fixed c++/17212. Tested x86_64-linux. c-family has its own ChangeLog file, all changes must be documented there. -- Eric Botcazou
Re: [patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling
On Oct 17, 2011, at 3:16 PM, Tom Tromey wrote: >> "Tristan" == Tristan Gingold writes: > > Tom> Another way to look at it is that there have been many changes to GCC's > Tom> DWARF output in the last few years. Surely these have broken these > Tom> DWARF consumers more than this change possibly could. > > Tristan> Yes, but there is -gstrict-dwarf to stay compatible with old > behavior. > > Yes, but this change is strictly compliant. Agreed. > What makes it different > from any other change that was made to make GCC more compliant? > Hypothetical consumers could also break on those changes. The others changes were often corner cases, while this one is very visible. What is wrong with my suggestion of adding a command line option to keep the siblings ? This option could be removed in a few years if nobody complained about sibling removal. Tristan.
Re: RFA: Improve tree-ssa-sink block selection
On 10/18/2011 07:10 AM, Jeff Law wrote: --- 467,475 if (gimple_code (use) != GIMPLE_PHI) { sinkbb = gimple_bb (use); ! sinkbb = select_best_block (frombb, gimple_bb (use), stmt); ! if (sinkbb == frombb) return false; *togsi = gsi_for_stmt (use); Useless assignment of sinkbb, otherwise looks fine. By the way, is it intended that sink_code_in_bb visits again postdominators that were already visited (which with domwalk would come for free)? As it is, the pass is quadratic when you have something like this: if (x1) y; else z; if (x2) y; else z; ... if (x) y; else z; if (x1) y; else z; where the postdominator tree is x1 | x2 | x | x1 y1 z1 y2 z2 ... y1 z1 \___|__|__|__|__/__,---' | EXIT_BLOCK Paolo
Re: resent2 [PATCH] Fix ICE in redirect_jump, at jump.c:1497 PR50496
> thread_prologue_and_epilogue_insns should detect all cases where a > return insn can be created. So any CFG cleanup that runs before it does > not need this functionality. So we're left with CFG cleanups that run after it and could forward edges to an edge from a return insn to the exit block in order to build a new return insn. -- Eric Botcazou
Re: [PATCH][RFC] "Fix" PR50716, override type alignment knowledge
On Mon, 17 Oct 2011, Eric Botcazou wrote: > > 2011-10-17 Richard Guenther > > > > PR middle-end/50716 > > * expr.c (get_object_or_type_alignment): New function. > > (expand_assignment): Use it. > > (expand_expr_real_1): Likewise. > > Maybe move it to builtins.c alongside the other get_*_alignment functions. I explicitly didn't do that because the function shouldn't be used without knowing exactly what you do ;) > > Index: gcc/expr.c > > === > > *** gcc/expr.c (revision 180077) > > --- gcc/expr.c (working copy) > > *** get_bit_range (unsigned HOST_WIDE_INT *b > > *** 4544,4549 > > --- 4544,4568 > > } > > } > > > > + /* Return the alignment of the object EXP, also considering its type > > +when we do not know of explicit misalignment. > > +??? Note that generally the type of an expression is not kept > > +consistent with misalignment information by the frontend, for > > +example when taking the address of a member of a packed structure. > > +So taking into account type information for alignment is generally > > +wrong, but is done here as a compromise. */ > > This sounds backwards, as taking into account type information is generally > correct, i.e. the packed case is the exception. Maybe use "in the general > case" instead: > > ??? Note that, in the general case, the type of an expression is not kept > consistent with the misalignment by the front-end, for example when taking > the address of a member of a packed structure. However, in most of the > cases, expressions have the alignment of their type, so we fall back to > the alignment of the type when we cannot compute a misalignment. Ok, I'll adjust it a bit, adding "so we fall back optimistically to the alignment...", because that's what is true - for the packed case we simply compute wrong alignment this way (but for the expand case this has been the case since forever, noticed only on strict-align targets). Note that "cannot compute a misalignment" applies also to the case where we for some reason explicitly know the alignment is just BITS_PER_UNIT (you cannot 'misalign' that). So there are still a lot of cases where known (in some sense of known) alignment is overridden by maybe bogus type information. It's just less cases now, but I think we may be able to improve the situation by maybe returning an extra argument from get_object_alignment_1 that says whether the alignment might be bigger or whether the alignment is definitively known. But that's for a followup I presume. > > + static unsigned int > + get_object_or_type_alignment (tree exp) > + { > + unsigned HOST_WIDE_INT misalign; > + unsigned int align = get_object_alignment_1 (exp, &misalign); > + align = MAX (TYPE_ALIGN (TREE_TYPE (exp)), align); > + if (misalign != 0) > + align = (misalign & -misalign); > > You don't need to go through the MAX if misalign is non-zero. Ok. Committed as follows. Richard. 2011-10-18 Richard Guenther PR middle-end/50716 * expr.c (get_object_or_type_alignment): New function. (expand_assignment): Use it. (expand_expr_real_1): Likewise. Index: gcc/expr.c === *** gcc/expr.c (revision 180077) --- gcc/expr.c (working copy) *** get_bit_range (unsigned HOST_WIDE_INT *b *** 4544,4549 --- 4544,4570 } } + /* Return the alignment of the object EXP, also considering its type +when we do not know of explicit misalignment. +??? Note that, in the general case, the type of an expression is not kept +consistent with misalignment information by the front-end, for +example when taking the address of a member of a packed structure. +However, in most of the cases, expressions have the alignment of +their type, so we optimistically fall back to the alignment of the +type when we cannot compute a misalignment. */ + + static unsigned int + get_object_or_type_alignment (tree exp) + { + unsigned HOST_WIDE_INT misalign; + unsigned int align = get_object_alignment_1 (exp, &misalign); + if (misalign != 0) + align = (misalign & -misalign); + else + align = MAX (TYPE_ALIGN (TREE_TYPE (exp)), align); + return align; + } + /* Expand an assignment that stores the value of FROM into TO. If NONTEMPORAL is true, try generating a nontemporal store. */ *** expand_assignment (tree to, tree from, b *** 4553,4559 rtx to_rtx = 0; rtx result; enum machine_mode mode; ! int align; enum insn_code icode; /* Don't crash if the lhs of the assignment was erroneous. */ --- 4574,4580 rtx to_rtx = 0; rtx result; enum machine_mode mode; ! unsigned int align; enum insn_code icode; /* Don't crash if the lhs of the assignment was erroneous. */ *** expand_
[patch#2] dwarf2out: Drop the size + performance overhead of DW_AT_sibling
On Tue, 18 Oct 2011 09:24:08 +0200, Tristan Gingold wrote: > What is wrong with my suggestion of adding a command line option to keep the > siblings ? This option could be removed in a few years if nobody complained > about sibling removal. I find an extra option just a pollution of doc and everything by an option which will never get used. Wouldn't it be enough to disable it by -gstrict-dwarf? While currently the -gstrict-dwarf meaning is different I believe the purpose is correct - to be more backward compatible. Thanks, Jan gcc/ 2011-10-12 Jan Kratochvil Stop producing DW_AT_sibling without -gstrict-dwarf. * dwarf2out.c (dwarf2out_finish): Remove calls of add_sibling_attributes if !DWARF_STRICT. Extend the comment with reason. --- gcc/dwarf2out.c (revision 180121) +++ gcc/dwarf2out.c (working copy) @@ -22496,13 +22496,17 @@ dwarf2out_finish (const char *filename) prune_unused_types (); } - /* Traverse the DIE's and add add sibling attributes to those DIE's - that have children. */ - add_sibling_attributes (comp_unit_die ()); - for (node = limbo_die_list; node; node = node->next) -add_sibling_attributes (node->die); - for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode->next) -add_sibling_attributes (ctnode->root_die); + if (dwarf_strict) +{ + /* Traverse the DIE's and add add sibling attributes to those DIE's that +have children. It is produced only for compatibility reasons as it is +both a size and consumer performance hit. */ + add_sibling_attributes (comp_unit_die ()); + for (node = limbo_die_list; node; node = node->next) + add_sibling_attributes (node->die); + for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode->next) + add_sibling_attributes (ctnode->root_die); +} /* Output a terminator label for the .text section. */ switch_to_section (text_section);
Re: [patch#2] dwarf2out: Drop the size + performance overhead of DW_AT_sibling
On Tue, Oct 18, 2011 at 10:28:09AM +0200, Jan Kratochvil wrote: > 2011-10-12 Jan Kratochvil > > Stop producing DW_AT_sibling without -gstrict-dwarf. > * dwarf2out.c (dwarf2out_finish): Remove calls of > add_sibling_attributes if !DWARF_STRICT. Extend the comment with > reason. This is ok for trunk. Jakub
Re: [Patch, Fortran] PR 47023: C_Sizeof: Rejects valid code
On 10/17/2011 11:37 PM, Janus Weil wrote: here is another patch for PR47023, which takes care of comment #1, i.e. rejecting polymorphic variables in a C-binding context. Regtested on x86_64-unknown-linux-gnu. Ok for trunk? OK. Thanks for the patch. If I saw it correctly, one still needs to fix the vendor-extension SIZEOF for procedure pointers (it currently returns one byte!) - and the issue in the original bug (comment 0). Tobias 2011-10-17 Janus Weil PR fortran/47023 * decl.c (verify_c_interop_param): Renamed to 'gfc_verify_c_interop_param'. Add error message for polymorphic arguments. (verify_c_interop): Renamed to 'gfc_verify_c_interop'. Reject polymorphic variables. (verify_bind_c_sym): Renamed 'verify_c_interop'. * gfortran.h (verify_c_interop,verify_c_interop_param): Renamed. * check.c (gfc_check_sizeof): Ditto. * resolve.c (gfc_iso_c_func_interface,resolve_fl_procedure): Ditto. * symbol.c (verify_bind_c_derived_type): Ditto. 2011-10-17 Janus Weil PR fortran/47023 * gfortran.dg/iso_c_binding_class.f03: New.
Re: [PATCH] Add capability to run several iterations of early optimizations
On Tue, Oct 18, 2011 at 1:45 AM, Maxim Kuvyrkov wrote: > On 13/10/2011, at 12:58 AM, Richard Guenther wrote: > >> On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov >> wrote: >>> The following patch adds new knob to make GCC perform several iterations of >>> early optimizations and inlining. >>> >>> This is for dont-care-about-compile-time-optimize-all-you-can scenarios. >>> Performing several iterations of optimizations does significantly improve >>> code speed on a certain proprietary source base. Some hand-tuning of the >>> parameter value is required to get optimum performance. Another good use >>> for this option is for search and ad-hoc analysis of cases where GCC misses >>> optimization opportunities. >>> >>> With the default setting of '1', nothing is changed from the current status >>> quo. >>> >>> The patch was bootstrapped and regtested with 3 iterations set by default >>> on i686-linux-gnu. The only failures in regression testsuite were due to >>> latent bugs in handling of EH information, which are being discussed in a >>> different thread. >>> >>> Performance impact on the standard benchmarks is not conclusive, there are >>> improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*]. >>> SPEC2006 benchmarks will take another day or two to complete and I will >>> update the spreadsheet then. The benchmarks were run on a Core2 system for >>> all combinations of {-m32/-m64}{-O2/-O3}. >>> >>> Effect on compilation time is fairly predictable, about 10% compile time >>> increase with 3 iterations. >>> >>> OK for trunk? >> >> I don't think this is a good idea, especially in the form you implemented it. >> >> If we'd want to iterate early optimizations we'd want to do it by iterating >> an IPA pass so that we benefit from more precise size estimates >> when trying to inline a function the second time. > > Could you elaborate on this a bit? Early optimizations are gimple passes, so > I'm missing your point here. pass_early_local_passes is an IPA pass, you want to iterate fn1, fn2, fn1, fn2, ..., not fn1, fn1 ..., fn2, fn2 ... precisely for better inlining. Thus you need to split pass_early_local_passes into pieces so you can iterate one of the IPA pieces. >> Also statically >> scheduling the passes will mess up dump files and you have no >> chance of say, noticing that nothing changed for function f and its >> callees in iteration N and thus you can skip processing them in >> iteration N + 1. > > Yes, these are the shortcomings. The dump files name changes can be fixed, > e.g., by adding a suffix to the passes on iterations after the first one. > The analysis to avoid unnecessary iterations is more complex problem. Sure. I analyzed early passes by manually duplicating them and test that they do nothing for tramp3d, which they pretty much all did at some point. >> >> So, at least you should split the pass_early_local_passes IPA pass >> into three, you'd iterate over the 2nd (definitely not over >> pass_split_functions >> though), the third would be pass_profile and pass_split_functions only. >> And you'd iterate from the place the 2nd IPA pass is executed, not >> by scheduling them N times. > > OK, I will look into this. > >> >> Then you'd have to analyze the compile-time impact of the IPA >> splitting on its own when not iterating. Then you should look >> at what actually was the optimizations that were performed >> that lead to the improvement (I can see some indirect inlining >> happening, but everything else would be a bug in present >> optimizers in the early pipeline - they are all designed to be >> roughly independent on each other and _not_ expose new >> opportunities by iteration). Thus - testcases? > > The initial motivation for the patch was to enable more indirect inlining and > devirtualization opportunities. Hm. > Since then I found the patch to be helpful in searching for optimization > opportunities and bugs. E.g., SPEC2006's 471.omnetpp drops 20% with 2 > additional iterations of early optimizations [*]. Given that applying more > optimizations should, theoretically, not decrease performance, there is > likely a very real bug or deficiency behind that. It is likely early SRA that messes up, or maybe convert switch. Early passes should be really restricted to always profitable cleanups. Your experiment looks useful to track down these bugs, but in general I don't think we want to expose iterating early passes. Richard. > Thank you, > > [*] > https://docs.google.com/spreadsheet/ccc?key=0AvK0Y-Pgj7bNdFBQMEJ6d3laeFdvdk9lQ1p0LUFkVFE&hl=en_US > > -- > Maxim Kuvyrkov > CodeSourcery / Mentor Graphics > > > >
Re: [PATCH 3/6] Emit macro expansion related diagnostics
Jason Merrill writes: > If you have a patch like this that fixes a major regression, go ahead > and check it in without waiting for approval; we can adjust it as > necessary after build is working again. OK. >> size_t num_expanded_macros; > >> - fprintf (stderr, "Number of expanded macros: %5lu\n", >> - s.num_expanded_macros); >> + fprintf (stderr, "Number of expanded macros: %5ld\n", >> + (long)s.num_expanded_macros); > > If we're going to use %l in printf, we should use long rather than > size_t in linemap_stats; that way we don't need the cast. OK, I have checked the below in. libcpp/ * include/line-map.h (struct linemap_stats): Change the type of the members from size_t to long. * macro.c (macro_arg_token_iter_init): Unconditionally initialize iter->location_ptr. gcc/c-family/ * c-lex.c (fe_file_change): Use LINEMAP_SYSP when !NO_IMPLICIT_EXTERN_C. gcc/ * input.c (dump_line_table_statistics): Use long, not size_t. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180124 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c index be83b61..b151564 100644 --- a/gcc/c-family/c-lex.c +++ b/gcc/c-family/c-lex.c @@ -211,7 +211,7 @@ fe_file_change (const struct line_map *new_map) #ifndef NO_IMPLICIT_EXTERN_C if (c_header_level) ++c_header_level; - else if (new_map->sysp == 2) + else if (LINEMAP_SYSP (new_map) == 2) { c_header_level = 1; ++pending_lang_change; @@ -224,7 +224,7 @@ fe_file_change (const struct line_map *new_map) #ifndef NO_IMPLICIT_EXTERN_C if (c_header_level && --c_header_level == 0) { - if (new_map->sysp == 2) + if (LINEMAP_SYSP (new_map) == 2) warning (0, "badly nested C headers from preprocessor"); --pending_lang_change; } diff --git a/gcc/input.c b/gcc/input.c index 41842b7..a780f5c 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -80,7 +80,7 @@ void dump_line_table_statistics (void) { struct linemap_stats s; - size_t total_used_map_size, + long total_used_map_size, macro_maps_size, total_allocated_map_size; @@ -99,45 +99,45 @@ dump_line_table_statistics (void) + s.macro_maps_used_size + s.macro_maps_locations_size; - fprintf (stderr, "Number of expanded macros: %5lu\n", + fprintf (stderr, "Number of expanded macros: %5ld\n", s.num_expanded_macros); if (s.num_expanded_macros != 0) -fprintf (stderr, "Average number of tokens per macro expansion: %5lu\n", +fprintf (stderr, "Average number of tokens per macro expansion: %5ld\n", s.num_macro_tokens / s.num_expanded_macros); fprintf (stderr, "\nLine Table allocations during the " "compilation process\n"); - fprintf (stderr, "Number of ordinary maps used:%5lu%c\n", + fprintf (stderr, "Number of ordinary maps used:%5ld%c\n", SCALE (s.num_ordinary_maps_used), STAT_LABEL (s.num_ordinary_maps_used)); - fprintf (stderr, "Ordinary map used size: %5lu%c\n", + fprintf (stderr, "Ordinary map used size: %5ld%c\n", SCALE (s.ordinary_maps_used_size), STAT_LABEL (s.ordinary_maps_used_size)); - fprintf (stderr, "Number of ordinary maps allocated: %5lu%c\n", + fprintf (stderr, "Number of ordinary maps allocated: %5ld%c\n", SCALE (s.num_ordinary_maps_allocated), STAT_LABEL (s.num_ordinary_maps_allocated)); - fprintf (stderr, "Ordinary maps allocated size:%5lu%c\n", + fprintf (stderr, "Ordinary maps allocated size:%5ld%c\n", SCALE (s.ordinary_maps_allocated_size), STAT_LABEL (s.ordinary_maps_allocated_size)); - fprintf (stderr, "Number of macro maps used: %5lu%c\n", + fprintf (stderr, "Number of macro maps used: %5ld%c\n", SCALE (s.num_macro_maps_used), STAT_LABEL (s.num_macro_maps_used)); - fprintf (stderr, "Macro maps used size:%5lu%c\n", + fprintf (stderr, "Macro maps used size:%5ld%c\n", SCALE (s.macro_maps_used_size), STAT_LABEL (s.macro_maps_used_size)); - fprintf (stderr, "Macro maps locations size: %5lu%c\n", + fprintf (stderr, "Macro maps locations size: %5ld%c\n", SCALE (s.macro_maps_locations_size), STAT_LABEL (s.macro_maps_locations_size)); - fprintf (stderr, "Macro maps size: %5lu%c\n", + fprintf (stderr, "Macro maps size: %5ld%c\n", SCALE (macro_maps_size), STAT_LABEL (macro_maps_size)); - fprintf (stderr, "Duplicated maps locations size: %5lu%c\n", + fprintf (stderr, "Duplicated maps locat
Re: [PATCH] Simplify and fix restrict handling
On Mon, 17 Oct 2011, Richard Guenther wrote: > On Fri, 14 Oct 2011, Richard Guenther wrote: > > > > > This follows up Michas testcase where we fail to handle the > > conservatively propagated restrict tags properly. The following > > patch simplifies handling of restrict in the oracle and thus > > only excludes NONLOCAL (as designed), but not ESCAPED from > > conflict checking. > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > So, after some regressions caused by this patch and some more thinking > (about more possible issues) I concluded that we can simplify things > even more by not making restrict vars point to NONLOCAL, but only > to their tag (but marking that as global and able to have a points-to > set). This way the special-casing of NONLOCAL vs. restrict can go > away, and with it all its possible problems. Restrict is now > similar to malloc () memory that escapes. > > Hopefully this one is without regressions ;) > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. And this is what I ended up applying after fixing constraints again. For restrict qualified global (or parameter) pointers we now generate p = &RESTRICT_TAG RESTRICT_TAG = NONLOCAL if we implmenent the proposed RESTRICT_CAST expression from p = RESTRICT_CAST ; we'd need to generate p = &TAG; TAG = *q; With this in place we can now disambiguate restrict qualified pointers against global decls which wasn't possible before (we invented the DECL_IS_RESTRICTED_P flag for this). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-10-17 Richard Guenther * tree-ssa-alias.h (struct pt_solution): Remove vars_contains_restrict member. (pt_solutions_same_restrict_base): Remove. (pt_solution_set): Adjust. * tree-ssa-alias.c (ptr_deref_may_alias_decl_p): Remove vars_contains_restrict handling. (dump_points_to_solution): Likewise. (ptr_derefs_may_alias_p): Do not call pt_solutions_same_restrict_base. * tree-ssa-structalias.c (struct variable_info): Remove is_restrict_var field. (new_var_info): Do not initialize it. (ipa_escaped_pt): Adjust. (make_constraint_from_restrict): Make the tag global. (make_constraint_from_global_restrict): New function. (make_constraint_from_heapvar): Remove. (create_variable_info_for): Do not make restrict vars point to NONLOCAL. (intra_create_variable_infos): Likewise. (find_what_var_points_to): Remove vars_contains_restrict handling. (pt_solution_set): Adjust. (pt_solution_ior_into): Likewise. (pt_solutions_same_restrict_base): Remove. (compute_points_to_sets): Do not test is_restrict_var. * cfgexpand.c (update_alias_info_with_stack_vars): Adjust. * gimple-pretty-print.c (pp_points_to_solution): Likewise. * gcc.dg/torture/restrict-1.c: New testcase. Index: gcc/tree-ssa-alias.c === *** gcc/tree-ssa-alias.c.orig 2011-10-17 16:56:54.0 +0200 --- gcc/tree-ssa-alias.c2011-10-17 16:57:21.0 +0200 *** ptr_deref_may_alias_decl_p (tree ptr, tr *** 219,231 if (!pi) return true; - /* If the decl can be used as a restrict tag and we have a restrict - pointer and that pointers points-to set doesn't contain this decl - then they can't alias. */ - if (DECL_RESTRICTED_P (decl) - && pi->pt.vars_contains_restrict) - return bitmap_bit_p (pi->pt.vars, DECL_PT_UID (decl)); - return pt_solution_includes (&pi->pt, decl); } --- 219,224 *** ptr_derefs_may_alias_p (tree ptr1, tree *** 316,326 if (!pi1 || !pi2) return true; - /* If both pointers are restrict-qualified try to disambiguate - with restrict information. */ - if (!pt_solutions_same_restrict_base (&pi1->pt, &pi2->pt)) - return false; - /* ??? This does not use TBAA to prune decls from the intersection that not both pointers may access. */ return pt_solutions_intersect (&pi1->pt, &pi2->pt); --- 309,314 *** dump_points_to_solution (FILE *file, str *** 426,433 dump_decl_set (file, pt->vars); if (pt->vars_contains_global) fprintf (file, " (includes global vars)"); - if (pt->vars_contains_restrict) - fprintf (file, " (includes restrict tags)"); } } --- 414,419 Index: gcc/tree-ssa-alias.h === *** gcc/tree-ssa-alias.h.orig 2011-10-17 16:56:54.0 +0200 --- gcc/tree-ssa-alias.h2011-10-17 16:57:21.0 +0200 *** struct GTY(()) pt_solution *** 54,61 /* Nonzero if the pt_vars bitmap includes a global variable. */ unsigned int vars_contains_global : 1; - /* Nonzero if the pt_vars bitmap includes a r
Re: [PATCH 1/6] Linemap infrastructure for virtual locations
Gerald Pfeifer writes: > On Mon, 17 Oct 2011, Dodji Seketeli wrote: >>> this looks like it's causing the following bootstrap failure for me >>> on i386-unknown-freebsd9.0? >> Yes this is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50760, and I am >> testing the patch attached to the bug at the moment. Just curious, does >> that patch help for your target? > > Yes, applying the input.c part of the patch I am onw beyond the point > where bootstrap failed last night and into stage 3. Thank you for testing. I have checked the fix in. It should hopefully be fixed now. -- Dodji
Re: [PR50672, PATCH] Fix ice triggered by -ftree-tail-merge: verify_ssa failed: no immediate_use list
On 10/17/2011 01:51 PM, Richard Guenther wrote: > On Sun, Oct 16, 2011 at 12:05 PM, Tom de Vries wrote: >> On 10/14/2011 12:00 PM, Richard Guenther wrote: >>> On Fri, Oct 14, 2011 at 1:12 AM, Tom de Vries >>> wrote: On 10/12/2011 02:19 PM, Richard Guenther wrote: > On Wed, Oct 12, 2011 at 8:35 AM, Tom de Vries > wrote: >> Richard, >> >> I have a patch for PR50672. >> >> When compiling the testcase from the PR with -ftree-tail-merge, the >> scenario is >> as follows: >> >> We start out tail_merge_optimize with blocks 14 and 20, which are alike, >> but not >> equal, since they have different successors: >> ... >> # BLOCK 14 freq:690 >> # PRED: 25 [61.0%] (false,exec) >> >> if (wD.2197_57(D) != 0B) >>goto ; >> else >>goto ; >> # SUCC: 15 [78.4%] (true,exec) 16 [21.6%] (false,exec) >> >> >> # BLOCK 20 freq:2900 >> # PRED: 29 [100.0%] (fallthru) 31 [100.0%] (fallthru) >> >> # .MEMD.2447_209 = PHI <.MEMD.2447_125(29), .MEMD.2447_129(31)> >> if (wD.2197_57(D) != 0B) >>goto ; >> else >>goto ; >> # SUCC: 5 [85.0%] (true,exec) 6 [15.0%] (false,exec) >> ... >> >> In the first iteration, we merge block 5 with block 15 and block 6 with >> block >> 16. After that, the blocks 14 and 20 are equal. >> >> In the second iteration, the blocks 14 and 20 are merged, by redirecting >> the >> incoming edges of block 20 to block 14, and removing block 20. >> >> Block 20 also contains the definition of .MEMD.2447_209. Removing the >> definition >> delinks the vuse of .MEMD.2447_209 in block 5: >> ... >> # BLOCK 5 freq:6036 >> # PRED: 20 [85.0%] (true,exec) >> >> # PT = nonlocal escaped >> D.2306_58 = &thisD.2200_10(D)->D.2156; >> # .MEMD.2447_132 = VDEF <.MEMD.2447_209> >> # USE = anything >> # CLB = anything >> drawLineD.2135 (D.2306_58, wD.2197_57(D), gcD.2198_59(D)); >> goto ; >> # SUCC: 17 [100.0%] (fallthru,exec) >> ... > > And block 5 is retained and block 15 is discarded? > Indeed. >> After the pass, when executing the TODO_update_ssa_only_virtuals, we >> update the >> drawLine call in block 5 using rewrite_update_stmt, which calls >> maybe_replace_use for the vuse operand. >> >> However, maybe_replace_use doesn't have an effect since the old vuse and >> the new >> vuse happen to be the same (rdef == use), so SET_USE is not called and >> the vuse >> remains delinked: >> ... >> if (rdef && rdef != use) >>SET_USE (use_p, rdef); >> ... >> >> The patch fixes this by forcing SET_USE for delinked uses. > > That isn't the correct fix. Whoever unlinks the vuse (by removing its > definition) has to replace it with something valid, which is either the > bare symbol .MEM, or the VUSE associated with the removed VDEF > (thus, as unlink_stmt_vdef does). > Another try. For each deleted bb, we call unlink_stmt_vdef for the statements, and replace the .MEM phi uses with the bare .MEM symbol. Bootstrapped and reg-tested on x86_64. Ok for trunk? >>> >>> Better. For >>> >>> + >>> + FOR_EACH_IMM_USE_STMT (use_stmt, iter, res) >>> + { >>> + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) >>> + SET_USE (use_p, SSA_NAME_VAR (res)); >>> + } >>> >>> you can use mark_virtual_phi_result_for_renaming (phi) instead. >>> >>> + for (i = gsi_last_bb (bb); !gsi_end_p (i); gsi_prev_nondebug (&i)) >>> +unlink_stmt_vdef (gsi_stmt (i)); >>> >>> is that actually necessary? That is, isn't the block that follows a >>> deleted block always starting with a virtual PHI? >> >> Block 20 is deleted. Block 5 follows the deleted block 20. Block 5 does not >> start with a virtual PHI. >> >>> If not it should >>> be enough to walk to the first stmt that uses a virtual operand >>> and similar to the PHI case replace all its uses with the bare >>> symbol. >> >> I think we need to handle the exposed uses (meaning outside the block) of >> vdefs >> in each deleted block. The only vdef that can have exposed uses is the last >> vdef >> in the block. So we should find the last vdef (gimple with gimple_vdef or >> virtual PHI) and handle the related uses. >> >> Bootstrapped and regtested on x86_64. OK for trunk? > > Hmmm. I can see that this will fail when the block has a store > but not a virtual PHI node, thus, when renaming does not reach > a bare .MEM symbol walking the use-def chain from the VUSE > of the store. What release_last_vdef should do is trigger a rename > of subsequent VUSEs in successors of the deleted block. Which > requires you to do something like > > static void > rename_last_vdef (basic_block bb) > { > + gimple_stmt_iterator i; > + > + for (i = gsi_last_bb (bb); !gsi_
Re: [cxx-mem-model] Rename __sync_mem to __atomic
> In order to align with a standard naming convention, this patch renames > all the __sync_mem references to __atomic. > > The only real addition is in builtins.c::is_builtin_name needed to know > that __atomic_ is also a builtin prefix. Are you sure about the renaming get_builtin_sync_mem -> get_builtin_atomic? -- Eric Botcazou
Re: [C++ Patch] PR 50757
Hi, >> exactly like the recently fixed c++/17212. Tested x86_64-linux. > > c-family has its own ChangeLog file, all changes must be documented there. Oh my, I thought I had checked. I'll fix it, sorry. Paolo
Re: [patch] Support vectorization of widening shifts
On Tue, Oct 18, 2011 at 11:39:22AM +0200, Ira Rosen wrote: > On 2 October 2011 10:30, Ira Rosen wrote: > > On 29 September 2011 17:30, Ramana Radhakrishnan > > wrote: > >> On 19 September 2011 08:54, Ira Rosen wrote: > >> > >>> > >>> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux > >>> and arm-linux-gnueabi > >>> OK for mainline? > >> > >> Sorry I missed this patch. Is there any reason why we need unspecs in > >> this case ? Can't this be represented by subregs and zero/ sign > >> extensions in RTL without the UNSPECs ? > > I committed the attached patch with Ramana's solution for testing > +/* Detect widening shift pattern: > > + type a_t; > + TYPE a_T, res_T; > + > + S1 a_t = ; > + S2 a_T = (TYPE) a_t; > + S3 res_T = a_T << CONST; > + > + where type 'TYPE' is at least double the size of type 'type'. > + > + Also detect unsgigned cases: unsigned Jakub
Re: [PATCH] fortran/50514 -- Fix static chekcing of ISHFT[C] arguments.
On 10/17/2011 07:02 PM, Steve Kargl wrote: On Mon, Oct 17, 2011 at 12:22:03PM +0200, Tobias Burnus wrote: I am not so happy with complete test replacements. How about adding it as new test case? Well, the old testcase is [...] The only line that would survive is the first line, which is covered in the updated testcase. Convinced: Do the replacement. I somehow find less_than_bitsize1's + if (strncmp (arg2, "ISHFT", 5) == 0) not that elegant and would prefer another argument, which tells the function that it should take the absolute value of the argument; however, given that ISHFT seems to be the only function which allows negative arguments, one could also leave it. In looking at the other uses of less_than_bitsize1() I could pass arg2=NULL for ISHFT[C], and then the code would become if (arg2 == NULL) { /* Special case for ISHFT[C]. */ Would that be better? It would be faster - but not necessarily better. I was rather thinking of adding another argument like "bool abs_ok". However, I think all three solutions are OK. Pick one. Tobias
Re: [C++ Patch] __builtin_choose_expr
I'm under the impression that some tests are written in C, wouldn't better fit in c-c++-common? Thanks, Paolo.
Re: [patch] Support vectorization of widening shifts
On 18 October 2011 11:43, Jakub Jelinek wrote: > On Tue, Oct 18, 2011 at 11:39:22AM +0200, Ira Rosen wrote: >> On 2 October 2011 10:30, Ira Rosen wrote: >> > On 29 September 2011 17:30, Ramana Radhakrishnan >> > wrote: >> >> On 19 September 2011 08:54, Ira Rosen wrote: >> >> >> >>> >> >>> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux >> >>> and arm-linux-gnueabi >> >>> OK for mainline? >> >> >> >> Sorry I missed this patch. Is there any reason why we need unspecs in >> >> this case ? Can't this be represented by subregs and zero/ sign >> >> extensions in RTL without the UNSPECs ? >> >> I committed the attached patch with Ramana's solution for testing > >> +/* Detect widening shift pattern: >> >> + type a_t; >> + TYPE a_T, res_T; >> + >> + S1 a_t = ; >> + S2 a_T = (TYPE) a_t; >> + S3 res_T = a_T << CONST; >> + >> + where type 'TYPE' is at least double the size of type 'type'. >> + >> + Also detect unsgigned cases: > > unsigned Thanks, I'll fix this. Ira > > Jakub >
patch ping: Dump the "degree of overlap" to compare static profile with instrumentation profile
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01293.html Thanks! Dehao
Re: [Patch]: fix typo in rs6000.c (AIX bootstrap broken)
Ping… On Oct 13, 2011, at 5:11 PM, Tristan Gingold wrote: > Hi, > > looks like an obvious typo. Ok for trunk ? > > Tristan. > > 2011-10-13 Tristan Gingold > > * config/rs6000/rs6000.c (rs6000_init_builtins): Fix typo. > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 4fd2192..3bfe33e 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -12213,7 +12213,7 @@ rs6000_init_builtins (void) > > #if TARGET_XCOFF > /* AIX libm provides clog as __clog. */ > - if ((tdecl = builtin_decl_explicit ([BUILT_IN_CLOG))) != NULL_TREE) > + if ((tdecl = builtin_decl_explicit (BUILT_IN_CLOG)) != NULL_TREE) > set_user_assembler_name (tdecl, "__clog"); > #endif > >
Re: [Patch, Fortran] PR 47023: C_Sizeof: Rejects valid code
>> here is another patch for PR47023, which takes care of comment #1, >> i.e. rejecting polymorphic variables in a C-binding context. >> >> Regtested on x86_64-unknown-linux-gnu. Ok for trunk? > > OK. Thanks for the patch. Thanks. Committed as r180130. > If I saw it correctly, one still needs to fix the vendor-extension SIZEOF > for procedure pointers (it currently returns one byte!) Right. Actually I don't quite understand where that 1 byte comes from. Example: use iso_c_binding procedure(real), pointer :: pp pp => sin print *,sizeof(pp) print *,sizeof(pp(0.)) end This spits out: 1 4 The second one is correct (giving the pointee size). The first one should probably give the pointer size, but "1" is neither the size of the pointer nor the pointee. > and the issue in the original bug (comment 0). Yes. I think that's it. Cheers, Janus >> 2011-10-17 Janus Weil >> >> PR fortran/47023 >> * decl.c (verify_c_interop_param): Renamed to >> 'gfc_verify_c_interop_param'. Add error message for polymorphic >> arguments. >> (verify_c_interop): Renamed to 'gfc_verify_c_interop'. Reject >> polymorphic variables. >> (verify_bind_c_sym): Renamed 'verify_c_interop'. >> * gfortran.h (verify_c_interop,verify_c_interop_param): Renamed. >> * check.c (gfc_check_sizeof): Ditto. >> * resolve.c (gfc_iso_c_func_interface,resolve_fl_procedure): Ditto. >> * symbol.c (verify_bind_c_derived_type): Ditto. >> >> >> 2011-10-17 Janus Weil >> >> PR fortran/47023 >> * gfortran.dg/iso_c_binding_class.f03: New. > >
[PATCH] Fix PR50767
This fixes an oversight of my patch folding the added stmt in PRE - failing to update it. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-10-18 Richard Guenther PR tree-optimization/50767 * tree-ssa-pre.c (create_expression_by_pieces): Update the folded statement. * gcc.dg/torture/pr50767.c: New testcase. Index: gcc/tree-ssa-pre.c === *** gcc/tree-ssa-pre.c (revision 180127) --- gcc/tree-ssa-pre.c (working copy) *** create_expression_by_pieces (basic_block *** 3188,3194 /* Fold the last statement. */ gsi = gsi_last (*stmts); ! fold_stmt_inplace (&gsi); /* Add a value number to the temporary. The value may already exist in either NEW_SETS, or AVAIL_OUT, because --- 3188,3195 /* Fold the last statement. */ gsi = gsi_last (*stmts); ! if (fold_stmt_inplace (&gsi)) ! update_stmt (gsi_stmt (gsi)); /* Add a value number to the temporary. The value may already exist in either NEW_SETS, or AVAIL_OUT, because Index: gcc/testsuite/gcc.dg/torture/pr50767.c === *** gcc/testsuite/gcc.dg/torture/pr50767.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr50767.c (revision 0) *** *** 0 --- 1,21 + /* { dg-do compile } */ + /* { dg-options "-fno-tree-copy-prop -fno-tree-dominator-opts" } */ + + struct S + { + struct S *s; + }; + + static struct S *ss; + struct S *s; + + void bar(void); + + void foo(void) + { + for (;;) + { + s->s = ss; + bar (); + } + }
[patch testsuite]: Adjust tree-ssa/builtin-expect-*.c tests for high cost-branching optimization
Hello, this patch adjusts __builtin_expect tests in tree-ssa so, that short-circuit branch optimization don't lead to fallout. I've used here a multiplication, as simple substraction/addition might get optimized away. ChangeLog 2011-10-18 Kai Tietz * gcc.dg/tree-ssa/builtin-expect-1.c: Adjust test. * gcc.dg/tree-ssa/builtin-expect-2.c: Adjust test. * gcc.dg/tree-ssa/builtin-expect-3.c: Adjust test. * gcc.dg/tree-ssa/builtin-expect-4.c: Adjust test. * gcc.dg/tree-ssa/builtin-expect-5.c: Adjust test. Regression tested for x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-1.c === --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-1.c +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-1.c @@ -1,9 +1,9 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-gimple" } */ -f (int i, float j) +f (int i, float j, int i2, float j2) { - if (__builtin_expect (i > 0 && j, 0)) + if (__builtin_expect ((i * i2) > 0 && (j * j2), 0)) g (); } Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-2.c === --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-2.c +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-2.c @@ -1,9 +1,9 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-gimple" } */ -f (int i, float j) +f (int i, float j, int i2, float j2) { - if (__builtin_expect (i > 0 || j, 0)) + if (__builtin_expect ((i * i2) > 0 || (j * j2), 0)) ; else g (); Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-3.c === --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-3.c +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-3.c @@ -1,9 +1,9 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-gimple" } */ -f (int i, float j) +f (int i, float j, int i2, float j2) { - if (__builtin_expect (i > 0 && j, 0)) + if (__builtin_expect ((i * i2) > 0 && (j * j2), 0)) a (); else b (); Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-4.c === --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-4.c +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-4.c @@ -1,9 +1,9 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-gimple" } */ -f (int i, float j) +f (int i, float j, int i2, float j2) { - if (__builtin_expect (i > 0 || j, 0)) + if (__builtin_expect ((i * i2) > 0 || (j * j2), 0)) a (); else b (); Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c === --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c @@ -1,9 +1,9 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-forwprop" } */ -f (int i, float j) +f (int i, float j, int i2, float j2) { - if (__builtin_expect (i > 0 && __builtin_expect (j != 0, 1), 0)) + if (__builtin_expect ((i * i2) > 0 && __builtin_expect ((j * j2) != 0, 1), 0)) a (); else b ();
Re: [PATCH] fix -Wsign-compare error in objc-act.c (PR objc/50743)
> The FSF has had my papers since Feb '11, but are stalling trying to decide > whether to accept my employer's slightly non-standard disclaimer or not. Ah. That seems a long time even for this kind of things. Anyhow, this particular patch consisted of exactly 4 casts, so it seems to fall in the case of a "tiny patch" which doesn't require copyright assignment, so I applied it for you. Thanks
Re: [C++ Patch] __builtin_choose_expr
- Original Message - From: "Paolo Carlini" To: "Andy Gibbs" Cc: ; Sent: Tuesday, October 18, 2011 12:00 PM Subject: Re: [C++ Patch] __builtin_choose_expr > I'm under the impression that some tests are written in C, wouldn't better > fit in c-c++-common? Ok, I can do this but I think only 2 out of the 8 tests can be moved across. I will then move the duplicates out of the gcc.dg folder too. I will post a replacement patch later this afternoon. Andy
[Patch,AVR]: PR50447: Tweak addhi3
This patch do some tweaks to addhi3 like adding QI scratch register. The original *addhi3 insn is still there and located prior to new addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this note) so that there is a version with and a version without scratch register. Patch passes without regressions. Ok for trunk? PR target/50447 * config/avr/avr.md (cc): New alternative out_plus_noclobber. (adjust_len): Ditto. (addhi3): Don't pipe through short; use gen_int_mode instead. Prior to reload, expand to gen_addhi3_clobber. (*addhi3): Use avr_out_plus_noclobber if applicable, use out_plus_noclobber in cc and adjust_len attribute. (addhi3_clobber): 2 new RTL peepholes. (addhi3_clobber): New insn. * config/avr/avr-protos.h: (avr_out_plus_noclobber): New prototype. * config/avr/avr.c (avr_out_plus_noclobber): New function. (notice_update_cc): Handle CC_OUT_PLUS_NOCLOBBER. (avr_out_plus_1): Tweak if only MSB is +/-1 and other bytes are 0. Set cc0 to set_zn for adiw on 16-bit values. (adjust_insn_length): Handle ADJUST_LEN_OUT_PLUS_NOCLOBBER. (expand_epilogue): No need to add 0 to frame_pointer_rtx. Johann Index: config/avr/avr.md === --- config/avr/avr.md (revision 180104) +++ config/avr/avr.md (working copy) @@ -78,7 +78,7 @@ (define_c_enum "unspecv" ;; Condition code settings. (define_attr "cc" "none,set_czn,set_zn,set_n,compare,clobber, - out_plus" + out_plus, out_plus_noclobber" (const_string "none")) (define_attr "type" "branch,branch1,arith,xcall" @@ -125,7 +125,8 @@ (define_attr "length" "" ;; Otherwise do special processing depending on the attribute. (define_attr "adjust_len" - "out_bitop, out_plus, addto_sp, tsthi, tstsi, compare, call, + "out_bitop, out_plus, out_plus_noclobber, addto_sp, + tsthi, tstsi, compare, call, mov8, mov16, mov32, reload_in16, reload_in32, ashlqi, ashrqi, lshrqi, ashlhi, ashrhi, lshrhi, @@ -759,14 +760,23 @@ (define_expand "addhi3" (plus:HI (match_operand:HI 1 "register_operand" "") (match_operand:HI 2 "nonmemory_operand" "")))] "" - " -{ - if (GET_CODE (operands[2]) == CONST_INT) -{ - short tmp = INTVAL (operands[2]); - operands[2] = GEN_INT(tmp); -} -}") + { +if (CONST_INT_P (operands[2])) + { +operands[2] = gen_int_mode (INTVAL (operands[2]), HImode); + +if (!reload_completed +&& !reload_in_progress +&& !stack_register_operand (operands[0], HImode) +&& !stack_register_operand (operands[1], HImode) +&& !d_register_operand (operands[0], HImode) +&& !d_register_operand (operands[1], HImode)) + { +emit_insn (gen_addhi3_clobber (operands[0], operands[1], operands[2])); +DONE; + } + } + }) (define_insn "*addhi3_zero_extend" @@ -803,20 +813,77 @@ (define_insn "*addhi3_sp_R" (set_attr "adjust_len" "addto_sp")]) (define_insn "*addhi3" - [(set (match_operand:HI 0 "register_operand" "=r,!w,!w,d,r,r") - (plus:HI - (match_operand:HI 1 "register_operand" "%0,0,0,0,0,0") - (match_operand:HI 2 "nonmemory_operand" "r,I,J,i,P,N")))] + [(set (match_operand:HI 0 "register_operand" "=r,d,d") +(plus:HI (match_operand:HI 1 "register_operand" "%0,0,0") + (match_operand:HI 2 "nonmemory_operand" "r,s,n")))] "" - "@ - add %A0,%A2\;adc %B0,%B2 - adiw %A0,%2 - sbiw %A0,%n2 - subi %A0,lo8(-(%2))\;sbci %B0,hi8(-(%2)) - sec\;adc %A0,__zero_reg__\;adc %B0,__zero_reg__ - sec\;sbc %A0,__zero_reg__\;sbc %B0,__zero_reg__" - [(set_attr "length" "2,1,1,2,3,3") - (set_attr "cc" "set_n,set_czn,set_czn,set_czn,set_n,set_n")]) + { +static const char * const asm_code[] = + { +"add %A0,%A2\;adc %B0,%B2", +"subi %A0,lo8(-(%2))\;sbci %B0,hi8(-(%2))", +"" + }; + +if (*asm_code[which_alternative]) + return asm_code[which_alternative]; + +return avr_out_plus_noclobber (operands, NULL, NULL); + } + [(set_attr "length" "2,2,2") + (set_attr "adjust_len" "*,*,out_plus_noclobber") + (set_attr "cc" "set_n,set_czn,out_plus_noclobber")]) + +;; Adding a constant to NO_LD_REGS might have lead to a reload of +;; that constant to LD_REGS. We don't add a scratch to *addhi3 +;; itself because that insn is special to reload. + +(define_peephole2 ; *addhi3_clobber + [(set (match_operand:HI 0 "d_register_operand" "") +(match_operand:HI 1 "const_int_operand" "")) + (set (match_operand:HI 2 "l_register_operand" "") + (plus:HI (match_dup 2) + (match_dup 0)))] + "peep2_reg_dead_p (2, operands[0])" + [(parallel [(set (match_dup 2) + (plus:HI (match_dup 2) +(match_dup 1))) + (clobber (match_dup 3))
[Patch Darwin/Ada] work around PR target/50678
The audit trail in the PR contains the detective work (mostly by Eric) that concludes we have a long-standing bug in the Darwin x86-64 sigtramp unwind data. This has been filed as radar #10302855, but we need a work-around until that is resolved (possibly forever on older systems). OK for trunk? (what opinion about 4.6?) ada: PR target/50678 * init.c (Darwin/__gnat_error_handler): Transpose rbx and rdx in the handler. Index: gcc/ada/init.c === --- gcc/ada/init.c (revision 180097) +++ gcc/ada/init.c (working copy) @@ -2287,6 +2287,16 @@ __gnat_error_handler (int sig, siginfo_t *si, void { struct Exception_Data *exception; const char *msg; +#if defined (__x86_64__) + /* Work around radar #10302855/pr50678, where the unwinders (libunwind or + libgcc_s depending on the system revision) and the DWARF unwind data for + the sigtramp have different ideas about register numbering (causing rbx + and rdx to be transposed). */ + ucontext_t *uc = (ucontext_t *)ucontext ; + unsigned long t = uc->uc_mcontext->__ss.__rbx; + uc->uc_mcontext->__ss.__rbx = uc->uc_mcontext->__ss.__rdx; + uc->uc_mcontext->__ss.__rdx =t; +#endif switch (sig) {
Re: [cxx-mem-model] Rename __sync_mem to __atomic
In order to align with a standard naming convention, this patch renames all the __sync_mem references to __atomic. The only real addition is in builtins.c::is_builtin_name needed to know that __atomic_ is also a builtin prefix. Are you sure about the renaming get_builtin_sync_mem -> get_builtin_atomic? Heh, yeah you are right. It should retain the old name. I'll fix that along with some minor formatting issues I've come across. Andrew
Re: [ARM] Fix PR49641
On 10/17/11 14:54, Richard Earnshaw wrote: > On 14/10/11 14:31, Bernd Schmidt wrote: >> On 07/13/11 16:03, Richard Earnshaw wrote: * config/arm/arm.c (store_multiple_sequence): Avoid cases where the base reg is stored iff compiling for Thumb1. * gcc.target/arm/pr49641.c: New test. >> >> Ping. Richard, you replied to the mail but didn't comment on the patch. >> >> >> Bernd >> > > > Sorry, I thought I'd made it clear that I don't think the compiler > should ever use STM with write-back if the base register is in the > stored list. We must certainly never do it if the base register is not > the first register in the list as this has always been unpredictable. > > BTW, this is not Thumb1 specific, it applies at all times. > > > So, no the patch is not OK as it stands. I'm confused. The patch disables the STM if THUMB1 and the base register is in the stored list. We only ever enable write-back for Thumb1 (see gen_stm_seq). So, what's the problem? Bernd
[4.6] Fix type of SRAed enum accesses
On Tue, Sep 27, 2011 at 03:26:03PM +0100, Richard Sandiford wrote: > This patch fixes a miscompilation of stage1 c-parser.o in an ARM bootstrap. > When an access to an enum field was SRAed, a component ref used the type > of the integer temporary variable instead of the type of the enum. > It therefore didn't alias other accesses to the same structure, > and was scheduled after a copy-load. > > Tested on x86_64-linux-gnu, and by verifying that c-parser.o is correctly > compiled for ARM after the patch. Martin says he's going to test on ia64 > too (thanks) -- I'll add 50326 to the changelog if that goes OK. > > OK to install if there are no regressions on ia64? As mentioned in bugzilla, the patch that caused this regression has been committed to 4.6 branch as well. Here is a backport of that patch to 4.6 branch, bootstrapped/regtested on x86_64-linux and i686-linux and Matthias AFAIK bootstrapped it on ia64-linux (where it previously failed to bootstrap because of this), regtest pending. Ok for 4.6? 2011-10-18 Jakub Jelinek PR target/50350 Backport from mainline 2011-09-27 Richard Sandiford PR middle-end/50386 PR middle-end/50326 * tree-sra.c (build_ref_for_model): Use the type of the field as the type of the COMPONENT_REF. --- gcc/tree-sra.c.jj 2011-09-30 17:19:10.0 +0200 +++ gcc/tree-sra.c 2011-10-18 08:01:28.682647833 +0200 @@ -1448,12 +1448,12 @@ build_ref_for_model (location_t loc, tre { if (TREE_CODE (model->expr) == COMPONENT_REF) { - tree t, exp_type; - offset -= int_bit_position (TREE_OPERAND (model->expr, 1)); + tree t, exp_type, fld = TREE_OPERAND (model->expr, 1); + offset -= int_bit_position (fld); exp_type = TREE_TYPE (TREE_OPERAND (model->expr, 0)); t = build_ref_for_offset (loc, base, offset, exp_type, gsi, insert_after); - return fold_build3_loc (loc, COMPONENT_REF, model->type, t, - TREE_OPERAND (model->expr, 1), NULL_TREE); + return fold_build3_loc (loc, COMPONENT_REF, TREE_TYPE (fld), t, fld, + NULL_TREE); } else return build_ref_for_offset (loc, base, offset, model->type, Jakub
Re: [Patch Darwin/Ada] work around PR target/50678
> The audit trail in the PR contains the detective work (mostly by Eric) that > concludes we have a long-standing bug in the Darwin x86-64 sigtramp unwind > data. > This has been filed as radar #10302855, but we need a work-around until > that is resolved (possibly forever on older systems). > > OK for trunk? > (what opinion about 4.6?) > > ada: > > PR target/50678 > * init.c (Darwin/__gnat_error_handler): Transpose rbx and rdx in the > handler. Here is Tristan's review on the above patch: -- According to the PR, this is ok. (Maybe we should restrict the swap to the known broken libc ?) Tristan.
Re: [Patch, Darwin/libiberty] fix target/48108 by adding a section wrapper scheme to the darwin port.
On 30 Sep 2011, at 00:33, Iain Sandoe wrote: I'll re-jig with the typographical changes (sorry that were so many ... ) I've addressed your points in : http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01974.html specifically (other than typographical issues_ o we retain the ability to read an old-style file. - I wonder if there should be an argument for removing this from the trunk version, but retaining in 4.6. - or, at the least, can we remove it from 4.8? o I dropped the ability to read all the sections from a file (by retaining the requirement to specify a segment name). - I will try to submit a separate patch to do this - which would essentially give the darwin version of simple-object the same functionality as the others (at the moment one can only read data from one segment at a time). o there's no need to amend the header, since the functionality is unchanged. - retested on i686/x86-64 darwin 9/10 with "bootstrap-lto bootstrap- debug" (darwin10 requires the ranlib -c fix to work when XCode >= 3.2.6 ). Once again, apologies for the number of typos in the original. OK for trunk/4.6? Iain libiberty: PR target/48108 * simple-object-mach-o.c (GNU_WRAPPER_SECTS, GNU_WRAPPER_INDEX, GNU_WRAPPER_NAMES): New macros. (simple_object_mach_o_segment): Handle wrapper scheme. (simple_object_mach_o_write_section_header): Allow the segment name to be supplied. (simple_object_mach_o_write_segment): Handle wrapper scheme. Ensure that the top-level segment name in the load command is empty. (simple_object_mach_o_write_to_file): Determine the number of sections during segment output, use that in writing the header. gcc: PR target/48108 * config/darwin.c (top level): Amend comments concerning LTO output. (lto_section_num): New variable. (darwin_lto_section_e): New GTY. (LTO_SECTS_SECTION, LTO_INDEX_SECTION): New. (LTO_NAMES_SECTION): Rename. (darwin_asm_named_section): Record LTO section counts and switches in a vec of darwin_lto_section_e. (darwin_file_start): Remove unused code. (darwin_file_end): Put an LTO section termination label. Handle output of the wrapped LTO sections, index and names table. Index: libiberty/simple-object-mach-o.c === --- libiberty/simple-object-mach-o.c(revision 180097) +++ libiberty/simple-object-mach-o.c(working copy) @@ -1,5 +1,5 @@ /* simple-object-mach-o.c -- routines to manipulate Mach-O object files. - Copyright 2010 Free Software Foundation, Inc. + Copyright 2010, 2011 Free Software Foundation, Inc. Written by Ian Lance Taylor, Google. This program is free software; you can redistribute it and/or modify it @@ -174,6 +174,15 @@ struct mach_o_section_64 #define GNU_SECTION_NAMES "__section_names" +/* A GNU-specific extension to wrap multiple sections using three + mach-o sections within a given segment. The section '__wrapper_sects' + is subdivided according to the index '__wrapper_index' and each sub + sect is named according to the names supplied in '__wrapper_names'. */ + +#define GNU_WRAPPER_SECTS "__wrapper_sects" +#define GNU_WRAPPER_INDEX "__wrapper_index" +#define GNU_WRAPPER_NAMES "__wrapper_names" + /* Private data for an simple_object_read. */ struct simple_object_mach_o_read @@ -214,8 +223,19 @@ struct simple_object_mach_o_attributes unsigned int reserved; }; -/* See if we have a Mach-O file. */ +/* See if we have a Mach-O MH_OBJECT file: + A standard MH_OBJECT (from as) will have three load commands: + 0 - LC_SEGMENT/LC_SEGMENT64 + 1 - LC_SYMTAB + 2 - LC_DYSYMTAB + + The LC_SEGMENT/LC_SEGMENT64 will introduce a single anonymous segment + containing all the sections. + + Files written by simple-object will have only the segment command + (no symbol tables). */ + static void * simple_object_mach_o_match ( unsigned char header[SIMPLE_OBJECT_MATCH_HEADER_LEN], @@ -356,9 +376,30 @@ simple_object_mach_o_section_info (int is_big_endi } } -/* Handle a segment in a Mach-O file. Return 1 if we should continue, - 0 if the caller should return. */ +/* Handle a segment in a Mach-O Object file. + This will callback to the function pfn for each "section found" the meaning + of which depends on gnu extensions to mach-o: + + If we find mach-o sections (with the segment name as specified) which also + contain: a 'sects' wrapper, an index, and a name table, we expand this into + as many sections as are specified in the index. In this case, there will + be a callback for each of these. + + We will also allow an extension that permits long names (more than 16 + characters) to be used with mach-o. In this case, the section name has + a specific format embedding an index into a name table, and the file must + contain such name table. + + Re
Re: [Patch,AVR]: PR50447: Tweak addhi3
2011/10/18 Georg-Johann Lay : > This patch do some tweaks to addhi3 like adding QI scratch register. > > The original *addhi3 insn is still there and located prior to new > addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this > note) so that there is a version with and a version without scratch register. > > Patch passes without regressions. > Which improvements added by this patch ? Denis.
Re: [Patch Darwin/Ada] work around PR target/50678
On 18 Oct 2011, at 13:22, Arnaud Charlet wrote: The audit trail in the PR contains the detective work (mostly by Eric) that concludes we have a long-standing bug in the Darwin x86-64 sigtramp unwind data. This has been filed as radar #10302855, but we need a work-around until that is resolved (possibly forever on older systems). OK for trunk? (what opinion about 4.6?) ada: PR target/50678 * init.c (Darwin/__gnat_error_handler): Transpose rbx and rdx in the handler. Here is Tristan's review on the above patch: -- According to the PR, this is ok. (Maybe we should restrict the swap to the known broken libc ?) It's broken in all Libc versions that are in the wild (AFAICT from looking at the released sources). We will need to deal with configury/ __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ stuff once there is a fixed Libc. Iain
Re: [ARM] Fix PR49641
On 18/10/11 13:19, Bernd Schmidt wrote: > On 10/17/11 14:54, Richard Earnshaw wrote: >> On 14/10/11 14:31, Bernd Schmidt wrote: >>> On 07/13/11 16:03, Richard Earnshaw wrote: > * config/arm/arm.c (store_multiple_sequence): Avoid cases where > the base reg is stored iff compiling for Thumb1. > > * gcc.target/arm/pr49641.c: New test. >>> >>> Ping. Richard, you replied to the mail but didn't comment on the patch. >>> >>> >>> Bernd >>> >> >> >> Sorry, I thought I'd made it clear that I don't think the compiler >> should ever use STM with write-back if the base register is in the >> stored list. We must certainly never do it if the base register is not >> the first register in the list as this has always been unpredictable. >> >> BTW, this is not Thumb1 specific, it applies at all times. >> >> >> So, no the patch is not OK as it stands. > > I'm confused. The patch disables the STM if THUMB1 and the base register > is in the stored list. We only ever enable write-back for Thumb1 (see > gen_stm_seq). So, what's the problem? Well, if that's the case why do we need to test for Thumb1 at all? And why do we only enable write-back for Thumb1? other ISA variants can also do that (I know that Thumb1 requires write-back, but it's optionally available for the other ISA flavours). R.
Re: [Patch Darwin/Ada] work around PR target/50678
> It's broken in all Libc versions that are in the wild (AFAICT from looking > at the released sources). > > We will need to deal with > configury/__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ stuff once > there is a fixed Libc. OK, would be good to follow up with such patch when/if this is fixed. Arno
Re: [Patch,AVR]: PR50447: Tweak addhi3
Denis Chertykov schrieb: > 2011/10/18 Georg-Johann Lay : >> This patch do some tweaks to addhi3 like adding QI scratch register. >> >> The original *addhi3 insn is still there and located prior to new >> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for >> this >> note) so that there is a version with and a version without scratch register. >> >> Patch passes without regressions. >> > > Which improvements added by this patch ? > > Denis. If the addhi3 is expanded early, the addition happens with QI scratch which avoids reload of constant if target register is in NO_LD. And reduce register pressure as only QI is needed and not reload of constant to HI. Otherwise, there might be sequences like ldi r31, 2; *reload_inhi mov r12, r31 clr r13 add r14, r12 ; *addhi3 adc r15, r13 which now will be ldi r31, 2; addhi3_clobber add r14, r31 adc r15, __zero_reg__ Similar applies if the reload of the constant happens to LD regs: ldi r30, 2; *movhi clr r31 add r14, r12 ; *addhi3 adc r15, r13 will become ldi r30, 2; addhi3_clobber add r14, r30 adc r15, __zero_reg__ For *addhi3 insns the register pressure is not reduced but the insn sequence might be smarter if peep2 comes up with a QI scratch or if it detects a *reload_inhi insn just prior to the addition (and the reg that holds the reloaded constant dies after the addition). As *addhi3 is special to reload, there is still an "ordinary" add addhi insn without scratch. This is easier because, e.g. prologue and epilogue generation generate add insns (not by means of addhi3 expander but by explicit gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an addhi3 insn is to be generated via addhi3 expander late in the compilation process. Johann
Re: [Patch,AVR]: PR50447: Tweak addhi3
2011/10/18 Georg-Johann Lay : > Denis Chertykov schrieb: >> 2011/10/18 Georg-Johann Lay : >>> This patch do some tweaks to addhi3 like adding QI scratch register. >>> >>> The original *addhi3 insn is still there and located prior to new >>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for >>> this >>> note) so that there is a version with and a version without scratch >>> register. >>> >>> Patch passes without regressions. >>> >> >> Which improvements added by this patch ? >> >> Denis. > > If the addhi3 is expanded early, the addition happens with QI scratch which > avoids reload of constant if target register is in NO_LD. And reduce register > pressure as only QI is needed and not reload of constant to HI. > > Otherwise, there might be sequences like > > ldi r31, 2 ; *reload_inhi > mov r12, r31 > clr r13 > > add r14, r12 ; *addhi3 > adc r15, r13 > > which now will be > > ldi r31, 2 ; addhi3_clobber > add r14, r31 > adc r15, __zero_reg__ > > Similar applies if the reload of the constant happens to LD regs: > > ldi r30, 2 ; *movhi > clr r31 > > add r14, r12 ; *addhi3 > adc r15, r13 > > will become > > ldi r30, 2 ; addhi3_clobber > add r14, r30 > adc r15, __zero_reg__ > > For *addhi3 insns the register pressure is not reduced but the insn sequence > might be smarter if peep2 comes up with a QI scratch or if it detects a > *reload_inhi insn just prior to the addition (and the reg that holds the > reloaded constant dies after the addition). > > As *addhi3 is special to reload, there is still an "ordinary" add addhi insn > without scratch. This is easier because, e.g. prologue and epilogue generation > generate add insns (not by means of addhi3 expander but by explicit > gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an > addhi3 insn is to be generated via addhi3 expander late in the compilation > process Please provide any real world example. Denis.
Re: [ARM] Fix PR49641
On 10/18/11 14:30, Richard Earnshaw wrote: > Well, if that's the case why do we need to test for Thumb1 at all? And > why do we only enable write-back for Thumb1? other ISA variants can > also do that (I know that Thumb1 requires write-back, but it's > optionally available for the other ISA flavours). We're not trying to generate a writeback sequence with our peepholes. The problem is that on Thumb, that's the only instruction available, and we want to make use of it if possible (i.e. register dead afterwards etc.). Bernd
[Patch, Fortran, 4.6, committed] PR 50016: mitigate performance regression on Windows by calling less often _commit
This patch has been approved by Janne off list - and has been committed to the 4.6 branch only (Rev. 180138) after bootstrapping and regtesting it. It is essentially my patch from http://gcc.gnu.org/ml/fortran/2011-10/msg00120.html minus the .texi change. And the inquire.c part of Janne's patch at http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html In GCC 4.6 and 4.7, every time when the gfortran-internal buffer is flushed, _commit() was called on _WIN32 (= MinGW and MinGW-w64), which caused a severe slowdown. The reason is that _commit() causes that files are written to the hard disk. With this patch, _commit() is only called when the user explicitly runs the FLUSH subroutine/statement. Additionally, if one inquires the size of an open file, now the internally known size is used instead of calling "stat". * * * For 4.7 the same issue exists but as the release is still a couple of months away, we have time to learn more about how Windows handles buffering and which consistency should be provided. - The gist of the discussion is whether flush() should automatically call _commit or whether it shouldn't, which is a question about consistency vs. performance. For details, see the thread starting at http://gcc.gnu.org/ml/fortran/2011-10/threads.html#00079 For 4.6 we decided that it makes more sense to make the committed patch available for 4.6.2 than to delay it further. The patch should fix most of the performance issues. Comment about which strategy to choose for 4.7 and insight about the buffering of Windows (i.e. whether it affects data and file meta data such as filesizes, or only the latter) are highly welcome. Tobias Index: libgfortran/ChangeLog === --- libgfortran/ChangeLog (revision 180134) +++ libgfortran/ChangeLog (working copy) @@ -1,3 +1,14 @@ +2011-10-18 Tobias Burnus + Janne Blomqvist + + PR fortran/50016 + * io/file_pos.c (st_flush): Call _commit on MinGW(-w64). + * io/intrinsics.c (flush_i4, flush_i8): Ditto. + * io/unix.c (flush_all_units_1, flush_all_units): Ditto. + (buf_flush): Remove _commit call. + * io/inquire.c (inquire_via_unit): Flush internal buffer + and call file_length instead of invoking stat via file_size. + 2011-09-11 Thomas Koenig Backport fron trunk Index: libgfortran/io/file_pos.c === --- libgfortran/io/file_pos.c (revision 180134) +++ libgfortran/io/file_pos.c (working copy) @@ -452,6 +452,10 @@ st_flush (st_parameter_filepos *fpp) fbuf_flush (u, u->mode); sflush (u->s); +#ifdef _WIN32 + /* Without _commit, changes are not visible to other file descriptors. */ + _commit (u->s->fd); +#endif unlock_unit (u); } else Index: libgfortran/io/inquire.c === --- libgfortran/io/inquire.c (revision 180134) +++ libgfortran/io/inquire.c (working copy) @@ -409,7 +409,10 @@ inquire_via_unit (st_parameter_inquire *iqp, gfc_u if (u == NULL) *iqp->size = -1; else - *iqp->size = file_size (u->file, (gfc_charlen_type) u->file_len); + { + sflush (u->s); + *iqp->size = file_length (u->s); + } } } Index: libgfortran/io/unix.c === --- libgfortran/io/unix.c (revision 180134) +++ libgfortran/io/unix.c (working copy) @@ -435,10 +435,6 @@ buf_flush (unix_stream * s) if (s->ndirty != 0) return -1; -#ifdef _WIN32 - _commit (s->fd); -#endif - return 0; } @@ -1542,7 +1538,14 @@ flush_all_units_1 (gfc_unit *u, int min_unit) if (__gthread_mutex_trylock (&u->lock)) return u; if (u->s) - sflush (u->s); + { + sflush (u->s); +#ifdef _WIN32 + /* Without _commit, changes are not visible to other + file descriptors. */ + _commit (u->s->fd); +#endif + } __gthread_mutex_unlock (&u->lock); } u = u->right; @@ -1573,6 +1576,11 @@ flush_all_units (void) if (u->closed == 0) { sflush (u->s); +#ifdef _WIN32 + /* Without _commit, changes are not visible to other + file descriptors. */ + _commit (u->s->fd); +#endif __gthread_mutex_lock (&unit_lock); __gthread_mutex_unlock (&u->lock); (void) predec_waiting_locked (u); Index: libgfortran/io/intrinsics.c === --- libgfortran/io/intrinsics.c (revision 180134) +++ libgfortran/io/intrinsics.c (working copy) @@ -207,6 +207,11 @@ flush_i4 (GFC_INTEGER_4 *unit) if (us != NULL) { sflush (us->s); +#ifdef _WIN32 + /* Without _commit, changes are not visible + to other file descriptors. */ + _commit (u->s->fd); +#endif unlock_unit (us); } } @@ -230,6 +235,11 @@ flush_i8 (GFC_INTEGER_8 *unit) if (us != NULL) { sflush (us->s); +#ifdef _WIN32 + /* Without _commit, changes are not
Re: [ARM] Fix PR49641
On 18/10/11 13:47, Bernd Schmidt wrote: > On 10/18/11 14:30, Richard Earnshaw wrote: >> Well, if that's the case why do we need to test for Thumb1 at all? And >> why do we only enable write-back for Thumb1? other ISA variants can >> also do that (I know that Thumb1 requires write-back, but it's >> optionally available for the other ISA flavours). > > We're not trying to generate a writeback sequence with our peepholes. > The problem is that on Thumb, that's the only instruction available, and > we want to make use of it if possible (i.e. register dead afterwards etc.). > > > Bernd > OK, I understand now. However, I think it's misleading to talk about thumb1 directly here -- it implies that this doesn't apply in other cases. A better patch, would be (I think) --- gcc/config/arm/arm.c(revision 175906) +++ gcc/config/arm/arm.c(working copy) @@ -9950,7 +9950,9 @@ store_multiple_sequence (rtx *operands, /* If it isn't an integer register, then we can't do this. */ if (unsorted_regs[i] < 0 || (TARGET_THUMB1 && unsorted_regs[i] > LAST_LO_REGNUM) - || (TARGET_THUMB2 && unsorted_regs[i] == base_reg) + /* The effects are unpredictable if the base reg is +both updated and stored. */ + || (base_writeback && unsorted_regs[i] == base_reg) || (TARGET_THUMB2 && unsorted_regs[i] == SP_REGNUM) || unsorted_regs[i] > 14) return 0; and then initialize base_writeback at the entry to the function (I presume that currently we would just set it to be TARGET_THUMB1), perhaps with a comment saying that we don't currently support base_writeback for other ISA variants. R.
Re: [patch testsuite]: Adjust tree-ssa/builtin-expect-*.c tests for high cost-branching optimization
- Original Message - From: "Kai Tietz" To: gcc-patches@gcc.gnu.org Cc: "Richard Guenther" Sent: Tuesday, October 18, 2011 1:33:17 PM Subject: [patch testsuite]: Adjust tree-ssa/builtin-expect-*.c tests for high cost-branching optimization Hello, this patch adjusts __builtin_expect tests in tree-ssa so, that short-circuit branch optimization don't lead to fallout. I've used here a multiplication, as simple substraction/addition might get optimized away. ChangeLog 2011-10-18 Kai Tietz * gcc.dg/tree-ssa/builtin-expect-1.c: Adjust test. * gcc.dg/tree-ssa/builtin-expect-2.c: Adjust test. * gcc.dg/tree-ssa/builtin-expect-3.c: Adjust test. * gcc.dg/tree-ssa/builtin-expect-4.c: Adjust test. * gcc.dg/tree-ssa/builtin-expect-5.c: Adjust test. Regression tested for x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c @@ -1,9 +1,9 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-forwprop" } */ -f (int i, float j) +f (int i, float j, int i2, float j2) { - if (__builtin_expect (i > 0 && __builtin_expect (j != 0, 1), 0)) + if (__builtin_expect ((i * i2) > 0 && __builtin_expect ((j * j2) != 0, 1), 0)) a (); else b (); Well, this change might be not necessary. But to keep in all 5 tests the same conditions checked, I modified this too. On the other hand, this change isn't really necessary, if we assume that this check is testing that __builtin_expect conditions aren't merged on short-circuit optimization. Regards, Kai
Re: [PATCH][ARM] -m{cpu,tune,arch}=native
On 17/10/11 14:24, Richard Earnshaw wrote: There's a presumption in host_detect_local_cpu() that "CPU implementer" will appear before "CPU part" in the output of /proc/cpuinfo. That's probably a pretty safe assumption (and it appears that it will handle that case relatively safely -- ie not crash). I'll let you decide whether that's important enough to fix. Yes, I was aware of that. I believe it wouldn't be worth the effort required to fix it, and it would only complicate the code for no advantage. If, at some point in the future, the format of /proc/cpuinfo changes enough to break this assumption then it'll probably be the least of the problems needing fixing. However another problem I've just spotted is that you never close the file descriptor if you fail to parse the output properly (ie jump to not_found). That should be fixed before this is committed. OK with the second issue resolved. Thanks, fixed and committed, as attached. Andrew 2011-10-18 Andrew Stubbs gcc/ * config.host (arm*-*-linux*): Add driver-arm.o and x-arm. * config/arm/arm.opt: Add 'native' processor_type and arm_arch enum values. * config/arm/arm.h (host_detect_local_cpu): New prototype. (EXTRA_SPEC_FUNCTIONS): New define. (MCPU_MTUNE_NATIVE_SPECS): New define. (DRIVER_SELF_SPECS): New define. * config/arm/driver-arm.c: New file. * config/arm/x-arm: New file. * doc/invoke.texi (ARM Options): Document -mcpu=native, -mtune=native and -march=native. --- a/gcc/config.host +++ b/gcc/config.host @@ -100,6 +100,14 @@ case ${host} in esac case ${host} in + arm*-*-linux*) +case ${target} in + arm*-*-*) + host_extra_gcc_objs="driver-arm.o" + host_xmake_file="${host_xmake_file} arm/x-arm" + ;; +esac +;; alpha*-*-linux* | alpha*-dec-osf*) case ${target} in alpha*-*-linux* | alpha*-dec-osf*) --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -2255,4 +2255,21 @@ extern int making_const_table; } \ while (0) +/* -mcpu=native handling only makes sense with compiler running on + an ARM chip. */ +#if defined(__arm__) +extern const char *host_detect_local_cpu (int argc, const char **argv); +# define EXTRA_SPEC_FUNCTIONS \ + { "local_cpu_detect", host_detect_local_cpu }, + +# define MCPU_MTUNE_NATIVE_SPECS \ + " %{march=native:%http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "configargs.h" + +struct vendor_cpu { + const char *part_no; + const char *arch_name; + const char *cpu_name; +}; + +static struct vendor_cpu arm_cpu_table[] = { +{"0x926", "armv5te", "arm926ej-s"}, +{"0xa26", "armv5te", "arm1026ej-s"}, +{"0xb02", "armv6k", "mpcore"}, +{"0xb36", "armv6j", "arm1136j-s"}, +{"0xb56", "armv6t2", "arm1156t2-s"}, +{"0xb76", "armv6zk", "arm1176jz-s"}, +{"0xc05", "armv7-a", "cortex-a5"}, +{"0xc08", "armv7-a", "cortex-a8"}, +{"0xc09", "armv7-a", "cortex-a9"}, +{"0xc0f", "armv7-a", "cortex-a15"}, +{"0xc14", "armv7-r", "cortex-r4"}, +{"0xc15", "armv7-r", "cortex-r5"}, +{"0xc20", "armv6-m", "cortex-m0"}, +{"0xc21", "armv6-m", "cortex-m1"}, +{"0xc23", "armv7-m", "cortex-m3"}, +{"0xc24", "armv7e-m", "cortex-m4"}, +{NULL, NULL, NULL} +}; + +struct { + const char *vendor_no; + const struct vendor_cpu *vendor_parts; +} vendors[] = { +{"0x41", arm_cpu_table}, +{NULL, NULL} +}; + +/* This will be called by the spec parser in gcc.c when it sees + a %:local_cpu_detect(args) construct. Currently it will be called + with either "arch", "cpu" or "tune" as argument depending on if + -march=native, -mcpu=native or -mtune=native is to be substituted. + + It returns a string containing new command line parameters to be + put at the place of the above two options, depending on what CPU + this is executed. E.g. "-march=armv7-a" on a Cortex-A8 for + -march=native. If the routine can't detect a known processor, + the -march or -mtune option is discarded. + + ARGC and ARGV are set depending on the actual arguments given + in the spec. */ +const char * +host_detect_local_cpu (int argc, const char **argv) +{ + const char *val = NULL; + char buf[128]; + FILE *f; + bool arch; + const struct vendor_cpu *cpu_table = NULL; + + if (argc < 1) +goto not_found; + + arch = strcmp (argv[0], "arch") == 0; + if (!arch && strcmp (argv[0], "cpu") != 0 && strcmp (argv[0], "tune")) +goto not_found; + + f = fopen ("/proc/cpuinfo", "r"); + if (f == NULL) +goto not_found; + + while (fgets (buf, sizeof (buf), f) != NULL) +{ + /* Ensure that CPU implementer is ARM (0x41). */ + if (strncmp (buf, "CPU implementer", sizeof ("CPU implementer") - 1) == 0) + { + int i; + for (i = 0; vendors[i].vendor_no != NULL; i++) + if (strstr (buf, vendors[i].vendor_no) != NULL) + { + cpu_table = vendors[i].vendor_parts; + break; + } + } + + /* Detect arch/cpu. */ +
Re: [Patch]: fix typo in rs6000.c (AIX bootstrap broken)
Tristan, Mike fixed the typo already. Why are you still seeing a problem? - David On Tue, Oct 18, 2011 at 6:52 AM, Tristan Gingold wrote: > Ping… > > On Oct 13, 2011, at 5:11 PM, Tristan Gingold wrote: > >> Hi, >> >> looks like an obvious typo. Ok for trunk ? >> >> Tristan. >> >> 2011-10-13 Tristan Gingold >> >> * config/rs6000/rs6000.c (rs6000_init_builtins): Fix typo. >> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index 4fd2192..3bfe33e 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -12213,7 +12213,7 @@ rs6000_init_builtins (void) >> >> #if TARGET_XCOFF >> /* AIX libm provides clog as __clog. */ >> - if ((tdecl = builtin_decl_explicit ([BUILT_IN_CLOG))) != NULL_TREE) >> + if ((tdecl = builtin_decl_explicit (BUILT_IN_CLOG)) != NULL_TREE) >> set_user_assembler_name (tdecl, "__clog"); >> #endif >> >> > >
Re: [PATCH] Fix PR46556 (poor address generation)
Greetings, Here is a new revision of the tree portions of this patch. I moved the pattern recognizer to expand, and added additional logic to look for the same pattern in gimple form. I added two more tests to verify the new logic. I didn't run into any problems with the RTL CSE phases. I can't recall for certain what caused me to abandon the expand version previously. There may not have been good reason; too many versions to keep track of and too many interruptions, I suppose. In any case, I'm much happier having this code in the expander. Paolo's RTL logic for unpropagating the zero-offset case is not going to work out as is. It causes a number of performance degradations, which I suspect are due to the pass reordering. That's a separate issue, though, and not needed for this patch. Bootstrapped and regression-tested on powerpc64-linux. SPEC cpu2000 shows a number of small improvements and no significant degradations. SPEC cpu2006 testing is pending. Thanks, Bill 2011-10-18 Bill Schmidt gcc: PR rtl-optimization/46556 * expr.c (tree-pretty-print.h): New include. (restructure_base_and_offset): New function. (restructure_mem_ref): New function. (expand_expr_real_1): In MEM_REF case, attempt restructure_mem_ref first. In normal_inner_ref case, attempt restructure_base_and_offset first. * Makefile.in: Update dependences for expr.o. gcc/testsuite: PR rtl-optimization/46556 * gcc.dg/tree-ssa-pr46556-1.c: New test. * gcc.dg/tree-ssa-pr46556-2.c: Likewise. * gcc.dg/tree-ssa-pr46556-3.c: Likewise. * gcc.dg/tree-ssa-pr46556-4.c: Likewise. * gcc.dg/tree-ssa-pr46556-5.c: Likewise. Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-expand" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); +} + +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 2 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "const_int 128" 1 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 1 "expand" } } */ +/* { dg-final { cleanup-rtl-dump "expand" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-expand" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) +foo (p->a[n], p->c[n], p->b[n]); + else if (n > 3) +foo (p->b[n], p->a[n], p->c[n]); +} + +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */ +/* { dg-final { cleanup-rtl-dump "expand" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0) @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-expand" } */ +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 3) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) + foo (p->b[n], p->a[n], p->c[n]); +} +} + +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */ +/* { dg-final { cleanup-rtl-dump "expand" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c (revision 0) @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-expand" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (*((int *)p + n*4), *((int *)p + 32 + n*4), *((int *)p + 16 + n*4)); + if (n > 3) +{ + foo (*((int *)p + n*4), *((int *)p + 32 + n*4), *
Re: [Patch]: fix typo in rs6000.c (AIX bootstrap broken)
On Oct 18, 2011, at 4:23 PM, David Edelsohn wrote: > Tristan, > > Mike fixed the typo already. Why are you still seeing a problem? Indeed, was fixed later. Tristan. > > - David > > On Tue, Oct 18, 2011 at 6:52 AM, Tristan Gingold wrote: >> Ping… >> >> On Oct 13, 2011, at 5:11 PM, Tristan Gingold wrote: >> >>> Hi, >>> >>> looks like an obvious typo. Ok for trunk ? >>> >>> Tristan. >>> >>> 2011-10-13 Tristan Gingold >>> >>> * config/rs6000/rs6000.c (rs6000_init_builtins): Fix typo. >>> >>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >>> index 4fd2192..3bfe33e 100644 >>> --- a/gcc/config/rs6000/rs6000.c >>> +++ b/gcc/config/rs6000/rs6000.c >>> @@ -12213,7 +12213,7 @@ rs6000_init_builtins (void) >>> >>> #if TARGET_XCOFF >>> /* AIX libm provides clog as __clog. */ >>> - if ((tdecl = builtin_decl_explicit ([BUILT_IN_CLOG))) != NULL_TREE) >>> + if ((tdecl = builtin_decl_explicit (BUILT_IN_CLOG)) != NULL_TREE) >>> set_user_assembler_name (tdecl, "__clog"); >>> #endif >>> >>> >> >>
Re: [PATCH][ARM] -m{cpu,tune,arch}=native
On 18/10/11 15:23, Andrew Stubbs wrote: > + /* Detect arch/cpu. */ > + if (strncmp (buf, "CPU part", sizeof ("CPU part") - 1) == 0) > + { > + int i; > + > + if (cpu_table == NULL) > + goto not_found; > + Which still jumps to not_found without closing f. R.
Patch ping: [PR 50562]: configure
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00291.html configure: add target-libgloss to skipdirs when using --without-newlib. Testing: Bootstrapped/tested on x86_64-linux and sparc-sun-solaris2.10 with a gcc+newlib+libgloss combined tree. Thanks, Jeremy
[Patch, libfortran, 1/3] Simplify handling of special files
Hi, in a few places in libgfortran we have some code for handling special and/or non-seekable files differently. The problem is that special files don't all have some nice consistent behavior. E.g. wrt. seeking, some allow seeking just fine, others allow some seeks and not others, others allow them but always return an offset of 0, and yet others fail the seek completely. The Fortran standard doesn't really help here except for noting that some files may not be positionable, and thus statements requiring the file position to be modified may fail on such files. Obviously, libgfortran itself cannot enumerate all the possible variations for how a special file may behave, and trying to impose some kind of least common denominator may hide essential capability. Having thought about this, my conclusion is that the only thing that makes sense is that we do what the caller asks us to do, and if that fails, we report the error back to the caller and let the caller handle it. The attached patch implements this. Regtested on x86_64-unknown-linux-gnu, Ok for trunk? 2011-10-18 Janne Blomqvist * io/file_pos.c (st_rewind): Handle regular and special files identically. * io/intrinsics.c (fseek_sub): Don't check whether we think the file is seekable, just do what the caller says. * io/transfer.c (skip_record): First try to seek, then fallback to reading and throwing away what we read. * io/unit.c (update_position): Don't check whether file is seekable, just try to do what we're told. (unit_truncate): Likewise. * io/unix.c (struct unix_stream): Remove special_file flag. (buf_flush): Remove code for handling unseekable files. (buf_seek): Likewise. (fd_to_stream): Use buffered IO only for regular files. (file_length): Remove is_seekable() call. (is_seekable): Remove function. (is_special): Likewise. * io/unix.h: Remove prototypes for is_seekable and is_special. -- Janne Blomqvist diff --git a/libgcc/configure b/libgcc/configure old mode 100644 new mode 100755 diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c index b034f38..2caf601 100644 --- a/libgfortran/io/file_pos.c +++ b/libgfortran/io/file_pos.c @@ -407,20 +407,15 @@ st_rewind (st_parameter_filepos *fpp) if (sseek (u->s, 0, SEEK_SET) < 0) generate_error (&fpp->common, LIBERROR_OS, NULL); - /* Handle special files like /dev/null differently. */ - if (!is_special (u->s)) + /* Set this for compatibilty with g77 for /dev/null. */ + if (file_length (u->s) == 0) + u->endfile = AT_ENDFILE; + else { /* We are rewinding so we are not at the end. */ u->endfile = NO_ENDFILE; } - else - { - /* Set this for compatibilty with g77 for /dev/null. */ - if (file_length (u->s) == 0 && stell (u->s) == 0) - u->endfile = AT_ENDFILE; - /* Future refinements on special files can go here. */ - } - + u->current_record = 0; u->strm_pos = 1; u->read_bad = 0; diff --git a/libgfortran/io/intrinsics.c b/libgfortran/io/intrinsics.c index f48bd77..c1287d4 100644 --- a/libgfortran/io/intrinsics.c +++ b/libgfortran/io/intrinsics.c @@ -246,7 +246,7 @@ fseek_sub (int * unit, GFC_IO_INT * offset, int * whence, int * status) gfc_unit * u = find_unit (*unit); ssize_t result = -1; - if (u != NULL && is_seekable(u->s)) + if (u != NULL) { result = sseek(u->s, *offset, *whence); diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c index 12aca97..1e054f8 100644 --- a/libgfortran/io/transfer.c +++ b/libgfortran/io/transfer.c @@ -2823,18 +2823,12 @@ skip_record (st_parameter_dt *dtp, ssize_t bytes) if (dtp->u.p.current_unit->bytes_left_subrecord == 0) return; - if (is_seekable (dtp->u.p.current_unit->s)) + /* Direct access files do not generate END conditions, + only I/O errors. */ + if (sseek (dtp->u.p.current_unit->s, + dtp->u.p.current_unit->bytes_left_subrecord, SEEK_CUR) < 0) { - /* Direct access files do not generate END conditions, - only I/O errors. */ - if (sseek (dtp->u.p.current_unit->s, - dtp->u.p.current_unit->bytes_left_subrecord, SEEK_CUR) < 0) - generate_error (&dtp->common, LIBERROR_OS, NULL); - - dtp->u.p.current_unit->bytes_left_subrecord = 0; -} - else -{ /* Seek by reading data. */ + /* Seeking failed, fall back to seeking by reading data. */ while (dtp->u.p.current_unit->bytes_left_subrecord > 0) { rlength = @@ -2850,8 +2844,9 @@ skip_record (st_parameter_dt *dtp, ssize_t bytes) dtp->u.p.current_unit->bytes_left_subrecord -= readb; } + return; } - + dtp->u.p.current_unit->bytes_left_subrecord = 0; } diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index e8a9b84..d2fb6d0 100644 --- a/libgfortran/io/unit.c +++ b/libgfortran/io/unit.c @@ -714,23 +714,21 @@ update_position (gfc_unit *u
Re: [PATCH][ARM] -m{cpu,tune,arch}=native
On 18/10/11 15:34, Richard Earnshaw wrote: On 18/10/11 15:23, Andrew Stubbs wrote: + /* Detect arch/cpu. */ + if (strncmp (buf, "CPU part", sizeof ("CPU part") - 1) == 0) + { + int i; + + if (cpu_table == NULL) + goto not_found; + Which still jumps to not_found without closing f. Hmmm, I know I fixed that, I know I did! But I appear to have lost the change somewhere when I updated my checkout? I'll fix it now. Andrew
[Patch, libfortran, 2/3] Introduce a ssize member function
Hi, the attached patch introduces a new "member function" ssize in order to determine the size of a unit. The benefit is that with unbuffered IO, when getting the size of a file we can replace 3 lseek() calls with one fstat() call, and also that for unseekable file we get a size of 0 rather than -1 (which would indicate that the syscall failed). Regtested on x86_64-unknown-linux-gnu, Ok for trunk? 2011-10-18 Janne Blomqvist * io/unix.h (struct stream): Add size function pointer. (ssize): New inline function. (file_length): Remove prototype. * io/unix.c (raw_size): New function. (raw_init): Initialize st.size pointer. (buf_size): New function. (buf_init): Initialize st.size pointer. (open_internal): Likewise. (open_internal4): Likewise. (file_length): Remove function. * io/file_pos.c (st_rewind): Use ssize instead of file_length. * io/open.c (test_endfile): Likewise. * io/transfer.c (data_transfer_init): Likewise. (next_record_r): Likewise. (next_record_w): Likewise. * io/unit.c (update_position): Likewise. -- Janne Blomqvist diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c index 2caf601..c8ecc3a 100644 --- a/libgfortran/io/file_pos.c +++ b/libgfortran/io/file_pos.c @@ -408,7 +408,7 @@ st_rewind (st_parameter_filepos *fpp) generate_error (&fpp->common, LIBERROR_OS, NULL); /* Set this for compatibilty with g77 for /dev/null. */ - if (file_length (u->s) == 0) + if (ssize (u->s) == 0) u->endfile = AT_ENDFILE; else { diff --git a/libgfortran/io/open.c b/libgfortran/io/open.c index b26d14d..0102b9c 100644 --- a/libgfortran/io/open.c +++ b/libgfortran/io/open.c @@ -153,7 +153,7 @@ static const st_option async_opt[] = static void test_endfile (gfc_unit * u) { - if (u->endfile == NO_ENDFILE && file_length (u->s) == stell (u->s)) + if (u->endfile == NO_ENDFILE && ssize (u->s) == stell (u->s)) u->endfile = AT_ENDFILE; } diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c index 1e054f8..26263ae 100644 --- a/libgfortran/io/transfer.c +++ b/libgfortran/io/transfer.c @@ -2627,7 +2627,7 @@ data_transfer_init (st_parameter_dt *dtp, int read_flag) a partial record needs to exist. */ if (dtp->u.p.mode == READING && (dtp->rec - 1) - * dtp->u.p.current_unit->recl >= file_length (dtp->u.p.current_unit->s)) + * dtp->u.p.current_unit->recl >= ssize (dtp->u.p.current_unit->s)) { generate_error (&dtp->common, LIBERROR_BAD_OPTION, "Non-existing record number"); @@ -2944,7 +2944,7 @@ next_record_r (st_parameter_dt *dtp, int done) { bytes_left = (int) dtp->u.p.current_unit->bytes_left; bytes_left = min_off (bytes_left, - file_length (dtp->u.p.current_unit->s) + ssize (dtp->u.p.current_unit->s) - stell (dtp->u.p.current_unit->s)); if (sseek (dtp->u.p.current_unit->s, bytes_left, SEEK_CUR) < 0) @@ -3309,7 +3309,7 @@ next_record_w (st_parameter_dt *dtp, int done) { dtp->u.p.current_unit->strm_pos += len; if (dtp->u.p.current_unit->strm_pos - < file_length (dtp->u.p.current_unit->s)) + < ssize (dtp->u.p.current_unit->s)) unit_truncate (dtp->u.p.current_unit, dtp->u.p.current_unit->strm_pos - 1, &dtp->common); diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index d2fb6d0..1d36214 100644 --- a/libgfortran/io/unit.c +++ b/libgfortran/io/unit.c @@ -719,7 +719,7 @@ update_position (gfc_unit *u) return; else if (cur == 0) u->flags.position = POSITION_REWIND; - else if (file_length (u->s) == cur) + else if (ssize (u->s) == cur) u->flags.position = POSITION_APPEND; else u->flags.position = POSITION_ASIS; diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c index 00f7c72..c87be13 100644 --- a/libgfortran/io/unix.c +++ b/libgfortran/io/unix.c @@ -332,6 +332,16 @@ raw_tell (unix_stream * s) return lseek (s->fd, 0, SEEK_CUR); } +static gfc_offset +raw_size (unix_stream * s) +{ + struct stat statbuf; + int ret = fstat (s->fd, &statbuf); + if (ret == -1) +return ret; + return statbuf.st_size; +} + static int raw_truncate (unix_stream * s, gfc_offset length) { @@ -398,6 +408,7 @@ raw_init (unix_stream * s) s->st.write = (void *) raw_write; s->st.seek = (void *) raw_seek; s->st.tell = (void *) raw_tell; + s->st.size = (void *) raw_size; s->st.trunc = (void *) raw_truncate; s->st.close = (void *) raw_close; s->st.flush = (void *) raw_flush; @@ -584,6 +595,12 @@ buf_tell (unix_stream * s) return buf_seek (s, 0, SEEK_CUR); } +static gfc_offset +buf_size (unix_stream * s) +{ + return s->file_length; +} + static int buf_truncate (unix_stream * s, gfc_offset length) { @@ -613,6 +630,7 @@ buf_init (unix_stream * s) s->st.write = (void *) buf_write; s->st.seek = (void *) bu
Re: [PATCH 3/6] Emit macro expansion related diagnostics
Hey, Dodji, Your patch broke bootstrap on AIX because of the typedef "loc_t" introduced in tree-diagnostics.c. The typedef conflicts with a typedef in an AIX 5.3 header file for locales. AIX should not be using that namespace, but the failure occurs before fix-includes, so it is not possible to fix it there. How can we avoid the namespace conflict? Thanks, David
[Patch, libfortran, 3/3] Update file position lazily
Hi, libgfortran maintains a position flag which is used by the INQUIRE(POSITION=...) statement. Currently we update this flag after every IO statement. For unbuffered IO this is somewhat tedious, as figuring out whether we're at the beginning of a file or the end requires at least two syscalls. The attached patch moves this checking to the inquire implementation, which is certainly less frequently invoked than READ or WRITE. Also, I think I've found a small standards conformance bug. From F2008 (N1830) 9.10.2.23 (page 256): "... ASIS if the connection was opened without changing its position." and "If the file has been repositioned since the connection, the scalar-default-char-variable is assigned a processor-dependent value, which shall not be REWIND unless the file is positioned at its initial point and shall not be APPEND unless the file is positioned so that its endfile record is the next record or at its terminal point if it has no endfile record. " If my understanding of the above is correct, returning ASIS is incorrent unless the position is unchanged since the OPEN statement. Currently we return ASIS by default if it's neither REWIND nor APPEND. So the patch changes the implementation to return the processor-dependent value UNSPECIFIED in this case. Regtested on x86_64-unknown-linux-gnu, Ok for trunk? 2011-10-18 Janne Blomqvist * io/inquire.c (inquire_via_unit): Check whether we're at the beginning or end if the position is unspecified. If the position is not one of the 3 standard ones, return unspecified. * io/io.h (update_position): Remove prototype. * io/transfer.c (next_record): Set the position to unspecified, letting inquire figure it out more exactly when needed. * io/unit.c (update_position): Remove function. testsuite ChangeLog: 2011-10-18 Janne Blomqvist * gfortran.dg/inquire_5.f90: Update testcase to match the standard and current implementation. -- Janne Blomqvist diff --git a/gcc/testsuite/gfortran.dg/inquire_5.f90 b/gcc/testsuite/gfortran.dg/inquire_5.f90 index fe107a1..064f96d 100644 --- a/gcc/testsuite/gfortran.dg/inquire_5.f90 +++ b/gcc/testsuite/gfortran.dg/inquire_5.f90 @@ -1,11 +1,10 @@ ! { dg-do run { target fd_truncate } } -! { dg-options "-std=legacy" } ! ! pr19314 inquire(..position=..) segfaults ! test by thomas.koe...@online.de ! bdavis9...@comcast.net implicit none - character*20 chr + character(len=20) chr open(7,STATUS='SCRATCH') inquire(7,position=chr) if (chr.NE.'ASIS') CALL ABORT @@ -31,7 +30,8 @@ write(7,*)'this is another record' backspace(7) inquire(7,position=chr) - if (chr.NE.'ASIS') CALL ABORT + if (chr.eq.'ASIS' .or. chr .eq. 'REWIND' & + .or. chr .eq. 'APPEND') CALL ABORT rewind(7) inquire(7,position=chr) if (chr.NE.'REWIND') CALL ABORT diff --git a/libgfortran/io/inquire.c b/libgfortran/io/inquire.c index 252f29f..fb525ca 100644 --- a/libgfortran/io/inquire.c +++ b/libgfortran/io/inquire.c @@ -418,24 +418,36 @@ inquire_via_unit (st_parameter_inquire *iqp, gfc_unit * u) if (u == NULL || u->flags.access == ACCESS_DIRECT) p = undefined; else -switch (u->flags.position) - { - case POSITION_REWIND: - p = "REWIND"; - break; - case POSITION_APPEND: - p = "APPEND"; - break; - case POSITION_ASIS: - p = "ASIS"; - break; - default: - /* if not direct access, it must be - either REWIND, APPEND, or ASIS. - ASIS seems to be the best default */ - p = "ASIS"; - break; - } + { + /* If the position is unspecified, check if we can figure + out whether it's at the beginning or end. */ + if (u->flags.position == POSITION_UNSPECIFIED) + { + gfc_offset cur = stell (u->s); + if (cur == 0) + u->flags.position = POSITION_REWIND; + else if (cur != -1 && (ssize (u->s) == cur)) + u->flags.position = POSITION_APPEND; + } + switch (u->flags.position) + { + case POSITION_REWIND: + p = "REWIND"; + break; + case POSITION_APPEND: + p = "APPEND"; + break; + case POSITION_ASIS: + p = "ASIS"; + break; + default: + /* If the position has changed and is not rewind or + append, it must be set to a processor-dependent + value. */ + p = "UNSPECIFIED"; + break; + } + } cf_strcpy (iqp->position, iqp->position_len, p); } diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index 37353d7..23f07ca 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -608,9 +608,6 @@ internal_proto(get_unit); extern void unlock_unit (gfc_unit *); internal_proto(unlock_unit); -extern void update_position (gfc_unit *); -internal
Re: [PATCH 3/6] Emit macro expansion related diagnostics
On Tue, 18 Oct 2011, David Edelsohn wrote: > Hey, Dodji, > > Your patch broke bootstrap on AIX because of the typedef "loc_t" > introduced in tree-diagnostics.c. The typedef conflicts with a > typedef in an AIX 5.3 header file for locales. AIX should not be > using that namespace, but the failure occurs before fix-includes, so > it is not possible to fix it there. The whole *_t namespace is reserved by POSIX, so it's not unreasonable for AIX to have some AIX-specific *_t typedefs although it seems quite rare for this to cause conflicts in practice (and AIX is obviously risking conflicts with future POSIX revisions by using *_t for its own types). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 3/6] Emit macro expansion related diagnostics
On Tue, Oct 18, 2011 at 11:09:04AM -0400, David Edelsohn wrote: > Your patch broke bootstrap on AIX because of the typedef "loc_t" > introduced in tree-diagnostics.c. The typedef conflicts with a > typedef in an AIX 5.3 header file for locales. AIX should not be > using that namespace, but the failure occurs before fix-includes, so > it is not possible to fix it there. At least in POSIX *_t is reserved for the implementation (in any header). So gcc should use something else. Jakub
Re: [PATCH 3/6] Emit macro expansion related diagnostics
On Tue, Oct 18, 2011 at 10:19 AM, Joseph S. Myers wrote: > On Tue, 18 Oct 2011, David Edelsohn wrote: > >> Hey, Dodji, >> >> Your patch broke bootstrap on AIX because of the typedef "loc_t" >> introduced in tree-diagnostics.c. The typedef conflicts with a >> typedef in an AIX 5.3 header file for locales. AIX should not be >> using that namespace, but the failure occurs before fix-includes, so >> it is not possible to fix it there. > > The whole *_t namespace is reserved by POSIX, so it's not unreasonable for > AIX to have some AIX-specific *_t typedefs although it seems quite rare > for this to cause conflicts in practice (and AIX is obviously risking > conflicts with future POSIX revisions by using *_t for its own types). any reason we could not call it location_type? or just location?
[PATCH, m68k] Enable building for ColdFire Linux
Hi, The attached patch (by Maxim Kuvyrkov) enables building GCC for Linux when the --with-arch=cf configuration option is in effect (i.e. when building a compiler targeting ColdFire rather than legacy m68k). Without the patch the build fails with an error such as: .../gcc/config/m68k/t-mlibs:45: *** Error default cpu 'mcf5475' is not in multilib set ''. Stop. OK to apply? Thanks, Julian ChangeLog Maxim Kuvyrkov gcc/ * config/m68k/t-linux (M68K_MLIB_CPU): Add ColdFire CPUs. Index: gcc/config/m68k/t-linux === --- gcc/config/m68k/t-linux (revision 179967) +++ gcc/config/m68k/t-linux (working copy) @@ -18,8 +18,8 @@ EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o -# Only include multilibs for 680x0 CPUs with an MMU. -M68K_MLIB_CPU += && (CPU ~ "^m680") && (FLAGS ~ "FL_MMU") +# Only include multilibs for 680x0 and ColdFire CPUs with an MMU. +M68K_MLIB_CPU += && ((CPU ~ "^m680") || (CPU ~ "^mcf")) && (FLAGS ~ "FL_MMU") # This rule uses MULTILIB_MATCHES to generate a definition of # SYSROOT_SUFFIX_SPEC.
Re: [Patch, fortran] [00/14] PR fortran/50420 Support coarray subreferences
On Sunday 09 October 2011 18:25:25 Tobias Burnus wrote: > On 07.10.2011 16:38, Mikael Morin wrote: > > The full patchset has passed the fortran testsuite successfully. > > OK for trunk? > > OK for the whole patch set. Thanks for finding and fixing the issue! > Committed as follows. I committed 8 and 9 together because I was fearing breakage. patch revision === 1 180140 2 180141 3 180142 4 180143 5 180144 6 180145 7 180146 8,9180147 10 180148 11 180149 12 180150 13 180151 14 180152 tests 180153 Mikael
[PATCH] i?86 vec_perm fixes and improvements
Hi! Now that there is a better testsuite for constant reshuffling, this patch fixes various issues I found plus improves various permutations. Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix\{-msse2,-msse4,-mavx\} dg-torture.exp=vshuf*' on AVX capable box and tested -mavx2 compiled tests on sde. Ok for trunk? Examples of improvements, say for V16HImode: - vpshuflw$228, a(%rip), %ymm0 + vmovdqa a(%rip), %ymm0 vmovdqa %ymm0, c(%rip) (for identity permutation), ICE vs. + vpbroadcastwa(%rip), %ymm0 + vmovdqa %ymm0, c(%rip) using vpbroadcast* for broadcast shuffle, - vpshufb .LC0(%rip), %ymm0, %ymm1 - vpshufb .LC1(%rip), %ymm0, %ymm0 - vpermq $78, %ymm1, %ymm1 - vpor%ymm1, %ymm0, %ymm0 + vperm2i128 $0, %ymm0, %ymm0, %ymm0 + vpshufb .LC0(%rip), %ymm0, %ymm0 when both lanes refer to just one lane, > 20 insns (full two argument non-constant shuffle) into: + vmovdqa a(%rip), %ymm0 + vpunpcklwd b(%rip), %ymm0, %ymm0 + vpshufb .LC2(%rip), %ymm0, %ymm0 + vmovdqa %ymm0, c(%rip) (resp. vpunpckhwd) when interleave gives something vpshufb can reshuffle afterwards, - vmovdqa a(%rip), %ymm0 - vpshufb .LC11(%rip), %ymm0, %ymm1 - vpshufb .LC12(%rip), %ymm0, %ymm0 - vpermq $78, %ymm1, %ymm1 - vpor%ymm1, %ymm0, %ymm0 + vpermq $156, a(%rip), %ymm0 + vpshufb .LC4(%rip), %ymm0, %ymm0 another case where vpermq can shuffle quadwords into something vpshufb can reshuffle, etc. 2011-10-18 Jakub Jelinek * config/i386/i386.c (ix86_expand_vec_perm): In merge_two use mode SUBREG of operands[0] as target. (valid_perm_using_mode_p): Don't ignore higher bits of d->perm. (expand_vec_pshufb): For V8SImode vmode emit avx2_permvarv8si. (expand_vec_perm_1): Handle identity and some broadcast permutations. (expand_vec_perm_interleave2): Handle also 32-byte modes, using vperm2[fi]128 or vpunpck[lh]* followed by single insn permutation. For d->testing_p return true earlier to avoid creating more GC garbage. (expand_vec_perm_vpermq_perm_1): New function. (expand_vec_perm_vpshufb2_vpermq): For d->testing_p return true earlier to avoid creating more GC garbage. Fix handling of V16HImode. Avoid some SUBREGs in SET_DEST. (expand_vec_perm_broadcast_1): Return false for 32-byte integer vector modes. (expand_vec_perm_vpshufb4_vpermq2): New function. (ix86_expand_vec_perm_builtin_1): Call expand_vec_perm_vpermq_perm_1 and expand_vec_perm_vpshufb4_vpermq2. --- gcc/config/i386/i386.c.jj 2011-10-17 22:27:39.0 +0200 +++ gcc/config/i386/i386.c 2011-10-18 14:08:58.0 +0200 @@ -19663,7 +19663,7 @@ ix86_expand_vec_perm (rtx operands[]) mask = expand_simple_binop (maskmode, AND, mask, vt, NULL_RTX, 0, OPTAB_DIRECT); - xops[0] = operands[0]; + xops[0] = gen_lowpart (mode, operands[0]); xops[1] = gen_lowpart (mode, t2); xops[2] = gen_lowpart (mode, t1); xops[3] = gen_rtx_EQ (maskmode, mask, vt); @@ -35006,8 +35006,7 @@ valid_perm_using_mode_p (enum machine_mo return false; else for (j = 1; j < chunk; ++j) - if ((d->perm[i] & (d->nelt - 1)) + j - != (d->perm[i + j] & (d->nelt - 1))) + if (d->perm[i] + j != d->perm[i + j]) return false; return true; @@ -35138,6 +35137,8 @@ expand_vec_perm_pshufb (struct expand_ve emit_insn (gen_ssse3_pshufbv16qi3 (target, op0, vperm)); else if (vmode == V32QImode) emit_insn (gen_avx2_pshufbv32qi3 (target, op0, vperm)); + else + emit_insn (gen_avx2_permvarv8si (target, vperm, op0)); } else { @@ -35163,9 +35164,58 @@ expand_vec_perm_1 (struct expand_vec_per if (d->op0 == d->op1) { int mask = nelt - 1; + bool identity_perm = true; + bool broadcast_perm = true; for (i = 0; i < nelt; i++) - perm2[i] = d->perm[i] & mask; + { + perm2[i] = d->perm[i] & mask; + if (perm2[i] != i) + identity_perm = false; + if (perm2[i]) + broadcast_perm = false; + } + + if (identity_perm) + { + if (!d->testing_p) + emit_move_insn (d->target, d->op0); + return true; + } + else if (broadcast_perm && TARGET_AVX2) + { + /* Use vpbroadcast{b,w,d}. */ + rtx op = d->op0, (*gen) (rtx, rtx) = NULL; + switch (d->vmode) + { + case V32QImode: + op = gen_lowpart (V16QImode, op); + gen = gen_avx2_pbroadcastv32qi; + break; + case V16HImode: + op = gen_lowpart (V8HImode, op); + gen = gen_avx2_pbro
Re: [PATCH] Clear DECL_GIMPLE_REG_P when making parameter copy addressable (PR tree-optimization/50735)
On Mon, Oct 17, 2011 at 08:49:34AM +0200, Richard Guenther wrote: > On Sun, Oct 16, 2011 at 5:47 PM, Jakub Jelinek wrote: > I think this should be exactly the other way around, using create_tmp_var, > copying TREE_ADDRESSABLE and setting DECL_GIMPLE_REG_P if it is not > addressable. > > Ok with that change. Here is what I've committed after bootstrap/regtest on x86_64-linux and i686-linux: 2011-10-18 Jakub Jelinek PR tree-optimization/50735 * function.c (gimplify_parameters): Use create_tmp_var instead of create_tmp_reg. If parm is not TREE_ADDRESSABLE and type is complex or vector type, set DECL_GIMPLE_REG_P. --- gcc/function.c.jj 2011-10-17 09:20:38.0 +0200 +++ gcc/function.c 2011-10-18 13:54:02.0 +0200 @@ -3617,7 +3617,7 @@ gimplify_parameters (void) && compare_tree_int (DECL_SIZE_UNIT (parm), STACK_CHECK_MAX_VAR_SIZE) > 0)) { - local = create_tmp_reg (type, get_name (parm)); + local = create_tmp_var (type, get_name (parm)); DECL_IGNORED_P (local) = 0; /* If PARM was addressable, move that flag over to the local copy, as its address will be taken, @@ -3625,6 +3625,9 @@ gimplify_parameters (void) as we'll query that flag during gimplification. */ if (TREE_ADDRESSABLE (parm)) TREE_ADDRESSABLE (local) = 1; + else if (TREE_CODE (type) == COMPLEX_TYPE + || TREE_CODE (type) == VECTOR_TYPE) + DECL_GIMPLE_REG_P (local) = 1; } else { Jakub
[PATCH] Fix gcc.dg/guality/asm-1.c regressions on 4.6 branch
Hi! Apparently a recent change (2011-10-1[23] or so) regressed gcc.dg/guality/asm-1.c test on x86_64-linux on the 4.6 branch. This turned out to be a bug in loc_descriptor which has been passing wrong mode on the recursive call, so when we had var_location note containing SImode subreg of some DImode expression, the inner loc_descriptor would just give up because of the mode mismatch. Fixed thusly (here the trunk version of the patch, attached 4.6 version), bootstrapped/regtested on x86_64-linux and i686-linux (both branches in each case), ok for trunk/4.6? 2011-10-18 Jakub Jelinek * dwarf2out.c (loc_descriptor): For SUBREG pass SUBREG_REG's mode as second argument instead of mode. --- gcc/dwarf2out.c.jj 2011-09-08 11:21:10.0 +0200 +++ gcc/dwarf2out.c 2011-10-18 14:09:46.0 +0200 @@ -12502,7 +12502,8 @@ loc_descriptor (rtx rtl, enum machine_mo legitimate to make the Dwarf info refer to the whole register which contains the given subreg. */ if (REG_P (SUBREG_REG (rtl)) && subreg_lowpart_p (rtl)) - loc_result = loc_descriptor (SUBREG_REG (rtl), mode, initialized); + loc_result = loc_descriptor (SUBREG_REG (rtl), +GET_MODE (SUBREG_REG (rtl)), initialized); else goto do_default; break; Jakub 2011-10-18 Jakub Jelinek * dwarf2out.c (loc_descriptor): For SUBREG pass SUBREG_REG's mode as second argument instead of mode. --- gcc/dwarf2out.c.jj 2011-08-02 10:42:23.0 +0200 +++ gcc/dwarf2out.c 2011-10-18 11:21:38.0 +0200 @@ -14608,7 +14608,8 @@ loc_descriptor (rtx rtl, enum machine_mo up an entire register. For now, just assume that it is legitimate to make the Dwarf info refer to the whole register which contains the given subreg. */ - loc_result = loc_descriptor (SUBREG_REG (rtl), mode, initialized); + loc_result = loc_descriptor (SUBREG_REG (rtl), + GET_MODE (SUBREG_REG (rtl)), initialized); break; case REG:
Re: [PATCH, i386 tests] New tests to check vectorization for AVX2 insns.
On Mon, Oct 17, 2011 at 7:49 AM, Kirill Yukhin wrote: > Thanks, guys, could anybody please commit that? > I checked it in for you. -- H.J.
[commit, spu] Fix -fPIC wrong-code regression
Hello, current mainline miscompiles -fPIC code on SPU, which leads to wrong code in libgcc, causing a large number of test failures. The problem is in the back-end's get_pic_reg routine which implements an optimization: in a leaf function, if register 74 is unused so far, use that register instead of the standard register 124 as PIC register. This helps becaues 74 is call-clobbered, and thus no prologue/epilogue code is needed. However, there is a problem in how the "register 74 is unused so far" check is implemented: with current mainline, this check succeeds in the post-reload splitter where uses of the PIC register are introduced. However, that same check is later repeated in prologue generation in order to emit code to set up the PIC register. At this point, register 74 is now considered as in use. This is true, of course, since it is in use by the code that was added in the post-reload splitter ... It would appear this used to work in the past because DF information was not updated between the post-reload splitter and prologue emitter passes. This seems to have changed recently (maybe due to the shrink- wrapping pass?), and now the back-end bug is exposed. Fortunately, the fix is simple: we perform the DF check only once and remember the result in cfun->machine. The patch below implements this. Tested on spu-elf, fixes many testcase failures. Committed to mainline. Bye, Ulrich ChangeLog: * config/spu/spu.c (struct machine_function): New data structure. (spu_init_machine_status): New function. (spu_option_override): Install it. (get_pic_reg): Set and use cfun->machine->pic_reg. (spu_split_immediate): Do not set crtl->uses_pic_offset_table. (need_to_save_reg): Use cfun->machine->pic_reg instead of checking crtl->uses_pic_offset_table. (spu_expand_prologue): Likewise. Index: gcc/config/spu/spu.c === *** gcc/config/spu/spu.c(revision 179977) --- gcc/config/spu/spu.c(working copy) *** static void spu_setup_incoming_varargs ( *** 500,509 --- 500,526 struct gcc_target targetm = TARGET_INITIALIZER; + /* Define the structure for the machine field in struct function. */ + struct GTY(()) machine_function + { + /* Register to use for PIC accesses. */ + rtx pic_reg; + }; + + /* How to allocate a 'struct machine_function'. */ + static struct machine_function * + spu_init_machine_status (void) + { + return ggc_alloc_cleared_machine_function (); + } + /* Implement TARGET_OPTION_OVERRIDE. */ static void spu_option_override (void) { + /* Set up function hooks. */ + init_machine_status = spu_init_machine_status; + /* Small loops will be unpeeled at -O3. For SPU it is more important to keep code small by default. */ if (!flag_unroll_loops && !flag_peel_loops) *** print_operand (FILE * file, rtx x, int c *** 1741,1752 static rtx get_pic_reg (void) { - rtx pic_reg = pic_offset_table_rtx; if (!reload_completed && !reload_in_progress) abort (); ! if (current_function_is_leaf && !df_regs_ever_live_p (LAST_ARG_REGNUM)) ! pic_reg = gen_rtx_REG (SImode, LAST_ARG_REGNUM); ! return pic_reg; } /* Split constant addresses to handle cases that are too large. --- 1758,1779 static rtx get_pic_reg (void) { if (!reload_completed && !reload_in_progress) abort (); ! ! /* If we've already made the decision, we need to keep with it. Once we've ! decided to use LAST_ARG_REGNUM, future calls to df_regs_ever_live_p may ! return true since the register is now live; this should not cause us to ! "switch back" to using pic_offset_table_rtx. */ ! if (!cfun->machine->pic_reg) ! { ! if (current_function_is_leaf && !df_regs_ever_live_p (LAST_ARG_REGNUM)) ! cfun->machine->pic_reg = gen_rtx_REG (SImode, LAST_ARG_REGNUM); ! else ! cfun->machine->pic_reg = pic_offset_table_rtx; ! } ! ! return cfun->machine->pic_reg; } /* Split constant addresses to handle cases that are too large. *** spu_split_immediate (rtx * ops) *** 1849,1855 { rtx pic_reg = get_pic_reg (); emit_insn (gen_addsi3 (ops[0], ops[0], pic_reg)); - crtl->uses_pic_offset_table = 1; } return flag_pic || c == IC_IL2s; } --- 1876,1881 *** need_to_save_reg (int regno, int saving) *** 1875,1883 return 1; if (flag_pic && regno == PIC_OFFSET_TABLE_REGNUM ! && (!saving || crtl->uses_pic_offset_table) ! && (!saving ! || !current_function_is_leaf || df_regs_ever_live_p (LAST_ARG_REGNUM))) return 1; return 0; } --- 1901,1907 return 1; if (flag_pic && regno == PIC_OFFSET_TABLE_REGNUM ! && (!saving || cfun->machine->pic_reg == pic_offset_table_
Re: [PATCH, i386 tests] New tests to check vectorization for AVX2 insns.
Thank you! K On Tue, Oct 18, 2011 at 7:42 PM, H.J. Lu wrote: > On Mon, Oct 17, 2011 at 7:49 AM, Kirill Yukhin > wrote: >> Thanks, guys, could anybody please commit that? >> > > I checked it in for you. > > > -- > H.J. >
Re: [PATCH 3/6] Emit macro expansion related diagnostics
David Edelsohn writes: > Your patch broke bootstrap on AIX because of the typedef "loc_t" > introduced in tree-diagnostics.c. The typedef conflicts with a > typedef in an AIX 5.3 header file for locales. AIX should not be > using that namespace, but the failure occurs before fix-includes, so > it is not possible to fix it there. > > How can we avoid the namespace conflict? I'll change the name to location_type as Gaby proposed and commit it as obvious. Sorry for the inconvenience. -- Dodji
[PATCH] Fix computed gotos on m68k
Hi, This patch fixes computed gotos on m68k, and probably other targets too (the fix is in the middle end). Several tests fail at present with "-O3 -fomit-frame-pointer". One example of erroneous behaviour is as follows: the function 'x' in comp-goto-1.c compiles to: x: lea (-104,%sp),%sp lea (104,%sp),%a0 move.l %a0,92(%sp) fmovem #63,44(%sp) movem.l #31996,(%sp) move.l %sp,96(%sp) move.l 108(%sp),-(%sp) lea (96,%sp),%a0 jsr y.1217 .L3: .L12: move.l 112(%sp),%d0 <--- wrong offset, should be 108. movem.l (%sp),#31996 fmovem 44(%sp),#63 lea (104,%sp),%sp rts Where the source code looks like: int x(a) { __label__ xlab; void y(a) { void *x = &&llab; if (a==-1) goto *x; if (a==0) goto xlab; llab: y (a-1); } y (a); xlab:; return a; } The offset is intended to load the argument value 'a' for the "return a;" statement, but actually loads one word beyond that. This has been broken because since mainline revision r160124, which (correctly) causes 'y' to be flagged as noreturn -- y never returns via the normal function return route. If ipa-pure-const.c is hacked to check that a function does not perform non-local gotos before marking it noreturn, the original (correct) offset is restored -- but that is simply papering over the problem. Prior to r160124, or with the aforementioned patch to inhibit the "noreturn" attribute for the function y, the call to y from x (just before "xlab:") has a goto to the following label emitted at expand time. Before/after that patch is applied, the diff of 148r.expand around the relevant part looks like: # BLOCK 2 freq:1 # PRED: ENTRY [100.0%] (fallthru,exec) y (a_1(D)); [static-chain: &FRAME.0] - # SUCC: 3 [100.0%] (ab,exec) + goto (xlab); + # SUCC: 3 [61.0%] (ab,exec) 4 [39.0%] (fallthru,exec) - # BLOCK 3 freq:1 - # PRED: 2 [100.0%] (ab,exec) + # BLOCK 3 freq:6100 + # PRED: 2 [61.0%] (ab,exec) : [non-local] # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 freq:1 - # PRED: 3 [100.0%] (fallthru,exec) + # PRED: 2 [39.0%] (fallthru,exec) 3 [100.0%] (fallthru,exec) xlab: Now, this is important because when the "goto " is emitted, a stack adjustment is also emitted to pop y's arguments (via dojump.c:do_pending_stack_adjust). Without that (i.e. the present state with no patches), when later passes try to figure out elimination offsets (from arg pointer to frame pointer, frame pointer to stack pointer and so on) at the labels L0/xlab, they get the wrong answer. (This happens in reload, called from ira.) So, on to the attached fix. As mentioned previously, GCC tracks elimination offsets throughout each function, which can vary as e.g. other functions are called. There is a concept of an "initial" offset: I believe that when nonlocal gotos are executed, and hence at all the labels a nonlocal goto may reach, the elimination offsets must have their "initial" values. We can find out if a label is the target of a nonlocal goto (and hence force the use of initial offsets) in a somewhat roundabout way, by looking up its containing basic block, seeing if that BB is a nonlocal goto target, then seeing if the label is the first instruction in the block. This seems slightly clumsy, but I didn't find another way of doing it. Interestingly perhaps, the comment near the fix in reload1.c:set_label_offsets hints at a possible issue which may be related: case CODE_LABEL: /* If we know nothing about this label, set the desired offsets. Note that this sets the offset at a label to be the offset before a label if we don't know anything about the label. This is not correct for the label after a BARRIER, but is the best guess we can make. If we guessed wrong, we will suppress an elimination that might have been possible had we been able to guess correctly. */ But obviously, in our case, rather than just failing to make an elimination which may have been possible, we get wrong code instead (there is a barrier before our non-local-goto-receiving label). Anyway. The attached patch appears to work fine, and we get the following changes in test results (cross-building and testing to ColdFire Linux): FAIL -> PASS: gcc.c-torture/execute/920501-7.c execution, -O3 -fomit-frame-pointer FAIL -> PASS: gcc.c-torture/execute/comp-goto-2.c execution, -O3 -fomit-frame-pointer FAIL -> PASS: gcc.dg/torture/stackalign/comp-goto-1.c -O3 -fomit-frame-pointer execution test FAIL -> PASS: gcc.dg/torture/stackalign/comp-goto-1.c -O3 -fomit-frame-pointer execution test [#2] FAIL -> PASS: gcc.dg/torture/stackalign/non-local-goto-4.c -O3 -fomit-frame-pointer execution test FAIL -> PASS: gcc.dg/torture/stackalign/non-local-goto-4.c -O3 -fomit-frame-pointer execution test [#2] Any comments, or OK to apply? Thanks, J
[PATCH] libiberty: fix psignal parameter type
When libiberty defines psignal, it doesn't use the canonical signature. This came up as a problem in a configuration where libiberty wants to define psignal itself, but the build environment's declares it too. This was a --with-newlib configuration using the newlib trunk, where newlib does define psignal (and strsignal, among other things), but libiberty's configure thinks it knows exactly what newlib does and doesn't define without checking. This hard-coding seems unwise to me, exactly because of the potential for cases like this, where newlib's set of available functions changes over time. I think the fragility of that hard-coding ought to be addressed somehow. But regardless, this change alleviates the immediate problem, and is otherwise harmless. Thanks, Roland libiberty/ 2011-10-18 Roland McGrath * strsignal.c (psignal): Use const second in parameter type. * functions.texi: Updated. diff --git a/libiberty/functions.texi b/libiberty/functions.texi index c9df186..2945c61 100644 --- a/libiberty/functions.texi +++ b/libiberty/functions.texi @@ -1097,7 +1097,7 @@ documented. @end deftypefn @c strsignal.c:541 -@deftypefn Supplemental void psignal (int @var{signo}, char *@var{message}) +@deftypefn Supplemental void psignal (int @var{signo}, const char *@var{message}) Print @var{message} to the standard error, followed by a colon, followed by the description of the signal specified by @var{signo}, diff --git a/libiberty/strsignal.c b/libiberty/strsignal.c index 666b1b4..3b56d16 100644 --- a/libiberty/strsignal.c +++ b/libiberty/strsignal.c @@ -538,7 +538,7 @@ strtosigno (const char *name) /* -@deftypefn Supplemental void psignal (int @var{signo}, char *@var{message}) +@deftypefn Supplemental void psignal (int @var{signo}, const char *@var{message}) Print @var{message} to the standard error, followed by a colon, followed by the description of the signal specified by @var{signo}, @@ -551,7 +551,7 @@ followed by a newline. #ifndef HAVE_PSIGNAL void -psignal (int signo, char *message) +psignal (int signo, const char *message) { if (signal_names == NULL) {
Re: [PATCH] libiberty: fix psignal parameter type
On Tue, Oct 18, 2011 at 9:49 AM, Roland McGrath wrote: > When libiberty defines psignal, it doesn't use the canonical signature. > This came up as a problem in a configuration where libiberty wants to > define psignal itself, but the build environment's declares > it too. > > This was a --with-newlib configuration using the newlib trunk, where newlib > does define psignal (and strsignal, among other things), but libiberty's > configure thinks it knows exactly what newlib does and doesn't define > without checking. This hard-coding seems unwise to me, exactly because of > the potential for cases like this, where newlib's set of available > functions changes over time. libiberty is no longer compiled for the target so this should never happen really. Thanks, Andrew Pinski
Re: [PATCH] libiberty: fix psignal parameter type
On Tue, Oct 18, 2011 at 9:50 AM, Andrew Pinski wrote: > libiberty is no longer compiled for the target so this should never > happen really. I see. I was indeed using an older source base, and just noticed that all the offending configure logic was still the same. Perhaps all the --with-newlib logic in libiberty's configure should be removed now? Regardless, the psignal change is an appropriate cleanup in its own right. Thanks, Roland
Re: [Patch,AVR]: PR50447: Tweak addhi3
Denis Chertykov schrieb: > 2011/10/18 Georg-Johann Lay : >> Denis Chertykov schrieb: >>> 2011/10/18 Georg-Johann Lay : This patch do some tweaks to addhi3 like adding QI scratch register. The original *addhi3 insn is still there and located prior to new addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this note) so that there is a version with and a version without scratch register. Patch passes without regressions. >>> Which improvements added by this patch ? >>> >>> Denis. >> If the addhi3 is expanded early, the addition happens with QI scratch which >> avoids reload of constant if target register is in NO_LD. And reduce register >> pressure as only QI is needed and not reload of constant to HI. >> >> Otherwise, there might be sequences like >> >> ldi r31, 2; *reload_inhi >> mov r12, r31 >> clr r13 >> >> add r14, r12 ; *addhi3 >> adc r15, r13 >> >> which now will be >> >> ldi r31, 2; addhi3_clobber >> add r14, r31 >> adc r15, __zero_reg__ >> >> Similar applies if the reload of the constant happens to LD regs: >> >> ldi r30, 2; *movhi >> clr r31 >> >> add r14, r12 ; *addhi3 >> adc r15, r13 >> >> will become >> >> ldi r30, 2; addhi3_clobber >> add r14, r30 >> adc r15, __zero_reg__ >> >> For *addhi3 insns the register pressure is not reduced but the insn sequence >> might be smarter if peep2 comes up with a QI scratch or if it detects a >> *reload_inhi insn just prior to the addition (and the reg that holds the >> reloaded constant dies after the addition). >> >> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn >> without scratch. This is easier because, e.g. prologue and epilogue >> generation >> generate add insns (not by means of addhi3 expander but by explicit >> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an >> addhi3 insn is to be generated via addhi3 expander late in the compilation >> process > > Please provide any real world example. > > Denis. Consider avr-libc (under the assumption that it is "real world" code): In avr-libc's build directory, and with the patch integrated: $ cd avr/lib/avr4 $ make clean && make CFLAGS='-save-temps -dp -Os' $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment) $ grep 'addhi3_clobber\/2' *.s | wc -l 33 This shows that the insns are already there before peep2 and thus no reload of 16-bit constant is needed; an 8-bit scratch is sufficient. Alternatively, the implementation could omit the expansion to addhi3_clobber in addhi3 expander and instead rely completely on peep2. However, that does not reduce register pressure because a 16-bit register will be allocated and the peep2 just prints things smarter and needs just a QI scratch to call avr_out_plus_clobber. For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a mess. as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter but better because it sets cc0. So you like this patch? Or prefer a patch that is neutral with respect to register allocator and just uses peep2 to print things smarter? Johann dtoa_prf.s: ldi r31,3; , ; 338 addhi3_clobber/2[length = 3] dtoa_prf.s- add r12,r31 ; s, dtoa_prf.s- adc r13,__zero_reg__ ; s -- dtoa_prf.s: ldi r31,3; , ; 447 addhi3_clobber/2[length = 3] dtoa_prf.s- add r12,r31 ; s, dtoa_prf.s- adc r13,__zero_reg__ ; s -- fgets.s:ldi r31,1; , ; 70 addhi3_clobber/2[length = 3] fgets.s-sub r14,r31 ; ivtmp.9, fgets.s-sbc r15,__zero_reg__ ; ivtmp.9 -- realloc.s: ldi r17,2; , ; 80 addhi3_clobber/2[length = 3] realloc.s- add r12,r17 ; tmp83, realloc.s- adc r13,__zero_reg__ ; -- realloc.s: ldi r18,2; , ; 85 addhi3_clobber/2[length = 3] realloc.s- add r12,r18 ; tmp84, realloc.s- adc r13,__zero_reg__ ; -- strtod.s: ldi r31,1; , ; 101 addhi3_clobber/2[length = 3] strtod.s- sub r14,r31 ; D.2581, strtod.s- sbc r15,__zero_reg__ ; D.2581 -- strtod.s: ldi r18,2; , ; 110 addhi3_clobber/2[length = 3] strtod.s- add r14,r18 ; nptr, strtod.s- adc r15,__zero_reg__ ; nptr -- strtod.s: ldi r21,7; , ; 120 addhi3_clobber/2[length = 3] strtod.s- add r14,r21 ; nptr, strtod.s- adc r15,__zero_reg__ ; nptr -- strtod.s: ldi r31,255 ; , ; 175 addhi3_clobber/2[length = 3] strtod.s- sub r14,r31 ; exp, strtod.s- sbc r15,r31 ; exp, -- strtod.s: ldi r18,1; , ; 185 addhi3_clobber/2[length = 3] strtod.s- sub r14,r18 ; exp, strtod.s- sbc r15,__zero_reg__ ; exp -- strtod.s: ldi r31,24 ; , ; 376 addhi3_clobbe
Re: [PATCH] libiberty: fix psignal parameter type
On Tue, Oct 18, 2011 at 9:49 AM, Roland McGrath wrote: > > libiberty/ > 2011-10-18 Roland McGrath > > * strsignal.c (psignal): Use const second in parameter type. > * functions.texi: Updated. This is OK. Thanks. Ian
Re: [PATCH] Fix computed gotos on m68k
Julian Brown writes: > Hi, > > This patch fixes computed gotos on m68k, and probably other targets too > (the fix is in the middle end). Several tests fail at present with "-O3 > -fomit-frame-pointer". This is the same bug as PR47918. As described in that PR trail, the bug can be triggered also on i386. Thanks for debugging this very nasty issue! > [...] /Mikael
Re: [PATCH] libiberty: fix psignal parameter type
On Tue, Oct 18, 2011 at 10:18 AM, Ian Lance Taylor wrote: > On Tue, Oct 18, 2011 at 9:49 AM, Roland McGrath wrote: >> >> libiberty/ >> 2011-10-18 Roland McGrath >> >> * strsignal.c (psignal): Use const second in parameter type. >> * functions.texi: Updated. > > This is OK. Thanks. But last I checked, I'm not a GCC committer. So somebody has to do it for me. Thanks, Roland
Re: [PATCH] i?86 vec_perm fixes and improvements
On 10/18/2011 08:30 AM, Jakub Jelinek wrote: > * config/i386/i386.c (ix86_expand_vec_perm): In merge_two use > mode SUBREG of operands[0] as target. > (valid_perm_using_mode_p): Don't ignore higher bits of d->perm. > (expand_vec_pshufb): For V8SImode vmode emit avx2_permvarv8si. > (expand_vec_perm_1): Handle identity and some broadcast > permutations. > (expand_vec_perm_interleave2): Handle also 32-byte modes, using > vperm2[fi]128 or vpunpck[lh]* followed by single insn permutation. > For d->testing_p return true earlier to avoid creating more GC > garbage. > (expand_vec_perm_vpermq_perm_1): New function. > (expand_vec_perm_vpshufb2_vpermq): For d->testing_p return true > earlier to avoid creating more GC garbage. Fix handling of > V16HImode. Avoid some SUBREGs in SET_DEST. > (expand_vec_perm_broadcast_1): Return false for 32-byte integer > vector modes. > (expand_vec_perm_vpshufb4_vpermq2): New function. > (ix86_expand_vec_perm_builtin_1): Call expand_vec_perm_vpermq_perm_1 > and expand_vec_perm_vpshufb4_vpermq2. Ok. r~
C++ PATCH for c++/50742 (ICE on switch with local using-decl)
A using-declaration adds a TREE_LIST to the list of names on a binding level, so we need to cope with that here. Tested x86_64-pc-linux-gnu, applying to trunk. commit 052e893fe307f33ca3d3c2ead06248e0ef738f16 Author: Jason Merrill Date: Mon Oct 17 22:04:08 2011 -0400 PR c++/50742 * decl.c (check_previous_goto_1): Handle using-decl. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 8b5033f..4b5b6c8 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -2683,7 +2683,8 @@ check_previous_goto_1 (tree decl, cp_binding_level* level, tree names, tree new_decls, old_decls = (b == level ? names : NULL_TREE); for (new_decls = b->names; new_decls != old_decls; - new_decls = DECL_CHAIN (new_decls)) + new_decls = (DECL_P (new_decls) ? DECL_CHAIN (new_decls) + : TREE_CHAIN (new_decls))) { int problem = decl_jump_unsafe (new_decls); if (! problem) diff --git a/gcc/testsuite/g++.dg/lookup/using23.C b/gcc/testsuite/g++.dg/lookup/using23.C new file mode 100644 index 000..5dd8d85 --- /dev/null +++ b/gcc/testsuite/g++.dg/lookup/using23.C @@ -0,0 +1,13 @@ +// PR c++/50742 + +typedef int A; + +void f(int i) +{ + switch (i) +{ +case 0: + using ::A; +default:; +} +}
Re: [PATCH 3/6] Emit macro expansion related diagnostics
Dodji Seketeli writes: > David Edelsohn writes: > >> Your patch broke bootstrap on AIX because of the typedef "loc_t" >> introduced in tree-diagnostics.c. I have committed the patch below. Tested on x86_64-unknown-linux-gnu only, unfortunately. I hope it fixes the build on AIX. From: Dodji Seketeli Date: Tue, 18 Oct 2011 18:40:06 +0200 Subject: [PATCH] Do not use loc_t as a type name gcc/ * tree-diagnostic (struct loc_t): Rename into struct loc_map_pair. (maybe_unwind_expanded_macro_loc): Adjust. --- gcc/tree-diagnostic.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/tree-diagnostic.c b/gcc/tree-diagnostic.c index 53b350b..b4b60dc 100644 --- a/gcc/tree-diagnostic.c +++ b/gcc/tree-diagnostic.c @@ -56,10 +56,10 @@ typedef struct { const struct line_map *map; source_location where; -} loc_t; +} loc_map_pair; -DEF_VEC_O (loc_t); -DEF_VEC_ALLOC_O (loc_t, heap); +DEF_VEC_O (loc_map_pair); +DEF_VEC_ALLOC_O (loc_map_pair, heap); /* Unwind the different macro expansions that lead to the token which location is WHERE and emit diagnostics showing the resulting @@ -111,9 +111,9 @@ maybe_unwind_expanded_macro_loc (diagnostic_context *context, const struct line_map **first_exp_point_map) { const struct line_map *map; - VEC(loc_t,heap) *loc_vec = NULL; + VEC(loc_map_pair,heap) *loc_vec = NULL; unsigned ix; - loc_t loc, *iter; + loc_map_pair loc, *iter; map = linemap_lookup (line_table, where); if (!linemap_macro_expansion_map_p (map)) @@ -132,7 +132,7 @@ maybe_unwind_expanded_macro_loc (diagnostic_context *context, loc.where = where; loc.map = map; - VEC_safe_push (loc_t, heap, loc_vec, &loc); + VEC_safe_push (loc_map_pair, heap, loc_vec, &loc); /* WHERE is the location of a token inside the expansion of a macro. MAP is the map holding the locations of that macro @@ -150,7 +150,7 @@ maybe_unwind_expanded_macro_loc (diagnostic_context *context, first macro which expansion triggered this trace was expanded inside a system header. */ if (!LINEMAP_SYSP (map)) -FOR_EACH_VEC_ELT (loc_t, loc_vec, ix, iter) +FOR_EACH_VEC_ELT (loc_map_pair, loc_vec, ix, iter) { source_location resolved_def_loc = 0, resolved_exp_loc = 0; diagnostic_t saved_kind; @@ -203,7 +203,7 @@ maybe_unwind_expanded_macro_loc (diagnostic_context *context, context->printer->prefix = saved_prefix; } - VEC_free (loc_t, heap, loc_vec); + VEC_free (loc_map_pair, heap, loc_vec); } /* This is a diagnostic finalizer implementation that is aware of -- 1.7.6.2 -- Dodji
Re: [Patch,AVR]: PR50447: Tweak addhi3
2011/10/18 Georg-Johann Lay : > Denis Chertykov schrieb: >> 2011/10/18 Georg-Johann Lay : >>> Denis Chertykov schrieb: 2011/10/18 Georg-Johann Lay : > This patch do some tweaks to addhi3 like adding QI scratch register. > > The original *addhi3 insn is still there and located prior to new > addhi3_clobber insn because addhi3 is special to reload (thanks Danis for > this > note) so that there is a version with and a version without scratch > register. > > Patch passes without regressions. > Which improvements added by this patch ? Denis. >>> If the addhi3 is expanded early, the addition happens with QI scratch which >>> avoids reload of constant if target register is in NO_LD. And reduce >>> register >>> pressure as only QI is needed and not reload of constant to HI. >>> >>> Otherwise, there might be sequences like >>> >>> ldi r31, 2 ; *reload_inhi >>> mov r12, r31 >>> clr r13 >>> >>> add r14, r12 ; *addhi3 >>> adc r15, r13 >>> >>> which now will be >>> >>> ldi r31, 2 ; addhi3_clobber >>> add r14, r31 >>> adc r15, __zero_reg__ >>> >>> Similar applies if the reload of the constant happens to LD regs: >>> >>> ldi r30, 2 ; *movhi >>> clr r31 >>> >>> add r14, r12 ; *addhi3 >>> adc r15, r13 >>> >>> will become >>> >>> ldi r30, 2 ; addhi3_clobber >>> add r14, r30 >>> adc r15, __zero_reg__ >>> >>> For *addhi3 insns the register pressure is not reduced but the insn sequence >>> might be smarter if peep2 comes up with a QI scratch or if it detects a >>> *reload_inhi insn just prior to the addition (and the reg that holds the >>> reloaded constant dies after the addition). >>> >>> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn >>> without scratch. This is easier because, e.g. prologue and epilogue >>> generation >>> generate add insns (not by means of addhi3 expander but by explicit >>> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an >>> addhi3 insn is to be generated via addhi3 expander late in the compilation >>> process >> >> Please provide any real world example. >> >> Denis. > > Consider avr-libc (under the assumption that it is "real world" code): > > In avr-libc's build directory, and with the patch integrated: > > $ cd avr/lib/avr4 > $ make clean && make CFLAGS='-save-temps -dp -Os' > $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment) > $ grep 'addhi3_clobber\/2' *.s | wc -l > 33 > > This shows that the insns are already there before peep2 and thus no reload of > 16-bit constant is needed; an 8-bit scratch is sufficient. > > Alternatively, the implementation could omit the expansion to addhi3_clobber > in > addhi3 expander and instead rely completely on peep2. However, that does not > reduce register pressure because a 16-bit register will be allocated and the > peep2 just prints things smarter and needs just a QI scratch to call > avr_out_plus_clobber. > > For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a > mess. > as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter > but > better because it sets cc0. > > So you like this patch? > Or prefer a patch that is neutral with respect to register allocator and just > uses peep2 to print things smarter? I'm interested in code improvements. What difference in size of avr-libc ? Denis.
Re: [Patch,AVR]: PR50447: Tweak addhi3
Just a note. Instead of `!reload_completed && !reload_in_progress'you can use `can_create_pseudo_p()' declared in rtl.h Denis.
Re: RFA: Improve tree-ssa-sink block selection
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/18/11 01:44, Paolo Bonzini wrote: > On 10/18/2011 07:10 AM, Jeff Law wrote: >> --- 467,475 if (gimple_code (use) != GIMPLE_PHI) { sinkbb = >> gimple_bb (use); ! sinkbb = select_best_block (frombb, >> gimple_bb (use), stmt); >> >> ! if (sinkbb == frombb) return false; >> >> *togsi = gsi_for_stmt (use); > > Useless assignment of sinkbb, otherwise looks fine. Thanks. I'll fix that. > > By the way, is it intended that sink_code_in_bb visits again > postdominators that were already visited (which with domwalk would > come for free)? As it is, the pass is quadratic when you have > something like this: I haven't really looked at the visitation order in tree-ssa-sink.c. The formulation as a walk over the pdom tree and sinking statements within each block is unfortunate, but workable. I'm not sure where you're seeing multiple visits of an individual block, but maybe I've got some kind of a mental block. The other way to formulate this problem is with recursion. A DFS walk to sink each statement's immediate uses, then sinking the statement itself. It's a nice, simple formulation that is annoyingly complex to implement in GCC due to "interesting" aspects of our immediate use iterators. Jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOnb/bAAoJEBRtltQi2kC7i7oIAIXnmR76SMZIBDQLHvMqEAME qTG5HkbL0kCb4EDMHYRXA4U8Wot1UWiuAkJzihBQMRWiqOs5v555iUx6N+M/l2Fu w+1SyqeZacNBLm9/bI/Ynbr55BXF3aKPxES5kqRWIJE1kCoOqKFOWAizgWJmyfMG ct13PpUqU1CT572cwl+GgCQulEWXQEgeOH/Iu/gLPFv+PSEhkEN0vA6foBYgCnk6 d4lGaa3NcQsUwhAlu4rNgQVAZL3cpTRL8QDIultxg3j/zEhcTwJ4YB4azDQcsRQ4 ZasKqD7m4HXnvuqS8TI66v+w4wKsparWyKTXAqp4dibnhON4l2Omes/ZmJokiUs= =HXaj -END PGP SIGNATURE-
Re: [Patch Darwin/Ada] work around PR target/50678
On Oct 18, 2011, at 4:58 AM, Iain Sandoe wrote: > The audit trail in the PR contains the detective work (mostly by Eric) that > concludes we have a long-standing bug in the Darwin x86-64 sigtramp unwind > data. Yeah, well, it is a bug, if they ever fix it. If they burn it in and define it as correct, then, it is an unfortunatism. If they fix it, it should be software updated to older system and all tools fixed to use the new scheme. I suspect they'll never fix it. > This has been filed as radar #10302855, but we need a work-around until that > is resolved (possibly forever on older systems). > +#if defined (__x86_64__) If this is for a target machine compile, this is fine. Looking through the code, it seems like target machine code, though, I wasn't sure. If it is for host code, then this isn't cross compile safe. For host code DARWIN_X86 and TARGET_64BIT (from tm.h) both need to be true, for one to be compiling for a 64-bit x86 machine. I think this is fine for relevant release branches.
Re: [Patch,AVR]: PR50447: Tweak addhi3
Denis Chertykov schrieb: > 2011/10/18 Georg-Johann Lay : >> Denis Chertykov schrieb: >>> 2011/10/18 Georg-Johann Lay : Denis Chertykov schrieb: > 2011/10/18 Georg-Johann Lay : >> This patch do some tweaks to addhi3 like adding QI scratch register. >> >> The original *addhi3 insn is still there and located prior to new >> addhi3_clobber insn because addhi3 is special to reload (thanks Danis >> for this >> note) so that there is a version with and a version without scratch >> register. >> >> Patch passes without regressions. >> > Which improvements added by this patch ? > > Denis. If the addhi3 is expanded early, the addition happens with QI scratch which avoids reload of constant if target register is in NO_LD. And reduce register pressure as only QI is needed and not reload of constant to HI. Otherwise, there might be sequences like ldi r31, 2; *reload_inhi mov r12, r31 clr r13 add r14, r12 ; *addhi3 adc r15, r13 which now will be ldi r31, 2; addhi3_clobber add r14, r31 adc r15, __zero_reg__ Similar applies if the reload of the constant happens to LD regs: ldi r30, 2; *movhi clr r31 add r14, r12 ; *addhi3 adc r15, r13 will become ldi r30, 2; addhi3_clobber add r14, r30 adc r15, __zero_reg__ For *addhi3 insns the register pressure is not reduced but the insn sequence might be smarter if peep2 comes up with a QI scratch or if it detects a *reload_inhi insn just prior to the addition (and the reg that holds the reloaded constant dies after the addition). As *addhi3 is special to reload, there is still an "ordinary" add addhi insn without scratch. This is easier because, e.g. prologue and epilogue generation generate add insns (not by means of addhi3 expander but by explicit gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an addhi3 insn is to be generated via addhi3 expander late in the compilation process >>> Please provide any real world example. >>> >>> Denis. >> Consider avr-libc (under the assumption that it is "real world" code): >> >> In avr-libc's build directory, and with the patch integrated: >> >> $ cd avr/lib/avr4 >> $ make clean && make CFLAGS='-save-temps -dp -Os' >> $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment) >> $ grep 'addhi3_clobber\/2' *.s | wc -l >> 33 >> >> This shows that the insns are already there before peep2 and thus no reload >> of >> 16-bit constant is needed; an 8-bit scratch is sufficient. >> >> Alternatively, the implementation could omit the expansion to addhi3_clobber >> in >> addhi3 expander and instead rely completely on peep2. However, that does not >> reduce register pressure because a 16-bit register will be allocated and the >> peep2 just prints things smarter and needs just a QI scratch to call >> avr_out_plus_clobber. >> >> For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a >> mess. >> as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter >> but >> better because it sets cc0. >> >> So you like this patch? >> Or prefer a patch that is neutral with respect to register allocator and just >> uses peep2 to print things smarter? > > I'm interested in code improvements. > What difference in size of avr-libc ? > > Denis. I have to tool for smart size analysis, so here is just a diff: After rebuilding avr-libc with respective compiler version, did respectively: $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-orig.txt $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-patch.txt and then $ diff -U 0 size-orig.txt size-patch.txt > size.diff As far as I can see, there is not a big gain but no object increases in size. For some files like ./avr/lib/avr2/libc.a:dtoa_prf.o size gain is 3%. For ./avr/lib/avr4/libc.a:vfprintf_std.o it's 1.7% and for others just one instruction better. Johann --- size-orig.txt 2011-10-18 19:59:52.0 +0200 +++ size-patch.txt 2011-10-18 19:50:59.0 +0200 @@ -7 +7 @@ -750 0 0 750 2ee dtoa_prf.o (ex ./avr/lib/avr51/libc.a) +724 0 0 724 2d4 dtoa_prf.o (ex ./avr/lib/avr51/libc.a) @@ -11 +11 @@ -722 6 0 728 2d8 malloc.o (ex ./avr/lib/avr51/libc.a) +720 6 0 726 2d6 malloc.o (ex ./avr/lib/avr51/libc.a) @@ -15,2 +15,2 @@ -510 0 0 510 1fe realloc.o (ex ./avr/lib/avr51/libc.a) -747 0 0 747 2eb strtod.o (ex ./avr/lib/avr51/libc.a) +506 0 0 506 1fa realloc.o (ex ./avr/lib/avr51/libc.a) +739 0 0 739 2e3 strtod.o (ex ./avr/lib/avr51/libc.a) @@ -18 +18 @@ -536 0 0 536 218 strtoul.o (ex ./avr/lib/avr51/libc.a
Re: [Patch, Darwin/libiberty] fix target/48108 by adding a section wrapper scheme to the darwin port.
On Oct 18, 2011, at 5:23 AM, Iain Sandoe wrote: > o we retain the ability to read an old-style file. > - I wonder if there should be an argument for removing this from the trunk > version, but retaining in 4.6. > - or, at the least, can we remove it from 4.8? I'd be fine with simply requiring recompiles, since support is so new. If you do up the code, might as well leave it in. > OK for trunk/4.6? Ok from my perspective. Don't recall if others want to review the simple-object-mach-o.c change.
C++ PATCH for c++/50531 (ICE with template dtor defaulted outside fn)
Here we were failing to recognize that a defaulted fn had already been instantiated, and crashed trying to instantiate it again. Tested x86_64-pc-linux-gnu, applying to trunk and 4.6. commit 10ff48cb9f7817ae8e95e71bd4b22989dafb2be2 Author: Jason Merrill Date: Tue Oct 18 14:25:42 2011 -0400 PR c++/50531 * pt.c (instantiate_decl): Recognize when a function defaulted outside the class is already instantiated. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 6fc60d5..56fa632 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -18045,6 +18045,8 @@ instantiate_decl (tree d, int defer_ok, d = DECL_CLONED_FUNCTION (d); if (DECL_TEMPLATE_INSTANTIATED (d) + || (TREE_CODE (d) == FUNCTION_DECL + && DECL_DEFAULTED_FN (d) && DECL_INITIAL (d)) || DECL_TEMPLATE_SPECIALIZATION (d)) /* D has already been instantiated or explicitly specialized, so there's nothing for us to do here. diff --git a/gcc/testsuite/g++.dg/cpp0x/defaulted32.C b/gcc/testsuite/g++.dg/cpp0x/defaulted32.C new file mode 100644 index 000..351cdae --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/defaulted32.C @@ -0,0 +1,21 @@ +// PR c++/50531 +// { dg-options -std=c++0x } + +template +class DataFilter +{ + public: + inline virtual ~DataFilter(); +}; + +template +inline DataFilter::~DataFilter() = default; + +class ARCalculator : public DataFilter +{ + public: + virtual void dataStart(int, int); +}; + +void ARCalculator::dataStart(int, int) +{}
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
http://codereview.appspot.com/5272048/diff/18001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/18001/tree-asan.c#newcode325 tree-asan.c:325: base = build_addr (t, current_function_decl); You need to create a temp var and build as gimple assignment. See init_tmp_var in tree-nested.c. http://codereview.appspot.com/5272048/
Re: [trans-mem] Rename/split __transaction into __transaction_atomic and __transaction_relaxed.
Several ICEs in the TM tests on C++, but I think they are old. C TM tests work except some missing optimizations (old failures too). c-c++-common/tm/wrap-[12].c fail on C++, but not on C: wrap-1.c:5:57: error: 'transaction_wrap' argument not an identifier I believe this is an old error too? (I don't yet have a testcase summary of before the changes, so I can't compare properly. But will send an follow-up if there are regressions.) Can you run the tests before your patch to make sure there are no regressions? That is, let's be sure that these are indeed old failures. The missed optimizations, particularly the memopt* ones are indeed old, or at least came with the merge. I don't know about the other failures.
Re: [PATCH] Fix pr50717: widening multiply bad code
On 17/10/11 11:43, Richard Guenther wrote: Ok. Committed, thanks. Andrew
Re: [Patch,AVR]: PR50447: Tweak addhi3
Georg-Johann Lay schrieb: Denis Chertykov schrieb: What difference in size of avr-libc ? I have no tool for smart size analysis, so here is just a diff: After rebuilding avr-libc with respective compiler version, did respectively: $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-orig.txt $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-patch.txt and then $ diff -U 0 size-orig.txt size-patch.txt > size.diff As far as I can see, there is not a big gain but no object increases in size. For some files like ./avr/lib/avr2/libc.a:dtoa_prf.o size gain is 3%. For ./avr/lib/avr4/libc.a:vfprintf_std.o it's 1.7% and for others just one instruction better. Actually there are some cases where the size increases by one instruction: -464 0 0 464 1d0 realloc.o (ex ./avr/lib/avr31/libc.a) +466 0 0 466 1d2 realloc.o (ex ./avr/lib/avr31/libc.a) -464 0 0 464 1d0 realloc.o (ex ./avr/lib/avr3/libc.a) +466 0 0 466 1d2 realloc.o (ex ./avr/lib/avr3/libc.a) Will have a look tomorrow; presumably it's adding +/-1 that force a scratch whilst the old code does not. Johann
[cxx-mem-model] compare_exchange implementation
Here's the last of the missing intrinsics. compare_exchange is slightly different than the others since it doesn't map directly to an rtl pattern. Its format is : bool __atomic_compare_exchange (T* mem, T* expected, T desired, bool weak, memory_order success, memory_order failure) note that the expected parameter is a pointer as well. In the case where false is returned, the value of expected is updated. Instead of providing a complex rtl pattern to try to match this, instead all that is required is to provide a compare_and_swap routine, which this builtin will then add some wrapper code around to match the required functionality. there are 2 rtl patterns it looks for, atomic_compare_and_swap_weak and atomic_compare_and_swap_strong both are of the form QImode atomicCAS (mem(O), current_val(O), expected(I), desired(I), memorymodel(const int)) When neither of these patterns are present, the implementation defaults to using __sync_val_compare_and_swap to provide the implementation. New tests have been added as well. Bootstraps and no new regressions on x86_64-unknown-linux-gnu so a bit of different rtl hacking around in there for me... does everything look OK? it seems to work :-)Actual pattern implementation and debugging will happen later when we actually write custom patterns for all these rather than defaulting to the current seq-cst patterns. Andrew * optabs.h (direct_optab_index): Replace DOI_atomic_compare_exchange with DOI_atomic_compare_and_swap_{strong,weak}. (direct_op): Add DOI_atomic_compare_and_swap_{strong,weak}. * genopinit.c: Set atomic_compare_and_swap_{strong,weak}_optab. * expr.h (expand_atomic_compare_exchange): Add parameter. * builtins.c (builtin_atomic_compare_exchange): Add weak parameter and verify it is a compile time constant. * optabs.c (expand_atomic_compare_exchange): Look for the presence of atomic_compare_and_swap_{strong,weak} patterns, otherwise use __sync_val_compare_and_swap. * builtin-types.def (BT_FN_BOOL_VPTR_PTR_I{1,2,4,8,16}_BOOL_INT_INT): Add the bool parameter. * sync-builtins.def (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_*): Use new prototype. * c-family/c-common.c (resolve_overloaded_builtin): Don't try to process a return value with an error mark. * fortran/types.def (BT_FN_BOOL_VPTR_PTR_I{1,2,4,8,16}_BOOL_INT_INT): Add the bool parameter. * testsuite/gcc.dg/atomic-invalid.c: Add compare_exchange failures. * testsuite/gcc.dg/atomic-compare-exchange-{1-5}.c: New tests. Index: optabs.h === *** optabs.h(revision 180092) --- optabs.h(working copy) *** enum direct_optab_index *** 690,696 /* Atomic operations with memory model parameters. */ DOI_atomic_exchange, ! DOI_atomic_compare_exchange, DOI_atomic_load, DOI_atomic_store, DOI_atomic_add_fetch, --- 690,697 /* Atomic operations with memory model parameters. */ DOI_atomic_exchange, ! DOI_atomic_compare_and_swap_strong, ! DOI_atomic_compare_and_swap_weak, DOI_atomic_load, DOI_atomic_store, DOI_atomic_add_fetch, *** typedef struct direct_optab_d *direct_op *** 765,772 #define atomic_exchange_optab \ (&direct_optab_table[(int) DOI_atomic_exchange]) ! #define atomic_compare_exchange_optab \ ! (&direct_optab_table[(int) DOI_atomic_compare_exchange]) #define atomic_load_optab \ (&direct_optab_table[(int) DOI_atomic_load]) #define atomic_store_optab \ --- 766,775 #define atomic_exchange_optab \ (&direct_optab_table[(int) DOI_atomic_exchange]) ! #define atomic_compare_and_swap_strong_optab \ ! (&direct_optab_table[(int) DOI_atomic_compare_and_swap_strong]) ! #define atomic_compare_and_swap_weak_optab \ ! (&direct_optab_table[(int) DOI_atomic_compare_and_swap_weak]) #define atomic_load_optab \ (&direct_optab_table[(int) DOI_atomic_load]) #define atomic_store_optab \ Index: genopinit.c === *** genopinit.c (revision 180092) --- genopinit.c (working copy) *** static const char * const optabs[] = *** 244,250 "set_direct_optab_handler (sync_lock_test_and_set_optab, $A, CODE_FOR_$(sync_lock_test_and_set$I$a$))", "set_direct_optab_handler (sync_lock_release_optab, $A, CODE_FOR_$(sync_lock_release$I$a$))", "set_direct_optab_handler (atomic_exchange_optab, $A, CODE_FOR_$(atomic_exchange$I$a$))", ! "set_direct_optab_handler (atomic_compare_exchange_optab, $A, CODE_FOR_$(atomic_compare_exchange$I$a$))", "set_direct_optab_handler (atomic_load_optab, $A, CODE_FOR_$(atomic_load$I$a$))", "set_direct_optab_handler (atomic_store_optab, $A, CODE_FOR_$(atomic_store$I$a$))", "set_direct_optab_handler (atomic_add_fetch_optab, $A,
Re: [cxx-mem-model] compare_exchange implementation
On Tue, Oct 18, 2011 at 06:03:16PM -0400, Andrew MacLeod wrote: > > Here's the last of the missing intrinsics. compare_exchange is > slightly different than the others since it doesn't map directly to > an rtl pattern. Its format is : > > bool __atomic_compare_exchange (T* mem, T* expected, T desired, > bool weak, memory_order success, memory_order failure) > > note that the expected parameter is a pointer as well. In the case > where false is returned, the value of expected is updated. I think the __sync_* way here was much better (two intrinsics instead of one, one bool-ish and one returning val). With mandating T*expected you force it to be addressable, probably until expansion time at which point it will be just forced into memory, which is highly undesirable. Jakub
Re: [cxx-mem-model] compare_exchange implementation
On 10/18/2011 06:07 PM, Jakub Jelinek wrote: On Tue, Oct 18, 2011 at 06:03:16PM -0400, Andrew MacLeod wrote: Here's the last of the missing intrinsics. compare_exchange is slightly different than the others since it doesn't map directly to an rtl pattern. Its format is : bool __atomic_compare_exchange (T* mem, T* expected, T desired, bool weak, memory_order success, memory_order failure) note that the expected parameter is a pointer as well. In the case where false is returned, the value of expected is updated. I think the __sync_* way here was much better (two intrinsics instead of one, one bool-ish and one returning val). With mandating T*expected you force it to be addressable, probably until expansion time at which point it will be just forced into memory, which is highly undesirable. Its impossible to implement a weak compare and swap unless you return both parameters in one operation. the compare_exchange matches the atomic interface for c++, and if we can't resolve it with a lock free instruction sequence, we have to leave an external call with this format to a library, so I that why I provide this built-in. Neither rth nor I like the addressable parameter, so thats why I left the rtl pattern for weak and strong compare and swap without that addressable argument, and let this builtin generate wrapper code around it. You could provide an __atomic version of the bool and val routines with memory model we toyed with making the compare_and_swap return both values so we could implement a weak version, but getting 2 return values is not pretty. It could be done with 2 separate built-ins that relate to each other, but thats not great either. we may think of a more brilliant way later to do it under the covers, but at the moment, this gets us the external interface we're looking for. Andrew
[google] Suppress FDO-use related notes/warnings (issue5294043)
Suppress verbose notes/warnings printed in FDO-use compilation. (1) Add option -fprofile-use-verbose. When this option is on, FDO-use compilation prints out all the notes as that of today. When this option is off (the default), all notes are suppressed. (2) Make several unconditional warnings under OPT_Wcoverage_mismatch. So that they can be turned off via -Wno-coverage-mismatch. (3) Make several unconditional warnings for LIPO under flag_ripa_verbose. (can be turned on via -fripa-verbose). This patch is for google-main only. Tested with bootstrap and regression tests. 2011-10-18 Rong Xu * gcc/common.opt (fprofile-use-verbose): New flag. * gcc/value-prof.c (check_ic_counter): guard notes by flag_profile_use_verbose. (find_func_by_funcdef_no): Ditto. (check_ic_target): Ditto. (check_counter): Ditto. (check_ic_counter): Ditto. * gcc/mcf.c (find_minimum_cost_flow): Ditto. * gcc/profile.c (read_profile_edge_counts): (compute_branch_probabilities): * gcc/coverage.c (read_counts_file): guard LIPO warnings by flag_ripa_verbose. (get_coverage_counts): guard notes by flag_profile_use_verbose; make warning under OPT_Wcoverage_mismatch. * gcc/tree-profile.c: (gimple_gen_reusedist): Ditto. (maybe_issue_profile_use_note): Ditto. (optimize_reusedist): Ditto. * gcc/testsuite/gcc.dg/pr32773.c: add -fprofile-use-verbose. * gcc/testsuite/gcc.dg/pr40209.c: Ditto. * gcc/testsuite/gcc.dg/pr26570.c: Ditto. * gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto. Index: gcc/value-prof.c === --- gcc/value-prof.c(revision 180106) +++ gcc/value-prof.c(working copy) @@ -472,9 +472,10 @@ : DECL_SOURCE_LOCATION (current_function_decl); if (flag_profile_correction) { - inform (locus, "correcting inconsistent value profile: " - "%s profiler overall count (%d) does not match BB count " - "(%d)", name, (int)*all, (int)bb_count); + if (flag_profile_use_verbose) +inform (locus, "correcting inconsistent value profile: %s " + "profiler overall count (%d) does not match BB count " +"(%d)", name, (int)*all, (int)bb_count); *all = bb_count; if (*count > *all) *count = *all; @@ -510,33 +511,42 @@ location_t locus; if (*count1 > all && flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, "Correcting inconsistent value profile: " - "ic (topn) profiler top target count (%ld) exceeds " - "BB count (%ld)", (long)*count1, (long)all); + if (flag_profile_use_verbose) +{ + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, "Correcting inconsistent value profile: " + "ic (topn) profiler top target count (%ld) exceeds " + "BB count (%ld)", (long)*count1, (long)all); +} *count1 = all; } if (*count2 > all && flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, "Correcting inconsistent value profile: " - "ic (topn) profiler second target count (%ld) exceeds " - "BB count (%ld)", (long)*count2, (long)all); + if (flag_profile_use_verbose) +{ + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, "Correcting inconsistent value profile: " + "ic (topn) profiler second target count (%ld) exceeds " + "BB count (%ld)", (long)*count2, (long)all); +} *count2 = all; } if (*count2 > *count1) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, "Corrupted topn ic value profile: " - "first target count (%ld) is less than the second " - "target count (%ld)", (long)*count1, (long)*count2); + if (flag_profile_use_verbose) +{ + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, "Corrupted topn ic value profile: " + "first target count (%ld) is less than the second " + "target count (%ld)", (long)*count1, (long)*count2); +} return true; } @@
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
http://codereview.appspot.com/5272048/diff/18001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/18001/tree-asan.c#newcode325 tree-asan.c:325: base = build_addr (t, current_function_decl); There are issues with creating address expressions from TARGET_MEM_REF in gcc. Since you want compiler to do optimization on instrumented code as much as possible, asan instrumentation is better done as early as possible after ipa inlining -- and this will also solve this problem. I tried moving asan pass before loop opt, and it worked fine. http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Tue, Oct 18, 2011 at 3:56 PM, wrote: > On 2011/10/18 22:52:33, davidxl wrote: >> >> http://codereview.appspot.com/5272048/diff/18001/tree-asan.c >> File tree-asan.c (right): > > > http://codereview.appspot.com/5272048/diff/18001/tree-asan.c#newcode325 >> >> tree-asan.c:325: base = build_addr (t, current_function_decl); >> There are issues with creating address expressions from TARGET_MEM_REF > > in gcc. >> >> Since you want compiler to do optimization on instrumented code as > > much as >> >> possible, asan instrumentation is better done as early as possible > > after ipa > > Why? > I would actually say that I want the instrumentation to happen as late > as possible because this will instrument fewer memory accesses. > For example, asan certainly needs to happen after loop invariants are > moved out and common subexpressions (including loads) are eliminated. > No? yes -- so a good choice would be after PRE and PDE (pre, sink_code) which should handle most of the loop invariant memory loads. David > >> inlining -- and this will also solve this problem. I tried moving asan > > pass >> >> before loop opt, and it worked fine. > > > > http://codereview.appspot.com/5272048/ >
Re: [PATCH] Add an intermediate coverage format for gcov
On Wed, Oct 5, 2011 at 9:58 AM, Mike Stump wrote: > > On Oct 5, 2011, at 12:47 AM, Sharad Singhai wrote: > > This patch adds an intermediate coverage format (enabled via 'gcov > > -i'). This is a compact format as it does not require source files. > > I don't like any of the tags, I think if you showed them to 100 people that > had used gcov before, very few of them would be able to figure out the > meaning without reading the manual. Why make it completely cryptic? A > binary encoded compress format is smaller and just as readable. > > SF, sounds like single float to me, I'd propose file. > FN, sounds like filename, I'd propose line. > FNDA, can't even guess at what it means, I'd propose fcount. > BA, I'd propose branch, for 0, notexe, for 1, nottaken, for 2 taken. > DA, I'd propose lcount > > I'd propose fcount be listed as fname,ecount, to match the ordering of > lcount. Also, implicit in the name, is the ordering f, then count, l, then > count. > > I think if you showed the above to 100 people that had used gcov before, I > think we'd be up past 90% of the people able to figure it out, without > reading the doc. Okay, I liked the idea of self-descriptive tags. I have updated the patch based on your suggestions. I have simplified the format somewhat. Instead of repeating function name, I use a 'function' tag with the format function:,, I also dropped the unmangled function names, they were turning out to be too unreadable and not really useful in this context. Here is the updated patch. OK for trunk? 2011-10-18 Sharad Singhai * doc/gcov.texi: Document gcov intermediate format. * gcov.c (print_usage): Handle new option. (process_args): Handle new option. (get_gcov_file_intermediate_name): New function. (output_intermediate_file): New function. (generate_results): Handle new option. * testsuite/lib/gcov.exp: Handle intermediate format. * testsuite/g++.dg/gcov/gcov-8.C: New testcase. Index: doc/gcov.texi === --- doc/gcov.texi (revision 179873) +++ doc/gcov.texi (working copy) @@ -130,6 +130,7 @@ gcov [@option{-v}|@option{--version}] [@ [@option{-f}|@option{--function-summaries}] [@option{-o}|@option{--object-directory} @var{directory|file}] @var{sourcefiles} [@option{-u}|@option{--unconditional-branches}] + [@option{-i}|@option{--intermediate-format}] [@option{-d}|@option{--display-progress}] @c man end @c man begin SEEALSO @@ -216,6 +217,45 @@ Unconditional branches are normally not @itemx --display-progress Display the progress on the standard output. +@item -i +@itemx --intermediate-format +Output gcov file in an easy-to-parse intermediate text format that can +be used by @command{lcov} or other tools. The output is a single +@file{.gcov} file per @file{.gcda} file. No source code is required. + +The format of the intermediate @file{.gcov} file is plain text with +one entry per line + +@smallexample +file:@var{source_file_name} +function:@var{function_name},@var{line_number},@var{execution_count} +lcount:@var{line number},@var{execution_count} +branch:@var{line_number},@var{branch_coverage_type} + +Where the @var{branch_coverage_type} is + notexec (Branch not executed) + taken (Branch executed and taken) + nottaken (Branch executed, but not taken) + +There can be multiple @var{file} entries in an intermediate gcov +file. All entries following a @var{file} pertain to that source file +until the next @var{file} entry. +@end smallexample + +Here is a sample when @option{-i} is used in conjuction with @option{-b} option: + +@smallexample +file:array.cc +function:_Z3sumRKSt6vectorIPiSaIS0_EE,11,1 +function:main,22,1 +lcount:11,1 +lcount:12,1 +lcount:14,1 +branch:14,taken +lcount:26,1 +branch:28,nottaken +@end smallexample + @end table @command{gcov} should be run with the current directory the same as that Index: gcov.c === --- gcov.c (revision 179873) +++ gcov.c (working copy) @@ -311,6 +311,9 @@ static int flag_gcov_file = 1; static int flag_display_progress = 0; +/* Output *.gcov file in intermediate format used by 'lcov'. */ +static int flag_intermediate_format = 0; + /* For included files, make the gcov output file name include the name of the input source file. For example, if x.h is included in a.c, then the output file name is a.c##x.h.gcov instead of x.h.gcov. */ @@ -436,6 +439,7 @@ print_usage (int error_p) fnotice (file, " -o, --object-directory DIR|FILE Search for object files in DIR or called FILE\n"); fnotice (file, " -p, --preserve-pathsPreserve all pathname components\n"); fnotice (file, " -u, --unconditional-branchesShow unconditional branch counts too\n"); + fnotice (file, " -i, --intermediate-format Output .gcov file in intermediate text format\n"); fnotice (file,
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
It will be weird to put the instrumentation pass inside loop opt, besides memory loads which are loop invariants and redundant stores in loop should be handled by pre/pde. +cc Richard Guenther You may want to ask the middle-end maintainer to review your code at this point if you want it to be in trunk. thanks, David On Tue, Oct 18, 2011 at 4:12 PM, wrote: >> yes -- so a good choice would be after PRE and PDE (pre, sink_code) >> which should handle most of the loop invariant memory loads. > > how about pass_lim? I think asan should be after lim. > > http://codereview.appspot.com/5272048/ >
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Tue, Oct 18, 2011 at 19:31, Xinliang David Li wrote: > It will be weird to put the instrumentation pass inside loop opt, > besides memory loads which are loop invariants and redundant stores in > loop should be handled by pre/pde. > > +cc Richard Guenther > > You may want to ask the middle-end maintainer to review your code at > this point if you want it to be in trunk. For trunk inclusion, we need to decide what to do wrt mudflap. Clearly, if ASAN offers the same protections and considerable better performance, then that should be an easy decision. I still have not looked at the implementation in detail, but I like the fact that it is a pure gimple pass. If the instrumentation can be statically optimized, then that is a clear advantage over mudflap, which we have never been able to optimize properly. More comments on the patch itself coming soon. Diego.
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On 29/09/2011, at 10:23 PM, Richard Guenther wrote: > On Thu, Sep 29, 2011 at 10:27 AM, Tom de Vries wrote: > + && SSA_NAME_IS_DEFAULT_DEF (expr) > + && TREE_CODE (var) == PARM_DECL > + && POINTER_TYPE_P (TREE_TYPE (var)) > + && TYPE_USER_ALIGN (TREE_TYPE (TREE_TYPE (var > + val = get_align_value (TYPE_ALIGN (TREE_TYPE (TREE_TYPE (var))), > + TREE_TYPE (var), 0); > > I realize that the scope where this applies is pretty narrow (and thus > bad fallout is quite unlikely). But I suppose we should document > somewhere that GCC treats alignment attributes on parameters > more strict than say, on assignments. Richard, the intention here is for a follow up patch to change "&& TYPE_USER_ALIGN" to "&& (TYPE_USER_ALIGN || flag_assume_aligned_parameters)". I understand that you will probably not like the idea of adding flag_assume_aligned_parameters, but it wouldn't make such an option any less valuable for experienced users of GCC. For some users this option will the additional piece of rope to hang themselves; for others it will produce good benefits of better performance. [Disclaimer: I've put significant amount of my time into investigation of this problem and hate to see it all go to waste :-).] Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
Re: [google] Suppress FDO-use related notes/warnings (issue5294043)
On Tue, Oct 18, 2011 at 3:48 PM, Rong Xu wrote: > Suppress verbose notes/warnings printed in FDO-use compilation. > (1) Add option -fprofile-use-verbose. Gcc currently does not emit informational messages on high level transformations such as inlining, value profiling transformations (ic promotion info messages are emitted in lipo mode in google/main), loop peeling and unrolling, and other loop opts. I can see those are very useful for triaging performance regressions etc, but turning those on would be too verbose for non-power users and can be annoying. They (when introduced in the future) should be guarded. I can see they should be guarded using the same option and possibly with a verbose level. Something like: -fopt-info=<1,2,3> For now, a nonparameterized option should be good enough. (similarly, -fopt-report can be used to dump optimization report which is more structured -- e.g, grouped via optimizations, not functions). There are some unrelated changes (e.g, histogram free etc) which should be committed separately. David > When this option is on, FDO-use > compilation prints out all the notes as that of today. When this > option is off (the default), all notes are suppressed. > (2) Make several unconditional warnings under OPT_Wcoverage_mismatch. > So that they can be turned off via -Wno-coverage-mismatch. > (3) Make several unconditional warnings for LIPO under flag_ripa_verbose. > (can be turned on via -fripa-verbose). > > This patch is for google-main only. > > Tested with bootstrap and regression tests. > > 2011-10-18 Rong Xu > > * gcc/common.opt (fprofile-use-verbose): New flag. > * gcc/value-prof.c (check_ic_counter): guard notes by > flag_profile_use_verbose. > (find_func_by_funcdef_no): Ditto. > (check_ic_target): Ditto. > (check_counter): Ditto. > (check_ic_counter): Ditto. > * gcc/mcf.c (find_minimum_cost_flow): Ditto. > * gcc/profile.c (read_profile_edge_counts): > (compute_branch_probabilities): > * gcc/coverage.c (read_counts_file): guard LIPO > warnings by flag_ripa_verbose. > (get_coverage_counts): guard notes > by flag_profile_use_verbose; make warning > under OPT_Wcoverage_mismatch. > * gcc/tree-profile.c: (gimple_gen_reusedist): Ditto. > (maybe_issue_profile_use_note): Ditto. > (optimize_reusedist): Ditto. > * gcc/testsuite/gcc.dg/pr32773.c: add -fprofile-use-verbose. > * gcc/testsuite/gcc.dg/pr40209.c: Ditto. > * gcc/testsuite/gcc.dg/pr26570.c: Ditto. > * gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto. > > Index: gcc/value-prof.c > === > --- gcc/value-prof.c (revision 180106) > +++ gcc/value-prof.c (working copy) > @@ -472,9 +472,10 @@ > : DECL_SOURCE_LOCATION (current_function_decl); > if (flag_profile_correction) > { > - inform (locus, "correcting inconsistent value profile: " > - "%s profiler overall count (%d) does not match BB count " > - "(%d)", name, (int)*all, (int)bb_count); > + if (flag_profile_use_verbose) > + inform (locus, "correcting inconsistent value profile: %s " > + "profiler overall count (%d) does not match BB count " > + "(%d)", name, (int)*all, (int)bb_count); > *all = bb_count; > if (*count > *all) > *count = *all; > @@ -510,33 +511,42 @@ > location_t locus; > if (*count1 > all && flag_profile_correction) > { > - locus = (stmt != NULL) > - ? gimple_location (stmt) > - : DECL_SOURCE_LOCATION (current_function_decl); > - inform (locus, "Correcting inconsistent value profile: " > - "ic (topn) profiler top target count (%ld) exceeds " > - "BB count (%ld)", (long)*count1, (long)all); > + if (flag_profile_use_verbose) > + { > + locus = (stmt != NULL) > + ? gimple_location (stmt) > + : DECL_SOURCE_LOCATION (current_function_decl); > + inform (locus, "Correcting inconsistent value profile: " > + "ic (topn) profiler top target count (%ld) exceeds " > + "BB count (%ld)", (long)*count1, (long)all); > + } > *count1 = all; > } > if (*count2 > all && flag_profile_correction) > { > - locus = (stmt != NULL) > - ? gimple_location (stmt) > - : DECL_SOURCE_LOCATION (current_function_decl); > - inform (locus, "Correcting inconsistent value profile: " > - "ic (topn) profiler second target count (%ld) exceeds " > - "BB count (%ld)", (long)*count2, (long)all); > + if (flag_profile_use_verbose) > + { > + locus = (stmt != NULL) > + ? gimple_location (stmt) > + : DECL