Re: [v3] plus
On 25 September 2013 06:41, Marc Glisse wrote: > On Wed, 25 Sep 2013, Jonathan Wakely wrote: > >> I've had this sitting in my tree waiting to do something with, > > > I did ask last week if someone had done it already... Sorry :-\ > >> I'm about to go to sleep so didn't check if the test covers anything yours >> doesn't. > > > In the test you have basic cover for all functors, and I cover only 2 cases > (more specifically though, since I look at the return type cv-qual). > > In my patch, I added constexpr and noexcept, I couldn't see any reason not > to for such basic utilities. Yes, I did read the wiki and noticed the vote > yesterday about constexpr, but imho that's wrong. > > >> It looks like your patch adds the default template argument even in >> C++98 mode, I avoided that by putting forward declarations at the top >> of the file, in a #if block. > > > This only lets me write: > std::plus<> *p; > in C++98 mode (std::plus<> p; gives an error), doesn't seem that bad. > > And I also add the void specializations in C++11 mode, as an extension. > > Well, let's forget my patch and go with yours, though at least adding > noexcept seems like a good idea. Yes, noexcept is a good idea. > (too bad we don't have noexcept(auto) here) > (too bad we can't use decltype(auto) for the return type, as that would > disable sfinae, it is a bad sign when the standard turns its nose up at its > own dog food...) Yeah, I think the idea is that concepts will make SFINAE a thing of the past, but we're not there yet.
Re: [PATCH] Don't always instrument shifts (PR sanitizer/58413)
Ping. On Tue, Sep 17, 2013 at 12:32:03PM +0200, Marek Polacek wrote: > On Mon, Sep 16, 2013 at 08:35:35PM +0200, Jakub Jelinek wrote: > > On Fri, Sep 13, 2013 at 08:01:36PM +0200, Marek Polacek wrote: > > I'd say the above is going to be a maintainance nightmare, with all the code > > duplication. And you are certainly going to miss cases that way, > > e.g. > > void > > foo (void) > > { > > int A[-2 / -1] = {}; > > } > > > > I'd say instead of adding all this, you should just at the right spot insert > > if (integer_zerop (t)) return NULL_TREE; or similar. > > > > For shift instrumentation, I guess you could add > > if (integer_zerop (t) && (tt == NULL_TREE || integer_zerop (tt))) > > return NULL_TREE; > > right before: > > t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), op0, t); > > Yeah, this is _much_ better. I'm glad we can live without that > ugliness. > > > > +/* PR sanitizer/58413 */ > > > +/* { dg-do run } */ > > > +/* { dg-options "-fsanitize=shift -w" } */ > > > + > > > +int x = 7; > > > +int > > > +main (void) > > > +{ > > > + /* All of the following should pass. */ > > > + int A[128 >> 5] = {}; > > > + int B[128 << 5] = {}; > > > + > > > + static int e = > > > +((int) > > > + (0x | ((31 & ((1 << (4)) - 1)) << (((15) + 6) + 4)) | > > > + ((0) << ((15) + 6)) | ((0) << (15; > > > > This relies on int32plus, so needs to be /* { dg-do run { target int32plus > > } } */ > > Fixed. > > > > --- gcc/testsuite/c-c++-common/ubsan/shift-5.c.mp32013-09-13 > > > 18:25:06.195847144 +0200 > > > +++ gcc/testsuite/c-c++-common/ubsan/shift-5.c2013-09-13 > > > 19:16:38.990211229 +0200 > > > @@ -0,0 +1,21 @@ > > > +/* { dg-do compile} */ > > > +/* { dg-options "-fsanitize=shift -w" } */ > > > +/* { dg-shouldfail "ubsan" } */ > > > + > > > +int x; > > > +int > > > +main (void) > > > +{ > > > + /* None of the following should pass. */ > > > + switch (x) > > > +{ > > > +case 1 >> -1:/* { dg-error "" } */ > > > +case -1 >> -1: /* { dg-error "" } */ > > > +case 1 << -1:/* { dg-error "" } */ > > > +case -1 << -1: /* { dg-error "" } */ > > > +case -1 >> 200: /* { dg-error "" } */ > > > +case 1 << 200: /* { dg-error "" } */ > > > > Can't you fill in the error you are expecting, or is the problem > > that the wording is very different between C and C++? > > I discovered { target c } stuff, so I filled in both error messages. > > This patch seems to work: bootstrap-ubsan passes + ubsan testsuite > passes too. Ok for trunk? > > 2013-09-17 Marek Polacek > Jakub Jelinek > > PR sanitizer/58413 > c-family/ > * c-ubsan.c (ubsan_instrument_shift): Don't instrument > an expression if we can prove it is correct. > (ubsan_instrument_division): Likewise. Remove unnecessary > check. > > testsuite/ > * c-c++-common/ubsan/shift-4.c: New test. > * c-c++-common/ubsan/shift-5.c: New test. > * c-c++-common/ubsan/div-by-zero-5.c: New test. > * gcc.dg/ubsan/c-shift-1.c: New test. > > --- gcc/c-family/c-ubsan.c.mp 2013-09-17 12:24:44.582835840 +0200 > +++ gcc/c-family/c-ubsan.c2013-09-17 12:24:48.772849823 +0200 > @@ -51,14 +51,6 @@ ubsan_instrument_division (location_t lo >if (TREE_CODE (type) != INTEGER_TYPE) > return NULL_TREE; > > - /* If we *know* that the divisor is not -1 or 0, we don't have to > - instrument this expression. > - ??? We could use decl_constant_value to cover up more cases. */ > - if (TREE_CODE (op1) == INTEGER_CST > - && integer_nonzerop (op1) > - && !integer_minus_onep (op1)) > -return NULL_TREE; > - >t = fold_build2 (EQ_EXPR, boolean_type_node, > op1, build_int_cst (type, 0)); > > @@ -74,6 +66,11 @@ ubsan_instrument_division (location_t lo >t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, x); > } > > + /* If the condition was folded to 0, no need to instrument > + this expression. */ > + if (integer_zerop (t)) > +return NULL_TREE; > + >/* In case we have a SAVE_EXPR in a conditional context, we need to > make sure it gets evaluated before the condition. */ >t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), op0, t); > @@ -138,6 +135,11 @@ ubsan_instrument_shift (location_t loc, >tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, x, tt); > } > > + /* If the condition was folded to 0, no need to instrument > + this expression. */ > + if (integer_zerop (t) && (tt == NULL_TREE || integer_zerop (tt))) > +return NULL_TREE; > + >/* In case we have a SAVE_EXPR in a conditional context, we need to > make sure it gets evaluated before the condition. */ >t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), op0, t); > --- gcc/testsuite/c-c++-common/ubsan/shift-4.c.mp 2013-09-17 > 12:25:12.130931875 +0200 > +++ gcc/testsuite/c-c++-common/ubsan/shift-4.c2013-09-17 > 10:19:44.66
Re: [PATCH] Handle IDENTIFIER_NODEs in ubsan.c (PR sanitizer/58420)
Ping. On Mon, Sep 16, 2013 at 05:49:55PM +0200, Marek Polacek wrote: > This patch amends the chunk of code where we are determining the > type name; I haven't consider that IDENTIFIER_NODEs require special > handling, since we need to omit the DECL_NAME. I had something similar > in http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00917.html patch, but > there I had a thinko: I need to check that TYPE_NAME is non-NULL first. > > Regtested/ran bootstrap-ubsan on x86_64-linux. > > Ok for trunk? > > 2013-09-16 Marek Polacek > > PR sanitizer/58420 > * ubsan.c (ubsan_type_descriptor): Handle IDENTIFIER_NODEs > when determining the type name. > > --- gcc/ubsan.c.mp2013-09-16 14:22:07.195918175 +0200 > +++ gcc/ubsan.c 2013-09-16 14:22:10.503929477 +0200 > @@ -260,11 +260,18 @@ ubsan_type_descriptor (tree type) >unsigned short tkind, tinfo; > >/* At least for INTEGER_TYPE/REAL_TYPE/COMPLEX_TYPE, this should work. > - ??? For e.g. type_unsigned_for (type), the TYPE_NAME would be NULL. */ > + For e.g. type_unsigned_for (type) or bit-fields, the TYPE_NAME > + would be NULL. */ >if (TYPE_NAME (type) != NULL) > -tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type))); > +{ > + if (TREE_CODE (TYPE_NAME (type)) == IDENTIFIER_NODE) > + tname = IDENTIFIER_POINTER (TYPE_NAME (type)); > + else > + tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type))); > +} >else > tname = ""; > + >if (TREE_CODE (type) == INTEGER_TYPE) > { >/* For INTEGER_TYPE, this is 0x. */ > > Marek Marek
Re: [PATCH] Don't always instrument shifts (PR sanitizer/58413)
On Wed, Sep 25, 2013 at 10:34:42AM +0200, Marek Polacek wrote: > > 2013-09-17 Marek Polacek > > Jakub Jelinek > > > > PR sanitizer/58413 > > c-family/ > > * c-ubsan.c (ubsan_instrument_shift): Don't instrument > > an expression if we can prove it is correct. > > (ubsan_instrument_division): Likewise. Remove unnecessary > > check. > > > > testsuite/ > > * c-c++-common/ubsan/shift-4.c: New test. > > * c-c++-common/ubsan/shift-5.c: New test. > > * c-c++-common/ubsan/div-by-zero-5.c: New test. > > * gcc.dg/ubsan/c-shift-1.c: New test. Ok. Jakub
Re: [PATCH] Handle IDENTIFIER_NODEs in ubsan.c (PR sanitizer/58420)
On Wed, Sep 25, 2013 at 10:35:40AM +0200, Marek Polacek wrote: > > 2013-09-16 Marek Polacek > > > > PR sanitizer/58420 > > * ubsan.c (ubsan_type_descriptor): Handle IDENTIFIER_NODEs > > when determining the type name. Ok. Jakub
Re: [patch] Separate immediate uses and phi routines from tree-flow*.h
On Tue, Sep 24, 2013 at 4:39 PM, Andrew MacLeod wrote: > This larger patch moves all the immediate use and operand routines from > tree-flow.h into tree-ssa-operands.h. > It also moves the basic phi routines and prototypes into a newly created > tree-phinodes.h, or tree-ssa-operands.h if they belong there. > And finally shuffles a couple of other routines which allows > tree-ssa-operands.h to be removed from the gimple.h header file. > > of note or interest: > > 1 - dump_decl_set() was defined in tree-into-ssa.c, but isn't really ssa > specific. Its tree-specific, so normally I'd throw it into tree.c. Looking > forward a little, its only used in a gimple context, so when we map to > gimple_types it will need to be converted to/created for those. If it is in > tree.c, I'll have to create a new version for gimple types, and then the > routine in tree.c will become unused. Based on that, I figured gimple.c is > the place place for it. > > 2 - has_zero_uses_1() and single_imm_use_1() were both in tree-cfg.c for > some reason.. they've been moved to tree-ssa-operands.c > > 3 - a few routines seem like basic gimple routines, but really turn out to > require the operand infrastructure to implement... so they are moved to > tree-ssa-operands.[ch] as well. This sort of thing showed up when removing > tree-ssa-operands.h from the gimple.h include file. These were things like > gimple_vuse_op, gimple_vdef_op, update_stmt, and update_stmt_if_modified Note that things like gimple_vuse_op are on the interface border between gimple (where the SSA operands are stored) and SSA operands. So it's not so clear for them given they access internal gimple fields directly but use the regular SSA operand API. I'd prefer gimple_vuse_op and gimple_vdef_op to stay in gimple.[ch]. If you want to remove the tree-ssa-operands.h include from gimple.h (in some way makes sense), then we need a new gimple-ssa.[ch] for routines that operate on the "gimple in SSA" IL (not purely on gimple, which could be not in SSA form and not purely on the SSA web which could in theory be constructed over a non-GIMPLE representation). Actually I like the idea of gimple-ssa.[ch]. tree-ssa-operands.[ch] would then be solely the place for the operand infrastructure internals implementation, not a place for generic SSA utility routines related to SSA operands. Moving update_stmt/update_stmt_if_modified is a similar case but maybe less obvious (unless we have gimple-ssa.[ch]). What do you think? > 4 - gimple_phi_arg_def() was oddly defined: > > static inline tree > gimple_phi_arg_def (gimple gs, size_t index) > { > struct phi_arg_d *pd = gimple_phi_arg (gs, index); > return get_use_from_ptr (&pd->imm_use); > } > > /* Return a pointer to the tree operand for argument I of PHI node GS. */ > > static inline tree * > gimple_phi_arg_def_ptr (gimple gs, size_t index) > { > return &gimple_phi_arg (gs, index)->def; > } > > > Im not sure why it goes through the immediate use routines... it should be > the same as '* gimple_phi_arg_def_ptr(x,y)' ... And by using immediate use > routine, it couldn't be put in tree-phinodes like it belong. I changed it > to be simply >return gimple_phi_arg (gs, index)->def; > and bootstrapped that change, along with an assert that it was the same > value as the immediate_use method. everything worked as expected. old wart > I guess. Nice cleanup. > I didn't bother with a Makefile.in change since its only dependencies > change, and Tromey's pending auto-dependency patch would just delete those > anyway. > > This bootstraps on x86_64-unknown-linux-gnu and has no new regressions. > OK? (swap_tree_operands): Move prototype to tree-ssa-operands.h. rename to swap_ssa_operands and remove the !ssa_operands_active path? (should be no longer needed I think) (gimple_phi_arg_def, gimple_phi_arg_def_ptr, gimple_phi_arg_edge, gimple_phi_arg_location, gimple_phi_arg_location_from_edge, gimple_phi_arg_set_location, gimple_phi_arg_has_location, phi_nodes, phi_nodes_ptr, set_phi_nodes, phi_ssa_name_p): Move to tree-phinodes.h. I always found the separation of PHI node API from gimple API odd ... now, PHI nodes are a SSA speciality. Unless the gimple_phi class moves to tree-phinodes.h the same comments as to gimple_vuse_op apply. But maybe everything will be more obvious once we separate the core gimple data structure from the gimple API stuff ... Overall the patch looks ok, either massage it according to my comments above (thinking about gimple-ssa.[ch]) or follow up with a separate patch (not that moving things twice will hurt). Thanks, Richard. > Andrew
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
On Tue, Sep 24, 2013 at 8:06 PM, Martin Jambor wrote: > Hi, > > On Tue, Sep 24, 2013 at 12:02:17PM +0200, Richard Biener wrote: >> On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger >> wrote: >> > Hi, >> > >> > with the attached patch the read-side of the out of bounds accesses are >> > fixed. >> > There is a single new test case pr57748-3.c that is derived from Martin's >> > test case. >> > The test case does only test the read access and does not depend on part 1 >> > of the >> > patch. >> > >> > This patch was boot-strapped and regression tested on >> > x86_64-unknown-linux-gnu. >> > >> > Additionally I generated eCos and an eCos-application (on ARMv5 using >> > packed >> > structures) with an arm-eabi cross compiler, and looked for differences in >> > the >> > disassembled code with and without this patch, but there were none. >> > >> > OK for trunk? >> >> Index: gcc/expr.c >> === >> --- gcc/expr.c (revision 202778) >> +++ gcc/expr.c (working copy) >> @@ -9878,7 +9878,7 @@ >> && modifier != EXPAND_STACK_PARM >> ? target : NULL_RTX), >> VOIDmode, >> -modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier); >> +EXPAND_MEMORY); >> >> /* If the bitfield is volatile, we want to access it in the >>field's mode, not the computed mode. >> >> context suggests that we may arrive with EXPAND_STACK_PARM here >> which is a correctness modifier (see its docs). But I'm not too familiar >> with the details of the various expand modifiers, Eric may be though. >> >> That said, I still believe that fixing the misalign path in expand_assignment >> would be better than trying to avoid it. For this testcase the issue is >> again that expand_assignment passes the wrong mode/target to the >> movmisalign optab. > > Perhaps I'm stating the obvious, but this is supposed to address a > separate bug in the expansion of the RHS (as opposed to the first bug > which was in the expansion of the LHS), and so we would have to make > expand_assignment to also examine potential misalignments in the RHS, > which it so far does not do, despite its complexity. > > Having said that, I also dislike the patch, but I am quite convinced > that if we allow non-BLK structures with zero sized arrays, the fix > will be ugly - but I'll be glad to be shown I've been wrong. I don't think the issues have anything to do with zero sized arrays or non-BLK structures. The issue is just we are feeding movmisaling with the wrong mode (the mode of the base object) if we are expanding a base object which is unaligned and has non-BLK mode. So what we maybe need is another expand modifier that tells us to not use movmisalign when expanding the base object. Or we need to stop expanding the base object with VOIDmode and instead pass down the mode of the access (TYPE_MODE (TREE_TYPE (exp)) which will probably confuse other code. So eventually not handling the misaligned case by expansion of the base but inlined similar to how it was in expand_assignment would be needed here. Richard. > Martin
Improve pretty printing of obj type ref.
Hi, the type of class whose method is called used to be determinable (in wrong way) by looking up type of OBJ_TYPE_REF_OBJECT parameter. Now we determine it from the method type of the call and this is not printed anywhere. This adds a "cast" of the OBj_TYPE_REF_OBJECT so i can see it easily. Bootstrapped/regtested x86_64-linux, OK? Honza * tree-pretty-print.c (dump_generic_node): Print class type of OBJ_TYPE_REF. Index: tree-pretty-print.c === --- tree-pretty-print.c (revision 202838) +++ tree-pretty-print.c (working copy) @@ -2040,6 +2040,12 @@ dump_generic_node (pretty_printer *buffe pp_string (buffer, "OBJ_TYPE_REF("); dump_generic_node (buffer, OBJ_TYPE_REF_EXPR (node), spc, flags, false); pp_semicolon (buffer); + if (virtual_method_call_p (node)) + { + pp_string (buffer, "("); + dump_generic_node (buffer, obj_type_ref_class (node), spc, flags, false); + pp_string (buffer, ")"); + } dump_generic_node (buffer, OBJ_TYPE_REF_OBJECT (node), spc, flags, false); pp_arrow (buffer); dump_generic_node (buffer, OBJ_TYPE_REF_TOKEN (node), spc, flags, false);
[PATCH] Fix PR58521
This hopefully fixes the bootstrap issue people were seeing. It fixes the ICE on the file I reproduced it on with i586 bootstrap. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. I'm still reducing a testcase. Richard. 2013-09-25 Richard Biener PR middle-end/58521 * tree.c (iterative_hash_expr): Remove MEM_REF special handling. Index: gcc/tree.c === --- gcc/tree.c (revision 202885) +++ gcc/tree.c (working copy) @@ -7280,21 +7280,6 @@ iterative_hash_expr (const_tree t, hashv } return val; } -case MEM_REF: - { - /* The type of the second operand is relevant, except for - its top-level qualifiers. */ - tree type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (t, 1))); - - val = iterative_hash_object (TYPE_HASH (type), val); - - /* We could use the standard hash computation from this point - on. */ - val = iterative_hash_object (code, val); - val = iterative_hash_expr (TREE_OPERAND (t, 1), val); - val = iterative_hash_expr (TREE_OPERAND (t, 0), val); - return val; - } case FUNCTION_DECL: /* When referring to a built-in FUNCTION_DECL, use the __builtin__ form. Otherwise nodes that compare equal according to operand_equal_p might
Re: Improve pretty printing of obj type ref.
On Wed, 25 Sep 2013, Jan Hubicka wrote: > Hi, > the type of class whose method is called used to be determinable (in wrong > way) > by looking up type of OBJ_TYPE_REF_OBJECT parameter. Now we determine it from > the method type of the call and this is not printed anywhere. > This adds a "cast" of the OBj_TYPE_REF_OBJECT so i can see it easily. > > Bootstrapped/regtested x86_64-linux, OK? > > Honza > > * tree-pretty-print.c (dump_generic_node): Print class type of > OBJ_TYPE_REF. > Index: tree-pretty-print.c > === > --- tree-pretty-print.c (revision 202838) > +++ tree-pretty-print.c (working copy) > @@ -2040,6 +2040,12 @@ dump_generic_node (pretty_printer *buffe >pp_string (buffer, "OBJ_TYPE_REF("); >dump_generic_node (buffer, OBJ_TYPE_REF_EXPR (node), spc, flags, > false); >pp_semicolon (buffer); > + if (virtual_method_call_p (node)) > + { > + pp_string (buffer, "("); > + dump_generic_node (buffer, obj_type_ref_class (node), spc, flags, > false); I think you want flags | TDF_SLIM here. Ok with that change. Richard.
Strenghten condition in cgraph_resolve_speculation
Hi, when target of polymorphic call was first determined speculatively and later determined exactly we compare targets for a match and if they match, we use the original speculative edge. This is becuase we may have already optimized through this edge. The test is done by comparing decls that brings unnecesary failures when we use local aliases. Fixed thus. regtested/bootstrapped x86_64-linux, comitted. Index: ChangeLog === --- ChangeLog (revision 202887) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2013-09-25 Jan Hubicka + + * cgraph.c (cgraph_resolve_speculation): Use semantical equivalency + test. + 2013-09-25 Marek Polacek PR sanitizer/58420 Index: cgraph.c === --- cgraph.c(revision 202838) +++ cgraph.c(working copy) @@ -1188,7 +1188,9 @@ cgraph_resolve_speculation (struct cgrap gcc_assert (edge->speculative); cgraph_speculative_call_info (edge, e2, edge, ref); - if (ref->referred->symbol.decl != callee_decl) + if (!callee_decl + || !symtab_semantically_equivalent_p ((symtab_node) ref->referred, + symtab_get_node (callee_decl))) { if (dump_file) {
Context sensitive type inheritance graph walking
Hi, this is updated version of http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00936.html It updates the type inheritance graph walking algorithm to be context sensitive. Contex is stored in ipa_polymorhic_call_context containing - OUTER_TYPE (a type of memory object that contains the object whose method is called, NULL if unknown) - OFFSET offset withing OUTER_TYPE where object is located - MAYBE_IN_CONSTRUCTION true if the object may be in construction. I.e. it is local or static variable. At the moment we do not try to disprove construciton that is partly done by ipa-prop and is planned to be merged here in future - MAYBE_DERIVED_TYPE if OUTER_TYPE object actually may be a derivation of a type. For example when OUTER_TYPE was determined from THIS parameter of a method. There is cgraph_indirect_call_info that walks GIMPLE code and attempts to determine the context of a given call. It looks for objects located in declarations (static vars/ automatic vars/parameters), objects passed by invisible references and objects passed as THIS pointers. The second two cases are new, the first case is already done by gimple_extract_devirt_binfo_from_cst Context is determined by cgraph construction code and stored in cgraph edges. In future I expect ipa-prop to manipulate and update the contextes and propagate across them, but it is not implemented. For this reason I did not add context itself into cgrpah edge, merely I added the new info and hacked ipa-prop (temporarily) to simply throw away the actual info. The highlevel idea is to make get_class_context to fill in info for known type and ancestor functions, too and then we can have function combining types + doing propagation across them replacing the current code that deals with BINFOs directly and thus can not deal with all the cases above very well. possible_polymorphic_call_targets is now bit more complex. it is split into - get_class_context this function walks the OUTER_TYPE (if present) and looks up the type of class actually containing the method being called. Previously similar logic was part of gimple_extract_devirt_binfo_from_cst. - walk_bases When type is in construction, all base types are walked and for those containing OTR_TYPE at proper memory location the corresponding virtual methods are added to list - record_binfos now walks the inner binfo from OUTER_TYPE to OTR_TYPE via get_binfo_at_offset. Bootstrapped/regtested x86_64-linux. The patch is tested by ipa-devirt9 testcase. I have extra four, but I would like to first fix the case where the devirtualization happens in TODO of early_local_passes that is not dumped anywhere. So I plan to post these incrementally once this code is hooked also into gimple folding. The patch results in 60% more devirtualizations on Firefox and 10% more speculative devirtualization. I think main component missing now is code determining dynamic type from a call to constructor. I have some prototype for this, too, I would like to discuss incrementally. I am not 100% sure how much harder tracking of dynamic type changes becomes here. Martin, does it seem to make sense? Honza * cgraph.c (cgraph_create_indirect_edge): Use get_polymorphic_call_info. * cgraph.h (cgraph_indirect_call_info): Add outer_type, maybe_in_construction and maybe_derived_type. * ipa-utils.h (ipa_polymorphic_call_context): New structure. (ipa_dummy_polymorphic_call_context): New global var. (possible_polymorphic_call_targets): Add context paramter. (dump_possible_polymorphic_call_targets): Likewise; update wrappers. (possible_polymorphic_call_target_p): Likewise. (get_polymorphic_call_info): New function. * ipa-devirt.c (ipa_dummy_polymorphic_call_context): New function. (add_type_duplicate): Remove forgotten debug output. (method_class_type): Add sanity check. (maybe_record_node): Add FINALP parameter. (record_binfo): Add OUTER_TYPE and OFFSET; walk the inner by info by get_binfo_at_offset. (possible_polymorphic_call_targets_1): Add OUTER_TYPE/OFFSET parameters; pass them to record-binfo. (polymorphic_call_target_d): Add context and FINAL. (polymorphic_call_target_hasher::hash): Hash context. (polymorphic_call_target_hasher::equal): Compare context. (free_polymorphic_call_targets_hash): (get_class_context): New function. (contains_type_p): New function. (get_polymorphic_call_info): New function. (walk_bases): New function. (possible_polymorphic_call_targets): Add context parameter; honnor it. (dump_possible_polymorphic_call_targets): Dump context. (possible_polymorphic_call_target_p): Add context. (update_type_inheritance_graph): Update comment.s (ipa_set_jf_known_type): Asser
[C++ PATCH] Splice when giving an error (PR c++/58510)
The following testcase ICEd because complete_ctor_at_level_p got a union with two initializers - and didn't like that. I think we can get away with splicing when sorting the initializers: we already gave an error and the program isn't accepted. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-09-25 Marek Polacek PR c++/58510 cp/ * init.c (sort_mem_initializers): Splice when giving an error. testsuite/ * g++.dg/cpp0x/pr58510.C: New test. --- gcc/cp/init.c.mp2013-09-25 11:50:18.246432664 +0200 +++ gcc/cp/init.c 2013-09-25 11:50:18.262432728 +0200 @@ -980,9 +980,12 @@ sort_mem_initializers (tree t, tree mem_ else if (TREE_VALUE (*last_p) && !TREE_VALUE (init)) goto splice; else - error_at (DECL_SOURCE_LOCATION (current_function_decl), - "initializations for multiple members of %qT", - ctx); + { + error_at (DECL_SOURCE_LOCATION (current_function_decl), + "initializations for multiple members of %qT", + ctx); + goto splice; + } } last_p = p; --- gcc/testsuite/g++.dg/cpp0x/pr58510.C.mp 2013-09-25 12:19:02.612137551 +0200 +++ gcc/testsuite/g++.dg/cpp0x/pr58510.C2013-09-25 12:45:13.157119958 +0200 @@ -0,0 +1,11 @@ +// PR c++/58510 +// { dg-do compile { target c++11 } } + +void foo() +{ + union + {// { dg-error "multiple" } +int i = 0; +char c = 0; + }; +} Marek
[PATCH] Improve out-of-SSA coalescing
This loosens the restriction of only coalescing SSA names with the same base variable by ignoring that restriction for DECL_INGORED_P base variables (ok, all of them can and should be anonymous SSA names now, but code obviously hasn't catched up 100%). This improves the code generated for the loop in the testcase to .p2align 4,,10 .p2align 3 .L4: xorps %xmm1, %xmm1 cvtsi2ss%eax, %xmm1 addl$1, %eax cmpl%edi, %eax addss %xmm1, %xmm0 jne .L4 from jmp .L4 .p2align 4,,10 .p2align 3 .L6: movaps %xmm0, %xmm1 .L4: xorps %xmm0, %xmm0 cvtsi2ss%eax, %xmm0 addl$1, %eax cmpl%edi, %eax addss %xmm1, %xmm0 jne .L6 avoiding the copy on the backedge and the loop entry jump. Overall this is similar to what Jeff was after with his latest adjustment of this code. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. Richard. 2013-09-25 Richard Biener * tree-ssa-live.c (var_map_base_init): Handle SSA names with DECL_IGNORED_P base variables like anonymous SSA names. (loe_visit_block): Use gcc_checking_assert. * tree-ssa-coalesce.c (create_outofssa_var_map): Use gimple_assign_ssa_name_copy_p. (gimple_can_coalesce_p): Adjust according to the var_map_base_init change. * gcc.dg/tree-ssa/coalesce-2.c: New testcase. Index: gcc/tree-ssa-live.c === *** gcc/tree-ssa-live.c (revision 202883) --- gcc/tree-ssa-live.c (working copy) *** var_map_base_init (var_map map) *** 104,110 struct tree_int_map **slot; unsigned baseindex; var = partition_to_var (map, x); ! if (SSA_NAME_VAR (var)) m->base.from = SSA_NAME_VAR (var); else /* This restricts what anonymous SSA names we can coalesce --- 104,111 struct tree_int_map **slot; unsigned baseindex; var = partition_to_var (map, x); ! if (SSA_NAME_VAR (var) ! && !DECL_IGNORED_P (SSA_NAME_VAR (var))) m->base.from = SSA_NAME_VAR (var); else /* This restricts what anonymous SSA names we can coalesce *** loe_visit_block (tree_live_info_p live, *** 990,998 edge_iterator ei; basic_block pred_bb; bitmap loe; - gcc_assert (!bitmap_bit_p (visited, bb->index)); bitmap_set_bit (visited, bb->index); loe = live_on_entry (live, bb); FOR_EACH_EDGE (e, ei, bb->preds) --- 993,1002 edge_iterator ei; basic_block pred_bb; bitmap loe; + gcc_checking_assert (!bitmap_bit_p (visited, bb->index)); bitmap_set_bit (visited, bb->index); + loe = live_on_entry (live, bb); FOR_EACH_EDGE (e, ei, bb->preds) Index: gcc/tree-ssa-coalesce.c === *** gcc/tree-ssa-coalesce.c (revision 202883) --- gcc/tree-ssa-coalesce.c (working copy) *** create_outofssa_var_map (coalesce_list_p *** 980,989 { tree lhs = gimple_assign_lhs (stmt); tree rhs1 = gimple_assign_rhs1 (stmt); ! ! if (gimple_assign_copy_p (stmt) ! && TREE_CODE (lhs) == SSA_NAME ! && TREE_CODE (rhs1) == SSA_NAME && gimple_can_coalesce_p (lhs, rhs1)) { v1 = SSA_NAME_VERSION (lhs); --- 982,988 { tree lhs = gimple_assign_lhs (stmt); tree rhs1 = gimple_assign_rhs1 (stmt); ! if (gimple_assign_ssa_name_copy_p (stmt) && gimple_can_coalesce_p (lhs, rhs1)) { v1 = SSA_NAME_VERSION (lhs); *** gimple_can_coalesce_p (tree name1, tree *** 1347,1353 { /* First check the SSA_NAME's associated DECL. We only want to coalesce if they have the same DECL or both have no associated DECL. */ ! if (SSA_NAME_VAR (name1) != SSA_NAME_VAR (name2)) return false; /* Now check the types. If the types are the same, then we should --- 1346,1356 { /* First check the SSA_NAME's associated DECL. We only want to coalesce if they have the same DECL or both have no associated DECL. */ ! tree var1 = SSA_NAME_VAR (name1); ! tree var2 = SSA_NAME_VAR (name2); ! var1 = (var1 && !DECL_IGNORED_P (var1)) ? var1 : NULL_TREE; ! var2 = (var2 && !DECL_IGNORED_P (var2)) ? var2 : NULL_TREE; ! if (var1 != var2) return false; /* Now check the types. If the types are the same, then we should Index: gcc/testsuite/gcc.dg/tree-ssa/coalesce-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/coalesce-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/coalesce-2.c (w
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
Hello, Please find the updated version of the patch in the attachment. It has addressed the previous comments and also included some changes in order to pass the bootstrapping on x86_64. It's also passed the regtest on arm-none-eabi and aarch64-none-elf. It will also fix the test failure as reported here: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html OK for the trunk? Thanks, Yufeng gcc/ * gimple-ssa-strength-reduction.c (safe_to_multiply_p): New function. (backtrace_base_for_ref): Call get_unwidened, check 'base_in' again and set unwidend_p with true; call safe_to_multiply_p to avoid unsafe unwidened cases. gcc/testsuite/ * gcc.dg/tree-ssa/slsr-40.c: New test. On 09/11/13 13:39, Bill Schmidt wrote: On Wed, 2013-09-11 at 10:32 +0200, Richard Biener wrote: On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang wrote: Hi, Following Bin's patch in http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch tweaks backtrace_base_for_ref () to strip of any widening conversion after the first TREE_CODE check fails. Without this patch, the test (gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as backtrace_base_for_ref () will stop if not seeing an ssa_name since the tree code can be nop_expr instead. Regtested on arm and aarch64; still bootstrapping x86_64. OK for the trunk if the x86_64 bootstrap succeeds? Please add a testcase. Also, the comment "Strip of" should read "Strip off". Otherwise I have no comments. Thanks, Bill Richard.diff --git a/gcc/gimple-ssa-strength-reduction.c b/gcc/gimple-ssa-strength-reduction.c index 8d48add..1c04382 100644 --- a/gcc/gimple-ssa-strength-reduction.c +++ b/gcc/gimple-ssa-strength-reduction.c @@ -750,6 +750,40 @@ slsr_process_phi (gimple phi, bool speed) add_cand_for_stmt (phi, c); } +/* Utility function for backtrace_base_for_ref. + + Given + + T2 = T2' + CST + RES = (wider_type) T2 * C3 + + where TYPE is TREE_TYPE (T2), this function returns false when it is + _not_ safe to carry out the following transformation. + + RES = (wider_type) T2' * C3 + (wider_type) CST * C3 + + One example unsafe case is: + + int array[40]; + array[n - 1] + + where n is a 32-bit unsigned int and pointer are 64-bit long. In this + case, the gimple for (n - 1) is: + + _2 = n_1(D) + 4294967295; // 0x + + and it is wrong to multiply the large constant by 4 in the 64-bit space. */ + +static bool +safe_to_multiply_p (tree type, double_int cst) +{ + if (TYPE_UNSIGNED (type) + && ! double_int_fits_to_tree_p (signed_type_for (type), cst)) +return false; + + return true; +} + /* Given PBASE which is a pointer to tree, look up the defining statement for it and check whether the candidate is in the form of: @@ -766,10 +800,19 @@ backtrace_base_for_ref (tree *pbase) { tree base_in = *pbase; slsr_cand_t base_cand; + bool unwidened_p = false; STRIP_NOPS (base_in); if (TREE_CODE (base_in) != SSA_NAME) -return tree_to_double_int (integer_zero_node); +{ + /* Strip off widening conversion(s) to handle cases where +e.g. 'B' is widened from an 'int' in order to calculate +a 64-bit address. */ + base_in = get_unwidened (base_in, NULL_TREE); + if (TREE_CODE (base_in) != SSA_NAME) + return tree_to_double_int (integer_zero_node); + unwidened_p = true; +} base_cand = base_cand_from_table (base_in); @@ -777,7 +820,10 @@ backtrace_base_for_ref (tree *pbase) { if (base_cand->kind == CAND_ADD && base_cand->index.is_one () - && TREE_CODE (base_cand->stride) == INTEGER_CST) + && TREE_CODE (base_cand->stride) == INTEGER_CST + && (! unwidened_p + || safe_to_multiply_p (TREE_TYPE (base_cand->stride), +tree_to_double_int (base_cand->stride { /* X = B + (1 * S), S is integer constant. */ *pbase = base_cand->base_expr; @@ -785,8 +831,11 @@ backtrace_base_for_ref (tree *pbase) } else if (base_cand->kind == CAND_ADD && TREE_CODE (base_cand->stride) == INTEGER_CST - && integer_onep (base_cand->stride)) -{ + && integer_onep (base_cand->stride) + && (! unwidened_p + || safe_to_multiply_p (TREE_TYPE (base_cand->base_expr), + base_cand->index))) + { /* X = B + (i * S), S is integer one. */ *pbase = base_cand->base_expr; return base_cand->index; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c b/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c new file mode 100644 index 000..72726a3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c @@ -0,0 +1,27 @@ +/* Verify straight-line strength reduction for array + subscripting. + + elems[n-1] is reduced to elems + n * 4 + 0x * 4, only w
RE: [PATCH, PR 57748] Check for out of bounds access, Part 2
Hmm., On Tue, 24 Sep 2013 20:06:51, Martin Jambor wrote: > Hi, > > On Tue, Sep 24, 2013 at 12:02:17PM +0200, Richard Biener wrote: >> On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger >> wrote: >>> Hi, >>> >>> with the attached patch the read-side of the out of bounds accesses are >>> fixed. >>> There is a single new test case pr57748-3.c that is derived from Martin's >>> test case. >>> The test case does only test the read access and does not depend on part 1 >>> of the >>> patch. >>> >>> This patch was boot-strapped and regression tested on >>> x86_64-unknown-linux-gnu. >>> >>> Additionally I generated eCos and an eCos-application (on ARMv5 using packed >>> structures) with an arm-eabi cross compiler, and looked for differences in >>> the >>> disassembled code with and without this patch, but there were none. >>> >>> OK for trunk? >> >> Index: gcc/expr.c >> === >> --- gcc/expr.c (revision 202778) >> +++ gcc/expr.c (working copy) >> @@ -9878,7 +9878,7 @@ >> && modifier != EXPAND_STACK_PARM >> ? target : NULL_RTX), >> VOIDmode, >> - modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier); >> + EXPAND_MEMORY); >> >> /* If the bitfield is volatile, we want to access it in the >> field's mode, not the computed mode. >> >> context suggests that we may arrive with EXPAND_STACK_PARM here >> which is a correctness modifier (see its docs). But I'm not too familiar >> with the details of the various expand modifiers, Eric may be though. " EXPAND_STACK_PARM is used when expanding to a TARGET on the stack for a call parameter. Such targets require special care as we haven't yet marked TARGET so that it's safe from being trashed by libcalls. We don't want to use TARGET for anything but the final result; Intermediate values must go elsewhere. Additionally, calls to emit_block_move will be flagged with BLOCK_OP_CALL_PARM." This should not be a problem: If modifier == EXPAND_STACK_PARM the target is NOT passed down, and thus not giving the modifier either should not cause incorrect code. But I might be wrong... On the other hand, in the case where tem is a unaligned MEM_REF and exp is not exactly the same object as tem, EXPAND_STACK_PARM may exactly do the wrong thing. What I can tell is, that this patch does not change a single bit in the complete GCC boot strap process, stage 2 and stage 3 are binary identical with or without this patch, and that all test cases pass, even the ADA test cases... >> That said, I still believe that fixing the misalign path in expand_assignment >> would be better than trying to avoid it. For this testcase the issue is >> again that expand_assignment passes the wrong mode/target to the >> movmisalign optab. > > Perhaps I'm stating the obvious, but this is supposed to address a > separate bug in the expansion of the RHS (as opposed to the first bug > which was in the expansion of the LHS), and so we would have to make > expand_assignment to also examine potential misalignments in the RHS, > which it so far does not do, despite its complexity. Thanks for pointing that out. This is true. > Having said that, I also dislike the patch, but I am quite convinced > that if we allow non-BLK structures with zero sized arrays, the fix > will be ugly - but I'll be glad to be shown I've been wrong. > > Martin I could of course look at the type of tem, and only mess with the expand modifier in case of a MEM_REF. But I am not sure what to do about the VIEW_CONVERT_EXPR, if the code at normal_inner_ref is wrong, the code at VIEW_CONVERT_EXPR can't possibly be any better, although I have not yet any idea how to write a test case for this: /* ??? We should work harder and deal with non-zero offsets. */ if (!offset && (bitpos % BITS_PER_UNIT) == 0 && bitsize>= 0 && compare_tree_int (TYPE_SIZE (type), bitsize) == 0) { /* See the normal_inner_ref case for the rationale. */ orig_op0 = expand_expr (tem, (TREE_CODE (TREE_TYPE (tem)) == UNION_TYPE && (TREE_CODE (TYPE_SIZE (TREE_TYPE (tem))) != INTEGER_CST) && modifier != EXPAND_STACK_PARM ? target : NULL_RTX), VOIDmode, modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier); Especially for EXPAND_STACK_PARM? If this does really allocate something on a register, then the result will be thrown away, because it is not MEM_P? Right? Note the RHS-Bug can also affect ordinary unions on ARM and all STRICT_ALIGNMENT targets. Please check the attached test case. It may not be obvious at first sight, but the assembler code for check is definitely wrong, because it reads the whole structure, and uses only one byte of it. And that is, not OK because x was volatile... check:
Re: [PATCH][ubsan] Add VLA bound instrumentation
On Thu, Sep 12, 2013 at 02:26:55PM +0200, Marek Polacek wrote: > This patch adds the instrumentation of VLA bounds. Basically, it just checks > that > the size of a VLA is positive. I.e., We also issue an error if the size of > the > VLA is 0. It catches e.g. > > int i = 1; > int a[i][i - 2]; > > It is pretty straightforward, but I had > issues in the C++ FE, mainly choosing the right spot where to instrument... > Hopefully I picked up the right one. Also note that in C++1y we throw > an exception when the size of a VLA is negative; hence no need to perform > the instrumentation if -std=c++1y is in effect. > > Regtested/ran bootstrap-ubsan on x86_64-linux, also > make check -C gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} ubsan.exp' > passes. > > Ok for trunk? I'd like to ping this patch; below is rebased version with the ubsan.c hunk omitted, since that part was already fixed by another patch. (It still doesn't contain alloca/SIZE_MAX/... checking, since that very much relies on libubsan. Still, it'd be felicitous to get at least the basic VLA checking in.) Ran ubsan testsuite + bootstrap-ubsan on x86_64-linux. 2013-09-25 Marek Polacek * opts.c (common_handle_option): Handle vla-bound. * sanitizer.def (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE): Define. * flag-types.h (enum sanitize_code): Add SANITIZE_VLA. * asan.c (initialize_sanitizer_builtins): Build BT_FN_VOID_PTR_PTR. c-family/ * c-ubsan.c: Don't include hash-table.h. (ubsan_instrument_vla): New function. * c-ubsan.h: Declare it. cp/ * decl.c (create_array_type_for_decl): Add VLA instrumentation. c/ * c-decl.c (grokdeclarator): Add VLA instrumentation. testsuite/ * g++.dg/ubsan/cxx1y-vla.C: New test. * c-c++-common/ubsan/vla-3.c: New test. * c-c++-common/ubsan/vla-2.c: New test. * c-c++-common/ubsan/vla-4.c: New test. * c-c++-common/ubsan/vla-1.c: New test. --- gcc/opts.c.mp 2013-09-25 14:06:58.531276511 +0200 +++ gcc/opts.c 2013-09-25 14:07:03.580294566 +0200 @@ -1428,6 +1428,7 @@ common_handle_option (struct gcc_options { "undefined", SANITIZE_UNDEFINED, sizeof "undefined" - 1 }, { "unreachable", SANITIZE_UNREACHABLE, sizeof "unreachable" - 1 }, + { "vla-bound", SANITIZE_VLA, sizeof "vla-bound" - 1 }, { NULL, 0, 0 } }; const char *comma; --- gcc/c-family/c-ubsan.c.mp 2013-09-25 14:06:58.535276527 +0200 +++ gcc/c-family/c-ubsan.c 2013-09-25 14:07:03.580294566 +0200 @@ -25,7 +25,6 @@ along with GCC; see the file COPYING3. #include "alloc-pool.h" #include "cgraph.h" #include "gimple.h" -#include "hash-table.h" #include "output.h" #include "toplev.h" #include "ubsan.h" @@ -86,8 +85,7 @@ ubsan_instrument_division (location_t lo return t; } -/* Instrument left and right shifts. If not instrumenting, return - NULL_TREE. */ +/* Instrument left and right shifts. */ tree ubsan_instrument_shift (location_t loc, enum tree_code code, @@ -157,4 +155,23 @@ ubsan_instrument_shift (location_t loc, t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node); return t; +} + +/* Instrument variable length array bound. */ + +tree +ubsan_instrument_vla (location_t loc, tree size) +{ + tree type = TREE_TYPE (size); + tree t, tt; + + t = fold_build2 (LE_EXPR, boolean_type_node, size, build_int_cst (type, 0)); + tree data = ubsan_create_data ("__ubsan_vla_data", +loc, ubsan_type_descriptor (type), NULL_TREE); + data = build_fold_addr_expr_loc (loc, data); + tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE); + tt = build_call_expr_loc (loc, tt, 2, data, ubsan_encode_value (size)); + t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node); + + return t; } --- gcc/c-family/c-ubsan.h.mp 2013-09-25 14:06:58.538276539 +0200 +++ gcc/c-family/c-ubsan.h 2013-09-25 14:07:03.595294628 +0200 @@ -23,5 +23,6 @@ along with GCC; see the file COPYING3. extern tree ubsan_instrument_division (location_t, tree, tree); extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree); +extern tree ubsan_instrument_vla (location_t, tree); #endif /* GCC_C_UBSAN_H */ --- gcc/sanitizer.def.mp2013-09-25 14:06:58.542276558 +0200 +++ gcc/sanitizer.def 2013-09-25 14:07:03.628294753 +0200 @@ -297,3 +297,7 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HAN "__ubsan_handle_builtin_unreachable", BT_FN_VOID_PTR, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST) +DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE, + "__ubsan_handle_vla_bound_not_positive", + BT_FN_VOID_PTR_PTR, + ATTR_COLD_NOTHROW_LEAF_LIST) --- gcc/flag-types.h.mp 2013-09-25 14:06:58.546276575 +0200 +++ gcc/flag-types.h
Re: [PATCH] Improve out-of-SSA coalescing
On Wed, 25 Sep 2013, Richard Biener wrote: > > This loosens the restriction of only coalescing SSA names with > the same base variable by ignoring that restriction for DECL_INGORED_P > base variables (ok, all of them can and should be anonymous SSA names > now, but code obviously hasn't catched up 100%). > > This improves the code generated for the loop in the testcase to > > > .p2align 4,,10 > .p2align 3 > .L4: > xorps %xmm1, %xmm1 > cvtsi2ss%eax, %xmm1 > addl$1, %eax > cmpl%edi, %eax > addss %xmm1, %xmm0 > jne .L4 > > from > > jmp .L4 > .p2align 4,,10 > .p2align 3 > .L6: > movaps %xmm0, %xmm1 > .L4: > xorps %xmm0, %xmm0 > cvtsi2ss%eax, %xmm0 > addl$1, %eax > cmpl%edi, %eax > addss %xmm1, %xmm0 > jne .L6 > > avoiding the copy on the backedge and the loop entry jump. Overall > this is similar to what Jeff was after with his latest adjustment > of this code. > > Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. For some reason it miscompiles GCC itself. Hmm. Cannot spot the obvious error yet. Richard.
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
hmm, I don't see clearly where we loose the XEXP (x, n) information when calling must_be_base_p(*inner) and/or must_be_index_p(*inner) in set_address_base and set_address_index. BTW, the validation on ARM (AARch32 and AARch64) is clean. Thanks, Yvan On 24 September 2013 18:36, Richard Sandiford wrote: > Eric Botcazou writes: >>> Sorry, I'm not sure I understand well, it means that you prefer the >>> shift_and _rotate_code_p notation, right ? >> >> Let's do the following in addition to the lsb_bitfield_op_p thing: >> 1. Replace the LO_SUM test in set_address_base by a call to must_be_base_p, >> 2. Replace the MULT || ASHIFT test in set_address_index by a call to >> must_be_index_p, >> 3. Add the new cases to must_be_index_p directly, with a comment saying >> that >> there are e.g. for the ARM. > > FWIW, I'd prefer to keep it as-is, since must_be_base_p (x) and > must_be_index_p (x) don't imply that we should look specifically at > XEXP (x, 0) (rather than just X, or XEXP (x, 1), etc.). I think it's > better to keep the code tests and the associated XEXPs together. > > Thanks, > Richard
Re: [patch] Separate immediate uses and phi routines from tree-flow*.h
On 09/25/2013 04:49 AM, Richard Biener wrote: On Tue, Sep 24, 2013 at 4:39 PM, Andrew MacLeod wrote: 3 - a few routines seem like basic gimple routines, but really turn out to require the operand infrastructure to implement... so they are moved to tree-ssa-operands.[ch] as well. This sort of thing showed up when removing tree-ssa-operands.h from the gimple.h include file. These were things like gimple_vuse_op, gimple_vdef_op, update_stmt, and update_stmt_if_modified Note that things like gimple_vuse_op are on the interface border between gimple (where the SSA operands are stored) and SSA operands. So it's not so clear for them given they access internal gimple fields directly but use the regular SSA operand API. I'd prefer gimple_vuse_op and gimple_vdef_op to stay in gimple.[ch]. If you want to remove the tree-ssa-operands.h include from gimple.h (in some way makes sense), then we need a new gimple-ssa.[ch] for routines that operate on the "gimple in SSA" IL (not purely on gimple, which could be not in SSA form and not purely on the SSA web which could in theory be constructed over a non-GIMPLE representation). Actually I like the idea of gimple-ssa.[ch]. tree-ssa-operands.[ch] would then be solely the place for the operand infrastructure internals implementation, not a place for generic SSA utility routines related to SSA operands. Moving update_stmt/update_stmt_if_modified is a similar case but maybe less obvious (unless we have gimple-ssa.[ch]). What do you think? Yeah, I was mulling something similar, I wasnt super thrilled that those were going into tree-ssa-operands.h... but they didnt belong in gimple.h either. I think adding gimple-ssa.[ch] is the way to go (swap_tree_operands): Move prototype to tree-ssa-operands.h. rename to swap_ssa_operands and remove the !ssa_operands_active path? (should be no longer needed I think) Yeah, I think you are right. Its not being used by tree only routines, and it trivial in that case anyway. (gimple_phi_arg_def, gimple_phi_arg_def_ptr, gimple_phi_arg_edge, gimple_phi_arg_location, gimple_phi_arg_location_from_edge, gimple_phi_arg_set_location, gimple_phi_arg_has_location, phi_nodes, phi_nodes_ptr, set_phi_nodes, phi_ssa_name_p): Move to tree-phinodes.h. I always found the separation of PHI node API from gimple API odd ... now, PHI nodes are a SSA speciality. Unless the gimple_phi class moves to tree-phinodes.h the same comments as to gimple_vuse_op apply. But maybe everything will be more obvious once we separate the core gimple data structure from the gimple API stuff ... Hrm. Yeah, this one isn't as clear as it might seem at first. gimple.h does have a few accessors for the base statement kind... and these are accessors for the argument which is from core-types.h. I imagine that tree-phinodes.h should do all the phi-node related stuff short of the actual stmt itself... but its not immediately obvious. I'd like to leave these in tree-phinodes.h and I'll make a stab at how to manage that next.. it would actually be nice to get struct phi_arg_d *out* of core-types.h.. but maybe thats too grand. I can move them again later if it turns out the should be in the border file or elsewhere. Overall the patch looks ok, either massage it according to my comments above (thinking about gimple-ssa.[ch]) or follow up with a separate patch (not that moving things twice will hurt). I'll do the massage., I like the gimple-ssa border files. I suspect there will be more shuffling of some of these routines down the road.. right now its pretty high level separation thats happening. Once things are better clustered, I expect more appropriate fine grained breakdowns will become apparent. Andrew
RE: [PATCH] Portable Volatility Warning
Hi Sandra, thanks a lot, your comments are very welcome, especially as I am not a native english speaker... On Tue, 24 Sep 2013 15:46:22, Sandra Loosemore wrote: > > I have some nit-picky documentation suggestions about this patch > > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00100.html > >> + warning_at (input_location, OPT_Wportable_volatility, >> + "the code to accesses this volatile member is dependent on" >> + " whether -fstrict-volatile-bitfields is used;" >> + " accessing device registers may depend on this option" >> + " however it contradicts the C11/C++11 memory model"); > > I'd simplify this to something like > > "access to volatile member may not conform to the C11/C++11" > " memory model if -fstrict-volatile-bitfields is in effect" > > >> --- gcc/c-family/c.opt 2013-05-28 21:55:10.0 +0200 >> +++ gcc/c-family/c.opt 2013-07-13 11:02:38.0 +0200 >> @@ -629,6 +629,10 @@ Wpointer-to-int-cast >> C ObjC Var(warn_pointer_to_int_cast) Init(1) Warning >> Warn when a pointer is cast to an integer of a different size >> >> +Wportable-volatility >> +C ObjC C++ ObjC++ Var(warn_portable_volatility) Warning >> +Warn about code for which separate incompatible definitions exist even >> within gcc >> + > > This seems too vague. How about just > > Warn about potentially nonportable volatile memory accesses > >> +@item -Wportable-volatility >> +@opindex Wportable-volatility >> +@opindex Wno-portable-volatility >> +Warn about code which accesses volatile structure members for > > s/which/that/ > >> +which different ABI specifications may exist whether in some >> +language standard or in a target-specific ABI document. >> + >> +For instance on the ARM architecture AAPCS specifies how to >> +access volatile bit-fields. But for C/C++ there exists a >> +language standard, the C11/C++11 memory model. GCC tries to >> +follow the latter by default unless -fstrict-volatile-bitfields >> +is used. >> + >> +As an example this option will warn about code which accesses >> +packed volatile structure members which may be dependent on >> +whether -fstrict-volatile-bitfields is used or not. > > I'd replace the last two paragraphs with just: > > In particular, this option causes GCC to warn about volatile memory > accesses for which use of the @option{-fstrict-volatile-bitfields} > option causes code to be generated that does not conform to the > C11/C++11 memory model. > > I'd also like to see a cross-reference added to the documentation for > -fstrict-volatile-bitfields, like: > > You can use @option{-Wportable-volatility} to make GCC issue warnings > about volatile structure accesses affected by > @option{-fstrict-volatile-bitfields}. > > -Sandra Richard: I do not know, is this a political issue, that is blocking the whole of Sandra's patch? Actually we (softing.com) do not really care what happens to the default setting of -fstrict-volatile-bitfields. Maybe you could look at reviewing Sandra's part 1,2,3 independently of the rest? Once that is approved, I could start again my work on the warnings part. Meanwhile I've got an idea how to solve you first objection: On Tue, 3 Sep 2013 10:53:10, Richard Biener wrote: > Just a quick first note: > > @@ -3470,6 +3470,13 @@ optimize_bit_field_compare (location_t l >|| offset != 0 || TREE_CODE (linner) == PLACEHOLDER_EXPR) > return 0; > > + /* Do not try to optimize on volatiles, as it would defeat the > + -Wportable-volatility warning later: > + If the COMPONENT_REF is replaced by a BIT_FIELD_REF we loose > + information and thus the warning cannot be generated correctly. */ > + if (warn_portable_volatility && lvolatilep) > +return 0; > > changing code-generation with a warning option looks wrong to me. Note > that I completely removed this code once (it also generates strange > BIT_FIELD_REFs) but re-instantiated it on request because of lost > optimizations. The idea is, I'd emit the warning at the time when the COMPONENT_REF is replaced by a BIT_FIELD_REF. Then the warning code would not have access to !MEM_P (op0) || !MEM_VOLATILE_P (op0) but this should not be a big issue. Thanks Bernd.
Re: [gomp4] Dumping gimple for offload.
On 24 Sep 11:02, Richard Biener wrote: > On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar wrote: > > Hi, > > > > I've rebased my patch. > > Is it ok for gomp4 > > Passing through "is_omp" looks bad - please find a more suitable place > to attach this meta-information. From a quick look you only need it to > produce an alternate section name, Mostly for name, but there are other uses (e. g. choosing decl states vector). > thus consider assigning the section > name in a different place. > > Richard. What do you mean by different place? I can add global dumping_omp_target variable to choose correct name, depending on it's value (patch below). Is it better? --- gcc/lto-cgraph.c | 10 ++- gcc/lto-section-out.c | 6 +- gcc/lto-streamer-out.c | 182 + gcc/lto-streamer.c | 4 +- gcc/lto-streamer.h | 6 ++ gcc/passes.c | 2 +- gcc/passes.def | 2 + gcc/timevar.def| 2 + gcc/tree-pass.h| 2 + 9 files changed, 198 insertions(+), 18 deletions(-) diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 952588d..5187190 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -907,9 +907,15 @@ output_symtab (void) { symtab_node node = lto_symtab_encoder_deref (encoder, i); if (cgraph_node *cnode = dyn_cast (node)) -lto_output_node (ob, cnode, encoder); + { + if (!dumping_omp_target || lookup_attribute ("omp declare target", + DECL_ATTRIBUTES (node->symbol.decl))) + lto_output_node (ob, cnode, encoder); + } else -lto_output_varpool_node (ob, varpool (node), encoder); + if (!dumping_omp_target || lookup_attribute ("omp declare target", + DECL_ATTRIBUTES (node->symbol.decl))) + lto_output_varpool_node (ob, varpool (node), encoder); } diff --git a/gcc/lto-section-out.c b/gcc/lto-section-out.c index 59eed71..ad247c1 100644 --- a/gcc/lto-section-out.c +++ b/gcc/lto-section-out.c @@ -49,6 +49,7 @@ static vec decl_state_stack; vec lto_function_decl_states; +vec omp_function_decl_states; /* Output routines shared by all of the serialization passes. @@ -443,5 +444,8 @@ lto_record_function_out_decl_state (tree fn_decl, state->streams[i].tree_hash_table = NULL; } state->fn_decl = fn_decl; - lto_function_decl_states.safe_push (state); + if (dumping_omp_target) +omp_function_decl_states.safe_push (state); + else +lto_function_decl_states.safe_push (state); } diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index 8f823f2..66a0747 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -48,6 +48,9 @@ along with GCC; see the file COPYING3. If not see #include "cfgloop.h" +/* Off by default. */ +bool dumping_omp_target = false; + /* Clear the line info stored in DATA_IN. */ static void @@ -2014,6 +2017,93 @@ lto_output (void) #endif } +bool +gate_omp_out (void) +{ + return flag_openmp; +} + +static void +omp_output (void) +{ + /* We need omp names for sections, turn this on. */ + dumping_omp_target = true; + + lto_streamer_hooks_init(); + struct lto_out_decl_state *decl_state; + int i, n_nodes; + lto_symtab_encoder_t encoder = lto_get_out_decl_state ()->symtab_node_encoder; + + /* Initialize the streamer. */ + lto_streamer_init (); + + n_nodes = lto_symtab_encoder_size (encoder); + /* Process only the functions with bodies. */ + for (i = 0; i < n_nodes; i++) +{ + symtab_node snode = lto_symtab_encoder_deref (encoder, i); + cgraph_node *node = dyn_cast (snode); + if (node + && lto_symtab_encoder_encode_body_p (encoder, node) + && !node->symbol.alias + && gimple_has_body_p (node->symbol.decl) + && lookup_attribute ("omp declare target", + DECL_ATTRIBUTES (node->symbol.decl))) + { + decl_state = lto_new_out_decl_state (); + lto_push_out_decl_state (decl_state); + output_function (node); + lto_pop_out_decl_state (); + lto_record_function_out_decl_state (node->symbol.decl, decl_state); + } +} + + /* Emit the callgraph after emitting function bodies. This needs to + be done now to make sure that all the statements in every function + have been renumbered so that edges can be associated with call + statements using the statement UIDs. */ + output_symtab (); +} + +namespace { + +const pass_data pass_data_ipa_omp_gimple_out = +{ + IPA_PASS, /* type */ + "omp_gimple_out", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + true, /* has_gate */ + false, /* has_execute */ + TV_IPA_OMP_GIMPLE_OUT, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_sta
Re: [gomp4] Dumping gimple for offload.
On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar wrote: > On 24 Sep 11:02, Richard Biener wrote: >> On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar wrote: >> > Hi, >> > >> > I've rebased my patch. >> > Is it ok for gomp4 >> >> Passing through "is_omp" looks bad - please find a more suitable place >> to attach this meta-information. From a quick look you only need it to >> produce an alternate section name, > > Mostly for name, but there are other uses > (e. g. choosing decl states vector). > >> thus consider assigning the section >> name in a different place. >> >> Richard. > > What do you mean by different place? > I can add global dumping_omp_target variable to choose correct name, > depending on it's value (patch below). Is it better? More like passing down a different abstraction, like for > @@ -907,9 +907,15 @@ output_symtab (void) > { >symtab_node node = lto_symtab_encoder_deref (encoder, i); >if (cgraph_node *cnode = dyn_cast (node)) > -lto_output_node (ob, cnode, encoder); > + { > + if (!dumping_omp_target || lookup_attribute ("omp declare target", > + DECL_ATTRIBUTES > (node->symbol.decl))) > + lto_output_node (ob, cnode, encoder); > + } >else > -lto_output_varpool_node (ob, varpool (node), encoder); > + if (!dumping_omp_target || lookup_attribute ("omp declare target", > + DECL_ATTRIBUTES > (node->symbol.decl))) > + lto_output_varpool_node (ob, varpool (node), encoder); > > } have the symtab encoder already not contain the varpool nodes you don't need. And instead of looking up attributes, mark the symtab node with a flag. Maybe Honza has some ideas on how to design this into the machinery rather than trying to fit in from the outside as well. Richard.
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
Hi, On Wed, Sep 25, 2013 at 11:05:21AM +0200, Richard Biener wrote: > On Tue, Sep 24, 2013 at 8:06 PM, Martin Jambor wrote: > > Hi, > > > > On Tue, Sep 24, 2013 at 12:02:17PM +0200, Richard Biener wrote: > >> On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger > >> wrote: > >> > Hi, > >> > > >> > with the attached patch the read-side of the out of bounds accesses are > >> > fixed. > >> > There is a single new test case pr57748-3.c that is derived from > >> > Martin's test case. > >> > The test case does only test the read access and does not depend on part > >> > 1 of the > >> > patch. > >> > > >> > This patch was boot-strapped and regression tested on > >> > x86_64-unknown-linux-gnu. > >> > > >> > Additionally I generated eCos and an eCos-application (on ARMv5 using > >> > packed > >> > structures) with an arm-eabi cross compiler, and looked for differences > >> > in the > >> > disassembled code with and without this patch, but there were none. > >> > > >> > OK for trunk? > >> > >> Index: gcc/expr.c > >> === > >> --- gcc/expr.c (revision 202778) > >> +++ gcc/expr.c (working copy) > >> @@ -9878,7 +9878,7 @@ > >> && modifier != EXPAND_STACK_PARM > >> ? target : NULL_RTX), > >> VOIDmode, > >> -modifier == EXPAND_SUM ? EXPAND_NORMAL : > >> modifier); > >> +EXPAND_MEMORY); > >> > >> /* If the bitfield is volatile, we want to access it in the > >>field's mode, not the computed mode. > >> > >> context suggests that we may arrive with EXPAND_STACK_PARM here > >> which is a correctness modifier (see its docs). But I'm not too familiar > >> with the details of the various expand modifiers, Eric may be though. > >> > >> That said, I still believe that fixing the misalign path in > >> expand_assignment > >> would be better than trying to avoid it. For this testcase the issue is > >> again that expand_assignment passes the wrong mode/target to the > >> movmisalign optab. > > > > Perhaps I'm stating the obvious, but this is supposed to address a > > separate bug in the expansion of the RHS (as opposed to the first bug > > which was in the expansion of the LHS), and so we would have to make > > expand_assignment to also examine potential misalignments in the RHS, > > which it so far does not do, despite its complexity. > > > > Having said that, I also dislike the patch, but I am quite convinced > > that if we allow non-BLK structures with zero sized arrays, the fix > > will be ugly - but I'll be glad to be shown I've been wrong. > > I don't think the issues have anything to do with zero sized arrays > or non-BLK structures. The issue is just we are feeding movmisaling > with the wrong mode (the mode of the base object) if we are expanding > a base object which is unaligned and has non-BLK mode. I disagree. Consider a slightly modified testcase: #include typedef long long V __attribute__ ((vector_size (2 * sizeof (long long)), may_alias)); #if 1 typedef struct S { V a; V b[0]; } P __attribute__((aligned (1))); struct __attribute__((packed)) T { char c; P s; }; #else typedef struct S { V a; V b[0]; } P; struct T { char c; P s; }; #endif void __attribute__((noinline, noclone)) good_value (V b) { if (b[0] != 3 || b[1] != 4) __builtin_abort (); } void __attribute__((noinline, noclone)) check (P *p) { good_value (p->b[0]); } int main () { struct T *t = (struct T *) calloc (128, 1); V a = { 3, 4 }; t->s.b[0] = a; check (&t->s); free (t); return 0; } The problem is the expansion of the memory load in function check. All involved modes, the one of the base structure and of the loaded component, are V2DI so even if we somehow mixed them up, in this example it should not matter, yet the unaligned load gives wrong results. I'm still convinced that the problem is that we have a V2DImode structure which is larger that the mode tells and we want to load the data from outside of its mode. That is only happening because zero sized arrays. > > So what we maybe need is another expand modifier that tells us > to not use movmisalign when expanding the base object. In that case we can just as well use EXPAND_MEMORY. If so, I'd do that only when there is a zero sized array involved in order not to avoid using movmisalign when we can. > Or we need to stop expanding the base object with VOIDmode and > instead pass down the mode of the access (TYPE_MODE (TREE_TYPE > (exp)) which will probably confuse other code. Well, because I think the problem is not (just) mixing up modes, I don't think this will help either. > So eventually not > handling the misaligned case by expansion of the base but inlined > similar to how it was in expand_assignment would be needed here. At least now I fail to see how this would be different from copying a large portion of expand_expr_real_1 (all of the MEM_REF
Re: [PATCH] Portable Volatility Warning
On 09/25/2013 07:23 AM, Bernd Edlinger wrote: Richard: I do not know, is this a political issue, that is blocking the whole of Sandra's patch? Actually we (softing.com) do not really care what happens to the default setting of -fstrict-volatile-bitfields. Maybe you could look at reviewing Sandra's part 1,2,3 independently of the rest? I can't speak for all of Mentor Graphics, but I personally do not really care what the default setting of -fstrict-volatile-bitfields is, either. Looking at it from our customers' point of view, though, this option currently causes code to be generated that is just broken and wrong and not conforming to either AAPCS or the C11/C++11 memory model. And because it's enabled by default on ARM, that means GCC generates broken code by default on ARM. I think users would rather live with having to pass -fstrict-volatile-bitfields explicitly than to have a default that is not useful in any way. BTW, it was pointed out to me offline that there is precedent for GCC choosing to ignore details of a target-specific ABI in favor of uniform behavior across targets -- see the rant against unsigned bit-fields here: http://gcc.gnu.org/onlinedocs/gcc/Non_002dbugs.html#Non_002dbugs Anyway, I am hoping that we can reach some closure on this issue before it is too late to get the fix into GCC 4.9. It's very frustrating to me that I have been working on it since May or June, tried hard to incorporate all the feedback I received on the initial patches, spent a lot of time testing, etc and the whole process just seems stuck. :-( -Sandra
Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.
On 09/24/2013 10:40 AM, Yvan Roux wrote: > Hi, > > This patch removes REG_DEAD and REG_UNUSED notes in update_inc_notes, > as it is what the function is supposed to do (see the comments) and as > keeping these notes produce some failures, at least on ARM. > > Thanks, > Yvan > > 2013-09-24 Yvan Roux > > * lra.c (update_inc_notes): Remove all REG_DEAD and REG_UNUSED notes. Ok. Thanks, Yvan. Another possibility would run one more time LRA live pass at the end of LRA to update notes correctly but it would be wasting CPU time as the same will be done by DF-framework at the end of whole RA and the LRA live pass makes much more than notes.
Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.
On 09/24/2013 03:40 PM, Mike Stump wrote: > On Sep 24, 2013, at 12:23 PM, Steven Bosscher wrote: >> On Tue, Sep 24, 2013 at 5:03 PM, Eric Botcazou wrote: This patch removes REG_DEAD and REG_UNUSED notes >>> DF framework is supposed to do it for you. >> Unfortunately LRA uses its own DF framework. I'd say fortunately. Otherwise, LRA would decrease whole compiler speed by upto 10%. I can say it because of the first version of LRA was based on DF framework. I guess that was one reason why reload was never moved to DF too. LRA/reload is the most live analysis intensive passes (too many changes on several sub-passes). LRA uses also REG notes internally and recreates them several time as a by-product of pseudo live-range analysis used for (reload) pseudo assignments. I'd be glad if we move to DF without compilation time degradation. I already wrote that DF-design (although it has a nice API which is good for prototyping) has a bad design with resource usage point of view. Keeping a lot of duplicated info by RTL side increases memory footprint very much, worsens data locality, and slows down the compiler. As I remember correctly, GCC steering committing permitted 5% compiler time degradation as a criterium to include it into the trunk and DF achieved this with difficulties. > Is that a bug, or a feature?
Ping^6/update: contribute Synopsys Designware ARC port (6/7): documentation
The extend.texi context has changed due to the addition of the MSP430 port. OK to apply? 2013-09-24 Joern Rennecke Jeremy Bennett * doc/install.texi (--with-cpu): Mention ARC. (arc-*-elf32): New paragraph. (arc-linux-uclibc): Likewise. * doc/md.texi (Machine Constraints): Add ARC part. * doc/invoke.texi: (menu): Add ARC Options. (Machine Dependent Options) : Add synopsis. (node ARC Options): Add. * doc/extend.texi (long_call / short_call attribute): Add ARC. (ARC Built-in Functions): New section defining generic ARC built-in functions. (ARC SIMD Built-in Functions): New section defining SIMD specific built-in functions. (Declaring Attributes of Functions): Extended description of short_call and long_call attributes for ARC and added index entries. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index edf0e28..ef10f4c 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -2813,8 +2813,9 @@ least version 2.20.1), and GNU C library (at least version 2.11.1). @item interrupt @cindex interrupt handler functions -Use this attribute on the ARM, AVR, CR16, Epiphany, M32C, M32R/D, m68k, MeP, MIPS, -MSP430, RL78, RX and Xstormy16 ports to indicate that the specified function is an +Use this attribute on the ARC, ARM, AVR, CR16, Epiphany, M32C, M32R/D, +m68k, MeP, MIPS, MSP430, RL78, RX and Xstormy16 ports to indicate that +the specified function is an interrupt handler. The compiler generates function entry and exit sequences suitable for use in an interrupt handler when this attribute is present. With Epiphany targets it may also generate a special section with @@ -2823,6 +2824,16 @@ code to initialize the interrupt vector table. Note, interrupt handlers for the Blackfin, H8/300, H8/300H, H8S, MicroBlaze, and SH processors can be specified via the @code{interrupt_handler} attribute. +Note, on the ARC, you must specify the kind of interrupt to be handled +in a parameter to the interrupt attribute like this: + +@smallexample +void f () __attribute__ ((interrupt ("ilink1"))); +@end smallexample + +Permissible values for this parameter are: @w{@code{ilink1}} and +@w{@code{ilink2}}. + Note, on the AVR, the hardware globally disables interrupts when an interrupt is executed. The first instruction of an interrupt handler declared with this attribute is a @code{SEI} instruction to @@ -3021,18 +3032,33 @@ unit. This is to allow easy merging of multiple compilation units into one, for example, by using the link-time optimization. For this reason the attribute is not allowed on types to annotate indirect calls. -@item long_call/short_call +@item long_call/medium_call/short_call +@cindex indirect calls on ARC @cindex indirect calls on ARM -This attribute specifies how a particular function is called on -ARM and Epiphany. Both attributes override the -@option{-mlong-calls} (@pxref{ARM Options}) -command-line switch and @code{#pragma long_calls} settings. The +@cindex indirect calls on Epiphany +These attribute specifies how a particular function is called on +ARC, ARM and Epiphany - with @code{medium_call} being specific to ARC. +These attributes override the +@option{-mlong-calls} (@pxref{ARM Options} and @ref{ARC Options}) +and @option{-mmedium-calls} (@pxref{ARC Options}) +command-line switches and @code{#pragma long_calls} settings. For ARM, the @code{long_call} attribute indicates that the function might be far away from the call site and require a different (more expensive) calling sequence. The @code{short_call} attribute always places the offset to the function from the call site into the @samp{BL} instruction directly. +For ARC, a function marked with the @code{long_call} attribute is +always called using register-indirect jump-and-link instructions, +thereby enabling the called function to be placed anywhere within the +32-bit address space. A function marked with the @code{medium_call} +attribute will always be close enough to be called with an unconditional +branch-and-link instruction, which has a 25-bit offset from +the call site. A function marked with the @code{short_call} +attribute will always be close enough to be called with a conditional +branch-and-link instruction, which has a 21-bit offset from +the call site. + @item longcall/shortcall @cindex functions called via pointer on the RS/6000 and PowerPC On the Blackfin, RS/6000 and PowerPC, the @code{longcall} attribute @@ -8870,6 +8896,8 @@ instructions, but allow the compiler to schedule those calls. @menu * Alpha Built-in Functions:: +* ARC Built-in Functions:: +* ARC SIMD Built-in Functions:: * ARM iWMMXt Built-in Functions:: * ARM NEON Intrinsics:: * AVR Built-in Functions:: @@ -8977,6 +9005,430 @@ void *__builtin_thread_pointer (void) void __builtin_set_thread_pointer (void *) @end smallexample +@node ARC Built-in Functions +@subsection ARC Bu
[PATCH] Trivial cleanup
I was looking at the vec class to figure out the best way to do some things and realized we have a "last" member function. Using foo.last() is clearer than foo[foo.length() - 1] On a related note, our current standards require a space between a function name and the open paren for its argument list. However, it leads to fugly code in some cases. Assume foo is an vec instance and we want to look at something in the last vector element. Our current standards would imply something like this: foo.last ()->bar Which is how the current patch formats that code. However, I typically see this more often in C++ codebases as foo.last()->bar But then, what if we just want the last element without peeking deeper inside it? foo.last ()? or foo.last()? Do we want to make any changes in our style guidelines to handle this better? Anyway, bootstrapped & regression tested on x86_64-unknown-linux-gnu. Installed onto the trunk. Jeff * tree-ssa-threadedge.c (thread_across_edge): Use foo.last () rather than foo[foo.length () - 1] to access last member in a vec. * tree-ssa-threadupdate.c (register_jump_thread): Similarly. diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 47db280..2ca56342 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -947,8 +947,7 @@ thread_across_edge (gimple dummy_cond, } remove_temporary_equivalences (stack); - propagate_threaded_block_debug_into (path[path.length () - 1]->dest, - e->dest); + propagate_threaded_block_debug_into (path.last ()->dest, e->dest); register_jump_thread (path, false); path.release (); return; @@ -987,7 +986,7 @@ thread_across_edge (gimple dummy_cond, path.safe_push (taken_edge); found = false; if ((e->flags & EDGE_DFS_BACK) == 0 - || ! cond_arg_set_in_bb (path[path.length () - 1], e->dest)) + || ! cond_arg_set_in_bb (path.last (), e->dest)) found = thread_around_empty_blocks (taken_edge, dummy_cond, handle_dominating_asserts, @@ -999,7 +998,7 @@ thread_across_edge (gimple dummy_cond, record the jump threading opportunity. */ if (found) { - propagate_threaded_block_debug_into (path[path.length () - 1]->dest, + propagate_threaded_block_debug_into (path.last ()->dest, taken_edge->dest); register_jump_thread (path, true); } diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 4131128..fd5234c 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -1401,7 +1401,7 @@ register_jump_thread (vec path, bool through_joiner) if (!through_joiner) e3 = NULL; else -e3 = path[path.length () - 1]; +e3 = path.last (); /* This can occur if we're jumping to a constant address or or something similar. Just get out now. */
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
On 09/24/2013 07:57 PM, Wei Mi wrote: > Hi, > > This patch is to address the problem described here: > http://gcc.gnu.org/ml/gcc/2013-09/msg00187.html > > The patch changes ALLOCNO_MODE of a pseudo reg to be outermode if the > pseudo reg is used in a paradoxical subreg, so IRA will not mistakenly > assign an operand with a bigger mode to a smaller hardreg which > couldn't find a pair register. > > No test is added because I cannot create a small testcase to reproduce > the problem on trunk, the difficulty of which was described in the > above post. > > bootstrap and regression pass. ok for trunk? > > Thanks, > Wei Mi. > > 2013-09-24 Wei Mi > > * ira-build.c (create_insn_allocnos): Fix ALLOCNO_MODE > in the case of paradoxical subreg. > > Index: ira-build.c > === > --- ira-build.c (revision 201963) > +++ ira-build.c (working copy) > @@ -1688,6 +1688,30 @@ create_insn_allocnos (rtx x, bool output > } >return; > } > + else if (code == SUBREG) > +{ > + int regno; > + rtx subreg_reg = SUBREG_REG (x); > + enum machine_mode outermode, innermode; > + > + create_insn_allocnos (subreg_reg, output_p); > + /* For paradoxical subreg, set allocno's mode to be > +the outermode. */ > + outermode = GET_MODE (x); > + innermode = GET_MODE (subreg_reg); > + if (REG_P (subreg_reg) > + && (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode))) > + { > + regno = REGNO (subreg_reg); > + if (regno >= FIRST_PSEUDO_REGISTER) > + { > + ira_allocno_t a = ira_curr_regno_allocno_map[regno]; > + ira_assert (a != NULL); > + ALLOCNO_MODE (a) = outermode; > + } > + } > + return; > +} >else if (code == SET) > { >create_insn_allocnos (SET_DEST (x), true); Wei Mi, sorry for misleading you. I thought more about the problem. It is not a trivial one. The patch is ok for the code correctness but it might hurt code performance. For example, we have code ... (reg:DI) ... ... ... (subreg:TI (reg:DI)) ... ...(reg:DI) We need two hard regs only for the second place by transforming p = (reg:DI) ...(subreg:TI p) With this patch we requires two hard regs for the all live range of the original pseudo (which can be quite long). It might considerably worsen code performance. So the problem could be fixed in LRA which can make this transformation or even in IRA (that would be even better as we put pseudo P into the global picture vs. probably spilling some pseudos for P in LRA). I need some time to think what is better (e.g. I don't know how to implement it in IRA without big compilation slow down). In any case, the solution for the problem will be not that easy as in the patch. Sorry again.
Re: [PATCH] Trivial cleanup
On 09/25/2013 11:33 AM, Jeff Law wrote: I was looking at the vec class to figure out the best way to do some things and realized we have a "last" member function. Using foo.last() is clearer than foo[foo.length() - 1] On a related note, our current standards require a space between a function name and the open paren for its argument list. However, it leads to fugly code in some cases. Assume foo is an vec instance and we want to look at something in the last vector element. Our current standards would imply something like this: foo.last ()->bar Which is how the current patch formats that code. However, I typically see this more often in C++ codebases as foo.last()->bar But then, what if we just want the last element without peeking deeper inside it? foo.last ()? or foo.last()? Do we want to make any changes in our style guidelines to handle this better? I noticed that with the wrapper conversion, often you will get a sequence of 3 or more method calls, and its quite unbearable to have the spaces. simple things like int unsignedsrcp = ptrvar.type().type().type_unsigned(); vs int unsignedsrcp = ptrvar.type ().type ().type_unsigned (); I was going to bring it up at some point too. My preference is strongly to simply eliminate the space on methods... Andrew
Re: [PATCH] Trivial cleanup
On 9/25/13 10:46 AM, Andrew MacLeod wrote: I was going to bring it up at some point too. My preference is strongly to simply eliminate the space on methods... Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time. Paolo.
Re: [PATCH] Trivial cleanup
On 09/25/2013 09:48 AM, Paolo Carlini wrote: On 9/25/13 10:46 AM, Andrew MacLeod wrote: I was going to bring it up at some point too. My preference is strongly to simply eliminate the space on methods... Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time. Yea. I actually reviewed the libstdc++ guidelines to see where they differed from GNU's C guidelines. I'm strongly in favor of dropping the horizontal whitespace between the method name and its open paren when the result is then dereferenced. ie foo.last()->e rather than foo.last ()->e. However, do we want an exception when the result isn't immediately dereferenced? ie, foo.last () or foo.last()? jeff
Re: [C++1y] [PATCH 3/4] ... canonical type workaround and refactoring
On 09/24/2013 02:05 AM, Adam Butcher wrote: Shall I push the patch below to trunk as an intermediate workaround whilst I get to refactoring to support on-the-fly template parm synthesis? I think let's wait for a better fix. On the subject of on-the-fly synthesis: I haven't started yet but I'm thinking to trigger in 'cp_parser_simple_type_specifier' when 'current_binding_level->kind == sk_function_parms'. I can foresee a potential issue with packs in that, upon reading the 'auto' (potentially nested in some other type), a plain template parm will be synthesized; but it may need to be a pack parm type if '...' is found later. Hmm, good point. I think there are two options: 1) Build up the type as normal and use tsubst to replace the non-pack template parameter with a pack if needed. 2) If we see 'auto', scan ahead (possibly via tentative parsing) to see if there's a ... Jason
Re: RFA: Store the REG_BR_PROB probability directly as an int
On Tue, 2013-09-24 at 21:07 +0200, Andreas Schwab wrote: > Richard Sandiford writes: > > > Sorry for the breakage. I think we need to handle INT_LIST in the same way > > as INSN_LIST though, and eliminate in XEXP (x, 1). > > > > How about the attached? Testing in progress... > > Works for me as well. > > Andreas. This patch worked for me as well on MIPS. I did a complete build and test overnight. Steve Ellcey sell...@mips.com
Re: [PATCH] Trivial cleanup
On Wed, Sep 25, 2013 at 11:46:12AM -0400, Andrew MacLeod wrote: > I noticed that with the wrapper conversion, often you will get a > sequence of 3 or more method calls, and its quite unbearable to have > the spaces. > simple things like > int unsignedsrcp = ptrvar.type().type().type_unsigned(); > vs > int unsignedsrcp = ptrvar.type ().type ().type_unsigned (); > > I was going to bring it up at some point too. My preference is > strongly to simply eliminate the space on methods... My preference is to keep the spaces, it makes code more readable, no space before ( looks very ugly to me. Jakub
Re: [PATCH v4 00/20] resurrect automatic dependencies
Paolo> The series is good! Thanks! I'm checking it in now. I'll be around to fix up any bugs I introduce. I'll send a note to the gcc list when I'm done, just to warn people. Tom
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
> performance. For example, we have code > > ... (reg:DI) ... > ... > ... (subreg:TI (reg:DI)) > ... > ...(reg:DI) > > We need two hard regs only for the second place by transforming > > p = (reg:DI) > > ...(subreg:TI p) > > With this patch we requires two hard regs for the all live range of the > original pseudo (which can be quite long). It might considerably worsen > code performance. > > So the problem could be fixed in LRA which can make this transformation > or even in IRA (that would be even better as we put pseudo P into the > global picture vs. probably spilling some pseudos for P in LRA). > Thanks for your detailed explanation. Now I understand what you concern here. > I need some time to think what is better (e.g. I don't know how to > implement it in IRA without big compilation slow down). In any case, > the solution for the problem will be not that easy as in the patch. To fix it in IRA, it looks like we want a live range splitting pass for pseudos used in paradoxical subreg here. Is the potential compilation slow down you mention here caused by more allocnos introduced by the live range splitting, or something else? Thanks, Wei Mi.
Re: [PATCH v4 00/20] resurrect automatic dependencies
> Paolo> The series is good! > > Thanks! > > I'm checking it in now. I'll be around to fix up any bugs I introduce. > I'll send a note to the gcc list when I'm done, just to warn people. Thank you for working on this! Honza > > Tom
[gomp4] Compiler side of depend clause support
Hi! This is the compiler side of the depend clause support. If a task has any depend clauses, we pass an array of pointers as 8th argument to GOMP_task, with value 8 ored into the 7th argument (flags) to signalize the presence of the 8th argument. The array starts with two integers (casted to void * pointers), first is total number of depend clauses, second is number of inout/out depend clauses (those are treated the same), followed by addresses of the first bytes of all the objects, first from the depend(inout:) or depend(out:) clauses, then from depend(in:) clauses. Will commit tomorrow unless somebody chimes in. 2013-09-25 Jakub Jelinek * omp-low.c (expand_task_call): If there are depend clauses, pass bit 8 in 7th argument and pass pointer to depend array as 8th argument. (lower_depend_clauses): New function. (lower_omp_taskreg): Handle depend clauses. * omp-builtins.def (BUILT_IN_GOMP_TASK): Use BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR instead of BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT. * builtin-types.def (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT): Remove. (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR): New. fortran/ * types.def (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT): Remove. (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR): New. --- gcc/omp-low.c.jj2013-09-25 09:51:45.0 +0200 +++ gcc/omp-low.c 2013-09-25 15:34:53.705241334 +0200 @@ -4203,7 +4203,7 @@ expand_parallel_call (struct omp_region static void expand_task_call (basic_block bb, gimple entry_stmt) { - tree t, t1, t2, t3, flags, cond, c, c2, clauses; + tree t, t1, t2, t3, flags, cond, c, c2, clauses, depend; gimple_stmt_iterator gsi; location_t loc = gimple_location (entry_stmt); @@ -4217,8 +4217,9 @@ expand_task_call (basic_block bb, gimple c = find_omp_clause (clauses, OMP_CLAUSE_UNTIED); c2 = find_omp_clause (clauses, OMP_CLAUSE_MERGEABLE); + depend = find_omp_clause (clauses, OMP_CLAUSE_DEPEND); flags = build_int_cst (unsigned_type_node, -(c ? 1 : 0) + (c2 ? 4 : 0)); +(c ? 1 : 0) + (c2 ? 4 : 0) + (depend ? 8 : 0)); c = find_omp_clause (clauses, OMP_CLAUSE_FINAL); if (c) @@ -4229,6 +4230,10 @@ expand_task_call (basic_block bb, gimple build_int_cst (unsigned_type_node, 0)); flags = fold_build2_loc (loc, PLUS_EXPR, unsigned_type_node, flags, c); } + if (depend) +depend = OMP_CLAUSE_DECL (depend); + else +depend = build_int_cst (ptr_type_node, 0); gsi = gsi_last_bb (bb); t = gimple_omp_task_data_arg (entry_stmt); @@ -4244,9 +4249,10 @@ expand_task_call (basic_block bb, gimple t3 = build_fold_addr_expr_loc (loc, t); t = build_call_expr (builtin_decl_explicit (BUILT_IN_GOMP_TASK), - 7, t1, t2, t3, + 8, t1, t2, t3, gimple_omp_task_arg_size (entry_stmt), - gimple_omp_task_arg_align (entry_stmt), cond, flags); + gimple_omp_task_arg_align (entry_stmt), cond, flags, + depend); force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, GSI_CONTINUE_LINKING); @@ -9232,6 +9238,68 @@ create_task_copyfn (gimple task_stmt, om pop_cfun (); } +static void +lower_depend_clauses (gimple stmt, gimple_seq *iseq, gimple_seq *oseq) +{ + tree c, clauses; + gimple g; + size_t n_in = 0, n_out = 0, idx = 2, i; + + clauses = find_omp_clause (gimple_omp_task_clauses (stmt), +OMP_CLAUSE_DEPEND); + gcc_assert (clauses); + for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND) + switch (OMP_CLAUSE_DEPEND_KIND (c)) + { + case OMP_CLAUSE_DEPEND_IN: + n_in++; + break; + case OMP_CLAUSE_DEPEND_OUT: + case OMP_CLAUSE_DEPEND_INOUT: + n_out++; + break; + default: + gcc_unreachable (); + } + tree type = build_array_type_nelts (ptr_type_node, n_in + n_out + 2); + tree array = create_tmp_var (type, NULL); + tree r = build4 (ARRAY_REF, ptr_type_node, array, size_int (0), NULL_TREE, + NULL_TREE); + g = gimple_build_assign (r, build_int_cst (ptr_type_node, n_in + n_out)); + gimple_seq_add_stmt (iseq, g); + r = build4 (ARRAY_REF, ptr_type_node, array, size_int (1), NULL_TREE, + NULL_TREE); + g = gimple_build_assign (r, build_int_cst (ptr_type_node, n_out)); + gimple_seq_add_stmt (iseq, g); + for (i = 0; i < 2; i++) +{ + if ((i ? n_in : n_out) == 0) + continue; + for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND + && ((OMP_CLAUSE_DEPEND_KIND (c) != OMP_CLAUSE_DEPEND_IN) ^ i)) + { + tree t = OMP_CL
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
On 09/25/2013 12:42 PM, Wei Mi wrote: >> performance. For example, we have code >> >> ... (reg:DI) ... >> ... >> ... (subreg:TI (reg:DI)) >> ... >> ...(reg:DI) >> >> We need two hard regs only for the second place by transforming >> >> p = (reg:DI) >> >> ...(subreg:TI p) >> >> With this patch we requires two hard regs for the all live range of the >> original pseudo (which can be quite long). It might considerably worsen >> code performance. >> >> So the problem could be fixed in LRA which can make this transformation >> or even in IRA (that would be even better as we put pseudo P into the >> global picture vs. probably spilling some pseudos for P in LRA). >> > Thanks for your detailed explanation. Now I understand what you concern here. > >> I need some time to think what is better (e.g. I don't know how to >> implement it in IRA without big compilation slow down). In any case, >> the solution for the problem will be not that easy as in the patch. > To fix it in IRA, it looks like we want a live range splitting pass > for pseudos used in paradoxical subreg here. Is the potential > compilation slow down you mention here caused by more allocnos > introduced by the live range splitting, or something else? > > To define for what occurrence of the pseudo we should do the transformation, we need to create allocnos and calculate reg classes to know what paradoxical subreg needs more hard regs (the transformations can not be done for all paradoxical subregs as my experience shows many RTL changes result in worse RA even if we have heuristics to remove the generated changes as in this case would be trying to assign the same hard reg for the original and the new pseudo). After doing the transformations, we need to recalculate reg classes and rebuild allocnos (both are expensive). To speed up the process it could be implemented as some kind of update of already existing data but it will complicate code much. So right now I think implementing this in LRA would be easier Still LRA has a pretty good register (re-)allocation (although it is worse than in IRA).
Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL
> Sorry for missing this problem when committing Kugan's patch. > > I have just committed the attached patch, which I hope fixes all the > spaces/indentation issues introduced. Thanks a lot! -- Eric Botcazou
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
> 2013-09-24 Yvan Roux > Vladimir Makarov > > * rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield > operations from the least significant bit. > (strip_address_mutations): Add bitfield operations handling. > (must_be_index_p): Add shifting and rotate operations handling. > (set_address_base): Use must_be_base_p predicate. > (set_address_index):Use must_be_index_p predicate. OK for mainline, if you add the missing space after GET_MODE in enum machine_mode mode = GET_MODE(XEXP (x, 0)); -- Eric Botcazou
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
> FWIW, I'd prefer to keep it as-is, since must_be_base_p (x) and > must_be_index_p (x) don't imply that we should look specifically at > XEXP (x, 0) (rather than just X, or XEXP (x, 1), etc.). I think it's > better to keep the code tests and the associated XEXPs together. Feel free to revert this part, but then add appropriate comments explaining why we are interested in LO_SUM for set_address_base and in MULT and the 5 others for set_address_index. If it's because the former is rather a base than an index and vice-versa for the latter, then it's even clearer with the predicates I think. -- Eric Botcazou
[PATCH] fortran/PR58113
Hi, this test case fails very often, and the reason is not in GCC but in a missing glibc rounding support for strtod. This patch fixes the test case, to first determine if the rounding support is available. This is often the case for real(16) thru the libquadmath. So even in cases where the test case usually fails it still tests something with this patch. Ok for trunk? Regards Bernd Edlinger2013-09-25 Bernd Edlinger PR fortran/58113 * gfortran.dg/round_4.f90: Check for rounding support. --- gcc/testsuite/gfortran.dg/round_4.f90 2013-07-21 13:54:27.0 +0200 +++ gcc/testsuite/gfortran.dg/round_4.f90 2013-08-23 10:16:32.0 +0200 @@ -27,6 +27,17 @@ real(xp) :: r10p, r10m, ref10u, ref10d real(qp) :: r16p, r16m, ref16u, ref16d character(len=20) :: str, round + logical :: rnd4, rnd8, rnd10, rnd16 + + ! Test for which types glibc's strtod function supports rounding + str = '0.01 0.01 0.01 0.01' + read (str, *, round='up') r4p, r8p, r10p, r16p + read (str, *, round='down') r4m, r8m, r10m, r16m + rnd4 = r4p /= r4m + rnd8 = r8p /= r8m + rnd10 = r10p /= r10m + rnd16 = r16p /= r16m +! write (*, *) rnd4, rnd8, rnd10, rnd16 ref4u = 0.10001_4 ref8u = 0.10001_8 @@ -55,40 +66,40 @@ round = 'up' call t() - if (r4p /= ref4u .or. r4m /= -ref4d) call abort() - if (r8p /= ref8u .or. r8m /= -ref8d) call abort() - if (r10p /= ref10u .or. r10m /= -ref10d) call abort() - if (r16p /= ref16u .or. r16m /= -ref16d) call abort() + if (rnd4 .and. (r4p /= ref4u .or. r4m /= -ref4d)) call abort() + if (rnd8 .and. (r8p /= ref8u .or. r8m /= -ref8d)) call abort() + if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10d)) call abort() + if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16d)) call abort() round = 'down' call t() - if (r4p /= ref4d .or. r4m /= -ref4u) call abort() - if (r8p /= ref8d .or. r8m /= -ref8u) call abort() - if (r10p /= ref10d .or. r10m /= -ref10u) call abort() - if (r16p /= ref16d .or. r16m /= -ref16u) call abort() + if (rnd4 .and. (r4p /= ref4d .or. r4m /= -ref4u)) call abort() + if (rnd8 .and. (r8p /= ref8d .or. r8m /= -ref8u)) call abort() + if (rnd10 .and. (r10p /= ref10d .or. r10m /= -ref10u)) call abort() + if (rnd16 .and. (r16p /= ref16d .or. r16m /= -ref16u)) call abort() round = 'zero' call t() - if (r4p /= ref4d .or. r4m /= -ref4d) call abort() - if (r8p /= ref8d .or. r8m /= -ref8d) call abort() - if (r10p /= ref10d .or. r10m /= -ref10d) call abort() - if (r16p /= ref16d .or. r16m /= -ref16d) call abort() + if (rnd4 .and. (r4p /= ref4d .or. r4m /= -ref4d)) call abort() + if (rnd8 .and. (r8p /= ref8d .or. r8m /= -ref8d)) call abort() + if (rnd10 .and. (r10p /= ref10d .or. r10m /= -ref10d)) call abort() + if (rnd16 .and. (r16p /= ref16d .or. r16m /= -ref16d)) call abort() round = 'nearest' call t() - if (r4p /= ref4u .or. r4m /= -ref4u) call abort() - if (r8p /= ref8u .or. r8m /= -ref8u) call abort() - if (r10p /= ref10u .or. r10m /= -ref10u) call abort() - if (r16p /= ref16u .or. r16m /= -ref16u) call abort() + if (rnd4 .and. (r4p /= ref4u .or. r4m /= -ref4u)) call abort() + if (rnd8 .and. (r8p /= ref8u .or. r8m /= -ref8u)) call abort() + if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10u)) call abort() + if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16u)) call abort() ! Same as nearest (but rounding towards zero if there is a tie ! [does not apply here]) round = 'compatible' call t() - if (r4p /= ref4u .or. r4m /= -ref4u) call abort() - if (r8p /= ref8u .or. r8m /= -ref8u) call abort() - if (r10p /= ref10u .or. r10m /= -ref10u) call abort() - if (r16p /= ref16u .or. r16m /= -ref16u) call abort() + if (rnd4 .and. (r4p /= ref4u .or. r4m /= -ref4u)) call abort() + if (rnd8 .and. (r8p /= ref8u .or. r8m /= -ref8u)) call abort() + if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10u)) call abort() + if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16u)) call abort() contains subroutine t() !print *, round
[gomp4] Fix UDR handling on combined constructs (PR libgomp/58482)
Hi! This patch fixes a problem with UDRs, where if we copy reduction clauses to multiple constructs (on combined constructs), we weren't copying OMP_CLAUSE_REDUCTION_PLACEHOLDER (which holds IDENTIFIER_NODE, TREE_VEC or some similar magic used by {,c_}finish_omp_clauses later on to properly finalize the clause. Will commit tomorrow. 2013-09-25 Jakub Jelinek PR libgomp/58482 * c-omp.c (c_omp_split_clauses) : Copy also OMP_CLAUSE_REDUCTION_PLACEHOLDER. * testsuite/libgomp.c/simd-6.c: New test. * testsuite/libgomp.c++/simd-8.C: New test. --- gcc/c-family/c-omp.c.jj 2013-09-25 09:51:45.0 +0200 +++ gcc/c-family/c-omp.c2013-09-25 19:08:05.776289461 +0200 @@ -832,6 +832,8 @@ c_omp_split_clauses (location_t loc, enu OMP_CLAUSE_DECL (c) = OMP_CLAUSE_DECL (clauses); OMP_CLAUSE_REDUCTION_CODE (c) = OMP_CLAUSE_REDUCTION_CODE (clauses); + OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) + = OMP_CLAUSE_REDUCTION_PLACEHOLDER (clauses); OMP_CLAUSE_CHAIN (c) = cclauses[C_OMP_CLAUSE_SPLIT_SIMD]; cclauses[C_OMP_CLAUSE_SPLIT_SIMD] = c; } @@ -844,6 +846,8 @@ c_omp_split_clauses (location_t loc, enu OMP_CLAUSE_DECL (c) = OMP_CLAUSE_DECL (clauses); OMP_CLAUSE_REDUCTION_CODE (c) = OMP_CLAUSE_REDUCTION_CODE (clauses); + OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) + = OMP_CLAUSE_REDUCTION_PLACEHOLDER (clauses); OMP_CLAUSE_CHAIN (c) = cclauses[C_OMP_CLAUSE_SPLIT_PARALLEL]; cclauses[C_OMP_CLAUSE_SPLIT_PARALLEL] = c; s = C_OMP_CLAUSE_SPLIT_TEAMS; --- libgomp/testsuite/libgomp.c/simd-6.c.jj 2013-09-25 19:15:09.572135004 +0200 +++ libgomp/testsuite/libgomp.c/simd-6.c2013-09-25 19:15:05.748158078 +0200 @@ -0,0 +1,44 @@ +/* PR libgomp/58482 */ +/* { dg-do run } */ +/* { dg-options "-O2" } */ +/* { dg-additional-options "-msse2" { target sse2_runtime } } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +extern void abort (); +int a[1024] __attribute__((aligned (32))) = { 1 }; +struct S { int s; }; +#pragma omp declare reduction (+:struct S:omp_out.s += omp_in.s) +#pragma omp declare reduction (foo:struct S:omp_out.s += omp_in.s) +#pragma omp declare reduction (foo:int:omp_out += omp_in) + +__attribute__((noinline, noclone)) int +foo (void) +{ + int i, u = 0; + struct S s, t; + s.s = 0; t.s = 0; + #pragma omp parallel for simd aligned(a : 32) reduction(+:s) \ + reduction(foo:t, u) + for (i = 0; i < 1024; i++) +{ + int x = a[i]; + s.s += x; + t.s += x; + u += x; +} + if (t.s != s.s || u != s.s) +abort (); + return s.s; +} + +int +main () +{ + int i; + for (i = 0; i < 1024; i++) +a[i] = (i & 31) + (i / 128); + int s = foo (); + if (s != 19456) +abort (); + return 0; +} --- libgomp/testsuite/libgomp.c++/simd-8.C.jj 2013-09-25 19:15:42.069968629 +0200 +++ libgomp/testsuite/libgomp.c++/simd-8.C 2013-09-25 19:16:03.615860022 +0200 @@ -0,0 +1,47 @@ +// PR libgomp/58482 +// { dg-do run } +// { dg-options "-O2" } +// { dg-additional-options "-msse2" { target sse2_runtime } } +// { dg-additional-options "-mavx" { target avx_runtime } } + +extern "C" void abort (); +int a[1024] __attribute__((aligned (32))) = { 1 }; +struct S +{ + int s; + S () : s (0) {} + ~S () {} +}; +#pragma omp declare reduction (+:S:omp_out.s += omp_in.s) +#pragma omp declare reduction (foo:S:omp_out.s += omp_in.s) +#pragma omp declare reduction (foo:int:omp_out += omp_in) + +__attribute__((noinline, noclone)) int +foo () +{ + int i, u = 0; + S s, t; + #pragma omp parallel for simd aligned(a : 32) reduction(+:s) \ + reduction(foo:t, u) + for (i = 0; i < 1024; i++) +{ + int x = a[i]; + s.s += x; + t.s += x; + u += x; +} + if (t.s != s.s || u != s.s) +abort (); + return s.s; +} + +int +main () +{ + int i; + for (i = 0; i < 1024; i++) +a[i] = (i & 31) + (i / 128); + int s = foo (); + if (s != 19456) +abort (); +} Jakub
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
> To define for what occurrence of the pseudo we should do the > transformation, we need to create allocnos and calculate reg classes to > know what paradoxical subreg needs more hard regs (the transformations > can not be done for all paradoxical subregs as my experience shows many > RTL changes result in worse RA even if we have heuristics to remove the > generated changes as in this case would be trying to assign the same > hard reg for the original and the new pseudo). > After doing the transformations, we need to recalculate reg classes > and rebuild allocnos (both are expensive). To speed up the process it > could be implemented as some kind of update of already existing data but > it will complicate code much. > I see, thanks! > So right now I think implementing this in LRA would be easier Still LRA > has a pretty good register (re-)allocation (although it is worse than in > IRA). > When you get an idea how to fix it in LRA, if you are still busy, I would be happy to do the implementation if you could brief your idea. Thanks, Wei Mi.
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
On 24 Sep 10:04, Richard Henderson wrote: > On 08/27/2013 11:37 AM, Kirill Yukhin wrote: > > Hello, > > > >> This patch is still far too large. > >> > >> I think you should split it up based on every single mode iterator that > >> you need to add or change. > > > > Problem is that some iterators are depend on each other, so patches are > > not going to be tiny. > > > > Here is 1st one. It extends VF iterator - biggest impact I believe > > > > Is it Ok? > > > > Testing: > > 1. Bootstrap pass. > > 2. make check shows no regressions. > > 3. Spec 2000 & 2006 build show no regressions both with and without > > -mavx512f option. > > 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f > > option. > > > Ok. > > > r~ Checked into main trunk by Kirill Yukhin: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00779.html -- Ilya
Re: Context sensitive type inheritance graph walking
On 2013.09.25 at 12:20 +0200, Jan Hubicka wrote: > this is updated version of > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00936.html > > Bootstrapped/regtested x86_64-linux. The patch is tested by ipa-devirt9 > testcase. I have extra four, but I would like to first fix the case where the > devirtualization happens in TODO of early_local_passes that is not dumped > anywhere. So I plan to post these incrementally once this code is hooked also > into gimple folding. > > The patch results in 60% more devirtualizations on Firefox and 10% more > speculative devirtualization. I think main component missing now is code > determining dynamic type from a call to constructor. I have some prototype > for > this, too, I would like to discuss incrementally. I am not 100% sure how much > harder tracking of dynamic type changes becomes here. Hi Honza, I've tested your patch and it failed during the "profile generate" phase of an LTO/PGO build of Firefox. Reduced: markus@x4 /tmp % cat test.ii class A { public: virtual void m_fn1(); }; class B final : A { ~B(); virtual void m_fn2() { m_fn1(); } }; B::~B() {} markus@x4 /tmp % g++ -c -std=c++11 -O2 -c test.ii test.ii: In member function ‘virtual void B::m_fn2()’: test.ii:7:16: error: stmt (0x7f85504c3130) marked modified after optimization pass: virtual void m_fn2() { m_fn1(); } ^ # .MEM_6 = VDEF <.MEM_2(D)> A::m_fn1 (_5); test.ii:7:16: internal compiler error: verify_ssa failed 0xc62364 verify_ssa(bool) ../../gcc/gcc/tree-ssa.c:1046 0xa305a1 execute_function_todo ../../gcc/gcc/passes.c:1834 0xa30d07 execute_todo ../../gcc/gcc/passes.c:1866 0xa32af9 execute_one_ipa_transform_pass ../../gcc/gcc/passes.c:2049 0xa32af9 execute_all_ipa_transforms() ../../gcc/gcc/passes.c:2079 0x7e3cc0 expand_function ../../gcc/gcc/cgraphunit.c:1743 0x7e5d96 expand_all_functions ../../gcc/gcc/cgraphunit.c:1855 0x7e5d96 compile() ../../gcc/gcc/cgraphunit.c:2192 0x7e6379 finalize_compilation_unit() ../../gcc/gcc/cgraphunit.c:2269 0x5f816e cp_write_global_declarations() ../../gcc/gcc/cp/decl2.c:4360 Please submit a full bug report, -- Markus
Re: [PATCH, LRA, AARCH64] Switching LRA on for AArch64
Hi, the needed lra analyser patch was commited as r202914. Thanks, Yvan On 24 September 2013 11:03, Yvan Roux wrote: > Hi, > > The following patch switch LRA on for AArch64. The patch introduces > an undocumented option -mlra to use LRA instead of reload, for a > testing purpose. Please notice that this patch is dependent on the > one submitted in the thread below: > > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00805.html > > Thanks, > Yvan > > 2013-09-24 Yvan Roux > > * config/aarch64/aarch64.opt (mlra): New option. > * config/aarch64/aarch64.c (aarch64_lra_p): New function. > (TARGET_LRA_P): Define.
*PING* Re: [Patch, Fortran] PR57697/58469 - Fix another defined-assignment issue
* PING * http://gcc.gnu.org/ml/fortran/2013-09/msg00039.html Additionally pinging for: http://gcc.gnu.org/ml/fortran/2013-09/msg00031.html On September 19, 2013 21:11, Tobias Burnus wrote: This patch fixes two issues: a) It could happen that no code change has happened. In that case, the one freed an expression which still should be used. b) In my previous patch, I used a pointer assignment to the temporary of the LHS (after its allocation) [only if the LHS was initially unassigned]. That lead to a problem with double deallocation (temporary + LHS). In the previous test case, it didn't matter as the LHS wasn't freed (implicit SAVE of in the main program). That's now solved by a NULL-pointer assignment. Finally, I corrected some indenting issues and removed unreachable code. Build and regtested on x86-64-gnu-linux. OK for the trunk and the 4.8 branch? Tobias PS: For the testcase of (a), I am not quite sure whether the intrinsic assignment should invoke the defined assignment. It currently doesn't for gfortran and crayftn. In any case, the invalid freeing is wrong.
Re: [PATCH] fortran/PR58113
Bernd Edlinger wrote: this test case fails very often, and the reason is not in GCC but in a missing glibc rounding support for strtod. This patch fixes the test case, to first determine if the rounding support is available. This is often the case for real(16) thru the libquadmath. So even in cases where the test case usually fails it still tests something with this patch. Ok for trunk? First, for Fortran patches, it helps if you CC fortran@ as otherwise the email might be missed. Your change doesn't really directly check whether strtod handles rounding but whether libgfortran (invoking strtod) supports up/down rounding. Hence, after your patch, one effectively checks - given that up/down rounding works (or at least produces different values) - that the nearest/zero/up/down give the correct result. As only few strtod implementations currently support rounding, it is probably the best approach. However, I think it merits a comment making clear what it now checked (and what isn't). Maybe something along my paragraph (but polished) - and removing the then obsoleted parts of the existing comment. Except for the comment update, the patch looks fine to me. Tobias PS: I wonder whether there is a good way to do rounded strtod without relying on the system's libc to handle it. changelog-round4.txt 2013-09-25 Bernd Edlinger PR fortran/58113 * gfortran.dg/round_4.f90: Check for rounding support. patch-round4.diff --- gcc/testsuite/gfortran.dg/round_4.f90 2013-07-21 13:54:27.0 +0200 +++ gcc/testsuite/gfortran.dg/round_4.f90 2013-08-23 10:16:32.0 +0200 @@ -27,6 +27,17 @@ real(xp) :: r10p, r10m, ref10u, ref10d real(qp) :: r16p, r16m, ref16u, ref16d character(len=20) :: str, round + logical :: rnd4, rnd8, rnd10, rnd16 + + ! Test for which types glibc's strtod function supports rounding + str = '0.01 0.01 0.01 0.01' + read (str, *, round='up') r4p, r8p, r10p, r16p + read (str, *, round='down') r4m, r8m, r10m, r16m + rnd4 = r4p /= r4m + rnd8 = r8p /= r8m + rnd10 = r10p /= r10m + rnd16 = r16p /= r16m +! write (*, *) rnd4, rnd8, rnd10, rnd16 ref4u = 0.10001_4 ref8u = 0.10001_8 @@ -55,40 +66,40 @@ round = 'up' call t() - if (r4p /= ref4u .or. r4m /= -ref4d) call abort() - if (r8p /= ref8u .or. r8m /= -ref8d) call abort() - if (r10p /= ref10u .or. r10m /= -ref10d) call abort() - if (r16p /= ref16u .or. r16m /= -ref16d) call abort() + if (rnd4 .and. (r4p /= ref4u .or. r4m /= -ref4d)) call abort() + if (rnd8 .and. (r8p /= ref8u .or. r8m /= -ref8d)) call abort() + if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10d)) call abort() + if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16d)) call abort() round = 'down' call t() - if (r4p /= ref4d .or. r4m /= -ref4u) call abort() - if (r8p /= ref8d .or. r8m /= -ref8u) call abort() - if (r10p /= ref10d .or. r10m /= -ref10u) call abort() - if (r16p /= ref16d .or. r16m /= -ref16u) call abort() + if (rnd4 .and. (r4p /= ref4d .or. r4m /= -ref4u)) call abort() + if (rnd8 .and. (r8p /= ref8d .or. r8m /= -ref8u)) call abort() + if (rnd10 .and. (r10p /= ref10d .or. r10m /= -ref10u)) call abort() + if (rnd16 .and. (r16p /= ref16d .or. r16m /= -ref16u)) call abort() round = 'zero' call t() - if (r4p /= ref4d .or. r4m /= -ref4d) call abort() - if (r8p /= ref8d .or. r8m /= -ref8d) call abort() - if (r10p /= ref10d .or. r10m /= -ref10d) call abort() - if (r16p /= ref16d .or. r16m /= -ref16d) call abort() + if (rnd4 .and. (r4p /= ref4d .or. r4m /= -ref4d)) call abort() + if (rnd8 .and. (r8p /= ref8d .or. r8m /= -ref8d)) call abort() + if (rnd10 .and. (r10p /= ref10d .or. r10m /= -ref10d)) call abort() + if (rnd16 .and. (r16p /= ref16d .or. r16m /= -ref16d)) call abort() round = 'nearest' call t() - if (r4p /= ref4u .or. r4m /= -ref4u) call abort() - if (r8p /= ref8u .or. r8m /= -ref8u) call abort() - if (r10p /= ref10u .or. r10m /= -ref10u) call abort() - if (r16p /= ref16u .or. r16m /= -ref16u) call abort() + if (rnd4 .and. (r4p /= ref4u .or. r4m /= -ref4u)) call abort() + if (rnd8 .and. (r8p /= ref8u .or. r8m /= -ref8u)) call abort() + if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10u)) call abort() + if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16u)) call abort() ! Same as nearest (but rounding towards zero if there is a tie ! [does not apply here]) round = 'compatible' call t() - if (r4p /= ref4u .or. r4m /= -ref4u) call abort() - if (r8p /= ref8u .or. r8m /= -ref8u) call abort() - if (r10p /= ref10u .or. r10m /= -ref10u) call abort() - if (r16p /= ref16u .or. r16m /= -ref16u) call abort() + if (rnd4 .and. (r4p /= ref4u .or. r4m /= -ref4u)) call abort() + if (rnd8 .and. (r8p /= ref8u .or. r8m /= -ref8u)) call abort() + if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10u)) call abort()
Re: *PING* Re: [Patch, Fortran] PR57697/58469 - Fix another defined-assignment issue
Hi Tobias, > * PING * http://gcc.gnu.org/ml/fortran/2013-09/msg00039.html > > Additionally pinging for: > http://gcc.gnu.org/ml/fortran/2013-09/msg00031.html Both are OK. Thanks a lot for the patches! Thomas
Re: [PATCH] Trivial cleanup
On 09/25/2013 10:04 AM, Jakub Jelinek wrote: On Wed, Sep 25, 2013 at 11:46:12AM -0400, Andrew MacLeod wrote: I noticed that with the wrapper conversion, often you will get a sequence of 3 or more method calls, and its quite unbearable to have the spaces. simple things like int unsignedsrcp = ptrvar.type().type().type_unsigned(); vs int unsignedsrcp = ptrvar.type ().type ().type_unsigned (); I was going to bring it up at some point too. My preference is strongly to simply eliminate the space on methods... My preference is to keep the spaces, it makes code more readable, no space before ( looks very ugly to me. I generally find the lack of a space before the open-paren ugly as well, but I find something like Andrew's example with the horizontal spaces insanely bad. Jeff
Re: [PATCH] Trivial cleanup
On Wed, Sep 25, 2013 at 01:18:06PM -0600, Jeff Law wrote: > On 09/25/2013 10:04 AM, Jakub Jelinek wrote: > >On Wed, Sep 25, 2013 at 11:46:12AM -0400, Andrew MacLeod wrote: > >>I noticed that with the wrapper conversion, often you will get a > >>sequence of 3 or more method calls, and its quite unbearable to have > >>the spaces. > >>simple things like > >> int unsignedsrcp = ptrvar.type().type().type_unsigned(); > >>vs > >> int unsignedsrcp = ptrvar.type ().type ().type_unsigned (); > >> > >>I was going to bring it up at some point too. My preference is > >>strongly to simply eliminate the space on methods... > > > >My preference is to keep the spaces, it makes code more readable, > >no space before ( looks very ugly to me. > I generally find the lack of a space before the open-paren ugly as > well, but I find something like Andrew's example with the horizontal > spaces insanely bad. Then perhaps we don't want the above style and write int unsignedsrcp = type_unsigned (type (type (ptrvar))); instead? If we really need to have C++ uglification everywhere, perhaps through: template tree type (T t) { return t.get_type (t); } or similar. Jakub
Re: Context sensitive type inheritance graph walking
> On 2013.09.25 at 12:20 +0200, Jan Hubicka wrote: > > this is updated version of > > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00936.html > > > > Bootstrapped/regtested x86_64-linux. The patch is tested by ipa-devirt9 > > testcase. I have extra four, but I would like to first fix the case where > > the > > devirtualization happens in TODO of early_local_passes that is not dumped > > anywhere. So I plan to post these incrementally once this code is hooked > > also > > into gimple folding. > > > > The patch results in 60% more devirtualizations on Firefox and 10% more > > speculative devirtualization. I think main component missing now is code > > determining dynamic type from a call to constructor. I have some prototype > > for > > this, too, I would like to discuss incrementally. I am not 100% sure how > > much > > harder tracking of dynamic type changes becomes here. > > Hi Honza, > > I've tested your patch and it failed during the "profile generate" phase of an > LTO/PGO build of Firefox. > > Reduced: > > markus@x4 /tmp % cat test.ii > class A { > public: > virtual void m_fn1(); > }; > class B final : A { > ~B(); > virtual void m_fn2() { m_fn1(); } > }; > B::~B() {} > > markus@x4 /tmp % g++ -c -std=c++11 -O2 -c test.ii > test.ii: In member function ???virtual void B::m_fn2()???: > test.ii:7:16: error: stmt (0x7f85504c3130) marked modified after optimization > pass: Thanks, it looks like latent problem in remove_unreachable_nodes that does not update SSA after changing call to a direct call. I will fix it tomorrow. M:q Honza
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
On 09/25/2013 02:00 PM, Wei Mi wrote: >> To define for what occurrence of the pseudo we should do the >> transformation, we need to create allocnos and calculate reg classes to >> know what paradoxical subreg needs more hard regs (the transformations >> can not be done for all paradoxical subregs as my experience shows many >> RTL changes result in worse RA even if we have heuristics to remove the >> generated changes as in this case would be trying to assign the same >> hard reg for the original and the new pseudo). >> After doing the transformations, we need to recalculate reg classes >> and rebuild allocnos (both are expensive). To speed up the process it >> could be implemented as some kind of update of already existing data but >> it will complicate code much. >> > I see, thanks! > >> So right now I think implementing this in LRA would be easier Still LRA >> has a pretty good register (re-)allocation (although it is worse than in >> IRA). >> > When you get an idea how to fix it in LRA, if you are still busy, I > would be happy to do the implementation if you could brief your idea. > Probably the best place to add a code for this is in lra-constraints.c::simplify_operand_subreg by permitting subreg reload for paradoxical subregs whose hard regs are not fully in allocno class of the inner pseudo. It needs a good testing (i'd check that the generated code is not changed on variety benchmarks to see that the change has no impact on the most programs performance) and you need to add a good comment describing why this change is needed.
Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.
> > The description is too terse. In the RTL middle-end, you shouldn't have > > to manually deal with the REG_DEAD and REG_UNUSED notes (unlike > > REG_EQUAL and REG_EQUIV notes), as the DF framework is supposed to do > > it for you. > > Unfortunately LRA uses its own DF framework. In fact reload does just the same: /* Make a pass over all the insns and delete all USEs which we inserted only to tag a REG_EQUAL note on them. Remove all REG_DEAD and REG_UNUSED notes. Delete all CLOBBER insns, except those that refer to the return value and the special mem:BLK CLOBBERs added to prevent the scheduler from misarranging variable-array code, and simplify (subreg (reg)) operands. Strip and regenerate REG_INC notes that may have been moved around. */ [...] pnote = ®_NOTES (insn); while (*pnote != 0) { if (REG_NOTE_KIND (*pnote) == REG_DEAD || REG_NOTE_KIND (*pnote) == REG_UNUSED || REG_NOTE_KIND (*pnote) == REG_INC) *pnote = XEXP (*pnote, 1); else pnote = &XEXP (*pnote, 1); } so I guess LRA is entitled to do it as well. But the proper fix to this is to recompute the REG_DEAD/REG_UNUSED notes at the beginning of postreload. -- Eric Botcazou
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
Eric Botcazou writes: >> FWIW, I'd prefer to keep it as-is, since must_be_base_p (x) and >> must_be_index_p (x) don't imply that we should look specifically at >> XEXP (x, 0) (rather than just X, or XEXP (x, 1), etc.). I think it's >> better to keep the code tests and the associated XEXPs together. > > Feel free to revert this part, but then add appropriate comments explaining > why we are interested in LO_SUM for set_address_base and in MULT and the 5 > others for set_address_index. If it's because the former is rather a base > than an index and vice-versa for the latter, then it's even clearer with the > predicates I think. The idea is that must_be_base_p and must_be_index_p are used when deciding how to classify the operands of a PLUS. set_address_base and set_address_index are called once we've decided which is which and want to record the choice. There's no backing out by that stage. So in the set_* routines it isn't about whether the value is definitely a base or a definitely an index. It's just about drilling down through what we've already decided is a base or index to get the inner reg or mem, and knowing which XEXPs to look at. We could instead have used a for_each_rtx, or something like that, without any code checks. But I wanted to be precise about the types of address we allow, so that we can assert for things we don't understand. In other words, it was "designed" to require the kind of extension Yvan is adding here. E.g. although it's a bit of a stretch, it might be in future that someone wants to allow NOT in an index. It would then make sense to add NOT to must_be_index_p. But the existing check: if ((GET_CODE (*inner) == MULT || GET_CODE (*inner) == ASHIFT) && CONSTANT_P (XEXP (*inner, 1))) inner = strip_address_mutations (&XEXP (*inner, 0)); wouldn't make sense as: if (must_be_index_p (*inner) && CONSTANT_P (XEXP (*inner, 1))) inner = strip_address_mutations (&XEXP (*inner, 0)); in that case, since NOT only has one operand. And it might be that the NOT is allowed only outside the scale, only inside the scale, or in both positions. Not the best example, sorry. Thanks, Richard
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
> So in the set_* routines it isn't about whether the value is definitely > a base or a definitely an index. It's just about drilling down through > what we've already decided is a base or index to get the inner reg or mem, > and knowing which XEXPs to look at. We could instead have used a > for_each_rtx, or something like that, without any code checks. But I > wanted to be precise about the types of address we allow, so that we can > assert for things we don't understand. In other words, it was "designed" > to require the kind of extension Yvan is adding here. Does this mean that the design is to require a parallel implementation in the predicates and in the set routines, i.e. each time you add a new case to the predicates, you need to add it (or do something) to the set routines as well? If so, that's a little weird, but OK, feel free to revert the de-duplication part, but add comments saying that the functions must be kept synchronized. -- Eric Botcazou
Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.
On Wed, Sep 25, 2013 at 4:55 PM, Vladimir Makarov wrote: > On 09/24/2013 03:40 PM, Mike Stump wrote: >> On Sep 24, 2013, at 12:23 PM, Steven Bosscher wrote: >>> On Tue, Sep 24, 2013 at 5:03 PM, Eric Botcazou wrote: > This patch removes REG_DEAD and REG_UNUSED notes DF framework is supposed to do it for you. >>> Unfortunately LRA uses its own DF framework. > I'd say fortunately. Otherwise, LRA would decrease whole compiler speed > by upto 10%. I can say it because of the first version of LRA was based > on DF framework. You've said that before, but I don't think you ever really gave the DF framework a fair chance. There are many knobs and switches on DF that can make or ruin its performance. I don't remember seeing your DF-based LRA code, or you seeking advice from DF maintainers. As you may remember, LRA had a great number of relatively easy compiler speed improvements left just before the merge from the lra-branch to the trunk. We will never know whether your DF-based LRA would have been competitive, given a little more care and effort, and some help from people who know the DF framework well. But it doesn't matter anymore now, anyway. IRA+LRA are fine the way it is, IMHO. IRA+LRA needs to have a different view on registers than almost all other passes need: allocnos vs. REGs, insn operand number matters more, LRA needs to see the MEM if the registers are used in an address, etc. I'm not sure DF is the right framework for IRA+LRA to do their job! It's only unfortunate that LRA doesn't, and maybe can't use DF, because I just would have liked to see the whole compiler use a single framework, to improve maintainability (as in this case: managing REG_DEAD/REG_UNUSED notes in only one place) and to let all passes benefit from any work done to improve the framework. > I'd be glad if we move to DF without compilation time degradation. > > I already wrote that DF-design (although it has a nice API which is good > for prototyping) has a bad design with resource usage point of view. > Keeping a lot of duplicated info by RTL side increases memory footprint > very much, worsens data locality, and slows down the compiler. The memory footprint is still a significant downside, but the other two points are not. Data locality can be a lot better if the DF caches are used. For example: I rewrote the RTL CPROP pass using the DF operand caches and that took the pass off the list of compile time bottlenecks (before 3-4% compile time on preprocessed GCC source; now less than 1%). Most of the speedup came from fewer L2 cache misses, because CPROP doesn't walk insn patterns anymore in its dataflow analysis phase. The DF operand caches are compiler speed killers if used incorrectly, e.g. automatic updating after each change for passes that make a lot of changes. Manual updating is the solution used by such passes. I can only speculate, but perhaps the DF-based LRA used an inappropriate DF caches update mode. (What's still not done well in the DF framework, is re-using the memory allocated for the DF caches...) > As I > remember correctly, GCC steering committing permitted 5% compiler time > degradation as a criterium to include it into the trunk and DF achieved > this with difficulties. In fact, as I remember it, the df-branch at the time of the merge was still above that 5% threshold. Was it below 5% slowdown before the merge? Either way: Much of that slowdown was due to having the LR, RU, and LIVE problems separate. The RU and LIVE problems were merged before GCC 4.3 and that solved much of the slowdown (and a significant portion of the memory footprint issues) . Moreover, the solver itself was rewritten before GCC 4.3 was released, and all DF problems benefited from that. Another big compiler speed problem was with the RD problem that fwprop used at the time. The RD problem can now be much faster if reaching defs are pruned using liveness, and all RD users benefit from this. Single framework change, many passes benefit! Ciao! Steven
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Tue, Sep 24, 2013 at 11:25 AM, Teresa Johnson wrote: > On Tue, Sep 24, 2013 at 10:57 AM, Jan Hubicka wrote: >>> >>> I looked at one that failed after 100 as well (20031204-1.c). In this >>> case, it was due to expansion which was creating multiple branches/bbs >>> from a logical OR and guessing incorrectly on how to assign the >>> counts: >>> >>> if (octets == 4 && (*cp == ':' || *cp == '\0')) { >>> >>> The (*cp == ':' || *cp == '\0') part looked like the following going >>> into RTL expansion: >>> >>> [20031204-1.c : 31:33] _29 = _28 == 58; >>> [20031204-1.c : 31:33] _30 = _28 == 0; >>> [20031204-1.c : 31:33] _31 = _29 | _30; >>> [20031204-1.c : 31:18] if (_31 != 0) >>> goto ; >>> else >>> goto ; >>> >>> where the result of the OR was always true, so bb 16 had a count of >>> 100 and bb 19 a count of 0. When it was expanded, the expanded version >>> of the above turned into 2 bbs with a branch in between. Both >>> comparisons were done in the first bb, but the first bb checked >>> whether the result of the *cp == '\0' compare was true, and if not >>> branched to the check for whether the *cp == ':' compare was true. It >>> gave the branch to the second check against ':' a count of 0, so that >>> bb got a count of 0 and was split out, and put the count of 100 on the >>> fall through assuming the compare with '\0' always evaluated to true. >>> In reality, this OR condition was always true because *cp was ':', not >>> '\0'. Therefore, the count of 0 on the second block with the check for >>> ':' was incorrect, we ended up trying to execute it, and failed. >> >> I see, we produce: >> ;; if (_26 != 0) >> >> (insn 94 93 95 (set (reg:CCZ 17 flags) >> (compare:CCZ (reg:QI 107 [ D.2184 ]) >> (const_int 0 [0]))) a.c:31 -1 >> (nil)) >> >> (insn 95 94 96 (set (reg:QI 122 [ D.2186 ]) >> (eq:QI (reg:CCZ 17 flags) >> (const_int 0 [0]))) a.c:31 -1 >> (nil)) >> >> (insn 96 95 97 (set (reg:CCZ 17 flags) >> (compare:CCZ (reg:QI 122 [ D.2186 ]) >> (const_int 0 [0]))) a.c:31 -1 >> (nil)) >> >> (jump_insn 97 96 98 (set (pc) >> (if_then_else (ne (reg:CCZ 17 flags) >> (const_int 0 [0])) >> (label_ref 100) >> (pc))) a.c:31 -1 >> (expr_list:REG_BR_PROB (const_int 6100 [0x17d4]) >> (nil))) >> >> (insn 98 97 99 (set (reg:CCZ 17 flags) >> (compare:CCZ (reg:QI 108 [ D.2186 ]) >> (const_int 0 [0]))) a.c:31 -1 >> (nil)) >> >> (jump_insn 99 98 100 (set (pc) >> (if_then_else (eq (reg:CCZ 17 flags) >> (const_int 0 [0])) >> (label_ref 0) >> (pc))) a.c:31 -1 >> (expr_list:REG_BR_PROB (const_int 3900 [0xf3c]) >> (nil))) >> >> (code_label 100 99 0 14 "" [0 uses]) >> >> That is because we TER together "_26 = _25 | _24" and "if (_26 != 0)" >> >> First I think the logic of do_jump should really be moved to trees. It is >> not >> doing things that can not be adequately represented by gimple. >> >> I am not that certain we want to move it before profiling though. >>> >>> Presumably we had the correct profile data for both blocks, but the >>> accuracy was reduced when the OR was represented as a logical >>> computation with a single branch. We could change the expansion code >>> to do something different, e.g. treat as a 50-50 branch. But we would >>> still end up with integer truncation issues when there was a single >>> training run. But that could be dealt with conservatively in the >> >> Yep, but it is still better than what we have now - if the test above was >> in hot part of program (i.e. not executed once), we will end up optimizing >> the second conditional for size. >> >> So I think it is do_jump bug to not distribute probabilities across the two >> conditoinals introduced. >>> bbpart code as I suggested for the jump threading issue above. I.e. a >>> cold block with incoming non-cold edges conservatively not marked cold >>> for splitting. >> >> Yep, we can probably do that, but we ought to fix the individual cases >> above at least for resonable number of runs. > > I made this change and it removed a few of the failures. > > I looked at another case that still failed with 1 train run but passed > with 100. It turned out to be another truncation issue exposed by RTL > expansion, where we created some control flow for a memset builtin > which was in a block with an execution count of 1. Some of the blocks > got frequencies less than half the original block, so the count was > rounded down or truncated to 0. I noticed that in this case (as well > as the jump threading case I fixed by looking for non-zero incoming > edges in partitioning) that the bb frequency was non-zero. > > Why not just have probably_never_executed_bb_p return simply return > false bb->frequency is non-zero (right now it does the opposite - > returns true when bb->frequency is 0)? Making this change removed a > bunch of other fail
[GOOGLE] Disable aggressive loop peeling to prevent code bloat.
This patch disables aggressive loop peeling when profile is available. This prevents extensive code bloat which leads to increased i-cache misses. Bootstrapped and passed regression tests. OK for google-4_8? Thanks, Dehao Index: gcc/loop-unroll.c === --- gcc/loop-unroll.c (revision 202926) +++ gcc/loop-unroll.c (working copy) @@ -1574,8 +1574,7 @@ decide_peel_simple (struct loop *loop, int flags) peeling it is not the case. Also a function call inside loop is also branch from branch prediction POV (and probably better reason to not unroll/peel). */ - if (desc->num_branches > 1 - && profile_status != PROFILE_READ) + if (desc->num_branches > 1) { if (dump_file) fprintf (dump_file, ";; Not peeling, contains branches\n");
[google gcc-4_8] Applied partial backport of r202818, r202832 and r202836.
Greetings, I committed partial backport of r202818, r202832 and r202836 to google/gcc-4_8 branch as r202927. 2013-09-25 Paul Pluzhnikov * libstdc++-v3/config/abi/pre/gnu.ver: Add _ZSt24__throw_out_of_range_fmtPKcz * libstdc++-v3/src/c++11/Makefile.am: Add snprintf_lite. * libstdc++-v3/src/c++11/Makefile.in: Regenerate. * libstdc++-v3/src/c++11/snprintf_lite.cc: New. * libstdc++-v3/src/c++11/functexcept.cc (__throw_out_of_range_fmt): New. Index: libstdc++-v3/config/abi/pre/gnu.ver === --- libstdc++-v3/config/abi/pre/gnu.ver (revision 202926) +++ libstdc++-v3/config/abi/pre/gnu.ver (working copy) @@ -1357,6 +1357,13 @@ } GLIBCXX_3.4.18; +GLIBCXX_3.4.20 { + + # std::__throw_out_of_range_fmt(char const*, ...) + _ZSt24__throw_out_of_range_fmtPKcz; + +} GLIBCXX_3.4.19; + # Symbols in the support library (libsupc++) have their own tag. CXXABI_1.3 { Index: libstdc++-v3/src/c++11/Makefile.am === --- libstdc++-v3/src/c++11/Makefile.am (revision 202926) +++ libstdc++-v3/src/c++11/Makefile.am (working copy) @@ -42,6 +42,7 @@ random.cc \ regex.cc \ shared_ptr.cc \ + snprintf_lite.cc \ system_error.cc \ thread.cc Index: libstdc++-v3/src/c++11/functexcept.cc === --- libstdc++-v3/src/c++11/functexcept.cc (revision 202926) +++ libstdc++-v3/src/c++11/functexcept.cc (working copy) @@ -31,6 +31,7 @@ #include #include #include +#include #ifdef _GLIBCXX_USE_NLS # include @@ -39,6 +40,12 @@ # define _(msgid) (msgid) #endif +namespace __gnu_cxx +{ + int __snprintf_lite(char *__buf, size_t __bufsize, const char *__fmt, + va_list __ap); +} + namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -80,6 +87,22 @@ { _GLIBCXX_THROW_OR_ABORT(out_of_range(_(__s))); } void + __throw_out_of_range_fmt(const char* __fmt, ...) + { +const size_t __len = __builtin_strlen(__fmt); +// We expect at most 2 numbers, and 1 short string. The additional +// 512 bytes should provide more than enough space for expansion. +const size_t __alloca_size = __len + 512; +char *const __s = static_cast(__builtin_alloca(__alloca_size)); +va_list __ap; + +va_start(__ap, __fmt); +__gnu_cxx::__snprintf_lite(__s, __alloca_size, __fmt, __ap); +_GLIBCXX_THROW_OR_ABORT(out_of_range(_(__s))); +va_end(__ap); // Not reached. + } + + void __throw_runtime_error(const char* __s __attribute__((unused))) { _GLIBCXX_THROW_OR_ABORT(runtime_error(_(__s))); } Index: libstdc++-v3/src/c++11/Makefile.in === --- libstdc++-v3/src/c++11/Makefile.in (revision 202926) +++ libstdc++-v3/src/c++11/Makefile.in (working copy) @@ -70,7 +70,8 @@ am__objects_1 = chrono.lo condition_variable.lo debug.lo \ functexcept.lo functional.lo future.lo hash_c++0x.lo \ hashtable_c++0x.lo limits.lo mutex.lo placeholders.lo \ - random.lo regex.lo shared_ptr.lo system_error.lo thread.lo + random.lo regex.lo shared_ptr.lo snprintf_lite.lo \ + system_error.lo thread.lo @ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_2 = fstream-inst.lo \ @ENABLE_EXTERN_TEMPLATE_TRUE@ string-inst.lo wstring-inst.lo am_libc__11convenience_la_OBJECTS = $(am__objects_1) $(am__objects_2) @@ -319,6 +320,7 @@ random.cc \ regex.cc \ shared_ptr.cc \ + snprintf_lite.cc \ system_error.cc \ thread.cc Index: libstdc++-v3/src/c++11/snprintf_lite.cc === --- libstdc++-v3/src/c++11/snprintf_lite.cc (revision 0) +++ libstdc++-v3/src/c++11/snprintf_lite.cc (revision 0) @@ -0,0 +1,159 @@ +// Debugging support -*- C++ -*- + +// Copyright (C) 2013 Free Software Foundation, Inc. +// +// This file is part of GCC. +// +// GCC is free software; you can redistribute it and/or modify +// it under the terms of the GNU General Public License as published by +// the Free Software Foundation; either version 3, or (at your option) +// any later version. +// +// GCC is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. +// +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +//
Re: [GOOGLE] Disable aggressive loop peeling to prevent code bloat.
I wish there is better heuristic in the future. For now it is ok. David On Wed, Sep 25, 2013 at 2:48 PM, Dehao Chen wrote: > This patch disables aggressive loop peeling when profile is available. > This prevents extensive code bloat which leads to increased i-cache > misses. > > Bootstrapped and passed regression tests. > > OK for google-4_8? > > Thanks, > Dehao > > Index: gcc/loop-unroll.c > === > --- gcc/loop-unroll.c (revision 202926) > +++ gcc/loop-unroll.c (working copy) > @@ -1574,8 +1574,7 @@ decide_peel_simple (struct loop *loop, int flags) > peeling it is not the case. Also a function call inside loop is > also branch from branch prediction POV (and probably better reason > to not unroll/peel). */ > - if (desc->num_branches > 1 > - && profile_status != PROFILE_READ) > + if (desc->num_branches > 1) > { >if (dump_file) > fprintf (dump_file, ";; Not peeling, contains branches\n");
cost model patch
I took the liberty to pick up Richard's original fvect-cost-model patch and made some modification. What has not changed: 1) option -ftree-vect-loop-version is removed; 2) three cost models are introduced: cheap, dynamic, and unlimited; 3) unless explicitly specified, cheap model is the default at O2 (e.g. when -ftree-loop-vectorize is used with -O2), and dynamic mode is the default for O3 and FDO 4) alignment based versioning is disabled with cheap model. What has changed: 1) peeling is also disabled with cheap model; 2) alias check condition limit is reduced with cheap model, but not completely suppressed. Runtime alias check is a pretty important enabler. 3) tree if conversion changes are not included. Does this patch look reasonable? thanks, David Index: tree-vectorizer.h === --- tree-vectorizer.h (revision 202926) +++ tree-vectorizer.h (working copy) @@ -880,6 +880,14 @@ known_alignment_for_access_p (struct dat return (DR_MISALIGNMENT (data_ref_info) != -1); } + +/* Return true if the vect cost model is unlimited. */ +static inline bool +unlimited_cost_model () +{ + return flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED; +} + /* Source location */ extern LOC vect_location; Index: flag-types.h === --- flag-types.h(revision 202926) +++ flag-types.h(working copy) @@ -191,6 +191,15 @@ enum fp_contract_mode { FP_CONTRACT_FAST = 2 }; +/* Vectorizer cost-model. */ +enum vect_cost_model { + VECT_COST_MODEL_UNLIMITED = 0, + VECT_COST_MODEL_CHEAP = 1, + VECT_COST_MODEL_DYNAMIC = 2, + VECT_COST_MODEL_DEFAULT = 3 +}; + + /* Different instrumentation modes. */ enum sanitize_code { /* AddressSanitizer. */ Index: targhooks.c === --- targhooks.c (revision 202926) +++ targhooks.c (working copy) @@ -1057,20 +1057,17 @@ default_add_stmt_cost (void *data, int c unsigned *cost = (unsigned *) data; unsigned retval = 0; - if (flag_vect_cost_model) -{ - tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; - int stmt_cost = default_builtin_vectorization_cost (kind, vectype, + tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; + int stmt_cost = default_builtin_vectorization_cost (kind, vectype, misalign); - /* Statements in an inner loop relative to the loop being -vectorized are weighted more heavily. The value here is -arbitrary and could potentially be improved with analysis. */ - if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info)) - count *= 50; /* FIXME. */ + /* Statements in an inner loop relative to the loop being + vectorized are weighted more heavily. The value here is + arbitrary and could potentially be improved with analysis. */ + if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info)) +count *= 50; /* FIXME. */ - retval = (unsigned) (count * stmt_cost); - cost[where] += retval; -} + retval = (unsigned) (count * stmt_cost); + cost[where] += retval; return retval; } Index: common.opt === --- common.opt (revision 202926) +++ common.opt (working copy) @@ -2278,13 +2278,33 @@ ftree-slp-vectorize Common Report Var(flag_tree_slp_vectorize) Optimization Enable basic block vectorization (SLP) on trees +fvect-cost-model= +Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT) +Specifies the cost model for vectorization + +Enum +Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown vectorizer cost model %qs) + +EnumValue +Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED) + +EnumValue +Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC) + +EnumValue +Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP) + fvect-cost-model -Common Report Var(flag_vect_cost_model) Optimization -Enable use of cost model in vectorization +Common RejectNegative Alias(fvect-cost-model=,dynamic) +Enables the dynamic vectorizer cost model. Preserved for backward compatibility. + +fno-vect-cost-model +Common RejectNegative Alias(fvect-cost-model=,unlimited) +Enables the unlimited vectorizer cost model. Preserved for backward compatibility. ftree-vect-loop-version -Common Report Var(flag_tree_vect_loop_version) Init(1) Optimization -Enable loop versioning when doing loop vectorization on trees +Common Ignore +Does nothing. Preserved for backward compatibility. ftree-scev-cprop Common Report Var(flag_tree_scev_cprop) Init(1) Optimization Index: opts.c === --- opts.c (revision 202926) +++ opts.c (working copy) @@
Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3
On Tue, Sep 24, 2013 at 4:33 PM, Michael Meissner wrote: > This patch adds the initial support for putting DI, DF, and SF values in the > upper registers (traditional Altivec registers) using the -mupper-regs-df and > -mupper-regs-sf patches. Those switches will not be enabled by default until > the rest of the changes are made. This patch passes the bootstrap test and > make check test. I tested all of the targets I tested previously (power4-8, > G4/G5, SPE, cell, e5500/e5600, and paired floating point), and all machines > generate the same code. Is it ok to install this patch? > > [gcc] > 2013-09-24 Michael Meissner > > * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Allow > DFmode, DImode, and SFmode in the upper VSX registers based on the > -mupper-regs-{df,sf} flags. Fix wu constraint to be ALTIVEC_REGS > if -mpower8-vector. Combine -mvsx-timode handling with the rest > of the VSX register handling. > > * config/rs6000/rs6000.md (f32_lv): Use %x0 for VSX regsters. > (f32_sv): Likewise. > (zero_extendsidi2_lfiwzx): Add support for loading into the > Altivec registers with -mpower8-vector. Use wu/wv constraints to > only do VSX memory options on Altivec registers. > (extendsidi2_lfiwax): Likewise. > (extendsfdf2_fpr): Likewise. > (mov_hardfloat, SF/SD modes): Likewise. > (mov_hardfloat32, DF/DD modes): Likewise. > (mov_hardfloat64, DF/DD modes): Likewise. > (movdi_internal64): Likewise. > > [gcc/testsuite] > 2013-09-24 Michael Meissner > > * gcc.target/powerpc/p8vector-ldst.c: New test for -mupper-regs-sf > and -mupper-regs-df. Okay. Thanks, David
[PATCH] Make jump thread path carry more information
This patch conveys information about the blocks/edges in a jump threading path. Of particular interest is information about the source block in each edge of the path -- the nature of the block determines our copy strategy (empty -- no copy, normal copy, joiner copy). Rather than rediscover the nature of those blocks during the CFG/SSA graph updating, it's easier to just attach the information to each edge in the path. That's precisely what this patch does. The SSA/CFG updating code isn't making much use of this yet, that's a follow-on change. This change is strictly designed to get the information into the updating code. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Installed on trunk. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 5adbaeb..8871aca 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,26 @@ +2013-09-25 Jeff Law + + * tree-flow.h (thread_through_all_blocks): Prototype moved into + tree-ssa-threadupdate.h. + (register_jump_thread): Similarly. + * tree-ssa-threadupdate.h: New header file. + * tree-ssa-dom.c: Include tree-ssa-threadupdate.h. + * tree-vrp.c: Likewise. + * tree-ssa-threadedge.c: Include tree-ssa-threadupdate.h. + (thread_around_empty_blocks): Change type of path vector argument to + an edge,type pair from just an edge. Initialize both elements when + appending to a jump threading path. Tweak references to elements + appropriately. + (thread_across_edge): Similarly. Release memory for the elements + as needed. + * tree-ssa-threadupdate.c: Include tree-ssa-threadupdate.h. + (dump_jump_thread_path): New function broken out from + register_jump_thread. + (register_jump_thread): Use dump_jump_thread_path. Change type of + path vector entries. Search the path for NULL edges and dump + the path if one is found. Tweak the conversion of path to 3-edge + form to use the block copy type information embedded in the path. + 2013-09-25 Yvan Roux * lra.c (update_inc_notes): Remove all REG_DEAD and REG_UNUSED notes. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 634e747..a264056 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,6 @@ +2013-09-25 Jeff Law + + * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Update expected output. 2013-09-25 Tobias Burnus PR fortran/58436 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c index d2a1fbb..222a97b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c @@ -43,7 +43,6 @@ expand_one_var (tree var, unsigned char toplevel, unsigned char really_expand) } /* We should thread the jump, through an intermediate block. */ /* { dg-final { scan-tree-dump-times "Threaded" 1 "dom1"} } */ -/* { dg-final { scan-tree-dump-times "Registering jump thread .through joiner block.: \\(.*\\); \\(.*\\); \\(.*\\);" 1 "dom1"} } */ - +/* { dg-final { scan-tree-dump-times "Registering jump thread: \\(.*\\) incoming edge; \\(.*\\) joiner; \\(.*\\) nocopy;" 1 "dom1"} } */ /* { dg-final { cleanup-tree-dump "dom1" } } */ diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h index 2f64abc..ee69179 100644 --- a/gcc/tree-flow.h +++ b/gcc/tree-flow.h @@ -641,10 +641,6 @@ bool multiplier_allowed_in_address_p (HOST_WIDE_INT, enum machine_mode, addr_space_t); bool may_be_nonaddressable_p (tree expr); -/* In tree-ssa-threadupdate.c. */ -extern bool thread_through_all_blocks (bool); -extern void register_jump_thread (vec, bool); - /* In gimplify.c */ tree force_gimple_operand_1 (tree, gimple_seq *, gimple_predicate, tree); tree force_gimple_operand (tree, gimple_seq *, bool, tree); diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index f0cc0ee..81119c3 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "domwalk.h" #include "tree-pass.h" #include "tree-ssa-propagate.h" +#include "tree-ssa-threadupdate.h" #include "langhooks.h" #include "params.h" diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 2ca56342..467d982 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3. If not see #include "dumpfile.h" #include "tree-ssa.h" #include "tree-ssa-propagate.h" +#include "tree-ssa-threadupdate.h" #include "langhooks.h" #include "params.h" @@ -753,7 +754,7 @@ thread_around_empty_blocks (edge taken_edge, bool handle_dominating_asserts, tree (*simplify) (gimple, gimple), bitmap visited, - vec *path) + vec *path) { basic_block bb = taken_edge->dest
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Wed, Sep 25, 2013 at 2:33 PM, Teresa Johnson wrote: > On Tue, Sep 24, 2013 at 11:25 AM, Teresa Johnson wrote: >> On Tue, Sep 24, 2013 at 10:57 AM, Jan Hubicka wrote: I looked at one that failed after 100 as well (20031204-1.c). In this case, it was due to expansion which was creating multiple branches/bbs from a logical OR and guessing incorrectly on how to assign the counts: if (octets == 4 && (*cp == ':' || *cp == '\0')) { The (*cp == ':' || *cp == '\0') part looked like the following going into RTL expansion: [20031204-1.c : 31:33] _29 = _28 == 58; [20031204-1.c : 31:33] _30 = _28 == 0; [20031204-1.c : 31:33] _31 = _29 | _30; [20031204-1.c : 31:18] if (_31 != 0) goto ; else goto ; where the result of the OR was always true, so bb 16 had a count of 100 and bb 19 a count of 0. When it was expanded, the expanded version of the above turned into 2 bbs with a branch in between. Both comparisons were done in the first bb, but the first bb checked whether the result of the *cp == '\0' compare was true, and if not branched to the check for whether the *cp == ':' compare was true. It gave the branch to the second check against ':' a count of 0, so that bb got a count of 0 and was split out, and put the count of 100 on the fall through assuming the compare with '\0' always evaluated to true. In reality, this OR condition was always true because *cp was ':', not '\0'. Therefore, the count of 0 on the second block with the check for ':' was incorrect, we ended up trying to execute it, and failed. >>> >>> I see, we produce: >>> ;; if (_26 != 0) >>> >>> (insn 94 93 95 (set (reg:CCZ 17 flags) >>> (compare:CCZ (reg:QI 107 [ D.2184 ]) >>> (const_int 0 [0]))) a.c:31 -1 >>> (nil)) >>> >>> (insn 95 94 96 (set (reg:QI 122 [ D.2186 ]) >>> (eq:QI (reg:CCZ 17 flags) >>> (const_int 0 [0]))) a.c:31 -1 >>> (nil)) >>> >>> (insn 96 95 97 (set (reg:CCZ 17 flags) >>> (compare:CCZ (reg:QI 122 [ D.2186 ]) >>> (const_int 0 [0]))) a.c:31 -1 >>> (nil)) >>> >>> (jump_insn 97 96 98 (set (pc) >>> (if_then_else (ne (reg:CCZ 17 flags) >>> (const_int 0 [0])) >>> (label_ref 100) >>> (pc))) a.c:31 -1 >>> (expr_list:REG_BR_PROB (const_int 6100 [0x17d4]) >>> (nil))) >>> >>> (insn 98 97 99 (set (reg:CCZ 17 flags) >>> (compare:CCZ (reg:QI 108 [ D.2186 ]) >>> (const_int 0 [0]))) a.c:31 -1 >>> (nil)) >>> >>> (jump_insn 99 98 100 (set (pc) >>> (if_then_else (eq (reg:CCZ 17 flags) >>> (const_int 0 [0])) >>> (label_ref 0) >>> (pc))) a.c:31 -1 >>> (expr_list:REG_BR_PROB (const_int 3900 [0xf3c]) >>> (nil))) >>> >>> (code_label 100 99 0 14 "" [0 uses]) >>> >>> That is because we TER together "_26 = _25 | _24" and "if (_26 != 0)" >>> >>> First I think the logic of do_jump should really be moved to trees. It is >>> not >>> doing things that can not be adequately represented by gimple. >>> >>> I am not that certain we want to move it before profiling though. Presumably we had the correct profile data for both blocks, but the accuracy was reduced when the OR was represented as a logical computation with a single branch. We could change the expansion code to do something different, e.g. treat as a 50-50 branch. But we would still end up with integer truncation issues when there was a single training run. But that could be dealt with conservatively in the >>> >>> Yep, but it is still better than what we have now - if the test above was >>> in hot part of program (i.e. not executed once), we will end up optimizing >>> the second conditional for size. >>> >>> So I think it is do_jump bug to not distribute probabilities across the two >>> conditoinals introduced. bbpart code as I suggested for the jump threading issue above. I.e. a cold block with incoming non-cold edges conservatively not marked cold for splitting. >>> >>> Yep, we can probably do that, but we ought to fix the individual cases >>> above at least for resonable number of runs. >> >> I made this change and it removed a few of the failures. >> >> I looked at another case that still failed with 1 train run but passed >> with 100. It turned out to be another truncation issue exposed by RTL >> expansion, where we created some control flow for a memset builtin >> which was in a block with an execution count of 1. Some of the blocks >> got frequencies less than half the original block, so the count was >> rounded down or truncated to 0. I noticed that in this case (as well >> as the jump threading case I fixed by looking for non-zero incoming >> edges in partitioning) that the bb frequency was non-zero. >> >> Why not just have probably_never_executed_bb_p return s