date:20130925

Re: [v3] plus

2013-09-25 Thread Jonathan Wakely

On 25 September 2013 06:41, Marc Glisse wrote:
> On Wed, 25 Sep 2013, Jonathan Wakely wrote:
>
>> I've had this sitting in my tree waiting to do something with,
>
>
> I did ask last week if someone had done it already...

Sorry :-\

>
>> I'm about to go to sleep so didn't check if the test covers anything yours
>> doesn't.
>
>
> In the test you have basic cover for all functors, and I cover only 2 cases
> (more specifically though, since I look at the return type cv-qual).
>
> In my patch, I added constexpr and noexcept, I couldn't see any reason not
> to for such basic utilities. Yes, I did read the wiki and noticed the vote
> yesterday about constexpr, but imho that's wrong.
>
>
>> It looks like your patch adds the default template argument even in
>> C++98 mode, I avoided that by putting forward declarations at the top
>> of the file, in a #if block.
>
>
> This only lets me write:
> std::plus<> *p;
> in C++98 mode (std::plus<> p; gives an error), doesn't seem that bad.
>
> And I also add the void specializations in C++11 mode, as an extension.
>
> Well, let's forget my patch and go with yours, though at least adding
> noexcept seems like a good idea.

Yes, noexcept is a good idea.

> (too bad we don't have noexcept(auto) here)
> (too bad we can't use decltype(auto) for the return type, as that would
> disable sfinae, it is a bad sign when the standard turns its nose up at its
> own dog food...)

Yeah, I think the idea is that concepts will make SFINAE a thing of
the past, but we're not there yet.

Re: [PATCH] Don't always instrument shifts (PR sanitizer/58413)

2013-09-25 Thread Marek Polacek

Ping.

On Tue, Sep 17, 2013 at 12:32:03PM +0200, Marek Polacek wrote:
> On Mon, Sep 16, 2013 at 08:35:35PM +0200, Jakub Jelinek wrote:
> > On Fri, Sep 13, 2013 at 08:01:36PM +0200, Marek Polacek wrote:
> > I'd say the above is going to be a maintainance nightmare, with all the code
> > duplication.  And you are certainly going to miss cases that way,
> > e.g.
> > void
> > foo (void)
> > {
> >   int A[-2 / -1] = {};
> > }
> > 
> > I'd say instead of adding all this, you should just at the right spot insert
> > if (integer_zerop (t)) return NULL_TREE; or similar.
> > 
> > For shift instrumentation, I guess you could add
> > if (integer_zerop (t) && (tt == NULL_TREE || integer_zerop (tt)))
> >   return NULL_TREE;
> > right before:
> >   t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), op0, t);
> 
> Yeah, this is _much_ better.  I'm glad we can live without that
> ugliness.
> 
> > > +/* PR sanitizer/58413 */
> > > +/* { dg-do run } */
> > > +/* { dg-options "-fsanitize=shift -w" } */
> > > +
> > > +int x = 7;
> > > +int
> > > +main (void)
> > > +{
> > > +  /* All of the following should pass.  */
> > > +  int A[128 >> 5] = {};
> > > +  int B[128 << 5] = {};
> > > +
> > > +  static int e =
> > > +((int)
> > > + (0x | ((31 & ((1 << (4)) - 1)) << (((15) + 6) + 4)) |
> > > +  ((0) << ((15) + 6)) | ((0) << (15;
> > 
> > This relies on int32plus, so needs to be /* { dg-do run { target int32plus 
> > } } */
> 
> Fixed.
> 
> > > --- gcc/testsuite/c-c++-common/ubsan/shift-5.c.mp32013-09-13 
> > > 18:25:06.195847144 +0200
> > > +++ gcc/testsuite/c-c++-common/ubsan/shift-5.c2013-09-13 
> > > 19:16:38.990211229 +0200
> > > @@ -0,0 +1,21 @@
> > > +/* { dg-do compile} */
> > > +/* { dg-options "-fsanitize=shift -w" } */
> > > +/* { dg-shouldfail "ubsan" } */
> > > +
> > > +int x;
> > > +int
> > > +main (void)
> > > +{
> > > +  /* None of the following should pass.  */
> > > +  switch (x)
> > > +{
> > > +case 1 >> -1:/* { dg-error "" } */
> > > +case -1 >> -1:   /* { dg-error "" } */
> > > +case 1 << -1:/* { dg-error "" } */
> > > +case -1 << -1:   /* { dg-error "" } */
> > > +case -1 >> 200:  /* { dg-error "" } */
> > > +case 1 << 200:   /* { dg-error "" } */
> > 
> > Can't you fill in the error you are expecting, or is the problem
> > that the wording is very different between C and C++?
> 
> I discovered { target c } stuff, so I filled in both error messages.
> 
> This patch seems to work: bootstrap-ubsan passes + ubsan testsuite
> passes too.  Ok for trunk?
> 
> 2013-09-17  Marek Polacek  
>   Jakub Jelinek  
> 
>   PR sanitizer/58413
> c-family/
>   * c-ubsan.c (ubsan_instrument_shift): Don't instrument
>   an expression if we can prove it is correct.
>   (ubsan_instrument_division): Likewise.  Remove unnecessary
>   check.
> 
> testsuite/
>   * c-c++-common/ubsan/shift-4.c: New test.
>   * c-c++-common/ubsan/shift-5.c: New test.
>   * c-c++-common/ubsan/div-by-zero-5.c: New test.
>   * gcc.dg/ubsan/c-shift-1.c: New test.
> 
> --- gcc/c-family/c-ubsan.c.mp 2013-09-17 12:24:44.582835840 +0200
> +++ gcc/c-family/c-ubsan.c2013-09-17 12:24:48.772849823 +0200
> @@ -51,14 +51,6 @@ ubsan_instrument_division (location_t lo
>if (TREE_CODE (type) != INTEGER_TYPE)
>  return NULL_TREE;
>  
> -  /* If we *know* that the divisor is not -1 or 0, we don't have to
> - instrument this expression.
> - ??? We could use decl_constant_value to cover up more cases.  */
> -  if (TREE_CODE (op1) == INTEGER_CST
> -  && integer_nonzerop (op1)
> -  && !integer_minus_onep (op1))
> -return NULL_TREE;
> -
>t = fold_build2 (EQ_EXPR, boolean_type_node,
>   op1, build_int_cst (type, 0));
>  
> @@ -74,6 +66,11 @@ ubsan_instrument_division (location_t lo
>t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, x);
>  }
>  
> +  /* If the condition was folded to 0, no need to instrument
> + this expression.  */
> +  if (integer_zerop (t))
> +return NULL_TREE;
> +
>/* In case we have a SAVE_EXPR in a conditional context, we need to
>   make sure it gets evaluated before the condition.  */
>t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), op0, t);
> @@ -138,6 +135,11 @@ ubsan_instrument_shift (location_t loc,
>tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, x, tt);
>  }
>  
> +  /* If the condition was folded to 0, no need to instrument
> + this expression.  */
> +  if (integer_zerop (t) && (tt == NULL_TREE || integer_zerop (tt)))
> +return NULL_TREE;
> +
>/* In case we have a SAVE_EXPR in a conditional context, we need to
>   make sure it gets evaluated before the condition.  */
>t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), op0, t);
> --- gcc/testsuite/c-c++-common/ubsan/shift-4.c.mp 2013-09-17 
> 12:25:12.130931875 +0200
> +++ gcc/testsuite/c-c++-common/ubsan/shift-4.c2013-09-17 
> 10:19:44.66

Re: [PATCH] Handle IDENTIFIER_NODEs in ubsan.c (PR sanitizer/58420)

2013-09-25 Thread Marek Polacek

Ping.

On Mon, Sep 16, 2013 at 05:49:55PM +0200, Marek Polacek wrote:
> This patch amends the chunk of code where we are determining the
> type name; I haven't consider that IDENTIFIER_NODEs require special
> handling, since we need to omit the DECL_NAME.  I had something similar
> in http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00917.html patch, but
> there I had a thinko: I need to check that TYPE_NAME is non-NULL first.
> 
> Regtested/ran bootstrap-ubsan on x86_64-linux.
> 
> Ok for trunk?
> 
> 2013-09-16  Marek Polacek  
> 
>   PR sanitizer/58420
>   * ubsan.c (ubsan_type_descriptor): Handle IDENTIFIER_NODEs
>   when determining the type name.
> 
> --- gcc/ubsan.c.mp2013-09-16 14:22:07.195918175 +0200
> +++ gcc/ubsan.c   2013-09-16 14:22:10.503929477 +0200
> @@ -260,11 +260,18 @@ ubsan_type_descriptor (tree type)
>unsigned short tkind, tinfo;
>  
>/* At least for INTEGER_TYPE/REAL_TYPE/COMPLEX_TYPE, this should work.
> - ??? For e.g. type_unsigned_for (type), the TYPE_NAME would be NULL.  */
> + For e.g. type_unsigned_for (type) or bit-fields, the TYPE_NAME
> + would be NULL.  */
>if (TYPE_NAME (type) != NULL)
> -tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)));
> +{
> +  if (TREE_CODE (TYPE_NAME (type)) == IDENTIFIER_NODE)
> + tname = IDENTIFIER_POINTER (TYPE_NAME (type));
> +  else
> + tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)));
> +}
>else
>  tname = "";
> +
>if (TREE_CODE (type) == INTEGER_TYPE)
>  {
>/* For INTEGER_TYPE, this is 0x.  */
> 
>   Marek

Marek

Re: [PATCH] Don't always instrument shifts (PR sanitizer/58413)

2013-09-25 Thread Jakub Jelinek

On Wed, Sep 25, 2013 at 10:34:42AM +0200, Marek Polacek wrote:
> > 2013-09-17  Marek Polacek  
> > Jakub Jelinek  
> > 
> > PR sanitizer/58413
> > c-family/
> > * c-ubsan.c (ubsan_instrument_shift): Don't instrument
> > an expression if we can prove it is correct.
> > (ubsan_instrument_division): Likewise.  Remove unnecessary
> > check.
> > 
> > testsuite/
> > * c-c++-common/ubsan/shift-4.c: New test.
> > * c-c++-common/ubsan/shift-5.c: New test.
> > * c-c++-common/ubsan/div-by-zero-5.c: New test.
> > * gcc.dg/ubsan/c-shift-1.c: New test.

Ok.

Jakub

Re: [PATCH] Handle IDENTIFIER_NODEs in ubsan.c (PR sanitizer/58420)

2013-09-25 Thread Jakub Jelinek

On Wed, Sep 25, 2013 at 10:35:40AM +0200, Marek Polacek wrote:
> > 2013-09-16  Marek Polacek  
> > 
> > PR sanitizer/58420
> > * ubsan.c (ubsan_type_descriptor): Handle IDENTIFIER_NODEs
> > when determining the type name.

Ok.

Jakub

Re: [patch] Separate immediate uses and phi routines from tree-flow*.h

2013-09-25 Thread Richard Biener

On Tue, Sep 24, 2013 at 4:39 PM, Andrew MacLeod  wrote:
> This larger patch moves all the immediate use and operand routines from
> tree-flow.h into tree-ssa-operands.h.
> It also moves the basic phi routines and prototypes into a newly created
> tree-phinodes.h, or tree-ssa-operands.h if they belong there.
> And finally shuffles a couple of other routines which allows
> tree-ssa-operands.h to be removed from the gimple.h header file.
>
> of note or interest:
>
> 1 - dump_decl_set() was defined in tree-into-ssa.c, but isn't really ssa
> specific. Its tree-specific, so normally I'd throw it into tree.c.  Looking
> forward a little, its only used in a gimple context, so when we map to
> gimple_types it will need to be converted to/created for those. If it is in
> tree.c, I'll have to create a new version for gimple types, and then the
> routine in tree.c will become unused.  Based on that, I figured gimple.c is
> the place place for it.
>
> 2 - has_zero_uses_1() and single_imm_use_1() were both in tree-cfg.c for
> some reason.. they've been moved to tree-ssa-operands.c
>
> 3 - a few routines seem like basic gimple routines, but really turn out to
> require the operand infrastructure to implement... so they are moved to
> tree-ssa-operands.[ch] as well.  This sort of thing showed up when removing
> tree-ssa-operands.h from the gimple.h include file.  These were things like
> gimple_vuse_op, gimple_vdef_op, update_stmt, and update_stmt_if_modified

Note that things like gimple_vuse_op are on the interface border between
gimple (where the SSA operands are stored) and SSA operands.  So
it's not so clear for them given they access internal gimple fields directly
but use the regular SSA operand API.

I'd prefer gimple_vuse_op and gimple_vdef_op to stay in gimple.[ch].

If you want to remove the tree-ssa-operands.h include from gimple.h
(in some way makes sense), then we need a new gimple-ssa.[ch]
for routines that operate on the "gimple in SSA" IL (not purely on
gimple, which could be not in SSA form and not purely on the SSA
web which could in theory be constructed over a non-GIMPLE representation).

Actually I like the idea of gimple-ssa.[ch].  tree-ssa-operands.[ch]
would then be solely the place for the operand infrastructure internals
implementation, not a place for generic SSA utility routines related
to SSA operands.

Moving update_stmt/update_stmt_if_modified is a similar case but
maybe less obvious (unless we have gimple-ssa.[ch]).

What do you think?

> 4 - gimple_phi_arg_def()  was oddly defined:
>
> static inline tree
> gimple_phi_arg_def (gimple gs, size_t index)
> {
>   struct phi_arg_d *pd = gimple_phi_arg (gs, index);
>   return get_use_from_ptr (&pd->imm_use);
> }
>
> /* Return a pointer to the tree operand for argument I of PHI node GS.  */
>
> static inline tree *
> gimple_phi_arg_def_ptr (gimple gs, size_t index)
> {
>   return &gimple_phi_arg (gs, index)->def;
> }
>
>
> Im not sure why it goes through the immediate use routines... it should be
> the same as '* gimple_phi_arg_def_ptr(x,y)' ...   And by using immediate use
> routine, it couldn't be put in tree-phinodes like it belong.   I changed it
> to be simply
>return gimple_phi_arg (gs, index)->def;
> and bootstrapped that change, along with an assert that it was the same
> value as the immediate_use method.   everything worked as expected. old wart
> I guess.

Nice cleanup.

> I didn't bother with a Makefile.in change since its only dependencies
> change, and Tromey's pending auto-dependency patch would just delete those
> anyway.
>
> This bootstraps on  x86_64-unknown-linux-gnu and has no new regressions.
> OK?

(swap_tree_operands): Move prototype to tree-ssa-operands.h.

rename to swap_ssa_operands and remove the !ssa_operands_active
path?  (should be no longer needed I think)

(gimple_phi_arg_def, gimple_phi_arg_def_ptr, gimple_phi_arg_edge,
gimple_phi_arg_location, gimple_phi_arg_location_from_edge,
gimple_phi_arg_set_location, gimple_phi_arg_has_location, phi_nodes,
phi_nodes_ptr, set_phi_nodes, phi_ssa_name_p): Move to tree-phinodes.h.

I always found the separation of PHI node API from gimple API odd ...
now, PHI nodes are a SSA speciality.  Unless the gimple_phi class
moves to tree-phinodes.h the same comments as to gimple_vuse_op
apply.  But maybe everything will be more obvious once we separate
the core gimple data structure from the gimple API stuff ...

Overall the patch looks ok, either massage it according to my comments
above (thinking about gimple-ssa.[ch]) or follow up with a separate patch
(not that moving things twice will hurt).

Thanks,
Richard.

> Andrew

Re: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-09-25 Thread Richard Biener

On Tue, Sep 24, 2013 at 8:06 PM, Martin Jambor  wrote:
> Hi,
>
> On Tue, Sep 24, 2013 at 12:02:17PM +0200, Richard Biener wrote:
>> On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger
>>  wrote:
>> > Hi,
>> >
>> > with the attached patch the read-side of the out of bounds accesses are 
>> > fixed.
>> > There is a single new test case pr57748-3.c that is derived from Martin's 
>> > test case.
>> > The test case does only test the read access and does not depend on part 1 
>> > of the
>> > patch.
>> >
>> > This patch was boot-strapped and regression tested on 
>> > x86_64-unknown-linux-gnu.
>> >
>> > Additionally I generated eCos and an eCos-application (on ARMv5 using 
>> > packed
>> > structures) with an arm-eabi cross compiler, and looked for differences in 
>> > the
>> > disassembled code with and without this patch, but there were none.
>> >
>> > OK for trunk?
>>
>> Index: gcc/expr.c
>> ===
>> --- gcc/expr.c  (revision 202778)
>> +++ gcc/expr.c  (working copy)
>> @@ -9878,7 +9878,7 @@
>>   && modifier != EXPAND_STACK_PARM
>>   ? target : NULL_RTX),
>>  VOIDmode,
>> -modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier);
>> +EXPAND_MEMORY);
>>
>> /* If the bitfield is volatile, we want to access it in the
>>field's mode, not the computed mode.
>>
>> context suggests that we may arrive with EXPAND_STACK_PARM here
>> which is a correctness modifier (see its docs).  But I'm not too familiar
>> with the details of the various expand modifiers, Eric may be though.
>>
>> That said, I still believe that fixing the misalign path in expand_assignment
>> would be better than trying to avoid it.  For this testcase the issue is
>> again that expand_assignment passes the wrong mode/target to the
>> movmisalign optab.
>
> Perhaps I'm stating the obvious, but this is supposed to address a
> separate bug in the expansion of the RHS (as opposed to the first bug
> which was in the expansion of the LHS), and so we would have to make
> expand_assignment to also examine potential misalignments in the RHS,
> which it so far does not do, despite its complexity.
>
> Having said that, I also dislike the patch, but I am quite convinced
> that if we allow non-BLK structures with zero sized arrays, the fix
> will be ugly - but I'll be glad to be shown I've been wrong.

I don't think the issues have anything to do with zero sized arrays
or non-BLK structures.  The issue is just we are feeding movmisaling
with the wrong mode (the mode of the base object) if we are expanding
a base object which is unaligned and has non-BLK mode.

So what we maybe need is another expand modifier that tells us
to not use movmisalign when expanding the base object.  Or we need
to stop expanding the base object with VOIDmode and instead pass
down the mode of the access (TYPE_MODE (TREE_TYPE (exp))
which will probably confuse other code.  So eventually not handling
the misaligned case by expansion of the base but inlined similar
to how it was in expand_assignment would be needed here.

Richard.

> Martin

Improve pretty printing of obj type ref.

2013-09-25 Thread Jan Hubicka

Hi,
the type of class whose method is called used to be determinable (in wrong way)
by looking up type of OBJ_TYPE_REF_OBJECT parameter.  Now we determine it from
the method type of the call and this is not printed anywhere.
This adds a "cast" of the OBj_TYPE_REF_OBJECT so i can see it easily.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* tree-pretty-print.c (dump_generic_node): Print class type of 
OBJ_TYPE_REF.
Index: tree-pretty-print.c
===
--- tree-pretty-print.c (revision 202838)
+++ tree-pretty-print.c (working copy)
@@ -2040,6 +2040,12 @@ dump_generic_node (pretty_printer *buffe
   pp_string (buffer, "OBJ_TYPE_REF(");
   dump_generic_node (buffer, OBJ_TYPE_REF_EXPR (node), spc, flags, false);
   pp_semicolon (buffer);
+  if (virtual_method_call_p (node))
+   {
+ pp_string (buffer, "(");
+ dump_generic_node (buffer, obj_type_ref_class (node), spc, flags, 
false);
+ pp_string (buffer, ")");
+   }
   dump_generic_node (buffer, OBJ_TYPE_REF_OBJECT (node), spc, flags, 
false);
   pp_arrow (buffer);
   dump_generic_node (buffer, OBJ_TYPE_REF_TOKEN (node), spc, flags, false);

[PATCH] Fix PR58521

2013-09-25 Thread Richard Biener


This hopefully fixes the bootstrap issue people were seeing.
It fixes the ICE on the file I reproduced it on with i586 bootstrap.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

I'm still reducing a testcase.

Richard.

2013-09-25  Richard Biener  

PR middle-end/58521
* tree.c (iterative_hash_expr): Remove MEM_REF special handling.

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 202885)
+++ gcc/tree.c  (working copy)
@@ -7280,21 +7280,6 @@ iterative_hash_expr (const_tree t, hashv
  }
return val;
   }
-case MEM_REF:
-  {
-   /* The type of the second operand is relevant, except for
-  its top-level qualifiers.  */
-   tree type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (t, 1)));
-
-   val = iterative_hash_object (TYPE_HASH (type), val);
-
-   /* We could use the standard hash computation from this point
-  on.  */
-   val = iterative_hash_object (code, val);
-   val = iterative_hash_expr (TREE_OPERAND (t, 1), val);
-   val = iterative_hash_expr (TREE_OPERAND (t, 0), val);
-   return val;
-  }
 case FUNCTION_DECL:
   /* When referring to a built-in FUNCTION_DECL, use the __builtin__ form.
 Otherwise nodes that compare equal according to operand_equal_p might

Re: Improve pretty printing of obj type ref.

2013-09-25 Thread Richard Biener

On Wed, 25 Sep 2013, Jan Hubicka wrote:

> Hi,
> the type of class whose method is called used to be determinable (in wrong 
> way)
> by looking up type of OBJ_TYPE_REF_OBJECT parameter.  Now we determine it from
> the method type of the call and this is not printed anywhere.
> This adds a "cast" of the OBj_TYPE_REF_OBJECT so i can see it easily.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> Honza
> 
>   * tree-pretty-print.c (dump_generic_node): Print class type of 
> OBJ_TYPE_REF.
> Index: tree-pretty-print.c
> ===
> --- tree-pretty-print.c   (revision 202838)
> +++ tree-pretty-print.c   (working copy)
> @@ -2040,6 +2040,12 @@ dump_generic_node (pretty_printer *buffe
>pp_string (buffer, "OBJ_TYPE_REF(");
>dump_generic_node (buffer, OBJ_TYPE_REF_EXPR (node), spc, flags, 
> false);
>pp_semicolon (buffer);
> +  if (virtual_method_call_p (node))
> + {
> +   pp_string (buffer, "(");
> +   dump_generic_node (buffer, obj_type_ref_class (node), spc, flags, 
> false);

I think you want flags | TDF_SLIM here.

Ok with that change.

Richard.

Strenghten condition in cgraph_resolve_speculation

2013-09-25 Thread Jan Hubicka

Hi,
when target of polymorphic call was first determined speculatively and later
determined exactly we compare targets for a match and if they match, we use
the original speculative edge.  This is becuase we may have already optimized
through this edge.  The test is done by comparing decls that brings unnecesary
failures when we use local aliases.  Fixed thus.

regtested/bootstrapped x86_64-linux, comitted.

Index: ChangeLog
===
--- ChangeLog   (revision 202887)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2013-09-25  Jan Hubicka  
+
+   * cgraph.c (cgraph_resolve_speculation): Use semantical equivalency
+   test.
+
 2013-09-25  Marek Polacek  
 
PR sanitizer/58420
Index: cgraph.c
===
--- cgraph.c(revision 202838)
+++ cgraph.c(working copy)
@@ -1188,7 +1188,9 @@ cgraph_resolve_speculation (struct cgrap
 
   gcc_assert (edge->speculative);
   cgraph_speculative_call_info (edge, e2, edge, ref);
-  if (ref->referred->symbol.decl != callee_decl)
+  if (!callee_decl
+  || !symtab_semantically_equivalent_p ((symtab_node) ref->referred,
+   symtab_get_node (callee_decl)))
 {
   if (dump_file)
{

Context sensitive type inheritance graph walking

2013-09-25 Thread Jan Hubicka

Hi,
this is updated version of 
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00936.html

It updates the type inheritance graph walking algorithm to be context sensitive.
Contex is stored in ipa_polymorhic_call_context containing
  - OUTER_TYPE
  (a type of memory object that contains the object whose method is called,
   NULL if unknown)
  - OFFSET
  offset withing OUTER_TYPE where object is located
  - MAYBE_IN_CONSTRUCTION
  true if the object may be in construction. I.e. it is local or static
  variable.
  At the moment we do not try to disprove construciton that is partly
  done by ipa-prop and is planned to be merged here in future
  - MAYBE_DERIVED_TYPE
  if OUTER_TYPE object actually may be a derivation of a type.
  For example when OUTER_TYPE was determined from THIS parameter of a 
method.

There is cgraph_indirect_call_info that walks GIMPLE code and attempts to
determine the context of a given call.  It looks for objects located in
declarations (static vars/ automatic vars/parameters), objects passed by
invisible references and objects passed as THIS pointers.
The second two cases are new, the first case is already done by 
gimple_extract_devirt_binfo_from_cst

Context is determined by cgraph construction code and stored in cgraph edges.
In future I expect ipa-prop to manipulate and update the contextes and propagate
across them, but it is not implemented.  For this reason I did not add context
itself into cgrpah edge, merely I added the new info and hacked ipa-prop 
(temporarily)
to simply throw away the actual info.

The highlevel idea is to make get_class_context to fill in info for known type
and ancestor functions, too and then we can have function combining types +
doing propagation across them replacing the current code that deals with BINFOs
directly and thus can not deal with all the cases above very well.

possible_polymorphic_call_targets is now bit more complex.  it is split into
  - get_class_context
  this function walks the OUTER_TYPE (if present) and looks up the type
  of class actually containing the method being called.

  Previously similar logic was part of gimple_extract_devirt_binfo_from_cst.
  - walk_bases
  When type is in construction, all base types are walked and for those 
containing
  OTR_TYPE at proper memory location the corresponding virtual methods are 
added
  to list
  - record_binfos now walks the inner binfo from OUTER_TYPE to OTR_TYPE
  via get_binfo_at_offset.

Bootstrapped/regtested x86_64-linux.  The patch is tested by ipa-devirt9
testcase.  I have extra four, but I would like to first fix the case where the
devirtualization happens in TODO of early_local_passes that is not dumped
anywhere.  So I plan to post these incrementally once this code is hooked also
into gimple folding.

The patch results in 60% more devirtualizations on Firefox and 10% more
speculative devirtualization.  I think main component missing now is code
determining dynamic type from a call to constructor.  I have some prototype for
this, too, I would like to discuss incrementally.  I am not 100% sure how much
harder tracking of dynamic type changes becomes here.

Martin, does it seem to make sense?

Honza

* cgraph.c (cgraph_create_indirect_edge): Use get_polymorphic_call_info.
* cgraph.h (cgraph_indirect_call_info): Add outer_type, 
maybe_in_construction
and maybe_derived_type.
* ipa-utils.h (ipa_polymorphic_call_context): New structure.
(ipa_dummy_polymorphic_call_context): New global var.
(possible_polymorphic_call_targets): Add context paramter.
(dump_possible_polymorphic_call_targets): Likewise; update
wrappers.
(possible_polymorphic_call_target_p): Likewise.
(get_polymorphic_call_info): New function.
* ipa-devirt.c (ipa_dummy_polymorphic_call_context): New function.
(add_type_duplicate): Remove forgotten debug output.
(method_class_type): Add sanity check.
(maybe_record_node): Add FINALP parameter.
(record_binfo): Add OUTER_TYPE and OFFSET; walk the inner
by info by get_binfo_at_offset.
(possible_polymorphic_call_targets_1): Add OUTER_TYPE/OFFSET parameters;
pass them to record-binfo.
(polymorphic_call_target_d): Add context and FINAL.
(polymorphic_call_target_hasher::hash): Hash context.
(polymorphic_call_target_hasher::equal): Compare context.
(free_polymorphic_call_targets_hash):
(get_class_context): New function.
(contains_type_p): New function.
(get_polymorphic_call_info): New function.
(walk_bases): New function.
(possible_polymorphic_call_targets): Add context parameter; honnor it.
(dump_possible_polymorphic_call_targets): Dump context.
(possible_polymorphic_call_target_p): Add context.
(update_type_inheritance_graph): Update comment.s
(ipa_set_jf_known_type): Asser

[C++ PATCH] Splice when giving an error (PR c++/58510)

2013-09-25 Thread Marek Polacek

The following testcase ICEd because complete_ctor_at_level_p got
a union with two initializers - and didn't like that.  I think we can
get away with splicing when sorting the initializers: we already gave
an error and the program isn't accepted.

Regtested/bootstrapped on x86_64-linux, ok for trunk?

2013-09-25  Marek Polacek  

PR c++/58510
cp/
* init.c (sort_mem_initializers): Splice when giving an error.
testsuite/
* g++.dg/cpp0x/pr58510.C: New test.

--- gcc/cp/init.c.mp2013-09-25 11:50:18.246432664 +0200
+++ gcc/cp/init.c   2013-09-25 11:50:18.262432728 +0200
@@ -980,9 +980,12 @@ sort_mem_initializers (tree t, tree mem_
  else if (TREE_VALUE (*last_p) && !TREE_VALUE (init))
goto splice;
  else
-   error_at (DECL_SOURCE_LOCATION (current_function_decl),
- "initializations for multiple members of %qT",
- ctx);
+   {
+ error_at (DECL_SOURCE_LOCATION (current_function_decl),
+   "initializations for multiple members of %qT",
+   ctx);
+ goto splice;
+   }
}
 
  last_p = p;
--- gcc/testsuite/g++.dg/cpp0x/pr58510.C.mp 2013-09-25 12:19:02.612137551 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/pr58510.C2013-09-25 12:45:13.157119958 
+0200
@@ -0,0 +1,11 @@
+// PR c++/58510
+// { dg-do compile { target c++11 } }
+
+void foo()
+{
+  union
+  {// { dg-error "multiple" }
+int i = 0;
+char c = 0;
+  };
+}

Marek

[PATCH] Improve out-of-SSA coalescing

2013-09-25 Thread Richard Biener


This loosens the restriction of only coalescing SSA names with
the same base variable by ignoring that restriction for DECL_INGORED_P
base variables (ok, all of them can and should be anonymous SSA names
now, but code obviously hasn't catched up 100%).

This improves the code generated for the loop in the testcase to


.p2align 4,,10
.p2align 3
.L4:
xorps   %xmm1, %xmm1
cvtsi2ss%eax, %xmm1
addl$1, %eax
cmpl%edi, %eax
addss   %xmm1, %xmm0
jne .L4

from

jmp .L4
.p2align 4,,10
.p2align 3
.L6:
movaps  %xmm0, %xmm1
.L4:
xorps   %xmm0, %xmm0
cvtsi2ss%eax, %xmm0
addl$1, %eax
cmpl%edi, %eax
addss   %xmm1, %xmm0
jne .L6

avoiding the copy on the backedge and the loop entry jump.  Overall
this is similar to what Jeff was after with his latest adjustment
of this code.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

Richard.

2013-09-25  Richard Biener  

* tree-ssa-live.c (var_map_base_init): Handle SSA names with
DECL_IGNORED_P base variables like anonymous SSA names.
(loe_visit_block): Use gcc_checking_assert.
* tree-ssa-coalesce.c (create_outofssa_var_map): Use
gimple_assign_ssa_name_copy_p.
(gimple_can_coalesce_p): Adjust according to the var_map_base_init
change.

* gcc.dg/tree-ssa/coalesce-2.c: New testcase.

Index: gcc/tree-ssa-live.c
===
*** gcc/tree-ssa-live.c (revision 202883)
--- gcc/tree-ssa-live.c (working copy)
*** var_map_base_init (var_map map)
*** 104,110 
struct tree_int_map **slot;
unsigned baseindex;
var = partition_to_var (map, x);
!   if (SSA_NAME_VAR (var))
m->base.from = SSA_NAME_VAR (var);
else
/* This restricts what anonymous SSA names we can coalesce
--- 104,111 
struct tree_int_map **slot;
unsigned baseindex;
var = partition_to_var (map, x);
!   if (SSA_NAME_VAR (var)
! && !DECL_IGNORED_P (SSA_NAME_VAR (var)))
m->base.from = SSA_NAME_VAR (var);
else
/* This restricts what anonymous SSA names we can coalesce
*** loe_visit_block (tree_live_info_p live,
*** 990,998 
edge_iterator ei;
basic_block pred_bb;
bitmap loe;
-   gcc_assert (!bitmap_bit_p (visited, bb->index));
  
bitmap_set_bit (visited, bb->index);
loe = live_on_entry (live, bb);
  
FOR_EACH_EDGE (e, ei, bb->preds)
--- 993,1002 
edge_iterator ei;
basic_block pred_bb;
bitmap loe;
  
+   gcc_checking_assert (!bitmap_bit_p (visited, bb->index));
bitmap_set_bit (visited, bb->index);
+ 
loe = live_on_entry (live, bb);
  
FOR_EACH_EDGE (e, ei, bb->preds)
Index: gcc/tree-ssa-coalesce.c
===
*** gcc/tree-ssa-coalesce.c (revision 202883)
--- gcc/tree-ssa-coalesce.c (working copy)
*** create_outofssa_var_map (coalesce_list_p
*** 980,989 
  {
tree lhs = gimple_assign_lhs (stmt);
tree rhs1 = gimple_assign_rhs1 (stmt);
! 
!   if (gimple_assign_copy_p (stmt)
! && TREE_CODE (lhs) == SSA_NAME
!   && TREE_CODE (rhs1) == SSA_NAME
&& gimple_can_coalesce_p (lhs, rhs1))
  {
v1 = SSA_NAME_VERSION (lhs);
--- 982,988 
  {
tree lhs = gimple_assign_lhs (stmt);
tree rhs1 = gimple_assign_rhs1 (stmt);
!   if (gimple_assign_ssa_name_copy_p (stmt)
&& gimple_can_coalesce_p (lhs, rhs1))
  {
v1 = SSA_NAME_VERSION (lhs);
*** gimple_can_coalesce_p (tree name1, tree
*** 1347,1353 
  {
/* First check the SSA_NAME's associated DECL.  We only want to
   coalesce if they have the same DECL or both have no associated DECL.  */
!   if (SSA_NAME_VAR (name1) != SSA_NAME_VAR (name2))
  return false;
  
/* Now check the types.  If the types are the same, then we should
--- 1346,1356 
  {
/* First check the SSA_NAME's associated DECL.  We only want to
   coalesce if they have the same DECL or both have no associated DECL.  */
!   tree var1 = SSA_NAME_VAR (name1);
!   tree var2 = SSA_NAME_VAR (name2);
!   var1 = (var1 && !DECL_IGNORED_P (var1)) ? var1 : NULL_TREE;
!   var2 = (var2 && !DECL_IGNORED_P (var2)) ? var2 : NULL_TREE;
!   if (var1 != var2)
  return false;
  
/* Now check the types.  If the types are the same, then we should
Index: gcc/testsuite/gcc.dg/tree-ssa/coalesce-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/coalesce-2.c  (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/coalesce-2.c  (w

Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-09-25 Thread Yufeng Zhang


Hello,

Please find the updated version of the patch in the attachment.  It has 
addressed the previous comments and also included some changes in order 
to pass the bootstrapping on x86_64.


It's also passed the regtest on arm-none-eabi and aarch64-none-elf.

It will also fix the test failure as reported here:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html

OK for the trunk?

Thanks,
Yufeng


gcc/

* gimple-ssa-strength-reduction.c (safe_to_multiply_p): New 
function.

(backtrace_base_for_ref): Call get_unwidened, check 'base_in'
again and set unwidend_p with true; call safe_to_multiply_p to 
avoid

unsafe unwidened cases.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-40.c: New test.



On 09/11/13 13:39, Bill Schmidt wrote:

On Wed, 2013-09-11 at 10:32 +0200, Richard Biener wrote:

On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang  wrote:

Hi,

Following Bin's patch in
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch tweaks
backtrace_base_for_ref () to strip of any widening conversion after the
first TREE_CODE check fails.  Without this patch, the test
(gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as
backtrace_base_for_ref () will stop if not seeing an ssa_name since the tree
code can be nop_expr instead.

Regtested on arm and aarch64; still bootstrapping x86_64.

OK for the trunk if the x86_64 bootstrap succeeds?


Please add a testcase.


Also, the comment "Strip of" should read "Strip off".  Otherwise I have
no comments.

Thanks,
Bill



Richard.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 8d48add..1c04382 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -750,6 +750,40 @@ slsr_process_phi (gimple phi, bool speed)
   add_cand_for_stmt (phi, c);
 }
 
+/* Utility function for backtrace_base_for_ref.
+
+   Given
+
+ T2 = T2' + CST
+ RES = (wider_type) T2 * C3
+
+   where TYPE is TREE_TYPE (T2), this function returns false when it is
+   _not_ safe to carry out the following transformation.
+
+ RES = (wider_type) T2' * C3 + (wider_type) CST * C3
+
+   One example unsafe case is:
+
+ int array[40];
+ array[n - 1]
+
+   where n is a 32-bit unsigned int and pointer are 64-bit long.  In this
+   case, the gimple for (n - 1) is:
+
+ _2 = n_1(D) + 4294967295; // 0x
+
+   and it is wrong to multiply the large constant by 4 in the 64-bit space.  */
+
+static bool
+safe_to_multiply_p (tree type, double_int cst)
+{
+  if (TYPE_UNSIGNED (type)
+  && ! double_int_fits_to_tree_p (signed_type_for (type), cst))
+return false;
+
+  return true;
+}
+
 /* Given PBASE which is a pointer to tree, look up the defining
statement for it and check whether the candidate is in the
form of:
@@ -766,10 +800,19 @@ backtrace_base_for_ref (tree *pbase)
 {
   tree base_in = *pbase;
   slsr_cand_t base_cand;
+  bool unwidened_p = false;
 
   STRIP_NOPS (base_in);
   if (TREE_CODE (base_in) != SSA_NAME)
-return tree_to_double_int (integer_zero_node);
+{
+  /* Strip off widening conversion(s) to handle cases where
+e.g. 'B' is widened from an 'int' in order to calculate
+a 64-bit address.  */
+  base_in = get_unwidened (base_in, NULL_TREE);
+  if (TREE_CODE (base_in) != SSA_NAME)
+   return tree_to_double_int (integer_zero_node);
+  unwidened_p = true;
+}
 
   base_cand = base_cand_from_table (base_in);
 
@@ -777,7 +820,10 @@ backtrace_base_for_ref (tree *pbase)
 {
   if (base_cand->kind == CAND_ADD
  && base_cand->index.is_one ()
- && TREE_CODE (base_cand->stride) == INTEGER_CST)
+ && TREE_CODE (base_cand->stride) == INTEGER_CST
+ && (! unwidened_p
+ || safe_to_multiply_p (TREE_TYPE (base_cand->stride),
+tree_to_double_int (base_cand->stride
{
  /* X = B + (1 * S), S is integer constant.  */
  *pbase = base_cand->base_expr;
@@ -785,8 +831,11 @@ backtrace_base_for_ref (tree *pbase)
}
   else if (base_cand->kind == CAND_ADD
   && TREE_CODE (base_cand->stride) == INTEGER_CST
-  && integer_onep (base_cand->stride))
-{
+  && integer_onep (base_cand->stride)
+  && (! unwidened_p
+  || safe_to_multiply_p (TREE_TYPE (base_cand->base_expr),
+ base_cand->index)))
+   {
  /* X = B + (i * S), S is integer one.  */
  *pbase = base_cand->base_expr;
  return base_cand->index;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c
new file mode 100644
index 000..72726a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-40.c
@@ -0,0 +1,27 @@
+/* Verify straight-line strength reduction for array
+   subscripting.
+
+   elems[n-1] is reduced to elems + n * 4 + 0x * 4, only w

RE: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-09-25 Thread Bernd Edlinger

Hmm.,

On Tue, 24 Sep 2013 20:06:51, Martin Jambor wrote:
> Hi,
>
> On Tue, Sep 24, 2013 at 12:02:17PM +0200, Richard Biener wrote:
>> On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger
>>  wrote:
>>> Hi,
>>>
>>> with the attached patch the read-side of the out of bounds accesses are 
>>> fixed.
>>> There is a single new test case pr57748-3.c that is derived from Martin's 
>>> test case.
>>> The test case does only test the read access and does not depend on part 1 
>>> of the
>>> patch.
>>>
>>> This patch was boot-strapped and regression tested on 
>>> x86_64-unknown-linux-gnu.
>>>
>>> Additionally I generated eCos and an eCos-application (on ARMv5 using packed
>>> structures) with an arm-eabi cross compiler, and looked for differences in 
>>> the
>>> disassembled code with and without this patch, but there were none.
>>>
>>> OK for trunk?
>>
>> Index: gcc/expr.c
>> ===
>> --- gcc/expr.c (revision 202778)
>> +++ gcc/expr.c (working copy)
>> @@ -9878,7 +9878,7 @@
>> && modifier != EXPAND_STACK_PARM
>> ? target : NULL_RTX),
>> VOIDmode,
>> - modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier);
>> + EXPAND_MEMORY);
>>
>> /* If the bitfield is volatile, we want to access it in the
>> field's mode, not the computed mode.
>>
>> context suggests that we may arrive with EXPAND_STACK_PARM here
>> which is a correctness modifier (see its docs). But I'm not too familiar
>> with the details of the various expand modifiers, Eric may be though.

"  EXPAND_STACK_PARM is used when expanding to a TARGET on  the stack for
   a call parameter.  Such targets require special care as we haven't yet
   marked TARGET so that it's safe from being trashed by libcalls.  We
   don't want to use TARGET for anything but the final result;
   Intermediate values must go elsewhere.   Additionally, calls to
   emit_block_move will be flagged with BLOCK_OP_CALL_PARM."

This should not be a problem: If modifier == EXPAND_STACK_PARM
the target is NOT passed down, and thus not giving the modifier either
should not cause incorrect code. But I might be wrong...

On the other hand, in the case where tem is a unaligned MEM_REF and
exp is not exactly the same object as tem,
EXPAND_STACK_PARM may exactly do the wrong thing.

What I can tell is, that this patch does not change a single bit in the complete
GCC boot strap process, stage 2 and stage 3 are binary identical with or
without this patch, and that all test cases pass, even the ADA test cases...

>> That said, I still believe that fixing the misalign path in expand_assignment
>> would be better than trying to avoid it. For this testcase the issue is
>> again that expand_assignment passes the wrong mode/target to the
>> movmisalign optab.
>
> Perhaps I'm stating the obvious, but this is supposed to address a
> separate bug in the expansion of the RHS (as opposed to the first bug
> which was in the expansion of the LHS), and so we would have to make
> expand_assignment to also examine potential misalignments in the RHS,
> which it so far does not do, despite its complexity.

Thanks for pointing that out. This is true.

> Having said that, I also dislike the patch, but I am quite convinced
> that if we allow non-BLK structures with zero sized arrays, the fix
> will be ugly - but I'll be glad to be shown I've been wrong.
>
> Martin

I could of course look at the type of tem, and only mess with the
expand modifier in case of a MEM_REF. But I am not sure what
to do about the VIEW_CONVERT_EXPR, if the code at normal_inner_ref
is wrong, the code at VIEW_CONVERT_EXPR can't possibly be any better,
although I have not yet any idea how to write a test case for this:

    /* ??? We should work harder and deal with non-zero offsets.  */
    if (!offset
    && (bitpos % BITS_PER_UNIT) == 0
    && bitsize>= 0
    && compare_tree_int (TYPE_SIZE (type), bitsize) == 0)
  {
    /* See the normal_inner_ref case for the rationale.  */
    orig_op0
  = expand_expr (tem,
 (TREE_CODE (TREE_TYPE (tem)) == UNION_TYPE
  && (TREE_CODE (TYPE_SIZE (TREE_TYPE (tem)))
  != INTEGER_CST)
  && modifier != EXPAND_STACK_PARM
  ? target : NULL_RTX),
 VOIDmode,
 modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier);

Especially for EXPAND_STACK_PARM? If this does really allocate
something on a register, then the result will be thrown away, because
it is not MEM_P?
Right?

Note the RHS-Bug can also affect ordinary unions on ARM and all
STRICT_ALIGNMENT targets.

Please check the attached test case. It may not be obvious at
first sight, but the assembler code for check is definitely wrong,
because it reads the whole structure, and uses only one byte of it.
And that is, not OK because x was volatile...

check:

Re: [PATCH][ubsan] Add VLA bound instrumentation

2013-09-25 Thread Marek Polacek

On Thu, Sep 12, 2013 at 02:26:55PM +0200, Marek Polacek wrote:
> This patch adds the instrumentation of VLA bounds.  Basically, it just checks 
> that
> the size of a VLA is positive.  I.e., We also issue an error if the size of 
> the
> VLA is 0.  It catches e.g.
> 
> int i = 1;
> int a[i][i - 2];
> 
> It is pretty straightforward, but I had
> issues in the C++ FE, mainly choosing the right spot where to instrument...
> Hopefully I picked up the right one.  Also note that in C++1y we throw
> an exception when the size of a VLA is negative; hence no need to perform
> the instrumentation if -std=c++1y is in effect.
> 
> Regtested/ran bootstrap-ubsan on x86_64-linux, also
> make check -C gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} ubsan.exp'
> passes.
> 
> Ok for trunk?

I'd like to ping this patch; below is rebased version with the ubsan.c
hunk omitted, since that part was already fixed by another patch.

(It still doesn't contain alloca/SIZE_MAX/... checking, since that
very much relies on libubsan.  Still, it'd be felicitous to get at
least the basic VLA checking in.)

Ran ubsan testsuite + bootstrap-ubsan on x86_64-linux.

2013-09-25  Marek Polacek  

* opts.c (common_handle_option): Handle vla-bound.
* sanitizer.def (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE):
Define.
* flag-types.h (enum sanitize_code): Add SANITIZE_VLA.
* asan.c (initialize_sanitizer_builtins): Build BT_FN_VOID_PTR_PTR.
c-family/
* c-ubsan.c: Don't include hash-table.h.
(ubsan_instrument_vla): New function.
* c-ubsan.h: Declare it.
cp/
* decl.c (create_array_type_for_decl): Add VLA instrumentation.
c/
* c-decl.c (grokdeclarator): Add VLA instrumentation.
testsuite/
* g++.dg/ubsan/cxx1y-vla.C: New test.
* c-c++-common/ubsan/vla-3.c: New test.
* c-c++-common/ubsan/vla-2.c: New test.
* c-c++-common/ubsan/vla-4.c: New test.
* c-c++-common/ubsan/vla-1.c: New test.

--- gcc/opts.c.mp   2013-09-25 14:06:58.531276511 +0200
+++ gcc/opts.c  2013-09-25 14:07:03.580294566 +0200
@@ -1428,6 +1428,7 @@ common_handle_option (struct gcc_options
  { "undefined", SANITIZE_UNDEFINED, sizeof "undefined" - 1 },
  { "unreachable", SANITIZE_UNREACHABLE,
sizeof "unreachable" - 1 },
+ { "vla-bound", SANITIZE_VLA, sizeof "vla-bound" - 1 },
  { NULL, 0, 0 }
};
const char *comma;
--- gcc/c-family/c-ubsan.c.mp   2013-09-25 14:06:58.535276527 +0200
+++ gcc/c-family/c-ubsan.c  2013-09-25 14:07:03.580294566 +0200
@@ -25,7 +25,6 @@ along with GCC; see the file COPYING3.
 #include "alloc-pool.h"
 #include "cgraph.h"
 #include "gimple.h"
-#include "hash-table.h"
 #include "output.h"
 #include "toplev.h"
 #include "ubsan.h"
@@ -86,8 +85,7 @@ ubsan_instrument_division (location_t lo
   return t;
 }
 
-/* Instrument left and right shifts.  If not instrumenting, return
-   NULL_TREE.  */
+/* Instrument left and right shifts.  */
 
 tree
 ubsan_instrument_shift (location_t loc, enum tree_code code,
@@ -157,4 +155,23 @@ ubsan_instrument_shift (location_t loc,
   t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
 
   return t;
+}
+
+/* Instrument variable length array bound.  */
+
+tree
+ubsan_instrument_vla (location_t loc, tree size)
+{
+  tree type = TREE_TYPE (size);
+  tree t, tt;
+
+  t = fold_build2 (LE_EXPR, boolean_type_node, size, build_int_cst (type, 0));
+  tree data = ubsan_create_data ("__ubsan_vla_data",
+loc, ubsan_type_descriptor (type), NULL_TREE);
+  data = build_fold_addr_expr_loc (loc, data);
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE);
+  tt = build_call_expr_loc (loc, tt, 2, data, ubsan_encode_value (size));
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+
+  return t;
 }
--- gcc/c-family/c-ubsan.h.mp   2013-09-25 14:06:58.538276539 +0200
+++ gcc/c-family/c-ubsan.h  2013-09-25 14:07:03.595294628 +0200
@@ -23,5 +23,6 @@ along with GCC; see the file COPYING3.
 
 extern tree ubsan_instrument_division (location_t, tree, tree);
 extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree);
+extern tree ubsan_instrument_vla (location_t, tree);
 
 #endif  /* GCC_C_UBSAN_H  */
--- gcc/sanitizer.def.mp2013-09-25 14:06:58.542276558 +0200
+++ gcc/sanitizer.def   2013-09-25 14:07:03.628294753 +0200
@@ -297,3 +297,7 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HAN
  "__ubsan_handle_builtin_unreachable",
  BT_FN_VOID_PTR,
  ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE,
+ "__ubsan_handle_vla_bound_not_positive",
+ BT_FN_VOID_PTR_PTR,
+ ATTR_COLD_NOTHROW_LEAF_LIST)
--- gcc/flag-types.h.mp 2013-09-25 14:06:58.546276575 +0200
+++ gcc/flag-types.h

Re: [PATCH] Improve out-of-SSA coalescing

2013-09-25 Thread Richard Biener

On Wed, 25 Sep 2013, Richard Biener wrote:

> 
> This loosens the restriction of only coalescing SSA names with
> the same base variable by ignoring that restriction for DECL_INGORED_P
> base variables (ok, all of them can and should be anonymous SSA names
> now, but code obviously hasn't catched up 100%).
> 
> This improves the code generated for the loop in the testcase to
> 
> 
> .p2align 4,,10
> .p2align 3
> .L4:
> xorps   %xmm1, %xmm1
> cvtsi2ss%eax, %xmm1
> addl$1, %eax
> cmpl%edi, %eax
> addss   %xmm1, %xmm0
> jne .L4
> 
> from
> 
> jmp .L4
> .p2align 4,,10
> .p2align 3
> .L6:
> movaps  %xmm0, %xmm1
> .L4:
> xorps   %xmm0, %xmm0
> cvtsi2ss%eax, %xmm0
> addl$1, %eax
> cmpl%edi, %eax
> addss   %xmm1, %xmm0
> jne .L6
> 
> avoiding the copy on the backedge and the loop entry jump.  Overall
> this is similar to what Jeff was after with his latest adjustment
> of this code.
> 
> Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

For some reason it miscompiles GCC itself.  Hmm.  Cannot spot the
obvious error yet.

Richard.

Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-25 Thread Yvan Roux

hmm, I don't see clearly where we loose the XEXP (x, n) information
when calling must_be_base_p(*inner) and/or must_be_index_p(*inner) in
set_address_base and set_address_index.

BTW, the validation on ARM (AARch32 and AARch64) is clean.

Thanks,
Yvan

On 24 September 2013 18:36, Richard Sandiford
 wrote:
> Eric Botcazou  writes:
>>> Sorry, I'm not sure I understand well, it means that you prefer the
>>> shift_and _rotate_code_p notation, right ?
>>
>> Let's do the following in addition to the lsb_bitfield_op_p thing:
>>   1. Replace the LO_SUM test in set_address_base by a call to must_be_base_p,
>>   2. Replace the MULT || ASHIFT test in set_address_index by a call to
>> must_be_index_p,
>>   3. Add the new cases to must_be_index_p directly, with a comment saying 
>> that
>> there are e.g. for the ARM.
>
> FWIW, I'd prefer to keep it as-is, since must_be_base_p (x) and
> must_be_index_p (x) don't imply that we should look specifically at
> XEXP (x, 0) (rather than just X, or XEXP (x, 1), etc.).  I think it's
> better to keep the code tests and the associated XEXPs together.
>
> Thanks,
> Richard

Re: [patch] Separate immediate uses and phi routines from tree-flow*.h

2013-09-25 Thread Andrew MacLeod


On 09/25/2013 04:49 AM, Richard Biener wrote:

On Tue, Sep 24, 2013 at 4:39 PM, Andrew MacLeod  wrote:

3 - a few routines seem like basic gimple routines, but really turn out to
require the operand infrastructure to implement... so they are moved to
tree-ssa-operands.[ch] as well.  This sort of thing showed up when removing
tree-ssa-operands.h from the gimple.h include file.  These were things like
gimple_vuse_op, gimple_vdef_op, update_stmt, and update_stmt_if_modified

Note that things like gimple_vuse_op are on the interface border between
gimple (where the SSA operands are stored) and SSA operands.  So
it's not so clear for them given they access internal gimple fields directly
but use the regular SSA operand API.

I'd prefer gimple_vuse_op and gimple_vdef_op to stay in gimple.[ch].

If you want to remove the tree-ssa-operands.h include from gimple.h
(in some way makes sense), then we need a new gimple-ssa.[ch]
for routines that operate on the "gimple in SSA" IL (not purely on
gimple, which could be not in SSA form and not purely on the SSA
web which could in theory be constructed over a non-GIMPLE representation).

Actually I like the idea of gimple-ssa.[ch].  tree-ssa-operands.[ch]
would then be solely the place for the operand infrastructure internals
implementation, not a place for generic SSA utility routines related
to SSA operands.

Moving update_stmt/update_stmt_if_modified is a similar case but
maybe less obvious (unless we have gimple-ssa.[ch]).

What do you think?


Yeah, I was mulling something similar, I wasnt super thrilled that those 
were going into tree-ssa-operands.h... but they didnt belong in gimple.h 
either.   I think adding gimple-ssa.[ch] is the way to go

(swap_tree_operands): Move prototype to tree-ssa-operands.h.

rename to swap_ssa_operands and remove the !ssa_operands_active
path?  (should be no longer needed I think)
Yeah, I think you are right. Its not being used by tree only routines, 
and it trivial in that case anyway.


 (gimple_phi_arg_def, gimple_phi_arg_def_ptr, gimple_phi_arg_edge,
 gimple_phi_arg_location, gimple_phi_arg_location_from_edge,
 gimple_phi_arg_set_location, gimple_phi_arg_has_location, phi_nodes,
 phi_nodes_ptr, set_phi_nodes, phi_ssa_name_p): Move to tree-phinodes.h.

I always found the separation of PHI node API from gimple API odd ...
now, PHI nodes are a SSA speciality.  Unless the gimple_phi class
moves to tree-phinodes.h the same comments as to gimple_vuse_op
apply.  But maybe everything will be more obvious once we separate
the core gimple data structure from the gimple API stuff ...


Hrm. Yeah, this one isn't as clear as it might seem at first. gimple.h 
does have a few accessors for the base statement kind... and these are 
accessors for the argument which is from core-types.h.   I imagine that 
tree-phinodes.h should do all the phi-node related stuff short of the 
actual stmt itself... but its not immediately obvious. I'd like to leave 
these in tree-phinodes.h and I'll make a stab at how to manage that 
next..  it would actually be nice to get struct phi_arg_d  *out* of 
core-types.h.. but maybe thats too grand.  I can move them again later 
if it turns out the should be in the border file or elsewhere.




Overall the patch looks ok, either massage it according to my comments
above (thinking about gimple-ssa.[ch]) or follow up with a separate patch
(not that moving things twice will hurt).

I'll do the massage., I like the gimple-ssa border files.   I suspect 
there will be more shuffling of some of these routines down the road.. 
right now its pretty high level separation thats happening. Once things 
are better clustered, I expect more appropriate fine grained breakdowns 
will become apparent.


Andrew

RE: [PATCH] Portable Volatility Warning

2013-09-25 Thread Bernd Edlinger

Hi Sandra,

thanks a lot, your comments are very welcome, especially as I am
not a native english speaker...


On Tue, 24 Sep 2013 15:46:22, Sandra Loosemore wrote:
>
> I have some nit-picky documentation suggestions about this patch
>
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00100.html
>
>> + warning_at (input_location, OPT_Wportable_volatility,
>> + "the code to accesses this volatile member is dependent on"
>> + " whether -fstrict-volatile-bitfields is used;"
>> + " accessing device registers may depend on this option"
>> + " however it contradicts the C11/C++11 memory model");
>
> I'd simplify this to something like
>
> "access to volatile member may not conform to the C11/C++11"
> " memory model if -fstrict-volatile-bitfields is in effect"
>
>
>> --- gcc/c-family/c.opt 2013-05-28 21:55:10.0 +0200
>> +++ gcc/c-family/c.opt 2013-07-13 11:02:38.0 +0200
>> @@ -629,6 +629,10 @@ Wpointer-to-int-cast
>> C ObjC Var(warn_pointer_to_int_cast) Init(1) Warning
>> Warn when a pointer is cast to an integer of a different size
>>
>> +Wportable-volatility
>> +C ObjC C++ ObjC++ Var(warn_portable_volatility) Warning
>> +Warn about code for which separate incompatible definitions exist even 
>> within gcc
>> +
>
> This seems too vague. How about just
>
> Warn about potentially nonportable volatile memory accesses
>
>> +@item -Wportable-volatility
>> +@opindex Wportable-volatility
>> +@opindex Wno-portable-volatility
>> +Warn about code which accesses volatile structure members for
>
> s/which/that/
>
>> +which different ABI specifications may exist whether in some
>> +language standard or in a target-specific ABI document.
>> +
>> +For instance on the ARM architecture AAPCS specifies how to
>> +access volatile bit-fields. But for C/C++ there exists a
>> +language standard, the C11/C++11 memory model. GCC tries to
>> +follow the latter by default unless -fstrict-volatile-bitfields
>> +is used.
>> +
>> +As an example this option will warn about code which accesses
>> +packed volatile structure members which may be dependent on
>> +whether -fstrict-volatile-bitfields is used or not.
>
> I'd replace the last two paragraphs with just:
>
> In particular, this option causes GCC to warn about volatile memory
> accesses for which use of the @option{-fstrict-volatile-bitfields}
> option causes code to be generated that does not conform to the
> C11/C++11 memory model.
>
> I'd also like to see a cross-reference added to the documentation for
> -fstrict-volatile-bitfields, like:
>
> You can use @option{-Wportable-volatility} to make GCC issue warnings
> about volatile structure accesses affected by
> @option{-fstrict-volatile-bitfields}.
>
> -Sandra


Richard: I do not know, is this a political issue, that is blocking
the whole of Sandra's patch?

Actually we (softing.com) do not really care what happens to the
default setting of -fstrict-volatile-bitfields. Maybe you could look at
reviewing Sandra's part 1,2,3 independently of the rest?

Once that is approved, I could start again my work on the warnings part.
Meanwhile I've got an idea how to solve you first objection:

On Tue, 3 Sep 2013 10:53:10, Richard Biener wrote:
> Just a quick first note:
> 
> @@ -3470,6 +3470,13 @@ optimize_bit_field_compare (location_t l
>|| offset != 0 || TREE_CODE (linner) == PLACEHOLDER_EXPR)
>  return 0;
> 
> +  /* Do not try to optimize on volatiles, as it would defeat the
> + -Wportable-volatility warning later:
> + If the COMPONENT_REF is replaced by a BIT_FIELD_REF we loose
> + information and thus the warning cannot be generated correctly.  */
> +  if (warn_portable_volatility && lvolatilep)
> +return 0;
> 
> changing code-generation with a warning option looks wrong to me.  Note
> that I completely removed this code once (it also generates strange
> BIT_FIELD_REFs) but re-instantiated it on request because of lost
> optimizations.

The idea is, I'd emit the warning at the time when the COMPONENT_REF
is replaced by a BIT_FIELD_REF. Then the warning code would not have
access to  !MEM_P (op0)   || !MEM_VOLATILE_P (op0) but this should not
be a big issue.


Thanks
Bernd.

Re: [gomp4] Dumping gimple for offload.

2013-09-25 Thread Ilya Tocar

On 24 Sep 11:02, Richard Biener wrote:
> On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar  wrote:
> > Hi,
> >
> > I've rebased my patch.
> > Is it ok for gomp4
> 
> Passing through "is_omp" looks bad - please find a more suitable place
> to attach this meta-information.  From a quick look you only need it to
> produce an alternate section name,

Mostly for name, but there are other uses
(e. g. choosing decl states vector).

>  thus consider assigning the section
> name in a different place.
> 
> Richard.

What do you mean  by different place?
I can add global dumping_omp_target variable to choose correct name,
depending on it's value (patch below). Is it better?


---
 gcc/lto-cgraph.c   |  10 ++-
 gcc/lto-section-out.c  |   6 +-
 gcc/lto-streamer-out.c | 182 +
 gcc/lto-streamer.c |   4 +-
 gcc/lto-streamer.h |   6 ++
 gcc/passes.c   |   2 +-
 gcc/passes.def |   2 +
 gcc/timevar.def|   2 +
 gcc/tree-pass.h|   2 +
 9 files changed, 198 insertions(+), 18 deletions(-)

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 952588d..5187190 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -907,9 +907,15 @@ output_symtab (void)
 {
   symtab_node node = lto_symtab_encoder_deref (encoder, i);
   if (cgraph_node *cnode = dyn_cast  (node))
-lto_output_node (ob, cnode, encoder);
+   {
+ if (!dumping_omp_target || lookup_attribute ("omp declare target",
+ DECL_ATTRIBUTES (node->symbol.decl)))
+   lto_output_node (ob, cnode, encoder);
+   }
   else
-lto_output_varpool_node (ob, varpool (node), encoder);
+ if (!dumping_omp_target || lookup_attribute ("omp declare target",
+ DECL_ATTRIBUTES (node->symbol.decl)))
+   lto_output_varpool_node (ob, varpool (node), encoder);

 }
 
diff --git a/gcc/lto-section-out.c b/gcc/lto-section-out.c
index 59eed71..ad247c1 100644
--- a/gcc/lto-section-out.c
+++ b/gcc/lto-section-out.c
@@ -49,6 +49,7 @@ static vec decl_state_stack;
 
 vec lto_function_decl_states;
 
+vec omp_function_decl_states;
 
 /*
Output routines shared by all of the serialization passes.
@@ -443,5 +444,8 @@ lto_record_function_out_decl_state (tree fn_decl,
state->streams[i].tree_hash_table = NULL;
   }
   state->fn_decl = fn_decl;
-  lto_function_decl_states.safe_push (state);
+  if (dumping_omp_target)
+omp_function_decl_states.safe_push (state);
+  else
+lto_function_decl_states.safe_push (state);
 }
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index 8f823f2..66a0747 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -48,6 +48,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 
 
+/* Off by default.  */
+bool dumping_omp_target = false;
+
 /* Clear the line info stored in DATA_IN.  */
 
 static void
@@ -2014,6 +2017,93 @@ lto_output (void)
 #endif
 }
 
+bool
+gate_omp_out (void)
+{
+  return flag_openmp;
+}
+
+static void
+omp_output (void)
+{
+  /* We need omp names for sections, turn this on.  */
+  dumping_omp_target = true;
+
+  lto_streamer_hooks_init();
+  struct lto_out_decl_state *decl_state;
+  int i, n_nodes;
+  lto_symtab_encoder_t encoder = lto_get_out_decl_state 
()->symtab_node_encoder;
+
+  /* Initialize the streamer.  */
+  lto_streamer_init ();
+
+  n_nodes = lto_symtab_encoder_size (encoder);
+  /* Process only the functions with bodies.  */
+  for (i = 0; i < n_nodes; i++)
+{
+  symtab_node snode = lto_symtab_encoder_deref (encoder, i);
+  cgraph_node *node = dyn_cast  (snode);
+  if (node
+ && lto_symtab_encoder_encode_body_p (encoder, node)
+ && !node->symbol.alias
+ && gimple_has_body_p (node->symbol.decl)
+ && lookup_attribute ("omp declare target",
+  DECL_ATTRIBUTES (node->symbol.decl)))
+   {
+ decl_state = lto_new_out_decl_state ();
+ lto_push_out_decl_state (decl_state);
+ output_function (node);
+ lto_pop_out_decl_state ();
+ lto_record_function_out_decl_state (node->symbol.decl, decl_state);
+   }
+}
+
+  /* Emit the callgraph after emitting function bodies.  This needs to
+ be done now to make sure that all the statements in every function
+ have been renumbered so that edges can be associated with call
+ statements using the statement UIDs.  */
+  output_symtab ();
+}
+
+namespace {
+
+const pass_data pass_data_ipa_omp_gimple_out =
+{
+  IPA_PASS, /* type */
+  "omp_gimple_out", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  true, /* has_gate */
+  false, /* has_execute */
+  TV_IPA_OMP_GIMPLE_OUT, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_sta

Re: [gomp4] Dumping gimple for offload.

2013-09-25 Thread Richard Biener

On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar  wrote:
> On 24 Sep 11:02, Richard Biener wrote:
>> On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar  wrote:
>> > Hi,
>> >
>> > I've rebased my patch.
>> > Is it ok for gomp4
>>
>> Passing through "is_omp" looks bad - please find a more suitable place
>> to attach this meta-information.  From a quick look you only need it to
>> produce an alternate section name,
>
> Mostly for name, but there are other uses
> (e. g. choosing decl states vector).
>
>>  thus consider assigning the section
>> name in a different place.
>>
>> Richard.
>
> What do you mean  by different place?
> I can add global dumping_omp_target variable to choose correct name,
> depending on it's value (patch below). Is it better?

More like passing down a different abstraction, like for

> @@ -907,9 +907,15 @@ output_symtab (void)
>  {
>symtab_node node = lto_symtab_encoder_deref (encoder, i);
>if (cgraph_node *cnode = dyn_cast  (node))
> -lto_output_node (ob, cnode, encoder);
> +   {
> + if (!dumping_omp_target || lookup_attribute ("omp declare target",
> + DECL_ATTRIBUTES 
> (node->symbol.decl)))
> +   lto_output_node (ob, cnode, encoder);
> +   }
>else
> -lto_output_varpool_node (ob, varpool (node), encoder);
> + if (!dumping_omp_target || lookup_attribute ("omp declare target",
> + DECL_ATTRIBUTES 
> (node->symbol.decl)))
> +   lto_output_varpool_node (ob, varpool (node), encoder);
>
>  }

have the symtab encoder already not contain the varpool nodes you
don't need.

And instead of looking up attributes, mark the symtab node with a flag.

Maybe Honza has some ideas on how to design this into the machinery
rather than trying to fit in from the outside as well.

Richard.

Re: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-09-25 Thread Martin Jambor

Hi,

On Wed, Sep 25, 2013 at 11:05:21AM +0200, Richard Biener wrote:
> On Tue, Sep 24, 2013 at 8:06 PM, Martin Jambor  wrote:
> > Hi,
> >
> > On Tue, Sep 24, 2013 at 12:02:17PM +0200, Richard Biener wrote:
> >> On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger
> >>  wrote:
> >> > Hi,
> >> >
> >> > with the attached patch the read-side of the out of bounds accesses are 
> >> > fixed.
> >> > There is a single new test case pr57748-3.c that is derived from 
> >> > Martin's test case.
> >> > The test case does only test the read access and does not depend on part 
> >> > 1 of the
> >> > patch.
> >> >
> >> > This patch was boot-strapped and regression tested on 
> >> > x86_64-unknown-linux-gnu.
> >> >
> >> > Additionally I generated eCos and an eCos-application (on ARMv5 using 
> >> > packed
> >> > structures) with an arm-eabi cross compiler, and looked for differences 
> >> > in the
> >> > disassembled code with and without this patch, but there were none.
> >> >
> >> > OK for trunk?
> >>
> >> Index: gcc/expr.c
> >> ===
> >> --- gcc/expr.c  (revision 202778)
> >> +++ gcc/expr.c  (working copy)
> >> @@ -9878,7 +9878,7 @@
> >>   && modifier != EXPAND_STACK_PARM
> >>   ? target : NULL_RTX),
> >>  VOIDmode,
> >> -modifier == EXPAND_SUM ? EXPAND_NORMAL : 
> >> modifier);
> >> +EXPAND_MEMORY);
> >>
> >> /* If the bitfield is volatile, we want to access it in the
> >>field's mode, not the computed mode.
> >>
> >> context suggests that we may arrive with EXPAND_STACK_PARM here
> >> which is a correctness modifier (see its docs).  But I'm not too familiar
> >> with the details of the various expand modifiers, Eric may be though.
> >>
> >> That said, I still believe that fixing the misalign path in 
> >> expand_assignment
> >> would be better than trying to avoid it.  For this testcase the issue is
> >> again that expand_assignment passes the wrong mode/target to the
> >> movmisalign optab.
> >
> > Perhaps I'm stating the obvious, but this is supposed to address a
> > separate bug in the expansion of the RHS (as opposed to the first bug
> > which was in the expansion of the LHS), and so we would have to make
> > expand_assignment to also examine potential misalignments in the RHS,
> > which it so far does not do, despite its complexity.
> >
> > Having said that, I also dislike the patch, but I am quite convinced
> > that if we allow non-BLK structures with zero sized arrays, the fix
> > will be ugly - but I'll be glad to be shown I've been wrong.
> 
> I don't think the issues have anything to do with zero sized arrays
> or non-BLK structures.  The issue is just we are feeding movmisaling
> with the wrong mode (the mode of the base object) if we are expanding
> a base object which is unaligned and has non-BLK mode.

I disagree.  Consider a slightly modified testcase:

#include 
typedef long long V
  __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));

#if 1
typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));
struct __attribute__((packed)) T { char c; P s; };
#else
typedef struct S { V a; V b[0]; } P;
struct T { char c; P s; };
#endif

void __attribute__((noinline, noclone))
good_value (V b)
{
  if (b[0] != 3 || b[1] != 4)
__builtin_abort ();
}

void __attribute__((noinline, noclone))
check (P *p)
{
  good_value (p->b[0]);
}

int
main ()
{
  struct T *t = (struct T *) calloc (128, 1);
  V a = { 3, 4 };
  t->s.b[0] = a;
  check (&t->s);
  free (t);
  return 0;
}

The problem is the expansion of the memory load in function check.
All involved modes, the one of the base structure and of the loaded
component, are V2DI so even if we somehow mixed them up, in this
example it should not matter, yet the unaligned load gives wrong
results.

I'm still convinced that the problem is that we have a V2DImode
structure which is larger that the mode tells and we want to load the
data from outside of its mode. That is only happening because zero
sized arrays.

> 
> So what we maybe need is another expand modifier that tells us
> to not use movmisalign when expanding the base object.

In that case we can just as well use EXPAND_MEMORY.  If so, I'd do
that only when there is a zero sized array involved in order not to
avoid using movmisalign when we can.

> Or we need to stop expanding the base object with VOIDmode and
> instead pass down the mode of the access (TYPE_MODE (TREE_TYPE
> (exp)) which will probably confuse other code. 

Well, because I think the problem is not (just) mixing up modes, I
don't think this will help either.

> So eventually not
> handling the misaligned case by expansion of the base but inlined
> similar to how it was in expand_assignment would be needed here.

At least now I fail to see how this would be different from copying a
large portion of expand_expr_real_1 (all of the MEM_REF

Re: [PATCH] Portable Volatility Warning

2013-09-25 Thread Sandra Loosemore


On 09/25/2013 07:23 AM, Bernd Edlinger wrote:


Richard: I do not know, is this a political issue, that is blocking
the whole of Sandra's patch?

Actually we (softing.com) do not really care what happens to the
default setting of -fstrict-volatile-bitfields. Maybe you could look at
reviewing Sandra's part 1,2,3 independently of the rest?


I can't speak for all of Mentor Graphics, but I personally do not really 
care what the default setting of -fstrict-volatile-bitfields is, either. 
 Looking at it from our customers' point of view, though, this option 
currently causes code to be generated that is just broken and wrong and 
not conforming to either AAPCS or the C11/C++11 memory model.  And 
because it's enabled by default on ARM, that means GCC generates broken 
code by default on ARM.  I think users would rather live with having to 
pass -fstrict-volatile-bitfields explicitly than to have a default that 
is not useful in any way.


BTW, it was pointed out to me offline that there is precedent for GCC 
choosing to ignore details of a target-specific ABI in favor of uniform 
behavior across targets -- see the rant against unsigned bit-fields here:


http://gcc.gnu.org/onlinedocs/gcc/Non_002dbugs.html#Non_002dbugs

Anyway, I am hoping that we can reach some closure on this issue before 
it is too late to get the fix into GCC 4.9.  It's very frustrating to me 
that I have been working on it since May or June, tried hard to 
incorporate all the feedback I received on the initial patches, spent a 
lot of time testing, etc  and the whole process just seems stuck.  :-(


-Sandra

Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.

2013-09-25 Thread Vladimir Makarov

On 09/24/2013 10:40 AM, Yvan Roux wrote:
> Hi,
>
> This patch removes REG_DEAD and REG_UNUSED notes in update_inc_notes,
> as it is what the function is supposed to do (see the comments) and as
> keeping these notes produce some failures, at least on ARM.
>
> Thanks,
> Yvan
>
> 2013-09-24  Yvan Roux  
>
> * lra.c (update_inc_notes): Remove all REG_DEAD and REG_UNUSED notes.
Ok.  Thanks, Yvan.

Another possibility would run one more time LRA live pass at the end of
LRA to update notes correctly but it would be wasting CPU time as the
same will be done by DF-framework at the end of whole RA and the LRA
live pass makes much more than notes.

Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.

2013-09-25 Thread Vladimir Makarov

On 09/24/2013 03:40 PM, Mike Stump wrote:
> On Sep 24, 2013, at 12:23 PM, Steven Bosscher  wrote:
>> On Tue, Sep 24, 2013 at 5:03 PM, Eric Botcazou wrote:
 This patch removes REG_DEAD and REG_UNUSED notes
>>> DF framework is supposed to do it for you.
>> Unfortunately LRA uses its own DF framework.
I'd say fortunately.  Otherwise, LRA would decrease whole compiler speed
by upto 10%.  I can say it because of the first version of LRA was based
on DF framework.

I guess that was one reason why reload was never moved to DF too. 
LRA/reload is the most live analysis intensive passes (too many changes
on several sub-passes).

LRA uses also REG notes internally and recreates them several time as a
by-product of pseudo live-range analysis used for (reload) pseudo
assignments.

I'd be glad if we move to DF without compilation time degradation.

I already wrote that DF-design (although it has a nice API which is good
for prototyping) has a bad design with resource usage point of view. 
Keeping a lot of duplicated info by RTL side increases memory footprint
very much, worsens data locality, and slows down the compiler.  As I
remember correctly, GCC steering committing permitted 5% compiler time
degradation as a criterium to include it into the trunk and DF achieved
this with difficulties.

> Is that a bug, or a feature?

Ping^6/update: contribute Synopsys Designware ARC port (6/7): documentation

2013-09-25 Thread Joern Rennecke


The extend.texi context has changed due to the addition of the MSP430 port.

OK to apply?
2013-09-24  Joern Rennecke  
Jeremy Bennett  

* doc/install.texi (--with-cpu): Mention ARC.
(arc-*-elf32): New paragraph.
(arc-linux-uclibc): Likewise.
* doc/md.texi (Machine Constraints): Add ARC part.
* doc/invoke.texi: (menu): Add ARC Options.
(Machine Dependent Options) : Add synopsis.
(node ARC Options): Add.
* doc/extend.texi (long_call / short_call attribute): Add ARC.
(ARC Built-in Functions): New section defining
generic ARC built-in functions.
(ARC SIMD Built-in Functions): New section defining SIMD specific
built-in functions.
(Declaring Attributes of Functions): Extended
description of short_call and long_call attributes for ARC and
added index entries.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index edf0e28..ef10f4c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2813,8 +2813,9 @@ least version 2.20.1), and GNU C library (at least 
version 2.11.1).
 
 @item interrupt
 @cindex interrupt handler functions
-Use this attribute on the ARM, AVR, CR16, Epiphany, M32C, M32R/D, m68k, MeP, 
MIPS,
-MSP430, RL78, RX and Xstormy16 ports to indicate that the specified function 
is an
+Use this attribute on the ARC, ARM, AVR, CR16, Epiphany, M32C, M32R/D,
+m68k, MeP, MIPS, MSP430, RL78, RX and Xstormy16 ports to indicate that
+the specified function is an
 interrupt handler.  The compiler generates function entry and exit
 sequences suitable for use in an interrupt handler when this attribute
 is present.  With Epiphany targets it may also generate a special section with
@@ -2823,6 +2824,16 @@ code to initialize the interrupt vector table.
 Note, interrupt handlers for the Blackfin, H8/300, H8/300H, H8S, MicroBlaze,
 and SH processors can be specified via the @code{interrupt_handler} attribute.
 
+Note, on the ARC, you must specify the kind of interrupt to be handled
+in a parameter to the interrupt attribute like this:
+
+@smallexample
+void f () __attribute__ ((interrupt ("ilink1")));
+@end smallexample
+
+Permissible values for this parameter are: @w{@code{ilink1}} and
+@w{@code{ilink2}}.
+
 Note, on the AVR, the hardware globally disables interrupts when an
 interrupt is executed.  The first instruction of an interrupt handler
 declared with this attribute is a @code{SEI} instruction to
@@ -3021,18 +3032,33 @@ unit.  This is to allow easy merging of multiple 
compilation units into one,
 for example, by using the link-time optimization.  For this reason the
 attribute is not allowed on types to annotate indirect calls.
 
-@item long_call/short_call
+@item long_call/medium_call/short_call
+@cindex indirect calls on ARC
 @cindex indirect calls on ARM
-This attribute specifies how a particular function is called on
-ARM and Epiphany.  Both attributes override the
-@option{-mlong-calls} (@pxref{ARM Options})
-command-line switch and @code{#pragma long_calls} settings.  The
+@cindex indirect calls on Epiphany
+These attribute specifies how a particular function is called on
+ARC, ARM and Epiphany - with @code{medium_call} being specific to ARC.
+These attributes override the
+@option{-mlong-calls} (@pxref{ARM Options} and @ref{ARC Options})
+and @option{-mmedium-calls} (@pxref{ARC Options})
+command-line switches and @code{#pragma long_calls} settings.  For ARM, the
 @code{long_call} attribute indicates that the function might be far
 away from the call site and require a different (more expensive)
 calling sequence.   The @code{short_call} attribute always places
 the offset to the function from the call site into the @samp{BL}
 instruction directly.
 
+For ARC, a function marked with the @code{long_call} attribute is
+always called using register-indirect jump-and-link instructions,
+thereby enabling the called function to be placed anywhere within the
+32-bit address space.  A function marked with the @code{medium_call}
+attribute will always be close enough to be called with an unconditional
+branch-and-link instruction, which has a 25-bit offset from
+the call site.  A function marked with the @code{short_call}
+attribute will always be close enough to be called with a conditional
+branch-and-link instruction, which has a 21-bit offset from
+the call site.
+
 @item longcall/shortcall
 @cindex functions called via pointer on the RS/6000 and PowerPC
 On the Blackfin, RS/6000 and PowerPC, the @code{longcall} attribute
@@ -8870,6 +8896,8 @@ instructions, but allow the compiler to schedule those 
calls.
 
 @menu
 * Alpha Built-in Functions::
+* ARC Built-in Functions::
+* ARC SIMD Built-in Functions::
 * ARM iWMMXt Built-in Functions::
 * ARM NEON Intrinsics::
 * AVR Built-in Functions::
@@ -8977,6 +9005,430 @@ void *__builtin_thread_pointer (void)
 void __builtin_set_thread_pointer (void *)
 @end smallexample
 
+@node ARC Built-in Functions
+@subsection ARC Bu

[PATCH] Trivial cleanup

2013-09-25 Thread Jeff Law



I was looking at the vec class to figure out the best way to do some 
things and realized we have a "last" member function.  Using foo.last() 
is clearer than foo[foo.length() - 1]


On a related note, our current standards require a space between a 
function name and the open paren for its argument list.  However, it 
leads to fugly code in some cases.  Assume foo is an vec instance and we 
want to look at something in the last vector element.  Our current 
standards would imply something like this:


foo.last ()->bar


Which is how the current patch formats that code.  However, I typically 
see this more often in C++ codebases as


foo.last()->bar

But then, what if we just want the last element without peeking deeper 
inside it?


foo.last ()?

or

foo.last()?


Do we want to make any changes in our style guidelines to handle this 
better?


Anyway, bootstrapped & regression tested on x86_64-unknown-linux-gnu. 
Installed onto the trunk.


Jeff
* tree-ssa-threadedge.c (thread_across_edge): Use foo.last () rather
than foo[foo.length () - 1] to access last member in a vec.
* tree-ssa-threadupdate.c (register_jump_thread): Similarly.

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 47db280..2ca56342 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -947,8 +947,7 @@ thread_across_edge (gimple dummy_cond,
}
 
  remove_temporary_equivalences (stack);
- propagate_threaded_block_debug_into (path[path.length () - 1]->dest,
-  e->dest);
+ propagate_threaded_block_debug_into (path.last ()->dest, e->dest);
  register_jump_thread (path, false);
  path.release ();
  return;
@@ -987,7 +986,7 @@ thread_across_edge (gimple dummy_cond,
path.safe_push (taken_edge);
found = false;
if ((e->flags & EDGE_DFS_BACK) == 0
-   || ! cond_arg_set_in_bb (path[path.length () - 1], e->dest))
+   || ! cond_arg_set_in_bb (path.last (), e->dest))
  found = thread_around_empty_blocks (taken_edge,
  dummy_cond,
  handle_dominating_asserts,
@@ -999,7 +998,7 @@ thread_across_edge (gimple dummy_cond,
   record the jump threading opportunity.  */
if (found)
  {
-   propagate_threaded_block_debug_into (path[path.length () - 1]->dest,
+   propagate_threaded_block_debug_into (path.last ()->dest,
 taken_edge->dest);
register_jump_thread (path, true);
  }
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 4131128..fd5234c 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1401,7 +1401,7 @@ register_jump_thread (vec path, bool through_joiner)
   if (!through_joiner)
 e3 = NULL;
   else
-e3 = path[path.length () - 1];
+e3 = path.last ();
 
   /* This can occur if we're jumping to a constant address or
  or something similar.  Just get out now.  */

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-09-25 Thread Vladimir Makarov

On 09/24/2013 07:57 PM, Wei Mi wrote:
> Hi,
>
> This patch is to address the problem described here:
> http://gcc.gnu.org/ml/gcc/2013-09/msg00187.html
>
> The patch changes ALLOCNO_MODE of a pseudo reg to be outermode if the
> pseudo reg is used in a paradoxical subreg, so IRA will not mistakenly
> assign an operand with a bigger mode to a smaller hardreg which
> couldn't find a pair register.
>
> No test is added because I cannot create a small testcase to reproduce
> the problem on trunk, the difficulty of which was described in the
> above post.
>
> bootstrap and regression pass. ok for trunk?
>
> Thanks,
> Wei Mi.
>
> 2013-09-24  Wei Mi  
>
> * ira-build.c (create_insn_allocnos): Fix ALLOCNO_MODE
> in the case of paradoxical subreg.
>
> Index: ira-build.c
> ===
> --- ira-build.c (revision 201963)
> +++ ira-build.c (working copy)
> @@ -1688,6 +1688,30 @@ create_insn_allocnos (rtx x, bool output
> }
>return;
>  }
> +  else if (code == SUBREG)
> +{
> +  int regno;
> +  rtx subreg_reg = SUBREG_REG (x);
> +  enum machine_mode outermode, innermode;
> +
> +  create_insn_allocnos (subreg_reg, output_p);
> +  /* For paradoxical subreg, set allocno's mode to be
> +the outermode.  */
> +  outermode = GET_MODE (x);
> +  innermode = GET_MODE (subreg_reg);
> +  if (REG_P (subreg_reg)
> + && (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode)))
> +   {
> + regno = REGNO (subreg_reg);
> + if (regno >= FIRST_PSEUDO_REGISTER)
> +   {
> + ira_allocno_t a = ira_curr_regno_allocno_map[regno];
> + ira_assert (a != NULL);
> + ALLOCNO_MODE (a) = outermode;
> +   }
> +   }
> +  return;
> +}
>else if (code == SET)
>  {
>create_insn_allocnos (SET_DEST (x), true);
Wei Mi, sorry for misleading you.  I thought more about the problem.  It
is not a trivial one.

The patch is ok for the code correctness but it might hurt code
performance. For example, we have code

... (reg:DI) ...
...
... (subreg:TI (reg:DI))
...
...(reg:DI)

We need two hard regs only for the second place by transforming

p = (reg:DI)

...(subreg:TI p)

With this patch we requires two hard regs for the all live range of the
original pseudo (which can be quite long).  It might considerably worsen
code performance.

So the problem could be fixed in LRA which can make this transformation
or even in IRA (that would be even better as we put pseudo P into the
global picture vs. probably spilling some pseudos for P in LRA).

I need some time to think what is better (e.g. I don't know how to
implement it in IRA without big compilation slow down).  In any case,
the solution for the problem will be not that easy as in the patch.

Sorry again.

Re: [PATCH] Trivial cleanup

2013-09-25 Thread Andrew MacLeod


On 09/25/2013 11:33 AM, Jeff Law wrote:


I was looking at the vec class to figure out the best way to do some 
things and realized we have a "last" member function.  Using 
foo.last() is clearer than foo[foo.length() - 1]


On a related note, our current standards require a space between a 
function name and the open paren for its argument list.  However, it 
leads to fugly code in some cases.  Assume foo is an vec instance and 
we want to look at something in the last vector element.  Our current 
standards would imply something like this:


foo.last ()->bar


Which is how the current patch formats that code.  However, I 
typically see this more often in C++ codebases as


foo.last()->bar

But then, what if we just want the last element without peeking deeper 
inside it?


foo.last ()?

or

foo.last()?


Do we want to make any changes in our style guidelines to handle this 
better?


I noticed that with the wrapper conversion, often you will get a 
sequence of 3 or more method calls, and its quite unbearable to have the 
spaces.

simple things like
  int unsignedsrcp = ptrvar.type().type().type_unsigned();
vs
  int unsignedsrcp = ptrvar.type ().type ().type_unsigned ();

I was going to bring it up at some point too.  My preference is strongly 
to simply eliminate the space on methods...


Andrew

Re: [PATCH] Trivial cleanup

2013-09-25 Thread Paolo Carlini


On 9/25/13 10:46 AM, Andrew MacLeod wrote:
I was going to bring it up at some point too.  My preference is 
strongly to simply eliminate the space on methods...

Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time.

Paolo.

Re: [PATCH] Trivial cleanup

2013-09-25 Thread Jeff Law


On 09/25/2013 09:48 AM, Paolo Carlini wrote:

On 9/25/13 10:46 AM, Andrew MacLeod wrote:

I was going to bring it up at some point too.  My preference is
strongly to simply eliminate the space on methods...

Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time.
Yea.  I actually reviewed the libstdc++ guidelines to see where they 
differed from GNU's C guidelines.


I'm strongly in favor of dropping the horizontal whitespace between the 
method name and its open paren when the result is then dereferenced.  ie

foo.last()->e rather than foo.last ()->e.

However, do we want an exception when the result isn't immediately 
dereferenced?  ie, foo.last () or foo.last()?


jeff

Re: [C++1y] [PATCH 3/4] ... canonical type workaround and refactoring

2013-09-25 Thread Jason Merrill


On 09/24/2013 02:05 AM, Adam Butcher wrote:

Shall I push the patch below to trunk as an intermediate workaround
whilst I get to refactoring to support on-the-fly template parm synthesis?


I think let's wait for a better fix.


On the subject of on-the-fly synthesis: I haven't started yet but I'm
thinking to trigger in 'cp_parser_simple_type_specifier' when
'current_binding_level->kind == sk_function_parms'.  I can foresee a
potential issue with packs in that, upon reading the 'auto' (potentially
nested in some other type), a plain template parm will be synthesized;
but it may need to be a pack parm type if '...' is found later.


Hmm, good point.

I think there are two options:

1) Build up the type as normal and use tsubst to replace the non-pack 
template parameter with a pack if needed.


2) If we see 'auto', scan ahead (possibly via tentative parsing) to see 
if there's a ...


Jason

Re: RFA: Store the REG_BR_PROB probability directly as an int

2013-09-25 Thread Steve Ellcey

On Tue, 2013-09-24 at 21:07 +0200, Andreas Schwab wrote:
> Richard Sandiford  writes:
> 
> > Sorry for the breakage.  I think we need to handle INT_LIST in the same way
> > as INSN_LIST though, and eliminate in XEXP (x, 1).
> >
> > How about the attached?  Testing in progress...
> 
> Works for me as well.
> 
> Andreas.

This patch worked for me as well on MIPS.  I did a complete build and
test overnight.

Steve Ellcey
sell...@mips.com

Re: [PATCH] Trivial cleanup

2013-09-25 Thread Jakub Jelinek

On Wed, Sep 25, 2013 at 11:46:12AM -0400, Andrew MacLeod wrote:
> I noticed that with the wrapper conversion, often you will get a
> sequence of 3 or more method calls, and its quite unbearable to have
> the spaces.
> simple things like
>   int unsignedsrcp = ptrvar.type().type().type_unsigned();
> vs
>   int unsignedsrcp = ptrvar.type ().type ().type_unsigned ();
> 
> I was going to bring it up at some point too.  My preference is
> strongly to simply eliminate the space on methods...

My preference is to keep the spaces, it makes code more readable,
no space before ( looks very ugly to me.

Jakub

Re: [PATCH v4 00/20] resurrect automatic dependencies

2013-09-25 Thread Tom Tromey

Paolo> The series is good!

Thanks!

I'm checking it in now.  I'll be around to fix up any bugs I introduce.
I'll send a note to the gcc list when I'm done, just to warn people.

Tom

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-09-25 Thread Wei Mi

> performance. For example, we have code
>
> ... (reg:DI) ...
> ...
> ... (subreg:TI (reg:DI))
> ...
> ...(reg:DI)
>
> We need two hard regs only for the second place by transforming
>
> p = (reg:DI)
>
> ...(subreg:TI p)
>
> With this patch we requires two hard regs for the all live range of the
> original pseudo (which can be quite long).  It might considerably worsen
> code performance.
>
> So the problem could be fixed in LRA which can make this transformation
> or even in IRA (that would be even better as we put pseudo P into the
> global picture vs. probably spilling some pseudos for P in LRA).
>

Thanks for your detailed explanation. Now I understand what you concern here.

> I need some time to think what is better (e.g. I don't know how to
> implement it in IRA without big compilation slow down).  In any case,
> the solution for the problem will be not that easy as in the patch.

To fix it in IRA, it looks like we want a live range splitting pass
for pseudos used in paradoxical subreg here. Is the potential
compilation slow down you mention here caused by more allocnos
introduced by the live range splitting, or something else?

Thanks,
Wei Mi.

Re: [PATCH v4 00/20] resurrect automatic dependencies

2013-09-25 Thread Jan Hubicka

> Paolo> The series is good!
> 
> Thanks!
> 
> I'm checking it in now.  I'll be around to fix up any bugs I introduce.
> I'll send a note to the gcc list when I'm done, just to warn people.

Thank you for working on this!
Honza
> 
> Tom

[gomp4] Compiler side of depend clause support

2013-09-25 Thread Jakub Jelinek

Hi!

This is the compiler side of the depend clause support.
If a task has any depend clauses, we pass an array of pointers
as 8th argument to GOMP_task, with value 8 ored into the 7th
argument (flags) to signalize the presence of the 8th argument.
The array starts with two integers (casted to void * pointers),
first is total number of depend clauses, second is number
of inout/out depend clauses (those are treated the same), followed
by addresses of the first bytes of all the objects, first from the
depend(inout:) or depend(out:) clauses, then from depend(in:) clauses.

Will commit tomorrow unless somebody chimes in.

2013-09-25  Jakub Jelinek  

* omp-low.c (expand_task_call): If there are depend clauses,
pass bit 8 in 7th argument and pass pointer to depend array
as 8th argument.
(lower_depend_clauses): New function.
(lower_omp_taskreg): Handle depend clauses.
* omp-builtins.def (BUILT_IN_GOMP_TASK): Use
BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR
instead of BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT.
* builtin-types.def
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT): Remove.
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR): New.
fortran/
* types.def
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT): Remove.
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR): New.

--- gcc/omp-low.c.jj2013-09-25 09:51:45.0 +0200
+++ gcc/omp-low.c   2013-09-25 15:34:53.705241334 +0200
@@ -4203,7 +4203,7 @@ expand_parallel_call (struct omp_region
 static void
 expand_task_call (basic_block bb, gimple entry_stmt)
 {
-  tree t, t1, t2, t3, flags, cond, c, c2, clauses;
+  tree t, t1, t2, t3, flags, cond, c, c2, clauses, depend;
   gimple_stmt_iterator gsi;
   location_t loc = gimple_location (entry_stmt);
 
@@ -4217,8 +4217,9 @@ expand_task_call (basic_block bb, gimple
 
   c = find_omp_clause (clauses, OMP_CLAUSE_UNTIED);
   c2 = find_omp_clause (clauses, OMP_CLAUSE_MERGEABLE);
+  depend = find_omp_clause (clauses, OMP_CLAUSE_DEPEND);
   flags = build_int_cst (unsigned_type_node,
-(c ? 1 : 0) + (c2 ? 4 : 0));
+(c ? 1 : 0) + (c2 ? 4 : 0) + (depend ? 8 : 0));
 
   c = find_omp_clause (clauses, OMP_CLAUSE_FINAL);
   if (c)
@@ -4229,6 +4230,10 @@ expand_task_call (basic_block bb, gimple
   build_int_cst (unsigned_type_node, 0));
   flags = fold_build2_loc (loc, PLUS_EXPR, unsigned_type_node, flags, c);
 }
+  if (depend)
+depend = OMP_CLAUSE_DECL (depend);
+  else
+depend = build_int_cst (ptr_type_node, 0);
 
   gsi = gsi_last_bb (bb);
   t = gimple_omp_task_data_arg (entry_stmt);
@@ -4244,9 +4249,10 @@ expand_task_call (basic_block bb, gimple
 t3 = build_fold_addr_expr_loc (loc, t);
 
   t = build_call_expr (builtin_decl_explicit (BUILT_IN_GOMP_TASK),
-  7, t1, t2, t3,
+  8, t1, t2, t3,
   gimple_omp_task_arg_size (entry_stmt),
-  gimple_omp_task_arg_align (entry_stmt), cond, flags);
+  gimple_omp_task_arg_align (entry_stmt), cond, flags,
+  depend);
 
   force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
@@ -9232,6 +9238,68 @@ create_task_copyfn (gimple task_stmt, om
   pop_cfun ();
 }
 
+static void
+lower_depend_clauses (gimple stmt, gimple_seq *iseq, gimple_seq *oseq)
+{
+  tree c, clauses;
+  gimple g;
+  size_t n_in = 0, n_out = 0, idx = 2, i;
+
+  clauses = find_omp_clause (gimple_omp_task_clauses (stmt),
+OMP_CLAUSE_DEPEND);
+  gcc_assert (clauses);
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND)
+  switch (OMP_CLAUSE_DEPEND_KIND (c))
+   {
+   case OMP_CLAUSE_DEPEND_IN:
+ n_in++;
+ break;
+   case OMP_CLAUSE_DEPEND_OUT:
+   case OMP_CLAUSE_DEPEND_INOUT:
+ n_out++;
+ break;
+   default:
+ gcc_unreachable ();
+   }
+  tree type = build_array_type_nelts (ptr_type_node, n_in + n_out + 2);
+  tree array = create_tmp_var (type, NULL);
+  tree r = build4 (ARRAY_REF, ptr_type_node, array, size_int (0), NULL_TREE,
+  NULL_TREE);
+  g = gimple_build_assign (r, build_int_cst (ptr_type_node, n_in + n_out));
+  gimple_seq_add_stmt (iseq, g);
+  r = build4 (ARRAY_REF, ptr_type_node, array, size_int (1), NULL_TREE,
+ NULL_TREE);
+  g = gimple_build_assign (r, build_int_cst (ptr_type_node, n_out));
+  gimple_seq_add_stmt (iseq, g);
+  for (i = 0; i < 2; i++)
+{
+  if ((i ? n_in : n_out) == 0)
+   continue;
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
+   && ((OMP_CLAUSE_DEPEND_KIND (c) != OMP_CLAUSE_DEPEND_IN) ^ i))
+ {
+   tree t = OMP_CL

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-09-25 Thread Vladimir Makarov

On 09/25/2013 12:42 PM, Wei Mi wrote:
>> performance. For example, we have code
>>
>> ... (reg:DI) ...
>> ...
>> ... (subreg:TI (reg:DI))
>> ...
>> ...(reg:DI)
>>
>> We need two hard regs only for the second place by transforming
>>
>> p = (reg:DI)
>>
>> ...(subreg:TI p)
>>
>> With this patch we requires two hard regs for the all live range of the
>> original pseudo (which can be quite long).  It might considerably worsen
>> code performance.
>>
>> So the problem could be fixed in LRA which can make this transformation
>> or even in IRA (that would be even better as we put pseudo P into the
>> global picture vs. probably spilling some pseudos for P in LRA).
>>
> Thanks for your detailed explanation. Now I understand what you concern here.
>
>> I need some time to think what is better (e.g. I don't know how to
>> implement it in IRA without big compilation slow down).  In any case,
>> the solution for the problem will be not that easy as in the patch.
> To fix it in IRA, it looks like we want a live range splitting pass
> for pseudos used in paradoxical subreg here. Is the potential
> compilation slow down you mention here caused by more allocnos
> introduced by the live range splitting, or something else?
>
>
 To define for what occurrence of the pseudo we should do the
transformation, we need to create allocnos and calculate reg classes to
know what paradoxical subreg needs more hard regs (the transformations
can not be done for all paradoxical subregs as my experience shows many
RTL changes result in worse RA even if we have heuristics to remove the
generated changes as in this case would be trying to assign the same
hard reg for the original and the new pseudo).
  After doing the transformations, we need to recalculate reg classes
and rebuild allocnos (both are expensive).  To speed up the process it
could be implemented as some kind of update of already existing data but
it will complicate code much.

So right now I think implementing this in LRA would be easier  Still LRA
has a pretty good register (re-)allocation (although it is worse than in
IRA).

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-25 Thread Eric Botcazou

> Sorry for missing this problem when committing Kugan's patch.
> 
> I have just committed the attached patch, which I hope fixes all the
> spaces/indentation issues introduced.

Thanks a lot!

-- 
Eric Botcazou

Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-25 Thread Eric Botcazou

> 2013-09-24  Yvan Roux  
> Vladimir Makarov  
> 
> * rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield
> operations from the least significant bit.
> (strip_address_mutations): Add bitfield operations handling.
> (must_be_index_p): Add shifting and rotate operations handling.
> (set_address_base): Use must_be_base_p predicate.
> (set_address_index):Use must_be_index_p predicate.

OK for mainline, if you add the missing space after GET_MODE in

 enum machine_mode mode = GET_MODE(XEXP (x, 0));

-- 
Eric Botcazou

Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-25 Thread Eric Botcazou

> FWIW, I'd prefer to keep it as-is, since must_be_base_p (x) and
> must_be_index_p (x) don't imply that we should look specifically at
> XEXP (x, 0) (rather than just X, or XEXP (x, 1), etc.).  I think it's
> better to keep the code tests and the associated XEXPs together.

Feel free to revert this part, but then add appropriate comments explaining 
why we are interested in LO_SUM for set_address_base and in MULT and the 5 
others for set_address_index.  If it's because the former is rather a base 
than an index and vice-versa for the latter, then it's even clearer with the 
predicates I think.

-- 
Eric Botcazou

[PATCH] fortran/PR58113

2013-09-25 Thread Bernd Edlinger

Hi,

this test case fails very often, and the reason is not in GCC but
in a missing glibc rounding support for strtod.

This patch fixes the test case, to first determine if the
rounding support is available. This is often the case for real(16)
thru the libquadmath. So even in cases where the test case usually
fails it still tests something with this patch.

Ok for trunk?

Regards
Bernd Edlinger2013-09-25  Bernd Edlinger  

PR fortran/58113
* gfortran.dg/round_4.f90: Check for rounding support.

--- gcc/testsuite/gfortran.dg/round_4.f90   2013-07-21 13:54:27.0 
+0200
+++ gcc/testsuite/gfortran.dg/round_4.f90   2013-08-23 10:16:32.0 
+0200
@@ -27,6 +27,17 @@
   real(xp) :: r10p, r10m, ref10u, ref10d
   real(qp) :: r16p, r16m, ref16u, ref16d
   character(len=20) :: str, round
+  logical :: rnd4, rnd8, rnd10, rnd16
+
+  ! Test for which types glibc's strtod function supports rounding
+  str = '0.01 0.01 0.01 0.01'
+  read (str, *, round='up') r4p, r8p, r10p, r16p
+  read (str, *, round='down') r4m, r8m, r10m, r16m
+  rnd4 = r4p /= r4m
+  rnd8 = r8p /= r8m
+  rnd10 = r10p /= r10m
+  rnd16 = r16p /= r16m
+!  write (*, *) rnd4, rnd8, rnd10, rnd16
 
   ref4u = 0.10001_4
   ref8u = 0.10001_8
@@ -55,40 +66,40 @@
 
   round = 'up'
   call t()
-  if (r4p  /= ref4u  .or. r4m  /= -ref4d)  call abort()
-  if (r8p  /= ref8u  .or. r8m  /= -ref8d)  call abort()
-  if (r10p /= ref10u .or. r10m /= -ref10d) call abort()
-  if (r16p /= ref16u .or. r16m /= -ref16d) call abort()
+  if (rnd4  .and. (r4p  /= ref4u  .or. r4m  /= -ref4d))  call abort()
+  if (rnd8  .and. (r8p  /= ref8u  .or. r8m  /= -ref8d))  call abort()
+  if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10d)) call abort()
+  if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16d)) call abort()
 
   round = 'down'
   call t()
-  if (r4p  /= ref4d  .or. r4m  /= -ref4u)  call abort()
-  if (r8p  /= ref8d  .or. r8m  /= -ref8u)  call abort()
-  if (r10p /= ref10d .or. r10m /= -ref10u) call abort()
-  if (r16p /= ref16d .or. r16m /= -ref16u) call abort()
+  if (rnd4  .and. (r4p  /= ref4d  .or. r4m  /= -ref4u))  call abort()
+  if (rnd8  .and. (r8p  /= ref8d  .or. r8m  /= -ref8u))  call abort()
+  if (rnd10 .and. (r10p /= ref10d .or. r10m /= -ref10u)) call abort()
+  if (rnd16 .and. (r16p /= ref16d .or. r16m /= -ref16u)) call abort()
 
   round = 'zero'
   call t()
-  if (r4p  /= ref4d  .or. r4m  /= -ref4d)  call abort()
-  if (r8p  /= ref8d  .or. r8m  /= -ref8d)  call abort()
-  if (r10p /= ref10d .or. r10m /= -ref10d) call abort()
-  if (r16p /= ref16d .or. r16m /= -ref16d) call abort()
+  if (rnd4  .and. (r4p  /= ref4d  .or. r4m  /= -ref4d))  call abort()
+  if (rnd8  .and. (r8p  /= ref8d  .or. r8m  /= -ref8d))  call abort()
+  if (rnd10 .and. (r10p /= ref10d .or. r10m /= -ref10d)) call abort()
+  if (rnd16 .and. (r16p /= ref16d .or. r16m /= -ref16d)) call abort()
 
   round = 'nearest'
   call t()
-  if (r4p  /= ref4u  .or. r4m  /= -ref4u)  call abort()
-  if (r8p  /= ref8u  .or. r8m  /= -ref8u)  call abort()
-  if (r10p /= ref10u .or. r10m /= -ref10u) call abort()
-  if (r16p /= ref16u .or. r16m /= -ref16u) call abort()
+  if (rnd4  .and. (r4p  /= ref4u  .or. r4m  /= -ref4u))  call abort()
+  if (rnd8  .and. (r8p  /= ref8u  .or. r8m  /= -ref8u))  call abort()
+  if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10u)) call abort()
+  if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16u)) call abort()
 
 ! Same as nearest (but rounding towards zero if there is a tie
 ! [does not apply here])
   round = 'compatible'
   call t()
-  if (r4p  /= ref4u  .or. r4m  /= -ref4u)  call abort()
-  if (r8p  /= ref8u  .or. r8m  /= -ref8u)  call abort()
-  if (r10p /= ref10u .or. r10m /= -ref10u) call abort()
-  if (r16p /= ref16u .or. r16m /= -ref16u) call abort()
+  if (rnd4  .and. (r4p  /= ref4u  .or. r4m  /= -ref4u))  call abort()
+  if (rnd8  .and. (r8p  /= ref8u  .or. r8m  /= -ref8u))  call abort()
+  if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10u)) call abort()
+  if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16u)) call abort()
 contains
   subroutine t()
 !print *, round

[gomp4] Fix UDR handling on combined constructs (PR libgomp/58482)

2013-09-25 Thread Jakub Jelinek

Hi!

This patch fixes a problem with UDRs, where if we copy reduction
clauses to multiple constructs (on combined constructs), we weren't copying
OMP_CLAUSE_REDUCTION_PLACEHOLDER (which holds IDENTIFIER_NODE, TREE_VEC
or some similar magic used by {,c_}finish_omp_clauses later on to properly
finalize the clause.

Will commit tomorrow.

2013-09-25  Jakub Jelinek  

PR libgomp/58482
* c-omp.c (c_omp_split_clauses) : Copy
also OMP_CLAUSE_REDUCTION_PLACEHOLDER.

* testsuite/libgomp.c/simd-6.c: New test.
* testsuite/libgomp.c++/simd-8.C: New test.

--- gcc/c-family/c-omp.c.jj 2013-09-25 09:51:45.0 +0200
+++ gcc/c-family/c-omp.c2013-09-25 19:08:05.776289461 +0200
@@ -832,6 +832,8 @@ c_omp_split_clauses (location_t loc, enu
  OMP_CLAUSE_DECL (c) = OMP_CLAUSE_DECL (clauses);
  OMP_CLAUSE_REDUCTION_CODE (c)
= OMP_CLAUSE_REDUCTION_CODE (clauses);
+ OMP_CLAUSE_REDUCTION_PLACEHOLDER (c)
+   = OMP_CLAUSE_REDUCTION_PLACEHOLDER (clauses);
  OMP_CLAUSE_CHAIN (c) = cclauses[C_OMP_CLAUSE_SPLIT_SIMD];
  cclauses[C_OMP_CLAUSE_SPLIT_SIMD] = c;
}
@@ -844,6 +846,8 @@ c_omp_split_clauses (location_t loc, enu
  OMP_CLAUSE_DECL (c) = OMP_CLAUSE_DECL (clauses);
  OMP_CLAUSE_REDUCTION_CODE (c)
= OMP_CLAUSE_REDUCTION_CODE (clauses);
+ OMP_CLAUSE_REDUCTION_PLACEHOLDER (c)
+   = OMP_CLAUSE_REDUCTION_PLACEHOLDER (clauses);
  OMP_CLAUSE_CHAIN (c) = cclauses[C_OMP_CLAUSE_SPLIT_PARALLEL];
  cclauses[C_OMP_CLAUSE_SPLIT_PARALLEL] = c;
  s = C_OMP_CLAUSE_SPLIT_TEAMS;
--- libgomp/testsuite/libgomp.c/simd-6.c.jj 2013-09-25 19:15:09.572135004 
+0200
+++ libgomp/testsuite/libgomp.c/simd-6.c2013-09-25 19:15:05.748158078 
+0200
@@ -0,0 +1,44 @@
+/* PR libgomp/58482 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+extern void abort ();
+int a[1024] __attribute__((aligned (32))) = { 1 };
+struct S { int s; };
+#pragma omp declare reduction (+:struct S:omp_out.s += omp_in.s)
+#pragma omp declare reduction (foo:struct S:omp_out.s += omp_in.s)
+#pragma omp declare reduction (foo:int:omp_out += omp_in)
+
+__attribute__((noinline, noclone)) int
+foo (void)
+{
+  int i, u = 0;
+  struct S s, t;
+  s.s = 0; t.s = 0;
+  #pragma omp parallel for simd aligned(a : 32) reduction(+:s) \
+   reduction(foo:t, u)
+  for (i = 0; i < 1024; i++)
+{
+  int x = a[i];
+  s.s += x;
+  t.s += x;
+  u += x;
+}
+  if (t.s != s.s || u != s.s)
+abort ();
+  return s.s;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+a[i] = (i & 31) + (i / 128);
+  int s = foo ();
+  if (s != 19456)
+abort ();
+  return 0;
+}
--- libgomp/testsuite/libgomp.c++/simd-8.C.jj   2013-09-25 19:15:42.069968629 
+0200
+++ libgomp/testsuite/libgomp.c++/simd-8.C  2013-09-25 19:16:03.615860022 
+0200
@@ -0,0 +1,47 @@
+// PR libgomp/58482
+// { dg-do run }
+// { dg-options "-O2" }
+// { dg-additional-options "-msse2" { target sse2_runtime } }
+// { dg-additional-options "-mavx" { target avx_runtime } }
+
+extern "C" void abort ();
+int a[1024] __attribute__((aligned (32))) = { 1 };
+struct S
+{
+  int s;
+  S () : s (0) {}
+  ~S () {}
+};
+#pragma omp declare reduction (+:S:omp_out.s += omp_in.s)
+#pragma omp declare reduction (foo:S:omp_out.s += omp_in.s)
+#pragma omp declare reduction (foo:int:omp_out += omp_in)
+
+__attribute__((noinline, noclone)) int
+foo ()
+{
+  int i, u = 0;
+  S s, t;
+  #pragma omp parallel for simd aligned(a : 32) reduction(+:s) \
+   reduction(foo:t, u)
+  for (i = 0; i < 1024; i++)
+{
+  int x = a[i];
+  s.s += x;
+  t.s += x;
+  u += x;
+}
+  if (t.s != s.s || u != s.s)
+abort ();
+  return s.s;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+a[i] = (i & 31) + (i / 128);
+  int s = foo ();
+  if (s != 19456)
+abort ();
+}

Jakub

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-09-25 Thread Wei Mi

>  To define for what occurrence of the pseudo we should do the
> transformation, we need to create allocnos and calculate reg classes to
> know what paradoxical subreg needs more hard regs (the transformations
> can not be done for all paradoxical subregs as my experience shows many
> RTL changes result in worse RA even if we have heuristics to remove the
> generated changes as in this case would be trying to assign the same
> hard reg for the original and the new pseudo).
>   After doing the transformations, we need to recalculate reg classes
> and rebuild allocnos (both are expensive).  To speed up the process it
> could be implemented as some kind of update of already existing data but
> it will complicate code much.
>

I see, thanks!

> So right now I think implementing this in LRA would be easier  Still LRA
> has a pretty good register (re-)allocation (although it is worse than in
> IRA).
>

When you get an idea how to fix it in LRA, if you are still busy, I
would be happy to do the implementation if you could brief your idea.

Thanks,
Wei Mi.

Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.

2013-09-25 Thread Ilya Verbin

On 24 Sep 10:04, Richard Henderson wrote:
> On 08/27/2013 11:37 AM, Kirill Yukhin wrote:
> > Hello,
> > 
> >> This patch is still far too large.
> >>
> >> I think you should split it up based on every single mode iterator that
> >> you need to add or change.
> > 
> > Problem is that some iterators are depend on each other, so patches are
> > not going to be tiny.
> > 
> > Here is 1st one. It extends VF iterator - biggest impact I believe
> > 
> > Is it Ok?
> > 
> > Testing:
> >   1. Bootstrap pass.
> >   2. make check shows no regressions.
> >   3. Spec 2000 & 2006 build show no regressions both with and without 
> > -mavx512f option.
> >   4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f 
> > option.
> 
> 
> Ok.
> 
> 
> r~

Checked into main trunk by Kirill Yukhin:
http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00779.html

  -- Ilya

Re: Context sensitive type inheritance graph walking

2013-09-25 Thread Markus Trippelsdorf

On 2013.09.25 at 12:20 +0200, Jan Hubicka wrote:
> this is updated version of 
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00936.html
> 
> Bootstrapped/regtested x86_64-linux.  The patch is tested by ipa-devirt9
> testcase.  I have extra four, but I would like to first fix the case where the
> devirtualization happens in TODO of early_local_passes that is not dumped
> anywhere.  So I plan to post these incrementally once this code is hooked also
> into gimple folding.
> 
> The patch results in 60% more devirtualizations on Firefox and 10% more
> speculative devirtualization.  I think main component missing now is code
> determining dynamic type from a call to constructor.  I have some prototype 
> for
> this, too, I would like to discuss incrementally.  I am not 100% sure how much
> harder tracking of dynamic type changes becomes here.

Hi Honza,

I've tested your patch and it failed during the "profile generate" phase of an
LTO/PGO build of Firefox.

Reduced:

markus@x4 /tmp % cat test.ii
class A {
public:
  virtual void m_fn1();
};
class B final : A {
  ~B();
  virtual void m_fn2() { m_fn1(); }
};
B::~B() {}

markus@x4 /tmp % g++ -c -std=c++11 -O2 -c test.ii
test.ii: In member function ‘virtual void B::m_fn2()’:
test.ii:7:16: error: stmt (0x7f85504c3130) marked modified after optimization 
pass: 
   virtual void m_fn2() { m_fn1(); }
^
# .MEM_6 = VDEF <.MEM_2(D)>
A::m_fn1 (_5);
test.ii:7:16: internal compiler error: verify_ssa failed
0xc62364 verify_ssa(bool)
../../gcc/gcc/tree-ssa.c:1046
0xa305a1 execute_function_todo
../../gcc/gcc/passes.c:1834
0xa30d07 execute_todo
../../gcc/gcc/passes.c:1866
0xa32af9 execute_one_ipa_transform_pass
../../gcc/gcc/passes.c:2049
0xa32af9 execute_all_ipa_transforms()
../../gcc/gcc/passes.c:2079
0x7e3cc0 expand_function
../../gcc/gcc/cgraphunit.c:1743
0x7e5d96 expand_all_functions
../../gcc/gcc/cgraphunit.c:1855
0x7e5d96 compile()
../../gcc/gcc/cgraphunit.c:2192
0x7e6379 finalize_compilation_unit()
../../gcc/gcc/cgraphunit.c:2269
0x5f816e cp_write_global_declarations()
../../gcc/gcc/cp/decl2.c:4360
Please submit a full bug report,

-- 
Markus

Re: [PATCH, LRA, AARCH64] Switching LRA on for AArch64

2013-09-25 Thread Yvan Roux

Hi,

the needed lra analyser patch was commited as r202914.

Thanks,
Yvan

On 24 September 2013 11:03, Yvan Roux  wrote:
> Hi,
>
> The following patch switch LRA on for AArch64.  The patch introduces
> an undocumented option -mlra to use LRA instead of  reload, for a
> testing purpose.  Please notice that this patch is dependent on the
> one submitted in the thread below:
>
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00805.html
>
> Thanks,
> Yvan
>
> 2013-09-24  Yvan Roux  
>
> * config/aarch64/aarch64.opt (mlra): New option.
> * config/aarch64/aarch64.c (aarch64_lra_p): New function.
> (TARGET_LRA_P): Define.

PING Re: [Patch, Fortran] PR57697/58469 - Fix another defined-assignment issue

2013-09-25 Thread Tobias Burnus


* PING * http://gcc.gnu.org/ml/fortran/2013-09/msg00039.html

Additionally pinging for: 
http://gcc.gnu.org/ml/fortran/2013-09/msg00031.html



On September 19, 2013 21:11, Tobias Burnus wrote:

This patch fixes two issues:

a) It could happen that no code change has happened. In that case, the 
one freed an expression which still should be used.


b) In my previous patch, I used a pointer assignment to the temporary 
of the LHS (after its allocation) [only if the LHS was initially 
unassigned]. That lead to a problem with double deallocation 
(temporary + LHS). In the previous test case, it didn't matter as the 
LHS wasn't freed (implicit SAVE of in the main program). That's now 
solved by a NULL-pointer assignment.


Finally, I corrected some indenting issues and removed unreachable code.

Build and regtested on x86-64-gnu-linux.
OK for the trunk and the 4.8 branch?

Tobias

PS: For the testcase of (a), I am not quite sure whether the intrinsic 
assignment should invoke the defined assignment. It currently doesn't 
for gfortran and crayftn. In any case, the invalid freeing is wrong.

Re: [PATCH] fortran/PR58113

2013-09-25 Thread Tobias Burnus


Bernd Edlinger wrote:

this test case fails very often, and the reason is not in GCC but
in a missing glibc rounding support for strtod.

This patch fixes the test case, to first determine if the
rounding support is available. This is often the case for real(16)
thru the libquadmath. So even in cases where the test case usually
fails it still tests something with this patch.

Ok for trunk?



First, for Fortran patches, it helps if you CC fortran@ as otherwise the 
email might be missed.


Your change doesn't really directly check whether strtod handles 
rounding but whether libgfortran (invoking strtod) supports up/down 
rounding.


Hence, after your patch, one effectively checks - given that up/down 
rounding works (or at least produces different values) - that the 
nearest/zero/up/down give the correct result.


As only few strtod implementations currently support rounding, it is 
probably the best approach. However, I think it merits a comment making 
clear what it now checked (and what isn't). Maybe something along my 
paragraph (but polished) - and removing the then obsoleted parts of the 
existing comment.


Except for the comment update, the patch looks fine to me.

Tobias

PS: I wonder whether there is a good way to do rounded strtod without 
relying on the system's libc to handle it.



changelog-round4.txt


2013-09-25  Bernd Edlinger

PR fortran/58113
* gfortran.dg/round_4.f90: Check for rounding support.


patch-round4.diff


--- gcc/testsuite/gfortran.dg/round_4.f90   2013-07-21 13:54:27.0 
+0200
+++ gcc/testsuite/gfortran.dg/round_4.f90   2013-08-23 10:16:32.0 
+0200
@@ -27,6 +27,17 @@
real(xp) :: r10p, r10m, ref10u, ref10d
real(qp) :: r16p, r16m, ref16u, ref16d
character(len=20) :: str, round
+  logical :: rnd4, rnd8, rnd10, rnd16
+
+  ! Test for which types glibc's strtod function supports rounding
+  str = '0.01 0.01 0.01 0.01'
+  read (str, *, round='up') r4p, r8p, r10p, r16p
+  read (str, *, round='down') r4m, r8m, r10m, r16m
+  rnd4 = r4p /= r4m
+  rnd8 = r8p /= r8m
+  rnd10 = r10p /= r10m
+  rnd16 = r16p /= r16m
+!  write (*, *) rnd4, rnd8, rnd10, rnd16

ref4u = 0.10001_4
ref8u = 0.10001_8
@@ -55,40 +66,40 @@

round = 'up'
call t()
-  if (r4p  /= ref4u  .or. r4m  /= -ref4d)  call abort()
-  if (r8p  /= ref8u  .or. r8m  /= -ref8d)  call abort()
-  if (r10p /= ref10u .or. r10m /= -ref10d) call abort()
-  if (r16p /= ref16u .or. r16m /= -ref16d) call abort()
+  if (rnd4  .and. (r4p  /= ref4u  .or. r4m  /= -ref4d))  call abort()
+  if (rnd8  .and. (r8p  /= ref8u  .or. r8m  /= -ref8d))  call abort()
+  if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10d)) call abort()
+  if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16d)) call abort()

round = 'down'
call t()
-  if (r4p  /= ref4d  .or. r4m  /= -ref4u)  call abort()
-  if (r8p  /= ref8d  .or. r8m  /= -ref8u)  call abort()
-  if (r10p /= ref10d .or. r10m /= -ref10u) call abort()
-  if (r16p /= ref16d .or. r16m /= -ref16u) call abort()
+  if (rnd4  .and. (r4p  /= ref4d  .or. r4m  /= -ref4u))  call abort()
+  if (rnd8  .and. (r8p  /= ref8d  .or. r8m  /= -ref8u))  call abort()
+  if (rnd10 .and. (r10p /= ref10d .or. r10m /= -ref10u)) call abort()
+  if (rnd16 .and. (r16p /= ref16d .or. r16m /= -ref16u)) call abort()

round = 'zero'
call t()
-  if (r4p  /= ref4d  .or. r4m  /= -ref4d)  call abort()
-  if (r8p  /= ref8d  .or. r8m  /= -ref8d)  call abort()
-  if (r10p /= ref10d .or. r10m /= -ref10d) call abort()
-  if (r16p /= ref16d .or. r16m /= -ref16d) call abort()
+  if (rnd4  .and. (r4p  /= ref4d  .or. r4m  /= -ref4d))  call abort()
+  if (rnd8  .and. (r8p  /= ref8d  .or. r8m  /= -ref8d))  call abort()
+  if (rnd10 .and. (r10p /= ref10d .or. r10m /= -ref10d)) call abort()
+  if (rnd16 .and. (r16p /= ref16d .or. r16m /= -ref16d)) call abort()

round = 'nearest'
call t()
-  if (r4p  /= ref4u  .or. r4m  /= -ref4u)  call abort()
-  if (r8p  /= ref8u  .or. r8m  /= -ref8u)  call abort()
-  if (r10p /= ref10u .or. r10m /= -ref10u) call abort()
-  if (r16p /= ref16u .or. r16m /= -ref16u) call abort()
+  if (rnd4  .and. (r4p  /= ref4u  .or. r4m  /= -ref4u))  call abort()
+  if (rnd8  .and. (r8p  /= ref8u  .or. r8m  /= -ref8u))  call abort()
+  if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10u)) call abort()
+  if (rnd16 .and. (r16p /= ref16u .or. r16m /= -ref16u)) call abort()

  ! Same as nearest (but rounding towards zero if there is a tie
  ! [does not apply here])
round = 'compatible'
call t()
-  if (r4p  /= ref4u  .or. r4m  /= -ref4u)  call abort()
-  if (r8p  /= ref8u  .or. r8m  /= -ref8u)  call abort()
-  if (r10p /= ref10u .or. r10m /= -ref10u) call abort()
-  if (r16p /= ref16u .or. r16m /= -ref16u) call abort()
+  if (rnd4  .and. (r4p  /= ref4u  .or. r4m  /= -ref4u))  call abort()
+  if (rnd8  .and. (r8p  /= ref8u  .or. r8m  /= -ref8u))  call abort()
+  if (rnd10 .and. (r10p /= ref10u .or. r10m /= -ref10u)) call abort()

Re: PING Re: [Patch, Fortran] PR57697/58469 - Fix another defined-assignment issue

2013-09-25 Thread Thomas Koenig

Hi Tobias,

> * PING * http://gcc.gnu.org/ml/fortran/2013-09/msg00039.html
> 
> Additionally pinging for:
> http://gcc.gnu.org/ml/fortran/2013-09/msg00031.html

Both are OK.

Thanks a lot for the patches!

Thomas

Re: [PATCH] Trivial cleanup

2013-09-25 Thread Jeff Law


On 09/25/2013 10:04 AM, Jakub Jelinek wrote:

On Wed, Sep 25, 2013 at 11:46:12AM -0400, Andrew MacLeod wrote:

I noticed that with the wrapper conversion, often you will get a
sequence of 3 or more method calls, and its quite unbearable to have
the spaces.
simple things like
   int unsignedsrcp = ptrvar.type().type().type_unsigned();
vs
   int unsignedsrcp = ptrvar.type ().type ().type_unsigned ();

I was going to bring it up at some point too.  My preference is
strongly to simply eliminate the space on methods...


My preference is to keep the spaces, it makes code more readable,
no space before ( looks very ugly to me.
I generally find the lack of a space before the open-paren ugly as well, 
but I find something like Andrew's example with the horizontal spaces 
insanely bad.

Jeff

Re: [PATCH] Trivial cleanup

2013-09-25 Thread Jakub Jelinek

On Wed, Sep 25, 2013 at 01:18:06PM -0600, Jeff Law wrote:
> On 09/25/2013 10:04 AM, Jakub Jelinek wrote:
> >On Wed, Sep 25, 2013 at 11:46:12AM -0400, Andrew MacLeod wrote:
> >>I noticed that with the wrapper conversion, often you will get a
> >>sequence of 3 or more method calls, and its quite unbearable to have
> >>the spaces.
> >>simple things like
> >>   int unsignedsrcp = ptrvar.type().type().type_unsigned();
> >>vs
> >>   int unsignedsrcp = ptrvar.type ().type ().type_unsigned ();
> >>
> >>I was going to bring it up at some point too.  My preference is
> >>strongly to simply eliminate the space on methods...
> >
> >My preference is to keep the spaces, it makes code more readable,
> >no space before ( looks very ugly to me.
> I generally find the lack of a space before the open-paren ugly as
> well, but I find something like Andrew's example with the horizontal
> spaces insanely bad.

Then perhaps we don't want the above style and write
int unsignedsrcp = type_unsigned (type (type (ptrvar)));
instead?  If we really need to have C++ uglification everywhere, perhaps
through:
template 
tree
type (T t)
{
  return t.get_type (t);
}
or similar.

Jakub

Re: Context sensitive type inheritance graph walking

2013-09-25 Thread Jan Hubicka

> On 2013.09.25 at 12:20 +0200, Jan Hubicka wrote:
> > this is updated version of 
> > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00936.html
> > 
> > Bootstrapped/regtested x86_64-linux.  The patch is tested by ipa-devirt9
> > testcase.  I have extra four, but I would like to first fix the case where 
> > the
> > devirtualization happens in TODO of early_local_passes that is not dumped
> > anywhere.  So I plan to post these incrementally once this code is hooked 
> > also
> > into gimple folding.
> > 
> > The patch results in 60% more devirtualizations on Firefox and 10% more
> > speculative devirtualization.  I think main component missing now is code
> > determining dynamic type from a call to constructor.  I have some prototype 
> > for
> > this, too, I would like to discuss incrementally.  I am not 100% sure how 
> > much
> > harder tracking of dynamic type changes becomes here.
> 
> Hi Honza,
> 
> I've tested your patch and it failed during the "profile generate" phase of an
> LTO/PGO build of Firefox.
> 
> Reduced:
> 
> markus@x4 /tmp % cat test.ii
> class A {
> public:
>   virtual void m_fn1();
> };
> class B final : A {
>   ~B();
>   virtual void m_fn2() { m_fn1(); }
> };
> B::~B() {}
> 
> markus@x4 /tmp % g++ -c -std=c++11 -O2 -c test.ii
> test.ii: In member function ???virtual void B::m_fn2()???:
> test.ii:7:16: error: stmt (0x7f85504c3130) marked modified after optimization 
> pass: 

Thanks, it looks like latent problem in remove_unreachable_nodes that does not 
update
SSA after changing call to a direct call.  I will fix it tomorrow.

M:q

Honza

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-09-25 Thread Vladimir Makarov

On 09/25/2013 02:00 PM, Wei Mi wrote:
>>  To define for what occurrence of the pseudo we should do the
>> transformation, we need to create allocnos and calculate reg classes to
>> know what paradoxical subreg needs more hard regs (the transformations
>> can not be done for all paradoxical subregs as my experience shows many
>> RTL changes result in worse RA even if we have heuristics to remove the
>> generated changes as in this case would be trying to assign the same
>> hard reg for the original and the new pseudo).
>>   After doing the transformations, we need to recalculate reg classes
>> and rebuild allocnos (both are expensive).  To speed up the process it
>> could be implemented as some kind of update of already existing data but
>> it will complicate code much.
>>
> I see, thanks!
>
>> So right now I think implementing this in LRA would be easier  Still LRA
>> has a pretty good register (re-)allocation (although it is worse than in
>> IRA).
>>
> When you get an idea how to fix it in LRA, if you are still busy, I
> would be happy to do the implementation if you could brief your idea.
>

Probably the best place to add a code for this is in
lra-constraints.c::simplify_operand_subreg by permitting subreg reload
for paradoxical subregs whose hard regs are not fully in allocno class
of the inner pseudo.

It needs a good testing (i'd check that the generated code is not
changed on variety benchmarks to see that the change has no impact on
the most programs performance) and you need to add a good comment
describing why this change is needed.

Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.

2013-09-25 Thread Eric Botcazou

> > The description is too terse.  In the RTL middle-end, you shouldn't have
> > to manually deal with the REG_DEAD and REG_UNUSED notes (unlike
> > REG_EQUAL and REG_EQUIV notes), as the DF framework is supposed to do
> > it for you.
>
> Unfortunately LRA uses its own DF framework.

In fact reload does just the same:

  /* Make a pass over all the insns and delete all USEs which we inserted
 only to tag a REG_EQUAL note on them.  Remove all REG_DEAD and REG_UNUSED
 notes.  Delete all CLOBBER insns, except those that refer to the return
 value and the special mem:BLK CLOBBERs added to prevent the scheduler
 from misarranging variable-array code, and simplify (subreg (reg))
 operands.  Strip and regenerate REG_INC notes that may have been moved
 around.  */

[...]

pnote = ®_NOTES (insn);
while (*pnote != 0)
  {
if (REG_NOTE_KIND (*pnote) == REG_DEAD
|| REG_NOTE_KIND (*pnote) == REG_UNUSED
|| REG_NOTE_KIND (*pnote) == REG_INC)
  *pnote = XEXP (*pnote, 1);
else
  pnote = &XEXP (*pnote, 1);
  }

so I guess LRA is entitled to do it as well.  But the proper fix to this is to 
recompute the REG_DEAD/REG_UNUSED notes at the beginning of postreload.

-- 
Eric Botcazou

Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-25 Thread Richard Sandiford

Eric Botcazou  writes:
>> FWIW, I'd prefer to keep it as-is, since must_be_base_p (x) and
>> must_be_index_p (x) don't imply that we should look specifically at
>> XEXP (x, 0) (rather than just X, or XEXP (x, 1), etc.).  I think it's
>> better to keep the code tests and the associated XEXPs together.
>
> Feel free to revert this part, but then add appropriate comments explaining 
> why we are interested in LO_SUM for set_address_base and in MULT and the 5 
> others for set_address_index.  If it's because the former is rather a base 
> than an index and vice-versa for the latter, then it's even clearer with the 
> predicates I think.

The idea is that must_be_base_p and must_be_index_p are used when deciding
how to classify the operands of a PLUS.  set_address_base and set_address_index
are called once we've decided which is which and want to record the choice.
There's no backing out by that stage.

So in the set_* routines it isn't about whether the value is definitely
a base or a definitely an index.  It's just about drilling down through
what we've already decided is a base or index to get the inner reg or mem,
and knowing which XEXPs to look at.  We could instead have used a for_each_rtx,
or something like that, without any code checks.  But I wanted to be precise
about the types of address we allow, so that we can assert for things we
don't understand.  In other words, it was "designed" to require the kind
of extension Yvan is adding here.

E.g. although it's a bit of a stretch, it might be in future that
someone wants to allow NOT in an index.  It would then make sense
to add NOT to must_be_index_p.  But the existing check:

  if ((GET_CODE (*inner) == MULT || GET_CODE (*inner) == ASHIFT)
  && CONSTANT_P (XEXP (*inner, 1)))
inner = strip_address_mutations (&XEXP (*inner, 0));

wouldn't make sense as:

  if (must_be_index_p (*inner) && CONSTANT_P (XEXP (*inner, 1)))
inner = strip_address_mutations (&XEXP (*inner, 0));

in that case, since NOT only has one operand.  And it might be that
the NOT is allowed only outside the scale, only inside the scale,
or in both positions.

Not the best example, sorry.

Thanks,
Richard

Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-25 Thread Eric Botcazou

> So in the set_* routines it isn't about whether the value is definitely
> a base or a definitely an index.  It's just about drilling down through
> what we've already decided is a base or index to get the inner reg or mem,
> and knowing which XEXPs to look at.  We could instead have used a
> for_each_rtx, or something like that, without any code checks.  But I
> wanted to be precise about the types of address we allow, so that we can
> assert for things we don't understand.  In other words, it was "designed"
> to require the kind of extension Yvan is adding here.

Does this mean that the design is to require a parallel implementation in the 
predicates and in the set routines, i.e. each time you add a new case to the 
predicates, you need to add it (or do something) to the set routines as well?  
If so, that's a little weird, but OK, feel free to revert the de-duplication 
part, but add comments saying that the functions must be kept synchronized.

-- 
Eric Botcazou

Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.

2013-09-25 Thread Steven Bosscher

On Wed, Sep 25, 2013 at 4:55 PM, Vladimir Makarov wrote:
> On 09/24/2013 03:40 PM, Mike Stump wrote:
>> On Sep 24, 2013, at 12:23 PM, Steven Bosscher wrote:
>>> On Tue, Sep 24, 2013 at 5:03 PM, Eric Botcazou wrote:
> This patch removes REG_DEAD and REG_UNUSED notes
 DF framework is supposed to do it for you.
>>> Unfortunately LRA uses its own DF framework.
> I'd say fortunately.  Otherwise, LRA would decrease whole compiler speed
> by upto 10%.  I can say it because of the first version of LRA was based
> on DF framework.

You've said that before, but I don't think you ever really gave the DF
framework a fair chance.

There are many knobs and switches on DF that can make or ruin its
performance. I don't remember seeing your DF-based LRA code, or you
seeking advice from DF maintainers. As you may remember, LRA had a
great number of relatively easy compiler speed improvements left just
before the merge from the lra-branch to the trunk. We will never know
whether your DF-based LRA would have been competitive, given a little
more care and effort, and some help from people who know the DF
framework well.

But it doesn't matter anymore now, anyway. IRA+LRA are fine the way it
is, IMHO. IRA+LRA needs to have a different view on registers than
almost all other passes need: allocnos vs. REGs, insn operand number
matters more, LRA needs to see the MEM if the registers are used in an
address, etc. I'm not sure DF is the right framework for IRA+LRA to do
their job!

It's only unfortunate that LRA doesn't, and maybe can't use DF,
because I just would have liked to see the whole compiler use a single
framework, to improve maintainability (as in this case: managing
REG_DEAD/REG_UNUSED notes in only one place) and to let all passes
benefit from any work done to improve the framework.

> I'd be glad if we move to DF without compilation time degradation.
>
> I already wrote that DF-design (although it has a nice API which is good
> for prototyping) has a bad design with resource usage point of view.
> Keeping a lot of duplicated info by RTL side increases memory footprint
> very much, worsens data locality, and slows down the compiler.

The memory footprint is still a significant downside, but the other
two points are not. Data locality can be a lot better if the DF caches
are used.

For example: I rewrote the RTL CPROP pass using the DF operand caches
and that took the pass off the list of compile time bottlenecks
(before 3-4% compile time on preprocessed GCC source; now less than
1%). Most of the speedup came from fewer L2 cache misses, because
CPROP doesn't walk insn patterns anymore in its dataflow analysis
phase.

The DF operand caches are compiler speed killers if used incorrectly,
e.g. automatic updating after each change for passes that make a lot
of changes. Manual updating is the solution used by such passes. I can
only speculate, but perhaps the DF-based LRA used an inappropriate DF
caches update mode.

(What's still not done well in the DF framework, is re-using the
memory allocated for the DF caches...)

>  As I
> remember correctly, GCC steering committing permitted 5% compiler time
> degradation as a criterium to include it into the trunk and DF achieved
> this with difficulties.

In fact, as I remember it, the df-branch at the time of the merge was
still above that 5% threshold. Was it below 5% slowdown before the
merge?

Either way: Much of that slowdown was due to having the LR, RU, and
LIVE problems separate. The RU and LIVE problems were merged before
GCC 4.3 and that solved much of the slowdown (and a significant
portion of the memory footprint issues) . Moreover, the solver itself
was rewritten before GCC 4.3 was released, and all DF problems
benefited from that. Another big compiler speed problem was with the
RD problem that fwprop used at the time. The RD problem can now be
much faster if reaching defs are pruned using liveness, and all RD
users benefit from this. Single framework change, many passes benefit!

Ciao!
Steven

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-09-25 Thread Teresa Johnson

On Tue, Sep 24, 2013 at 11:25 AM, Teresa Johnson  wrote:
> On Tue, Sep 24, 2013 at 10:57 AM, Jan Hubicka  wrote:
>>>
>>> I looked at one that failed after 100 as well (20031204-1.c). In this
>>> case, it was due to expansion which was creating multiple branches/bbs
>>> from a logical OR and guessing incorrectly on how to assign the
>>> counts:
>>>
>>>  if (octets == 4 && (*cp == ':' || *cp == '\0')) {
>>>
>>> The (*cp == ':' || *cp == '\0') part looked like the following going
>>> into RTL expansion:
>>>
>>>   [20031204-1.c : 31:33] _29 = _28 == 58;
>>>   [20031204-1.c : 31:33] _30 = _28 == 0;
>>>   [20031204-1.c : 31:33] _31 = _29 | _30;
>>>   [20031204-1.c : 31:18] if (_31 != 0)
>>> goto ;
>>>   else
>>> goto ;
>>>
>>> where the result of the OR was always true, so bb 16 had a count of
>>> 100 and bb 19 a count of 0. When it was expanded, the expanded version
>>> of the above turned into 2 bbs with a branch in between. Both
>>> comparisons were done in the first bb, but the first bb checked
>>> whether the result of the *cp == '\0' compare was true, and if not
>>> branched to the check for whether the *cp == ':' compare was true. It
>>> gave the branch to the second check against ':' a count of 0, so that
>>> bb got a count of 0 and was split out, and put the count of 100 on the
>>> fall through assuming the compare with '\0' always evaluated to true.
>>> In reality, this OR condition was always true because *cp was ':', not
>>> '\0'. Therefore, the count of 0 on the second block with the check for
>>> ':' was incorrect, we ended up trying to execute it, and failed.
>>
>> I see, we produce:
>> ;; if (_26 != 0)
>>
>> (insn 94 93 95 (set (reg:CCZ 17 flags)
>> (compare:CCZ (reg:QI 107 [ D.2184 ])
>> (const_int 0 [0]))) a.c:31 -1
>>  (nil))
>>
>> (insn 95 94 96 (set (reg:QI 122 [ D.2186 ])
>> (eq:QI (reg:CCZ 17 flags)
>> (const_int 0 [0]))) a.c:31 -1
>>  (nil))
>>
>> (insn 96 95 97 (set (reg:CCZ 17 flags)
>> (compare:CCZ (reg:QI 122 [ D.2186 ])
>> (const_int 0 [0]))) a.c:31 -1
>>  (nil))
>>
>> (jump_insn 97 96 98 (set (pc)
>> (if_then_else (ne (reg:CCZ 17 flags)
>> (const_int 0 [0]))
>> (label_ref 100)
>> (pc))) a.c:31 -1
>>  (expr_list:REG_BR_PROB (const_int 6100 [0x17d4])
>> (nil)))
>>
>> (insn 98 97 99 (set (reg:CCZ 17 flags)
>> (compare:CCZ (reg:QI 108 [ D.2186 ])
>> (const_int 0 [0]))) a.c:31 -1
>>  (nil))
>>
>> (jump_insn 99 98 100 (set (pc)
>> (if_then_else (eq (reg:CCZ 17 flags)
>> (const_int 0 [0]))
>> (label_ref 0)
>> (pc))) a.c:31 -1
>>  (expr_list:REG_BR_PROB (const_int 3900 [0xf3c])
>> (nil)))
>>
>> (code_label 100 99 0 14 "" [0 uses])
>>
>> That is because we TER together "_26 = _25 | _24" and "if (_26 != 0)"
>>
>> First I think the logic of do_jump should really be moved to trees.  It is 
>> not
>> doing things that can not be adequately represented by gimple.
>>
>> I am not that certain we want to move it before profiling though.
>>>
>>> Presumably we had the correct profile data for both blocks, but the
>>> accuracy was reduced when the OR was represented as a logical
>>> computation with a single branch. We could change the expansion code
>>> to do something different, e.g. treat as a 50-50 branch. But we would
>>> still end up with integer truncation issues when there was a single
>>> training run. But that could be dealt with conservatively in the
>>
>> Yep, but it is still better than what we have now - if the test above was
>> in hot part of program (i.e. not executed once), we will end up optimizing
>> the second conditional for size.
>>
>> So I think it is do_jump bug to not distribute probabilities across the two
>> conditoinals introduced.
>>> bbpart code as I suggested for the jump threading issue above. I.e. a
>>> cold block with incoming non-cold edges conservatively not marked cold
>>> for splitting.
>>
>> Yep, we can probably do that, but we ought to fix the individual cases
>> above at least for resonable number of runs.
>
> I made this change and it removed a few of the failures.
>
> I looked at another case that still failed with 1 train run but passed
> with 100. It turned out to be another truncation issue exposed by RTL
> expansion, where we created some control flow for a memset builtin
> which was in a block with an execution count of 1. Some of the blocks
> got frequencies less than half the original block, so the count was
> rounded down or truncated to 0. I noticed that in this case (as well
> as the jump threading case I fixed by looking for non-zero incoming
> edges in partitioning) that the bb frequency was non-zero.
>
> Why not just have probably_never_executed_bb_p return simply return
> false bb->frequency is non-zero (right now it does the opposite -
> returns true when bb->frequency is 0)? Making this change removed a
> bunch of other fail

[GOOGLE] Disable aggressive loop peeling to prevent code bloat.

2013-09-25 Thread Dehao Chen

This patch disables aggressive loop peeling when profile is available.
This prevents extensive code bloat which leads to increased i-cache
misses.

Bootstrapped and passed regression tests.

OK for google-4_8?

Thanks,
Dehao

Index: gcc/loop-unroll.c
===
--- gcc/loop-unroll.c (revision 202926)
+++ gcc/loop-unroll.c (working copy)
@@ -1574,8 +1574,7 @@ decide_peel_simple (struct loop *loop, int flags)
  peeling it is not the case.  Also a function call inside loop is
  also branch from branch prediction POV (and probably better reason
  to not unroll/peel).  */
-  if (desc->num_branches > 1
-  && profile_status != PROFILE_READ)
+  if (desc->num_branches > 1)
 {
   if (dump_file)
  fprintf (dump_file, ";; Not peeling, contains branches\n");

[google gcc-4_8] Applied partial backport of r202818, r202832 and r202836.

2013-09-25 Thread Paul Pluzhnikov

Greetings,

I committed partial backport of r202818, r202832 and r202836 to
google/gcc-4_8 branch as r202927.


2013-09-25  Paul Pluzhnikov  

* libstdc++-v3/config/abi/pre/gnu.ver: Add 
_ZSt24__throw_out_of_range_fmtPKcz
* libstdc++-v3/src/c++11/Makefile.am: Add snprintf_lite.
* libstdc++-v3/src/c++11/Makefile.in: Regenerate.
* libstdc++-v3/src/c++11/snprintf_lite.cc: New.
* libstdc++-v3/src/c++11/functexcept.cc (__throw_out_of_range_fmt): New.



Index: libstdc++-v3/config/abi/pre/gnu.ver
===
--- libstdc++-v3/config/abi/pre/gnu.ver (revision 202926)
+++ libstdc++-v3/config/abi/pre/gnu.ver (working copy)
@@ -1357,6 +1357,13 @@
 
 } GLIBCXX_3.4.18;
 
+GLIBCXX_3.4.20 {
+
+   # std::__throw_out_of_range_fmt(char const*, ...)
+   _ZSt24__throw_out_of_range_fmtPKcz;
+
+} GLIBCXX_3.4.19;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
Index: libstdc++-v3/src/c++11/Makefile.am
===
--- libstdc++-v3/src/c++11/Makefile.am  (revision 202926)
+++ libstdc++-v3/src/c++11/Makefile.am  (working copy)
@@ -42,6 +42,7 @@
random.cc \
regex.cc  \
shared_ptr.cc \
+   snprintf_lite.cc \
system_error.cc \
thread.cc
 
Index: libstdc++-v3/src/c++11/functexcept.cc
===
--- libstdc++-v3/src/c++11/functexcept.cc   (revision 202926)
+++ libstdc++-v3/src/c++11/functexcept.cc   (working copy)
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef _GLIBCXX_USE_NLS
 # include 
@@ -39,6 +40,12 @@
 # define _(msgid)   (msgid)
 #endif
 
+namespace __gnu_cxx
+{
+  int __snprintf_lite(char *__buf, size_t __bufsize, const char *__fmt,
+ va_list __ap);
+}
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -80,6 +87,22 @@
   { _GLIBCXX_THROW_OR_ABORT(out_of_range(_(__s))); }
 
   void
+  __throw_out_of_range_fmt(const char* __fmt, ...)
+  {
+const size_t __len = __builtin_strlen(__fmt);
+// We expect at most 2 numbers, and 1 short string. The additional
+// 512 bytes should provide more than enough space for expansion.
+const size_t __alloca_size = __len + 512;
+char *const __s = static_cast(__builtin_alloca(__alloca_size));
+va_list __ap;
+
+va_start(__ap, __fmt);
+__gnu_cxx::__snprintf_lite(__s, __alloca_size, __fmt, __ap);
+_GLIBCXX_THROW_OR_ABORT(out_of_range(_(__s)));
+va_end(__ap);  // Not reached.
+  }
+
+  void
   __throw_runtime_error(const char* __s __attribute__((unused)))
   { _GLIBCXX_THROW_OR_ABORT(runtime_error(_(__s))); }
 
Index: libstdc++-v3/src/c++11/Makefile.in
===
--- libstdc++-v3/src/c++11/Makefile.in  (revision 202926)
+++ libstdc++-v3/src/c++11/Makefile.in  (working copy)
@@ -70,7 +70,8 @@
 am__objects_1 = chrono.lo condition_variable.lo debug.lo \
functexcept.lo functional.lo future.lo hash_c++0x.lo \
hashtable_c++0x.lo limits.lo mutex.lo placeholders.lo \
-   random.lo regex.lo shared_ptr.lo system_error.lo thread.lo
+   random.lo regex.lo shared_ptr.lo snprintf_lite.lo \
+   system_error.lo thread.lo
 @ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_2 = fstream-inst.lo \
 @ENABLE_EXTERN_TEMPLATE_TRUE@  string-inst.lo wstring-inst.lo
 am_libc__11convenience_la_OBJECTS = $(am__objects_1) $(am__objects_2)
@@ -319,6 +320,7 @@
random.cc \
regex.cc  \
shared_ptr.cc \
+   snprintf_lite.cc \
system_error.cc \
thread.cc
 
Index: libstdc++-v3/src/c++11/snprintf_lite.cc
===
--- libstdc++-v3/src/c++11/snprintf_lite.cc (revision 0)
+++ libstdc++-v3/src/c++11/snprintf_lite.cc (revision 0)
@@ -0,0 +1,159 @@
+// Debugging support -*- C++ -*-
+
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+//

Re: [GOOGLE] Disable aggressive loop peeling to prevent code bloat.

2013-09-25 Thread Xinliang David Li

I wish there is better heuristic in the future. For now it is ok.

David

On Wed, Sep 25, 2013 at 2:48 PM, Dehao Chen  wrote:
> This patch disables aggressive loop peeling when profile is available.
> This prevents extensive code bloat which leads to increased i-cache
> misses.
>
> Bootstrapped and passed regression tests.
>
> OK for google-4_8?
>
> Thanks,
> Dehao
>
> Index: gcc/loop-unroll.c
> ===
> --- gcc/loop-unroll.c (revision 202926)
> +++ gcc/loop-unroll.c (working copy)
> @@ -1574,8 +1574,7 @@ decide_peel_simple (struct loop *loop, int flags)
>   peeling it is not the case.  Also a function call inside loop is
>   also branch from branch prediction POV (and probably better reason
>   to not unroll/peel).  */
> -  if (desc->num_branches > 1
> -  && profile_status != PROFILE_READ)
> +  if (desc->num_branches > 1)
>  {
>if (dump_file)
>   fprintf (dump_file, ";; Not peeling, contains branches\n");

cost model patch

2013-09-25 Thread Xinliang David Li

I took the liberty to pick up Richard's original fvect-cost-model
patch and made some modification.

What has not changed:
1) option -ftree-vect-loop-version is removed;
2) three cost models are introduced: cheap, dynamic, and unlimited;
3) unless explicitly specified, cheap model is the default at O2 (e.g.
when -ftree-loop-vectorize is used with -O2), and dynamic mode is the
default for O3 and FDO
4) alignment based versioning is disabled with cheap model.

What has changed:
1) peeling is also disabled with cheap model;
2) alias check condition limit is reduced with cheap model, but not
completely suppressed. Runtime alias check is a pretty important
enabler.
3) tree if conversion changes are not included.

Does this patch look reasonable?

thanks,

David
Index: tree-vectorizer.h
===
--- tree-vectorizer.h   (revision 202926)
+++ tree-vectorizer.h   (working copy)
@@ -880,6 +880,14 @@ known_alignment_for_access_p (struct dat
   return (DR_MISALIGNMENT (data_ref_info) != -1);
 }
 
+
+/* Return true if the vect cost model is unlimited.  */
+static inline bool
+unlimited_cost_model ()
+{
+  return flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED;
+}
+
 /* Source location */
 extern LOC vect_location;
 
Index: flag-types.h
===
--- flag-types.h(revision 202926)
+++ flag-types.h(working copy)
@@ -191,6 +191,15 @@ enum fp_contract_mode {
   FP_CONTRACT_FAST = 2
 };
 
+/* Vectorizer cost-model.  */
+enum vect_cost_model {
+  VECT_COST_MODEL_UNLIMITED = 0,
+  VECT_COST_MODEL_CHEAP = 1,
+  VECT_COST_MODEL_DYNAMIC = 2,
+  VECT_COST_MODEL_DEFAULT = 3
+};
+
+
 /* Different instrumentation modes.  */
 enum sanitize_code {
   /* AddressSanitizer.  */
Index: targhooks.c
===
--- targhooks.c (revision 202926)
+++ targhooks.c (working copy)
@@ -1057,20 +1057,17 @@ default_add_stmt_cost (void *data, int c
   unsigned *cost = (unsigned *) data;
   unsigned retval = 0;
 
-  if (flag_vect_cost_model)
-{
-  tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
-  int stmt_cost = default_builtin_vectorization_cost (kind, vectype,
+  tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
+  int stmt_cost = default_builtin_vectorization_cost (kind, vectype,
  misalign);
-  /* Statements in an inner loop relative to the loop being
-vectorized are weighted more heavily.  The value here is
-arbitrary and could potentially be improved with analysis.  */
-  if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
-   count *= 50;  /* FIXME.  */
+   /* Statements in an inner loop relative to the loop being
+  vectorized are weighted more heavily.  The value here is
+  arbitrary and could potentially be improved with analysis.  */
+  if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
+count *= 50;  /* FIXME.  */
 
-  retval = (unsigned) (count * stmt_cost);
-  cost[where] += retval;
-}
+  retval = (unsigned) (count * stmt_cost);
+  cost[where] += retval;
 
   return retval;
 }
Index: common.opt
===
--- common.opt  (revision 202926)
+++ common.opt  (working copy)
@@ -2278,13 +2278,33 @@ ftree-slp-vectorize
 Common Report Var(flag_tree_slp_vectorize) Optimization
 Enable basic block vectorization (SLP) on trees
 
+fvect-cost-model=
+Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) 
Init(VECT_COST_MODEL_DEFAULT)
+Specifies the cost model for vectorization
+ 
+Enum
+Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
vectorizer cost model %qs)
+
+EnumValue
+Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED)
+
+EnumValue
+Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC)
+
+EnumValue
+Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP)
+
 fvect-cost-model
-Common Report Var(flag_vect_cost_model) Optimization
-Enable use of cost model in vectorization
+Common RejectNegative Alias(fvect-cost-model=,dynamic)
+Enables the dynamic vectorizer cost model.  Preserved for backward 
compatibility.
+
+fno-vect-cost-model
+Common RejectNegative Alias(fvect-cost-model=,unlimited)
+Enables the unlimited vectorizer cost model.  Preserved for backward 
compatibility.
 
 ftree-vect-loop-version
-Common Report Var(flag_tree_vect_loop_version) Init(1) Optimization
-Enable loop versioning when doing loop vectorization on trees
+Common Ignore
+Does nothing. Preserved for backward compatibility.
 
 ftree-scev-cprop
 Common Report Var(flag_tree_scev_cprop) Init(1) Optimization
Index: opts.c
===
--- opts.c  (revision 202926)
+++ opts.c  (working copy)
@@

Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3

2013-09-25 Thread David Edelsohn

On Tue, Sep 24, 2013 at 4:33 PM, Michael Meissner
 wrote:
> This patch adds the initial support for putting DI, DF, and SF values in the
> upper registers (traditional Altivec registers) using the -mupper-regs-df and
> -mupper-regs-sf patches.  Those switches will not be enabled by default until
> the rest of the changes are made.  This patch passes the bootstrap test and
> make check test.  I tested all of the targets I tested previously (power4-8,
> G4/G5, SPE, cell, e5500/e5600, and paired floating point), and all machines
> generate the same code.  Is it ok to install this patch?
>
> [gcc]
> 2013-09-24  Michael Meissner  
>
> * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Allow
> DFmode, DImode, and SFmode in the upper VSX registers based on the
> -mupper-regs-{df,sf} flags.  Fix wu constraint to be ALTIVEC_REGS
> if -mpower8-vector.  Combine -mvsx-timode handling with the rest
> of the VSX register handling.
>
> * config/rs6000/rs6000.md (f32_lv): Use %x0 for VSX regsters.
> (f32_sv): Likewise.
> (zero_extendsidi2_lfiwzx): Add support for loading into the
> Altivec registers with -mpower8-vector.  Use wu/wv constraints to
> only do VSX memory options on Altivec registers.
> (extendsidi2_lfiwax): Likewise.
> (extendsfdf2_fpr): Likewise.
> (mov_hardfloat, SF/SD modes): Likewise.
> (mov_hardfloat32, DF/DD modes): Likewise.
> (mov_hardfloat64, DF/DD modes): Likewise.
> (movdi_internal64): Likewise.
>
> [gcc/testsuite]
> 2013-09-24  Michael Meissner  
>
> * gcc.target/powerpc/p8vector-ldst.c: New test for -mupper-regs-sf
> and -mupper-regs-df.

Okay.

Thanks, David

[PATCH] Make jump thread path carry more information

2013-09-25 Thread Jeff Law



This patch conveys information about the blocks/edges in a jump 
threading path.  Of particular interest is information about the source 
block in each edge of the path -- the nature of the block determines our 
copy strategy (empty -- no copy, normal copy, joiner copy).


Rather than rediscover the nature of those blocks during the CFG/SSA 
graph updating, it's easier to just attach the information to each edge 
in the path.  That's precisely what this patch does.


The SSA/CFG updating code isn't making much use of this yet, that's a 
follow-on change.  This change is strictly designed to get the 
information into the updating code.



Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed on trunk.





diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5adbaeb..8871aca 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,26 @@
+2013-09-25  Jeff Law  
+
+   * tree-flow.h (thread_through_all_blocks): Prototype moved into
+   tree-ssa-threadupdate.h.
+   (register_jump_thread): Similarly.
+   * tree-ssa-threadupdate.h: New header file.
+   * tree-ssa-dom.c: Include tree-ssa-threadupdate.h.
+   * tree-vrp.c: Likewise.
+   * tree-ssa-threadedge.c: Include tree-ssa-threadupdate.h.
+   (thread_around_empty_blocks): Change type of path vector argument to
+   an edge,type pair from just an edge.  Initialize both elements when
+   appending to a jump threading path.  Tweak references to elements
+   appropriately.
+   (thread_across_edge): Similarly.  Release memory for the elements
+   as needed.
+   * tree-ssa-threadupdate.c: Include tree-ssa-threadupdate.h.
+   (dump_jump_thread_path): New function broken out from
+   register_jump_thread.
+   (register_jump_thread): Use dump_jump_thread_path.  Change type of
+   path vector entries.  Search the path for NULL edges and dump
+   the path if one is found.  Tweak the conversion of path to 3-edge
+   form to use the block copy type information embedded in the path.
+
 2013-09-25  Yvan Roux  
 
* lra.c (update_inc_notes): Remove all REG_DEAD and REG_UNUSED notes.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 634e747..a264056 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,6 @@
+2013-09-25  Jeff Law  
+
+   * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Update expected output.
 2013-09-25  Tobias Burnus  
 
PR fortran/58436
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c
index d2a1fbb..222a97b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c
@@ -43,7 +43,6 @@ expand_one_var (tree var, unsigned char toplevel, unsigned 
char really_expand)
 }
 /* We should thread the jump, through an intermediate block.  */
 /* { dg-final { scan-tree-dump-times "Threaded" 1 "dom1"} } */
-/* { dg-final { scan-tree-dump-times "Registering jump thread .through joiner 
block.: \\(.*\\);  \\(.*\\);  \\(.*\\);" 1 "dom1"} } */
-
+/* { dg-final { scan-tree-dump-times "Registering jump thread: \\(.*\\) 
incoming edge;  \\(.*\\) joiner;  \\(.*\\) nocopy;" 1 "dom1"} } */
 /* { dg-final { cleanup-tree-dump "dom1" } } */
 
diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h
index 2f64abc..ee69179 100644
--- a/gcc/tree-flow.h
+++ b/gcc/tree-flow.h
@@ -641,10 +641,6 @@ bool multiplier_allowed_in_address_p (HOST_WIDE_INT, enum 
machine_mode,
  addr_space_t);
 bool may_be_nonaddressable_p (tree expr);
 
-/* In tree-ssa-threadupdate.c.  */
-extern bool thread_through_all_blocks (bool);
-extern void register_jump_thread (vec, bool);
-
 /* In gimplify.c  */
 tree force_gimple_operand_1 (tree, gimple_seq *, gimple_predicate, tree);
 tree force_gimple_operand (tree, gimple_seq *, bool, tree);
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index f0cc0ee..81119c3 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "domwalk.h"
 #include "tree-pass.h"
 #include "tree-ssa-propagate.h"
+#include "tree-ssa-threadupdate.h"
 #include "langhooks.h"
 #include "params.h"
 
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 2ca56342..467d982 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-ssa.h"
 #include "tree-ssa-propagate.h"
+#include "tree-ssa-threadupdate.h"
 #include "langhooks.h"
 #include "params.h"
 
@@ -753,7 +754,7 @@ thread_around_empty_blocks (edge taken_edge,
bool handle_dominating_asserts,
tree (*simplify) (gimple, gimple),
bitmap visited,
-   vec *path)
+   vec *path)
 {
   basic_block bb = taken_edge->dest

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-09-25 Thread Teresa Johnson

On Wed, Sep 25, 2013 at 2:33 PM, Teresa Johnson  wrote:
> On Tue, Sep 24, 2013 at 11:25 AM, Teresa Johnson  wrote:
>> On Tue, Sep 24, 2013 at 10:57 AM, Jan Hubicka  wrote:

 I looked at one that failed after 100 as well (20031204-1.c). In this
 case, it was due to expansion which was creating multiple branches/bbs
 from a logical OR and guessing incorrectly on how to assign the
 counts:

  if (octets == 4 && (*cp == ':' || *cp == '\0')) {

 The (*cp == ':' || *cp == '\0') part looked like the following going
 into RTL expansion:

   [20031204-1.c : 31:33] _29 = _28 == 58;
   [20031204-1.c : 31:33] _30 = _28 == 0;
   [20031204-1.c : 31:33] _31 = _29 | _30;
   [20031204-1.c : 31:18] if (_31 != 0)
 goto ;
   else
 goto ;

 where the result of the OR was always true, so bb 16 had a count of
 100 and bb 19 a count of 0. When it was expanded, the expanded version
 of the above turned into 2 bbs with a branch in between. Both
 comparisons were done in the first bb, but the first bb checked
 whether the result of the *cp == '\0' compare was true, and if not
 branched to the check for whether the *cp == ':' compare was true. It
 gave the branch to the second check against ':' a count of 0, so that
 bb got a count of 0 and was split out, and put the count of 100 on the
 fall through assuming the compare with '\0' always evaluated to true.
 In reality, this OR condition was always true because *cp was ':', not
 '\0'. Therefore, the count of 0 on the second block with the check for
 ':' was incorrect, we ended up trying to execute it, and failed.
>>>
>>> I see, we produce:
>>> ;; if (_26 != 0)
>>>
>>> (insn 94 93 95 (set (reg:CCZ 17 flags)
>>> (compare:CCZ (reg:QI 107 [ D.2184 ])
>>> (const_int 0 [0]))) a.c:31 -1
>>>  (nil))
>>>
>>> (insn 95 94 96 (set (reg:QI 122 [ D.2186 ])
>>> (eq:QI (reg:CCZ 17 flags)
>>> (const_int 0 [0]))) a.c:31 -1
>>>  (nil))
>>>
>>> (insn 96 95 97 (set (reg:CCZ 17 flags)
>>> (compare:CCZ (reg:QI 122 [ D.2186 ])
>>> (const_int 0 [0]))) a.c:31 -1
>>>  (nil))
>>>
>>> (jump_insn 97 96 98 (set (pc)
>>> (if_then_else (ne (reg:CCZ 17 flags)
>>> (const_int 0 [0]))
>>> (label_ref 100)
>>> (pc))) a.c:31 -1
>>>  (expr_list:REG_BR_PROB (const_int 6100 [0x17d4])
>>> (nil)))
>>>
>>> (insn 98 97 99 (set (reg:CCZ 17 flags)
>>> (compare:CCZ (reg:QI 108 [ D.2186 ])
>>> (const_int 0 [0]))) a.c:31 -1
>>>  (nil))
>>>
>>> (jump_insn 99 98 100 (set (pc)
>>> (if_then_else (eq (reg:CCZ 17 flags)
>>> (const_int 0 [0]))
>>> (label_ref 0)
>>> (pc))) a.c:31 -1
>>>  (expr_list:REG_BR_PROB (const_int 3900 [0xf3c])
>>> (nil)))
>>>
>>> (code_label 100 99 0 14 "" [0 uses])
>>>
>>> That is because we TER together "_26 = _25 | _24" and "if (_26 != 0)"
>>>
>>> First I think the logic of do_jump should really be moved to trees.  It is 
>>> not
>>> doing things that can not be adequately represented by gimple.
>>>
>>> I am not that certain we want to move it before profiling though.

 Presumably we had the correct profile data for both blocks, but the
 accuracy was reduced when the OR was represented as a logical
 computation with a single branch. We could change the expansion code
 to do something different, e.g. treat as a 50-50 branch. But we would
 still end up with integer truncation issues when there was a single
 training run. But that could be dealt with conservatively in the
>>>
>>> Yep, but it is still better than what we have now - if the test above was
>>> in hot part of program (i.e. not executed once), we will end up optimizing
>>> the second conditional for size.
>>>
>>> So I think it is do_jump bug to not distribute probabilities across the two
>>> conditoinals introduced.
 bbpart code as I suggested for the jump threading issue above. I.e. a
 cold block with incoming non-cold edges conservatively not marked cold
 for splitting.
>>>
>>> Yep, we can probably do that, but we ought to fix the individual cases
>>> above at least for resonable number of runs.
>>
>> I made this change and it removed a few of the failures.
>>
>> I looked at another case that still failed with 1 train run but passed
>> with 100. It turned out to be another truncation issue exposed by RTL
>> expansion, where we created some control flow for a memset builtin
>> which was in a block with an execution count of 1. Some of the blocks
>> got frequencies less than half the original block, so the count was
>> rounded down or truncated to 0. I noticed that in this case (as well
>> as the jump threading case I fixed by looking for non-zero incoming
>> edges in partitioning) that the bb frequency was non-zero.
>>
>> Why not just have probably_never_executed_bb_p return s

69 matches

Mail list logo