date:20131111

Re: [PATCH] Bug fix for PR59050

2013-11-11 Thread Richard Biener

On Fri, 8 Nov 2013, Cong Hou wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59050
> 
> This is my bad. I forget to check the test result for gfortran. With
> this patch the bug should be fixed (tested on x86-64).

Ok.

Btw, requirements are to bootstrap and test with all default
languages enabled (that is, without any --enable-languages or
--enable-languages=all).  That
gets you c,c++,objc,java,fortran,lto and misses obj-c++ ada and go.
I am personally using --enable-languages=all,ada,obj-c++.

Thanks,
Richard.

> thanks,
> Cong
> 
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 90b01f2..e62c672 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2013-11-08  Cong Hou  
> +
> +   PR tree-optimization/59050
> +   * tree-vect-data-refs.c (comp_dr_addr_with_seg_len_pair): Bug fix.
> +
>  2013-11-07  Cong Hou  
> 
> * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks):
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index b2a31b1..b7eb926 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -2669,9 +2669,9 @@ comp_dr_addr_with_seg_len_pair (const void *p1_,
> const void *p2_)
>if (comp_res != 0)
> return comp_res;
>  }
> -  if (tree_int_cst_compare (p11.offset, p21.offset) < 0)
> +  else if (tree_int_cst_compare (p11.offset, p21.offset) < 0)
>  return -1;
> -  if (tree_int_cst_compare (p11.offset, p21.offset) > 0)
> +  else if (tree_int_cst_compare (p11.offset, p21.offset) > 0)
>  return 1;
>if (TREE_CODE (p12.offset) != INTEGER_CST
>|| TREE_CODE (p22.offset) != INTEGER_CST)
> @@ -2680,9 +2680,9 @@ comp_dr_addr_with_seg_len_pair (const void *p1_,
> const void *p2_)
>if (comp_res != 0)
> return comp_res;
>  }
> -  if (tree_int_cst_compare (p12.offset, p22.offset) < 0)
> +  else if (tree_int_cst_compare (p12.offset, p22.offset) < 0)
>  return -1;
> -  if (tree_int_cst_compare (p12.offset, p22.offset) > 0)
> +  else if (tree_int_cst_compare (p12.offset, p22.offset) > 0)
>  return 1;
> 
>return 0;
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: Simplify code in gimple_equal_p

2013-11-11 Thread Richard Biener

On Sat, 9 Nov 2013, Tom de Vries wrote:

> Richard,
> 
> This patch simplifies code in gimple_equal_p.
> 
> Bootstrapped and regtested on x86_64.
> 
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
> 
> 2013-11-06  Tom de Vries  
> 
>   * tree-ssa-tail-merge.c (gimple_equal_p): Remove equal variable.

Re: [RFA][PATCH] Isolate erroneous paths optimization

2013-11-11 Thread Eric Botcazou

> However, that brings up an couple interesting questions.
> 
> Let's say we find a NULL pointer which reaches a return statement in a
> function which is marked as returns_nonnull.  In that case there is no
> dereference.  Presumably for that kind of scenario we'll just keep the
> builtin trap.
> 
> Similarly, assume we extend this pass to detect out-of-bounds array
> indexing.  It's fairly simple to do and has always been in my plan.  In
> that case leaving in the array indexing  won't necessarily generate a
> fault.  For those presumably we'll just want the builtin_trap as well?
> 
> Again, I don't mind inserting a *0, I just want to have a plan for the
> other undefined behaviours we currently detect and those which I plan on
> catching soon.

The more general problem is that, with -fnon-call-exceptions, we really expect 
a fully-fledged exception to be raised when something goes wrong.  Inserting 
__builtin_trap doesn't work because it's simply not equivalent to a throw.  In 
other words, if __builtin_throw would be inserted instead of __builtin_trap 
with -fnon-call-exceptions, things would probably be acceptable as-is.

-- 
Eric Botcazou

Re: [PATCH] Factor out gimple_dont_merge_p

2013-11-11 Thread Richard Biener

On Sat, 9 Nov 2013, Tom de Vries wrote:

> Richard,
> 
> This patch factors out gimple_dont_merge_p from gimple_equal_p and
> find_duplicate.
> 
> Bootstrapped and regtested on x86_64.
> 
> OK for trunk?

+static bool
+gimple_dont_merge_p (gimple stmt)
+{
+  switch (gimple_code (stmt))
+{
+case GIMPLE_CALL:
+  /* Eventually, we'll significantly complicate the CFG by adding
+back edges to properly model the effects of transaction restart.
+For the bulk of optimization this does not matter, but what we
+cannot recover from is tail merging blocks between two separate
+transactions.  Avoid that by making commit not match.  */
+  if (gimple_call_builtin_p (stmt, BUILT_IN_TM_COMMIT))
+   return true;
+
+  /* We cannot tail-merge the builtins that end transactions.
+??? The alternative being unsharing of BBs in the tm_init pass.  
*/
+  if (flag_tm
+ && (gimple_call_flags (stmt) & ECF_TM_BUILTIN)
+ && is_tm_ending_fndecl (gimple_call_fndecl (stmt)))
+   return true;


1) BUILT_IN_TM_COMMIT is handled in is_tm_ending_fndecl,
2) fndecl may be NULL

I'd simply get rid of gimple_dont_merge_p and call a
is_tm_ending (gimple) function you'd add to TM.

Richard.

> Thanks,
> - Tom
> 
> 2013-11-06  Tom de Vries  
> 
>   * tree-ssa-tail-merge.c (gimple_dont_merge_p): Factor function out
>   of ...
>   (gimple_equal_p): ... here, and ...
>   (find_duplicate): ... here.  Use gimple_dont_merge_p.

Re: [PATCH] Factor out gimple_operand_equal_value_p from gimple_equal_p

2013-11-11 Thread Richard Biener

On Sat, 9 Nov 2013, Tom de Vries wrote:

> Richard,
> 
> This patch factors out gimple_operand_equal_value_p from gimple_equal_p.
> 
> Bootstrapped and regtested on x86_64.
> 
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
> 
> 2013-11-06  Tom de Vries  
> 
>   * tree-ssa-tail-merge.c (gimple_operand_equal_value_p): Factor new
>   function out of ...
>   (gimple_equal_p): ... here.

Re: [PATCH] Handle GIMPLE_ASSIGNs with different vuse in gimple_equal_p

2013-11-11 Thread Richard Biener

On Sat, 9 Nov 2013, Tom de Vries wrote:

> Richard,
> 
> Consider the test-case test.c:
> ...
> int z;
> int x;
> 
> void
> f (int c, int d)
> {
>   if (c)
> z = 5;
>   else
> {
>   if (d)
>   x = 4;
>   z = 5;
> }
> }
> ...
> 
> Atm, we don't tail-merge the 'z = 5' blocks, because gimple_equal_p returns
> false for the 'z = 5' statements. The relevant code is this:
> ...
>   if (TREE_CODE (lhs1) != SSA_NAME
>   && TREE_CODE (lhs2) != SSA_NAME)
> return (vn_valueize (gimple_vdef (s1))
> == vn_valueize (gimple_vdef (s2)));
> ...
> The vdefs of the 'z = 5' statements are different, because the incoming vuses
> are different.
> 
> This patch handles GIMPLE_ASSIGNs with different vuse in gimple_equal_p, by
> doing a structural comparison.
> 
> Bootstrapped and regtested on x86_64.
> 
> OK for trunk?

Comments inline


diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index 98b5882..43516a7 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -1173,8 +1173,47 @@ gimple_equal_p (same_succ same_succ, gimple s1, gimple 
s2)
   lhs2 = gimple_get_lhs (s2);
   if (TREE_CODE (lhs1) != SSA_NAME
  && TREE_CODE (lhs2) != SSA_NAME)
-   return (vn_valueize (gimple_vdef (s1))
-   == vn_valueize (gimple_vdef (s2)));
+   {
+ /* If the vdef is the same, it's the same statement.  */
+ if (vn_valueize (gimple_vdef (s1))
+ == vn_valueize (gimple_vdef (s2)))
+   return true;
+
+ /* If the vdef is not the same but the vuse is the same, it's not the
+same stmt.  */
+ if (vn_valueize (gimple_vuse (s1))
+ == vn_valueize (gimple_vuse (s2)))
+   return false;

What's the logic behind this?  We want to use VN to get more "positive"
results - doing a negative early out looks suspicious to me ...

+ /* If the vdef is not the same and the vuse is not the same, it might 
be
+same stmt.  */
+
+ /* Test for structural equality.  */
+ if (gimple_assign_rhs_code (s1) != gimple_assign_rhs_code (s1)

s2

+ || (gimple_assign_nontemporal_move_p (s1)
+ != gimple_assign_nontemporal_move_p (s2)))

I don't think we should care (it'll be false - a later pass sets it,
it's an optimization hint, not a correctness issue).  More interesting
would be to assert that has_volatile_ops is the same if the operands
turned out to be the same.

+   return false;
+
+ if (!operand_equal_p (lhs1, lhs2, 0))
+   return false;
+
+ t1 = gimple_assign_rhs1 (s1);
+ t2 = gimple_assign_rhs1 (s2);
+ if (!gimple_operand_equal_value_p (t1, t2))
+   return false;
+
+ t1 = gimple_assign_rhs2 (s1);
+ t2 = gimple_assign_rhs2 (s2);
+ if (!gimple_operand_equal_value_p (t1, t2))
+   return false;
+
+ t1 = gimple_assign_rhs3 (s1);
+ t2 = gimple_assign_rhs3 (s2);
+ if (!gimple_operand_equal_value_p (t1, t2))
+   return false;

  for (i = 1; i < gimple_num_ops (s1); ++i)
t1 = gimple_op (s1, i);
...

but I think you should only compare rhs1 and thus only handle
GIMPLE_ASSIGN_SINGLEs this way - the others have a SSA name
lhs.

That makes the whole thing just

  if (TREE_CODE (lhs1) != SSA_NAME
  && TREE_CODE (lhs2) != SSA_NAME)
{
   if (vn_valueize (gimple_vdef (s1))
   == vn_valueize (gimple_vdef (s2)))
 return true;
   return operand_equal_p (lhs1, lhs2, 0)
   && gimple_operand_equal_value_p (gimple_assign_rhs1 (s1),
 gimple_assign_rhs2 (s2));
}

Ok with doing it this way.

Thanks,
Richard.

+ /* Same structure.  */
+ return true;
+   }
   else if (TREE_CODE (lhs1) == SSA_NAME
   && TREE_CODE (lhs2) == SSA_NAME)
return vn_valueize (lhs1) == vn_valueize (lhs2);

Re: [RFC] replace malloc with a decl on the stack

2013-11-11 Thread Ondřej Bílka

On Sun, Nov 10, 2013 at 04:27:00PM +0100, Marc Glisse wrote:
> Hello,
> 
> I am posting this patch to get some feedback on the approach. The
> goal is to replace malloc+free with a stack allocation (a decl
> actually) when the size is a small constant.
>
Why constraint yourself to small sizes. Stack allocation benefits is
speed and less memory comsumption due lack of fragmentation.

A possible way is to have thread local bounds to stack size and call
function with custom logic when it is outside of bounds.

Below is a simple implementation which creates a separate stack for
that (for simplicity and because it does not need to find bounds on
thread stack.)

With bit of more work it could do allocations in similar way as in
splitstack.

 
> For testing, I highjacked the "leaf" attribute, but it isn't right,
> I'll remove it from the list (not sure what I'll do for the
> testcases then). What I'd want instead is a "returns" attribute that
> means the function will return (either by "return" or an exception),
> as opposed to having an infinite loop, calling exit or longjmp, etc
> (sched-deps.c has a related call_may_noreturn_p). The situation I am
> trying to avoid is:
> p=malloc(12);
> f(p)
> free(p)
> 
> where f contains somewhere:
> free(p); exit(42);
> (or longjmp or something else that takes the regular call to free
> out of the execution flow).
> 
One of plans to extend malloc is add custom free handler, interface
would be something like dalloc(amount, destructor) which would invoke
destructor on free. Main motivation is memory pool that can be returned
by free.

With that extension it would be possible to mark pointer so its free
would be a nop.
> 
> 
> The size above which the malloc->stack transformation is not applied
> should depend on a parameter, I don't know if it should get its own
> or depend on an existing one. In any case, I'd like to avoid ending
> up with a ridiculously low threshold (my programs use GMP, which
> internally uses alloca up to 65536 bytes (even in recursive
> functions that have a dozen allocations), so I don't want gcc to
> tell me that 50 bytes are too much).
> 
> A program with a double-free may, with this patch, end up crashing
> on the first free instead of the second, but that's an invalid
> program anyway.
> 
> 

#include 
__thread void *__stack_from;
__thread void *__stack_cur;
__thread void *__stack_to;


#define STACK_ALLOC(size) ({ \
  void *__stack_new = __stack_cur + size; \
  if (__stack_new < __stack_cur || __stack_to > __stack_new) \
__stack_alloc (size); \
  else \
  { \
void *__s = __stack_cur; \
__stack_cur = __stack_new; \
__s; \
  } \
})

#define STACK_FREE(__stack_new) ({ \
  if (__stack_new < __stack_from || __stack_to > __stack_new) \
__stack_free (size); \
  else \
__stack_cur = __stack_new; \
})

static pthread_key_t key;
void
__stack_destroy (void *x)
{
  free (stack_from);
}
void *
__stack_alloc (size_t size)
{
  if (!__stack_from)
{
  __stack_from = malloc (1 << 18);
  __stack_to = __stack_from + (1 << 18);
  __stack_cur = __stack_from; _ pthread_key_create (&key, destroy);
  pthread_setspecific (key, &key);
}
  return malloc (size);
}
void
__stack_free (void *p)
{
  free (p);
}

[patch] Fix PR ada/35998

2013-11-11 Thread Eric Botcazou

Hi,

this is an old bug report from Jan, which was closed, then reopened by Tom at 
some point, but the patch never got reviewed.  The original submission is at:
  http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01857.html

Tested on x86_64-suse-linux, OK for the mainline?


2013-11-11  Jan Kratochvil  

PR ada/35998
* dwarf2out.c (add_byte_size_attribute): Omit attribute for size -1.


-- 
Eric BotcazouIndex: dwarf2out.c
===
--- dwarf2out.c	(revision 20)
+++ dwarf2out.c	(working copy)
@@ -16355,9 +16355,10 @@ add_byte_size_attribute (dw_die_ref die,
 
   /* Note that `size' might be -1 when we get to this point.  If it is, that
  indicates that the byte size of the entity in question is variable.  We
- have no good way of expressing this fact in Dwarf at the present time,
- so just let the -1 pass on through.  */
-  add_AT_unsigned (die, DW_AT_byte_size, size);
+ have no good way of expressing this fact in Dwarf at the present time
+ when location description was not used by the caller code instead.  */
+  if (size != (unsigned) -1)
+add_AT_unsigned (die, DW_AT_byte_size, size);
 }
 
 /* For a FIELD_DECL node which represents a bit-field, output an attribute

Re: [RFC] replace malloc with a decl on the stack

2013-11-11 Thread Jakub Jelinek

On Mon, Nov 11, 2013 at 11:08:14AM +0100, Ondřej Bílka wrote:
> On Sun, Nov 10, 2013 at 04:27:00PM +0100, Marc Glisse wrote:
> > I am posting this patch to get some feedback on the approach. The
> > goal is to replace malloc+free with a stack allocation (a decl
> > actually) when the size is a small constant.
> >
> Why constraint yourself to small sizes. Stack allocation benefits is
> speed and less memory comsumption due lack of fragmentation.

Because you can hardly predict what the program will have as stack
requirements in functions you call?  In leaf functions sure, you only care
not to create too large allocations that would go over the stack size,
but if you call other functions, you usually can't know (at least without
sufficient IPA analysis, but that is really hard because it is too early)
how much stack will it really need (both fixed requirements for non-VLA
vars on the stack, spill space, function call arguments and other overhead
and VLAs and also these malloc turned into stack allocation).
So, if you say have a malloc with corresponding free shortly afterwards
but some call in between and decide that it is fine to change it into stack
allocation when it is half the size of the remaining stack space, but then
two frames down there will be some non-VLA var that needs 3/4 of the old
remaining stack space, you've turned a correct program into a broken one.

Jakub

Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-11 Thread Dodji Seketeli

Hello,

As it appeared that concerns about the speed of
location_get_source_line were as present as the need of just fixing
this bug, I have conflated the two concerns in a new attempt below,
trying to address the points you guys have raised during the previous
reviews.

The patch below introduces a cache for the data read from the file we
want to emit caret diagnostic for.  In that cache it stashes the bytes
read from the file as well as a number of positions of line delimiters
that we encountered while reading the file.  It keeps a number of the
last file caches in memory in case location_get_source_line is later
asked for lines from the same file.

To avoid exploding the memory consumption, the number line delimiter
position saved is fixed (100).  So if a file is smaller than 100 lines
all of its line positions can be saved.  That is, if
location_get_source_line is first asked to return line 20, all the
position of the lines encountered since the beginning of the file --
up to line 20 -- are going to be saved in the cache.  Next time, if
location_get_source_line is asked to return line 10, as the position
of the beginning/end of line 10 is saved in the cache, returning that
line is fast.  If it's asked to return line 25, it will have to start
reading from line 20, not from the beginning of the file.

If the file is bigger than 100, then the patch just saves 100 line
positions.  To evenly spread the line position saved, it needs to know
the total number lines of the file.  Luckily we can usually get this
information from the line map subsystem (from libcpp).  The patch thus
adds a new entry point in the line map
(linemap_get_file_highest_location) that gives the greatest
source_location seen for a given file and uses that to decide what
line position to save in the cache.

The speed gain I have seen is variable, depending on the size (in
number of quasi adjacent lines) of the diagnostics, but on some
pathological cases I have seen, it can divide the time spend
displaying the diagnostics by two ore more.  I had to add hackery in
the code to measure this, unfortunately :-(

The patch doesn't try to reuse the same infrastructure for gcov for
now.  I am letting that for later now when I have more time.

Bootstrapped on x86_64-unknown-linux-gnu against trunk.

PS: To ease the review (especially for Tom Tromey who I am CC-ing
because of the new entry point in the line map sub-system) I am
attaching the cover letter of the patch itself that does the analysis
of the initial bug.  Sorry to the other addressees of this message for
the redundancy.

Thanks.


>8<---
In this problem report, the compiler is fed a (bogus) translation unit
in which some literals contain bytes whose value is zero.  The
preprocessor detects that and proceeds to emit diagnostics for that
king of bogus literals.  But then when the diagnostics machinery
re-reads the input file again to display the bogus literals with a
caret, it attempts to calculate the length of each of the lines it got
using fgets.  The line length calculation is done using strlen.  But
that doesn't work well when the content of the line can have several
zero bytes.  The result is that the read_line never sees the end of
the line because strlen repeatedly reports that the line ends before
the end-of-line character; so read_line thinks its buffer for reading
the line is too small; it thus increases the buffer, leading to a huge
memory consumption and disaster.

Here is what this patch does.

location_get_source_line is modified to return the length of a source
line that can now contain bytes with zero value.
diagnostic_show_locus() is then modified to consider that a line can
have characters of value zero, and so just shows a white space when
instructed to display one of these characters.

Additionally location_get_source_line is modified to avoid re-reading
each and every line from the beginning of the file until it reaches
the line number N that it is instructed to get; this was leading to
annoying quadratic behaviour when reading adjacent lines near the end
of (big) files.  So a cache is now associated to the file opened in
text mode.  When the content of the file is read, that content is
stashed in the file cache.  That file cache is searched for line
delimiters.  A number of line positions are saved in the cache and a
number of file caches are kept in memory.  That way when
location_get_source_line is asked to read line N + 1, it just has to
start reading from line N that it has already read.
>8<---

And now the real patch.

libcpp/ChangeLog:

* include/line-map.h (linemap_get_file_highest_location): Declare
new function.
* line-map.c (linemap_get_file_highest_location): Define it.

gcc/ChangeLog:

* input.h (location_get_source_line): Take an additional line_size
parameter.
(void diagno

Committed: config/arc/arc.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define.

2013-11-11 Thread Joern Rennecke



2013-11-11  Joern Rennecke  

* config/arc/arc.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define.

Index: config/arc/arc.h
===
--- config/arc/arc.h(revision 204665)
+++ config/arc/arc.h(working copy)
@@ -1087,6 +1087,22 @@ #define MEMORY_MOVE_COST(MODE,CLASS,IN)
expensive than reg->reg moves.  */
 #define BRANCH_COST(speed_p, predictable_p) 2
 
+/* Scc sets the destination to 1 and then conditionally zeroes it.
+   Best case, ORed SCCs can be made into clear - condset - condset.
+   But it could also end up as five insns.  So say it costs four on
+   average.
+   These extra instructions - and the second comparison - will also be
+   an extra cost if the first comparison would have been decisive.
+   So get an average saving, with a probability of the first branch
+   beging decisive of p0, we want:
+   p0 * (branch_cost - 4) > (1 - p0) * 5
+   ??? We don't get to see that probability to evaluate, so we can
+   only wildly guess that it might be 50%.
+   ??? The compiler also lacks the notion of branch predictability.  */
+#define LOGICAL_OP_NON_SHORT_CIRCUIT \
+  (BRANCH_COST (optimize_function_for_speed_p (cfun), \
+   false) > 9)
+
 /* Nonzero if access to memory by bytes is slow and undesirable.
For RISC chips, it means that access to memory by bytes is no
better than access by words when possible, so grab a whole word

Re: RFA: Fix PR middle-end/59049

2013-11-11 Thread Richard Biener

On Fri, Nov 8, 2013 at 5:02 PM, Jeff Law  wrote:
> On 11/08/13 07:45, Steven Bosscher wrote:
>>
>> On Fri, Nov 8, 2013 at 3:40 PM, Joern Rennecke
>>  wrote:
>>>
>>> bootstrapped / regtested on i686-pc-linux-gnu.
>>
>>
>> Not a very elaborate description of the patch, eh? :-)
>>
>> This is IMHO not OK without at least an explanation of why the
>> comparison of two const_ints is not folded. Better yet would be to fix
>> that underlying problem. We should not present such non-sense to the
>> RTL parts of the middle end.
> Agreed.

In this case it's fold-all-builtins folding a strlen call with a
PHI <"foo", "bar"> argument.  IMHO not presenting RTL with such
non-sense is best achieved by not letting TER do constant propagation
(because it doesn't "fold" the result).  We can never rule out such
stray non-propagated constants, so that makes expand more robust
(and hopes for RTL CCP).

Index: gcc/tree-ssa-ter.c
===
--- gcc/tree-ssa-ter.c  (revision 204664)
+++ gcc/tree-ssa-ter.c  (working copy)
@@ -438,6 +439,12 @@ ter_is_replaceable_p (gimple stmt)
  && !is_gimple_val (gimple_assign_rhs1 (stmt)))
return false;

+  /* Do not propagate "modeless" constants - we may end up
confusing the RTL
+expanders.  Leave the optimization to RTL CCP.  */
+  if (gimple_assign_single_p (stmt)
+ && CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt)))
+   return false;
+
   return true;
 }
   return false;

does that make sense?  I'll test it then.

Thanks,
Richard.

> jeff
>

Re: [RFC] replace malloc with a decl on the stack

2013-11-11 Thread Ondřej Bílka

On Mon, Nov 11, 2013 at 11:19:05AM +0100, Jakub Jelinek wrote:
> On Mon, Nov 11, 2013 at 11:08:14AM +0100, Ondřej Bílka wrote:
> > On Sun, Nov 10, 2013 at 04:27:00PM +0100, Marc Glisse wrote:
> > > I am posting this patch to get some feedback on the approach. The
> > > goal is to replace malloc+free with a stack allocation (a decl
> > > actually) when the size is a small constant.
> > >
> > Why constraint yourself to small sizes. Stack allocation benefits is
> > speed and less memory comsumption due lack of fragmentation.
> 
> Because you can hardly predict what the program will have as stack
> requirements in functions you call?  In leaf functions sure, you only care
> not to create too large allocations that would go over the stack size,
> but if you call other functions, you usually can't know (at least without
> sufficient IPA analysis, but that is really hard because it is too early)

Which is completely irrelevant. Checking it in run time is easy as in my
prototype which uses a alternative stack so added stack space does not
coun.

> how much stack will it really need (both fixed requirements for non-VLA
> vars on the stack, spill space, function call arguments and other overhead
> and VLAs and also these malloc turned into stack allocation).

This is argument to turn a alloca and VLA to ones that check if there is
enough stack and get space by other means. This can be with same logic
as described above. 

> So, if you say have a malloc with corresponding free shortly afterwards
> but some call in between and decide that it is fine to change it into stack
> allocation when it is half the size of the remaining stack space, but then

Of total stack space, also if we use main stack user should enlarge stacks
accordingly.

> two frames down there will be some non-VLA var that needs 3/4 of the old
> remaining stack space, you've turned a correct program into a broken one.
> 
>   Jakub

[patch] Fix debug info for modified parameter

2013-11-11 Thread Eric Botcazou

Hi,

since the switch to SSA form at -O0, the compiler generates wrong debug info 
for something like:

void
foo (int i)
{
  int j = 0;
  i = 1;
  j = j + 1;  /* BREAK */
}

If you try to display the value of i after breaking in GDB, you don't get 1.
The reason is that there is no default def SSA_NAME for i in this case, so no 
partitions get the RTL location of the parameter.

Tentative patch attached, it's admittedly a little heavy, but I don't see any 
other solution.  Tested on x86_64-suse-linux, OK for the mainline?


2013-11-11  Eric Botcazou  

* tree-outof-ssa.c (remove_ssa_form): For a parameter without default
def, pretend that the single partition that contains all the SSA_NAMEs
for this parameter, if it exists, also contains the default def.


2013-11-11  Eric Botcazou  

* gcc.dg/guality/param-4.c: New test.


-- 
Eric BotcazouIndex: tree-outof-ssa.c
===
--- tree-outof-ssa.c	(revision 20)
+++ tree-outof-ssa.c	(working copy)
@@ -972,7 +972,10 @@ expand_phi_nodes (struct ssaexpand *sa)
 static void
 remove_ssa_form (bool perform_ter, struct ssaexpand *sa)
 {
+  tree fndecl = current_function_decl, arg;
   bitmap values = NULL;
+  bitmap parameter_has_default_def;
+  struct pointer_map_t *parameter_map;
   var_map map;
   unsigned i;
 
@@ -999,17 +1002,85 @@ remove_ssa_form (bool perform_ter, struc
 
   sa->map = map;
   sa->values = values;
+
+  /* Find out which partitions contain a default def.  If, for a parameter, no
+ partitions contain a default def for this parameter, then we pretend that
+ the single partition that contains all the SSA_NAMEs for this parameter
+ also contains the (ghost) default def, if such a partition exists.  This
+ makes it possible to have correct debug info, without optimization, for
+ parameters written to before being read.  */
   sa->partition_has_default_def = BITMAP_ALLOC (NULL);
+  parameter_has_default_def = BITMAP_ALLOC (NULL);
+  parameter_map = pointer_map_create ();
+
   for (i = 1; i < num_ssa_names; i++)
 {
-  tree t = ssa_name (i);
-  if (t && SSA_NAME_IS_DEFAULT_DEF (t))
+  tree t, var;
+
+  t = ssa_name (i);
+  if (t == NULL_TREE)
+	continue;
+
+  var = SSA_NAME_VAR (t);
+
+  /* If this is a default def in a partition, set the appropriate flag for
+	 the partition and, if this is a SSA_NAME for a parameter, record that
+	 there is a default def for the parameter.  */
+  if (SSA_NAME_IS_DEFAULT_DEF (t))
+	{
+	  int p = var_to_partition (map, t);
+	  if (p != NO_PARTITION)
+	{
+	  bitmap_set_bit (sa->partition_has_default_def, p);
+	  if (var && TREE_CODE (var) == PARM_DECL)
+		bitmap_set_bit (parameter_has_default_def, DECL_UID (var));
+	}
+	}
+
+  /* Otherwise, if this is a SSA_NAME for a parameter without default def,
+	 at least yet, and in a partition, record the partition as associated
+	 with the parameter, unless there is already a different one recorded,
+	 in which case record NO_PARTITION.  */
+  else if (var && TREE_CODE (var) == PARM_DECL
+	   && !bitmap_bit_p (parameter_has_default_def, DECL_UID (var)))
 	{
 	  int p = var_to_partition (map, t);
 	  if (p != NO_PARTITION)
-	bitmap_set_bit (sa->partition_has_default_def, p);
+	{
+	  void **slot = pointer_map_contains (parameter_map, var);
+	  if (slot)
+		{
+		  int oldp = (int) (intptr_t) *slot;
+		  if (oldp != p)
+		*slot = (void *) (intptr_t) NO_PARTITION;
+		}
+	  else
+		{
+		  slot = pointer_map_insert (parameter_map, var);
+		  *slot = (void *) (intptr_t) p;
+		}
+	}
 	}
 }
+
+  /* Finally walk the scalar parameters without default def and mark that
+ the partition that contains all the SSA_NAMEs for a parameter, if it
+ exists, contains the (ghost) default def.  */
+  for (arg = DECL_ARGUMENTS (fndecl); arg; arg = DECL_CHAIN (arg))
+if (is_gimple_reg_type (TREE_TYPE (arg))
+	&& !bitmap_bit_p (parameter_has_default_def, DECL_UID (arg)))
+  {
+	void **slot = pointer_map_contains (parameter_map, arg);
+	if (slot)
+	  {
+	int p = (int) (intptr_t) *slot;
+	if (p != NO_PARTITION)
+	  bitmap_set_bit (sa->partition_has_default_def, p);
+	  }
+  }
+
+  pointer_map_destroy (parameter_map);
+  BITMAP_FREE (parameter_has_default_def);
 }
 
 /* { dg-do run } */
/* { dg-options "-g" } */
/* { dg-skip-if "" { *-*-* }  { "*" } { "-O0" } } */

void
foo (int i)
{
  int j = 0;
  i = 1;
  j = j + 1;  /* BREAK */
}

int
main (void)
{
  foo (0);
  return 0;
}

/* { dg-final { gdb-test 10 "i" "1" } } */

[patch] Add CONSTRUCTOR_NO_CLEARING flag

2013-11-11 Thread Eric Botcazou

Hi,

in Ada 2012 it is allowed to omit components of aggregates (the equivalent of 
initializers/constructors); in this case, the contents of the corresponding 
fields in the record become undefined.  Now the gimplifier implements the C 
semantics of clearing the missing components, so this patch introduces a new 
flag on constructors to modify that.

Tested on x86_64-suse-linux, OK for the mainline?


2013-11-11  Tristan Gingold  
Eric Botcazou  

* tree.h (CONSTRUCTOR_NO_CLEARING): Define.
* tree-core.h (CONSTRUCTOR_NO_CLEARING): Document it.
* tree.def (CONSTRUCTOR): Likewise.
* gimplify.c (gimplify_init_constructor): Do not clear the object when
the constructor is incomplete and CONSTRUCTOR_NO_CLEARING is set.
ada/
* gcc-interface/utils2.c (gnat_build_constructor): Also set the flag
CONSTRUCTOR_NO_CLEARING on the constructor.


-- 
Eric BotcazouIndex: tree-core.h
===
--- tree-core.h	(revision 20)
+++ tree-core.h	(working copy)
@@ -823,6 +823,9 @@ struct GTY(()) tree_base {
VAR_DECL, FUNCTION_DECL
IDENTIFIER_NODE
 
+   CONSTRUCTOR_NO_CLEARING in
+   CONSTRUCTOR
+
ASM_VOLATILE_P in
ASM_EXPR
 
Index: tree.h
===
--- tree.h	(revision 20)
+++ tree.h	(working copy)
@@ -957,6 +957,8 @@ extern void omp_clause_range_check_faile
   (&(*CONSTRUCTOR_ELTS (NODE))[IDX])
 #define CONSTRUCTOR_NELTS(NODE) \
   (vec_safe_length (CONSTRUCTOR_ELTS (NODE)))
+#define CONSTRUCTOR_NO_CLEARING(NODE) \
+  (CONSTRUCTOR_CHECK (NODE)->base.public_flag)
 
 /* Iterate through the vector V of CONSTRUCTOR_ELT elements, yielding the
value of each element (stored within VAL). IX must be a scratch variable
Index: tree.def
===
--- tree.def	(revision 20)
+++ tree.def	(working copy)
@@ -458,7 +458,10 @@ DEFTREECODE (OBJ_TYPE_REF, "obj_type_ref
value in a SAVE_EXPR if you want to evaluate side effects only once.)
 
For RECORD_TYPE, UNION_TYPE, or QUAL_UNION_TYPE:
-   The field INDEX of each node is a FIELD_DECL.  */
+   The field INDEX of each node is a FIELD_DECL.
+   Components that aren't present are cleared as per the C semantics,
+   unless the CONSTRUCTOR_NO_CLEARING flag is set, in which case they
+   become undefined.  */
 DEFTREECODE (CONSTRUCTOR, "constructor", tcc_exceptional, 0)
 
 /* The expression types are mostly straightforward, with the fourth argument
Index: gimplify.c
===
--- gimplify.c	(revision 20)
+++ gimplify.c	(working copy)
@@ -4052,7 +4052,7 @@ gimplify_init_constructor (tree *expr_p,
 	 objects.  Initializers for such objects must explicitly set
 	 every field that needs to be set.  */
 	  cleared = false;
-	else if (!complete_p)
+	else if (!complete_p && !CONSTRUCTOR_NO_CLEARING (ctor))
 	  /* If the constructor isn't complete, clear the whole object
 	 beforehand.
 
Index: ada/gcc-interface/utils2.c
===
--- ada/gcc-interface/utils2.c	(revision 20)
+++ ada/gcc-interface/utils2.c	(working copy)
@@ -1874,6 +1874,7 @@ gnat_build_constructor (tree type, vecqsort (compare_elmt_bitpos);
 
   result = build_constructor (type, v);
+  CONSTRUCTOR_NO_CLEARING (result) = 1;
   TREE_CONSTANT (result) = TREE_STATIC (result) = allconstant;
   TREE_SIDE_EFFECTS (result) = side_effects;
   TREE_READONLY (result) = TYPE_READONLY (type) || allconstant;

[ARM] Use standard t-elf libgcc fragment on VxWorks

2013-11-11 Thread Eric Botcazou

Hi,

this is something I forgot to submit right after submitting
  http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01906.html
We want to use the standard t-elf fragment on VxWorks as well.

Tested on ARM/VxWorks, OK for the mainline?


2013-11-11  Eric Botcazou  

* config.host (arm-wrs-vxworks): Replace arm/t-vxworks with arm/t-elf
in tmake_file.
* config/arm/t-vxworks: Delete.


-- 
Eric BotcazouIndex: config.host
===
--- config.host	(revision 20)
+++ config.host	(working copy)
@@ -333,7 +333,7 @@ arc*-*-linux-uclibc*)
 	extra_parts="crti.o crtn.o crtend.o crtbegin.o crtendS.o crtbeginS.o libgmon.a crtg.o crtgend.o"
 	;;
 arm-wrs-vxworks)
-	tmake_file="$tmake_file arm/t-arm arm/t-vxworks t-softfp-sfdf t-softfp-excl arm/t-softfp t-softfp"
+	tmake_file="$tmake_file arm/t-arm arm/t-elf t-softfp-sfdf t-softfp-excl arm/t-softfp t-softfp"
 	extra_parts="$extra_parts crti.o crtn.o"
 	;;
 arm*-*-netbsdelf*)
Index: config/arm/t-vxworks
===
--- config/arm/t-vxworks	(revision 20)
+++ config/arm/t-vxworks	(working copy)
@@ -1 +0,0 @@
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2 _ctzsi2

Re: RFA: Fix PR middle-end/59049

2013-11-11 Thread Eric Botcazou

> In this case it's fold-all-builtins folding a strlen call with a
> PHI <"foo", "bar"> argument.  IMHO not presenting RTL with such
> non-sense is best achieved by not letting TER do constant propagation
> (because it doesn't "fold" the result).  We can never rule out such
> stray non-propagated constants, so that makes expand more robust
> (and hopes for RTL CCP).
> 
> Index: gcc/tree-ssa-ter.c
> ===
> --- gcc/tree-ssa-ter.c  (revision 204664)
> +++ gcc/tree-ssa-ter.c  (working copy)
> @@ -438,6 +439,12 @@ ter_is_replaceable_p (gimple stmt)
>   && !is_gimple_val (gimple_assign_rhs1 (stmt)))
> return false;
> 
> +  /* Do not propagate "modeless" constants - we may end up
> confusing the RTL
> +expanders.  Leave the optimization to RTL CCP.  */
> +  if (gimple_assign_single_p (stmt)
> + && CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt)))
> +   return false;
> +
>return true;
>  }
>return false;
> 
> does that make sense?  I'll test it then.

I agree with Joern that we want more constant propagation/folding during 
RTL expansion, not less, so IMO that's the wrong direction.

-- 
Eric Botcazou

Re: RFA: Fix PR middle-end/59049

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 12:21 PM, Eric Botcazou  wrote:
>> In this case it's fold-all-builtins folding a strlen call with a
>> PHI <"foo", "bar"> argument.  IMHO not presenting RTL with such
>> non-sense is best achieved by not letting TER do constant propagation
>> (because it doesn't "fold" the result).  We can never rule out such
>> stray non-propagated constants, so that makes expand more robust
>> (and hopes for RTL CCP).
>>
>> Index: gcc/tree-ssa-ter.c
>> ===
>> --- gcc/tree-ssa-ter.c  (revision 204664)
>> +++ gcc/tree-ssa-ter.c  (working copy)
>> @@ -438,6 +439,12 @@ ter_is_replaceable_p (gimple stmt)
>>   && !is_gimple_val (gimple_assign_rhs1 (stmt)))
>> return false;
>>
>> +  /* Do not propagate "modeless" constants - we may end up
>> confusing the RTL
>> +expanders.  Leave the optimization to RTL CCP.  */
>> +  if (gimple_assign_single_p (stmt)
>> + && CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt)))
>> +   return false;
>> +
>>return true;
>>  }
>>return false;
>>
>> does that make sense?  I'll test it then.
>
> I agree with Joern that we want more constant propagation/folding during
> RTL expansion, not less, so IMO that's the wrong direction.

The question is whether you for example want to handle

  a_2 = 1 + 0;

at RTL expansion time?  I'd say it's better to have

  b_1 = 0;
  a_2 = 1 + b_1;

not to say the proposed patch would be a way to ensure the first
didn't happen - it just makes it less likely that TER gets you
into this.

OTOH as TER is now "explicit" (the expander has to lookup SSA defs)
those uses should better deal with constants they receive from that.

Richard.

> --
> Eric Botcazou

Re: [PATCH 2/3] libstdc++-v3: ::tmpnam depends on uClibc SUSV4_LEGACY

2013-11-11 Thread Jonathan Wakely

On 8 November 2013 10:29, Bernhard Reutner-Fischer wrote:
>> On 04/11/2013 02:04 PM, Bernhard Reutner-Fischer wrote:
>>>
>>> I would have expected that somebody would tell me that omitting ::tmpnam
>>> violates 27.9.2  from the spec but noone yelled at me yet?

std::tmpnam, like std::gets, should be killed with fire. If a target C
library doesn't provide it then I have no problem is libstdc++ doesn't
provide it either.

> Attaching an updated patch that i was using since March (without
> regressions) which takes Rainer's comments about _GLIBCXX_USE_TMPNAM
> into account.
> Ok for trunk?

Thanks for following this up.

I'm curious why you use tmpnam("NULL") rather than tmpnam(NULL) or
tmpnam("something")?  Using the string literal "NULL" is a bit
confusing (although not a problem.)

How does __UCLIBC_SUSV4_LEGACY__ get defined?  We'd have a problem if
users defined that at configure time but not later when using the
library.

Re: [ARM] Use standard t-elf libgcc fragment on VxWorks

2013-11-11 Thread Ramana Radhakrishnan


On 11/11/13 11:11, Eric Botcazou wrote:

Hi,

this is something I forgot to submit right after submitting
   http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01906.html
We want to use the standard t-elf fragment on VxWorks as well.

Tested on ARM/VxWorks, OK for the mainline?


Ok, please apply.

regards
Ramana

Re: RFA: Fix PR middle-end/59049

2013-11-11 Thread Jakub Jelinek

On Mon, Nov 11, 2013 at 12:26:09PM +0100, Richard Biener wrote:
> The question is whether you for example want to handle
> 
>   a_2 = 1 + 0;
> 
> at RTL expansion time?  I'd say it's better to have

I think we already handle that just fine, there are tons of various
simplify_gen_* calls during expansion, and we know there the mode etc.

Just Joern hit a place which wasn't prepared to handle it properly,
so either we handle it as you are suggesting by forcing one of the
constants into a register, or we simplify the comparison and if it
simplifies into a constant, we transform it to something simpler.

Jakub

Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-11-11 Thread Richard Biener

On Fri, Nov 8, 2013 at 6:41 PM, Steven Bosscher  wrote:
> On Fri, Nov 8, 2013 at 6:20 PM, Steven Bosscher wrote:
>> On Wed, Oct 30, 2013 at 5:03 AM, Andrew Pinski wrote:
>>> Here is what I applied in the end; Jeff told me just to remove the
>>> testcase.  I added the comment trying to explain why it was the
>>> opposite order of PHI-opt.
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>> ChangeLog:
>>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
>>
>> Eh, why???
>>
>> The file has this comment:
>>
>> 25/* rtl is needed only because arm back-end requires it for
>> 26   BRANCH_COST.  */
>> 27#include "rtl.h"
>> 28#include "tm_p.h"
>>
>> Can you please clarify why this is not something to be fixed in the
>> ARM back end?
>
>
> Could be fixed like attached. Can you please have a look and
> foster-parent it if you like it?

Looks good to me - Richard, can you pick it up?

Thanks,
Richard.

> Thanks,
>
> Ciao!
> Steven

Re: [RFA][PATCH] Minor fix to aliasing machinery

2013-11-11 Thread Richard Biener

On Sat, Nov 9, 2013 at 12:18 AM, Marc Glisse  wrote:
> On Wed, 6 Nov 2013, Richard Biener wrote:
>
>> So the only thing that remains is the mem_ref_offset thing and yes, I
>> guess
>> I'd prefer to use double-ints because we deal with bit offsets in the end.
>
>
> Here it is (bootstrap+testsuite on x86_64-unknown-linux-gnu). It feels
> rather artificial to do this small bit of computation with double_int
> when so much else assumes HWI is enough, but why not...

Ok.

Thanks,
Richard.

> 2013-11-09  Marc Glisse  
>
> Jeff Law  
>
> gcc/
> * tree-ssa-alias.c (stmt_kills_ref_p_1): Use
> ao_ref_init_from_ptr_and_size for builtins.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/alias-27.c: New testcase.
>
> --
> Marc Glisse
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(working copy)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized" } */
> +
> +void f (long *p) {
> +  *p = 42;
> +  p[4] = 42;
> +  __builtin_memset (p, 0, 100);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "= 42" "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>
> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c
> ___
> Added: svn:keywords
> ## -0,0 +1 ##
> +Author Date Id Revision URL
> \ No newline at end of property
> Added: svn:eol-style
> ## -0,0 +1 ##
> +native
> \ No newline at end of property
> Index: gcc/tree-ssa-alias.c
> ===
> --- gcc/tree-ssa-alias.c(revision 204608)
> +++ gcc/tree-ssa-alias.c(working copy)
> @@ -2001,23 +2001,24 @@ stmt_may_clobber_ref_p (gimple stmt, tre
>return stmt_may_clobber_ref_p_1 (stmt, &r);
>  }
>
>  /* If STMT kills the memory reference REF return true, otherwise
> return false.  */
>
>  static bool
>  stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref)
>  {
>/* For a must-alias check we need to be able to constrain
> - the access properly.  */
> -  ao_ref_base (ref);
> -  if (ref->max_size == -1)
> + the access properly.
> + FIXME: except for BUILTIN_FREE.  */
> +  if (!ao_ref_base (ref)
> +  || ref->max_size == -1)
>  return false;
>
>if (gimple_has_lhs (stmt)
>&& TREE_CODE (gimple_get_lhs (stmt)) != SSA_NAME
>/* The assignment is not necessarily carried out if it can throw
>  and we can catch it in the current function where we could inspect
>  the previous value.
>  ???  We only need to care about the RHS throwing.  For aggregate
>  assignments or similar calls and non-call exceptions the LHS
>  might throw as well.  */
> @@ -2090,37 +2091,47 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref
>   case BUILT_IN_MEMPCPY:
>   case BUILT_IN_MEMMOVE:
>   case BUILT_IN_MEMSET:
>   case BUILT_IN_MEMCPY_CHK:
>   case BUILT_IN_MEMPCPY_CHK:
>   case BUILT_IN_MEMMOVE_CHK:
>   case BUILT_IN_MEMSET_CHK:
> {
>   tree dest = gimple_call_arg (stmt, 0);
>   tree len = gimple_call_arg (stmt, 2);
> - tree base = NULL_TREE;
> - HOST_WIDE_INT offset = 0;
>   if (!host_integerp (len, 0))
> return false;
> - if (TREE_CODE (dest) == ADDR_EXPR)
> -   base = get_addr_base_and_unit_offset (TREE_OPERAND (dest,
> 0),
> - &offset);
> - else if (TREE_CODE (dest) == SSA_NAME)
> -   base = dest;
> - if (base
> - && base == ao_ref_base (ref))
> + tree rbase = ref->base;
> + double_int roffset = double_int::from_shwi (ref->offset);
> + ao_ref dref;
> + ao_ref_init_from_ptr_and_size (&dref, dest, len);
> + tree base = ao_ref_base (&dref);
> + double_int offset = double_int::from_shwi (dref.offset);
> + double_int bpu = double_int::from_uhwi (BITS_PER_UNIT);
> + if (!base || dref.size == -1)
> +   return false;
> + if (TREE_CODE (base) == MEM_REF)
> +   {
> + if (TREE_CODE (rbase) != MEM_REF)
> +   return false;
> + // Compare pointers.
> + offset += bpu * mem_ref_offset (base);
> + roffset += bpu * mem_ref_offset (rbase);
> + base = TREE_OPERAND (base, 0);
> + rbase = TREE_OPERAND (rbase, 0);
> +   }
> + if (base == rbase)
> {
> - HOST_WIDE_INT size = TREE_INT_CST_LOW (len);
> - if (offset <= ref->offset / BITS_PER_UNIT
> - && (

Re: Some wide-int review comments

2013-11-11 Thread Richard Biener

On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck  wrote:
> On 11/08/2013 05:30 AM, Richard Sandiford wrote:
>>  From tree-vrp.c:
>>
>> @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code
>>   /* If the singed operation wraps then int_const_binop has done
>>  everything we want.  */
>>   ;
>> +  /* Signed division of -1/0 overflows and by the time it gets here
>> + returns NULL_TREE.  */
>> +  else if (!res)
>> +return NULL_TREE;
>> else if ((TREE_OVERFLOW (res)
>> && !TREE_OVERFLOW (val1)
>> && !TREE_OVERFLOW (val2))
>>
>> Why is this case different from trunk?  Or is it a bug-fix independent
>> of wide-int?
>
> the api for division is different for wide-int than it was for double-int.

But this is using a tree API (int_const_binop) that didn't change
(it returned NULL for / 0 previously).  So what makes us arrive here
now?  (I agree there is a bug in VRP, but it shouldn't manifest itself
only on wide-int)

Richard.

>>
>> Thanks,
>> Richard
>
>

Re: some prep work to make JUMP_TABLE_DATA a non-active_insn_p object

2013-11-11 Thread Richard Biener

On Sun, Nov 10, 2013 at 6:11 PM, Eric Botcazou  wrote:
>> This is the first patch of what I think will be four to fix those few
>> places.
>>
>> Bootstrapped&tested on powerpc64-unknown-linux-gnu. Also built SH to be
>> sure.
>>
>> OK for trunk?
>
> The generic part is OK (modulo the additional space after ! in the 3rd hunk of
> the haifa-sched.c patch).

The rest is ok as well.

Richard.

> --
> Eric Botcazou

Re: [PATCH] make has_gate and has_execute useless

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 1:39 AM, Trevor Saunders  wrote:
> On Fri, Nov 08, 2013 at 10:37:00AM +0100, Richard Biener wrote:
>> On Thu, Nov 7, 2013 at 5:00 PM,   wrote:
>> > From: Trevor Saunders 
>> >
>> > Hi,
>> >
>> >  This is the result of seeing what it would take to get rid of the 
>> > has_gate and
>> > has_execute flags on pass_data.  It turns out not much, but I wanted
>> > confirmation this part is ok before I go deal with all the places that
>> > initialize the fields.
>> >
>> > I bootstrapped this part on x86_64-unknown-linux-gnu and saw no new test 
>> > suite
>> > regressions (ignoring the silk stuff because the full paths in its test 
>> > names
>> > break my test script for now) Any reason this patch with the actual 
>> > removal of the flags wouldn't be ok?
>>
>> The has_gate flag is easy to remove (without a TODO_ hack), right?
>> No gate simply means that the gate returns always true.  The only
>> weird thing is
>>
>>   /* If we have a gate, combine the properties that we could have with
>>  and without the pass being examined.  */
>>   if (pass->has_gate)
>> properties &= new_properties;
>>   else
>> properties = new_properties;
>>
>> which I don't understand (and you just removed all properties handling 
>> there).
>>
>> So can you split out removing has_gate?  This part is obviously ok.
>>
>> Then, for ->execute () I'd have refactored the code to make
>> ->sub passes explicitely executed by the default ->execute ()
>> implementation only.  That is, passes without overriding execute
>> are containers only.  Can you quickly check whether that would
>> work out?
>
> Ok, I've now given this a shot and wasn't terribly successful, if I just
> change execute_pass_list and execute_ipa_pass_list to handle container
> passes executing their sub passes I get the following ICE
>
> ./gt-passes.h:77:2: internal compiler error: Segmentation fault
>  };
>^
>0xd43d96 crash_signal
>/src/gcc/gcc/toplev.c:334
>0xd901a9 ssa_default_def(function*, tree_node*)
>/src/gcc/gcc/tree-dfa.c:318
>0xb56d77 ipa_analyze_params_uses
>/src/gcc/gcc/ipa-prop.c:2094
>0xb57275 ipa_analyze_node(cgraph_node*)
>/src/gcc/gcc/ipa-prop.c:2179
>0x13e2b6d ipcp_generate_summary
>/src/gcc/gcc/ipa-cp.c:3615
>0xc55a2a
>
> execute_ipa_summary_passes(ipa_opt_pass_d*)
>/src/gcc/gcc/passes.c:1991
>0x943341 ipa_passes
>
> /src/gcc/gcc/cgraphunit.c:2011
>0x943675
>compile()
>
> /src/gcc/gcc/cgraphunit.c:2118
>
>now
>  Which is because fn->gimple_df is null.  I expect that's because pass
>  ordering changed somehow, but I'm not sure off hand how or ifthat's
>  something worth figuring out right now.

Yeah, some of the walking doesn't seem to work ... probably a
pass with sub-passes already has an execute member function ;)

>  Another issue I realized is that doing this will change the order of
>  plugin events from
>  start container pass a
>  end container pass a
>  start contained pass b
>  end contained pass b
>  ...
>
>  to
>
>  start container pass a
>  start contained pass b
>  end contained pass b
>  end container pass a
>
>  Arguably that's an improvement, but I'm not sure if changing that APi
>  like that is acceptable.

Indeed an improvement.  Can you attach your patch?

Thanks,
Richard.

>  Trev
>
>>
>> Thanks,
>> Richard.
>>
>> > Trev
>> >
>> > 2013-11-06  Trevor Saunders  
>> >
>> > * pass_manager.h (pass_manager): Adjust.
>> > * passes.c (opt_pass::execute): Tell the pass manager it doesn't 
>> > need
>> > to do anything for this pass.
>> > (pass_manager::register_dump_files_1): Don't uselessly deal with
>> > properties of passes.
>> > (pass_manager::register_dump_files): Adjust.
>> > (dump_one_pass): Just call pass->gate ().
>> > (execute_ipa_summary_passes): Likewise.
>> > (execute_one_pass): Don't check pass->has_execute flag.
>> > (ipa_write_summaries_2): Don't check pass->has_gate flag.
>> > (ipa_write_optimization_summaries_1): Likewise.
>> > (ipa_read_summaries_1): Likewise.
>> > (ipa_read_optimization_summaries_1): Likewise.
>> > (execute_ipa_stmt_fixups): Likewise.
>> > * tree-pass.h (pass_data): Rename has_gate to useless_has_ga

Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2

2013-11-11 Thread Richard Biener

On Thu, Oct 31, 2013 at 1:46 AM, Sandra Loosemore
 wrote:
> On 10/29/2013 02:51 AM, Bernd Edlinger wrote:
>>
>>
>> On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote:
>>>
>>> On 10/28/2013 03:20 AM, Bernd Edlinger wrote:

 I have attached an update to your patch, that should
 a) fix the recursion problem.
 b) restrict the -fstrict-volatile-bitfields to not violate the C++
 memory model.
>
>
> Here's a new version of the update patch.
>
>
>>> Alternatively, if strict_volatile_bitfield_p returns false but
>>> flag_strict_volatile_bitfields> 0, then always force to word_mode and
>>> change the -fstrict-volatile-bitfields documentation to indicate that's
>>> the fallback if the insertion/extraction cannot be done in the declared
>>> mode, rather than claiming that it tries to do the same thing as if
>>> -fstrict-volatile-bitfields were not enabled at all.
>
>
> I decided that this approach was more expedient, after all.
>
> I've tested this patch (in conjunction with my already-approved but
> not-yet-applied patch) on mainline for arm-none-eabi, x86_64-linux-gnu, and
> mips-linux gnu.  I also backported the entire series to GCC 4.8 and tested
> there on arm-none-eabi and x86_64-linux-gnu.  OK to apply?

Hm, I can't seem to find the context for

@@ -923,6 +935,14 @@
store_fixed_bit_field (str_rtx, bitsize, bitnum, 0, 0, value);
   return;
 }
+  else if (MEM_P (str_rtx)
+  && MEM_VOLATILE_P (str_rtx)
+  && flag_strict_volatile_bitfields > 0)
+/* This is a case where -fstrict-volatile-bitfields doesn't apply
+   because we can't do a single access in the declared mode of the field.
+   Since the incoming STR_RTX has already been adjusted to that mode,
+   fall back to word mode for subsequent logic.  */
+str_rtx = adjust_address (str_rtx, word_mode, 0);

   /* Under the C++0x memory model, we must not touch bits outside the
  bit region.  Adjust the address to start at the beginning of the

and the other similar hunk.  I suppose they apply to earlier patches
in the series?  I suppose the above applies to store_bit_field (diff -p
really helps!).  Why would using word_mode be any good as
fallback?  That is, why is "Since the incoming STR_RTX has already
been adjusted to that mode" not the thing to fix?

Richard.

> -Sandra

Re: [PATCH GCC]Refactor force_expr_to_var_cost and handle type conversion

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 8:29 AM, Bin.Cheng  wrote:
> On Fri, Nov 8, 2013 at 10:06 PM, Richard Biener
>  wrote:
>> On Fri, Nov 8, 2013 at 2:41 PM, bin.cheng  wrote:
>>> Hi,
>>> This patch refactors force_expr_to_var_cost and handles type conversion
>>> along with other tree nodes.  It is split from the patch posted at
>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00546.html
>>> Bootstrap and test with the patch lowering address expressions on
>>> x86/x86_64/arm.  Is it OK?
>>
>> ENOPATCH
>>
> Attached here.
> I think it should be stated that this patch and the lowering one are
> logically one because we rely on this patch to compute cost of lowered
> address expression like "&arr + offset".
> Moreover, address_cost on x86/x86_64 (is 1) are small, so this patch
> has small impact on these two targets.  While address_cost on arm (is
> 9) is non-trivial, it has greater impact on ARM.  The statement is in
> line with various benchmark data.

Ok.

Thanks,
Richard.

> Thanks,
> bin
> --
> Best Regards.

Re: [RFA][PATCH] Isolate erroneous paths optimization

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 4:11 AM, Jeff Law  wrote:
> On 11/10/13 05:34, Eric Botcazou wrote:

 But I think that you cannot transform

 foo ()
 {

*0 = 1;

 }

 to __builtin_trap as you can catch the trap via an exception handler
 in a caller of foo, no?
>>>
>>>
>>> That is true.  OK, I can see an argument that when using
>>> -fnon-call-exceptions that kind of code should not be changed to call
>>> __builtin_trap.
>>
>>
>> That's exactly the reason why gnat.dg/memtrap.adb started to fail after
>> Jeff's
>> patch went it.  So, if -fisolate-erroneous-paths isn't amended, we'll need
>> to
>> disable it in Ada like in Go.
>>
>>> In that case I think it would be fine to run the isolate paths
>>> optimization, but to not omit the actual dereference of the NULL
>>> pointer (possibly the dereference could be followed by a trap).
>>
>>
>> This would probably work for Ada as well.
>
> OK.  It sounds like there's a pretty general consensus that we ought ot go
> ahead and leave in a load/store of a NULL pointer.  I'll go ahead and run
> with that.  I'll probably just emit SSA_NAME = *0, unless someone thinks we
> ought ot distinguish between loads and stores, emitting SSA_NAME = *0 and *0
> = 0 for the two cases respectively.

Can't you simply keep the trapping stmt as-is?

Richard.

> However, that brings up an couple interesting questions.
>
> Let's say we find a NULL pointer which reaches a return statement in a
> function which is marked as returns_nonnull.  In that case there is no
> dereference.  Presumably for that kind of scenario we'll just keep the
> builtin trap.
>
> Similarly, assume we extend this pass to detect out-of-bounds array
> indexing.  It's fairly simple to do and has always been in my plan.  In that
> case leaving in the array indexing  won't necessarily generate a fault.  For
> those presumably we'll just want the builtin_trap as well?
>
> Again, I don't mind inserting a *0, I just want to have a plan for the other
> undefined behaviours we currently detect and those which I plan on catching
> soon.
>
> Jeff
>

Re: [patch] Fix PR ada/35998

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 11:10 AM, Eric Botcazou  wrote:
> Hi,
>
> this is an old bug report from Jan, which was closed, then reopened by Tom at
> some point, but the patch never got reviewed.  The original submission is at:
>   http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01857.html
>
> Tested on x86_64-suse-linux, OK for the mainline?

Due to the different interfaces of int_size_in_bytes and
simple_type_size_in_bits (and 'size' in add_byte_size_attribute being
unsigned and not [unsigned] HWI) it would be cleaner to
add an early return after the call to int_size_in_bytes if its
return value is -1 (and make sure the return value doesn't
overflow an unsigned int - likewise for simple_type_size_in_bits,
not sure why that case doesn't use int_size_in_bytes as well ...)?

Can you apply some TLC here?

Thanks,
Richard.

>
> 2013-11-11  Jan Kratochvil  
>
> PR ada/35998
> * dwarf2out.c (add_byte_size_attribute): Omit attribute for size -1.
>
>
> --
> Eric Botcazou

Re: [PATCH] Handle GIMPLE_ASSIGNs with different vuse in gimple_equal_p

2013-11-11 Thread Tom de Vries

On 11/11/13 10:50, Richard Biener wrote:
> On Sat, 9 Nov 2013, Tom de Vries wrote:
>> Bootstrapped and regtested on x86_64.
>>
>> OK for trunk?
> 
> Comments inline
> 
> 
> diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
> index 98b5882..43516a7 100644
> --- a/gcc/tree-ssa-tail-merge.c
> +++ b/gcc/tree-ssa-tail-merge.c
> @@ -1173,8 +1173,47 @@ gimple_equal_p (same_succ same_succ, gimple s1, gimple 
> s2)
>lhs2 = gimple_get_lhs (s2);
>if (TREE_CODE (lhs1) != SSA_NAME
> && TREE_CODE (lhs2) != SSA_NAME)
> - return (vn_valueize (gimple_vdef (s1))
> - == vn_valueize (gimple_vdef (s2)));
> + {
> +   /* If the vdef is the same, it's the same statement.  */
> +   if (vn_valueize (gimple_vdef (s1))
> +   == vn_valueize (gimple_vdef (s2)))
> + return true;
> +
> +   /* If the vdef is not the same but the vuse is the same, it's not the
> +  same stmt.  */
> +   if (vn_valueize (gimple_vuse (s1))
> +   == vn_valueize (gimple_vuse (s2)))
> + return false;
> 

Richard,

> What's the logic behind this?
> We want to use VN to get more "positive"
> results

We can use it to get both positive and negative results faster.

> - doing a negative early out looks suspicious to me ...
> 

If the vdefs are the same, the VN has concluded that the statements are the
same. We have a positive early out.

If the vdefs are not the same, the VN has concluded that the statements are
different. But if the vuses are the same, the VN has concluded that the
statements are different because of structural difference. Which means there's
no sense in us repeating the structural comparison. We have a negative early 
out.

If the vdefs are not the same, the VN has concluded that the statements are
different. But if the vuses are different, the VN may have concluded that the
statements are different because of the different vuses. So it still makes sense
to try structural comparison.

> +   /* If the vdef is not the same and the vuse is not the same, it might 
> be
> +  same stmt.  */
> +
> +   /* Test for structural equality.  */
> +   if (gimple_assign_rhs_code (s1) != gimple_assign_rhs_code (s1)
> 
> s2
> 
> +   || (gimple_assign_nontemporal_move_p (s1)
> +   != gimple_assign_nontemporal_move_p (s2)))
> 
> I don't think we should care (it'll be false - a later pass sets it,
> it's an optimization hint, not a correctness issue).  More interesting
> would be to assert that has_volatile_ops is the same if the operands
> turned out to be the same.
> 

OK.

> + return false;
> +
> +   if (!operand_equal_p (lhs1, lhs2, 0))
> + return false;
> +
> +   t1 = gimple_assign_rhs1 (s1);
> +   t2 = gimple_assign_rhs1 (s2);
> +   if (!gimple_operand_equal_value_p (t1, t2))
> + return false;
> +
> +   t1 = gimple_assign_rhs2 (s1);
> +   t2 = gimple_assign_rhs2 (s2);
> +   if (!gimple_operand_equal_value_p (t1, t2))
> + return false;
> +
> +   t1 = gimple_assign_rhs3 (s1);
> +   t2 = gimple_assign_rhs3 (s2);
> +   if (!gimple_operand_equal_value_p (t1, t2))
> + return false;
> 
>   for (i = 1; i < gimple_num_ops (s1); ++i)
> t1 = gimple_op (s1, i);
> ...
> 
> but I think you should only compare rhs1 and thus only handle
> GIMPLE_ASSIGN_SINGLEs this way - the others have a SSA name
> lhs.
> 

I see.

> That makes the whole thing just
> 
>   if (TREE_CODE (lhs1) != SSA_NAME
>   && TREE_CODE (lhs2) != SSA_NAME)
> {
>if (vn_valueize (gimple_vdef (s1))
>== vn_valueize (gimple_vdef (s2)))
>return true;
>return operand_equal_p (lhs1, lhs2, 0)
>&& gimple_operand_equal_value_p (gimple_assign_rhs1 (s1),
>  gimple_assign_rhs2 (s2));
> }
> 
> Ok with doing it this way.

I'll retest and checkin like this, unless you agree about the early out, which I
still think is a good idea, although the structural test is now much smaller.

Thanks,
- Tom

> 
> Thanks,
> Richard.
> 
> +   /* Same structure.  */
> +   return true;
> + }
>else if (TREE_CODE (lhs1) == SSA_NAME
>  && TREE_CODE (lhs2) == SSA_NAME)
>   return vn_valueize (lhs1) == vn_valueize (lhs2);
>

Re: [RFC] replace malloc with a decl on the stack

2013-11-11 Thread Richard Biener

On Sun, Nov 10, 2013 at 4:27 PM, Marc Glisse  wrote:
> Hello,
>
> I am posting this patch to get some feedback on the approach. The goal is to
> replace malloc+free with a stack allocation (a decl actually) when the size
> is a small constant.
>
> For testing, I highjacked the "leaf" attribute, but it isn't right, I'll
> remove it from the list (not sure what I'll do for the testcases then). What
> I'd want instead is a "returns" attribute that means the function will
> return (either by "return" or an exception), as opposed to having an
> infinite loop, calling exit or longjmp, etc (sched-deps.c has a related
> call_may_noreturn_p). The situation I am trying to avoid is:
> p=malloc(12);
> f(p)
> free(p)
>
> where f contains somewhere:
> free(p); exit(42);
> (or longjmp or something else that takes the regular call to free out of the
> execution flow).

Instead of a new attribute IPA should be used to compute this call-graph
property.  An infinite loop wouldn't be an issue, but the real issue is
that it shouldn't free the object which would be only valid if the free
after the call isn't reached.

IPA pure-const is used to compute similar properties (may loop,
may throw).

You also have to make sure to properly align the replacement.

>
> It passes most of the testsuite, but breaks a couple __builtin_object_size
> tests:
>
> struct A
> {
>   char a[10];
>   int b;
>   char c[10];
> };
> int main(){
>   struct A *p = malloc (2 * sizeof (struct A));
>   assert (__builtin_object_size (&p->a, 1) == sizeof (p->a));
>   free (p);
> }
> __builtin_object_size now returns 56 instead of 10. I am not sure what to do
> about that.
>
>
> The size above which the malloc->stack transformation is not applied should
> depend on a parameter, I don't know if it should get its own or depend on an
> existing one. In any case, I'd like to avoid ending up with a ridiculously
> low threshold (my programs use GMP, which internally uses alloca up to 65536
> bytes (even in recursive functions that have a dozen allocations), so I
> don't want gcc to tell me that 50 bytes are too much).
>
> A program with a double-free may, with this patch, end up crashing on the
> first free instead of the second, but that's an invalid program anyway.
>
>
> I don't know if tree-ssa-forwprop is a good place for it, but it was
> convenient for a prototype. I like the idea of running it several times:
> replacing malloc with a decl early can help other optimizations, and other
> optimizations can make this one possible later.

We have a similar transform in CCP (fold_builtin_alloca_with_align) which
is there because CCP is a good place where size arguments may
become constant (the case you handle - you don't seem to handle
variable malloc -> alloca transform which would be possible if for example
VRP figures out a acceptable upper bound for the size).

Richard.

> The walk could be a bit expensive, but we only do it if we detected a malloc
> of a small constant and at least one matching free. I guess I should mark
> the malloc somehow to avoid performing the walk twice if there are several
> (but not enough) matching free.
>
>
> stack_vec is nice, it would be convenient if bitmaps also had a version with
> a destructor so we don't need to explicitly deallocate them.
>
> --
> Marc Glisse
> Index: gcc/testsuite/g++.dg/tree-ssa/heapstack-1.C
> ===
> --- gcc/testsuite/g++.dg/tree-ssa/heapstack-1.C (revision 0)
> +++ gcc/testsuite/g++.dg/tree-ssa/heapstack-1.C (working copy)
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +struct A {
> +  void*p;
> +  A(void*q) : p(q) {}
> +  ~A(){ __builtin_free(p); }
> +};
> +void f(void*)__attribute__((__leaf__));
> +void h(void*)__attribute__((__leaf__,__nothrow__));
> +void g(){
> +  void*p=__builtin_malloc(12);
> +  A a(p);
> +  f(p);
> +}
> +
> +void i(){
> +  void*p=__builtin_malloc(12);
> +  h(p);
> +  __builtin_free(p);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>
> Property changes on: gcc/testsuite/g++.dg/tree-ssa/heapstack-1.C
> ___
> Added: svn:keywords
> ## -0,0 +1 ##
> +Author Date Id Revision URL
> \ No newline at end of property
> Added: svn:eol-style
> ## -0,0 +1 ##
> +native
> \ No newline at end of property
> Index: gcc/testsuite/g++.dg/tree-ssa/heapstack-2.C
> ===
> --- gcc/testsuite/g++.dg/tree-ssa/heapstack-2.C (revision 0)
> +++ gcc/testsuite/g++.dg/tree-ssa/heapstack-2.C (working copy)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +void f(void*)__attribute__((__leaf__));
> +void g(){
> +  void*p=__builtin_malloc(12);
> +  f(p);
> +  __builtin_free(p);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "malloc" 1 "op

Re: [patch] Fix debug info for modified parameter

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 11:56 AM, Eric Botcazou  wrote:
> Hi,
>
> since the switch to SSA form at -O0, the compiler generates wrong debug info
> for something like:
>
> void
> foo (int i)
> {
>   int j = 0;
>   i = 1;
>   j = j + 1;  /* BREAK */
> }
>
> If you try to display the value of i after breaking in GDB, you don't get 1.
> The reason is that there is no default def SSA_NAME for i in this case, so no
> partitions get the RTL location of the parameter.
>
> Tentative patch attached, it's admittedly a little heavy, but I don't see any
> other solution.  Tested on x86_64-suse-linux, OK for the mainline?

Hmm, at -O0 we should be able to coalesce all SSA names of a
DECL.  So in theory the following should work:

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c (revision 204664)
+++ gcc/cfgexpand.c (working copy)
@@ -1619,7 +1619,7 @@ expand_used_vars (void)
 we don't do anything here.  But those which don't contain the
 default def (representing a temporary based on the parm/result)
 we need to allocate space just like for normal VAR_DECLs.  */
- if (!bitmap_bit_p (SA.partition_has_default_def, i))
+ if (optimize && !bitmap_bit_p (SA.partition_has_default_def, i))
{
  expand_one_var (var, true, true);
  gcc_assert (SA.partition_to_pseudo[i]);

eventually restricting handling to !DECL_IGNORED_P / !DECL_ARTIFICIAL
variables.

Richard.

>
> 2013-11-11  Eric Botcazou  
>
> * tree-outof-ssa.c (remove_ssa_form): For a parameter without default
> def, pretend that the single partition that contains all the SSA_NAMEs
> for this parameter, if it exists, also contains the default def.
>
>
> 2013-11-11  Eric Botcazou  
>
> * gcc.dg/guality/param-4.c: New test.
>
>
> --
> Eric Botcazou

Re: [patch] Add CONSTRUCTOR_NO_CLEARING flag

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 11:55 AM, Eric Botcazou  wrote:
> Hi,
>
> in Ada 2012 it is allowed to omit components of aggregates (the equivalent of
> initializers/constructors); in this case, the contents of the corresponding
> fields in the record become undefined.  Now the gimplifier implements the C
> semantics of clearing the missing components, so this patch introduces a new
> flag on constructors to modify that.
>
> Tested on x86_64-suse-linux, OK for the mainline?

Ok.  Can you update doc/generic.texi?

Thanks,
Richard.

>
> 2013-11-11  Tristan Gingold  
> Eric Botcazou  
>
> * tree.h (CONSTRUCTOR_NO_CLEARING): Define.
> * tree-core.h (CONSTRUCTOR_NO_CLEARING): Document it.
> * tree.def (CONSTRUCTOR): Likewise.
> * gimplify.c (gimplify_init_constructor): Do not clear the object when
> the constructor is incomplete and CONSTRUCTOR_NO_CLEARING is set.
> ada/
> * gcc-interface/utils2.c (gnat_build_constructor): Also set the flag
> CONSTRUCTOR_NO_CLEARING on the constructor.
>
>
> --
> Eric Botcazou

Re: [PATCH] Handle GIMPLE_ASSIGNs with different vuse in gimple_equal_p

2013-11-11 Thread Richard Biener

On Mon, 11 Nov 2013, Tom de Vries wrote:

> On 11/11/13 10:50, Richard Biener wrote:
> > On Sat, 9 Nov 2013, Tom de Vries wrote:
> >> Bootstrapped and regtested on x86_64.
> >>
> >> OK for trunk?
> > 
> > Comments inline
> > 
> > 
> > diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
> > index 98b5882..43516a7 100644
> > --- a/gcc/tree-ssa-tail-merge.c
> > +++ b/gcc/tree-ssa-tail-merge.c
> > @@ -1173,8 +1173,47 @@ gimple_equal_p (same_succ same_succ, gimple s1, 
> > gimple s2)
> >lhs2 = gimple_get_lhs (s2);
> >if (TREE_CODE (lhs1) != SSA_NAME
> >   && TREE_CODE (lhs2) != SSA_NAME)
> > -   return (vn_valueize (gimple_vdef (s1))
> > -   == vn_valueize (gimple_vdef (s2)));
> > +   {
> > + /* If the vdef is the same, it's the same statement.  */
> > + if (vn_valueize (gimple_vdef (s1))
> > + == vn_valueize (gimple_vdef (s2)))
> > +   return true;
> > +
> > + /* If the vdef is not the same but the vuse is the same, it's not the
> > +same stmt.  */
> > + if (vn_valueize (gimple_vuse (s1))
> > + == vn_valueize (gimple_vuse (s2)))
> > +   return false;
> > 
> 
> Richard,
> 
> > What's the logic behind this?
> > We want to use VN to get more "positive"
> > results
> 
> We can use it to get both positive and negative results faster.

I can get a negative result for free: "false" ;)

VN proves equivalency it doesn't prove non-equivalency - that's
simply its conservative fallback.

> > - doing a negative early out looks suspicious to me ...
> > 
> 
> If the vdefs are the same, the VN has concluded that the statements are the
> same. We have a positive early out.
> 
> If the vdefs are not the same, the VN has concluded that the statements are
> different. But if the vuses are the same, the VN has concluded that the
> statements are different because of structural difference. Which means there's
> no sense in us repeating the structural comparison. We have a negative early 
> out.

Well, see above - it merely failed to prove equivalency, it didn't
actually conclude they are different.
 
> If the vdefs are not the same, the VN has concluded that the statements are
> different. But if the vuses are different, the VN may have concluded that the
> statements are different because of the different vuses. So it still makes 
> sense
> to try structural comparison.
> 
> > + /* If the vdef is not the same and the vuse is not the same, it might 
> > be
> > +same stmt.  */
> > +
> > + /* Test for structural equality.  */
> > + if (gimple_assign_rhs_code (s1) != gimple_assign_rhs_code (s1)
> > 
> > s2
> > 
> > + || (gimple_assign_nontemporal_move_p (s1)
> > + != gimple_assign_nontemporal_move_p (s2)))
> > 
> > I don't think we should care (it'll be false - a later pass sets it,
> > it's an optimization hint, not a correctness issue).  More interesting
> > would be to assert that has_volatile_ops is the same if the operands
> > turned out to be the same.
> > 
> 
> OK.
> 
> > +   return false;
> > +
> > + if (!operand_equal_p (lhs1, lhs2, 0))
> > +   return false;
> > +
> > + t1 = gimple_assign_rhs1 (s1);
> > + t2 = gimple_assign_rhs1 (s2);
> > + if (!gimple_operand_equal_value_p (t1, t2))
> > +   return false;
> > +
> > + t1 = gimple_assign_rhs2 (s1);
> > + t2 = gimple_assign_rhs2 (s2);
> > + if (!gimple_operand_equal_value_p (t1, t2))
> > +   return false;
> > +
> > + t1 = gimple_assign_rhs3 (s1);
> > + t2 = gimple_assign_rhs3 (s2);
> > + if (!gimple_operand_equal_value_p (t1, t2))
> > +   return false;
> > 
> >   for (i = 1; i < gimple_num_ops (s1); ++i)
> > t1 = gimple_op (s1, i);
> > ...
> > 
> > but I think you should only compare rhs1 and thus only handle
> > GIMPLE_ASSIGN_SINGLEs this way - the others have a SSA name
> > lhs.
> > 
> 
> I see.
> 
> > That makes the whole thing just
> > 
> >   if (TREE_CODE (lhs1) != SSA_NAME
> >   && TREE_CODE (lhs2) != SSA_NAME)
> > {
> >if (vn_valueize (gimple_vdef (s1))
> >== vn_valueize (gimple_vdef (s2)))
> >  return true;
> >return operand_equal_p (lhs1, lhs2, 0)
> >&& gimple_operand_equal_value_p (gimple_assign_rhs1 (s1),
> >  gimple_assign_rhs2 (s2));
> > }
> > 
> > Ok with doing it this way.
> 
> I'll retest and checkin like this, unless you agree about the early out, 
> which I
> still think is a good idea, although the structural test is now much smaller.

I think the early out is premature.

Richard.

[patch] fix libstdc++/54562

2013-11-11 Thread Jonathan Wakely

This ensures that the standard timed mutexes use steady_clock for
relative timeouts and the same clock as pthread_mutex_timedlock for
absolute timeouts.

I'm not adding a new test as the only ones I could come up with
involve waiting for longer than is suitable in a testcase, and don't
reliably fail on all systems anyway.

PR libstdc++/54562
* include/std/mutex (__timed_mutex_impl::__clock_t): Use
high_resolution_clock for absolute timeouts, because
pthread_mutex_timedlock uses CLOCK_REALTIME not CLOCK_MONOTONIC.
(__timed_mutex_impl::_M_try_lock_for): Use steady_clock for relative
timeouts as per [thread.req.timing].
(__timed_mutex_impl::_M_try_lock_until): Convert to
__clock_t time point instead of using _M_try_lock_for.

Tested x86_64-linux, committed to trunk.  I would like to commit this
to the 4.8 branch too, but I'm undecided whether it's suitable.
commit 1f2401b0338fefaaa521ebf202e563a6c673b47e
Author: Jonathan Wakely 
Date:   Mon Nov 11 12:40:01 2013 +

PR libstdc++/54562
* include/std/mutex (__timed_mutex_impl::__clock_t): Use
high_resolution_clock for absolute timeouts, because
pthread_mutex_timedlock uses CLOCK_REALTIME not CLOCK_MONOTONIC.
(__timed_mutex_impl::_M_try_lock_for): Use steady_clock for relative
timeouts as per [thread.req.timing].
(__timed_mutex_impl::_M_try_lock_until): Convert to
__clock_t time point instead of using _M_try_lock_for.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 40b2e31..da2ca0c 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -203,21 +203,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 class __timed_mutex_impl
 {
 protected:
-#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC
-  typedef chrono::steady_clock __clock_t;
-#else
   typedef chrono::high_resolution_clock__clock_t;
-#endif
 
   template
bool
_M_try_lock_for(const chrono::duration<_Rep, _Period>& __rtime)
{
- auto __rt = chrono::duration_cast<__clock_t::duration>(__rtime);
- if (ratio_greater<__clock_t::period, _Period>())
+ using chrono::steady_clock;
+ auto __rt = chrono::duration_cast(__rtime);
+ if (ratio_greater())
++__rt;
-
- return _M_try_lock_until(__clock_t::now() + __rt);
+ return _M_try_lock_until(steady_clock::now() + __rt);
}
 
   template
@@ -225,11 +221,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_try_lock_until(const chrono::time_point<__clock_t,
   _Duration>& __atime)
{
- chrono::time_point<__clock_t, chrono::seconds> __s =
-   chrono::time_point_cast(__atime);
-
- chrono::nanoseconds __ns =
-   chrono::duration_cast(__atime - __s);
+ auto __s = chrono::time_point_cast(__atime);
+ auto __ns = chrono::duration_cast(__atime - __s);
 
  __gthread_time_t __ts = {
static_cast(__s.time_since_epoch().count()),
@@ -243,7 +236,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
bool
_M_try_lock_until(const chrono::time_point<_Clock, _Duration>& __atime)
-   { return _M_try_lock_for(__atime - _Clock::now()); }
+   {
+ auto __rtime = __atime - _Clock::now();
+ return _M_try_lock_until(__clock_t::now() + __rtime);
+   }
 };
 
   /// timed_mutex

patch ping: diagnostics finalization and plugins

2013-11-11 Thread Basile Starynkevitch

Hello all,

I'm pinging the patch

http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00056.html

##
Index: gcc/toplev.c
===
--- gcc/toplev.c(revision 204671)
+++ gcc/toplev.c(working copy)
@@ -1968,11 +1968,13 @@ toplev_main (int argc, char **argv)
 
   if (warningcount || errorcount || werrorcount)
 print_ignored_options ();
-  diagnostic_finish (global_dc);
 
-  /* Invoke registered plugin callbacks if any.  */
+  /* Invoke registered plugin callbacks if any.  Some plugins could
+ emit some diagnostics here.  */
   invoke_plugin_callbacks (PLUGIN_FINISH, NULL);
 
+  diagnostic_finish (global_dc);
+
   finalize_plugins ();
   location_adhoc_data_fini (line_table);
   if (seen_error () || werrorcount)
 gcc/ChangeLog entry

2013-11-11  Basile Starynkevitch  

* toplev.c (toplev_main): Move PLUGIN_FINISH invocation before 
  diagnostic_finish.



Ok for trunk?

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Re: patch ping: diagnostics finalization and plugins

2013-11-11 Thread Diego Novillo

On Mon, Nov 11, 2013 at 8:36 AM, Basile Starynkevitch
 wrote:

> 2013-11-11  Basile Starynkevitch  
>
> * toplev.c (toplev_main): Move PLUGIN_FINISH invocation before
>   diagnostic_finish.

OK.


Diego.

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-11-11 Thread Richard Biener

On Thu, Nov 7, 2013 at 10:07 PM, Ilya Enkovich  wrote:
> 2013/11/7 Jeff Law :
>> On 11/04/13 05:40, Richard Biener wrote:


 Effectively the bounds are passed "on the side".
>>>
>>>
>>> Well, not exactly.  I can see __bound_tmp.0_4 passed to access_and_store.
>>
>> I'm referring to how they're dealt with in FUNCTION_ARG and friends, ie, the
>> low level aspects.  Maybe that's why we're crossing wires here.
>>
>>
>>
>>>
>>> Are __builtin_ia32_bndcu and access_and_store supposed to be
>>> called directly after each other?  What if for example profile
>>> instrumentation
>>> inserts a call inbetween them?
>>
>> That's a question for Ilya.
>
> I think there is some misunderstanding. __builtin_ia32_bndcu and
> access_and_store calls are completely independent in this example.
> This is just a code from example on how user-level builtins may be
> used with legacy code.
>
>>
>>

 You can't actually interleave them -- that results in MPX and normal code
 not being able to interact.   Passing the bound at the end doesn't really
 work either -- varargs and the desire to pass some of the bounds around
 in
 bound registers.
>>>
>>>
>>> I'm only looking at how bound arguments are passed at the GIMPLE
>>> level - which I think is arbitrary given they are passed on-the-side
>>> during code-generation.  So I'm looking for a more "sensible" option
>>> for the GIMPLE representation which looks somewhat fragile to me
>>> (it looks the argument is only there to have a data dependence
>>> between the bndcu intrinsic and the actual call?)
>>
>> OK, we're clearly looking at two different aspects here.  While I think we
>> can express them arbitrarily in GIMPLE, we have to pass them on the side
>> once we get into the low level code for compatibility purposes.
>>
>> I think argument passing for MPX is going to ultimately be ugly/fragile
>> regardless of what we do given the constraints.  Given we're passing on the
>> side at the low level, I'd prefer them passed on the side in GIMPLE, but I'm
>> willing to consider other alternatives.
>>
>>
>>>
>
> /* Return the number of arguments used by call statement GS
>  ignoring bound ones.  */
>
> static inline unsigned
> gimple_call_num_nobnd_args (const_gimple gs)
> {
> unsigned num_args = gimple_call_num_args (gs);
> unsigned res = num_args;
> for (unsigned n = 0; n < num_args; n++)
>   if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
> res--;
> return res;
> }
>
> the choice means that gimple_call_num_nobnd_args is not O(1).


 Yes, but I don't see that's terribly problematical.
>>>
>>>
>>> Well, consider compile.exp limits-fnargs.c written with int * parameters
>>> and bound support ;)  [there is a limit on the number of bound registers
>>> available, but the GIMPLE parts doesn't put any limit on instrumented
>>> args?]
>>
>> Right.  Go back to Ilya's earlier messages :-)
>>
>> There's these two magic hooks TARGET_{LOAD,STORE}_BOUNDS_FOR_ARG.  At some
>> point when you run out of regs, you start shoving these things into memory.
>> I think we're ultimately going to need further clarification here.
>>
>>
>
> GIMPLE layout depending on flag_check_pointer_bounds sounds
> like a recipie for desaster if you consider TUs compiled with and
> TUs compiled without and LTO.  Or if you consider using
> optimized attribute with that flag.


 Sorry, I don't follow.  Can you elaborate please.
>>>
>>>
>>> Compile TU1 with -fno-chk, TU2 with -fchk, LTO both object files
>>> which gives you - well, either -fno-chk or -fchk.  Then you read in
>>> the GIMPLE and when calling gimple_call_nobnd_arg you get
>>> a mismatch between where you generated the gimple and where
>>> you use the gimple.
>>>
>>> A flag on whether a call is instrumented is better (consider inlining
>>> from TU1 into TU2 where a flag on struct function for example wouldn't
>>> be enough either).
>>
>> OK.  I see what you're saying.  Whether or not we have these extra args is a
>> property of the call site, not the CFUN, target DECL or the TU. That makes
>> sense.
>>
>>
>> Ilya, I think there's enough confusion & issues around the call ABI/API that
>> we need to sort it out before moving forward in any significant way.
>>
>> Let's go back to argument passing and go through some concrete examples of
>> the various cases.  Make sure to tie them into the calls to
>> TARGET_{LOAD,STORE}_BOUNDS_FOR_ARG and the new gimple_call_* routines.
>
> OK. Lets see at how expand pass handles instrumented calls.  As usual,
> for each passed argument expand calls target hook to determine where
> argument is passed.  Now target may return not only slot for arg, but
> also a slot for bounds in case arg may have them (PARALLEL expr is
> used to hold all slots).  If bounds slot is register, then expand
> simply puts bounds to it.  If it is not a register, target hook is
> called to sto

Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification

2013-11-11 Thread Richard Biener

On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich  wrote:
> 2013/11/8 Richard Biener :
>> On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law  wrote:
>>> On 11/07/13 04:50, Ilya Enkovich wrote:

 Hi,

 Here is an updated patch version.
>>>
>>> I think this needs to hold until we have a consensus on what the parameter
>>> passing looks like for bounded pointers.
>>
>> I still think the best thing to do on GIMPLE is
>>
>> arg_2 = __builtin_ia32_bnd_arg (arg_1(D));
>> foo (arg_2);
>>
>> that is, make the parameter an implicit pair of {value, bound} where
>> the bound is determined by the value going through a bound association
>> builtin.  No extra explicit argument to the calls so arguments match
>> the fndecl and fntype.  All the complexity is defered to the expander
>> which can trivially lookup bound arguments via the SSA def (I suppose
>> it does that anyway now for getting at the explicit bound argument now).
>>
>> As far as I can see (well, think), all currently passed bound arguments
>> are the return value of such builtin already.
>
> All bounds are result of different builtin calls. Small example:
>
> int *global_p;
> void foo (int *p)
> {
>   int buf[10];
>   bar (p, buf, global_p);
> }
>
>
> It is translated into:
>
>   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
>   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
>   global_p.0_2 = global_p;
>   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
>   bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7,
> global_p.0_2, __bound_tmp.1_8);
>
> Bounds binding via calls as you suggest may be done as following:
>
>   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
>   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
>   global_p.0_2 = global_p;
>   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
>   _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6);
>   _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7);
>   _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8);
>   bar (_9, _10, _11);
>
> Is it close to what you propose?

Yes.

Richard.

> Ilya
>>
>> Richard.
>>
>>
>>
>>> Jeff

Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.

2013-11-11 Thread Kirill Yukhin

Hello,
On 05 Nov 16:05, Kirill Yukhin wrote:
> Is it ok with that change?

Ping?

--
Thanks, K

[PATCH][ARM] Fix "control reached end of non-void function" warning and boostrap

2013-11-11 Thread Kyrill Tkachov


Hi all,

My patch last Friday introduced a warning about reaching the end of a non-void 
function which breaks bootstrap. On second thought, instead of breaking at the 
end of the comparisons handling, we should just return instead.


Tested arm-none-eabi on qemu and checked the build log to make sure the warning 
disappears.


Ok for trunk?

Thanks,
Kyrill

2013-11-11  Kyrylo Tkachov  

* config/arm/arm.c (arm_new_rtx_costs): Return after handling
comparisons.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 2aa739d..62ec7f7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10154,7 +10154,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	  *cost = 0;
 	  return true;
 	}
-  break;
+  return false;
 
 case ABS:
   if (TARGET_HARD_FLOAT && GET_MODE_CLASS (mode) == MODE_FLOAT

Re: [PATCH][ARM] Fix "control reached end of non-void function" warning and boostrap

2013-11-11 Thread Ramana Radhakrishnan


On 11/11/13 13:48, Kyrill Tkachov wrote:

Hi all,

My patch last Friday introduced a warning about reaching the end of a non-void
function which breaks bootstrap. On second thought, instead of breaking at the
end of the comparisons handling, we should just return instead.

Tested arm-none-eabi on qemu and checked the build log to make sure the warning
disappears.

Ok for trunk?


Ouch !

Ok.

Ramana



Thanks,
Kyrill

2013-11-11  Kyrylo Tkachov  

  * config/arm/arm.c (arm_new_rtx_costs): Return after handling
  comparisons.

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-11-11 Thread Ilya Enkovich

2013/11/11 Richard Biener :
> On Thu, Nov 7, 2013 at 10:07 PM, Ilya Enkovich  wrote:
>> 2013/11/7 Jeff Law :
>>> On 11/04/13 05:40, Richard Biener wrote:
>
>
> Effectively the bounds are passed "on the side".


 Well, not exactly.  I can see __bound_tmp.0_4 passed to access_and_store.
>>>
>>> I'm referring to how they're dealt with in FUNCTION_ARG and friends, ie, the
>>> low level aspects.  Maybe that's why we're crossing wires here.
>>>
>>>
>>>

 Are __builtin_ia32_bndcu and access_and_store supposed to be
 called directly after each other?  What if for example profile
 instrumentation
 inserts a call inbetween them?
>>>
>>> That's a question for Ilya.
>>
>> I think there is some misunderstanding. __builtin_ia32_bndcu and
>> access_and_store calls are completely independent in this example.
>> This is just a code from example on how user-level builtins may be
>> used with legacy code.
>>
>>>
>>>
>
> You can't actually interleave them -- that results in MPX and normal code
> not being able to interact.   Passing the bound at the end doesn't really
> work either -- varargs and the desire to pass some of the bounds around
> in
> bound registers.


 I'm only looking at how bound arguments are passed at the GIMPLE
 level - which I think is arbitrary given they are passed on-the-side
 during code-generation.  So I'm looking for a more "sensible" option
 for the GIMPLE representation which looks somewhat fragile to me
 (it looks the argument is only there to have a data dependence
 between the bndcu intrinsic and the actual call?)
>>>
>>> OK, we're clearly looking at two different aspects here.  While I think we
>>> can express them arbitrarily in GIMPLE, we have to pass them on the side
>>> once we get into the low level code for compatibility purposes.
>>>
>>> I think argument passing for MPX is going to ultimately be ugly/fragile
>>> regardless of what we do given the constraints.  Given we're passing on the
>>> side at the low level, I'd prefer them passed on the side in GIMPLE, but I'm
>>> willing to consider other alternatives.
>>>
>>>

>>
>> /* Return the number of arguments used by call statement GS
>>  ignoring bound ones.  */
>>
>> static inline unsigned
>> gimple_call_num_nobnd_args (const_gimple gs)
>> {
>> unsigned num_args = gimple_call_num_args (gs);
>> unsigned res = num_args;
>> for (unsigned n = 0; n < num_args; n++)
>>   if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
>> res--;
>> return res;
>> }
>>
>> the choice means that gimple_call_num_nobnd_args is not O(1).
>
>
> Yes, but I don't see that's terribly problematical.


 Well, consider compile.exp limits-fnargs.c written with int * parameters
 and bound support ;)  [there is a limit on the number of bound registers
 available, but the GIMPLE parts doesn't put any limit on instrumented
 args?]
>>>
>>> Right.  Go back to Ilya's earlier messages :-)
>>>
>>> There's these two magic hooks TARGET_{LOAD,STORE}_BOUNDS_FOR_ARG.  At some
>>> point when you run out of regs, you start shoving these things into memory.
>>> I think we're ultimately going to need further clarification here.
>>>
>>>
>>
>> GIMPLE layout depending on flag_check_pointer_bounds sounds
>> like a recipie for desaster if you consider TUs compiled with and
>> TUs compiled without and LTO.  Or if you consider using
>> optimized attribute with that flag.
>
>
> Sorry, I don't follow.  Can you elaborate please.


 Compile TU1 with -fno-chk, TU2 with -fchk, LTO both object files
 which gives you - well, either -fno-chk or -fchk.  Then you read in
 the GIMPLE and when calling gimple_call_nobnd_arg you get
 a mismatch between where you generated the gimple and where
 you use the gimple.

 A flag on whether a call is instrumented is better (consider inlining
 from TU1 into TU2 where a flag on struct function for example wouldn't
 be enough either).
>>>
>>> OK.  I see what you're saying.  Whether or not we have these extra args is a
>>> property of the call site, not the CFUN, target DECL or the TU. That makes
>>> sense.
>>>
>>>
>>> Ilya, I think there's enough confusion & issues around the call ABI/API that
>>> we need to sort it out before moving forward in any significant way.
>>>
>>> Let's go back to argument passing and go through some concrete examples of
>>> the various cases.  Make sure to tie them into the calls to
>>> TARGET_{LOAD,STORE}_BOUNDS_FOR_ARG and the new gimple_call_* routines.
>>
>> OK. Lets see at how expand pass handles instrumented calls.  As usual,
>> for each passed argument expand calls target hook to determine where
>> argument is passed.  Now target may return not only slot for arg, but
>> also a slot for bounds in case arg may have them (PARALLEL expr i

Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification

2013-11-11 Thread Ilya Enkovich

2013/11/11 Richard Biener :
> On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich  wrote:
>> 2013/11/8 Richard Biener :
>>> On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law  wrote:
 On 11/07/13 04:50, Ilya Enkovich wrote:
>
> Hi,
>
> Here is an updated patch version.

 I think this needs to hold until we have a consensus on what the parameter
 passing looks like for bounded pointers.
>>>
>>> I still think the best thing to do on GIMPLE is
>>>
>>> arg_2 = __builtin_ia32_bnd_arg (arg_1(D));
>>> foo (arg_2);
>>>
>>> that is, make the parameter an implicit pair of {value, bound} where
>>> the bound is determined by the value going through a bound association
>>> builtin.  No extra explicit argument to the calls so arguments match
>>> the fndecl and fntype.  All the complexity is defered to the expander
>>> which can trivially lookup bound arguments via the SSA def (I suppose
>>> it does that anyway now for getting at the explicit bound argument now).
>>>
>>> As far as I can see (well, think), all currently passed bound arguments
>>> are the return value of such builtin already.
>>
>> All bounds are result of different builtin calls. Small example:
>>
>> int *global_p;
>> void foo (int *p)
>> {
>>   int buf[10];
>>   bar (p, buf, global_p);
>> }
>>
>>
>> It is translated into:
>>
>>   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
>>   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
>>   global_p.0_2 = global_p;
>>   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
>>   bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7,
>> global_p.0_2, __bound_tmp.1_8);
>>
>> Bounds binding via calls as you suggest may be done as following:
>>
>>   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
>>   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
>>   global_p.0_2 = global_p;
>>   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
>>   _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6);
>>   _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7);
>>   _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8);
>>   bar (_9, _10, _11);
>>
>> Is it close to what you propose?
>
> Yes.

Is there a way to bind bounds with structures in a similar way?  For
SSA names I may easily find definition and check if it is a binding
builtin call. But for structures it is not possible.  The way I see it
to mark all such args as addressable and load required bounds on
expand pass.

Ilya
>
> Richard.
>
>> Ilya
>>>
>>> Richard.
>>>
>>>
>>>
 Jeff

[PATCH] Fix asan regtest failures c-c++-common/asan/{memcmp-1.c,strncpy-overflow-1.c}

2013-11-11 Thread Dodji Seketeli

Hello,

Since a couple of days I am seeing failure on the tests above on my
Fedora system.  The errors look like:

FAIL: c-c++-common/asan/memcmp-1.c  -O0  output pattern test, is 
=
==21832==ERROR: AddressSanitizer: stack-buffer-overflow on address 
0x7fff75df96f4 at pc 0x7f98ecbab68b bp 0x7fff75df96b0 sp 0x7fff75df95b8
READ of size 6 at 0x7fff75df96f4 thread T0
#0 0x7f98ecbab68a in __interceptor_memcmp 
/home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 
(discriminator 7)
#1 0x7f98ecbb6393 in __asan_report_error 
/home/dodji/git/gcc/master/libsanitizer/asan/asan_report.cc:774 (discriminator 
9)
#2 0x7f98ecbab6d0 in __interceptor_memcmp 
/home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 
(discriminator 7)
#3 0x400b0a in main 
/home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/memcmp-1.c:14
#4 0x3380821a04 in __libc_start_main ??:?
#5 0x4007b8 in _start ??:?
Address 0x7fff75df96f4 is located in stack of thread T0 at offset 36 in frame
#0 0x40089f in main 
/home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/memcmp-1.c:11

[...]

and

FAIL: c-c++-common/asan/strncpy-overflow-1.c  -O0  output pattern test, is 
=
==30666==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x6020efd9 at pc 0x7f65c09fd8f9 bp 0x7fffd820d170 sp 0x7fffd820d0a0
WRITE of size 10 at 0x6020efd9 thread T0
#0 0x7f65c09fd8f8 in __interceptor_strncpy 
/home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:509 
(discriminator 7)
#1 0x7f65c0a06393 in __asan_report_error 
/home/dodji/git/gcc/master/libsanitizer/asan/asan_report.cc:774 (discriminator 
9)
#2 0x7f65c09fd93e in __interceptor_strncpy 
/home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:509 
(discriminator 7)
#3 0x400a9c in main 
/home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c:11
#4 0x3380821a04 in __libc_start_main ??:?
#5 0x400898 in _start ??:?
0x6020efd9 is located 0 bytes to the right of 9-byte region 
[0x6020efd0,0x6020efd9)
allocated by thread T0 here:
#0 0x7f65c0a00cd9 in __interceptor_malloc 
/home/dodji/git/gcc/master/libsanitizer/asan/asan_malloc_linux.cc:72 
(discriminator 9)
#1 0x400a80 in main 
/home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c:10

[...]

It turned out this is just due to the output emitted by the
AddressSanitizer that has changed.

The patch below just adapts the regexps of the above test to match
the fact that a few new line patterns were inserted in the middle of
the output.

It also fixes what I think are typos namely, un-escaped ']' as in
"\[something]".  I believe the closing square bracket should be
escaped as well (but well, how do I know about tcl really).

Tested on x86_64-unknown-linux-gnu against trunk.

gcc/testsuite/ChangeLog:

* c-c++-common/asan/memcmp-1.c: Adapt the pattern for changed
libsanitizer output.
* c-c++-common/asan/strncpy-overflow-1.c: Likewise.
---
 gcc/testsuite/c-c++-common/asan/memcmp-1.c   | 8 ++--
 gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c | 9 +
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/asan/memcmp-1.c 
b/gcc/testsuite/c-c++-common/asan/memcmp-1.c
index 03f32e9..9a62e33 100644
--- a/gcc/testsuite/c-c++-common/asan/memcmp-1.c
+++ b/gcc/testsuite/c-c++-common/asan/memcmp-1.c
@@ -16,5 +16,9 @@ main ()
 }
 
 /* { dg-output "ERROR: AddressSanitizer: stack-buffer-overflow.*(\n|\r\n|\r)" 
} */
-/* { dg-output "#0 0x\[0-9a-f\]+ (in _*(interceptor_|)memcmp 
|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main|\[(\])\[^\n\r]*(\n|\r\n|\r)" } 
*/
+/* { dg-output "#0 0x\[0-9a-f\]+ (in _*(interceptor_|)memcmp 
|\[(\])\[^\n\r\]*(\n|\r\n|\r)" } */
+/* { dg-output "(\[^\r\n\]*(\n|\r\n|\r))+" } */
+/* { dg-output "#\[0-9\]+ 0x\[0-9a-f\]+ in main"} */
+/* { dg-output "(\[^\r\n\]*(\n|\r\n|\r))+" } */
+/* { dg-output "Address 0x\[0-9a-f\] is located in stack of thread T0 at 
offset \[0-9\]+ in frame (\n|\r\n|\r)"} */
+/* { dg-output "#0 0x\[0-9a-f\]+ in main" } */
diff --git a/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c 
b/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c
index 3ed9fd6..b1a4cda 100644
--- a/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c
+++ b/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c
@@ -13,9 +13,10 @@ int main(int argc, char **argv) {
 }
 
 /* { dg-output "WRITE of size \[0-9\]* at 0x\[0-9a-f\]+ thread 
T0\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "#0 0x\[0-9a-f\]+ (in 
_*(interceptor_|)strncpy|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main 
(\[^\n\r]*strncpy-overflow-1.c:11|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */
-/* { dg-output "\[^\n\r]*0x\[0-9a-f\]+

Re: [PATCH] Fix asan regtest failures c-c++-common/asan/{memcmp-1.c,strncpy-overflow-1.c}

2013-11-11 Thread Jakub Jelinek

On Mon, Nov 11, 2013 at 03:01:53PM +0100, Dodji Seketeli wrote:
> Since a couple of days I am seeing failure on the tests above on my
> Fedora system.  The errors look like:
> 
> FAIL: c-c++-common/asan/memcmp-1.c  -O0  output pattern test, is 
> =
> ==21832==ERROR: AddressSanitizer: stack-buffer-overflow on address 
> 0x7fff75df96f4 at pc 0x7f98ecbab68b bp 0x7fff75df96b0 sp 0x7fff75df95b8
> READ of size 6 at 0x7fff75df96f4 thread T0
> #0 0x7f98ecbab68a in __interceptor_memcmp 
> /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 
> (discriminator 7)
> #1 0x7f98ecbb6393 in __asan_report_error 
> /home/dodji/git/gcc/master/libsanitizer/asan/asan_report.cc:774 
> (discriminator 9)
> #2 0x7f98ecbab6d0 in __interceptor_memcmp 
> /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 
> (discriminator 7)
> #3 0x400b0a in main 
> /home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/memcmp-1.c:14

That looks like a bug in libasan, __asan_report_error doesn't call memcmp
again.  So something is wrong with getting proper backtrace it seems.

Jakub

Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 3:00 PM, Ilya Enkovich  wrote:
> 2013/11/11 Richard Biener :
>> On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich  
>> wrote:
>>> 2013/11/8 Richard Biener :
 On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law  wrote:
> On 11/07/13 04:50, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> Here is an updated patch version.
>
> I think this needs to hold until we have a consensus on what the parameter
> passing looks like for bounded pointers.

 I still think the best thing to do on GIMPLE is

 arg_2 = __builtin_ia32_bnd_arg (arg_1(D));
 foo (arg_2);

 that is, make the parameter an implicit pair of {value, bound} where
 the bound is determined by the value going through a bound association
 builtin.  No extra explicit argument to the calls so arguments match
 the fndecl and fntype.  All the complexity is defered to the expander
 which can trivially lookup bound arguments via the SSA def (I suppose
 it does that anyway now for getting at the explicit bound argument now).

 As far as I can see (well, think), all currently passed bound arguments
 are the return value of such builtin already.
>>>
>>> All bounds are result of different builtin calls. Small example:
>>>
>>> int *global_p;
>>> void foo (int *p)
>>> {
>>>   int buf[10];
>>>   bar (p, buf, global_p);
>>> }
>>>
>>>
>>> It is translated into:
>>>
>>>   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
>>>   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
>>>   global_p.0_2 = global_p;
>>>   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
>>>   bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7,
>>> global_p.0_2, __bound_tmp.1_8);
>>>
>>> Bounds binding via calls as you suggest may be done as following:
>>>
>>>   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
>>>   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
>>>   global_p.0_2 = global_p;
>>>   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
>>>   _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6);
>>>   _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7);
>>>   _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8);
>>>   bar (_9, _10, _11);
>>>
>>> Is it close to what you propose?
>>
>> Yes.
>
> Is there a way to bind bounds with structures in a similar way?

Not to have them easy to lookup in the SSA web.  A long time ago
I proposed to make SSA aggregates possible, so you could do

 tem_2 = __internal_bind_bounds (aggr(D), __bound_tmp.1_3,
__bound_tmp.1_4, ...);
 bar (tem_2);

(originally the SSA aggregates were supposed to make copy-propgagation
possible using the SSA copy propagator, and of course I needed it for
the middle-end array work)

Not sure if that will give other people the creeps (expand would
never expand the "load" from tem_2 but instead handle aggr as
parameter).  A slight complication is due to alias analysis
which would be need to taught that bar performs a load of aggr.

Richard.

>  For
> SSA names I may easily find definition and check if it is a binding
> builtin call. But for structures it is not possible.  The way I see it
> to mark all such args as addressable and load required bounds on
> expand pass.
>
> Ilya
>>
>> Richard.
>>
>>> Ilya

 Richard.



> Jeff

Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-11 Thread Jakub Jelinek

On Mon, Nov 11, 2013 at 11:19:21AM +0100, Dodji Seketeli wrote:
>  .../c-c++-common/cpp/warning-zero-in-literals-1.c  | Bin 0 -> 240 bytes
>  libcpp/include/line-map.h  |   8 +
>  libcpp/line-map.c  |  40 ++
>  8 files changed, 585 insertions(+), 39 deletions(-)
>  create mode 100644 
> gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 49285e5..50c2482 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1469,7 +1469,7 @@ OBJS = \
>  
>  # Objects in libcommon.a, potentially used by all host binaries and with
>  # no target dependencies.
> -OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o 
> input.o version.o
> +OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o vec.o 
> input.o version.o

Too long line?

> +  if (c == '\0')
> + c = ' ';
>pp_character (context->printer, c);

Does that match how libcpp counts the embedded '\0' character in column
computation?

> +/* The position (byte count) the the last byte of the line.  This
> +   normally points to the '\n' character, or to one byte after the
> +   last byte of the file, if the file doesn't contain a '\n'
> +   character.  */
> +size_t end_pos;

Does it really help to note this?  You can always just walk the line from
start_pos looking for '\n' or end of file.

> +static fcache*
> +add_file_to_cache_tab (const char *file_path)
> +{
> +  static size_t idx;
> +  fcache *r;
> +
> +  FILE *fp = fopen (file_path, "r");
> +  if (ferror (fp))
> +{
> +  fclose (fp);
> +  return NULL;
> +}
> +
> +  r = &fcache_tab[idx];

Wouldn't it be better to use some LRU algorithm to determine which
file to kick out of the cache?  Have some tick counter of cache lookups (or
creation) and assign the tick counter to the just created resp. used
cache entry, and remove the one with the smallest counter?
> +  fcache * r = lookup_file_in_cache_tab (file_path);
> +  if (r ==  NULL)

Formatting (no space after *, extra space after ==).

Jakub

Re: Some wide-int review comments

2013-11-11 Thread Kenneth Zadeck


On 11/11/2013 06:49 AM, Richard Biener wrote:

On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck  wrote:

On 11/08/2013 05:30 AM, Richard Sandiford wrote:

  From tree-vrp.c:

@@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code
   /* If the singed operation wraps then int_const_binop has done
  everything we want.  */
   ;
+  /* Signed division of -1/0 overflows and by the time it gets here
+ returns NULL_TREE.  */
+  else if (!res)
+return NULL_TREE;
 else if ((TREE_OVERFLOW (res)
 && !TREE_OVERFLOW (val1)
 && !TREE_OVERFLOW (val2))

Why is this case different from trunk?  Or is it a bug-fix independent
of wide-int?

the api for division is different for wide-int than it was for double-int.

But this is using a tree API (int_const_binop) that didn't change
(it returned NULL for / 0 previously).  So what makes us arrive here
now?  (I agree there is a bug in VRP, but it shouldn't manifest itself
only on wide-int)

Richard.
My reading of the code is that is that i changed int_const_binop to 
return null_tree for this case.
On the trunk, only rem returns null_tree for divide by 0, on the wide 
int branch, both div and rem return null tree.


I know that this is going to bring on a string of questions that i do 
not remember the answers to as to why i made that change. but 
fold-const.c:int_const_binop_1 now returns null_tree and this is just 
fallout from that change.


Kenny

Thanks,
Richard

[PATCH] Fix failing assertion in calls.c:store_unaligned_arguments_into_pseudos

2013-11-11 Thread Ulrich Weigand

Hello,

when implementing the new ABI for powerpc64le-linux, I ran into an assertion
failure in store_unaligned_arguments_into_pseudos:
gcc_assert (args[i].partial % UNITS_PER_WORD == 0);

This can happen in the new ABI since we pass "homogeneous structures"
consisting of soleley floating point elements of the same type in
floating-point registers until they run out, and the rest of the
structure in memory.  If the structure member type is a 4-byte float,
and we only have an odd number of floating point registers available,
then args[i].partial can legitimately end up not being a multiple
of UNITS_PER_WORD (i.e. 8 on the platform).

Now, there are a number of similar checks that args[i].partial is
aligned elsewhere in calls.c and functions.c.  But for all of those
the logic is: if args[i].reg is a REG, then args[i].partial must be
a multiple of the word size; but if args[i].regs is a PARALLEL, then
args[i].partial can be any arbitrary value.  In the powerpc64le-linux
use case, the back-end always generates PARALLEL in this situation,
so it seemed the middle-end ought to support this -- and it does,
except for this one case.

However, looking more closely, it seems store_unaligned_arguments_into_pseudos
is not really useful for PARALLEL arguments in the first place.  What this
routine does is load arguments into args[i].aligned_regs.  But if we have
an argument where args[i].reg is a PARALLEL, args[i].aligned_regs will in
fact never be used later on at all!   Instead, PARALLEL will always be
handled directly via emit_group_move (in load_register_parameters), so
the code generated by store_unaligned_arguments_into_pseudos for such cases
is simply dead anyway.

Thus, I'd suggest to simply have store_unaligned_arguments_into_pseudos
skip PARALLEL arguments.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich



ChangeLog:

2013-11-11  Ulrich Weigand  

* calls.c (store_unaligned_arguments_into_pseudos): Skip PARALLEL
arguments.

Index: gcc/gcc/calls.c
===
--- gcc.orig/gcc/calls.c
+++ gcc/gcc/calls.c
@@ -981,6 +981,7 @@ store_unaligned_arguments_into_pseudos (
 
   for (i = 0; i < num_actuals; i++)
 if (args[i].reg != 0 && ! args[i].pass_on_stack
+   && GET_CODE (args[i].reg) != PARALLEL
&& args[i].mode == BLKmode
&& MEM_P (args[i].value)
&& (MEM_ALIGN (args[i].value)
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[wwwdocs] backend -> back end

2013-11-11 Thread Gerald Pfeifer

Working to address a user question, I noticed that many of our pages use 
the spelling of "backend" when http://gcc.gnu.org/codingconventions.html
suggest "back end" (noun) and "back-end" (adjective).

Joseph, if you confirm that back end it is (as a noun), I'll apply
the patch below.

Gerald

Index: contribute.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/contribute.html,v
retrieving revision 1.81
diff -u -3 -p -r1.81 contribute.html
--- contribute.html 18 Aug 2013 20:59:25 -  1.81
+++ contribute.html 10 Nov 2013 20:34:10 -
@@ -308,11 +308,11 @@ is a good candidate for being mentioned 
 
 Larger accomplishments, either as part of a specific project, or long
 term commitment, merit mention on the front page.  Examples include projects
-like tree-ssa, new backends, major advances in optimization or standards
+like tree-ssa, new back ends, major advances in optimization or standards
 compliance.
 
 The gcc-announce mailing list serves to announce new releases and changes
-like frontends or backends being dropped.
+like front ends or back ends being dropped.
 
 
 
Index: frontends.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/frontends.html,v
retrieving revision 1.38
diff -u -3 -p -r1.38 frontends.html
--- frontends.html  27 Mar 2013 21:53:00 -  1.38
+++ frontends.html  10 Nov 2013 20:34:10 -
@@ -33,14 +33,14 @@ are very mature.
 href="http://www.mercurylang.org/download/gcc-backend.html";>Mercury,
 a declarative logic/functional language. The University of Melbourne Mercury
 compiler is written in Mercury; originally it compiled via C but now it also
-has a backend that generates assembler directly, using the GCC backend.
+has a back end that generates assembler directly, using the GCC back end.
 
 http://CobolForGCC.sourceforge.net/";>Cobol For GCC
 (at an early stage of development).
 
 http://www.nongnu.org/gm2/";>GNU Modula-2 implements
 the PIM2, PIM3, PIM4 and ISO dialects of the language.  The compiler
-is fully operational with the GCC 4.1.2 backend (on GNU/Linux x86
+is fully operational with the GCC 4.1.2 back end (on GNU/Linux x86
 systems).  Work is in progress to move the frontend to the GCC trunk.
 The frontend is mostly written in Modula-2, but includes a bootstrap
 procedure via a heavily modified version of p2c.
Index: lists.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/lists.html,v
retrieving revision 1.106
diff -u -3 -p -r1.106 lists.html
--- lists.html  24 Oct 2013 01:31:10 -  1.106
+++ lists.html  10 Nov 2013 20:34:10 -
@@ -95,7 +95,7 @@ before subscribing<
   http://gcc.gnu.org/ml/jit/";>jit
   is for discussion and development of
   http://gcc.gnu.org/wiki/JIT";>libgccjit, an experimental
-  library for implementing Just-In-Time compilation using gcc as a backend.
+  library for implementing Just-In-Time compilation using GCC as a back end.
   The list is intended for both users and developers of the library.
   Patches for the jit branch should go to both this list and
   gcc-patches.
Index: news.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/news.html,v
retrieving revision 1.137
diff -u -3 -p -r1.137 news.html
--- news.html   12 Aug 2013 00:01:45 -  1.137
+++ news.html   10 Nov 2013 20:34:11 -
@@ -1079,10 +1079,10 @@ noticed or fixed defects or made other u
 May 1, 2000
 
 Richard Earnshaw of ARM Ltd, and Nick Clifton of Cygnus, a Red Hat
-company, have contributed a new backend for the Arm and Thumb
+company, have contributed a new back end for the Arm and Thumb
 processors.
 
-The new backend combines code generation for the Arm, the Thumb and
+The new back end combines code generation for the Arm, the Thumb and
 the StrongArm into one compiler, with the target processor and
 instruction sets being selectable via command line switches.
 
@@ -1260,7 +1260,7 @@ changes from the old GCC 2 sources.
 
 September 2, 1999
 
-Richard Henderson has finished merging the ia32 backend rewrite into the
+Richard Henderson has finished merging the ia32 back end rewrite into the
 mainline GCC sources.  The rewrite is designed to improve optimization
 opportunities for the Pentium II target, but also provides a cleaner
 way to optimize for the Pentium III, AMD-K7 and other high end ia32
@@ -1537,7 +1537,7 @@ Cygnus donates August 25, 1998
 
-  David Miller donates rewritten sparc 
backend.
+  David Miller donates rewritten SPARC back 
end.
 
 
 August 19, 1998
Index: svn.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v
retrieving revision 1.189
diff -u -3 -p -r1.189 svn.html
--- svn.html14 Sep 2013 00:42:41 -  1.189
+++ svn.html10 Nov 2013 20:34:11 -
@@ -880,7 +880,7 @@ be prefixed with the initials of the dis

[PATCH] Avoid duplicate calls to REG_PARM_STACK_SPACE

2013-11-11 Thread Ulrich Weigand

Hello,

this is another tweak to the middle-end to help support the new
powerpc64le-linux ABI.

In the new ABI, we make a distinction between functions that pass
all arguments solely in registers, and those that do not.  Only when
calling one the latter type (or a varags routine) does the caller
need to provide REG_PARM_STACK_SPACE; in the former case, this can
be omitted to save stack space.

This leads to a definition of REG_PARM_STACK_SPACE that is a lot
more complex than usual, which means it would be helpful if the
middle-end were to evaluate it sparingly (e.g. once per function
definition / call).

The middle-end already does cache REG_PARM_STACK_SPACE results,
however, this cache it not consistently used.  In particular,
the locate_and_pad_parm routine will re-evaluate the macro
every time it is called, even though all callers of the routine
already have a cached copy.

This patch adds a new argument to locate_and_pad_parm which is
used instead of evaluating REG_PARM_STACK_SPACE, and updates
the callers to pass in the correct value.

There should be no change in generated code on any platform.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich

2013-11-11  Ulrich Weigand  
Alan Modra  

ChangeLog:

* function.c (assign_parms): Use all.reg_parm_stack_space instead
of re-evaluating REG_PARM_STACK_SPACE target macro.
(locate_and_pad_parm): New parameter REG_PARM_STACK_SPACE.  Use it
instead of evaluating target macro REG_PARM_STACK_SPACE every time.
(assign_parm_find_entry_rtl): Update call.
* calls.c (initialize_argument_information): Update call.
(emit_library_call_value_1): Likewise.
* expr.h (locate_and_pad_parm): Update prototype.

Index: gcc/gcc/expr.h
===
--- gcc.orig/gcc/expr.h
+++ gcc/gcc/expr.h
@@ -521,8 +521,8 @@ extern rtx expand_divmod (int, enum tree
  rtx, int);
 #endif
 
-extern void locate_and_pad_parm (enum machine_mode, tree, int, int, tree,
-struct args_size *,
+extern void locate_and_pad_parm (enum machine_mode, tree, int, int, int,
+tree, struct args_size *,
 struct locate_and_pad_arg_data *);
 
 /* Return the CODE_LABEL rtx for a LABEL_DECL, creating it if necessary.  */
Index: gcc/gcc/calls.c
===
--- gcc.orig/gcc/calls.c
+++ gcc/gcc/calls.c
@@ -1326,6 +1326,7 @@ initialize_argument_information (int num
 #else
 args[i].reg != 0,
 #endif
+reg_parm_stack_space,
 args[i].pass_on_stack ? 0 : args[i].partial,
 fndecl, args_size, &args[i].locate);
 #ifdef BLOCK_REG_PADDING
@@ -3736,7 +3737,8 @@ emit_library_call_value_1 (int retval, r
 #else
   argvec[count].reg != 0,
 #endif
-  0, NULL_TREE, &args_size, &argvec[count].locate);
+  reg_parm_stack_space, 0,
+  NULL_TREE, &args_size, &argvec[count].locate);
 
   if (argvec[count].reg == 0 || argvec[count].partial != 0
  || reg_parm_stack_space > 0)
@@ -3823,7 +3825,7 @@ emit_library_call_value_1 (int retval, r
 #else
   argvec[count].reg != 0,
 #endif
-  argvec[count].partial,
+  reg_parm_stack_space, argvec[count].partial,
   NULL_TREE, &args_size, &argvec[count].locate);
  args_size.constant += argvec[count].locate.size.constant;
  gcc_assert (!argvec[count].locate.size.var);
Index: gcc/gcc/function.c
===
--- gcc.orig/gcc/function.c
+++ gcc/gcc/function.c
@@ -2523,6 +2523,7 @@ assign_parm_find_entry_rtl (struct assig
 }
 
   locate_and_pad_parm (data->promoted_mode, data->passed_type, in_regs,
+  all->reg_parm_stack_space,
   entry_parm ? data->partial : 0, current_function_decl,
   &all->stack_args_size, &data->locate);
 
@@ -3511,11 +3512,7 @@ assign_parms (tree fndecl)
   /* Adjust function incoming argument size for alignment and
  minimum length.  */
 
-#ifdef REG_PARM_STACK_SPACE
-  crtl->args.size = MAX (crtl->args.size,
-   REG_PARM_STACK_SPACE (fndecl));
-#endif
-
+  crtl->args.size = MAX (crtl->args.size, all.reg_parm_stack_space);
   crtl->args.size = CEIL_ROUND (crtl->args.size,
   PARM_BOUNDARY / BITS_PER_UNIT);
 
@@ -3719,6 +3716,9 @@ gimplify_parameters (void)
IN_REGS is nonzero if the argument will be passed in registers.  It will
never be set if REG_PARM_STACK_SPACE is not defined.
 
+   REG_PARM_STACK_SPACE is the number o

[PATCH, rs6000] Fix little-endian bug in linux-unwind.h

2013-11-11 Thread Ulrich Weigand

Hello,

this fixes another issue related to CR unwinding, this time only on
64-bit little-endian systems.

The problem is linux-unwind.h:ppc_fallback_frame_state, which notes
that the kernel places the CR value in the low bits of a 64-bit
field in the signal handler frame; but the offset of that field
is different in little-endian.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


libgcc/ChangeLog:

2013-11-11  Ulrich Weigand  
Alan Modra  

* config/rs6000/linux-unwind.h (ppc_fallback_frame_state): Correct
location of CR save area for 64-bit little-endian systems.

Index: gcc/libgcc/config/rs6000/linux-unwind.h
===
--- gcc.orig/libgcc/config/rs6000/linux-unwind.h
+++ gcc/libgcc/config/rs6000/linux-unwind.h
@@ -185,6 +185,7 @@ ppc_fallback_frame_state (struct _Unwind
 {
   struct gcc_regs *regs = get_regs (context);
   struct gcc_vregs *vregs;
+  long cr_offset;
   long new_cfa;
   int i;
 
@@ -206,11 +207,13 @@ ppc_fallback_frame_state (struct _Unwind
   fs->regs.reg[i].loc.offset = (long) ®s->gpr[i] - new_cfa;
 }
 
+  /* The CR is saved in the low 32 bits of regs->ccr.  */
+  cr_offset = (long) ®s->ccr - new_cfa;
+#ifndef __LITTLE_ENDIAN__
+  cr_offset += sizeof (long) - 4;
+#endif
   fs->regs.reg[R_CR2].how = REG_SAVED_OFFSET;
-  /* CR? regs are always 32-bit and PPC is big-endian, so in 64-bit
- libgcc loc.offset needs to point to the low 32 bits of regs->ccr.  */
-  fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa
-  + sizeof (long) - 4;
+  fs->regs.reg[R_CR2].loc.offset = cr_offset;
 
   fs->regs.reg[R_LR].how = REG_SAVED_OFFSET;
   fs->regs.reg[R_LR].loc.offset = (long) ®s->link - new_cfa;
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH, rs6000] Fix corner case in unwinding CR values

2013-11-11 Thread Ulrich Weigand

Hello,

this patch fixes a corner case bug in unwinding CR values.  The
underlying problem is as follows:

In order to save the CR in the prologue, two insns are necessary.
The first is a mfcr that loads the CR value into a GPR, and the
second is a store of the GPR to the stack.  The back-end currently
tags the first insn with a REG_FRAME_RELATED_EXPR, and marks both
as RTX_FRAME_RELATED_P.  This causes the middle-end to emit two
CFI records, showing the CR saved first in the GPR, and later
in its stack slot.  (The first record is omitted if there is no
possible exception in between.)

Now, assume we use -fnon-call-exceptions, and get a signal on
an instruction in between those two points, and the signal handler
attempts to throw.  The unwind logic will read CFI records and
determine that in this frame, CR was saved into a GPR; and in
the frame below (which must be a signal frame), that GPR was
saved onto the stack.

What the unwind logic then decides to do is to restore CR from
the save slot of that latter GPR; i.e. the unwinder will copy
that GPR save slot over the CR save slot of the unwind routine
that calls __builtin_eh_return.

Unfortunately, this is a problem on a 64-bit big-endian platform,
since that GPR save slot is 8 bytes, while the CR save slot is
just 4 bytes.  The unwinder therefore copies the wrong half of
the GPR save slot over the CR save slot ...


As second, related, problem will come up with the new ABI.  Here,
we want to track each CR field in a separate CFI record, even
though the values end up sharing the same stack slot.  This is
not a problem for a CFI record pointing to memory.  However, the
current dwarf2out.c middle-end logic is unable to handle the
situation where multiple registers are saved in the same *register*
at the same time ...


Both problems can be fixed in the same way, by never emitting
a CFI record showing CR saved into a GPR.  This is easily done
by simply marking only the store to memory as RTX_FRAME_RELATED_P.
However, to avoid generating incorrect code we must then prevent
and exception point to occur in between the two insns to save CR.
The easiest way to do so is to mark the store to the stack slot
as an additional user of all CR fields that need to be saved by
the current function.

This does restrict scheduling other instructions in between the
prologue a bit more than otherwise.  But since this affects only
very special cases (e.g. an instruction that may triggers a floating-
point exception, while -fnon-call-exceptions is active), this
seems not too bad ...


Note that in the epilogue, this is not really a problem, even though
we also need two instructions to restore CR.  We can (and do) simply
leave the CFI record point to the stack slot all the time, which
remains valid even after the CR value was read from there (and even
after the stack pointer itself was modified).


The patch below fixes this problem as described above, and adds
a test case that fails without the patch.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich



ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_emit_prologue): Do not place a
RTX_FRAME_RELATED_P marker on the UNSPEC_MOVESI_FROM_CR insn.
Instead, add USEs of all modified call-saved CR fields to the
insn storing the result to the stack slot, and provide an
appropriate REG_FRAME_RELATED_EXPR for that insn.
* config/rs6000/rs6000.md ("*crsave"): New insn pattern.
* config/rs6000/predicates.md ("crsave_operation"): New predicate.

testsuite/ChangeLog:

2013-11-11  Ulrich Weigand  

* g++.dg/eh/ppc64-sighandle-cr.C: New test.


Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -21761,21 +21761,9 @@ rs6000_emit_prologue (void)
   && REGNO (frame_reg_rtx) != cr_save_regno
   && !(using_static_chain_p && cr_save_regno == 11))
 {
-  rtx set;
-
   cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno);
   START_USE (cr_save_regno);
-  insn = emit_insn (gen_movesi_from_cr (cr_save_rtx));
-  RTX_FRAME_RELATED_P (insn) = 1;
-  /* Now, there's no way that dwarf2out_frame_debug_expr is going
-to understand '(unspec:SI [(reg:CC 68) ...] UNSPEC_MOVESI_FROM_CR)'.
-But that's OK.  All we have to do is specify that _one_ condition
-code register is saved in this stack slot.  The thrower's epilogue
-will then restore all the call-saved registers.
-We use CR2_REGNO (70) to be compatible with gcc-2.95 on Linux.  */
-  set = gen_rtx_SET (VOIDmode, cr_save_rtx,
-gen_rtx_REG (SImode, CR2_REGNO));
-  add_reg_note (insn, REG_FRAME_RELATED_EXPR, set);
+  emit_insn (gen_movesi_from_cr (cr_save_rtx));
 }
 
   /* Do any required saving of fpr's.  If only one or two to save, do
@@ -22086,26 +22074,

[PATCH] Remove dead code in input_gimple_stmt

2013-11-11 Thread Richard Biener


The following patch removes now dead code from input_gimple_stmt
(which also could be quite slow).  Type mismatches can only
occur at the decl level after the new tree merging code went
in and those are handed by wrapping all decls in MEM_REFs to
transparently have them view-converted.

Bootstrapped and tested on x86_64-unknown-linux-gnu, LTO bootstrap
in progress.

Richard.

2013-11-11  Richard Biener  

* gimple-streamer-in.c (input_gimple_stmt): Remove now dead
code dealing with type mismatches inside component reference
chains.

Index: gcc/gimple-streamer-in.c
===
*** gcc/gimple-streamer-in.c.orig   2013-11-11 12:16:43.0 +0100
--- gcc/gimple-streamer-in.c2013-11-11 12:20:00.956191322 +0100
*** input_gimple_stmt (struct lto_input_bloc
*** 158,242 
  if (TREE_CODE (*opp) == ADDR_EXPR)
opp = &TREE_OPERAND (*opp, 0);
  while (handled_component_p (*opp))
!   {
! if (TREE_CODE (*opp) == COMPONENT_REF)
!   {
! /* Fixup FIELD_DECLs in COMPONENT_REFs, they are not handled
!by decl merging.  */
! tree field, type, tem;
! tree closest_match = NULL_TREE;
! field = TREE_OPERAND (*opp, 1);
! type = DECL_CONTEXT (field);
! for (tem = TYPE_FIELDS (type); tem; tem = TREE_CHAIN (tem))
!   {
! if (TREE_CODE (tem) != FIELD_DECL)
!   continue;
! if (tem == field)
!   break;
! if (DECL_NONADDRESSABLE_P (tem)
! == DECL_NONADDRESSABLE_P (field)
! && gimple_compare_field_offset (tem, field))
!   {
! if (types_compatible_p (TREE_TYPE (tem),
! TREE_TYPE (field)))
!   break;
! else
!   closest_match = tem;
!   }
!   }
! /* In case of type mismatches across units we can fail
!to unify some types and thus not find a proper
!field-decl here.  */
! if (tem == NULL_TREE)
!   {
! /* Thus, emit a ODR violation warning.  */
! if (warning_at (gimple_location (stmt), 0,
! "use of type %<%E%> with two mismatching "
! "declarations at field %<%E%>",
! type, TREE_OPERAND (*opp, 1)))
!   {
! if (TYPE_FIELDS (type))
!   inform (DECL_SOURCE_LOCATION (TYPE_FIELDS (type)),
!   "original type declared here");
! inform (DECL_SOURCE_LOCATION (TREE_OPERAND (*opp, 1)),
! "field in mismatching type declared here");
! if (TYPE_NAME (TREE_TYPE (field))
! && (TREE_CODE (TYPE_NAME (TREE_TYPE (field)))
! == TYPE_DECL))
!   inform (DECL_SOURCE_LOCATION
! (TYPE_NAME (TREE_TYPE (field))),
!   "type of field declared here");
! if (closest_match
! && TYPE_NAME (TREE_TYPE (closest_match))
! && (TREE_CODE (TYPE_NAME
!  (TREE_TYPE (closest_match))) == TYPE_DECL))
!   inform (DECL_SOURCE_LOCATION
! (TYPE_NAME (TREE_TYPE (closest_match))),
!   "type of mismatching field declared here");
!   }
! /* And finally fixup the types.  */
! TREE_OPERAND (*opp, 0)
!   = build1 (VIEW_CONVERT_EXPR, type,
! TREE_OPERAND (*opp, 0));
!   }
! else
!   TREE_OPERAND (*opp, 1) = tem;
!   }
! else if ((TREE_CODE (*opp) == ARRAY_REF
!   || TREE_CODE (*opp) == ARRAY_RANGE_REF)
!  && (TREE_CODE (TREE_TYPE (TREE_OPERAND (*opp, 0)))
!  != ARRAY_TYPE))
!   {
! /* And ARRAY_REFs to objects that had mismatched types
!during symbol merging to avoid ICEs.  */
! TREE_OPERAND (*opp, 0)
!   = build1 (VIEW_CONVERT_EXPR,
! build_array_type (TREE_TYPE (*opp), NULL_TREE),
! TREE_OPERAND (*opp, 0));
!   }
! 
!

[PATCH, rs6000] ELFv2 ABI preparation: Refactor call expanders

2013-11-11 Thread Ulrich Weigand

Hello,

this patch refactors code to expand calls in preparation for adding support
for the new little-endian ABI.

Currently, the call expanders always generate the same RTX pattern, which
is matched by different insns patterns depending on circumstances (ABI,
local vs. global calls, ...).   This makes it difficult if in some of
these circumstances, we'd really need a different RTX pattern, e.g.
becuase the call uses certain special registers, or because the call
must perform extra actions like saving or restoring the TOC pointer.

There is one exception to this logic already in place: when expanding
an indirect call on ABI_AIX, the expander calls a routine in rs6000.c.

This patch expands on that by handling *all* calls (including sibling
calls) on ABI_AIX via a routine in rs6000.c.   This should be fully
transparent to generated code, although generated RTL is a bit different:
we use CALL_INSN_FUNCTION_USAGE to track uses of ABI-defined registers
by the call, which has a couple of other advantages:

- we can remove some patterns that were duplicated solely to track
  whether or not a call uses r11;

- we can simply the sibcall patterns so they no longer explicitly
  refer to LR_REGNO;

- we can precisely track which calls use r2 to ensure we have correct
  data flow for the register (this doesn't actually matter for the
  current code, but will become necessary with the new ABI).


Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich



ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_call_indirect_aix): Rename to ...
(rs6000_call_aix): ... this.  Handle both direct and indirect calls.
Create call insn directly instead of via various gen_... routines.
Mention special registers used by the call in CALL_INSN_FUNCTION_USAGE.
(rs6000_sibcall_aix): New function.
* config/rs6000/rs6000.md (TOC_SAVE_OFFSET_32BIT): Remove.
(TOC_SAVE_OFFSET_64BIT): Likewise.
(AIX_FUNC_DESC_TOC_32BIT): Likewise.
(AIX_FUNC_DESC_TOC_64BIT): Likewise.
(AIX_FUNC_DESC_SC_32BIT): Likewise.
(AIX_FUNC_DESC_SC_64BIT): Likewise.
("call" expander): Call rs6000_call_aix.
("call_value" expander): Likewise.
("call_indirect_aix"): Replace this pattern ...
("call_indirect_aix_nor11"): ... and this pattern ...
("*call_indirect_aix"): ... by this insn pattern.
("call_value_indirect_aix"): Replace this pattern ...
("call_value_indirect_aix_nor11"): ... and this pattern ...
("*call_value_indirect_aix"): ... by this insn pattern.
("*call_nonlocal_aix32", "*call_nonlocal_aix64"): Replace by ...
("*call_nonlocal_aix"): ... this pattern.
("*call_value_nonlocal_aix32", "*call_value_nonlocal_aix64"): Replace
("*call_value_nonlocal_aix"): ... by this pattern.
("*call_local_aix"): New insn pattern.
("*call_value_local_aix"): Likewise.
("sibcall" expander): Call rs6000_sibcall_aix.
("sibcall_value" expander): Likewise.  Move earlier in file.
("*sibcall_nonlocal_aix"): Replace by ...
("*sibcall_aix"): ... this pattern.
("*sibcall_value_nonlocal_aix"): Replace by ...
("*sibcall_value_aix"): ... this pattern.
* config/rs6000/rs6000-protos.h (rs6000_call_indirect_aix): Remove.
(rs6000_call_aix): Add prototype.
(rs6000_sibcall_aix): Likewise.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -30599,118 +30599,138 @@ rs6000_legitimate_constant_p (enum machi
 }
 
 
-/* A function pointer under AIX is a pointer to a data area whose first word
-   contains the actual address of the function, whose second word contains a
-   pointer to its TOC, and whose third word contains a value to place in the
-   static chain register (r11).  Note that if we load the static chain, our
-   "trampoline" need not have any executable code.  */
+
+/* Expand code to perform a call under the AIX ABI.  */
 
 void
-rs6000_call_indirect_aix (rtx value, rtx func_desc, rtx flag)
+rs6000_call_aix (rtx value, rtx func_desc, rtx flag, rtx cookie)
 {
+  rtx toc_reg = gen_rtx_REG (Pmode, TOC_REGNUM);
+  rtx toc_load = NULL_RTX;
+  rtx toc_restore = NULL_RTX;
   rtx func_addr;
-  rtx toc_reg;
-  rtx sc_reg;
-  rtx stack_ptr;
-  rtx stack_toc_offset;
-  rtx stack_toc_mem;
-  rtx func_toc_offset;
-  rtx func_toc_mem;
-  rtx func_sc_offset;
-  rtx func_sc_mem;
+  rtx abi_reg = NULL_RTX;
+  rtx call[4];
+  int n_call;
   rtx insn;
-  rtx (*call_func) (rtx, rtx, rtx, rtx);
-  rtx (*call_value_func) (rtx, rtx, rtx, rtx, rtx);
-
-  stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
-  toc_reg = gen_rtx_REG (Pmode, TOC_REGNUM);
 
-  /* Load up address of the actual function.  */
-  func_desc = force_reg (Pmode, func_desc);
-  func_addr = gen_reg_rtx

[PATCH, rs6000] ELFv2 ABI preparation: Refactor rs6000_function_arg

2013-11-11 Thread Ulrich Weigand

Hello,

another patch in preparation for the new ABI.

When calling a function in the absence of a prototype, we need to pass
arguments that are normally passed in floating-point or vector registers
also on the stack and/or in GPRs.

The code currently handling this in rs6000_function_arg is implemented
in two separate places, one that handles floating-point arguments and
one that handles vector arguments.  This is probably reasonable with
the current ABI, since in the vector case, everything is much simpler
than in the floating-point case.

However, with the new ABI, we may now pass more complex arguments in
vector registers, in which case we might need to duplicate more of
the logic currently used only for floating-point arguments.

To avoid that duplication, this patch moves the logic to handle
argument duplication on stack or in GPRs out to a pair of new
helper routines, and uses them for both floating-point and vector
arguments.  For the current ABI, the result should be exactly
the same.

No change in generated code intented.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_psave_function_arg): New function.
(rs6000_finish_function_arg): Likewise.
(rs6000_function_arg): Use rs6000_psave_function_arg and
rs6000_finish_function_arg to handle both vector and floating
point arguments that are also passed in GPRs / the stack.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -9511,6 +9511,83 @@ rs6000_mixed_function_arg (enum machine_
   return gen_rtx_PARALLEL (mode, gen_rtvec_v (k, rvec));
 }
 
+/* We have an argument of MODE and TYPE that goes into FPRs or VRs,
+   but must also be copied into the parameter save area starting at
+   offset ALIGN_WORDS.  Fill in RVEC with the elements corresponding
+   to the GPRs and/or memory.  Return the number of elements used.  */
+
+static int
+rs6000_psave_function_arg (enum machine_mode mode, const_tree type,
+  int align_words, rtx *rvec)
+{
+  int k = 0;
+
+  if (align_words < GP_ARG_NUM_REG)
+{
+  int n_words = rs6000_arg_size (mode, type);
+
+  if (align_words + n_words > GP_ARG_NUM_REG
+ || (TARGET_32BIT && TARGET_POWERPC64))
+   {
+ /* If this is partially on the stack, then we only
+include the portion actually in registers here.  */
+ enum machine_mode rmode = TARGET_32BIT ? SImode : DImode;
+ int i = 0;
+
+ if (align_words + n_words > GP_ARG_NUM_REG)
+   {
+ /* Not all of the arg fits in gprs.  Say that it goes in memory
+too, using a magic NULL_RTX component.  Also see comment in
+rs6000_mixed_function_arg for why the normal
+function_arg_partial_nregs scheme doesn't work in this case. */
+ rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, NULL_RTX, const0_rtx);
+   }
+
+ do
+   {
+ rtx r = gen_rtx_REG (rmode, GP_ARG_MIN_REG + align_words);
+ rtx off = GEN_INT (i++ * GET_MODE_SIZE (rmode));
+ rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off);
+   }
+ while (++align_words < GP_ARG_NUM_REG && --n_words != 0);
+   }
+  else
+   {
+ /* The whole arg fits in gprs.  */
+ rtx r = gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
+ rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, const0_rtx);
+   }
+}
+  else
+{
+  /* It's entirely in memory.  */
+  rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, NULL_RTX, const0_rtx);
+}
+
+  return k;
+}
+
+/* RVEC is a vector of K components of an argument of mode MODE.
+   Construct the final function_arg return value from it.  */
+
+static rtx
+rs6000_finish_function_arg (enum machine_mode mode, rtx *rvec, int k)
+{
+  gcc_assert (k >= 1);
+
+  /* Avoid returning a PARALLEL in the trivial cases.  */
+  if (k == 1)
+{
+  if (XEXP (rvec[0], 0) == NULL_RTX)
+   return NULL_RTX;
+
+  if (GET_MODE (XEXP (rvec[0], 0)) == mode)
+   return XEXP (rvec[0], 0);
+}
+
+  return gen_rtx_PARALLEL (mode, gen_rtvec_v (k, rvec));
+}
+
 /* Determine where to put an argument to a function.
Value is zero to push the argument on the stack,
or a hard register in which to store the argument.
@@ -9580,32 +9657,25 @@ rs6000_function_arg (cumulative_args_t c
 }
 
   if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named))
-if (TARGET_64BIT && ! cum->prototype)
-  {
-   /* Vector parameters get passed in vector register
-  and also in GPRs or memory, in absence of prototype.  */
-   int align_words;
-   rtx slot;
-   align_words = (cum->words + 1) & ~1;
+{
+  rtx rvec[GP_ARG_NUM_REG + 1];
+  rtx r;
+  int k = 0;
 
-

[PATCH] Fix alias machinery to deal with DECL_BIT_FIELD_REPRESENTATIVE in COMPONENT_REFs

2013-11-11 Thread Richard Biener


The following patch is necessary to make DECL_BIT_FIELD_REPRESENTATIVE
useable in COMPONENT_REFs (which is the easiest way to get
some bitfield lowering).

An earlier patch passed bootstrapped and testing on 
x86_64-unknown-linux-gnu (with bitfield lowering), re-testing
after minor changes.

2013-11-11  Richard Biener  

* tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p):
Properly handle DECL_BIT_FIELD_REPRESENTATIVE occuring as
COMPONENT_REF operand.
* alias.c (nonoverlapping_component_refs_p): Likewise.
* stor-layout.c (start_bitfield_representative): Mark
DECL_BIT_FIELD_REPRESENTATIVE as DECL_NONADDRESSABLE_P.

Index: gcc/tree-ssa-alias.c
===
*** gcc/tree-ssa-alias.c.orig   2013-11-11 10:32:25.0 +0100
--- gcc/tree-ssa-alias.c2013-11-11 11:35:16.042386275 +0100
*** nonoverlapping_component_refs_of_decl_p
*** 829,841 
if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
 goto may_overlap;
  
-   /* Different fields of the same record type cannot overlap.
-??? Bitfields can overlap at RTL level so punt on them.  */
if (field1 != field2)
{
  component_refs1.release ();
  component_refs2.release ();
! return !(DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2));
}
  }
  
--- 829,848 
if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
 goto may_overlap;
  
if (field1 != field2)
{
  component_refs1.release ();
  component_refs2.release ();
! /* A field and its representative need to be considered the
!same.  */
! if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2
! || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1)
!   return false;
! /* Different fields of the same record type cannot overlap.
!??? Bitfields can overlap at RTL level so punt on them.  */
! if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
!   return false;
! return true;
}
  }
  
Index: gcc/stor-layout.c
===
*** gcc/stor-layout.c.orig  2013-11-11 12:05:16.0 +0100
--- gcc/stor-layout.c   2013-11-11 13:42:14.947161958 +0100
*** start_bitfield_representative (tree fiel
*** 1747,1752 
--- 1747,1757 
DECL_SIZE_UNIT (repr) = DECL_SIZE_UNIT (field);
DECL_PACKED (repr) = DECL_PACKED (field);
DECL_CONTEXT (repr) = DECL_CONTEXT (field);
+   /* There are no indirect accesses to this field.  If we introduce
+  some then they have to use the record alias set.  This makes
+  sure to properly conflict with [indirect] accesses to addressable
+  fields of the bitfield group.  */
+   DECL_NONADDRESSABLE_P (repr) = 1;
return repr;
  }
  
Index: gcc/alias.c
===
*** gcc/alias.c.orig2013-11-11 15:29:12.0 +0100
--- gcc/alias.c 2013-11-11 15:29:17.054068720 +0100
*** nonoverlapping_component_refs_p (const_r
*** 2302,2310 
  
  found:
/* If we're left with accessing different fields of a structure, then no
!possible overlap, unless they are both bitfields.  */
if (TREE_CODE (typex) == RECORD_TYPE && fieldx != fieldy)
!   return !(DECL_BIT_FIELD (fieldx) && DECL_BIT_FIELD (fieldy));
  
/* The comparison on the current field failed.  If we're accessing
 a very nested structure, look at the next outer level.  */
--- 2302,2316 
  
  found:
/* If we're left with accessing different fields of a structure, then no
!possible overlap, unless they are both bitfields or one is
!the representative of the other.  */
if (TREE_CODE (typex) == RECORD_TYPE && fieldx != fieldy)
!   {
! if (DECL_BIT_FIELD_REPRESENTATIVE (fieldx) == fieldy
! || DECL_BIT_FIELD_REPRESENTATIVE (fieldy) == fieldx)
!   return false;
! return !(DECL_BIT_FIELD (fieldx) && DECL_BIT_FIELD (fieldy));
!   }
  
/* The comparison on the current field failed.  If we're accessing
 a very nested structure, look at the next outer level.  */

[PATCH, rs0000] ELFv2 ABI preparation: Refactor some uses of ABI_AIX

2013-11-11 Thread Ulrich Weigand

Hello,

this is another patch to prepare for the new ABI.  Since this will introduce
a new setting of DEFAULT_ABI, which will have to be treated like ABI_AIX in
many cases, it would be better if there were fewer explicit references to
ABI_AIX in the back-end.  This patch eliminates a number of them:

- DEFAULT_ABI is either ABI_V4, ABI_DARWIN, or ABI_AIX, so any test for
  two of these can be replaced by a test for the third

- there are some code paths that are never used on ABI_DARWIN.  Here,
  any test for ABI_AIX can be replaced by one for ABI_V4.  (This is
  particularly obvious for the load_toc_v4_... patterns that even
  have v4 in their name ...)

- rs6000_elf_declare_function_name had a duplicated test for ABI_AIX in
  a code block that already tests for ABI_AIX

None of this is intended to change generated code on any platform.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_option_override_internal): Replace
"DEFAULT_ABI != ABI_AIX" test by testing for ABI_V4 or ABI_DARWIN.
(rs6000_savres_strategy): Likewise.
(rs6000_return_addr): Likewise.
(rs6000_emit_load_toc_table): Replace "DEFAULT_ABI != ABI_AIX" by
testing for ABI_V4 (since ABI_DARWIN is impossible here).
(rs6000_emit_prologue): Likewise.
(legitimate_lo_sum_address_p): Simplify DEFAULT_ABI test.
(rs6000_elf_declare_function_name): Remove duplicated test.
* config/rs6000/rs6000.md ("load_toc_v4_PIC_1"): Explicitly test
for ABI_V4 (instead of "DEFAULT_ABI != ABI_AIX" test).
("load_toc_v4_PIC_1_normal"): Likewise.
("load_toc_v4_PIC_1_476"): Likewise.
("load_toc_v4_PIC_1b"): Likewise.
("load_toc_v4_PIC_1b_normal"): Likewise.
("load_toc_v4_PIC_1b_476"): Likewise.
("load_toc_v4_PIC_2"): Likewise.
("load_toc_v4_PIC_3b"): Likewise.
("load_toc_v4_PIC_3c"): Likewise.
* config/rs6000/rs6000.h (RS6000_REG_SAVE): Simplify DEFAULT_ABI test.
(RS6000_SAVE_AREA): Likewise.
(FP_ARG_MAX_REG): Likewise.
(RETURN_ADDRESS_OFFSET): Likewise.
* config/rs6000/sysv.h (TARGET_TOC): Test for ABI_V4 instead
of ABI_AIX.
(SUBTARGET_OVERRIDE_OPTIONS): Likewise.
(MINIMAL_TOC_SECTION_ASM_OP): Likewise.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -3669,7 +3669,7 @@ rs6000_option_override_internal (bool gl
 
   /* We should always be splitting complex arguments, but we can't break
 Linux and Darwin ABIs at the moment.  For now, only AIX is fixed.  */
-  if (DEFAULT_ABI != ABI_AIX)
+  if (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN)
targetm.calls.split_complex_arg = NULL;
 }
 
@@ -6384,7 +6384,7 @@ legitimate_lo_sum_address_p (enum machin
 {
   bool large_toc_ok;
 
-  if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic)
+  if (DEFAULT_ABI == ABI_V4 && flag_pic)
return false;
   /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls
 push_reload from reload pass code.  LEGITIMIZE_RELOAD_ADDRESS
@@ -19605,7 +19605,8 @@ rs6000_savres_strategy (rs6000_stack_t *
  by the static chain.  It would require too much fiddling and the
  static chain is rarely used anyway.  FPRs are saved w.r.t the stack
  pointer on Darwin, and AIX uses r1 or r12.  */
-  if (using_static_chain_p && DEFAULT_ABI != ABI_AIX)
+  if (using_static_chain_p
+  && (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN))
 strategy |= ((DEFAULT_ABI == ABI_DARWIN ? 0 : SAVE_INLINE_FPRS)
 | SAVE_INLINE_GPRS
 | SAVE_INLINE_VRS | REST_INLINE_VRS);
@@ -20301,7 +20302,8 @@ rs6000_return_addr (int count, rtx frame
   /* Currently we don't optimize very well between prolog and body
  code and for PIC code the code can be actually quite bad, so
  don't try to be too clever here.  */
-  if (count != 0 || (DEFAULT_ABI != ABI_AIX && flag_pic))
+  if (count != 0
+  || ((DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN) && flag_pic))
 {
   cfun->machine->ra_needs_full_frame = 1;
 
@@ -20449,7 +20451,7 @@ rs6000_emit_load_toc_table (int fromprol
   rtx dest;
   dest = gen_rtx_REG (Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
 
-  if (TARGET_ELF && TARGET_SECURE_PLT && DEFAULT_ABI != ABI_AIX && flag_pic)
+  if (TARGET_ELF && TARGET_SECURE_PLT && DEFAULT_ABI == ABI_V4 && flag_pic)
 {
   char buf[30];
   rtx lab, tmp1, tmp2, got;
@@ -20477,7 +20479,7 @@ rs6000_emit_load_toc_table (int fromprol
   emit_insn (gen_load_toc_v4_pic_si ());
   emit_move_insn (dest, gen_rtx_REG (Pmode, LR_REGNO));
 }
-  else if (TARGET_ELF && DEFAULT_ABI != ABI_AIX && flag_pic == 2)
+  else if (TARGET_ELF && DEFA

[PATCH, rs6000] ELFv2 ABI preparation: Remove USE_FP/ALTIVEC_FOR_ARG_P type arg

2013-11-11 Thread Ulrich Weigand

Hello,

another patch in preparation for the new ABI.

USE_FP_FOR_ARG_P and USE_ALTIVEC_FOR_ARG_P take a TYPE argument
which they never use.  Since passing a correct TYPE for the
homogeneous struct case would be a bit problematic, it seems
cleaner to just remove the unused argument.

No change in generated code intented.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (USE_FP_FOR_ARG_P): Remove TYPE argument.
(USE_ALTIVEC_FOR_ARG_P): Likewise.
(rs6000_darwin64_record_arg_advance_recurse): Update uses.
(rs6000_function_arg_advance_1):Likewise.
(rs6000_darwin64_record_arg_recurse): Likewise.
(rs6000_function_arg): Likewise.
(rs6000_arg_partial_bytes): Likewise.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -8434,13 +8434,13 @@ rs6000_member_type_forces_blk (const_tre
 }
 
 /* Nonzero if we can use a floating-point register to pass this arg.  */
-#define USE_FP_FOR_ARG_P(CUM,MODE,TYPE)\
+#define USE_FP_FOR_ARG_P(CUM,MODE) \
   (SCALAR_FLOAT_MODE_P (MODE)  \
&& (CUM)->fregno <= FP_ARG_MAX_REG  \
&& TARGET_HARD_FLOAT && TARGET_FPRS)
 
 /* Nonzero if we can use an AltiVec register to pass this arg.  */
-#define USE_ALTIVEC_FOR_ARG_P(CUM,MODE,TYPE,NAMED) \
+#define USE_ALTIVEC_FOR_ARG_P(CUM,MODE,NAMED)  \
   (ALTIVEC_OR_VSX_VECTOR_MODE (MODE)   \
&& (CUM)->vregno <= ALTIVEC_ARG_MAX_REG \
&& TARGET_ALTIVEC_ABI   \
@@ -8889,7 +8889,7 @@ rs6000_darwin64_record_arg_advance_recur
 
if (TREE_CODE (ftype) == RECORD_TYPE)
  rs6000_darwin64_record_arg_advance_recurse (cum, ftype, bitpos);
-   else if (USE_FP_FOR_ARG_P (cum, mode, ftype))
+   else if (USE_FP_FOR_ARG_P (cum, mode))
  {
unsigned n_fpregs = (GET_MODE_SIZE (mode) + 7) >> 3;
rs6000_darwin64_record_arg_advance_flush (cum, bitpos, 0);
@@ -8930,7 +8930,7 @@ rs6000_darwin64_record_arg_advance_recur
else
  cum->words += n_fpregs;
  }
-   else if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, 1))
+   else if (USE_ALTIVEC_FOR_ARG_P (cum, mode, 1))
  {
rs6000_darwin64_record_arg_advance_flush (cum, bitpos, 0);
cum->vregno++;
@@ -8993,7 +8993,7 @@ rs6000_function_arg_advance_1 (CUMULATIV
 {
   bool stack = false;
 
-  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named))
+  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named))
{
  cum->vregno++;
  if (!TARGET_ALTIVEC)
@@ -9366,7 +9366,7 @@ rs6000_darwin64_record_arg_recurse (CUMU
 
if (TREE_CODE (ftype) == RECORD_TYPE)
  rs6000_darwin64_record_arg_recurse (cum, ftype, bitpos, rvec, k);
-   else if (cum->named && USE_FP_FOR_ARG_P (cum, mode, ftype))
+   else if (cum->named && USE_FP_FOR_ARG_P (cum, mode))
  {
unsigned n_fpreg = (GET_MODE_SIZE (mode) + 7) >> 3;
 #if 0
@@ -9394,7 +9394,7 @@ rs6000_darwin64_record_arg_recurse (CUMU
if (mode == TFmode || mode == TDmode)
  cum->fregno++;
  }
-   else if (cum->named && USE_ALTIVEC_FOR_ARG_P (cum, mode, ftype, 1))
+   else if (cum->named && USE_ALTIVEC_FOR_ARG_P (cum, mode, 1))
  {
rs6000_darwin64_record_arg_flush (cum, bitpos, rvec, k);
rvec[(*k)++]
@@ -9579,7 +9579,7 @@ rs6000_function_arg (cumulative_args_t c
   /* Else fall through to usual handling.  */
 }
 
-  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named))
+  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named))
 if (TARGET_64BIT && ! cum->prototype)
   {
/* Vector parameters get passed in vector register
@@ -9707,7 +9707,7 @@ rs6000_function_arg (cumulative_args_t c
   if (mode == TDmode && (cum->fregno % 2) == 1)
cum->fregno++;
 
-  if (USE_FP_FOR_ARG_P (cum, mode, type))
+  if (USE_FP_FOR_ARG_P (cum, mode))
{
  rtx rvec[GP_ARG_NUM_REG + 1];
  rtx r;
@@ -9823,7 +9823,7 @@ rs6000_arg_partial_bytes (cumulative_arg
   if (DEFAULT_ABI == ABI_V4)
 return 0;
 
-  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named)
+  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)
   && cum->nargs_prototype >= 0)
 return 0;
 
@@ -9833,7 +9833,7 @@ rs6000_arg_partial_bytes (cumulative_arg
 
   align_words = rs6000_parm_start (mode, type, cum->words);
 
-  if (USE_FP_FOR_ARG_P (cum, mode, type))
+  if (USE_FP_FOR_ARG_P (cum, mode))
 {
   /* If we are passing this arg in the fixed parameter save area
 (gprs or memory) as well as fprs, then this function should
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.w

[PATCH, rs6000] ELFv2 ABI preparation: Refactor rs6000_arg_partial_bytes

2013-11-11 Thread Ulrich Weigand

Hello,

this is the final patch preparing for the new ABI.

The logic in rs6000_arg_partial_bytes is a bit complex, since it apparently
still contains remnants from the time where this routine was used even for
arguments that are returned both in GPRs and FPRs/VRs (*and* memory).

These days, all such cases are handled completely in rs6000_function_arg
so rs6000_arg_partial_bytes should always return 0; which it *does*, even
though in a somewhat complex way.

This complex logic would make handling homogeneous structs more difficult
than it needs to be.  Therefore, this patch simplifies the logic by making
explicit the fact that rs6000_arg_partial_bytes does not actually need to
handle the cases described above.

No change in generated code expected.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_arg_partial_bytes): Simplify logic
by making use of the fact that for vector / floating point arguments
passed both in VRs/FPRs and in the fixed parameter area, the partial
bytes mechanism is in fact not used.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -9835,15 +9835,25 @@ rs6000_arg_partial_bytes (cumulative_arg
  tree type, bool named)
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  bool passed_in_gprs = true;
   int ret = 0;
   int align_words;
 
   if (DEFAULT_ABI == ABI_V4)
 return 0;
 
-  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)
-  && cum->nargs_prototype >= 0)
-return 0;
+  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named))
+{
+  /* If we are passing this arg in the fixed parameter save area
+ (gprs or memory) as well as VRs, we do not use the partial
+bytes mechanism; instead, rs6000_function_arg wil return a
+PARALLEL including a memory element as necessary.  */
+  if (TARGET_64BIT && ! cum->prototype)
+   return 0;
+
+  /* Otherwise, we pass in VRs only.  No partial copy possible.  */
+  passed_in_gprs = false;
+}
 
   /* In this complicated case we just disable the partial_nregs code.  */
   if (TARGET_MACHO && rs6000_darwin64_struct_check_p (mode, type))
@@ -9853,24 +9863,27 @@ rs6000_arg_partial_bytes (cumulative_arg
 
   if (USE_FP_FOR_ARG_P (cum, mode))
 {
+  unsigned long n_fpreg = (GET_MODE_SIZE (mode) + 7) >> 3;
+
   /* If we are passing this arg in the fixed parameter save area
-(gprs or memory) as well as fprs, then this function should
-return the number of partial bytes passed in the parameter
-save area rather than partial bytes passed in fprs.  */
+ (gprs or memory) as well as FPRs, we do not use the partial
+bytes mechanism; instead, rs6000_function_arg wil return a
+PARALLEL including a memory element as necessary.  */
   if (type
  && (cum->nargs_prototype <= 0
  || (DEFAULT_ABI == ABI_AIX
  && TARGET_XL_COMPAT
  && align_words >= GP_ARG_NUM_REG)))
return 0;
-  else if (cum->fregno + ((GET_MODE_SIZE (mode) + 7) >> 3)
-  > FP_ARG_MAX_REG + 1)
+
+  /* Otherwise, we pass in FPRs only.  Check for partial copies.  */
+  passed_in_gprs = false;
+  if (cum->fregno + n_fpreg > FP_ARG_MAX_REG + 1)
ret = (FP_ARG_MAX_REG + 1 - cum->fregno) * 8;
-  else if (cum->nargs_prototype >= 0)
-   return 0;
 }
 
-  if (align_words < GP_ARG_NUM_REG
+  if (passed_in_gprs
+  && align_words < GP_ARG_NUM_REG
   && GP_ARG_NUM_REG < align_words + rs6000_arg_size (mode, type))
 ret = (GP_ARG_NUM_REG - align_words) * (TARGET_32BIT ? 4 : 8);
 
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PING^2] Re: [PATCH] Caller instrumentation with -finstrument-calls

2013-11-11 Thread Jan Hubicka

Hi,
I apologize for long time for the review.

> > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> > index 8f7f5e5..f3ad003 100644
> > --- a/gcc/c-family/c-common.c
> > +++ b/gcc/c-family/c-common.c
> > @@ -343,6 +343,8 @@ static tree handle_tls_model_attribute (tree *, tree, 
> > tree, int,
> > bool *);
> >  static tree handle_no_instrument_function_attribute (tree *, tree,
> >  tree, int, bool *);
> > +static tree handle_no_instrument_calls_attribute (tree *, tree,
> > +tree, int, bool *);
> >  static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_returns_twice_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_no_limit_stack_attribute (tree *, tree, tree, int,
> > @@ -658,6 +660,9 @@ const struct attribute_spec c_common_attribute_table[] =
> >{ "no_instrument_function", 0, 0, true,  false, false,
> >   handle_no_instrument_function_attribute,
> >   false },
> > +  { "no_instrument_calls", 0, 0, true,  false, false,
> > + handle_no_instrument_calls_attribute,
> > + false },

So you are adding no_instrument_calls in addition to no_instrument_function and 
moreover you add
command line option to disable/enable this per line basis.  I see this follow 
the pre-existing
functio ninstrumentatio ncode, but can't we just make this part of the function 
instrumentation
instead of adding three new knobs for this? (i.e. have just -finstrument-calls 
that will be controlled
same way as function instrumentation).

> > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > index 8170a80..4657e48 100644
> > --- a/gcc/c/c-decl.c
> > +++ b/gcc/c/c-decl.c
> > @@ -2287,6 +2287,8 @@ merge_decls (tree newdecl, tree olddecl, tree 
> > newtype, tree oldtype)
> >   DECL_NO_LIMIT_STACK (newdecl) |= DECL_NO_LIMIT_STACK (olddecl);
> >   DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (newdecl)
> > |= DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (olddecl);
> > + DECL_NO_INSTRUMENT_CALLS_BEFORE_AFTER (newdecl)
> > +   |= DECL_NO_INSTRUMENT_CALLS_BEFORE_AFTER (olddecl);
> >   TREE_THIS_VOLATILE (newdecl) |= TREE_THIS_VOLATILE (olddecl);
> >   DECL_IS_MALLOC (newdecl) |= DECL_IS_MALLOC (olddecl);
> >   DECL_IS_OPERATOR_NEW (newdecl) |= DECL_IS_OPERATOR_NEW (olddecl);

You do not need to introduce new DECL flag for this.  We used to do this in 
dark ages before
we had attributes in place, but this time we usually just look up the given 
attribute.

> > +@item -finstrument-calls
> > +@opindex finstrument-calls
> > +Generate instrumentation calls immediately before and after each
> > +function call. The following profiling functions will be called with
> > +the address of the function that is called between them. Use
> > +@code{__builtin_return_address(0)} inside the profiling functions to
> > +get the addresses from where they are called.
> > +
> > +@smallexample
> > +void __gnu_profile_call_before (void *fn);
> > +void __gnu_profile_call_after  (void *fn);
> > +@end smallexample

I would expect this to be useful i.e. for dynamic callgraph construction.
In this case I would expect to have two parameters (calling function
and called function) instead of just one (calling function, right?)
> > diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> > index e2ae893..8401278 100644
> > --- a/gcc/gimplify.c
> > +++ b/gcc/gimplify.c
> > @@ -89,6 +89,8 @@ static struct gimplify_omp_ctx *gimplify_omp_ctxp;
> >  
> >  /* Forward declaration.  */
> >  static enum gimplify_status gimplify_compound_expr (tree *, gimple_seq *, 
> > bool);
> > +static bool flag_instrument_calls_exclude_p (tree fndecl);
> > +static bool flag_instrument_functions_exclude_p (tree fndecl);
> >  
> >  /* Mark X addressable.  Unlike the langhook we expect X to be in gimple
> > form and we don't do any syntax checking.  */
> > @@ -1153,6 +1155,63 @@ build_stack_save_restore (gimple *save, gimple 
> > *restore)
> >  1, tmp_var);
> >  }
> >  
> > +/* Returns the function decl that corresponds the function called in
> > +   CALL_EXPR if call instrumentation is enabled.  */
> > +
> > +static tree
> > +addr_expr_for_call_instrumentation (tree call_expr)
> > +{
> > +  tree addr_expr = NULL_TREE;
> > +
> > +  if (!gimplify_ctxp->into_ssa && flag_instrument_calls_before_after
> > +  && !DECL_NO_INSTRUMENT_CALLS_BEFORE_AFTER (current_function_decl)
> > +  && !flag_instrument_calls_exclude_p (current_function_decl))
> > +{
> > +  tree fndecl = get_callee_fndecl (call_expr);
> > +  if (fndecl)
> > +{
> > + if (!DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (fndecl)
> > + && !flag_instrument_functions_exclude_p (fndecl))

Here you can just use lookup_attribute ("no_instrument", attributes) != NULL
instead of DECL_NO_

Re: RFA: Fix PR middle-end/59049

2013-11-11 Thread Jakub Jelinek

On Fri, Nov 08, 2013 at 05:11:39PM +, Joern Rennecke wrote:
> On 8 November 2013 15:50, Steven Bosscher  wrote:
> > Even with the gimple opts disabled, a const-const comparison would
> > normally be folded by the RTL expanders.
> 
> Well, in this spirit, attached is another way to address the RTL side
> of the problem.
> As mention in the PR, the tree side of the problem started showing up
> in r204194.

> 2013-11-08  Joern Rennecke  
> 
>   PR middle-end/59049
>   * expmed.c (emit_store_flag): Fail for const-const comparison.

LGTM (I'd just write return NULL_RTX; instead of return 0;).

> --- expmed.c  (revision 204568)
> +++ expmed.c  (working copy)
> @@ -5401,6 +5401,13 @@ emit_store_flag (rtx target, enum rtx_co
>rtx subtarget;
>rtx tem, last, trueval;
>  
> +  /* If we compare constants, we shouldn't use a store-flag operation,
> + but a constant load.  We can get there via the vanilla route that
> + usually generates a compare-branch sequence, but will in this case
> + fold the comparison to a constant, and thus elide the branch.  */
> +  if (CONSTANT_P (op0) && CONSTANT_P (op1))
> +return 0;
> +
>tem = emit_store_flag_1 (target, code, op0, op1, mode, unsignedp, 
> normalizep,
>  target_mode);
>if (tem)


Jakub

[PATCH, rs6000] ELFv2 ABI 3/8: Track single CR fields in DWARF CFI

2013-11-11 Thread Ulrich Weigand

Hello,

this patch adds another ELFv2 ABI feature: explicit tracking of CR fields
in DWARF CFI.

In the current ABI, DWARF CFI contains only a single record describing
the save location of the whole CR field.  It is implicit that all (or
at least all call-clobbered) fields are present at that location.

Now, if you use the instructions that save and restore the whole CR
at once, this approach might seem reasonable.  Unfortunately, with
current POWER processors, those instructions tend to be significantly
slower that those that access only single CR fields.

In particular in routines where only one or two CR fields are actually
clobbered and need to be saved, we could improve performance of prolog
and epilog code by saving/restoring only selected CR fields.  However,
this is not possible in the current ABI since there is no way to
describe this fact in the CFI.

With the ELFv2 ABI, every CR field gets its own CFI record (using the
register numbers 68 .. 75 to stand for CR0 .. CR7).  Now, those fields
will still usually be saved in the same 4-byte field on the stack.
The semantics of a CFI record for field CRx is that the memory location
holds 4 bytes, and the 4-bit nibble corresponding to CRx within those
4 bytes hold the CRx value to be restored.

The one problem with this scheme is the way uw_install_context tries
to modify saved valued when unwinding the stack: it will simply copy
over the whole field into the save slot of the unwinder routine
(that calls __builtin_eh_return).  This clearly does not work if
multiple CR fields need to be restored independently.

To fix this, the prolog/epilog code for unwinder routines will use
*multiple* stack slots, one for each call-saved CR fields, and save
and restore those fields to and from their own slot.  This will
allow uw_install_context to install values for multiple fields.
(Note that there is already precedent for unwinder routines being
treated specially in the rs6000.c prologue/epilogue code ...)

Bye,
Ulrich


gcc/ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (struct rs6000_stack): New member ehcr_offset.
(rs6000_stack_info): For ABI_ELFv2, allocate space for separate CR
field save areas if the function calls __builtin_eh_return.
(rs6000_emit_move_from_cr): New function.
(rs6000_emit_prologue): Use it.  For ABI_ELFv2, generate separate
CFI records for each saved CR field.  For functions that call
__builtin_eh_return, save all CR fields into separate slots.
(restore_saved_cr): For ABI_ELFv2, generate separate CFA_RESTORE
entries for each saved CR field.
(add_crlr_cfa_restore): Likewise.
(rs6000_emit_epilogue): For ABI_ELFv2, if the function calls
__builtin_eh_return, restore each CR field from its own slot.

libgcc/ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/linux-unwind.h (R_CR3, R_CR4): New macros.
(ppc_fallback_frame_state) [_CALL_ELF == 2]: Create CFI entry
for CR3 and CR4.


Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -97,6 +97,7 @@ typedef struct rs6000_stack {
   int spe_gp_save_offset;  /* offset to save spe 64-bit gprs  */
   int varargs_save_offset; /* offset to save the varargs registers */
   int ehrd_offset; /* offset to EH return data */
+  int ehcr_offset; /* offset to EH CR field data */
   int reg_size;/* register size (4 or 8) */
   HOST_WIDE_INT vars_size; /* variable save area size */
   int parm_size;   /* outgoing parameter size */
@@ -19847,6 +19848,7 @@ rs6000_stack_info (void)
   rs6000_stack_t *info_ptr = &stack_info;
   int reg_size = TARGET_32BIT ? 4 : 8;
   int ehrd_size;
+  int ehcr_size;
   int save_align;
   int first_gp;
   HOST_WIDE_INT non_fixed_size;
@@ -19940,6 +19942,18 @@ rs6000_stack_info (void)
   else
 ehrd_size = 0;
 
+  /* In the ELFv2 ABI, we also need to allocate space for separate
+ CR field save areas if the function calls __builtin_eh_return.  */
+  if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return)
+{
+  /* This hard-codes that we have three call-saved CR fields.  */
+  ehcr_size = 3 * reg_size;
+  /* We do *not* use the regular CR save mechanism.  */
+  info_ptr->cr_save_p = 0;
+}
+  else
+ehcr_size = 0;
+
   /* Determine various sizes.  */
   info_ptr->reg_size = reg_size;
   info_ptr->fixed_size   = RS6000_SAVE_AREA;
@@ -20009,6 +20023,8 @@ rs6000_stack_info (void)
}
   else
info_ptr->ehrd_offset  = info_ptr->gp_save_offset - ehrd_size;
+
+  info_ptr->ehcr_offset  = info_ptr->ehrd_offset - ehcr_size;
   info_ptr->cr_save_offset   = reg_size; /* first word when 64-bit.  */
   info_ptr->lr_save_offset   = 2*reg_size;
   break;
@@ -20071,6 +20087,7 @@ rs6000_st

[PATCH, rs6000] ELFv2 ABI 2/8: Remove function descriptors

2013-11-11 Thread Ulrich Weigand

Hello,

this patch adds the first major feature of the ELFv2 ABI: removal
of function descriptors.


In the current ABI, it is the responsibility of a caller to set
up the TOC pointer needed by the callee by loading the correct
value into r2.  This typically happens in one of these ways:

- For a direct function call within the same module, caller and
  callee share the same TOC, so r2 already contains the correct
  value.

- For a direct function call to a function in another module,
  the linker will interpose a PLT stub which reloads r2 to be
  appropriate for the callee, with help from ld.so.

- For an indirect function call, the "function pointer" points
  to function descriptor that holds the target function address
  as well as the appropriate TOC value (and a static chain
  value for nested functions).  The caller must load those
  values up before transfering control to the callee.


With the new ABI, it is the responsibility of the callee to
set up its own TOC pointer.  This has a number of advantages,
in particular for a callee that doesn't actually need a TOC,
and makes the function pointer call sequence more effective
by avoiding pipeline hazards.

However, it remains true that with the PowerPC ISA, it is
difficult to actually establish addressability to the TOC
without any outside help.  To avoid introducing new performance
regressions, the new ABI instead provides a dual entry point
scheme.

Any function that needs a TOC may provide two entry points:

- The first ("global") entry point is intended to called via
  indirect calls.  It expects r12 to be set up to point to
  the entry point address itself.  (Since in order to perform
  an indirect call, the address must be loaded into some GPR
  anyway, this does not impose a substantial impact on the
  caller ...)  Code at the global entry point is expected
  to load up r2 with the appropriate TOC value for this
  function (and it may use the r12 value to compute that),
  and then fall through to the local entry point.

- The second ("local") entry point expects r2 to be set up
  to the current TOC, much like an entry point in the current
  ABI does.  However, it must not expect r12 to hold any
  particular value.  This means that a local direct function
  call within the same module can be done via a simple "bl"
  instruction just as today.  Note that the linker will
  automatically direct a "bl func" to the *local* entry
  point associated with the symbol "func"; not the global one.

If local and global entry points coincide, the function may
expect no particular value in either r12 or r2; this can be
used by functions that do not need a TOC.

There exists just one single ELF symbol for both local and
global entry points; its symbol value refers to the global
entry point, and flag bits in ELF st_other encode the
offset from there to the local entry point.  The compiler
emits a .localentry directive to announce this location
to the assembler, which will encode it as appropriate.


This patch implements all aspects needed for this scheme:

- Emit ".abiversion 2" assembler directive to help the assembler
  and linker understand the ABI implemented in this file.
- Remove all handling of function descriptors / .opd section /
  dot symbols for the ELFv2 ABI.
- Output a global entry point prologue and local entry point
  marker for functions that use the TOC.
- Implement the ELFv2 function pointer call sequence.
- Use trampolines to handle nested functions, just like on
  32-bit (including marking the stack as executable).
- No longer scan for the old ABI indirect call sequence in
  linux-unwind.h.
- Update assembler code to the new ABI:
   - gcc/config/rs6000/ppc-asm.h (used by libgcc and elsewhere)
   - lititm/config/powerpc/sjlj.S
   - gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c

In addition, some related changes are necessary:

- Move the code generating the call to _mcount for -mprofile-kernel
  from output_function_profiler to rs6000_output_function_prologue,
  since this call must happen at the local entry point, not the
  global entry point.
- Disable (now useless) tests for -mpointers-to-nested-functions.
- Fix the libstc++ extract_symvers.in script to ignore markers
  emitted by readelf --symbols that indicate local entry points.
- Fix the "gotest" script to no longer expect function symbols
  to reside in the data segment.  (I understand this last bit
  needs to go into the Go repository first ...)

Bye,
Ulrich


gcc/ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (machine_function): New member
r2_setup_needed.
(rs6000_emit_prologue): Set r2_setup_needed if necessary.
(rs6000_output_mi_thunk): Set r2_setup_needed.
(rs6000_output_function_prologue): Output global entry point
prologue and local entry point marker if needed for ABI_ELFv2.
Output -mprofile-kernel code here.
(output_function_profiler): Do not output -mprofile-kernel
code here; mov

Re: Some wide-int review comments

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck
 wrote:
> On 11/11/2013 06:49 AM, Richard Biener wrote:
>>
>> On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck 
>> wrote:
>>>
>>> On 11/08/2013 05:30 AM, Richard Sandiford wrote:

   From tree-vrp.c:

 @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code
/* If the singed operation wraps then int_const_binop has done
   everything we want.  */
;
 +  /* Signed division of -1/0 overflows and by the time it gets here
 + returns NULL_TREE.  */
 +  else if (!res)
 +return NULL_TREE;
  else if ((TREE_OVERFLOW (res)
  && !TREE_OVERFLOW (val1)
  && !TREE_OVERFLOW (val2))

 Why is this case different from trunk?  Or is it a bug-fix independent
 of wide-int?
>>>
>>> the api for division is different for wide-int than it was for
>>> double-int.
>>
>> But this is using a tree API (int_const_binop) that didn't change
>> (it returned NULL for / 0 previously).  So what makes us arrive here
>> now?  (I agree there is a bug in VRP, but it shouldn't manifest itself
>> only on wide-int)
>>
>> Richard.
>
> My reading of the code is that is that i changed int_const_binop to return
> null_tree for this case.

Trunk has:

case TRUNC_DIV_EXPR:
case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR:
case EXACT_DIV_EXPR:
  /* This is a shortcut for a common special case.  */
  if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0
  && !TREE_OVERFLOW (arg1)
  && !TREE_OVERFLOW (arg2)
  && op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0)
{
  if (code == CEIL_DIV_EXPR)
op1.low += op2.low - 1;

  res.low = op1.low / op2.low, res.high = 0;
  break;
}

  /* ... fall through ...  */

case ROUND_DIV_EXPR:
  if (op2.is_zero ())
return NULL_TREE;

so it already returns NULL_TREE on divide by zero.

> On the trunk, only rem returns null_tree for divide by 0, on the wide int
> branch, both div and rem return null tree.
>
> I know that this is going to bring on a string of questions that i do not
> remember the answers to as to why i made that change. but
> fold-const.c:int_const_binop_1 now returns null_tree and this is just
> fallout from that change.
>
> Kenny

 Thanks,
 Richard
>>>
>>>
>

Re: PING^3 Re: [PATCH] Add -fno-instrument-function v2

2013-11-11 Thread Andi Kleen

Andi Kleen  writes:


PING^3

Since it doesn't look like a generic solution for the
LTO options problem will appear this development cycle,
I would still like to pursue this option for 4.9.
This would help fixing parts of the linux kernel LTO
build.

Can someone please review the patch?

Thanks

> Andi Kleen  writes:
>
> PING^2
>
>> Andi Kleen  writes:
>>
>>> From: Andi Kleen 
>>>
>>> [I posted this originally quite some time ago.
>>> This version fixes all review problems, particularly
>>> it works for C++ too and the test case really works.]
>>
>> Ping!
>>
>> Could someone please review it.
>>
>> Note this might be obsolete with Honza's LTO option work, but if it's
>> not covered in his first iteration I would still have it earlier for the
>> Linux kernel LTO build (fixed ftrace)
>>
>> -Andi
>>
>>> This adds a new C/C++ option to force
>>> __attribute__((no_instrument_function)) on every function compiled.
>>>
>>> This is useful together with LTO. You may want to have the whole
>>> program compiled with -pg and have to specify that in the LTO
>>> link, but want to disable it for some specific files. As the
>>> option works on the frontend level it is already passed through
>>> properly by LTO.
>>>
>>> Without LTO it is equivalent to not specifing -pg or -mfentry.
>>>
>>> This fixes some missing functionality in the Linux kernel LTO port.
>>>
>>> Passed bootstrap and test suite on x86_64-linux. Ok?
>>>
>>> gcc/:
>>>
>>> 2013-08-10  Andi Kleen 
>>>
>>> * c.opt (fno-instrument-function): Document.
>>>
>>> gcc/c:
>>>
>>> 2013-08-10  Andi Kleen 
>>>
>>> * c-decl.c (start_function): Handle force_no_instrument_function
>>>
>>> gcc/cp:
>>>
>>> 2013-08-10  Andi Kleen 
>>>
>>> * decl.c (start_preparsed_function): Handle
>>> force_no_instrument_function
>>>
>>> gcc/testsuite:
>>>
>>> 2013-08-10  Andi Kleen 
>>>
>>> * g++.dg/fno-instrument-function.C: Add.
>>> * gcc.dg/fno-instrument-function.c: Add.
>>> ---
>>>  gcc/c-family/c.opt |  4 
>>>  gcc/c/c-decl.c |  3 +++
>>>  gcc/cp/decl.c  |  3 +++
>>>  gcc/doc/invoke.texi|  8 +++-
>>>  gcc/testsuite/g++.dg/fno-instrument-function.C | 18 ++
>>>  gcc/testsuite/gcc.dg/fno-instrument-function.c | 24 
>>> 
>>>  6 files changed, 59 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/g++.dg/fno-instrument-function.C
>>>  create mode 100644 gcc/testsuite/gcc.dg/fno-instrument-function.c
>>>
>>> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
>>> index 10ae84d..2159f89 100644
>>> --- a/gcc/c-family/c.opt
>>> +++ b/gcc/c-family/c.opt
>>> @@ -1014,6 +1014,10 @@ fnil-receivers
>>>  ObjC ObjC++ Var(flag_nil_receivers) Init(1)
>>>  Assume that receivers of Objective-C messages may be nil
>>>  
>>> +fno-instrument-function
>>> +C C++ ObjC ObjC++ RejectNegative Report Var(force_no_instrument_function)
>>> +Force __attribute__((no_instrument_function)) for all functions in 
>>> translation unit.
>>> +
>>>  fnonansi-builtins
>>>  C++ ObjC++ Var(flag_no_nonansi_builtin, 0)
>>>  
>>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
>>> index d9bbf5c..15717a9 100644
>>> --- a/gcc/c/c-decl.c
>>> +++ b/gcc/c/c-decl.c
>>> @@ -7876,6 +7876,9 @@ start_function (struct c_declspecs *declspecs, struct 
>>> c_declarator *declarator,
>>>if (current_scope == file_scope)
>>>  maybe_apply_pragma_weak (decl1);
>>>  
>>> +  if (force_no_instrument_function)
>>> +DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl1) = 1;  
>>> +
>>>/* Warn for unlikely, improbable, or stupid declarations of `main'.  */
>>>if (warn_main && MAIN_NAME_P (DECL_NAME (decl1)))
>>>  {
>>> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
>>> index 01804d2..103188b 100644
>>> --- a/gcc/cp/decl.c
>>> +++ b/gcc/cp/decl.c
>>> @@ -13023,6 +13023,9 @@ start_preparsed_function (tree decl1, tree attrs, 
>>> int flags)
>>>&& lookup_attribute ("noinline", attrs))
>>>  warning (0, "inline function %q+D given attribute noinline", decl1);
>>>  
>>> +  if (force_no_instrument_function)
>>> +DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl1) = 1;
>>> +
>>>/* Handle gnu_inline attribute.  */
>>>if (GNU_INLINE_P (decl1))
>>>  {
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 782b569..bc20a77 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -169,7 +169,7 @@ in the following sections.
>>>  -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol
>>>  -fno-asm  -fno-builtin  -fno-builtin-@var{function} @gol
>>>  -fhosted  -ffreestanding -fopenmp -fms-extensions -fplan9-extensions @gol
>>> --trigraphs  -traditional  -traditional-cpp @gol
>>> +-trigraphs  -traditional  -traditional-cpp -fno-instrument-function @gol
>>>  -fallow-single-precision  -fcond-mismatch -flax-vector-conversions @gol
>>>  -fsigned-bitfields  -fsigned-char @gol
>>>  -funsigned-bitf

[PATCH, rs6000] ELFv2 ABI 1/8: Add options and infrastructure

2013-11-11 Thread Ulrich Weigand

Hello,

this is the first patch in the series to add support for the ELFv2 ABI.

The ELFv2 ABI is the intended ABI for the new powerpc64le-linux port.
However, it is not inherently tied to the byte order; it it possible
in principle to use the ELFv2 ABI in big-endian mode too.

Therefore, it is introduces via a new pair of options
   -mabi=elfv1 / -mabi=elfv2
where -mabi=elfv1 select the current Linux ABI, and -mabi=elfv2
selects the new one.

If neither option is given, the compiler defaults to whatever ABI
was specified at configure time using the
   --with-abi=elfv1 / --with-abi=elfv2
configure option.

If this was not specified either, every subtarget can provide a
master default by defining (or not) the LINUX64_DEFAULT_ABI_ELFv2
macro.

The patch series is structured as follows:

- This first patch adds the new enum rs6000_abi value ABI_ELFv2
  and all the configure logic needed to set it.  It is treated
  just like ABI_AIX (on a modern system) at this point; no change
  in generated code is expected yet.

- A series of follow on patches will add the various features
  defining the ELFv2 ABI.  Note that they will still not be
  active by default anywhere.

- The final patch will simply turn the switch to make ELFv2
  the default ABI on powerpc64le-linux.

The whole series was tested on:

  - powerpc64-linux with no regression, including no regression
in an ALT_CC_UNDER_TEST/ALT_CXX_UNDER_TEST run of compat.exp
and struct-layout-1.exp against a current compiler.

  - powerpc64-linux --with-abi=elfv2 in mock-up big-endian ELFv2
environment running with some hacks on an existing system.

  - powerpc64le-linux, using the new default ELFv2 in a rebuilt
small OS image using the new ABI everywhere.
 
(Of course, patches to binutils, glibc, and libffi to support the
new ELFv2 ABI are also required.)

The tests were using all languages (including Java, Go, and
Objective-C++; on powerpc64le-linux also Ada), and all target
libraries except libsanitizer (which currently seems broken
on mainline).

There is a small number of regressions on powerpc64le-linux
which seem to be due to unrelated issues:

- FAIL: g++.dg/cpp1y/vla-initlist1.C execution test
  This is an invalid assumption in the test case

- FAIL: go.test/test/fixedbugs/bug296.go execution,  -O2 -g
  The ELFv2 ABI  seems to be exposing a Go frontend bug here

- FAIL: runtime/pprof   (in libgo)
  This is due to a glibc problem with unwinding through a
  context created via makecontext

I'll address those separately.  (Of course, there are also
other pre-existing regressions on powerpc64le-linux simply
due to little-endian issues.  Those are unaffected by ELFv2.)


I'll add more detailed descriptions of the actual ABI features
with the various follow-on patches that implement those.

This patch specifically only adds the options as described above.
ABI_ELFv2 is implemented to be equivalent to ABI_AIX on a modern
system; since the ABI is new, there is no need to continue to
support legacy features.  This means specifically:
- DOT_SYMBOLS is assumed to be false
- rs6000_compat_align_parm is assumed to be false
- code in linux-unwind.h that handles old kernels/linkers is removed

The patch also adds two new pre-defined macros:
- _CALL_ELF is set to 1 or 2 to denote the ELFv1 / ELFv2 ABI
- __STRUCT_PARM_ALIGN__ is set to 16 if aggregates passed by value
  are aligned to 16 if their native alignment requires that
  (this is necessary to implement libffi, for example)

In addition, the patch adds a testsuite effective target check
for the ELFv2 ABI, and disables the pr57949 tests (these verify
the operation of the -mcompat-align-parm option, which is no
longer useful in the ELFv2 ABI).


To avoid having a partial ABI implementation in tree, it seems best
to commit this whole patch series as a single commit.

Is the series OK for mainline?

Bye,
Ulrich




gcc/ChangeLog:

2013-11-11  Ulrich Weigand  

* config.gcc [powerpc*-*-* | rs6000-*-*]: Support --with-abi=elfv1
and --with-abi=elfv2.
* config/rs6000/option-defaults.h (OPTION_DEFAULT_SPECS): Add "abi".
* config/rs6000/rs6000.opt (mabi=elfv1): New option.
(mabi=elfv2): Likewise.
* config/rs6000/rs6000-opts.h (enum rs6000_abi): Add ABI_ELFv2.
* config/rs6000/linux64.h (DEFAULT_ABI): Do not hard-code to AIX_ABI
if !RS6000_BI_ARCH.
(ELFv2_ABI_CHECK): New macro.
(SUBSUBTARGET_OVERRIDE_OPTIONS): Use it to decide whether to set
rs6000_current_abi to ABI_AIX or ABI_ELFv2.
(GLIBC_DYNAMIC_LINKER64): Support ELFv2 ld.so version.
* config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): Predefine
_CALL_ELF and __STRUCT_PARM_ALIGN__ if appropriate.

* config/rs6000/rs6000.c (rs6000_debug_reg_global): Handle ABI_ELFv2.
(debug_stack_info): Likewise.
(rs6000_file_start): Treat ABI_ELFv2 the same as ABI_AIX.
(rs6000_legitimize_tls_address): Likewise.
(rs6000_condit

[PATCH, rs6000] ELFv2 ABI 4/8: Struct passing calling convention changes

2013-11-11 Thread Ulrich Weigand

Hello,

this patch implements the new struct calling convention for ELFv2.

With the new ABI, so-called "homogeneous" aggregates, i.e. struct, arrays,
or unions that (recursively) contain only elements of the same floating-
point or vector type are passed as if those elements were passed as
separate arguments.  (This is done for up to 8 such elements.)

After the refactoring of the rs6000_function_arg / rs6000_arg_partial_bytes
logic that was done in a prior patch, implementing support for
homogenous aggregates is now mostly straightforward.  Note that
rs6000_psave_function_arg can now get called for BLKmode arguments,
and has to handle them using a PARALLEL.

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  
Michael Gschwind  

* config/rs6000/rs6000.h (AGGR_ARG_NUM_REG): Define.
* config/rs6000/rs6000.c (rs6000_aggregate_candidate): New function.
(rs6000_discover_homogeneous_aggregate): Likewise.
(rs6000_function_arg_boundary): Handle homogeneous aggregates.
(rs6000_function_arg_advance_1): Likewise.
(rs6000_function_arg): Likewise.
(rs6000_arg_partial_bytes): Likewise.
(rs6000_psave_function_arg): Handle BLKmode arguments.

Index: gcc/gcc/config/rs6000/rs6000.h
===
--- gcc.orig/gcc/config/rs6000/rs6000.h
+++ gcc/gcc/config/rs6000/rs6000.h
@@ -1641,6 +1641,9 @@ extern enum reg_class rs6000_constraints
 #define ALTIVEC_ARG_MAX_REG (ALTIVEC_ARG_MIN_REG + 11)
 #define ALTIVEC_ARG_NUM_REG (ALTIVEC_ARG_MAX_REG - ALTIVEC_ARG_MIN_REG + 1)
 
+/* Maximum number of registers per ELFv2 homogeneous aggregate argument.  */
+#define AGGR_ARG_NUM_REG 8
+
 /* Return registers */
 #define GP_ARG_RETURN GP_ARG_MIN_REG
 #define FP_ARG_RETURN FP_ARG_MIN_REG
Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -8460,6 +8460,219 @@ rs6000_member_type_forces_blk (const_tre
&& TARGET_ALTIVEC_ABI   \
&& (NAMED))
 
+/* Walk down the type tree of TYPE counting consecutive base elements.
+   If *MODEP is VOIDmode, then set it to the first valid floating point
+   or vector type.  If a non-floating point or vector type is found, or
+   if a floating point or vector type that doesn't match a non-VOIDmode
+   *MODEP is found, then return -1, otherwise return the count in the
+   sub-tree.  */
+
+static int
+rs6000_aggregate_candidate (const_tree type, enum machine_mode *modep)
+{
+  enum machine_mode mode;
+  HOST_WIDE_INT size;
+
+  switch (TREE_CODE (type))
+{
+case REAL_TYPE:
+  mode = TYPE_MODE (type);
+  if (!SCALAR_FLOAT_MODE_P (mode))
+   return -1;
+
+  if (*modep == VOIDmode)
+   *modep = mode;
+
+  if (*modep == mode)
+   return 1;
+
+  break;
+
+case COMPLEX_TYPE:
+  mode = TYPE_MODE (TREE_TYPE (type));
+  if (!SCALAR_FLOAT_MODE_P (mode))
+   return -1;
+
+  if (*modep == VOIDmode)
+   *modep = mode;
+
+  if (*modep == mode)
+   return 2;
+
+  break;
+
+case VECTOR_TYPE:
+  if (!TARGET_ALTIVEC_ABI || !TARGET_ALTIVEC)
+   return -1;
+
+  /* Use V4SImode as representative of all 128-bit vector types.  */
+  size = int_size_in_bytes (type);
+  switch (size)
+   {
+   case 16:
+ mode = V4SImode;
+ break;
+   default:
+ return -1;
+   }
+
+  if (*modep == VOIDmode)
+   *modep = mode;
+
+  /* Vector modes are considered to be opaque: two vectors are
+equivalent for the purposes of being homogeneous aggregates
+if they are the same size.  */
+  if (*modep == mode)
+   return 1;
+
+  break;
+
+case ARRAY_TYPE:
+  {
+   int count;
+   tree index = TYPE_DOMAIN (type);
+
+   /* Can't handle incomplete types.  */
+   if (!COMPLETE_TYPE_P (type))
+ return -1;
+
+   count = rs6000_aggregate_candidate (TREE_TYPE (type), modep);
+   if (count == -1
+   || !index
+   || !TYPE_MAX_VALUE (index)
+   || !host_integerp (TYPE_MAX_VALUE (index), 1)
+   || !TYPE_MIN_VALUE (index)
+   || !host_integerp (TYPE_MIN_VALUE (index), 1)
+   || count < 0)
+ return -1;
+
+   count *= (1 + tree_low_cst (TYPE_MAX_VALUE (index), 1)
+ - tree_low_cst (TYPE_MIN_VALUE (index), 1));
+
+   /* There must be no padding.  */
+   if (!host_integerp (TYPE_SIZE (type), 1)
+   || (tree_low_cst (TYPE_SIZE (type), 1)
+   != count * GET_MODE_BITSIZE (*modep)))
+ return -1;
+
+   return count;
+  }
+
+case RECORD_TYPE:
+  {
+   int count = 0;
+   int sub_count;
+   tree field;
+
+   /* Can't handle incomplete types.  */
+   if (!COMPLETE_TYPE_P (type))
+ return -1;
+
+   for (field = TY

[PATCH, rs6000] ELFv2 ABI 7/8: Eliminate some stack frame fields

2013-11-11 Thread Ulrich Weigand

Hello,

this is the second part of reducing stack space consumption for the
ELFv2 ABI: the old ABI reserved six words in every stack frame for
various purposes, and two of those were basically unused: one word
for "compiler use" and one word for "linker use".

Since neither the compiler nor the linker actually ever made any
nontrivial use of these fields, they're now eliminated in the new
ABI.

This patch implements this change, which is mostly straightforward
except for the fact that due to the change, the stack offset of
the TOC save area changes, which was hard-coded in a number of
places ...

The patch also updates a number of test cases that hardcoded the
stack layout.

Bye,
Ulrich


gcc/ChangeLog:

2013-11-11  Ulrich Weigand  
Alan Modra  

* config/rs6000/rs6000.h (RS6000_SAVE_AREA): Handle ABI_ELFv2.
(RS6000_SAVE_TOC): Remove.
(RS6000_TOC_SAVE_SLOT): New macro.
* config/rs6000/rs6000.c (rs6000_parm_offset): New function.
(rs6000_parm_start): Use it.
(rs6000_function_arg_advance_1): Likewise.
(rs6000_emit_prologue): Use RS6000_TOC_SAVE_SLOT.
(rs6000_emit_epilogue): Likewise.
(rs6000_call_aix): Likewise.
(rs6000_output_function_prologue): Do not save/restore r11
around calling _mcount for ABI_ELFv2.

libgcc/ChangeLog:

2013-11-11  Ulrich Weigand  
Alan Modra  

* config/rs6000/linux-unwind.h (TOC_SAVE_SLOT): Define.
(frob_update_context): Use it.

gcc/testsuite/ChangeLog:

2013-11-11  Ulrich Weigand  

* gcc.target/powerpc/ppc64-abi-1.c (stack_frame_t): Remove
compiler and linker field if _CALL_ELF == 2.
* gcc.target/powerpc/ppc64-abi-2.c (stack_frame_t): Likewise.
* gcc.target/powerpc/ppc64-abi-dfp-1.c (stack_frame_t): Likewise.
* gcc.dg/stack-usage-1.c (SIZE): Update value for _CALL_ELF == 2.


Index: gcc/gcc/config/rs6000/rs6000.h
===
--- gcc.orig/gcc/config/rs6000/rs6000.h
+++ gcc/gcc/config/rs6000/rs6000.h
@@ -1529,12 +1529,12 @@ extern enum reg_class rs6000_constraints
 
 /* Size of the fixed area on the stack */
 #define RS6000_SAVE_AREA \
-  ((DEFAULT_ABI == ABI_V4 ? 8 : 24) << (TARGET_64BIT ? 1 : 0))
+  ((DEFAULT_ABI == ABI_V4 ? 8 : DEFAULT_ABI == ABI_ELFv2 ? 16 : 24)\
+   << (TARGET_64BIT ? 1 : 0))
 
-/* MEM representing address to save the TOC register */
-#define RS6000_SAVE_TOC gen_rtx_MEM (Pmode, \
-plus_constant (Pmode, stack_pointer_rtx, \
-   (TARGET_32BIT ? 20 : 40)))
+/* Stack offset for toc save slot.  */
+#define RS6000_TOC_SAVE_SLOT \
+  ((DEFAULT_ABI == ABI_ELFv2 ? 12 : 20) << (TARGET_64BIT ? 1 : 0))
 
 /* Align an address */
 #define RS6000_ALIGN(n,a) (((n) + (a) - 1) & ~((a) - 1))
Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -9029,6 +9029,16 @@ rs6000_function_arg_boundary (enum machi
 return PARM_BOUNDARY;
 }
 
+/* The offset in words to the start of the parameter save area.  */
+
+static unsigned int
+rs6000_parm_offset (void)
+{
+  return (DEFAULT_ABI == ABI_V4 ? 2
+ : DEFAULT_ABI == ABI_ELFv2 ? 4
+ : 6);
+}
+
 /* For a function parm of MODE and TYPE, return the starting word in
the parameter area.  NWORDS of the parameter area are already used.  */
 
@@ -9037,11 +9047,9 @@ rs6000_parm_start (enum machine_mode mod
   unsigned int nwords)
 {
   unsigned int align;
-  unsigned int parm_offset;
 
   align = rs6000_function_arg_boundary (mode, type) / PARM_BOUNDARY - 1;
-  parm_offset = DEFAULT_ABI == ABI_V4 ? 2 : 6;
-  return nwords + (-(parm_offset + nwords) & align);
+  return nwords + (-(rs6000_parm_offset () + nwords) & align);
 }
 
 /* Compute the size (in words) of a function argument.  */
@@ -9281,15 +9289,13 @@ rs6000_function_arg_advance_1 (CUMULATIV
{
  int align;
 
- /* Vector parameters must be 16-byte aligned.  This places
-them at 2 mod 4 in terms of words in 32-bit mode, since
-the parameter save area starts at offset 24 from the
-stack.  In 64-bit mode, they just have to start on an
-even word, since the parameter save area is 16-byte
-aligned.  Space for GPRs is reserved even if the argument
-will be passed in memory.  */
+ /* Vector parameters must be 16-byte aligned.  In 32-bit
+mode this means we need to take into account the offset
+to the parameter save area.  In 64-bit mode, they just
+have to start on an even word, since the parameter save
+area is 16-byte aligned.  */
  if (TARGET_32BIT)
-   align = (2 - cum->words) & 3;
+   align = -(rs6000_parm_offset () + cum->words) & 3;
  else

[PATCH, rs6000] ELFv2 ABI 5/8: Struct return calling convention changes

2013-11-11 Thread Ulrich Weigand

Hello,

analogously to the previous patch handling homogeneous aggregates
as arguments, this patch handles such aggregates as return values.

In addition, the ELFv2 ABI returns any aggregate of up to 16 bytes
in size (that is not otherwise handled) in up to two GPRs.

In either case, the return value is returned formatted exactly as if
it were being passed as first argument.  In order to get this correct
in the big-endian ELFv2 case, we need to provide a non-default
implementation of TARGET_RETURN_IN_MSB.

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  
Michael Gschwind  

* config/rs6000/rs6000.h (FP_ARG_MAX_RETURN): New macro.
(ALTIVEC_ARG_MAX_RETURN): Likewise.
(FUNCTION_VALUE_REGNO_P): Use them.
* config/rs6000/rs6000.c (TARGET_RETURN_IN_MSB): Define.
(rs6000_return_in_msb): New function.
(rs6000_return_in_memory): Handle ELFv2 homogeneous aggregates.
Handle aggregates of up to 16 bytes for ELFv2.
(rs6000_function_value): Handle ELFv2 homogeneous aggregates.


Index: gcc/gcc/config/rs6000/rs6000.h
===
--- gcc.orig/gcc/config/rs6000/rs6000.h
+++ gcc/gcc/config/rs6000/rs6000.h
@@ -1648,6 +1648,10 @@ extern enum reg_class rs6000_constraints
 #define GP_ARG_RETURN GP_ARG_MIN_REG
 #define FP_ARG_RETURN FP_ARG_MIN_REG
 #define ALTIVEC_ARG_RETURN (FIRST_ALTIVEC_REGNO + 2)
+#define FP_ARG_MAX_RETURN (DEFAULT_ABI != ABI_ELFv2 ? FP_ARG_RETURN\
+  : (FP_ARG_RETURN + AGGR_ARG_NUM_REG - 1))
+#define ALTIVEC_ARG_MAX_RETURN (DEFAULT_ABI != ABI_ELFv2 ? ALTIVEC_ARG_RETURN \
+   : (ALTIVEC_ARG_RETURN + AGGR_ARG_NUM_REG - 1))
 
 /* Flags for the call/call_value rtl operations set up by function_arg */
 #define CALL_NORMAL0x  /* no special processing */
@@ -1667,8 +1671,10 @@ extern enum reg_class rs6000_constraints
On RS/6000, this is r3, fp1, and v2 (for AltiVec).  */
 #define FUNCTION_VALUE_REGNO_P(N)  \
   ((N) == GP_ARG_RETURN
\
-   || ((N) == FP_ARG_RETURN && TARGET_HARD_FLOAT && TARGET_FPRS)   \
-   || ((N) == ALTIVEC_ARG_RETURN && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI))
+   || ((N) >= FP_ARG_RETURN && (N) <= FP_ARG_MAX_RETURN
\
+   && TARGET_HARD_FLOAT && TARGET_FPRS)\
+   || ((N) >= ALTIVEC_ARG_RETURN && (N) <= ALTIVEC_ARG_MAX_RETURN  \
+   && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI))
 
 /* 1 if N is a possible register number for function argument passing.
On RS/6000, these are r3-r10 and fp1-fp13.
Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -1449,6 +1449,9 @@ static const struct attribute_spec rs600
 #undef TARGET_RETURN_IN_MEMORY
 #define TARGET_RETURN_IN_MEMORY rs6000_return_in_memory
 
+#undef TARGET_RETURN_IN_MSB
+#define TARGET_RETURN_IN_MSB rs6000_return_in_msb
+
 #undef TARGET_SETUP_INCOMING_VARARGS
 #define TARGET_SETUP_INCOMING_VARARGS setup_incoming_varargs
 
@@ -8725,6 +8728,16 @@ rs6000_return_in_memory (const_tree type
   /* Otherwise fall through to more conventional ABI rules.  */
 }
 
+  /* The ELFv2 ABI returns homogeneous VFP aggregates in registers */
+  if (rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type,
+NULL, NULL))
+return false;
+
+  /* The ELFv2 ABI returns aggregates up to 16B in registers */
+  if (DEFAULT_ABI == ABI_ELFv2 && AGGREGATE_TYPE_P (type)
+  && (unsigned HOST_WIDE_INT) int_size_in_bytes (type) <= 16)
+return false;
+
   if (AGGREGATE_TYPE_P (type)
   && (aix_struct_return
  || (unsigned HOST_WIDE_INT) int_size_in_bytes (type) > 8))
@@ -8756,6 +8769,19 @@ rs6000_return_in_memory (const_tree type
   return false;
 }
 
+/* Specify whether values returned in registers should be at the most
+   significant end of a register.  We want aggregates returned by
+   value to match the way aggregates are passed to functions.  */
+
+static bool
+rs6000_return_in_msb (const_tree valtype)
+{
+  return (DEFAULT_ABI == ABI_ELFv2
+ && BYTES_BIG_ENDIAN
+ && AGGREGATE_TYPE_P (valtype)
+ && FUNCTION_ARG_PADDING (TYPE_MODE (valtype), valtype) == upward);
+}
+
 #ifdef HAVE_AS_GNU_ATTRIBUTE
 /* Return TRUE if a call to function FNDECL may be one that
potentially affects the function calling ABI of the object file.  */
@@ -29897,6 +29923,8 @@ rs6000_function_value (const_tree valtyp
 {
   enum machine_mode mode;
   unsigned int regno;
+  enum machine_mode elt_mode;
+  int n_elts;
 
   /* Special handling for structs in darwin64.  */
   if (TARGET_MACHO 
@@ -29916,6 +29944,36 @@ rs6000_function_value (const_tree valtyp
   /* Otherwise fall through to standar

Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification

2013-11-11 Thread Ilya Enkovich

2013/11/11 Richard Biener :
> On Mon, Nov 11, 2013 at 3:00 PM, Ilya Enkovich  wrote:
>> 2013/11/11 Richard Biener :
>>> On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich  
>>> wrote:
 2013/11/8 Richard Biener :
> On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law  wrote:
>> On 11/07/13 04:50, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> Here is an updated patch version.
>>
>> I think this needs to hold until we have a consensus on what the 
>> parameter
>> passing looks like for bounded pointers.
>
> I still think the best thing to do on GIMPLE is
>
> arg_2 = __builtin_ia32_bnd_arg (arg_1(D));
> foo (arg_2);
>
> that is, make the parameter an implicit pair of {value, bound} where
> the bound is determined by the value going through a bound association
> builtin.  No extra explicit argument to the calls so arguments match
> the fndecl and fntype.  All the complexity is defered to the expander
> which can trivially lookup bound arguments via the SSA def (I suppose
> it does that anyway now for getting at the explicit bound argument now).
>
> As far as I can see (well, think), all currently passed bound arguments
> are the return value of such builtin already.

 All bounds are result of different builtin calls. Small example:

 int *global_p;
 void foo (int *p)
 {
   int buf[10];
   bar (p, buf, global_p);
 }


 It is translated into:

   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
   global_p.0_2 = global_p;
   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
   bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7,
 global_p.0_2, __bound_tmp.1_8);

 Bounds binding via calls as you suggest may be done as following:

   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
   global_p.0_2 = global_p;
   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
   _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6);
   _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7);
   _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8);
   bar (_9, _10, _11);

 Is it close to what you propose?
>>>
>>> Yes.
>>
>> Is there a way to bind bounds with structures in a similar way?
>
> Not to have them easy to lookup in the SSA web.  A long time ago
> I proposed to make SSA aggregates possible, so you could do
>
>  tem_2 = __internal_bind_bounds (aggr(D), __bound_tmp.1_3,
> __bound_tmp.1_4, ...);
>  bar (tem_2);
>
> (originally the SSA aggregates were supposed to make copy-propgagation
> possible using the SSA copy propagator, and of course I needed it for
> the middle-end array work)
>
> Not sure if that will give other people the creeps (expand would
> never expand the "load" from tem_2 but instead handle aggr as
> parameter).  A slight complication is due to alias analysis
> which would be need to taught that bar performs a load of aggr.

It would require bounds loading for aggr before __internal_bind_bounds
anyway.  So, why not to do it in expand?  I just need to mark calls
with a flag (which you've proposed few times already) to let expand
know when it should load bounds.  Having SSA aggregates would be nice
but I suspect it has much higher impact then loading bounds in expand.
 I want to try simpler variant first.

Thanks,
Ilya

>
> Richard.
>
>>  For
>> SSA names I may easily find definition and check if it is a binding
>> builtin call. But for structures it is not possible.  The way I see it
>> to mark all such args as addressable and load required bounds on
>> expand pass.
>>
>> Ilya
>>>
>>> Richard.
>>>
 Ilya
>
> Richard.
>
>
>
>> Jeff

[PATCH, rs6000] ELFv2 ABI 6/8: Eliminate register save area in some cases

2013-11-11 Thread Ulrich Weigand

Hello,

the final area of changes implemented by the ELFv2 ABI is stack space
consumption.  Reducing the miminum stack size requirements is important
for environments where stack space is restricted, e.g. Linux kernel code,
or where we have a large number of stacks (e.g. heavily multi-threaded
applications).

One reason for the large stack space requirement of the current ABI is
that every caller must provide an argument save area of 64 bytes to its
caller.

This is actually useful in one specific case: if the called function
uses variable arguments and constructs a va_list.  Because of the way
the ABI is designed, the callee can in this case simply store the 8
GPRs (potentially) carrying arguments into that save area, and is
then guaranteed to have all function arguments in a linear arrangement
on the stack, which makes a trivial va_arg implementation possible.

If the callee is not vararg, this wouldn't really be necessary.  However,
if we were to skip that area then, vararg and non-vararg routines would
have incompatible ABIs, which would make it impossible to generate code
for a function call if the caller does not have a prototype of the called
function.  Therefore the old ABI requires the save area to be present
always.

With the new ABI, we wanted to retain the possibility of using a
trivial va_arg implementation, and also the possibility of calling
non-prototyped routines safely.  However, even so, there is a set
of cases where it is still not necessary to provide a save area:
when calling a function we have a prototype for and know that it
is neither vararg nor uses on-stack arguments.

This patch implement this change by having REG_PARM_STACK_SPACE
return a different value depending on whether the called routine
requires on-stack arguments (and we have a prototype).


There was one problem exposed by this change: rs6000_function_arg
contained this piece of code:

  if (mode == BLKmode)
mode = Pmode;

which marked the mode of a GPR holding an incoming struct argument
as Pmode.  This is now causing a problem.  When REG_PARM_STACK_SPACE
returns 0, function.c must allocate a stack slot within the callee's
frame to serve as DECL_RTL for the parameter.  This is done in
assign_parm_setup_block.  This routine attempts to use the mode
of the incoming register as the mode of that stack slot, if the
sizes match.

This means that the DECL_RTL for an argument of struct type may
now have mode Pmode, even if that type cannot reliably be operated
on in Pmode, which causes ICEs in several test cases for me.

Note that when REG_PARM_STACK_SPACE is non-zero, this problem does
not occur, because in this case, function.c allocates the stack slot
for the parameter's DECL_RTL in the register save area, using a
different function assign_parm_find_stack_rtl.   *That* function
will keep the mode as BLKmode even if the incoming register is
in another mode.

Removing the above two lines from rs6000_function_arg fixes this
problem, and does not appear to introduce any problem for the
ELFv1 ABI case either (full bootstrap / regtest still passes).

Looking back at the ChangeLog, those two lines where added by
Alan Modra back in 2004:
http://gcc.gnu.org/ml/gcc/2004-11/msg00617.html
to fix ICEs after a change by Richard Henderson:
http://gcc.gnu.org/ml/gcc/2004-11/msg00569.html
but Richard shortly afterwards reverted that change again
since it caused problems elsewhere too:
http://gcc.gnu.org/ml/gcc/2004-11/msg00751.html

So all in all, it seems this change by Alan was simply not
necessary and can be reverted.


Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  
Alan Modra  

* config/rs6000/rs6000-protos.h (rs6000_reg_parm_stack_space):
Add prototype.
* config/rs6000/rs6000.h (RS6000_REG_SAVE): Remove.
(REG_PARM_STACK_SPACE): Call rs6000_reg_parm_stack_space.
* config/rs6000/rs6000.c (rs6000_parm_needs_stack): New function.
(rs6000_function_parms_need_stack): Likewise.
(rs6000_reg_parm_stack_space): Likewise.
(rs6000_function_arg): Do not replace BLKmode by Pmode when
returning a register argument.

Index: gcc/gcc/config/rs6000/rs6000-protos.h
===
--- gcc.orig/gcc/config/rs6000/rs6000-protos.h
+++ gcc/gcc/config/rs6000/rs6000-protos.h
@@ -158,6 +158,7 @@ extern tree altivec_resolve_overloaded_b
 extern rtx rs6000_libcall_value (enum machine_mode);
 extern rtx rs6000_va_arg (tree, tree);
 extern int function_ok_for_sibcall (tree);
+extern int rs6000_reg_parm_stack_space (tree);
 extern void rs6000_elf_declare_function_name (FILE *, const char *, tree);
 extern bool rs6000_elf_in_small_data_p (const_tree);
 #ifdef ARGS_SIZE_RTX
Index: gcc/gcc/config/rs6000/rs6000.h
===
--- gcc.orig/gcc/config/rs6000/rs6000.h
+++ gcc/gcc/config/rs6000/rs6000.h
@@ -1527,10 +1527,6 @@ extern enum reg_class rs6000_cons

[PATCH, rs6000] ELFv2 ABI 8/8: Enable by default on little-endian

2013-11-11 Thread Ulrich Weigand

Hello,

this patch finally throws the switch and enables the ELFv2 ABI
by default on powerpc64le-linux.

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/sysv4le.h (LINUX64_DEFAULT_ABI_ELFv2): Define.


Index: gcc/gcc/config/rs6000/sysv4le.h
===
--- gcc.orig/gcc/config/rs6000/sysv4le.h
+++ gcc/gcc/config/rs6000/sysv4le.h
@@ -34,3 +34,7 @@
 
 #undef MULTILIB_DEFAULTS
 #defineMULTILIB_DEFAULTS { "mlittle", "mcall-sysv" }
+
+/* Little-endian PowerPC64 Linux uses the ELF v2 ABI by default.  */
+#define LINUX64_DEFAULT_ABI_ELFv2
+
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Time profiler - phase 1

2013-11-11 Thread Jan Hubicka

> +2013-10-29  Martin Liska  
> + Jan Hubicka  
> +
> + * cgraph.c (dump_cgraph_node): Profile dump added.
> + * cgraph.h (struct cgraph_node): New time profile variable added.
> + * cgraphclones.c (cgraph_clone_node): Time profile is cloned.
> + * gcov-io.h (gcov_type): New profiler type introduced.
> + * ipa-profile.c (lto_output_node): Streaming for time profile added.
> + (input_node): Time profiler is read from LTO stream.
> + * predict.c (maybe_hot_count_p): Hot prediction changed.
> + * profile.c (instrument_values): New case for time profiler added.
> + (compute_value_histograms): Read of time profile.
> + * tree-pretty-print.c (dump_function_header): Time profiler is dumped.
> + * tree-profile.c (init_ic_make_global_vars): Time profiler function 
> added.
> + (gimple_init_edge_profiler): TP function instrumentation.
> + (gimple_gen_time_profiler): New.
> + * value-prof.c (gimple_add_histogram_value): Support for time profiler
> + added.
> + (dump_histogram_value): TP type added to dumps.
> + (visit_hist): More sensitive check that takes TP into account.
> + (gimple_find_values_to_profile): TP instrumentation.
> + * value-prof.h (hist_type): New histogram type added.
> + (struct histogram_value_t): Pointer to struct function added.
> + * libgcc/Makefile.in: New GCOV merge function for TP added.
> + * libgcov.c: function_counter variable introduced.
> + (_gcov_merge_time_profile): New.
> + (_gcov_time_profiler): New.
> +
>  2013-11-05  Steven Bosscher  
>  
>  
> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
> index 1260069..eff4b51 100644
> --- a/gcc/ipa-profile.c
> +++ b/gcc/ipa-profile.c
> @@ -465,6 +465,7 @@ ipa_propagate_frequency (struct cgraph_node *node)
>if (d.maybe_unlikely_executed)
>  {
>node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED;
> +  node->tp_first_run = 0;
>if (dump_file)
>   fprintf (dump_file, "Node %s promoted to unlikely executed.\n",
>cgraph_node_name (node));

Why do you put tp_first_run to 0?  The function may be unlikely but it still 
may have
average possition in the program execution. I.e. when it is executed once per 
many
invocations during the train run)
> @@ -895,9 +907,22 @@ compute_value_histograms (histogram_values values, 
> unsigned cfg_checksum,
>hist->hvalue.counters =  XNEWVEC (gcov_type, hist->n_counters);
>for (j = 0; j < hist->n_counters; j++)
>  if (aact_count)
> -   hist->hvalue.counters[j] = aact_count[j];
> - else
> -   hist->hvalue.counters[j] = 0;
> +  hist->hvalue.counters[j] = aact_count[j];
> +else
> +  hist->hvalue.counters[j] = 0;
> +
> +  /* Time profiler counter is not related to any statement,
> +   * so that we have to read the counter and set the value to
> +   * the corresponding call graph node.  */

Reformat comment to GNU style.
> +2013-10-28  Martin Liska 
> +
> + * gcc.dg/time-profiler-1.c: New test.
> + * gcc.dg/time-profiler-2.c: Ditto.
> +
It is custom to put all changelogs to the beggining of your patch (just 
nitpicking ;))
> @@ -222,6 +225,18 @@ gimple_init_edge_profiler (void)
>   = tree_cons (get_identifier ("leaf"), NULL,
>DECL_ATTRIBUTES (tree_indirect_call_profiler_fn));
>  
> +  /* void (*) (gcov_type *, gcov_type, void *)  */
> +  time_profiler_fn_type
> += build_function_type_list (void_type_node,
> +   gcov_type_ptr, NULL_TREE);
> +  tree_time_profiler_fn
> +   = build_fn_decl ("__gcov_time_profiler",
> +  time_profiler_fn_type);
> +  TREE_NOTHROW (tree_time_profiler_fn) = 1;
> +  DECL_ATTRIBUTES (tree_time_profiler_fn)
> + = tree_cons (get_identifier ("leaf"), NULL,
> +  DECL_ATTRIBUTES (tree_time_profiler_fn));

We should udpate this code to use set_call_expr_flags but by incremental patch.

The patch is OK with changes above.  Do you have write permissin to commit it?
If not, just send updated version and I will commit it + follow the write access
instructions on gcc.gnu.org homepage.

Honza

Re: [RFC] libgcov.c re-factoring and offline profile-tool

2013-11-11 Thread Jan Hubicka

> 2013-11-04  Rong Xu  
> 
>   * libgcc/libgcov.c: Delete as part of re-factoring.
>   * libgcc/libgcov-profiler.c (__gcov_interval_profiler): Moved from
>   libgcov.c
>   (__gcov_pow2_profiler): Ditto.
>   (__gcov_one_value_profiler_body): Ditto.
>   (__gcov_one_value_profiler): Ditto.
>   (__gcov_indirect_call_profiler): Ditto.
>   (__gcov_indirect_call_profiler_v2): Ditto.
>   (__gcov_average_profiler): Ditto.
>   (__gcov_ior_profiler): Ditto.
>   * libgcc/libgcov-driver.c (this_prg): Make it file scope
>   static variable.
>   (all_prg): Ditto.
>   (crc32): Ditto.
>   (gi_filename): Ditto.
>   (fn_buffer): Ditto.
>   (sum_buffer): Ditto.
> (struct gcov_filename_aux): New types to store auxiliary information
> for gi_filename.
>   (gcov_version): Moved from libgcov.c.
>   (crc32_unsigned): Ditto.
>   (gcov_histogram_insert): Ditto.
>   (gcov_compute_histogram): Ditto.
>   (gcov_exit): Ditto.
>   (gcov_clear): Ditto.
>   (__gcov_init): Ditto.
>   (gcov_exit_compute_summary): New function split from gcov_exit().
>   (gcov_exit_merge_gcda): Ditto.
>   (gcov_exit_write_gcda): Ditto.
>   (gcov_exit_dump_gcov): Ditto.
>   * libgcc/libgcov-interface.c (init_mx): Moved from libgcov.c.
>   (init_mx_once): Ditto.
>   (__gcov_flush): Ditto.
>   (__gcov_reset): Ditto.
>   (__gcov_dump): Ditto.
>   (__gcov_fork): Ditto.
>   (__gcov_execl): Ditto.
>   (__gcov_execlp): Ditto.
>   (__gcov_execle): Ditto.
>   (__gcov_execv): Ditto.
>   (__gcov_execvp): Ditto.
>   (__gcov_execve): Ditto.
>   * libgcc/libgcov-driver-system.c (gcov_error): New utility function.
>   (allocate_filename_struct): New function split from gcov_exit().
>   (gcov_exit_open_gcda_file): Ditto.
> (create_file_directory): Moved from libgcov.c.
>   * libgcc/libgcov-merge.c:
>   (__gcov_merge_add): Moved from libgcov.c.
>   (__gcov_merge_ior): Ditto.
>   (__gcov_merge_single): Ditto.
>   (__gcov_merge_delta): Ditto.
>   * libgcc/Makefile.in: Change to build newly added files. 
>   * gcc/gcov-io.h (__gcov_merge_ior): Add the decl to avoid warning.
> 

Hi,
the patch looks resonable. I take your word on the fact that there are no 
changes
in the code, I did not read it all ;)))
> Index: libgcc/Makefile.in
> ===
> --- libgcc/Makefile.in(revision 204285)
> +++ libgcc/Makefile.in(working copy)
> @@ -853,17 +853,37 @@ include $(iterator)
>  # Build libgcov components.
>  
>  # Defined in libgcov.c, included only in gcov library
> -LIBGCOV = _gcov _gcov_merge_add _gcov_merge_single _gcov_merge_delta \
> -_gcov_fork _gcov_execl _gcov_execlp _gcov_execle \
> -_gcov_execv _gcov_execvp _gcov_execve _gcov_reset _gcov_dump \
> -_gcov_interval_profiler _gcov_pow2_profiler _gcov_one_value_profiler \
> +##LIBGCOV = _gcov _gcov_merge_add _gcov_merge_single _gcov_merge_delta \
> +##_gcov_fork _gcov_execl _gcov_execlp _gcov_execle \
> +##_gcov_execv _gcov_execvp _gcov_execve _gcov_reset _gcov_dump \
> +##_gcov_interval_profiler _gcov_pow2_profiler _gcov_one_value_profiler \
> +##_gcov_indirect_call_profiler _gcov_average_profiler _gcov_ior_profiler 
> \
> +##_gcov_merge_ior _gcov_indirect_call_profiler_v2

Probably no need for this commnet block.
> +
> +LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_single _gcov_merge_delta 
> _gcov_merge_ior
> +LIBGCOV_PROFILER = _gcov_interval_profiler _gcov_pow2_profiler 
> _gcov_one_value_profiler \
>  _gcov_indirect_call_profiler _gcov_average_profiler _gcov_ior_profiler \
> -_gcov_merge_ior _gcov_indirect_call_profiler_v2
> +_gcov_indirect_call_profiler_v2
> +LIBGCOV_INTERFACE = _gcov_flush _gcov_fork _gcov_execl _gcov_execlp 
> _gcov_execle \
> +_gcov_execv _gcov_execvp _gcov_execve _gcov_reset _gcov_dump
> +LIBGCOV_DRIVER = _gcov 
>  
> -libgcov-objects = $(patsubst %,%$(objext),$(LIBGCOV))
> +libgcov-merge-objects = $(patsubst %,%$(objext),$(LIBGCOV_MERGE))
> +libgcov-profiler-objects = $(patsubst %,%$(objext),$(LIBGCOV_PROFILER))
> +libgcov-interface-objects = $(patsubst %,%$(objext),$(LIBGCOV_INTERFACE))
> +libgcov-driver-objects = $(patsubst %,%$(objext),$(LIBGCOV_DRIVER))
> +libgcov-objects = $(libgcov-merge-objects) $(libgcov-profiler-objects) \
> + $(libgcov-interface-objects) $(libgcov-driver-objects)
>  
> -$(libgcov-objects): %$(objext): $(srcdir)/libgcov.c
> - $(gcc_compile) -DL$* -c $(srcdir)/libgcov.c
> +$(libgcov-merge-objects): %$(objext): $(srcdir)/libgcov-merge.c
> + $(gcc_compile) -DL$* -c $(srcdir)/libgcov-merge.c
> +$(libgcov-profiler-objects): %$(objext): $(srcdir)/libgcov-profiler.c
> + $(gcc_compile) -DL$* -c $(srcdir)/libgcov-profiler.c
> +$(libgcov-interface-objects): %$(objext): $(srcdir)/libgcov-interface.c
> + $(gcc_compile) -DL$*

Re: Some wide-int review comments

2013-11-11 Thread Kenneth Zadeck


On 11/11/2013 09:42 AM, Richard Biener wrote:

On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck
 wrote:

On 11/11/2013 06:49 AM, Richard Biener wrote:

On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck 
wrote:

On 11/08/2013 05:30 AM, Richard Sandiford wrote:

   From tree-vrp.c:

@@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code
/* If the singed operation wraps then int_const_binop has done
   everything we want.  */
;
+  /* Signed division of -1/0 overflows and by the time it gets here
+ returns NULL_TREE.  */
+  else if (!res)
+return NULL_TREE;
  else if ((TREE_OVERFLOW (res)
  && !TREE_OVERFLOW (val1)
  && !TREE_OVERFLOW (val2))

Why is this case different from trunk?  Or is it a bug-fix independent
of wide-int?

the api for division is different for wide-int than it was for
double-int.

But this is using a tree API (int_const_binop) that didn't change
(it returned NULL for / 0 previously).  So what makes us arrive here
now?  (I agree there is a bug in VRP, but it shouldn't manifest itself
only on wide-int)

Richard.

My reading of the code is that is that i changed int_const_binop to return
null_tree for this case.

Trunk has:

 case TRUNC_DIV_EXPR:
 case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR:
 case EXACT_DIV_EXPR:
   /* This is a shortcut for a common special case.  */
   if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0
   && !TREE_OVERFLOW (arg1)
   && !TREE_OVERFLOW (arg2)
   && op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0)
 {
   if (code == CEIL_DIV_EXPR)
 op1.low += op2.low - 1;

   res.low = op1.low / op2.low, res.high = 0;
   break;
 }

   /* ... fall through ...  */

 case ROUND_DIV_EXPR:
   if (op2.is_zero ())
 return NULL_TREE;

so it already returns NULL_TREE on divide by zero.
I found the reason  This is one of the many "tree-vrp was not 
properly tested for TImode bugs."


on the trunk, the case 0/(smallest negative number) case will only 
trigger overflow in TImode. On the wide-int branch, tree-vrp works 
at the precision of the operands so overflow is triggered properly for 
this case.So for HImode, the trunk produces the a result for 0/0x80 
and then force_fit code at the bottom of int_const_binop_1 turns this 
into an overflow tree value rather than a null tree.


on the wide-int branch, this case causes the overflow bit to be returned 
from the wide-int divide because the overflow case is properly handled 
for all types and that overflow is turned into null_tree by the wide-int 
version of int_const_binop_1.


apparently there are no test cases that exercise the true divide by 0 
case but there are test cases that hit the 0/ largest negative number 
case for modes smaller than TImode.


Kenny



On the trunk, only rem returns null_tree for divide by 0, on the wide int
branch, both div and rem return null tree.

I know that this is going to bring on a string of questions that i do not
remember the answers to as to why i made that change. but
fold-const.c:int_const_binop_1 now returns null_tree and this is just
fallout from that change.

Kenny

Thanks,
Richard

Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 3:45 PM, Ilya Enkovich  wrote:
> 2013/11/11 Richard Biener :
>> On Mon, Nov 11, 2013 at 3:00 PM, Ilya Enkovich  
>> wrote:
>>> 2013/11/11 Richard Biener :
 On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich  
 wrote:
> 2013/11/8 Richard Biener :
>> On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law  wrote:
>>> On 11/07/13 04:50, Ilya Enkovich wrote:

 Hi,

 Here is an updated patch version.
>>>
>>> I think this needs to hold until we have a consensus on what the 
>>> parameter
>>> passing looks like for bounded pointers.
>>
>> I still think the best thing to do on GIMPLE is
>>
>> arg_2 = __builtin_ia32_bnd_arg (arg_1(D));
>> foo (arg_2);
>>
>> that is, make the parameter an implicit pair of {value, bound} where
>> the bound is determined by the value going through a bound association
>> builtin.  No extra explicit argument to the calls so arguments match
>> the fndecl and fntype.  All the complexity is defered to the expander
>> which can trivially lookup bound arguments via the SSA def (I suppose
>> it does that anyway now for getting at the explicit bound argument now).
>>
>> As far as I can see (well, think), all currently passed bound arguments
>> are the return value of such builtin already.
>
> All bounds are result of different builtin calls. Small example:
>
> int *global_p;
> void foo (int *p)
> {
>   int buf[10];
>   bar (p, buf, global_p);
> }
>
>
> It is translated into:
>
>   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
>   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
>   global_p.0_2 = global_p;
>   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
>   bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7,
> global_p.0_2, __bound_tmp.1_8);
>
> Bounds binding via calls as you suggest may be done as following:
>
>   __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40);
>   __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab));
>   global_p.0_2 = global_p;
>   __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2);
>   _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6);
>   _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7);
>   _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8);
>   bar (_9, _10, _11);
>
> Is it close to what you propose?

 Yes.
>>>
>>> Is there a way to bind bounds with structures in a similar way?
>>
>> Not to have them easy to lookup in the SSA web.  A long time ago
>> I proposed to make SSA aggregates possible, so you could do
>>
>>  tem_2 = __internal_bind_bounds (aggr(D), __bound_tmp.1_3,
>> __bound_tmp.1_4, ...);
>>  bar (tem_2);
>>
>> (originally the SSA aggregates were supposed to make copy-propgagation
>> possible using the SSA copy propagator, and of course I needed it for
>> the middle-end array work)
>>
>> Not sure if that will give other people the creeps (expand would
>> never expand the "load" from tem_2 but instead handle aggr as
>> parameter).  A slight complication is due to alias analysis
>> which would be need to taught that bar performs a load of aggr.
>
> It would require bounds loading for aggr before __internal_bind_bounds
> anyway.  So, why not to do it in expand?

Ah, ok I wasn't aware of that.

>  I just need to mark calls
> with a flag (which you've proposed few times already) to let expand
> know when it should load bounds.  Having SSA aggregates would be nice
> but I suspect it has much higher impact then loading bounds in expand.
>  I want to try simpler variant first.

Good.

Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>
>>>  For
>>> SSA names I may easily find definition and check if it is a binding
>>> builtin call. But for structures it is not possible.  The way I see it
>>> to mark all such args as addressable and load required bounds on
>>> expand pass.
>>>
>>> Ilya

 Richard.

> Ilya
>>
>> Richard.
>>
>>
>>
>>> Jeff

Recent Go patch broke CentOS 5.10 bootstrap

2013-11-11 Thread Uros Bizjak

Hello!

Building latest gcc+go on CentOS 5.10 breaks with following build failure:

../../../gcc-svn/trunk/libgo/go/net/fd_unix.go:414:72: error:
reference to undefined identifier ‘syscall.F_DUPFD_CLOEXEC’
   r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd),
syscall.F_DUPFD_CLOEXEC, 0)
^
gmake[4]: *** [net.lo] Error 1
gmake[4]: Leaving directory
`/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgo'
gmake[3]: *** [all-recursive] Error 1
gmake[3]: Leaving directory
`/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgo'
gmake[2]: *** [all] Error 2
gmake[2]: Leaving directory
`/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgo'
gmake[1]: *** [all-target-libgo] Error 2
gmake[1]: Leaving directory `/home/uros/gcc-build'
gmake: *** [all] Error 2

Uros.

Re: [RFC] libgcov.c re-factoring and offline profile-tool

2013-11-11 Thread Jan Hubicka

> Here is the patch that includes profile-tool.
> Profile-tool now has two functions: merge and rewrite. I'll add diff later.
> 
> Compiler is tested with spec2006 and profiledbootstrap.
> profile-tool is tested with spec2006 profiles.

Hi,
it would be nice if you could elaborate bit more on the tool and what it does.
I plan to implement relink only instrumentatino for LTO and for that I will need
profile reading/handling that is independent on functio nbodies, so perhaps
we can unify the infrastructure.  My plans always was to merge the gcov and
profile.c profile handling logic and make it bit more generic.

Also I think profile-tool is a bit generic name. I.e. it does not realy say it
is related to GCC's coverage profiling.  Maybe gcc-profile-tool or
gcov-profile-tool may be better?
Perhaps an resonable option would also be to bundle it into gcov binary...
> Index: libgcc/libgcov-tool.c
> ===
> --- libgcc/libgcov-tool.c (revision 0)
> +++ libgcc/libgcov-tool.c (revision 0)
> @@ -0,0 +1,800 @@
> +/* GCC instrumentation plugin for ThreadSanitizer.
> +   Copyright (C) 2011-2013 Free Software Foundation, Inc.
> +   Contributed by Dmitry Vyukov 

You need to update this...
Why there needs to be libgcov changes?

Honza

Re: [PATCH] decide edge's hotness when there is profile info

2013-11-11 Thread Jan Hubicka

> 2013-11-08  Teresa Johnson  
> Jan Hubicka  
> 
> * predict.c (drop_profile): New function.
> (handle_missing_profiles): Ditto.
> (counts_to_freqs): Don't overwrite estimated frequencies
> when function has no profile counts.
> * predict.h (handle_missing_profiles): Declare.
> * tree-inline.c (freqs_to_counts): New function.
> (copy_cfg_body): Invoke freqs_to_counts as needed.
> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
> 
> Index: predict.c
> ===
> --- predict.c   (revision 204516)
> +++ predict.c   (working copy)
> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>BITMAP_FREE (tovisit);
>  }
> 
> +/* Drop the profile for NODE to guessed, and update its frequency based on
> +   whether it is expected to be HOT.  */
> +
> +static void
> +drop_profile (struct cgraph_node *node, bool hot)
> +{
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  if (dump_file)
> +fprintf (dump_file,
> + "Dropping 0 profile for %s/%i. %s based on calls.\n",
> + cgraph_node_name (node), node->order,
> + hot ? "Function is hot" : "Function is normal");
> +  /* We only expect to miss profiles for functions that are reached
> + via non-zero call edges in cases where the function may have
> + been linked from another module or library (COMDATs and extern
> + templates). See the comments below for handle_missing_profiles.  */
> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
> +warning (0,
> + "Missing counts for called function %s/%i",
> + cgraph_node_name (node), node->order);
> +
> +  profile_status_for_function (fn)
> +  = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency
> +  = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> +}
> +
> +/* In the case of COMDAT routines, multiple object files will contain the 
> same
> +   function and the linker will select one for the binary. In that case
> +   all the other copies from the profile instrument binary will be missing
> +   profile counts. Look for cases where this happened, due to non-zero
> +   call counts going to 0-count functions, and drop the profile to guessed
> +   so that we can use the estimated probabilities and avoid optimizing only
> +   for size.
> +
> +   The other case where the profile may be missing is when the routine
> +   is not going to be emitted to the object file, e.g. for "extern template"
> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
> +   all other cases of non-zero calls to 0-count functions.  */
> +
> +void
> +handle_missing_profiles (void)
> +{
> +  struct cgraph_node *node;
> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> +  vec worklist;
> +  worklist.create (64);
> +
> +  /* See if 0 count function has non-0 count callers.  In this case we
> + lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +{
> +  struct cgraph_edge *e;
> +  gcov_type call_count = 0;
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  if (node->count)
> +continue;
> +  for (e = node->callers; e; e = e->next_caller)
> +call_count += e->count;
> +  if (call_count
> +  && fn && fn->cfg
> +  && (call_count * unlikely_count_fraction >= profile_info->runs))
> +{
> +  bool maybe_hot = maybe_hot_count_p (NULL, call_count);
> +
> +  drop_profile (node, maybe_hot);
> +  worklist.safe_push (node);
> +}

Perhaps we should add an note/error on mishandled profile when function is not 
COMDAT?
That way we may notice further bugs in this area.
> +}
> +
> +  /* Propagate the profile dropping to other 0-count COMDATs that are
> + potentially called by COMDATs we already dropped the profile on.  */
> +  while (worklist.length () > 0)
> +{
> +  struct cgraph_edge *e;
> +
> +  node = worklist.pop ();
> +  for (e = node->callees; e; e = e->next_caller)
> +{
> +  struct cgraph_node *callee = e->callee;
> +  struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
> +
> +  if (callee->count > 0)
> +continue;
> +  if (DECL_COMDAT (callee->decl) && fn && fn->cfg
> +  && profile_status_for_function (fn) == PROFILE_READ)

Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile 
estimate
to give us false only for really known to be unlikely paths? (i.e. EH handling, 
noreturns
etc.)
> +{
> +  /* Since there are no non-0 call counts to this function,
> + we don't know for sure whether it is hot. Indicate to
> + the drop_profile routine that function should be marked
> +

Re: Some wide-int review comments

2013-11-11 Thread Richard Biener

On Mon, Nov 11, 2013 at 4:04 PM, Kenneth Zadeck
 wrote:
> On 11/11/2013 09:42 AM, Richard Biener wrote:
>>
>> On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck
>>  wrote:
>>>
>>> On 11/11/2013 06:49 AM, Richard Biener wrote:

 On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck
 
 wrote:
>
> On 11/08/2013 05:30 AM, Richard Sandiford wrote:
>>
>>From tree-vrp.c:
>>
>> @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code
>> /* If the singed operation wraps then int_const_binop has done
>>everything we want.  */
>> ;
>> +  /* Signed division of -1/0 overflows and by the time it gets here
>> + returns NULL_TREE.  */
>> +  else if (!res)
>> +return NULL_TREE;
>>   else if ((TREE_OVERFLOW (res)
>>   && !TREE_OVERFLOW (val1)
>>   && !TREE_OVERFLOW (val2))
>>
>> Why is this case different from trunk?  Or is it a bug-fix independent
>> of wide-int?
>
> the api for division is different for wide-int than it was for
> double-int.

 But this is using a tree API (int_const_binop) that didn't change
 (it returned NULL for / 0 previously).  So what makes us arrive here
 now?  (I agree there is a bug in VRP, but it shouldn't manifest itself
 only on wide-int)

 Richard.
>>>
>>> My reading of the code is that is that i changed int_const_binop to
>>> return
>>> null_tree for this case.
>>
>> Trunk has:
>>
>>  case TRUNC_DIV_EXPR:
>>  case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR:
>>  case EXACT_DIV_EXPR:
>>/* This is a shortcut for a common special case.  */
>>if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0
>>&& !TREE_OVERFLOW (arg1)
>>&& !TREE_OVERFLOW (arg2)
>>&& op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0)
>>  {
>>if (code == CEIL_DIV_EXPR)
>>  op1.low += op2.low - 1;
>>
>>res.low = op1.low / op2.low, res.high = 0;
>>break;
>>  }
>>
>>/* ... fall through ...  */
>>
>>  case ROUND_DIV_EXPR:
>>if (op2.is_zero ())
>>  return NULL_TREE;
>>
>> so it already returns NULL_TREE on divide by zero.
>
> I found the reason  This is one of the many "tree-vrp was not properly
> tested for TImode bugs."
>
> on the trunk, the case 0/(smallest negative number) case will only trigger
> overflow in TImode. On the wide-int branch, tree-vrp works at the
> precision of the operands so overflow is triggered properly for this case.
> So for HImode, the trunk produces the a result for 0/0x80 and then force_fit
> code at the bottom of int_const_binop_1 turns this into an overflow tree
> value rather than a null tree.
>
> on the wide-int branch, this case causes the overflow bit to be returned
> from the wide-int divide because the overflow case is properly handled for
> all types and that overflow is turned into null_tree by the wide-int version
> of int_const_binop_1.
>
> apparently there are no test cases that exercise the true divide by 0 case
> but there are test cases that hit the 0/ largest negative number case for
> modes smaller than TImode.

You probably mean  / -1?  I don't see where
NULL_TREE is returned for any overflow case in int_const_binop_1.

Ah, you made it so.  That looks like a bogus change.  What's the
reason?  int_const_binop_1 is supposed to return a value with
TREE_OVERFLOW set in these cases, also required for frontend
constant folding.  Try compiling

const int i = (-__INT_MAX__ - 1) / -1;

which should say

> ./cc1 -quiet t.c
t.c:1:34: warning: integer overflow in expression [-Woverflow]
 const int i = (-__INT_MAX__ - 1) / -1;
  ^

but not error or ICE.  Seems to work on the branch, but only
because the expression is still folded by

12357 /* X / -1 is -X.  */
12358 if (!TYPE_UNSIGNED (type)
12359 && TREE_CODE (arg1) == INTEGER_CST
12360 && wi::eq_p (arg1, -1))
12361   return fold_convert_loc (loc, type, negate_expr (arg0));


Thus

case TRUNC_DIV_EXPR:
case EXACT_DIV_EXPR:
  res = wi::div_trunc (arg1, arg2, sign, &overflow);
  if (overflow)
return NULL_TREE;
  break;

should retain the arg2 == 0 checks (and return NULL_TREE) but
otherwise keep overflow handling the same.

Richard.

> Kenny
>
>>
>>> On the trunk, only rem returns null_tree for divide by 0, on the wide int
>>> branch, both div and rem return null tree.
>>>
>>> I know that this is going to bring on a string of questions that i do not
>>> remember the answers to as to why i made that change. but
>>> fold-const.c:int_const_binop_1 now returns null_tree and this is just
>>> fallout from that change.
>>>
>>> Kenny
>>
>> Thanks,
>> Richard
>
>
>

Re: Some wide-int review comments

2013-11-11 Thread Kenneth Zadeck


On 11/11/2013 10:04 AM, Kenneth Zadeck wrote:

On 11/11/2013 09:42 AM, Richard Biener wrote:

On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck
 wrote:

On 11/11/2013 06:49 AM, Richard Biener wrote:
On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck 


wrote:

On 11/08/2013 05:30 AM, Richard Sandiford wrote:

   From tree-vrp.c:

@@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code
/* If the singed operation wraps then int_const_binop has 
done

   everything we want.  */
;
+  /* Signed division of -1/0 overflows and by the time it gets here
+ returns NULL_TREE.  */
+  else if (!res)
+return NULL_TREE;
  else if ((TREE_OVERFLOW (res)
  && !TREE_OVERFLOW (val1)
  && !TREE_OVERFLOW (val2))

Why is this case different from trunk?  Or is it a bug-fix 
independent

of wide-int?

the api for division is different for wide-int than it was for
double-int.

But this is using a tree API (int_const_binop) that didn't change
(it returned NULL for / 0 previously).  So what makes us arrive here
now?  (I agree there is a bug in VRP, but it shouldn't manifest itself
only on wide-int)

Richard.
My reading of the code is that is that i changed int_const_binop to 
return

null_tree for this case.

Trunk has:

 case TRUNC_DIV_EXPR:
 case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR:
 case EXACT_DIV_EXPR:
   /* This is a shortcut for a common special case.  */
   if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0
   && !TREE_OVERFLOW (arg1)
   && !TREE_OVERFLOW (arg2)
   && op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0)
 {
   if (code == CEIL_DIV_EXPR)
 op1.low += op2.low - 1;

   res.low = op1.low / op2.low, res.high = 0;
   break;
 }

   /* ... fall through ...  */

 case ROUND_DIV_EXPR:
   if (op2.is_zero ())
 return NULL_TREE;

so it already returns NULL_TREE on divide by zero.
I found the reason  This is one of the many "tree-vrp was not 
properly tested for TImode bugs."


it is an overstatement to say that this was a timode undertesting 
issue.  I made the change to tree-vrp because of wide-int's treating of 
all divide/rem overflows in a uniform manner AND the fact that there was 
a untested divide by zero hole in tree-vrp.  I most likely never saw 
a true divide by zero pass into that section of code.



on the trunk, the case 0/(smallest negative number) case will only 
trigger overflow in TImode. On the wide-int branch, tree-vrp works 
at the precision of the operands so overflow is triggered properly for 
this case.So for HImode, the trunk produces the a result for 
0/0x80 and then force_fit code at the bottom of int_const_binop_1 
turns this into an overflow tree value rather than a null tree.


on the wide-int branch, this case causes the overflow bit to be 
returned from the wide-int divide because the overflow case is 
properly handled for all types and that overflow is turned into 
null_tree by the wide-int version of int_const_binop_1.


apparently there are no test cases that exercise the true divide by 0 
case but there are test cases that hit the 0/ largest negative number 
case for modes smaller than TImode.


Kenny


On the trunk, only rem returns null_tree for divide by 0, on the 
wide int

branch, both div and rem return null tree.

I know that this is going to bring on a string of questions that i 
do not

remember the answers to as to why i made that change. but
fold-const.c:int_const_binop_1 now returns null_tree and this is just
fallout from that change.

Kenny

Thanks,
Richard

Re: Some wide-int review comments

2013-11-11 Thread Kenneth Zadeck


On 11/11/2013 10:29 AM, Richard Biener wrote:

On Mon, Nov 11, 2013 at 4:04 PM, Kenneth Zadeck
 wrote:

On 11/11/2013 09:42 AM, Richard Biener wrote:

On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck
 wrote:

On 11/11/2013 06:49 AM, Richard Biener wrote:

On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck

wrote:

On 11/08/2013 05:30 AM, Richard Sandiford wrote:

From tree-vrp.c:

@@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code
 /* If the singed operation wraps then int_const_binop has done
everything we want.  */
 ;
+  /* Signed division of -1/0 overflows and by the time it gets here
+ returns NULL_TREE.  */
+  else if (!res)
+return NULL_TREE;
   else if ((TREE_OVERFLOW (res)
   && !TREE_OVERFLOW (val1)
   && !TREE_OVERFLOW (val2))

Why is this case different from trunk?  Or is it a bug-fix independent
of wide-int?

the api for division is different for wide-int than it was for
double-int.

But this is using a tree API (int_const_binop) that didn't change
(it returned NULL for / 0 previously).  So what makes us arrive here
now?  (I agree there is a bug in VRP, but it shouldn't manifest itself
only on wide-int)

Richard.

My reading of the code is that is that i changed int_const_binop to
return
null_tree for this case.

Trunk has:

  case TRUNC_DIV_EXPR:
  case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR:
  case EXACT_DIV_EXPR:
/* This is a shortcut for a common special case.  */
if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0
&& !TREE_OVERFLOW (arg1)
&& !TREE_OVERFLOW (arg2)
&& op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0)
  {
if (code == CEIL_DIV_EXPR)
  op1.low += op2.low - 1;

res.low = op1.low / op2.low, res.high = 0;
break;
  }

/* ... fall through ...  */

  case ROUND_DIV_EXPR:
if (op2.is_zero ())
  return NULL_TREE;

so it already returns NULL_TREE on divide by zero.

I found the reason  This is one of the many "tree-vrp was not properly
tested for TImode bugs."

on the trunk, the case 0/(smallest negative number) case will only trigger
overflow in TImode. On the wide-int branch, tree-vrp works at the
precision of the operands so overflow is triggered properly for this case.
So for HImode, the trunk produces the a result for 0/0x80 and then force_fit
code at the bottom of int_const_binop_1 turns this into an overflow tree
value rather than a null tree.

on the wide-int branch, this case causes the overflow bit to be returned
from the wide-int divide because the overflow case is properly handled for
all types and that overflow is turned into null_tree by the wide-int version
of int_const_binop_1.

apparently there are no test cases that exercise the true divide by 0 case
but there are test cases that hit the 0/ largest negative number case for
modes smaller than TImode.

You probably mean  / -1?  I don't see where
NULL_TREE is returned for any overflow case in int_const_binop_1.

Ah, you made it so.  That looks like a bogus change.  What's the
reason?  int_const_binop_1 is supposed to return a value with
TREE_OVERFLOW set in these cases, also required for frontend
constant folding.  Try compiling

const int i = (-__INT_MAX__ - 1) / -1;

which should say


./cc1 -quiet t.c

t.c:1:34: warning: integer overflow in expression [-Woverflow]
  const int i = (-__INT_MAX__ - 1) / -1;
   ^

but not error or ICE.  Seems to work on the branch, but only
because the expression is still folded by

12357 /* X / -1 is -X.  */
12358 if (!TYPE_UNSIGNED (type)
12359 && TREE_CODE (arg1) == INTEGER_CST
12360 && wi::eq_p (arg1, -1))
12361   return fold_convert_loc (loc, type, negate_expr (arg0));


Thus

 case TRUNC_DIV_EXPR:
 case EXACT_DIV_EXPR:
   res = wi::div_trunc (arg1, arg2, sign, &overflow);
   if (overflow)
 return NULL_TREE;
   break;

should retain the arg2 == 0 checks (and return NULL_TREE) but
otherwise keep overflow handling the same.
I see your point.   i did not think that it was necessary to distinguish 
the two types of behavior by division.


i will make this fix today and test and install it.

kenny

Richard.


Kenny


On the trunk, only rem returns null_tree for divide by 0, on the wide int
branch, both div and rem return null tree.

I know that this is going to bring on a string of questions that i do not
remember the answers to as to why i made that change. but
fold-const.c:int_const_binop_1 now returns null_tree and this is just
fallout from that change.

Kenny

Thanks,
Richard

[gomp4] Fix two typos

2013-11-11 Thread Jakub Jelinek

Hi!

Dunno how I've managed to break the branch on Friday, anyway, here is an
obvious fix, committed to the branch.

2013-11-11  Jakub Jelinek  

* tree-vect-data-refs.c (vect_analyze_data_refs): Check loop->safelen
rather than loop->simdlen.
* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.

--- gcc/tree-vect-data-refs.c.jj2013-11-08 17:12:49.0 +0100
+++ gcc/tree-vect-data-refs.c   2013-11-11 16:25:15.363311414 +0100
@@ -2982,7 +2982,7 @@ vect_analyze_data_refs (loop_vec_info lo
  gimple stmt = gsi_stmt (gsi);
  if (!find_data_references_in_stmt (loop, stmt, &datarefs))
{
- if (is_gimple_call (stmt) && loop->simdlen)
+ if (is_gimple_call (stmt) && loop->safelen)
{
  tree fndecl = gimple_call_fndecl (stmt), op;
  if (fndecl != NULL_TREE)
--- gcc/tree-vect-stmts.c.jj2013-11-08 17:12:49.0 +0100
+++ gcc/tree-vect-stmts.c   2013-11-11 16:26:43.254861184 +0100
@@ -2334,7 +2340,7 @@ vectorizable_simd_clone_call (gimple stm
   /* If the function isn't const, only allow it in simd loops where user
  has asserted that at least nunits consecutive iterations can be
  performed using SIMD instructions.  */
-  if ((loop == NULL || loop->simdlen < nunits) && gimple_vuse (stmt))
+  if ((loop == NULL || loop->safelen < nunits) && gimple_vuse (stmt))
 {
   arginfo.release ();
   return false;

Jakub

Re: [PATCH] decide edge's hotness when there is profile info

2013-11-11 Thread Teresa Johnson

On Mon, Nov 11, 2013 at 7:23 AM, Jan Hubicka  wrote:
>> 2013-11-08  Teresa Johnson  
>> Jan Hubicka  
>>
>> * predict.c (drop_profile): New function.
>> (handle_missing_profiles): Ditto.
>> (counts_to_freqs): Don't overwrite estimated frequencies
>> when function has no profile counts.
>> * predict.h (handle_missing_profiles): Declare.
>> * tree-inline.c (freqs_to_counts): New function.
>> (copy_cfg_body): Invoke freqs_to_counts as needed.
>> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>
>> Index: predict.c
>> ===
>> --- predict.c   (revision 204516)
>> +++ predict.c   (working copy)
>> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>>BITMAP_FREE (tovisit);
>>  }
>>
>> +/* Drop the profile for NODE to guessed, and update its frequency based on
>> +   whether it is expected to be HOT.  */
>> +
>> +static void
>> +drop_profile (struct cgraph_node *node, bool hot)
>> +{
>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +  if (dump_file)
>> +fprintf (dump_file,
>> + "Dropping 0 profile for %s/%i. %s based on calls.\n",
>> + cgraph_node_name (node), node->order,
>> + hot ? "Function is hot" : "Function is normal");
>> +  /* We only expect to miss profiles for functions that are reached
>> + via non-zero call edges in cases where the function may have
>> + been linked from another module or library (COMDATs and extern
>> + templates). See the comments below for handle_missing_profiles.  */
>> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
>> +warning (0,
>> + "Missing counts for called function %s/%i",
>> + cgraph_node_name (node), node->order);
>> +
>> +  profile_status_for_function (fn)
>> +  = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> +  node->frequency
>> +  = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>> +}
>> +
>> +/* In the case of COMDAT routines, multiple object files will contain the 
>> same
>> +   function and the linker will select one for the binary. In that case
>> +   all the other copies from the profile instrument binary will be missing
>> +   profile counts. Look for cases where this happened, due to non-zero
>> +   call counts going to 0-count functions, and drop the profile to guessed
>> +   so that we can use the estimated probabilities and avoid optimizing only
>> +   for size.
>> +
>> +   The other case where the profile may be missing is when the routine
>> +   is not going to be emitted to the object file, e.g. for "extern template"
>> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
>> +   all other cases of non-zero calls to 0-count functions.  */
>> +
>> +void
>> +handle_missing_profiles (void)
>> +{
>> +  struct cgraph_node *node;
>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>> +  vec worklist;
>> +  worklist.create (64);
>> +
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> + lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +{
>> +  struct cgraph_edge *e;
>> +  gcov_type call_count = 0;
>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +  if (node->count)
>> +continue;
>> +  for (e = node->callers; e; e = e->next_caller)
>> +call_count += e->count;
>> +  if (call_count
>> +  && fn && fn->cfg
>> +  && (call_count * unlikely_count_fraction >= profile_info->runs))
>> +{
>> +  bool maybe_hot = maybe_hot_count_p (NULL, call_count);
>> +
>> +  drop_profile (node, maybe_hot);
>> +  worklist.safe_push (node);
>> +}
>
> Perhaps we should add an note/error on mishandled profile when function is 
> not COMDAT?
> That way we may notice further bugs in this area.

I have a warning like that already in drop_profile(). Is that
sufficient? Also, Steven Bosscher suggested putting that warning under
OPT_Wdisabled_optimization instead of '0', what do you prefer for
that?

>> +}
>> +
>> +  /* Propagate the profile dropping to other 0-count COMDATs that are
>> + potentially called by COMDATs we already dropped the profile on.  */
>> +  while (worklist.length () > 0)
>> +{
>> +  struct cgraph_edge *e;
>> +
>> +  node = worklist.pop ();
>> +  for (e = node->callees; e; e = e->next_caller)
>> +{
>> +  struct cgraph_node *callee = e->callee;
>> +  struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>> +
>> +  if (callee->count > 0)
>> +continue;
>> +  if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>> +  && profile_status_for_function (fn) == PROFILE_READ)
>
> Perhaps we can check here maybe_hot_bb_p for the caller and rely on

Re: [RFA][PATCH] Isolate erroneous paths optimization

2013-11-11 Thread Jeff Law


On 11/11/13 05:16, Richard Biener wrote:

On Mon, Nov 11, 2013 at 4:11 AM, Jeff Law  wrote:

On 11/10/13 05:34, Eric Botcazou wrote:


But I think that you cannot transform

foo ()
{

*0 = 1;

}

to __builtin_trap as you can catch the trap via an exception handler
in a caller of foo, no?



That is true.  OK, I can see an argument that when using
-fnon-call-exceptions that kind of code should not be changed to call
__builtin_trap.



That's exactly the reason why gnat.dg/memtrap.adb started to fail after
Jeff's
patch went it.  So, if -fisolate-erroneous-paths isn't amended, we'll need
to
disable it in Ada like in Go.


In that case I think it would be fine to run the isolate paths
optimization, but to not omit the actual dereference of the NULL
pointer (possibly the dereference could be followed by a trap).



This would probably work for Ada as well.


OK.  It sounds like there's a pretty general consensus that we ought ot go
ahead and leave in a load/store of a NULL pointer.  I'll go ahead and run
with that.  I'll probably just emit SSA_NAME = *0, unless someone thinks we
ought ot distinguish between loads and stores, emitting SSA_NAME = *0 and *0
= 0 for the two cases respectively.


Can't you simply keep the trapping stmt as-is?
You eliminate less dead code if the trapping statement is a store.  You 
want all the RHS nonsense in that case to just "go away".


jeff

Re: [PATCH] make has_gate and has_execute useless

2013-11-11 Thread Trevor Saunders

On Mon, Nov 11, 2013 at 12:58:36PM +0100, Richard Biener wrote:
> On Mon, Nov 11, 2013 at 1:39 AM, Trevor Saunders  
> wrote:
> > On Fri, Nov 08, 2013 at 10:37:00AM +0100, Richard Biener wrote:
> >> On Thu, Nov 7, 2013 at 5:00 PM,   wrote:
> >> > From: Trevor Saunders 
> >> >
> >> > Hi,
> >> >
> >> >  This is the result of seeing what it would take to get rid of the 
> >> > has_gate and
> >> > has_execute flags on pass_data.  It turns out not much, but I wanted
> >> > confirmation this part is ok before I go deal with all the places that
> >> > initialize the fields.
> >> >
> >> > I bootstrapped this part on x86_64-unknown-linux-gnu and saw no new test 
> >> > suite
> >> > regressions (ignoring the silk stuff because the full paths in its test 
> >> > names
> >> > break my test script for now) Any reason this patch with the actual 
> >> > removal of the flags wouldn't be ok?
> >>
> >> The has_gate flag is easy to remove (without a TODO_ hack), right?
> >> No gate simply means that the gate returns always true.  The only
> >> weird thing is
> >>
> >>   /* If we have a gate, combine the properties that we could have with
> >>  and without the pass being examined.  */
> >>   if (pass->has_gate)
> >> properties &= new_properties;
> >>   else
> >> properties = new_properties;
> >>
> >> which I don't understand (and you just removed all properties handling 
> >> there).
> >>
> >> So can you split out removing has_gate?  This part is obviously ok.
> >>
> >> Then, for ->execute () I'd have refactored the code to make
> >> ->sub passes explicitely executed by the default ->execute ()
> >> implementation only.  That is, passes without overriding execute
> >> are containers only.  Can you quickly check whether that would
> >> work out?
> >
> > Ok, I've now given this a shot and wasn't terribly successful, if I just
> > change execute_pass_list and execute_ipa_pass_list to handle container
> > passes executing their sub passes I get the following ICE
> >
> > ./gt-passes.h:77:2: internal compiler error: Segmentation fault
> >  };
> >^
> >0xd43d96 crash_signal
> >/src/gcc/gcc/toplev.c:334
> >0xd901a9 ssa_default_def(function*, tree_node*)
> >/src/gcc/gcc/tree-dfa.c:318
> >0xb56d77 ipa_analyze_params_uses
> >/src/gcc/gcc/ipa-prop.c:2094
> >0xb57275 ipa_analyze_node(cgraph_node*)
> >/src/gcc/gcc/ipa-prop.c:2179
> >0x13e2b6d ipcp_generate_summary
> >/src/gcc/gcc/ipa-cp.c:3615
> >0xc55a2a
> >
> > execute_ipa_summary_passes(ipa_opt_pass_d*)
> >
> > /src/gcc/gcc/passes.c:1991
> >0x943341 ipa_passes
> >
> > /src/gcc/gcc/cgraphunit.c:2011
> >0x943675
> >compile()
> >
> > /src/gcc/gcc/cgraphunit.c:2118
> >
> >now
> >  Which is because fn->gimple_df is null.  I expect that's because pass
> >  ordering changed somehow, but I'm not sure off hand how or ifthat's
> >  something worth figuring out right now.
> 
> Yeah, some of the walking doesn't seem to work ... probably a
> pass with sub-passes already has an execute member function ;)

Sounds like a reasonable guess.

> >  Another issue I realized is that doing this will change the order of
> >  plugin events from
> >  start container pass a
> >  end container pass a
> >  start contained pass b
> >  end contained pass b
> >  ...
> >
> >  to
> >
> >  start container pass a
> >  start contained pass b
> >  end contained pass b
> >  end container pass a
> >
> >  Arguably that's an improvement, but I'm not sure if changing that APi
> >  like that is acceptable.
> 
> Indeed an improvement.  Can you attach your patch?

ok, done

thanks

Trev

> 
> Thanks,
> Richard.
> 
> >  Trev
> >
> >>
> >> Thanks,
> >> Richard.
> >>
> >> > Trev
> >> >
> >> > 2013-11-06  Trevor Saunders  
> >> >
> >> > * pass_manager.h (pass_manager): Adjust.
> >> > * passes.c (opt_pass::execute): Tell the pass manager it doesn't 
> >> > need
> >> > to do anything for this pass.
> >> > (pass_manager::register_dump_files_1): Don't uselessly deal with
> >> > properties of passes.
> >> > (pass_manager::register_dump_files): Adjust.
> >> > (dump_one_pass): Just call pass->gate ().
> >> > (execute_ipa_summary_passes): Likewise.
> >> > (execute_one_pass): Don't check pass->has_exec

Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-11 Thread Dodji Seketeli

Jakub Jelinek  writes:

>> -OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o 
>> input.o version.o
>> +OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o 
>> vec.o input.o version.o
>
> Too long line?

Fixed in my local copy of the patch, thanks.

>
>> +  if (c == '\0')
>> +c = ' ';
>>pp_character (context->printer, c);
>
> Does that match how libcpp counts the embedded '\0' character in column
> computation?

Yes, I think so.  _cpp_lex_direct in libcpp/lex.c considers '\0' just
like a white space and so the column number is incremented when it's
encountered.

>> +/* The position (byte count) the the last byte of the line.  This
>> +   normally points to the '\n' character, or to one byte after the
>> +   last byte of the file, if the file doesn't contain a '\n'
>> +   character.  */
>> +size_t end_pos;
>
> Does it really help to note this?  You can always just walk the line from
> start_pos looking for '\n' or end of file.

Yes you are right, it's not strictly necessary.  But with that end_pos,
copying a line is even faster; no need of walking.  I thought the goal
was to avoid re-doing the work we've already done, as much as possible.

>
>> +static fcache*
>> +add_file_to_cache_tab (const char *file_path)
>> +{
>> +  static size_t idx;
>> +  fcache *r;
>> +
>> +  FILE *fp = fopen (file_path, "r");
>> +  if (ferror (fp))
>> +{
>> +  fclose (fp);
>> +  return NULL;
>> +}
>> +
>> +  r = &fcache_tab[idx];
>
> Wouldn't it be better to use some LRU algorithm to determine which
> file to kick out of the cache?  Have some tick counter of cache lookups (or
> creation) and assign the tick counter to the just created resp. used
> cache entry, and remove the one with the smallest counter?

Hehe, the LRU idea occurred to me too, but I dismissed the idea as
something probably over-engineered.  But now that you are mentioning it
I guess I should give it a try ;-) I'll post a patch about that later
then.

>> +  fcache * r = lookup_file_in_cache_tab (file_path);
>> +  if (r ==  NULL)
>
> Formatting (no space after *, extra space after ==).

Fixed in my local copy.  Thanks.

-- 
Dodji

Re: [PATCH] Fix asan regtest failures c-c++-common/asan/{memcmp-1.c,strncpy-overflow-1.c}

2013-11-11 Thread Dodji Seketeli

Jakub Jelinek  writes:

> On Mon, Nov 11, 2013 at 03:01:53PM +0100, Dodji Seketeli wrote:
>> Since a couple of days I am seeing failure on the tests above on my
>> Fedora system.  The errors look like:
>> 
>> FAIL: c-c++-common/asan/memcmp-1.c  -O0  output pattern test, is 
>> =
>> ==21832==ERROR: AddressSanitizer: stack-buffer-overflow on address
>> 0x7fff75df96f4 at pc 0x7f98ecbab68b bp 0x7fff75df96b0 sp
>> 0x7fff75df95b8
>> READ of size 6 at 0x7fff75df96f4 thread T0
>> #0 0x7f98ecbab68a in __interceptor_memcmp 
>> /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 
>> (discriminator 7)
>> #1 0x7f98ecbb6393 in __asan_report_error 
>> /home/dodji/git/gcc/master/libsanitizer/asan/asan_report.cc:774 
>> (discriminator 9)
>> #2 0x7f98ecbab6d0 in __interceptor_memcmp 
>> /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 
>> (discriminator 7)
>> #3 0x400b0a in main 
>> /home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/memcmp-1.c:14
>
> That looks like a bug in libasan, __asan_report_error doesn't call memcmp
> again.  So something is wrong with getting proper backtrace it seems.

Correct.  I'll need to get familiar with the backtracing part then,
unless someone beats me to it.

-- 
Dodji

Re: [PATCH] decide edge's hotness when there is profile info

2013-11-11 Thread Jan Hubicka

> I have a warning like that already in drop_profile(). Is that

I think it should be warning (or silent) for COMDAT and error/note for
other functions (depending on flag_profile_correction).
I guess drop_profile is better place for it indeed.

> sufficient? Also, Steven Bosscher suggested putting that warning under
> OPT_Wdisabled_optimization instead of '0', what do you prefer for
> that?

It is a bit different case than other disabled optimizations we have (where
the optimization does not happen because of --param tunable limits), but I
am fine with both alternatives.
The warning may end up quite noisy so having way to silence it definitely
makes sense.
> 
> >> +}
> >> +
> >> +  /* Propagate the profile dropping to other 0-count COMDATs that are
> >> + potentially called by COMDATs we already dropped the profile on.  */
> >> +  while (worklist.length () > 0)
> >> +{
> >> +  struct cgraph_edge *e;
> >> +
> >> +  node = worklist.pop ();
> >> +  for (e = node->callees; e; e = e->next_caller)
> >> +{
> >> +  struct cgraph_node *callee = e->callee;
> >> +  struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
> >> +
> >> +  if (callee->count > 0)
> >> +continue;
> >> +  if (DECL_COMDAT (callee->decl) && fn && fn->cfg
> >> +  && profile_status_for_function (fn) == PROFILE_READ)
> >
> > Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile 
> > estimate
> > to give us false only for really known to be unlikely paths? (i.e. EH 
> > handling, noreturns
> > etc.)
> 
> Ok, let me try this.
> 
> >> +{
> >> +  /* Since there are no non-0 call counts to this function,
> >> + we don't know for sure whether it is hot. Indicate to
> >> + the drop_profile routine that function should be marked
> >> + normal, rather than hot.  */
> >> +  drop_profile (node, false);
> >> +  worklist.safe_push (callee);
> >> +}
> >> +}
> >> +}
> >> +  worklist.release ();
> >
> > I would add a pointer set so we avoid duplicates in worklist.  This is 
> > potentially quadratic
> > for large programs.
> 
> I'll add a visited_nodes set to keep track of processed nodes so they
> don't get added to the worklist multiple times.

Perhaps you can also track this by testing profile_status_for_function. Both 
solutions are fine for me.

Honza
> 
> Thanks,
> Teresa
> 
> >
> > OK, with these changes.
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [RFA][PATCH] Isolate erroneous paths optimization

2013-11-11 Thread Jeff Law


On 11/11/13 02:33, Eric Botcazou wrote:

However, that brings up an couple interesting questions.

Let's say we find a NULL pointer which reaches a return statement in a
function which is marked as returns_nonnull.  In that case there is no
dereference.  Presumably for that kind of scenario we'll just keep the
builtin trap.

Similarly, assume we extend this pass to detect out-of-bounds array
indexing.  It's fairly simple to do and has always been in my plan.  In
that case leaving in the array indexing  won't necessarily generate a
fault.  For those presumably we'll just want the builtin_trap as well?

Again, I don't mind inserting a *0, I just want to have a plan for the
other undefined behaviours we currently detect and those which I plan on
catching soon.


The more general problem is that, with -fnon-call-exceptions, we really expect
a fully-fledged exception to be raised when something goes wrong.  Inserting
__builtin_trap doesn't work because it's simply not equivalent to a throw.  In
other words, if __builtin_throw would be inserted instead of __builtin_trap
with -fnon-call-exceptions, things would probably be acceptable as-is.
Hmm, maybe that's a better soultion then.  When non-call-exceptions is 
active, throw rather than trap.


Seems fairly reasonable.

Jeff

Re: [RFA][PATCH] Isolate erroneous paths optimization

2013-11-11 Thread Jakub Jelinek

On Mon, Nov 11, 2013 at 09:24:27AM -0700, Jeff Law wrote:
> On 11/11/13 02:33, Eric Botcazou wrote:
> >>However, that brings up an couple interesting questions.
> >>
> >>Let's say we find a NULL pointer which reaches a return statement in a
> >>function which is marked as returns_nonnull.  In that case there is no
> >>dereference.  Presumably for that kind of scenario we'll just keep the
> >>builtin trap.
> >>
> >>Similarly, assume we extend this pass to detect out-of-bounds array
> >>indexing.  It's fairly simple to do and has always been in my plan.  In
> >>that case leaving in the array indexing  won't necessarily generate a
> >>fault.  For those presumably we'll just want the builtin_trap as well?
> >>
> >>Again, I don't mind inserting a *0, I just want to have a plan for the
> >>other undefined behaviours we currently detect and those which I plan on
> >>catching soon.
> >
> >The more general problem is that, with -fnon-call-exceptions, we really 
> >expect
> >a fully-fledged exception to be raised when something goes wrong.  Inserting
> >__builtin_trap doesn't work because it's simply not equivalent to a throw.  
> >In
> >other words, if __builtin_throw would be inserted instead of __builtin_trap
> >with -fnon-call-exceptions, things would probably be acceptable as-is.
> Hmm, maybe that's a better soultion then.  When non-call-exceptions
> is active, throw rather than trap.

But throw what?  It is up to the runtimes of -fnon-call-exceptions languages
to decide if they actually want to throw or do something else in the signal
handlers, and what exactly to throw.

Jakub

Re: [RFA][PATCH] Isolate erroneous paths optimization

2013-11-11 Thread Ian Lance Taylor

On Mon, Nov 11, 2013 at 8:27 AM, Jakub Jelinek  wrote:
> On Mon, Nov 11, 2013 at 09:24:27AM -0700, Jeff Law wrote:
>> On 11/11/13 02:33, Eric Botcazou wrote:
>> >>However, that brings up an couple interesting questions.
>> >>
>> >>Let's say we find a NULL pointer which reaches a return statement in a
>> >>function which is marked as returns_nonnull.  In that case there is no
>> >>dereference.  Presumably for that kind of scenario we'll just keep the
>> >>builtin trap.
>> >>
>> >>Similarly, assume we extend this pass to detect out-of-bounds array
>> >>indexing.  It's fairly simple to do and has always been in my plan.  In
>> >>that case leaving in the array indexing  won't necessarily generate a
>> >>fault.  For those presumably we'll just want the builtin_trap as well?
>> >>
>> >>Again, I don't mind inserting a *0, I just want to have a plan for the
>> >>other undefined behaviours we currently detect and those which I plan on
>> >>catching soon.
>> >
>> >The more general problem is that, with -fnon-call-exceptions, we really 
>> >expect
>> >a fully-fledged exception to be raised when something goes wrong.  Inserting
>> >__builtin_trap doesn't work because it's simply not equivalent to a throw.  
>> >In
>> >other words, if __builtin_throw would be inserted instead of __builtin_trap
>> >with -fnon-call-exceptions, things would probably be acceptable as-is.
>> Hmm, maybe that's a better soultion then.  When non-call-exceptions
>> is active, throw rather than trap.
>
> But throw what?  It is up to the runtimes of -fnon-call-exceptions languages
> to decide if they actually want to throw or do something else in the signal
> handlers, and what exactly to throw.

Yes.  At least for Go it would work fine to call a language defined
function for a nil pointer dereference (in fact I already have one you
could call), but then LTO might be an issue.

Ian

Re: PING Cilk Plus failures on non-LTO targets

2013-11-11 Thread David Edelsohn

On Mon, Nov 4, 2013 at 11:06 AM, David Edelsohn  wrote:
> Balaji,
>
> I am seeing a large number of libcilkrts failures on AIX.  These all
> are of the form:
>
> Executing on host: /tmp/20131103/gcc/xgcc -B/tmp/20131103/gcc/ 
> /nasfarm/edelsohn
> /src/src/gcc/testsuite/c-c++-common/cilk-plus/CK/sync_wo_spawn.c  
> -fno-diagnosti
> cs-show-caret -fdiagnostics-color=never   -O0 -flto -g -fcilkplus 
> -L/tmp/2013110
> 3/powerpc-ibm-aix7.1.0.0/./libcilkrts/.libs -fcilkplus -S  -o sync_wo_spawn.s
>  (timeout = 300)
> cc1: error: LTO support has not been enabled in this configuration
> compiler exited with status 1
>
> If the feature / functionality requires LTO, the tests should check
> that LTO is supported with check_effective_target_lto, either in each
> relevant test or in cilk-plus.exp.

Balaji,

I continue to see extra Cilk Plus failures on AIX.  Tests that require
LTO should check that the target supports LTO.

Thanks, David

Re: [patch] Add CONSTRUCTOR_NO_CLEARING flag

2013-11-11 Thread Eric Botcazou

> Ok.  Can you update doc/generic.texi?

Thanks, done (it was still talking about TREE_LIST).

-- 
Eric Botcazou

Re: [PATCH] decide edge's hotness when there is profile info

2013-11-11 Thread Teresa Johnson

On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka  wrote:
>> I have a warning like that already in drop_profile(). Is that
>
> I think it should be warning (or silent) for COMDAT and error/note for
> other functions (depending on flag_profile_correction).
> I guess drop_profile is better place for it indeed.

I don't want to warn for COMDAT since this is currently an expected
issue in that case, so I'm afraid it will be too noisy (but there is a
note emitted to the dump file to track how often this occurs). So
currently the warning is only emitted in cases where we don't expect
it to occur (e.g. non-comdat).

>
>> sufficient? Also, Steven Bosscher suggested putting that warning under
>> OPT_Wdisabled_optimization instead of '0', what do you prefer for
>> that?
>
> It is a bit different case than other disabled optimizations we have (where
> the optimization does not happen because of --param tunable limits), but I
> am fine with both alternatives.
> The warning may end up quite noisy so having way to silence it definitely
> makes sense.

I didn't find any warnings being emitted when running this for spec or
internal benchmarks, so how about instead of a warning, I handle it
like you suggest above: depending on the setting of
flag_profile_correction either error or emit a note to the dump file
under MSG_MISSED_OPTIMIZATION?

>>
>> >> +}
>> >> +
>> >> +  /* Propagate the profile dropping to other 0-count COMDATs that are
>> >> + potentially called by COMDATs we already dropped the profile on.  */
>> >> +  while (worklist.length () > 0)
>> >> +{
>> >> +  struct cgraph_edge *e;
>> >> +
>> >> +  node = worklist.pop ();
>> >> +  for (e = node->callees; e; e = e->next_caller)
>> >> +{
>> >> +  struct cgraph_node *callee = e->callee;
>> >> +  struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>> >> +
>> >> +  if (callee->count > 0)
>> >> +continue;
>> >> +  if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>> >> +  && profile_status_for_function (fn) == PROFILE_READ)
>> >
>> > Perhaps we can check here maybe_hot_bb_p for the caller and rely on 
>> > profile estimate
>> > to give us false only for really known to be unlikely paths? (i.e. EH 
>> > handling, noreturns
>> > etc.)
>>
>> Ok, let me try this.
>>
>> >> +{
>> >> +  /* Since there are no non-0 call counts to this function,
>> >> + we don't know for sure whether it is hot. Indicate to
>> >> + the drop_profile routine that function should be marked
>> >> + normal, rather than hot.  */
>> >> +  drop_profile (node, false);
>> >> +  worklist.safe_push (callee);
>> >> +}
>> >> +}
>> >> +}
>> >> +  worklist.release ();
>> >
>> > I would add a pointer set so we avoid duplicates in worklist.  This is 
>> > potentially quadratic
>> > for large programs.
>>
>> I'll add a visited_nodes set to keep track of processed nodes so they
>> don't get added to the worklist multiple times.
>
> Perhaps you can also track this by testing profile_status_for_function. Both 
> solutions are fine for me.

Ah - I just realized I am already checking profile_status_for_function
above and adding to the worklist if it is still PROFILE_READ. Since I
call drop_profile before adding to the worklist, we can't end up
adding multiple times. So I don't think there is currently an issue
with this.

Thanks,
Teresa

>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > OK, with these changes.
>> >
>> > Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [patch] Fix PR ada/35998

2013-11-11 Thread Eric Botcazou

> Due to the different interfaces of int_size_in_bytes and
> simple_type_size_in_bits (and 'size' in add_byte_size_attribute being
> unsigned and not [unsigned] HWI) it would be cleaner to
> add an early return after the call to int_size_in_bytes if its
> return value is -1 (and make sure the return value doesn't
> overflow an unsigned int - likewise for simple_type_size_in_bits,
> not sure why that case doesn't use int_size_in_bytes as well ...)?

Both calls are present in the first version of the function, but I agree that 
the discrepancy looks strange.

Revised version attached, tested {GCC,GDB} on x86_64-suse-linux.


2013-11-11  Eric Botcazou  

PR ada/35998
* dwarf2out.c (add_byte_size_attribute): Use int_size_in_bytes also
for fields.  Do not add the attribute if the size is negative.


-- 
Eric BotcazouIndex: dwarf2out.c
===
--- dwarf2out.c	(revision 20)
+++ dwarf2out.c	(working copy)
@@ -16319,11 +16319,13 @@ add_subscript_info (dw_die_ref type_die,
 }
 }
 
+/* Add a DW_AT_byte_size attribute to DIE with TREE_NODE's size.  */
+
 static void
 add_byte_size_attribute (dw_die_ref die, tree tree_node)
 {
   dw_die_ref decl_die;
-  unsigned size;
+  HOST_WIDE_INT size;
 
   switch (TREE_CODE (tree_node))
 {
@@ -16347,7 +16349,7 @@ add_byte_size_attribute (dw_die_ref die,
 	 generally given as the number of bytes normally allocated for an
 	 object of the *declared* type of the member itself.  This is true
 	 even for bit-fields.  */
-  size = simple_type_size_in_bits (field_type (tree_node)) / BITS_PER_UNIT;
+  size = int_size_in_bytes (field_type (tree_node));
   break;
 default:
   gcc_unreachable ();
@@ -16356,8 +16358,9 @@ add_byte_size_attribute (dw_die_ref die,
   /* Note that `size' might be -1 when we get to this point.  If it is, that
  indicates that the byte size of the entity in question is variable.  We
  have no good way of expressing this fact in Dwarf at the present time,
- so just let the -1 pass on through.  */
-  add_AT_unsigned (die, DW_AT_byte_size, size);
+ when location description was not used by the caller code instead.  */
+  if (size >= 0)
+add_AT_unsigned (die, DW_AT_byte_size, size);
 }
 
 /* For a FIELD_DECL node which represents a bit-field, output an attribute

Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-11 Thread Bill Schmidt

Hi Yufeng,

The idea is a good one but I don't like your implementation of adding an
extra expression parameter to look at on the find_basis_for_candidate
lookup.  This goes against the design of the pass and may not be
sufficiently general (might there be situations where a third possible
basis could exist?).

The overall design is set up to have alternate interpretations of
candidates in the candidate table to handle this sort of ambiguity.  The
goal for your example is create a second candidate (chained to the first
one by way of the next_interp field) so that the candidate table looks
like this:

   8  [2] *_10[j_7(D)] = 2;
  REF  : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] *
  basis: 0  dependent: 0  sibling: 0
  next-interp: 9  dead-savings: 0

   9  [2] *_10[j_7(D)] = 2;
  REF  : _5 + ((sizetype) j_7(D) * 4) + 800 : int[20] *
  basis: 5  dependent: 0  sibling: 0
  next-interp: 0  dead-savings: 0

This will in turn allow subsequent candidates to be seen in terms of
either _5 or _10, which may be necessary to avoid missed opportunities.
There may be a subsequent REF _15 +... that can be an affine expression
of either of these, for example.

If you fail to find a basis for a candidate with its first
interpretation, you can then follow the next-interp chain to look for a
basis for the next one, without the messy passing of extra possibilities
to the find-basis routine.

I haven't read the patch in detail, but I think this should give you
enough to work with to re-design the idea to fit better with the
existing framework.  Please let me know if you need more information, or
if you feel I've misunderstood something.

Thanks,
Bill

On Mon, 2013-11-04 at 18:41 +, Yufeng Zhang wrote:
> Hi,
> 
> This patch extends the slsr pass to optionally use an alternative base 
> expression in finding basis for CAND_REFs.  Currently the pass uses 
> hash-based algorithm to match the base_expr in a candidate.  Given a 
> test case like the following, slsr will not be able to recognize the two 
> CAND_REFs have the same basis, as their base_expr are of different 
> SSA_NAMEs:
> 
> typedef int arr_2[20][20];
> 
> void foo (arr_2 a2, int i, int j)
> {
>a2[i][j] = 1;
>a2[i + 10][j] = 2;
> }
> 
> The gimple dump before slsr is like the following (using an 
> arm-none-eabi gcc):
> 
>i.0_2 = (unsigned int) i_1(D);
>_3 = i.0_2 * 80;
>_5 = a2_4(D) + _3;
>*_5[j_7(D)] = 1;  <
>_9 = _3 + 800;
>_10 = a2_4(D) + _9;
>*_10[j_7(D)] = 2; <
> 
> Here are the dumps for the two CAND_REFs generated for the two 
> statements pointed by the arrows:
> 
> 
>4  [2] _5 = a2_4(D) + _3;
>   ADD  : a2_4(D) + (80 * i_1(D)) : int[20] *
>   basis: 0  dependent: 0  sibling: 0
>   next-interp: 0  dead-savings: 0
> 
>8  [2] *_10[j_7(D)] = 2;
>   REF  : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] *
>   basis: 5  dependent: 0  sibling: 0
>   next-interp: 0  dead-savings: 0
> 
> As mentioned previously, slsr cannot establish that candidate 4 is the 
> basis for the candidate 8, as they have different base_exprs: a2_4(D) 
> and _10, respectively.  However, the two references actually only differ 
> by an immediate offset (800).
> 
> This patch uses the tree affine combination facilities to create an 
> optional alternative base expression to be used in finding (as well as 
> recording) the basis.  It calls tree_to_aff_combination_expand on 
> base_expr, reset the offset field of the generated aff_tree to 0 and 
> generate a tree from it by calling aff_combination_to_tree.
> 
> The new tree is recorded as a potential basis, and when 
> find_basis_for_candidate fails to find a basis for a CAND_REF in its 
> normal approach, it searches again using a tree expanded in such way. 
> Such an expanded tree usually discloses the expression behind an 
> SSA_NAME.  In the example above, instead of seeing the strength 
> reduction candidate chains like this:
> 
>_5 -> 5
>_10 -> 8
> 
> we are now having:
> 
>_5 -> 5
>_10 -> 8
>a2_4(D) + (sizetype) i_1(D) * 80 -> 5 -> 8
> 
> With the candidates 5 and 8 linked to the same tree expression (a2_4(D) 
> + (sizetype) i_1(D) * 80), slsr is now able to establish that 5 is the 
> basis of 8.
> 
> The patch doesn't attempt to change the content of any CAND_REF though. 
>   It only enables CAND_REFs which (1) have the same stride and (2) have 
> the underlying expressions of their base_expr only differ in immediate 
> offsets,  to be recognized to have the same basis.  The statements with 
> such CAND_REFs will be lowered to MEM_REFs, and later on the RTL 
> expander shall be able to fold and re-associate the immediate offsets to 
> the rightmost side of the addressing expression, and therefore exposes 
> the common sub-expression successfully.
> 
> The code-gen difference of the example code on arm with -O2 
> -mcpu=cortex-15 is:
> 
>  mov r3, r1, asl #6
> -   add ip, r0, r2, asl #2
>

1 2 >

1 - 100 of 163 matches

Mail list logo