date:20120903

Re: [PATCH] Improve VPR for some builtins and non pointer checks

2012-09-03 Thread Jakub Jelinek

On Sun, Sep 02, 2012 at 10:18:15PM -0700, Andrew Pinski wrote:
>   While fixing some code not to have aliasing violations in it, I can
> across that some builtins were not causing their arguments or their
> return values being marked as non-null.  This patch implements just
> that in VPR while allowing to remove some null pointer checks later
> on.
> 
> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> @@ -1057,6 +1057,20 @@ vrp_stmt_computes_nonzero (gimple stmt,
>   }
>  }
>  
> +  /* With some builtins, we can infer if the pointer return value
> + will be non null.  */
> +  if (flag_delete_null_pointer_checks
> +  && is_gimple_call (stmt) && gimple_call_fndecl (stmt)
> +  && DECL_BUILT_IN_CLASS (gimple_call_fndecl (stmt)) == BUILT_IN_NORMAL)
> +{
> +  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
> + {
> +   case BUILT_IN_MEMCPY:
> +   case BUILT_IN_MEMMOVE:
> + return true;
> + }
> +}
> +
>return false;
>  }
>  

That is too hackish and lists way too few builtins.
If you rely on nonnull attribute marked builtins, I'd say you want
flags = gimple_call_return_flags (stmt);
if ((flags & ERF_RETURNS_ARG)
&& (flags & ERF_RETURN_ARG_MASK) < gimple_call_num_args (stmt))
  {
/* Test nonnull attribute on the decl, either argument-less or
   on the (flags & ERF_RETURN_ARG_MASK)th argument.  */
  }
Or at least handle builtins e.g. CCP handles as pass-thru arg1:
BUILT_IN_MEMCPY, BUILT_IN_MEMMOVE, BUILT_IN_MEMSET, BUILT_IN_STRCPY,
BUILT_IN_STRNCPY, BUILT_IN_MEMCPY_CHK, BUILT_IN_MEMMOVE_CHK,
BUILT_IN_MEMSET_CHK, BUILT_IN_STRCPY_CHK, BUILT_IN_STRNCPY_CHK
(which reminds me that some of these apparently aren't marked
with ATTR_RET1_NOTHROW_NONNULL_LEAF, why?).

> @@ -4231,6 +4245,32 @@ infer_value_range (gimple stmt, tree op,
>   }
>  }
>  
> +  /* With some builtins, we can infer if the pointer argument
> + will be non null.  */
> +  if (flag_delete_null_pointer_checks
> +  && is_gimple_call (stmt) && gimple_call_fndecl (stmt))
> +{
> +  tree callee = gimple_call_fndecl (stmt);
> +  if (DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL)
> + {
> +   switch (DECL_FUNCTION_CODE (callee))
> + {
> +   case BUILT_IN_MEMCPY:
> +   case BUILT_IN_MEMMOVE:
> +   case BUILT_IN_STRCMP:
> +   case BUILT_IN_MEMCMP:
> + /* The first and second arguments of memcpy and memmove will be 
> non null after the call. */
> + if (gimple_call_arg (stmt, 0) == op
> + || gimple_call_arg (stmt, 1) == op)
> +   {
> + *val_p = build_int_cst (TREE_TYPE (op), 0);
> + *comp_code_p = NE_EXPR;
> + return true;
> +   }
> + }
> + }
> +}

Again, what you are looking for here?  Passing pointers to nonnull
attributes, or something more specific?  What exactly?  There are tons of
builtins that behave similarly to memcpy/memmove/strcmp/memcmp.

Jakub

Re: [PATCH 4/6] Thread pointer built-in functions, s390

2012-09-03 Thread Andreas Krebbel

On 28/08/12 10:14, Chung-Lin Tang wrote:
> On 12/7/18 8:05 PM, Andreas Krebbel wrote:
>> On 07/12/2012 08:52 AM, Chung-Lin Tang wrote:
>>> * config/s390/s390.c (s390_builtin,code_for_builtin_64,
>>> code_for_builtin_31,s390_init_builtins,s390_expand_builtin):
>>> Remove.
>>> (s390_expand_builtin_thread_pointer): Add hook function for
>>> TARGET_EXPAND_BUILTIN_THREAD_POINTER.
>>> (s390_expand_builtin_set_thread_pointer): Add hook function for
>>> TARGET_EXPAND_BUILTIN_SET_THREAD_POINTER.
>>
>> I've tested your patches on s390x. No regressions.
>>
>> The patch is ok.
>>
>> Bye,
>>
>> -Andreas-
>>
> 
> S390 parts updated to use MD pattern. Sorry Andreas, would you mind
> testing the updated patches again?

Retested on s390 and s390x. No regressions.

Ok to apply.

Thanks!

Bye,

-Andreas-

Re: [PATCH] Improve VPR for some builtins and non pointer checks

2012-09-03 Thread Andrew Pinski

On Sun, Sep 2, 2012 at 11:36 PM, Georg-Johann Lay  wrote:
> Andrew Pinski schrieb:
>>
>> Hi,
>>   While fixing some code not to have aliasing violations in it, I can
>> across that some builtins were not causing their arguments or their
>> return values being marked as non-null.  This patch implements just
>> that in VPR while allowing to remove some null pointer checks later
>> on.
>>
>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * tree-vrp.c (vrp_stmt_computes_nonzero): Return true for some
>> builtins (memcpy and memmove).
>> (infer_value_range): Infer nonzero for some arguments to
>> some builtins (memcpy, memmove, strcmp and memcmp).
>>
>> testsuite/ChangeLog:
>> * gcc.dg/tree-ssa/vrp-builtins1.c: New testcase.
>>
>>
>>
>> Index: tree-vrp.c
>> ===
>> --- tree-vrp.c  (revision 190868)
>> +++ tree-vrp.c  (working copy)
>> @@ -1057,6 +1057,20 @@ vrp_stmt_computes_nonzero (gimple stmt,
>> }
>>  }
>>  +  /* With some builtins, we can infer if the pointer return value
>> + will be non null.  */
>> +  if (flag_delete_null_pointer_checks
>> +  && is_gimple_call (stmt) && gimple_call_fndecl (stmt)
>> +  && DECL_BUILT_IN_CLASS (gimple_call_fndecl (stmt)) ==
>> BUILT_IN_NORMAL)
>> +{
>> +  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
>> +   {
>> + case BUILT_IN_MEMCPY:
>> + case BUILT_IN_MEMMOVE:
>> +   return true;
>> +   }
>> +}
>> +
>>return false;
>>  }
>>  @@ -4231,6 +4245,32 @@ infer_value_range (gimple stmt, tree op,
>> }
>>  }
>>  +  /* With some builtins, we can infer if the pointer argument
>> + will be non null.  */
>> +  if (flag_delete_null_pointer_checks
>> +  && is_gimple_call (stmt) && gimple_call_fndecl (stmt))
>> +{
>> +  tree callee = gimple_call_fndecl (stmt);
>> +  if (DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL)
>> +   {
>> + switch (DECL_FUNCTION_CODE (callee))
>> +   {
>> + case BUILT_IN_MEMCPY:
>> + case BUILT_IN_MEMMOVE:
>> + case BUILT_IN_STRCMP:
>> + case BUILT_IN_MEMCMP:
>> +   /* The first and second arguments of memcpy and memmove
>> will be non null after the call. */
>> +   if (gimple_call_arg (stmt, 0) == op
>> +   || gimple_call_arg (stmt, 1) == op)
>> + {
>> +   *val_p = build_int_cst (TREE_TYPE (op), 0);
>> +   *comp_code_p = NE_EXPR;
>> +   return true;
>> + }
>> +   }
>> +   }
>> +}
>> +
>>return false;
>>  }
>>  Index: testsuite/gcc.dg/tree-ssa/vrp-builtins1.c
>> ===
>> --- testsuite/gcc.dg/tree-ssa/vrp-builtins1.c   (revision 0)
>> +++ testsuite/gcc.dg/tree-ssa/vrp-builtins1.c   (revision 0)
>> @@ -0,0 +1,30 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-vrp1" } */
>
>
> Shouldn't this xfail for avr and other targets that set
> flag_delete_null_pointer_checks = 0, cf.
> avr.c:avr_option_override() for example.

This is taken care of by the check keeps_null_pointer_checks which
returns true for avr-*-*.

Thanks,
Andrew Pinski


>
> Johann
>
>> +
>> +struct f1
>> +{
>> +char a[4];
>> +};
>> +
>> +int f(int *a, struct f1 b)
>> +{
>> +  int *c = __builtin_memcpy(a, b.a, 4);
>> +  if (c == 0)
>> +return 0;
>> +  return *a;
>> +}
>> +
>> +
>> +int f1(int *a, struct f1 b)
>> +{
>> +  int *c = __builtin_memcpy(a, b.a, 4);
>> +  if (a == 0)
>> +return 0;
>> +  return *a;
>> +}
>> +
>> +/* Both the if statements should be folded when the target does not keep
>> around null pointer checks. */
>> +/* { dg-final { scan-tree-dump-times "Folding predicate" 0 "vrp1" {
>> target {   keeps_null_pointer_checks } } } } */
>> +/* { dg-final { scan-tree-dump-times "Folding predicate" 2 "vrp1" {
>> target { ! keeps_null_pointer_checks } } } } */
>> +/* { dg-final { cleanup-tree-dump "vrp1" } } */
>> +
>
>

Re: [PATCH] Improve VPR for some builtins and non pointer checks

2012-09-03 Thread Andrew Pinski

On Mon, Sep 3, 2012 at 12:03 AM, Jakub Jelinek  wrote:
> On Sun, Sep 02, 2012 at 10:18:15PM -0700, Andrew Pinski wrote:
>>   While fixing some code not to have aliasing violations in it, I can
>> across that some builtins were not causing their arguments or their
>> return values being marked as non-null.  This patch implements just
>> that in VPR while allowing to remove some null pointer checks later
>> on.
>>
>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>> @@ -1057,6 +1057,20 @@ vrp_stmt_computes_nonzero (gimple stmt,
>>   }
>>  }
>>
>> +  /* With some builtins, we can infer if the pointer return value
>> + will be non null.  */
>> +  if (flag_delete_null_pointer_checks
>> +  && is_gimple_call (stmt) && gimple_call_fndecl (stmt)
>> +  && DECL_BUILT_IN_CLASS (gimple_call_fndecl (stmt)) == BUILT_IN_NORMAL)
>> +{
>> +  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
>> + {
>> +   case BUILT_IN_MEMCPY:
>> +   case BUILT_IN_MEMMOVE:
>> + return true;
>> + }
>> +}
>> +
>>return false;
>>  }
>>
>
> That is too hackish and lists way too few builtins.
> If you rely on nonnull attribute marked builtins, I'd say you want
> flags = gimple_call_return_flags (stmt);
> if ((flags & ERF_RETURNS_ARG)
> && (flags & ERF_RETURN_ARG_MASK) < gimple_call_num_args (stmt))
>   {
> /* Test nonnull attribute on the decl, either argument-less or
>on the (flags & ERF_RETURN_ARG_MASK)th argument.  */
>   }
> Or at least handle builtins e.g. CCP handles as pass-thru arg1:
> BUILT_IN_MEMCPY, BUILT_IN_MEMMOVE, BUILT_IN_MEMSET, BUILT_IN_STRCPY,
> BUILT_IN_STRNCPY, BUILT_IN_MEMCPY_CHK, BUILT_IN_MEMMOVE_CHK,
> BUILT_IN_MEMSET_CHK, BUILT_IN_STRCPY_CHK, BUILT_IN_STRNCPY_CHK
> (which reminds me that some of these apparently aren't marked
> with ATTR_RET1_NOTHROW_NONNULL_LEAF, why?).


I did not know of these attributes and flags on the decl.  Yes I know
it only lists only a few of the builtins.
I will look into those flags and see if I can improve it.

>
>> @@ -4231,6 +4245,32 @@ infer_value_range (gimple stmt, tree op,
>>   }
>>  }
>>
>> +  /* With some builtins, we can infer if the pointer argument
>> + will be non null.  */
>> +  if (flag_delete_null_pointer_checks
>> +  && is_gimple_call (stmt) && gimple_call_fndecl (stmt))
>> +{
>> +  tree callee = gimple_call_fndecl (stmt);
>> +  if (DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL)
>> + {
>> +   switch (DECL_FUNCTION_CODE (callee))
>> + {
>> +   case BUILT_IN_MEMCPY:
>> +   case BUILT_IN_MEMMOVE:
>> +   case BUILT_IN_STRCMP:
>> +   case BUILT_IN_MEMCMP:
>> + /* The first and second arguments of memcpy and memmove will 
>> be non null after the call. */
>> + if (gimple_call_arg (stmt, 0) == op
>> + || gimple_call_arg (stmt, 1) == op)
>> +   {
>> + *val_p = build_int_cst (TREE_TYPE (op), 0);
>> + *comp_code_p = NE_EXPR;
>> + return true;
>> +   }
>> + }
>> + }
>> +}
>
> Again, what you are looking for here?  Passing pointers to nonnull
> attributes, or something more specific?  What exactly?  There are tons of
> builtins that behave similarly to memcpy/memmove/strcmp/memcmp.

Passing pointers arguments to nonnull attributed functions.  I will
see if I can use the nonnull attribute on those functions so that this
can be done without checking the builtin functions.

Thanks,
Andrew Pinski

>
> Jakub

[SH] PR 51244 - Add CANONICALIZE_COMPARISON macro

2012-09-03 Thread Oleg Endo

Hello,

This adds implementation of the CANONICALIZE_COMPARISON macro to the SH
target.  So far, it doesn't seem to have an impact on the generated
code, but it might be useful to have it in place in the future.
Moreover, it changes the behavior of the cbranchsi4 expander, which was
checking for TARGET_CBRANCHDI4, which seems a bit odd to do.
TARGET_CBRANCHDI4 is disabled for -Os (another story) and thus would
affect the behavior of the cbranchsi4 expander.  I've briefly checked
CSiBE result-size with this change and there are only changes in a few
files in the range of -20...+4.
Tested on rev 190865 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

ChangeLog:

PR target/51244
* config/sh/sh.c (prepare_cbranch_operands): Pull out 
comparison canonicalization code into...
* (sh_canonicalize_comparison): This new function.
* config/sh/sh-protos.h: Declare it.
* config/sh/sh.h: Use it in new macro CANONICALIZE_COMPARISON.
* config/sh/sh.md (cbranchsi4): Remove TARGET_CBRANCHDI4 check 
and always invoke expand_cbranchsi4.

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 190840)
+++ gcc/config/sh/sh.md	(working copy)
@@ -881,10 +881,9 @@
   if (TARGET_SHMEDIA)
 emit_jump_insn (gen_cbranchint4_media (operands[0], operands[1],
 	   operands[2], operands[3]));
-  else if (TARGET_CBRANCHDI4)
-expand_cbranchsi4 (operands, LAST_AND_UNUSED_RTX_CODE, -1);
   else
-sh_emit_compare_and_branch (operands, SImode);
+expand_cbranchsi4 (operands, LAST_AND_UNUSED_RTX_CODE, -1);
+
   DONE;
 })
 
Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 190840)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -106,6 +106,9 @@
 extern rtx sh_gen_truncate (enum machine_mode, rtx, int);
 extern bool sh_vector_mode_supported_p (enum machine_mode);
 extern bool sh_cfun_trap_exit_p (void);
+extern void sh_canonicalize_comparison (enum rtx_code&, rtx&, rtx&,
+	enum machine_mode mode = VOIDmode);
+
 #endif /* RTX_CODE */
 
 extern const char *output_jump_label_table (void);
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 190840)
+++ gcc/config/sh/sh.c	(working copy)
@@ -21,6 +21,12 @@
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+/* FIXME: This is a temporary hack, so that we can include 
+   below.   will try to include  which will reference
+   malloc & co, which are poisoned by "system.h".  The proper solution is
+   to include  in "system.h" instead of .  */
+#include 
+
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -56,6 +62,7 @@
 #include "tm-constrs.h"
 #include "opts.h"
 
+#include 
 
 int code_for_indirect_jump_scratch = CODE_FOR_indirect_jump_scratch;
 
@@ -1791,65 +1798,124 @@
 }
 }
 
-enum rtx_code
-prepare_cbranch_operands (rtx *operands, enum machine_mode mode,
-			  enum rtx_code comparison)
+// Implement the CANONICALIZE_COMPARISON macro for the combine pass.
+// This function is also re-used to canonicalize comparisons in cbranch
+// pattern expanders.
+void
+sh_canonicalize_comparison (enum rtx_code& cmp, rtx& op0, rtx& op1,
+			enum machine_mode mode)
 {
-  rtx op1;
-  rtx scratch = NULL_RTX;
+  // When invoked from within the combine pass the mode is not specified,
+  // so try to get it from one of the operands.
+  if (mode == VOIDmode)
+mode = GET_MODE (op0);
+  if (mode == VOIDmode)
+mode = GET_MODE (op1);
 
-  if (comparison == LAST_AND_UNUSED_RTX_CODE)
-comparison = GET_CODE (operands[0]);
-  else
-scratch = operands[4];
-  if (CONST_INT_P (operands[1])
-  && !CONST_INT_P (operands[2]))
+  // We need to have a mode to do something useful here.
+  if (mode == VOIDmode)
+return;
+
+  // Currently, we don't deal with floats here.
+  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
+return;
+
+  // Make sure that the constant operand is the second operand.
+  if (CONST_INT_P (op0) && !CONST_INT_P (op1))
 {
-  rtx tmp = operands[1];
+  std::swap (op0, op1);
+  cmp = swap_condition (cmp);
+}
 
-  operands[1] = operands[2];
-  operands[2] = tmp;
-  comparison = swap_condition (comparison);
-}
-  if (CONST_INT_P (operands[2]))
+  if (CONST_INT_P (op1))
 {
-  HOST_WIDE_INT val = INTVAL (operands[2]);
-  if ((val == -1 || val == -0x81)
-	  && (comparison == GT || comparison == LE))
+  // Try to adjust the constant operand in such a way that available
+  // comparison insns can be utilized better and the constant can be
+  // loaded with a 'mov #imm,Rm' insn.  This avoids a load from the
+  // constant pool.
+  const HOST_WIDE_INT

Re: Re-implement VEC_* to be member functions of vec_t

2012-09-03 Thread Richard Guenther

On Thu, 23 Aug 2012, Diego Novillo wrote:

> This patch is the first step towards making the API for VEC use
> member functions.
> 
> There are no user code modifications in this patch.  Everything
> is still using the VEC_* macros, but this time they expand into
> member function calls.
> 
> Because of the way VECs are used, this required some trickery.
> The API allows VECs to be NULL.  This means that services like
> VEC_length(V) will return 0 when V is a NULL pointer.  This is,
> of course, not possible to do if we call V->length().
> 
> For functions that either need to allocate/re-allocate the
> vector, or they need to handle NULL vectors, I implemented them
> as static member functions or free functions.
> 
> Another wart that I did not address in this patch is the fact
> that vectors of pointers and vectors of objects have slightly
> different semantics when handling elements in the vector.  In
> vector of pointers, we pass them around by value, but in vectors
> of objects, they are passed around via pointers.  That's why we
> need TYPE * and TYPE ** overloads for some functions (e.g.,
> vec_t::iterate).
> 
> I will fix these two warts in a subsequent patch.  The idea is to
> make vec_t a single-word structure, which acts as a handler for
> the structure containing the actual vector.  Something like this:
> 
> template
> struct vec_t
> {
>   struct vec_internal *vec_;
> };
> 
> This has the advantage that we can now declare the actual vector
> instances as regular variables, instead of pointers.  They will
> use the same amount of memory when embedded in other structures,
> and we will be able to allocate and reallocate the actual data
> without having to mutate the vector instance.
> 
> All the functions that are now static members in vec_t, will
> become instance members in the new vec_t.  This will mean that
> all the callers will need to be changed, of course.
> 
> There is another issue that I need to address and I'm not quite
> sure how to go about it: with the macro-based API, we make use of
> pre-processor trickery to insert __FILE__, __LINE__ and
> __FUNCTION__ into the argument list of functions.
> 
> When I change VEC_pop(V) with V->pop(), the macro expansion no
> longer exists and we lose the caller references.  Richi, I
> understand that your __builtin_FILE patch would allow me to
> declare default values for these arguments? Something like:
> 
> T vec_t::pop(const char *file_ = __builtin_FILE,
>   unsigned line_ = __builtin_LINE,
>   const char *function_ = __builtin_FUNCTION)
> 
> which would then be evaluated at the call site and get the right
> values.  Is that more or less how your solution works?

Yes.  I'll pick up on this patch again after I recovered from
my vacation.

> If so, then we could get away with that in most cases.  However,
> we would still have the problem of operator functions (e.g.,
> vec_t::operator[]).
> 
> I think I would like to explore the idea of implement a stack
> unwinder that's used by gcc_assert().  This way: (a) we do not
> need to uglify all the APIs with these extra arguments, (b) we
> can control how much of the call stack we show on an assertion.
> 
> Would that be something difficult to implement?  I don't think we
> need something as generic as libunwind.  Thoughts?

It won't work and it is not portable.  So I don't think we want to
go that way.  operator[] does never do allocation so you won't need
the file/line/function arguments.

Richard.

Re: Re-implement VEC_* to be member functions of vec_t

2012-09-03 Thread Richard Guenther

On Fri, 24 Aug 2012, Diego Novillo wrote:

> On 2012-08-24 12:03 , Gabriel Dos Reis wrote:
> 
> > I would just use C++ standard function `at()' (e.g. as found in vector)
> > for this.
> 
> Sure.  For regular functions, using default-valued arguments would be fine.
> But I think the mechanism would be much more transparent if the compiler did
> the heavy lifting.
> 
> 1- Add a class/function attribute that makes the compiler add 3 hidden
>args for the caller location.
> 
> 2- During code generation, the compiler fills in these values at call
>sites.

I don't think we want this ... how do you handle passing on this
hidden argument to callees in the function?  How do you distinguish
between pass-through and passing a new pack?  Consider

bar () __attribute__((pack));

foo () __attribute__((pack))
{
  malloc ();
  record (__builtin_file());
  bar ();
}

so we want to record calls of foo but also calls of bar.  How do
you distinguish the case where the pack needs to be passed down to
bar from the case where bar itself needs to be tracked.

Richard.

Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-09-03 Thread Christophe Lyon

On 31 August 2012 17:59, Richard Henderson  wrote:
> On 2012-08-31 07:25, Christophe Lyon wrote:
>> +  offset = gen_rtx_CONST_INT (VOIDmode, location);
>
> Never call gen_rtx_CONST_INT directly.  Use GEN_INT.
>
>
Here is an updated patch with that small change.
For the record, there are quite a few existing calls to
gen_rtx_CONST_INT, maybe a cleanup pass is needed?

Thanks,

Christophe.

2012-09-03  Christophe Lyon  

gcc/
* config/arm/arm.c (arm_evpc_neon_vext): New
function.
(arm_expand_vec_perm_const_1): Add call to
arm_evpc_neon_vext.

gcc/testsuite/
* gcc.target/arm/neon-vext.c
gcc.target/arm/neon-vext-execute.c:
New tests.

gcc-vec-permute-vext.patch
Description: Binary data

Re: [Patch ARM testsuite] fix 3 tests for big-endian

2012-09-03 Thread Christophe Lyon

On 31 August 2012 18:14, Janis Johnson  wrote:
>
> do something like
>
> /* { dg-final { scan-assembler-times "fmrrd\[\\t \]+r0,\[\\t \]*r1,\[\\t 
> \]*d0" 2 } { target arm_little_endian } } */
> /* { dg-final { scan-assembler-times "fmrrd\[\\t \]+r1,\[\\t \]*r0,\[\\t 
> \]*d0" 2  } {target { ! arm_little_endian } } } */
>
> That's untested, but you get the idea.
>
> Janis
>
>

Thanks for your review. Here is an updated patch.

Christophe.

2012-09-03  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/neon-vset_lanes8.c, gcc.target/arm/pr51835.c,
gcc.target/arm/pr48252.c: Fix for big-endian support.


big-endian-tests.patch
Description: Binary data

Re: [wwwdocs] PATCH for Re: [PATCH] Remove matrix-reorg

2012-09-03 Thread Richard Guenther

On Sun, 2 Sep 2012, Gerald Pfeifer wrote:

> Hi Richi,
> 
> On Fri, 10 Aug 2012, Richard Guenther wrote:
> > This removes matrix-reorg which is today useless and possibly
> > dangerous.  It follows struct-reorg down the kitchen-sink.
> 
> how about the following patch for the GCC 4.8 release notes?

I'd not mention the command-line flags.

> Would you like to propose a (politically correct ;-) snippet
> on why you removed the two ?

They were not working correctly and they did not work with LTO
which made them useless apart from for single-TU programs.

Richard.

> Gerald
> 
> Index: changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
> retrieving revision 1.26
> diff -u -3 -p -r1.26 changes.html
> --- changes.html  2 Sep 2012 15:56:24 -   1.26
> +++ changes.html  2 Sep 2012 18:39:27 -
> @@ -38,7 +38,7 @@ explicit use of vector types may be inco
>  built with older versions of GCC.  Auto-vectorized code is not affected
>  by this change.
>  
> -General Optimizer Improvements
> +General Optimizer Improvements (and Changes)
>  
>
>  A new option -ftree-partial-pre was added to control
> @@ -46,6 +46,9 @@ by this change.
>This option is enabled by default at the -O3 optimization
>level, and it makes PRE more aggressive.
>  
> +The struct reorg and matrix reorg optimizations (command-line
> +options -fipa-struct-reorg and 
> +-fipa-matrix-reorg) have been removed.
>
>  
>  
> 
> 

-- 
Richard Guenther 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

[avr]: Disable libquadmath

2012-09-03 Thread Georg-Johann Lay

This patch disables libquadmath.

128 bit wide floats are not really sensible on avr.

Ok for trunk?

Johann

--

* configure.ac (noconfigdirs,target=avr): Add target-libquadmath.
* configure: Regenerate.
Index: configure.ac
===
--- configure.ac	(revision 190873)
+++ configure.ac	(working copy)
@@ -544,6 +544,13 @@ case "${target}" in
 ;;
 esac
 
+# Disable libquadmath for some systems.
+case "${target}" in
+  avr-*-*)
+noconfigdirs="$noconfigdirs target-libquadmath"
+;;
+esac
+
 # Disable libstdc++-v3 for some systems.
 case "${target}" in
   *-*-vxworks*)
Index: configure
===
--- configure	(revision 190873)
+++ configure	(working copy)
@@ -3153,6 +3153,13 @@ case "${target}" in
 ;;
 esac
 
+# Disable libquadmath for some systems.
+case "${target}" in
+  avr-*-*)
+noconfigdirs="$noconfigdirs target-libquadmath"
+;;
+esac
+
 # Disable libstdc++-v3 for some systems.
 case "${target}" in
   *-*-vxworks*)

Re: Fold VEC_PERM_EXPR a little more

2012-09-03 Thread Marc Glisse


On Sun, 2 Sep 2012, Hans-Peter Nilsson wrote:


On Sat, 1 Sep 2012, Marc Glisse wrote:

gcc/
* fold-const.c (fold_ternary_loc): Constant-propagate after
removing dead operands.

gcc/testsuite/
* gcc.dg/fold-perm.c: Improve test.


(adding a line and a parameter to the function containing the test-code)

JFTR: generally speaking, editing existing tests is frowned
upon.  Adding a new separate test is almost always better.
Not sure there's reason for an exception this time, but I see
you're the author of the original test and it's < 2 weeks since
it was added.


Thanks for the comment. I am not sure there is much I can add to that. I 
know it is better to avoid modifying existing tests. And the fact that I 
am completing my patch from 2 weeks ago made me think it was ok this time. 
I'll break it up into fold-perm-2.c if the reviewer asks for it.


--
Marc Glisse

Re: [SH] PR 51244 - Add CANONICALIZE_COMPARISON macro

2012-09-03 Thread Kaz Kojima

Oleg Endo  wrote:
> --- gcc/config/sh/sh.c(revision 190840)
> +++ gcc/config/sh/sh.c(working copy)
> @@ -21,6 +21,12 @@
>  along with GCC; see the file COPYING3.  If not see
>  .  */
>  
> +/* FIXME: This is a temporary hack, so that we can include 
> +   below.   will try to include  which will reference
> +   malloc & co, which are poisoned by "system.h".  The proper solution is
> +   to include  in "system.h" instead of .  */
> +#include 
> +
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
[snip]
> @@ -1791,65 +1798,124 @@
>  }
>  }
>  
> -enum rtx_code
> -prepare_cbranch_operands (rtx *operands, enum machine_mode mode,
> -   enum rtx_code comparison)
> +// Implement the CANONICALIZE_COMPARISON macro for the combine pass.
> +// This function is also re-used to canonicalize comparisons in cbranch
> +// pattern expanders.
> +void
> +sh_canonicalize_comparison (enum rtx_code& cmp, rtx& op0, rtx& op1,
> + enum machine_mode mode)
>  {
> -  rtx op1;
> -  rtx scratch = NULL_RTX;
> +  // When invoked from within the combine pass the mode is not specified,
> +  // so try to get it from one of the operands.
> +  if (mode == VOIDmode)
> +mode = GET_MODE (op0);

I'm not sure that the mixture of C and C++ style long comments
in one .c file is OK with the current gcc coding style.  Could
you point me to a reference for that?


Regards,
kaz

Re: [SH] PR 51244 - Add CANONICALIZE_COMPARISON macro

2012-09-03 Thread Oleg Endo


On 3 Sep 2012, at 11:18, Kaz Kojima  wrote:


Oleg Endo  wrote:

--- gcc/config/sh/sh.c(revision 190840)
+++ gcc/config/sh/sh.c(working copy)
@@ -21,6 +21,12 @@
along with GCC; see the file COPYING3.  If not see
.  */

+/* FIXME: This is a temporary hack, so that we can include  

+   below.   will try to include  which will  
reference
+   malloc & co, which are poisoned by "system.h".  The proper  
solution is

+   to include  in "system.h" instead of .  */
+#include 
+
#include "config.h"
#include "system.h"
#include "coretypes.h"

[snip]

@@ -1791,65 +1798,124 @@
}
}

-enum rtx_code
-prepare_cbranch_operands (rtx *operands, enum machine_mode mode,
-  enum rtx_code comparison)
+// Implement the CANONICALIZE_COMPARISON macro for the combine pass.
+// This function is also re-used to canonicalize comparisons in  
cbranch

+// pattern expanders.
+void
+sh_canonicalize_comparison (enum rtx_code& cmp, rtx& op0, rtx& op1,
+enum machine_mode mode)
{
-  rtx op1;
-  rtx scratch = NULL_RTX;
+  // When invoked from within the combine pass the mode is not  
specified,

+  // so try to get it from one of the operands.
+  if (mode == VOIDmode)
+mode = GET_MODE (op0);


I'm not sure that the mixture of C and C++ style long comments
in one .c file is OK with the current gcc coding style.  Could
you point me to a reference for that?


Sorry, I can't.  At least The C++ conventions wiki page doesn't  
mention this.  Maybe somebody else can comment on this, please?


In any case, I have no problem with changing the multi line comments  
to /* ... */.  Just let me know.


Cheers,
Oleg

Re: [wwwdocs] PATCH for Re: Commit: XStormy16: Add support for -fstack-usage

2012-09-03 Thread nick clifton


Hi Gerald,


Anything you'd like to add or tweak?


No, it is fine thanks.


(Is "stormy" fine as an anchor, or should we go for "xstormy" or
the full "xstormy16"?  I used what we have for the port itself...)


I like the full "xstormy16" as well.  I think that the fact that the gcc 
backend sources are in a directory called "stormy16" is just a 
historical curiosity...


Cheers
  Nick

[AArch64] Correct cache line size calculation

2012-09-03 Thread Marcus Shawcroft

I've just committed the attached patch to correct the i and d cache line 
size calculation used in sync_cache_range() for AArch64.


/Marcus

2012-09-03  Marcus Shawcroft  

* config/aarch64/sync-cache.c (__aarch64_sync_cache_range): Lift
declarations to top of function.  Update comment.  Correct
icache_linesize and dcache_linesize calculation.
diff --git a/libgcc/config/aarch64/sync-cache.c b/libgcc/config/aarch64/sync-cache.c
index 089439d..1636b94 100644
--- a/libgcc/config/aarch64/sync-cache.c
+++ b/libgcc/config/aarch64/sync-cache.c
@@ -22,20 +22,22 @@ void
 __aarch64_sync_cache_range (const void *base, const void *end)
 {
   unsigned int cache_info = 0;
+  unsigned int icache_lsize;
+  unsigned int dcache_lsize;
+  const char *address;
 
-  /* CTR_EL0 is the same as AArch32's CTR which contains log2 of the
- icache size in [3:0], and log2 of the dcache line in [19:16].  */
+  /* CTR_EL0 [3:0] contains log2 of icache line size in words.
+ CTR_EL0 [19:16] contains log2 of dcache line size in words.  */
   asm volatile ("mrs\t%0, ctr_el0":"=r" (cache_info));
 
-  unsigned int icache_lsize = 1 << (cache_info & 0xF);
-  unsigned int dcache_lsize = 1 << ((cache_info >> 16) & 0xF);
+  icache_lsize = 4 << (cache_info & 0xF);
+  dcache_lsize = 4 << ((cache_info >> 16) & 0xF);
 
   /* Loop over the address range, clearing one cache line at once.
  Data cache must be flushed to unification first to make sure the
  instruction cache fetches the updated data.  'end' is exclusive,
  as per the GNU definition of __clear_cache.  */
 
-  const char *address;
   for (address = base; address < (const char *) end; address += dcache_lsize)
 asm volatile ("dc\tcvau, %0"
 		  :
-- 
1.7.12.rc0.22.gcdd159b

[AArch64] cache CTR_EL0 in sync_cache_range()

2012-09-03 Thread Marcus Shawcroft

I've just committed the attached patch to cache the CTR_EL0 register 
between calls to sync_cache_range().


/Marcus

2012-09-03  Marcus Shawcroft  

* config/aarch64/sync-cache.c (__aarch64_sync_cache_range):
Cache the ctr_el0 register.diff --git a/libgcc/config/aarch64/sync-cache.c b/libgcc/config/aarch64/sync-cache.c
index 1636b94..d7b621e 100644
--- a/libgcc/config/aarch64/sync-cache.c
+++ b/libgcc/config/aarch64/sync-cache.c
@@ -21,14 +21,15 @@
 void
 __aarch64_sync_cache_range (const void *base, const void *end)
 {
-  unsigned int cache_info = 0;
-  unsigned int icache_lsize;
-  unsigned int dcache_lsize;
+  unsigned icache_lsize;
+  unsigned dcache_lsize;
+  static unsigned int cache_info = 0;
   const char *address;
 
-  /* CTR_EL0 [3:0] contains log2 of icache line size in words.
- CTR_EL0 [19:16] contains log2 of dcache line size in words.  */
-  asm volatile ("mrs\t%0, ctr_el0":"=r" (cache_info));
+  if (! cache_info)
+/* CTR_EL0 [3:0] contains log2 of icache line size in words.
+   CTR_EL0 [19:16] contains log2 of dcache line size in words.  */
+asm volatile ("mrs\t%0, ctr_el0":"=r" (cache_info));
 
   icache_lsize = 4 << (cache_info & 0xF);
   dcache_lsize = 4 << ((cache_info >> 16) & 0xF);

Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-09-03 Thread Ramana Radhakrishnan


On 09/03/12 09:59, Christophe Lyon wrote:

On 31 August 2012 17:59, Richard Henderson  wrote:

On 2012-08-31 07:25, Christophe Lyon wrote:

+  offset = gen_rtx_CONST_INT (VOIDmode, location);


Never call gen_rtx_CONST_INT directly.  Use GEN_INT.



Here is an updated patch with that small change.
For the record, there are quite a few existing calls to
gen_rtx_CONST_INT, maybe a cleanup pass is needed?


A set of cleanup patches are welcome.

This looks OK - thanks.

Ramana

Re: [SH] PR 51244 - Add CANONICALIZE_COMPARISON macro

2012-09-03 Thread Kaz Kojima

Oleg Endo  wrote:
> In any case, I have no problem with changing the multi line comments  
> to /* ... */.  Just let me know.

Other than that, the patch is OK.

Regards,
kaz

Re: [avr]: Disable libquadmath

2012-09-03 Thread Denis Chertykov

2012/9/3 Georg-Johann Lay :
> This patch disables libquadmath.
>
> 128 bit wide floats are not really sensible on avr.
>
> Ok for trunk?
>

Ok.
Please commit.

Denis

Re: PATCH: PR driver/54335: -dm doesn't work

2012-09-03 Thread Richard Guenther

On Thu, Aug 23, 2012 at 3:38 PM, H.J. Lu  wrote:
> On Thu, Aug 23, 2012 at 1:45 AM, Richard Guenther
>  wrote:
>> On Wed, Aug 22, 2012 at 10:09 PM, H.J. Lu  wrote:
>>> Hi,
>>>
>>> -dm hasn't worked for a long time, at least dating back to GCC 3.4.
>>> This patch removes -dm and puts back -da, which was removed by accident.
>>> OK to install?
>>
>> Ok.
>>
>> Thanks,
>> Richard.
>
> I'd like to backport it to 4.6 and 4.7 branches.  OK for 4.6/4.7

Ok.

Richard.

> Thanks.
>
>>> Thanks.
>>>
>>>
>>> H.J.
>>> ---
>>> 2012-08-22  H.J. Lu  
>>>
>>> PR driver/54335
>>> * doc/invoke.texi: Add -da and remove -dm.
>>>
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index ae22ca9..e2feb6d 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -5610,7 +5610,9 @@ Dump after live range splitting.
>>>  @opindex fdump-rtl-dfinish
>>>  These dumps are defined but always produce empty files.
>>>
>>> -@item -fdump-rtl-all
>>> +@item -da
>>> +@itemx -fdump-rtl-all
>>> +@opindex da
>>>  @opindex fdump-rtl-all
>>>  Produce all the dumps listed above.
>>>
>>> @@ -5627,11 +5629,6 @@ normal output.
>>>  @opindex dH
>>>  Produce a core dump whenever an error occurs.
>>>
>>> -@item -dm
>>> -@opindex dm
>>> -Print statistics on memory usage, at the end of the run, to
>>> -standard error.
>>> -
>>>  @item -dp
>>>  @opindex dp
>>>  Annotate the assembler output with a comment indicating which
>
>
>
> --
> H.J.

Re: Ping^3 Re: Add --no-sysroot-suffix driver option

2012-09-03 Thread Richard Guenther

On Thu, Aug 23, 2012 at 4:43 PM, Joseph S. Myers
 wrote:
> Ping^3.  This patch
>  is still pending
> review.

Ok.

Thanks,
Richard.

> --
> Joseph S. Myers
> jos...@codesourcery.com

Re: [PATCH] Don't ICE if COMPOUND_LITERAL_EXPR's DECL_INITIAL isn't CONSTRUCTOR (PR c/54363)

2012-09-03 Thread Richard Guenther

On Fri, Aug 24, 2012 at 11:39 PM, Joseph S. Myers
 wrote:
> On Fri, 24 Aug 2012, Jakub Jelinek wrote:
>
>> Hi!
>>
>> On this testcase, we ICE in optimize_compound_literals_in_ctor
>> because init isn't CONSTRUCTOR, but the recursive call relies on it
>> being a CONSTRUCTOR.
>>
>> Either we can add that check as the patch does, making the optimization
>> tiny bit more robust (the rest of the gimplifier handles non-CONSTRUCTOR
>> DECL_INITIAL of COMPOUND_LITERAL_EXPR just fine), or the C FE would need to
>> be somehow fixed to always emit a CONSTRUCTOR.
>>
>> Joseph, what do you prefer here?
>
> I prefer the gimplifier approach here (allowing COMPOUND_LITERAL_EXPRs to
> use anything that would be a valid initializer for the relevant type).

The patch is ok.

Thanks,
Richard.

> --
> Joseph S. Myers
> jos...@codesourcery.com

Re: [PATCH] Fix emit_conditional_add and documentation for add@var{mode}cc

2012-09-03 Thread Richard Guenther

On Sat, Aug 25, 2012 at 12:43 AM, Andrew Pinski  wrote:
> Forgot to attach the patch.
>
> -- Andrew
>
> On Fri, Aug 24, 2012 at 3:42 PM, Andrew Pinski  wrote:
>> Hi,
>>   I decided to split this patch from the other patch which uses
>> emit_conditional_add in expand as that part of the patch needs some
>> work.  This part of the patch can be applied separately and it fixes a
>> few things dealing with conditional adds.
>>
>> First the documentation is wrong for the pattern as we do the addition
>> if operand 0 is true rather than false.
>> Then emit_conditional_add is wrong as you cannot switch around op2 and op3.
>>
>> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Which does not have conditional add ...

Well, the patch looks ok to me.  I suppose the op2 and op3 swapping was
supposed to canonicalize operands, but I see that addcc does not have a separate
argument slot for the value to use when the comparison is false.

Thanks,
Richard.

>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>>  * optabs.c (emit_conditional_add): Correct comment about the arguments.
>> Remove code which might swap op2 and op3 since they cannot be swapped.
>> * doc/md.texi (add@var{mode}cc): Fix document about how the arguments are 
>> used.

Re: [PATCH] fix for PR53986 - missing vrp on bit-mask test, LSHIFT_EXPR not handled

2012-09-03 Thread Richard Guenther

On Sun, Aug 26, 2012 at 4:28 PM, Tom de Vries  wrote:
> Richard,
>
> this patch fixes PR53986.
>
> The patch calculates the range of an LSHIFT_EXPR in case both operands are
> constants ranges, and the operation is guaranteed not to overflow.
>
> F.i., it evaluates [1, 2] << [1, 8] to [2, 512].
>
> Bootstrapped and reg-tested (ada inclusive) on x86_64.
>
> Ok for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2012-08-25  Tom de Vries  
>
> PR tree-optimization/53986
> * tree-vrp.c (extract_range_from_multiplicative_op_1): Allow
> LSHIFT_EXPR.
> (extract_range_from_binary_expr_1): Handle LSHIFT with constant range 
> as
> shift amount.
>
> * gcc.dg/tree-ssa/vrp80.c: New test.
> * gcc.dg/tree-ssa/vrp80-2.c: Same.

Re: another wrong-code problem with -fstrict-volatile-bitfields

2012-09-03 Thread Richard Guenther

On Sat, Aug 25, 2012 at 10:15 PM, Sandra Loosemore
 wrote:
> While I was grovelling around trying to swap in more state on the bitfield
> store/extract code for the patch rewrite being discussed here:
>
> http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01546.html
>
> I found a reference to PR23623 and found that it is broken again, but in a
> different way.  On ARM EABI, the attached test case correctly does 32-bit
> reads for the volatile int bit-field with -fstrict-volatile-bitfields, but
> incorrectly does 8-bit writes.  I thought I should try to track this down
> and fix it first, as part of making the bit-field read/extract code more
> consistent with each other, before trying to figure out a new place to hook
> in the packedp attribute stuff.  The patch I previously submitted does not
> fix the behavior of this test case for writing, nor does reverting the older
> patch that added the packedp attribute for reading break that case.
>
> After I tweaked a couple other places in store_bit_field_1 to handle
> -fstrict-volatile-bitfields consistently with extract_bit_field_1, I've
> gotten it into store_fixed_bit_field, to parallel the read case where it is
> getting into extract_fixed_bit_field.  But there it's failing to reach the
> special case for using the declared mode of the field with
> -fstrict-volatile-bitfields because it's been passed bitregion_start = 0 and
> bitregion_end = 7 so it thinks it must not write more than 8 bits no matter
> what.  Those values are coming in from expand_assignment, which is in turn
> getting them from get_bit_range.
>
> I'm really confused -- where is the right place to reconcile the new C++
> memory model with -fstrict-volatile-bitfields?

Appearantly they conflict.  The proper place to fix this is in struct-layout.c
where we compute DECL_BIT_FIELD_REPRESENTATIVE.

Your testcase is

extern struct
{
   unsigned int b : 1;
} bf1;

void writeb(void)
{
   bf1.b = 1;

what is sizeof (bf1) / alignof (bf1)?  I suppose with
-fstrict-volatile-bitfields
it is 4.  Thus make the DECL_BIT_FIELD_REPRESENTATIVE cover
tail padding in the structure for -fstrict-volatile-bitfields.  If the
size is _not_ 4
then

extern struct
{
   unsigned int b : 1;
} bf1;
char c;

may be miscompiling

c = 1;
bf1.b = 1;
return c;

if bf1 and c are adjacent in .data.

Richard.

> -Sandra
>

Re: Patch ping

2012-09-03 Thread Richard Guenther

On Mon, Aug 27, 2012 at 9:42 AM, Jakub Jelinek  wrote:
> http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01100.html
>   - C++ -Wsizeof-pointer-memaccess support (C is already in)
>
> http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01376.html
>   - valtrack ICE fix

This one is ok.

Thanks,
Richard.

> Jakub

Re: [PATCH] Fix PR 53395: tree-if-conv.c not producing MAX_EXPR

2012-09-03 Thread Richard Guenther

On Tue, Aug 28, 2012 at 9:09 AM, Andrew Pinski  wrote:
> On Mon, May 21, 2012 at 2:56 AM, Richard Guenther
>  wrote:
>> On Sun, May 20, 2012 at 1:40 AM, Andrew Pinski  wrote:
>>> The problem here is that tree-if-conv.c produces COND_EXPR instead of
>>> the MAX/MIN EXPRs.  When I added the expansion from COND_EXPR to
>>> conditional moves, I assumes that the expressions which should have
>>> been converted into MAX_EXPR/MIN_EXPR have already happened.
>>>
>>> This fixes the problem by having tree-if-conv fold the expression so
>>> the MIN_EXPR/MAX_EXPR appears in the IR rather than COND_EXPR and the
>>> expansion happens correctly to the min/max rtl rather than just
>>> through the conditional move ones.
>>>
>>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>>
>> As we are unconditionally building a gimple_assign from the folding result
>> you need to re-gimplify it.  The code was using build3 instead of fold_build3
>> to avoid that and to make sure we create a plain COND_EXPR we can
>> later CSE / simplify properly.  Generating a MAX_EXPR directly is certainly
>> fine (can we always vectorize that?), but I suppose simply changing the
>> code to use fold_build3 will have unwanted fallout (consider folds habit
>> to insert conversions that are not requried on gimple).
>>
>> So I'd rather prefer to abstract the build3 (COND_EXPR,... into a
>> helper function that uses fold_ternary and only if the result is an
>> invariant/register or a MIN/MAX_EXPR use the result, canonicalizing
>> it properly.
>
> Here is an updated patch which does exactly that.
>
> Note I noticed another regression dealing with my expansion for
> COND_EXPR dealing with min/max and I will be submitting patches for
> those issue tomorrow.
>
> OK? Bootstrap and tested on x86_64-linux-gnu.

Ok.

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-if-conv.c (constant_or_ssa_name): New function.
> (fold_build_cond_expr): New function.
> (predicate_scalar_phi): Use fold_build_cond_expr instead of build3.
> (predicate_mem_writes): Likewise.

[Committed] S/390: Use load relative to access GOT slots

2012-09-03 Thread Andreas Krebbel

Hi,

so far we were missing to use load relative when accessing GOT slots.
Fixed with the attached patch.

Committed to mainline after tested on s390 and s390x.

Bye,

-Andreas-

2012-09-03  Andreas Krebbel  

* config/s390/s390.c (s390_loadrelative_operand_p): New function.
(s390_check_qrst_address, print_operand_address): Use
s390_loadrelative_operand_p instead of s390_symref_operand_p.
(s390_check_symref_alignment): Accept pointer size alignment for GOT 
slots.
(legitimize_pic_address): Use load relative on z10 or later.

---
 gcc/config/s390/s390.c |   36 +++-
 1 file changed, 23 insertions(+), 1 deletion(-), 12 modifications(!)

Index: gcc/config/s390/s390.c
===
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*** s390_symref_operand_p (rtx addr, rtx *sy
*** 2123,2128 
--- 2123,2144 
return true;
  }
  
+ /* Return TRUE if ADDR is an operand valid for a load/store relative
+instructions.  Be aware that the alignment of the operand needs to
+be checked separately.  */
+ static bool
+ s390_loadrelative_operand_p (rtx addr)
+ {
+   if (GET_CODE (addr) == CONST)
+ addr = XEXP (addr, 0);
+ 
+   /* Enable load relative for symbol@GOTENT.  */
+   if (GET_CODE (addr) == UNSPEC
+   && XINT (addr, 1) == UNSPEC_GOTENT)
+ return true;
+ 
+   return s390_symref_operand_p (addr, NULL, NULL);
+ }
  
  /* Return true if the address in OP is valid for constraint letter C
 if wrapped in a MEM rtx.  Set LIT_POOL_OK to true if it literal
*** s390_check_qrst_address (char c, rtx op,
*** 2137,2143 
  
/* This check makes sure that no symbolic address (except literal
   pool references) are accepted by the R or T constraints.  */
!   if (s390_symref_operand_p (op, NULL, NULL))
  return 0;
  
/* Ensure literal pool references are only accepted if LIT_POOL_OK.  */
--- 2153,2159 
  
/* This check makes sure that no symbolic address (except literal
   pool references) are accepted by the R or T constraints.  */
!   if (s390_loadrelative_operand_p (op))
  return 0;
  
/* Ensure literal pool references are only accepted if LIT_POOL_OK.  */
*** s390_check_symref_alignment (rtx addr, H
*** 2941,2946 
--- 2957,2969 
HOST_WIDE_INT addend;
rtx symref;
  
+   /* Accept symbol@GOTENT with pointer size alignment.  */
+   if (GET_CODE (addr) == CONST
+   && GET_CODE (XEXP (addr, 0)) == UNSPEC
+   && XINT (XEXP (addr, 0), 1) == UNSPEC_GOTENT
+   && alignment <= UNITS_PER_LONG)
+ return true;
+ 
if (!s390_symref_operand_p (addr, &symref, &addend))
  return false;
  
*** legitimize_pic_address (rtx orig, rtx re
*** 3398,3406 
  
new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), 
UNSPEC_GOTENT);
new_rtx = gen_rtx_CONST (Pmode, new_rtx);
-   emit_move_insn (temp, new_rtx);
  
!   new_rtx = gen_const_mem (Pmode, temp);
emit_move_insn (reg, new_rtx);
new_rtx = reg;
  }
--- 3421,3434 
  
new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), 
UNSPEC_GOTENT);
new_rtx = gen_rtx_CONST (Pmode, new_rtx);
  
! if (!TARGET_Z10)
!   {
! emit_move_insn (temp, new_rtx);
! new_rtx = gen_const_mem (Pmode, temp);
!   }
! else
!   new_rtx = gen_const_mem (GET_MODE (reg), new_rtx);
emit_move_insn (reg, new_rtx);
new_rtx = reg;
  }
*** print_operand_address (FILE *file, rtx a
*** 5250,5256 
  {
struct s390_address ad;
  
!   if (s390_symref_operand_p (addr, NULL, NULL))
  {
if (!TARGET_Z10)
{
--- 5278,5284 
  {
struct s390_address ad;
  
!   if (s390_loadrelative_operand_p (addr))
  {
if (!TARGET_Z10)
{

Re: [PATCH 3/3] Compute predicates for phi node results in ipa-inline-analysis.c

2012-09-03 Thread Richard Guenther

On Fri, Aug 31, 2012 at 7:24 PM, Martin Jambor  wrote:
> Hi,
>
> On Thu, Aug 30, 2012 at 05:11:35PM +0200, Martin Jambor wrote:
>> this is a new version of the patch which makes ipa analysis produce
>> predicates for PHI node results, at least at the bottom of the
>> simplest diamond and semi-diamond CFG subgraphs.  This time I also
>> analyze the conditions again rather than extracting information from
>> CFG edges, which means I can reason about substantially more PHI
>> nodes.
>>
>> This patch makes us produce loop bounds hint for the pr48636.f90
>> testcase.
>>
>> Bootstrapped and tested on x86_64-linux.  OK for trunk?
>>
>> Thanks,
>>
>> Martin
>>
>>
>> 2012-08-29  Martin Jambor  
>>
>>   * ipa-inline-analysis.c (phi_result_unknown_predicate): New function.
>>   (predicate_for_phi_result): Likewise.
>>   (estimate_function_body_sizes): Use the above two functions.
>>
>
> This patch, on top of the one doing loop calculations almost always,
> introduces a number of testsuite failures which somehow I had not
> caught during my testing.  The problem is that either
> calculate_dominance_info or loop_optimizer_init introduce new SSA
> names for which there is no index in nonconstant_names which is
> allocated before the dominance and loop computations.  I'm currently
> bootstrapping and testing the following fix which simply allocates the
> vector after doing the two computations.  If it passes I will commit
> it straight away so that the regression is fixed before I leave for
> the weekend, I hope it's obvious enough for that.
>
> On the other hand, it would really be better if we did not change
> function bodies during IPA summary generation phase...

Um ... we shouldn't do this.  Can you track down where it happens?  I
suppose it might come from CFG manipulations loop_optimizer_init
performs when not passing AVOID_CFG_MODIFICATIONS.

Richard.

> Sorry for the breakage,
>
> Martin
>
>
> 2012-08-31  Martin Jambor  
>
> * ipa-inline-analysis.c (estimate_function_body_sizes): Allocate
> nonconstant_names after calculate_dominance_info and
> loop_optimizer_init.
>
> Index: src/gcc/ipa-inline-analysis.c
> ===
> --- src.orig/gcc/ipa-inline-analysis.c
> +++ src/gcc/ipa-inline-analysis.c
> @@ -2185,13 +2185,6 @@ estimate_function_body_sizes (struct cgr
>struct ipa_node_params *parms_info = NULL;
>VEC (predicate_t, heap) *nonconstant_names = NULL;
>
> -  if (ipa_node_params_vector && !early && optimize)
> -{
> -  parms_info = IPA_NODE_REF (node);
> -  VEC_safe_grow_cleared (predicate_t, heap, nonconstant_names,
> -VEC_length (tree, SSANAMES (my_function)));
> -}
> -
>info->conds = 0;
>info->entry = 0;
>
> @@ -2199,6 +2192,13 @@ estimate_function_body_sizes (struct cgr
>  {
>calculate_dominance_info (CDI_DOMINATORS);
>loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
> +
> +  if (ipa_node_params_vector)
> +   {
> + parms_info = IPA_NODE_REF (node);
> + VEC_safe_grow_cleared (predicate_t, heap, nonconstant_names,
> +VEC_length (tree, SSANAMES (my_function)));
> +   }
>  }
>
>if (dump_file)
>
>
>

Re: [patch] Make every GIMPLE switch have a default case, always

2012-09-03 Thread Richard Guenther

On Sat, Aug 25, 2012 at 1:14 AM, Steven Bosscher  wrote:
> Hello,
>
> This patch restores the old invariant that every GIMPLE switch has a
> default case. This invariant is only broken by the SJLJ exception
> dispatch code, and it's resulted in some code accepting a switch
> without a default while others still assume there is _always_ a
> default case. The patch enforces the invariant, fixes some fall-out,
> and cleans up the code in a couple of places. It makes the follow-up
> work on switch code generation that I still have planned a bit easier.
>
> Bootstrapped&tested on x86_64-unknown-linux-gnu (with Java to torture
> the exception handling code) and did a non-bootstrap build (including
> Java again) with --enable-sjlj-exceptions. OK for trunk?

Ok.

Thanks,
Richard.

> Ciao!
> Steven

Re: [PATCH, C] Mixed pointer types in call to streamer_tree_cache_lookup() in gcc/lto-streamer-out.c

2012-09-03 Thread Richard Guenther

On Sat, Sep 1, 2012 at 2:21 PM, Andris Pavenis  wrote:
> uint32_t * is used as a 3rd parameter in call to
> streamer_tree_cache_lookup()
> in 2 places in gcc/lto-streamer-out.c when the procedure prototype have
> unsigned *. They are not guaranteed to be the same for all targets
> (I got error when building for DJGPP)

Ok.

Thanks,
Richard.

> Andris
>
> ChangeLog entry
>
> 2012-09-01  Andris Pavenis 
>
> * lto-streamer-out.c (write_global_references,
> lto_output_decl_state_refs):
> Fix parameter type in call to streamer_tree_cache_lookup

Re: combine BIT_FIELD_REF and VEC_PERM_EXPR

2012-09-03 Thread Richard Guenther

On Sat, Sep 1, 2012 at 8:54 PM, Marc Glisse  wrote:
> Hello,
>
> this patch makes it so that instead of taking the k-th element of a shuffle
> of V, we directly take the k'-th element of V.
>
> Note that I am *not* checking that the shuffle is only used once. There can
> be some circumstances where this optimization will make us miss other
> opportunities later, but restricting it to the single use case would make it
> much less useful.
>
> This is also my first contact with BIT_FIELD_REF, I may have missed some
> properties of that tree.
>
> bootstrap+testsuite ok.
>
> 2012-09-01  Marc Glisse  
>
> gcc/
> * tree-ssa-forwprop.c (simplify_bitfield): New function.
> (ssa_forward_propagate_and_combine): Call it.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/forwprop-21.c: New testcase.
>
> (-21 because I have another patch pending review that adds a -20 testcase)
> (simplify_bitfield appears before simplify_permutation to minimize conflicts
> with that same patch, I can put it back in order when I commit if you
> prefer)
>
> --
> Marc Glisse
> Index: testsuite/gcc.dg/tree-ssa/forwprop-21.c
> ===
> --- testsuite/gcc.dg/tree-ssa/forwprop-21.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/forwprop-21.c (revision 0)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +typedef int v4si __attribute__ ((vector_size (4 * sizeof(int;
> +
> +int
> +test (v4si *x, v4si *y)
> +{
> +  v4si m = { 2, 3, 6, 5 };
> +  v4si z = __builtin_shuffle (*x, *y, m);
> +  return z[2];
> +}
> +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>
> Property changes on: testsuite/gcc.dg/tree-ssa/forwprop-21.c
> ___
> Added: svn:eol-style
>+ native
> Added: svn:keywords
>+ Author Date Id Revision URL
>
> Index: tree-ssa-forwprop.c
> ===
> --- tree-ssa-forwprop.c (revision 190847)
> +++ tree-ssa-forwprop.c (working copy)
> @@ -2570,20 +2570,88 @@ combine_conversions (gimple_stmt_iterato
>   gimple_assign_set_rhs_code (stmt, CONVERT_EXPR);
>   update_stmt (stmt);
>   return remove_prop_source_from_use (op0) ? 2 : 1;
> }
> }
>  }
>
>return 0;
>  }
>
> +/* Combine an element access with a shuffle.  Returns true if there were
> +   any changes made, else it returns false.  */
> +
> +static bool
> +simplify_bitfield (gimple_stmt_iterator *gsi)
> +{
> +  gimple stmt = gsi_stmt (*gsi);
> +  gimple def_stmt;
> +  tree op, op0, op1, op2;
> +  unsigned idx, n, size;
> +  enum tree_code code;
> +
> +  op = gimple_assign_rhs1 (stmt);
> +  gcc_checking_assert (TREE_CODE (op) == BIT_FIELD_REF);
> +
> +  op0 = TREE_OPERAND (op, 0);
> +  op1 = TREE_OPERAND (op, 1);
> +  op2 = TREE_OPERAND (op, 2);
> +
> +  if (TREE_CODE (TREE_TYPE (op0)) != VECTOR_TYPE)
> +return false;
> +
> +  size = TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (op0;
> +  n = TREE_INT_CST_LOW (op1) / size;
> +  idx = TREE_INT_CST_LOW (op2) / size;
> +
> +  if (n != 1)
> +return false;
> +
> +  if (TREE_CODE (op0) != SSA_NAME)
> +return false;

Please do the early outs where you compute the arguments.  Thus, right
after getting op0 in this case or right after computing n for the n != 1 check.

I think you need to verify that the type of 'op' is actually the element type
of op0.  The BIT_FIELD_REF can happily access elements two and three
of { 1, 2, 3, 4 } as a long for example.  See the BIT_FIELD_REF foldings
in fold-const.c.

> +  def_stmt = SSA_NAME_DEF_STMT (op0);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +  || !can_propagate_from (def_stmt))
> +return false;
> +
> +  code = gimple_assign_rhs_code (def_stmt);
> +
> +  if (code == VEC_PERM_EXPR)
> +{
> +  tree p, m, index, tem;
> +  unsigned nelts;
> +  m = gimple_assign_rhs3 (def_stmt);
> +  if (TREE_CODE (m) != VECTOR_CST)
> +   return false;
> +  nelts = VECTOR_CST_NELTS (m);
> +  idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (m, idx));
> +  idx %= 2 * nelts;
> +  if (idx < nelts)
> +   {
> + p = gimple_assign_rhs1 (def_stmt);
> +   }
> +  else
> +   {
> + p = gimple_assign_rhs2 (def_stmt);
> + idx -= nelts;
> +   }
> +  index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size);
> +  tem = fold_build3 (BIT_FIELD_REF, TREE_TYPE (op), p, op1, index);

This shouldn't simplify, so you can use build3 instead.  Please also add
handling of code == CONSTRUCTOR.

Thanks,
Richard.

> +  gimple_assign_set_rhs1 (stmt, tem);
> +  update_stmt (stmt);
> +  return true;
> +}
> +
> +  return false;
> +}
> +
>  /* Determine whether applying the 2 permutations (mask1 then mask2)
> gives back one of t

Re: Fold VEC_PERM_EXPR a little more

2012-09-03 Thread Richard Guenther

On Sat, Sep 1, 2012 at 4:38 PM, Marc Glisse  wrote:
> Hello,
>
> I noticed while writing the patch posted at:
> http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01755.html
>
> that fold_ternary_loc sometimes doesn't go all the way for VEC_PERM_EXPR,
> calling it a second time would fold even more. Fixed by a simple reordering.
>
> (bootstrap and testsuite are ok)

Ok.

Thanks,
Richard.

> 2012-09-01  Marc Glisse  
>
> gcc/
> * fold-const.c (fold_ternary_loc): Constant-propagate after
> removing dead operands.
>
> gcc/testsuite/
> * gcc.dg/fold-perm.c: Improve test.
>
> --
> Marc Glisse
> Index: fold-const.c
> ===
> --- fold-const.c(revision 190845)
> +++ fold-const.c(working copy)
> @@ -14189,40 +14189,40 @@ fold_ternary_loc (location_t loc, enum t
> }
>
>   if (maybe_identity)
> {
>   if (all_in_vec0)
> return op0;
>   if (all_in_vec1)
> return op1;
> }
>
> - if ((TREE_CODE (arg0) == VECTOR_CST
> -  || TREE_CODE (arg0) == CONSTRUCTOR)
> - && (TREE_CODE (arg1) == VECTOR_CST
> - || TREE_CODE (arg1) == CONSTRUCTOR))
> -   {
> - t = fold_vec_perm (type, arg0, arg1, sel);
> - if (t != NULL_TREE)
> -   return t;
> -   }
> -
>   if (all_in_vec0)
> op1 = op0;
>   else if (all_in_vec1)
> {
>   op0 = op1;
>   for (i = 0; i < nelts; i++)
> sel[i] -= nelts;
>   need_mask_canon = true;
> }
>
> + if ((TREE_CODE (op0) == VECTOR_CST
> +  || TREE_CODE (op0) == CONSTRUCTOR)
> + && (TREE_CODE (op1) == VECTOR_CST
> + || TREE_CODE (op1) == CONSTRUCTOR))
> +   {
> + t = fold_vec_perm (type, op0, op1, sel);
> + if (t != NULL_TREE)
> +   return t;
> +   }
> +
>   if (op0 == op1 && !single_arg)
> changed = true;
>
>   if (need_mask_canon && arg2 == op2)
> {
>   tree *tsel = XALLOCAVEC (tree, nelts);
>   tree eltype = TREE_TYPE (TREE_TYPE (arg2));
>   for (i = 0; i < nelts; i++)
> tsel[i] = build_int_cst (eltype, sel[i]);
>   op2 = build_vector (TREE_TYPE (arg2), tsel);
> Index: testsuite/gcc.dg/fold-perm.c
> ===
> --- testsuite/gcc.dg/fold-perm.c(revision 190845)
> +++ testsuite/gcc.dg/fold-perm.c(working copy)
> @@ -1,19 +1,20 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O -fdump-tree-ccp1" } */
>
>  typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
>
> -void fun (veci *f, veci *g, veci *h)
> +void fun (veci *f, veci *g, veci *h, veci *i)
>  {
>veci m = { 7, 7, 4, 6 };
>veci n = { 0, 1, 2, 3 };
>veci p = { 1, 1, 7, 6 };
> +  *i = __builtin_shuffle (*i,  p, m);
>*h = __builtin_shuffle (*h, *h, p);
>*g = __builtin_shuffle (*f, *g, m);
>*f = __builtin_shuffle (*f, *g, n);
>  }
>
>  /* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 3, 0, 2 }" "ccp1" } }
> */
>  /* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 1, 1, 3, 2 }" "ccp1" } }
> */
>  /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 2 "ccp1" } } */
>  /* { dg-final { cleanup-tree-dump "ccp1" } } */
>

[PATCH] Fix parts of PR54362

2012-09-03 Thread Richard Guenther


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2012-09-03  Richard Guenther  

PR tree-optimization/54362
* tree-ssa-structalias.c (find_func_aliases): Handle COND_EXPR.

Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 190620)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -4527,6 +4527,18 @@ find_func_aliases (gimple origt)
 && !POINTER_TYPE_P (TREE_TYPE (rhsop
   || gimple_assign_single_p (t))
get_constraint_for_rhs (rhsop, &rhsc);
+ else if (code == COND_EXPR)
+   {
+ /* The result is a merge of both COND_EXPR arms.  */
+ VEC (ce_s, heap) *tmp = NULL;
+ struct constraint_expr *rhsp;
+ unsigned i;
+ get_constraint_for_rhs (gimple_assign_rhs2 (t), &rhsc);
+ get_constraint_for_rhs (gimple_assign_rhs3 (t), &tmp);
+ FOR_EACH_VEC_ELT (ce_s, tmp, i, rhsp)
+   VEC_safe_push (ce_s, heap, rhsc, rhsp);
+ VEC_free (ce_s, heap, tmp);
+   }
  else if (truth_value_p (code))
/* Truth value results are not pointer (parts).  Or at least
   very very unreasonable obfuscation of a part.  */

Re: Speedup loop header copying [part of PR 46590]

2012-09-03 Thread Richard Guenther

On Sun, Sep 2, 2012 at 9:35 PM, Michael Matz  wrote:
> Hi,
>
> as the bug report tells us one speed problem is loop header copying, in
> particular the update_ssa call that is done for each and every copied loop
> header but touches all blocks in a function.
>
> Now, one idea was to use an optimized update_ssa that works only on the
> relevant subset of blocks (it's dominance frontiers that are the problem).
> I've experimented with the original formulation of frontiers as per Cytron
> which allows to calculate the domfrontier of one basic block lazily.  The
> end result was none, no speedup, no slowdown.  I haven't investigated,
> but I guess the problem is that too often most of the blocks are relevant
> for most of the header copies, either because of virtual ops or because
> calculating the domfrontier of a block needs all domfrontiers of all
> dominated childs (i.e. the domfrontier of the entry needs domfrontiers of
> all blocks).  So, no cake there.
>
> But actually there's no reason that we need to keep SSA form uptodate
> during the multiple header copyings.  We use gimple_duplicate_sese_region
> to do the copying which updates the SSA web before returning (actually
> loop header copying is the only caller of it).  The next thing done is
> just another call to gimple_duplicate_sese_region to copy some other BBs,
> then some split edges, repeat from start.  We can just as well defer the
> whole SSA web updating to after we've duplicated everything we want.
>
> That's what this patch does.  Time for various things on the testcase
> (with -O1):
>
>  without with patch
> tree SSA rewrite 26.2s 4.8s
> tree SSA incremental 21.7s 4.6s
> dominance computation15.0s 4.2s
> dominance frontiers  25.6s 6.7s
> TOTAL   135.6s67.8s
>
> Regstrapped on x86_64-linux, no regressions.  Okay for trunk?

Ok.

Thanks,
Richard.

>
> Ciao,
> Michael.
> --
> PR tree-optimization/46590
>
> * tree-cfg.c (gimple_duplicate_sese_region): Don't update
> SSA web here ...
> * tree-ssa-loop-ch.c (copy_loop_headers): ... but here.
>
> Index: tree-cfg.c
> ===
> --- tree-cfg.c  (revision 190803)
> +++ tree-cfg.c  (working copy)
> @@ -5530,9 +5530,10 @@ add_phi_args_after_copy (basic_block *re
> important exit edge EXIT.  By important we mean that no SSA name defined
> inside region is live over the other exit edges of the region.  All entry
> edges to the region must go to ENTRY->dest.  The edge ENTRY is redirected
> -   to the duplicate of the region.  SSA form, dominance and loop information
> -   is updated.  The new basic blocks are stored to REGION_COPY in the same
> -   order as they had in REGION, provided that REGION_COPY is not NULL.
> +   to the duplicate of the region.  Dominance and loop information is
> +   updated, but not the SSA web.  The new basic blocks are stored to
> +   REGION_COPY in the same order as they had in REGION, provided that
> +   REGION_COPY is not NULL.
> The function returns false if it is unable to copy the region,
> true otherwise.  */
>
> @@ -5593,8 +5594,6 @@ gimple_duplicate_sese_region (edge entry
>free_region_copy = true;
>  }
>
> -  gcc_assert (!need_ssa_update_p (cfun));
> -
>/* Record blocks outside the region that are dominated by something
>   inside.  */
>doms = NULL;
> @@ -5663,9 +5662,6 @@ gimple_duplicate_sese_region (edge entry
>/* Add the other PHI node arguments.  */
>add_phi_args_after_copy (region_copy, n_region, NULL);
>
> -  /* Update the SSA web.  */
> -  update_ssa (TODO_update_ssa);
> -
>if (free_region_copy)
>  free (region_copy);
>
> Index: tree-ssa-loop-ch.c
> ===
> --- tree-ssa-loop-ch.c  (revision 190803)
> +++ tree-ssa-loop-ch.c  (working copy)
> @@ -241,6 +241,7 @@ copy_loop_headers (void)
>split_edge (loop_latch_edge (loop));
>  }
>
> +  update_ssa (TODO_update_ssa);
>free (bbs);
>free (copied_bbs);
>

Re: [PATCH][build] Fix PR54138, make --without-cloog work

2012-09-03 Thread Richard Guenther

On Tue, Aug 14, 2012 at 1:27 PM, Richard Guenther  wrote:
>
> This makes --without-cloog and --without-isl disable GRAPHITE support
> as intended.
>
> Tested up to building stage2 with --without-isl, verified ISL was not
> used or checked for, tested up to building stage2 without --without-isl,
> verified system ISL was picked up.
>
> Ok for trunk?

Ping.

> Thanks,
> Richard.
>
> 2012-08-14  Richard Guenther  
>
> PR bootstrap/54138
> * configure.ac: Re-organize ISL / CLOOG checks to allow
> disabling with either --without-isl or --without-cloog.
> * configure: Regenerated.
> * config/cloog.m4: Adjust.
> * config/isl.m4: Adjust.
>
> Index: configure.ac
> ===
> *** configure.ac(revision 190376)
> --- configure.ac(working copy)
> *** AC_ARG_WITH(boot-ldflags,
> *** 1520,1563 
>fi])
>   AC_SUBST(poststage1_ldflags)
>
> ! # Check for ISL
> ! dnl Provide configure switches and initialize islinc & isllibs
> ! dnl with user input.
> ! ISL_INIT_FLAGS
> ! if test "x$with_isl" != "xno"; then
> dnl The minimal version of ISL required for Graphite.
> ISL_CHECK_VERSION(0,10)
> -
> dnl Only execute fail-action, if ISL has been requested.
> ISL_IF_FAILED([
>   AC_MSG_ERROR([Unable to find a usable ISL.  See config.log for 
> details.])])
> - fi
>
> ! # Check for CLOOG
> ! dnl Provide configure switches and initialize clooginc & clooglibs
> ! dnl with user input.
> ! CLOOG_INIT_FLAGS
> ! if test "x$isllibs" = x && test "x$islinc" = x; then
> !   clooglibs=
> !   clooginc=
> ! elif test "x$with_cloog" != "xno"; then
> !   dnl The minimal version of CLooG required for Graphite.
> !   dnl
> !   dnl If we use CLooG-Legacy, the provided version information is
> !   dnl ignored.
> !   CLOOG_CHECK_VERSION(0,17,0)
> !
> !   dnl Only execute fail-action, if CLooG has been requested.
> !   CLOOG_IF_FAILED([
> ! AC_MSG_ERROR([Unable to find a usable CLooG.  See config.log for 
> details.])])
>   fi
>
>   # If either the ISL or the CLooG check failed, disable builds of in-tree
>   # variants of both
> ! if test "x$clooglibs" = x && test "x$clooginc" = x; then
> noconfigdirs="$noconfigdirs cloog isl"
>   fi
>
>   # Check for LTO support.
>   AC_ARG_ENABLE(lto,
>   [AS_HELP_STRING([--enable-lto], [enable link time optimization support])],
> --- 1520,1590 
>fi])
>   AC_SUBST(poststage1_ldflags)
>
> ! # GCC GRAPHITE dependences, ISL and CLOOG which in turn requires ISL.
> ! # Basic setup is inlined here, actual checks are in config/cloog.m4 and
> ! # config/isl.m4
> !
> ! AC_ARG_WITH(cloog,
> !   [AS_HELP_STRING(
> ! [--with-cloog=PATH],
> ! [Specify prefix directory for the installed CLooG-ISL package.
> !  Equivalent to --with-cloog-include=PATH/include
> !  plus --with-cloog-lib=PATH/lib])])
> ! AC_ARG_WITH(isl,
> !   [AS_HELP_STRING(
> ![--with-isl=PATH],
> ![Specify prefix directory for the installed ISL package.
> ! Equivalent to --with-isl-include=PATH/include
> ! plus --with-isl-lib=PATH/lib])])
> !
> ! # Treat either --without-cloog or --without-isl as a request to disable
> ! # GRAPHITE support and skip all following checks.
> ! if test "x$with_isl" != "xno" &&
> !test "x$with_cloog" != "xno"; then
> !   # Check for ISL
> !   dnl Provide configure switches and initialize islinc & isllibs
> !   dnl with user input.
> !   ISL_INIT_FLAGS
> dnl The minimal version of ISL required for Graphite.
> ISL_CHECK_VERSION(0,10)
> dnl Only execute fail-action, if ISL has been requested.
> ISL_IF_FAILED([
>   AC_MSG_ERROR([Unable to find a usable ISL.  See config.log for 
> details.])])
>
> !   if test "x$gcc_cv_isl" != "xno"; then
> ! # Check for CLOOG
> ! dnl Provide configure switches and initialize clooginc & clooglibs
> ! dnl with user input.
> ! CLOOG_INIT_FLAGS
> ! dnl The minimal version of CLooG required for Graphite.
> ! dnl
> ! dnl If we use CLooG-Legacy, the provided version information is
> ! dnl ignored.
> ! CLOOG_CHECK_VERSION(0,17,0)
> !
> ! dnl Only execute fail-action, if CLooG has been requested.
> ! CLOOG_IF_FAILED([
> !   AC_MSG_ERROR([Unable to find a usable CLooG.  See config.log for 
> details.])])
> !   fi
>   fi
>
>   # If either the ISL or the CLooG check failed, disable builds of in-tree
>   # variants of both
> ! if test "x$with_isl" == xno ||
> !test "x$with_cloog" == xno ||
> !test "x$gcc_cv_cloog" = xno ||
> !test "x$gcc_cv_isl" = xno; then
> noconfigdirs="$noconfigdirs cloog isl"
> +   islinc=
> +   clooginc=
> +   clooglibs=
>   fi
>
> + AC_SUBST(islinc)
> + AC_SUBST(clooglibs)
> + AC_SUBST(clooginc)
> +
> +
>   # Check for LTO support.
>   AC_ARG_ENABLE(lto,
>   [AS_HELP_STRING([--enable-lto], [enable link time optimization support])],
> Index: config/isl.m4
> =

Re: [PATCH][RFC] Add -Og

2012-09-03 Thread Richard Guenther

On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther  wrote:
>
> This adds a new optimization level, -Og, as previously discussed.
> It aims at providing fast compilation, a superior debugging
> experience and reasonable runtime performance.  Instead of making
> -O1 this optimization level this adds a new -Og.
>
> It's a first cut, highlighting that our fixed pass pipeline and
> simply enabling/disabling individual passes (but not pass copies
> for example) doesn't scale to properly differentiate between
> -Og and -O[23].  -O1 should get similar treatment, eventually
> just building on -Og but not focusing on debugging experience.
> That is, I expect that in the end we will at least have two post-IPA
> optimization pipelines.  It also means that you cannot enable
> PRE or VRP with -Og at the moment because these passes are not
> anywhere scheduled (similar to the situation with -O0).
>
> It has some funny effect on dump-file naming of the pass copies
> though, which hints at that the current setup is too static.
> For that reason the new queue comes after the old, to not confuse
> too many testcases.
>
> It also does not yet disable any of the early optimizations that
> make debugging harder (SRA comes to my mind here, as does
> switch-conversion and partial inlining).
>
> The question arises if we want to support in any reasonable
> way using profile-feedback or LTO for -O[01g], thus if we
> rather want to delay some of the early opts to after IPA
> optimizations.
>
> Not bootstrapped or fully tested, but it works for the compile
> torture.
>
> Comments welcome,

No comments?  Then I'll drop this idea for 4.8.

Richard.

> Thanks,
> Richard.
>
> 2012-08-10  Richard Guenther  
>
> PR other/53316
> * common.opt (optimize_debug): New variable.
> (Og): New optimization level.
> * doc/invoke.texi (Og): Document.
> * opts.c (maybe_default_option): Add debug parameter.
> (maybe_default_options): Likewise.
> (default_options_optimization): Handle -Og.
> (common_handle_option): Likewise.
> * passes.c (gate_all_optimizations): Do not run with -Og.
> (gate_all_optimizations_g): New gate, run with -Og.
> (pass_all_optimizations_g): New container pass, run with -Og.
> (init_optimization_passes): Schedule pass_all_optimizations_g
> alongside pass_all_optimizations.
>
> * gcc/testsuite/lib/c-torture.exp: Add -Og -g to default
> TORTURE_OPTIONS.
>
> Index: trunk/gcc/common.opt
> ===
> *** trunk.orig/gcc/common.opt   2012-07-19 10:39:47.0 +0200
> --- trunk/gcc/common.opt2012-08-10 11:58:22.218122816 +0200
> *** int optimize
> *** 32,37 
> --- 32,40 
>   Variable
>   int optimize_size
>
> + Variable
> + int optimize_debug
> +
>   ; Not used directly to control optimizations, only to save -Ofast
>   ; setting for "optimize" attributes.
>   Variable
> *** Ofast
> *** 446,451 
> --- 449,458 
>   Common Optimization
>   Optimize for speed disregarding exact standards compliance
>
> + Og
> + Common Optimization
> + Optimize for debugging experience rather than speed or size
> +
>   Q
>   Driver
>
> Index: trunk/gcc/opts.c
> ===
> *** trunk.orig/gcc/opts.c   2012-07-24 10:35:57.0 +0200
> --- trunk/gcc/opts.c2012-08-10 12:48:38.986018411 +0200
> *** init_options_struct (struct gcc_options
> *** 314,328 
>   }
>
>   /* If indicated by the optimization level LEVEL (-Os if SIZE is set,
> !-Ofast if FAST is set), apply the option DEFAULT_OPT to OPTS and
> !OPTS_SET, diagnostic context DC, location LOC, with language mask
> !LANG_MASK and option handlers HANDLERS.  */
>
>   static void
>   maybe_default_option (struct gcc_options *opts,
>   struct gcc_options *opts_set,
>   const struct default_options *default_opt,
> ! int level, bool size, bool fast,
>   unsigned int lang_mask,
>   const struct cl_option_handlers *handlers,
>   location_t loc,
> --- 314,328 
>   }
>
>   /* If indicated by the optimization level LEVEL (-Os if SIZE is set,
> !-Ofast if FAST is set, -Og if DEBUG is set), apply the option DEFAULT_OPT
> !to OPTS and OPTS_SET, diagnostic context DC, location LOC, with language
> !mask LANG_MASK and option handlers HANDLERS.  */
>
>   static void
>   maybe_default_option (struct gcc_options *opts,
>   struct gcc_options *opts_set,
>   const struct default_options *default_opt,
> ! int level, bool size, bool fast, bool debug,
>   unsigned int lang_mask,
>   const struct cl_option_handlers *handlers,
>   location_t loc,
> *

Re: combine BIT_FIELD_REF and VEC_PERM_EXPR

2012-09-03 Thread Marc Glisse


On Mon, 3 Sep 2012, Richard Guenther wrote:

Please do the early outs where you compute the arguments.  Thus, right 
after getting op0 in this case or right after computing n for the n != 1 
check.


Ok.


I think you need to verify that the type of 'op' is actually the element type
of op0.  The BIT_FIELD_REF can happily access elements two and three
of { 1, 2, 3, 4 } as a long for example.


Indeed I missed that.


See the BIT_FIELD_REF foldings in fold-const.c.


That's what I was looking at (picked the same variable names size, idx, n) 
but I forgot that test :-(



+  if (code == VEC_PERM_EXPR)
+{
+  tree p, m, index, tem;
+  unsigned nelts;
+  m = gimple_assign_rhs3 (def_stmt);
+  if (TREE_CODE (m) != VECTOR_CST)
+   return false;
+  nelts = VECTOR_CST_NELTS (m);
+  idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (m, idx));
+  idx %= 2 * nelts;
+  if (idx < nelts)
+   {
+ p = gimple_assign_rhs1 (def_stmt);
+   }
+  else
+   {
+ p = gimple_assign_rhs2 (def_stmt);
+ idx -= nelts;
+   }
+  index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size);
+  tem = fold_build3 (BIT_FIELD_REF, TREE_TYPE (op), p, op1, index);


This shouldn't simplify, so you can use build3 instead.


I think that it is possible for p to be a VECTOR_CST, if the shuffle 
involves one constant and one non-constant vectors, no?


Now that I look at this line, I wonder if I am missing some unshare_expr 
for p and/or op1.



Please also add handling of code == CONSTRUCTOR.


The cases I tried were already handled by fre1. I can add code for 
constructor, but I'll need to look for a testcase first. Can that go to a 
different patch?


--
Marc Glisse

Re: combine vec_perm_expr with constructor

2012-09-03 Thread Richard Guenther

On Sat, Aug 25, 2012 at 10:54 PM, Marc Glisse  wrote:
> Hello,
>
> this patch (bootstrapped and regtested on x86_64) deals with the same issue
> as the one at:
>
> http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00205.html
>
> that is combining a shuffle of a constructor into a constructor, but at the
> tree-ssa level. An advantage is that it works with any size of vectors (the
> RTL patch only handles size 2 IIRC). A drawback is that it only applies to
> __builtin_shuffle, not the builtins that the x86 front-end uses.
>
> Note that fold already knew this optimization (my thanks to whoever wrote
> it, it helped a lot), it just never got a chance to apply it.
>
> In the call to fold_ternary, I am not sure if TREE_TYPE(op0) is the right
> argument, I could also use the type of the lhs, I don't know if that would
> make any difference.
>
>
> While I am here, I would like to write a patch that converts w={v[1],v[0]}
> to a builtin_shuffle (under the same conditions where a builtin_shuffle is
> not lowered to a constructor of elements, obviously). Is forwprop still the
> right pass to add it, or is there another more relevant one?
>
>
> 2012-08-22  Marc Glisse  
>
> gcc/
> * tree-ssa-forwprop.c (simplify_permutation): Handle CONSTRUCTOR.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/forwprop-20.c: New testcase.
>
> --
> Marc Glisse
> Index: testsuite/gcc.dg/tree-ssa/forwprop-20.c
> ===
> --- testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 0)
> @@ -0,0 +1,70 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target double64 } */
> +/* { dg-options "-O2 -fdump-tree-optimized" }  */
> +
> +#include 
> +
> +/* All of these optimizations happen for unsupported vector modes as a
> +   consequence of the lowering pass. We need to test with a vector mode
> +   that is supported by default on at least some architectures, or make
> +   the test target specific so we can pass a flag like -mavx.  */
> +
> +typedef double vecf __attribute__ ((vector_size (2 * sizeof (double;
> +typedef int64_t veci __attribute__ ((vector_size (2 * sizeof (int64_t;
> +
> +void f (double d, vecf* r)
> +{
> +  vecf x = { -d, 5 };
> +  vecf y = {  1, 4 };
> +  veci m = {  2, 0 };
> +  *r = __builtin_shuffle (x, y, m); // { 1, -d }
> +}
> +
> +void g (float d, vecf* r)
> +{
> +  vecf x = { d, 5 };
> +  vecf y = { 1, 4 };
> +  veci m = { 2, 1 };
> +  *r = __builtin_shuffle (x, y, m); // { 1, 5 }
> +}
> +
> +void h (double d, vecf* r)
> +{
> +  vecf x = { d + 1, 5 };
> +  vecf y = {   1  , 4 };
> +  veci m = {   2  , 0 };
> +  *r = __builtin_shuffle (y, x, m); // { d + 1, 1 }
> +}
> +
> +void i (float d, vecf* r)
> +{
> +  vecf x = { d, 5 };
> +  veci m = { 1, 0 };
> +  *r = __builtin_shuffle (x, m); // { 5, d }
> +}
> +
> +void j (vecf* r)
> +{
> +  vecf y = {  1, 2 };
> +  veci m = {  0, 0 };
> +  *r = __builtin_shuffle (y, m); // { 1, 1 }
> +}
> +
> +void k (vecf* r)
> +{
> +  vecf x = {  3, 4 };
> +  vecf y = {  1, 2 };
> +  veci m = {  3, 0 };
> +  *r = __builtin_shuffle (x, y, m); // { 2, 3 }
> +}
> +
> +void l (double d, vecf* r)
> +{
> +  vecf x = { -d, 5 };
> +  vecf y = {  d, 4 };
> +  veci m = {  2, 0 };
> +  *r = __builtin_shuffle (x, y, m); // { d, -d }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>
> Property changes on: testsuite/gcc.dg/tree-ssa/forwprop-20.c
> ___
> Added: svn:keywords
>+ Author Date Id Revision URL
> Added: svn:eol-style
>+ native
>
> Index: tree-ssa-forwprop.c
> ===
> --- tree-ssa-forwprop.c (revision 190666)
> +++ tree-ssa-forwprop.c (working copy)
> @@ -2602,75 +2602,130 @@ is_combined_permutation_identity (tree m
>if (j == i)
> maybe_identity2 = false;
>else if (j == i + nelts)
> maybe_identity1 = false;
>else
> return 0;
>  }
>return maybe_identity1 ? 1 : maybe_identity2 ? 2 : 0;
>  }
>
> -/* Combine two shuffles in a row.  Returns 1 if there were any changes
> -   made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
> +/* Combine a shuffle with its arguments.  Returns 1 if there were any
> +   changes made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
>
>  static int
>  simplify_permutation (gimple_stmt_iterator *gsi)
>  {
>gimple stmt = gsi_stmt (*gsi);
>gimple def_stmt;
> -  tree op0, op1, op2, op3;
> -  enum tree_code code = gimple_assign_rhs_code (stmt);
> -  enum tree_code code2;
> +  tree op0, op1, op2, op3, arg0, arg1;
> +  enum tree_code code;
>
> -  gcc_checking_assert (code == VEC_PERM_EXPR);
> +  gcc_checking_assert (gimple_assign_rhs_code (stmt) == VEC_PERM_EXPR);
>
>op0 = gimple_assign_rhs1 (stmt);
>op1 = gimple_assign_rhs2 (stmt)

Re: combine vec_perm_expr with constructor

2012-09-03 Thread Marc Glisse


On Mon, 3 Sep 2012, Richard Guenther wrote:


You shouldn't need the VECTOR_CST handling - constant propagation should
already ensure properly simplified code here (and is the more canonical place
to handle this).


IIRC, I added VECTOR_CST because of mixed constructor/vector_cst shuffles 
(and because it wasn't too hard). If I remove it (I can), I guess some of 
the testcases won't work anymore.



You do work above and then bail late here.  Always do early exists early
to reduce useless compile-time.


Ok.


+  opt = fold_ternary (VEC_PERM_EXPR, TREE_TYPE(op0), arg0, arg1, op2);
+  if (!opt)
+   return 0;
+  gimple_assign_set_rhs_from_tree (gsi, opt);


You need to verify that fold_ternary returns something that is valid GIMPLE.
fold () in general happily returns trees that are in the need of
re-gimplification.
You expect a CONSTRUCTOR or VECTOR_CST here, so you should check
for that.


Ok.

Thank you for the reviews,

--
Marc Glisse

[Patch, Fortran, committed] Fix PR54467 (TBP ICEs due to _final wrapper disabling)

2012-09-03 Thread Tobias Burnus

Committed as obvious (Rev. 190892). My patch which disabled the 
generation of the FINAL wrapper subroutine cut out too much, namely also 
normal type-bound procedures. Fixed by the obvious patch.


(The wrapper is currently disabled as it would be an ABI changing patch; 
the current plan is to enable it once the complete FINAL implementation 
is ready - and use a .mod version update to effectively force a 
recompilation.)


Tobias
Index: ChangeLog
===
--- ChangeLog	(Revision 190888)
+++ ChangeLog	(Arbeitskopie)
@@ -1,5 +1,11 @@
 2012-09-03  Tobias Burnus  
 
+	PR fortran/54467
+	* class.c (gfc_find_derived_vtab): Fix disabling of _final
+	by continuing to generate normal type-bound procedures.
+
+2012-09-03  Tobias Burnus  
+
 	* class.c (gfc_find_derived_vtab): Disable ABI-breaking
 	generation of the "_final" subroutine for now.
 
Index: class.c
===
--- class.c	(Revision 190888)
+++ class.c	(Arbeitskopie)
@@ -1634,10 +1634,10 @@ gfc_find_derived_vtab (gfc_symbol *derived)
 	  c->tb = XCNEW (gfc_typebound_proc);
 	  c->tb->ppc = 1;
 	  generate_finalization_wrapper (derived, ns, tname, c);
+	}
 
 	  /* Add procedure pointers for type-bound procedures.  */
 	  add_procs_to_declared_vtab (derived, vtype);
-	}
 	  }
 
 have_vtype:

Re: [PATCH][build] Fix PR54138, make --without-cloog work

2012-09-03 Thread Jakub Jelinek

On Mon, Sep 03, 2012 at 03:27:29PM +0200, Richard Guenther wrote:
> On Tue, Aug 14, 2012 at 1:27 PM, Richard Guenther  wrote:
> >
> > This makes --without-cloog and --without-isl disable GRAPHITE support
> > as intended.
> >
> > Tested up to building stage2 with --without-isl, verified ISL was not
> > used or checked for, tested up to building stage2 without --without-isl,
> > verified system ISL was picked up.
> >
> > Ok for trunk?
> 
> Ping.

Ok.

> > 2012-08-14  Richard Guenther  
> >
> > PR bootstrap/54138
> > * configure.ac: Re-organize ISL / CLOOG checks to allow
> > disabling with either --without-isl or --without-cloog.
> > * configure: Regenerated.
> > * config/cloog.m4: Adjust.
> > * config/isl.m4: Adjust.

Jakub

Re: combine BIT_FIELD_REF and VEC_PERM_EXPR

2012-09-03 Thread Richard Guenther

On Mon, Sep 3, 2012 at 3:39 PM, Marc Glisse  wrote:
> On Mon, 3 Sep 2012, Richard Guenther wrote:
>
>> Please do the early outs where you compute the arguments.  Thus, right
>> after getting op0 in this case or right after computing n for the n != 1
>> check.
>
>
> Ok.
>
>> I think you need to verify that the type of 'op' is actually the element
>> type
>> of op0.  The BIT_FIELD_REF can happily access elements two and three
>> of { 1, 2, 3, 4 } as a long for example.
>
>
> Indeed I missed that.
>
>
>> See the BIT_FIELD_REF foldings in fold-const.c.
>
>
> That's what I was looking at (picked the same variable names size, idx, n)
> but I forgot that test :-(
>
>
>>> +  if (code == VEC_PERM_EXPR)
>>> +{
>>> +  tree p, m, index, tem;
>>> +  unsigned nelts;
>>> +  m = gimple_assign_rhs3 (def_stmt);
>>> +  if (TREE_CODE (m) != VECTOR_CST)
>>> +   return false;
>>> +  nelts = VECTOR_CST_NELTS (m);
>>> +  idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (m, idx));
>>> +  idx %= 2 * nelts;
>>> +  if (idx < nelts)
>>> +   {
>>> + p = gimple_assign_rhs1 (def_stmt);
>>> +   }
>>> +  else
>>> +   {
>>> + p = gimple_assign_rhs2 (def_stmt);
>>> + idx -= nelts;
>>> +   }
>>> +  index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size);
>>> +  tem = fold_build3 (BIT_FIELD_REF, TREE_TYPE (op), p, op1, index);
>>
>>
>> This shouldn't simplify, so you can use build3 instead.
>
>
> I think that it is possible for p to be a VECTOR_CST, if the shuffle
> involves one constant and one non-constant vectors, no?

Well, constant propagation should have handled it ...

If you use fold_build3 you need to check that the result is in expected form
(a is_gimple_invariant or an SSA_NAME).

> Now that I look at this line, I wonder if I am missing some unshare_expr for
> p and/or op1.

If either is a CONSTRUCTOR and its def stmt is not removed and it survives
into tem then yes ...

>> Please also add handling of code == CONSTRUCTOR.
>
>
> The cases I tried were already handled by fre1. I can add code for
> constructor, but I'll need to look for a testcase first. Can that go to a
> different patch?

Yes.

Thanks,
Richard.

> --
> Marc Glisse

Re: combine vec_perm_expr with constructor

2012-09-03 Thread Richard Guenther

On Mon, Sep 3, 2012 at 4:00 PM, Marc Glisse  wrote:
> On Mon, 3 Sep 2012, Richard Guenther wrote:
>
>> You shouldn't need the VECTOR_CST handling - constant propagation should
>> already ensure properly simplified code here (and is the more canonical
>> place
>> to handle this).
>
>
> IIRC, I added VECTOR_CST because of mixed constructor/vector_cst shuffles
> (and because it wasn't too hard). If I remove it (I can), I guess some of
> the testcases won't work anymore.

I see.  If you still have a testcase can you look if CCP does not do something
it should?

>
>> You do work above and then bail late here.  Always do early exists early
>> to reduce useless compile-time.
>
>
> Ok.
>
>
>>> +  opt = fold_ternary (VEC_PERM_EXPR, TREE_TYPE(op0), arg0, arg1,
>>> op2);
>>> +  if (!opt)
>>> +   return 0;
>>> +  gimple_assign_set_rhs_from_tree (gsi, opt);
>>
>>
>> You need to verify that fold_ternary returns something that is valid
>> GIMPLE.
>> fold () in general happily returns trees that are in the need of
>> re-gimplification.
>> You expect a CONSTRUCTOR or VECTOR_CST here, so you should check
>> for that.
>
>
> Ok.
>
> Thank you for the reviews,
>
> --
> Marc Glisse

[Patch contrib] check_GNU_style: remove tmp file

2012-09-03 Thread Christophe Lyon

Hi,

check_GNU_style.sh currently leaves a temporary file in the current directory.
This patch removes it upon exit.

Christophe.

2012-09-03   Christophe Lyon  

* check_GNU_style.sh: Remove temporay file upon exit.


check-gnu-style.patch
Description: Binary data

Re: [PATCH][RFC] Add -Og

2012-09-03 Thread Andi Kleen

Richard Guenther  writes:
>>
>> Comments welcome,
>
> No comments?  Then I'll drop this idea for 4.8.

FWIW I liked the idea. But I'm not really competent to review the 
implementation.

On x86 I would enable frame pointers. Even though gdb doesn't need
them, some other profiling and debugging tools do.

Also I would add a bootstrap-Og build config to make
it easier to test for bootstrap (or maybe change the existing bootstrap-debug?)

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only

[Patch, ARM] Replace gen_rtx_CONST_INT by GEN_INT

2012-09-03 Thread Christophe Lyon

Hi,

As discussed in
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00077.html, this patch is
a cleanup pass to replace calls to gen_rtx_CONST_INT by GEN_INT.

Tested on arm-linuxeabi with qemu.

Christophe.

2012-09-03  Christophe Lyon  

* config/arm/arm.c (arm_expand_builtin): Replace gen_rtx_CONST_INT
by GEN_INT.
(arm_emit_coreregs_64bit_shift): Likewise.


gen-int.patch
Description: Binary data

[patch,libgcc] fp-bit.c: filter-out LIB2FUNCS_EXCLUDE

2012-09-03 Thread Georg-Johann Lay

Currently LIB2FUNCS_EXCLUDE is ignoref for the bits of libgcc*.a
that come from fp-bit.c, fixed by this patch.

Ok to install?

Johann

* Makefile.in (FPBIT_FUNCS): filter-out LIB2FUNCS_EXCLUDE.
(DPBIT_FUNCS): Ditto.
(TPBIT_FUNCS): Ditto.
Index: libgcc/Makefile.in
===
--- libgcc/Makefile.in	(revision 190873)
+++ libgcc/Makefile.in	(working copy)
@@ -516,6 +516,10 @@ FPBIT_FUNCS := $(filter-out _sf_to_tf,$(
 DPBIT_FUNCS := $(filter-out _df_to_tf,$(DPBIT_FUNCS))
 endif
 
+FPBIT_FUNCS := $(filter-out $(LIB2FUNCS_EXCLUDE),$(FPBIT_FUNCS))
+DPBIT_FUNCS := $(filter-out $(LIB2FUNCS_EXCLUDE),$(DPBIT_FUNCS))
+TPBIT_FUNCS := $(filter-out $(LIB2FUNCS_EXCLUDE),$(TPBIT_FUNCS))
+
 fpbit-src := $(srcdir)/fp-bit.c
 
 # Build FPBIT.

Re: combine vec_perm_expr with constructor

2012-09-03 Thread Marc Glisse


On Mon, 3 Sep 2012, Richard Guenther wrote:


On Mon, Sep 3, 2012 at 4:00 PM, Marc Glisse  wrote:

On Mon, 3 Sep 2012, Richard Guenther wrote:


You shouldn't need the VECTOR_CST handling - constant propagation should
already ensure properly simplified code here (and is the more canonical
place
to handle this).



IIRC, I added VECTOR_CST because of mixed constructor/vector_cst shuffles
(and because it wasn't too hard). If I remove it (I can), I guess some of
the testcases won't work anymore.


I see.  If you still have a testcase can you look if CCP does not do 
something it should?


I think CCP is working fine, the fold_ternary patch you approved today 
tests some of that (without that patch, sometimes ccp1 does half the work 
and fre1 finishes it, and since forwprop1 is before fre1, I hit that 
case there). Is there a particular scenario you have in mind that might 
not be handled?


Here I was concerned with:
x={a,b}; // constructor
y={18,42}; // vector_cst
m={0,3};
__builtin_shuffle(x,y,m) // should be {a,42}

--
Marc Glisse

Re: [PATCH 3/3] Compute predicates for phi node results in ipa-inline-analysis.c

2012-09-03 Thread Jan Hubicka

> On Fri, Aug 31, 2012 at 7:24 PM, Martin Jambor  wrote:
> > Hi,
> >
> > On Thu, Aug 30, 2012 at 05:11:35PM +0200, Martin Jambor wrote:
> >> this is a new version of the patch which makes ipa analysis produce
> >> predicates for PHI node results, at least at the bottom of the
> >> simplest diamond and semi-diamond CFG subgraphs.  This time I also
> >> analyze the conditions again rather than extracting information from
> >> CFG edges, which means I can reason about substantially more PHI
> >> nodes.
> >>
> >> This patch makes us produce loop bounds hint for the pr48636.f90
> >> testcase.
> >>
> >> Bootstrapped and tested on x86_64-linux.  OK for trunk?
> >>
> >> Thanks,
> >>
> >> Martin
> >>
> >>
> >> 2012-08-29  Martin Jambor  
> >>
> >>   * ipa-inline-analysis.c (phi_result_unknown_predicate): New function.
> >>   (predicate_for_phi_result): Likewise.
> >>   (estimate_function_body_sizes): Use the above two functions.
> >>
> >
> > This patch, on top of the one doing loop calculations almost always,
> > introduces a number of testsuite failures which somehow I had not
> > caught during my testing.  The problem is that either
> > calculate_dominance_info or loop_optimizer_init introduce new SSA
> > names for which there is no index in nonconstant_names which is
> > allocated before the dominance and loop computations.  I'm currently
> > bootstrapping and testing the following fix which simply allocates the
> > vector after doing the two computations.  If it passes I will commit
> > it straight away so that the regression is fixed before I leave for
> > the weekend, I hope it's obvious enough for that.
> >
> > On the other hand, it would really be better if we did not change
> > function bodies during IPA summary generation phase...
> 
> Um ... we shouldn't do this.  Can you track down where it happens?  I
> suppose it might come from CFG manipulations loop_optimizer_init
> performs when not passing AVOID_CFG_MODIFICATIONS.

I bet it come from loop noromalization :) (i.e. loop closed form
or preheader construction both needs new SSA names.)
I think it would be best to make pass manager to handle this and make
loop normalization to happen once before all SSA IPA analysis

Honza

Re: [Patch, ARM] Replace gen_rtx_CONST_INT by GEN_INT

2012-09-03 Thread Ramana Radhakrishnan


On 09/03/12 16:28, Christophe Lyon wrote:

Hi,

As discussed in
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00077.html, this patch is
a cleanup pass to replace calls to gen_rtx_CONST_INT by GEN_INT.

Tested on arm-linuxeabi with qemu.


Ok. Thanks for doing this so quickly.

Thanks,
Ramana

Re: [PATCH][RFC] Add -Og

2012-09-03 Thread Michael Matz

Hi,

On Mon, 3 Sep 2012, Richard Guenther wrote:

> > Comments welcome,
> 
> No comments?  Then I'll drop this idea for 4.8.

Hey, don't discard my face2face comments :)  Regarding -Og it's about 
time, I like it, and your implementation is a start.  The pass list (e.g. 
if to include LIM or not, or in a limited form) can be incrementally 
changed.

Ciao,
Michael.

Re: [PATCH, C] Mixed pointer types in call to streamer_tree_cache_lookup() in gcc/lto-streamer-out.c

2012-09-03 Thread Andris Pavenis


On 09/03/2012 03:27 PM, Richard Guenther wrote:

On Sat, Sep 1, 2012 at 2:21 PM, Andris Pavenis  wrote:

uint32_t * is used as a 3rd parameter in call to
streamer_tree_cache_lookup()
in 2 places in gcc/lto-streamer-out.c when the procedure prototype have
unsigned *. They are not guaranteed to be the same for all targets
(I got error when building for DJGPP)


Ok.


I do not have SVN write access, so I cannot commit myself

Andris



Thanks,
Richard.


Andris

ChangeLog entry

2012-09-01  Andris Pavenis 

 * lto-streamer-out.c (write_global_references,
lto_output_decl_state_refs):
 Fix parameter type in call to streamer_tree_cache_lookup

Re: combine BIT_FIELD_REF and VEC_PERM_EXPR

2012-09-03 Thread Marc Glisse


On Mon, 3 Sep 2012, Richard Guenther wrote:


+  if (code == VEC_PERM_EXPR)
+{
+  tree p, m, index, tem;
+  unsigned nelts;
+  m = gimple_assign_rhs3 (def_stmt);
+  if (TREE_CODE (m) != VECTOR_CST)
+   return false;
+  nelts = VECTOR_CST_NELTS (m);
+  idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (m, idx));
+  idx %= 2 * nelts;
+  if (idx < nelts)
+   {
+ p = gimple_assign_rhs1 (def_stmt);
+   }
+  else
+   {
+ p = gimple_assign_rhs2 (def_stmt);
+ idx -= nelts;
+   }
+  index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size);
+  tem = fold_build3 (BIT_FIELD_REF, TREE_TYPE (op), p, op1, index);



This shouldn't simplify, so you can use build3 instead.



I think that it is possible for p to be a VECTOR_CST, if the shuffle
involves one constant and one non-constant vectors, no?


Well, constant propagation should have handled it ...


When it sees __builtin_shuffle(cst1,var,cst2)[cst3], CCP should basically 
do the same thing I am doing here, in the hope that the element will be 
part of cst1 instead of var? What if builtin_shuffle takes 2 constructors, 
one of which contains at least one constant? It looks easier to handle it 
here and let the next run of CCP notice the simplified expression. Or do 
you mean I should add the new function to CCP (or even fold) instead of 
forwprop? (wouldn't be the first time CCP does more than constant 
propagation)



If you use fold_build3 you need to check that the result is in expected form
(a is_gimple_invariant or an SSA_NAME).


Now that I look at this line, I wonder if I am missing some unshare_expr for
p and/or op1.


If either is a CONSTRUCTOR and its def stmt is not removed and it survives
into tem then yes ...


But the integer_cst doesn't need it. Ok, thanks.

--
Marc Glisse

[Patch,avr] PR54461: Better AVR-Libc integration

2012-09-03 Thread Georg-Johann Lay

AVR-Libc comes with hand-optimized float support functions written
in assembler.  These functions use the same naming conventions like
libgcc.  There are situations where this name clashed lead to performance
regression because the functions from libgcc are linked.  One example
are the new fixed-point support that convert fixed-point to/from float
and reference float/int conversion functions from within libgcc.

The float implementation in libm.a have been discussed several times
with the only result that it is very unlikely that the code will
ever be integrated into libgcc because the original authors are no
more around.  And is is much less work to add a new configure switch
than to port and integrate the code, given there were no license issues.
One point against such an extension was that such change to the compiler
establishes a dependency between the compiler and AVR-Libc, but this
decision has been made long ago by accepting code that actually should
had been added to libgcc -- but was not for whatever reason.

This patch removes that performance regressions by removing the
doubly implemented functions from libgcc by means of a new configure
option --with-avrlibc.

Moreover, some specs are adjusted so that -lm is treated very much like
-lgcc so that the user need not specify -lm by hand for core float
support like int/float conversions.

Without this patch, LTO compilations also lead to performance regression
because lto adds -plugin-opt=-pass-through=-lgcc etc. prior to the -lm
specified by the user.

Other cases where code is improved are C++ programs, see PR28718 for a
discussion.


There are less fails in gcc.dg/fixed-point, presumably because the rounding
is as expected by the test cases, i.e. there are no rounding errors as
mentioned in the review for PR54222:
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01586.html

Ok to install?


Johann


PR target/54461
* configure.ac (noconfigdirs,target=avr-*-*): Add target-newlib,
target-libgloss if configured --with-avrlibc.
* configure: Regenerate.

libgcc/
PR target/54461
* config.host (tmake_file,host=avr-*-*): Add avr/t-avrlibc if
configured --with-avrlibc.
* config/avr/t-avrlibc: New file.

gcc/
PR target/54461
* config.gcc (tm_file,target=avr-*-*): Add avr/avrlibc.h if
configured --with-avrlibc.
(tm_defines,target=avr-*-*): Add WITH_AVRLIBC if configured
--with-avrlibc.
* config/avr/avrlibc.h: New file.
* config/avr/avr-c.c: Build-in define __WITH_AVRLIBC__ if
configured --with-avrlibc.
Index: configure
===
--- configure	(revision 190887)
+++ configure	(working copy)
@@ -3500,6 +3500,13 @@ case "${target}" in
   arm-*-riscix*)
 noconfigdirs="$noconfigdirs ld target-libgloss"
 ;;
+  avr-*-rtems*)
+;;
+  avr-*-*)
+if test x${with_avrlibc} = xyes; then
+  noconfigdirs="$noconfigdirs target-newlib target-libgloss"
+fi
+;;
   c4x-*-* | tic4x-*-*)
 noconfigdirs="$noconfigdirs target-libgloss"
 ;;
Index: configure.ac
===
--- configure.ac	(revision 190887)
+++ configure.ac	(working copy)
@@ -891,6 +891,13 @@ case "${target}" in
   arm-*-riscix*)
 noconfigdirs="$noconfigdirs ld target-libgloss"
 ;;
+  avr-*-rtems*)
+;;
+  avr-*-*)
+if test x${with_avrlibc} = xyes; then
+  noconfigdirs="$noconfigdirs target-newlib target-libgloss"
+fi
+;;
   c4x-*-* | tic4x-*-*)
 noconfigdirs="$noconfigdirs target-libgloss"
 ;;
Index: libgcc/config/avr/t-avrlibc
===
--- libgcc/config/avr/t-avrlibc	(revision 0)
+++ libgcc/config/avr/t-avrlibc	(revision 0)
@@ -0,0 +1,66 @@
+# This file is used with --with-avrlibc=yes
+#
+# AVR-Libc comes with hand-optimized float routines.
+# For historical reasons, these routines live in AVR-Libc
+# and not in libgcc and use the same function names like libgcc.
+# To get the best support, i.e. always use the routines from
+# AVR-Libc, we remove these routines from libgcc.
+#
+# See also PR54461.
+#
+#
+# Arithmetic:
+# __addsf3 __subsf3 __divsf3 __mulsf3 __negsf2
+#
+# Comparison:
+# __cmpsf2 __unordsf2
+# __eqsf2 __lesf2 __ltsf2 __nesf2 __gesf2 __gtsf2
+#
+# Conversion:
+# __fixsfdi __fixunssfdi __floatdisf __floatundisf
+# __fixsfsi __fixunssfsi __floatsisf __floatunsisf
+#
+#
+# These functions are contained in modules:
+#
+# _addsub_sf.o:   __addsf3  __subsf3
+# _mul_sf.o:  __mulsf3
+# _div_sf.o:  __divsf3
+# _negate_sf.o:   __negsf2
+#
+# _compare_sf.o:  __cmpsf2
+# _unord_sf.o:__unordsf2
+# _eq_sf.o:   __eqsf2
+# _ne_sf.o:   __nesf2
+# _ge_sf.o:   __gesf2
+# _gt_sf.o:   __gtsf2
+# _le_sf.o:   __lesf2
+# _lt_sf.o:   __ltsf2
+#
+# _fixsfdi.o: __fixsfdi
+# _fixunssfdi.o:  __fixunssfdi
+# _fixun

[Patch,avr] Fix PR54220

2012-09-03 Thread Georg-Johann Lay

This implements TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS as
obvious fix for PR54220.

Ok to install?

Johann


PR target/54220
* config/avr/avr.c (TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS): New
define to...
(avr_allocate_stack_slots_for_args): ...this new static function.
Index: gcc/config/avr/avr.c
===
--- gcc/config/avr/avr.c	(revision 190873)
+++ gcc/config/avr/avr.c	(working copy)
@@ -700,6 +700,16 @@ avr_regs_to_save (HARD_REG_SET *set)
   return count;
 }
 
+
+/* Implement `TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS' */
+
+static bool
+avr_allocate_stack_slots_for_args (void)
+{
+  return !cfun->machine->is_naked;
+}
+
+
 /* Return true if register FROM can be eliminated via register TO.  */
 
 static bool
@@ -11339,6 +11349,9 @@ avr_fold_builtin (tree fndecl, int n_arg
 #undef  TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE avr_can_eliminate
 
+#undef  TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
+#define TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS avr_allocate_stack_slots_for_args
+
 #undef TARGET_WARN_FUNC_RETURN
 #define TARGET_WARN_FUNC_RETURN avr_warn_func_return

Re: [PATCH] Fix emit_conditional_add and documentation for add@var{mode}cc

2012-09-03 Thread Andrew Pinski

On Mon, Sep 3, 2012 at 4:12 AM, Richard Guenther
 wrote:
> On Sat, Aug 25, 2012 at 12:43 AM, Andrew Pinski  wrote:
>> Forgot to attach the patch.
>>
>> -- Andrew
>>
>> On Fri, Aug 24, 2012 at 3:42 PM, Andrew Pinski  wrote:
>>> Hi,
>>>   I decided to split this patch from the other patch which uses
>>> emit_conditional_add in expand as that part of the patch needs some
>>> work.  This part of the patch can be applied separately and it fixes a
>>> few things dealing with conditional adds.
>>>
>>> First the documentation is wrong for the pattern as we do the addition
>>> if operand 0 is true rather than false.
>>> Then emit_conditional_add is wrong as you cannot switch around op2 and op3.
>>>
>>> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Which does not have conditional add ...

It has a pattern for conditional add (addcc).

Thanks,
Andrew

>
> Well, the patch looks ok to me.  I suppose the op2 and op3 swapping was
> supposed to canonicalize operands, but I see that addcc does not have a 
> separate
> argument slot for the value to use when the comparison is false.



>
> Thanks,
> Richard.
>
>>> Thanks,
>>> Andrew Pinski
>>>
>>> ChangeLog:
>>>  * optabs.c (emit_conditional_add): Correct comment about the arguments.
>>> Remove code which might swap op2 and op3 since they cannot be swapped.
>>> * doc/md.texi (add@var{mode}cc): Fix document about how the arguments are 
>>> used.

Re: [Patch,avr] PR54461: Better AVR-Libc integration

2012-09-03 Thread Gabriel Dos Reis

On Mon, Sep 3, 2012 at 11:29 AM, Georg-Johann Lay  wrote:
> AVR-Libc comes with hand-optimized float support functions written
> in assembler.  These functions use the same naming conventions like
> libgcc.  There are situations where this name clashed lead to performance
> regression because the functions from libgcc are linked.  One example
> are the new fixed-point support that convert fixed-point to/from float
> and reference float/int conversion functions from within libgcc.
>
> The float implementation in libm.a have been discussed several times
> with the only result that it is very unlikely that the code will
> ever be integrated into libgcc because the original authors are no
> more around.  And is is much less work to add a new configure switch
> than to port and integrate the code, given there were no license issues.
> One point against such an extension was that such change to the compiler
> establishes a dependency between the compiler and AVR-Libc, but this
> decision has been made long ago by accepting code that actually should
> had been added to libgcc -- but was not for whatever reason.
>
> This patch removes that performance regressions by removing the
> doubly implemented functions from libgcc by means of a new configure
> option --with-avrlibc.

Johann,

as I stated yesterday, I do not understand why there needs to be yet another
configure option. The NATURAL libc for ARV targets is ARV-libc.  We
should not need a
switch for that.

-- Gaby

>
> Moreover, some specs are adjusted so that -lm is treated very much like
> -lgcc so that the user need not specify -lm by hand for core float
> support like int/float conversions.
>
> Without this patch, LTO compilations also lead to performance regression
> because lto adds -plugin-opt=-pass-through=-lgcc etc. prior to the -lm
> specified by the user.
>
> Other cases where code is improved are C++ programs, see PR28718 for a
> discussion.
>
>
> There are less fails in gcc.dg/fixed-point, presumably because the rounding
> is as expected by the test cases, i.e. there are no rounding errors as
> mentioned in the review for PR54222:
> http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01586.html
>
> Ok to install?
>
>
> Johann
>
>
> PR target/54461
> * configure.ac (noconfigdirs,target=avr-*-*): Add target-newlib,
> target-libgloss if configured --with-avrlibc.
> * configure: Regenerate.
>
> libgcc/
> PR target/54461
> * config.host (tmake_file,host=avr-*-*): Add avr/t-avrlibc if
> configured --with-avrlibc.
> * config/avr/t-avrlibc: New file.
>
> gcc/
> PR target/54461
> * config.gcc (tm_file,target=avr-*-*): Add avr/avrlibc.h if
> configured --with-avrlibc.
> (tm_defines,target=avr-*-*): Add WITH_AVRLIBC if configured
> --with-avrlibc.
> * config/avr/avrlibc.h: New file.
> * config/avr/avr-c.c: Build-in define __WITH_AVRLIBC__ if
> configured --with-avrlibc.

Re: [PATCH][RFC] Add -Og

2012-09-03 Thread H.J. Lu

On Mon, Sep 3, 2012 at 6:28 AM, Richard Guenther
 wrote:
> On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther  wrote:
>>
>> This adds a new optimization level, -Og, as previously discussed.
>> It aims at providing fast compilation, a superior debugging
>> experience and reasonable runtime performance.  Instead of making
>> -O1 this optimization level this adds a new -Og.
>>
>> It's a first cut, highlighting that our fixed pass pipeline and
>> simply enabling/disabling individual passes (but not pass copies
>> for example) doesn't scale to properly differentiate between
>> -Og and -O[23].  -O1 should get similar treatment, eventually
>> just building on -Og but not focusing on debugging experience.
>> That is, I expect that in the end we will at least have two post-IPA
>> optimization pipelines.  It also means that you cannot enable
>> PRE or VRP with -Og at the moment because these passes are not
>> anywhere scheduled (similar to the situation with -O0).
>>
>> It has some funny effect on dump-file naming of the pass copies
>> though, which hints at that the current setup is too static.
>> For that reason the new queue comes after the old, to not confuse
>> too many testcases.
>>
>> It also does not yet disable any of the early optimizations that
>> make debugging harder (SRA comes to my mind here, as does
>> switch-conversion and partial inlining).
>>
>> The question arises if we want to support in any reasonable
>> way using profile-feedback or LTO for -O[01g], thus if we
>> rather want to delay some of the early opts to after IPA
>> optimizations.
>>
>> Not bootstrapped or fully tested, but it works for the compile
>> torture.
>>
>> Comments welcome,
>
> No comments?  Then I'll drop this idea for 4.8.
>

When I debug binutils, I have to use -O0 -g to get precise
line and variable info.  Also glibc has to be compiled with
-O, which makes debug a challenge.  Will -Og help bintils
and glibc debug?


H.J.

Re: [Patch,avr] PR54461: Better AVR-Libc integration

2012-09-03 Thread Georg-Johann Lay


Gabriel Dos Reis schrieb:

On Mon, Sep 3, 2012 at 11:29 AM, Georg-Johann Lay  wrote:

AVR-Libc comes with hand-optimized float support functions written
in assembler.  These functions use the same naming conventions like
libgcc.  There are situations where this name clashed lead to performance
regression because the functions from libgcc are linked.  One example
are the new fixed-point support that convert fixed-point to/from float
and reference float/int conversion functions from within libgcc.

The float implementation in libm.a have been discussed several times
with the only result that it is very unlikely that the code will
ever be integrated into libgcc because the original authors are no
more around.  And is is much less work to add a new configure switch
than to port and integrate the code, given there were no license issues.
One point against such an extension was that such change to the compiler
establishes a dependency between the compiler and AVR-Libc, but this
decision has been made long ago by accepting code that actually should
had been added to libgcc -- but was not for whatever reason.

This patch removes that performance regressions by removing the
doubly implemented functions from libgcc by means of a new configure
option --with-avrlibc.


Johann,

as I stated yesterday, I do not understand why there needs to be yet another
configure option. The NATURAL libc for ARV targets is ARV-libc.  We
should not need a switch for that.


There is also newlib that is used with avr-gcc.  I know this because
some bugs are only triggered for newlib.  If there are users that
report bugs if avr-gcc is build for newlib, I'd guess these users are
actually interested in using newlib.

It's clear that the proposed changes do *not* work with newlib
because newlib does not mimic parts of libgcc.

Johann


Moreover, some specs are adjusted so that -lm is treated very much like
-lgcc so that the user need not specify -lm by hand for core float
support like int/float conversions.

Without this patch, LTO compilations also lead to performance regression
because lto adds -plugin-opt=-pass-through=-lgcc etc. prior to the -lm
specified by the user.

Other cases where code is improved are C++ programs, see PR28718 for a
discussion.


There are less fails in gcc.dg/fixed-point, presumably because the rounding
is as expected by the test cases, i.e. there are no rounding errors as
mentioned in the review for PR54222:
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01586.html

Ok to install?


Johann


PR target/54461
* configure.ac (noconfigdirs,target=avr-*-*): Add target-newlib,
target-libgloss if configured --with-avrlibc.
* configure: Regenerate.

libgcc/
PR target/54461
* config.host (tmake_file,host=avr-*-*): Add avr/t-avrlibc if
configured --with-avrlibc.
* config/avr/t-avrlibc: New file.

gcc/
PR target/54461
* config.gcc (tm_file,target=avr-*-*): Add avr/avrlibc.h if
configured --with-avrlibc.
(tm_defines,target=avr-*-*): Add WITH_AVRLIBC if configured
--with-avrlibc.
* config/avr/avrlibc.h: New file.
* config/avr/avr-c.c: Build-in define __WITH_AVRLIBC__ if
configured --with-avrlibc.

Re: [PATCH][RFC] Add -Og

2012-09-03 Thread rguenther

"H.J. Lu"  wrote:

>On Mon, Sep 3, 2012 at 6:28 AM, Richard Guenther
> wrote:
>> On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther 
>wrote:
>>>
>>> This adds a new optimization level, -Og, as previously discussed.
>>> It aims at providing fast compilation, a superior debugging
>>> experience and reasonable runtime performance.  Instead of making
>>> -O1 this optimization level this adds a new -Og.
>>>
>>> It's a first cut, highlighting that our fixed pass pipeline and
>>> simply enabling/disabling individual passes (but not pass copies
>>> for example) doesn't scale to properly differentiate between
>>> -Og and -O[23].  -O1 should get similar treatment, eventually
>>> just building on -Og but not focusing on debugging experience.
>>> That is, I expect that in the end we will at least have two post-IPA
>>> optimization pipelines.  It also means that you cannot enable
>>> PRE or VRP with -Og at the moment because these passes are not
>>> anywhere scheduled (similar to the situation with -O0).
>>>
>>> It has some funny effect on dump-file naming of the pass copies
>>> though, which hints at that the current setup is too static.
>>> For that reason the new queue comes after the old, to not confuse
>>> too many testcases.
>>>
>>> It also does not yet disable any of the early optimizations that
>>> make debugging harder (SRA comes to my mind here, as does
>>> switch-conversion and partial inlining).
>>>
>>> The question arises if we want to support in any reasonable
>>> way using profile-feedback or LTO for -O[01g], thus if we
>>> rather want to delay some of the early opts to after IPA
>>> optimizations.
>>>
>>> Not bootstrapped or fully tested, but it works for the compile
>>> torture.
>>>
>>> Comments welcome,
>>
>> No comments?  Then I'll drop this idea for 4.8.
>>
>
>When I debug binutils, I have to use -O0 -g to get precise
>line and variable info.  Also glibc has to be compiled with
>-O, which makes debug a challenge.  Will -Og help bintils
>and glibc debug?

I suppose so, but it is hard to tell without knowing more about the issues.

Richard.
>
>H.J.


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Re: [PATCH][RFC] Add -Og

2012-09-03 Thread H.J. Lu

On Mon, Sep 3, 2012 at 11:50 AM,   wrote:
> "H.J. Lu"  wrote:
>
>>On Mon, Sep 3, 2012 at 6:28 AM, Richard Guenther
>> wrote:
>>> On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther 
>>wrote:

 This adds a new optimization level, -Og, as previously discussed.
 It aims at providing fast compilation, a superior debugging
 experience and reasonable runtime performance.  Instead of making
 -O1 this optimization level this adds a new -Og.

 It's a first cut, highlighting that our fixed pass pipeline and
 simply enabling/disabling individual passes (but not pass copies
 for example) doesn't scale to properly differentiate between
 -Og and -O[23].  -O1 should get similar treatment, eventually
 just building on -Og but not focusing on debugging experience.
 That is, I expect that in the end we will at least have two post-IPA
 optimization pipelines.  It also means that you cannot enable
 PRE or VRP with -Og at the moment because these passes are not
 anywhere scheduled (similar to the situation with -O0).

 It has some funny effect on dump-file naming of the pass copies
 though, which hints at that the current setup is too static.
 For that reason the new queue comes after the old, to not confuse
 too many testcases.

 It also does not yet disable any of the early optimizations that
 make debugging harder (SRA comes to my mind here, as does
 switch-conversion and partial inlining).

 The question arises if we want to support in any reasonable
 way using profile-feedback or LTO for -O[01g], thus if we
 rather want to delay some of the early opts to after IPA
 optimizations.

 Not bootstrapped or fully tested, but it works for the compile
 torture.

 Comments welcome,
>>>
>>> No comments?  Then I'll drop this idea for 4.8.
>>>
>>
>>When I debug binutils, I have to use -O0 -g to get precise
>>line and variable info.  Also glibc has to be compiled with
>>-O, which makes debug a challenge.  Will -Og help bintils
>>and glibc debug?
>
> I suppose so, but it is hard to tell without knowing more about the issues.
>

The main issues are

1. I need to know precise values for all local variables at all times.
2. Compiler shouldn't inline a function or move lines around.

-- 
H.J.

Re: [Patch,avr] PR54461: Better AVR-Libc integration

2012-09-03 Thread Gabriel Dos Reis

On Mon, Sep 3, 2012 at 1:46 PM, Georg-Johann Lay  wrote:
> Gabriel Dos Reis schrieb:
>
>> On Mon, Sep 3, 2012 at 11:29 AM, Georg-Johann Lay  wrote:
>>>
>>> AVR-Libc comes with hand-optimized float support functions written
>>> in assembler.  These functions use the same naming conventions like
>>> libgcc.  There are situations where this name clashed lead to performance
>>> regression because the functions from libgcc are linked.  One example
>>> are the new fixed-point support that convert fixed-point to/from float
>>> and reference float/int conversion functions from within libgcc.
>>>
>>> The float implementation in libm.a have been discussed several times
>>> with the only result that it is very unlikely that the code will
>>> ever be integrated into libgcc because the original authors are no
>>> more around.  And is is much less work to add a new configure switch
>>> than to port and integrate the code, given there were no license issues.
>>> One point against such an extension was that such change to the compiler
>>> establishes a dependency between the compiler and AVR-Libc, but this
>>> decision has been made long ago by accepting code that actually should
>>> had been added to libgcc -- but was not for whatever reason.
>>>
>>> This patch removes that performance regressions by removing the
>>> doubly implemented functions from libgcc by means of a new configure
>>> option --with-avrlibc.
>>
>>
>> Johann,
>>
>> as I stated yesterday, I do not understand why there needs to be yet
>> another
>> configure option. The NATURAL libc for ARV targets is ARV-libc.  We
>> should not need a switch for that.
>
>
> There is also newlib that is used with avr-gcc.  I know this because
> some bugs are only triggered for newlib.  If there are users that
> report bugs if avr-gcc is build for newlib, I'd guess these users are
> actually interested in using newlib.

I did not say there was no other libc library.  I said
that the *natural* libc appears to be AVR-libc.

We don't configure GCC/g++ saying --with-libstdc++.

-- Gaby

>
> It's clear that the proposed changes do *not* work with newlib
> because newlib does not mimic parts of libgcc.
>
> Johann
>
>
>>> Moreover, some specs are adjusted so that -lm is treated very much like
>>> -lgcc so that the user need not specify -lm by hand for core float
>>> support like int/float conversions.
>>>
>>> Without this patch, LTO compilations also lead to performance regression
>>> because lto adds -plugin-opt=-pass-through=-lgcc etc. prior to the -lm
>>> specified by the user.
>>>
>>> Other cases where code is improved are C++ programs, see PR28718 for a
>>> discussion.
>>>
>>>
>>> There are less fails in gcc.dg/fixed-point, presumably because the
>>> rounding
>>> is as expected by the test cases, i.e. there are no rounding errors as
>>> mentioned in the review for PR54222:
>>> http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01586.html
>>>
>>> Ok to install?
>>>
>>>
>>> Johann
>>>
>>>
>>> PR target/54461
>>> * configure.ac (noconfigdirs,target=avr-*-*): Add target-newlib,
>>> target-libgloss if configured --with-avrlibc.
>>> * configure: Regenerate.
>>>
>>> libgcc/
>>> PR target/54461
>>> * config.host (tmake_file,host=avr-*-*): Add avr/t-avrlibc if
>>> configured --with-avrlibc.
>>> * config/avr/t-avrlibc: New file.
>>>
>>> gcc/
>>> PR target/54461
>>> * config.gcc (tm_file,target=avr-*-*): Add avr/avrlibc.h if
>>> configured --with-avrlibc.
>>> (tm_defines,target=avr-*-*): Add WITH_AVRLIBC if configured
>>> --with-avrlibc.
>>> * config/avr/avrlibc.h: New file.
>>> * config/avr/avr-c.c: Build-in define __WITH_AVRLIBC__ if
>>> configured --with-avrlibc.
>>
>>
>

Re: [Patch,avr] PR54461: Better AVR-Libc integration

2012-09-03 Thread Georg-Johann Lay


Gabriel Dos Reis schrieb:

Georg-Johann Lay wrote:

Gabriel Dos Reis schrieb:

Georg-Johann Lay wrote:

AVR-Libc comes with hand-optimized float support functions written
in assembler.  These functions use the same naming conventions like
libgcc.  There are situations where this name clashed lead to performance
regression because the functions from libgcc are linked.  One example
are the new fixed-point support that convert fixed-point to/from float
and reference float/int conversion functions from within libgcc.

The float implementation in libm.a have been discussed several times
with the only result that it is very unlikely that the code will
ever be integrated into libgcc because the original authors are no
more around.  And is is much less work to add a new configure switch
than to port and integrate the code, given there were no license issues.
One point against such an extension was that such change to the compiler
establishes a dependency between the compiler and AVR-Libc, but this
decision has been made long ago by accepting code that actually should
had been added to libgcc -- but was not for whatever reason.

This patch removes that performance regressions by removing the
doubly implemented functions from libgcc by means of a new configure
option --with-avrlibc.


as I stated yesterday, I do not understand why there needs to be yet
another
configure option. The NATURAL libc for ARV targets is ARV-libc.  We
should not need a switch for that.


There is also newlib that is used with avr-gcc.  I know this because
some bugs are only triggered for newlib.  If there are users that
report bugs if avr-gcc is build for newlib, I'd guess these users are
actually interested in using newlib.


I did not say there was no other libc library.  I said
that the *natural* libc appears to be AVR-libc.

We don't configure GCC/g++ saying --with-libstdc++.


That's a different story because these libraries support in-tree
build just like newlib does.  This is not true for AVR-Libc which
does not support in-tree builds.

I agree that AVR-Libc is the most common libc implementation
used with avr-gcc and is has many advantages over other libc
implementation (except that it does not support in-tree builds).

However, a --with-avrlibc is not needed to *get* the support
from AVR-Libc, it's just used to fix some problems that arise
in certain use cases.

Besides that, the proposed arrangement does not affect the
configuration if the switch is *not* specified, thus the patch
is appropriate to be backported.

My intention is to backport it to 4.7 as indicated by the milestone,
but if the change was unconditionally I don't think the change is
appropriate for a backport.

And after all it's just a *configure* option that some distribution
maintainers can set if they want to.  The tool chain user is not
bothered at all by the new option and won't even notice it.
From the user perspective it's just as if some optimizations
had been added to the tool chain.

What do you propose?

Use the setting per default and support a --with-avrlibc=no if
the user want full libgcc support and nothing removed from it?


It's clear that the proposed changes do *not* work with newlib
because newlib does not mimic parts of libgcc.

Johann

Re: [patch] Fix PR rtl-optimization/54290

2012-09-03 Thread Eric Botcazou

> 2012-09-02  Eric Botcazou  
> 
>   PR rtl-optimization/54290
>   * reload1.c (choose_reload_regs): Also take into account secondary MEMs
>   to remove address replacements for inherited reloads.

I forgot to attach the testcase...

* gcc.c-torture/execute/20120902-1.c: New test.


-- 
Eric Botcazou

-- 
Eric Botcazou
/* PR rtl-optimization/54290 */
/* Testcase by Eric Volk  */

double vd[2] = {1., 0.};
int vi[2] = {1234567890, 0};
double *pd = vd;
int *pi = vi;

extern void abort(void);

void init (int *n, int *dummy) __attribute__ ((noinline,noclone));

void init (int *n, int *dummy)
{
  if(0 == n) dummy[0] = 0;
}

int main (void)
{
  int dummy[1532];
  int i = -1, n = 1, s = 0;
  init (&n, dummy);
  while (i < n) {
if (i == 0) {
  if (pd[i] > 0) {
if (pi[i] > 0) {
  s += pi[i];
}
  }
  pd[i] = pi[i];
}
++i;
  }
  if (s != 1234567890)
abort ();
}

[PATCH] Improve the mode which is used for insertion

2012-09-03 Thread Andrew Pinski

Hi,
  On MIPS it is better sometimes not to use the word mode (DImode) but
rather SImode.  The main reason is for 32bits, the value has to be
sign extended to 64bits so using the 32bit instruction for insertion
is allows for that to happen automatically.

This patch implements this by adding a target hook which returns the
modes which are valid for doing the insertion and modifying
store_bit_field_1 to use them if they exists.

OK?  Bootstrapped and tested on both x86_64-linux-gnu and
mips64-linux-gnu with no regressions.

Thanks,
Andrew Pinski


* target.h (gcc_target): Add mode_for_extraction_insv.
* expmed.c (store_bit_field_1): Use mode_for_extraction_insv
and split out into ...
(store_bit_field_2): Here.
* target-def.h (TARGET_MODE_FOR_EXTRACTION_INSV): Define.
(TARGET_INITIALIZER): Add TARGET_MODE_FOR_EXTRACTION_INSV.
* config/mips/mips.c (mips_mode_for_extraction_insv): New function.
(TARGET_MODE_FOR_EXTRACTION_INSV): Define.
Index: doc/tm.texi
===
--- doc/tm.texi (revision 190863)
+++ doc/tm.texi (working copy)
@@ -8853,6 +8853,10 @@ The default is that no label is emitted.
 If the target implements @code{TARGET_ASM_UNWIND_EMIT}, this hook may be used 
to emit a directive to install a personality hook into the unwind info.  This 
hook should not be used if dwarf2 unwind info is used.
 @end deftypefn
 
+@deftypefn {Target Hook} {enum machine_mode *} TARGET_MODE_FOR_EXTRACTION_INSV 
(void)
+Returns a list of the modes to try for doing a bitfield insertation
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_ASM_UNWIND_EMIT (FILE *@var{stream}, rtx 
@var{insn})
 This target hook emits assembly directives required to unwind the
 given instruction.  This is only used when @code{TARGET_EXCEPT_UNWIND_INFO}
Index: doc/tm.texi.in
===
--- doc/tm.texi.in  (revision 190863)
+++ doc/tm.texi.in  (working copy)
@@ -8734,6 +8734,8 @@ The default is that no label is emitted.
 
 @hook TARGET_ASM_EMIT_EXCEPT_PERSONALITY
 
+@hook TARGET_MODE_FOR_EXTRACTION_INSV
+
 @hook TARGET_ASM_UNWIND_EMIT
 This target hook emits assembly directives required to unwind the
 given instruction.  This is only used when @code{TARGET_EXCEPT_UNWIND_INFO}
Index: target.def
===
--- target.def  (revision 190863)
+++ target.def  (working copy)
@@ -2765,6 +2765,13 @@ DEFHOOKPOD
  @code{bool} @code{true}.",
  unsigned char, 1)
  
+/* Returns a list to the modes to try for doing a bitfield insertation. */
+DEFHOOK
+(mode_for_extraction_insv,
+ "Returns a list of the modes to try for doing a bitfield insertation",
+ enum machine_mode *, (void),
+ NULL)
+
 /* Leave the boolean fields at the end.  */
 
 /* True if we can create zeroed data by switching to a BSS section
Index: expmed.c
===
--- expmed.c(revision 190863)
+++ expmed.c(working copy)
@@ -394,29 +394,35 @@ mode_for_extraction (enum extraction_pat
   return data->operand[opno].mode;
 }
 
-/* A subroutine of store_bit_field, with the same arguments.  Return true
-   if the operation could be implemented.
+static bool
+store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
+  unsigned HOST_WIDE_INT bitnum, 
+  unsigned HOST_WIDE_INT bitregion_start,
+  unsigned HOST_WIDE_INT bitregion_end,
+  enum machine_mode fieldmode,
+  rtx value, bool fallback_p);
+
+/* A subroutine of store_bit_field, with the same arguments, except OP_MODE is 
the
+   mode which tried to use for the inseration.  Return true if the operation
+   could be implemented.
 
If FALLBACK_P is true, fall back to store_fixed_bit_field if we have
no other way of implementing the operation.  If FALLBACK_P is false,
return false instead.  */
 
 static bool
-store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
+store_bit_field_2 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
   unsigned HOST_WIDE_INT bitnum,
   unsigned HOST_WIDE_INT bitregion_start,
   unsigned HOST_WIDE_INT bitregion_end,
   enum machine_mode fieldmode,
-  rtx value, bool fallback_p)
+  rtx value, enum machine_mode op_mode, bool fallback_p)
 {
-  unsigned int unit
-= (MEM_P (str_rtx)) ? BITS_PER_UNIT : BITS_PER_WORD;
   unsigned HOST_WIDE_INT offset, bitpos;
   rtx op0 = str_rtx;
   int byte_offset;
   rtx orig_value;
-
-  enum machine_mode op_mode = mode_for_extraction (EP_insv, 3);
+  unsigned int unit = (MEM_P (str_rtx)) ? BITS_PER_UNIT : GET_MODE_BITSIZE 
(op_mode);
 
   while (GET_CODE (op0) == SUBREG)
 {
@@ -876,6 +882,46 @@ store_bit_field_1 (rtx str_rtx, unsigned
   return true;
 }
 
+/* A subroutine of store

[PATCH] PR45070: Fix wrong epilogue code for cortex-m0/Os

2012-09-03 Thread Bin Cheng

Hi,
I found below regression tests are failed on trunk for cortex-m0/Os:

FAIL: gcc.c-torture/execute/pr45070.c execution,  -Os
FAIL: gcc.dg/compat/struct-layout-1 c_compat_x_tst.o-c_compat_y_tst.o
execute
FAIL: gcc.dg/compat/struct-return-2 c_compat_x_tst.o-c_compat_y_tst.o
execute
FAIL: tmpdir-gcc.dg-struct-layout-1/t001 c_compat_x_tst.o-c_compat_y_tst.o
execute
WARNING: program timed out.
FAIL: tmpdir-gcc.dg-struct-layout-1/t002 c_compat_x_tst.o-c_compat_y_tst.o
execute
FAIL: tmpdir-gcc.dg-struct-layout-1/t024 c_compat_x_tst.o-c_compat_y_tst.o
execute
FAIL: tmpdir-gcc.dg-struct-layout-1/t025 c_compat_x_tst.o-c_compat_y_tst.o
execute
FAIL: tmpdir-gcc.dg-struct-layout-1/t027 c_compat_x_tst.o-c_compat_y_tst.o
execute
FAIL: tmpdir-gcc.dg-struct-layout-1/t028 c_compat_x_tst.o-c_compat_y_tst.o
execute

Seems the patch for pr45070 does not cover Thumb1 instruction set, here
comes this patch fixing the issue.

It is caused by wrong epilogue generated for cortex-m0/Os. In function
thumb1_extra_regs_pushed, gcc calculates reg_base as following, which does
not honor return value of size less than 4 bytes, resulting in wrong pop
sequences.
reg_base = arm_size_return_regs () / UNITS_PER_WORD;

I ran regression test with/without Os for cortex-m0 and everything is ok.
Ok for trunk and 4.7/4.6 release branches?

Thanks.


2012-09-04  Bin Cheng  

PR target/45070
* config/arm/arm.c (thumb1_extra_regs_pushed): Handle return value
of size
less than 4 bytes by using macro ARM_NUM_INTS.
(thumb1_unexpanded_epilogue): Use macro ARM_NUM_INTS.Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 190830)
+++ gcc/config/arm/arm.c(working copy)
@@ -21862,7 +21862,7 @@ thumb1_extra_regs_pushed (arm_stack_offsets *offse
   unsigned long l_mask = live_regs_mask & (for_prologue ? 0x40ff : 0xff);
   /* Then count how many other high registers will need to be pushed.  */
   unsigned long high_regs_pushed = bit_count (live_regs_mask & 0x0f00);
-  int n_free, reg_base;
+  int n_free, reg_base, size;
 
   if (!for_prologue && frame_pointer_needed)
 amount = offsets->locals_base - offsets->saved_regs;
@@ -21901,7 +21901,8 @@ thumb1_extra_regs_pushed (arm_stack_offsets *offse
   n_free = 0;
   if (!for_prologue)
 {
-  reg_base = arm_size_return_regs () / UNITS_PER_WORD;
+  size = arm_size_return_regs ();
+  reg_base = ARM_NUM_INTS (size);
   live_regs_mask >>= reg_base;
 }
 
@@ -21955,8 +21956,7 @@ thumb1_unexpanded_epilogue (void)
   if (extra_pop > 0)
 {
   unsigned long extra_mask = (1 << extra_pop) - 1;
-  live_regs_mask |= extra_mask << ((size + UNITS_PER_WORD - 1) 
-  / UNITS_PER_WORD);
+  live_regs_mask |= extra_mask << ARM_NUM_INTS (size);
 }
 
   /* The prolog may have pushed some high registers to use as

beta distribution

2012-09-03 Thread Ulrich Drepper

Another distribution missing is beta, related to the gamma
distribution.  Instead of the complex formula I've used an iterative
process, similar to the one used for the normal distribution.  There
is no real surprise here, there are two scalar parameters.  Unless
someone things this distribution for some reason does not belong into
the library I'll check in the patch.


d-random-beta
Description: Binary data

[PATCH, M68K] Fix ICE from scheduler improvement

2012-09-03 Thread Maxim Kuvyrkov

Andreas,

This patch updates scheduling support for m68k to handle propagation of 
scheduler automaton states between basic blocks that Bernd checked in a couple 
of weeks ago.  Currently m68k build fails on an assert that checks that basic 
block starts with a clean scheduler state, which no longer holds true.

Tested by building a ColdFire toolchain.  OK to apply?

Thank you,

--
Maxim Kuvyrkov
Mentor Graphics




gcc-m68k-sched_ib.ChangeLog
Description: Binary data


gcc-m68k-sched_ib.patch
Description: Binary data

Re: [Patch,avr] PR54461: Better AVR-Libc integration

2012-09-03 Thread Joerg Wunsch

As Georg-Johann Lay wrote:

> What do you propose?

I'm fine with that option, and think it's a good idea.

-- 
cheers, J"org   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

Re: beta distribution

2012-09-03 Thread Daniel Krügler

2012/9/4 Ulrich Drepper :
> Another distribution missing is beta, related to the gamma
> distribution.  Instead of the complex formula I've used an iterative
> process, similar to the one used for the normal distribution.  There
> is no real surprise here, there are two scalar parameters.  Unless
> someone things this distribution for some reason does not belong into
> the library I'll check in the patch.

I think this contribution is very appropriate. I would only recommend to
fix the typo in the description that says:

"The formula for the gamma probability density function"

- Daniel

RE: [Patch ARM] Update the test case to differ movs and lsrs for ARM mode and non-ARM mode

2012-09-03 Thread Terry Guo



> -Original Message-
> From: Richard Earnshaw
> Sent: Wednesday, August 22, 2012 10:00 PM
> To: Terry Guo
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [Patch ARM] Update the test case to differ movs and lsrs
> for ARM mode and non-ARM mode
> 
> On 22/08/12 12:16, Terry Guo wrote:
> >
> >>>
> >>> Due to the impact of ARM UAL, the Thumb1 and Thumb2 mode use LSRS
> >>> instruction while the ARM mode uses MOVS instruction. So the
> >> following case
> >>> is updated accordingly. Is it OK to trunk?
> >>>
> >>> BR,
> >>> Terry
> >>>
> >>> 2012-08-21  Terry Guo  
> >>>
> >>> * gcc.target/arm/combine-movs.c: Check movs for ARM mode
> >>> and lsrs for other mode.
> >>>
> >>
> >> This can't be right.  Thumb1 doesn't use unified syntax.
> >>
> >> R.
> >>
> >
> > oops. You are right. Sorry for making such obvious mistake.
> > Here is patch updated to distinguish ARM and Thumb2.
> > Tested for Thumb1, Thumb2 and ARM modes. No regression.
> >
> > Is it OK?
> >
> > BR,
> > Terry
> >
> > 2012-08-21  Terry Guo  
> >
> > * gcc.target/arm/combine-movs.c: Check movs for ARM mode
> > and lsrs for Thumb2 mode.
> >
> >
> 
> OK.
> 
> R.

Hi Richard,

Is it ok to apply this fix to gcc 4.7 branch?

BR,
Terry

[Ping]RE: [Patch, test] Enable to prune warnings for tests defined in one exp file

2012-09-03 Thread Terry Guo

Hi Mike,

Is it ok to document this feature in README.gcc? Is it ok to back port this
feature to 4.7 branch? Thanks.

BR,
Terry

> -Original Message-
> From: Terry Guo [mailto:terry@arm.com]
> Sent: Thursday, August 30, 2012 10:45 AM
> To: 'Mike Stump'
> Cc: gcc-patches@gcc.gnu.org; Richard Guenther
> Subject: RE: [Patch, test] Enable to prune warnings for tests defined
> in one exp file
> 
> > -Original Message-
> > From: Mike Stump [mailto:mikest...@comcast.net]
> > Sent: Tuesday, August 28, 2012 1:21 AM
> > To: Terry Guo
> > Cc: gcc-patches@gcc.gnu.org; Richard Guenther
> > Subject: Re: [Patch, test] Enable to prune warnings for tests defined
> > in one exp file
> >
> > On Aug 27, 2012, at 1:14 AM, Terry Guo wrote:
> > > This patch intends to provide a chance to prune common warning
> > messages for
> > > tests defined in an exp file.
> >
> > > Is it OK to trunk?
> >
> > Ok.
> >
> > If you can find where to document this...  :-)  That'd be nice.
> >
> 
> I checked the texi files in gcc/doc folder, but can't find a suitable
> place. So I resort to README.gcc in gcc/testsuite which is claimed to
> list notes for those writing testcases and those writing expect scripts.
> Following is the patch. Is it OK?
> 
> BR,
> Terry
> 
> 2012-08-30  Terry Guo  
> 
> * README.gcc: Document new variable dg_runtest_extra_prunes.
> 
> Index: gcc/testsuite/README.gcc
> ===
> --- gcc/testsuite/README.gcc  (revision 190795)
> +++ gcc/testsuite/README.gcc  (working copy)
> @@ -79,6 +79,11 @@
> 
>  If a test does not fit into the torture framework, use the dg
> framework.
> 
> +If some tests in an exp file need to skip same warning messages, just
> define
> +variable dg_runtest_extra_prunes in this exp file and let it contain
> this warning
> +message pattern.  This can avoid duplicating dg-prune in these cases.
> +Always remember to clear this variable when leave this exp file.
> +
> 
> 
>  Copyright (C) 1997, 1998, 2004 Free Software Foundation, Inc.

Re: [Patch,avr] PR54461: Better AVR-Libc integration

2012-09-03 Thread Denis Chertykov

2012/9/4 Joerg Wunsch :
> As Georg-Johann Lay wrote:
>
>> What do you propose?
>
> I'm fine with that option, and think it's a good idea.
>

I have not objections against the patch.

Denis

Re: [Patch,avr] Fix PR54220

2012-09-03 Thread Denis Chertykov

2012/9/3 Georg-Johann Lay :
> This implements TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS as
> obvious fix for PR54220.
>
> Ok to install?
>
> Johann
>
>
> PR target/54220
> * config/avr/avr.c (TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS): New
> define to...
> (avr_allocate_stack_slots_for_args): ...this new static function.

Approved.

Denis.

Re: [Patch,avr] PR54461: Better AVR-Libc integration

2012-09-03 Thread Gabriel Dos Reis

On Mon, Sep 3, 2012 at 4:23 PM, Georg-Johann Lay  wrote:
> Gabriel Dos Reis schrieb:
>>
>> Georg-Johann Lay wrote:
>>>
>>> Gabriel Dos Reis schrieb:

 Georg-Johann Lay wrote:
>
> AVR-Libc comes with hand-optimized float support functions written
> in assembler.  These functions use the same naming conventions like
> libgcc.  There are situations where this name clashed lead to
> performance
> regression because the functions from libgcc are linked.  One example
> are the new fixed-point support that convert fixed-point to/from float
> and reference float/int conversion functions from within libgcc.
>
> The float implementation in libm.a have been discussed several times
> with the only result that it is very unlikely that the code will
> ever be integrated into libgcc because the original authors are no
> more around.  And is is much less work to add a new configure switch
> than to port and integrate the code, given there were no license
> issues.
> One point against such an extension was that such change to the
> compiler
> establishes a dependency between the compiler and AVR-Libc, but this
> decision has been made long ago by accepting code that actually should
> had been added to libgcc -- but was not for whatever reason.
>
> This patch removes that performance regressions by removing the
> doubly implemented functions from libgcc by means of a new configure
> option --with-avrlibc.


 as I stated yesterday, I do not understand why there needs to be yet
 another
 configure option. The NATURAL libc for ARV targets is ARV-libc.  We
 should not need a switch for that.
>>>
>>>
>>> There is also newlib that is used with avr-gcc.  I know this because
>>> some bugs are only triggered for newlib.  If there are users that
>>> report bugs if avr-gcc is build for newlib, I'd guess these users are
>>> actually interested in using newlib.
>>
>>
>> I did not say there was no other libc library.  I said
>> that the *natural* libc appears to be AVR-libc.
>>
>> We don't configure GCC/g++ saying --with-libstdc++.
>
>
> That's a different story because these libraries support in-tree
> build just like newlib does.  This is not true for AVR-Libc which
> does not support in-tree builds.
>
> I agree that AVR-Libc is the most common libc implementation
> used with avr-gcc and is has many advantages over other libc
> implementation (except that it does not support in-tree builds).

I think the "in-tree builds" thing is a red herring.

>
> However, a --with-avrlibc is not needed to *get* the support
> from AVR-Libc, it's just used to fix some problems that arise
> in certain use cases.

so, let's make it the default -- see below.

>
> Besides that, the proposed arrangement does not affect the
> configuration if the switch is *not* specified, thus the patch
> is appropriate to be backported.
>
> My intention is to backport it to 4.7 as indicated by the milestone,
> but if the change was unconditionally I don't think the change is
> appropriate for a backport.
>

It is perfectly reasonable and OK to to make the backport more
guarded (e.g. by the configure option) than on mainline.

> And after all it's just a *configure* option that some distribution
> maintainers can set if they want to.

yes, but it is still one more configure option.

>  The tool chain user is not
> bothered at all by the new option and won't even notice it.
> From the user perspective it's just as if some optimizations
> had been added to the tool chain.
>
> What do you propose?
>
> Use the setting per default and support a --with-avrlibc=no if
> the user want full libgcc support and nothing removed from it?

Yes. Let's make the "sane" behaviour the default.

-- Gaby

[SH] Define NO_IMPLICIT_EXTERN_C for newlib targets

2012-09-03 Thread Christian Bruel

newlib uses extern "C" wrappers in its headers, so GCC can be told it is
C++ compatible.

this patch fixes :

FAIL: g++.dg/lookup/builtin5.C -std=c++11  scan-assembler _ZSt5atanhd t

Tested om the 4.7 and 4.8 branches, OK for both ?

nb: newlib can be added to the list of runtimes that need it (see
http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01164.html), in case this
macro is removed in the future.

Thanks

Christian



2012-09-04  Christian Bruel  

	* config/sh/newlib.h (NO_IMPLICIT_EXTERN_C): Define.

Index: config/sh/newlib.h
===
--- config/sh/newlib.h	(revision 190714)
+++ config/sh/newlib.h	(working copy)
@@ -23,3 +23,7 @@

 #undef LIB_SPEC
 #define LIB_SPEC "-lc -lgloss"
+
+#undef  NO_IMPLICIT_EXTERN_C
+#define NO_IMPLICIT_EXTERN_C 1
+

Re: [Patch,avr] PR54461: Better AVR-Libc integration

2012-09-03 Thread Georg-Johann Lay


Gabriel Dos Reis schrieb:

Georg-Johann Lay wrote:

Gabriel Dos Reis schrieb:

Georg-Johann Lay wrote:

Gabriel Dos Reis schrieb:

Georg-Johann Lay wrote:

AVR-Libc comes with hand-optimized float support functions
written in assembler.  These functions use the same naming
conventions like libgcc.  There are situations where this
name clashed lead to performance regression because the
functions from libgcc are linked.  One example are the new
fixed-point support that convert fixed-point to/from float 
and reference float/int conversion functions from within

libgcc.

The float implementation in libm.a have been discussed
several times with the only result that it is very unlikely
that the code will ever be integrated into libgcc because
the original authors are no more around.  And is is much
less work to add a new configure switch than to port and
integrate the code, given there were no license issues. One
point against such an extension was that such change to the
compiler establishes a dependency between the compiler and
AVR-Libc, but this decision has been made long ago by
accepting code that actually should had been added to
libgcc -- but was not for whatever reason.

This patch removes that performance regressions by removing
the doubly implemented functions from libgcc by means of a
new configure option --with-avrlibc.


as I stated yesterday, I do not understand why there needs to
be yet another configure option. The NATURAL libc for ARV
targets is ARV-libc.  We should not need a switch for that.


There is also newlib that is used with avr-gcc.  I know this
because some bugs are only triggered for newlib.  If there are
users that report bugs if avr-gcc is build for newlib, I'd
guess these users are actually interested in using newlib.


I did not say there was no other libc library.  I said that the
*natural* libc appears to be AVR-libc.

We don't configure GCC/g++ saying --with-libstdc++.


That's a different story because these libraries support in-tree 
build just like newlib does.  This is not true for AVR-Libc which 
does not support in-tree builds.


I agree that AVR-Libc is the most common libc implementation used
with avr-gcc and is has many advantages over other libc 
implementation (except that it does not support in-tree builds).


I think the "in-tree builds" thing is a red herring.


I don't think so.  If there was an in-tree build gcc could detect
itself whether or not AVR-Libc is present or not.  With the
current setup the user has to specify that -- in whatever
direction: that libc is there or that libc is not there depending
on whatever is default.


However, a --with-avrlibc is not needed to *get* the support from
AVR-Libc, it's just used to fix some problems that arise in certain
use cases.


so, let's make it the default -- see below.

Besides that, the proposed arrangement does not affect the 
configuration if the switch is *not* specified, thus the patch is

appropriate to be backported.

My intention is to backport it to 4.7 as indicated by the
milestone, but if the change was unconditionally I don't think the
change is appropriate for a backport.


It is perfectly reasonable and OK to to make the backport more 
guarded (e.g. by the configure option) than on mainline.



And after all it's just a *configure* option that some distribution
 maintainers can set if they want to.


yes, but it is still one more configure option.


hmm.  The configure machinery was not changed, it automatically sets
with_foo if --with-foo is specified.  It's just about who is to
be blamed if he does not read the release notes ;-)

Whatever, I think we two are stuck now and enough arguments passed
back and forth.  Let the port maintainers decide.

And Jörg, would you check the excludes list in t-avrlibc?

Johann


The tool chain user is not bothered at all by the new option and
won't even notice it. From the user perspective it's just as if
some optimizations had been added to the tool chain.

What do you propose?

Use the setting per default and support a --with-avrlibc=no if the
user want full libgcc support and nothing removed from it?


Yes. Let's make the "sane" behaviour the default.

-- Gaby

79 matches

Mail list logo