Fix PR ada/50842

2011-10-28 Thread Eric Botcazou
We now need to link the gnattools with libiconv on Darwin 9, since we link with 
libcpp.a and other libraries.  The patch also gets rid of EXTRA_GNATTOOLS_OBJS 
which doesn't make much sense any more.

Tested by Dave and Dominique, applied on the mainline.


2011-10-28  Eric Botcazou  

PR ada/50842
* gcc-interface/Makefile.in (SYMDEPS): Delete.
(LIBICONV): New variable.
(LIBICONV_DEP): Likewise.
(LIBS): Add $(LIBICONV).
(LIBDEPS): Add $(LIBICONV_DEP).
(EXTRA_GNATTOOLS_OBJS): Merge into...
(TOOLS_LIBS): ...this.  Add $(LIBICONV).


-- 
Eric Botcazou
Index: gcc-interface/Makefile.in
===
--- gcc-interface/Makefile.in	(revision 180423)
+++ gcc-interface/Makefile.in	(working copy)
@@ -121,7 +121,6 @@ THREAD_KIND = native
 THREADSLIB =
 GMEM_LIB =
 MISCLIB =
-SYMDEPS = $(LIBINTL_DEP)
 OUTPUT_OPTION = @OUTPUT_OPTION@
 
 objext = .o
@@ -175,13 +174,13 @@ top_builddir = ../..
 LIBINTL = @LIBINTL@
 LIBINTL_DEP = @LIBINTL_DEP@
 
+# Character encoding conversion library.
+LIBICONV = @LIBICONV@
+LIBICONV_DEP = @LIBICONV_DEP@
+
 # Any system libraries needed just for GNAT.
 SYSLIBS = @GNAT_LIBEXC@
 
-# List of extra object files linked in with various programs.
-EXTRA_GNATTOOLS_OBJS = ../../libcommon-target.a ../../libcommon.a \
-	../../../libcpp/libcpp.a
-
 # List extra gnattools
 EXTRA_GNATTOOLS =
 
@@ -242,11 +241,13 @@ LIBIBERTY = ../../libiberty/libiberty.a
 
 # How to link with both our special library facilities
 # and the system's installed libraries.
-LIBS = $(LIBINTL) $(LIBIBERTY) $(SYSLIBS)
-LIBDEPS = $(LIBINTL_DEP) $(LIBIBERTY)
+LIBS = $(LIBINTL) $(LIBICONV) $(LIBIBERTY) $(SYSLIBS)
+LIBDEPS = $(LIBINTL_DEP) $(LIBICONV_DEP) $(LIBIBERTY)
 # Default is no TGT_LIB; one might be passed down or something
 TGT_LIB =
-TOOLS_LIBS = $(EXTRA_GNATTOOLS_OBJS) targext.o link.o $(LIBGNAT) $(LIBINTL) ../../../libiberty/libiberty.a $(SYSLIBS) $(TGT_LIB)
+TOOLS_LIBS = targext.o link.o ../../libcommon-target.a ../../libcommon.a \
+  ../../../libcpp/libcpp.a $(LIBGNAT) $(LIBINTL) $(LIBICONV) \
+  ../../../libiberty/libiberty.a $(SYSLIBS) $(TGT_LIB)
 
 # Convert the target variable into a space separated list of architecture,
 # manufacturer, and operating system and assign each of those to its own


Re: [PATCH] Account for devirtualization opportunities in inliner

2011-10-28 Thread Maxim Kuvyrkov
Jan,

Attached is the updated patch.  The only major change is the addition of 
indirect_call_cost to size and time weights.  I've set the size cost of 
indirect call to 3, which is what I remember calculating when I looked into 
costs couple of months ago: one call instruction for the call itself, one 
memory instruction to pull the call address out of vtable, and one ALU 
instruction to calculate the address inside vtable.  On architectures with 
base+offset addressing the above can be shrunk into 2 instructions.

The remapping of the known_vals and known_binfos did indeed turned out to work 
just fine.  Probably, that was a bug that was fixed since.

The patch bootstraps and passes regtest.

Comments?  OK for trunk?

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



On 28/10/2011, at 3:43 PM, Maxim Kuvyrkov wrote:

> On 20/10/2011, at 10:11 PM, Jan Hubicka wrote:
>> static clause_t
>> -evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
>> +evaluate_conditions_vals_binfos_for_edge (struct cgraph_edge *e,
>> +  bool inline_p,
>> +  VEC (tree, heap) **known_vals_ptr,
>> +  VEC (tree, heap) **known_binfos_ptr)
>> 
>> Hmm, I would make clause also returned by reference to be sonsistent and 
>> perhaps
>> call it something like edge_properties
>> since it is not really only about evaulating the clause anymore.
> 
> Agree.
> 
>> 
>> -/* Increase SIZE and TIME for size and time needed to handle all calls in 
>> NODE.  */
>> +/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and
>> +   KNOWN_BINFOS.  */
>> +
>> +static void
>> +estimate_edge_devirt_benefit (struct cgraph_edge *ie,
>> +  int *size, int *time, int prob,
>> +  VEC (tree, heap) *known_vals,
>> +  VEC (tree, heap) *known_binfos)
>> 
>> I think this whole logic should go into estimate_edge_time_and_size.  This 
>> way
>> we will save all the duplication of scaling logic
>> Just add the known_vals/binfos arguments.
> 
> Then devirtualization benefit will not be available through 
> estimate_node_size_and_time, which is the primary interface for users of 
> ipa-inline-analysis other than the inliner itself.  I.e., 
> estimate_ipcp_clone_size_and_time, which is the only other user of the 
> analysis at the moment, will not see devirtualization benefit.
> 
>> 
>> I am not quite sure how to estimate the actual benefits.  estimate_num_insns
>> doesn't really make a difference in between direct and indirect calls.
>> 
>> I see it is good idea to inline more then the destination is known & 
>> inlinable.
>> This is an example when we have additional knowledge that we want to mix into
>> badness metric that does not directly translate to time/size.  There are 
>> multiple
>> cases like this.  I was thinking of adding kind of bonus metric for this 
>> purpose,
>> but I would suggest doing this incrementally.
> 
> I too thought about this, and decided to keep the bonus metric part to bare 
> minimum in this patch.
> 
>> 
>> What about
>> 1) extending estimate_num_insns wieghts to account direct calls differently
>>   from indirect calls (i.e. adding indirect_call cost value into eni wights)
>>   I would set it 2 for size metrics and 15 for time metrics for start
>> 2) make estimate_edge_time_and_size to subtract difference of those two 
>> metrics
>>   from edge costs when destination is direct.
> 
> OK, I'll try this.
> 
>> @@ -2125,25 +2207,35 @@ estimate_calls_size_and_time (struct cgraph_node 
>> *node, int *size, int *time,
>>  }
>>else
>>  estimate_calls_size_and_time (e->callee, size, time,
>> -  possible_truths);
>> +  possible_truths,
>> +  /* TODO: remap KNOWN_VALS and
>> + KNOWN_BINFOS to E->CALLEE
>> + parameters, and use them.  */
>> +  NULL, NULL);
>> 
>> Remapping should not be needed here - the jump functions are merged after 
>> marking edge inline, so jump
>> functions in inlined functions actually reffer to the parameters of the 
>> function they are inlined to.
> 
> I remember it crashing on some testcase and thought the lack of remapping was 
> the cause.  I'll look into this.
> 
> Thank you,
> 
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
> 



fsf-gcc-devirt-account-2.ChangeLog
Description: Binary data


fsf-gcc-devirt-account-2.patch
Description: Binary data


Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-10-28 Thread Richard Guenther
On Thu, 27 Oct 2011, Uros Bizjak wrote:

> Hello!
> 
> > Here's a patch for PR47698, which is about CMOV should not be
> > generated for memory address marked as volatile.
> > Successfully bootstrapped and passed make check on x86_64-unknown-linux-gnu.
> 
> 
>   PR rtl-optimization/47698
>   * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation
>   for volatile mem
> 
>   PR rtl-optimization/47698
>   * gcc.target/i386/47698.c: New test
> 
> Please use punctuation marks and correct capitalization in ChangeLog entries.
> 
> OTOH, do we want to fix this per-target, or in the middle-end?

The middle-end pattern documentation does not say operands 2 and 3
are not evaluated if they do not end up being stored, so a middle-end
fix is more appropriate.

Richard.


Re: [PATCH][PING] Vectorize conversions directly

2011-10-28 Thread Dmitry Plotnikov

Here is the patch updated according to recent comments.

2011-10-28  Dmitry Plotnikov 

gcc/
* tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
* optabs.c (supportable_convert_operation): New function.
* optabs.h (supportable_convert_operation): New prototype.
* tree-vect-stmts.c (vectorizable_conversion): Change condition and 
behavior

  for NONE modifier case.
* tree.h (VECTOR_INTEGER_TYPE_P): New macro.

gcc/config/arm/
* neon.md (floatv2siv2sf2): New.
  (floatunsv2siv2sf2): New.
  (fix_truncv2sfv2si2): New.
  (fix_truncunsv2sfv2si2): New.
  (floatv4siv4sf2): New.
  (floatunsv4siv4sf2): New.
  (fix_truncv4sfv4si2): New.
  (fix_truncunsv4sfv4si2): New.

gcc/testsuite/
* gcc.target/arm/vect-vcvt.c: New test.
* gcc.target/arm/vect-vcvtq.c: New test.

gcc/testsuite/lib/
* target-supports.exp (check_effective_target_vect_intfloat_cvt): True
  for ARM NEON.
  (check_effective_target_vect_uintfloat_cvt): Likewise.
  (check_effective_target_vect_intfloat_cvt): Likewise.
  (check_effective_target_vect_floatuint_cvt): Likewise.
  (check_effective_target_vect_floatint_cvt): Likewise.
  (check_effective_target_vect_extract_even_odd): Likewise.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index ea09da2..0dd13a6 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2945,6 +2945,62 @@
(const_string "neon_fp_vadd_qqq_vabs_qq")))]
 )
 
+(define_insn "floatv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+   (float:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.s32\t%P0, %P1"
+)
+
+(define_insn "floatunsv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+   (unsigned_float:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))] 
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.u32\t%P0, %P1"
+)
+
+(define_insn "fix_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+(fix:V2SI (match_operand:V2SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%P0, %P1"
+)
+
+(define_insn "fixuns_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+(unsigned_fix:V2SI (match_operand:V2SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%P0, %P1"
+)
+
+(define_insn "floatv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+   (float:V4SF (match_operand:V4SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.s32\t%q0, %q1"
+)
+
+(define_insn "floatunsv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+   (unsigned_float:V4SF (match_operand:V4SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.u32\t%q0, %q1"
+)
+
+(define_insn "fix_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+(fix:V4SI (match_operand:V4SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%q0, %q1"
+)
+
+(define_insn "fixuns_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+(unsigned_fix:V4SI (match_operand:V4SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%q0, %q1"
+)
+
 (define_insn "neon_vcvt"
   [(set (match_operand: 0 "s_register_operand" "=w")
 	(unspec: [(match_operand:VCVTI 1 "s_register_operand" "w")
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 0ba1333..920d756 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -4727,6 +4727,60 @@ can_float_p (enum machine_mode fltmode, enum machine_mode fixmode,
   tab = unsignedp ? ufloat_optab : sfloat_optab;
   return convert_optab_handler (tab, fltmode, fixmode);
 }
+
+/* Function supportable_convert_operation
+
+   Check whether an operation represented by the code CODE is a
+   convert operation that is supported by the target platform in
+   vector form (i.e., when operating on arguments of type VECTYPE_IN
+   producing a result of type VECTYPE_OUT).
+   
+   Convert operations we currently support directly are FIX_TRUNC and FLOAT.
+   This function checks if these operations are supported
+   by the target platform either directly (via vector tree-codes), or via
+   target builtins.
+   
+   Output:
+   - CODE1 is code of vector operation to be used when
+   vectorizing the operation, if available.
+   - DECL is decl of target builtin functions to be used
+   when vectorizing the operation, if available.  In this case,
+   CODE1 is CALL_EXPR.  */
+
+bool
+supportable_convert_operation (enum tree_code code,
+tree vectype_out, tree vectype_in,
+tree *decl, enum tree_code *code1)
+{
+  enum machine_mode m1,m2;
+  int truncp;
+
+  m1 = TYPE_MODE (vectype_out);
+  m2 = TYPE_MODE (vectype_in);
+
+  /* First check if we can done conversion directly.  */
+  if ((code == FIX_TRUNC_EXPR 

Re: [PATCH] Don't ICE on long long shifts in vectorizable_shift

2011-10-28 Thread Richard Guenther
On Thu, 27 Oct 2011, Jakub Jelinek wrote:

> Hi!
> 
> With the patch I'm going to post momentarily which adds vlshrv{4,2}di and
> vashlv{4,2}di patterns for -mavx2 vectorizable_shift ICEs, because the
> frontends for long_long_var1 << long_long_var2 emit long_long_var1 << (int) 
> long_long_var2
> and vectorizable_shift isn't prepared to handle type promotion (or
> demotion).  IMHO it would complicate it too much, so this patch just
> gives up on vectorizing in that case.
> 
> I can work on Monday on pattern recognizer that will change
> shifts/rotates where the rhs1 has different size from rhs2 into
> a pattern with def_stmt casting the rhs2 to the same type as rhs1
> and pattern_stmt that uses the temporary for rhs2, perhaps with extra
> optimization if it sees the type being promoted/demoted again to just look
> through those.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Hm, but you are testing vector modes in the path that is supposed to
handle shifts by a scalar.  That looks odd.  Also it should be easy
to promote/demote the scalar value to the proper type, so why not
do that?  Like how we do

  /* Unlike the other binary operators, shifts/rotates have
 the rhs being int, instead of the same type as the lhs,
 so make sure the scalar is the right type if we are
 dealing with vectors of short/char.  */
  if (dt[1] == vect_constant_def)
op1 = fold_convert (TREE_TYPE (vectype), op1);

just do that for all kind of defs (and deal with the fact that you
may need a gimplified stmt for the vector shift amount generation).

?

Thanks,
Richard.


> 2011-10-27  Jakub Jelinek  
> 
>   * tree-vect-stmts.c (vectorizable_shift): Give up if op1 has different
>   vector mode from vectype's mode.
> 
> --- gcc/tree-vect-stmts.c.jj  2011-10-27 08:42:51.0 +0200
> +++ gcc/tree-vect-stmts.c 2011-10-27 17:24:15.0 +0200
> @@ -2318,6 +2318,7 @@ vectorizable_shift (gimple stmt, gimple_
>int nunits_in;
>int nunits_out;
>tree vectype_out;
> +  tree op1_vectype;
>int ncopies;
>int j, i;
>VEC (tree, heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
> @@ -2387,7 +2388,8 @@ vectorizable_shift (gimple stmt, gimple_
>  return false;
>  
>op1 = gimple_assign_rhs2 (stmt);
> -  if (!vect_is_simple_use (op1, loop_vinfo, bb_vinfo, &def_stmt, &def, 
> &dt[1]))
> +  if (!vect_is_simple_use_1 (op1, loop_vinfo, bb_vinfo, &def_stmt, &def,
> +  &dt[1], &op1_vectype))
>  {
>if (vect_print_dump_info (REPORT_DETAILS))
>  fprintf (vect_dump, "use not simple.");
> @@ -2444,6 +2446,13 @@ vectorizable_shift (gimple stmt, gimple_
>optab = optab_for_tree_code (code, vectype, optab_vector);
>if (vect_print_dump_info (REPORT_DETAILS))
>  fprintf (vect_dump, "vector/vector shift/rotate found.");
> +  if (TYPE_MODE (op1_vectype) != TYPE_MODE (vectype))
> + {
> +   if (vect_print_dump_info (REPORT_DETAILS))
> + fprintf (vect_dump, "unusable type for last operand in"
> + " vector/vector shift/rotate.");
> +   return false;
> + }
>  }
>/* See if the machine has a vector shifted by scalar insn and if not
>   then see if it has a vector shifted by vector insn.  */
> 
>   Jakub
> 
> 

-- 
Richard Guenther 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Re: [RFC PATCH] update to libtool-2.4.2 and regenerate

2011-10-28 Thread Rainer Orth
Markus Trippelsdorf  writes:

> By popular demand, I've prepared a patch that updates the in-tree
> libtool to version 2.4.2. It is needed for lto-bootstrap with
> -fno-fat-lto-objects and FreeBSD10.x versions. 

I see that your patch doesn't deal with libgo/config, where a private
copy of libtool is kept.  Would it be possible to get rid of that, given
that 2.4.2 does support Go?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Miscompilation of __attribute__((constructor)) functions.

2011-10-28 Thread Richard Guenther
On Thu, Oct 27, 2011 at 7:24 PM, Paul Brook  wrote:
> Patch below fixes a miscompilation observed whem building uclibc libpthread on
> a mips-linux system.
>
> The story start with the ipa-split optimization, which turns:
>
> void fn()
> {
>  if (cond) {
>    DO_STUFF;
>  }
> }
>
> into:
>
> static void fn_helper()
> {
>  DO_STUFF;
> }
>
> void fn()
> {
>  if (cond)
>    fn_helper();
> }
>
>
> The idea is that the new fn() wrapper is a good candidate for inlining,
> whereas the original fn is not.
>
> The optimization uses cgraph function versioning.  The problem is that when we
> clone the cgraph node we propagate the DECL_STATIC_CONSTRUCTOR bit.  Thus both
> fn() and fn_helper() get called on startup.
>
> When fn happens to be pthread_initialize we end up calling both the original
> and a clone with the have-I-already-done-this check removed. Much
> badness ensues.
>
> Patch below fixes this by clearing the DECL_STATIC_{CON,DES}TRUCTOR bit when
> cloning a cgraph node - there's already logic to make sure we keep the
> original.  My guess is this bug is probably latent in other IPA passes.
>
> Tested on mips-linux and bootstrap+test x86_64-linux
> Ok?

Ok if you move the clearing to after

  /* Generate a new name for the new version. */
  DECL_NAME (new_decl) = clone_function_name (old_decl, clone_name);
  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
  SET_DECL_RTL (new_decl, NULL);

using new_decl directly, thus add

  /* When the old decl was a con-/destructor make sure the clone isn't.  */
  DECL_STATIC_CONSTRUCTOR(new_decl) = 0;
  DECL_STATIC_DESTRUCTOR(new_decl) = 0;

Thanks,
Richard.

> Paul
>
> 2011-10-27  Paul Brook  
>
>        gcc/
>        * cgraphunit.c: Don't mark clones as static constructors.
>
>        gcc/testsuite/
>        * gcc.dg/constructor-1.c: New test.
>
> Index: gcc/cgraphunit.c
> ===
> --- gcc/cgraphunit.c    (revision 180439)
> +++ gcc/cgraphunit.c    (working copy)
> @@ -2386,6 +2386,8 @@ cgraph_function_versioning (struct cgrap
>   new_version_node->local.externally_visible = 0;
>   new_version_node->local.local = 1;
>   new_version_node->lowered = true;
> +  DECL_STATIC_CONSTRUCTOR(new_version_node->decl) = 0;
> +  DECL_STATIC_DESTRUCTOR(new_version_node->decl) = 0;
>
>   /* Update the call_expr on the edges to call the new version node. */
>   update_call_expr (new_version_node);
> Index: gcc/testsuite/gcc.dg/constructor-1.c
> ===
> --- gcc/testsuite/gcc.dg/constructor-1.c        (revision 0)
> +++ gcc/testsuite/gcc.dg/constructor-1.c        (revision 0)
> @@ -0,0 +1,37 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +/* The ipa-split pass pulls the body of the if(!x) block
> +   into a separate function to make foo a better inlining
> +   candidate.  Make sure this new function isn't also run
> +   as a static constructor.  */
> +
> +#include 
> +
> +int x, y;
> +
> +void __attribute__((noinline))
> +bar(void)
> +{
> +  y++;
> +}
> +
> +void __attribute__((constructor))
> +foo(void)
> +{
> +  if (!x)
> +    {
> +      bar();
> +      y++;
> +    }
> +}
> +
> +int main()
> +{
> +  x = 1;
> +  foo();
> +  foo();
> +  if (y != 2)
> +    abort();
> +  exit(0);
> +}
>


Re: [PATCH] Pass through jump functions for addressable (scalar) parameters

2011-10-28 Thread Richard Guenther
On Thu, Oct 27, 2011 at 9:11 PM, Martin Jambor  wrote:
> Hi,
>
> On Thu, Oct 27, 2011 at 03:07:10PM +0200, Richard Guenther wrote:
>> On Wed, Oct 26, 2011 at 8:25 PM, Martin Jambor  wrote:
>> > Hi,
>> >
>> >
>> >
>> > 2011-10-26  Martin Jambor  
>> >
>> >        * ipa-prop.c (mark_modified): Moved up in the file.
>> >        (is_parm_modified_before_call): Renamed to
>> >        is_parm_modified_before_stmt, moved up in the file.
>> >        (load_from_unmodified_param): New function.
>> >        (compute_complex_assign_jump_func): Also attempt to create pass
>> >        through jump functions for values loaded from (addressable)
>> >        parameters.
>> >
>> >        * testsuite/gcc.dg/ipa/ipcp-4.c: New test.
>> >
>
> ...
>
>> > Index: src/gcc/ipa-prop.c
>> > ===
>> > --- src.orig/gcc/ipa-prop.c
>> > +++ src/gcc/ipa-prop.c
>> > @@ -419,31 +419,105 @@ detect_type_change_ssa (tree arg, gimple
>> >   return detect_type_change (arg, arg, call, jfunc, 0);
>> >  }
>> >
>> > +/* Callback of walk_aliased_vdefs.  Flags that it has been invoked to the
>> > +   boolean variable pointed to by DATA.  */
>> > +
>> > +static bool
>> > +mark_modified (ao_ref *ao ATTRIBUTE_UNUSED, tree vdef ATTRIBUTE_UNUSED,
>> > +                    void *data)
>> > +{
>> > +  bool *b = (bool *) data;
>> > +  *b = true;
>> > +  return true;
>> > +}
>> > +
>> > +/* Return true if the formal parameter PARM might have been modified in 
>> > this
>> > +   function before reaching the statement STMT.  PARM_AINFO is a pointer 
>> > to a
>> > +   structure containing temporary information about PARM.  */
>> > +
>> > +static bool
>> > +is_parm_modified_before_stmt (struct param_analysis_info *parm_ainfo,
>> > +                             gimple stmt, tree parm)
>> > +{
>> > +  bool modified = false;
>> > +  ao_ref refd;
>> > +
>> > +  if (parm_ainfo->modified)
>> > +    return true;
>> > +
>> > +  ao_ref_init (&refd, parm);
>> > +  walk_aliased_vdefs (&refd, gimple_vuse (stmt), mark_modified,
>> > +                     &modified, &parm_ainfo->visited_statements);
>> > +  if (modified)
>> > +    {
>> > +      parm_ainfo->modified = true;
>> > +      return true;
>> > +    }
>> > +  return false;
>> > +}
>> > +
>> > +/* If STMT is an assignment that loads a value from an parameter 
>> > declaration,
>> > +   return the index of the parameter in ipa_node_params which has not been
>> > +   modified.  Otherwise return -1.  */
>> > +
>> > +static int
>> > +load_from_unmodified_param (struct ipa_node_params *info,
>> > +                           struct param_analysis_info *parms_ainfo,
>> > +                           gimple stmt)
>> > +{
>> > +  int index;
>> > +  tree op1;
>> > +
>> > +  if (!gimple_assign_single_p (stmt)
>> > +      || gimple_assign_cast_p (stmt))
>>
>> The || gimple_assign_cast_p (stmt) check is redundant.  Hopefully
>> you don't want to test for VIEW_CONVERT_EXPR here?
>
> Yeah, right, I managed to confuse myself with all the gimple
> accessors.
>
>>
>> > +    return -1;
>> > +
>> > +  op1 = gimple_assign_rhs1 (stmt);
>> > +  index = ipa_get_param_decl_index (info, op1);
>>
>> That only succeeds for decls?  Where do you check this is actually
>> a (full) load?  I suppose it's a side-effect of ipa_get_parm_decl_index
>> in some way, but it's not clear.  Beause ...
>
> Yes, it is a side-effect of the function.  I've added the check for
> clarity and to save us the search but it is not strictly necessary.
>
>>
>> > +  if (index < 0
>> > +      || is_parm_modified_before_stmt (&parms_ainfo[index], stmt, op1))
>>
>> ... entering this without a VUSE (in case op1 is an SSA name, for example)
>> will do interesting things (likely ICE, at least compute garbage).
>
> I know.  This has to be a load, though.  I added a checking assert for
> non-NULL VUSE too, just for convenience.
>
>>
>> So, do you want to have TREE_CODE (op1) == PARM_DECL here?
>>
>> > +    return -1;
>> > +
>> > +  return index;
>> > +}
>> >
>> >  /* Given that an actual argument is an SSA_NAME (given in NAME) and is a 
>> > result
>> >    of an assignment statement STMT, try to find out whether NAME can be
>> >    described by a (possibly polynomial) pass-through jump-function or an
>> > -   ancestor jump function and if so, write the appropriate function into
>> > -   JFUNC */
>> > +   ancestor jump function and if so, write the appropriate function into 
>> > JFUNC.
>> > +   PARMS_AINFO describes state of analysis with respect to individual 
>> > formal
>> > +   parameters.  */
>> >
>> >  static void
>> >  compute_complex_assign_jump_func (struct ipa_node_params *info,
>> > +                                 struct param_analysis_info *parms_ainfo,
>> >                                  struct ipa_jump_func *jfunc,
>> >                                  gimple call, gimple stmt, tree name)
>> >  {
>> >   HOST_WIDE_INT offset, size, max_size;
>> > -  tree op1, op2, base, ssa;
>> > +  tree op1, tc_ssa, base,

Re: [PATCH] Don't ICE on long long shifts in vectorizable_shift

2011-10-28 Thread Jakub Jelinek
On Fri, Oct 28, 2011 at 10:22:15AM +0200, Richard Guenther wrote:
> Hm, but you are testing vector modes in the path that is supposed to
> handle shifts by a scalar.  That looks odd.  Also it should be easy

No, I'm testing it in the path that is supposed to handle shifts by a
vector.  That block starts with:
  /* Vector shifted by vector.  */
  if (!scalar_shift_arg)
{

Which means I'd have to duplicate there big parts of
vectorizable_type_promotion and vectorizable_type_demotion
and handle all these widening resp. narrowing cases.

I know you don't like tree-vect-pattern.c too much, but IMHO just
changing the shifts in there to have rhs2 type matching rhs1 type
would be far easier and more maintainable.  Especially when it can handle
additionally what one of the testcases does - long long shift
with long long shift count, which should be naturally vectorized without
any promotion/demotion, but the FEs insert there a cast to (int) which would
result in the promotion/demotion.

Jakub


[PATCH] Cleanup AVX2 vector/vector shifts (take 2)

2011-10-28 Thread Jakub Jelinek
On Thu, Oct 27, 2011 at 10:07:13PM +0200, Uros Bizjak wrote:
> Please use expressive RTX forms for expanders, similar to the above
> define_insn RTX. You can avoid calling gen_avx2_lshrv at the end
> of c code. Also, expanders can have nonimmediate_operand as operand 2
> and conditionally move it to register in C code block if needed.

Like this?

In addition to that the patch also enables all the 3 patterns for V2DImode
for -mxop too (all this depends on some solution for the vectorizable_shift
ICE I've posted yesterday) and except for the left shift xop only pattern
uses nonimmediate_operand on the last arg - even the xop pattern that start
with negation of the last operand can use nonimmediate_operand which
neg2 uses.

2011-10-28  Jakub Jelinek  

* config/i386/sse.md (VI4SD_AVX2): Removed.
(VI48_AVX2, VI128_128, VI48_128, VI48_256): New mode iterators.
(vashl3): Use VI12_128 iterator instead of VI124_128.
Add another expander using VI48_128 iterator for
TARGET_AVX2 || TARGET_XOP and another using VI48_256 iterator
for TARGET_AVX2.
(vlshr3): Likewise.  Change register_operand predicate to
nonimmediate_operand on last operand in the VI12_128 expander.
(vashr3): Use VI128_128 iterator instead of VI124_128.
(vashrv4si3, vashrv8si3): New expanders.
(avx2_ashrvv8si, avx2_ashrvv4si, avx2_vv8si,
avx2_vv2di): Removed.
(avx2_ashrv): New insn with VI4_AVX2 iterator.
(avx2_v): Macroize using VI48_AVX2
iterator.  Simplify pattern.

* gcc.dg/vshift-1.c: New test.
* gcc.dg/vshift-2.c: New test.
* gcc.target/i386/xop-vshift-1.c: New test.
* gcc.target/i386/xop-vshift-2.c: New test.
* gcc.target/i386/avx2-vshift-1.c: New test.

--- gcc/config/i386/sse.md.jj   2011-10-28 09:59:54.0 +0200
+++ gcc/config/i386/sse.md  2011-10-28 10:29:22.0 +0200
@@ -125,8 +125,9 @@ (define_mode_iterator VI248_AVX2
(V8SI "TARGET_AVX2") V4SI
(V4DI "TARGET_AVX2") V2DI])
 
-(define_mode_iterator VI4SD_AVX2
-  [V4SI V4DI])
+(define_mode_iterator VI48_AVX2
+  [(V8SI "TARGET_AVX2") V4SI
+   (V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator V48_AVX2
   [V4SF V2DF
@@ -191,11 +192,14 @@ (define_mode_iterator VI_256 [V32QI V16H
 (define_mode_iterator VI12_128 [V16QI V8HI])
 (define_mode_iterator VI14_128 [V16QI V4SI])
 (define_mode_iterator VI124_128 [V16QI V8HI V4SI])
+(define_mode_iterator VI128_128 [V16QI V8HI V2DI])
 (define_mode_iterator VI24_128 [V8HI V4SI])
 (define_mode_iterator VI248_128 [V8HI V4SI V2DI])
+(define_mode_iterator VI48_128 [V4SI V2DI])
 
 ;; Random 256bit vector integer mode combinations
 (define_mode_iterator VI124_256 [V32QI V16HI V8SI])
+(define_mode_iterator VI48_256 [V8SI V4DI])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -11265,11 +11269,10 @@ (define_insn "xop_vrotl3"
(set_attr "mode" "TI")])
 
 ;; XOP packed shift instructions.
-;; FIXME: add V2DI back in
 (define_expand "vlshr3"
-  [(match_operand:VI124_128 0 "register_operand" "")
-   (match_operand:VI124_128 1 "register_operand" "")
-   (match_operand:VI124_128 2 "register_operand" "")]
+  [(match_operand:VI12_128 0 "register_operand" "")
+   (match_operand:VI12_128 1 "register_operand" "")
+   (match_operand:VI12_128 2 "nonimmediate_operand" "")]
   "TARGET_XOP"
 {
   rtx neg = gen_reg_rtx (mode);
@@ -11278,10 +11281,33 @@ (define_expand "vlshr3"
   DONE;
 })
 
+(define_expand "vlshr3"
+  [(set (match_operand:VI48_128 0 "register_operand" "")
+   (lshiftrt:VI48_128
+ (match_operand:VI48_128 1 "register_operand" "")
+ (match_operand:VI48_128 2 "nonimmediate_operand" "")))]
+  "TARGET_AVX2 || TARGET_XOP"
+{
+  if (!TARGET_AVX2)
+{
+  rtx neg = gen_reg_rtx (mode);
+  emit_insn (gen_neg2 (neg, operands[2]));
+  emit_insn (gen_xop_lshl3 (operands[0], operands[1], neg));
+  DONE;
+}
+})
+
+(define_expand "vlshr3"
+  [(set (match_operand:VI48_256 0 "register_operand" "")
+   (lshiftrt:VI48_256
+ (match_operand:VI48_256 1 "register_operand" "")
+ (match_operand:VI48_256 2 "nonimmediate_operand" "")))]
+  "TARGET_AVX2")
+
 (define_expand "vashr3"
-  [(match_operand:VI124_128 0 "register_operand" "")
-   (match_operand:VI124_128 1 "register_operand" "")
-   (match_operand:VI124_128 2 "register_operand" "")]
+  [(match_operand:VI128_128 0 "register_operand" "")
+   (match_operand:VI128_128 1 "register_operand" "")
+   (match_operand:VI128_128 2 "nonimmediate_operand" "")]
   "TARGET_XOP"
 {
   rtx neg = gen_reg_rtx (mode);
@@ -11290,16 +11316,59 @@ (define_expand "vashr3"
   DONE;
 })
 
+(define_expand "vashrv4si3"
+  [(set (match_operand:V4SI 0 "register_operand" "")
+   (ashiftrt:V4SI (match_operand:V4SI 1 "register_operand" "")
+  (match_operand:V4SI 2 "nonimmediate_operand" "")))]
+  "TARGET_AVX2 || TARGET_XOP"
+{
+  if (!TARGET_AVX2)
+{
+  rtx neg = gen_

Re: [PATCH] Add capability to run several iterations of early optimizations

2011-10-28 Thread Richard Guenther
On Thu, Oct 27, 2011 at 11:53 PM, Matt  wrote:
 Then you'd have to analyze the compile-time impact of the IPA
 splitting on its own when not iterating. ?Then you should look
 at what actually was the optimizations that were performed
 that lead to the improvement (I can see some indirect inlining
 happening, but everything else would be a bug in present
 optimizers in the early pipeline - they are all designed to be
 roughly independent on each other and _not_ expose new
 opportunities by iteration). ?Thus - testcases?
>>>
>>> The initial motivation for the patch was to enable more indirect
>
> inlining and devirtualization opportunities.
>
>> Hm.
>
> It is the proprietary codebase of my employer that these optimizations were
> developed for. Multiple iterations specifically helps propogate the concrete
> type information from functions that implement the Abstract Factory design
> pattern, allowing for cleaner runtime dynamic dispatch. I can verify that in
> said codebase (and in the reduced, non-proprietary examples Maxim provided
> earlier in the year) it works quite effectively.
>
> Many of the devirt examples focus on a pure top-down approach like this:
> class I { virtual void f() = 0; };
> class K : public I { virtual void f() {} };
> class L: public I { virtual void f() {} };
> void g(I& i) { i.f(); }
> int main(void) { L l; g(l); return 0; }
>
> While that strategy isn't unheard of, it implies a link-time substitution to
> inject new/different sub-classes of the parameterized interface. Besides
> limiting extensibility by requiring a rebuild/relink, it also presupposes
> that two different implementations would be mutually exclusive for that
> module. That is often not the case, hence the factory pattern expressed in
> the other examples Maxim provided.
>
>>> Since then I found the patch to be helpful in searching for
>
> optimization opportunities and bugs. ?E.g., SPEC2006's 471.omnetpp drops 20%
> with 2 additional iterations of early optimizations [*]. ?Given that
> applying more optimizations should, theoretically, not decrease performance,
> there is likely a very real bug or deficiency behind that.
>
>> It is likely early SRA that messes up, or maybe convert switch.  Early
>> passes should be really restricted to always profitable cleanups.
>
>> Your experiment looks useful to track down these bugs, but in general
>> I don't think we want to expose iterating early passes.
>
> In these other more top-down examples of devirt I mention above, I agree
> with you. Once the CFG is ordered and the analyses happen, things should be
> propogated forward without issue. In the case of factory functions, my
> understanding and experience on this real-world codebase is that multiple
> passes are required. First, to "bubble up" the concrete type info coming out
> of the factory function. Depending on how many layers, it may require a
> couple. Second, to then forward propogate that concrete type information for
> the pointer.
>
> There was a surprising side-effect when I started experimenting with this
> ipa-passes feature. In a module that contains ~100KLOC, I implemented
> mega-compilation (a poor-man's LTO). At two passes, the module got larger,
> which I expected. This minor growth continued with each additional pass,
> until at about 7 passes when it decreased by over 10%. I set up a script to
> run overnight to incrementally try passes and record the module size, and
> the "sweet spot" ended up being 54 passes as far as size. I took the three
> smallest binaries and did a full performance regression at the system level,
> and the smallest binary's inclusion resulted in an ~6% performance
> improvement (measured as overall network I/O throughput) while using less
> CPU on a Transmeta Crusoe-based appliance. (This is a web proxy, with about
> 500KLOC of other code that was not compiled in this new way.)
>
> The idea of multiple passes resulting is a smaller binary and higher
> performance was like a dream. I reproduced a similar pattern on open source
> projects, namely scummvm (on which I was able to use proper LTO)*. That is,
> smaller binaries resulted as well as decreased CPU usage. On some projects,
> this could possibly be correlated with micro-level benchmarks such as
> reduced branch prediction and L1 cache misses as reported by callgrind.
>
> While it's possible/probable that some of the performance improvements I saw
> by increasing ipa-passes were ultimately missed-optimization bugs that
> should be fixed, I'd be very surprised if *all* of those improvements were
> the case. As such, I would still like to see this exposed. I would be happy
> to file bugs and help test any instances where it looks like an optimization
> should have been gotten within a single ipa-pass.

I discussed the idea of iterating early optimizations shortly with Honza.
I was trying to step back a bit and look at what we try to do right now,
which is, optimize functions in topological ord

[trans-mem] Explicitly go irrevocable even if transaction will always go irrevocable.

2011-10-28 Thread Torvald Riegel
The ABI does not require a TM runtime library to immediately start a
transaction in irrevocable mode if the beginTransaction flag
"doesGoIrrevocable" is set. This patch fixes this by inserting a call to
changeTransactionMode(0) into the entry block of all transactions that
always go irrevocable eventually.
Once we generate uninstrumented code paths too, we can optimize this by
telling the runtime that it is probably preferable to use uninstrumented
code right away. The instrumented path, if we generate it, should then
go irrevocable as late as possible, so that there is actually a choice
that the TM runtime can make (e.g., if we go irrevocable close to the
end of a long transaction, then being irrevocable for a shorter time can
increase overall performance).

OK for branch?
commit 64e5f9da61ea7f45748b411aa58404628040c343
Author: Torvald Riegel 
Date:   Fri Oct 28 10:55:38 2011 +0200

Explicitly go irrevocable even if transaction will always go irrevocable.

* trans-mem.c (ipa_tm_transform_transaction): Insert explicit request
to go irrevocable even if transaction will always go irrevocable.
* testsuite/gcc.dg/tm/irrevocable-8.c: New file.
* testsuite/gcc.dg/tm/memopt-1.c: Re-focus this test on tmmemopt.

--- a/gcc/ChangeLog.tm
+++ b/gcc/ChangeLog.tm
@@ -1,5 +1,12 @@
 2011-10-27  Torvald Riegel  
 
+   * trans-mem.c (ipa_tm_transform_transaction): Insert explicit request
+   to go irrevocable even if transaction will always go irrevocable.
+   * testsuite/gcc.dg/tm/irrevocable-8.c: New file.
+   * testsuite/gcc.dg/tm/memopt-1.c: Re-focus this test on tmmemopt.
+
+2011-10-27  Torvald Riegel  
+
* trans-mem.c (struct tm_region): Extended comment.
(tm_region_init): Fix tm_region association of blocks with the "over"
label used for transcation abort.
diff --git a/gcc/testsuite/gcc.dg/tm/irrevocable-8.c 
b/gcc/testsuite/gcc.dg/tm/irrevocable-8.c
new file mode 100644
index 000..1fe6c0a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tm/irrevocable-8.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -fdump-ipa-tmipa" } */
+
+void unsafe(void) __attribute__((transaction_unsafe));
+
+void
+f(void)
+{
+  __transaction_relaxed {
+unsafe();
+  }
+}
+
+/* { dg-final { scan-ipa-dump-times "changeTransactionMode \\(0\\)" 1 "tmipa" 
} } */
+/* { dg-final { cleanup-ipa-dump "tmipa" } } */
diff --git a/gcc/testsuite/gcc.dg/tm/memopt-1.c 
b/gcc/testsuite/gcc.dg/tm/memopt-1.c
index 06d4f64..9a48dcb 100644
--- a/gcc/testsuite/gcc.dg/tm/memopt-1.c
+++ b/gcc/testsuite/gcc.dg/tm/memopt-1.c
@@ -2,8 +2,8 @@
 /* { dg-options "-fgnu-tm -O -fdump-tree-tmmemopt" } */
 
 long g, xxx, yyy;
-extern george() __attribute__((transaction_callable));
-extern ringo(long int);
+extern george() __attribute__((transaction_safe));
+extern ringo(long int) __attribute__((transaction_safe));
 int i;
 
 f()
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index e88c7ad..994cf09 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -4521,6 +4521,14 @@ ipa_tm_transform_transaction (struct cgraph_node *node)
{
  transaction_subcode_ior (region, GTMA_DOES_GO_IRREVOCABLE);
  transaction_subcode_ior (region, GTMA_MAY_ENTER_IRREVOCABLE);
+ /* ??? We still have to insert a call to explicitly request
+to go irrevocable because the ABI does not require the runtime
+to immediately go irrevocable.  Once we generate an
+uninstrumented code path for transactions, we can avoid this
+extra call and only make uninstrumented code available, which
+tells the runtime that it must go irrevocable immediately.  */
+ ipa_tm_insert_irr_call (node, region, region->entry_block);
+ need_ssa_rename = true;
  continue;
}
 


Re: [PATCH] Don't ICE on long long shifts in vectorizable_shift

2011-10-28 Thread Richard Guenther
On Fri, 28 Oct 2011, Jakub Jelinek wrote:

> On Fri, Oct 28, 2011 at 10:22:15AM +0200, Richard Guenther wrote:
> > Hm, but you are testing vector modes in the path that is supposed to
> > handle shifts by a scalar.  That looks odd.  Also it should be easy
> 
> No, I'm testing it in the path that is supposed to handle shifts by a
> vector.  That block starts with:
>   /* Vector shifted by vector.  */
>   if (!scalar_shift_arg)
> {

Oh, I looked close and thought you are touching the else { part ...

> Which means I'd have to duplicate there big parts of
> vectorizable_type_promotion and vectorizable_type_demotion
> and handle all these widening resp. narrowing cases.

Yeah, agreed (though it should be always possible to just
truncate/extend the shift count).

> I know you don't like tree-vect-pattern.c too much, but IMHO just
> changing the shifts in there to have rhs2 type matching rhs1 type
> would be far easier and more maintainable.  Especially when it can handle
> additionally what one of the testcases does - long long shift
> with long long shift count, which should be naturally vectorized without
> any promotion/demotion, but the FEs insert there a cast to (int) which would
> result in the promotion/demotion.

... but we could as well forward-prop this (also for scalar code)
(well, truncations only for SHIFT_COUNT_TRUNCATED targets).

The patch is ok meanwhile.

Thanks,
Richard.


Re: [PATCH, devirtualization] Detect the new type in type change detection

2011-10-28 Thread Richard Guenther
On Thu, Oct 27, 2011 at 9:54 PM, Martin Jambor  wrote:
> Hi,
>
> On Thu, Oct 27, 2011 at 11:06:02AM +0200, Richard Guenther wrote:
>> On Thu, Oct 27, 2011 at 1:22 AM, Martin Jambor  wrote:
>> > Hi,
>> >
>> > I've been asked by Maxim Kuvyrkov to revive the following patch which
>> > has not made it to 4.6.  Currently, when type based devirtualization
>> > detects a potential type change, it simply gives up on gathering any
>> > information on the object in question.  This patch adds an attempt to
>> > actually detect the new type after the change.
>> >
>> > Maxim claimed this (and another patch I'll post tomorrow) noticeably
>> > improved performance of some real code.  I can only offer a rather
>> > artificial example in the attachment.  When the constructors are
>> > inlined but the function multiply_matrices is not, this patch makes
>> > the produced executable run for only 7 seconds instead of about 20 on
>> > my 4 year old i686 desktop (with -Ofast).
>> >
>> > Anyway, the patch passes bootstrap and testsuite on x86_64-linux.
>> > What do you think, is it a good idea for trunk now?
>> >
>> > Thanks,
>> >
>> > Martin
>> >
>> >
>> > 2011-10-21  Martin Jambor  
>> >
>> >        * ipa-prop.c (type_change_info): New fields object, 
>> > known_current_type
>> >        and multiple_types_encountered.
>> >        (extr_type_from_vtbl_ptr_store): New function.
>> >        (check_stmt_for_type_change): Use it, set 
>> > multiple_types_encountered if
>> >        the result is different from the previous one.
>> >        (detect_type_change): Renamed to detect_type_change_1. New parameter
>> >        comp_type.  Set up new fields in tci, build known type jump
>> >        functions if the new type can be identified.
>> >        (detect_type_change): New function.
>> >        * tree.h (DECL_CONTEXT): Comment new use.
>> >
>> >        * testsuite/g++.dg/ipa/devirt-c-1.C: Add dump scans.
>> >        * testsuite/g++.dg/ipa/devirt-c-2.C: Likewise.
>> >        * testsuite/g++.dg/ipa/devirt-c-7.C: New test.
>> >
>> >
>> > Index: src/gcc/ipa-prop.c
>> > ===
>> > --- src.orig/gcc/ipa-prop.c
>> > +++ src/gcc/ipa-prop.c
>> > @@ -271,8 +271,17 @@ ipa_print_all_jump_functions (FILE *f)
>> >
>> >  struct type_change_info
>> >  {
>> > +  /* The declaration or SSA_NAME pointer of the base that we are checking 
>> > for
>> > +     type change.  */
>> > +  tree object;
>> > +  /* If we actually can tell the type that the object has changed to, it 
>> > is
>> > +     stored in this field.  Otherwise it remains NULL_TREE.  */
>> > +  tree known_current_type;
>> >   /* Set to true if dynamic type change has been detected.  */
>> >   bool type_maybe_changed;
>> > +  /* Set to true if multiple types have been encountered.  
>> > known_current_type
>> > +     must be disregarded in that case.  */
>> > +  bool multiple_types_encountered;
>> >  };
>> >
>> >  /* Return true if STMT can modify a virtual method table pointer.
>> > @@ -338,6 +347,49 @@ stmt_may_be_vtbl_ptr_store (gimple stmt)
>> >   return true;
>> >  }
>> >
>> > +/* If STMT can be proved to be an assignment to the virtual method table
>> > +   pointer of ANALYZED_OBJ and the type associated with the new table
>> > +   identified, return the type.  Otherwise return NULL_TREE.  */
>> > +
>> > +static tree
>> > +extr_type_from_vtbl_ptr_store (gimple stmt, tree analyzed_obj)
>> > +{
>> > +  tree lhs, t, obj;
>> > +
>> > +  if (!is_gimple_assign (stmt))
>>
>> gimple_assign_single_p (stmt)
>
> OK.
>
>>
>> > +    return NULL_TREE;
>> > +
>> > +  lhs = gimple_assign_lhs (stmt);
>> > +
>> > +  if (TREE_CODE (lhs) != COMPONENT_REF)
>> > +    return NULL_TREE;
>> > +  obj = lhs;
>> > +
>> > +  if (!DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1)))
>> > +    return NULL_TREE;
>> > +
>> > +  do
>> > +    {
>> > +      obj = TREE_OPERAND (obj, 0);
>> > +    }
>> > +  while (TREE_CODE (obj) == COMPONENT_REF);
>>
>> You do not allow other components than component-refs (thus, for
>> example an ARRAY_REF - that is for a reason?).  Please add
>> a comment why.  Otherwise this whole sequence would look like
>> it should be replaceable by get_base_address (obj).
>>
>
> I guess I might have been overly conservative here, ARRAY_REFs are
> fine.  get_base_address only digs into MEM_REFs if they are based on
> an ADDR_EXPR while I do so always.  But I can check that either both
> obj and analyzed_obj are a MEM_REF of the same SSA_NAME or they are
> the same thing (i.e. the same decl)... which even feels a bit cleaner,
> so I did that.

Well, as you are looking for a must-change-type pattern I think you cannot
simply ignore offsets.  Consider

T a[10];

new (T') (&a[9]);
a[8]->foo();

where the must-type-change on a[9] is _not_ changing the type of a[8]!

Similar cases might happen with

class Compound { T a; T b; };

no?

Please think about the difference must vs. may-type-change for these
cases.  I'm not convinced that the must-type-change code is

Re: [PATCH] Cleanup AVX2 vector/vector shifts (take 2)

2011-10-28 Thread Uros Bizjak
On Fri, Oct 28, 2011 at 10:57 AM, Jakub Jelinek  wrote:
> On Thu, Oct 27, 2011 at 10:07:13PM +0200, Uros Bizjak wrote:
>> Please use expressive RTX forms for expanders, similar to the above
>> define_insn RTX. You can avoid calling gen_avx2_lshrv at the end
>> of c code. Also, expanders can have nonimmediate_operand as operand 2
>> and conditionally move it to register in C code block if needed.
>
> Like this?

Yes.

> In addition to that the patch also enables all the 3 patterns for V2DImode
> for -mxop too (all this depends on some solution for the vectorizable_shift
> ICE I've posted yesterday) and except for the left shift xop only pattern
> uses nonimmediate_operand on the last arg - even the xop pattern that start
> with negation of the last operand can use nonimmediate_operand which
> neg2 uses.

I see some more cleanup opportunities with XOP patterns, I will try to
create a follow-up patch.

> 2011-10-28  Jakub Jelinek  
>
>        * config/i386/sse.md (VI4SD_AVX2): Removed.
>        (VI48_AVX2, VI128_128, VI48_128, VI48_256): New mode iterators.
>        (vashl3): Use VI12_128 iterator instead of VI124_128.
>        Add another expander using VI48_128 iterator for
>        TARGET_AVX2 || TARGET_XOP and another using VI48_256 iterator
>        for TARGET_AVX2.
>        (vlshr3): Likewise.  Change register_operand predicate to
>        nonimmediate_operand on last operand in the VI12_128 expander.
>        (vashr3): Use VI128_128 iterator instead of VI124_128.
>        (vashrv4si3, vashrv8si3): New expanders.
>        (avx2_ashrvv8si, avx2_ashrvv4si, avx2_vv8si,
>        avx2_vv2di): Removed.
>        (avx2_ashrv): New insn with VI4_AVX2 iterator.
>        (avx2_v): Macroize using VI48_AVX2
>        iterator.  Simplify pattern.
>
>        * gcc.dg/vshift-1.c: New test.
>        * gcc.dg/vshift-2.c: New test.
>        * gcc.target/i386/xop-vshift-1.c: New test.
>        * gcc.target/i386/xop-vshift-2.c: New test.
>        * gcc.target/i386/avx2-vshift-1.c: New test.

OK.

Thanks,
Uros.


Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Kirill Yukhin
Hi Jacob,
this looks really cool. I have a liitle question, since I do not
understand vectorizer as good.

Say, we have a snippet:
int *p;
int idx[N];
int arr[M];
for (...)
{
  p[i%4] += arr[idx[I]];
}
As far as I understand, we cannot do gather we, since p may point to
somewere in arr,
and, idx may took it twice.
E.g. lets take
  idx = {0, 1, 1, 2, 3, 4, 5, 6}
  arr = {0, 1, 0, 0, 0, 0, 0, 0}
  p = arr;
Correct case will have, we'll have something like arr = {0, 2, 2, 0,...}
If we'll have gather, arr may look likearr = {0,
2, 1, 0, ...}

So my question, does your patch catch such a cases?

Thanks, K


Re: [Patch]: PR49868: Named address space support for AVR

2011-10-28 Thread Georg-Johann Lay
Georg-Johann Lay schrieb:

> This patch adds named address space support to read data from flash (aka.
> progmem) to target AVR.
> 
> The patch has two parts:
> 
> The first part is a repost of Ulrich's work from
>http://gcc.gnu.org/ml/gcc/2011-08/msg00131.html
> with the needed changes to ./gcc and ./gcc/doc
> 
> This patch is needed because the target hooks MODE_CODE_BASE_REG_CLASS and
> REGNO_MODE_CODE_OK_FOR_BASE_P don't distinguish between different address
> spaces.  Ulrich's patch adds respective support to these hooks.
> 
> The second part is the AVR dependent part that adds __pgm as address space
> qualifier for address space AS1.
> 
> The AVR part is just the worker code.  If there is agreement that AS support
> for AVR is okay in principle and Ulrich's work will go into GCC, I will supply
> test programs and updates to the user manual, of course.
> 
> The major drawbacks of the current AS implementation are:
> 
> - It works only for C.
>   For C++, a language extension would be needed as indicated in
>  ISO/IEC DTR 18037
>  Annex F - C++ Compatibility and Migration issues
>  F.2 Multiple Address Spaces Support
> 
> - Register allocation does not a good job. AS1 can only be addressed
>   byte-wise by one single address register (Z) as per *Z or *Z++.

This flaw from register allocator are filed as PR50775 now.

> The AVR part does several things:
> 
> - It locates data in AS1 into appropriate section, i.e. somewhere in
>   .progmem
> 
> - It does early sanity checks to ensure that __pgm is always accompanied
>   with const so that writing to AS1 in not possible.
> 
> - It prints LPM instructions to access flash memory.

The attached patch is an update merge so that it fits without conflicts.

The patch requires Ulrich's works which is still in review
   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50775

The regression tests run with this patch and the new ChangeLog enttry si
written as if Ulrich's patch was applied.

Tests pass without regression.

Besides the update to a nor up-to-date SVN version, the patch sets a built-in
define __PGM so that it is easy for users to test if or if not the feature is
available.

Documentation and test cases will follow in separate patch.

Ok for trunk after Ulrich's work has been approved?

Johann

PR target/49868
* config/avr/avr.h (ADDR_SPACE_PGM): New define for address space AS1.
(REGISTER_TARGET_PRAGMAS): New define.
* config/avr/avr-protos.h (avr_mem_pgm_p): New prototype.
(avr_register_target_pragmas): New prototype.
(avr_log_t): Add field "progmem".  Order alphabetically.
* config/avr/avr-log.c (avr_log_set_avr_log): Set avr_log.progmem.
* config/avr/avr-c.c (langhooks.h): New include.
(avr_register_target_pragmas): New function. Register address
space AS1 as "__pgm".
(avr_cpu_cpp_builtins): Add built-in define __PGM.
* config/avr/avr.c: Include "c-family/c-common.h".
(TARGET_LEGITIMATE_ADDRESS_P): Remove define.
(TARGET_LEGITIMIZE_ADDRESS): Remove define.
(TARGET_ADDR_SPACE_SUBSET_P): Define to...
(avr_addr_space_subset_p): ...this new static function.
(TARGET_ADDR_SPACE_CONVERT): Define to...
(avr_addr_space_convert): ...this new static function.
(TARGET_ADDR_SPACE_ADDRESS_MODE): Define to...
(avr_addr_space_address_mode): ...this new static function.
(TARGET_ADDR_SPACE_POINTER_MODE): Define to...
(avr_addr_space_pointer_mode): ...this new static function.
(TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): Define to...
(avr_addr_space_legitimate_address_p): ...this new static function.
(TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS): Define to...
(avr_addr_space_legitimize_address): ...this new static function.
(avr_mode_code_base_reg_class): Handle AS1.
(avr_regno_mode_code_ok_for_base_p): Handle AS1.
(lpm_addr_reg_rtx, lpm_reg_rtx): New static GTYed variables.
(avr_decl_pgm_p): New static function.
(avr_mem_pgm_p): New function.
(avr_asm_len): Return always "" instead of void.
(avr_out_lpm_no_lpmx): New static function.
(avr_out_lpm): New static function.
(output_movqi, output_movhi, output_movsisf): Call avr_out_lpm to
handle loads from progmem.
(avr_progmem_p): Test if decl is in AS1.
(avr_pgm_pointer_const_p): New static function.
(avr_pgm_check_var_decl): New static function.
(avr_insert_attributes): Use it.  Change error message to report
cause (progmem or AS1) when code wants to write to AS1.
(avr_section_type_flags): Unset section flag SECTION_BSS for
data in progmem.
* config/avr/avr.md (LPM_REGNO): New define_constants.
(movqi, movhi, movsi, movsf): Skip if code would write to AS1.
(movmemhi): Ditto.  Propagate address space information to newly
created MEM.
(split-l

Re: [Patch Darwin/PPC] implement out-of-line FPR/GPR saves/restores.

2011-10-28 Thread Iain Sandoe


On 14 Oct 2011, at 10:29, Mike Stump wrote:


On Oct 14, 2011, at 2:05 AM, Iain Sandoe wrote:
This implements their use and also the GPRs - the latter makes an  
appreciable reduction in code size,



OK for trunk?


Ok.  Watch for problems with async stack walking (hitting sample in  
Activity Monitor, or the walking done by CrashReporter)...  that's  
the only thing I can think of that might be strange.


This has taken some time to apply because of various bootstrap issues  
(version applied is attached)


In answer to your observation;
I didn't expect problems with FPR saves because the vendor's tools  
implement those.


To test what you suggested I built some code that dropped down a few  
stack levels (with saves of FPR/GPR) and then either aborts or spins  
on a sleep.


The crashlogs from the abort() and the instrumentation samples from  
the sleep were OK.




During doing this (and checking crosses to aix and eabisim) I noticed  
the following in rs6000/sysv4.h:


/* And similarly for general purpose registers.  */
#define GP_SAVE_INLINE(FIRST_REG) ((FIRST_REG) < 32 \
   && !optimize_size)

which gives rise to code (with -Os) like:

main:
mr 11,1  #,
stwu 1,-504(1)   #,,
mflr 0   #,
bl _savegpr_31   #
lis 31,.LANCHOR0@ha  # tmp137,

which I doubt is what was intended 

... copying David in case he feels that should be amended.

cheers
Iain

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 180609)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,23 @@
+2011-10-28  Iain Sandoe  
+
+   * config/rs6000/t-darwin (LIB2FUNCS_STATIC_EXTRA): 
+   Move darwin-fpsave.asm from here to ... LIB2FUNCS_EXTRA.
+   (LIB2FUNCS_EXTRA):  Add darwin-gpsave.asm.
+   (TARGET_LIBGCC2_CFLAGS): Ensure that fPIC and -pipe are inherited from
+   config/t-darwin.
+   * config/rs6000/darwin.h (FP_SAVE_INLINE): Adjust to enable.
+   (GP_SAVE_INLINE): Likewise.
+   (SAVE_FP_PREFIX,  SAVE_FP_SUFFIX, RESTORE_FP_PREFIX,
+   RESTORE_FP_SUFFIX): Set to empty strings.
+   * config/rs6000/rs6000.c (rs6000_savres_strategy): Implement for Darwin.
+   (debug_stack_info): Print savres_strategy.
+   (rs6000_savres_routine_name): Implement for Darwin.
+   (rs6000_make_savres_rtx): Adjust used register for Darwin.
+   (rs6000_emit_prologue): Implement out-of-line saves for Darwin.
+   (rs6000_output_function_prologue): Don't emit .extern for Mach-O.
+   (rs6000_emit_epilogue): Implement out-of-line saves for Darwin.
+   * config/rs6000/darwin-gpsave.asm: New file.
+
 2011-10-28  Jakub Jelinek  
 
* config/i386/sse.md (VI4SD_AVX2): Removed.
Index: gcc/config/rs6000/t-darwin
===
--- gcc/config/rs6000/t-darwin  (revision 180609)
+++ gcc/config/rs6000/t-darwin  (working copy)
@@ -19,21 +19,21 @@
 
 LIB2FUNCS_EXTRA = $(srcdir)/config/rs6000/darwin-tramp.asm \
$(srcdir)/config/darwin-64.c \
+   $(srcdir)/config/rs6000/darwin-fpsave.asm  \
+   $(srcdir)/config/rs6000/darwin-gpsave.asm  \
$(srcdir)/config/rs6000/darwin-world.asm
 
 LIB2FUNCS_STATIC_EXTRA = \
-   $(srcdir)/config/rs6000/darwin-fpsave.asm  \
$(srcdir)/config/rs6000/darwin-vecsave.asm
 
-# The .asm files above are designed to run on all processors,
-# even though they use AltiVec instructions.  -Wa is used because
-# -force_cpusubtype_ALL doesn't work with -dynamiclib.
-#
-# -pipe because there's an assembler bug, 4077127, which causes
-# it to not properly process the first # directive, causing temporary
-# file names to appear in stabs, causing the bootstrap to fail.  Using -pipe
-# works around this by not having any temporary file names.
-TARGET_LIBGCC2_CFLAGS = -Wa,-force_cpusubtype_ALL -pipe 
-mmacosx-version-min=10.4
+# The .asm files above are designed to run on all processors, even though
+# they use AltiVec instructions.
+# -Wa is used because -force_cpusubtype_ALL doesn't work with -dynamiclib.
+# -mmacosx-version-min=10.4 is used to provide compatibility for code from
+# earlier OSX versions.
 
+TARGET_LIBGCC2_CFLAGS += -Wa,-force_cpusubtype_ALL -mmacosx-version-min=10.4
+
 darwin-fpsave.o:   $(srcdir)/config/rs6000/darwin-asm.h
+darwin-gpsave.o:   $(srcdir)/config/rs6000/darwin-asm.h
 darwin-tramp.o:$(srcdir)/config/rs6000/darwin-asm.h
Index: gcc/config/rs6000/darwin.h
===
--- gcc/config/rs6000/darwin.h  (revision 180609)
+++ gcc/config/rs6000/darwin.h  (working copy)
@@ -173,18 +173,27 @@ extern int darwin_emit_branch_islands;
   (RS6000_ALIGN (crtl->outgoing_args_size, 16) \
+ (STACK_POINTER_OFFSET))
 
-/* Define cutoff for using external functions to save floating point.
-   Currently on Darwin, always use inline stores.  */
+/* Define c

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Jakub Jelinek
On Fri, Oct 28, 2011 at 02:01:36PM +0400, Kirill Yukhin wrote:
> this looks really cool. I have a liitle question, since I do not
> understand vectorizer as good.
> 
> Say, we have a snippet:
> int *p;
> int idx[N];
> int arr[M];
> for (...)
> {
>   p[i%4] += arr[idx[I]];
> }
> As far as I understand, we cannot do gather we, since p may point to
> somewere in arr,
> and, idx may took it twice.
> E.g. lets take
>   idx = {0, 1, 1, 2, 3, 4, 5, 6}
>   arr = {0, 1, 0, 0, 0, 0, 0, 0}
>   p = arr;
> Correct case will have, we'll have something like arr = {0, 2, 2, 0,...}
> If we'll have gather, arr may look likearr = {0,
> 2, 1, 0, ...}
> 
> So my question, does your patch catch such a cases?

Yes, it does.  It should be caught by the vect_analyze_data_refs
part after vect_check_gather succeeds, which updates all the data
dependence relations with the new dr, and if the other dr in the
dependence relation has DR_IS_WRITE, then if dr_may_alias_p
in initialize_data_dependent_relation didn't say that those two
aren't known not to alias, we set bad and give up on vectorizing
the loop.  I'm doing it there, because the vectorizer would otherwise
attempt later on to version for aliasing, but for gather it is not possible
(or sometimes is at least much more difficult) to add such a runtime
check.  For the above it will not say chrec_known and thus we give up.

A very unfortunate thing is that dr_may_alias_p doesn't use TBAA
if one dr is a write and the other is a read, I think that is because
of placement new.  It would be nice if we could use TBAA, at least
in some cases (e.g. when you read pointers/ints through gather
and store floats/doubles, or vice versa etc.).  I wonder if we couldn't
resurrect a gimple stmt for placement new, and simply if the loop to be
vectorized doesn't contain any of those stmts (and doesn't call any
functions or just calls const/pure functions - after all otherwise
we wouldn't vectorize it anyway), then we could use TBAA even for that.
But that is definitely out of the scope of this patch.
We could then also vectorize:
void
foo (int *p, float *q)
{
  int i;
  for (i = 0; i < 1024; i++)
p[i] = q[i] * 2;
}
without versioning for aliasing, etc., and we wouldn't do that if there is
a placement new in that loop somewhere which could make it invalid.

Jakub


Re: [PATCH] Add capability to run several iterations of early optimizations

2011-10-28 Thread Richard Guenther
On Fri, Oct 28, 2011 at 1:05 AM, Maxim Kuvyrkov  wrote:
> Richard,
>
> Just as Matt posted his findings about the effect of iterating early 
> optimizations, I've got the new patch ready.  This patch is essentially a 
> complete rewrite and addresses the comments you made.
>
> On 18/10/2011, at 9:56 PM, Richard Guenther wrote:
>

 If we'd want to iterate early optimizations we'd want to do it by iterating
 an IPA pass so that we benefit from more precise size estimates
 when trying to inline a function the second time.
>>>
>>> Could you elaborate on this a bit?  Early optimizations are gimple passes, 
>>> so I'm missing your point here.
>>
>> pass_early_local_passes is an IPA pass, you want to iterate
>> fn1, fn2, fn1, fn2, ..., not fn1, fn1 ..., fn2, fn2 ... precisely for better
>> inlining.  Thus you need to split pass_early_local_passes into pieces
>> so you can iterate one of the IPA pieces.
>
> Early_local_passes are now split into _main, _iter and _late parts.  To avoid 
> changing the default case, _late part is merged into _main when no iterative 
> optimizations are requested.
>
>>
 Also statically
 scheduling the passes will mess up dump files and you have no
 chance of say, noticing that nothing changed for function f and its
 callees in iteration N and thus you can skip processing them in
 iteration N + 1.
>>>
>>> Yes, these are the shortcomings.  The dump files name changes can be fixed, 
>>> e.g., by adding a suffix to the passes on iterations after the first one.  
>>> The analysis to avoid unnecessary iterations is more complex problem.
>
> To avoid changing the dump file names the patch appends "_iter" suffix to the 
> dumps of iterative passes.
>
>>
>> Sure.  I analyzed early passes by manually duplicating them and
>> test that they do nothing for tramp3d, which they pretty much all did
>> at some point.
>>

 So, at least you should split the pass_early_local_passes IPA pass
 into three, you'd iterate over the 2nd (definitely not over 
 pass_split_functions
 though), the third would be pass_profile and pass_split_functions only.
 And you'd iterate from the place the 2nd IPA pass is executed, not
 by scheduling them N times.
>>>
>>> OK, I will look into this.
>
> Done.
>
>>>

 Then you'd have to analyze the compile-time impact of the IPA
 splitting on its own when not iterating.
>
> I decided to avoid this and keep the pass pipeline effectively the same when 
> not running iterative optimizations.  This is achieved by scheduling 
> pass_early_optimizations_late in different places in the pipeline depending 
> on whether iterative optimizations are enabled or not.
>
> The patch bootstraps and passes regtest on i686-pc-linux-gnu {-m32/-m64} with 
> 3 iterations enabled by default.  The only failures are 5 scan-dump tests 
> that are due to more functions being inlined than expected.  With iterative 
> optimizations disabled there is no change.
>
> I've kicked off SPEC2000/SPEC2006 benchmark runs to see the performance 
> effect of the patch, and those will be posted in the same Google Docs 
> spreadsheet in several days.
>
> OK for trunk?

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index f056d3d..4738b28 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2416,7 +2416,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
tree_lowering_passes (fndecl);
bitmap_obstack_initialize (NULL);
if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
- execute_pass_list (pass_early_local_passes.pass.sub);
+ execute_early_local_passes_for_current_function ();
bitmap_obstack_release (NULL);
pop_cfun ();
current_function_decl = NULL;
@@ -2441,7 +2441,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
gimple_register_cfg_hooks ();
bitmap_obstack_initialize (NULL);
if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
- execute_pass_list (pass_early_local_passes.pass.sub);
+ execute_early_local_passes_for_current_function ();

I think these should only execute the lowering pieces of early local passes,
let me see if that's properly split ...

@@ -255,7 +255,7 @@ cgraph_process_new_functions (void)
  /* When not optimizing, be sure we run early local passes anyway
 to expand OMP.  */
  || !optimize)
-   execute_pass_list (pass_early_local_passes.pass.sub);
+   execute_early_local_passes_for_current_function ();

similar.

About all this -suffix stuff, I'd like to have the iterations simply re-use
the existing dump-files, thus statically sub-divide pass_early_local_passes
like

NEXT_PASS (pass_early_local_lowering_passes);
  {
  NEXT_PASS (pass_fixup_cfg);
  NEXT_PASS (pass_init_datastructures);
  NEXT_PASS (pass_expand_omp);

  NEXT_PASS (pass_referenced_vars);
  NEXT_PASS (pass_build_ssa);
  NEXT_PASS (pass_lower_vecto

Re: [PATCH] Fix computed gotos on m68k

2011-10-28 Thread Julian Brown
On Tue, 25 Oct 2011 14:49:09 +0200
Eric Botcazou  wrote:

> These labels are on the nonlocal_goto_handler_labels chain.  You
> presumably just need to apply the same treatment to them in
> set_initial_label_offsets as the one applied to forced labels.

> OK for the adjusted patch if it works, mainline and 4.6 branch once
> it reopens. Please mention PR rtl-optimization/47918 in the ChangeLog.

Thanks -- I've committed this (much cleaner) version to mainline,
after re-testing. I'll see about testing it on 4.6 also.

Cheers,

JulianIndex: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 180610)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2011-10-28  Julian Brown  
+
+	PR rtl-optimization/47918
+
+	* reload1.c (set_initial_label_offsets): Use initial offsets
+	for labels on the nonlocal_goto_handler_labels chain.
+
 2011-10-28  Iain Sandoe  
 
 	* config/rs6000/t-darwin (LIB2FUNCS_STATIC_EXTRA): 
Index: gcc/reload1.c
===
--- gcc/reload1.c	(revision 180610)
+++ gcc/reload1.c	(working copy)
@@ -3918,6 +3918,10 @@ set_initial_label_offsets (void)
 if (XEXP (x, 0))
   set_label_offsets (XEXP (x, 0), NULL_RTX, 1);
 
+  for (x = nonlocal_goto_handler_labels; x; x = XEXP (x, 1))
+if (XEXP (x, 0))
+  set_label_offsets (XEXP (x, 0), NULL_RTX, 1);
+
   for_each_eh_label (set_initial_eh_label_offset);
 }
 


Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Richard Guenther
On Fri, 28 Oct 2011, Jakub Jelinek wrote:

> On Fri, Oct 28, 2011 at 02:01:36PM +0400, Kirill Yukhin wrote:
> > this looks really cool. I have a liitle question, since I do not
> > understand vectorizer as good.
> > 
> > Say, we have a snippet:
> > int *p;
> > int idx[N];
> > int arr[M];
> > for (...)
> > {
> >   p[i%4] += arr[idx[I]];
> > }
> > As far as I understand, we cannot do gather we, since p may point to
> > somewere in arr,
> > and, idx may took it twice.
> > E.g. lets take
> >   idx = {0, 1, 1, 2, 3, 4, 5, 6}
> >   arr = {0, 1, 0, 0, 0, 0, 0, 0}
> >   p = arr;
> > Correct case will have, we'll have something like arr = {0, 2, 2, 0,...}
> > If we'll have gather, arr may look likearr = {0,
> > 2, 1, 0, ...}
> > 
> > So my question, does your patch catch such a cases?
> 
> Yes, it does.  It should be caught by the vect_analyze_data_refs
> part after vect_check_gather succeeds, which updates all the data
> dependence relations with the new dr, and if the other dr in the
> dependence relation has DR_IS_WRITE, then if dr_may_alias_p
> in initialize_data_dependent_relation didn't say that those two
> aren't known not to alias, we set bad and give up on vectorizing
> the loop.  I'm doing it there, because the vectorizer would otherwise
> attempt later on to version for aliasing, but for gather it is not possible
> (or sometimes is at least much more difficult) to add such a runtime
> check.  For the above it will not say chrec_known and thus we give up.
> 
> A very unfortunate thing is that dr_may_alias_p doesn't use TBAA
> if one dr is a write and the other is a read, I think that is because
> of placement new.  It would be nice if we could use TBAA, at least

It is also because of re-use of memory via memcpy (yes, some dubious
TBAA case from C, but essentially we don't want to break that).  Thus
we can't use TBAA on anonymous memory.

Consider a loop like (gimple IL, thus gimple memory model TBAA):

 for (i=0;i<256;++i)
   {
 tem = intptr[i];
 floatptr[i] = (float)tem;
   }

with intptr == floatptr.  In C you'd have to use a memcpy but
I think we'd just fold that to an assignment.

Richard.


Re: [Patch]: PR49868: Named address space support for AVR

2011-10-28 Thread Georg-Johann Lay
> The patch requires Ulrich's works which is still in review

Now the correct link:

http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01874.html



Re: [cxx-mem-model][PATCH 0/9] Convert i386 to new atomic optabs.

2011-10-28 Thread Jakub Jelinek
On Thu, Oct 27, 2011 at 09:07:29PM -0700, Richard Henderson wrote:
> Jakub, in the seventh patch, is there any good reason why OMP is
> making the decision of whether or not to generate a compare-and-swap
> loop?  Why shouldn't we simply always generate the __sync_fetch_op
> builtin and let optabs.c generate the compare-and-swap loop?

It just wants a guarantee that the builtin will actually be implemented
in hw.  I guess if __sync_fetch_op (new/old) isn't supported but
__sync_compare_and_swap_* is, we could just use the former and let
optabs.c deal with that.  But we have to handle the CAS case anyway
for most of the operations that don't have a __sync_fetch_op defined
(and for the cases where we e.g. VCE floating point data to integer
of the same size for CAS).
For ABI reasons we should keep using GOMP_{start,end}_atomic for the
types that don't have CAS in hw, shouldn't replace it with some generic
C++11 atomic helper in some library (libgcc or elsewhere?).

BTW, I believe all #pragma omp atomic ops we want in the relaxed model
or weaker, I think OpenMP only guarantees that the memory is modified
or loaded atomically (that you don't see half of something and half of
something else), there is nothing that requires ordering the atomic
vs. any other memory location stores/loads.

Jakub


Re: [Patch]: PR49868: Named address space support for AVR

2011-10-28 Thread Denis Chertykov
2011/10/28 Georg-Johann Lay :
> Georg-Johann Lay schrieb:
>
>> This patch adds named address space support to read data from flash (aka.
>> progmem) to target AVR.
>>
>> The patch has two parts:
>>
>> The first part is a repost of Ulrich's work from
>>    http://gcc.gnu.org/ml/gcc/2011-08/msg00131.html
>> with the needed changes to ./gcc and ./gcc/doc
>>
>> This patch is needed because the target hooks MODE_CODE_BASE_REG_CLASS and
>> REGNO_MODE_CODE_OK_FOR_BASE_P don't distinguish between different address
>> spaces.  Ulrich's patch adds respective support to these hooks.
>>
>> The second part is the AVR dependent part that adds __pgm as address space
>> qualifier for address space AS1.
>>
>> The AVR part is just the worker code.  If there is agreement that AS support
>> for AVR is okay in principle and Ulrich's work will go into GCC, I will 
>> supply
>> test programs and updates to the user manual, of course.
>>
>> The major drawbacks of the current AS implementation are:
>>
>> - It works only for C.
>>   For C++, a language extension would be needed as indicated in
>>      ISO/IEC DTR 18037
>>      Annex F - C++ Compatibility and Migration issues
>>      F.2 Multiple Address Spaces Support
>>
>> - Register allocation does not a good job. AS1 can only be addressed
>>   byte-wise by one single address register (Z) as per *Z or *Z++.
>
> This flaw from register allocator are filed as PR50775 now.
>
>> The AVR part does several things:
>>
>> - It locates data in AS1 into appropriate section, i.e. somewhere in
>>   .progmem
>>
>> - It does early sanity checks to ensure that __pgm is always accompanied
>>   with const so that writing to AS1 in not possible.
>>
>> - It prints LPM instructions to access flash memory.
>
> The attached patch is an update merge so that it fits without conflicts.
>
> The patch requires Ulrich's works which is still in review
>   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50775
>
> The regression tests run with this patch and the new ChangeLog enttry si
> written as if Ulrich's patch was applied.
>
> Tests pass without regression.
>
> Besides the update to a nor up-to-date SVN version, the patch sets a built-in
> define __PGM so that it is easy for users to test if or if not the feature is
> available.
>
> Documentation and test cases will follow in separate patch.
>
> Ok for trunk after Ulrich's work has been approved?
>
> Johann
>
>        PR target/49868
>        * config/avr/avr.h (ADDR_SPACE_PGM): New define for address space AS1.
>        (REGISTER_TARGET_PRAGMAS): New define.
>        * config/avr/avr-protos.h (avr_mem_pgm_p): New prototype.
>        (avr_register_target_pragmas): New prototype.
>        (avr_log_t): Add field "progmem".  Order alphabetically.
>        * config/avr/avr-log.c (avr_log_set_avr_log): Set avr_log.progmem.
>        * config/avr/avr-c.c (langhooks.h): New include.
>        (avr_register_target_pragmas): New function. Register address
>        space AS1 as "__pgm".
>        (avr_cpu_cpp_builtins): Add built-in define __PGM.
>        * config/avr/avr.c: Include "c-family/c-common.h".
>        (TARGET_LEGITIMATE_ADDRESS_P): Remove define.
>        (TARGET_LEGITIMIZE_ADDRESS): Remove define.
>        (TARGET_ADDR_SPACE_SUBSET_P): Define to...
>        (avr_addr_space_subset_p): ...this new static function.
>        (TARGET_ADDR_SPACE_CONVERT): Define to...
>        (avr_addr_space_convert): ...this new static function.
>        (TARGET_ADDR_SPACE_ADDRESS_MODE): Define to...
>        (avr_addr_space_address_mode): ...this new static function.
>        (TARGET_ADDR_SPACE_POINTER_MODE): Define to...
>        (avr_addr_space_pointer_mode): ...this new static function.
>        (TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): Define to...
>        (avr_addr_space_legitimate_address_p): ...this new static function.
>        (TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS): Define to...
>        (avr_addr_space_legitimize_address): ...this new static function.
>        (avr_mode_code_base_reg_class): Handle AS1.
>        (avr_regno_mode_code_ok_for_base_p): Handle AS1.
>        (lpm_addr_reg_rtx, lpm_reg_rtx): New static GTYed variables.
>        (avr_decl_pgm_p): New static function.
>        (avr_mem_pgm_p): New function.
>        (avr_asm_len): Return always "" instead of void.
>        (avr_out_lpm_no_lpmx): New static function.
>        (avr_out_lpm): New static function.
>        (output_movqi, output_movhi, output_movsisf): Call avr_out_lpm to
>        handle loads from progmem.
>        (avr_progmem_p): Test if decl is in AS1.
>        (avr_pgm_pointer_const_p): New static function.
>        (avr_pgm_check_var_decl): New static function.
>        (avr_insert_attributes): Use it.  Change error message to report
>        cause (progmem or AS1) when code wants to write to AS1.
>        (avr_section_type_flags): Unset section flag SECTION_BSS for
>        data in progmem.
>        * config/avr/avr.md (LPM_REGNO): New define_constants.
>        (movqi, movhi, movsi, movsf

Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO v2

2011-10-28 Thread Richard Guenther
On Fri, Oct 21, 2011 at 1:55 AM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> Slim LTO requires running ar/nm/ranlib with the LTO plugin. The most
> convenient way to get this into existing Makefiles is using small
> wrappers that pass the plugin. This matches how other compilers
> (LLVM, icc) do this too.
>
> My previous attempt at using shell scripts for this
> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html
> was not approved. Here's another attempt using wrappers written
> in C.  This adds wrappers add a --plugin argument before calling the
> respective binutils utilities.
>
> The logic gcc.c uses to find the files is very complicated. I didn't
> try to replicate it 100% and left out some magic. I would be interested
> if this simple method works for everyone or if more code needs
> to be added. This only needs to support LTO supporting hosts of course.
>
> I didn't add any documentation because the syntax is exactly the same as
> the native ar/ranlib/nm.
>
> v2: Address review comments. Makefile follows go now, use own binaries
> for each sub program.
>
> Passed bootstrap and test suite on x86_64-linux.

Ok.

We can improve/fix things if there is a need to as followup.

Thanks,
Richard.

> gcc/:
> 2011-10-19  Andi Kleen  
>
>        * Makefile.in (MOSTLYCLEANFILES): Add gcc-ar/nm/ranlib.
>        (native): Add gcc-ar, gcc-nm, gcc-ranlib.
>        (AR_LIBS, gcc-ar, gcc-ar.o, gcc-ranlib, gcc-ranlib.o,
>         gcc-nm, gcc-nm.o, gcc-ranlib.c, gcc-nm.c): Add.
>        (install): Depend on install-gcc-ar.
>        (install-gcc-ar): Add.
>        (uninstall): Uninstall gcc-ar, gcc-nm, gcc-ranlib.
>        * gcc-ar.c: Add new file.
> ---
>  gcc/Makefile.in |   71 +++--
>  gcc/gcc-ar.c    |   96 
> +++
>  2 files changed, 164 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/gcc-ar.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 6b28ef5..1b9987a 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1545,7 +1545,8 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h 
> insn-codes.h \
>  genrtl.h gt-*.h gtype-*.h gtype-desc.c gtyp-input.list \
>  xgcc$(exeext) cpp$(exeext) cc1$(exeext) $(EXTRA_PASSES) \
>  $(EXTRA_PARTS) $(EXTRA_PROGRAMS) gcc-cross$(exeext) \
> - $(SPECS) collect2$(exeext) lto-wrapper$(exeext) \
> + $(SPECS) collect2$(exeext) gcc-ar$(exeext) gcc-nm$(exeext) \
> + gcc-ranlib$(exeext) \
>  gcov-iov$(build_exeext) gcov$(exeext) gcov-dump$(exeext) \
>  gengtype$(exeext) *.[0-9][0-9].* *.[si] *-checksum.c libbackend.a \
>  libcommon-target.a libcommon.a libgcc.mk
> @@ -1791,7 +1792,8 @@ rest.encap: lang.rest.encap
>  # This is what is made with the host's compiler
>  # whether making a cross compiler or not.
>  native: config.status auto-host.h build-@POSUB@ $(LANGUAGES) \
> -       $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext)
> +       $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext) \
> +       gcc-ar$(exeext) gcc-nm$(exeext) gcc-ranlib$(exeext)
>
>  ifeq ($(enable_plugin),yes)
>  native: gengtype$(exeext)
> @@ -2049,6 +2051,46 @@ sbitmap.o: sbitmap.c sbitmap.h $(CONFIG_H) $(SYSTEM_H) 
> coretypes.h $(BASIC_BLOCK
>  ebitmap.o: ebitmap.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(EBITMAP_H)
>  sparseset.o: sparseset.c $(SYSTEM_H) sparseset.h $(CONFIG_H)
>
> +AR_LIBS = @COLLECT2_LIBS@
> +
> +gcc-ar$(exeext): gcc-ar.o $(LIBDEPS)
> +       +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) gcc-ar.o -o $@ \
> +               $(LIBS) $(AR_LIBS)
> +
> +gcc-nm$(exeext): gcc-nm.o $(LIBDEPS)
> +       +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) gcc-nm.o -o $@ \
> +               $(LIBS) $(AR_LIBS)
> +
> +gcc-ranlib$(exeext): gcc-ranlib.o $(LIBDEPS)
> +       +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) gcc-ranlib.o -o $@ \
> +               $(LIBS) $(AR_LIBS)
> +
> +CFLAGS-gcc-ar.o += $(DRIVER_DEFINES) \
> +       -DTARGET_MACHINE=\"$(target_noncanonical)\" \
> +       @TARGET_SYSTEM_ROOT_DEFINE@ -DPERSONALITY=\"ar\"
> +
> +gcc-ar.o: gcc-ar.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H)
> +
> +CFLAGS-gcc-ranlib.o += $(DRIVER_DEFINES) \
> +       -DTARGET_MACHINE=\"$(target_noncanonical)\" \
> +       @TARGET_SYSTEM_ROOT_DEFINE@ -DPERSONALITY=\"ranlib\"
> +
> +gcc-ranlib.o: gcc-ranlib.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H)
> +
> +CFLAGS-gcc-nm.o += $(DRIVER_DEFINES) \
> +       -DTARGET_MACHINE=\"$(target_noncanonical)\" \
> +       @TARGET_SYSTEM_ROOT_DEFINE@ -DPERSONALITY=\"nm\"
> +
> +gcc-nm.o: gcc-nm.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H)
> +
> +# ??? the implicit rules dont trigger if the source file has a different name
> +# so copy instead
> +gcc-ranlib.c: gcc-ar.c
> +       cp $^ $@
> +
> +gcc-nm.c: gcc-ar.c
> +       cp $^ $@
> +
>  COLLECT2_OBJS = collect2.o collect2-aix.o tlink.o
>  COLLECT2_LIBS = @COLLECT2_LIBS@
>  collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
> @@ -4576,7 +4618,7 @@ maintainer-clean:
>  # broken is small.
>  install: install-common $(INSTALL_HEAD

Re: Another ggc anti fragmentation patchkit

2011-10-28 Thread Richard Guenther
On Sat, Oct 22, 2011 at 7:54 AM, Andi Kleen  wrote:
> This version addresses all earlier review comments. Passes bootstrap
> and testing on x86-64. Ok?

Ok.

Thanks,
Richard.

> -Andi
>
>


Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Jakub Jelinek
On Fri, Oct 28, 2011 at 12:59:48PM +0200, Richard Guenther wrote:
> It is also because of re-use of memory via memcpy (yes, some dubious
> TBAA case from C, but essentially we don't want to break that).  Thus
> we can't use TBAA on anonymous memory.

No, IMHO we always use a ref_all mem access in that case.
If you meant something like:

void
foo (int *intptr, float *floatptr)
{
  int i;
  for (i = 0; i < 256; ++i)
{
  int tem;
  __builtin_memcpy (&tem, &intptr[i], sizeof (tem));
  floatptr[i] = (float) tem;
}
}

which is valid C even if intptr == floatptr, we have:

:

:
  # i_21 = PHI 
  # ivtmp.12_27 = PHI 
  D.2709_3 = (long unsigned int) i_21;
  D.2710_4 = D.2709_3 * 4;
  D.2711_6 = intptr_5(D) + D.2710_4;
  D.2712_7 = MEM[(char * {ref-all})D.2711_6];
  D.2713_11 = floatptr_10(D) + D.2710_4;
  D.2715_13 = (float) D.2712_7;
  *D.2713_11 = D.2715_13;
  i_14 = i_21 + 1;
  ivtmp.12_26 = ivtmp.12_27 - 1;
  if (ivtmp.12_26 != 0)
goto ;
  else
goto ;

:
  goto ;

which is just fine even with TBAA.
And similarly for
void
bar (int *intptr, float *floatptr)
{
  int i;
  for (i = 0; i < 256; ++i)
{
  float tem;
  tem = (float) intptr[i];
  __builtin_memcpy (&floatptr[i], &tem, sizeof (tem));
}
}

where the ref-all isn't used for load, but for store.

Jakub


Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Richard Guenther
On Fri, 28 Oct 2011, Jakub Jelinek wrote:

> On Fri, Oct 28, 2011 at 12:59:48PM +0200, Richard Guenther wrote:
> > It is also because of re-use of memory via memcpy (yes, some dubious
> > TBAA case from C, but essentially we don't want to break that).  Thus
> > we can't use TBAA on anonymous memory.
> 
> No, IMHO we always use a ref_all mem access in that case.
> If you meant something like:
> 
> void
> foo (int *intptr, float *floatptr)
> {
>   int i;
>   for (i = 0; i < 256; ++i)
> {
>   int tem;
>   __builtin_memcpy (&tem, &intptr[i], sizeof (tem));
>   floatptr[i] = (float) tem;
> }
> }
> 
> which is valid C even if intptr == floatptr, we have:
> 
> :
> 
> :
>   # i_21 = PHI 
>   # ivtmp.12_27 = PHI 
>   D.2709_3 = (long unsigned int) i_21;
>   D.2710_4 = D.2709_3 * 4;
>   D.2711_6 = intptr_5(D) + D.2710_4;
>   D.2712_7 = MEM[(char * {ref-all})D.2711_6];
>   D.2713_11 = floatptr_10(D) + D.2710_4;
>   D.2715_13 = (float) D.2712_7;
>   *D.2713_11 = D.2715_13;
>   i_14 = i_21 + 1;
>   ivtmp.12_26 = ivtmp.12_27 - 1;
>   if (ivtmp.12_26 != 0)
> goto ;
>   else
> goto ;
> 
> :
>   goto ;
> 
> which is just fine even with TBAA.
> And similarly for
> void
> bar (int *intptr, float *floatptr)
> {
>   int i;
>   for (i = 0; i < 256; ++i)
> {
>   float tem;
>   tem = (float) intptr[i];
>   __builtin_memcpy (&floatptr[i], &tem, sizeof (tem));
> }
> }
> 
> where the ref-all isn't used for load, but for store.

Well, yeah.  I said it's probably difficult to generate a
C testcase.  It's still valid middle-end IL (and well-defined) to have
intptr == floatptr and  MEM[(int *)..] and MEM[(float *)...].

Richard.


Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Richard Guenther
On Fri, 28 Oct 2011, Richard Guenther wrote:

> On Fri, 28 Oct 2011, Jakub Jelinek wrote:
> 
> > On Fri, Oct 28, 2011 at 12:59:48PM +0200, Richard Guenther wrote:
> > > It is also because of re-use of memory via memcpy (yes, some dubious
> > > TBAA case from C, but essentially we don't want to break that).  Thus
> > > we can't use TBAA on anonymous memory.
> > 
> > No, IMHO we always use a ref_all mem access in that case.
> > If you meant something like:
> > 
> > void
> > foo (int *intptr, float *floatptr)
> > {
> >   int i;
> >   for (i = 0; i < 256; ++i)
> > {
> >   int tem;
> >   __builtin_memcpy (&tem, &intptr[i], sizeof (tem));
> >   floatptr[i] = (float) tem;
> > }
> > }
> > 
> > which is valid C even if intptr == floatptr, we have:
> > 
> > :
> > 
> > :
> >   # i_21 = PHI 
> >   # ivtmp.12_27 = PHI 
> >   D.2709_3 = (long unsigned int) i_21;
> >   D.2710_4 = D.2709_3 * 4;
> >   D.2711_6 = intptr_5(D) + D.2710_4;
> >   D.2712_7 = MEM[(char * {ref-all})D.2711_6];
> >   D.2713_11 = floatptr_10(D) + D.2710_4;
> >   D.2715_13 = (float) D.2712_7;
> >   *D.2713_11 = D.2715_13;
> >   i_14 = i_21 + 1;
> >   ivtmp.12_26 = ivtmp.12_27 - 1;
> >   if (ivtmp.12_26 != 0)
> > goto ;
> >   else
> > goto ;
> > 
> > :
> >   goto ;
> > 
> > which is just fine even with TBAA.
> > And similarly for
> > void
> > bar (int *intptr, float *floatptr)
> > {
> >   int i;
> >   for (i = 0; i < 256; ++i)
> > {
> >   float tem;
> >   tem = (float) intptr[i];
> >   __builtin_memcpy (&floatptr[i], &tem, sizeof (tem));
> > }
> > }
> > 
> > where the ref-all isn't used for load, but for store.
> 
> Well, yeah.  I said it's probably difficult to generate a
> C testcase.  It's still valid middle-end IL (and well-defined) to have
> intptr == floatptr and  MEM[(int *)..] and MEM[(float *)...].

Btw, only the exact overlap case is critical, for non-exact overlap
like

  for (i)
   {
float[i] = int[i-1] + int[i];
   }

you can reason that there cannot be aliasing as if you execute this
loop more than once(!) then you'd have

  float[i] = int[i-1] + int[i];
  float[i+1] = int[i] + int[i+1];
...

where the 2nd load from int[i] would load from float-initialized
memory which is undefined.  Thus you can assume that float != int.
But that requires more thorough analysis that we don't do at the
moment and knowledge that the loop will iterate at least N
times (when called from the vectorizer, the vectorization factor,
which is at least 2).

Richard.


[PATCH][2/n] LTO option handling/merging rewrite

2011-10-28 Thread Richard Guenther

This moves the existing processing of user options from lto1 to
the lto driver (lto-wrapper).  It also changes the way we stream
user options from some custom binary format over to simply
streaming the original command-line as passed to the compiler
by the driver as a COLLECT_GCC_OPTIONS-like string.

lto-wrapper in this patch version (tries to) performs exactly
the same as lto1 tried to do - re-issue all target specific
options and a selected set of switches.

I chose to do this as an incremental step which should not
change behavior or regress in any way (fingers crossing).

>From here we can think what would be the most sensible behavior
(and maybe start tagging options in the .opt files so they
get treatment based on some flag).

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, ok?

Thanks,
Richard.

2011-10-28  Richard Guenther  

* lto-opts.c: Re-implement.
* lto-streamer.h (lto_register_user_option): Remove.
(lto_read_file_options): Likewise.
(lto_reissue_options): Likewise.
(lto_clear_user_options): Likewise.
(lto_clear_file_options): Likewise.
* opts-global.c (post_handling_callback): Remove.
(set_default_handlers): Do not set post_handling_callback.
(decode_options): Remove LTO specific code.
* lto-wrapper.c (merge_and_complain): New function.
(run_gcc): Read all input file options and
prepend a merged set before the linker driver options.
* gcc.c (driver_post_handling_callback): Remove.
(set_option_handlers): Do not set post_handling_callback.
* opts-common.c (handle_option): Do not call post_handling_callback.
* opts.h (struct cl_option_handlers): Remove post_handling_callback.

lto/
* lto-lang.c (lto_post_options): Do not read file options.
* lto.c (lto_read_all_file_options): Remove.
(lto_init): Call lto_set_in_hooks here.


Index: trunk/gcc/lto-opts.c
===
*** trunk.orig/gcc/lto-opts.c   2011-10-27 15:24:59.0 +0200
--- trunk/gcc/lto-opts.c2011-10-28 12:15:52.0 +0200
***
*** 1,6 
  /* LTO IL options.
  
!Copyright 2009, 2010, 2011 Free Software Foundation, Inc.
 Contributed by Simon Baldwin 
  
  This file is part of GCC.
--- 1,6 
  /* LTO IL options.
  
!Copyright 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
 Contributed by Simon Baldwin 
  
  This file is part of GCC.
*** along with GCC; see the file COPYING3.
*** 33,422 
  #include "common/common-target.h"
  #include "diagnostic.h"
  #include "lto-streamer.h"
! 
! /* When a file is initially compiled, the options used when generating
!the IL are not necessarily the same as those used when linking the
!objects into the final executable.  In general, most build systems
!will proceed with something along the lines of:
! 
!   $ gcc  -flto -c f1.c -o f1.o
!   $ gcc  -flto -c f2.c -o f2.o
!   ...
!   $ gcc  -flto -c fN.c -o fN.o
! 
!And the final link may or may not include the same  used
!to generate the initial object files:
! 
!   $ gcc  -flto -o prog f1.o ... fN.o
! 
!Since we will be generating final code during the link step, some
!of the flags used during the compile step need to be re-applied
!during the link step.  For instance, flags in the -m family.
! 
!The idea is to save a selected set of  in a special
!section of the initial object files.  This section is then read
!during linking and the options re-applied.
! 
!FIXME lto.  Currently the scheme is limited in that only the
!options saved on the first object file (f1.o) are read back during
!the link step.  This means that the options used to compile f1.o
!will be applied to ALL the object files in the final link step.
!More work needs to be done to implement a merging and validation
!mechanism, as this will not be enough for all cases.  */
! 
! /* Saved options hold the type of the option (currently CL_TARGET or
!CL_COMMON), and the code, argument, and value.  */
! 
! typedef struct GTY(()) opt_d
! {
!   unsigned int type;
!   size_t code;
!   char *arg;
!   int value;
! } opt_t;
! 
! DEF_VEC_O (opt_t);
! DEF_VEC_ALLOC_O (opt_t, heap);
! 
! 
! /* Options are held in two vectors, one for those registered by
!command line handling code, and the other for those read in from
!any LTO IL input.  */
! static VEC(opt_t, heap) *user_options = NULL;
! static VEC(opt_t, heap) *file_options = NULL;
! 
! /* Iterate FROM in reverse, writing option codes not yet in CODES into *TO.
!Mark each new option code encountered in CODES.  */
! 
! static void
! reverse_iterate_options (VEC(opt_t, heap) *from, VEC(opt_t, heap) **to,
!bitmap codes)
! {
!   int i;
! 
!   for (i = VEC_length (opt_t, from); i > 0; i--)
! {
!   const opt_t *const o = VEC_index (opt_t

[v3] Add some basic tests for associative/unordered::count()

2011-10-28 Thread Paolo Carlini

Hi,

checked x86_64-linux, committed to mainline.

Thanks,
Paolo.

//

2011-10-28  Paolo Carlini  

* testsuite/23_containers/unordered_map/operations/count.cc: New.
* testsuite/23_containers/multimap/operations/count.cc: Likewise.
* testsuite/23_containers/set/operations/count.cc: Likewise.
* testsuite/23_containers/unordered_multimap/operations/count.cc:
Likewise.
* testsuite/23_containers/unordered_set/operations/count.cc: Likewise.
* testsuite/23_containers/multiset/operations/count.cc: Likewise.
* testsuite/23_containers/unordered_multiset/operations/count.cc:
Likewise.
* testsuite/23_containers/map/operations/count.cc: Likewise.

Index: testsuite/23_containers/unordered_map/operations/count.cc
===
--- testsuite/23_containers/unordered_map/operations/count.cc   (revision 0)
+++ testsuite/23_containers/unordered_map/operations/count.cc   (revision 0)
@@ -0,0 +1,108 @@
+// { dg-options "-std=gnu++0x" }
+
+// 2011-10-28  Paolo Carlini  
+
+// Copyright (C) 2011 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+//
+
+#include 
+#include 
+
+void test01()
+{
+  bool test __attribute__((unused)) = true;
+  using namespace std;
+
+  typedef unordered_map::value_type value_type;
+
+  unordered_map um0;
+  VERIFY( um0.count(0) == 0 );
+  VERIFY( um0.count(1) == 0 );
+
+  um0.insert(value_type(1, 1));
+  VERIFY( um0.count(0) == 0 );
+  VERIFY( um0.count(1) == 1 );
+
+  um0.insert(value_type(1, 2));
+  VERIFY( um0.count(0) == 0 );
+  VERIFY( um0.count(1) == 1 );
+
+  um0.insert(value_type(2, 1));
+  VERIFY( um0.count(2) == 1 );
+
+  um0.insert(value_type(3, 1));
+  um0.insert(value_type(3, 2));
+  um0.insert(value_type(3, 3));
+  VERIFY( um0.count(3) == 1 );
+
+  um0.erase(2);
+  VERIFY( um0.count(2) == 0 );
+
+  um0.erase(0);
+  VERIFY( um0.count(0) == 0 );
+
+  unordered_map um1(um0);
+  VERIFY( um1.count(0) == 0 );
+  VERIFY( um1.count(1) == 1 );
+  VERIFY( um1.count(2) == 0 );
+  VERIFY( um1.count(3) == 1 );
+
+  um0.clear();
+  VERIFY( um0.count(0) == 0 );
+  VERIFY( um0.count(1) == 0 );
+  VERIFY( um0.count(2) == 0 );
+  VERIFY( um0.count(3) == 0 );
+
+  um1.insert(value_type(4, 1));
+  um1.insert(value_type(5, 1));
+  um1.insert(value_type(5, 2));
+  um1.insert(value_type(5, 3));
+  um1.insert(value_type(5, 4));
+  VERIFY( um1.count(4) == 1 );
+  VERIFY( um1.count(5) == 1 );
+
+  um1.erase(1);
+  VERIFY( um1.count(1) == 0 );
+
+  um1.erase(um1.find(5));
+  VERIFY( um1.count(5) == 0 );
+
+  um1.insert(value_type(1, 1));
+  um1.insert(value_type(1, 2));
+  VERIFY( um1.count(1) == 1 );
+
+  um1.erase(5);
+  VERIFY( um1.count(5) == 0 );
+
+  um1.erase(um1.find(4));
+  VERIFY( um1.count(4) == 0 );
+
+  um1.clear();
+  VERIFY( um1.count(0) == 0 );
+  VERIFY( um1.count(1) == 0 );
+  VERIFY( um1.count(2) == 0 );
+  VERIFY( um1.count(3) == 0 );
+  VERIFY( um1.count(4) == 0 );
+  VERIFY( um1.count(5) == 0 );
+}
+
+int main()
+{
+  test01();
+  return 0;
+}
Index: testsuite/23_containers/multimap/operations/count.cc
===
--- testsuite/23_containers/multimap/operations/count.cc(revision 0)
+++ testsuite/23_containers/multimap/operations/count.cc(revision 0)
@@ -0,0 +1,106 @@
+// 2011-10-28  Paolo Carlini  
+
+// Copyright (C) 2011 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+//
+
+#include 
+#include 
+
+void test01()
+{
+  bool test __attribute__((unused)) = true;
+  using namespace 

Re: [Patch Darwin/Ada] work around PR target/50678

2011-10-28 Thread Iain Sandoe


On 18 Oct 2011, at 13:31, Arnaud Charlet wrote:

It's broken in all Libc versions that are in the wild (AFAICT from  
looking

at the released sources).

We will need to deal with
configury/__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ stuff once
there is a fixed Libc.


OK, would be good to follow up with such patch when/if this is fixed.


We are waiting for input re. the actual bug (in system code).

In the meantime,  I've applied the patch as a (hopefully temporary)  
workaround.


cheers
Iain

Index: gcc/ada/ChangeLog
===
--- gcc/ada/ChangeLog   (revision 180612)
+++ gcc/ada/ChangeLog   (working copy)
@@ -1,3 +1,11 @@
+2011-10-28  Iain Sandoe  
+   Eric Botcazou  
+
+   PR target/50678
+   * init.c (Darwin/__gnat_error_handler): Apply a work-around to the
+   bug [filed as radar #10302855], which is inconsistent unwind data
+   for sigtramp.
+
 2011-10-28  Eric Botcazou  
 
PR ada/50842
Index: gcc/ada/init.c
===
--- gcc/ada/init.c  (revision 180612)
+++ gcc/ada/init.c  (working copy)
@@ -2287,6 +2287,16 @@ __gnat_error_handler (int sig, siginfo_t *si, void
 {
   struct Exception_Data *exception;
   const char *msg;
+#if defined (__x86_64__)
+  /* Work around radar #10302855/pr50678, where the unwinders (libunwind or
+ libgcc_s depending on the system revision) and the DWARF unwind data for
+ the sigtramp have different ideas about register numbering (causing rbx
+ and rdx to be transposed)..  */
+  ucontext_t *uc = (ucontext_t *)ucontext ;
+  unsigned long t = uc->uc_mcontext->__ss.__rbx;
+  uc->uc_mcontext->__ss.__rbx = uc->uc_mcontext->__ss.__rdx;
+  uc->uc_mcontext->__ss.__rdx = t;
+#endif
 
   switch (sig)
 {




Re: Use of vector instructions in memmov/memset expanding

2011-10-28 Thread Michael Zolotukhin
Hi Jan!
Thanks for the review, you could find my answers to some of your
remarks below. I'll send a corrected patch soon with answers to the
rest of your remarks.

> -  {{rep_prefix_1_byte, {{-1, rep_prefix_1_byte}}},
> +  {{{rep_prefix_1_byte, {{-1, rep_prefix_1_byte}}},
>{rep_prefix_1_byte, {{-1, rep_prefix_1_byte,
> -  {{rep_prefix_1_byte, {{-1, rep_prefix_1_byte}}},
> +   {{rep_prefix_1_byte, {{-1, rep_prefix_1_byte}}},
> +   {rep_prefix_1_byte, {{-1, rep_prefix_1_byte},
> +  {{{rep_prefix_1_byte, {{-1, rep_prefix_1_byte}}},
>{rep_prefix_1_byte, {{-1, rep_prefix_1_byte,
> +   {{rep_prefix_1_byte, {{-1, rep_prefix_1_byte}}},
> +   {rep_prefix_1_byte, {{-1, rep_prefix_1_byte},
>
> I am bit concerned about explossion of variants, but adding aligned variants 
> probably makes
> sense.  I guess we should specify what alignment needs to be known. I.e. is 
> alignment of 2 enough
> or shall the alignment be matching the size of loads/stores produced?

Yes, alignment should match the size of loads/stores as well as offset
from alignment boundary should be known. In other case, strategies for
unknown alignment would be chosen.


> This hunk seems dangerous in a way that by emitting the explicit loadq/storeq 
> pairs (and similar) will
> prevent use of integer registers for 64bit/128bit arithmetic.
>
> I guess we could play such tricks for memory-memory moves & constant stores. 
> With gimple optimizations
> we already know pretty well that the moves will stay as they are.  That might 
> be enough for you?

Yes, theoretically it could harm 64/128-bit arithmetic, but actually
what could we do if we have DImode, mem-to-mem move and our mode is
32-bit? Ideally, RA should be able to make desicions on how to perform
such moves, but currently it doesn't generate SSE-moves - when it'll
be able to do so, I think we could remove this part and rely on RA.
And, one more point. This is quite a special case - here we want to
perform move via half of vector register. This is the main reason why
these particular cases are handled in special, not common, way.


> I wrote the original function, but it is not really clear for me what the 
> function
> does now. I.e. what is code for updating addresses and what means reusing 
> iter.
> I guess reusing iter means that we won't start the loop from 0.  Could you
> expand comments a bit more?
>
> I know I did not documented them originally, but all the parameters ought to 
> be
> explicitely documented in a function comment.

Yep, you're right - we just don't start the loop from 0. I'll send a
version with the comments soon.


> -/* Output code to copy at most count & (max_size - 1) bytes from SRC to 
> DEST.  */
> +/* Emit strset instuction.  If RHS is constant, and vector mode will be used,
> +   then move this consatnt to a vector register before emitting strset.  */
> +static void
> +emit_strset (rtx destmem, rtx value,
> +rtx destptr, enum machine_mode mode, int offset)
>
> This seems to more naturally belong into gen_strset expander?

I don't think it matters here, but to make emit_strset look similar to
emit_strmov, most of emit_strset body realy could be moved to
gen_strset.


>   if (max_size > 16)
> {
>   rtx label = ix86_expand_aligntest (count, 16, true);
>   if (TARGET_64BIT)
>{
> - dest = change_address (destmem, DImode, destptr);
> - emit_insn (gen_strset (destptr, dest, value));
> - emit_insn (gen_strset (destptr, dest, value));
> + destmem = change_address (destmem, DImode, destptr);
> + emit_insn (gen_strset (destptr, destmem, gen_lowpart (DImode,
> +   value)));
> + emit_insn (gen_strset (destptr, destmem, gen_lowpart (DImode,
> +   value)));
>
> No use for 128bit moves here?
>}
>   else
>{
> - dest = change_address (destmem, SImode, destptr);
> - emit_insn (gen_strset (destptr, dest, value));
> - emit_insn (gen_strset (destptr, dest, value));
> - emit_insn (gen_strset (destptr, dest, value));
> - emit_insn (gen_strset (destptr, dest, value));
> + destmem = change_address (destmem, SImode, destptr);
> + emit_insn (gen_strset (destptr, destmem, gen_lowpart (SImode,
> +   value)));
> + emit_insn (gen_strset (destptr, destmem, gen_lowpart (SImode,
> +   value)));
> + emit_insn (gen_strset (destptr, destmem, gen_lowpart (SImode,
> +   value)));
> + emit_insn (gen_strset (destptr, destmem, gen_lowpart (SImode,
> +   value)));
>
> And here?

For memset prologues/epilogues I avoid using vector moves as it could
require 

Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-10-28 Thread Sergey Ostanevich
On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther  wrote:
> On Thu, 27 Oct 2011, Uros Bizjak wrote:
>
>> Hello!
>>
>> > Here's a patch for PR47698, which is about CMOV should not be
>> > generated for memory address marked as volatile.
>> > Successfully bootstrapped and passed make check on 
>> > x86_64-unknown-linux-gnu.
>>
>>
>>       PR rtl-optimization/47698
>>       * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation
>>       for volatile mem
>>
>>       PR rtl-optimization/47698
>>       * gcc.target/i386/47698.c: New test
>>
>> Please use punctuation marks and correct capitalization in ChangeLog entries.
>>
>> OTOH, do we want to fix this per-target, or in the middle-end?
>
> The middle-end pattern documentation does not say operands 2 and 3
> are not evaluated if they do not end up being stored, so a middle-end
> fix is more appropriate.
>
> Richard.
>

I have two observations:

- the code for CMOV is under #ifdef in the mddle-end, which is
explicitly marked as "have to be removed" (ifcvt.c:1446)
- I have no clear evidence all platforms that support conditional move
have the same semantics that lead to the PR

I think the best way to address both concerns is to implement code
that relies on а new hookup "volatile-safe CMOV" that is false by
default.

regards,
Sergos


Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-10-28 Thread Richard Guenther
On Fri, 28 Oct 2011, Sergey Ostanevich wrote:

> On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther  wrote:
> > On Thu, 27 Oct 2011, Uros Bizjak wrote:
> >
> >> Hello!
> >>
> >> > Here's a patch for PR47698, which is about CMOV should not be
> >> > generated for memory address marked as volatile.
> >> > Successfully bootstrapped and passed make check on 
> >> > x86_64-unknown-linux-gnu.
> >>
> >>
> >>       PR rtl-optimization/47698
> >>       * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation
> >>       for volatile mem
> >>
> >>       PR rtl-optimization/47698
> >>       * gcc.target/i386/47698.c: New test
> >>
> >> Please use punctuation marks and correct capitalization in ChangeLog 
> >> entries.
> >>
> >> OTOH, do we want to fix this per-target, or in the middle-end?
> >
> > The middle-end pattern documentation does not say operands 2 and 3
> > are not evaluated if they do not end up being stored, so a middle-end
> > fix is more appropriate.
> >
> > Richard.
> >
> 
> I have two observations:
> 
> - the code for CMOV is under #ifdef in the mddle-end, which is
> explicitly marked as "have to be removed" (ifcvt.c:1446)
> - I have no clear evidence all platforms that support conditional move
> have the same semantics that lead to the PR
> 
> I think the best way to address both concerns is to implement code
> that relies on а new hookup "volatile-safe CMOV" that is false by
> default.

I suppose it's never safe for all architectures that support
memory operands in the source operand.

Richard.

Re: [trans-mem] Explicitly go irrevocable even if transaction will always go irrevocable.

2011-10-28 Thread Aldy Hernandez



diff --git a/gcc/testsuite/gcc.dg/tm/memopt-1.c 
b/gcc/testsuite/gcc.dg/tm/memopt-1.c
index 06d4f64..9a48dcb 100644
--- a/gcc/testsuite/gcc.dg/tm/memopt-1.c
+++ b/gcc/testsuite/gcc.dg/tm/memopt-1.c
@@ -2,8 +2,8 @@
 /* { dg-options "-fgnu-tm -O -fdump-tree-tmmemopt" } */

 long g, xxx, yyy;
-extern george() __attribute__((transaction_callable));
-extern ringo(long int);
+extern george() __attribute__((transaction_safe));
+extern ringo(long int) __attribute__((transaction_safe));
 int i;


The patch looks fine, but...

Was the original test wrong, or are you testing something new?

If the original test was wrong, this patch is OK.

If the original test was not wrong, you need to add a new test (and 
bonus points for finding out why this test is currently failing :)).


Thanks.
Aldy


[PATCH] Fix early inliner inlining uninlinable functions

2011-10-28 Thread Richard Guenther

We fail to keep the cannot-inline flag up-to-date when turning
indirect to direct calls.  The following patch arranges to do
this during statement folding (which should always be called
when that happens).  It also makes sure to copy the updated flag
to the edge when iterating early inlining.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok?

Thanks,
Richard.

2010-10-28  Richard Guenther  

PR tree-optimization/50890
* gimple.h (gimple_fold_call): Remove.
* gimple-fold.c (fold_stmt_1): Move all call related code to ...
(gimple_fold_call): ... here.  Make static.  Update the
cannot-inline flag on direct calls.
* ipa-inline.c (early_inliner): Copy the cannot-inline flag
from the statements to the edges.

* gcc.dg/torture/pr50890.c: New testcase.

Index: gcc/gimple.h
===
*** gcc/gimple.h(revision 180608)
--- gcc/gimple.h(working copy)
*** unsigned get_gimple_rhs_num_ops (enum tr
*** 909,915 
  #define gimple_alloc(c, n) gimple_alloc_stat (c, n MEM_STAT_INFO)
  gimple gimple_alloc_stat (enum gimple_code, unsigned MEM_STAT_DECL);
  const char *gimple_decl_printable_name (tree, int);
- bool gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace);
  tree gimple_get_virt_method_for_binfo (HOST_WIDE_INT, tree);
  void gimple_adjust_this_by_delta (gimple_stmt_iterator *, tree);
  tree gimple_extract_devirt_binfo_from_cst (tree);
--- 909,914 
Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 180608)
--- gcc/gimple-fold.c   (working copy)
*** gimple_extract_devirt_binfo_from_cst (tr
*** 1057,1109 
 simplifies to a constant value. Return true if any changes were made.
 It is assumed that the operands have been previously folded.  */
  
! bool
  gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace)
  {
gimple stmt = gsi_stmt (*gsi);
tree callee;
  
!   /* Check for builtins that CCP can handle using information not
!  available in the generic fold routines.  */
!   callee = gimple_call_fndecl (stmt);
!   if (!inplace && callee && DECL_BUILT_IN (callee))
! {
!   tree result = gimple_fold_builtin (stmt);
! 
!   if (result)
!   {
!   if (!update_call_from_tree (gsi, result))
!   gimplify_and_update_call_from_tree (gsi, result);
! return true;
!   }
! }
  
/* Check for virtual calls that became direct calls.  */
callee = gimple_call_fn (stmt);
if (callee && TREE_CODE (callee) == OBJ_TYPE_REF)
  {
-   tree binfo, fndecl, obj;
-   HOST_WIDE_INT token;
- 
if (gimple_call_addr_fndecl (OBJ_TYPE_REF_EXPR (callee)) != NULL_TREE)
{
  gimple_call_set_fn (stmt, OBJ_TYPE_REF_EXPR (callee));
! return true;
}
  
!   obj = OBJ_TYPE_REF_OBJECT (callee);
!   binfo = gimple_extract_devirt_binfo_from_cst (obj);
!   if (!binfo)
!   return false;
!   token = TREE_INT_CST_LOW (OBJ_TYPE_REF_TOKEN (callee));
!   fndecl = gimple_get_virt_method_for_binfo (token, binfo);
!   if (!fndecl)
!   return false;
!   gimple_call_set_fndecl (stmt, fndecl);
!   return true;
  }
  
!   return false;
  }
  
  /* Worker for both fold_stmt and fold_stmt_inplace.  The INPLACE argument
--- 1057,1138 
 simplifies to a constant value. Return true if any changes were made.
 It is assumed that the operands have been previously folded.  */
  
! static bool
  gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace)
  {
gimple stmt = gsi_stmt (*gsi);
tree callee;
+   bool changed = false;
+   unsigned i;
  
!   /* Fold *& in call arguments.  */
!   for (i = 0; i < gimple_call_num_args (stmt); ++i)
! if (REFERENCE_CLASS_P (gimple_call_arg (stmt, i)))
!   {
!   tree tmp = maybe_fold_reference (gimple_call_arg (stmt, i), false);
!   if (tmp)
! {
!   gimple_call_set_arg (stmt, i, tmp);
!   changed = true;
! }
!   }
  
/* Check for virtual calls that became direct calls.  */
callee = gimple_call_fn (stmt);
if (callee && TREE_CODE (callee) == OBJ_TYPE_REF)
  {
if (gimple_call_addr_fndecl (OBJ_TYPE_REF_EXPR (callee)) != NULL_TREE)
{
  gimple_call_set_fn (stmt, OBJ_TYPE_REF_EXPR (callee));
! changed = true;
}
+   else
+   {
+ tree obj = OBJ_TYPE_REF_OBJECT (callee);
+ tree binfo = gimple_extract_devirt_binfo_from_cst (obj);
+ if (binfo)
+   {
+ HOST_WIDE_INT token
+   = TREE_INT_CST_LOW (OBJ_TYPE_REF_TOKEN (callee));
+ tree fndecl = gimple_get_virt_method_for_binfo (token, binfo);
+ if (fndecl)
+   {
+ gimple_call_set_fndecl (stmt, fndecl);
+ changed = true;
+  

Re: [RFC PATCH] update to libtool-2.4.2 and regenerate

2011-10-28 Thread Ian Lance Taylor
Rainer Orth  writes:

> Markus Trippelsdorf  writes:
>
>> By popular demand, I've prepared a patch that updates the in-tree
>> libtool to version 2.4.2. It is needed for lto-bootstrap with
>> -fno-fat-lto-objects and FreeBSD10.x versions. 
>
> I see that your patch doesn't deal with libgo/config, where a private
> copy of libtool is kept.  Would it be possible to get rid of that, given
> that 2.4.2 does support Go?

I hope so, but that can probably be a separate patch after the main one
is in.

Ian


[google] ThreadSanitizer instrumentation pass (issue5303083)

2011-10-28 Thread Dmitriy Vyukov
The patch is for google/main branch.
ThreadSanitizer is a data race detector for C/C++ programs.
http://code.google.com/p/data-race-test/wiki/ThreadSanitizer

The tool consists of two parts:
instrumentation module (this file) and a run-time library.
The instrumentation module mainintains shadow call stacks
and intercepts interesting memory accesses.
The instrumentation is enabled with -ftsan flag.

Instrumentation for shadow stack maintainance is as follows:
void somefunc ()
{
  __tsan_shadow_stack [-1] = __builtin_return_address (0);
  __tsan_shadow_stack++;
  // function body
  __tsan_shadow_stack--;
}

Interception for memory access interception is as follows:
*addr = 1;
__tsan_handle_mop (addr, flags);
where flags are (is_sblock | (is_store << 1) | ((sizeof (*addr) - 1) << 2).
is_sblock is used merely for optimization purposes and can always
be set to 1, see comments in instrument_mops function.

Ignore files can be used to selectively non instrument some functions.
Ignore file is specified with -ftsan-ignore=filename flag.
There are 3 types of ignores: (1) do not instrument memory accesses
in the function, (2) do not create sblocks in the function
and (3) recursively ignore memory accesses in the function.
That last ignore type requires additional instrumentation of the form:
void somefunc ()
{
  __tsan_thread_ignore++;
  // function body
  __tsan_thread_ignore--;
}

The run-time library provides __tsan_handle_mop function,
definitions of __tsan_shadow_stack and __tsan_thread_ignore variables,
and intercepts synchronization related functions.

2011-10-28   Dmitriy Vyukov  

* gcc/doc/invoke.texi:
* gcc/tree-tsan.c (enum tsan_ignore_e):
(enum bb_state_e):
(struct bb_data_t):
(struct mop_desc_t):
(struct tsan_ignore_desc_t):
(lookup_name):
(shadow_stack_def):
(thread_ignore_def):
(rtl_mop_def):
(ignore_append):
(ignore_match):
(ignore_load):
(tsan_ignore):
(decl_name):
(build_stack_op):
(build_rec_ignore_op):
(build_stack_assign):
(instr_mop):
(instr_vptr_store):
(instr_func):
(set_location):
(is_dtor_vptr_store):
(is_vtbl_read):
(is_load_of_const):
(handle_expr):
(handle_gimple):
(instrument_bblock):
(instrument_mops):
(instrument_function):
(tsan_pass):
(tsan_gate):
* gcc/tree-pass.h:
* gcc/testsuite/gcc.dg/tsan-ignore.ignore:
* gcc/testsuite/gcc.dg/tsan.h (__tsan_init):
(__tsan_expect_mop):
(__tsan_handle_mop):
* gcc/testsuite/gcc.dg/tsan-ignore.c (foo):
(int bar):
(int baz):
(int bla):
(int xxx):
(main):
* gcc/testsuite/gcc.dg/tsan-ignore.h (in_tsan_ignore_header):
* gcc/testsuite/gcc.dg/tsan-stack.c (foobar):
* gcc/testsuite/gcc.dg/tsan-mop.c:
* gcc/common.opt:
* gcc/Makefile.in:
* gcc/passes.c:

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 180522)
+++ gcc/doc/invoke.texi (working copy)
@@ -308,6 +308,7 @@
 -fdump-tree-ssa@r{[}-@var{n}@r{]} -fdump-tree-pre@r{[}-@var{n}@r{]} @gol
 -fdump-tree-ccp@r{[}-@var{n}@r{]} -fdump-tree-dce@r{[}-@var{n}@r{]} @gol
 -fdump-tree-gimple@r{[}-raw@r{]} -fdump-tree-mudflap@r{[}-@var{n}@r{]} @gol
+-fdump-tree-tsan@r{[}-@var{n}@r{]} @gol
 -fdump-tree-dom@r{[}-@var{n}@r{]} @gol
 -fdump-tree-dse@r{[}-@var{n}@r{]} @gol
 -fdump-tree-phiprop@r{[}-@var{n}@r{]} @gol
@@ -381,8 +382,8 @@
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
--fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
--fno-default-inline @gol
+-fmove-loop-invariants -fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
+-ftsan -ftsan-ignore -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
 -fno-inline -fno-math-errno -fno-peephole -fno-peephole2 @gol
 -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol
@@ -5896,6 +5897,11 @@
 Dump each function after adding mudflap instrumentation.  The file name is
 made by appending @file{.mudflap} to the source file name.
 
+@item tsan
+@opindex fdump-tree-tsan
+Dump each function after adding ThreadSanitizer instrumentation.  The file 
name is
+made by appending @file{.tsan} to the source file name.
+
 @item sra
 @opindex fdump-tree-sra
 Dump each function after performing scalar replacement of aggregates.  The
@@ -6674,6 +6680,12 @@
 some protection against outright memory corrupting writes, but allows
 erroneously read data to propagate within a program.
 
+@item -ftsan -ftsan-ignore
+@opindex ftsan
+@opindex ftsan-ignore
+Add ThreadSanitizer instrumentation. Use @o

[Patch,AVR]: Tweak 8-bit parity expansion

2011-10-28 Thread Georg-Johann Lay
This is minor tweak to support 8-bit parity.

Otherwise, the input operand of 8-bit values will be extended before parity
computation.

The final representation as libgcc call is not generated in split1 and no more
in expand. Notice that

- combine is not allowed to propagate hard regs into zero-extends.
- combine does not try parity:QI

Ok for trunk?

Johann

* config/avr/avr.md (parityhi2): Expand allowing pseudos.
(*parityhi2): New pre-reload insn-and-split to map 16-bit parity
to the libgcc insn.
(*parityqihi2): Same for 8-bit parity.


Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 180605)
+++ config/avr/avr.md	(working copy)
@@ -4288,15 +4288,41 @@ (define_insn "delay_cycles_4"
 
 ;; Parity
 
+;; Postpone expansion of 16-bit parity to libgcc call until after combine for
+;; better 8-bit parity recognition.
+
 (define_expand "parityhi2"
+  [(parallel [(set (match_operand:HI 0 "register_operand" "")
+   (parity:HI (match_operand:HI 1 "register_operand" "")))
+  (clobber (reg:HI 24))])])
+
+(define_insn_and_split "*parityhi2"
+  [(set (match_operand:HI 0 "register_operand"   "=r")
+(parity:HI (match_operand:HI 1 "register_operand" "r")))
+   (clobber (reg:HI 24))]
+  "!reload_completed"
+  { gcc_unreachable(); }
+  "&& 1"
   [(set (reg:HI 24)
-(match_operand:HI 1 "register_operand" ""))
+(match_dup 1))
(set (reg:HI 24)
 (parity:HI (reg:HI 24)))
-   (set (match_operand:HI 0 "register_operand" "")
-(reg:HI 24))]
-  ""
-  "")
+   (set (match_dup 0)
+(reg:HI 24))])
+
+(define_insn_and_split "*parityqihi2"
+  [(set (match_operand:HI 0 "register_operand"   "=r")
+(parity:HI (match_operand:QI 1 "register_operand" "r")))
+   (clobber (reg:HI 24))]
+  "!reload_completed"
+  { gcc_unreachable(); }
+  "&& 1"
+  [(set (reg:QI 24)
+(match_dup 1))
+   (set (reg:HI 24)
+(zero_extend:HI (parity:QI (reg:QI 24
+   (set (match_dup 0)
+(reg:HI 24))])
 
 (define_expand "paritysi2"
   [(set (reg:SI 22)


Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)

2011-10-28 Thread Jack Howarth
Mikael,
The complete patch bootstraps current FSF gcc trunk on 
x86_64-apple-darwin11 and the resulting
gfortran compiler can compile the Polyhedron 2005 benchmarks using...

Compile Command : gfortran-fsf-4.7 -O3 -ffast-math -funroll-loops -flto 
-fwhole-program %n.f90 -o %n

without runtime regressions. However I don't seem to see any particular 
performance improvements with
your patches applied. In fact, a few benchmarks including nf and test_fpu seem 
to show slower runtimes
(~8-11%). Have you done any benchmarking with and without the proposed patches?
  Jack


Re: [PATCH, rs6000] Update Power7 scheduling

2011-10-28 Thread David Edelsohn
On Thu, Oct 27, 2011 at 6:14 PM, Pat Haugen  wrote:
> The following patch fixes some issues with the Power7 scheduling
> description. The patch is neutral on cpu2006 (was actually hoping to see
> some improvements, but it's still the right thing to do since it more
> accurately describes the hardware).
>
> Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?
>
> -Pat
>
>
> 2011-10-27  Pat Haugen 
>
>        * config/rs6000/rs6000.md (define_attr "type"): Add vecdouble.
>        * config/rs6000/vsx.md (VStype_simple, VStype_mul): Use vecdouble
>        type for V2DF.
>        (VStype_div): Use vector types for V2DF/V4SF.
>        (VStype_sqrt): Use *sqrt types.
>        (VS_spdp_type): Change type to vecdouble.
>        (*vsx_fmav2df4, *vsx_nfmsv2df4, vsx_xvcvdpsxws, vsx_xvcvdpuxws,
>        vsx_xvcvuxdsp, vsx_xvcvsxwdp, vsx_xvcvuxwdp, vsx_xvcvspsxds,
>        vsx_xvcvspuxds): Likewise.
>        (*vsx_fms4): Set type via .
>        (*vsx_eq__p, *vsx_gt__p, *vsx_ge__p): Set type via
>        .
>        * config/rs6000/power7.md (power7-vecstore): Correct VSU pipe.
>        (power7-fpcompare, power7-sdiv, power7-ddiv, power7-sqrt,
>        power7-dsqrt): Correct insn latency.
>        (power7-vecsimple): Add veccmp type and correct dispatch/VSU values.
>        (power7-veccmp): Delete.
>        (power7-vecfloat): Correct latency/dispatch/VSU values.
>        (define_bypass "power7-vecfloat"): Correct latency and types.
>        (power7-veccomplex, power7-vecperm): Correct dispatch/VSU values.
>        (power7-vecdouble, power7-vecfdiv, power7-vecdiv): New.

Okay.

Thanks, David


Re: [Patch,AVR]: Tweak 8-bit parity expansion

2011-10-28 Thread Denis Chertykov
2011/10/28 Georg-Johann Lay :
> This is minor tweak to support 8-bit parity.
>
> Otherwise, the input operand of 8-bit values will be extended before parity
> computation.
>
> The final representation as libgcc call is not generated in split1 and no more
> in expand. Notice that
>
> - combine is not allowed to propagate hard regs into zero-extends.
> - combine does not try parity:QI
>
> Ok for trunk?
>
> Johann
>
>        * config/avr/avr.md (parityhi2): Expand allowing pseudos.
>        (*parityhi2): New pre-reload insn-and-split to map 16-bit parity
>        to the libgcc insn.
>        (*parityqihi2): Same for 8-bit parity.
>

Approved.

Denis.


Re: [trans-mem] Explicitly go irrevocable even if transaction will always go irrevocable.

2011-10-28 Thread Patrick Marlier

On 10/28/2011 08:53 AM, Aldy Hernandez wrote:

If the original test was not wrong, you need to add a new test (and
bonus points for finding out why this test is currently failing :)).


long g, xxx, yyy;

/* { dg-final { scan-tree-dump-times "transforming: .*_ITM_RaWU8 
\\(&g\\);" 1 "tmmemopt" } } */


At least, maybe one of the problem is that g is "long" type (4 bytes on 
32bits) and it is testing for 8 bytes?


Patrick.


ping: [RFA:] testsuite infrastructure for options implied by dg-final methods

2011-10-28 Thread Hans-Peter Nilsson
Ping.
Subject changed from '[RFA:] fix breakage with "Update testsuite
to run with slim LTO"' except it doesn't fix *all* breakage
introduced by that patch, only the one I observed and intended
to fix.

> Date: Fri, 21 Oct 2011 04:29:20 +0200
> From: Hans-Peter Nilsson 

> > Date: Fri, 21 Oct 2011 00:19:32 +0200
> > From: Jan Hubicka 
> > Yes, if we scan assembler, we likely want -fno-fat-lto-objects.
> 
> > > then IIUC you need to patch *all* torture tests that use
> > > scan-assembler and scan-assembler-not.  Alternatively, patch
> > > somewhere else, like not passing it if certain directives are
> > > used, like scan-assembler{,-not}.  And either way, is it safe to
> > > add that option always, not just when also passing "-flto" or
> > > something?
> > 
> > Hmm, some of assembler scans still works because they check for
> > presence of symbols we output anyway, but indeed, it would make more
> > sense to automatically imply -ffat-lto-object when scan-assembler
> > is used.  I am not sure if my dejagnu skill as on par here however.
> 
> Maybe you could make amends ;) by testing the following, which
> seems to work at least for dg-torture.exp and cris-elf/cris-sim,
> in which -ffat-lto-object is automatically added for each
> scan-assembler and scan-assembler-not test, extensible for other
> dg-final actions without polluting with checking LTO options and
> whatnot across the files.  I checked (and corrected) so it also
> works when !check_effective_target_lto by commenting out the
> setting in the second chunk.
> 
> gcc/testsuite:
> 
>   * lib/gcc-dg.exp (gcc_force_conventional_output): New global
>   variable, default empty, -ffat-lto-objects for effective_target_lto.
>   (gcc-dg-test-1): Add options from dg-final methods.
>   * lib/scanasm.exp (scan-assembler_required_options)
>   (scan-assembler-not_required_options): New procs.


Ok to commit?

brgds, H-P


[v3] Trivial formatting changes to a recently added testcase

2011-10-28 Thread Paolo Carlini

HI,

committed to mainline.

Thanks,
Paolo.

///
2011-10-28  Paolo Carlini  

* testsuite/30_threads/condition_variable_any/50862.cc: Trivial
formatting fixes.


Index: testsuite/30_threads/condition_variable_any/50862.cc
===
--- testsuite/30_threads/condition_variable_any/50862.cc(revision 
180616)
+++ testsuite/30_threads/condition_variable_any/50862.cc(working copy)
@@ -41,8 +41,8 @@
 
   std::mutex  m;
   std::condition_variable_any cond;
-  unsigned intproduct=0;
-  const unsigned int  count=10;
+  unsigned intproduct = 0;
+  const unsigned int  count = 10;
 
   // writing to stream causes timing changes which makes deadlock easier
   // to reproduce - do not remove
@@ -50,27 +50,31 @@
 
   // create consumers
   std::array threads;
-  for(size_t i=0; i

Re: [PATCH][RFC] Re-write LTO option merging

2011-10-28 Thread Diego Novillo

On 11-10-27 01:46 , Richard Guenther wrote:

On Wed, 26 Oct 2011, Richard Guenther wrote:



This completely rewrites LTO option merging.  At compile (uselessly
now at WPA?) time we now stream a COLLECT_GCC_OPTIONS like string
as it comes from argv of the compiler binary.  Those options are
read in by the LTO driver (lto-wrapper), merged into a single
set (very simple merge function right now ;)) and given a place to
complain about incompatible arguments.  The merged set is then
prepended to the arguments from the linker driver line
(what we get in COLLECT_GCC_OPTIONS for lto-wrapper), thus the
linker command-line may override what the compiler command-line(s)
provided.

One visible change is that no optimization option on the link line
no longer means -O0, unless you explicitly specify -O0 at link time.

There are probably more obscure differences, especially due to the
very simple merge and complain function ;))  But this is a RFC ...

If WPA partitioning at any point wants to do something clever with
a set of incompatible functions it can re-parse the options and
do that (we then have to arrange for lto-wrapper to let the options
slip through).

I'm LTO bootstrapping and testing this simple variant right now
(I believe we do not excercise funny option combinations right now).

I'll still implement a very simple merge/complain function.
Suggestions for that welcome (I'll probably simply compute the
intersection of options ... in the long run we'd want to annotate
our options as to whether they should be unioned/intersected).


Are you thinking of having some table of options with hints?  An NxN 
matrix of options?  Given two arbitrary options OPT1 and OPT2, how do we 
decide whether they can go together?  That's one big matrix.


Perhaps we could group options in classes?  There's really only a subset 
of options that need to be checked: -f, -m, -O, -g, ...
Perhaps start with an if-tree checking an incoming option against the 
set of accumulated options so far.


In fact, if we simply cataloged the set of options that can affect 
gimple bytecode generation, we can then make sure that those don't 
change at link time.





!   if (i != 1)
!   obstack_grow (&temporary_obstack, " ", 1);
!   obstack_grow (&temporary_obstack, "'", 1);
!   q = option->canonical_option[0];
!   while ((p = strchr (q, '\'')))
!   {
! obstack_grow (&temporary_obstack, q, p - q);
! obstack_grow (&temporary_obstack, "'\\''", 4);
! q = ++p;
!   }
!   obstack_grow (&temporary_obstack, q, strlen (q));
!   obstack_grow (&temporary_obstack, "'", 1);

!   for (j = 1; j<  option->canonical_option_num_elements; ++j)
{
! obstack_grow (&temporary_obstack, " '", 2);
! q = option->canonical_option[j];
! while ((p = strchr (q, '\'')))
!   {
! obstack_grow (&temporary_obstack, q, p - q);
! obstack_grow (&temporary_obstack, "'\\''", 4);
! q = ++p;
!   }
! obstack_grow (&temporary_obstack, q, strlen (q));
! obstack_grow (&temporary_obstack, "'", 1);


Ugh.


+   /* ???  For now the easiest thing would be to warn about
+  mismatches.  */
+
+   if (*decoded_options_count != fdecoded_options_count)
+ {
+   /* ???  Warn?  */
+   return;
+ }


Yes, please.  We don't want to silently accept anything we don't fully 
understand.




+   for (i = 0; i<  *decoded_options_count; ++i)
+ {
+   struct cl_decoded_option *option =&(*decoded_options)[i];
+   struct cl_decoded_option *foption =&fdecoded_options[i];
+   if (strcmp (option->orig_option_with_args_text,
+ foption->orig_option_with_args_text) != 0)
+   {
+ /* ???  Warn?  */
+ return;


Likewise.  If the warning proves to noisy in common scenarios, we can 
then adjust.





+
 /* Initalize the common arguments for the driver.  */
!   new_argv = (const char **) xmalloc ((15


15?



One thing I like about this is that it moves option processing out of 
lto1.  If we are getting the same behaviour as today, I'd say commit and 
we can refine later.



Diego.


Re: [PATCH][2/n] LTO option handling/merging rewrite

2011-10-28 Thread Diego Novillo
Isn't this the same patch as
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02348.html?


Diego.


[trans-mem] fix C++ transaction_wrap attribute

2011-10-28 Thread Aldy Hernandez
The C++ front-end gives us a DECL for transaction_wrap attribute's 
argument.  The C front-end OTOH gives us an IDENTIFIER_NODE.  I have no 
idea why this change after the merge, but I have fixed the attribute 
handler to work with both front-ends.


Also, for this case, distilled from testsuite/c-c++-common/tm/wrap-2.c, 
the C++ FE correctly complains that "f4" was not declared in this scope. 
 The C front-end does not.


void g4(void) __attribute__((transaction_wrap(f4)))

Suffice to say that both front-ends are sufficiently different that we 
should probably have two versions of testsuite/c-c++-common/tm/wrap-2.c.


The following patch fixes the attribute handler to work with both 
front-ends and separates wrap-2.c into C and C++ versions.  With it, the 
wrap-* failures are fixed for C++.


Including the patches I have queued on my end, we are now down to 3 
distinct C++ TM failures.


OK for branch?
* c-family/c-common.c (handle_tm_wrap_attribute): Handle decl
argument.
* testsuite/c-c++-common/tm/wrap-2.c: Move...
* testsuite/gcc.dg/tm/wrap-2.c: ...here.
* testsuite/g++.dg/tm/wrap-2.C: New.

Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 180614)
+++ c-family/c-common.c (working copy)
@@ -7489,12 +7489,15 @@ handle_tm_wrap_attribute (tree *node, tr
 warning (OPT_Wattributes, "%qE attribute ignored", name);
   else
 {
-  tree wrap_id = TREE_VALUE (args);
-  if (TREE_CODE (wrap_id) != IDENTIFIER_NODE)
+  tree wrap_decl = TREE_VALUE (args);
+  if (TREE_CODE (wrap_decl) != IDENTIFIER_NODE
+ && TREE_CODE (wrap_decl) != VAR_DECL
+ && TREE_CODE (wrap_decl) != FUNCTION_DECL)
error ("%qE argument not an identifier", name);
   else
{
- tree wrap_decl = lookup_name (wrap_id);
+ if (TREE_CODE (wrap_decl) == IDENTIFIER_NODE)
+   wrap_decl = lookup_name (wrap_decl);
  if (wrap_decl && TREE_CODE (wrap_decl) == FUNCTION_DECL)
{
  if (lang_hooks.types_compatible_p (TREE_TYPE (decl),
@@ -7504,7 +7507,7 @@ handle_tm_wrap_attribute (tree *node, tr
error ("%qD is not compatible with %qD", wrap_decl, decl);
}
  else
-   error ("%qE is not a function", wrap_id);
+   error ("transaction_wrap argument is not a function");
}
 }
 
Index: testsuite/gcc.dg/tm/wrap-2.c
===
--- testsuite/gcc.dg/tm/wrap-2.c(revision 0)
+++ testsuite/gcc.dg/tm/wrap-2.c(revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm" } */
+
+#define W(X)   __attribute__((transaction_wrap(X)))
+void f1(void);
+void f2(int);
+int i3;
+int f7(void);
+
+void g1(void) W(f1);
+void g2(void) W(f2);   /* { dg-error "is not compatible" } */
+void g3(void) W(i3);   /* { dg-error "is not a function" } */
+void g4(void) W(f4);   /* { dg-error "is not a function" } */
+void g5(void) W(1);/* { dg-error "not an identifier" } */
+void g6(void) W("f1"); /* { dg-error "not an identifier" } */
+void g7(void) W(f7);   /* { dg-error "is not compatible" } */
Index: testsuite/g++.dg/tm/wrap-2.C
===
--- testsuite/g++.dg/tm/wrap-2.C(revision 0)
+++ testsuite/g++.dg/tm/wrap-2.C(revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm" } */
+
+#define W(X)   __attribute__((transaction_wrap(X)))
+void f1(void);
+void f2(int);
+int i3;
+int f7(void);
+
+void g1(void) W(f1);
+void g2(void) W(f2);   /* { dg-error "is not compatible" } */
+void g3(void) W(i3);   /* { dg-error "is not a function" } */
+void g4(void) W(f4);   /* { dg-error "not declared in this scope\|not an 
identifier" } */
+void g5(void) W(1);/* { dg-error "not an identifier" } */
+void g6(void) W("f1"); /* { dg-error "not an identifier" } */
+void g7(void) W(f7);   /* { dg-error "is not compatible" } */
Index: testsuite/c-c++-common/tm/wrap-2.c
===
--- testsuite/c-c++-common/tm/wrap-2.c  (revision 180614)
+++ testsuite/c-c++-common/tm/wrap-2.c  (working copy)
@@ -1,16 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-fgnu-tm" } */
-
-#define W(X)   __attribute__((transaction_wrap(X)))
-void f1(void);
-void f2(int);
-int i3;
-int f7(void);
-
-void g1(void) W(f1);
-void g2(void) W(f2);   /* { dg-error "is not compatible" } */
-void g3(void) W(i3);   /* { dg-error "is not a function" } */
-void g4(void) W(f4);   /* { dg-error "is not a function" } */
-void g5(void) W(1);/* { dg-error "not an identifier" } */
-void g6(void) W("f1"); /* { dg-error "not an identifier" } */
-void g7(void) W(f7);   /* { dg-error "is not compatible" } */


Re: [cxx-mem-model][PATCH 0/9] Convert i386 to new atomic optabs.

2011-10-28 Thread Richard Henderson
On 10/28/2011 04:06 AM, Jakub Jelinek wrote:
> It just wants a guarantee that the builtin will actually be implemented
> in hw.  I guess if __sync_fetch_op (new/old) isn't supported but
> __sync_compare_and_swap_* is, we could just use the former and let
> optabs.c deal with that.  But we have to handle the CAS case anyway
> for most of the operations that don't have a __sync_fetch_op defined
> (and for the cases where we e.g. VCE floating point data to integer
> of the same size for CAS).

I was just thinking that the data structure with the 6 optabs that we're
exporting from optabs.c is somewhat over the top, when simply testing
can_compare_and_swap_p is just about equivalent.

On reflection, I think I'll revert that patch and try it with just that
one test...

> BTW, I believe all #pragma omp atomic ops we want in the relaxed model
> or weaker, I think OpenMP only guarantees that the memory is modified
> or loaded atomically (that you don't see half of something and half of
> something else), there is nothing that requires ordering the atomic
> vs. any other memory location stores/loads.

... possibly with switching to the new builtins in relaxed mode?


r~


Re: [PATCH][PING] Vectorize conversions directly

2011-10-28 Thread Richard Henderson
On 10/28/2011 01:22 AM, Dmitry Plotnikov wrote:
> gcc/
> * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
> * optabs.c (supportable_convert_operation): New function.
> * optabs.h (supportable_convert_operation): New prototype.
> * tree-vect-stmts.c (vectorizable_conversion): Change condition and 
> behavior
>   for NONE modifier case.
> * tree.h (VECTOR_INTEGER_TYPE_P): New macro.
...
> gcc/testsuite/
> * gcc.target/arm/vect-vcvt.c: New test.
> * gcc.target/arm/vect-vcvtq.c: New test.
> 
> gcc/testsuite/lib/
> * target-supports.exp (check_effective_target_vect_intfloat_cvt): True
>   for ARM NEON.
>   (check_effective_target_vect_uintfloat_cvt): Likewise.
>   (check_effective_target_vect_intfloat_cvt): Likewise.
>   (check_effective_target_vect_floatuint_cvt): Likewise.
>   (check_effective_target_vect_floatint_cvt): Likewise.
>   (check_effective_target_vect_extract_even_odd): Likewise. 



Ok.


r~


[trans-mem] Fix outer transactions to be considered abortable too.

2011-10-28 Thread Torvald Riegel
Atomic transactions marked as outer(-atomic) transactions can abort too
if they are calling functions whose type has the may_cancel_outer
attribute. Given that outer transactions are probably rare, this patch
just assumes that all outer transactions might abort irrespective of
whether they are actually calling may_cancel_outer functions.
Previously, there was no abort-handling code generated for outer
transactions, except if there was a __transaction_cancel in lexical
scope.

OK for branch?
commit 5ca679dfdc10038e6fe9bf9b3a73df5088d6cf21
Author: Torvald Riegel 
Date:   Fri Oct 28 17:01:25 2011 +0200

Fix outer transactions to be considered abortable too.

* trans-mem.c (lower_transaction): Also add an "over" laber for outer
transactions.
(expand_transactions): Do not set hasNoAbort for outer transactions.
* testsuite/gcc.dg/tm/props-4.c: New file.

--- a/gcc/ChangeLog.tm
+++ b/gcc/ChangeLog.tm
@@ -1,3 +1,10 @@
+2011-10-28  Torvald Riegel  
+
+   * trans-mem.c (lower_transaction): Also add an "over" laber for outer
+   transactions.
+   (expand_transactions): Do not set hasNoAbort for outer transactions.
+   * testsuite/gcc.dg/tm/props-4.c: New file.
+
 2011-10-27  Torvald Riegel  
 
* trans-mem.c (ipa_tm_transform_transaction): Insert explicit request
diff --git a/gcc/testsuite/gcc.dg/tm/props-4.c 
b/gcc/testsuite/gcc.dg/tm/props-4.c
new file mode 100644
index 000..c9d0c2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tm/props-4.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -fdump-tree-tmedge -fdump-tree-tmmark" } */
+
+int a, b;
+
+void __attribute((transaction_may_cancel_outer,noinline)) cancel1()
+{
+  __transaction_cancel [[outer]];
+}
+
+void
+foo(void)
+{
+  __transaction_atomic [[outer]] {
+a = 2;
+__transaction_atomic {
+  b = 2;
+  cancel1();
+}
+  }
+}
+
+/* { dg-final { scan-tree-dump-times " instrumentedCode" 1 "tmedge" } } */
+/* { dg-final { scan-tree-dump-times "hasNoAbort" 0 "tmedge" } } */
+/* { dg-final { scan-tree-dump-times "LABEL=" 1 "tmmark" } } */
+/* { dg-final { cleanup-tree-dump "tmedge" } } */
+/* { dg-final { cleanup-tree-dump "tmmark" } } */
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index 994cf09..bb98273 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -1603,8 +1603,10 @@ lower_transaction (gimple_stmt_iterator *gsi, struct 
walk_stmt_info *wi)
 
   gimple_transaction_set_body (stmt, NULL);
 
-  /* If the transaction calls abort, add an "over" label afterwards.  */
-  if (this_state & GTMA_HAVE_ABORT)
+  /* If the transaction calls abort or if this is an outer transaction,
+ add an "over" label afterwards.  */
+  if ((this_state & (GTMA_HAVE_ABORT))
+  || (gimple_transaction_subcode(stmt) & GTMA_IS_OUTER))
 {
   tree label = create_artificial_label (UNKNOWN_LOCATION);
   gimple_transaction_set_label (stmt, label);
@@ -2563,7 +2565,10 @@ expand_transaction (struct tm_region *region)
 flags = PR_INSTRUMENTEDCODE;
   if ((subcode & GTMA_MAY_ENTER_IRREVOCABLE) == 0)
 flags |= PR_HASNOIRREVOCABLE;
-  if ((subcode & GTMA_HAVE_ABORT) == 0)
+  /* If the transaction does not have an abort in lexical scope and is not
+ marked as an outer transaction, then it will never abort.  */
+  if ((subcode & GTMA_HAVE_ABORT) == 0
+  && (subcode & GTMA_IS_OUTER) == 0)
 flags |= PR_HASNOABORT;
   if ((subcode & GTMA_HAVE_STORE) == 0)
 flags |= PR_READONLY;


Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-10-28 Thread Richard Henderson
On 10/27/2011 06:43 PM, Peter Bergner wrote:
> Ok, here's a patch to implement that, and it passes bootstrap and
> regtesting.  Richard, is this what you had in mind?  I'll note that
> I disabled rs6000_code_end for TARGET_POWERPC64, since I was running
> into linker errors when building libgcc.  The merging of the thunk
> routines with comdat worked fine, but the thunk function also has a
> function descriptor and I couldn't figure out a way to get those
> merged properly (if it's even possible), so they led to multiply
> defined symbol linker errors.

That's something you might have to discuss with David and Alan.

You might wind up bypassing some of the normal boilerplate
that gets added by final_start_function etc.

It does look like you're missing the stub for ppc64, and yet
you invoke it?  At least, I don't see anything earlier that
tests ppc64, only in rs6000_code_end.


r~


Re: [trans-mem] fix C++ transaction_wrap attribute

2011-10-28 Thread Richard Henderson
On 10/28/2011 07:50 AM, Aldy Hernandez wrote:
>   * c-family/c-common.c (handle_tm_wrap_attribute): Handle decl
>   argument.
>   * testsuite/c-c++-common/tm/wrap-2.c: Move...
>   * testsuite/gcc.dg/tm/wrap-2.c: ...here.
>   * testsuite/g++.dg/tm/wrap-2.C: New.

Ok.


r~


Re: [trans-mem] Fix outer transactions to be considered abortable too.

2011-10-28 Thread Richard Henderson
On 10/28/2011 08:08 AM, Torvald Riegel wrote:
> Fix outer transactions to be considered abortable too.
> 
>   * trans-mem.c (lower_transaction): Also add an "over" laber for outer
>   transactions.
>   (expand_transactions): Do not set hasNoAbort for outer transactions.
>   * testsuite/gcc.dg/tm/props-4.c: New file.

Ok.


r~


Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-10-28 Thread Sergey Ostanevich
On Fri, Oct 28, 2011 at 4:52 PM, Richard Guenther  wrote:
> On Fri, 28 Oct 2011, Sergey Ostanevich wrote:
>
>> On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther  wrote:
>> > On Thu, 27 Oct 2011, Uros Bizjak wrote:
>> >
>> >> Hello!
>> >>
>> >> > Here's a patch for PR47698, which is about CMOV should not be
>> >> > generated for memory address marked as volatile.
>> >> > Successfully bootstrapped and passed make check on 
>> >> > x86_64-unknown-linux-gnu.
>> >>
>> >>
>> >>       PR rtl-optimization/47698
>> >>       * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation
>> >>       for volatile mem
>> >>
>> >>       PR rtl-optimization/47698
>> >>       * gcc.target/i386/47698.c: New test
>> >>
>> >> Please use punctuation marks and correct capitalization in ChangeLog 
>> >> entries.
>> >>
>> >> OTOH, do we want to fix this per-target, or in the middle-end?
>> >
>> > The middle-end pattern documentation does not say operands 2 and 3
>> > are not evaluated if they do not end up being stored, so a middle-end
>> > fix is more appropriate.
>> >
>> > Richard.
>> >
>>
>> I have two observations:
>>
>> - the code for CMOV is under #ifdef in the mddle-end, which is
>> explicitly marked as "have to be removed" (ifcvt.c:1446)
>> - I have no clear evidence all platforms that support conditional move
>> have the same semantics that lead to the PR
>>
>> I think the best way to address both concerns is to implement code
>> that relies on а new hookup "volatile-safe CMOV" that is false by
>> default.
>
> I suppose it's never safe for all architectures that support
> memory operands in the source operand.
>
> Richard.

ok, at least there should be no big problem of missing optimization
around volatile memory.

apparently the problem is here:

ifcvt.c:2539 there is a test for side effects of source (which is 'a'
in this case)

2539  if (! noce_operand_ok (a) || ! noce_operand_ok (b))
(gdb) p debug_rtx(a)
(mem/v/c/i:DI (symbol_ref:DI ("mmio") [flags 0x40] ) [2 mmio+0 S8 A64])

but inside noce_operand_ok() there is a wrong order of tests:

2332  if (MEM_P (op))
2333return ! side_effects_p (XEXP (op, 0));
2334
2335  if (side_effects_p (op))
2336return FALSE;
2337

where XEXP removes the memory reference leaving just symbol reference,
that has no volatile attribute
#0  side_effects_p (x=0x7149c660) at ../../gcc/rtlanal.c:2152
(gdb) p debug_rtx(x)
(symbol_ref:DI ("mmio") [flags 0x40] )

Is the following fix is Ok?
I'm testing it so far.

Sergos

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 784e2e8..3b05c2a 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2329,12 +2329,12 @@ noce_operand_ok (const_rtx op)
 {
   /* We special-case memories, so handle any of them with
  no address side effects.  */
-  if (MEM_P (op))
-return ! side_effects_p (XEXP (op, 0));
-
   if (side_effects_p (op))
 return FALSE;

+  if (MEM_P (op))
+return ! side_effects_p (XEXP (op, 0));
+
   return ! may_trap_p (op);
 }

diff --git a/gcc/testsuite/gcc.target/i386/47698.c
b/gcc/testsuite/gcc.target/i386/47698.c
new file mode 100644
index 000..2c75109
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/47698.c
@@ -0,0 +1,10 @@
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-not "cmov" } } */
+
+extern volatile unsigned long mmio;
+unsigned long foo(int cond)
+{
+  if (cond)
+  return mmio;
+return 0;
+}


47698.patch
Description: Binary data


Re: [RFC PATCH] update to libtool-2.4.2 and regenerate

2011-10-28 Thread Joseph S. Myers
On Fri, 28 Oct 2011, Markus Trippelsdorf wrote:

> > What about the issues with libtool's notion of sysroots, as mentioned in 
> >  - have those 
> > been resolved, and if so, how?  And what about the directories only 
> > present in the src repository that will also need updating for such a 
> > toplevel change?
> 
> Oops, you're right, libtools "with_sysroot" still clashes with gcc's
> notion. But that should be easily fixable by just reverting one commit
> (3334f7ed5851ef1) in libtool.
> I'm afraid I don't understand your second question. Can you elaborate on
> that?

Because of the shared toplevel build system, the gcc and src repositories 
need to have their configure scripts etc. regenerated at the same time for 
a change such as this (and it should be tested that binutils, newlib and 
the other directories in the src repository do still build properly after 
the regeneration).

-- 
Joseph S. Myers
jos...@codesourcery.com


PING 2 : [Patch Darwin/PR49992 1/2] remove ranlib special-casing from the darwin port.

2011-10-28 Thread Iain Sandoe
Since this is approved by Mike, if there is no further comment by  
Monday, I plan to apply it.


On 22 Oct 2011, at 08:36, Iain Sandoe wrote:



On 14 Oct 2011, at 10:36, Iain Sandoe wrote:

As per the PR audit trail, there is no reason to retain this  
special-casing for Darwin.
(given that current GCC is not build-able using Darwin toolsets of  
the vintage that required the case).


Mike has OK'd this off-list - but, since Ralf commented on the  
previous version, I'd like to give him the opportunity to comment  
here.

OK for trunk?
Iain

* configure.ac: Remove ranlib special case for Darwin port.
* gcc/configure.ac: Likewise.
* configure: Regenerate.
* gcc/configure: Regenerate.

Index: configure.ac
===
--- configure.ac(revision 179962)
+++ configure.ac(working copy)
@@ -2274,10 +2274,6 @@ case "${target}" in
   extra_arflags_for_target=" -X32_64"
   extra_nmflags_for_target=" -B -X32_64"
   ;;
-  *-*-darwin[[3-9]]*)
-# ranlib before Darwin10 requires the -c flag to look at  
common symbols.

-extra_ranlibflags_for_target=" -c"
-;;
esac

alphaieee_frag=/dev/null
Index: gcc/configure.ac
===
--- gcc/configure.ac(revision 179962)
+++ gcc/configure.ac(working copy)
@@ -829,17 +829,7 @@ esac
gcc_AC_PROG_LN_S
ACX_PROG_LN($LN_S)
AC_PROG_RANLIB
-case "${host}" in
-*-*-darwin*)
-  # By default, the Darwin ranlib will not treat common symbols as
-  # definitions when  building the archive table of contents.  Other
-  # ranlibs do that; pass an option to the Darwin ranlib that makes
-  # it behave similarly.
-  ranlib_flags="-c"
-  ;;
-*)
-  ranlib_flags=""
-esac
+ranlib_flags=""
AC_SUBST(ranlib_flags)

gcc_AC_PROG_INSTALL









PING 2 : [Patch Darwin/PR49992 2/2] remove ranlib special-casing from the darwin port.

2011-10-28 Thread Iain Sandoe

This is unreviewed for 2 weeks.

I am sure that this issue will be affecting Ada on Darwin10/11 with  
the latest toolchains.


It might be subtle without LTO - OTOH when LTO is engaged it breaks  
things completely.



On 22 Oct 2011, at 08:37, Iain Sandoe wrote:



On 14 Oct 2011, at 10:37, Iain Sandoe wrote:

As per the PR audit trail, there is no reason to retain this in the  
building of GCC.


As for its use as a general option in tool-builds;
With current darwin toolsets it has the potential to cause issues  
when using convenience libs containing common.

OK for trunk?
Iain

gcc/ada:

PR target/49992
* mlib-tgt-specific-darwin.adb: Remove ranlib special case.
* gcc-interface/Makefile.in (darwin): Likewise.


Index: gcc/ada/mlib-tgt-specific-darwin.adb
===
--- gcc/ada/mlib-tgt-specific-darwin.adb(revision 179962)
+++ gcc/ada/mlib-tgt-specific-darwin.adb(working copy)
@@ -68,7 +68,7 @@ package body MLib.Tgt.Specific is

  function Archive_Indexer_Options return String_List_Access is
  begin
-  return new String_List'(1 => new String'("-c"));
+  return new String_List'(1 => new String'(""));
  end Archive_Indexer_Options;

  ---
Index: gcc/ada/gcc-interface/Makefile.in
===
--- gcc/ada/gcc-interface/Makefile.in   (revision 179962)
+++ gcc/ada/gcc-interface/Makefile.in   (working copy)
@@ -2179,7 +2179,6 @@ ifeq ($(strip $(filter-out darwin%,$(osys))),)

 EH_MECHANISM=-gcc
 GNATLIB_SHARED = gnatlib-shared-darwin
-  RANLIB = ranlib -c
 GMEM_LIB = gmemlib
 LIBRARY_VERSION := $(LIB_VERSION)
 soext = .dylib








Re: [PATCH][2/n] LTO option handling/merging rewrite

2011-10-28 Thread Joseph S. Myers
On Fri, 28 Oct 2011, Richard Guenther wrote:

> +   /* Fallthru.  */
> + case OPT_fPIC:
> + case OPT_fpic:
> + case OPT_fpie:
> + case OPT_fcommon:
> + case OPT_fexceptions:
> +   append_option (decoded_options, decoded_options_count, foption);
> +   break;

No doubt this is what the previous code did, but in this case shouldn't 
"union" mean the biggest PIC status of any file wins (thus, if -fPIC was 
the PIC option that actually had effect on some object, that wins over an 
explicit -fno-PIC or -fpic on another object)?  In general whether the 
options are positive or negative matters, and I don't see that handled 
here.

(Actually, maybe the smallest PIC status should win - i.e. if any object 
is not PIC then the final code can be presumed to be non-PIC.)

(Using Negative in .opt files for groups of options such as -fPIC/-fpic 
would ensure that at most one survives from any one object, but you still 
need to work out what you want to do for merging.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][2/n] LTO option handling/merging rewrite

2011-10-28 Thread Jack Howarth
On Fri, Oct 28, 2011 at 01:43:07PM +0200, Richard Guenther wrote:
> 
> This moves the existing processing of user options from lto1 to
> the lto driver (lto-wrapper).  It also changes the way we stream
> user options from some custom binary format over to simply
> streaming the original command-line as passed to the compiler
> by the driver as a COLLECT_GCC_OPTIONS-like string.
> 
> lto-wrapper in this patch version (tries to) performs exactly
> the same as lto1 tried to do - re-issue all target specific
> options and a selected set of switches.
> 
> I chose to do this as an incremental step which should not
> change behavior or regress in any way (fingers crossing).
> 
> >From here we can think what would be the most sensible behavior
> (and maybe start tagging options in the .opt files so they
> get treatment based on some flag).
> 
> LTO bootstrapped and tested on x86_64-unknown-linux-gnu, ok?

LTO bootstraps on x86_64-apple-darwin11 applied to r180613 with no
regressions in lto.exp.
  Jack

> 
> Thanks,
> Richard.
> 
> 2011-10-28  Richard Guenther  
> 
>   * lto-opts.c: Re-implement.
>   * lto-streamer.h (lto_register_user_option): Remove.
>   (lto_read_file_options): Likewise.
>   (lto_reissue_options): Likewise.
>   (lto_clear_user_options): Likewise.
>   (lto_clear_file_options): Likewise.
>   * opts-global.c (post_handling_callback): Remove.
>   (set_default_handlers): Do not set post_handling_callback.
>   (decode_options): Remove LTO specific code.
>   * lto-wrapper.c (merge_and_complain): New function.
>   (run_gcc): Read all input file options and
>   prepend a merged set before the linker driver options.
>   * gcc.c (driver_post_handling_callback): Remove.
>   (set_option_handlers): Do not set post_handling_callback.
>   * opts-common.c (handle_option): Do not call post_handling_callback.
>   * opts.h (struct cl_option_handlers): Remove post_handling_callback.
> 
>   lto/
>   * lto-lang.c (lto_post_options): Do not read file options.
>   * lto.c (lto_read_all_file_options): Remove.
>   (lto_init): Call lto_set_in_hooks here.
> 
> 
> Index: trunk/gcc/lto-opts.c
> ===
> *** trunk.orig/gcc/lto-opts.c 2011-10-27 15:24:59.0 +0200
> --- trunk/gcc/lto-opts.c  2011-10-28 12:15:52.0 +0200
> ***
> *** 1,6 
>   /* LTO IL options.
>   
> !Copyright 2009, 2010, 2011 Free Software Foundation, Inc.
>  Contributed by Simon Baldwin 
>   
>   This file is part of GCC.
> --- 1,6 
>   /* LTO IL options.
>   
> !Copyright 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
>  Contributed by Simon Baldwin 
>   
>   This file is part of GCC.
> *** along with GCC; see the file COPYING3.
> *** 33,422 
>   #include "common/common-target.h"
>   #include "diagnostic.h"
>   #include "lto-streamer.h"
> ! 
> ! /* When a file is initially compiled, the options used when generating
> !the IL are not necessarily the same as those used when linking the
> !objects into the final executable.  In general, most build systems
> !will proceed with something along the lines of:
> ! 
> ! $ gcc  -flto -c f1.c -o f1.o
> ! $ gcc  -flto -c f2.c -o f2.o
> ! ...
> ! $ gcc  -flto -c fN.c -o fN.o
> ! 
> !And the final link may or may not include the same  used
> !to generate the initial object files:
> ! 
> ! $ gcc  -flto -o prog f1.o ... fN.o
> ! 
> !Since we will be generating final code during the link step, some
> !of the flags used during the compile step need to be re-applied
> !during the link step.  For instance, flags in the -m family.
> ! 
> !The idea is to save a selected set of  in a special
> !section of the initial object files.  This section is then read
> !during linking and the options re-applied.
> ! 
> !FIXME lto.  Currently the scheme is limited in that only the
> !options saved on the first object file (f1.o) are read back during
> !the link step.  This means that the options used to compile f1.o
> !will be applied to ALL the object files in the final link step.
> !More work needs to be done to implement a merging and validation
> !mechanism, as this will not be enough for all cases.  */
> ! 
> ! /* Saved options hold the type of the option (currently CL_TARGET or
> !CL_COMMON), and the code, argument, and value.  */
> ! 
> ! typedef struct GTY(()) opt_d
> ! {
> !   unsigned int type;
> !   size_t code;
> !   char *arg;
> !   int value;
> ! } opt_t;
> ! 
> ! DEF_VEC_O (opt_t);
> ! DEF_VEC_ALLOC_O (opt_t, heap);
> ! 
> ! 
> ! /* Options are held in two vectors, one for those registered by
> !command line handling code, and the other for those read in from
> !any LTO IL input.  */
> ! static VEC(opt_t, heap) *user_options = NULL;
> ! static VEC(opt_t, heap) *file_options = NULL;
> ! 
> ! /* Iterat

Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-10-28 Thread Richard Henderson
On 10/28/2011 05:49 AM, Sergey Ostanevich wrote:
> - the code for CMOV is under #ifdef in the mddle-end, which is
> explicitly marked as "have to be removed" (ifcvt.c:1446)
> - I have no clear evidence all platforms that support conditional move
> have the same semantics that lead to the PR

We're on solid ground here.  The arguments are assumed to always be
evaluated in RTL, *except* in the case of COND_EXEC.  Only true
predication can avoid an exception or side effects of touching memory.

> I think the best way to address both concerns is to implement code
> that relies on а new hookup "volatile-safe CMOV" that is false by
> default.

We do not need a new hook.


r~


Re: [RFC PATCH] update to libtool-2.4.2 and regenerate

2011-10-28 Thread Markus Trippelsdorf
On 2011.10.28 at 15:34 +, Joseph S. Myers wrote:
> On Fri, 28 Oct 2011, Markus Trippelsdorf wrote:
> 
> > > What about the issues with libtool's notion of sysroots, as mentioned in 
> > >  - have those 
> > > been resolved, and if so, how?  And what about the directories only 
> > > present in the src repository that will also need updating for such a 
> > > toplevel change?
> > 
> > Oops, you're right, libtools "with_sysroot" still clashes with gcc's
> > notion. But that should be easily fixable by just reverting one commit
> > (3334f7ed5851ef1) in libtool.
> > I'm afraid I don't understand your second question. Can you elaborate on
> > that?
> 
> Because of the shared toplevel build system, the gcc and src repositories 
> need to have their configure scripts etc. regenerated at the same time for 
> a change such as this (and it should be tested that binutils, newlib and 
> the other directories in the src repository do still build properly after 
> the regeneration).

OK. Although it appears that it was handled in a cascaded fashion the
last time:
Last gcc update was on : 2009-12-05
Last binutils update   : 2010-01-09

-- 
Markus


Re: Use of vector instructions in memmov/memset expanding

2011-10-28 Thread Richard Henderson
On 10/28/2011 05:41 AM, Michael Zolotukhin wrote:
>> > +/* Target hook.  Returns rtx of mode MODE with promoted value VAL, that is
>> > +   supposed to represent one byte.  MODE could be a vector mode.
>> > +   Example:
>> > +   1) VAL = const_int (0xAB), mode = SImode,
>> > +   the result is const_int (0xABABABAB).
>> >
>> > This can be handled in machine independent way, right?
>> >
>> > +   2) if VAL isn't const, then the result will be the result of 
>> > MUL-instruction
>> > +   of VAL and const_int (0x01010101) (for SImode).  */
>> >
>> > This would probably go better as named expansion pattern, like we do for 
>> > other
>> > machine description interfaces.
> I don't think it could be done in machine-independent way - e.g. if
> AVX is available, we could use broadcast-instructions, if not - we
> need to use multiply-instructions, on other architectures there
> probably some other more efficient ways to duplicate byte value across
> the entire vector register. So IMO it's a good place to have a hook.
> 
> 

Certainly it can be done machine-independently.
See expand_vector_broadcast in optabs.c for a start.


r~


[PING] Pass address space to REGNO_MODE_CODE_OK_FOR_BASE_P

2011-10-28 Thread Ulrich Weigand

The following patch still needs maintainer review:
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01874.html

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [C++ Patch / RFC] PR 50870

2011-10-28 Thread Jason Merrill

On 10/27/2011 08:22 PM, Paolo Carlini wrote:

I'm trying to figure out where the very different args argument is
coming from.



Earlier than that things are different: in mainline the same args, as
arglist, comes from fixup_template_parm, and earlier we have
fixup_template_parms which creates the arglist itself


Right, this is new.  I guess the COMPONENT_REF code needs to be fixed to 
handle partial instantiation.


Jason


Re: [C++ Patch / RFC] PR 50870

2011-10-28 Thread Paolo Carlini

On 10/28/2011 06:07 PM, Jason Merrill wrote:

On 10/27/2011 08:22 PM, Paolo Carlini wrote:

I'm trying to figure out where the very different args argument is
coming from.



Earlier than that things are different: in mainline the same args, as
arglist, comes from fixup_template_parm, and earlier we have
fixup_template_parms which creates the arglist itself


Right, this is new.  I guess the COMPONENT_REF code needs to be fixed 
to handle partial instantiation.
I see. Something I didn't tell you yesterday, is that, in 4_5-branch, 
tsubst_template_arg is called like this, by coerce_template_parms:


  /* There must be a default arg in this case.  */
  arg = tsubst_template_arg (TREE_PURPOSE (parm), new_args,
 complain, in_decl);

which, what can I say, looks right ;)

By the way, your hint that probably we are also passing a wrong first 
argument to qualified_name_lookup_error may be useful for the other 
issue, the ice-on-invalid, which I was trying to fix in the parser ;) 
Let me test something...


Paolo.


Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)

2011-10-28 Thread Mikael Morin
On Friday 28 October 2011 15:56:36 Jack Howarth wrote:
> Mikael,
> The complete patch bootstraps current FSF gcc trunk on
> x86_64-apple-darwin11 and the resulting gfortran compiler can compile the
> Polyhedron 2005 benchmarks using...
> 
> Compile Command : gfortran-fsf-4.7 -O3 -ffast-math -funroll-loops -flto
> -fwhole-program %n.f90 -o %n
> 
> without runtime regressions. However I don't seem to see any particular
> performance improvements with your patches applied. In fact, a few
> benchmarks including nf and test_fpu seem to show slower runtimes
> (~8-11%). Have you done any benchmarking with and without the proposed
> patches? Jack

Not myself, but the previous versions of the patch have been reported to give 
sensitive improvement on "tonto" here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c26
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c35

Since those versions, the array constructor handling has been improved, and a 
few mostly cosmetic changes have been applied, so I expect the posted patch to 
be on par with the previous ones, possibly slightly better.

Now regarding your regressions, it is quite a lot worse, and quite unexpected.
I have just looked at test_fpu.f90 and nf.f90 from a polyhedron source I have 
found at http://www.polyhedron.com/web_images/documents/pb05.zip. 
There is no call to product in them, and both use only single-argument sum 
calls, which are not (or shouldn't be) impacted by my patch (scalar cases). 
Indeed, if I compare the code produced using -fdump-tree-original, there is 
zero difference in nf.f90, and in test_fpu.f90 only slight variations which 
are very very unlikely to cause the regression you see (see attached diff).

Could you double check your figures, and/or that the regressions are really 
caused by my patch?

Mikael
--- test_fpu.f90.003t.original.master	2011-10-28 18:08:53.0 +0200
+++ test_fpu.f90.003t.original.patched	2011-10-28 18:22:28.0 +0200
@@ -1929,6 +1929,7 @@
   D.2297 = offset.65 + -1;
   atmp.64.dim[0].ubound = D.2297;
   pos.61 = D.2297 >= 0 ? 1 : 0;
+  offset.62 = 1;
   {
 integer(kind=8) S.67;
 
@@ -1936,7 +1937,6 @@
 while (1)
   {
 if (S.67 > D.2297) goto L.133;
-offset.62 = 1;
 if (ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]> > limit.63)
   {
 limit.63 = ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]>;
@@ -2406,14 +2406,14 @@
   integer(kind=8) D.2457;
   integer(kind=8) S.104;
 
-  D.2457 = D.2436 + D.2442;
-  D.2458 = stride.45;
+  D.2457 = stride.45;
+  D.2458 = D.2436 + D.2442;
   D.2459 = D.2443 * stride.45 + D.2439;
   S.104 = 0;
   while (1)
 {
   if (S.104 > D.2444) goto L.149;
-  (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2458 + D.2457];
+  (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2457 + D.2458];
   S.104 = S.104 + 1;
 }
   L.149:;
@@ -2486,13 +2486,13 @@
   integer(kind=8) D.2479;
   integer(kind=8) S.106;
 
-  D.2479 = D.2473 + D.2476;
-  D.2480 = stride.45;
+  D.2479 = stride.45;
+  D.2480 = D.2473 + D.2476;
   S.106 = D.2471;
   while (1)
 {
   if (S.106 > D.2472) goto L.152;
-  (*b)[(S.106 + D.2477) * D.2480 + D.2479] = (*temp)[S.106 + -1];
+  (*b)[(S.106 + D.2477) * D.2479 + D.2480] = (*temp)[S.106 + -1];
   S.106 = S.106 + 1;
 }
   L.152:;
@@ -2756,13 +2756,13 @@
   integer(kind=8) D.2549;
   integer(kind=8) S.112;
 
-  D.2549 = D.2543 + D.2546;
-  D.2550 = stride.45;
+  D.2549 = stride.45;
+  D.2550 = D.2543 + D.2546;
   S.112 = 1;
   while (1)
 {
   if (S.112 > D.2542) goto L.168;
-  (*b)[(S.112 + D.2547) * D.2550 + D.2549] = (*temp)[S.112 + -1];
+

Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-10-28 Thread Peter Bergner
On Fri, 2011-10-28 at 08:20 -0700, Richard Henderson wrote:
> On 10/27/2011 06:43 PM, Peter Bergner wrote:
> > Ok, here's a patch to implement that, and it passes bootstrap and
> > regtesting.  Richard, is this what you had in mind?  I'll note that
> > I disabled rs6000_code_end for TARGET_POWERPC64, since I was running
> > into linker errors when building libgcc.  The merging of the thunk
> > routines with comdat worked fine, but the thunk function also has a
> > function descriptor and I couldn't figure out a way to get those
> > merged properly (if it's even possible), so they led to multiply
> > defined symbol linker errors.
> 
> That's something you might have to discuss with David and Alan.

So David, do we even want to bother trying to support this on -m64
given the only cpu that needs this is a 32-bit only cpu?  If so, I
can try and work with Alan to figure out how we can merge the
function descriptors for the thunk routines when using -m64.



> It does look like you're missing the stub for ppc64, and yet
> you invoke it?  At least, I don't see anything earlier that
> tests ppc64, only in rs6000_code_end.

Oops, you're write.  Had I compiled with -m64 -mcpu=power7 -mtune=476fp,
I would have caught that.  I guess (supposing we don't want to support
64-bit) I should have the following hunks instead, correct?

+  /* If not explicitly specified via option, decide whether to generate the
+ extra blr's required to preserve the link stack on some cpus (eg, 476).  
*/
+  if (TARGET_LINK_STACK == -1)
+SET_TARGET_LINK_STACK (rs6000_cpu == PROCESSOR_PPC476
+  && flag_pic
+  && !TARGET_POWERPC64);
+


+static void
+rs6000_code_end (void)
+{
+  char name[32];
+  tree decl;
+
+  if (!TARGET_LINK_STACK)
+return;
...



Peter


~   



Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-10-28 Thread Richard Henderson
On 10/28/2011 09:36 AM, Peter Bergner wrote:
> Oops, you're write.  Had I compiled with -m64 -mcpu=power7 -mtune=476fp,
> I would have caught that.  I guess (supposing we don't want to support
> 64-bit) I should have the following hunks instead, correct?
> 
> +  /* If not explicitly specified via option, decide whether to generate the
> + extra blr's required to preserve the link stack on some cpus (eg, 476). 
>  */
> +  if (TARGET_LINK_STACK == -1)
> +SET_TARGET_LINK_STACK (rs6000_cpu == PROCESSOR_PPC476
> +  && flag_pic
> +  && !TARGET_POWERPC64);

Not quite.  You can't allow the user to set TARGET_LINK_STACK either,
for 64-bit.  Because it won't work without further fixups.  More like

  if (TARGET_POWERPC64)
SET_TARGET_LINK_STACK (0);
  if (TARGET_LINK_STACK == -1)
SET_TARGET_LINK_STACK (rs6000_cpu == PROCESSOR_PPC476 && flag_pic);

That first test could possibly be more refined, like testing AIX
calling conventions or DOT_SYMBOLS.  But it hardly seems worthwhile.


r~


Re: [RFC PATCH] update to libtool-2.4.2 and regenerate

2011-10-28 Thread Andreas Tobler

On 28.10.11 01:35, Markus Trippelsdorf wrote:

By popular demand, I've prepared a patch that updates the in-tree
libtool to version 2.4.2. It is needed for lto-bootstrap with
-fno-fat-lto-objects and FreeBSD10.x versions.
It's a pretty big update as you can see by the following diffstat. I
cannot attach the patch even as a gzip file, because of its size:

  417745 Oct 28 00:47 0001-update-to-libtool-2.4.2-and-regenerate.patch.gz

Bootstrapped on x86_64-pc-linux-gnu.


For the record:

http://gcc.gnu.org/ml/gcc-testresults/2011-10/msg03138.html

Thanks!!!
Andreas



Re: Go patch committed: Implement new syscall package

2011-10-28 Thread Rainer Orth
Ian,

>> I committed this patch which should fix this problem.  Bootstrapped and
>> ran Go testsuite on x86_64-unknown-linux-gnu.
>
> thanks, but this is not enough:
>
> nawk: syntax error at source line 173
>  context is
>  ([^ >>>  ]*)$", <<<  cparam) == 0) {
> nawk: illegal statement at source line 173
> nawk: syntax error at source line 179
>
> and there is another instance on l.210.  I haven't tried fixing this
> myself since I'm fighting with other issues.

even if I work around this by installing gawk 4.0.0 on Solaris 8/9, I
run into another issue:

/vol/gcc/src/hg/trunk/local/libgo/go/syscall/errstr_nor.go:22:8: error: referenc
e to undefined name 'libc_strerror'
make[4]: *** [syscall/syscall.lo] Error 1

Replacing libc_strerror (which doesn't exist anywhere) by strerror isn't
enough, though:

/vol/gcc/src/hg/trunk/local/libgo/go/syscall/errstr_nor.go:22:2: error: 
variable has no type
/vol/gcc/src/hg/trunk/local/libgo/go/syscall/errstr_nor.go:22:2: error: 
incompatible type in initialization (non-value used as value)
make[2]: *** [syscall/syscall.lo] Error 1

I couldn't figure out what's wrong here; I'll need considerable more
time with the Go tutorial etc.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [RFC PATCH] update to libtool-2.4.2 and regenerate

2011-10-28 Thread Matthias Klose
On 10/28/2011 10:33 AM, Rainer Orth wrote:
> Markus Trippelsdorf  writes:
> 
>> By popular demand, I've prepared a patch that updates the in-tree
>> libtool to version 2.4.2. It is needed for lto-bootstrap with
>> -fno-fat-lto-objects and FreeBSD10.x versions. 
> 
> I see that your patch doesn't deal with libgo/config, where a private
> copy of libtool is kept.  Would it be possible to get rid of that, given
> that 2.4.2 does support Go?

same for libjava/libltdl


[C++ Patch] PR 50864

2011-10-28 Thread Paolo Carlini

Hi,

as per the recent discussion. This also changes c++/50870 from 
ice-on-valid to reject-valid, a *tad* better I think (but I mean to 
continue working on it, for a while). Tested x86_64-linux.


Ok for mainline?

Thanks,
Paolo.


/cp
2011-10-28  Paolo Carlini  

PR c++/50864
* pt.c (tsubst_copy_and_build): Fix qualified_name_lookup_error
call in case COMPONENT_REF.

/testsuite
2011-10-28  Paolo Carlini  

PR c++/50864
* testsuite/g++.dg/template/crash109.C: New.
Index: testsuite/g++.dg/template/crash109.C
===
--- testsuite/g++.dg/template/crash109.C(revision 0)
+++ testsuite/g++.dg/template/crash109.C(revision 0)
@@ -0,0 +1,10 @@
+// PR c++/50864
+
+namespace impl
+{
+  template  T create();
+}
+
+template () -> impl::create())>  // { dg-error "not 
a member" } 
+struct foo;
Index: cp/pt.c
===
--- cp/pt.c (revision 180619)
+++ cp/pt.c (working copy)
@@ -13741,14 +13741,12 @@ tsubst_copy_and_build (tree t,
else if (TREE_CODE (member) == SCOPE_REF
 && TREE_CODE (TREE_OPERAND (member, 1)) == TEMPLATE_ID_EXPR)
  {
-   tree tmpl;
-   tree args;
-
/* Lookup the template functions now that we know what the
   scope is.  */
-   tmpl = TREE_OPERAND (TREE_OPERAND (member, 1), 0);
-   args = TREE_OPERAND (TREE_OPERAND (member, 1), 1);
-   member = lookup_qualified_name (TREE_OPERAND (member, 0), tmpl,
+   tree scope = TREE_OPERAND (member, 0);
+   tree tmpl = TREE_OPERAND (TREE_OPERAND (member, 1), 0);
+   tree args = TREE_OPERAND (TREE_OPERAND (member, 1), 1);
+   member = lookup_qualified_name (scope, tmpl,
/*is_type_p=*/false,
/*complain=*/false);
if (BASELINK_P (member))
@@ -13762,7 +13760,7 @@ tsubst_copy_and_build (tree t,
  }
else
  {
-   qualified_name_lookup_error (object_type, tmpl, member,
+   qualified_name_lookup_error (scope, tmpl, member,
 input_location);
return error_mark_node;
  }


[pph] Various Merging Fixes (issue5330048)

2011-10-28 Thread Lawrence Crowl
Add namespace merging.  This change generalizes pph_out_merge_keys on
the global namespace to all namspaces.  Preallocate decl_lang_specific
for namespaces with streaming in their keys.  Stream out namespace
members in declaration order.  This change mysteriously fixes
mysterious bugs.

Add initial support for type merging.  This support includes
references to merge keys.  It modifies the key hash to include the
tree code, because type decls and types otherwise have the same
hash.  Types do not appear on a chain, so merging into a null chain is
avoided.  Type merging is off at the moment to test namespaces.

Handle unnamed decls by using the location instead of the mangled name
in creating the hash string.

Change the pph_trace_tree function from a set of bool parameters to a
single enum pph_trace_kind.  This change captures more information and
avoids printing trees before their contents are streamed in.  Change
the call sites to uniformly use a postorder traversal for tracing.
This makes in and out traces directly comparable.

Bootstrapped on x64.


Index: gcc/cp/ChangeLog.pph

2011-10-28   Lawrence Crowl  

* pph.c (pph_dump_tree_name): Remove dead code.  Dump tree_code also.
* pph-streamer.h (enum pph_trace_kind): New.
(pph_trace_tree): Change bool parameters to a single enum parameter.
Update callers to match.
(pph_tree_is_mergeable): All decls and types are mergeable.
* pph-streamer.c (pph_trace_tree): Avoid printing names for unmerge
keys, as they are too sparse to print.  Change bool parameters to a
single enum parameter.  Update body to match.
* pph-streamer-out.c (pph_out_start_merge_key_record): Handle reference
merge keys.  Return status.
(pph_out_merge_body_vec): New.
(pph_out_merge_body_chain): New.
(pph_out_merge_keys): Replace with general binding routines below.
(pph_out_binding_merge_keys): New.
(pph_out_binding_merge_bodies): New.
(pph_out_global_binding): Use the above.
(pph_merge_name): Handle types as well as decls.  Handle unnamed decls.
Handle namespaces.  Add disabled handling of types.
(pph_out_tree): Move tracing to postorder traversal to match tracing of
input streaming.
* pph-streamer-in.c (htab_merge_key_hash): Hash in tree code.
(pph_merge_into_chain): Do not merge into null chains.
(pph_in_binding_level): Split ALLOC_AND_REGISTER into constituents
for future registration of previously allocated binding level.
(pph_in_merge_keys): Replace with general binding routines below.
(pph_in_binding_merge_keys): New.
(pph_in_binding_merge_bodies): New.
(pph_in_global_binding): Use the above.
(pph_in_lang_specific): Avoid reallocating DECL_LANG_SPECIFIC.
(pph_in_merge_key_tree): Reformat comment.  Handle null and reference
markers, which may be needed for types.  Add handling of namespaces.
Add disabled handling of classes and types.


Index: gcc/cp/pph.c
===
--- gcc/cp/pph.c(revision 180550)
+++ gcc/cp/pph.c(working copy)
@@ -78,34 +78,20 @@ pph_dump_min_decl (FILE *file, tree decl
 void
 pph_dump_tree_name (FILE *file, tree t, int flags)
 {
-#if 0
   enum tree_code code = TREE_CODE (t);
-  fprintf (file, "%s\t", pph_tree_code_text (code));
-  if (code == FUNCTION_TYPE || code == METHOD_TYPE)
-{
-  dump_function_to_file (t, file, flags);
-}
-  else
-{
-  print_generic_expr (file, TREE_TYPE (t), flags);
-  /* FIXME pph: fprintf (file, " ", cxx_printable_name (t, 0)); */
-  fprintf (file, " " );
-  print_generic_expr (file, t, flags);
-}
-  fprintf (file, "\n");
-#else
+  const char *text = pph_tree_code_text (code);
   if (DECL_P (t))
-fprintf (file, "%s\n", decl_as_string (t, flags));
+fprintf (file, "%s %s\n", text, decl_as_string (t, flags));
   else if (TYPE_P (t))
-fprintf (file, "%s\n", type_as_string (t, flags));
+fprintf (file, "%s %s\n", text, type_as_string (t, flags));
   else if (EXPR_P (t))
-fprintf (file, "%s\n", expr_as_string (t, flags));
+fprintf (file, "%s %s\n", text, expr_as_string (t, flags));
   else
 {
+  fprintf (file, "%s ", text );
   print_generic_expr (file, t, flags);
   fprintf (file, "\n");
 }
-#endif
 }
 
 
Index: gcc/cp/pph-streamer-in.c
===
--- gcc/cp/pph-streamer-in.c(revision 180550)
+++ gcc/cp/pph-streamer-in.c(working copy)
@@ -750,7 +750,8 @@ htab_merge_key_hash (const void *p)
   const merge_toc_entry *key = (const merge_toc_entry *) p;
   hashval_t context_val = htab_hash_pointer (key->context);
   hashval_t name_val = htab_hash_string (key->name);
-  return iterative_hash_hashval_t (context_val, name_val);
+  hashval_t id_val = iterative_hash_hashval_t (name_val, TR

[Patch Ada RFA] make sure that multilibs are built with correct s-oscons.ads

2011-10-28 Thread Iain Sandoe
The sizes of items represented in s-oscons.ads can (and do) change  
with the multi-lib on targets that support  libada as a multi-lib.


At present, s-oscons.ads is only built once (in gcc/ada) and sym- 
linked to rts*/


This is causing a bunch of failures on i686-darwin9 where the m64  
multi-lib has generally larger structures than the m32 native.


On m64 targets with m32 multi-libs, this tends to be hidden by the  
fact that (generally) the m32 entities are smaller than their m64  
counterparts.  However, it's still wrong (at least insofar as wasting  
memory - if not on any more serious scale).


The attached patch moves the generation and use of the xoscons tool to  
the the library makefile (as suggested by Thomas) and adjusts the  
libada/Makefile dependency to point to this tool.  The remaining  
dependencies should (AFAICT) be handled by the gnatlib target - which  
thence depends on the required objects.


I don't have that many targets to test - and would very much welcome  
any more-Ada-build-system-aware  eyes cast over this.


This DTRT on i686-darwin9 (no unexpected fails at m64 when it's  
applied).


OK for trunk ?
what about 4.6 - given that this is a wrong code scenario?

cheers
Iain


ada:

* gcc-interface/Makefile.in (stamp-gnatlib-$(RTSDIR)): Don't
link s-oscons.ads.
(OSCONS_CPP, OSCONS_EXTRACT): New.
(./bldtools/oscons/xoscons): New Target.
($(RTSDIR)/s-oscons.ads): New Target.
(gnatlib): Depend on  $(RTSDIR)/s-oscons.ads.
* gcc-interface/Make-lang.in (ada/s-oscons.ads) Remove as dependency.
* Make-generated.in: Remove machinery to generate xoscons and
ada/s-oscons.ads.

libada:

Makefile.in: Change dependency on oscons to depend on the generator
tool.

Index: gcc/ada/gcc-interface/Makefile.in
===
--- gcc/ada/gcc-interface/Makefile.in   (revision 180619)
+++ gcc/ada/gcc-interface/Makefile.in   (working copy)
@@ -2498,21 +2498,50 @@ install-gnatlib: ../stamp-gnatlib-$(RTSDIR)
$(RTSDIR)/$(word 1,$(subst <, ,$(PAIR)));)
 # Copy tsystem.h
$(CP) $(srcdir)/tsystem.h $(RTSDIR)
-# Copy generated target dependent sources
-   $(RM) $(RTSDIR)/s-oscons.ads
-   (cd $(RTSDIR); $(LN_S) ../s-oscons.ads s-oscons.ads)
$(RM) ../stamp-gnatlib-$(RTSDIR)
touch ../stamp-gnatlib1-$(RTSDIR)
 
 # GNULLI End #
 
+ifeq ($(strip $(filter-out alpha64 ia64 dec hp vms% openvms% alphavms%,$(subst 
-, ,$(host,)
+OSCONS_CPP=../../$(DECC) -E /comment=as_is -DNATIVE \
+ -DTARGET='""$(target)""' $(fsrcpfx)ada/s-oscons-tmplt.c
+
+OSCONS_EXTRACT=../../$(DECC) -DNATIVE \
+ -DTARGET='""$(target)""' $(fsrcpfx)ada/s-oscons-tmplt.c ; \
+  ld -o s-oscons-tmplt.exe s-oscons-tmplt.obj; \
+  ./s-oscons-tmplt.exe > s-oscons-tmplt.s
+
+else
+# GCC_FOR_TARGET has paths relative to the gcc directory, so we need to adjust
+# for running it from $(RTSDIR)
+OSCONS_CC=`echo "$(GCC_FOR_TARGET)" \
+  | sed -e 's^\./xgcc^../../xgcc^' -e 's^-B./^-B../../^'`
+OSCONS_CPP=$(OSCONS_CC) $(GNATLIBCFLAGS) -E -C \
+  -DTARGET=\"$(target)\" $(fsrcpfx)ada/s-oscons-tmplt.c > s-oscons-tmplt.i
+OSCONS_EXTRACT=$(OSCONS_CC) -S s-oscons-tmplt.i
+endif
+
+./bldtools/oscons/xoscons: xoscons.adb xutil.ads xutil.adb
+   -$(MKDIR) ./bldtools/oscons
+   $(RM) $(addprefix ./bldtools/oscons/,$(notdir $^))
+   $(CP) $^ ./bldtools/oscons
+   (cd ./bldtools/oscons ; gnatmake -q xoscons)
+
+$(RTSDIR)/s-oscons.ads: ../stamp-gnatlib1-$(RTSDIR) s-oscons-tmplt.c gsocket.h 
./bldtools/oscons/xoscons
+   $(RM) $(RTSDIR)/s-oscons-tmplt.i $(RTSDIR)/s-oscons-tmplt.s
+   (cd $(RTSDIR) ; \
+   $(OSCONS_CPP) ; \
+   $(OSCONS_EXTRACT) ; \
+   ../bldtools/oscons/xoscons)
+
 # Don't use semicolon separated shell commands that involve list expansions.
 # The semicolon triggers a call to DCL on VMS and DCL can't handle command
 # line lengths in excess of 256 characters.
 # Example: cd $(RTSDIR); ar rc libfoo.a $(LONG_LIST_OF_OBJS)
 # is guaranteed to overflow the buffer.
 
-gnatlib: ../stamp-gnatlib1-$(RTSDIR) ../stamp-gnatlib2-$(RTSDIR)
+gnatlib: ../stamp-gnatlib1-$(RTSDIR) ../stamp-gnatlib2-$(RTSDIR) 
$(RTSDIR)/s-oscons.ads
$(MAKE) -C $(RTSDIR) \
CC="`echo \"$(GCC_FOR_TARGET)\" \
| sed -e 's,\./xgcc,../../xgcc,' -e 's,-B\./,-B../../,'`" \
Index: gcc/ada/gcc-interface/Make-lang.in
===
--- gcc/ada/gcc-interface/Make-lang.in  (revision 180619)
+++ gcc/ada/gcc-interface/Make-lang.in  (working copy)
@@ -568,7 +568,7 @@ canadian-gnattools: force
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools1-re
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools2
 
-gnatlib gnatlib-sjlj gnatlib-zcx gnatlib-shared: ada/s-oscons.ads force
+g

Re: [C++ Patch] PR 50864

2011-10-28 Thread Jason Merrill

OK.

Jason


[PATCH, i386]: Remove lshift_insn and lshift code attributes

2011-10-28 Thread Uros Bizjak
Hello!

We can extend existing code attributes.  Also, the patch includes some
stylistic changes in XOP shift patterns.

No functional changes.

2011-10-28  Uros Bizjak  

* config/i386/i386.md (shift_insn): Rename code attribute from
shiftrt_insn.  Also handle ashift RTX.
(shift): Rename code attribute from shiftrt.  Also handle ashift RTX.
(*): Rename from *. Update asm templates.
* config/i386/sse.md (any_lshift): Rename code iterator from lshift.
(lshift_insn): Remove code attribute.
(lshift): Remove code attribute.
(vlshr3): Use lshiftrt RTX.
(vashr3): Use ashiftrt RTX.
(vashl3): Use ashift RTX.
(avx2_v): Rename from avx2_v.  Use
any_lshift code iterator.  Update asm template.
* config/i386/i386.c (bdesc_args) <__builtin_ia32_psll>: Update.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}  and committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 180619)
+++ i386.md (working copy)
@@ -776,10 +776,11 @@
 (define_code_iterator any_shiftrt [lshiftrt ashiftrt])
 
 ;; Base name for define_insn
-(define_code_attr shiftrt_insn [(lshiftrt "lshr") (ashiftrt "ashr")])
+(define_code_attr shift_insn
+  [(ashift "ashl") (lshiftrt "lshr") (ashiftrt "ashr")])
 
 ;; Base name for insn mnemonic.
-(define_code_attr shiftrt [(lshiftrt "shr") (ashiftrt "sar")])
+(define_code_attr shift [(ashift "sll") (lshiftrt "shr") (ashiftrt "sar")])
 
 ;; Mapping of rotate operators
 (define_code_iterator any_rotate [rotate rotatert])
@@ -9579,7 +9580,7 @@
 
 ;; See comment above `ashl3' about how this works.
 
-(define_expand "3"
+(define_expand "3"
   [(set (match_operand:SDWIM 0 "" "")
(any_shiftrt:SDWIM (match_operand:SDWIM 1 "" "")
   (match_operand:QI 2 "nonmemory_operand" "")))]
@@ -9587,7 +9588,7 @@
   "ix86_expand_binary_operator (, mode, operands); DONE;")
 
 ;; Avoid useless masking of count operand.
-(define_insn_and_split "*3_mask"
+(define_insn_and_split "*3_mask"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
(any_shiftrt:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "0")
@@ -9613,7 +9614,7 @@
   [(set_attr "type" "ishift")
(set_attr "mode" "")])
 
-(define_insn_and_split "*3_doubleword"
+(define_insn_and_split "*3_doubleword"
   [(set (match_operand:DWI 0 "register_operand" "=r")
(any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0")
 (match_operand:QI 2 "nonmemory_operand" "c")))
@@ -9622,7 +9623,7 @@
   "#"
   "(optimize && flag_peephole2) ? epilogue_completed : reload_completed"
   [(const_int 0)]
-  "ix86_split_ (operands, NULL_RTX, mode); DONE;"
+  "ix86_split_ (operands, NULL_RTX, mode); DONE;"
   [(set_attr "type" "multi")])
 
 ;; By default we don't ask for a scratch register, because when DWImode
@@ -9639,7 +9640,7 @@
(match_dup 3)]
   "TARGET_CMOVE"
   [(const_int 0)]
-  "ix86_split_ (operands, operands[3], mode); DONE;")
+  "ix86_split_ (operands, operands[3], mode); DONE;")
 
 (define_insn "x86_64_shrd"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
@@ -9755,16 +9756,16 @@
   DONE;
 })
 
-(define_insn "*bmi2_3_1"
+(define_insn "*bmi2_3_1"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm")
   (match_operand:SWI48 2 "register_operand" "r")))]
   "TARGET_BMI2"
-  "x\t{%2, %1, %0|%0, %1, %2}"
+  "x\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ishiftx")
(set_attr "mode" "")])
 
-(define_insn "*3_1"
+(define_insn "*3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
(any_shiftrt:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
@@ -9780,9 +9781,9 @@
 default:
   if (operands[2] == const1_rtx
  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
-   return "{}\t%0";
+   return "{}\t%0";
   else
-   return "{}\t{%2, %0|%0, %2}";
+   return "{}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,bmi2")
@@ -9807,17 +9808,17 @@
(any_shiftrt:SWI48 (match_dup 1) (match_dup 2)))]
   "operands[2] = gen_lowpart (mode, operands[2]);")
 
-(define_insn "*bmi2_si3_1_zext"
+(define_insn "*bmi2_si3_1_zext"
   [(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
  (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
  (match_operand:SI 2 "register_operand" "r"]
   "TARGET_64BIT && TARGET_BMI2"
-  "x\t{%2, %1, %k0|%k0, %1, %2}"
+  "x\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "ishiftx")
(set_attr "mode" "SI")])
 
-(define_insn "*si3_1_zext"
+(define_insn "*si3_1_zext"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI
  (any_shiftrt:SI (match_operand:SI 1 "n

Re: scalar vector shift expansion problem on 64-bit

2011-10-28 Thread Jakub Jelinek
On Fri, Oct 28, 2011 at 06:50:49PM +0200, Jakub Jelinek wrote:
> On Fri, Oct 28, 2011 at 09:07:31AM -0700, Richard Henderson wrote:
> > I think this is the same problem as Jakub is attacking here:
> > 
> >   http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02503.html
> 
> It has been checked in already.  But my patch only deals
> with the vector << vector case, vector << scalar (including
> vector << scalar implemented using vector << vector) supposedly still
> needs fold_const somewhere if the type sizes disagree.

A wild guess, though untested, because I don't have a reproducer:

2011-10-28  Jakub Jelinek  

* tree-vect-stmts.c (vectorizable_shift): If op1 is vect_external_def
and has different type from op0, cast it to op0's type before the
loop first.

--- gcc/tree-vect-stmts.c.jj2011-10-28 16:21:06.0 +0200
+++ gcc/tree-vect-stmts.c   2011-10-28 20:19:27.0 +0200
@@ -2483,6 +2483,13 @@ vectorizable_shift (gimple stmt, gimple_
  dealing with vectors of short/char.  */
   if (dt[1] == vect_constant_def)
 op1 = fold_convert (TREE_TYPE (vectype), op1);
+ else if (!useless_type_conversion_p (TREE_TYPE (vectype),
+  TREE_TYPE (op1)))
+   {
+ op1 = fold_convert (TREE_TYPE (vectype), op1);
+ op1 = vect_init_vector (stmt, op1, TREE_TYPE (vectype),
+ NULL);
+   }
 }
 }
 }


Jakub


Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-10-28 Thread Peter Bergner
On Fri, 2011-10-28 at 09:44 -0700, Richard Henderson wrote:
> Not quite.  You can't allow the user to set TARGET_LINK_STACK either,
> for 64-bit.  Because it won't work without further fixups.  More like
> 
>   if (TARGET_POWERPC64)
> SET_TARGET_LINK_STACK (0);
>   if (TARGET_LINK_STACK == -1)
> SET_TARGET_LINK_STACK (rs6000_cpu == PROCESSOR_PPC476 && flag_pic);

Ah, I forgot about if the user explicitly uses -mpreserve-ppc476-link-stack.
Ok, so how about if we also spit out a warning that we're implicitly disabling
the link stack code rather than doing it silently?  Like so:

  if (TARGET_POWERPC64)
{
  if (TARGET_LINK_STACK > 0)
warning (0, "-m64 disables -mpreserve-ppc476-link-stack");
  SET_TARGET_LINK_STACK (0);
}
  else if (TARGET_LINK_STACK == -1)
SET_TARGET_LINK_STACK (rs6000_cpu == PROCESSOR_PPC476 && flag_pic);


Peter





Re: scalar vector shift expansion problem on 64-bit

2011-10-28 Thread Richard Henderson
On 10/28/2011 11:41 AM, Jakub Jelinek wrote:
> On Fri, Oct 28, 2011 at 06:50:49PM +0200, Jakub Jelinek wrote:
>> On Fri, Oct 28, 2011 at 09:07:31AM -0700, Richard Henderson wrote:
>>> I think this is the same problem as Jakub is attacking here:
>>>
>>>   http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02503.html
>>
>> It has been checked in already.  But my patch only deals
>> with the vector << vector case, vector << scalar (including
>> vector << scalar implemented using vector << vector) supposedly still
>> needs fold_const somewhere if the type sizes disagree.
> 
> A wild guess, though untested, because I don't have a reproducer:
> 
> 2011-10-28  Jakub Jelinek  
> 
>   * tree-vect-stmts.c (vectorizable_shift): If op1 is vect_external_def
>   and has different type from op0, cast it to op0's type before the
>   loop first.

I suspect the problem is in optabs.c, not here.

I'll try to look at it later today.


r~


[PATCH] Pattern recognize shifts with different rhs1/rhs2 types

2011-10-28 Thread Jakub Jelinek
Hi!

This patch implements what I've talked about, with this we can now
with -mavx2 as well as -mxop vectorize long long/unsigned long long
shifts by int or long long/unsigned long long shifts by long long
(where the FE casts it to int first).  Already covered by the *vshift-*
testcases I've committed recently (eyeballed for -mxop plus link tested,
for -mavx2 tested on sde).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-28  Jakub Jelinek  

* tree-vectorizer.h (NUM_PATTERNS): Bump to 9.
* tree-vect-patterns.c (vect_recog_vector_vector_shift_pattern): New
function.
(vect_vect_recog_func_ptrs): Add it.

--- gcc/tree-vectorizer.h.jj2011-10-27 08:42:51.0 +0200
+++ gcc/tree-vectorizer.h   2011-10-28 16:26:30.0 +0200
@@ -902,7 +902,7 @@ extern void vect_slp_transform_bb (basic
Additional pattern recognition functions can (and will) be added
in the future.  */
 typedef gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
-#define NUM_PATTERNS 8
+#define NUM_PATTERNS 9
 void vect_pattern_recog (loop_vec_info);
 
 /* In tree-vectorizer.c.  */
--- gcc/tree-vect-patterns.c.jj 2011-10-26 14:19:11.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-28 17:41:26.0 +0200
@@ -51,6 +51,8 @@ static gimple vect_recog_over_widening_p
  tree *);
 static gimple vect_recog_widen_shift_pattern (VEC (gimple, heap) **,
tree *, tree *);
+static gimple vect_recog_vector_vector_shift_pattern (VEC (gimple, heap) **,
+ tree *, tree *);
 static gimple vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **,
  tree *, tree *);
 static gimple vect_recog_bool_pattern (VEC (gimple, heap) **, tree *, tree *);
@@ -61,6 +63,7 @@ static vect_recog_func_ptr vect_vect_rec
vect_recog_pow_pattern,
vect_recog_over_widening_pattern,
vect_recog_widen_shift_pattern,
+   vect_recog_vector_vector_shift_pattern,
vect_recog_mixed_size_cond_pattern,
vect_recog_bool_pattern};
 
@@ -1439,6 +1442,133 @@ vect_recog_widen_shift_pattern (VEC (gim
   return pattern_stmt;
 }
 
+/* Detect a vector by vector shift pattern that wouldn't be otherwise
+   vectorized:
+
+   type a_t;
+   TYPE b_T, res_T;
+
+   S1 a_t = ;
+   S2 b_T = ;
+   S3 res_T = b_T op a_t;
+
+  where type 'TYPE' is a type with different size than 'type',
+  and op is <<, >> or rotate.
+
+  Also detect cases:
+
+   type a_t;
+   TYPE b_T, c_T, res_T;
+
+   S0 c_T = ;
+   S1 a_t = (type) c_T;
+   S2 b_T = ;
+   S3 res_T = b_T op a_t;
+
+  Input/Output:
+
+  * STMTS: Contains a stmt from which the pattern search begins,
+i.e. the shift/rotate stmt.  The original stmt (S3) is replaced
+with a shift/rotate which has same type on both operands, in the
+second case just b_T op c_T, in the first case with added cast
+from a_t to c_T in STMT_VINFO_PATTERN_DEF_STMT.
+
+  Output:
+
+  * TYPE_IN: The type of the input arguments to the pattern.
+
+  * TYPE_OUT: The type of the output of this pattern.
+
+  * Return value: A new stmt that will be used to replace the shift/rotate
+S3 stmt.  */
+
+static gimple
+vect_recog_vector_vector_shift_pattern (VEC (gimple, heap) **stmts,
+   tree *type_in, tree *type_out)
+{
+  gimple last_stmt = VEC_pop (gimple, *stmts);
+  tree oprnd0, oprnd1, lhs, var;
+  gimple pattern_stmt, def_stmt;
+  enum tree_code rhs_code;
+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+  enum vect_def_type dt;
+  tree def;
+
+  if (!is_gimple_assign (last_stmt))
+return NULL;
+
+  rhs_code = gimple_assign_rhs_code (last_stmt);
+  switch (rhs_code)
+{
+case LSHIFT_EXPR:
+case RSHIFT_EXPR:
+case LROTATE_EXPR:
+case RROTATE_EXPR:
+  break;
+default:
+  return NULL;
+}
+
+  if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
+return NULL;
+
+  lhs = gimple_assign_lhs (last_stmt);
+  oprnd0 = gimple_assign_rhs1 (last_stmt);
+  oprnd1 = gimple_assign_rhs2 (last_stmt);
+  if (TREE_CODE (oprnd0) != SSA_NAME
+  || TREE_CODE (oprnd1) != SSA_NAME
+  || TYPE_MODE (TREE_TYPE (oprnd0)) == TYPE_MODE (TREE_TYPE (oprnd1))
+  || TYPE_PRECISION (TREE_TYPE (oprnd1))
+!= GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (oprnd1)))
+  || TYPE_PRECISION (TREE_TYPE (lhs))
+!= TYPE_PRECISION (TREE_TYPE (oprnd0)))
+return NULL;
+
+  if (!vect_is_simple_use (oprnd1, loop_vinfo, NULL, &def_stmt, &def, &dt))
+return NULL;
+
+  if (dt != vect_internal_def)
+return NULL;
+
+  *type_in = get_vectype_for_scalar_type (TREE_TYPE (oprnd0));
+  *type_out = *type_in;
+  if (*type_in == NULL_TREE)
+return NULL;
+
+  def = NULL_TREE;
+  if (gimple_assign_cast_p (def_stmt))
+{

Re: [Patch Darwin/PPC] implement out-of-line FPR/GPR saves/restores.

2011-10-28 Thread Mike Stump
On Oct 28, 2011, at 3:40 AM, Iain Sandoe wrote:
> To test what you suggested I built some code that dropped down a few stack 
> levels (with saves of FPR/GPR) and then either aborts or spins on a sleep.

Uhm, that's not enough, for async, you need to spawn threads that do the 
interesting stuff in those threads and then have the main code abort.


Re: PING 2 : [Patch Darwin/PR49992 2/2] remove ranlib special-casing from the darwin port.

2011-10-28 Thread Mike Stump
On Oct 28, 2011, at 8:41 AM, Iain Sandoe wrote:
> This is unreviewed for 2 weeks.

Odd, usually the Ada people are fairly responsive.  If they want me to weigh 
in, I approve of the concept behind the work.


Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-10-28 Thread Richard Henderson
On 10/28/2011 11:35 AM, Peter Bergner wrote:
> On Fri, 2011-10-28 at 09:44 -0700, Richard Henderson wrote:
>> Not quite.  You can't allow the user to set TARGET_LINK_STACK either,
>> for 64-bit.  Because it won't work without further fixups.  More like
>>
>>   if (TARGET_POWERPC64)
>> SET_TARGET_LINK_STACK (0);
>>   if (TARGET_LINK_STACK == -1)
>> SET_TARGET_LINK_STACK (rs6000_cpu == PROCESSOR_PPC476 && flag_pic);
> 
> Ah, I forgot about if the user explicitly uses -mpreserve-ppc476-link-stack.
> Ok, so how about if we also spit out a warning that we're implicitly disabling
> the link stack code rather than doing it silently?  Like so:
> 
>   if (TARGET_POWERPC64)
> {
>   if (TARGET_LINK_STACK > 0)
>   warning (0, "-m64 disables -mpreserve-ppc476-link-stack");
>   SET_TARGET_LINK_STACK (0);
> }
>   else if (TARGET_LINK_STACK == -1)
> SET_TARGET_LINK_STACK (rs6000_cpu == PROCESSOR_PPC476 && flag_pic);

Fine by me.  Final rs6000 approval is dje's bivouac.


r~


Re: [PATCH, i386]: Remove lshift_insn and lshift code attributes

2011-10-28 Thread Uros Bizjak
On Fri, Oct 28, 2011 at 8:21 PM, Uros Bizjak  wrote:
> Hello!
>
> We can extend existing code attributes.  Also, the patch includes some
> stylistic changes in XOP shift patterns.
>
> No functional changes.

Eh, the patch was the old one, added is additional patch with updated ChangeLog:

2011-10-28  Uros Bizjak  

* config/i386/i386.md (shift_insn): Rename code attribute from
shiftrt_insn.  Also handle ashift RTX.
(shift): Rename code attribute from shiftrt.  Also handle ashift RTX.
(vshift): New code attribute.
(*): Rename from *. Update asm templates.
(any_lshift): Move and rename code iterator from ...
* config/i386/sse.md (lshift): ... here.
(lshift_insn): Remove code attribute.
(lshift): Remove code attribute.
(vlshr3): Use lshiftrt RTX.
(vashr3, ashrv16qi3, ashrv2di3): Use ashiftrt RTX.
(vashl3, ashlv16qi3): Use ashift RTX.
(avx2_v): Rename from avx2_v.  Use
any_lshift code iterator.  Update asm template.
(3): Macroize insn from lshr3 and ashl3
usign any_lshift code iterator.
* config/i386/mmx.md (mmx_3): Macroize insn from
mmx_lshr3 and mmx_ashl3 usign any_lshift code iterator.
* config/i386/i386.c (bdesc_args) <__builtin_ia32_psll>: Update.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}  and committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 180622)
+++ i386.md (working copy)
@@ -772,6 +772,9 @@
 ;; Base name for insn mnemonic.
 (define_code_attr logic [(and "and") (ior "or") (xor "xor")])
 
+;; Mapping of logic-shift operators
+(define_code_iterator any_lshift [ashift lshiftrt])
+
 ;; Mapping of shift-right operators
 (define_code_iterator any_shiftrt [lshiftrt ashiftrt])
 
@@ -781,6 +784,7 @@
 
 ;; Base name for insn mnemonic.
 (define_code_attr shift [(ashift "sll") (lshiftrt "shr") (ashiftrt "sar")])
+(define_code_attr vshift [(ashift "sll") (lshiftrt "srl") (ashiftrt "sra")])
 
 ;; Mapping of rotate operators
 (define_code_iterator any_rotate [rotate rotatert])
Index: mmx.md
===
--- mmx.md  (revision 180621)
+++ mmx.md  (working copy)
@@ -1037,13 +1037,13 @@
(const_string "0")))
(set_attr "mode" "DI")])
 
-(define_insn "mmx_lshr3"
+(define_insn "mmx_3"
   [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
-(lshiftrt:MMXMODE248
+(any_lshift:MMXMODE248
  (match_operand:MMXMODE248 1 "register_operand" "0")
  (match_operand:SI 2 "nonmemory_operand" "yN")))]
   "TARGET_MMX"
-  "psrl\t{%2, %0|%0, %2}"
+  "p\t{%2, %0|%0, %2}"
   [(set_attr "type" "mmxshft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand" "")
@@ -1051,20 +1051,6 @@
(const_string "0")))
(set_attr "mode" "DI")])
 
-(define_insn "mmx_ashl3"
-  [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
-(ashift:MMXMODE248
- (match_operand:MMXMODE248 1 "register_operand" "0")
- (match_operand:SI 2 "nonmemory_operand" "yN")))]
-  "TARGET_MMX"
-  "psll\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
-   (set (attr "length_immediate")
- (if_then_else (match_operand 2 "const_int_operand" "")
-   (const_string "1")
-   (const_string "0")))
-   (set_attr "mode" "DI")])
-
 ;
 ;;
 ;; Parallel integral comparisons
Index: sse.md
===
--- sse.md  (revision 180622)
+++ sse.md  (working copy)
@@ -167,9 +167,6 @@
(V4SI "vec") (V8SI "avx2")
(V2DI "vec") (V4DI "avx2")])
 
-;; Mapping of logic-shift operators
-(define_code_iterator any_lshift [ashift lshiftrt])
-
 (define_mode_attr ssedoublemode
   [(V16HI "V16SI") (V8HI "V8SI")])
 
@@ -5826,15 +5823,15 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])
 
-(define_insn "lshr3"
+(define_insn "3"
   [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x")
-   (lshiftrt:VI248_AVX2
+   (any_lshift:VI248_AVX2
  (match_operand:VI248_AVX2 1 "register_operand" "0,x")
  (match_operand:SI 2 "nonmemory_operand" "xN,xN")))]
   "TARGET_SSE2"
   "@
-   psrl\t{%2, %0|%0, %2}
-   vpsrl\t{%2, %1, %0|%0, %1, %2}"
+   p\t{%2, %0|%0, %2}
+   v\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseishft")
(set (attr "length_immediate")
@@ -5845,25 +5842,6 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])
 
-(define_insn "ashl3"
-  [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x")
-   (ashift:VI248_AVX2
- (match_operand:VI248_AVX2 1 "register_operand" "0,x")
- (match_operand:SI 2 "nonmemory_operand" "xN,xN")))]
-  "TARGET_SSE2"
-  "@
-   psll\t{%2, %0|%0, %2}
-   vpsll\t{%2, %

[C++ Patch] Avoid uninitialized warning in pt.c

2011-10-28 Thread Paolo Carlini

Hi,

I think we have to do this, the warning I'm seeing doesn't seem bogus: 
in principle comp_template_args_with_info may leave bad_old_arg and 
bad_new_arg uninitialized.


Ok?

Thanks,
Paolo.


2011-10-28  Paolo Carlini  

* pt.c (unify_pack_expansion): Initialize bad_old_arg and bad_new_arg.
Index: pt.c
===
--- pt.c(revision 180623)
+++ pt.c(working copy)
@@ -15715,7 +15715,7 @@ unify_pack_expansion (tree tparms, tree targs, tre
 }
   else
{
- tree bad_old_arg, bad_new_arg;
+ tree bad_old_arg = NULL_TREE, bad_new_arg = NULL_TREE;
  tree old_args = ARGUMENT_PACK_ARGS (old_pack);
 
  if (!comp_template_args_with_info (old_args, new_args,


Re: [C++ Patch] Avoid uninitialized warning in pt.c

2011-10-28 Thread Jason Merrill

OK.

Jason


Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-10-28 Thread David Edelsohn
On Fri, Oct 28, 2011 at 12:36 PM, Peter Bergner  wrote:

> So David, do we even want to bother trying to support this on -m64
> given the only cpu that needs this is a 32-bit only cpu?  If so, I
> can try and work with Alan to figure out how we can merge the
> function descriptors for the thunk routines when using -m64.

I barely want to bother with this ;-).  So, no, I don't want to bother
with -m64 support.

- David


Re: Require canonical type comparison for typedefs again.

2011-10-28 Thread H.J. Lu
On Fri, Dec 17, 2010 at 12:11 PM, H.J. Lu  wrote:
> On Wed, Oct 27, 2010 at 7:24 AM, Dodji Seketeli  wrote:
>> Hello,
>>
>> So I forgot to remove the wrong "optimization" on the parms of
>> template template parameters. In the patch at the end of this message
>> this hunks fixes that:
>>
> ...
>>
>> Below is the fully updated patch. Fully bootstrapped and tested on
>> x86_64-unknown-linux-gnu.
>>
>> --
>>        Dodji
>>
>> commit c21c5f024f47a9a55facbd70c0d1f36956cff7c4
>> Author: Dodji Seketeli 
>> Date:   Mon Sep 13 12:12:21 2010 +0200
>>
>>    Restore canonical type comparison for dependent typedefs
>>
>>    gcc/cp/ChangeLog:
>>        PR c++/45606
>>        * cp-tree.h (TEMPLATE_TYPE_PARM_SIBLING_PARMS): Remove.
>>        (struct template_parm_index_s): New field.
>>        (TEMPLATE_PARM_NUM_SIBLINGS): New accessor.
>>        (process_template_parm): Extend the API to accept the number of
>>        template parms in argument.
>>        (cp_set_underlying_type): Remove this.
>>        * class.c (build_self_reference): Require canonical type equality
>>        back on the self reference of class.
>>        * decl2.c (grokfield): Require canonical type equality back on
>>        typedef class fields.
>>        * name-lookup.c (pushdecl_maybe_friend): Require canonical type
>>        equality back on typedefs.
>>        * parser.c (cp_parser_template_parameter_list): Do not require
>>        canonical type equality on dependent types created during
>>        template parameters parsing.
>>        * pt.c (fixup_template_type_parm_type, fixup_template_parm_index)
>>        (fixup_template_parm, fixup_template_parms): New private
>>        functions.
>>        (current_template_args): Declare this.
>>        (process_template_parm): Pass the total number of template parms
>>        to canonical_type_parameter.
>>        (build_template_parm_index): Add a new argument to carry the total
>>        number of template parms.
>>        (reduce_template_parm_level, process_template_parm, make_auto): 
>> Adjust.
>>        (current_template_args): Fix this for template template
>>        parameters.
>>        (tsubst_template_parm): Split out of ...
>>        (tsubst_template_parms): ... this.
>>        (reduce_template_parm_level): Don't loose
>>        TEMPLATE_PARM_NUM_SIBLINGS when cloning a TEMPLATE_PARM_INDEX.
>>        (template_parm_to_arg): Extracted this function from
>>        current_template_args. Make it represent invalid template parms
>>        with an error_mark_node instead of a LIST_TREE containing an
>>        error_mark_node.
>>        (current_template_args): Use template_parm_to_arg.
>>        (dependent_template_arg_p): Consider an invalid template argument
>>        as dependent.
>>        (end_template_parm_list): Do not update template sibling parms
>>        here anymore. Use fixup_template_parms instead.
>>        (process_template_parm): Pass the number of template parms to
>>        canonical_type_parameter.
>>        (make_auto): Require structural equality on auto
>>        TEMPLATE_TYPE_PARM for now.
>>        (unify): Coerce template parameters
>>        using all the arguments deduced so far.
>>        (tsubst): Pass the number of sibling parms to
>>        canonical_type_parameter.
>>        * tree.c (cp_set_underlying_type): Remove.
>>        * typeck.c (get_template_parms_of_dependent_type)
>>        (incompatible_dependent_types_p): Remove.
>>        (structural_comptypes): Do not call incompatible_dependent_types_p
>>        anymore.
>>        (comp_template_parms_position): Re-organized. Take the length of
>>        template parms list in account.
>>
>>    gcc/testsuite/ChangeLog:
>>        PR c++/45606
>>        * g++.dg/template/typedef36.C: New test.
>>        * gcc/testsuite/g++.dg/template/canon-type-9.C: Likewise.
>>        * g++.dg/template/canon-type-10.C: Likewise.
>>        * g++.dg/template/canon-type-11.C: Likewise.
>>        * g++.dg/template/canon-type-12.C: Likewise.
>>        * g++.dg/template/canon-type-13.C: Likewise.
>>
>
>
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46394
>

It also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50870

-- 
H.J.


Re: [patch][google] Allow static const floats unless -pedantic is passed. (issue 5306071)

2011-10-28 Thread Jeffrey Yasskin
Thanks Diego,

Here's a new version of the patch with fixes for your comments. I'll
submit it in a couple hours unless I hear objections.

On Fri, Oct 28, 2011 at 7:57 AM, Diego Novillo  wrote:
> On Thu, Oct 27, 2011 at 09:27,   wrote:
>> Reviewers: Diego Novillo,
>>
>> Message:
>> This patch is intended for the google/gcc-4_6 branch. Tested with make
>> check-c++ on ubuntu x86-64.
>>
>> Should this go to gcc-patches@gcc.gnu.org too, or just the internal
>> list?
>
> As you prefer.  Strictly speaking, yes, in case other C++ maintainers
> have feedback on your patch.  But given that it is a patch that you
> intend to keep in a google release only, then it does not really
> matter all that much.
>
>>
>> Description:
This patch allows us to migrate to C++11 more incrementally, since we can leave
the static const float initializations in place, flip the switch, and then
change them to use constexpr.

We should NOT forward-port this to any gcc-4.7 branches.


gcc/cp/ChangeLog.google-4_6
2011-10-28  Jeffrey Yasskin  

google ref 5514746; backport of r179121

Modified locally to only block static const literals in -pedantic
mode.

2011-09-23  Paolo Carlini  

* decl.c (check_static_variable_definition): Allow in-class
initialization of static data member of non-integral type in
permissive mode.


gcc/testsuite/ChangeLog.google-4_6
2011-10-28  Jeffrey Yasskin  

google ref 5514746; backport of r179121

Modified locally to only block static const literals in -pedantic
mode.

* g++.dg/cpp0x/constexpr-static8_nonpedantic.C: New.

2011-09-23  Paolo Carlini  

* g++.dg/cpp0x/constexpr-static8.C: New.
>>
>> You can review this at http://codereview.appspot.com/5306071/
>>
>> Affected files:
>>  M     gcc/cp/ChangeLog.google-4_6
>>  M     gcc/cp/decl.c
>>  M     gcc/testsuite/ChangeLog.google-4_6
>>  A     gcc/testsuite/g++.dg/cpp0x/constexpr-static8.C
>>  A     gcc/testsuite/g++.dg/cpp0x/constexpr-static8_nonpedantic.C
>>
...
>> Index: gcc/cp/decl.c
>> ===
>> --- gcc/cp/decl.c       (revision 180546)
>> +++ gcc/cp/decl.c       (working copy)
>> @@ -7508,8 +7508,12 @@ check_static_variable_definition (tree decl, tree
>>   else if (cxx_dialect >= cxx0x && !INTEGRAL_OR_ENUMERATION_TYPE_P (type))
>>     {
>>       if (literal_type_p (type))
>> -       error ("% needed for in-class initialization of static "
>> -              "data member %q#D of non-integral type", decl);
>> +        {
>> +          pedwarn (input_location, OPT_pedantic,
>> +                   "% needed for in-class initialization of "
>> +                   "static data member %q#D of non-integral type", decl);
>> +          return 0;
>> +        }
>
> Add a 'FIXME google' here?  Describe why this is different than
> upstream.  Helps with merge conflicts.

Done.

> OK with those changes.
>
>
> Diego.
>
Index: gcc/testsuite/g++.dg/cpp0x/constexpr-static8.C
===
--- gcc/testsuite/g++.dg/cpp0x/constexpr-static8.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-static8.C	(revision 0)
@@ -0,0 +1,7 @@
+// PR c++/50258
+// { dg-options "-std=c++0x -pedantic" }
+
+struct Foo {
+  static const double d = 3.14; // { dg-warning "constexpr" }
+};
+const double Foo::d;
Index: gcc/testsuite/g++.dg/cpp0x/constexpr-static8_nonpedantic.C
===
--- gcc/testsuite/g++.dg/cpp0x/constexpr-static8_nonpedantic.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-static8_nonpedantic.C	(revision 0)
@@ -0,0 +1,7 @@
+// PR c++/50258
+// { dg-options "-std=c++0x" }
+
+struct Foo {
+  static const double d = 3.14; // no warning
+};
+const double Foo::d;
Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c	(revision 180546)
+++ gcc/cp/decl.c	(working copy)
@@ -7508,8 +7508,18 @@
   else if (cxx_dialect >= cxx0x && !INTEGRAL_OR_ENUMERATION_TYPE_P (type))
 {
   if (literal_type_p (type))
-	error ("% needed for in-class initialization of static "
-	   "data member %q#D of non-integral type", decl);
+{
+  /* FIXME google: This local modification allows us to
+ transition from C++98 to C++11 without moving static
+ const floats out of the class during the transition.  It
+ should not be forward-ported to a 4.7 branch, since by
+ then we should be able to just fix the code to use
+ constexpr.  */
+  pedwarn (input_location, OPT_pedantic,
+   "% needed for in-class initialization of "
+   "static data member %q#D of non-integral type", decl);
+  return 0;
+}
   else
 	error ("in-class initialization of static data member %q#D of "
 	   "non-literal type", decl);


Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-10-28 Thread Sergey Ostanevich
On Fri, Oct 28, 2011 at 7:25 PM, Sergey Ostanevich  wrote:
> On Fri, Oct 28, 2011 at 4:52 PM, Richard Guenther  wrote:
>> On Fri, 28 Oct 2011, Sergey Ostanevich wrote:
>>
>>> On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther  
>>> wrote:
>>> > On Thu, 27 Oct 2011, Uros Bizjak wrote:
>>> >
>>> >> Hello!
>>> >>
>>> >> > Here's a patch for PR47698, which is about CMOV should not be
>>> >> > generated for memory address marked as volatile.
>>> >> > Successfully bootstrapped and passed make check on 
>>> >> > x86_64-unknown-linux-gnu.
>>> >>
>>> >>
>>> >>       PR rtl-optimization/47698
>>> >>       * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV 
>>> >> generation
>>> >>       for volatile mem
>>> >>
>>> >>       PR rtl-optimization/47698
>>> >>       * gcc.target/i386/47698.c: New test
>>> >>
>>> >> Please use punctuation marks and correct capitalization in ChangeLog 
>>> >> entries.
>>> >>
>>> >> OTOH, do we want to fix this per-target, or in the middle-end?
>>> >
>>> > The middle-end pattern documentation does not say operands 2 and 3
>>> > are not evaluated if they do not end up being stored, so a middle-end
>>> > fix is more appropriate.
>>> >
>>> > Richard.
>>> >
>>>
>>> I have two observations:
>>>
>>> - the code for CMOV is under #ifdef in the mddle-end, which is
>>> explicitly marked as "have to be removed" (ifcvt.c:1446)
>>> - I have no clear evidence all platforms that support conditional move
>>> have the same semantics that lead to the PR
>>>
>>> I think the best way to address both concerns is to implement code
>>> that relies on а new hookup "volatile-safe CMOV" that is false by
>>> default.
>>
>> I suppose it's never safe for all architectures that support
>> memory operands in the source operand.
>>
>> Richard.
>
> ok, at least there should be no big problem of missing optimization
> around volatile memory.
>
> apparently the problem is here:
>
> ifcvt.c:2539 there is a test for side effects of source (which is 'a'
> in this case)
>
> 2539      if (! noce_operand_ok (a) || ! noce_operand_ok (b))
> (gdb) p debug_rtx(a)
> (mem/v/c/i:DI (symbol_ref:DI ("mmio") [flags 0x40]  0x71339140 mmio>) [2 mmio+0 S8 A64])
>
> but inside noce_operand_ok() there is a wrong order of tests:
>
> 2332      if (MEM_P (op))
> 2333        return ! side_effects_p (XEXP (op, 0));
> 2334
> 2335      if (side_effects_p (op))
> 2336        return FALSE;
> 2337
>
> where XEXP removes the memory reference leaving just symbol reference,
> that has no volatile attribute
> #0  side_effects_p (x=0x7149c660) at ../../gcc/rtlanal.c:2152
> (gdb) p debug_rtx(x)
> (symbol_ref:DI ("mmio") [flags 0x40] )
>
> Is the following fix is Ok?
> I'm testing it so far.
>
> Sergos
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 784e2e8..3b05c2a 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -2329,12 +2329,12 @@ noce_operand_ok (const_rtx op)
>  {
>   /* We special-case memories, so handle any of them with
>      no address side effects.  */
> -  if (MEM_P (op))
> -    return ! side_effects_p (XEXP (op, 0));
> -
>   if (side_effects_p (op))
>     return FALSE;
>
> +  if (MEM_P (op))
> +    return ! side_effects_p (XEXP (op, 0));
> +
>   return ! may_trap_p (op);
>  }
>
> diff --git a/gcc/testsuite/gcc.target/i386/47698.c
> b/gcc/testsuite/gcc.target/i386/47698.c
> new file mode 100644
> index 000..2c75109
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/47698.c
> @@ -0,0 +1,10 @@
> +/* { dg-options "-Os" } */
> +/* { dg-final { scan-assembler-not "cmov" } } */
> +
> +extern volatile unsigned long mmio;
> +unsigned long foo(int cond)
> +{
> +      if (cond)
> +              return mmio;
> +        return 0;
> +}
>

bootstrapped and passed make check successfully on x86_64-unknown-linux-gnu

Sergos


Re: scalar vector shift expansion problem on 64-bit

2011-10-28 Thread Jakub Jelinek
On Fri, Oct 28, 2011 at 11:44:17AM -0700, Richard Henderson wrote:
> > A wild guess, though untested, because I don't have a reproducer:
> > 
> > 2011-10-28  Jakub Jelinek  
> > 
> > * tree-vect-stmts.c (vectorizable_shift): If op1 is vect_external_def
> > and has different type from op0, cast it to op0's type before the
> > loop first.
> 
> I suspect the problem is in optabs.c, not here.

Possible.  Though, if I disable all "ashl3", "lshr3" and
"ashr3" expanders in sse.md, I get without the above patch ICEs on
e.g.

long long d[64], e, j[64];

void
f4 (void)
{
  int i;
  for (i = 0; i < 64; i++)
j[i] = d[i] << e;
}

with -O3 -mxop and -O3 -mavx2 and the patch fixes those.

Jakub


[PR50869] don't attempt to expand CFA within cselib

2011-10-28 Thread Alexandre Oliva
An assertion check meant to verify that var loc expansions that didn't
involve VALUEs (say constants, REGs, etc) didn't push values onto the
dependency stack failed in an expansion of the argp reg, because
equivalences for it are preserved at cselib table resets, and cselib
later tries to expand it to equivalent expressions.

It's not profitable to expand it within var-tracking, and that's the
only user of the CFA-base special-casing in cselib, so I arranged for
argp to be preserved in expansions, just like other stack base
registers.

While debugging it, I noticed it was theoretically possible for the
expression depth to remain uninitialized, and added an initialization
and an assertion check to make sure it only remains zero when no
location is found.

Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok to install?

for  gcc/ChangeLog
from  Alexandre Oliva  

	PR debug/50869
	* cselib.c (cfa_base_preserved_regno): Initialize.
	(cselib_expand_value_rtx_1): Don't expand it.
	* var-tracking.c (vt_expand_var_loc_chain): Initialize depth.
	Check it's only zero if result is NULL.

Index: gcc/cselib.c
===
--- gcc/cselib.c.orig	2011-10-27 18:32:20.137366314 -0200
+++ gcc/cselib.c	2011-10-27 18:27:05.387597000 -0200
@@ -185,7 +185,7 @@ static cselib_val dummy_val;
that is constant through the whole function and should never be
eliminated.  */
 static cselib_val *cfa_base_preserved_val;
-static unsigned int cfa_base_preserved_regno;
+static unsigned int cfa_base_preserved_regno = INVALID_REGNUM;
 
 /* Used to list all values that contain memory reference.
May or may not contain the useless values - the list is compacted
@@ -1451,7 +1451,7 @@ cselib_expand_value_rtx_1 (rtx orig, str
 	  if (GET_MODE (l->elt->val_rtx) == GET_MODE (orig))
 	{
 	  rtx result;
-	  int regno = REGNO (orig);
+	  unsigned regno = REGNO (orig);
 
 	  /* The only thing that we are not willing to do (this
 		 is requirement of dse and if others potential uses
@@ -1471,7 +1471,8 @@ cselib_expand_value_rtx_1 (rtx orig, str
 		 make the frame assumptions.  */
 	  if (regno == STACK_POINTER_REGNUM
 		  || regno == FRAME_POINTER_REGNUM
-		  || regno == HARD_FRAME_POINTER_REGNUM)
+		  || regno == HARD_FRAME_POINTER_REGNUM
+		  || regno == cfa_base_preserved_regno)
 		return orig;
 
 	  bitmap_set_bit (evd->regs_active, regno);
Index: gcc/var-tracking.c
===
--- gcc/var-tracking.c.orig	2011-10-27 18:32:20.141366261 -0200
+++ gcc/var-tracking.c	2011-10-27 18:28:03.823813000 -0200
@@ -7764,7 +7764,7 @@ vt_expand_var_loc_chain (variable var, b
   bool pending_recursion;
   rtx loc_from = NULL;
   struct elt_loc_list *cloc = NULL;
-  int depth, saved_depth = elcd->depth;
+  int depth = 0, saved_depth = elcd->depth;
 
   /* Clear all backlinks pointing at this, so that we're not notified
  while we're active.  */
@@ -7842,6 +7842,8 @@ vt_expand_var_loc_chain (variable var, b
   VAR_LOC_FROM (var) = loc_from;
   VAR_LOC_DEPTH (var) = depth;
 
+  gcc_checking_assert (!depth == !result);
+
   elcd->depth = update_depth (saved_depth, depth);
 
   /* Indicate whether any of the dependencies are pending recursion


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


[wwwdocs] Update home page with reference to release announcement

2011-10-28 Thread Gerald Pfeifer
Committed.

Gerald

Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.818
diff -u -r1.818 index.html
--- index.html  27 Oct 2011 12:07:48 -  1.818
+++ index.html  28 Oct 2011 20:45:33 -
@@ -128,7 +128,7 @@
 
   Status:
   
-  http://gcc.gnu.org/ml/gcc/2011-10/msg00314.html";>2011-10-19
+  http://gcc.gnu.org/ml/gcc/2011-10/msg00486.html";>2011-10-27
   
   (regression fixes and docs only).
   


Re: [Patch, libfortran, 1/3] Simplify handling of special files

2011-10-28 Thread Mikael Morin
On Tuesday 18 October 2011 16:42:45 Janne Blomqvist wrote:
> Hi,
> 
> in a few places in libgfortran we have some code for handling special
> and/or non-seekable files differently. The problem is that special
> files don't all have some nice consistent behavior. E.g. wrt. seeking,
> some allow seeking just fine, others allow some seeks and not others,
> others allow them but always return an offset of 0, and yet others
> fail the seek completely.
> 
> The Fortran standard doesn't really help here except for noting that
> some files may not be positionable, and thus statements requiring the
> file position to be modified may fail on such files.
> 
> Obviously, libgfortran itself cannot enumerate all the possible
> variations for how a special file may behave, and trying to impose
> some kind of least common denominator may hide essential capability.
> Having thought about this, my conclusion is that the only thing that
> makes sense is that we do what the caller asks us to do, and if that
> fails, we report the error back to the caller and let the caller
> handle it. The attached patch implements this.
> 
> Regtested on x86_64-unknown-linux-gnu, Ok for trunk?
> 
I know little about the library, but your approach looks good, and I have seen 
nothing obviously wrong in the patch.
Thus, as no one else has complained so far: OK.

Mikael


Re: [PATCH 3/6] Implement interleave via permutation.

2011-10-28 Thread Hans-Peter Nilsson
On Mon, 24 Oct 2011, Richard Henderson wrote:

> From: Richard Henderson 

> +  /* Certain vector operations can be implemented with vector permutation.  
> */
> +  if (VECTOR_MODE_P (mode))
> +{
> +  enum tree_code tcode = ERROR_MARK;
> +  rtx sel;
> +
> +  if (binoptab == vec_interleave_high_optab)
> + tcode = VEC_INTERLEAVE_HIGH_EXPR;
> +  else if (binoptab == vec_interleave_low_optab)
> + tcode = VEC_INTERLEAVE_LOW_EXPR;
> +  else if (binoptab == vec_extract_even_optab)
> + tcode = VEC_EXTRACT_EVEN_EXPR;
> +  else if (binoptab == vec_extract_odd_optab)
> + tcode = VEC_EXTRACT_ODD_EXPR;

Also VEC_UNPACK_HI_EXPR, VEC_UNPACK_LO_EXPR, and
VEC_PACK_TRUNC_EXPR to mention some.

brgds, H-P


Patch committed: Fix -fsplit-stack unwind

2011-10-28 Thread Ian Lance Taylor
The CFI code for the -fsplit-stack support was incomplete.  It did not
correctly unwind throughout the function.  This patch fixes that by
adding the CFI information at the start.  This is probably still not
precisely correct at all times, but it is better.

In 32-bit mode, the exception cleanup routine has to change %ebx when
running in a shared library.  Therefore, it is necessary that the
unwinder restore %ebx to the old value.  This patch implements that in a
simple way, but having the main routine save %ebx with appropriate CFI
information.

Bootstrapped and ran testsuite on x86_64-unknown-linux-gnu.  Committed
to mainline.

Ian


2011-10-28  Ian Lance Taylor  

* config/i386/morestack.S: Correct CFI information to do proper
returns throughout function.  In 32-bit mode, save %ebx so that it
is restored on unwind.


Index: config/i386/morestack.S
===
--- config/i386/morestack.S	(revision 180342)
+++ config/i386/morestack.S	(working copy)
@@ -139,44 +139,68 @@ __morestack:
 	.cfi_lsda 0x1b,.LLSDA1
 #endif
 
-	# Set up a normal backtrace.
-	pushl	%ebp
-	.cfi_def_cfa_offset 8
-	.cfi_offset %ebp, -8
-	movl	%esp, %ebp
-	.cfi_def_cfa_register %ebp
-
 	# We return below with a ret $8.  We will return to a single
 	# return instruction, which will return to the caller of our
 	# caller.  We let the unwinder skip that single return
 	# instruction, and just return to the real caller.
-	.cfi_offset 8, 8
+
+	# Here CFA points just past the return address on the stack,
+	# e.g., on function entry it is %esp + 4.  Later we will
+	# change it to %ebp + 8, as set by .cfi_def_cfa_register and
+	# .cfi_def_cfa_offset above.  The stack looks like this:
+	#	CFA + 12:	stack pointer after two returns
+	#	CFA + 8:	return address of morestack caller's caller
+	#	CFA + 4:	size of parameters
+	#	CFA:		new stack frame size
+	#	CFA - 4:	return address of this function
+	#	CFA - 8:	previous value of %ebp; %ebp points here
+	# We want to set %esp to the stack pointer after the double
+	# return, which is CFA + 12.
+	.cfi_offset 8, 8		# New PC stored at CFA + 8
 	.cfi_escape 0x15, 4, 0x7d	# DW_CFA_val_offset_sf, %esp, 12/-4
+	# i.e., next %esp is CFA + 12
+
+	# Set up a normal backtrace.
+	pushl	%ebp
+	.cfi_def_cfa_offset 8
+	.cfi_offset %ebp, -8
+	movl	%esp,%ebp
+	.cfi_def_cfa_register %ebp
 
 	# In 32-bit mode the parameters are pushed on the stack.  The
 	# argument size is pushed then the new stack frame size is
 	# pushed.
 
+	# Align stack to 16-byte boundary with enough space for saving
+	# registers and passing parameters to functions we call.
+	subl	$40,%esp
+
+	# Because our cleanup code may need to clobber %ebx, we need
+	# to save it here so the unwinder can restore the value used
+	# by the caller.  Note that we don't have to restore the
+	# register, since we don't change it, we just have to save it
+	# for the unwinder.
+	movl	%ebx,-4(%ebp)
+	.cfi_offset %ebx, -12
+
 	# In 32-bit mode the registers %eax, %edx, and %ecx may be
 	# used for parameters, depending on the regparm and fastcall
 	# attributes.
 
-	pushl	%eax
-	pushl	%edx
-	pushl	%ecx
+	movl	%eax,-8(%ebp)
+	movl	%edx,-12(%ebp)
+	movl	%ecx,-16(%ebp)
 
 	call	__morestack_block_signals
 
-	pushl	12(%ebp)		# The size of the parameters.
+	movl	12(%ebp),%eax		# The size of the parameters.
+	movl	%eax,8(%esp)
 	leal	20(%ebp),%eax		# Address of caller's parameters.
-	pushl	%eax
+	movl	%eax,4(%esp)
 	addl	$BACKOFF,8(%ebp)	# Ask for backoff bytes.
 	leal	8(%ebp),%eax		# The address of the new frame size.
-	pushl	%eax
+	movl	%eax,(%esp)
 
-	# Note that %esp is exactly 32 bytes below the CFA -- perfect for
-	# a 16-byte aligned stack.  That said, we still ought to compile
-	# generic-morestack.c with -mpreferred-stack-boundary=2.  FIXME.
 	call	__generic_morestack
 
 	movl	%eax,%esp		# Switch to the new stack.
@@ -191,8 +215,8 @@ __morestack:
 
 	call	__morestack_unblock_signals
 
-	movl	-8(%ebp),%edx		# Restore registers.
-	movl	-12(%ebp),%ecx
+	movl	-12(%ebp),%edx		# Restore registers.
+	movl	-16(%ebp),%ecx
 
 	movl	4(%ebp),%eax		# Increment the return address
 	cmpb	$0xc3,(%eax)		# to skip the ret instruction;
@@ -200,12 +224,12 @@ __morestack:
 	addl	$2,%eax
 1:	inc	%eax
 
-	movl	%eax,-8(%ebp)		# Store return address in an
+	movl	%eax,-12(%ebp)		# Store return address in an
 	# unused slot.
 
-	movl	-4(%ebp),%eax		# Restore the last register.
+	movl	-8(%ebp),%eax		# Restore the last register.
 
-	call	*-8(%ebp)		# Call our caller!
+	call	*-12(%ebp)		# Call our caller!
 
 	# The caller will return here, as predicted.
 
@@ -255,9 +279,13 @@ __morestack:
 	popl	%eax
 
 	.cfi_remember_state
+
+	# We never changed %ebx, so we don't have to actually restore it.
+	.cfi_restore %ebx
+
 	popl	%ebp
 	.cfi_restore %ebp
-	.cfi_def_cfa %esp, 12
+	.cfi_def_cfa %esp, 4
 	ret	$8			# Return to caller, which will
 	# immediately return.  Pop
 	# arguments as we go.
@@ -300,13 +328,

Go patch committed: Add rune type

2011-10-28 Thread Ian Lance Taylor
The Go language has a new type, rune, which is currently an alias for
int.  In the future it will be an alias for int32.  This type represents
a Unicode character.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r b1a5c9a0b9ff go/gogo.cc
--- a/go/gogo.cc	Wed Oct 26 21:55:17 2011 -0700
+++ b/go/gogo.cc	Fri Oct 28 15:04:19 2011 -0700
@@ -85,6 +85,10 @@
   Named_object* byte_type = this->declare_type("byte", loc);
   byte_type->set_type_value(uint8_type);
 
+  // "rune" is an alias for "int".
+  Named_object* rune_type = this->declare_type("rune", loc);
+  rune_type->set_type_value(int_type);
+
   this->add_named_type(Type::make_integer_type("uintptr", true,
 	   pointer_size,
 	   RUNTIME_TYPE_KIND_UINTPTR));


  1   2   >