Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-04 Thread Paolo Bonzini

On 10/04/2011 01:00 AM, H.J. Lu wrote:

+  else
+{
+  /* Improve address combine in x32 mode.  */
+  if (TARGET_X32
+   &&  code == PLUS
+   &&  !MEM_P (dst)
+   &&  !MEM_P (src1)
+   &&  MEM_P (src2) )
+   src2 = force_reg (mode, src2);
+}


Perhaps this is worthwhile also on non-x32?

Paolo



[PATCH] Handle side-effects of EH optimizations of callees

2011-10-04 Thread Maxim Kuvyrkov
Richard,

The following patch fixes a CFG consistency problem.

When early IPA optimizations (e.g., early SRA) create a version of a function 
that no longer throws, versioning machinery deletes EH landings pads in 
_callers_ of the function [*].  However, the EH cfg edges are not updated in 
the callers, which eventually leads to an ICE in verify_eh_edges.

I've tried several ways to fix this, including trying to make early 
optimizations not delete callers' landing pads.  You are the original author of 
the gimple EH code, and I wonder if you know the rationale behind removing EH 
landing pads from callers or a function being optimized.

The failure triggers on several libstdc++ testcases, but requires an unrelated 
patch of mine (soon to be posted).

In any case, attached is what appears to be the best way to fix this up.

Bootstrapped and regrested on x86_64-linux-gnu.  OK for trunk?

Thank you,

[*] this is done in cgraphunit.c: update_call_expr() and 
ipa_modify_call_arguments() via maybe_clean[_and_replace]_eh_stmt().

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics




fsf-gcc-fixup_eh.ChangeLog
Description: Binary data


fsf-gcc-fixup_eh.patch
Description: Binary data


[patch][committed] Fix check_effective_target_vect_multiple_sizes and check_effective_target_vect64

2011-10-04 Thread Ira Rosen
Hi,

Michael pointed out this problem in
check_effective_target_vect_multiple_sizes and
check_effective_target_vect64 that I added lately.

Tested on powerpc64-suse-linux.
Committed as obvious.

Thanks,
Ira

testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_vect_multiple_sizes):
Make et_vect_multiple_sizes_saved global.
(check_effective_target_vect64): Make et_vect64_saved global.


Index: testsuite/lib/target-supports.exp
===
--- testsuite/lib/target-supports.exp   (revision 179489)
+++ testsuite/lib/target-supports.exp   (working copy)
@@ -3375,7 +3375,7 @@ foreach N {2 3 4 8} {
 # Return 1 if the target supports multiple vector sizes

 proc check_effective_target_vect_multiple_sizes { } {
-global et_vect_multiple_sizes
+global et_vect_multiple_sizes_saved

 if [info exists et_vect_multiple_sizes_saved] {
 verbose "check_effective_target_vect_multiple_sizes: using
cached result" 2
@@ -3393,7 +3393,7 @@ proc check_effective_target_vect_multiple_sizes {
 # Return 1 if the target supports vectors of 64 bits.

 proc check_effective_target_vect64 { } {
-global et_vect64
+global et_vect64_saved

 if [info exists et_vect64_saved] {
 verbose "check_effective_target_vect64: using cached result" 2


[SPARC] Fix build failure

2011-10-04 Thread Eric Botcazou
Tested on SPARC/Solaris, applied on the mainline.


2011-10-04  Eric Botcazou  

* config/sparc/sparc.c (sparc_fold_builtin): Use a sequence of tests.


-- 
Eric Botcazou
Index: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 179488)
+++ config/sparc/sparc.c	(working copy)
@@ -9716,24 +9716,23 @@ sparc_fold_builtin (tree fndecl, int n_a
 
   if (ignore)
 {
-  switch (icode)
-	{
-	case CODE_FOR_alignaddrsi_vis:
-	case CODE_FOR_alignaddrdi_vis:
-	case CODE_FOR_wrgsr_vis:
-	case CODE_FOR_bmasksi_vis:
-	case CODE_FOR_bmaskdi_vis:
-	case CODE_FOR_cmask8si_vis:
-	case CODE_FOR_cmask8di_vis:
-	case CODE_FOR_cmask16si_vis:
-	case CODE_FOR_cmask16di_vis:
-	case CODE_FOR_cmask32si_vis:
-	case CODE_FOR_cmask32di_vis:
-	  break;
-
-	default:
-	  return build_zero_cst (rtype);
-	}
+  /* Note that a switch statement instead of the sequence of tests would
+	 be incorrect as many of the CODE_FOR values could be CODE_FOR_nothing
+	 and that would yield multiple alternatives with identical values.  */
+  if (icode == CODE_FOR_alignaddrsi_vis
+	  || icode == CODE_FOR_alignaddrdi_vis
+	  || icode == CODE_FOR_wrgsr_vis
+	  || icode == CODE_FOR_bmasksi_vis
+	  || icode == CODE_FOR_bmaskdi_vis
+	  || icode == CODE_FOR_cmask8si_vis
+	  || icode == CODE_FOR_cmask8di_vis
+	  || icode == CODE_FOR_cmask16si_vis
+	  || icode == CODE_FOR_cmask16di_vis
+	  || icode == CODE_FOR_cmask32si_vis
+	  || icode == CODE_FOR_cmask32di_vis)
+	;
+  else
+	return build_zero_cst (rtype);
 }
 
   switch (icode)


Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-10-04 Thread Tom de Vries
On 10/01/2011 05:46 PM, Tom de Vries wrote:
> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries  wrote:
>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
 On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries  
 wrote:
> Richard,
>
> I got a patch for PR50527.
>
> The patch prevents the alignment of vla-related allocas to be set to
> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
> the alloca.
>
> Bootstrapped and regtested on x86_64.
>
> OK for trunk?

 Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
 the vectorizer then will no longer see that the arrays are properly 
 aligned.

 I'm not sure what the best thing to do is here, other than trying to record
 the alignment requirement of the VLA somewhere.

 Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
 has the issue that it will force stack-realignment which isn't free (and 
 the
 point was to make the decl cheaper than the alloca).  But that might
 possibly be the better choice.

 Any other thoughts?
>>>
>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of 
>>> the vla
>>> for the new array prevents stack realignment for folded vla-allocas, also 
>>> for
>>> large vlas.
>>>
>>> This will not help in vectorizing large folded vla-allocas, but I think 
>>> it's not
>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>> been the case up until we started to fold). If you want to trigger 
>>> vectorization
>>> for a vla, you can still use the aligned attribute on the declaration.
>>>
>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without 
>>> using
>>> an attribute on the decl. This patch exploits this by setting it at the end 
>>> of
>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>> propagation though, because although the ptr_info of the lhs is propagated 
>>> via
>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>
>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 
>>> and
>>> not fold during ccp3.
>>
>> Ugh, somehow I like this the least ;)
>>
>> How about lowering VLAs to
>>
>>   p = __builtin_alloca (...);
>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>
>> and not assume anything for alloca itself if it feeds a
>> __builtin_assume_aligned?
>>
>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>
>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>
>> that's less awkward to use?
>>
>> Sorry for not having a clear plan here ;)
>>
> 
> Using assume_aligned is a more orthogonal way to represent this in gimple, but
> indeed harder to use.
> 
> Another possibility is to add a 'tree vla_decl' field to struct
> gimple_statement_call, which is probably the easiest to implement.
> 
> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
> decided to try this one. Attached patch implements my first stab at this  (now
> testing on x86_64).
> 
> Is this an acceptable approach?
> 

bootstrapped and reg-tested (including ada) on x86_64.

Ok for trunk?

Thanks,
- Tom

2011-10-03  Tom de Vries  

PR middle-end/50527
* tree.c (build_common_builtin_nodes): Add local_define_builtin for
BUILT_IN_ALLOCA_WITH_ALIGN.
* builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
arglist.
(expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.  Expand
BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA.
(is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
* tree-pass.h (pass_ccp_last): Declare.
* passes.c (init_optimization_passes): Replace 3rd pass_ccp with
pass_ccp_last.
* gimplify.c (gimplify_vla_decl): Lower vla to
BUILT_IN_ALLOCA_WITH_ALIGN.
* tree-ssa-ccp.c (cpp_last): Define.
(ccp_finalize): Improve align ptr_info for BUILT_IN_ALLOCA_WITH_ALIGN.
(evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
(builtin_alloca_with_align_p): New function.
(fold_builtin_alloca_with_align_p): New function, factored out of ...
(fold_builtin_alloca_for_var): Renamed to ...
(fold_builtin_alloca_with_align): Use fold_builtin_alloca_with_align_p.
Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument.
(ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if
builtin_alloca_with_align_p.
(do_ssa_ccp_last): New function.
(pass_ccp_last): Define.
(optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
* builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
DEF_BUILTIN_STUB.
* ipa-pure-const.c (special_builtin_state): Handle
BUILT_IN_ALLOCA_WITH_ALIGN.
* tree-ssa-alias.c (ref_ma

Re: [3/4] SMS: Record moves in the partial schedule

2011-10-04 Thread Richard Sandiford
Ayal Zaks  writes:
> On Wed, Sep 28, 2011 at 4:53 PM, Richard Sandiford
>  wrote:
>> Ayal Zaks  writes:
> Only request is to document that the register moves are
> placed/assigned-id's in a specific order.

I suppose this is the downside of splitting the patches up, sorry,
but the ids are only ordered for the throw-away loop:

 FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);

and for the prologue/epilogue code.  Both are replaced in patch 4
with code that doesn't rely on the ordering of ids.
>>> Ok then, very well. I was mostly referring to the following in
>>> prologue/epiloque code, which merely relies on assigning all regmoves
>>> of a node consecutive id's:
>>
>> FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move
>> & nreg_moves fields.
>>
>> Here's a slightly updated patch in line with your 4/4 comments.
>> The move->def is now always the id of the predecessor, rather than
>> the id of the original ddg node producer.  I've adapted the throw-away
>> loop accordingly, but there are no other changes.
>>
>
> This is ok.

Thanks.

> Just to make sure I follow, the changes were (for this patch):
>
> 1. setting up a different move->def for each move
>
>> + move->def = i_reg_move > 0 ? first_move + i_reg_move - 1 : i;
>
> instead of the same original def for all
>
>> +  move->def = i;
>
> 2. inserting each move right before its own def, bottom-up:
>
>> +  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
>> +add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>
> instead of inserting each move right before the original def, top-down:
>
 FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);

Yeah, those were the ones.

Richard



Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-04 Thread Uros Bizjak
On Tue, Oct 4, 2011 at 8:17 AM, Uros Bizjak  wrote:
> On Tue, Oct 4, 2011 at 1:00 AM, H.J. Lu  wrote:
>> This patch improves address combine for x32 by forcing the memory memory
>> operand of PLUS operation into register.  Tested on Linux/x86-64 with
>> -mx32.  OK for trunk?
>
> Does the patch fix
>
> FAIL: gcc.target/i386/pr45670.c scan-assembler-not lea[lq]
>
> on x32 ?

It does.

Following patch is the same, but takes into account that non-matching
memory can only be in src2, so it avoids a bunch of unnecessary
checks. Can you please check the effects of the patch with some
codesize benchmark?

Please also add the test from the PR that checks for lea (similar to
pr45670.c test).

Index: i386.c
===
--- i386.c  (revision 179489)
+++ i386.c  (working copy)
@@ -15727,6 +15727,10 @@
   if (MEM_P (src1) && !rtx_equal_p (dst, src1))
 src1 = force_reg (mode, src1);

+  /* Improve address combine in x32 mode.  */
+  if (TARGET_X32 && code == PLUS && MEM_P (src2))
+src2 = force_reg (mode, src2);
+
   operands[1] = src1;
   operands[2] = src2;
   return dst;

Uros.


Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-04 Thread Uros Bizjak
On Tue, Oct 4, 2011 at 10:06 AM, Uros Bizjak  wrote:
>>> This patch improves address combine for x32 by forcing the memory memory
>>> operand of PLUS operation into register.  Tested on Linux/x86-64 with
>>> -mx32.  OK for trunk?
>>
>> Does the patch fix
>>
>> FAIL: gcc.target/i386/pr45670.c scan-assembler-not lea[lq]
>>
>> on x32 ?
>
> It does.
>
> Following patch is the same, but takes into account that non-matching
> memory can only be in src2, so it avoids a bunch of unnecessary
> checks. Can you please check the effects of the patch with some
> codesize benchmark?

OTOH, x86_64 and i686 targets can also benefit from this change. If
combine can't create more complex address (covered by lea), then it
will simply propagate memory operand back into the add insn. It looks
to me that we can't loose here, so:

  /* Improve address combine.  */
  if (code == PLUS && MEM_P (src2))
src2 = force_reg (mode, src2);

Any opinions?

Uros.


Re: [testsuite] Don't XFAIL gcc.dg/graphite/interchange-14.c (PR tree-optimization/49662)

2011-10-04 Thread Richard Guenther
On Fri, 30 Sep 2011, Rainer Orth wrote:

> It seems that the following three tests don't fail anymore anywhere for
> some time, so the following patch removes the three xfail's to avoid the
> noise from XPASSes.
> 
> Tested with the approrpriate runtest invocation on i386-pc-solaris2.11.
> 
> Ok for mainline?

Ok.

Thanks,
Richard.

>   Rainer
> 
> 
> 2011-09-30  Rainer Orth  
> 
>   PR tree-optimization/49662
>   * gcc.dg/graphite/interchange-14.c: Remove xfail *-*-*.
>   * gcc.dg/graphite/interchange-15.c: Likewise.
>   * gcc.dg/graphite/interchange-mvt.c: Likewise.
> 
> 

-- 
Richard Guenther 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Re: [PATCH] Look at restrict disambiguation in tree-ssa-alias.c unconditionally (PR tree-optimization/50522)

2011-10-04 Thread Richard Guenther
On Fri, 30 Sep 2011, Joseph S. Myers wrote:

> On Mon, 26 Sep 2011, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > Adding Joseph and Jason to CC.
> > 
> > On Mon, Sep 26, 2011 at 04:56:20PM +0200, Richard Guenther wrote:
> > > Let's see what kind of fallout we get ;)  For example, if the
> > > following is valid C code I expect we will vectorize the second
> > > loop (disambiguating p[i] and q[i]) bogously:
> > > 
> > > void foo (int *p)
> > > {
> > >   int * __restrict p1 = p;
> > >   int * __restrict p2 = p + 32;
> > >   int *q;
> > >   int i;
> > >   for (i = 0; i < 32; ++i)
> > > p1[i] = p2[i];
> > >   p = p1;
> > >   q = p2 - 31;
> > >   for (i = 0; i < 32; ++i)
> > > p[i] = q[i];
> > > }
> > > 
> > > because p and q base on different restrict qualified pointers
> > > (p1 and p2 respective).  At the moment we are safe from this
> > > because of the TYPE_RESTRICT checks.
> > > 
> > > Any opinion on the above?  Is it valid to base non-restrict
> > > pointers on restrict ones?  It would be sort-of weird at least,
> > > but at least I don't think the first loop use is bogus (even
> > > though the pointed-to objects are the same).
> > 
> > If the last loop was
> >   for (i = 0; i < 32; i++)
> > q[i] = p[i];
> > then I believe the above would be clearly invalid C99, because
> > an object X (say incoming p[4]) would be modified in the same block
> > using a pointer based on p1 and using a pointer not based on p1
> > (q), which would violate the requirements that if the object is
> > modified through lvalue whose address is based on p1, all modifications
> > to B in that block should be done through lvalues whose address is
> > based on p1.  In the above testcase all modifications are made through
> > lvalues whose addresses are p1 based though, so it is less clear.
> > Joseph?
> 
> If an object that is accessed by a restricted pointer is also modified, 
> then all accesses (not just all modifications) must be through pointers 
> based on the restricted pointer.  So in the original loop with p[i] = 
> q[i], q[i] for i from 0 to 30 is an object that was previously modified 
> through p1 and is now being accessed through p2.  So this code appears 
> invalid to me.

In the above first loop the restrict pointers p1 and p2 access
distinct object pieces.  The second loop uses non-restrict qualified
pointers p and q (that are based on the restrict variants p1 and p2
though) to access overlapping pieces.  Is the second loop invalid
because p and q are based on p1 and p2 even though they are not
restrict qualified?  Or is the loop ok because p and q are not
restrict qualified?

Thanks,
Richard.

-- 
Richard Guenther 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Re: [PATCH] Restrict fixes (take 2)

2011-10-04 Thread Richard Guenther
On Fri, 30 Sep 2011, Jakub Jelinek wrote:

> On Fri, Sep 30, 2011 at 10:57:25AM +0200, Richard Guenther wrote:
> > Definitely.  Seeing a decl will enable better offset-based
> > disambiguation.
> 
> Ok, here is an updated patch.  Bootstrapped/regtested on x86_64-linux
> and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2011-09-30  Jakub Jelinek  
> 
>   * fold-const.c (fold_unary_loc): Don't optimize
>   POINTER_PLUS_EXPR casted to TYPE_RESTRICT pointer by
>   casting the inner pointer if it isn't TYPE_RESTRICT.
>   * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Don't through
>   casts from non-TYPE_RESTRICT pointer to TYPE_RESTRICT pointer.
> 
>   * gcc.dg/tree-ssa/restrict-4.c: New test.
> 
> --- gcc/fold-const.c.jj   2011-09-29 14:25:46.0 +0200
> +++ gcc/fold-const.c  2011-09-29 18:20:04.0 +0200
> @@ -7929,6 +7929,7 @@ fold_unary_loc (location_t loc, enum tre
>that this happens when X or Y is NOP_EXPR or Y is INTEGER_CST. */
>if (POINTER_TYPE_P (type)
> && TREE_CODE (arg0) == POINTER_PLUS_EXPR
> +   && (!TYPE_RESTRICT (type) || TYPE_RESTRICT (TREE_TYPE (arg0)))
> && (TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
> || TREE_CODE (TREE_OPERAND (arg0, 0)) == NOP_EXPR
> || TREE_CODE (TREE_OPERAND (arg0, 1)) == NOP_EXPR))
> --- gcc/tree-ssa-forwprop.c.jj2011-09-15 12:18:54.0 +0200
> +++ gcc/tree-ssa-forwprop.c   2011-09-29 19:08:03.0 +0200
> @@ -804,6 +804,11 @@ forward_propagate_addr_expr_1 (tree name
>&& ((rhs_code == SSA_NAME && rhs == name)
> || CONVERT_EXPR_CODE_P (rhs_code)))
>  {
> +  /* Don't propagate restrict pointer's RHS.  */
> +  if (TYPE_RESTRICT (TREE_TYPE (lhs))
> +   && !TYPE_RESTRICT (TREE_TYPE (name))
> +   && !is_gimple_min_invariant (def_rhs))
> + return false;
>/* Only recurse if we don't deal with a single use or we cannot
>do the propagation to the current statement.  In particular
>we can end up with a conversion needed for a non-invariant
> --- gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c.jj 2011-09-29 
> 20:21:00.0 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c2011-09-29 
> 20:21:57.0 +0200
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int
> +foo (int *x, int y)
> +{
> +  int *__restrict p1 = x;
> +  int *__restrict p2 = x + 32;
> +  p1[y] = 1;
> +  p2[4] = 2;
> +  return p1[y];
> +}
> +
> +int
> +bar (int *x, int y)
> +{
> +  int *__restrict p1 = x;
> +  int *p3 = x + 32;
> +  int *__restrict p2 = p3;
> +  p1[y] = 1;
> +  p2[4] = 2;
> +  return p1[y];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> 
>   Jakub
> 
> 

-- 
Richard Guenther 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Re: [PATCH] Support -m{cpu,tune}=native on Linux/Sparc

2011-10-04 Thread Gerald Pfeifer
On Sun, 2 Oct 2011, David Miller wrote:
> We always use "Sparc" instead of "SPARC" for the cpu names output
> by the kernel in /proc/cpuinfo.

Ah, I see.  Thanks for the explanation.

What confused me a bit is that a bit earlier in the same table
there is one (and just one) entry using "SPARC":

  { "SuperSPARC",   "supersparc" }

I assume this is for historical reasons?

Gerald


Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series

2011-10-04 Thread Gerald Pfeifer
On Mon, 3 Oct 2011, H.J. Lu wrote:
> I checked it in.

Thanks, H.J.

This needed a small markup fix which I just applied; see below.

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.43
diff -u -r1.43 changes.html
--- changes.html3 Oct 2011 19:38:23 -   1.43
+++ changes.html4 Oct 2011 09:27:15 -
@@ -407,9 +407,9 @@
 Support for new Intel rdrnd instruction is available via 
-mrdrnd.
 Two additional AVX vector conversion instructions are available via 
-mf16c.
 Support for new Intel processor codename IvyBridge with RDRND, 
FSGSBASE and F16C
-  is available through -march=core-avx-i.
+  is available through -march=core-avx-i.
 Support for the new Intel processor codename Haswell with AVX2, FMA, 
BMI,
-  BMI2, LZCNT is available through -march=core-avx2.
+  BMI2, LZCNT is available through -march=core-avx2.
 ...
   
 


Re: Stream cgraph order

2011-10-04 Thread Richard Guenther
On Mon, 3 Oct 2011, Jan Hubicka wrote:

> Hi,
> this patch makes us to stream out and stream in the order fields of cgraph
> correctly, so -fno-toplevel-reorder works within single compilation unit
> with -flto-partition=none.
> 
> This is currently needed to build kernel with LTO and it is useful otherwise
> (i.e. I made the patch originally for some experiments with Mozilla load 
> times)
> 
> Andi has patch to fix stream in code to stream in linker order that fixes
> the order with -flto-partition=none completely and I will followup with patch
> to make partitioning to honor the streaming order.
> 
> Boostrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

> 
> Honza
>   * lto-streamer.h (lto_input_toplevel_asms): Add order_base parameter.
>   * lto-streamer-in.c (lto_input_toplevel_asms): Stream in order.
>   * lto-streamer-out.c (lto_output_toplevel_asms): Stream out order.
>   * lto-cgraph.c (order_base): New static var.
>   (lto_output_node): Stream in order.
>   (lto_output_varpool_node): Stream out order.
>   (input_node): Stream in order.
>   (input_varpool_node): Stream out order.
>   (input_cgraph_1): Initialize order base; update call of
>   lto_input_toplevel_asms.
> 
> Index: lto-streamer.h
> ===
> --- lto-streamer.h(revision 179413)
> +++ lto-streamer.h(working copy)
> @@ -807,7 +807,7 @@ extern void lto_input_function_body (str
>const char *);
>  extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
> const char *);
> -extern void lto_input_toplevel_asms (struct lto_file_decl_data *);
> +extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
>  extern struct data_in *lto_data_in_create (struct lto_file_decl_data *,
>   const char *, unsigned,
>   VEC(ld_plugin_symbol_resolution_t,heap) *);
> Index: lto-streamer-in.c
> ===
> --- lto-streamer-in.c (revision 179413)
> +++ lto-streamer-in.c (working copy)
> @@ -1144,7 +1144,7 @@ lto_input_tree (struct lto_input_block *
>  /* Input toplevel asms.  */
>  
>  void
> -lto_input_toplevel_asms (struct lto_file_decl_data *file_data)
> +lto_input_toplevel_asms (struct lto_file_decl_data *file_data, int 
> order_base)
>  {
>size_t len;
>const char *data = lto_get_section_data (file_data, LTO_section_asm,
> @@ -1173,7 +1173,12 @@ lto_input_toplevel_asms (struct lto_file
>header->lto_header.minor_version);
>  
>while ((str = streamer_read_string_cst (data_in, &ib)))
> -cgraph_add_asm_node (str);
> +{
> +  struct cgraph_asm_node *node = cgraph_add_asm_node (str);
> +  node->order = streamer_read_hwi (&ib) + order_base;
> +  if (node->order >= cgraph_order)
> + cgraph_order = node->order + 1;
> +}
>  
>clear_line_info (data_in);
>lto_data_in_delete (data_in);
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 179413)
> +++ lto-streamer-out.c(working copy)
> @@ -954,7 +954,10 @@ lto_output_toplevel_asms (void)
>streamer_write_char_stream (ob->string_stream, 0);
>  
>for (can = cgraph_asm_nodes; can; can = can->next)
> -streamer_write_string_cst (ob, ob->main_stream, can->asm_str);
> +{
> +  streamer_write_string_cst (ob, ob->main_stream, can->asm_str);
> +  streamer_write_hwi (ob, can->order);
> +}
>  
>streamer_write_string_cst (ob, ob->main_stream, NULL_TREE);
>  
> Index: lto-cgraph.c
> ===
> --- lto-cgraph.c  (revision 179413)
> +++ lto-cgraph.c  (working copy)
> @@ -54,6 +54,9 @@ static void input_cgraph_opt_summary (VE
>  /* Number of LDPR values known to GCC.  */
>  #define LDPR_NUM_KNOWN (LDPR_RESOLVED_DYN + 1)
>  
> +/* All node orders are ofsetted by ORDER_BASE.  */
> +static int order_base;
> +
>  /* Cgraph streaming is organized as set of record whose type
> is indicated by a tag.  */
>  enum LTO_cgraph_tags
> @@ -425,6 +428,7 @@ lto_output_node (struct lto_simple_outpu
>  
>streamer_write_enum (ob->main_stream, LTO_cgraph_tags, LTO_cgraph_last_tag,
>  tag);
> +  streamer_write_hwi_stream (ob->main_stream, node->order);
>  
>/* In WPA mode, we only output part of the call-graph.  Also, we
>   fake cgraph node attributes.  There are two cases that we care.
> @@ -548,6 +552,7 @@ lto_output_varpool_node (struct lto_simp
>struct bitpack_d bp;
>int ref;
>  
> +  streamer_write_hwi_stream (ob->main_stream, node->order);
>lto_output_var_decl_index (ob->decl_state, ob->main_stream, node->decl);
>bp = bitpack_create (ob->main_stream);
>bp_pack_value (&bp, node->externally_visib

RE: [Ping] RE: CR16 Port addition

2011-10-04 Thread Jayant R. Sonar
PING 4: For review
Hi,
Can someone please review the patch:
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01441.html

The previous discussion about the patch can be referred in the earlier 
communication below.

Thanks,
Jayant Sonar


= Begin Message ==
-Original Message-
From: Sumanth Gundapaneni
Sent: Monday, May 30, 2011 6:57 PM
To: 'Joseph Myers'
Cc: gcc-patches@gcc.gnu.org; r...@redhat.com; Jayant R. Sonar
Subject: RE: CR16 Port addition

Hi Joseph and Richard,

Thanks for your time in reviewing the CR16 port and once again I am grateful 
for your valuable suggestions.

Please find attached the patch "cr16-gcc.patch" which contains modifications 
as suggested by Joseph in his previous mail.

For your kind information, I am providing the recent posts regarding
CR16 patches.
Initial post : http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01676.html
Second post  : http://gcc.gnu.org/ml/gcc-patches/2011-01/msg00803.html
Third post   : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00624.html
Fourth post  : http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01441.html

Please review the patch and let me know if there should be any modifications in 
it.

Joseph, please go through my explanation for your comments.

>Please remove the remaining top level configure changes from the patch.

The top level configure changes have been removed as advised.


>Now you seem to have a stray change to the generated file
>libstdc++-v3/configure. Again, you can probably just eliminate this

The configure changes in libtsdc++-v3 are removed too and you can find 
the same in the updated patch.


>and make the code in unwind-dw2-* use those macros, 
>instead of having separate copies of the files.

A new file unwind-helper.h file is created in libgcc/config/cr16 directory 
as per your suggestion and defined a few macros in this newly defined file 
which are getting called from gcc/unwind-dw2.c. Please review the same 
and do let me know for further modifications, if any. We have identified 
issues related to exception handling using prints in unwind code and 
will debug the same to a greater extent in near future with further 
development of GNU based tools.


>Since you first submitted the port, a file contrib/config-list.mk

"cr16-elf" is added to config-list.mk file.


gcc/ChangeLog:
--
2011-05-28  Sumanth G 
Jayant R Sonar 

* config.gcc: Add cr16-* support.

* doc/extend.texi: Document cr16 extensions.
* doc/install.texi: Document cr16 install.
* doc/invoke.texi: Document cr16 options.
* doc/md.texi: Document cr16 constraints.

* config/cr16/cr16.c: New file.
* config/cr16/cr16.h: New file.
* config/cr16/cr16.md: New file.
* config/cr16/cr16.opt: New file.
* config/cr16/cr16-libgcc.s: New file.
* config/cr16/cr16-protos.h: New file.

* config/cr16/divmodhi3.c: New file.
* config/cr16/predicates.md: New file.
* config/cr16/constraints.md: New file.
* config/cr16/t-cr16: New file.

libgcc/ChangeLog:
-
2011-05-28  Sumanth G 
Jayant R Sonar 

* config.host: Add National Semiconductor CR16 target (cr16-*-*).
* config/cr16/crti.S: New file.
* config/cr16/crtlibid.S: New file.
* config/cr16/crtn.S: New file.
* config/cr16/t-cr16: New file.
* config/cr16/unwind-helper.h: New file.

contrib/ChangeLog:
--
2011-05-28  Sumanth G 
Jayant R Sonar 

* config-list.mk: Add National Semiconductor CR16 target.
   
Thanks in advance,
Sumanth G,
PUNE (India).

= End Message ==




Re: Vector Comparison patch

2011-10-04 Thread Georg-Johann Lay
Artem Shinkarov schrieb:
> On Fri, Sep 30, 2011 at 4:54 PM, Jakub Jelinek  wrote:
>> On Fri, Sep 30, 2011 at 04:48:41PM +0100, Artem Shinkarov wrote:
>>> Most likely we can. The question is what do we really want to check
>>> with this test. My intention was to check that a programmer can
>>> statically get correspondence of the types, in a sense that sizeof
>>> (float) == sizeof (int) and sizeof (double) == sizeof (long long). As
>>> it seems my original assumption does not hold. Before using __typeof,
>>> I would try to make sure that there is no other way to determine these
>>> correspondences.
>> You can use preprocessor too, either just surround the whole test
>> with #if __SIZEOF_INT__ == __SIZEOF_FLOAT__ and similar,
>> or select the right type through preprocessor
>> #if __SIZEOF_INT__ == __SIZEOF_FLOAT__
>> #define FLOATCMPTYPE int
>> #elif __SIZEOF_LONG__ == __SIZEOF_FLOAT__
>> #define FLOATCMPTYPE long
>> #else
>> ...
>> or __typeof, etc.
>>
>>Jakub
>>
> 
> Ok, here is a patch which uses __typeof. Passes on x86_64.
> 
> Artem.

The patch from
  http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02060.html
  http://gcc.gnu.org/ml/gcc-patches/2011-09/txt00337.txt
works for me.

If it's ok from maintainer I can apply it for you.

Johann









Re: [PATCH] Look at restrict disambiguation in tree-ssa-alias.c unconditionally (PR tree-optimization/50522)

2011-10-04 Thread Jakub Jelinek
On Tue, Oct 04, 2011 at 11:01:27AM +0200, Richard Guenther wrote:
> > > > void foo (int *p)
> > > > {
> > > >   int * __restrict p1 = p;
> > > >   int * __restrict p2 = p + 32;
> > > >   int *q;
> > > >   int i;
> > > >   for (i = 0; i < 32; ++i)
> > > > p1[i] = p2[i];
> > > >   p = p1;
> > > >   q = p2 - 31;
> > > >   for (i = 0; i < 32; ++i)
> > > > p[i] = q[i];
> > > > }
> > > > 

> In the above first loop the restrict pointers p1 and p2 access
> distinct object pieces.  The second loop uses non-restrict qualified
> pointers p and q (that are based on the restrict variants p1 and p2
> though) to access overlapping pieces.  Is the second loop invalid
> because p and q are based on p1 and p2 even though they are not
> restrict qualified?

IMHO yes.  The standard doesn't seem to talk about the accesses being done
through the restricted pointer, but about accesses that are based on
the restricted pointer, and as soon as you access in the associated block
(here the foo function) some object through an lvalue whose address is
based on some restricted pointer and the value is modified by any means,
then all accesses to that object need to be done through something
based on that restricted pointer.

Jakub


Re: Vector Comparison patch

2011-10-04 Thread Jakub Jelinek
On Tue, Oct 04, 2011 at 11:32:37AM +0200, Georg-Johann Lay wrote:
> The patch from
>   http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02060.html
>   http://gcc.gnu.org/ml/gcc-patches/2011-09/txt00337.txt
> works for me.
> 
> If it's ok from maintainer I can apply it for you.

It is fine with a suitable ChangeLog entry.

Jakub


Re: [PATCH] Look at restrict disambiguation in tree-ssa-alias.c unconditionally (PR tree-optimization/50522)

2011-10-04 Thread Richard Guenther
On Tue, 4 Oct 2011, Jakub Jelinek wrote:

> On Tue, Oct 04, 2011 at 11:01:27AM +0200, Richard Guenther wrote:
> > > > > void foo (int *p)
> > > > > {
> > > > >   int * __restrict p1 = p;
> > > > >   int * __restrict p2 = p + 32;
> > > > >   int *q;
> > > > >   int i;
> > > > >   for (i = 0; i < 32; ++i)
> > > > > p1[i] = p2[i];
> > > > >   p = p1;
> > > > >   q = p2 - 31;
> > > > >   for (i = 0; i < 32; ++i)
> > > > > p[i] = q[i];
> > > > > }
> > > > > 
> 
> > In the above first loop the restrict pointers p1 and p2 access
> > distinct object pieces.  The second loop uses non-restrict qualified
> > pointers p and q (that are based on the restrict variants p1 and p2
> > though) to access overlapping pieces.  Is the second loop invalid
> > because p and q are based on p1 and p2 even though they are not
> > restrict qualified?
> 
> IMHO yes.  The standard doesn't seem to talk about the accesses being done
> through the restricted pointer, but about accesses that are based on
> the restricted pointer, and as soon as you access in the associated block
> (here the foo function) some object through an lvalue whose address is
> based on some restricted pointer and the value is modified by any means,
> then all accesses to that object need to be done through something
> based on that restricted pointer.

So when I change the above to

 /*p = p;*/
 q = (p + 32) - 31;

then the code will be valid?  When I obfuscate that enough I
can get GCC CSEing p + 32 and thus effectively q will look
like it is based on p2.

Richard.


Re: [Patch] Support DEC-C extensions

2011-10-04 Thread Pedro Alves
On Monday 03 October 2011 21:23:43, Joseph S. Myers wrote:
> On Mon, 3 Oct 2011, Douglas Rupp wrote:
> 
> > On 9/30/2011 8:19 AM, Joseph S. Myers wrote:
> > > On Fri, 30 Sep 2011, Tristan Gingold wrote:
> > > 
> > > > If you prefer a target hook, I'm fine with that.  I will write such a
> > > > patch.
> > > > 
> > > > I don't think it must be restricted to system headers, as it is possible
> > > > that the user 'imports' such a function (and define it in one of VMS
> > > > favorite languages such as macro-32 or bliss).
> > > If it's not restricted to system headers, then probably the option is
> > > better than the target hook.
> > > 
> > I'm not sure I understand the reasoning here.  This seems fairly VMS 
> > specific
> > so what is the downside for a target hook and user written headers?
> 
> The language accepted by the compiler in the user's source code (as 
> opposed to in system headers) shouldn't depend on the target except for 
> certain well-defined areas such as target attributes and built-in 
> functions; behaving the same across different systems is an important 
> feature of GCC.  This isn't one of those areas of target-dependence; it's 
> generic syntax rather than e.g. exploiting a particular processor feature.

So I take it a `void foo(...);' declaration to mean the same thing
as an unprototyped `void foo();'?  (If so, I think that should be
explicit in the documentation).  That is, decc system headers are
probably declaring with:

extern void foo(...);

and then the function is defined somewhere with, say:

void foo(int a, int b, void *c)
{
  ...
}

Do we need to consider ABIs that have calling conventions that
treat unprototyped and varargs functions differently? (is there any?)

-- 
Pedro Alves


Re: Vector Comparison patch

2011-10-04 Thread Georg-Johann Lay
Jakub Jelinek schrieb:
> On Tue, Oct 04, 2011 at 11:32:37AM +0200, Georg-Johann Lay wrote:
>> The patch from
>>   http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02060.html
>>   http://gcc.gnu.org/ml/gcc-patches/2011-09/txt00337.txt
>> works for me.
>>
>> If it's ok from maintainer I can apply it for you.
> 
> It is fine with a suitable ChangeLog entry.
> 
>   Jakub

It's here:

http://gcc.gnu.org/viewcvs?view=revision&revision=179497

Johann



Re: [PATCH] Look at restrict disambiguation in tree-ssa-alias.c unconditionally (PR tree-optimization/50522)

2011-10-04 Thread Jakub Jelinek
On Tue, Oct 04, 2011 at 11:55:17AM +0200, Richard Guenther wrote:
> > On Tue, Oct 04, 2011 at 11:01:27AM +0200, Richard Guenther wrote:
> > > > > > void foo (int *p)
> > > > > > {
> > > > > >   int * __restrict p1 = p;
> > > > > >   int * __restrict p2 = p + 32;
> > > > > >   int *q;
> > > > > >   int i;
> > > > > >   for (i = 0; i < 32; ++i)
> > > > > > p1[i] = p2[i];
> > > > > >   p = p1;
> > > > > >   q = p2 - 31;
> > > > > >   for (i = 0; i < 32; ++i)
> > > > > > p[i] = q[i];
> > > > > > }
> > > > > > 
> > 
> > > In the above first loop the restrict pointers p1 and p2 access
> > > distinct object pieces.  The second loop uses non-restrict qualified
> > > pointers p and q (that are based on the restrict variants p1 and p2
> > > though) to access overlapping pieces.  Is the second loop invalid
> > > because p and q are based on p1 and p2 even though they are not
> > > restrict qualified?
> > 
> > IMHO yes.  The standard doesn't seem to talk about the accesses being done
> > through the restricted pointer, but about accesses that are based on
> > the restricted pointer, and as soon as you access in the associated block
> > (here the foo function) some object through an lvalue whose address is
> > based on some restricted pointer and the value is modified by any means,
> > then all accesses to that object need to be done through something
> > based on that restricted pointer.
> 
> So when I change the above to
> 
>  /*p = p;*/
>  q = (p + 32) - 31;

void foo (int *p)
{
  int * __restrict p1 = p;
  int * __restrict p2 = p + 32;
  int *q;
  int i;
  for (i = 0; i < 32; ++i)
p1[i] = p2[i];
  q = (p + 32) - 31;
  for (i = 0; i < 32; ++i)
p[i] = q[i];
}

> then the code will be valid?  When I obfuscate that enough I
> can get GCC CSEing p + 32 and thus effectively q will look
> like it is based on p2.

The above is still invalid. p[0] through p[31] is modified and
is accessed both through lvalue whose address is based on p1 (p1[i])
and through lvalues whose address is not based on p1 (p[i] and
q[i] (the latter only for p[0] through p[30])).  If you take
the first loop out, it would be valid though.

Jakub


Re: [Patch] Support DEC-C extensions

2011-10-04 Thread Tristan Gingold

On Oct 4, 2011, at 11:59 AM, Pedro Alves wrote:

> On Monday 03 October 2011 21:23:43, Joseph S. Myers wrote:
>> On Mon, 3 Oct 2011, Douglas Rupp wrote:
>> 
>>> On 9/30/2011 8:19 AM, Joseph S. Myers wrote:
 On Fri, 30 Sep 2011, Tristan Gingold wrote:
 
> If you prefer a target hook, I'm fine with that.  I will write such a
> patch.
> 
> I don't think it must be restricted to system headers, as it is possible
> that the user 'imports' such a function (and define it in one of VMS
> favorite languages such as macro-32 or bliss).
 If it's not restricted to system headers, then probably the option is
 better than the target hook.
 
>>> I'm not sure I understand the reasoning here.  This seems fairly VMS 
>>> specific
>>> so what is the downside for a target hook and user written headers?
>> 
>> The language accepted by the compiler in the user's source code (as 
>> opposed to in system headers) shouldn't depend on the target except for 
>> certain well-defined areas such as target attributes and built-in 
>> functions; behaving the same across different systems is an important 
>> feature of GCC.  This isn't one of those areas of target-dependence; it's 
>> generic syntax rather than e.g. exploiting a particular processor feature.
> 
> So I take it a `void foo(...);' declaration to mean the same thing
> as an unprototyped `void foo();'?

No, that's not the case.  This is a full prototype.

>  (If so, I think that should be
> explicit in the documentation).  That is, decc system headers are
> probably declaring with:
> 
> extern void foo(...);
> 
> and then the function is defined somewhere with, say:
> 
> void foo(int a, int b, void *c)
> {
>  …
> }

No.

> Do we need to consider ABIs that have calling conventions that
> treat unprototyped and varargs functions differently? (is there any?)

(Yes: x86-64)

Tristan.




[patch RFC,PR50038]

2011-10-04 Thread Ilya Tocar
Hi everyone,

This patch fixes PR 50038 (redundant zero extensions) by modifying
implicit-zee pass
to also remove unneeded zero extensions from QImode to SImode.
There is  6% improvement in rgbyiqv test from EEMBC 2.0 benchmark on x86-64.
I am not sure if this is correct approach ( tom modify  implicit-zee
pass), so comments are welcome.
Also if this aproach is correct we will need to enable  implicit-zee
pass on some new targets ( for example x86 32bit).

It passes bootstrap and make-check.

Here is a Changelog:

2011-09-27  Ilya Tocar  

* implicit-zee.c: Added 2011 to copyright.
(combine_set_zero_extend): Add QImode.
(merge_def_and_ze): Likewise.
(add_removable_zero_extend): Likewise.
(not_qi_to_si): New.
(make_defs_and_copies_lists): Add check for QImode.


zee.patch
Description: Binary data


Re: [Patch] Support DEC-C extensions

2011-10-04 Thread Gabriel Dos Reis
On Tue, Oct 4, 2011 at 4:59 AM, Pedro Alves  wrote:
> On Monday 03 October 2011 21:23:43, Joseph S. Myers wrote:
>> On Mon, 3 Oct 2011, Douglas Rupp wrote:
>>
>> > On 9/30/2011 8:19 AM, Joseph S. Myers wrote:
>> > > On Fri, 30 Sep 2011, Tristan Gingold wrote:
>> > >
>> > > > If you prefer a target hook, I'm fine with that.  I will write such a
>> > > > patch.
>> > > >
>> > > > I don't think it must be restricted to system headers, as it is 
>> > > > possible
>> > > > that the user 'imports' such a function (and define it in one of VMS
>> > > > favorite languages such as macro-32 or bliss).
>> > > If it's not restricted to system headers, then probably the option is
>> > > better than the target hook.
>> > >
>> > I'm not sure I understand the reasoning here.  This seems fairly VMS 
>> > specific
>> > so what is the downside for a target hook and user written headers?
>>
>> The language accepted by the compiler in the user's source code (as
>> opposed to in system headers) shouldn't depend on the target except for
>> certain well-defined areas such as target attributes and built-in
>> functions; behaving the same across different systems is an important
>> feature of GCC.  This isn't one of those areas of target-dependence; it's
>> generic syntax rather than e.g. exploiting a particular processor feature.
>
> So I take it a `void foo(...);' declaration to mean the same thing
> as an unprototyped `void foo();'?  (If so, I think that should be
> explicit in the documentation).  That is, decc system headers are
> probably declaring with:
>
> extern void foo(...);
>
> and then the function is defined somewhere with, say:
>
> void foo(int a, int b, void *c)
> {
>  ...
> }
>
> Do we need to consider ABIs that have calling conventions that
> treat unprototyped and varargs functions differently? (is there any?)

Could you elaborate on the equivalence of these declarations?


Re: [PATCH] Maintain order of LTO sections

2011-10-04 Thread Jan Hubicka
> From: Andi Kleen 
> 
> Currently when reading in LTO sections from ld -r files they can
> get randomly reordered based on hash tables and random IDs.
> This causes reordering later when the final code is generated and
> also makes crashes harder to reproduce.
> 
> This patch maintains explicit lists based on the input order and uses
> those lists to preserve that order when starting the rest of the
> LTO passes.
> 
> This is the first step to working -fno-toplevel-reorder for
> LTO. But this needs more changes because the LTO partitioner
> can still reorder.
> 
> This add two lists: one for the section and another one for
> the file_decl_datas. This is needed because the sections are
> walked twice through different data structures.
> 
> In addition some code becomes slightly cleaner because we don't need
> to pass state through abstract callbacks anymore, but
> can just use direct type safe calls.
> 
> Passes LTO bootstrap and testsuite on x86_64-linux. Ok?

Note that this is also needed to make lto-symtab decisions consistent with 
linker
decisions. So it makes a lot of sense to me, but I can't approve it.
I was running into funny cases where things got resolved differently on
openoffice, so hopefully this will solve it.


Honza
> 
> gcc/lto/:
> 
> 2011-10-02   Andi Kleen 
> 
>   * lto-object.c (lto_obj_add_section_data): Add list.
>   (lto_obj_add_section): Fill in list.
>   (ltoobj_build_section_table): Pass through list.
>   * lto.c (file_data_list): Declare.
>   (create_subid_section_table): Pass arguments directly.
>   Fill in list of file_datas.
>   (lwstate): Delete.
>   (lto_create_files_from_ids): Pass in direct arguments.
>   Don't maintain list.
>   (lto_file_read): Use explicit section and file data lists.
>   (lto_read_all_file_options): Pass in section_list.
>   * lto.h (lto_obj_build_section_table): Add list.
>   (lto_section_slot): Add next.
>   (lto_section_list): Declare.
> 
> diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
> index 3be58de..daf3bd0 100644
> --- a/gcc/lto/lto-object.c
> +++ b/gcc/lto/lto-object.c
> @@ -204,6 +204,8 @@ struct lto_obj_add_section_data
>htab_t section_hash_table;
>/* The offset of this file.  */
>off_t base_offset;
> +  /* List in linker order */
> +  struct lto_section_list *list;
>  };
>  
>  /* This is called for each section in the file.  */
> @@ -218,6 +220,7 @@ lto_obj_add_section (void *data, const char *name, off_t 
> offset,
>char *new_name;
>struct lto_section_slot s_slot;
>void **slot;
> +  struct lto_section_list *list = loasd->list;
>  
>if (strncmp (name, LTO_SECTION_NAME_PREFIX,
>  strlen (LTO_SECTION_NAME_PREFIX)) != 0)
> @@ -228,12 +231,21 @@ lto_obj_add_section (void *data, const char *name, 
> off_t offset,
>slot = htab_find_slot (section_hash_table, &s_slot, INSERT);
>if (*slot == NULL)
>  {
> -  struct lto_section_slot *new_slot = XNEW (struct lto_section_slot);
> +  struct lto_section_slot *new_slot = XCNEW (struct lto_section_slot);
>  
>new_slot->name = new_name;
>new_slot->start = loasd->base_offset + offset;
>new_slot->len = length;
>*slot = new_slot;
> +
> +  if (list != NULL)
> +{
> +  if (!list->first)
> +list->first = new_slot;
> +  if (list->last)
> +list->last->next = new_slot;
> +  list->last = new_slot;
> +}
>  }
>else
>  {
> @@ -248,7 +260,7 @@ lto_obj_add_section (void *data, const char *name, off_t 
> offset,
> the start and size of each section in the .o file.  */
>  
>  htab_t
> -lto_obj_build_section_table (lto_file *lto_file)
> +lto_obj_build_section_table (lto_file *lto_file, struct lto_section_list 
> *list)
>  {
>struct lto_simple_object *lo = (struct lto_simple_object *) lto_file;
>htab_t section_hash_table;
> @@ -261,6 +273,7 @@ lto_obj_build_section_table (lto_file *lto_file)
>gcc_assert (lo->sobj_r != NULL && lo->sobj_w == NULL);
>loasd.section_hash_table = section_hash_table;
>loasd.base_offset = lo->base.offset;
> +  loasd.list = list;
>errmsg = simple_object_find_sections (lo->sobj_r, lto_obj_add_section,
>   &loasd, &err);
>if (errmsg != NULL)
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index 778e33e..a77eeb4 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -1052,6 +1052,12 @@ lto_resolution_read (splay_tree file_ids, FILE 
> *resolution, lto_file *file)
>  }
>  }
>  
> +/* List of file_decl_datas */
> +struct file_data_list
> +  {
> +struct lto_file_decl_data *first, *last;
> +  };
> +
>  /* Is the name for a id'ed LTO section? */
>  
>  static int 
> @@ -1068,11 +1074,10 @@ lto_section_with_id (const char *name, unsigned 
> HOST_WIDE_INT *id)
>  /* Create file_data of each sub file id */
>  
>  static int 
> -create_subid_section_table (void **slot, void *data)
> +create_subid_section_

Re: [PATCH] Look at restrict disambiguation in tree-ssa-alias.c unconditionally (PR tree-optimization/50522)

2011-10-04 Thread Richard Guenther
On Tue, 4 Oct 2011, Jakub Jelinek wrote:

> On Tue, Oct 04, 2011 at 11:55:17AM +0200, Richard Guenther wrote:
> > > On Tue, Oct 04, 2011 at 11:01:27AM +0200, Richard Guenther wrote:
> > > > > > > void foo (int *p)
> > > > > > > {
> > > > > > >   int * __restrict p1 = p;
> > > > > > >   int * __restrict p2 = p + 32;
> > > > > > >   int *q;
> > > > > > >   int i;
> > > > > > >   for (i = 0; i < 32; ++i)
> > > > > > > p1[i] = p2[i];
> > > > > > >   p = p1;
> > > > > > >   q = p2 - 31;
> > > > > > >   for (i = 0; i < 32; ++i)
> > > > > > > p[i] = q[i];
> > > > > > > }
> > > > > > > 
> > > 
> > > > In the above first loop the restrict pointers p1 and p2 access
> > > > distinct object pieces.  The second loop uses non-restrict qualified
> > > > pointers p and q (that are based on the restrict variants p1 and p2
> > > > though) to access overlapping pieces.  Is the second loop invalid
> > > > because p and q are based on p1 and p2 even though they are not
> > > > restrict qualified?
> > > 
> > > IMHO yes.  The standard doesn't seem to talk about the accesses being done
> > > through the restricted pointer, but about accesses that are based on
> > > the restricted pointer, and as soon as you access in the associated block
> > > (here the foo function) some object through an lvalue whose address is
> > > based on some restricted pointer and the value is modified by any means,
> > > then all accesses to that object need to be done through something
> > > based on that restricted pointer.
> > 
> > So when I change the above to
> > 
> >  /*p = p;*/
> >  q = (p + 32) - 31;
> 
> void foo (int *p)
> {
>   int * __restrict p1 = p;
>   int * __restrict p2 = p + 32;
>   int *q;
>   int i;
>   for (i = 0; i < 32; ++i)
> p1[i] = p2[i];
>   q = (p + 32) - 31;
>   for (i = 0; i < 32; ++i)
> p[i] = q[i];
> }
> 
> > then the code will be valid?  When I obfuscate that enough I
> > can get GCC CSEing p + 32 and thus effectively q will look
> > like it is based on p2.
> 
> The above is still invalid. p[0] through p[31] is modified and
> is accessed both through lvalue whose address is based on p1 (p1[i])
> and through lvalues whose address is not based on p1 (p[i] and
> q[i] (the latter only for p[0] through p[30])).  If you take
> the first loop out, it would be valid though.

So

  int *x;

> void foo (int *p)
> {
>   int * __restrict p1 = p;
>   int * __restrict p2 = p + 32;
>   int *q;
>   int i;
x = p2;
>   q = p + 32;
q = q - 31;
>   for (i = 0; i < 32; ++i)
> p[i] = q[i];
> }

would be valid and we'd rely on CSE not to replace q = p + 32
with q = p2 (ignoring the fact that for a miscompile we need
similar tricks for p1).  It doesn't do that at the moment
because we fold int * __restrict p2 = p + 32 to
((int * __restrict)p) + 32 and thus see

  p.0_4 = (int * restrict) p_2(D);
  p2_5 = p.0_4 + 128;

vs.

  q_6 = p_2(D) + 128;

but you are going to change that ;)

Richard.


Re: [PATCH] Maintain order of LTO sections

2011-10-04 Thread Richard Guenther
On Tue, 4 Oct 2011, Jan Hubicka wrote:

> > From: Andi Kleen 
> > 
> > Currently when reading in LTO sections from ld -r files they can
> > get randomly reordered based on hash tables and random IDs.
> > This causes reordering later when the final code is generated and
> > also makes crashes harder to reproduce.
> > 
> > This patch maintains explicit lists based on the input order and uses
> > those lists to preserve that order when starting the rest of the
> > LTO passes.
> > 
> > This is the first step to working -fno-toplevel-reorder for
> > LTO. But this needs more changes because the LTO partitioner
> > can still reorder.
> > 
> > This add two lists: one for the section and another one for
> > the file_decl_datas. This is needed because the sections are
> > walked twice through different data structures.
> > 
> > In addition some code becomes slightly cleaner because we don't need
> > to pass state through abstract callbacks anymore, but
> > can just use direct type safe calls.
> > 
> > Passes LTO bootstrap and testsuite on x86_64-linux. Ok?
> 
> Note that this is also needed to make lto-symtab decisions consistent with 
> linker
> decisions. So it makes a lot of sense to me, but I can't approve it.
> I was running into funny cases where things got resolved differently on
> openoffice, so hopefully this will solve it.

Well, I'm not sure we should jump through too much hoops to make
-flto work with -fno-toplevel-reorder.  Simply because I think nothing
defines any "toplevel order" for multiple object files.  So, I think
both ld and gcc/collect2 need to first document they will never
break toplevel order (what about ld with -ffunction-sections and
-gc-sections? Or -relax?)

Richard.

> 
> Honza
> > 
> > gcc/lto/:
> > 
> > 2011-10-02   Andi Kleen 
> > 
> > * lto-object.c (lto_obj_add_section_data): Add list.
> > (lto_obj_add_section): Fill in list.
> > (ltoobj_build_section_table): Pass through list.
> > * lto.c (file_data_list): Declare.
> > (create_subid_section_table): Pass arguments directly.
> > Fill in list of file_datas.
> > (lwstate): Delete.
> > (lto_create_files_from_ids): Pass in direct arguments.
> > Don't maintain list.
> > (lto_file_read): Use explicit section and file data lists.
> > (lto_read_all_file_options): Pass in section_list.
> > * lto.h (lto_obj_build_section_table): Add list.
> > (lto_section_slot): Add next.
> > (lto_section_list): Declare.
> > 
> > diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
> > index 3be58de..daf3bd0 100644
> > --- a/gcc/lto/lto-object.c
> > +++ b/gcc/lto/lto-object.c
> > @@ -204,6 +204,8 @@ struct lto_obj_add_section_data
> >htab_t section_hash_table;
> >/* The offset of this file.  */
> >off_t base_offset;
> > +  /* List in linker order */
> > +  struct lto_section_list *list;
> >  };
> >  
> >  /* This is called for each section in the file.  */
> > @@ -218,6 +220,7 @@ lto_obj_add_section (void *data, const char *name, 
> > off_t offset,
> >char *new_name;
> >struct lto_section_slot s_slot;
> >void **slot;
> > +  struct lto_section_list *list = loasd->list;
> >  
> >if (strncmp (name, LTO_SECTION_NAME_PREFIX,
> >strlen (LTO_SECTION_NAME_PREFIX)) != 0)
> > @@ -228,12 +231,21 @@ lto_obj_add_section (void *data, const char *name, 
> > off_t offset,
> >slot = htab_find_slot (section_hash_table, &s_slot, INSERT);
> >if (*slot == NULL)
> >  {
> > -  struct lto_section_slot *new_slot = XNEW (struct lto_section_slot);
> > +  struct lto_section_slot *new_slot = XCNEW (struct lto_section_slot);
> >  
> >new_slot->name = new_name;
> >new_slot->start = loasd->base_offset + offset;
> >new_slot->len = length;
> >*slot = new_slot;
> > +
> > +  if (list != NULL)
> > +{
> > +  if (!list->first)
> > +list->first = new_slot;
> > +  if (list->last)
> > +list->last->next = new_slot;
> > +  list->last = new_slot;
> > +}
> >  }
> >else
> >  {
> > @@ -248,7 +260,7 @@ lto_obj_add_section (void *data, const char *name, 
> > off_t offset,
> > the start and size of each section in the .o file.  */
> >  
> >  htab_t
> > -lto_obj_build_section_table (lto_file *lto_file)
> > +lto_obj_build_section_table (lto_file *lto_file, struct lto_section_list 
> > *list)
> >  {
> >struct lto_simple_object *lo = (struct lto_simple_object *) lto_file;
> >htab_t section_hash_table;
> > @@ -261,6 +273,7 @@ lto_obj_build_section_table (lto_file *lto_file)
> >gcc_assert (lo->sobj_r != NULL && lo->sobj_w == NULL);
> >loasd.section_hash_table = section_hash_table;
> >loasd.base_offset = lo->base.offset;
> > +  loasd.list = list;
> >errmsg = simple_object_find_sections (lo->sobj_r, lto_obj_add_section,
> > &loasd, &err);
> >if (errmsg != NULL)
> > diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> > index 778e

[Committed] S/390: longlong.h fix for zarch

2011-10-04 Thread Andreas Krebbel
Hi,

the attached patch fixes the smul_ppmm and sdiv_qrnnd patterns for
zarch.  Underscores were missing in local variables.  DR referenced
the wrong operand and the source and result regs were swapped.

Tested on s390 with -mzarch.

Applied to 4.6 and mainline.

Bye,

-Andreas-


2011-10-04  Andreas Krebbel  

* longlong.h (smul_ppmm, sdiv_qrnnd): Add underscores to the local
variables.  Fix second operand of DR.  Swap inputs for sdiv_qrnnd.


Index: gcc/longlong.h
===
*** gcc/longlong.h.orig
--- gcc/longlong.h
*** UDItype __umulsidi3 (USItype, USItype);
*** 365,387 
  #else
  #define smul_ppmm(xh, xl, m0, m1) \
do {  \
! register SItype r0 __asm__ ("0");   \
! register SItype r1 __asm__ ("1") = m0;  \
  \
  __asm__ ("mr\t%%r0,%3"  \
!  : "=r" (r0), "=r" (r1) \
!  : "r"  (r1),  "r" (m1));   \
! (xh) = r0; (xl) = r1;   \
} while (0)
  #define sdiv_qrnnd(q, r, n1, n0, d) \
!   do {
\
! register SItype r0 __asm__ ("0") = n0;  \
! register SItype r1 __asm__ ("1") = n1;  \
  \
! __asm__ ("dr\t%%r0,%3"\
!: "=r" (r0), "=r" (r1) \
!: "r" (r0), "r" (r1), "r" (d));\
! (q) = r0; (r) = r1;   \
} while (0)
  #endif /* __zarch__ */
  #endif
--- 365,388 
  #else
  #define smul_ppmm(xh, xl, m0, m1) \
do {  \
! register SItype __r0 __asm__ ("0");   
\
! register SItype __r1 __asm__ ("1") = (m0);
\
  \
  __asm__ ("mr\t%%r0,%3"  \
!  : "=r" (__r0), "=r" (__r1)   
\
!  : "r"  (__r1),  "r" (m1));   
\
! (xh) = __r0; (xl) = __r1; \
} while (0)
+ 
  #define sdiv_qrnnd(q, r, n1, n0, d) \
!   do {  \
! register SItype __r0 __asm__ ("0") = (n1);
\
! register SItype __r1 __asm__ ("1") = (n0);
\
  \
! __asm__ ("dr\t%%r0,%4"  \
!  : "=r" (__r0), "=r" (__r1)   
\
!  : "r" (__r0), "r" (__r1), "r" (d));  \
! (q) = __r1; (r) = __r0;   \
} while (0)
  #endif /* __zarch__ */
  #endif


Re: Explicitly record which registers are inaccessible

2011-10-04 Thread Bernd Schmidt
On 09/25/11 19:16, Richard Sandiford wrote:
> The last bit is indirect, via a new HARD_REG_SET called operand_reg_set.
> And this set is the reason why I'm sending the patch now.  The MIPS16 port
> has always had a problem with the HI and LO registers: they can only be
> set by multiplication and division instructions, and only read by MFHI
> and MFLO.  Unlike normal MIPS, there are no MTHI and MTLO instructions.
[...]
 >  Now that we use pressure classes
> instead (a good thing), I'm finally going to try to fix this "properly".
> And that means (a) fixing HI and LO and (b) stopping them from being
> treated as register operands.  (b) is important because if we start
> out with this (valid) instruction before reload:

The only slightly nonobvious thing about this is that mfhi/mflo can't
have their operand represented using a register_operand. I haven't
looked; I assume that's the case. Ok.


Bernd


Re: [Patch] Support DEC-C extensions

2011-10-04 Thread Pedro Alves
On Tuesday 04 October 2011 11:16:30, Gabriel Dos Reis wrote:

> > Do we need to consider ABIs that have calling conventions that
> > treat unprototyped and varargs functions differently? (is there any?)
> 
> Could you elaborate on the equivalence of these declarations?

I expected that with:

  extern void foo();
  extern void bar(...);
  foo (1, 2, 0.3f, NULL, 5);
  bar (1, 2, 0.3f, NULL, 5);

the compiler would emit the same for both of those
calls (calling convention wise).  That is, for example,
on x86-64, %rax is set to 1 (number of floating point
parameters passed to the function in SSE registers) in
both cases.

But not to be equivalent at the source level, that is:

  extern void foo();
  extern void foo(int a);
  extern void bar(...);
  extern void bar(int a);

should be a "conflicting types for ’bar’" error in C.

-- 
Pedro Alves


[PATCH] Fix ICE with strlen optimization (PR tree-optimization/50604)

2011-10-04 Thread Jakub Jelinek
Hi!

The following testcase ICEs on the trunk, as strlen optimization was
assuming memcpy arguments will have expected type (size_type_node),
but they had ssizetype instead.  The following patch fixes it
both in the builtins.c folders that create memcpy and also in the
strlen pass to no longer assume that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-04  Jakub Jelinek  

PR tree-optimization/50604
* builtins.c (fold_builtin_strcpy, fold_builtin_stpcpy,
fold_builtin_strncpy, fold_builtin_stxcpy_chk): Ensure
last argument to memcpy has size_type_node type instead of
ssizetype.
* tree-ssa-strlen.c (handle_builtin_memcpy): Use size_type_node
instead of TREE_TYPE (len) as type for newlen.

* gcc.dg/pr50604.c: New test.

--- gcc/builtins.c.jj   2011-08-26 18:41:44.0 +0200
+++ gcc/builtins.c  2011-10-04 09:50:11.0 +0200
@@ -8288,7 +8288,8 @@ fold_builtin_strcpy (location_t loc, tre
return NULL_TREE;
 }
 
-  len = size_binop_loc (loc, PLUS_EXPR, len, ssize_int (1));
+  len = fold_convert_loc (loc, size_type_node, len);
+  len = size_binop_loc (loc, PLUS_EXPR, len, build_int_cst (size_type_node, 
1));
   return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)),
   build_call_expr_loc (loc, fn, 3, dest, src, len));
 }
@@ -8319,7 +8320,9 @@ fold_builtin_stpcpy (location_t loc, tre
   if (!fn)
 return NULL_TREE;
 
-  lenp1 = size_binop_loc (loc, PLUS_EXPR, len, ssize_int (1));
+  lenp1 = size_binop_loc (loc, PLUS_EXPR,
+ fold_convert_loc (loc, size_type_node, len),
+ build_int_cst (size_type_node, 1));
   /* We use dest twice in building our expression.  Save it from
  multiple expansions.  */
   dest = builtin_save_expr (dest);
@@ -8375,6 +8378,8 @@ fold_builtin_strncpy (location_t loc, tr
   fn = implicit_built_in_decls[BUILT_IN_MEMCPY];
   if (!fn)
 return NULL_TREE;
+
+  len = fold_convert_loc (loc, size_type_node, len);
   return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)),
   build_call_expr_loc (loc, fn, 3, dest, src, len));
 }
@@ -12127,7 +12132,9 @@ fold_builtin_stxcpy_chk (location_t loc,
  if (!fn)
return NULL_TREE;
 
- len = size_binop_loc (loc, PLUS_EXPR, len, ssize_int (1));
+ len = fold_convert_loc (loc, size_type_node, len);
+ len = size_binop_loc (loc, PLUS_EXPR, len,
+   build_int_cst (size_type_node, 1));
  return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)),
   build_call_expr_loc (loc, fn, 4,
dest, src, len, size));
--- gcc/tree-ssa-strlen.c.jj2011-09-29 14:25:46.0 +0200
+++ gcc/tree-ssa-strlen.c   2011-10-04 09:52:50.0 +0200
@@ -1297,7 +1297,7 @@ handle_builtin_memcpy (enum built_in_fun
   if (si != NULL)
 newlen = si->length;
   else
-newlen = build_int_cst (TREE_TYPE (len), ~idx);
+newlen = build_int_cst (size_type_node, ~idx);
   oldlen = NULL_TREE;
   if (olddsi != NULL)
 {
--- gcc/testsuite/gcc.dg/pr50604.c.jj   2011-10-04 09:55:38.0 +0200
+++ gcc/testsuite/gcc.dg/pr50604.c  2011-10-04 09:53:39.0 +0200
@@ -0,0 +1,19 @@
+/* PR tree-optimization/50604 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "strlenopt.h"
+
+typedef char T;
+extern const T s[];
+
+void
+foo (T *x)
+{
+  char *r = malloc (strlen (x));
+  strcpy (r, s);
+  strcat (r, x);
+  strcat (r, "/");
+}
+
+const T s[] = "abcdefghijklmnopq";

Jakub


Re: Avoid optimized out references to appear in lto symbol table

2011-10-04 Thread Richard Guenther
2011/10/2 Jan Hubicka :
>> > Index: tree-sra.c
>> > ===
>> > *** tree-sra.c  (revision 179423)
>> > --- tree-sra.c  (working copy)
>> > *** modify_function (struct cgraph_node *nod
>> > *** 4622,4627 
>> > --- 4622,4628 
>> >    VEC (cgraph_edge_p, heap) * redirect_callers = collect_callers_of_node 
>> > (node);
>> >
>> >    rebuild_cgraph_edges ();
>> > +   free_dominance_info (CDI_DOMINATORS);
>> >    pop_cfun ();
>> >    current_function_decl = NULL_TREE;
>>
>>
>> Extra change to tree-sra, not in ChangeLog. Is this hunk needed? Or
>> unrelated fix for something else?
>
> It is needed - cgraph_remove_function sanity check that dminance info is 
> clear and it is leaking here.
> In the version I comitted there is changelog entry, so this must've been 
> next-to-last diff. Sorry
> for confussion.

But we don't free dominance information when it is correct.  So please don't
do that.  Instead remove the sanity check and simply remove dominance
information when you remove the function isntead.

Thanks,
Richard.

> Honza
>
>>
>> Ciao!
>> Steven
>


Re: [Patch, Fortran] PR 35831: [F95] Shape mismatch check missing for dummy procedure argument

2011-10-04 Thread Mikael Morin
On Monday 03 October 2011 23:02:15 Janus Weil wrote:
> Hi all,
> 
> here is a patch for a rather long-standing PR. It continues my ongoing
> campaign of improving the checks for "procedure characteristics" (cf.
> F08 chapter 12.3), which are relevant for dummy procedures, procedure
> pointer assignments, overriding of type-bound procedures, etc.
> 
> This particular patch checks for the correct shape of array arguments,
> in a manner similar to the recently added check for the string length
> (PR 49638), namely via 'gfc_dep_compare_expr'.
> 
> The hardest thing about this PR was to find out what exactly the
> standard requires (cf. c.l.f. thread linked in comment #12): Only the
> shape of the argument has to match (i.e. upper minus lower bound), not
> the bounds themselves (no matter if the bounds are constant or not).
> 
> I also added a FIXME, in order to remind myself of adding the same
> check for function results soon.
> 
> The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk?
> 

The patch is basically OK.
Any reason to keep this disabled?

+ case -2:
+   /* FIXME: Implement a warning for this case.
+   gfc_warning ("Possible shape mismatch in argument '%s'",
+   s1->name);*/
+   break;
+

Mikael


Re: [RFC PATCH] restrict_based_on_field attribute

2011-10-04 Thread Michael Matz
Hi,

On Mon, 3 Oct 2011, Jakub Jelinek wrote:

> std::vector acts roughly as something having a restrict pointer field,
> i.e. two different std::vector objects will have the pointers pointing to
> a different array, unfortunately unlike e.g. std::valarray we have 3
> different pointers pointing into the array instead of 1, and we can't change
> it without breaking the ABI.  This patch adds an extension, where several
> pointer fields in the same structure can be marked as being a group for
> restrict purposes (so the ISO C99 restrict wording would for them not
> mandate that all accesses are made through pointers based on say _M_start,
> but through pointers based on any of the fields in the group (_M_start,
> _M_finish or _M_end_of_storage in the std::vector case).

Ugh, that's stretching our fragile to unsafe restrict support quite much.  
IMO beyond what we can reasonably be confident to not cause 
miscompilations.  Essentially what you're implicitely assuming is that the 
sources of restrict tags can ultimately be memory, and that this is 
somehow reliable.  It unfortunately isn't.

Keep in mind that the whole current restrict-tag support was designed to 
support fortrans specific needs where we can rely on the compiler 
generating code we most probably (!) won't miscompile later.  Further we 
interpret some of the part where ISO-C says nothing in our favor.  
Together this means that e.g. (intentionally _not_ ISO-C, but sort of 
middle-end speak):

foo (struct s)
{ restrict int *p1 = s.p;
  restrict int *p2 = s.p;
  *p1 = 0; *p2 = 1; return *p1;
}

depends on optimization.  We either generate two restrict tags for p1 and 
p2 (this currently happens when there's no CSE of s.p), or we generate 
only one for p1 and p2 (that's what currently happens with CSE).

Note that both interpretations make sense.  One tag makes sense to make 
the above function work as the user expected (returning 1, not 0).  Two 
tags make sense that this case could be optimized:

foo (struct s)
{ restrict int *p1 = s.p;
  restrict int *p2 = s.p;
  ... accesses of p1[2*i] and p2[2*i+1] intermixed ...
  ... we expect to be able to disambiguate both sets purely because of 
  restrict tags ...
}

We will generate new tags when the source of the load is _not_ restrict 
qualified (conversion that introduce restrict generate new tags).  We will 
generate only one tag when the load _is_ restrict qualified _and_ memory 
CSE has eliminated one of the loads.

In any case, right now the middle end sees memory-loads of restrict 
qualified pointers as source of new restrict tags (and depends on 
optimizations if that isn't what is wanted).  This all works more or less 
for fortran because the array->data pointer usually is loaded only once 
for one array, or at least later memory optimizations come and optimize 
away superflous loads.  For fortran at least we currently _rely_ on such 
memory optimizations to take place in case there are multiple accesses to 
one array (as each of them should better have the same restrict tag).  
Luckily if we disable such optimizations we still can't write 
miscompiled testcases because we would as well disable the transformations 
that make use of the invalid restrict info.  But we certainly should be 
able to create fortran testcases where the restrict tags clearly are 
wrong with disabled memory optimizations.

So, loads generate new tags, and we rely on optimizations that this 
doesn't cause miscompilations via invalid disambiguations, which works 
fine for the restricted set of cases coming from the fortran frontend.

You introduce even more freedom to this mud.  Effectively:

foo (struct s)
{ restrict int *p1 = s.p;
  restrict int *p2 = s.q; // same restrict tag as s.p
}

You're giving the user rope to specify the comment part.  For 
this to not cause miscompilations you absolutely rely on that request to 
not be merely a hint.  The problem is that there's no guarantee that it 
isn't lost:

foo_abstract (restrict *p1, restrict *p2)
{ ... p1 and p2 have different tags ... }
foo { foo_abstract (s.p, s.q); }

or

user does pointer to member games:

foo (struct s)
{ restrict **pp1 = &s.p;
  restrict **pp2 = &s.q;
  ... mem clobber ...
  restrict *p1 = *pp1;
  restrict *p2 = *pp2;  // generate new tag for p2
}

Or our own lowering for MEM_EXPR:

foo (struct s)
{ restrict *p1 = MEM[s, 0];
  restrict *p2 = MEM[s, 4]; // new tag for p2
}

Granted, you're trying to look at the type of s to reconstruct the 
field_decl, but that might be misleading when enough obfuscations are 
present.  For fortran it's impossible to generate any of the breaking 
situations above.

All in all the restrict tag support is extremely fragile when it comes to 
tag source from loads, and deep down even unsafe for generic cases, hence 
I don't think to increase its exposal into that generic direction is a 
good idea.


Ciao,
Michael.


Re: Explicitly record which registers are inaccessible

2011-10-04 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 09/25/11 19:16, Richard Sandiford wrote:
>> The last bit is indirect, via a new HARD_REG_SET called operand_reg_set.
>> And this set is the reason why I'm sending the patch now.  The MIPS16 port
>> has always had a problem with the HI and LO registers: they can only be
>> set by multiplication and division instructions, and only read by MFHI
>> and MFLO.  Unlike normal MIPS, there are no MTHI and MTLO instructions.
> [...]
>  >  Now that we use pressure classes
>> instead (a good thing), I'm finally going to try to fix this "properly".
>> And that means (a) fixing HI and LO and (b) stopping them from being
>> treated as register operands.  (b) is important because if we start
>> out with this (valid) instruction before reload:
>
> The only slightly nonobvious thing about this is that mfhi/mflo can't
> have their operand represented using a register_operand. I haven't
> looked; I assume that's the case. Ok.

Right.  The follow-up MIPS patch (which I've been sitting on, I suppose
I should post it when I get home, sorry) removes HI and LO from the new
operand_reg_set and extends move_operand to explicitly allow LO.

A lot of the multiplication patterns need rejigging to expose LO early
when required, so it's not pretty...

Richard


[patch tree-optimization]: 1 of 2: Add normalization of bitwise-operations to tree-ssa-reassoc pass

2011-10-04 Thread Kai Tietz
Hello,

This patch (one of two) adds to tree-ssa-reassociation code for
expansion of packed
bitwise-binary operations - like (X | Y) == 0, etc.
Also it normalizes bitwise-not operations on bitwise-binary tree
chains - like ~(X | Y) -> ~X & ~Y.

ChangeLog

2011-10-04  Kai Tietz  

* tree-ssa-reassoc.c (gimple build_and_add_sum): Add forwader
and add support for unary expressions.
(remove_stmt_chain): New function for removing statement-chain
recursively without checking for visited state.
(make_new_tmp_statement): New helper function to generate
temporary register statement.
(expand_not_bitwise_binary): Helper for applying bitwise-not
on a tree-chain.
(expand_cmp_ior): Helper for expand packed bitwise-binary
combined statement.
(break_up_bitwise_combined_stmt): New function.
(break_up_subtract_bb): Rename to
(break_up_expr_bb): this.
(do_reassoc): Call break_up_expr_bb instead of break_up_subtract_bb.


ChangeLog

2011-10-04  Kai Tietz  


* gcc.dg/tree-ssa/reassoc-30.c: New file.
* gcc.dg/tree-ssa/reassoc-31.c: New file.
* gcc.dg/tree-ssa/reassoc-26.c: New file.
* gcc.dg/tree-ssa/reassoc-27.c: New file.
* gcc.dg/tree-ssa/reassoc-28.c: New file.
* gcc.dg/tree-ssa/reassoc-29.c: New file.

Bootstrapped and regression tested for all languages (including Ada
and Obj-C++) on host x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai

Index: gcc/gcc/tree-ssa-reassoc.c
===
--- gcc.orig/gcc/tree-ssa-reassoc.c
+++ gcc/gcc/tree-ssa-reassoc.c
@@ -44,6 +44,10 @@ along with GCC; see the file COPYING3.
 #include "params.h"
 #include "diagnostic-core.h"

+/* Forwarders.  */
+static gimple build_and_add_sum (tree, tree, tree, enum tree_code);
+static void remove_stmt_chain (tree);
+
 /*  This is a simple global reassociation pass.  It is, in part, based
 on the LLVM pass of the same name (They do some things more/less
 than we do, in different orders, etc).
@@ -51,7 +55,9 @@ along with GCC; see the file COPYING3.
 It consists of five steps:

 1. Breaking up subtract operations into addition + negate, where
-it would promote the reassociation of adds.
+it would promote the reassociation of adds.  Additionally breaking
+up combined expression made out of boolean-typed bitwise expressions
+for improving simplification.

 2. Left linearization of the expression trees, so that (A+B)+(C+D)
 becomes (((A+B)+C)+D), which is easier for us to rewrite later.
@@ -557,6 +563,412 @@ get_unary_op (tree name, enum tree_code
   return NULL_TREE;
 }

+/* Create a temporary register expression with type TYPE, tree code CODE, and
+   operands OP1 and OP2.  If REF_DEF is a valid gimple statement, we use its
+   location information for new generated temporary.
+   Function returns left-hand-side of new generated temporary register.  */
+
+static tree
+make_new_tmp_statement (tree type, enum tree_code code, tree op1, tree op2,
+   gimple ref_def)
+{
+  gimple sum;
+  tree tmpvar = create_tmp_reg (type, NULL);
+  add_referenced_var (tmpvar);
+  sum = build_and_add_sum (tmpvar, op1, op2, code);
+  if (ref_def)
+gimple_set_location (sum, gimple_location (ref_def));
+  return gimple_get_lhs (sum);
+}
+
+/* Perform on tree LHS with optional definition statement EXPR
+   the logic-not operation.  TYPE is of kind boolean.  */
+
+static tree
+expand_not_bitwise_binary (tree type, tree lhs, gimple expr)
+{
+  enum tree_code code = ERROR_MARK;
+  tree op1 = NULL, op2 = NULL;
+  gimple s1 = NULL, s2 = NULL;
+
+  if (TREE_CODE (lhs) == INTEGER_CST)
+return fold_build1 (BIT_NOT_EXPR, type, lhs);
+
+  if (expr && is_gimple_assign (expr))
+code = gimple_assign_rhs_code (expr);
+
+  /* If statement lhs isn't a single-use statement,
+ we don't want to modify it. So we can do only default-case
+ operation for it.  */
+  if (code != ERROR_MARK && !has_single_use (lhs))
+code = ERROR_MARK;
+
+  if (TREE_CODE_CLASS (code) == tcc_comparison
+  || code == BIT_XOR_EXPR
+  || code == BIT_AND_EXPR
+  || code == BIT_IOR_EXPR)
+{
+  op1 = gimple_assign_rhs1 (expr);
+  op2 = gimple_assign_rhs2 (expr);
+}
+  else if (code == BIT_NOT_EXPR)
+op1 = gimple_assign_rhs1 (expr);
+  else
+return make_new_tmp_statement (TREE_TYPE (lhs), BIT_NOT_EXPR, lhs,
+  NULL_TREE, expr);
+
+  /* ~(~X) -> X.  */
+  if (code == BIT_NOT_EXPR)
+return op1;
+
+  /* Invert comparison if possible, otherwise fall through to
+ default case.  */
+  else if (TREE_CODE_CLASS (code) == tcc_comparison)
+{
+  enum tree_code ncode;
+  tree op1type = TREE_TYPE (op1);
+
+  ncode = invert_tree_comparison (code,
+ HONOR_NANS (TYPE_MODE (op1type)));
+  if (ncode != ERROR_MARK)
+   ret

[patch tree-optimization]: 2 of 2: Add denormalization of bitwise-operations to tree-ssa-reassoc pass

2011-10-04 Thread Kai Tietz
Hello,

This patch (two of two) adds to tree-ssa-reassociation code for
repropagation of unpacked
bitwise-binary operations - like (X == 0) & (Y == 0), etc.
Also it denormalizes bitwise-not operations on bitwise-binary tree
chains - eg ~X & ~Y -> ~(X | Y).

ChangeLog

2011-10-04  Kai Tietz  

* tree-ssa-reassoc.c (walk_bitwise_stmt_elems): Helper
routine to build vectored representation of bitwise-binary tree chain.
(rebuild_vector_tree): Build out of a vectored tree-chain representation
a tree-chain.
(operate_bitwise_xor_stmt): Handle bitwise-xor denormalization.
(merge_bitwise_compares): Special helper for rebuilding denormalized
form of comparison to zero list.
(operate_bitwise_stmt): Handle bitwise-binary denormalization.
(repropagate_bitwise): New static function.
(execute_reassoc): Use repropagate_bitwise.


ChangeLog

2011-10-04  Kai Tietz  

* gcc.dg/tree-ssa/reassoc-32.c: New file.
* gcc.dg/tree-ssa/reassoc-33.c: New file.
* gcc.dg/tree-ssa/reassoc-34.c: New file.
* gcc.dg/tree-ssa/reassoc-35.c: New file.

Bootstrapped and regression tested for all languages (including Ada
and Obj-C++) on host x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai

Index: gcc/gcc/tree-ssa-reassoc.c
===
--- gcc.orig/gcc/tree-ssa-reassoc.c
+++ gcc/gcc/tree-ssa-reassoc.c
@@ -2658,6 +3113,648 @@ linearize_expr_tree (VEC(operand_entry_t
   add_to_ops_vec (ops, binrhs);
 }

+/* Split up a binary tree-chain of code CODE - starting at STMT - into four
+   different kinds:
+   - vector VCST stores constant values.
+   - vector VNOT stores bitwise-not expressions.
+   - vector VCMP_ZERO stores comparisons to zero.
+   - vector VEXRR stores the remaining rest. */
+
+static void
+walk_bitwise_stmt_elems (gimple stmt, enum tree_code code,
+VEC(tree, heap) **vcst,
+VEC(tree, heap) **vnot,
+VEC(tree, heap) **vcmp_zero,
+VEC(tree, heap) **vexpr)
+{
+  gimple s;
+  tree l;
+
+  l = gimple_assign_rhs1 (stmt);
+  if (TREE_CODE (l) == INTEGER_CST)
+VEC_safe_push (tree, heap, *vcst, l);
+  else if (TREE_CODE (l) != SSA_NAME
+  || (s = SSA_NAME_DEF_STMT (l)) == NULL
+  || !is_gimple_assign (s)
+  || !has_single_use (l))
+VEC_safe_push (tree, heap, *vexpr, l);
+  else if (gimple_assign_rhs_code (s) == code)
+walk_bitwise_stmt_elems (s, code, vcst, vnot, vcmp_zero, vexpr);
+  else if (gimple_assign_rhs_code (s) == BIT_NOT_EXPR)
+VEC_safe_push (tree, heap, *vnot, l);
+  /* (A == 0) & (B == 0) -> (A | B) == 0
+ (A != 0) | (B != 0) -> (A | B) != 0.  */
+  else if (((code == BIT_AND_EXPR
+&& gimple_assign_rhs_code (s) == EQ_EXPR)
+   || (code == BIT_IOR_EXPR
+   && gimple_assign_rhs_code (s) == NE_EXPR))
+  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (s)))
+  && TREE_CODE (gimple_assign_rhs2 (s)) == INTEGER_CST
+  && integer_zerop (gimple_assign_rhs2 (s)))
+VEC_safe_push (tree, heap, *vcmp_zero, l);
+  else
+VEC_safe_push (tree, heap, *vexpr, l);
+
+  l = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (l) == INTEGER_CST)
+{
+  VEC_safe_push (tree, heap, *vcst, l);
+  return;
+}
+  if (TREE_CODE (l) != SSA_NAME
+  || (s = SSA_NAME_DEF_STMT (l)) == NULL
+  || !is_gimple_assign (s)
+  || !has_single_use (l))
+VEC_safe_push (tree, heap, *vexpr, l);
+  else if (gimple_assign_rhs_code (s) == code)
+walk_bitwise_stmt_elems (s, code, vcst, vnot, vcmp_zero, vexpr);
+  else if (gimple_assign_rhs_code (s) == BIT_NOT_EXPR)
+VEC_safe_push (tree, heap, *vnot, l);
+  /* (A == 0) & (B == 0) -> (A | B) == 0
+ (A != 0) | (B != 0) -> (A | B) != 0.  */
+  else if (((code == BIT_AND_EXPR
+&& gimple_assign_rhs_code (s) == EQ_EXPR)
+   || (code == BIT_IOR_EXPR
+   && gimple_assign_rhs_code (s) == NE_EXPR))
+  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (s)))
+  && TREE_CODE (gimple_assign_rhs2 (s)) == INTEGER_CST
+  && integer_zerop (gimple_assign_rhs2 (s)))
+VEC_safe_push (tree, heap, *vcmp_zero, l);
+  else
+VEC_safe_push (tree, heap, *vexpr, l);
+}
+
+/* Helper function to rebuild a binary tree of CODE elements
+   from vector VEC.
+   If LASTP is NULL, then all elements are combined and the result is
+   returned.  Otherwise the last element of vector VEC is stored in LASTP
+   and all - but the last - elements are merged and returned.
+   Note: for vector with just one element, this element is returned
+   and LASTP is set to NULL, if provided.
+   If INNER_LEFT has value TRUE, then the RHS1 operand of VEC elements
+   are used for combining.  Otherwise the VEC elements itself are used.  */
+
+static tree
+rebuild_vector_tree (VEC(tree, heap) *vec,
+enum t

Re: [RFC] Builtin infrastructure change

2011-10-04 Thread Richard Guenther
On Tue, Oct 4, 2011 at 2:07 AM, Michael Meissner
 wrote:
> On Sat, Oct 01, 2011 at 02:11:27PM +, Joseph S. Myers wrote:
>> On Fri, 30 Sep 2011, Michael Meissner wrote:
>>
>> > Is this enough of a savings to continue on?  I'm of two minds about it, 
>> > one is
>>
>> The thing to measure is not so much memory as startup time (how long it
>> takes to compile an empty source file), which is important for libraries
>> and programs using a coding style with lots of small source files.
>
> With my current changes which has modified the standard builtins to be lazy,
> but I haven't yet done the machine dependent builtins, I saw a difference of
> 0.0022 seconds (0.0170 vs. 0.0148) on my 3 year old Intel core 2 laptop.  I 
> did
> 14 runs in total, and skipped the fastest 2 runs and slowest 2 runs, and then
> averaged the 10 runs in the middle.  I built boostrap builds with release
> checking with the top of subversion head without changes and with my changes.
>
> So at this point, I'm wondering whether it is worth it to finish the
> optimization for lazy builtins.

I think it is.

+/* Return the tree node for a builtin function or NULL, indexing into the
+   array.  */
+static inline tree
+built_in_set_decl_add (enum built_in_function fncode, int addend)
+{
+  return built_in_decls ((enum built_in_function)(((int)fncode) + addend));
+}

this doesn't seem to "set" anything.  Mistake?

+static inline tree
+built_in_decls (enum built_in_function fncode)

I'd prefer singular, built_in_decl () instead of plural.

+static inline tree
+implicit_built_in_decls (enum built_in_function fncode)

Likewise.

+/* Initialize an implicit builtin function.  */
+static inline void
+built_in_set_implicit (enum built_in_function fncode, tree implicit)

and built_in_set_implicit_decl (or rather set_implicit_built_in_decl - hm,
I guess I'm now bikeshedding ;)).

The patch is ok with the built_in_set_decl_add mystery resolved
(I'd best not have the function at all but perform the ugly add and casting
in the few callers -- it _is_ ugly to do that after all, so it might as well
look so ;)).

It would be nice to use singular/plural consistently (different currently
for set and get, I prefer singular).  The rest of the bikeshedding - oh well ;)

Thanks,
Richard.

> --
> Michael Meissner, IBM
> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
> meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899
>


Re: [PATCH] Handle side-effects of EH optimizations of callees

2011-10-04 Thread Richard Guenther
On Tue, Oct 4, 2011 at 9:17 AM, Maxim Kuvyrkov  wrote:
> Richard,
>
> The following patch fixes a CFG consistency problem.
>
> When early IPA optimizations (e.g., early SRA) create a version of a function 
> that no longer throws, versioning machinery deletes EH landings pads in 
> _callers_ of the function [*].  However, the EH cfg edges are not updated in 
> the callers, which eventually leads to an ICE in verify_eh_edges.

It needs to update EH edges in the callers via the common idiom

  if (maybe_clean_or_replace_eh_stmt (stmt, stmt)
  && gimple_purge_dead_eh_edges (gimple_bb (stmt)))
... arrange for cfg-cleanup to be run ...

where the cfg-cleanup part probably explains why it doesn't.  So one option
is to not delete the landing pads.  Where does it do that?  I suppose it
merely fails to properly transfer the EH info from the old to the new call
(thus, fails to forcefully do what maybe_duplicate_eh_stmt does).

Richard.

> I've tried several ways to fix this, including trying to make early 
> optimizations not delete callers' landing pads.  You are the original author 
> of the gimple EH code, and I wonder if you know the rationale behind removing 
> EH landing pads from callers or a function being optimized.
>
> The failure triggers on several libstdc++ testcases, but requires an 
> unrelated patch of mine (soon to be posted).
>
> In any case, attached is what appears to be the best way to fix this up.
>
> Bootstrapped and regrested on x86_64-linux-gnu.  OK for trunk?
>
> Thank you,
>
> [*] this is done in cgraphunit.c: update_call_expr() and 
> ipa_modify_call_arguments() via maybe_clean[_and_replace]_eh_stmt().
>
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>
>
>


Re: [PATCH] Look at restrict disambiguation in tree-ssa-alias.c unconditionally (PR tree-optimization/50522)

2011-10-04 Thread Jakub Jelinek
On Tue, Oct 04, 2011 at 12:17:30PM +0200, Richard Guenther wrote:
>   int *x;
> 
> > void foo (int *p)
> > {
> >   int * __restrict p1 = p;
> >   int * __restrict p2 = p + 32;
> >   int *q;
> >   int i;
> x = p2;
> >   q = p + 32;
> q = q - 31;
> >   for (i = 0; i < 32; ++i)
> > p[i] = q[i];
> > }

Yes, this is valid and so is a modified version of the earlier
testcase where all accesses in the first loop are biased
(bar below, assuming y > 32 or y <= -32).

int *x;

void
foo (int *p)
{
  int *__restrict p1 = p;
  int *__restrict p2 = p + 32;
  int *q;
  int i;
  x = p2;
  q = p + 32;
  q = q - 31;
  for (i = 0; i < 32; ++i)
p[i] = q[i];
}

void
bar (int *p, int y)
{
  int *__restrict p1 = p;
  int *__restrict p2 = p + 32;
  int *q;
  int i;
  for (i = 0; i < 32; ++i)
p1[i + y] = p2[i + y];
  q = (p + 32) - 31;
  for (i = 0; i < 32; ++i)
p[i] = q[i];
}

> 
> would be valid and we'd rely on CSE not to replace q = p + 32
> with q = p2 (ignoring the fact that for a miscompile we need
> similar tricks for p1).  It doesn't do that at the moment
> because we fold int * __restrict p2 = p + 32 to
> ((int * __restrict)p) + 32 and thus see
> 
>   p.0_4 = (int * restrict) p_2(D);
>   p2_5 = p.0_4 + 128;
> 
> vs.
> 
>   q_6 = p_2(D) + 128;
> 
> but you are going to change that ;)

But even with the "Restrict fixes" patch I've just checked in
and with the TYPE_RESTRICT check removal patch I don't see anything
wrong in the IL, the only thing that is PT (restr) is the stmt
computing p2, which is just stored into x and nothing else, and
in the second function only the first loop.

Jakub


Re: [PATCH] Maintain order of LTO sections

2011-10-04 Thread Andi Kleen
> Well, I'm not sure we should jump through too much hoops to make
> -flto work with -fno-toplevel-reorder.  Simply because I think nothing
> defines any "toplevel order" for multiple object files.  So, I think

In practice it seems to work because real programs rely on it.

I can just say with this patch and Honza's my program works
(with -flto-partition=none), without it it doesn't.

Also as Honza pointed out it has other benefits, like making
compiles more reproducible. For example if you have a memory corruption
somewhere the random order currently will randomly move it from
run to run and make it harder to debug.

> both ld and gcc/collect2 need to first document they will never
> break toplevel order (what about ld with -ffunction-sections and
> -gc-sections? Or -relax?)

-ffunction-sections and -gc-sections do not reorder as far as I know.
Not sure about -relax.

But the programs that rely on order should "don't do it when it hurts"
so not set too magic options. I'm sure there are other creative ways to break
order too, but it doesn't seem to be too hard in practice to avoid
them.


-Andi


Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-10-04 Thread Richard Guenther
On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries  wrote:
> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries  
>>> wrote:
 On 09/28/2011 11:53 AM, Richard Guenther wrote:
> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries  
> wrote:
>> Richard,
>>
>> I got a patch for PR50527.
>>
>> The patch prevents the alignment of vla-related allocas to be set to
>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after 
>> folding
>> the alloca.
>>
>> Bootstrapped and regtested on x86_64.
>>
>> OK for trunk?
>
> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
> the vectorizer then will no longer see that the arrays are properly 
> aligned.
>
> I'm not sure what the best thing to do is here, other than trying to 
> record
> the alignment requirement of the VLA somewhere.
>
> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
> has the issue that it will force stack-realignment which isn't free (and 
> the
> point was to make the decl cheaper than the alloca).  But that might
> possibly be the better choice.
>
> Any other thoughts?

 How about the approach in this (untested) patch? Using the DECL_ALIGN of 
 the vla
 for the new array prevents stack realignment for folded vla-allocas, also 
 for
 large vlas.

 This will not help in vectorizing large folded vla-allocas, but I think 
 it's not
 reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that 
 has
 been the case up until we started to fold). If you want to trigger 
 vectorization
 for a vla, you can still use the aligned attribute on the declaration.

 Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without 
 using
 an attribute on the decl. This patch exploits this by setting it at the 
 end of
 the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
 propagation though, because although the ptr_info of the lhs is propagated 
 via
 copy_prop afterwards, it's not propagated anymore via ccp.

 Another way to do this would be to set BIGGEST_ALIGNMENT at the end of 
 ccp2 and
 not fold during ccp3.
>>>
>>> Ugh, somehow I like this the least ;)
>>>
>>> How about lowering VLAs to
>>>
>>>   p = __builtin_alloca (...);
>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>
>>> and not assume anything for alloca itself if it feeds a
>>> __builtin_assume_aligned?
>>>
>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>
>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>
>>> that's less awkward to use?
>>>
>>> Sorry for not having a clear plan here ;)
>>>
>>
>> Using assume_aligned is a more orthogonal way to represent this in gimple, 
>> but
>> indeed harder to use.
>>
>> Another possibility is to add a 'tree vla_decl' field to struct
>> gimple_statement_call, which is probably the easiest to implement.
>>
>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>> decided to try this one. Attached patch implements my first stab at this  
>> (now
>> testing on x86_64).
>>
>> Is this an acceptable approach?
>>
>
> bootstrapped and reg-tested (including ada) on x86_64.
>
> Ok for trunk?

The idea is ok I think.  But

 case BUILT_IN_ALLOCA:
+case BUILT_IN_ALLOCA_WITH_ALIGN:
   /* If the allocation stems from the declaration of a variable-sized
 object, it cannot accumulate.  */
   target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
   if (target)
return target;
+  if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+ == BUILT_IN_ALLOCA_WITH_ALIGN)
+   {
+ tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
+  built_in_decls[BUILT_IN_ALLOCA],
+  1, CALL_EXPR_ARG (exp, 0));
+ CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
+ exp = new_call;
+   }

Ick.  Why can't the rest of the compiler not handle
BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
(thus, arrange things so the assembler name of alloca-with-align is alloca?)

I don't see why you still need the special late CCP pass.

-static tree
-fold_builtin_alloca_for_var (gimple stmt)
+static bool
+builtin_alloca_with_align_p (gimple stmt)
 {
-  unsigned HOST_WIDE_INT size, threshold, n_elem;
-  tree lhs, arg, block, var, elem_type, array_type;
-  unsigned int align;
+  tree fndecl;
+  if (!is_gimple_call (stmt))
+return false;
+
+  fndecl = gimple_call_fndecl (stmt);
+
+  return (fndecl != NULL_TREE
+ && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
+}

equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALL

Re: [PATCH] Look at restrict disambiguation in tree-ssa-alias.c unconditionally (PR tree-optimization/50522)

2011-10-04 Thread Richard Guenther
On Tue, 4 Oct 2011, Jakub Jelinek wrote:

> On Tue, Oct 04, 2011 at 12:17:30PM +0200, Richard Guenther wrote:
> >   int *x;
> > 
> > > void foo (int *p)
> > > {
> > >   int * __restrict p1 = p;
> > >   int * __restrict p2 = p + 32;
> > >   int *q;
> > >   int i;
> > x = p2;
> > >   q = p + 32;
> > q = q - 31;
> > >   for (i = 0; i < 32; ++i)
> > > p[i] = q[i];
> > > }
> 
> Yes, this is valid and so is a modified version of the earlier
> testcase where all accesses in the first loop are biased
> (bar below, assuming y > 32 or y <= -32).
> 
> int *x;
> 
> void
> foo (int *p)
> {
>   int *__restrict p1 = p;
>   int *__restrict p2 = p + 32;
>   int *q;
>   int i;
>   x = p2;
>   q = p + 32;
>   q = q - 31;
>   for (i = 0; i < 32; ++i)
> p[i] = q[i];
> }
> 
> void
> bar (int *p, int y)
> {
>   int *__restrict p1 = p;
>   int *__restrict p2 = p + 32;
>   int *q;
>   int i;
>   for (i = 0; i < 32; ++i)
> p1[i + y] = p2[i + y];
>   q = (p + 32) - 31;
>   for (i = 0; i < 32; ++i)
> p[i] = q[i];
> }
> 
> > 
> > would be valid and we'd rely on CSE not to replace q = p + 32
> > with q = p2 (ignoring the fact that for a miscompile we need
> > similar tricks for p1).  It doesn't do that at the moment
> > because we fold int * __restrict p2 = p + 32 to
> > ((int * __restrict)p) + 32 and thus see
> > 
> >   p.0_4 = (int * restrict) p_2(D);
> >   p2_5 = p.0_4 + 128;
> > 
> > vs.
> > 
> >   q_6 = p_2(D) + 128;
> > 
> > but you are going to change that ;)
> 
> But even with the "Restrict fixes" patch I've just checked in
> and with the TYPE_RESTRICT check removal patch I don't see anything
> wrong in the IL, the only thing that is PT (restr) is the stmt
> computing p2, which is just stored into x and nothing else, and
> in the second function only the first loop.

But that's by pure luck and not by design, no?

Richard.


Re: [PATCH] Maintain order of LTO sections

2011-10-04 Thread Richard Guenther
On Tue, 4 Oct 2011, Andi Kleen wrote:

> > Well, I'm not sure we should jump through too much hoops to make
> > -flto work with -fno-toplevel-reorder.  Simply because I think nothing
> > defines any "toplevel order" for multiple object files.  So, I think
> 
> In practice it seems to work because real programs rely on it.
> 
> I can just say with this patch and Honza's my program works
> (with -flto-partition=none), without it it doesn't.
> 
> Also as Honza pointed out it has other benefits, like making
> compiles more reproducible. For example if you have a memory corruption
> somewhere the random order currently will randomly move it from
> run to run and make it harder to debug.
> 
> > both ld and gcc/collect2 need to first document they will never
> > break toplevel order (what about ld with -ffunction-sections and
> > -gc-sections? Or -relax?)
> 
> -ffunction-sections and -gc-sections do not reorder as far as I know.
> Not sure about -relax.
> 
> But the programs that rely on order should "don't do it when it hurts"
> so not set too magic options. I'm sure there are other creative ways to break
> order too, but it doesn't seem to be too hard in practice to avoid
> them.

Sure, the question is if "-flto" counts as magic and thus
"don't do it when it hurts" ;))  I suppose with -flto-partition=none
(or 1to1) it would be reasonable to make -fno-toplevel-reorder work
(and thus maybe -fno-toplevel-reorder should simply force 1to1 
partitioning).

Richard.


Re: [PATCH] Maintain order of LTO sections

2011-10-04 Thread Andi Kleen
On Tue, Oct 04, 2011 at 03:08:02PM +0200, Richard Guenther wrote

> Sure, the question is if "-flto" counts as magic and thus
> "don't do it when it hurts" ;))  I suppose with -flto-partition=none
> (or 1to1) it would be reasonable to make -fno-toplevel-reorder work
> (and thus maybe -fno-toplevel-reorder should simply force 1to1 
> partitioning).

At least =none is incredible slow for larger programs. I would really
prefer normal whopr, otherwise at some point LTO becomes unpracticable
due to the extreme compile time penalty. I can use it right now as 
a workaround, but it's not a long term solution.

What I really want is not full -fno-toplevel-reorder anyways, but just
marking a few special variables with a special attribute to not
reorder. But this patch is still needed even for that and likely a good 
idea anyways for the other reasons.

-Andi


Re: [PATCH] Fix ICE with strlen optimization (PR tree-optimization/50604)

2011-10-04 Thread Richard Guenther
On Tue, Oct 4, 2011 at 1:00 PM, Jakub Jelinek  wrote:
> Hi!
>
> The following testcase ICEs on the trunk, as strlen optimization was
> assuming memcpy arguments will have expected type (size_type_node),
> but they had ssizetype instead.  The following patch fixes it
> both in the builtins.c folders that create memcpy and also in the
> strlen pass to no longer assume that.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2011-10-04  Jakub Jelinek  
>
>        PR tree-optimization/50604
>        * builtins.c (fold_builtin_strcpy, fold_builtin_stpcpy,
>        fold_builtin_strncpy, fold_builtin_stxcpy_chk): Ensure
>        last argument to memcpy has size_type_node type instead of
>        ssizetype.
>        * tree-ssa-strlen.c (handle_builtin_memcpy): Use size_type_node
>        instead of TREE_TYPE (len) as type for newlen.
>
>        * gcc.dg/pr50604.c: New test.
>
> --- gcc/builtins.c.jj   2011-08-26 18:41:44.0 +0200
> +++ gcc/builtins.c      2011-10-04 09:50:11.0 +0200
> @@ -8288,7 +8288,8 @@ fold_builtin_strcpy (location_t loc, tre
>        return NULL_TREE;
>     }
>
> -  len = size_binop_loc (loc, PLUS_EXPR, len, ssize_int (1));
> +  len = fold_convert_loc (loc, size_type_node, len);
> +  len = size_binop_loc (loc, PLUS_EXPR, len, build_int_cst (size_type_node, 
> 1));
>   return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)),
>                           build_call_expr_loc (loc, fn, 3, dest, src, len));
>  }
> @@ -8319,7 +8320,9 @@ fold_builtin_stpcpy (location_t loc, tre
>   if (!fn)
>     return NULL_TREE;
>
> -  lenp1 = size_binop_loc (loc, PLUS_EXPR, len, ssize_int (1));
> +  lenp1 = size_binop_loc (loc, PLUS_EXPR,
> +                         fold_convert_loc (loc, size_type_node, len),
> +                         build_int_cst (size_type_node, 1));
>   /* We use dest twice in building our expression.  Save it from
>      multiple expansions.  */
>   dest = builtin_save_expr (dest);
> @@ -8375,6 +8378,8 @@ fold_builtin_strncpy (location_t loc, tr
>   fn = implicit_built_in_decls[BUILT_IN_MEMCPY];
>   if (!fn)
>     return NULL_TREE;
> +
> +  len = fold_convert_loc (loc, size_type_node, len);
>   return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)),
>                           build_call_expr_loc (loc, fn, 3, dest, src, len));
>  }
> @@ -12127,7 +12132,9 @@ fold_builtin_stxcpy_chk (location_t loc,
>              if (!fn)
>                return NULL_TREE;
>
> -             len = size_binop_loc (loc, PLUS_EXPR, len, ssize_int (1));
> +             len = fold_convert_loc (loc, size_type_node, len);
> +             len = size_binop_loc (loc, PLUS_EXPR, len,
> +                                   build_int_cst (size_type_node, 1));
>              return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)),
>                                       build_call_expr_loc (loc, fn, 4,
>                                                        dest, src, len, size));
> --- gcc/tree-ssa-strlen.c.jj    2011-09-29 14:25:46.0 +0200
> +++ gcc/tree-ssa-strlen.c       2011-10-04 09:52:50.0 +0200
> @@ -1297,7 +1297,7 @@ handle_builtin_memcpy (enum built_in_fun
>   if (si != NULL)
>     newlen = si->length;
>   else
> -    newlen = build_int_cst (TREE_TYPE (len), ~idx);
> +    newlen = build_int_cst (size_type_node, ~idx);
>   oldlen = NULL_TREE;
>   if (olddsi != NULL)
>     {
> --- gcc/testsuite/gcc.dg/pr50604.c.jj   2011-10-04 09:55:38.0 +0200
> +++ gcc/testsuite/gcc.dg/pr50604.c      2011-10-04 09:53:39.0 +0200
> @@ -0,0 +1,19 @@
> +/* PR tree-optimization/50604 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "strlenopt.h"
> +
> +typedef char T;
> +extern const T s[];
> +
> +void
> +foo (T *x)
> +{
> +  char *r = malloc (strlen (x));
> +  strcpy (r, s);
> +  strcat (r, x);
> +  strcat (r, "/");
> +}
> +
> +const T s[] = "abcdefghijklmnopq";
>
>        Jakub
>


Re: [RFC PATCH] restrict_based_on_field attribute

2011-10-04 Thread Richard Guenther
On Tue, Oct 4, 2011 at 1:55 PM, Michael Matz  wrote:
> Hi,
>
> On Mon, 3 Oct 2011, Jakub Jelinek wrote:
>
>> std::vector acts roughly as something having a restrict pointer field,
>> i.e. two different std::vector objects will have the pointers pointing to
>> a different array, unfortunately unlike e.g. std::valarray we have 3
>> different pointers pointing into the array instead of 1, and we can't change
>> it without breaking the ABI.  This patch adds an extension, where several
>> pointer fields in the same structure can be marked as being a group for
>> restrict purposes (so the ISO C99 restrict wording would for them not
>> mandate that all accesses are made through pointers based on say _M_start,
>> but through pointers based on any of the fields in the group (_M_start,
>> _M_finish or _M_end_of_storage in the std::vector case).
>
> Ugh, that's stretching our fragile to unsafe restrict support quite much.
> IMO beyond what we can reasonably be confident to not cause
> miscompilations.  Essentially what you're implicitely assuming is that the
> sources of restrict tags can ultimately be memory, and that this is
> somehow reliable.  It unfortunately isn't.
>
> Keep in mind that the whole current restrict-tag support was designed to
> support fortrans specific needs where we can rely on the compiler
> generating code we most probably (!) won't miscompile later.  Further we
> interpret some of the part where ISO-C says nothing in our favor.
> Together this means that e.g. (intentionally _not_ ISO-C, but sort of
> middle-end speak):
>
> foo (struct s)
> { restrict int *p1 = s.p;
>  restrict int *p2 = s.p;
>  *p1 = 0; *p2 = 1; return *p1;
> }
>
> depends on optimization.  We either generate two restrict tags for p1 and
> p2 (this currently happens when there's no CSE of s.p), or we generate
> only one for p1 and p2 (that's what currently happens with CSE).
>
> Note that both interpretations make sense.  One tag makes sense to make
> the above function work as the user expected (returning 1, not 0).  Two
> tags make sense that this case could be optimized:
>
> foo (struct s)
> { restrict int *p1 = s.p;
>  restrict int *p2 = s.p;
>  ... accesses of p1[2*i] and p2[2*i+1] intermixed ...
>  ... we expect to be able to disambiguate both sets purely because of
>      restrict tags ...
> }
>
> We will generate new tags when the source of the load is _not_ restrict
> qualified (conversion that introduce restrict generate new tags).  We will
> generate only one tag when the load _is_ restrict qualified _and_ memory
> CSE has eliminated one of the loads.
>
> In any case, right now the middle end sees memory-loads of restrict
> qualified pointers as source of new restrict tags (and depends on
> optimizations if that isn't what is wanted).  This all works more or less
> for fortran because the array->data pointer usually is loaded only once
> for one array, or at least later memory optimizations come and optimize
> away superflous loads.  For fortran at least we currently _rely_ on such
> memory optimizations to take place in case there are multiple accesses to
> one array (as each of them should better have the same restrict tag).
> Luckily if we disable such optimizations we still can't write
> miscompiled testcases because we would as well disable the transformations
> that make use of the invalid restrict info.  But we certainly should be
> able to create fortran testcases where the restrict tags clearly are
> wrong with disabled memory optimizations.
>
> So, loads generate new tags, and we rely on optimizations that this
> doesn't cause miscompilations via invalid disambiguations, which works
> fine for the restricted set of cases coming from the fortran frontend.
>
> You introduce even more freedom to this mud.  Effectively:
>
> foo (struct s)
> { restrict int *p1 = s.p;
>  restrict int *p2 = s.q; // same restrict tag as s.p
> }
>
> You're giving the user rope to specify the comment part.  For
> this to not cause miscompilations you absolutely rely on that request to
> not be merely a hint.  The problem is that there's no guarantee that it
> isn't lost:
>
> foo_abstract (restrict *p1, restrict *p2)
> { ... p1 and p2 have different tags ... }
> foo { foo_abstract (s.p, s.q); }
>
> or
>
> user does pointer to member games:
>
> foo (struct s)
> { restrict **pp1 = &s.p;
>  restrict **pp2 = &s.q;
>  ... mem clobber ...
>  restrict *p1 = *pp1;
>  restrict *p2 = *pp2;  // generate new tag for p2
> }
>
> Or our own lowering for MEM_EXPR:
>
> foo (struct s)
> { restrict *p1 = MEM[s, 0];
>  restrict *p2 = MEM[s, 4]; // new tag for p2
> }
>
> Granted, you're trying to look at the type of s to reconstruct the
> field_decl, but that might be misleading when enough obfuscations are
> present.  For fortran it's impossible to generate any of the breaking
> situations above.
>
> All in all the restrict tag support is extremely fragile when it comes to
> tag source from loads, and deep down even unsafe for generic cases, hence
> I

Re: [RFC PATCH] restrict_based_on_field attribute

2011-10-04 Thread Jakub Jelinek
On Tue, Oct 04, 2011 at 01:55:27PM +0200, Michael Matz wrote:
> Ugh, that's stretching our fragile to unsafe restrict support quite much.  
> IMO beyond what we can reasonably be confident to not cause 
> miscompilations.  Essentially what you're implicitely assuming is that the 
> sources of restrict tags can ultimately be memory, and that this is 
> somehow reliable.  It unfortunately isn't.

If it causes miscompilations, then GCC is buggy and needs to be fixed.
restrict structure fields are not some kind of extension, they are in ISO
C99, see 6.7.3/12).

> Keep in mind that the whole current restrict-tag support was designed to 
> support fortrans specific needs where we can rely on the compiler 
> generating code we most probably (!) won't miscompile later.  Further we 
> interpret some of the part where ISO-C says nothing in our favor.  
> Together this means that e.g. (intentionally _not_ ISO-C, but sort of 
> middle-end speak):
> 
> foo (struct s)
> { restrict int *p1 = s.p;
>   restrict int *p2 = s.p;
>   *p1 = 0; *p2 = 1; return *p1;
> }

Not sure what you mean here.  restrict int doesn't make sense,
int *restrict p1 would make sense, but then it would be invalid C.

> So, loads generate new tags, and we rely on optimizations that this 
> doesn't cause miscompilations via invalid disambiguations, which works 
> fine for the restricted set of cases coming from the fortran frontend.

Loads don't generate new tags.
Try:
struct S { int a; int *__restrict p; };

int *a, *b, *c, *d;

void
foo (S x, S &__restrict y, int z)
{
  if (z > 64)
{
  a = x.p;
  b = y.p;
  x.p[z] = y.p[z];
}
  else if (z < 20)
{
  c = x.p;
  d = y.p;
  y.p[z] = x.p[z];
}
}

Both the pointers loaded from x.p will have the same heap var in PT info,
similarly both pointers loaded from y.p will have the same heap var in PT
info.

> You introduce even more freedom to this mud.  Effectively:
> 
> foo (struct s)
> { restrict int *p1 = s.p;
>   restrict int *p2 = s.q; // same restrict tag as s.p
> }

Again, in your above syntax it is unclear what is TYPE_RESTRICT, if it is
TYPE_RESTRICT (TREE_TYPE (p1)) or something else.  If the user writes the
above in C and then accesses both through p1 and p2 the same object, it is
invalid.

> Or our own lowering for MEM_EXPR:
> 
> foo (struct s)
> { restrict *p1 = MEM[s, 0];
>   restrict *p2 = MEM[s, 4]; // new tag for p2
> }

Again, if the user writes it this way and accesses through both, it is
invalid.  Additionally, the attribute intentionally doesn't force
TYPE_RESTRICT anywhere (and complains if the field is TYPE_RESTRICT),
so all you'll have is
int *p1 = MEM[s, 0];
int *p2 = MEM[s, 4];
and just PTA will say that both p1 and p2 include the same restrict tag
in the PT set.

I don't see how my patch could make things worse then they already are.

Jakub


Re: [PATCH] Look at restrict disambiguation in tree-ssa-alias.c unconditionally (PR tree-optimization/50522)

2011-10-04 Thread Richard Guenther
On Tue, 4 Oct 2011, Richard Guenther wrote:

> On Tue, 4 Oct 2011, Jakub Jelinek wrote:
> 
> > On Tue, Oct 04, 2011 at 12:17:30PM +0200, Richard Guenther wrote:
> > >   int *x;
> > > 
> > > > void foo (int *p)
> > > > {
> > > >   int * __restrict p1 = p;
> > > >   int * __restrict p2 = p + 32;
> > > >   int *q;
> > > >   int i;
> > > x = p2;
> > > >   q = p + 32;
> > > q = q - 31;
> > > >   for (i = 0; i < 32; ++i)
> > > > p[i] = q[i];
> > > > }
> > 
> > Yes, this is valid and so is a modified version of the earlier
> > testcase where all accesses in the first loop are biased
> > (bar below, assuming y > 32 or y <= -32).
> > 
> > int *x;
> > 
> > void
> > foo (int *p)
> > {
> >   int *__restrict p1 = p;
> >   int *__restrict p2 = p + 32;
> >   int *q;
> >   int i;
> >   x = p2;
> >   q = p + 32;
> >   q = q - 31;
> >   for (i = 0; i < 32; ++i)
> > p[i] = q[i];
> > }
> > 
> > void
> > bar (int *p, int y)
> > {
> >   int *__restrict p1 = p;
> >   int *__restrict p2 = p + 32;
> >   int *q;
> >   int i;
> >   for (i = 0; i < 32; ++i)
> > p1[i + y] = p2[i + y];
> >   q = (p + 32) - 31;
> >   for (i = 0; i < 32; ++i)
> > p[i] = q[i];
> > }
> > 
> > > 
> > > would be valid and we'd rely on CSE not to replace q = p + 32
> > > with q = p2 (ignoring the fact that for a miscompile we need
> > > similar tricks for p1).  It doesn't do that at the moment
> > > because we fold int * __restrict p2 = p + 32 to
> > > ((int * __restrict)p) + 32 and thus see
> > > 
> > >   p.0_4 = (int * restrict) p_2(D);
> > >   p2_5 = p.0_4 + 128;
> > > 
> > > vs.
> > > 
> > >   q_6 = p_2(D) + 128;
> > > 
> > > but you are going to change that ;)
> > 
> > But even with the "Restrict fixes" patch I've just checked in
> > and with the TYPE_RESTRICT check removal patch I don't see anything
> > wrong in the IL, the only thing that is PT (restr) is the stmt
> > computing p2, which is just stored into x and nothing else, and
> > in the second function only the first loop.
> 
> But that's by pure luck and not by design, no?

Well, updating my tree and playing with it a bit the above case
looks safe.

So, I think the patch removing the TYPE_RESTRICT checks is ok.
We can revert it again when issues show up.

Thanks,
Richard.


Re: [PATCH] Maintain order of LTO sections

2011-10-04 Thread Richard Guenther
On Tue, 4 Oct 2011, Andi Kleen wrote:

> On Tue, Oct 04, 2011 at 03:08:02PM +0200, Richard Guenther wrote
> 
> > Sure, the question is if "-flto" counts as magic and thus
> > "don't do it when it hurts" ;))  I suppose with -flto-partition=none
> > (or 1to1) it would be reasonable to make -fno-toplevel-reorder work
> > (and thus maybe -fno-toplevel-reorder should simply force 1to1 
> > partitioning).
> 
> At least =none is incredible slow for larger programs. I would really
> prefer normal whopr, otherwise at some point LTO becomes unpracticable
> due to the extreme compile time penalty. I can use it right now as 
> a workaround, but it's not a long term solution.
> 
> What I really want is not full -fno-toplevel-reorder anyways, but just
> marking a few special variables with a special attribute to not
> reorder. But this patch is still needed even for that and likely a good 
> idea anyways for the other reasons.

Yes, of course - I was just arguing about the supposed final state
we want to arrive at with respect to -fno-toplevel-reorder and LTO.

The patch is ok (it probably will also help testcase reductions).

Thanks,
Richard.


Re: [RFC PATCH] restrict_based_on_field attribute

2011-10-04 Thread Michael Matz
Hi,

On Tue, 4 Oct 2011, Jakub Jelinek wrote:

> If it causes miscompilations, then GCC is buggy and needs to be fixed.

That's not easy without effectively making the whole restrict support 
ineffective.

> > Keep in mind that the whole current restrict-tag support was designed to 
> > support fortrans specific needs where we can rely on the compiler 
> > generating code we most probably (!) won't miscompile later.  Further we 
> > interpret some of the part where ISO-C says nothing in our favor.  
> > Together this means that e.g. (intentionally _not_ ISO-C, but sort of 
> > middle-end speak):
> > 
> > foo (struct s)
> > { restrict int *p1 = s.p;
> >   restrict int *p2 = s.p;
> >   *p1 = 0; *p2 = 1; return *p1;
> > }
> 
> Not sure what you mean here.  restrict int doesn't make sense,

I said that it wasn't C.  Read all my examples as "int * restrict pX", ...

> int *restrict p1 would make sense, but then it would be invalid C.

... and invalidity in C doesn't necessarily enter the picture.  The above 
is middle-end instructions and we need to say what we wish to happen with 
the above code (which _is_ generated by valid fortran use for instance, 
and could conceivably be generated by other middle-end transformation, 
e.g. code duplication.

> Loads don't generate new tags.

See Richis (upcoming) answer, that's one of the various hacks^Wdesign 
decisions made to at least get something useful out of the restrict tag 
things for fortran without breaking too much.

> > You introduce even more freedom to this mud.  Effectively:
> > 
> > foo (struct s)
> > { restrict int *p1 = s.p;
> >   restrict int *p2 = s.q; // same restrict tag as s.p
> > }
> 
> Again, in your above syntax it is unclear what is TYPE_RESTRICT, if it is
> TYPE_RESTRICT (TREE_TYPE (p1)) or something else.  If the user writes the
> above in C and then accesses both through p1 and p2 the same object, it is
> invalid.

Hmm?  I thought the point of your patch was precisely to make accesses 
through _M_start and _M_final have the same restrict tag, so that users 
can access the same object through both pointers, but still enjoy 
disambiguation with e.g. _M_start from different objects.  Possibly I've 
misunderstood.


Ciao,
Michael.


Re: [RFC PATCH] restrict_based_on_field attribute

2011-10-04 Thread Richard Guenther
On Tue, Oct 4, 2011 at 3:28 PM, Jakub Jelinek  wrote:
> On Tue, Oct 04, 2011 at 01:55:27PM +0200, Michael Matz wrote:
>> Ugh, that's stretching our fragile to unsafe restrict support quite much.
>> IMO beyond what we can reasonably be confident to not cause
>> miscompilations.  Essentially what you're implicitely assuming is that the
>> sources of restrict tags can ultimately be memory, and that this is
>> somehow reliable.  It unfortunately isn't.
>
> If it causes miscompilations, then GCC is buggy and needs to be fixed.
> restrict structure fields are not some kind of extension, they are in ISO
> C99, see 6.7.3/12).
>
>> Keep in mind that the whole current restrict-tag support was designed to
>> support fortrans specific needs where we can rely on the compiler
>> generating code we most probably (!) won't miscompile later.  Further we
>> interpret some of the part where ISO-C says nothing in our favor.
>> Together this means that e.g. (intentionally _not_ ISO-C, but sort of
>> middle-end speak):
>>
>> foo (struct s)
>> { restrict int *p1 = s.p;
>>   restrict int *p2 = s.p;
>>   *p1 = 0; *p2 = 1; return *p1;
>> }
>
> Not sure what you mean here.  restrict int doesn't make sense,
> int *restrict p1 would make sense, but then it would be invalid C.
>
>> So, loads generate new tags, and we rely on optimizations that this
>> doesn't cause miscompilations via invalid disambiguations, which works
>> fine for the restricted set of cases coming from the fortran frontend.
>
> Loads don't generate new tags.
> Try:
> struct S { int a; int *__restrict p; };
>
> int *a, *b, *c, *d;
>
> void
> foo (S x, S &__restrict y, int z)
> {
>  if (z > 64)
>    {
>      a = x.p;
>      b = y.p;
>      x.p[z] = y.p[z];
>    }
>  else if (z < 20)
>    {
>      c = x.p;
>      d = y.p;
>      y.p[z] = x.p[z];
>    }
> }
>
> Both the pointers loaded from x.p will have the same heap var in PT info,
> similarly both pointers loaded from y.p will have the same heap var in PT
> info.

You are right - we currently restrict ourselves to restrict tag generation
from memory on parameters and globals to avoid the situation to rely
on optimization.  Which is why

struct S { int * restrict p; };

void foo (int *q)
{
  struct S s;
  s.p = q;
  int * restrict x = s.p;
  *x = 1;
}

does not get you a restrict tag for s.p but only for the conversion of
q before the store to s.p (this tag is propagated to x though).

Basically restrict tag generation happens at the time we generate
the internal PTA representation for 'memory' which happens once
for each 'memory' we can name (and not for memory we can't name,
apart from DECL_BY_REFERENCE restrict-qualified parameters).

Conversions OTOH are a source of restrict (I added this to not
lose restrict information during inlining as we'd keep the casts
to the argument types), which ultimately is the reason for PR48764
and PR49279 ...

> I don't see how my patch could make things worse then they already are.

I still have to have a detailled look at the patch.

Richard.


Re: [RFC] Builtin infrastructure change

2011-10-04 Thread Michael Meissner
On Tue, Oct 04, 2011 at 02:44:00PM +0200, Richard Guenther wrote:
> On Tue, Oct 4, 2011 at 2:07 AM, Michael Meissner
>  wrote:
> > On Sat, Oct 01, 2011 at 02:11:27PM +, Joseph S. Myers wrote:
> >> On Fri, 30 Sep 2011, Michael Meissner wrote:
> >>
> >> > Is this enough of a savings to continue on?  I'm of two minds about it, 
> >> > one is
> >>
> >> The thing to measure is not so much memory as startup time (how long it
> >> takes to compile an empty source file), which is important for libraries
> >> and programs using a coding style with lots of small source files.
> >
> > With my current changes which has modified the standard builtins to be lazy,
> > but I haven't yet done the machine dependent builtins, I saw a difference of
> > 0.0022 seconds (0.0170 vs. 0.0148) on my 3 year old Intel core 2 laptop.  I 
> > did
> > 14 runs in total, and skipped the fastest 2 runs and slowest 2 runs, and 
> > then
> > averaged the 10 runs in the middle.  I built boostrap builds with release
> > checking with the top of subversion head without changes and with my 
> > changes.
> >
> > So at this point, I'm wondering whether it is worth it to finish the
> > optimization for lazy builtins.
> 
> I think it is.
> 
> +/* Return the tree node for a builtin function or NULL, indexing into the
> +   array.  */
> +static inline tree
> +built_in_set_decl_add (enum built_in_function fncode, int addend)
> +{
> +  return built_in_decls ((enum built_in_function)(((int)fncode) + addend));
> +}
> 
> this doesn't seem to "set" anything.  Mistake?

Yes, it was a mistake (though once the lazy builtins are done, it may do a set
to create the function decl the first time the builtin is touched).

> +static inline tree
> +built_in_decls (enum built_in_function fncode)
> 
> I'd prefer singular, built_in_decl () instead of plural.

Ok.  I was just using the array name, but singular is better.

> +static inline tree
> +implicit_built_in_decls (enum built_in_function fncode)
> 
> Likewise.

Ok, if I'm going to change the name anyway, I think I will make it
built_in_decl_implicit instead.

> +/* Initialize an implicit builtin function.  */
> +static inline void
> +built_in_set_implicit (enum built_in_function fncode, tree implicit)
> 
> and built_in_set_implicit_decl (or rather set_implicit_built_in_decl - hm,
> I guess I'm now bikeshedding ;)).

> The patch is ok with the built_in_set_decl_add mystery resolved
> (I'd best not have the function at all but perform the ugly add and casting
> in the few callers -- it _is_ ugly to do that after all, so it might as well
> look so ;)).

Originally I had kept the math inline, but some of the OMP functions were
running into the 80 character line limit, particularly with the need to make
sure everything correctly cast to int or enum for C++.

> It would be nice to use singular/plural consistently (different currently
> for set and get, I prefer singular).  The rest of the bikeshedding - oh well 
> ;)

And whether it is builtin or built_in, since the compiler is inconsistant.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899


Re: [SPARC] Fix build failure

2011-10-04 Thread David Miller
From: Eric Botcazou 
Date: Tue, 4 Oct 2011 09:37:01 +0200

> Tested on SPARC/Solaris, applied on the mainline.
> 
> 
> 2011-10-04  Eric Botcazou  
> 
>   * config/sparc/sparc.c (sparc_fold_builtin): Use a sequence of tests.

Thanks for fixing this Eric.


Re: [PATCH] Support -m{cpu,tune}=native on Linux/Sparc

2011-10-04 Thread David Miller
From: Gerald Pfeifer 
Date: Tue, 4 Oct 2011 11:25:11 +0200 (CEST)

> On Sun, 2 Oct 2011, David Miller wrote:
>> We always use "Sparc" instead of "SPARC" for the cpu names output
>> by the kernel in /proc/cpuinfo.
> 
> Ah, I see.  Thanks for the explanation.
> 
> What confused me a bit is that a bit earlier in the same table
> there is one (and just one) entry using "SPARC":
> 
>   { "SuperSPARC",   "supersparc" }
> 
> I assume this is for historical reasons?

No, it's a bug, it should be "SuperSparc" too.  I'll fix this up.


PATCH: Update --param case-values-threshold

2011-10-04 Thread H.J. Lu
Hi,

I checked in this patch to fix a typo.

-- 
H.J.
---
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.44
diff -u -p -r1.44 changes.html
--- changes.html4 Oct 2011 09:27:31 -   1.44
+++ changes.html4 Oct 2011 14:33:01 -
@@ -54,7 +54,7 @@
 General Optimizer Improvements

   
-Support for a new parameter --param case-value-threshold=n
+Support for a new parameter --param
case-values-threshold=n
 was added to allow users to control the cutoff between doing
switch statements
 as a series of if statements and using a jump table.
 


Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-04 Thread H.J. Lu
On Tue, Oct 4, 2011 at 1:19 AM, Uros Bizjak  wrote:
> On Tue, Oct 4, 2011 at 10:06 AM, Uros Bizjak  wrote:
 This patch improves address combine for x32 by forcing the memory memory
 operand of PLUS operation into register.  Tested on Linux/x86-64 with
 -mx32.  OK for trunk?
>>>
>>> Does the patch fix
>>>
>>> FAIL: gcc.target/i386/pr45670.c scan-assembler-not lea[lq]
>>>
>>> on x32 ?
>>
>> It does.
>>
>> Following patch is the same, but takes into account that non-matching
>> memory can only be in src2, so it avoids a bunch of unnecessary
>> checks. Can you please check the effects of the patch with some
>> codesize benchmark?
>
> OTOH, x86_64 and i686 targets can also benefit from this change. If
> combine can't create more complex address (covered by lea), then it
> will simply propagate memory operand back into the add insn. It looks
> to me that we can't loose here, so:
>
>  /* Improve address combine.  */
>  if (code == PLUS && MEM_P (src2))
>    src2 = force_reg (mode, src2);
>
> Any opinions?

This patch should fix PR 50603 as well as gcc.target/i386/pr45670.c.
I will check its code size impact on SPEC CPU 2K/2006.

-- 
H.J.


Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series

2011-10-04 Thread H.J. Lu
On Tue, Oct 4, 2011 at 2:28 AM, Gerald Pfeifer  wrote:
> On Mon, 3 Oct 2011, H.J. Lu wrote:
>> I checked it in.
>
> Thanks, H.J.
>
> This needed a small markup fix which I just applied; see below.
>

Thanks.

-- 
H.J.


patch ping: thread testing infrastructure

2011-10-04 Thread Aldy Hernandez

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01272.html


[pph] Add pph_cache_find (issue5169052)

2011-10-04 Thread Diego Novillo
This implements the clean up Gab suggested in
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01882.html

Tested on x86_64.  Committed.


Diego.


* pph-streamer.h (pph_cache_find): New.
Replace pph_cache_select/pph_cache_get combinations with it.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index 197937c..66af782 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -442,10 +442,8 @@ pph_in_cxx_binding_1 (pph_stream *stream)
   if (marker == PPH_RECORD_END)
 return NULL;
   else if (pph_is_reference_marker (marker))
-{
-  pph_cache *cache = pph_cache_select (stream, marker, image_ix);
-  return (cxx_binding *) pph_cache_get (cache, ix);
-}
+return (cxx_binding *) pph_cache_find (stream, marker, image_ix, ix,
+  PPH_cxx_binding);
 
   value = pph_in_tree (stream);
   type = pph_in_tree (stream);
@@ -494,10 +492,8 @@ pph_in_class_binding (pph_stream *stream)
   if (marker == PPH_RECORD_END)
 return NULL;
   else if (pph_is_reference_marker (marker))
-{
-  pph_cache *cache = pph_cache_select (stream, marker, image_ix);
-  return (cp_class_binding *) pph_cache_get (cache, ix);
-}
+return (cp_class_binding *) pph_cache_find (stream, marker, image_ix, ix,
+   PPH_cp_class_binding);
 
   ALLOC_AND_REGISTER (&stream->cache, ix, PPH_cp_class_binding, cb,
   ggc_alloc_cleared_cp_class_binding ());
@@ -521,10 +517,8 @@ pph_in_label_binding (pph_stream *stream)
   if (marker == PPH_RECORD_END)
 return NULL;
   else if (pph_is_reference_marker (marker))
-{
-  pph_cache *cache = pph_cache_select (stream, marker, image_ix);
-  return (cp_label_binding *) pph_cache_get (cache, ix);
-}
+return (cp_label_binding *) pph_cache_find (stream, marker, image_ix, ix,
+   PPH_cp_label_binding);
 
   ALLOC_AND_REGISTER (&stream->cache, ix, PPH_cp_label_binding, lb,
   ggc_alloc_cleared_cp_label_binding ());
@@ -566,10 +560,8 @@ pph_in_binding_level (pph_stream *stream, cp_binding_level 
*to_register)
   if (marker == PPH_RECORD_END)
 return NULL;
   else if (pph_is_reference_marker (marker))
-{
-  pph_cache *cache = pph_cache_select (stream, marker, image_ix);
-  return (cp_binding_level *) pph_cache_get (cache, ix);
-}
+return (cp_binding_level *) pph_cache_find (stream, marker, image_ix, ix,
+   PPH_cp_binding_level);
 
   /* If TO_REGISTER is set, register that binding level instead of the newly
  allocated binding level into slot IX.  */
@@ -656,10 +648,9 @@ pph_in_language_function (pph_stream *stream)
   if (marker == PPH_RECORD_END)
 return NULL;
   else if (pph_is_reference_marker (marker))
-{
-  pph_cache *cache = pph_cache_select (stream, marker, image_ix);
-  return (struct language_function *) pph_cache_get (cache, ix);
-}
+return (struct language_function *) pph_cache_find (stream, marker,
+   image_ix, ix,
+   PPH_language_function);
 
   ALLOC_AND_REGISTER (&stream->cache, ix, PPH_language_function, lf,
   ggc_alloc_cleared_language_function ());
@@ -753,8 +744,8 @@ pph_in_struct_function (pph_stream *stream, tree decl)
 return;
   else if (pph_is_reference_marker (marker))
 {
-  pph_cache *cache = pph_cache_select (stream, marker, image_ix);
-  fn = (struct function *) pph_cache_get (cache, ix);
+  fn = (struct function *) pph_cache_find (stream, marker, image_ix, ix,
+  PPH_function);
   gcc_assert (DECL_STRUCT_FUNCTION (decl) == fn);
   return;
 }
@@ -863,9 +854,9 @@ pph_in_lang_specific (pph_stream *stream, tree decl)
 return;
   else if (pph_is_reference_marker (marker))
 {
-  pph_cache *cache = pph_cache_select (stream, marker, image_ix);
-  DECL_LANG_SPECIFIC (decl) = (struct lang_decl *) pph_cache_get (cache,
-  ix);
+  DECL_LANG_SPECIFIC (decl) =
+   (struct lang_decl *) pph_cache_find (stream, marker, image_ix, ix,
+PPH_lang_decl);
   return;
 }
 
@@ -959,10 +950,8 @@ pph_in_sorted_fields_type (pph_stream *stream)
   if (marker == PPH_RECORD_END)
 return NULL;
   else if (pph_is_reference_marker (marker))
-{
-  pph_cache *cache = pph_cache_select (stream, marker, image_ix);
-  return (struct sorted_fields_type *) pph_cache_get (cache, ix);
-}
+return (struct sorted_fields_type *)
+ pph_cache_find (stream, marker, image_ix, ix, PPH_sorted_fields_type);
 
   num_fields = pph_in_uint (stream);
   ALLOC_AND_REGISTER (&stream->cache, ix, PPH_sorted_fields_type, v

Re: [pph] Prepare for mutation detection [3/3] (issue5139052)

2011-10-04 Thread Diego Novillo
On Wed, Sep 28, 2011 at 17:28, Gabriel Charette  wrote:
> Very nice!
>
> I really like where this is heading :)!
>
> I think it would be great to instrument this to know how many times we
> need to use a PPH_RECORD_MREF, to avoid trashing the cache (i.e.
> potentially there are specific cases where only a small field's value
> changes and pickling the entire tree again is sub-optimal; if instead
> we could detect those cases and simply output the required change
> which could then be applied on the reader side it would potentially be
> better... it is possible that we haven't been affected by these
> specific cases up until now, but that they would all massively result
> in PPH_RECORD_MREFs now (the instrumentation would also potentially
> help us find some of those tricky mutation, if there are any caused by
> other things than overloads, which we aren't aware of yet...just a
> thought).

Yes, that's in the backburner, but I will wait until we have a more
complete implementation to start looking into these optimizations.

One thing that is very easy to do is to only try to handle mutations
of shared trees (tree_node_can_be_shared). Those are the only trees we
should even bother putting in the cache.


Diego.


Re: Vector shuffling

2011-10-04 Thread Artem Shinkarov
Ping.

Richard, the patch in the attachment should be submitted asap. The
other problem could wait for a while.

Thanks,
Artem.

On Tue, Oct 4, 2011 at 12:04 AM, Artem Shinkarov
 wrote:
> On Mon, Oct 3, 2011 at 6:12 PM, Richard Henderson  wrote:
>> On 10/03/2011 09:43 AM, Artem Shinkarov wrote:
>>> Hi, Richard
>>>
>>> There is a problem with the testcases of the patch you have committed
>>> for me. The code in every test-case is doubled. Could you please,
>>> apply the following patch, otherwise it would fail all the tests from
>>> the vector-shuffle-patch would fail.
>>
>> Huh.  Dunno what happened there.  Fixed.
>>
>>> Also, if it is possible, could you change my name from in the
>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>> version is the way I am spelled in the passport, and the name I use in
>>> the ChangeLog.
>>
>> Fixed.
>>
>>
>> r~
>>
>
> Richard, there was a problem causing segfault in ix86_expand_vshuffle
> which I have fixed with the patch attached.
>
> Another thing I cannot figure out is the following case:
> #define vector(elcount, type)  \
> __attribute__((vector_size((elcount)*sizeof(type type
>
> vector (8, short) __attribute__ ((noinline))
> f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
>    return  __builtin_shuffle (x, y, mask);
> }
>
> int main (int argc, char *argv[]) {
>    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
>    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
>    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
>    vector (8, short) v2;
>    int i;
>
>    v2 = f (v0, v1,  mask0);
>    /* v2 =  __builtin_shuffle (v0, v1, mask0); */
>    for (i = 0; i < 8; i ++)
>      __builtin_printf ("%i, ", v2[i]);
>
>    return 0;
> }
>
> I am compiling with support of ssse3, in my case it is ./xgcc -B. b.c
> -O3 -mtune=core2 -march=core2
>
> And I get 1, 1, 1, 3, 4, 5, 1, 7, on the output, which is wrong.
>
> But if I will call __builtin_shuffle directly, then the answer is correct.
>
> Any ideas?
>
>
> Thanks,
> Artem.
>


Re: [patch, arm] Fix PR target/50305 (arm_legitimize_reload_address problem)

2011-10-04 Thread Ulrich Weigand
Ramana Radhakrishnan wrote:
> On 26 September 2011 15:24, Ulrich Weigand  wrote:
> > Is this sufficient, or should I test any other set of options as well?
> 
> Could you run one set of tests with neon ?

Sorry for the delay, but I had to switch to my IGEP board for Neon
support, and that's a bit slow ...   In any case, I've now completed
testing the patch with Neon with no regressions.

> > Just to clarify: in the presence of the other options that are already
> > in dg-options, the test case now fails (with the unpatched compiler)
> > for *any* setting of -mfloat-abi (hard, soft, or softfp).  Do you still
> > want me to add a specific setting to the test case?
> 
> No the mfpu=vfpv3 is fine.

OK, thanks.

> Instead of skipping I was wondering if we
> could prune the outputs to get this through all the testers we have.

Well, the problem is that with certain -march options (e.g. armv7) we get:
/home/uweigand/gcc-head/gcc/testsuite/gcc.target/arm/pr50305.c:1:0:
error: target CPU does not support ARM mode

Since this is an *error*, pruning the output doesn't really help, the
test isn't being run in any case.

> Otherwise this is OK.

Given the above, is the patch now OK as-is?

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


[PATCH] Fix PR50609

2011-10-04 Thread Richard Guenther

This fixes PR50609, we need to do something sensible for VECTOR_TYPE
constructors, which is not dispatch to fold_nonarray_ctor_reference.
Instead fold_array_ctor_reference can be conveniently re-used.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2011-10-04  Richard Guenther  

PR tree-optimization/50609
* gimple-fold.c (fold_array_ctor_reference): Also handle
vector typed constructors.
(fold_ctor_reference): Dispatch to fold_array_ctor_reference
for vector typed constructors.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 179505)
+++ gcc/gimple-fold.c   (working copy)
@@ -2747,10 +2747,12 @@ fold_array_ctor_reference (tree type, tr
   double_int low_bound, elt_size;
   double_int index, max_index;
   double_int access_index;
-  tree domain_type = TYPE_DOMAIN (TREE_TYPE (ctor));
+  tree domain_type = NULL_TREE;
   HOST_WIDE_INT inner_offset;
 
   /* Compute low bound and elt size.  */
+  if (TREE_CODE (TREE_TYPE (ctor)) == ARRAY_TYPE)
+domain_type = TYPE_DOMAIN (TREE_TYPE (ctor));
   if (domain_type && TYPE_MIN_VALUE (domain_type))
 {
   /* Static constructors for variably sized objects makes no sense.  */
@@ -2917,7 +2919,8 @@ fold_ctor_reference (tree type, tree cto
   if (TREE_CODE (ctor) == CONSTRUCTOR)
 {
 
-  if (TREE_CODE (TREE_TYPE (ctor)) == ARRAY_TYPE)
+  if (TREE_CODE (TREE_TYPE (ctor)) == ARRAY_TYPE
+ || TREE_CODE (TREE_TYPE (ctor)) == VECTOR_TYPE)
return fold_array_ctor_reference (type, ctor, offset, size);
   else
return fold_nonarray_ctor_reference (type, ctor, offset, size);


Re: [RFC] Builtin infrastructure change

2011-10-04 Thread Joseph S. Myers
On Mon, 3 Oct 2011, Michael Meissner wrote:

> On Sat, Oct 01, 2011 at 02:11:27PM +, Joseph S. Myers wrote:
> > On Fri, 30 Sep 2011, Michael Meissner wrote:
> > 
> > > Is this enough of a savings to continue on?  I'm of two minds about it, 
> > > one is
> > 
> > The thing to measure is not so much memory as startup time (how long it 
> > takes to compile an empty source file), which is important for libraries 
> > and programs using a coding style with lots of small source files.
> 
> With my current changes which has modified the standard builtins to be lazy,
> but I haven't yet done the machine dependent builtins, I saw a difference of
> 0.0022 seconds (0.0170 vs. 0.0148) on my 3 year old Intel core 2 laptop.  I 
> did
> 14 runs in total, and skipped the fastest 2 runs and slowest 2 runs, and then
> averaged the 10 runs in the middle.  I built boostrap builds with release
> checking with the top of subversion head without changes and with my changes.
> 
> So at this point, I'm wondering whether it is worth it to finish the
> optimization for lazy builtins.

That's a 13% improvement.  It's very definitely worthwhile - most compile 
time improvements are probably 1% or less.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [patch RFC,PR50038]

2011-10-04 Thread Joseph S. Myers
On Tue, 4 Oct 2011, Ilya Tocar wrote:

> Hi everyone,
> 
> This patch fixes PR 50038 (redundant zero extensions) by modifying
> implicit-zee pass
> to also remove unneeded zero extensions from QImode to SImode.

Hardcoding particular modes like this in the target-independent parts of 
the compiler is fundamentally ill-conceived.  Right now it hardcodes the 
(SImode, DImode) pair.  You're adding hardcoding of (QImode, SImode) as 
well.  But really it should consider all pairs of (integer mode, wider 
integer mode), with the machine description (or target hooks) determining 
which pairs are relevant on a particular target.  Changing it not to 
hardcode particular modes would be better than adding a second pair.

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [Ping] RE: CR16 Port addition

2011-10-04 Thread Joseph S. Myers
On Tue, 4 Oct 2011, Jayant R. Sonar wrote:

> PING 4: For review
> Hi,
> Can someone please review the patch:
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01441.html

Could someone with general back-end expertise look at this?  I don't have 
the expertise to review whether the back end follows good practice for 
back-end implementation (beyond the general sort of issues I reviewed a 
previous iteration of this patch for).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-04 Thread H.J. Lu
On Tue, Oct 4, 2011 at 1:19 AM, Uros Bizjak  wrote:
> On Tue, Oct 4, 2011 at 10:06 AM, Uros Bizjak  wrote:
 This patch improves address combine for x32 by forcing the memory memory
 operand of PLUS operation into register.  Tested on Linux/x86-64 with
 -mx32.  OK for trunk?
>>>
>>> Does the patch fix
>>>
>>> FAIL: gcc.target/i386/pr45670.c scan-assembler-not lea[lq]
>>>
>>> on x32 ?
>>
>> It does.
>>
>> Following patch is the same, but takes into account that non-matching
>> memory can only be in src2, so it avoids a bunch of unnecessary
>> checks. Can you please check the effects of the patch with some
>> codesize benchmark?
>
> OTOH, x86_64 and i686 targets can also benefit from this change. If
> combine can't create more complex address (covered by lea), then it
> will simply propagate memory operand back into the add insn. It looks
> to me that we can't loose here, so:
>
>  /* Improve address combine.  */
>  if (code == PLUS && MEM_P (src2))
>    src2 = force_reg (mode, src2);
>
> Any opinions?
>

It doesn't work with 64bit libstdc++:

/export/gnu/import/git/gcc/libstdc++-v3/src/strstream.cc: In member
function ‘void std::strstream::_ZTv0_n24_NSt9strstreamD1Ev()’:
/export/gnu/import/git/gcc/libstdc++-v3/src/strstream.cc:418:1: error:
insn does not satisfy its constraints:
(insn 3 2 4 (set (reg:DI 59)
(mem:DI (plus:DI (reg:DI 39 r10)
(const_int -24 [0xffe8])) [0 S8 A8])) 62
{*movdi_internal_rex64}
 (nil))
/export/gnu/import/git/gcc/libstdc++-v3/src/strstream.cc:418:1:
internal compiler error: in final_scan_insn, at final.c:2648
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[7]: *** [strstream.lo] Error 1
make[7]: *** Waiting for unfinished jobs


-- 
H.J.


Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-10-04 Thread Tom de Vries
On 10/04/2011 03:03 PM, Richard Guenther wrote:
> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries  wrote:
>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
 On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries  
 wrote:
> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries  
>> wrote:
>>> Richard,
>>>
>>> I got a patch for PR50527.
>>>
>>> The patch prevents the alignment of vla-related allocas to be set to
>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after 
>>> folding
>>> the alloca.
>>>
>>> Bootstrapped and regtested on x86_64.
>>>
>>> OK for trunk?
>>
>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>> the vectorizer then will no longer see that the arrays are properly 
>> aligned.
>>
>> I'm not sure what the best thing to do is here, other than trying to 
>> record
>> the alignment requirement of the VLA somewhere.
>>
>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>> has the issue that it will force stack-realignment which isn't free (and 
>> the
>> point was to make the decl cheaper than the alloca).  But that might
>> possibly be the better choice.
>>
>> Any other thoughts?
>
> How about the approach in this (untested) patch? Using the DECL_ALIGN of 
> the vla
> for the new array prevents stack realignment for folded vla-allocas, also 
> for
> large vlas.
>
> This will not help in vectorizing large folded vla-allocas, but I think 
> it's not
> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that 
> has
> been the case up until we started to fold). If you want to trigger 
> vectorization
> for a vla, you can still use the aligned attribute on the declaration.
>
> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without 
> using
> an attribute on the decl. This patch exploits this by setting it at the 
> end of
> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
> propagation though, because although the ptr_info of the lhs is 
> propagated via
> copy_prop afterwards, it's not propagated anymore via ccp.
>
> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of 
> ccp2 and
> not fold during ccp3.

 Ugh, somehow I like this the least ;)

 How about lowering VLAs to

   p = __builtin_alloca (...);
   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));

 and not assume anything for alloca itself if it feeds a
 __builtin_assume_aligned?

 Or rather introduce a __builtin_alloca_with_align () and for VLAs do

  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));

 that's less awkward to use?

 Sorry for not having a clear plan here ;)

>>>
>>> Using assume_aligned is a more orthogonal way to represent this in gimple, 
>>> but
>>> indeed harder to use.
>>>
>>> Another possibility is to add a 'tree vla_decl' field to struct
>>> gimple_statement_call, which is probably the easiest to implement.
>>>
>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>> decided to try this one. Attached patch implements my first stab at this  
>>> (now
>>> testing on x86_64).
>>>
>>> Is this an acceptable approach?
>>>
>>
>> bootstrapped and reg-tested (including ada) on x86_64.
>>
>> Ok for trunk?
> 
> The idea is ok I think.  But
> 
>  case BUILT_IN_ALLOCA:
> +case BUILT_IN_ALLOCA_WITH_ALIGN:
>/* If the allocation stems from the declaration of a variable-sized
>  object, it cannot accumulate.  */
>target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>if (target)
> return target;
> +  if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
> + == BUILT_IN_ALLOCA_WITH_ALIGN)
> +   {
> + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
> +  
> built_in_decls[BUILT_IN_ALLOCA],
> +  1, CALL_EXPR_ARG (exp, 0));
> + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
> + exp = new_call;
> +   }
> 
> Ick.  Why can't the rest of the compiler not handle
> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
> 

We can set the assembler name in the local_define_builtin call. But that will
still create a call alloca (12, 4). How do we deal with the second argument?

> I don't see why you still need the special late CCP pass.
> 

For alloca_with_align, the align will minimally be the 2nd argument. This is
independent of folding, and we can propagate this information in every ccp

Re: Vector shuffling

2011-10-04 Thread Richard Henderson
On 10/04/2011 08:18 AM, Artem Shinkarov wrote:
> Ping.
> 
> Richard, the patch in the attachment should be submitted asap. The
> other problem could wait for a while.

The patch in the attachment is wrong too.  I've re-written the x86
backend support, adding TARGET_XOP in the process.  I've also re-written
the test cases so that they actually test what we wanted.

Patch to follow once testing is complete.


r~


Re: [PATCH] [mingw] fix typo: s/_REENTRANCE/_REENTRANT/

2011-10-04 Thread Kai Tietz
2011/10/3 Ozkan Sezer :
> On Mon, Oct 3, 2011 at 5:56 PM, Kai Tietz  wrote:
>> 2011/10/3 Ozkan Sezer :
>>> PING?
>>>
>>> On Thu, Sep 22, 2011 at 2:28 PM, Ozkan Sezer  wrote:
 Hi:

 Unless I'm missing something, the mingw CPP_SPEC changes introduced in
 r171833 have a typo: -D_REENTRANCE should read -D_REENTRANT . Patchlet
 below.  Please review, and apply if it's OK.


 config/i386/mingw-w64.h (CPP_SPEC): Rename _REENTRANCE to _REENTRANT.
 config/i386/mingw32.h (CPP_SPEC): Likewise.

 Index: config/i386/mingw-w64.h
 ===
 --- config/i386/mingw-w64.h     (revision 171833)
 +++ config/i386/mingw-w64.h     (working copy)
 @@ -25,8 +25,8 @@
  #undef CPP_SPEC
  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
                 "%{municode:-DUNICODE} " \
 -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
 -                "%{" SPEC_PTHREAD2 ":-U_REENTRANCE} "
 +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
 +                "%{" SPEC_PTHREAD2 ":-U_REENTRANT} "

  #undef STARTFILE_SPEC
  #define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
 Index: config/i386/mingw32.h
 ===
 --- config/i386/mingw32.h       (revision 177789)
 +++ config/i386/mingw32.h       (working copy)
 @@ -87,7 +87,7 @@

  #undef CPP_SPEC
  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
 -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
 +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
                 "%{" SPEC_PTHREAD2 ": } "

  /* For Windows applications, include more libraries, but always include


 --
 O.S.

>>
>> Patch is ok together with a ChangeLog.
>>
>> Thanks,
>> Kai
>>
>
> Here it is again with a changelog entry (Thought that I included one
> the first time..)  Thanks.
>
>        * config/i386/mingw-w64.h (CPP_SPEC): Rename _REENTRANCE to _REENTRANT.
>        * config/i386/mingw32.h (CPP_SPEC): Likewise.
>
> Index: config/i386/mingw-w64.h
> ===
> --- config/i386/mingw-w64.h     (revision 171833)
> +++ config/i386/mingw-w64.h     (working copy)
> @@ -25,8 +25,8 @@
>  #undef CPP_SPEC
>  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
>                 "%{municode:-DUNICODE} " \
> -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
> -                "%{" SPEC_PTHREAD2 ":-U_REENTRANCE} "
> +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
> +                "%{" SPEC_PTHREAD2 ":-U_REENTRANT} "
>
>  #undef STARTFILE_SPEC
>  #define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
> Index: config/i386/mingw32.h
> ===
> --- config/i386/mingw32.h       (revision 177789)
> +++ config/i386/mingw32.h       (working copy)
> @@ -87,7 +87,7 @@
>
>  #undef CPP_SPEC
>  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
> -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
> +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
>                 "%{" SPEC_PTHREAD2 ": } "
>
>  /* For Windows applications, include more libraries, but always include
>
> --
> O.S.

Applied for you at revision 179511.

Regards,
Kai


Two patches for gcj on ARM backported to gcc 4.6 branch

2011-10-04 Thread Andrew Haley
Two patches, both in trunk for some time, which fix crashes on ARM.

Andrew.


2011-10-04  Andrew Haley  

* src/arm/ffi.c (FFI_INIT_TRAMPOLINE): Clear icache.

Index: libffi/src/arm/ffi.c
===
--- libffi/src/arm/ffi.c(revision 179511)
+++ libffi/src/arm/ffi.c(working copy)
@@ -341,12 +341,16 @@
 ({ unsigned char *__tramp = (unsigned char*)(TRAMP);   \
unsigned int  __fun = (unsigned int)(FUN);  \
unsigned int  __ctx = (unsigned int)(CTX);  \
+   unsigned char *insns = (unsigned char *)(CTX);   \
*(unsigned int*) &__tramp[0] = 0xe92d000f; /* stmfd sp!, {r0-r3} */ \
*(unsigned int*) &__tramp[4] = 0xe59f; /* ldr r0, [pc] */   \
*(unsigned int*) &__tramp[8] = 0xe59ff000; /* ldr pc, [pc] */   \
*(unsigned int*) &__tramp[12] = __ctx;  \
*(unsigned int*) &__tramp[16] = __fun;  \
-   __clear_cache((&__tramp[0]), (&__tramp[19]));   \
+   __clear_cache((&__tramp[0]), (&__tramp[19])); /* Clear data mapping.  */ \
+   __clear_cache(insns, insns + 3 * sizeof (unsigned int)); \
+ /* Clear instruction   \
+mapping.  */\
  })


2011-07-12  Andrew Haley  

* java/lang/natClass.cc (finalize): Make sure that the class
really has an engine.

Index: libjava/java/lang/natClass.cc
===
--- libjava/java/lang/natClass.cc   (revision 179511)
+++ libjava/java/lang/natClass.cc   (working copy)
@@ -668,7 +668,9 @@
 void
 java::lang::Class::finalize (void)
 {
-  engine->unregister(this);
+  // Array classes don't have an engine, and don't need to be finalized.
+  if (engine)
+engine->unregister(this);
 }

 #ifdef INTERPRETER



Re: [Patch, Fortran] PR 35831: [F95] Shape mismatch check missing for dummy procedure argument

2011-10-04 Thread Janus Weil
Hi Mikael,

>> here is a patch for a rather long-standing PR. It continues my ongoing
>> campaign of improving the checks for "procedure characteristics" (cf.
>> F08 chapter 12.3), which are relevant for dummy procedures, procedure
>> pointer assignments, overriding of type-bound procedures, etc.
>>
>> This particular patch checks for the correct shape of array arguments,
>> in a manner similar to the recently added check for the string length
>> (PR 49638), namely via 'gfc_dep_compare_expr'.
>>
>> The hardest thing about this PR was to find out what exactly the
>> standard requires (cf. c.l.f. thread linked in comment #12): Only the
>> shape of the argument has to match (i.e. upper minus lower bound), not
>> the bounds themselves (no matter if the bounds are constant or not).
>>
>> I also added a FIXME, in order to remind myself of adding the same
>> check for function results soon.
>>
>> The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk?
>>
>
> The patch is basically OK.

thanks for the review!


> Any reason to keep this disabled?
>
> +             case -2:
> +               /* FIXME: Implement a warning for this case.
> +               gfc_warning ("Possible shape mismatch in argument '%s'",
> +                           s1->name);*/
> +               break;

Well, in principle not. As you might have noticed, such a warning is
already in effect e.g. when calling 'gfc_dep_compare_expr' from
'gfc_check_typebound_override'.

The problem with 'check_dummy_characteristics' is that it does not
directly throw an error, but only writes a message into a string
variable and returns FAILURE to the calling procedure, which itself is
responsible for throwing the error.

This mechanism is used to enhance flexibility and improve the error
message by prepending a string describing the context in which the
error message occurs (e.g. we might see a mismatch of characteristics
either in a procedure pointer assignment or when passing an actual
argument for a dummy procedure, etc).

The situation is further complicated by the fact that
'check_dummy_characteristics' is also called by
'gfc_compare_interfaces', which itself does not throw an error either,
but again just passes a string to the calling procedure.

If you have a cute idea how to elegantly introduce warnings into this
mechanism, I'm all ears. Otherwise I'll just start by committing the
patch as posted ...

Thanks again for the feedback,
Janus


Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-04 Thread Uros Bizjak
On Tue, Oct 4, 2011 at 6:06 PM, H.J. Lu  wrote:

>> OTOH, x86_64 and i686 targets can also benefit from this change. If
>> combine can't create more complex address (covered by lea), then it
>> will simply propagate memory operand back into the add insn. It looks
>> to me that we can't loose here, so:
>>
>>  /* Improve address combine.  */
>>  if (code == PLUS && MEM_P (src2))
>>    src2 = force_reg (mode, src2);
>>
>> Any opinions?
>>
>
> It doesn't work with 64bit libstdc++:

Yeah, yeah. ix86_output_mi_thunk has some ...  issues.

Please try attached patch that introduces ix86_emit_binop and uses it
in a bunch of places.

Uros.
Index: i386-protos.h
===
--- i386-protos.h   (revision 179506)
+++ i386-protos.h   (working copy)
@@ -94,6 +94,7 @@ extern bool ix86_lea_outperforms (rtx, unsigned in
  unsigned int, unsigned int);
 extern bool ix86_avoid_lea_for_add (rtx, rtx[]);
 extern bool ix86_avoid_lea_for_addr (rtx, rtx[]);
+extern void ix86_emit_binop (enum rtx_code, enum machine_mode, rtx, rtx);
 extern void ix86_split_lea_for_addr (rtx[], enum machine_mode);
 extern bool ix86_lea_for_add_ok (rtx, rtx[]);
 extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high);
Index: i386.c
===
--- i386.c  (revision 179506)
+++ i386.c  (working copy)
@@ -15727,6 +15727,10 @@ ix86_fixup_binary_operands (enum rtx_code code, en
   if (MEM_P (src1) && !rtx_equal_p (dst, src1))
 src1 = force_reg (mode, src1);
 
+  /* Improve address combine.  */
+  if (code == PLUS && MEM_P (src2))
+src2 = force_reg (mode, src2);
+
   operands[1] = src1;
   operands[2] = src2;
   return dst;
@@ -16470,6 +16474,20 @@ ix86_avoid_lea_for_addr (rtx insn, rtx operands[])
   return !ix86_lea_outperforms (insn, regno0, regno1, regno2, split_cost);
 }
 
+/* Emit x86 binary operand CODE in mode MODE, where the first operand
+   matches destination.  RTX includes clobber of FLAGS_REG.  */
+
+extern void ix86_emit_binop (enum rtx_code code, enum machine_mode mode,
+rtx dst, rtx src)
+{
+  rtx op, clob;
+
+  op = gen_rtx_SET (VOIDmode, dst, gen_rtx_fmt_ee (code, mode, dst, src));
+  clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+  
+  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob)));
+}
+
 /* Split lea instructions into a sequence of instructions
which are executed on ALU to avoid AGU stalls.
It is assumed that it is allowed to clobber flags register
@@ -16482,8 +16500,7 @@ ix86_split_lea_for_addr (rtx operands[], enum mach
   unsigned int regno1 = INVALID_REGNUM;
   unsigned int regno2 = INVALID_REGNUM;
   struct ix86_address parts;
-  rtx tmp, clob;
-  rtvec par;
+  rtx tmp;
   int ok, adds;
 
   ok = ix86_decompose_address (operands[1], &parts);
@@ -16515,14 +16532,7 @@ ix86_split_lea_for_addr (rtx operands[], enum mach
  gcc_assert (regno2 != regno0);
 
  for (adds = parts.scale; adds > 0; adds--)
-   {
- tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
- tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
- clob = gen_rtx_CLOBBER (VOIDmode,
- gen_rtx_REG (CCmode, FLAGS_REG));
- par = gen_rtvec (2, tmp, clob);
- emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
-   }
+   ix86_emit_binop (PLUS, mode, operands[0], parts.index);
}
   else
{
@@ -16531,30 +16541,14 @@ ix86_split_lea_for_addr (rtx operands[], enum mach
emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
 
  /* Use shift for scaling.  */
- tmp = gen_rtx_ASHIFT (mode, operands[0],
-   GEN_INT (exact_log2 (parts.scale)));
- tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
- clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
- par = gen_rtvec (2, tmp, clob);
- emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ ix86_emit_binop (ASHIFT, mode, operands[0],
+  GEN_INT (exact_log2 (parts.scale)));
 
  if (parts.base)
-   {
- tmp = gen_rtx_PLUS (mode, operands[0], parts.base);
- tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
- clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, 
FLAGS_REG));
- par = gen_rtvec (2, tmp, clob);
- emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
-   }
+   ix86_emit_binop (PLUS, mode, operands[0], parts.base);
 
  if (parts.disp && parts.disp != const0_rtx)
-   {
- tmp = gen_rtx_PLUS (mode, operands[0], parts.disp);
- tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
- clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, 
FLAGS_REG));
- par = gen_rtvec

Re: [PATCH] Support -m{cpu,tune}=native on Linux/Sparc

2011-10-04 Thread David Miller
From: Gerald Pfeifer 
Date: Tue, 4 Oct 2011 11:25:11 +0200 (CEST)

> On Sun, 2 Oct 2011, David Miller wrote:
>> We always use "Sparc" instead of "SPARC" for the cpu names output
>> by the kernel in /proc/cpuinfo.
> 
> Ah, I see.  Thanks for the explanation.
> 
> What confused me a bit is that a bit earlier in the same table
> there is one (and just one) entry using "SPARC":
> 
>   { "SuperSPARC",   "supersparc" }
> 
> I assume this is for historical reasons?

Committed to trunk, as discussed:


Small -m{cpu,tune}=native fix under Linux/Sparc.

* config/sparc/driver-sparc.c (cpu_names): Fix string for supersparc
under Linux.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179510 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   |5 +
 gcc/config/sparc/driver-sparc.c |2 +-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2f4d9a6..788c5f5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2011-10-04  David S. Miller  
+
+   * config/sparc/driver-sparc.c (cpu_names): Fix string for supersparc
+   under Linux.
+
 2011-10-04  Jakub Jelinek  
 
PR tree-optimization/50604
diff --git a/gcc/config/sparc/driver-sparc.c b/gcc/config/sparc/driver-sparc.c
index 9622728..3973fbc 100644
--- a/gcc/config/sparc/driver-sparc.c
+++ b/gcc/config/sparc/driver-sparc.c
@@ -58,7 +58,7 @@ static const struct cpu_names {
   { "SPARC-T3","niagara3" },
   { "SPARC-T4","niagara4" },
 #else
-  { "SuperSPARC",  "supersparc" },
+  { "SuperSparc",  "supersparc" },
   { "HyperSparc",  "hypersparc" },
   { "SpitFire","ultrasparc" },
   { "BlackBird",   "ultrasparc" },
-- 
1.7.6.401.g6a319



Re: [Patch] Support DEC-C extensions

2011-10-04 Thread Douglas Rupp

On 10/3/2011 8:35 AM, Gabriel Dos Reis wrote:

"unnamed variadic functions" sounds as if the function itself is
unnamed, so not good.


-funnamed-variadic-parameter


How about
-fvariadic-parameters-unnamed

there's already a -fvariadic-macros, so maybe putting variadic first is 
more consistent?




Re: [RFC PATCH] restrict_based_on_field attribute

2011-10-04 Thread Jason Merrill
This won't affect vectors that are passed by non-restrict reference, 
correct?  We don't want to break code that, say, takes the address of an 
element and then uses it later unless we've made that invalid by marking 
the vector as restrict.


Jason


Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-04 Thread H.J. Lu
On Tue, Oct 4, 2011 at 10:32 AM, Uros Bizjak  wrote:
> On Tue, Oct 4, 2011 at 6:06 PM, H.J. Lu  wrote:
>
>>> OTOH, x86_64 and i686 targets can also benefit from this change. If
>>> combine can't create more complex address (covered by lea), then it
>>> will simply propagate memory operand back into the add insn. It looks
>>> to me that we can't loose here, so:
>>>
>>>  /* Improve address combine.  */
>>>  if (code == PLUS && MEM_P (src2))
>>>    src2 = force_reg (mode, src2);
>>>
>>> Any opinions?
>>>
>>
>> It doesn't work with 64bit libstdc++:
>
> Yeah, yeah. ix86_output_mi_thunk has some ...  issues.
>
> Please try attached patch that introduces ix86_emit_binop and uses it
> in a bunch of places.
>
> Uros.
>

I tried it on GCC.  There are no regressions.  The bugs are fixed for x32.
Here are size comparison with GCC runtime libraries on ia32, x32 and
x86-64:

   textdata bss dec hex filename
 103304 480 368  104152   196d8 32/libgcc/32/libgcc_s.so
 103048 480 368  103896 
195d8   
../../build-x86_64-linux/x86_64-unknown-linux-gnu/32/libgcc/32/libgcc_s.so
   textdata bss dec hex filename
10538463100 360 1057306  10221a 32/libgfortran/.libs/libgfortran.so
10535743100 360 1057034 
10210a  
../../build-x86_64-linux/x86_64-unknown-linux-gnu/32/libgfortran/.libs/libgfortran.so
   textdata bss dec hex filename
  56817 536 156   57509e0a5 32/libgomp/.libs/libgomp.so
  56817 536 156   57509 
e0a5
../../build-x86_64-linux/x86_64-unknown-linux-gnu/32/libgomp/.libs/libgomp.so
   textdata bss dec hex filename
 1078151832  839016  948663   e79b7 32/libmudflap/.libs/libmudflap.so
 1078151832  839016  948663 
e79b7   
../../build-x86_64-linux/x86_64-unknown-linux-gnu/32/libmudflap/.libs/libmudflap.so
   textdata bss dec hex filename
 1108901976  839052  951918   e866e 32/libmudflap/.libs/libmudflapth.so
 1108901976  839052  951918 
e866e   
../../build-x86_64-linux/x86_64-unknown-linux-gnu/32/libmudflap/.libs/libmudflapth.so
   textdata bss dec hex filename
  9911329124748  106773   1a115 32/libobjc/.libs/libobjc.so
  9911329124748  106773 
1a115   
../../build-x86_64-linux/x86_64-unknown-linux-gnu/32/libobjc/.libs/libobjc.so
   textdata bss dec hex filename
 3574661148  12  358626   578e2 32/libquadmath/.libs/libquadmath.so
 3572181148  12  358378 
577ea   
../../build-x86_64-linux/x86_64-unknown-linux-gnu/32/libquadmath/.libs/libquadmath.so
   textdata bss dec hex filename
   6018 388   86414190e 32/libssp/.libs/libssp.so
   6018 388   86414 
190e
../../build-x86_64-linux/x86_64-unknown-linux-gnu/32/libssp/.libs/libssp.so
   textdata bss dec hex filename
 884093   18600   27064  929757   e2fdd 32/libstdc++-v3/src/.libs/libstdc++.so
 884189   18600   27064  929853 
e303d   
../../build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/libstdc++.so
   textdata bss dec hex filename
  83012 944 696   84652   14aac libgcc/libgcc_s.so
  83012 944 696   84652 
14aac   ../../build-x86_64-linux/x86_64-unknown-linux-gnu/libgcc/libgcc_s.so
   textdata bss dec hex filename
11445936520 568 1151681  1192c1 libgfortran/.libs/libgfortran.so
11445616520 568 1151649 
1192a1  
../../build-x86_64-linux/x86_64-unknown-linux-gnu/libgfortran/.libs/libgfortran.so
   textdata bss dec hex filename
  519581080 260   53298d032 libgomp/.libs/libgomp.so
  519581080 260   53298 
d032
../../build-x86_64-linux/x86_64-unknown-linux-gnu/libgomp/.libs/libgomp.so
   textdata bss dec hex filename
 1118313332 1634728 1749891  1ab383 libmudflap/.libs/libmudflap.so
 1118313332 1634728 1749891 
1ab383  
../../build-x86_64-linux/x86_64-unknown-linux-gnu/libmudflap/.libs/libmudflap.so
   textdata bss dec hex filename
 1164303612 1634768 1754810  1ac6ba libmudflap/.libs/libmudflapth.so
 1164303612 1634768 1754810 
1ac6ba  
../../build-x86_64-linux/x86_64-unknown-linux-gnu/libmudflap/.libs/libmudflapth.so
   textdata bss dec hex filename
 10224048729320  116432   1c6d0 libobjc/.libs/libobjc.so
 10224048729320  116432 
1c6d0   
../../build-x86_64-linux/x86_64-unknown-linux-gnu/libobjc/.libs/libobjc.so
   textdata bss dec hex filename
 2132151744  16  214975   347bf libquadmath/.libs/libquadmath.so
 2132311744  16  214991 
347cf   
../../build-x86_64-linux/x86_64-unknown-linux-gnu/libquadmath/.libs/libquadmath.so
   textdata bss dec hex filename
   6395 768  1671791c0b libssp/.libs/libssp.so
   6395 768  167179 
1c0b../../build-x86_64-linux/x86_64-unknown-linux-gnu/libssp/.libs/libssp.so
   textdata bss de

Re: [RFC PATCH] restrict_based_on_field attribute

2011-10-04 Thread Jakub Jelinek
On Tue, Oct 04, 2011 at 02:28:13PM -0400, Jason Merrill wrote:
> This won't affect vectors that are passed by non-restrict reference,
> correct?  We don't want to break code that, say, takes the address

It will ATM also affect global std::vector variables, and std::vector
variables passed by invisible reference.

> of an element and then uses it later unless we've made that invalid
> by marking the vector as restrict.

If you take an address of an element, that would be an expression based
on the restricted field(s).  As such, either PTA would figure out it
is based on it and would use the same restrict tag, or it should figure out
it doesn't know what the pointer points to and shouldn't have a (restr)
on it.

Jakub


Re: patch ping: thread testing infrastructure

2011-10-04 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/04/11 08:43, Aldy Hernandez wrote:
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01272.html
So others have already mentioned the name "memmodel" as being
unsatisfactory...  From the list, I liked simulate-thread (before
looking at your message where you voted for simulate-thread).

in memmodel-gdb-test, is there any reasonable way to detect if we're
not exactly 3 frames down from dg-test and that the local variable
"name" holds the filename of hte testcase?  Are these requirements
documented anywhere?

In the same code, don't you want to use somethign other than
GUALITY_GDB_NAME?  Clearly we've got two gcc testing frameworks that
need gdb.  A generic name might work better GDB_FOR_GCC_TESTING?  Or
something like that.

In remote_expect_target, don't you need to reorder the cases to have
the too-old-gdb case first.  Otherwise you can get inconsistent
results due to the way matching works in tcl/expect.  On a larger
note, can you use the routines from gcc-gdb-test.exp rather than
duplicating it?


In memmodel/guality.h, rather than including guality.h, would it make
sense to move the bits into a centralized place for use by any GCC
testing framework that wants to fire up gdb?

In memmodel.exp, do we really want to contiue to have references to
guality (check_guality, GUALITY_GDB_NAME, GUALCHKVAL)?  Is any of
those commonizable with the guality framework?


Do we have allow-store-data-races as a parameter in GCC right now?  If
not you'll need to do something about the tests themselves.


It's mostly OK -- just those nits related to copying the guality
stuff.  tcl may not make it feasible to share as much between guality
and simulate-thread as we'd like.


Jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOi1NTAAoJEBRtltQi2kC7zOAIALYCVEDV/jjzDEe9p41zdZe1
jjkEDlzPI58Gv3Z5gg1ueb4JOjTMKFepMwNb4qcIA45XViIPMCXtxZB411+Uvem9
xAqSOZlnauFspoa0zRUBlMRoozK+WIYsNnrx0KXj6xcWG2IF+843x/mUZFbT94fR
f2MBpbeRbFd5C/tEJShs1QZBbE+7tBC73Yf1L/u4vNGi38L9KLDRsbbKrvW0sQkC
16feM1srnZj0c4ZyUwdzfONi3JkIMc3Gt+s0c3TcwM9UtLQKGCEqy5H4rNjk65zb
PSKSDKkTVpK3QvqvDJTGycYgFzaKB/oM8oIjU+KLo+9x/al8SReZet0cP5N2bl4=
=Eeaa
-END PGP SIGNATURE-


Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-04 Thread Uros Bizjak
On Tue, Oct 4, 2011 at 8:37 PM, H.J. Lu  wrote:

 OTOH, x86_64 and i686 targets can also benefit from this change. If
 combine can't create more complex address (covered by lea), then it
 will simply propagate memory operand back into the add insn. It looks
 to me that we can't loose here, so:

  /* Improve address combine.  */
  if (code == PLUS && MEM_P (src2))
    src2 = force_reg (mode, src2);

 Any opinions?

>>>
>>> It doesn't work with 64bit libstdc++:
>>
>> Yeah, yeah. ix86_output_mi_thunk has some ...  issues.
>>
>> Please try attached patch that introduces ix86_emit_binop and uses it
>> in a bunch of places.

> I tried it on GCC.  There are no regressions.  The bugs are fixed for x32.
> Here are size comparison with GCC runtime libraries on ia32, x32 and
> x86-64:

>  884093   18600   27064  929757   e2fdd old libstdc++.so
>  884189   18600   27064  929853   e303d new libs/libstdc++.so
>
> The new code is
>
> mov    0xc(%edi),%eax
> mov    %eax,0x8(%esi)
> mov    -0xc(%eax),%eax
> mov    0x10(%edi),%edx
> lea    0x8(%esi,%eax,1),%eax
>
> The old one is
>
> mov    0xc(%edi),%edx
> lea    0x8(%esi),%eax
> mov    %edx,0x8(%esi)
> add    -0xc(%edx),%eax
> mov    0x10(%edi),%edx

The new code merged lea+add into one lea, so it looks quite OK to me.

Do you have some performance numbers?

Thanks,
Uros.


[cxx-mem-model] don't issue non-atomic load/store

2011-10-04 Thread Andrew MacLeod
I think I had a similar patch earlier that got forgotten, but I've made 
some changes.


if  (sizeof(atomic) > target_word_size) then defaulting to  a load or 
store with barriers will result in non-atomic code, which is bad. 
instead, the default has now been changed.


Store will try an exchange, and throw away the result.  If that fails, 
then default to a function call.
Load will try to do a compare_swap (mem, 0 , 0), which will get the 
current value of *mem if it is not 0, and if it is 0, then it will cause 
a 'mostly' harmless store  of 0 to a location already containing 0.


bootstraps with no new regressions on x86_64-unknown-linux-gnu

Andrew


* optabs.c (expand_sync_mem_load): Don't expand into a default load if
the type is larger than a word. Try a compare_and_swap with 0.
(expand_sync_mem_store): Return const0_rtx if a store is generated. If
type is larger than a word try an exchange, then fail.
* builtins.c (expand_builtin_sync_mem_store): Return a value.
(expand_builtin): If no store generated, leave a function call.
* expr.h (expand_sync_mem_store): Prototype returns value.

* testsuite/gcc.dg/memmodel/sync-other-int128.c: Don't xfail any more.


Index: optabs.c
===
*** optabs.c(revision 178916)
--- optabs.c(working copy)
*** expand_sync_mem_load (rtx target, rtx me
*** 7032,7038 
return ops[0].value;
  }
  
!   /* If there is no load, default to a move with barriers. */
if (target == const0_rtx)
  target = gen_reg_rtx (mode);
  
--- 7032,7050 
return ops[0].value;
  }
  
!   /* If there is no load pattern, default to a move with barriers. If the size
!  of the object is greater than word size on this target, a default load 
!  will not be atomic.  */
!   if (GET_MODE_PRECISION(mode) > BITS_PER_WORD)
! {
!   /* Issue val = compare_and_swap (mem, 0 , 0).
!This may cause the occasional harmless store of 0 when the value is
!already 0, but do it anyway until its determined to be invalid.  */
!   target = expand_val_compare_and_swap (mem, const0_rtx, const0_rtx, 
!   target);
!   return target;
! }
! 
if (target == const0_rtx)
  target = gen_reg_rtx (mode);
  
*** expand_sync_mem_load (rtx target, rtx me
*** 7048,7060 
return target;
  }
  
! /* This function expands the atomic load operation:
!return the atomically loaded value in MEM.
! 
 MEMMODEL is the memory model variant to use.
!TARGET is an option place to stick the return value.  */
  
! void
  expand_sync_mem_store (rtx mem, rtx val, enum memmodel model)
  {
enum machine_mode mode = GET_MODE (mem);
--- 7060,7071 
return target;
  }
  
! /* This function expands the atomic store operation:
!Atomically store VAL in MEM.
 MEMMODEL is the memory model variant to use.
!function returns const0_rtx if a pattern was emitted.  */
  
! rtx
  expand_sync_mem_store (rtx mem, rtx val, enum memmodel model)
  {
enum machine_mode mode = GET_MODE (mem);
*** expand_sync_mem_store (rtx mem, rtx val,
*** 7070,7076 
create_input_operand (&ops[1], val, mode);
create_integer_operand (&ops[2], model);
if (maybe_expand_insn (icode, 3, ops))
!   return;
  }
  
/* A store of 0 is the same as __sync_lock_release, try that.  */
--- 7081,7087 
create_input_operand (&ops[1], val, mode);
create_integer_operand (&ops[2], model);
if (maybe_expand_insn (icode, 3, ops))
!   return const0_rtx;
  }
  
/* A store of 0 is the same as __sync_lock_release, try that.  */
*** expand_sync_mem_store (rtx mem, rtx val,
*** 7086,7095 
  /* lock_release is only a release barrier.  */
  if (model == MEMMODEL_SEQ_CST)
expand_builtin_mem_thread_fence (model);
! return;
}
}
  }
/* If there is no mem_store, default to a move with barriers */
if (model == MEMMODEL_SEQ_CST || model == MEMMODEL_RELEASE)
  expand_builtin_mem_thread_fence (model);
--- 7097,7119 
  /* lock_release is only a release barrier.  */
  if (model == MEMMODEL_SEQ_CST)
expand_builtin_mem_thread_fence (model);
! return const0_rtx;
}
}
  }
+ 
+   /* If the size of the object is greater than word size on this target,
+  a default store will not be atomic, Try a mem_exchange and throw away
+  the result.  If that doesn't work, don't do anything.  */
+   if (GET_MODE_PRECISION(mode) > BITS_PER_WORD)
+ {
+   rtx target = expand_sync_mem_exchange (NULL_RTX, mem, val, model);
+   if (target)
+ return const0_rtx;
+   else
+ return NULL_RTX;
+ }
+ 
/* If there is no mem_store,

Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-10-04 Thread Richard Henderson
On 10/04/2011 09:28 AM, Tom de Vries wrote:
> Well, that was the idea. But now I wonder, isn't it better to do this in
> expand_builtin_alloca:
> ...
>/* Compute the argument.  */
>op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> 
> +  align =
> +(DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
> + BUILT_IN_ALLOCA_WITH_ALIGN)
> +? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
> +: BIGGEST_ALIGNMENT;
> +
> +
>/* Allocate the desired space.  */
> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
> -  cannot_accumulate);
> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);

Yes, this is better.

I've lost track of what you're trying to do with "folding" alloca?


r~


Re: [Patch, Fortran] PR 35831: [F95] Shape mismatch check missing for dummy procedure argument

2011-10-04 Thread Janus Weil
>>> The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk?
>>>
>> The patch is basically OK.


> Otherwise I'll just start by committing the
> patch as posted ...

Just did so (r179520).

Cheers,
Janus


Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-04 Thread H.J. Lu
On Tue, Oct 4, 2011 at 11:51 AM, Uros Bizjak  wrote:
> On Tue, Oct 4, 2011 at 8:37 PM, H.J. Lu  wrote:
>
> OTOH, x86_64 and i686 targets can also benefit from this change. If
> combine can't create more complex address (covered by lea), then it
> will simply propagate memory operand back into the add insn. It looks
> to me that we can't loose here, so:
>
>  /* Improve address combine.  */
>  if (code == PLUS && MEM_P (src2))
>    src2 = force_reg (mode, src2);
>
> Any opinions?
>

 It doesn't work with 64bit libstdc++:
>>>
>>> Yeah, yeah. ix86_output_mi_thunk has some ...  issues.
>>>
>>> Please try attached patch that introduces ix86_emit_binop and uses it
>>> in a bunch of places.
>
>> I tried it on GCC.  There are no regressions.  The bugs are fixed for x32.
>> Here are size comparison with GCC runtime libraries on ia32, x32 and
>> x86-64:
>
>>  884093   18600   27064  929757   e2fdd old libstdc++.so
>>  884189   18600   27064  929853   e303d new libs/libstdc++.so
>>
>> The new code is
>>
>> mov    0xc(%edi),%eax
>> mov    %eax,0x8(%esi)
>> mov    -0xc(%eax),%eax
>> mov    0x10(%edi),%edx
>> lea    0x8(%esi,%eax,1),%eax
>>
>> The old one is
>>
>> mov    0xc(%edi),%edx
>> lea    0x8(%esi),%eax
>> mov    %edx,0x8(%esi)
>> add    -0xc(%edx),%eax
>> mov    0x10(%edi),%edx
>
> The new code merged lea+add into one lea, so it looks quite OK to me.
>
> Do you have some performance numbers?
>

I will report performance numbers in a few days.

-- 
H.J.


Re: [PATCH] Maintain order of LTO sections

2011-10-04 Thread Mike Stump
On Oct 4, 2011, at 6:03 AM, Andi Kleen wrote:
> Also as Honza pointed out it has other benefits, like making
> compiles more reproducible. For example if you have a memory corruption
> somewhere the random order currently will randomly move it from
> run to run and make it harder to debug.

I like deterministic compilers.  :-)  I like an ordering that has a specific 
deterministic reason for it.


[cxx-mem-model] gdb testing changes to avoid infinite loops.

2011-10-04 Thread Andrew MacLeod
THe framework for invoking gdb and simulating other threads can run into 
a problem when memmodel_other_thread() is written in a hostile manner.


memodel_other_thread ()
{
  mem++; /* increment mem every instruction.  */
}

and it is tested against something like a compare_and_swap loop 
implementing add.


r1 = mem
loop:
r2 = r2 + value;
if (r1 = Compare_and_Swap (&mem, r1, r2))
   goto loop;

the value of mem will always change between loading it into r1 and the 
execution of the compare and swap after the add.  This causes the 
testcase to go into an infinite loop trying to make forward progress.


This adds a hostile thread check to the framework such that if more 
instructions are executed by the test than a specified threshold, a 
pause is inserted in calling memmodel_other_thread allowing forward 
progress to be made.


A testcase can opt to make this an error condition, but defaults to not 
an error.  It could be used to test for forward progress down the road 
perhaps.


Andrew

* gcc.dg/memmodel/memmodel.h (__gdb_memmodel_fini): Rename from 
memmodel_fini.
(HOSTILE_THREAD_THRESHOLD, HOSTILE_THREAD_PAUSE): New.
(__gdb_hostile_pause): New. Indicates a pause was triggered.
(__gdb_wrapper_other_threads): New.  Wrapper for memmodel_other_threads.
Avoid infinite loops.
(__gdb_wrapper_final_verify): New. Wrapper for memmodel_final_verify.
* gcc.dg/memmodel/memmodel.gdb: Use renamed __gdb_ routines instead.
* gcc.dg/memmodel/speculative-store.c (memmodel_other_threads): Correct
return type.

Index: memmodel.h
===
*** memmodel.h  (revision 178916)
--- memmodel.h  (working copy)
***
*** 1,7 
! int memmodel_fini = 0;
  
  void __attribute__((noinline))
  memmodel_done ()
  {
!   memmodel_fini = 1;
  }
--- 1,107 
! int __gdb_memmodel_fini = 0;
  
  void __attribute__((noinline))
  memmodel_done ()
  {
!   __gdb_memmodel_fini = 1;
! }
! 
! 
! /* A hostile thread is one which changes a memory location so quickly that 
!another thread may never see the same value again.  This is simulated when
!memmodel_other_thread() is defined to modify a memory location every cycle.
! 
!A process implementing a dependency on this value can run into difficulties
!with such a hostile thread.  for instance, implementing an add with a 
!compare_and_swap loop goes something like:
!  expected = *mem;
!loop:
!  new = expected += value;
!  if (!succeed (expected = compare_and_swap (mem, expected, new)))
!goto loop;
! 
!if the content of 'mem' are changed every cycle by memodel_other_thread ()
!this will become an infinite loop since the value *mem will never be 
!'expected' by the time the compare_and_swap is executed.
! 
!HOSTILE_THREAD_THRESHOLD defines the number of intructions which a program 
!will execute before triggering the hostile threead pause. The pause will 
!last for HOSTILE_THREAD_PAUSE instructions, and then the counter will reset
!and begin again.  During the pause period, memodel_other_thread will not
!be called. 
! 
!This provides a chance for forward progress to be made and the infinite 
loop
!to be avoided.
! 
!if the testcase defines HOSTILE_PAUSE_ERROR, then it will be considered an
!RUNTIME FAILURE if the hostile pause is triggered.  This will allow to test
!for guaranteed forward progress routines.
! 
!If the default values for HOSTILE_THREAD_THRESHOLD or HOSTILE_THREAD_PAUSE
!are insufficient, then the testcase may override these by defining the
!values before including this file.   
! 
!Most testcase are intended to run for very short periods of time, so these
!defaults are considered to be high enough to not trigger on a typical case,
!but not drag the test time out too much if a hostile condition is 
!interferring.  */
! 
!   
! /* Define the threshold to start pausing the hostile thread.  */
! #if !defined (HOSTILE_THREAD_THRESHOLD)
! #define HOSTILE_THREAD_THRESHOLD  500
! #endif
! 
! /* Define the length of pause in cycles for the hostile thread to pause to
!allow forward progress to be made.  */
! #if !defined (HOSTILE_THREAD_PAUSE)
! #define HOSTILE_THREAD_PAUSE  20
! #endif
! 
! void memmodel_other_threads (void);
! int memmodel_final_verify (void);
! 
! static int __gdb_hostile_pause = 0;
! 
! /* This function wraps memodel_other_threads an monitors for an infinite loop.
!If the threshold value HOSTILE_THREAD_THRESHOLD is reached, the 
other_thread
!process is paused for HOSTILE_THREAD_PAUSE cycles before resuming, and the
!counters start again.  */
! void
! __gdb_wrapper_other_threads()
! {
!   static int count = 0;
!   static int pause = 0;
! 
!   if (++count >= HOSTILE_THREAD_THRESHOLD)
! {
!   if (!__gdb_hostile_pause)
! __gdb_hostile_pause = 1;
! 
!   /* Count cycle

Re: [cxx-mem-model] don't issue non-atomic load/store

2011-10-04 Thread Richard Henderson
On 10/04/2011 11:51 AM, Andrew MacLeod wrote:
>   * optabs.c (expand_sync_mem_load): Don't expand into a default load if
>   the type is larger than a word. Try a compare_and_swap with 0.
>   (expand_sync_mem_store): Return const0_rtx if a store is generated. If
>   type is larger than a word try an exchange, then fail.
>   * builtins.c (expand_builtin_sync_mem_store): Return a value.
>   (expand_builtin): If no store generated, leave a function call.
>   * expr.h (expand_sync_mem_store): Prototype returns value.
> 
>   * testsuite/gcc.dg/memmodel/sync-other-int128.c: Don't xfail any more.

Ok.


r~


Re: [PATCH] Handle side-effects of EH optimizations of callees

2011-10-04 Thread Maxim Kuvyrkov
On 5/10/2011, at 1:49 AM, Richard Guenther wrote:

> On Tue, Oct 4, 2011 at 9:17 AM, Maxim Kuvyrkov  wrote:
>> Richard,
>> 
>> The following patch fixes a CFG consistency problem.
>> 
>> When early IPA optimizations (e.g., early SRA) create a version of a 
>> function that no longer throws, versioning machinery deletes EH landings 
>> pads in _callers_ of the function [*].  However, the EH cfg edges are not 
>> updated in the callers, which eventually leads to an ICE in verify_eh_edges.
> 
> It needs to update EH edges in the callers via the common idiom
> 
>  if (maybe_clean_or_replace_eh_stmt (stmt, stmt)
>  && gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>... arrange for cfg-cleanup to be run ...
> 
> where the cfg-cleanup part probably explains why it doesn't.  So one option
> is to not delete the landing pads.

There are several places in the compiler where code does not the above idiom 
and calls maybe_clean[_or_replace]_eh_stmt without following it up with 
cfg-cleanup.

>  Where does it do that?

The cases that I investigated are:
- cgraphunit.c: update_call_expr().  The right thing to do /seems/ to be to 
remove call to maybe_clean_eh_stmt_fn().
- ipa-prop.c: ipa_modify_call_arguments() -> gsi_replace().  The right thing to 
do /seems/ to be to add follow up call to gimple_purge_dead_eh_edges().

Both cases above are triggered from early SRA.

The other places that appear not to update CFG after calling 
maybe_clean_or_replace_eh_stmt are:
- tree-cfg.c: replace_uses_by()
- tree-ssa-dce.c: eliminate_unnecessary_stmts().

>  I suppose it
> merely fails to properly transfer the EH info from the old to the new call
> (thus, fails to forcefully do what maybe_duplicate_eh_stmt does).

I don't follow you here.  The EH CFG edges that get out-of-date are in _caller_ 
function.  It is the _callee_ function that is duplicated.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



[wwwdocs] add libstdc++/1773 change to gcc-4.7/changes.html

2011-10-04 Thread Jonathan Wakely
I've committed this, which documents the fix for
http://gcc.gnu.org/PR1773 in gcc-4.7/changes.html, and also replaces
some > characters with the > entity.
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.45
diff -r1.45 changes.html
84c84
<   if (a>10)
---
>   if (a>10)
211c211
< template 
---
> template 
215c215
< template 
---
> template 
218c218
<   void f() { g(T()); } // fix by using this->g or A::g
---
>   void f() { g(T()); } // fix by using this->g or A::g
225,226c225,226
<   f();
<   A().f();
---
>   f();
>   A().f();
232c232
< template
---
> template
242c242
<   int ar[Q::I];
---
>   int ar[Q::I];
268a269,272
> 
>   G++ now sets the predefined macro __cplusplus to the
> correct value, 199711L.
>   


Re: [PATCH] Fix ICE with strlen optimization (PR tree-optimization/50604)

2011-10-04 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/04/11 05:00, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase ICEs on the trunk, as strlen optimization
> was assuming memcpy arguments will have expected type
> (size_type_node), but they had ssizetype instead.  The following
> patch fixes it both in the builtins.c folders that create memcpy
> and also in the strlen pass to no longer assume that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> 2011-10-04  Jakub Jelinek  
> 
> PR tree-optimization/50604 * builtins.c (fold_builtin_strcpy,
> fold_builtin_stpcpy, fold_builtin_strncpy,
> fold_builtin_stxcpy_chk): Ensure last argument to memcpy has
> size_type_node type instead of ssizetype. * tree-ssa-strlen.c
> (handle_builtin_memcpy): Use size_type_node instead of TREE_TYPE
> (len) as type for newlen.
> 
> * gcc.dg/pr50604.c: New test.
OK.
jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOi1tzAAoJEBRtltQi2kC7r3EH/2gy6Wj3PZKjBPux97XXTAKn
Iz/Bn/8J4yCbYseS/Obbg7BE3qrHMePdq53Gbq5bOOJAifHl5l3u7WaOBVBBtAxP
CQFKaWBn0U7++Wsuyyd5s7z+/bpiEJBcZNQzxwMUrZ+ue3w+qlipYL7Qpepl7+WB
BTnctEfr9RPJ3OMBER/6DOXchQkc+YdG3IvtjKWXTfexIQyQi0WEEDvZe4EQQLZD
jN1LQZKHR4nP0exgTUOdxemRvg1Bb5l8dH6Y9gfmMWlxpsKDJBNq3eXRLurTOW1+
jhaY3AD8NOkI+plpKhpJuzBiBil8N5yODrxme+h/q03/dYYxwh2ApIEyGIQlylM=
=fOz8
-END PGP SIGNATURE-


[doc] fix broken anchor in install.texi

2011-10-04 Thread Jonathan Wakely
I want to commit the attached patch to install.texi, but have been
unable to test it successfully.

The problem is that the link for "x86_64-*-solaris2.1[0-9]*" in the
Host/Target Specific notes is
http://gcc.gnu.org/install/specific.html#x86-64-x-solaris210
but the corresponding anchor is:


Note the "_002e" difference, which is caused by the texinfo uref being:
@uref{#x86-64-x-solaris210,,x86_64-*-solaris2.1[0-9]*}
but the anchor target having an extra dot:
@anchor{x86-64-x-solaris2.10}

My patch removes that dot.

I tried to test it by regenerating the HTML docs (using
texinfo-4.13a-15 from Fedora 15), but I get different results: the dot
is gone as expected, but every hyphen gets expanded to _002d, so none
of the anchors work!

I'm pretty sure the patch is correct but will check the link in the
GCC onlinedocs once they're regenerated.

OK to commit?

2011-10-04  Jonathan Wakely  

* doc/install.texi (Specific): Fix anchor for
x86_64-*-solaris2.1[0-9]*
Index: install.texi
===
--- install.texi(revision 179520)
+++ install.texi(working copy)
@@ -4448,7 +4448,7 @@ both 64-bit x86-64 and 32-bit x86 code (
 @html
 
 @end html
-@heading @anchor{x86-64-x-solaris2.10}x86_64-*-solaris2.1[0-9]*
+@heading @anchor{x86-64-x-solaris210}x86_64-*-solaris2.1[0-9]*
 
 GCC also supports the x86-64 architecture implemented by the AMD64
 processor (@samp{amd64-*-*} is an alias for @samp{x86_64-*-*}) on


Re: [patch RFC,PR50038]

2011-10-04 Thread Richard Henderson
On 10/04/2011 08:42 AM, Joseph S. Myers wrote:
> On Tue, 4 Oct 2011, Ilya Tocar wrote:
> 
>> Hi everyone,
>>
>> This patch fixes PR 50038 (redundant zero extensions) by modifying
>> implicit-zee pass
>> to also remove unneeded zero extensions from QImode to SImode.
> 
> Hardcoding particular modes like this in the target-independent parts of 
> the compiler is fundamentally ill-conceived.  Right now it hardcodes the 
> (SImode, DImode) pair.  You're adding hardcoding of (QImode, SImode) as 
> well.  But really it should consider all pairs of (integer mode, wider 
> integer mode), with the machine description (or target hooks) determining 
> which pairs are relevant on a particular target.  Changing it not to 
> hardcode particular modes would be better than adding a second pair.
> 

That along with not hard-coding ZERO_EXTEND.

Both MIPS and Alpha have much the same free operations, but with SIGN_EXTEND.

I remember rejecting one iteration of this pass with this hard-coded, but the
pass was apparently approved by someone else without that being corrected.


r~


Re: [PATCH 3/7] Emit macro expansion related diagnostics

2011-10-04 Thread Jason Merrill

On 10/03/2011 06:49 PM, Dodji Seketeli wrote:

Good question.  Is the below better?



+  if (linemap_location_from_macro_expansion_p (set, pre))
+pre = linemap_resolve_location (set, pre,
+   LRK_MACRO_EXPANSION_POINT, NULL);
+
+  if (linemap_location_from_macro_expansion_p (set, post))
+post = linemap_resolve_location (set, post,
+LRK_MACRO_EXPANSION_POINT,
+NULL);


This looks like two different virtual locations from the same macro 
expansion will compare equal.


Jason


Re: [PATCH 3/7] Emit macro expansion related diagnostics

2011-10-04 Thread Jason Merrill

On 10/03/2011 04:08 PM, Dodji Seketeli wrote:

Jason Merrill  writes:

-finish_declspecs (struct c_declspecs *specs)
+finish_declspecs (struct c_declspecs *specs,
+ location_t where)


I'm not sure the beginning of the declspecs is a better place for
these diagnostics than the beginning of the declarator.


To fix the particular example of PR 7263 (and the example
gcc/testsuite/gcc.dg/cpp/syshdr3.c that I added to that particular
patch) where the declspec is what is defined in the system header, and
the declaration (declspec + declarator) gets assembled in the source, we
want pedwarn[1] (really diagnostic_report_diagnostic) to detect that the
declspec is spelled in a system header.  So we want pedwarn to be passed
the location of a declspec.


So then this change would make

_Complex c;

OK, but not

static _Complex c;

because the first declspec is not from a macro, right?


I believe you noted this at some point and
agreed with me that ideally each declspec should come with its location
but that's work for another occasion.


If I'm right about the above example, I think I'd rather hold off this 
declspecs change until that time.


Jason


Re: Explicitly record which registers are inaccessible

2011-10-04 Thread Richard Sandiford
Richard Sandiford  writes:
> Bernd Schmidt  writes:
>> On 09/25/11 19:16, Richard Sandiford wrote:
>>> The last bit is indirect, via a new HARD_REG_SET called operand_reg_set.
>>> And this set is the reason why I'm sending the patch now.  The MIPS16 port
>>> has always had a problem with the HI and LO registers: they can only be
>>> set by multiplication and division instructions, and only read by MFHI
>>> and MFLO.  Unlike normal MIPS, there are no MTHI and MTLO instructions.
>> [...]
>>  >  Now that we use pressure classes
>>> instead (a good thing), I'm finally going to try to fix this "properly".
>>> And that means (a) fixing HI and LO and (b) stopping them from being
>>> treated as register operands.  (b) is important because if we start
>>> out with this (valid) instruction before reload:
>>
>> The only slightly nonobvious thing about this is that mfhi/mflo can't
>> have their operand represented using a register_operand. I haven't
>> looked; I assume that's the case. Ok.
>
> Right.  The follow-up MIPS patch (which I've been sitting on, I suppose
> I should post it when I get home, sorry) removes HI and LO from the new
> operand_reg_set and extends move_operand to explicitly allow LO.
>
> A lot of the multiplication patterns need rejigging to expose LO early
> when required, so it's not pretty...

Here it is.  Like I say, most of it is just exposing LO at expand time
for MIPS16, while not affecting normal mode.  The changes relevant to
this patch are the ones to use muldiv_target_operand and the change
to move_operand.

Richard


gcc/
* config/mips/mips-protos.h (mips_emit_binary): Declare.
* config/mips/mips.c (mips_emit_binary): Make global.
(mips_set_mips16_mode): Turn off -mfix-r4000 in MIPS16 mode.
(mips_conditional_register_usage): Don't treat LO and HI as
register operands in MIPS16 mode.
(mips_mulsidi3_gen_fn): Use {u,}mulsidi3_{32,64}bit_mips16
for MIPS16 code.
* config/mips/predicates.md (muldiv_target_operand): New predicate.
(move_operand): Allow hilo_operand.
* config/mips/mips.md (mul3): Explicitly specify LO as the
target of MIPS16 multiplies, then move it into the target register.
(mul3_internal, *macc2, *msac2): Use muldiv_target_operand.
(mulsidi3_32bit_mips16): New expander.
(mulsidi3_32bit): Use muldiv_target_operand.
(mulsidi3_32bit_r4000): Disable for ISA_HAS_DSP.
(mulsidi3_64bit): Require !TARGET_MIPS16.  Split into
mulsidi3_64bit_split.
(mulsidi3_64bit_mips16): New expander.
(mulsidi3_64bit_split): Likewise, using expansions from
two previous define_splits.
(mulsidi3_64bit_hilo, *muls_di, msubsidi4): Use
muldiv_target_operand.
(mulsi3_highpart): Use mulsi3_highpart_split for MIPS16 code.
(mulsi3_highpart_internal): Require !TARGET_MIPS16.
Split into mulsi3_highpart_split.
(mulsi3_highpart_split): New expander.
(muldi3_highpart): Turn into a define_expand.
Use muldi3_highpart_split for MIPS16 code.
(muldi3_highpart_internal): Renamed from muldi3_highpart.
Require !TARGET_MIPS16.  Split into muldi3_highpart_split.
(muldi3_highpart_split): New expander.
(mulditi3): Explicitly specify LO as the target of MIPS16
multiplies, then move it into the target register.
(mulditi3_internal, maddsidi4): Use muldiv_target_operand.
(divmod4, udivmod4): Turn into define_expands.
Use divmod4_split for MIPS16 code, then explicitly
move LO into operand 0.
(divmod4_internal, udivmod4_internal): Renamed
from divmod4.  Use muldiv_target_operand.
Require !TARGET_MIPS16.  Split into divmod4_split.
(divmod4_split): New expander.
(divmod4_hilo_): Use muldiv_target_operand.
(mfhi_): Use hilo_operand.

gcc/testsuite/
* gcc.target/mips/mult-2.c, gcc.target/mips/mult-3.c,
gcc.target/mips/mult-4.c, gcc.target/mips/mult-5.c,
gcc.target/mips/mult-6.c, gcc.target/mips/mult-7.c,
gcc.target/mips/mult-8.c, gcc.target/mips/mult-9.c,
gcc.target/mips/mult-10.c, gcc.target/mips/mult-11.c,
gcc.target/mips/mult-12.c, gcc.target/mips/mult-13.c,
gcc.target/mips/mult-14.c, gcc.target/mips/mult-15.c,
gcc.target/mips/mult-16.c, gcc.target/mips/mult-17.c,
gcc.target/mips/mult-18.c, gcc.target/mips/mult-19.c,
gcc.target/mips/div-1.c, gcc.target/mips/div-2.c,
gcc.target/mips/div-3.c, gcc.target/mips/div-4.c,
gcc.target/mips/div-5.c, gcc.target/mips/div-6.c,
gcc.target/mips/div-7.c, gcc.target/mips/div-8.c,
gcc.target/mips/div-9.c, gcc.target/mips/div-10.c,
gcc.target/mips/div-11.c, gcc.target/mips/div-12.c: New tests.
* gcc.target/mips/fix-r4000-1.c (foo, bar): Add NOMIPS16.
* gcc.target/mips/fix-r4000-2.c (foo): Likewise.
* gcc.target/mips/fix-r4000

Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-10-04 Thread Tom de Vries
On 10/04/2011 08:51 PM, Richard Henderson wrote:
> On 10/04/2011 09:28 AM, Tom de Vries wrote:
>> Well, that was the idea. But now I wonder, isn't it better to do this in
>> expand_builtin_alloca:
>> ...
>>/* Compute the argument.  */
>>op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
>>
>> +  align =
>> +(DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
>> + BUILT_IN_ALLOCA_WITH_ALIGN)
>> +? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
>> +: BIGGEST_ALIGNMENT;
>> +
>> +
>>/* Allocate the desired space.  */
>> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
>> - cannot_accumulate);
>> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
> 
> Yes, this is better.
> 
> I've lost track of what you're trying to do with "folding" alloca?
> 

In general, to fold vlas (which are lowered to allocas) to normal declarations,
if the alloca argument is constant.

This particular patch tries 2 addres 2 issues:
- A bug in the current implementation (PR50527). We assume a different alignment
  while propagating in ccp, than during folding the alloca to a decl.
- We try to use the DECL_ALIGN of the vla as the DECL_ALIGN of the folded alloca
  decl. We try to carry the DECL_ALIGN from lowering to folding and later using
  the newly introduced __builtin_alloca_with_align. There are other ways to do
  this carrying:
  - introduce a vla_decl field in the call gimple
  - use a composition of __builtin_alloca and __builtin_assume_aligned
  I chose for __builtin_alloca_with_align to because it's simpler to use than
  the latter. The former is the simplest, but I think
  __builtin_alloca_with_align might have a use beyond vlas.

Any guidance on what to do if we have to expand the __builtin_alloca_with_align
to a function call, specifically, with the second argument? This is currently
handled like this, which is not very nice:
...
@@ -5559,11 +5573,21 @@ expand_builtin (tree exp, rtx target, rt
return XEXP (DECL_RTL (DECL_RESULT (current_function_decl)), 0);

 case BUILT_IN_ALLOCA:
+case BUILT_IN_ALLOCA_WITH_ALIGN:
   /* If the allocation stems from the declaration of a variable-sized
 object, it cannot accumulate.  */
   target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
   if (target)
return target;
+  if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+ == BUILT_IN_ALLOCA_WITH_ALIGN)
+   {
+ tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
+  built_in_decls[BUILT_IN_ALLOCA],
+  1, CALL_EXPR_ARG (exp, 0));
+ CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
+ exp = new_call;
+   }
   break;

 case BUILT_IN_STACK_SAVE:
...

Thanks,
- Tom

> 
> r~


Re: [PATCH 3/7] Emit macro expansion related diagnostics

2011-10-04 Thread Dodji Seketeli
Jason Merrill  writes:

> So then this change would make
>
> _Complex c;
>
> OK, but not
>
> static _Complex c;
>
> because the first declspec is not from a macro, right?

Yes.

>
>> I believe you noted this at some point and
>> agreed with me that ideally each declspec should come with its location
>> but that's work for another occasion.
>
> If I'm right about the above example, I think I'd rather hold off this
> declspecs change until that time.

OK.

-- 
Dodji


Re: [PATCH 3/7] Emit macro expansion related diagnostics

2011-10-04 Thread Dodji Seketeli
Jason Merrill  writes:

> On 10/03/2011 06:49 PM, Dodji Seketeli wrote:
>> Good question.  Is the below better?
>
>> +  if (linemap_location_from_macro_expansion_p (set, pre))
>> +pre = linemap_resolve_location (set, pre,
>> +   LRK_MACRO_EXPANSION_POINT, NULL);
>> +
>> +  if (linemap_location_from_macro_expansion_p (set, post))
>> +post = linemap_resolve_location (set, post,
>> +LRK_MACRO_EXPANSION_POINT,
>> +NULL);
>
> This looks like two different virtual locations from the same macro
> expansion will compare equal.

Yes.  I thought this was enough to not regress compared to the current
situation where two tokens from the same macro expansion have the
spelling location of macro expansion point anyway.  This is used for
#pragma disagnostic changes in diagnostic_report_diagnostic.

Otherwise, I am not sure how this should behave wrt nested macro
expansions.

-- 
Dodji


  1   2   >