date:20111021

[PATCH, i386]: Improve recip sequences a bit

2011-10-21 Thread Uros Bizjak

Hello!

While eyeballing following testcase:

float a[256], b[256], c[256];

void foo(void)
{
  int i;

  for (i=0; i<256; ++i)
c[i] = a[i] / b[i];
}

-O2 -ftree-vectorize -ffast-math

I noticed that for some reason CSE doesn't eliminate memory read, resulting in:

.L2:
vrcpps  b(%rax), %ymm0
vmulps  b(%rax), %ymm0, %ymm1
vmulps  %ymm1, %ymm0, %ymm1
vaddps  %ymm0, %ymm0, %ymm0
vsubps  %ymm1, %ymm0, %ymm1
vmulps  a(%rax), %ymm1, %ymm1
vmovaps %ymm1, c(%rax)
addq$32, %rax
cmpq$1024, %rax
jne .L2

Attached patch forces memory operand into register, producing:

.L2:
vmovaps b(%rax), %ymm1
vrcpps  %ymm1, %ymm0
vmulps  %ymm1, %ymm0, %ymm1
vmulps  %ymm1, %ymm0, %ymm1
vaddps  %ymm0, %ymm0, %ymm0
vsubps  %ymm1, %ymm0, %ymm1
vmulps  a(%rax), %ymm1, %ymm1
vmovaps %ymm1, c(%rax)
addq$32, %rax
cmpq$1024, %rax
jne .L2

The same cure could be applied for rsqrt sequences.

2011-10-21  Uros Bizjak  

* config/i386/i386.c (ix86_emit_swdivsf): Force b into register.
(ix86_emit_swsqrtsf): Force a into register.

Patch was tested on x86_64-pc-linux-gnu, committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 180255)
+++ config/i386/i386.c  (working copy)
@@ -33682,6 +33682,8 @@ void ix86_emit_swdivsf (rtx res, rtx a, rtx b, enu
 
   /* a / b = a * ((rcp(b) + rcp(b)) - (b * rcp(b) * rcp (b))) */
 
+  b = force_reg (mode, b);
+
   /* x0 = rcp(b) estimate */
   emit_insn (gen_rtx_SET (VOIDmode, x0,
  gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
@@ -33737,6 +33739,8 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, enum mach
   /* sqrt(a)  = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0)
  rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */
 
+  a = force_reg (mode, a);
+
   /* x0 = rsqrt(a) estimate */
   emit_insn (gen_rtx_SET (VOIDmode, x0,
  gen_rtx_UNSPEC (mode, gen_rtvec (1, a),

[C++ Patch] PR 30066

2011-10-21 Thread Roberto Agostino Vitillo

With this patch fvisibility-inlines-hidden affects also inline functions, e.g.:


inline void foo() {}

int main(){
  foo();
  return 0;
} 

when compiled with -fvisibility-inlines-hidden, foo has hidden visibility:
...
   10:  6 FUNCWEAK   HIDDEN 6 _Z3foov
...


Tested on x86_64-linux. I should have a signed copyright assignment in a few 
weeks but I think this patch is small enough that it doesn't need it.
r

2011-10-21  Roberto Agostino Vitillo  

PR c++/30066
* gcc/doc/invoke.texi: Documentation change for 
fvisibility-inlines-hidden.

* gcc/c-family/c.opt: Description change for fvisibility-inlines-hidden.

* gcc/cp/decl2.c (determine_hidden_inline): New function.
  (determine_visibility): fvisibility-inlines-hidden affects inline 
functions.

* gcc/testsuite/g++.dg/ext/visibility/fvisibility-inlines-hidden-4.C: 
New test.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 180234)
+++ gcc/doc/invoke.texi (working copy)
@@ -2120,7 +2120,7 @@
 @item -fvisibility-inlines-hidden
 @opindex fvisibility-inlines-hidden
 This switch declares that the user does not attempt to compare
-pointers to inline methods where the addresses of the two functions
+pointers to inline functions or methods where the addresses of the two 
functions
 were taken in different shared objects.
 
 The effect of this is that GCC may, effectively, mark inline methods with
Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 180234)
+++ gcc/c-family/c.opt  (working copy)
@@ -1043,7 +1043,7 @@
 
 fvisibility-inlines-hidden
 C++ ObjC++
-Marks all inlined methods as having hidden visibility
+Marks all inlined functions and methods as having hidden visibility
 
 fvisibility-ms-compat
 C++ ObjC++ Var(flag_visibility_ms_compat)
Index: gcc/testsuite/g++.dg/ext/visibility/fvisibility-inlines-hidden-4.C
===
--- gcc/testsuite/g++.dg/ext/visibility/fvisibility-inlines-hidden-4.C  
(revision 0)
+++ gcc/testsuite/g++.dg/ext/visibility/fvisibility-inlines-hidden-4.C  
(revision 0)
@@ -0,0 +1,18 @@
+/* PR c++/30066: Test that -fvisibility-inlines-hidden affects functions. */
+/* { dg-do compile } */
+/* { dg-require-visibility "" } */
+/* { dg-options "-fvisibility-inlines-hidden" } */
+/* { dg-final { scan-hidden "_Z3barv" } } */
+/* { dg-final { scan-hidden "_Z3fooIvEvv" } } */
+
+inline void bar() { }
+
+template 
+inline void foo() { }
+
+int main(void)
+{
+  bar();
+  foo();
+  return 0;
+}
Index: gcc/cp/decl2.c
===
--- gcc/cp/decl2.c  (revision 180234)
+++ gcc/cp/decl2.c  (working copy)
@@ -86,6 +86,7 @@
 static void import_export_class (tree);
 static tree get_guard_bits (tree);
 static void determine_visibility_from_class (tree, tree);
+static bool determine_hidden_inline (tree);
 static bool decl_defined_p (tree);
 
 /* A list of static class variables.  This is needed, because a
@@ -2131,13 +2132,16 @@
}
   else if (use_template)
/* Template instantiations and specializations get visibility based
-  on their template unless they override it with an attribute.  */;
+  on their template unless they override it with an attribute.  */;
   else if (! DECL_VISIBILITY_SPECIFIED (decl))
{
- /* Set default visibility to whatever the user supplied with
-#pragma GCC visibility or a namespace visibility attribute.  */
- DECL_VISIBILITY (decl) = default_visibility;
- DECL_VISIBILITY_SPECIFIED (decl) = visibility_options.inpragma;
+  if (!determine_hidden_inline (decl))
+{
+ /* Set default visibility to whatever the user supplied with
+#pragma GCC visibility or a namespace visibility attribute.  */
+ DECL_VISIBILITY (decl) = default_visibility;
+ DECL_VISIBILITY_SPECIFIED (decl) = visibility_options.inpragma;
+}
}
 }
 
@@ -2155,7 +2159,9 @@
  int depth = TMPL_ARGS_DEPTH (args);
  tree pattern = DECL_TEMPLATE_RESULT (TI_TEMPLATE (tinfo));
 
- if (!DECL_VISIBILITY_SPECIFIED (decl))
+ if (!DECL_VISIBILITY_SPECIFIED (decl)
+ && (DECL_VISIBILITY_SPECIFIED (pattern) 
+ || !determine_hidden_inline (decl)))
{
  DECL_VISIBILITY (decl) = DECL_VISIBILITY (pattern);
  DECL_VISIBILITY_SPECIFIED (decl)
@@ -2214,17 +2220,7 @@
   if (DECL_VISIBILITY_SPECIFIED (decl))
 return;
 
-  if (visibility_options.inlines_hidden
-  /* Don't do this for inline templates; specializations might not be
-inline, and we don't want them to inherit the hidden
-visibility.  We'll set it here for all inline instan

Re: [v3] libstdc++/50196 - enable std::thread, std::mutex etc. on darwin

2011-10-21 Thread Jonathan Wakely

On 21 October 2011 00:43, Jonathan Wakely wrote:
> This patch should enable macosx support for  and partial
> support for , by defining _GLIBCXX_HAS_GTHREADS on POSIX
> systems without the _POSIX_TIMEOUTS option, and only disabling the
> types which rely on the Timeouts option, std::timed_mutex and
> std::recursive_timed_mutex, instead of disabling all thread support.

I've just realised this patch will disable the timed mutexes on
non-posix platforms - I should only check for _POSIX_TIMEOUTS when
thread-model = posix, and set HAS_MUTEX_TIMEDLOCK unconditionally
elsewhere.

New patch coming soon...

Re: new patches using -fopt-info (issue5294043)

2011-10-21 Thread Richard Guenther

On Thu, Oct 20, 2011 at 7:11 PM, Xinliang David Li  wrote:
> On Thu, Oct 20, 2011 at 1:21 AM, Richard Guenther
>  wrote:
>> On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen  wrote:
>>> x...@google.com (Rong Xu) writes:
>>>
 After some off-line discussion, we decided to use a more general approach
 to control the printing of optimization messages/warnings. We will
 introduce a new option -fopt-info:
  * fopt-info=0 or fno-opt-info: no message will be emitted.
  * fopt-info or fopt-info=1: emit important warnings and optimization
    messages with large performance impact.
  * fopt-info=2: warnings and optimization messages targeting power users.
  * fopt-info=3: informational messages for compiler developers.
>>
>> This doesn't look scalable if you consider that each pass would print
>> as much of a mess like -fvectorizer-verbose=5.
>
> What is not scalable? For level 1 dump, only the summary of
> vectorization will be printed just like other loop transformations.
>
>>
>> I think =2 and =3 should be omitted - we do have dump-files for a reason.
>
> Dump files are not easy to use -- it is big, and slow especially for
> people with large distributed build systems.  Having both level 2 and
> 3 is debatable, but it will be useful to have a least one level above
> level 1. Dump files are mainly for compiler developers, while
> -fopt-info are for compiler developers *and* power users who know
> performance tuning.
>>
>> Also the coverage/profile cases you changed do not at all match
>> "... with large performance impact".  In fact the impact is completely
>> unknown (as it would be the case usually).
>
> Impact of any transformations is just 'potential', coverage problems
> are no different from that.
>
>>
>> I'd rather have a way to make dump-files more structured (so, following
>> some standard reporting scheme) than introducing yet another way
>> of output.  [after making dump-files more consistent it will be easy
>> to revisit patches like this, there would be a natural general central
>> way to implement it]
>
> Yes, I remember we have discussed about this before -- currently dump
> files are a big mess -- debug tracing, IR are all mixed up, but as I
> said above, this is a different matter -- it is for compiler
> developers.
>
> For more structured optimization report, we should use option
> -fopt-report which dump optimization information based on category --
> the info data base can also be shared across modules:
>
> Example:
>
> [Loop Interchange]
> File a, line x,   yyy
> File b, line xx, yyy
> 
> File c, line z,   It is beneficial to interchange the loop, but not
> done because of possible carried dependency (caused by false aliasing
> ...)
>
> [Loop Vectorization]
> 
>
> [Loop Unroll]
> ...
>
> [SRA]
>
> [Alias summary]
>  [Global Vars]
>   a: addr exposed
>   b: add not exposed
>   ..
>  [Global Pointers]
>    ..
>  ...

I very well understand the intent.  But I disagree with where you start
to implement this.  Dump files are _not_ only for developers - after
all we don't have anything else.  -fopt-report can get as big and unmanagable
to read as dump files - in fact I argue it will be worse than dump files if
you go beyond very very coarse reporting.

Yes, dump files are a "mess".  So - why not clean them up, and at the
same time annotate dump file pieces so _automatic_ filtering and
redirecting to stdout with something like -fopt-report would do something
sensible?  I don't see why dump files have to stay messy while you at
the same time would need to add _new_ code to dump to stdout for
-fopt-report.

So, no, please do it the right way that benefits both compiler developers
and your "power users".

And yes, the right way is not to start adding that -fopt-report switch.
The right way is to make dump-files consumable by mere mortals first.

Thanks,
Richard.

>
> Thanks,
>
> David
>
>>
>> So, please fix dump-files instead.  And for coverage/profiling, fill
>> in stuff in a dump-file!
>>
>> Richard.
>>
>>> It would be interested to have some warnings about missing SRA
>>> opportunities in =1 or =2. I found that sometimes fixing those can give a
>>> large speedup.
>>>
>>> Right now a common case that prevents SRA on structure field
>>> is simply a memset or memcpy.
>>>
>>> -Andi
>>>
>>>
>>> --
>>> a...@linux.intel.com -- Speaking for myself only
>>>
>>
>

Fwd: [patch] Fix PR tree-optimization/49960 ,Fix self data dependence

2011-10-21 Thread Richard Guenther

Forwarded to the list, gcc.gnu.org doesn't like gmail anymore.

-- Forwarded message --
From: Richard Guenther 
Date: Fri, Oct 21, 2011 at 11:03 AM
Subject: Re: [patch] Fix PR tree-optimization/49960 ,Fix self data dependence
To: Razya Ladelsky 
Cc: gcc-patches@gcc.gnu.org, gcc-patches-ow...@gcc.gnu.org, Sebastian
Pop 


On Wed, Oct 19, 2011 at 5:42 PM, Razya Ladelsky  wrote:
> gcc-patches-ow...@gcc.gnu.org wrote on 17/10/2011 09:03:59 AM:
>
>> From: Richard Guenther 
>> To: Razya Ladelsky/Haifa/IBM@IBMIL
>> Cc: gcc-patches@gcc.gnu.org, Sebastian Pop 
>> Date: 17/10/2011 09:04 AM
>> Subject: Re: [patch] Fix PR tree-optimization/49960 ,Fix self data
> dependence
>> Sent by: gcc-patches-ow...@gcc.gnu.org
>>
>> On Mon, Oct 17, 2011 at 8:23 AM, Razya Ladelsky 
> wrote:
>> > This patch fixes the failures described in
>> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960
>> > It also fixes bzips when run with autopar enabled.
>> >
>> > In both cases the self dependences are not handled correctly.
>> > In the first case, a non affine access is analyzed:
>> > in the second, the distance vector is not calculated correctly (the
>> > distance vector considered for for self dependences is always
> (0,0,...))
>> >
>> > As  a result, the loops get wrongfully parallelized.
>> >
>> > The patch avoids the special handling of  self dependences, and
> analyzes
>> > all dependences in the same way. Specific adjustments
>> > and support for the self dependence cases were made.
>>
>> Can you elaborate on
>>
>> @@ -3119,8 +3135,11 @@ add_other_self_distances (struct
> data_dependence_r
>>         {
>>           if (DDR_NUM_SUBSCRIPTS (ddr) != 1)
>>        {
>> -        DDR_ARE_DEPENDENT (ddr) = chrec_dont_know;
>> -        return;
>> +        if (DDR_NUM_SUBSCRIPTS (ddr) != 2 || !integer_zerop
> (DR_ACCESS_FN
>> (DDR_A (ddr), 1)))
>> +          {
>> +            DDR_ARE_DEPENDENT (ddr) = chrec_dont_know;
>> +            return;
>> +          }
>>        }
>>
>>           access_fun = DR_ACCESS_FN (DDR_A (ddr), 0);
>>
>> ?  It needed a comment before, and now so even more.
>>
>> The rest of the patch is ok, I suppose the above hunk is to enhance
>> something, not
>> to fix the bug?
>
> For fortran code like:
>
>      DO 140 J=1,MB
>         DO 130 K=1,NA
>            BKJ=B(K,J)
>            IF(BKJ.EQ.ZERO) GO TO 130
>               DO 120 I=1,MA
>                  C(I,J)=C(I,J)+A(K,I)*BKJ
>  120          CONTINUE
>  130    CONTINUE
>  140 CONTINUE
>      RETURN
>
>
> The access functions for the C(i j) self dependence are:
>
> (Data Dep:
> #(Data Ref:
> #  bb: 9
> #  stmt: D.1427_79 = *c_78(D)[D.1426_77];
> #  ref: *c_78(D)[D.1426_77];
> #  base_object: *c_78(D);
> #  Access function 0: {{(stride.12_25 + 1) + offset.13_36, +,
> stride.12_25}_1, +, 1}_3
> #  Access function 1: 0B
> #)
> #(Data Ref:
> #  bb: 9
> #  stmt: *c_78(D)[D.1426_77] = D.1433_88;
> #  ref: *c_78(D)[D.1426_77];
> #  base_object: *c_78(D);
> #  Access function 0: {{(stride.12_25 + 1) + offset.13_36, +,
> stride.12_25}_1, +, 1}_3
> #  Access function 1: 0B
> #)
>
>
> Two dimesions are created to describe C(i j) although there's no need for
> access function 1 which is just 0B.
>
>
> If this was a C code, we would have these two access functions for
> C[i][j]:
>
> (Data Dep:
> #(Data Ref:
> #  bb: 5
> #  stmt: t_10 = C[i_33][j_37];
> #  ref: C[i_33][j_37];
> #  base_object: C
> #  Access function 0: {3, +, 1}_3
> #  Access function 1: {3, +, 1}_2
> #)
> #(Data Ref:
> #  bb: 5
> #  stmt: C[i_33][j_37] = D.3852_15;
> #  ref: C[i_33][j_37];
> #  base_object: C
> #  Access function 0: {3, +, 1}_3
> #  Access function 1: {3, +, 1}_2
> #)
>
>
> In order to handle the Fortran data accesses, even for simple cases as
> above,
> I would need to handle multivariate accesses.
> The data dependence analysis doesn't know how to handle such dependences
> if there's more than one subscript.
> The above Frotran code doesn't actually have two subscripts, but one, and
> thus should be handled.
>
> The reason this issue came up only with the changes of this patch is that
> now
> add_other_self_distances is called from build_classic_dist_vector, which
> is called also for self dependences.
> Before the patch, the distance vector for self dependences was always
> determined as a vector of 0's, and build_classic_dist_vector
> was not called.
>
> I hope it's clearer now, I will add a comment to the code, and submit it
> before committing it.

No, it's not clearer, because it is not clear why you need to add the hack
instead of avoiding the 2nd access function. And iff you add the hack it
needs a comment why zero should be special (any other constant would
be the same I suppose).

Btw, your fortran example does not compile and I don't believe the issue
is still present after my last changes to dr_analyze_indices.  So, did
you verify this on trunk?

Richard.

> Thanks,
> Razya
>
>
>>
>> Thanks,
>> Richard.
>>
>> > Bootstrap and testsuite pass successfully for ppc64-redha

Re: [PATCH, PR50763] Fix for ICE in verify_gimple

2011-10-21 Thread Richard Guenther

On Thu, Oct 20, 2011 at 3:48 PM, Tom de Vries  wrote:
> Richard,
>
> I have a fix for PR50763.
>
> The second example from the PR looks like this:
> ...
> int bar (int i);
>
> void
> foo (int c, int d)
> {
>  if (bar (c))
>    bar (c);
>  d = 33;
>  while (c == d);
> }
> ...
>
> When compiled with -O2 -fno-dominator-opt, the gimple representation before
> ftree-tail-merge looks like this:
> ...
> foo (intD.6 cD.1606, intD.6 dD.1607)
> {
>  intD.6 D.2730;
>
>  # BLOCK 2 freq:900
>  # PRED: ENTRY [100.0%]  (fallthru,exec)
>  # .MEMD.2733_6 = VDEF <.MEMD.2733_5(D)>
>  # USE = nonlocal
>  # CLB = nonlocal
>  D.2730_2 = barD.1605 (cD.1606_1(D));
>  if (D.2730_2 != 0)
>    goto ;
>  else
>    goto ;
>  # SUCC: 3 [29.0%]  (true,exec) 7 [71.0%]  (false,exec)
>
>  # BLOCK 7 freq:639
>  # PRED: 2 [71.0%]  (false,exec)
>  goto ;
>  # SUCC: 4 [100.0%]  (fallthru)
>
>  # BLOCK 3 freq:261
>  # PRED: 2 [29.0%]  (true,exec)
>  # .MEMD.2733_7 = VDEF <.MEMD.2733_6>
>  # USE = nonlocal
>  # CLB = nonlocal
>  barD.1605 (cD.1606_1(D));
>  # SUCC: 4 [100.0%]  (fallthru,exec)
>
>  # BLOCK 4 freq:900
>  # PRED: 7 [100.0%]  (fallthru) 3 [100.0%]  (fallthru,exec)
>  # .MEMD.2733_4 = PHI <.MEMD.2733_6(7), .MEMD.2733_7(3)>
>  if (cD.1606_1(D) == 33)
>    goto ;
>  else
>    goto ;
>  # SUCC: 8 [91.0%]  (true,exec) 9 [9.0%]  (false,exec)
>
>  # BLOCK 9 freq:81
>  # PRED: 4 [9.0%]  (false,exec)
>  goto ;
>  # SUCC: 6 [100.0%]  (fallthru)
>
>  # BLOCK 8 freq:819
>  # PRED: 4 [91.0%]  (true,exec)
>  # SUCC: 5 [100.0%]  (fallthru)
>
>  # BLOCK 5 freq:9100
>  # PRED: 8 [100.0%]  (fallthru) 10 [100.0%]  (fallthru)
>  if (cD.1606_1(D) == 33)
>    goto ;
>  else
>    goto ;
>  # SUCC: 10 [91.0%]  (true,exec) 11 [9.0%]  (false,exec)
>
>  # BLOCK 10 freq:8281
>  # PRED: 5 [91.0%]  (true,exec)
>  goto ;
>  # SUCC: 5 [100.0%]  (fallthru)
>
>  # BLOCK 11 freq:819
>  # PRED: 5 [9.0%]  (false,exec)
>  # SUCC: 6 [100.0%]  (fallthru)
>
>  # BLOCK 6 freq:900
>  # PRED: 11 [100.0%]  (fallthru) 9 [100.0%]  (fallthru)
>  # VUSE <.MEMD.2733_4>
>  return;
>  # SUCC: EXIT [100.0%]
>
> }
> ...
>
> During the first iteration, tail_merge_optimize finds that block 9 and 11, and
> block 8 and 10 are equal, and removes block 11 and 10.
> During the second iteration it finds that block 4 and block 5 are equal, and 
> it
> removes block 5.
>
> Since pre had no effect, the responsibility for updating the vops lies with
> tail_merge_optimize.
>
> Block 4 starts with a virtual PHI which needs updating, but replace_block_by
> decides that an update is not necessary, because vop_at_entry returns 
> NULL_TREE
> for block 5 (the vop_at_entry for block 4 is .MEMD.2733_4).
> What is different from normal is that block 4 dominates block 5.
>
> The patch makes sure that the vops are also updated if vop_at_entry is defined
> for only one of bb1 and bb2.
>
> This also forced me to rewrite the code that updates the uses, which uses
> dominator info now. This forced me to keep the dominator info up-to-date. 
> Which
> forced me to move the actual deletion of the basic block and some additional
> bookkeeping related to that from purge_bbs to replace_block_by.
>
> Additionally, I fixed the case that update_vuses leaves virtual phis with only
> one argument (see unlink_virtual_phi).
>
> bootstrapped and reg-tested on x86_64. The tested patch had one addition to 
> the
> attached patch: calling verify_dominators at the end of replace_block_by.
>
> OK for trunk?

+  if (gimple_code (stmt) != GIMPLE_PHI &&
+ !dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), bb2))
  continue;

&&s go to the next line please.

The unlink_virtual_phi function needs a comment.

Ok with that changes.

Richard.

> Thanks,
> - Tom
>
> 2011-10-20  Tom de Vries  
>
>        PR tree-optimization/50763
>        * tree-ssa-tail-merge.c (same_succ_flush_bb): New function, factored 
> out
>        of ...
>        (same_succ_flush_bbs): Use same_succ_flush_bb.
>        (purge_bbs): Remove argument.  Remove calls to same_succ_flush_bbs,
>        release_last_vdef and delete_basic_block.
>        (unlink_virtual_phi): New function.
>        (update_vuses): Add and use vuse1_phi_args argument.  Set var to
>        SSA_NAME_VAR of vuse1 or vuse2, and use var.  Handle case that 
> def_stmt2
>        is NULL.  Use phi result as phi arg in case vuse1 or vuse2 is 
> NULL_TREE.
>        Replace uses of vuse1 if vuse2 is NULL_TREE.  Fix code to limit
>        replacement of uses.  Propagate phi argument for phis with a single
>        argument.
>        (replace_block_by): Update vops if phi_vuse1 or phi_vuse2 is NULL_TREE.
>        Set vuse1_phi_args if vuse1 is a phi defined in bb1.  Add 
> vuse1_phi_args
>        as argument to call to update_vuses.  Call release_last_vdef,
>        same_succ_flush_bb, delete_basic_block.  Update CDI_DOMINATORS info.
>        (tail_merge_optimize): Remove argument in call to purge_bbs.  Remove
>        call to free_dominance_info.  Only call calculate_dominance_info once

Re: [PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)

2011-10-21 Thread Richard Guenther

On Thu, Oct 20, 2011 at 12:31 PM, Jakub Jelinek  wrote:
> On Thu, Oct 20, 2011 at 11:42:01AM +0200, Richard Guenther wrote:
>> > +  if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR
>> > +      && is_pattern_stmt_p (stmt_info))
>> > +    scalar_dest = TREE_OPERAND (scalar_dest, 0);
>> >    if (TREE_CODE (scalar_dest) != ARRAY_REF
>> >        && TREE_CODE (scalar_dest) != INDIRECT_REF
>> >        && TREE_CODE (scalar_dest) != COMPONENT_REF
>>
>> Just change the if () stmt to
>>
>>  if (!handled_component_p (scalar_dest)
>>      && TREE_CODE (scalar_dest) != MEM_REF)
>>    return false;
>
> That will accept BIT_FIELD_REF and ARRAY_RANGE_REF (as well as VCE outside of 
> pattern stmts).
> The VCEs I hope don't appear, but the first two might, and I'm not sure
> we are prepared to handle them.  Certainly not BIT_FIELD_REFs.
>
>> > +      rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, 
>> > stmts);
>> > +      if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == TARGET_MEM_REF)
>> > +   {
>> > +     lhs = copy_node (lhs);
>>
>> We don't handle TARGET_MEM_REF in vectorizable_store, so no need to
>> do it here.  In fact, just unconditionally do ...
>>
>> > +     TREE_TYPE (lhs) = TREE_TYPE (vectype);
>> > +   }
>> > +      else
>> > +   lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs);
>>
>> ... this (wrap it in a V_C_E).  No need to special-case any
>> MEM_REFs.
>
> Ok.  After all it seems vectorizable_store pretty much ignores it
> (except for the scalar_dest check above).  For aliasing it uses the type
> from DR_REF and otherwise it uses the vectorized type.
>
>> > +      if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
>>
>> This should never be false, so you can as well unconditionally build
>> the conversion stmt.
>
> You mean because currently adjust_bool_pattern will prefer signed types
> over unsigned while here lhs will be unsigned?  I guess I should
> change it to use signed type for the memory store too to avoid the extra
> cast instead.  Both types can be certainly the same precision, e.g. for:
> unsigned char a[N], b[N];
> unsigned int d[N], e[N];
> bool c[N];
> ...
>  for (i = 0; i < N; ++i)
>    c[i] = a[i] < b[i];
> or different precision, e.g. for:
>  for (i = 0; i < N; ++i)
>    c[i] = d[i] < e[i];
>
>> > @@ -347,6 +347,28 @@ vect_determine_vectorization_factor (loo
>> >           gcc_assert (STMT_VINFO_DATA_REF (stmt_info)
>> >                       || is_pattern_stmt_p (stmt_info));
>> >           vectype = STMT_VINFO_VECTYPE (stmt_info);
>> > +         if (STMT_VINFO_DATA_REF (stmt_info))
>> > +           {
>> > +             struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
>> > +             tree scalar_type = TREE_TYPE (DR_REF (dr));
>> > +             /* vect_analyze_data_refs will allow bool writes through,
>> > +                in order to allow vect_recog_bool_pattern to transform
>> > +                those.  If they couldn't be transformed, give up now.  */
>> > +             if (((TYPE_PRECISION (scalar_type) == 1
>> > +                   && TYPE_UNSIGNED (scalar_type))
>> > +                  || TREE_CODE (scalar_type) == BOOLEAN_TYPE)
>>
>> Shouldn't it be always possible to vectorize those?  For loads
>> we can assume the memory contains only 1 or 0 (we assume that for
>> scalar loads), for stores we can mask out all other bits explicitly
>> if you add support for truncating conversions to non-mode precision
>> (in fact, we could support non-mode precision vectorization that way,
>> if not support bitfield loads or extending conversions).
>
> Not without the pattern recognizer transforming it into something.
> That is something we've discussed on IRC before I started working on the
> first vect_recog_bool_pattern patch, we'd need to special case bool and
> one-bit precision types in way too many places all around the vectorizer.
> Another reason for that was that what vect_recog_bool_pattern does currently
> is certainly way faster than what would we end up with if we just handled
> bool as unsigned (or signed?) char with masking on casts and stores
> - the ability to use any integer type for the bools rather than char
> as appropriate means we can avoid many VEC_PACK_TRUNK_EXPRs and
> corresponding VEC_UNPACK_{LO,HI}_EXPRs.
> So the chosen solution was attempt to transform some of bool patterns
> into something the vectorizer can handle easily.
> And that can be extended over time what it handles.
>
> The above just reflects it, probably just me trying to be too cautious,
> the vectorization would likely fail on the stmt feeding the store, because
> get_vectype_for_scalar_type would fail on it.
>
> If we wanted to support general TYPE_PRECISION != GET_MODE_BITSIZE (TYPE_MODE)
> vectorization (hopefully with still preserving the pattern bool recognizer
> for the above stated reasons), we'd start with changing
> get_vectype_for_scalar_type to handle those types (then the
> tree-vect-data-refs.c and tree-vect-loop.c changes fr

Re: [patch tree-optimization]: allow branch-cost optimization for truth-and/or on mode-expanded simple boolean-operands

2011-10-21 Thread Richard Guenther

On Thu, Oct 20, 2011 at 3:08 PM, Kai Tietz  wrote:
> Hello,
>
> this patch re-enables the branch-cost optimization on simple boolean-typed 
> operands, which are casted to a wider integral type.  This happens due casts 
> from
> boolean-types are preserved, but FE might expands simple-expression to wider 
> mode.
>
> I added two tests for already working branch-cost optimization for 
> IA-architecture and
> two for explicit checking for boolean-type.
>
> ChangeLog
>
> 2011-10-20  Kai Tietz  
>
>        * fold-const.c (simple_operand_p_2): Handle integral
>        casts from boolean-operands.
>
> 2011-10-20  Kai Tietz  
>
>        * gcc.target/i386/branch-cost1.c: New test.
>        * gcc.target/i386/branch-cost2.c: New test.
>        * gcc.target/i386/branch-cost3.c: New test.
>        * gcc.target/i386/branch-cost4.c: New test.
>
> Bootstrapped and regression tested on x86_64-unknown-linux-gnu for all 
> languages including Ada and Obj-C++.  Ok for apply?
>
> Regards,
> Kai
>
> Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost2.c
> ===
> --- /dev/null
> +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-gimple -mbranch-cost=2" } */
> +
> +extern int doo (void);
> +
> +int
> +foo (int a, int b)
> +{
> +  if (a && b)
> +   return doo ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "if " 1 "gimple" } } */
> +/* { dg-final { scan-tree-dump-times " & " 1 "gimple" } } */
> +/* { dg-final { cleanup-tree-dump "gimple" } } */
> Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost3.c
> ===
> --- /dev/null
> +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost3.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-gimple -mbranch-cost=2" } */
> +
> +extern int doo (void);
> +
> +int
> +foo (_Bool a, _Bool b)
> +{
> +  if (a && b)
> +   return doo ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "if " 1 "gimple" } } */
> +/* { dg-final { scan-tree-dump-times " & " 1 "gimple" } } */
> +/* { dg-final { cleanup-tree-dump "gimple" } } */
> Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost4.c
> ===
> --- /dev/null
> +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost4.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-gimple -mbranch-cost=0" } */
> +
> +extern int doo (void);
> +
> +int
> +foo (_Bool a, _Bool b)
> +{
> +  if (a && b)
> +   return doo ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "if " 2 "gimple" } } */
> +/* { dg-final { scan-tree-dump-not " & " "gimple" } } */
> +/* { dg-final { cleanup-tree-dump "gimple" } } */
> Index: gcc-head/gcc/fold-const.c
> ===
> --- gcc-head.orig/gcc/fold-const.c
> +++ gcc-head/gcc/fold-const.c
> @@ -3706,6 +3706,19 @@ simple_operand_p_2 (tree exp)
>   /* Strip any conversions that don't change the machine mode.  */
>   STRIP_NOPS (exp);
>
> +  /* Handle integral widening casts from boolean-typed
> +     expressions as simple.  This happens due casts from
> +     boolean-types are preserved, but FE might expands
> +     simple-expression to wider mode.  */
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (exp))
> +      && CONVERT_EXPR_P (exp)
> +      && TREE_CODE (TREE_TYPE (TREE_OPERAND (exp, 0)))
> +        == BOOLEAN_TYPE)
> +    {
> +      exp = TREE_OPERAND (exp, 0);
> +      STRIP_NOPS (exp);
> +    }
> +

Huh, well.  I think the above is just too special and you instead should
replace the existing STRIP_NOPS by

while (CONVERT_EXPR_P (exp))
  exp = TREE_OPERAND (exp, 0);

with a comment that conversions are considered simple.

Ok with that change, if it bootstraps & tests ok.

Richard.

>   code = TREE_CODE (exp);
>
>   if (TREE_SIDE_EFFECTS (exp)
> Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost1.c
> ===
> --- /dev/null
> +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-gimple -mbranch-cost=0" } */
> +
> +extern int doo (void);
> +
> +int
> +foo (int a, int b)
> +{
> +  if (a && b)
> +   return doo ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "if " 2 "gimple" } } */
> +/* { dg-final { scan-tree-dump-not " & " "gimple" } } */
> +/* { dg-final { cleanup-tree-dump "gimple" } } */
>

Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-21 Thread Richard Guenther

On Tue, Oct 18, 2011 at 4:14 PM, William J. Schmidt
 wrote:
> Greetings,
>
> Here is a new revision of the tree portions of this patch.  I moved the
> pattern recognizer to expand, and added additional logic to look for the
> same pattern in gimple form.  I added two more tests to verify the new
> logic.
>
> I didn't run into any problems with the RTL CSE phases.  I can't recall
> for certain what caused me to abandon the expand version previously.
> There may not have been good reason; too many versions to keep track of
> and too many interruptions, I suppose.  In any case, I'm much happier
> having this code in the expander.
>
> Paolo's RTL logic for unpropagating the zero-offset case is not going to
> work out as is.  It causes a number of performance degradations, which I
> suspect are due to the pass reordering.  That's a separate issue,
> though, and not needed for this patch.
>
> Bootstrapped and regression-tested on powerpc64-linux.  SPEC cpu2000
> shows a number of small improvements and no significant degradations.
> SPEC cpu2006 testing is pending.
>
> Thanks,
> Bill
>
>
> 2011-10-18  Bill Schmidt  
>
> gcc:
>
>        PR rtl-optimization/46556
>        * expr.c (tree-pretty-print.h): New include.
>        (restructure_base_and_offset): New function.
>        (restructure_mem_ref): New function.
>        (expand_expr_real_1): In MEM_REF case, attempt restructure_mem_ref
>        first.  In normal_inner_ref case, attempt restructure_base_and_offset
>        first.
>        * Makefile.in: Update dependences for expr.o.
>
> gcc/testsuite:
>
>        PR rtl-optimization/46556
>        * gcc.dg/tree-ssa-pr46556-1.c: New test.
>        * gcc.dg/tree-ssa-pr46556-2.c: Likewise.
>        * gcc.dg/tree-ssa-pr46556-3.c: Likewise.
>        * gcc.dg/tree-ssa-pr46556-4.c: Likewise.
>        * gcc.dg/tree-ssa-pr46556-5.c: Likewise.
>
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c   (revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 2 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 1 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 1 
> "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c   (revision 0)
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 12)
> +    foo (p->a[n], p->c[n], p->b[n]);
> +  else if (n > 3)
> +    foo (p->b[n], p->a[n], p->c[n]);
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 
> "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c   (revision 0)
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 3)
> +    {
> +      foo (p->a[n], p->c[n], p->b[n]);
> +      if (n > 12)
> +       foo (p->b[n], p->a[n], p->c[n]);
> +    }
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 
> "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c   (revision 0)
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-

Re: [PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)

2011-10-21 Thread Jakub Jelinek

On Fri, Oct 21, 2011 at 11:19:32AM +0200, Richard Guenther wrote:
> I'll try to poke at that a bit, thus support general bit-precision types for
> loads and stores and the few operations that are safe on them.  If you
> have a store to a bool like
> 
> int *a, *b;
> _Bool *c;
> 
> for (;;)
>   c[i] = a[i] < b[i];
> 
> will the compare choose an int vector type and then demote it to
> char for the store?

Yes.  The pattern recognizer would turn this into:
int *a, *b;
for (;;)
  {
int tmp = a[i] < b[i] ? 1 : 0;
((char *)c)[i] = (char) tmp;  // Still using _Bool for TBAA purposes
  }

>  I suppose trying to generally handle loads/stores
> for these types shouldn't interfere too much with this.  But I'll see ...

If you manage to get the generic stuff working (remove the condition from
get_vectype_from_scalar_type about TYPE_PRECISION and handle what is
needed), then vect_recog_bool_pattern would need to be adjusted slightly
(to not start on a cast from some kind of bool to another kind of bool,
which now results in return NULL because get_vectype_from_scalar_type
returns NULL_TREE) and from the patch I've posted we'd need just the
tree-vect-patterns.c bits (adjusted as you say to unconditionally create
VCE instead of special casing MEM_REF, and additionally attempting to use
signed instead of unsigned type to avoid unnecessary casts) and something
in vectorizable_store so that it doesn't fail on VCEs, at least not
in pattern stmts.

Jakub

Re: [RFA:] fix breakage with "Update testsuite to run with slim LTO"

2011-10-21 Thread Jan Hubicka

> > Date: Fri, 21 Oct 2011 00:19:32 +0200
> > From: Jan Hubicka 
> > Yes, if we scan assembler, we likely want -fno-fat-lto-objects.
> 
> > > then IIUC you need to patch *all* torture tests that use
> > > scan-assembler and scan-assembler-not.  Alternatively, patch
> > > somewhere else, like not passing it if certain directives are
> > > used, like scan-assembler{,-not}.  And either way, is it safe to
> > > add that option always, not just when also passing "-flto" or
> > > something?
> > 
> > Hmm, some of assembler scans still works because they check for
> > presence of symbols we output anyway, but indeed, it would make more
> > sense to automatically imply -ffat-lto-object when scan-assembler
> > is used.  I am not sure if my dejagnu skill as on par here however.
> 
> Maybe you could make amends ;) by testing the following, which
> seems to work at least for dg-torture.exp and cris-elf/cris-sim,
> in which -ffat-lto-object is automatically added for each
> scan-assembler and scan-assembler-not test, extensible for other
> dg-final actions without polluting with checking LTO options and
> whatnot across the files.  I checked (and corrected) so it also
> works when !check_effective_target_lto by commenting out the
> setting in the second chunk.

Thanks. It looks good to me.  If we ever start scanning LTO assembler output,
we may simply add scan-lto-assembler variants or so...

Honza

Re: [PATCH 1/3] Add missing page rounding of a page_entry

2011-10-21 Thread Richard Guenther

On Fri, Oct 21, 2011 at 7:52 AM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> This one place in ggc forgot to round page_entry->bytes to the
> next page boundary, which lead to all the heuristics in freeing to
> check for continuous memory failing. Round here too, like all other
> allocators already do. The memory consumed should be the same
> for MMAP because the kernel would round anyways. It may slightly
> increase memory usage when malloc groups are used.
>
> This will also increase the hitrate on the free page list
> slightly.

Ok.

Thanks,
RIchard.

> gcc/:
>
> 2011-10-18  Andi Kleen  
>
>        * ggc-page.c (alloc_pages): Always round up entry_size.
> ---
>  gcc/ggc-page.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> index 2da99db..ba88e3f 100644
> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -736,6 +736,7 @@ alloc_page (unsigned order)
>   entry_size = num_objects * OBJECT_SIZE (order);
>   if (entry_size < G.pagesize)
>     entry_size = G.pagesize;
> +  entry_size = ROUND_UP (entry_size, G.pagesize);
>
>   entry = NULL;
>   page = NULL;
> --
> 1.7.5.4
>
>

Re: [C++ Patch] PR 30066

2011-10-21 Thread Roberto Agostino Vitillo

I am resubmitting the changelog since it was not respecting the conventions.

gcc/Changelog:
2011-10-21  Roberto Agostino Vitillo  
PR c++/30066
* doc/invoke.texi (fvisibility-inlines-hidden): Documentation change.

gcc/c-family/Changelog:
2011-10-21  Roberto Agostino Vitillo  
PR c++/30066
* c.opt (fvisibility-inlines-hidden): Description change.

gcc/cp/Changelog:
2011-10-21  Roberto Agostino Vitillo  
PR c++/30066
* decl2.c (determine_hidden_inline): New function.
  (determine_visibility): fvisibility-inlines-hidden affects inline 
functions.

gcc/testsuite/Changelog:
2011-10-21  Roberto Agostino Vitillo  
PR c++/30066
* g++.dg/ext/visibility/fvisibility-inlines-hidden-4.C: New test.

r

Re: [PATCH 1/3] Add missing page rounding of a page_entry

2011-10-21 Thread Jakub Jelinek

On Fri, Oct 21, 2011 at 11:42:26AM +0200, Richard Guenther wrote:
> On Fri, Oct 21, 2011 at 7:52 AM, Andi Kleen  wrote:
> > From: Andi Kleen 
> >
> > This one place in ggc forgot to round page_entry->bytes to the
> > next page boundary, which lead to all the heuristics in freeing to
> > check for continuous memory failing. Round here too, like all other
> > allocators already do. The memory consumed should be the same
> > for MMAP because the kernel would round anyways. It may slightly
> > increase memory usage when malloc groups are used.
> >
> > This will also increase the hitrate on the free page list
> > slightly.
> 
> > 2011-10-18  Andi Kleen  
> >
> >        * ggc-page.c (alloc_pages): Always round up entry_size.

As I said in the PR, ROUND_UP should make the previous
  if (entry_size < G.pagesize)
entry_size = G.pagesize;
completely unnecessary.  Additionally, seeing what ROUND_UP does, it seems
horribly expensive when the second argument is not a constant.
#define ROUND_UP(x, f) (CEIL (x, f) * (f))
#define CEIL(x,y) (((x) + (y) - 1) / (y))
as G.pagesize is variable, I'm afraid the compiler has to divide and
multiply (or perhaps divide and modulo), there is nothing hinting that
G.pagesize is a power of two and thus
(entry_page_size + G.pagesize - 1) & (G.pagesize - 1);
will work.  ggc-page.c relies on G.pagesize to be a power of two though
(and I hope no sane host uses something else), as otherwise
G.lg_pagesize would be -1 and we shift by that amount, so that would be
undefined behavior.

> > diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> > index 2da99db..ba88e3f 100644
> > --- a/gcc/ggc-page.c
> > +++ b/gcc/ggc-page.c
> > @@ -736,6 +736,7 @@ alloc_page (unsigned order)
> >   entry_size = num_objects * OBJECT_SIZE (order);
> >   if (entry_size < G.pagesize)
> >     entry_size = G.pagesize;
> > +  entry_size = ROUND_UP (entry_size, G.pagesize);
> >
> >   entry = NULL;
> >   page = NULL;
> > --
> > 1.7.5.4
> >
> >

Jakub

Re: [PATCH 2/3] Free large chunks in ggc

2011-10-21 Thread Richard Guenther

On Fri, Oct 21, 2011 at 7:52 AM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> This implements the freeing back of large chunks in the ggc madvise path
> Richard Guenther asked for.  This way on systems with limited
> address space malloc() and other allocators still have
> a chance to get back at some of the memory ggc freed. The
> fragmented pages are still just given back, but the address space
> stays allocated.
>
> I tried freeing only aligned 2MB areas to optimize for 2MB huge
> pages, but the hit rate was quite low, so I switched to 1MB+
> unaligned areas. The target size is a param now.
>
> Passed bootstrap and testing on x86_64-linux
>
> gcc/:
> 2011-10-18  Andi Kleen  
>
>        * ggc-page (release_pages): First free large continuous
>        chunks in the madvise path.
>        * params.def (GGC_FREE_UNIT): Add.
>        * doc/invoke.texi (ggc-free-unit): Add.
> ---
>  gcc/doc/invoke.texi |    5 +
>  gcc/ggc-page.c      |   48 
>  gcc/params.def      |    5 +
>  3 files changed, 58 insertions(+), 0 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4f55dbc..e622552 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -8858,6 +8858,11 @@ very large effectively disables garbage collection.  
> Setting this
>  parameter and @option{ggc-min-expand} to zero causes a full collection
>  to occur at every opportunity.
>
> +@item  ggc-free-unit
> +
> +Continuous areas in OS pages to free back to OS immediately. Default is 256
> +pages, which is 1MB with 4K pages.
> +
>  @item max-reload-search-insns
>  The maximum number of instruction reload should look backward for equivalent
>  register.  Increasing values mean more aggressive optimization, making the
> diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> index ba88e3f..eb0eeef 100644
> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -972,6 +972,54 @@ release_pages (void)
>   page_entry *p, *start_p;
>   char *start;
>   size_t len;
> +  size_t mapped_len;
> +  page_entry *next, *prev, *newprev;
> +  size_t free_unit = PARAM_VALUE (GGC_FREE_UNIT) * G.pagesize;
> +
> +  /* First free larger continuous areas to the OS.
> +     This allows other allocators to grab these areas if needed.
> +     This is only done on larger chunks to avoid fragmentation.
> +     This does not always work because the free_pages list is only
> +     sorted over a single GC cycle. */

But release_pages is only called from ggc_collect, or what do you
mean with the above?  Would the hitrate using the quire size increase
if we change how we allocate from the freelist or is it real fragmentation
that causes it?

I'm a bit hesitant to approve the new param, I'd be ok if we just hard-code
quire-size / 2.

Richard.

> +
> +  p = G.free_pages;
> +  prev = NULL;
> +  while (p)
> +    {
> +      start = p->page;
> +      start_p = p;
> +      len = 0;
> +      mapped_len = 0;
> +      newprev = prev;
> +      while (p && p->page == start + len)
> +        {
> +          len += p->bytes;
> +         if (!p->discarded)
> +             mapped_len += p->bytes;
> +         newprev = p;
> +          p = p->next;
> +        }
> +      if (len >= free_unit)
> +        {
> +          while (start_p != p)
> +            {
> +              next = start_p->next;
> +              free (start_p);
> +              start_p = next;
> +            }
> +          munmap (start, len);
> +         if (prev)
> +           prev->next = p;
> +          else
> +            G.free_pages = p;
> +          G.bytes_mapped -= mapped_len;
> +         continue;
> +        }
> +      prev = newprev;
> +   }
> +
> +  /* Now give back the fragmented pages to the OS, but keep the address
> +     space to reuse it next time. */
>
>   for (p = G.free_pages; p; )
>     {
> diff --git a/gcc/params.def b/gcc/params.def
> index 5e49c48..edbf0de 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -561,6 +561,11 @@ DEFPARAM(GGC_MIN_HEAPSIZE,
>  #undef GGC_MIN_EXPAND_DEFAULT
>  #undef GGC_MIN_HEAPSIZE_DEFAULT
>
> +DEFPARAM(GGC_FREE_UNIT,
> +        "ggc-free-unit",
> +        "Continuous areas in OS pages to free back immediately",
> +        256, 0, 0)
> +
>  DEFPARAM(PARAM_MAX_RELOAD_SEARCH_INSNS,
>         "max-reload-search-insns",
>         "The maximum number of instructions to search backward when looking 
> for equivalent reload",
> --
> 1.7.5.4
>
>

[C++ Patch] __builtin_choose_expr bump

2011-10-21 Thread Andy Gibbs

Hi,

Please can I "bump" this patch and ask for it to be approved and committed:
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01711.html

The patch is to implement the C built-in function __builtin_choose_expr(...) 
in C++.

I'm afraid I am new to contributing to GCC, so I hope I am going about this 
the right way.  I also appreciate that this is a very busy mailing list, but 
would be very grateful if the patch could be committed so that it can make 
it into 4.7.0.

Many thanks

Andy

Re: [C++ Patch] __builtin_choose_expr bump

2011-10-21 Thread Richard Guenther

On Fri, Oct 21, 2011 at 11:57 AM, Andy Gibbs  wrote:
> Hi,
>
> Please can I "bump" this patch and ask for it to be approved and committed:
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01711.html
>
> The patch is to implement the C built-in function __builtin_choose_expr(...)
> in C++.
>
> I'm afraid I am new to contributing to GCC, so I hope I am going about this
> the right way.  I also appreciate that this is a very busy mailing list, but
> would be very grateful if the patch could be committed so that it can make
> it into 4.7.0.

What's the motivation for this?  Why can't it be implemented using C++
features such as templates and specialization?

Richard.


> Many thanks
>
> Andy
>
>
>
>

[Patch,AVR]: Use EIND consistently

2011-10-21 Thread Georg-Johann Lay

This patch adds support to consistently use EIND.

The compiler never sets this SFR but uses it in table jumps and EIJMP/EICALL
instructions.

Custom startup code could set EIND to an other value than 0 and the compiler
should use EIND consistently given that EIND might not be zero.

EIND != 0 will need support of custom linker script to locate jump pads in an
other segment, but that's a different story.

The patch undoes some changes from r179760 and introduces using EIND the other
way round: Not trying to avoid EIND altogether and assume code is supposed to
work in the lower segment only, but instead use EIND and not zero-reg when
simulating indirect jump by means of RET.

With this patch, the application may set EIND to a custom value and invent own
linker script to place jump pads.  The assertion is that EIND never changes
throughout the application and therefore ISR prologue/epilogue need not care.

With the gs() magic, code using indirect jumps works fine with that, e.g.
- Indirect calls
- Computed goto
- Jumping to 1: in prologue_saves

What does not work as expected is to jump to const_int addresses like

int main()
{
   ((void(*)(void))0)();
   return 0;
}

Instead, code must read

extern void my_address (void);

int main()
{
my_address();
return 0;
}

and compiled with, say -Wl,--defsym,my_address=0x2, so that a jump pad is
generated.

Patch ok for trunk?

Johann

* config/avr/libgcc.S (__EIND__): New define to 0x3C.
(__tablejump__): Consistently use EIND for indirect jump/call.
(__tablejump_elpm__): Ditto.

Index: config/avr/libgcc.S
===
--- config/avr/libgcc.S (revision 180262)
+++ config/avr/libgcc.S (working copy)
@@ -28,6 +28,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define __SP_H__ 0x3e
 #define __SP_L__ 0x3d
 #define __RAMPZ__ 0x3B
+#define __EIND__  0x3C

 /* Most of the functions here are called directly from avr.md
patterns, instead of using the standard libcall mechanisms.
@@ -821,17 +822,12 @@ ENDF __tablejump2__

 DEFUN __tablejump__
 #if defined (__AVR_HAVE_LPMX__)
-#if defined (__AVR_HAVE_EIJMP_EICALL__)
-   lpm  __tmp_reg__, Z+
-   push __tmp_reg__
-   lpm  __tmp_reg__, Z 
-   push __tmp_reg__
-   push __zero_reg__
-   ret
-#else
lpm __tmp_reg__, Z+
lpm r31, Z
mov r30, __tmp_reg__
+#if defined (__AVR_HAVE_EIJMP_EICALL__)
+   eijmp
+#else
ijmp
 #endif

@@ -842,7 +838,8 @@ DEFUN __tablejump__
lpm
push r0
 #if defined (__AVR_HAVE_EIJMP_EICALL__)
-   push __zero_reg__
+   in   __tmp_reg__, __EIND__
+   push __tmp_reg__
 #endif
ret
 #endif /* !HAVE_LPMX */
@@ -1034,7 +1031,8 @@ DEFUN __tablejump_elpm__
elpm
pushr0
 #if defined (__AVR_HAVE_EIJMP_EICALL__)
-push__zero_reg__
+   in  __tmp_reg__, __EIND__
+   push__tmp_reg__
 #endif
ret
 #endif

[C++ Patch / RFC] PR 45385

2011-10-21 Thread Paolo Carlini


Hi,

this one is a bit subtler. It's actually a regression due to the fix for 
PR35602, which was about a bogus warning for:


struct c
{
  ~c();
  c();
};

int
main()
{
  c x[0UL][0UL] =  // { dg-bogus "warning: conversion to .long unsigned 
int. from .long int. may change the sign of the result" }

{
};
}

The way we did it, we added a check at the beginning of 
c-common.c:conversion_warning:


   /* If any operand is artificial, then this expression was generated
  by the compiler and we do not warn.  */
  for (i = 0; i < expr_num_operands; i++)
 {
   tree op = TREE_OPERAND (expr, i);
   if (op && DECL_P (op) && DECL_ARTIFICIAL (op))
 return;
 }

which catches the artificial (only) operand of the expr (expr is a 
BIT_NOT_EXPR and the operand a VAR_DECL).


Now, however, for testcases like PR45385, where member functions are 
involved, we easily fail to produce warnings, simply because the this 
pointer is artificial! Thus I had the idea of restricting the above 
check to the single operand case which matters for PR35602: for sure 
it's a safe change, and passes the testsuite, but I cannot exclude that 
more complex situations can occur for which the loop would avoid more 
bogus warnings... What do you think, is the change good enough for now?


Thanks,
Paolo.


/c-family
2011-10-21  Paolo Carlini  

PR c++/45385
* c-common.c (conversion_warning): Early return only if the
only operand is DECL_ARTIFICIAL.

testsuite/
2011-10-21  Paolo Carlini  

PR c++/45385
* g++.dg/warn/Wconversion4.C: New.
Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 180288)
+++ c-family/c-common.c (working copy)
@@ -2121,19 +2121,17 @@ unsafe_conversion_p (tree type, tree expr, bool pr
 static void
 conversion_warning (tree type, tree expr)
 {
-  int i;
-  const int expr_num_operands = TREE_OPERAND_LENGTH (expr);
   tree expr_type = TREE_TYPE (expr);
   location_t loc = EXPR_LOC_OR_HERE (expr);
 
   if (!warn_conversion && !warn_sign_conversion)
 return;
 
-  /* If any operand is artificial, then this expression was generated
- by the compiler and we do not warn.  */
-  for (i = 0; i < expr_num_operands; i++)
+  /* If the only operand is artificial, then the expression was generated
+ by the compiler and we do not warn.   */
+  if (TREE_OPERAND_LENGTH (expr) == 1)
 {
-  tree op = TREE_OPERAND (expr, i);
+  tree op = TREE_OPERAND (expr, 0);
   if (op && DECL_P (op) && DECL_ARTIFICIAL (op))
return;
 }
Index: testsuite/g++.dg/warn/Wconversion4.C
===
--- testsuite/g++.dg/warn/Wconversion4.C(revision 0)
+++ testsuite/g++.dg/warn/Wconversion4.C(revision 0)
@@ -0,0 +1,17 @@
+// PR c++/45385
+// { dg-options "-Wconversion" } 
+
+void foo(unsigned char);
+
+class Test
+{
+  void eval()
+  {
+foo(bar());  // { dg-warning "may alter its value" }
+  }
+
+  unsigned int bar() const
+  {
+return __INT_MAX__ * 2U + 1;
+  }
+};

[commit][arm] Fix pr50809: build failure with --enable-build-with-cxx

2011-10-21 Thread Andrew Stubbs

I've just committed this patch to fix PR50809, in which driver-arm.c 
failed to build with a C++ compiler and -Werror.


The patch is pre-approved by Ramana, and anyway probably qualifies as 
obvious.


Andrew
2011-10-21  Andrew Stubbs  

	PR target/50809

	gcc/
	* config/arm/driver-arm.c (vendors): Make static.

--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -49,7 +49,7 @@ static struct vendor_cpu arm_cpu_table[] = {
 {NULL, NULL, NULL}
 };
 
-struct {
+static struct {
   const char *vendor_no;
   const struct vendor_cpu *vendor_parts;
 } vendors[] = {

[Patch] Add support of AIX response files in collect2

2011-10-21 Thread Tristan Gingold

Hi,

the AIX linker supports response files with the '-fFILE' command line option.
With this patch, collect2 reads the content of the response files and listed 
object files are handled.  Therefore these files are not forgotten by the 
collect2 machinery.

Although AIX ld supports glob(3) patterns, this isn't handled by this patch 
because glob() isn't available on all hosts, isn't present in libiberty and its 
use is not common.  To be considered for the future.

I haven't added a testcase for this because I think this is not possible with 
only one file.  But hints are welcome.

Reduced bootstrap on rs6000-aix

Ok for trunk ?

Tristan.

2011-10-21  Tristan Gingold  

* collect2.c (main): Add support of -f (response file) on AIX.

diff --git a/gcc/collect2.c b/gcc/collect2.c
index cf39693..fd747b5 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -1091,6 +1091,7 @@ main (int argc, char **argv)
   const char **ld2;
   char **object_lst;
   const char **object;
+  int object_nbr = argc;
   int first_file;
   int num_c_args;
   char **old_argv;
@@ -1440,6 +1441,57 @@ main (int argc, char **argv)
 "configuration");
 #endif
}
+#ifdef TARGET_AIX_VERSION
+ else
+   {
+ /* File containing a list of input files to process.  */
+ 
+ FILE *stream;
+ char buf[MAXPATHLEN + 2];
+ /* Number of additionnal object files.  */
+ int add_nbr = 0;
+ /* Maximum of additionnal object files before vector
+expansion.  */
+ int add_max = 0;
+ const char *list_filename = arg + 2;
+ 
+ /* Accept -fFILENAME and -f FILENAME.  */
+ if (*list_filename == '\0' && argv[1])
+   {
+ ++argv;
+ list_filename = *argv;
+ *ld1++ = *ld2++ = *argv;
+   }
+ 
+ stream = fopen (list_filename, "r");
+ if (stream == NULL)
+   fatal_error ("can't open %s: %m", list_filename);
+ 
+ while (fgets (buf, sizeof buf, stream) != NULL)
+   {
+ /* Remove end of line.  */
+ int len = strlen (buf);
+ if (len >= 1 && buf[len - 1] =='\n')
+   buf[len - 1] = '\0';
+ 
+ /* Put on object vector.
+Note: we only expanse vector here, so we must keep
+extra space for remaining arguments.  */
+ if (add_nbr >= add_max)
+   {
+ int pos = object - (const char **)object_lst;
+ add_max = (add_max == 0) ? 16 : add_max * 2;
+ object_lst = XRESIZEVEC(char *, object_lst,
+ object_nbr + add_max);
+ object = (const char **)object_lst + pos;
+ object_nbr += add_max;
+   }
+ *object++ = xstrdup(buf);
+ add_nbr++;
+   }
+ fclose (stream);
+   }
+#endif
   break;
 
case 'l':

[i386, PR50740] CPUID leaf 7 for BMI/BMI2/AVX2 feature detection not qualified with max_level and doesn't use subleaf

2011-10-21 Thread Kirill Yukhin

Hello,
Here is the patch which checks CPUID correctly to get BMI/BMI2/AVX2 feature.

ChangeLog entry is:
2011-10-21  H.J. Lu  
Kirill Yukhin  

* config/i386/driver-i386.c (host_detect_local_cpu): Do cpuid 7 only
if max_level allows that.

testsuite/ChangeLg entry is
2011-10-21  H.J. Lu  
Kirill Yukhin  

* gcc.target/i386/avx2-check.h (main): Check CPU level
correctly.
* gcc.target/i386/bmi2-check.h: Ditto.

Bootstrap has passed.
Could you please have a look?

Thanks, K


cpuid7.gcc.patch
Description: Binary data

Re: [Patch,AVR]: Use EIND consistently

2011-10-21 Thread Denis Chertykov

2011/10/21 Georg-Johann Lay :
> This patch adds support to consistently use EIND.
>
> The compiler never sets this SFR but uses it in table jumps and EIJMP/EICALL
> instructions.
>
> Custom startup code could set EIND to an other value than 0 and the compiler
> should use EIND consistently given that EIND might not be zero.
>
> EIND != 0 will need support of custom linker script to locate jump pads in an
> other segment, but that's a different story.
>
> The patch undoes some changes from r179760 and introduces using EIND the other
> way round: Not trying to avoid EIND altogether and assume code is supposed to
> work in the lower segment only, but instead use EIND and not zero-reg when
> simulating indirect jump by means of RET.
>
> With this patch, the application may set EIND to a custom value and invent own
> linker script to place jump pads.  The assertion is that EIND never changes
> throughout the application and therefore ISR prologue/epilogue need not care.
>
> With the gs() magic, code using indirect jumps works fine with that, e.g.
> - Indirect calls
> - Computed goto
> - Jumping to 1: in prologue_saves
>
> What does not work as expected is to jump to const_int addresses like
>
> int main()
> {
>   ((void(*)(void))0)();
>   return 0;
> }
>
> Instead, code must read
>
> extern void my_address (void);
>
> int main()
> {
>    my_address();
>    return 0;
> }
>
> and compiled with, say -Wl,--defsym,my_address=0x2, so that a jump pad is
> generated.
>
> Patch ok for trunk?
>
> Johann
>
>        * config/avr/libgcc.S (__EIND__): New define to 0x3C.
>        (__tablejump__): Consistently use EIND for indirect jump/call.
>        (__tablejump_elpm__): Ditto.

Approved.

Denis.

Handle weakrefs from callgraph

2011-10-21 Thread Jan Hubicka

Hi,
this patch makes weakref that has no declaration they alias to to go the 
callgraph
way instead of alias pair way.  This should make practically all sane aliases go
the callgraph way, but we still do not handle aliases from functions to 
variables
and vice versa.

They are not too hard to represent in callgraph+varpool but they complicate
handling of them.  I would think in favour of making them unsupported, but if 
neccesary,
I can add support for them.

Bootstrapped/regtested x86_64-linux, comitted.
Honza

* cgraph.c (dump_cgraph_node): Dump alias flag.
* cgraphunit.c (handle_alias_pairs): Handle weakrefs with no 
destination.
(get_alias_symbol): New function.
(output_weakrefs): Output also weakrefs with no destinatoin.
(lto_output_node): Output weakref alias flag when at function boundary.
Index: cgraph.c
===
--- cgraph.c(revision 180181)
+++ cgraph.c(working copy)
@@ -1838,6 +1838,8 @@ dump_cgraph_node (FILE *f, struct cgraph
 fprintf (f, " only_called_at_startup");
   if (node->only_called_at_exit)
 fprintf (f, " only_called_at_exit");
+  else if (node->alias)
+fprintf (f, " alias");
 
   fprintf (f, "\n");
 
Index: cgraphunit.c
===
--- cgraphunit.c(revision 180181)
+++ cgraphunit.c(working copy)
@@ -1249,6 +1249,18 @@ handle_alias_pairs (void)
  varpool_create_variable_alias (p->decl, target_vnode->decl);
  VEC_unordered_remove (alias_pair, alias_pairs, i);
}
+  /* Weakrefs with target not defined in current unit are easy to handle; 
they
+behave just as external variables except we need to note the alias flag
+to later output the weakref pseudo op into asm file.  */
+  else if (lookup_attribute ("weakref", DECL_ATTRIBUTES (p->decl)) != NULL)
+   {
+ if (TREE_CODE (p->decl) == FUNCTION_DECL)
+   cgraph_get_create_node (p->decl)->alias = true;
+ else
+   varpool_get_node (p->decl)->alias = true;
+ DECL_EXTERNAL (p->decl) = 1;
+ VEC_unordered_remove (alias_pair, alias_pairs, i);
+   }
   else
{
  if (dump_file)
@@ -2064,24 +2078,36 @@ ipa_passes (void)
   bitmap_obstack_release (NULL);
 }
 
+/* Return string alias is alias of.  */
+
+static tree
+get_alias_symbol (tree decl)
+{
+  tree alias = lookup_attribute ("alias", DECL_ATTRIBUTES (decl));
+  return get_identifier (TREE_STRING_POINTER
+ (TREE_VALUE (TREE_VALUE (alias;
+}
+
 /* Weakrefs may be associated to external decls and thus not output
at expansion time.  Emit all neccesary aliases.  */
 
-void
+static void
 output_weakrefs (void)
 {
   struct cgraph_node *node;
   struct varpool_node *vnode;
   for (node = cgraph_nodes; node; node = node->next)
-if (node->alias && node->thunk.alias && DECL_EXTERNAL (node->decl)
+if (node->alias && DECL_EXTERNAL (node->decl)
 && !TREE_ASM_WRITTEN (node->decl))
   assemble_alias (node->decl,
- DECL_ASSEMBLER_NAME (node->thunk.alias));
+ node->thunk.alias ? DECL_ASSEMBLER_NAME 
(node->thunk.alias)
+ : get_alias_symbol (node->decl));
   for (vnode = varpool_nodes; vnode; vnode = vnode->next)
-if (vnode->alias && vnode->alias_of && DECL_EXTERNAL (vnode->decl)
+if (vnode->alias && DECL_EXTERNAL (vnode->decl)
 && !TREE_ASM_WRITTEN (vnode->decl))
   assemble_alias (vnode->decl,
- DECL_ASSEMBLER_NAME (vnode->alias_of));
+ vnode->alias_of ? DECL_ASSEMBLER_NAME (vnode->alias_of)
+ : get_alias_symbol (vnode->decl));
 }
 
 
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 180181)
+++ lto-cgraph.c(working copy)
@@ -512,7 +512,13 @@ lto_output_node (struct lto_simple_outpu
 || referenced_from_other_partition_p (&node->ref_list, 
set, vset)), 1);
   bp_pack_value (&bp, node->lowered, 1);
   bp_pack_value (&bp, in_other_partition, 1);
-  bp_pack_value (&bp, node->alias && !boundary_p, 1);
+  /* Real aliases in a boundary become non-aliases. However we still stream
+ alias info on weakrefs. 
+ TODO: We lose a bit of information here - when we know that variable is
+ defined in other unit, we may use the info on aliases to resolve 
+ symbol1 != symbol2 type tests that we can do only for locally defined 
objects
+ otherwise.  */
+  bp_pack_value (&bp, node->alias && (!boundary_p || DECL_EXTERNAL 
(node->decl)), 1);
   bp_pack_value (&bp, node->frequency, 2);
   bp_pack_value (&bp, node->only_called_at_startup, 1);
   bp_pack_value (&bp, node->only_called_at_exit, 1);
@@ -530,7 +536,8 @@ lto_output_node (struct lto_simple_outpu
   streamer_write_uhwi_stream (ob->main_stream, node->thunk.fixed

Re: [RFA:] fix breakage with "Update testsuite to run with slim LTO"

2011-10-21 Thread Iain Sandoe

On 21 Oct 2011, at 10:31, Jan Hubicka wrote:

Date: Fri, 21 Oct 2011 00:19:32 +0200
From: Jan Hubicka 
Yes, if we scan assembler, we likely want -fno-fat-lto-objects.

then IIUC you need to patch *all* torture tests that use
scan-assembler and scan-assembler-not.  Alternatively, patch
somewhere else, like not passing it if certain directives are
used, like scan-assembler{,-not}.  And either way, is it safe to
add that option always, not just when also passing "-flto" or
something?

Hmm, some of assembler scans still works because they check for
presence of symbols we output anyway, but indeed, it would make more
sense to automatically imply -ffat-lto-object when scan-assembler
is used.  I am not sure if my dejagnu skill as on par here however.

Maybe you could make amends ;) by testing the following, which
seems to work at least for dg-torture.exp and cris-elf/cris-sim,
in which -ffat-lto-object is automatically added for each
scan-assembler and scan-assembler-not test, extensible for other
dg-final actions without polluting with checking LTO options and
whatnot across the files.  I checked (and corrected) so it also
works when !check_effective_target_lto by commenting out the
setting in the second chunk.

Thanks. It looks good to me.  If we ever start scanning LTO  
assembler output,

we may simply add scan-lto-assembler variants or so...

It looks like the gnat testsuite is also broken - but HP's fix doesn't  
recover that.

.. will try and take a look - but short on time today,
Iain

Re: [PATCH][PING] Vectorize conversions directly

2011-10-21 Thread Ramana Radhakrishnan

> Otherwise the generic parts of the patch look good.
> Please get separate approval for the arm portions of the patch.

Is it just me or has no one else seen this patch on the archives at
gcc-patches@. ?


Ramana

Re: [RFA:] fix breakage with "Update testsuite to run with slim LTO"

2011-10-21 Thread Rainer Orth

Iain Sandoe  writes:

> It looks like the gnat testsuite is also broken - but HP's fix doesn't
> recover that.
> .. will try and take a look - but short on time today,

I think I see what's going on: in gnat.log, I find

Running /vol/gcc/src/hg/trunk/local/gcc/testsuite/gnat.dg/dg.exp ...
ERROR: tcl error sourcing library file 
/vol/gcc/src/hg/trunk/local/gcc/testsuite/lib/gcc-dg.exp.
can't read "GCC_UNDER_TEST": no such variable
can't read "GCC_UNDER_TEST": no such variable
while executing
"lappend options "compiler=$GCC_UNDER_TEST""
(procedure "gcc_target_compile" line 37)
invoked from within
"gcc_target_compile $source $dest $type $options"
invoked from within
"if [ string match "*.c" $source ] then {
return [gcc_target_compile $source $dest $type $options]
}"
(procedure "gnat_target_compile" line 12)
invoked from within
"${tool}_target_compile $src $output $compile_type "$options""
(procedure "check_compile" line 39)
invoked from within
"check_compile linker_plugin executable {
 int main() { return 0; }
  } {-flto -fuse-linker-plugin}"
("eval" body line 1)
invoked from within
"eval check_compile $args"
(procedure "check_no_compiler_messages_nocache" line 2)
invoked from within
"check_no_compiler_messages_nocache linker_plugin executable {
 int main() { return 0; }
  } "-flto -fuse-linker-plugin""
(procedure "check_linker_plugin_available" line 2)
invoked from within
"check_linker_plugin_available"
invoked from within
"if [check_effective_target_lto] {
# When having plugin test both slim and fat LTO and plugin/nonplugin
# path.
if [check_linker_plugin_ava..."
(file "/vol/gcc/src/hg/trunk/local/gcc/testsuite/lib/gcc-dg.exp" line 71)
invoked from within
"source /vol/gcc/src/hg/trunk/local/gcc/testsuite/lib/gcc-dg.exp"
("uplevel" body line 1)
invoked from within
"uplevel #0 source /vol/gcc/src/hg/trunk/local/gcc/testsuite/lib/gcc-dg.exp"

If running the gnat.dg testsuite, lib/gcc-dg.exp is now calling
check_linker_plugin_available early, which ultimately calls
${tool}_target_compile.  For all languages but Ada,
${tool}_target_compile can compile .c files just fine, but
gnat_target_compile (which uses gnatmake) cannot, so it falls back to
directly calling gcc_target_compile in that case.  gcc_target_compile
relies on GCC_UNDER_TEST being set, which in this case hasn't yet
happened, thus the error.

My solution (a hack, actually) is to move the initialization of
GCC_UNDER_TEST in gcc-dg.exp before the calls to
check_linker_plugin_available.  x86_64-unknown-linux-gnu testing in
progress, will commit once that's finished.

Btw., the ChangeLog entry for Jan's patch was riddled with typos and
wrong pathnames.  I've corrected that with a separate checkin.

Rainer

2011-10-21  Rainer Orth  

* lib/gcc-dg.exp (GCC_UNDER_TEST): Set before calling
check_linker_plugin_available.

# HG changeset patch
# Parent 9c45ed5cb653fa8053d3c7a9d6502a85b0ffbafc
Fix gnat.dg testing with linker plugin check

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -41,6 +41,11 @@ if { [ishost "*-*-cygwin*"] } {
   setenv LANG C.ASCII
 }

+global GCC_UNDER_TEST
+if ![info exists GCC_UNDER_TEST] {
+set GCC_UNDER_TEST "[find_gcc]"
+}
+
 if [info exists TORTURE_OPTIONS] {
 set DG_TORTURE_OPTIONS $TORTURE_OPTIONS
 } else {
@@ -84,12 +89,6 @@ if [check_effective_target_lto] {
 }
 }

-
-global GCC_UNDER_TEST
-if ![info exists GCC_UNDER_TEST] {
-set GCC_UNDER_TEST "[find_gcc]"
-}
-
 global orig_environment_saved

 # This file may be sourced, so don't override environment settings

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [cxx-mem-model] Handle x86-64 with -m32

2011-10-21 Thread Andrew MacLeod


On 10/20/2011 06:50 PM, H.J. Lu wrote:

On Thu, Oct 20, 2011 at 3:38 PM, Joseph S. Myers
  wrote:


Do these operations exist for x32 as well as for -m64?  If they do, then
lp64 isn't the right test either; if not, then it is.


X32 has native int64 and int128.

I presume there is no atomic support for int128 though, and thats what 
'condition check_effective_target_sync_int_128' is testing for.


Andrew

Re: [patch tree-optimization]: allow branch-cost optimization for truth-and/or on mode-expanded simple boolean-operands

2011-10-21 Thread Kai Tietz

2011/10/21 Richard Guenther :
> On Thu, Oct 20, 2011 at 3:08 PM, Kai Tietz  wrote:
>> Hello,
>>
>> this patch re-enables the branch-cost optimization on simple boolean-typed 
>> operands, which are casted to a wider integral type.  This happens due casts 
>> from
>> boolean-types are preserved, but FE might expands simple-expression to wider 
>> mode.
>>
>> I added two tests for already working branch-cost optimization for 
>> IA-architecture and
>> two for explicit checking for boolean-type.
>>
>> ChangeLog
>>
>> 2011-10-20  Kai Tietz  
>>
>>        * fold-const.c (simple_operand_p_2): Handle integral
>>        casts from boolean-operands.
>>
>> 2011-10-20  Kai Tietz  
>>
>>        * gcc.target/i386/branch-cost1.c: New test.
>>        * gcc.target/i386/branch-cost2.c: New test.
>>        * gcc.target/i386/branch-cost3.c: New test.
>>        * gcc.target/i386/branch-cost4.c: New test.
>>
>> Bootstrapped and regression tested on x86_64-unknown-linux-gnu for all 
>> languages including Ada and Obj-C++.  Ok for apply?
>>
>> Regards,
>> Kai
>>
>> Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost2.c
>> ===
>> --- /dev/null
>> +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost2.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-gimple -mbranch-cost=2" } */
>> +
>> +extern int doo (void);
>> +
>> +int
>> +foo (int a, int b)
>> +{
>> +  if (a && b)
>> +   return doo ();
>> +  return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "if " 1 "gimple" } } */
>> +/* { dg-final { scan-tree-dump-times " & " 1 "gimple" } } */
>> +/* { dg-final { cleanup-tree-dump "gimple" } } */
>> Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost3.c
>> ===
>> --- /dev/null
>> +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost3.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-gimple -mbranch-cost=2" } */
>> +
>> +extern int doo (void);
>> +
>> +int
>> +foo (_Bool a, _Bool b)
>> +{
>> +  if (a && b)
>> +   return doo ();
>> +  return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "if " 1 "gimple" } } */
>> +/* { dg-final { scan-tree-dump-times " & " 1 "gimple" } } */
>> +/* { dg-final { cleanup-tree-dump "gimple" } } */
>> Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost4.c
>> ===
>> --- /dev/null
>> +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost4.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-gimple -mbranch-cost=0" } */
>> +
>> +extern int doo (void);
>> +
>> +int
>> +foo (_Bool a, _Bool b)
>> +{
>> +  if (a && b)
>> +   return doo ();
>> +  return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "if " 2 "gimple" } } */
>> +/* { dg-final { scan-tree-dump-not " & " "gimple" } } */
>> +/* { dg-final { cleanup-tree-dump "gimple" } } */
>> Index: gcc-head/gcc/fold-const.c
>> ===
>> --- gcc-head.orig/gcc/fold-const.c
>> +++ gcc-head/gcc/fold-const.c
>> @@ -3706,6 +3706,19 @@ simple_operand_p_2 (tree exp)
>>   /* Strip any conversions that don't change the machine mode.  */
>>   STRIP_NOPS (exp);
>>
>> +  /* Handle integral widening casts from boolean-typed
>> +     expressions as simple.  This happens due casts from
>> +     boolean-types are preserved, but FE might expands
>> +     simple-expression to wider mode.  */
>> +  if (INTEGRAL_TYPE_P (TREE_TYPE (exp))
>> +      && CONVERT_EXPR_P (exp)
>> +      && TREE_CODE (TREE_TYPE (TREE_OPERAND (exp, 0)))
>> +        == BOOLEAN_TYPE)
>> +    {
>> +      exp = TREE_OPERAND (exp, 0);
>> +      STRIP_NOPS (exp);
>> +    }
>> +
>
> Huh, well.  I think the above is just too special and you instead should
> replace the existing STRIP_NOPS by
>
> while (CONVERT_EXPR_P (exp))
>  exp = TREE_OPERAND (exp, 0);
>
> with a comment that conversions are considered simple.
>
> Ok with that change, if it bootstraps & tests ok.
>
> Richard.

Ok, bootstrapped and regression-tested on x86_64-unknown-linux-gnu and
applied to trunk with modifying as you suggested.

One question I have about handling of TRUTH-binaries in general in
fold-const.c.  Why aren't we enforcing already here in fold_binary for
those operations, that operands get boolean-type?  I see here some
advantages of C-AST folding.  I've tested it and saw that even later
in SSA-passes we get slightly better results on that.

Regards,
Kai

Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-21 Thread William J. Schmidt

On Fri, 2011-10-21 at 11:26 +0200, Richard Guenther wrote:
> On Tue, Oct 18, 2011 at 4:14 PM, William J. Schmidt
>  wrote:

> > +
> > +  /* We don't use get_def_for_expr for S1 because TER doesn't forward
> > + S1 in some situations where this transform is useful, such as
> > + when S1 is the base of two MEM_REFs fitting the pattern.  */
> > +  s1_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (exp, 0));
> 
> You can't do this - this will possibly generate wrong code.  You _do_
> have to use get_def_for_expr.  Or do it when we are still in "true" SSA 
> form...
> 
> Richard.
> 

OK.  get_def_for_expr always returns NULL here for the cases I was
targeting, so doing this in expand isn't going to be helpful.

Rather than cram this in somewhere else upstream, it might be better to
just wait and let this case be handled by the new strength reduction
pass.  This is one of the easy cases with explicit multiplies in the
instruction stream, so it shouldn't require any special handling there.
Seem reasonable?

Bill

Re: [Patch,AVR]: Use EIND consistently

2011-10-21 Thread Georg-Johann Lay

>> This patch adds support to consistently use EIND.
>>
>> The compiler never sets this SFR but uses it in table jumps and EIJMP/EICALL
>> instructions.
>>
>> Custom startup code could set EIND to an other value than 0 and the compiler
>> should use EIND consistently given that EIND might not be zero.
>>
>> EIND != 0 will need support of custom linker script to locate jump pads in an
>> other segment, but that's a different story.
>>
>> The patch undoes some changes from r179760 and introduces using EIND the 
>> other
>> way round: Not trying to avoid EIND altogether and assume code is supposed to
>> work in the lower segment only, but instead use EIND and not zero-reg when
>> simulating indirect jump by means of RET.
>>
>> With this patch, the application may set EIND to a custom value and invent 
>> own
>> linker script to place jump pads.  The assertion is that EIND never changes
>> throughout the application and therefore ISR prologue/epilogue need not care.
>>
>> With the gs() magic, code using indirect jumps works fine with that, e.g.
>> - Indirect calls
>> - Computed goto
>> - Jumping to 1: in prologue_saves
>>
>> What does not work as expected is to jump to const_int addresses like
>>
>> int main()
>> {
>>   ((void(*)(void))0)();
>>   return 0;
>> }
>>
>> Instead, code must read
>>
>> extern void my_address (void);
>>
>> int main()
>> {
>>my_address();
>>return 0;
>> }
>>
>> and compiled with, say -Wl,--defsym,my_address=0x2, so that a jump pad is
>> generated.
>>
>> Patch ok for trunk?
>>
>> Johann
>>
>>* config/avr/libgcc.S (__EIND__): New define to 0x3C.
>>(__tablejump__): Consistently use EIND for indirect jump/call.
>>(__tablejump_elpm__): Ditto.
> 
> Approved.
> 
> Denis.

Is this a thing to back port?

Johann

Re: [PATCHv2, RFA] Pass address space to REGNO_MODE_CODE_OK_FOR_BASE_P

2011-10-21 Thread Georg-Johann Lay

Ulrich Weigand schrieb:
> Georg-Johann Lay wrote:
>> Ulrich Weigand schrieb:
>>> Hello,
>>>
>>> Georg-Johann Lay has proposed a patch to add named address space support
>>> to the AVR target here:
>>> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00471.html
>>>
>>> Since the target needs to make register allocation decisions for
>>> address base registers depending on the target address space, a
>>> prerequiste for this is a patch of mine that I posted a while ago
>>> to add the address space to the MODE_CODE_BASE_REG_CLASS and
>>> REGNO_MODE_CODE_OK_FOR_BASE_P target macros.
>>>
>>> I've updated the patch for current mainline and re-tested on SPU
>>> with no regressions.
>> Meanwhile, there was some code clean-up to avr backend. Would you add this?

And there is the following lines in reload.c:

#ifdef LEGITIMIZE_RELOAD_ADDRESS
  do
{
  if (memrefloc && ADDR_SPACE_GENERIC_P (as))
{
  LEGITIMIZE_RELOAD_ADDRESS (ad, GET_MODE (*memrefloc), opnum, type,
 ind_levels, win);
}
  break;
win:
  *memrefloc = copy_rtx (*memrefloc);
  XEXP (*memrefloc, 0) = ad;
  move_replacements (&ad, &XEXP (*memrefloc, 0));
  return -1;
}
  while (0);
#endif

Does it make sense to extend LEGITIMIZE_RELOAD_ADDRESS, too?

For the target that needs your extension (AVR) there are different addressing
capabilities depending on AS and there is an implementation of L_R_A.

Johann

Re: [RFC PATCH] SLP vectorize calls

2011-10-21 Thread Ira Rosen

On 20 October 2011 23:50, Jakub Jelinek  wrote:
> Hi!

Hi,

>
> While looking at *.vect dumps from Polyhedron, I've noticed the lack
> of SLP vectorization of builtin calls.
>
> This patch is an attempt to handle at least 1 and 2 operand builtin calls
> (SLP doesn't handle ternary stmts either yet),

This is on the top of my todo list :).

> where all the types are the
> same.  E.g. it can handle
> extern float copysignf (float, float);
> extern float sqrtf (float);
> float a[8], b[8], c[8], d[8];
>
> void
> foo (void)
> {
>  a[0] = copysignf (b[0], c[0]) + 1.0f + sqrtf (d[0]);
>  a[1] = copysignf (b[1], c[1]) + 2.0f + sqrtf (d[1]);
>  a[2] = copysignf (b[2], c[2]) + 3.0f + sqrtf (d[2]);
>  a[3] = copysignf (b[3], c[3]) + 4.0f + sqrtf (d[3]);
>  a[4] = copysignf (b[4], c[4]) + 5.0f + sqrtf (d[4]);
>  a[5] = copysignf (b[5], c[5]) + 6.0f + sqrtf (d[5]);
>  a[6] = copysignf (b[6], c[6]) + 7.0f + sqrtf (d[6]);
>  a[7] = copysignf (b[7], c[7]) + 8.0f + sqrtf (d[7]);
> }
> and compile it into:
>        vmovaps .LC0(%rip), %ymm0
>        vandnps b(%rip), %ymm0, %ymm1
>        vandps  c(%rip), %ymm0, %ymm0
>        vorps   %ymm0, %ymm1, %ymm0
>        vsqrtps d(%rip), %ymm1
>        vaddps  %ymm1, %ymm0, %ymm0
>        vaddps  .LC1(%rip), %ymm0, %ymm0
>        vmovaps %ymm0, a(%rip)
> I've bootstrapped/regtested it on x86_64-linux and i686-linux, but
> am not 100% sure about all the changes, e.g. that
> || PURE_SLP_STMT (stmt_info) part.
>
> 2011-10-20  Jakub Jelinek  
>
>        * tree-vect-stmts.c (vectorizable_call): Add SLP_NODE argument.
>        Handle vectorization of SLP calls.
>        (vect_analyze_stmt): Adjust caller, add call to it for SLP too.
>        (vect_transform_stmt): Adjust vectorizable_call caller, remove
>        assertion.
>        * tree-vect-slp.c (vect_get_and_check_slp_defs): Handle one
>        and two argument calls too.
>        (vect_build_slp_tree): Allow CALL_EXPR.
>        (vect_get_slp_defs): Handle calls.
>
> --- gcc/tree-vect-stmts.c.jj    2011-10-20 14:13:34.0 +0200
> +++ gcc/tree-vect-stmts.c       2011-10-20 18:02:43.0 +0200
> @@ -1483,7 +1483,8 @@ vectorizable_function (gimple call, tree
>    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
>
>  static bool
> -vectorizable_call (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt)
> +vectorizable_call (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
> +                  slp_tree slp_node)
>  {
>   tree vec_dest;
>   tree scalar_dest;
> @@ -1494,6 +1495,7 @@ vectorizable_call (gimple stmt, gimple_s
>   int nunits_in;
>   int nunits_out;
>   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>   tree fndecl, new_temp, def, rhs_type;
>   gimple def_stmt;
>   enum vect_def_type dt[3]
> @@ -1505,19 +1507,12 @@ vectorizable_call (gimple stmt, gimple_s
>   size_t i, nargs;
>   tree lhs;
>
> -  /* FORNOW: unsupported in basic block SLP.  */
> -  gcc_assert (loop_vinfo);
> -
> -  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>     return false;
>
>   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
>     return false;
>
> -  /* FORNOW: SLP not supported.  */
> -  if (STMT_SLP_TYPE (stmt_info))
> -    return false;
> -
>   /* Is STMT a vectorizable call?   */
>   if (!is_gimple_call (stmt))
>     return false;
> @@ -1558,7 +1553,7 @@ vectorizable_call (gimple stmt, gimple_s
>       if (!rhs_type)
>        rhs_type = TREE_TYPE (op);
>
> -      if (!vect_is_simple_use_1 (op, loop_vinfo, NULL,
> +      if (!vect_is_simple_use_1 (op, loop_vinfo, bb_vinfo,
>                                 &def_stmt, &def, &dt[i], &opvectype))
>        {
>          if (vect_print_dump_info (REPORT_DETAILS))
> @@ -1620,7 +1615,13 @@ vectorizable_call (gimple stmt, gimple_s
>
>   gcc_assert (!gimple_vuse (stmt));
>
> -  if (modifier == NARROW)
> +  if (slp_node || PURE_SLP_STMT (stmt_info))
> +    {
> +      if (modifier != NONE)
> +       return false;
> +      ncopies = 1;
> +    }

If you want to bail out if it's SLP and modifier != NONE, this check
is not enough. PURE_SLP means the stmt is not used outside the SLP
instance, so for hybrid SLP stmts (those that have uses outside SLP)
this check will not work. You need

  if (modifier != NONE && STMT_SLP_TYPE (stmt_info))
 return false;

But I wonder why not allow different type sizes? I see that we fail in
such cases in vectorizable_conversion too, but I think we should
support this as well.

> +  else if (modifier == NARROW)
>     ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
>   else
>     ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
> @@ -1659,6 +1660,43 @@ vectorizable_call (gimple stmt, gimple_s
>          else
>            VEC_truncate (tree, vargs, 0);
>
> +         if (slp_node)
> +           {
> +             VEC(tree,heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
> +
> +             gcc_

[RFC ARM] Use vcvt.f32/64.s32 with immediate bits to do fixed to floating point conversions better.

2011-10-21 Thread Ramana Radhakrishnan

Hi,

Some time back Michael pointed out that the ARM backend doesn't generate
vcvt.f32.s where you have a conversion from fixed to floating point
as in the example below. It should also be possible to generate the vector forms
of this which will be the subject of a follow-up patch .

I've chosen to implement this in the following manner in the backend using
these interfaces from real.c . The reason I've chosen to not allow
this transformation in case flag_rounding_math is true is because this
instruction always ends up rounding using round-to-nearest rather than
obeying whats in the FPSCR and thus is not safe for programs that want
to dynamically set their rounding modes.


I have chosen to use the unified assembler syntax for this patch and
have a set of follow up patches that I've been working on that try to
replace all the old assembler mnemonics with the newer UAL ones. I
think gas has matured to a point where most of the new syntax for VFP
is now fully recognized and there's no reason why we shouldn't move
forward. What is the opinion in this regard ?

The benefits are quite obvious in that we eliminate a load from the
constant pool and a floating point multiply and thus essentially
shaving off a floating point multiply + Load latency off these
sequences. This instruction can only write the output into the same
register as the input register which is why I've modelled it as below
by tying op1 into op0. Also the i32 -> f64 cases were quite impossible
to model with
insn_and_splits and subreg modes which is what Richard and I tried to cook up.

If someone has an idea as to how this might be achieved I'm all ears
compared to the current way
in which it's all sort of tied together.

Also, if there's a simpler way of using the interfaces into real.c
then I'm all ears ?

OK for trunk ?

cheers
Ramana



* config/arm/arm.c (vfp3_const_double_for_fract_bits): Define.
* config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare.
* config/arm/constraints.md ("Dt"): New constraint.
* config/arm/predicates.md (const_double_vcvt_power_of_two_reciprocal):
New.
* config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New.
(*arm_combine_vcvt_f32_u32): New.

For the following testcases I see the code as follows with
-mfloat-abi=hard -mfpu=vfpv3 and -mcpu=cortex-a9

float foo (int i)
{
 float v = (float)i / (1 << 11);
 return v;
}
float foa_unsigned (unsigned int i)
{
 float v = (float)i / (1 << 5);
 return v;
}


After patch .

foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsrs0, r0  @ int
vcvt.f32.s32s0, s0, #11
bx  lr
.size   foo, .-foo
.align  2
.global foa_unsigned
.type   foa_unsigned, %function
foa_unsigned:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsrs0, r0  @ int
vcvt.f32.u32s0, s0, #5
bx  lr
.size   foa_unsigned, .-foa_unsigned
.align  2
.global foo1
.type   foo1, %function

rather than
.type   foo, %function
foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsrs15, r0 @ int
fsitos  s0, s15
fldss15, .L2
fmuls   s0, s0, s15
bx  lr
.L3:
.align  2
.L2:
.word   973078528
.size   foo, .-foo
.align  2
.global foa_unsigned
.type   foa_unsigned, %function
foa_unsigned:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsrs15, r0 @ int
fuitos   s0, s15
fldss15, .L5
fmuls   s0, s0, s15
bx  lr
.L6:
.align  2
.L5:
.word   1023410176
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 23a29c6..c933704 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -242,6 +242,7 @@ struct tune_params
 };
 
 extern const struct tune_params *current_tune;
+extern int vfp3_const_double_for_fract_bits (rtx);
 #endif /* RTX_CODE */
 
 #endif /* ! GCC_ARM_PROTOS_H */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1ada6f..266b757 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17606,6 +17606,11 @@ arm_print_operand (FILE *stream, rtx x, int code)
   }
   return;
 
+case 'v':
+	gcc_assert (GET_CODE (x) == CONST_DOUBLE);
+	fprintf (stream, "#%d", vfp3_const_double_for_fract_bits (x));
+	return;
+
 /* Register specifier for vld1.16/vst1.16.  Translate the S register
number into a D register number and element index.  */
 case 'z':
@@ -24972,4 +24977,27 @@ arm_count_output_move_double_insns (rtx *operands)

Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-10-21 Thread Ramana Radhakrishnan

Hi Sameera,

The comment about REG_FRAME_RELATED_EXPR vs REG_CFA_RESTORE from one
of your later patches
applies here as well.

>diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>index 3162b30..f86a3e6 100644
>--- a/gcc/config/arm/arm.c
>+++ b/gcc/config/arm/arm.c
>@@ -8754,6 +8754,140 @@ neon_valid_immediate (rtx op, enum machine_mode mode, 
>int inverse,
> #undef CHECK
> }
>
>+/* Return true if OP is a valid load multiple operation for MODE mode.
>+   CONSECUTIVE is true if the registers in the operation must form
>+   a consecutive sequence in the register bank.  STACK_ONLY is true
>+   if the base register must be the stack pointer.  RETURN_PC is true
>+   if value is to be loaded in PC.  */
>+bool
>+load_multiple_operation_p (rtx op, bool consecutive, enum machine_mode mode,
>+   bool stack_only, bool return_pc)
>+{

<...> snip

>+
>+  /* If DFMode, we must be asking for consecutive,
>+ since FLDMDD can only do consecutive regs.  */

s/DFMode/DFmode
s/FLDMDD/fldmdd (vldm.f64)

Why are you differentiating on stack_only ? Does it really matter ?


>+  gcc_assert ((mode != DFmode) || consecutive);
>+
>+  /* Set up the increments and the regs per val based on the mode.  */
>+  reg_increment = mode == DFmode ? 8 : 4;

Can't you just get the reg_increment based on GET_MODE_SIZE (mode) ?

>+  regs_per_val = mode == DFmode ? 2 : 1;
>+  offset_adj = return_pc ? 1 : 0;
>+
>+  if (count <= 1
>+  || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET
>+  || !REG_P (SET_DEST (XVECEXP (op, 0, offset_adj
>+return false;
>+
>+  /* Check to see if this might be a write-back.  */
>+  if (GET_CODE (SET_SRC (elt = XVECEXP (op, 0, offset_adj))) == PLUS)
>+{
>+  i++;
>+  base = 1;
>+  update = true;
>+
>+  /* Now check it more carefully.  */
>+  if (!REG_P (SET_DEST (elt))
>+  || !REG_P (XEXP (SET_SRC (elt), 0))
>+  || !CONST_INT_P (XEXP (SET_SRC (elt), 1))
>+  || INTVAL (XEXP (SET_SRC (elt), 1)) !=
>+  ((count - 1 - offset_adj) * reg_increment))
>+return false;

A comment here explaining that you are checking for the
increment amount being sane would be good.


>+
>+  /* Check the nature of the base_register being written to.  */
>+  if (stack_only && (REGNO (SET_DEST (elt)) != SP_REGNUM))
>+return false;
>+}
>+
>+  i = i + offset_adj;
>+  base = base + offset_adj;
>+  /* Perform a quick check so we don't blow up below.  */
>+  if (GET_CODE (XVECEXP (op, 0, i - 1)) != SET
>+  || !REG_P (SET_DEST (XVECEXP (op, 0, i - 1)))
>+  || !MEM_P (SET_SRC (XVECEXP (op, 0, i - 1
>+return false;
>+
>+  /* If only one reg being loaded, success depends on the type:
>+ FLDMDD can do just one reg, LDM must do at least two.  */

Hmmm isn't this true of only LDM's in Thumb state ? Though it could be argued
that this patch is only T2 epilogues.

>+  if (count <= i)
>+return mode == DFmode ? true : false;

Again a comment here would be useful.

>+
>+  first_dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, i - 1)));
>+  dest_regno = first_dest_regno;
>+
>+  src_addr = XEXP (SET_SRC (XVECEXP (op, 0, i - 1)), 0);
>+
>+  if (GET_CODE (src_addr) == PLUS)
>+{
>+  if (!CONST_INT_P (XEXP (src_addr, 1)))
>+  return false;

Watch out for the indentation of the return.

<...snip>
>+)
>+
>+(define_insn "*floating_point_pop_multiple_with_stack_update"

s/floating_point/vfp

>+  [(match_parallel 0 "load_multiple_operation_stack_fp"
>+[(set (match_operand:SI 1 "s_register_operand" "=k")
>+  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
>+   (match_operand:SI 3 "const_int_operand" "I")))
>+ (set (match_operand:DF 4 "arm_hard_register_operand" "")
>+  (mem:DF (match_dup 2)))])]
>+  "TARGET_THUMB2"

&& TARGET_HARD_FLOAT && TARGET_VFP

>+  "*
>+  {
>+int num_regs = XVECLEN (operands[0], 0);
>+static const struct { const char *const name; } table[]
>+  = { {\"d0\"}, {\"d1\"}, {\"d2\"}, {\"d3\"},
>+  {\"d4\"}, {\"d5\"}, {\"d6\"}, {\"d7\"},
>+  {\"d8\"}, {\"d9\"}, {\"d10\"}, {\"d11\"},
>+  {\"d12\"}, {\"d13\"}, {\"d14\"}, {\"d15\"},
>+  {\"d16\"}, {\"d17\"}, {\"d18\"}, {\"d19\"},
>+  {\"d20\"}, {\"d21\"}, {\"d22\"}, {\"d23\"},
>+  {\"d24\"}, {\"d25\"}, {\"d26\"}, {\"d27\"},
>+  {\"d28\"}, {\"d29\"}, {\"d30\"}, {\"d31\"} };


>+int i;
>+char pattern[100];
>+strcpy (pattern, \"fldmfdd\\t\");
>+strcat (pattern,
>+reg_names[REGNO (SET_DEST (XVECEXP (operands[0], 0, 
>0)))]);
>+strcat (pattern, \"!, {\");
>+strcat (pattern, table[(REGNO (XEXP (XVECEXP (operands[0], 0, 1), 0))
>+   - FIRST_VFP_REGNUM) / 2].name);

Can't you reuse names from arm.h and avoid the table here ?


>+for (i = 2; i < num_regs; i++)
>+  {
>+s

Re: [RFA/ARM][Patch 01/05]: Create tune for Cortex-A15.

2011-10-21 Thread Ramana Radhakrishnan

> 2011-10-11  Sameera Deshpande
> 
>
>        * config/arm/arm-cores.def (cortex_a15): Update.
>        * config/arm/arm-protos.h (struct tune_params): Add new field...
>          (arm_gen_ldrd_strd): ... this.
>        * config/arm/arm.c (arm_slowmul_tune): Add
>          arm_gen_ldrd_strd field settings.
>          (arm_fastmul_tune): Likewise.
>          (arm_strongarm_tune): Likewise.
>          (arm_xscale_tune): Likewise.
>          (arm_9e_tune): Likewise.
>          (arm_v6t2_tune): Likewise.
>          (arm_cortex_tune): Likewise.
>          (arm_cortex_a5_tune): Likewise.
>          (arm_cortex_a9_tune): Likewise.
>          (arm_fa726te_tune): Likewise.
>          (arm_cortex_a15_tune): New variable.
> --

OK.

Ramana

>
>

Re: [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue.

2011-10-21 Thread Ramana Radhakrishnan

On 11 October 2011 10:27, Sameera Deshpande  wrote:
> Hi!
>
> This patch generates STRD instruction instead of PUSH in thumb2 mode for
> A15.
>
> For optimize_size, original prologue is generated for A15.
> The work involves defining new functions, predicates and patterns.
>


> +/* Generate and emit a pattern that will be recognized as STRD pattern.  If 
> even
> +   number of registers are being pushed, multiple STRD patterns are created 
> for
> +   all register pairs.  If odd number of registers are pushed, first 
> register is

numchar > 80

> +   stored by using STR pattern.  */

s/stored/Stored.
A better comment would be

"Emit a combination of strd and str's for the prologue saves.  "

> +static void
> +thumb2_emit_strd_push (unsigned long saved_regs_mask)
> +{
> +  int num_regs = 0;
> +  int i, j;
> +  rtx par = NULL_RTX;
> +  rtx insn = NULL_RTX;
> +  rtx dwarf = NULL_RTX;
> +  rtx tmp, reg, tmp1;
> +
> +  for (i = 0; i <= LAST_ARM_REGNUM; i++)
> +if (saved_regs_mask & (1 << i))
> +  num_regs++;
> +
> +  gcc_assert (num_regs && num_regs <= 16);
> +
> +  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
> + registers to push.  */
> +  tmp = gen_rtx_SET (VOIDmode,
> + stack_pointer_rtx,
> + plus_constant (stack_pointer_rtx, -4 * num_regs));
> +  RTX_FRAME_RELATED_P (tmp) = 1;
> +  insn = emit_insn (tmp);
> +
> +  /* Create sequence for DWARF info.  */
> +  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
> +
> +  /* RTLs cannot be shared, hence create new copy for dwarf.  */
> +  tmp1 = gen_rtx_SET (VOIDmode,
> + stack_pointer_rtx,
> + plus_constant (stack_pointer_rtx, -4 * num_regs));
> +  RTX_FRAME_RELATED_P (tmp1) = 1;
> +  XVECEXP (dwarf, 0, 0) = tmp1;
> +
> +  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
> +/* Var j iterates over all the registers to gather all the registers in
> +   saved_regs_mask.  Var i gives index of register R_j in stack frame.
> +   A PARALLEL RTX of register-pair is created here, so that pattern for
> +   STRD can be matched.  If num_regs is odd, 1st register will be pushed
> +   using STR and remaining registers will be pushed with STRD in pairs.
> +   If num_regs is even, all registers are pushed with STRD in pairs.
> +   Hence, skip first element for odd num_regs.  */

Comment before the loop please.

> +if (saved_regs_mask & (1 << j))
> +  {
> +gcc_assert (j != SP_REGNUM);
> +gcc_assert (j != PC_REGNUM);
> +
> +/* Create RTX for store.  New RTX is created for dwarf as
> +   they are not sharable.  */
> +reg = gen_rtx_REG (SImode, j);
> +tmp = gen_rtx_SET (SImode,
> +   gen_frame_mem
> +   (SImode,
> +plus_constant (stack_pointer_rtx, 4 * i)),
> +   reg);
> +
> +tmp1 = gen_rtx_SET (SImode,
> +   gen_frame_mem
> +   (SImode,
> +plus_constant (stack_pointer_rtx, 4 * i)),
> +   reg);
> +RTX_FRAME_RELATED_P (tmp) = 1;
> +RTX_FRAME_RELATED_P (tmp1) = 1;
> +
> +if (((i - (num_regs % 2)) % 2) == 1)
> +  /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet 
> to
> + be created.  Hence create it first.  The STRD pattern we are
> + generating is :
> + [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
> +   (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
> + were target registers need not be consecutive.  */
> +  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
> +
> +/* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
> +   even, the reg_j is added as 0th element and if it is odd, reg_i is
> +   added as 1st element of STRD pattern shown above.  */
> +XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
> +XVECEXP (dwarf, 0, (i + 1)) = tmp1;
> +
> +if (((i - (num_regs % 2)) % 2) == 0)
> +  /* When (i - (num_regs % 2)) is even, RTXs for both the registers
> + to be loaded are generated in above given STRD pattern, and the
> + pattern can be emitted now.  */
> +  emit_insn (par);
> +
> +i--;
> +  }
> +
> +  if ((num_regs % 2) == 1)
> +{
> +  /* If odd number of registers are pushed, generate STR pattern to store
> + lone register.  */
> +  for (; (saved_regs_mask & (1 << j)) == 0; j--);
> +
> +  tmp1 = gen_frame_mem (SImode, plus_constant (stack_pointer_rtx, 4 * 
> i));
> +  reg = gen_rtx_REG (SImode, j);
> +  tmp = gen_rtx_SET (SImode, tmp1, reg);
> +  RTX_FRAME_RELATED_P (tmp) = 1;
> +
> +  emit_insn (tmp);
> +
> +  tmp1 = gen_rtx_SET (SImode,
> + gen_f

Re: [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.

2011-10-21 Thread Ramana Radhakrishnan

>+/* STRD in ARM mode needs consecutive registers to be stored.  This function
>+   keeps accumulating non-consecutive registers until first consecutive 
>register

numchar > 80.

>+   pair is found.  It then generates multi-reg PUSH for all accumulated
>+   registers, and then generates STRD with write-back for consecutive register
>+   pair.  This process is repeated until all the registers are stored on 
>stack.

And again.

>+   multi-reg PUSH takes care of lone registers as well.  */

s/multi-reg/Multi register

>+static void
>+arm_emit_strd_push (unsigned long saved_regs_mask)

How different is this from the thumb2 version you sent out in Patch 03/05 ?

>+{
>+  int num_regs = 0;
>+  int i, j;
>+  rtx par = NULL_RTX;
>+  rtx dwarf = NULL_RTX;
>+  rtx insn = NULL_RTX;
>+  rtx tmp, tmp1;
>+  unsigned long regs_to_be_pushed_mask;
>+
>+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
>+if (saved_regs_mask & (1 << i))
>+  num_regs++;
>+
>+  gcc_assert (num_regs && num_regs <= 16);
>+
>+  for (i=0, j = LAST_ARM_REGNUM, regs_to_be_pushed_mask = 0; i < num_regs; 
>j--)
>+/* Var j iterates over all registers to gather all registers in
>+   saved_regs_mask.  Var i is used to count number of registers stored on
>+   stack.  regs_to_be_pushed_mask accumulates non-consecutive registers
>+   that can be pushed using multi-reg PUSH before STRD is generated.  */

Comment above loop.

<...snip...>

>@@ -15958,7 +16081,8 @@ arm_get_frame_offsets (void)
>use 32-bit push/pop instructions.  */
> if (! any_sibcall_uses_r3 ()
> && arm_size_return_regs () <= 12
>-&& (offsets->saved_regs_mask & (1 << 3)) == 0)
>+&& (offsets->saved_regs_mask & (1 << 3)) == 0
>+  && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd))

Not sure I completely follow this change yet.

>@@ -16427,9 +16551,12 @@ arm_expand_prologue (void)
>   }
>   }
>
>-  if (TARGET_THUMB2 && current_tune->prefer_ldrd_strd && !optimize_size)
>+  if (current_tune->prefer_ldrd_strd && !optimize_size)

s/optimize_size/optimize_function_for_size ()

> {
>-  thumb2_emit_strd_push (live_regs_mask);
>+  if (TARGET_THUMB2)
>+thumb2_emit_strd_push (live_regs_mask);
>+  else
>+arm_emit_strd_push (live_regs_mask);
> }
>   else
> {
>diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
>index e3dcd4f..3c729bb 100644
>--- a/gcc/config/arm/ldmstm.md
>+++ b/gcc/config/arm/ldmstm.md
>@@ -73,6 +73,42 @@
>   [(set_attr "type" "store2")
>(set_attr "predicable" "yes")])
>
>+(define_insn "*arm_strd_base"
>+  [(set (match_operand:SI 0 "arm_hard_register_operand" "+rk")
>+(plus:SI (match_dup 0)
>+ (const_int -8)))
>+   (set (mem:SI (match_dup 0))
>+(match_operand:SI 1 "arm_hard_register_operand" "r"))
>+   (set (mem:SI (plus:SI (match_dup 0)
>+ (const_int 4)))
>+(match_operand:SI 2 "arm_hard_register_operand" "r"))]
>+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
>+ && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
>+ && (REGNO (operands[1]) != REGNO (operands[0]))
>+ && (REGNO (operands[2]) != REGNO (operands[0])))"
>+  "str%(d%)\t%1, %2, [%0, #-8]!"
>+  [(set_attr "type" "store2")
>+   (set_attr "predicable" "yes")])


Hmmm the question remains if we want to put these into ldmstm.md since
it was theoretically
auto-generated from ldmstm.ml. If this has to be marked to be separate
then I'd like
to regenerate ldmstm.md from ldmstm.ml and differentiate between the
bits that can be auto-generated
and the bits that have been added since.

Otherwise OK.

Ramana

Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.

2011-10-21 Thread Ramana Radhakrishnan

> 2011-10-11  Sameera Deshpande
> 
>
>        * config/arm/arm.c (arm_emit_ldrd_pop): New static function.
>          (arm_expand_epilogue): Update.
>        * config/arm/ldmstm.md (arm_ldrd_base): New pattern.
>          (arm_ldr_with_update): Likewise.

rth's comment about REG_CFA_RESTORE applies here as well. Please
change that. Other than that this patch looks OK and please watch out
for stylistic issues from the previous patch.

Ramana

> --
>
>
>

Re: [RFC PATCH] SLP vectorize calls

2011-10-21 Thread Jakub Jelinek

On Fri, Oct 21, 2011 at 02:37:06PM +0200, Ira Rosen wrote:
> > @@ -1620,7 +1615,13 @@ vectorizable_call (gimple stmt, gimple_s
> >
> >   gcc_assert (!gimple_vuse (stmt));
> >
> > -  if (modifier == NARROW)
> > +  if (slp_node || PURE_SLP_STMT (stmt_info))
> > +    {
> > +      if (modifier != NONE)
> > +       return false;
> > +      ncopies = 1;
> > +    }
> 
> If you want to bail out if it's SLP and modifier != NONE, this check
> is not enough. PURE_SLP means the stmt is not used outside the SLP
> instance, so for hybrid SLP stmts (those that have uses outside SLP)
> this check will not work. You need
> 
>   if (modifier != NONE && STMT_SLP_TYPE (stmt_info))
>  return false;

I just blindly copied what vectorizable_operation does, without
too much understanding what PURE_SLP_STMT or STMT_SLP_TYPE etc. mean.
Didn't get that far.
But modifier != NONE && something would sometimes allow modifier != NONE
through, which at least the current code isn't prepared to handle.
Did you mean || instead?

> But I wonder why not allow different type sizes? I see that we fail in
> such cases in vectorizable_conversion too, but I think we should
> support this as well.

Merely because I don't know SLP enough, vectorizable_operation also
handles just same size to same size, so I didn't have good examples
on how to do it.  For loops narrowing or widening operations are
handled through ncopies != 1, but for SLP it seems it is always
asserted it is 1...

> No need in \n.

Ok.

> >   for (i = 0; i < number_of_oprnds; i++)
> >     {
> > -      oprnd = gimple_op (stmt, i + 1);
> > +      if (is_gimple_call (stmt))
> > +       oprnd = gimple_call_arg (stmt, i);
> > +      else
> > +       oprnd = gimple_op (stmt, i + 1);
> >
> >       if (!vect_is_simple_use (oprnd, loop_vinfo, bb_vinfo, &def_stmt, 
> > &def[i],
> >                                &dt[i])
> 
> I think you forgot to check that all the calls are to the same function.

Right, that is easy to add, but modifier != NONE is something I have no idea
how to do currently.

Jakub

Re: [i386, PR50740] CPUID leaf 7 for BMI/BMI2/AVX2 feature detection not qualified with max_level and doesn't use subleaf

2011-10-21 Thread Uros Bizjak

On Fri, Oct 21, 2011 at 12:46 PM, Kirill Yukhin  wrote:

> Here is the patch which checks CPUID correctly to get BMI/BMI2/AVX2 feature.
>
> ChangeLog entry is:
> 2011-10-21  H.J. Lu  
>            Kirill Yukhin  
>
>        * config/i386/driver-i386.c (host_detect_local_cpu): Do cpuid 7 only
>        if max_level allows that.
>
> testsuite/ChangeLg entry is
> 2011-10-21  H.J. Lu  
>            Kirill Yukhin  
>
>        * gcc.target/i386/avx2-check.h (main): Check CPU level
>        correctly.

Chech CPUID level ...

>        * gcc.target/i386/bmi2-check.h: Ditto.
>
> Bootstrap has passed.

OK.

Thanks,
Uros.

Re: [PATCHv2, RFA] Pass address space to REGNO_MODE_CODE_OK_FOR_BASE_P

2011-10-21 Thread Ulrich Weigand

Georg-Johann Lay wrote:

> Does it make sense to extend LEGITIMIZE_RELOAD_ADDRESS, too?
> 
> For the target that needs your extension (AVR) there are different addressing
> capabilities depending on AS and there is an implementation of L_R_A.

I'd say that it is an independent issue.  LEGITIMIZE_RELOAD_ADDRESS is
supposed to used for performance improvements only, it should never be
necessary for correctness.  With the current code, the target just doesn't
get the opportunity to optimize address reloads for non-generic address
spaces -- so far, this has never come up as requirement.

If you'd like to add such optimizations for AVR, it would be straightforward
to pass the address space argument to LEGITIMIZE_RELOAD_ADDRESS.  But that
can certainly be done as an independent patch.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

strlen optimization of decompose strcat?!

2011-10-21 Thread Andreas Krebbel

Hi,

on s390 a strcat is already decomposed by fold_builtin_strcat into a
strlen and a strcpy.  Due to that 3 strlenopt testcases currently
fail: strlenopt-17g.c, strlenopt-4.c, strlenopt-4g.c.

For strlenopt-4.c no optimization is expected anyway (stpcpy
disabled). So this can easily be fixed by adding s390 specific scan
values.

The other two probably could be fixed by modifying the strlen
optimization to be able to reconstruct the strcat semantics.

The attached quick hack fixes strlenopt-17g.c for s390 without
regressions on x86_64.

For strlenopt-4g.c I only could get it working with setting (too) many
dont_invalidate flags on strinfos.  Otherwise the information needed
to reconstruct the strcat semantics don't survive long enough.

Perhaps we can remove the strcat folding relying on the strlen
optimization pass to catch all relevant cases?

Bye,

-Andreas-

Index: gcc/tree-ssa-strlen.c
===
*** gcc/tree-ssa-strlen.c.orig
--- gcc/tree-ssa-strlen.c
*** get_string_length (strinfo si)
*** 397,403 
callee = gimple_call_fndecl (stmt);
gcc_assert (callee && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL);
lhs = gimple_call_lhs (stmt);
!   gcc_assert (builtin_decl_implicit_p (BUILT_IN_STRCPY));
/* unshare_strinfo is intentionally not called here.  The (delayed)
 transformation of strcpy or strcat into stpcpy is done at the place
 of the former strcpy/strcat call and so can affect all the strinfos
--- 397,403 
callee = gimple_call_fndecl (stmt);
gcc_assert (callee && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL);
lhs = gimple_call_lhs (stmt);
!   gcc_assert (builtin_decl_implicit_p (BUILT_IN_STPCPY));
/* unshare_strinfo is intentionally not called here.  The (delayed)
 transformation of strcpy or strcat into stpcpy is done at the place
 of the former strcpy/strcat call and so can affect all the strinfos
*** handle_builtin_strcpy (enum built_in_fun
*** 1115,1127 
dsi->writable = true;
dsi->dont_invalidate = true;
  
!   if (dsi->length == NULL_TREE)
  {
/* If string length of src is unknown, use delayed length
 computation.  If string lenth of dst will be needed, it
 can be computed by transforming this strcpy call into
 stpcpy and subtracting dst from the return value.  */
!   dsi->stmt = stmt;
return;
  }
  
--- 1115,1144 
dsi->writable = true;
dsi->dont_invalidate = true;
  
!   if (builtin_decl_implicit_p (BUILT_IN_STPCPY) && dsi->length == NULL_TREE)
  {
+   strinfo psi;
+ 
/* If string length of src is unknown, use delayed length
 computation.  If string lenth of dst will be needed, it
 can be computed by transforming this strcpy call into
 stpcpy and subtracting dst from the return value.  */
!   psi = get_strinfo (dsi->prev);
!   if (psi
! && psi->next == dsi->idx
! && psi->first == dsi->first
! && psi->endptr == dsi->ptr)
!   {
! strinfo npsi = unshare_strinfo (psi);
! npsi->stmt = stmt;
! npsi->next = 0;
! npsi->length = NULL_TREE;
! npsi->endptr = NULL_TREE;
! npsi->dont_invalidate = true;
!   }
!   else
!   dsi->stmt = stmt;
! 
return;
  }

Re: [Patch,AVR]: Use EIND consistently

2011-10-21 Thread Denis Chertykov

2011/10/21 Georg-Johann Lay :
>>> This patch adds support to consistently use EIND.
>>>
>>> The compiler never sets this SFR but uses it in table jumps and EIJMP/EICALL
>>> instructions.
>>>
>>> Custom startup code could set EIND to an other value than 0 and the compiler
>>> should use EIND consistently given that EIND might not be zero.
>>>
>>> EIND != 0 will need support of custom linker script to locate jump pads in 
>>> an
>>> other segment, but that's a different story.
>>>
>>> The patch undoes some changes from r179760 and introduces using EIND the 
>>> other
>>> way round: Not trying to avoid EIND altogether and assume code is supposed 
>>> to
>>> work in the lower segment only, but instead use EIND and not zero-reg when
>>> simulating indirect jump by means of RET.
>>>
>>> With this patch, the application may set EIND to a custom value and invent 
>>> own
>>> linker script to place jump pads.  The assertion is that EIND never changes
>>> throughout the application and therefore ISR prologue/epilogue need not 
>>> care.
>>>
>>> With the gs() magic, code using indirect jumps works fine with that, e.g.
>>> - Indirect calls
>>> - Computed goto
>>> - Jumping to 1: in prologue_saves
>>>
>>> What does not work as expected is to jump to const_int addresses like
>>>
>>> int main()
>>> {
>>>   ((void(*)(void))0)();
>>>   return 0;
>>> }
>>>
>>> Instead, code must read
>>>
>>> extern void my_address (void);
>>>
>>> int main()
>>> {
>>>    my_address();
>>>    return 0;
>>> }
>>>
>>> and compiled with, say -Wl,--defsym,my_address=0x2, so that a jump pad 
>>> is
>>> generated.
>>>
>>> Patch ok for trunk?
>>>
>>> Johann
>>>
>>>        * config/avr/libgcc.S (__EIND__): New define to 0x3C.
>>>        (__tablejump__): Consistently use EIND for indirect jump/call.
>>>        (__tablejump_elpm__): Ditto.
>>
>> Approved.
>>
>> Denis.
>
> Is this a thing to back port?
>

As you please.

Denis.

Re: [C++ Patch] PR 50811 (rejects class-virt-specifier if class-head-name includes nested-name-specifier)

2011-10-21 Thread Jason Merrill


Applied, thanks.

Jason

Re: [RFC PATCH] SLP vectorize calls

2011-10-21 Thread Ira Rosen

On 21 October 2011 14:52, Jakub Jelinek  wrote:
> On Fri, Oct 21, 2011 at 02:37:06PM +0200, Ira Rosen wrote:
>> > @@ -1620,7 +1615,13 @@ vectorizable_call (gimple stmt, gimple_s
>> >
>> >   gcc_assert (!gimple_vuse (stmt));
>> >
>> > -  if (modifier == NARROW)
>> > +  if (slp_node || PURE_SLP_STMT (stmt_info))
>> > +    {
>> > +      if (modifier != NONE)
>> > +       return false;
>> > +      ncopies = 1;
>> > +    }
>>
>> If you want to bail out if it's SLP and modifier != NONE, this check
>> is not enough. PURE_SLP means the stmt is not used outside the SLP
>> instance, so for hybrid SLP stmts (those that have uses outside SLP)
>> this check will not work. You need
>>
>>   if (modifier != NONE && STMT_SLP_TYPE (stmt_info))
>>      return false;
>
> I just blindly copied what vectorizable_operation does, without
> too much understanding what PURE_SLP_STMT or STMT_SLP_TYPE etc. mean.
> Didn't get that far.
> But modifier != NONE && something would sometimes allow modifier != NONE
> through, which at least the current code isn't prepared to handle.
> Did you mean || instead?

But it's OK to allow modifier != NONE if it's not SLP, so we need &&, no?
Something like:

if (modifier != NONE && STMT_SLP_TYPE (stmt_info))
   return false;

if (slp_node || PURE_SLP_STMT (stmt_info))
   ncopies = 1;
else if (modifier == NARROW)
   ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
else
   ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;

>
>> But I wonder why not allow different type sizes? I see that we fail in
>> such cases in vectorizable_conversion too, but I think we should
>> support this as well.
>
> Merely because I don't know SLP enough, vectorizable_operation also
> handles just same size to same size, so I didn't have good examples
> on how to do it.  For loops narrowing or widening operations are
> handled through ncopies != 1, but for SLP it seems it is always
> asserted it is 1...

There are vectorizable_type_promotion/demotion, and for the rest the
copies are "hidden" inside multiple vector operands that you get from
vect_get_vec_defs. But, of course, there is not need to handle
modifier == NARROW for SLP at the moment. I was just wondering out
loud.

Ira

>
>        Jakub
>

Re: [C++ Patch] __builtin_choose_expr bump

2011-10-21 Thread Andy Gibbs

On Friday, October 21, 2011 12:04 PM, Richard Guenther wrote:
> What's the motivation for this?  Why can't it be implemented using C++
> features such as templates and specialization?

Sorry for the delay in my reply.

The motivation, simply put, is to be able to create both templates and 
macros that choose between two or more courses of action at compile time and 
do not generate code for the unchosen action(s), nor (importantly) cause 
compiler errors if the unchosen action(s) should be ill-formed.  The end aim 
is in the generation of source code libraries for compile-time evaluation.

An example in the template domain would be, having template specialisation 
without actually needing to specialise the template.  Ordinarily, 
specialising a template requires a copy to be made and modified according to 
the specialisation.  If the "base" template is changed, the specialisation 
also needs to be so adapted.  Missing one specialisation in a long list 
leads to unpredictable problems (yes, there is bitter experience here!).  If 
the majority of code is identical then it makes sense to try to either 
factor out the identical code and use some meta-template If struct or other 
such magic to limit the scope of the template specialisation.  This can 
become brain-bending if not impossible if, for example, any of the relevant 
expressions cannot be parameters into a meta-template construct.  It can 
also make the code very difficult to follow and debug.  In these cases, 
__builtin_choose_expr can be beneficial to avoid the need to specialise, 
while keeping the clarity and breadth of the template's intended function.

When it comes to macros, the use of __builtin_choose_expr can be very useful 
where one of the expressions would cause a compiler error.  An example of 
such a macro is as follows:

#define CONST(v) \
   __builtin_choose_expr(util::can_use_const_wrapper::value, \
   (util::const_wrapper::value) : \
   ([&](){ constexpr auto o = (v); return o; })())

namespace util
 {
 template 
 struct const_wrapper
  { static constexpr T value = t; };

 template 
 struct can_use_const_wrapper; /* code removed to keep example short! */
 /* value = true for types that can be used in template value parameters */
 }

This is a part of a larger library of code for constexpr functions, a 
majority of which implement algorithms in such a way that the compiler can 
evaluate them but which if evaluated at run-time would have a huge 
performance hit (e.g. CRC calculation on a string).  In this library, a 
second macro FORCE_CONSTEXPR is used to ensure these functions cannot be 
called at run-time or even inlined, but only evaluated at compile-time.  The 
CONST macro is then used, among other things, to hint to the compiler that 
it needs to evaluate at compile-time.

If __builtin_choose_expr in the CONST macro was replaced with "? :" then the 
code using it would fail to compile in certain situations, notably because 
C++ only permits certain types as template value parameters.  However, it is 
not sufficient either to simply use the lambda expression alone since this 
fails with compiler errors in other situations.  This is, I think, one 
example where it is not possible to use templates to get round the 
problem?...

There is a further use of __builtin_choose_expr in this constexpr library, 
which is similar to its use in C, which is to enable a macro function to 
wrap and choose between a compile-time version of the algorithm and a 
run-time version of the algorithm, for example:

#define CrcString(str) (__builtin_choose_expr(__builtin_constant_p(str), \
   CONST(CrcString_compiler(str)),   \
   CrcString_runtime(str)))

auto string = CrcString("test");

This might of course be implemented using "? :", but only if 
CrcString_compiler(...) and CrcString_runtime(...) return compatible types, 
which need not be so with __builtin_choose_expr.

I hope this gives you some feeling for how __builtin_choose_expr may be 
used, or for that matter how I use it.  I expect that others will find much 
more impressive uses!

Thanks

Andy

Re: [i386, PR50740] CPUID leaf 7 for BMI/BMI2/AVX2 feature detection not qualified with max_level and doesn't use subleaf

2011-10-21 Thread Kirill Yukhin

Thanks,

Updated testsuite/ChangeLog:
2011-10-21  H.J. Lu  
Kirill Yukhin  

* gcc.target/i386/avx2-check.h (main): Check CPUID level
correctly.
* gcc.target/i386/bmi2-check.h: Ditto.


Could anybody please commit that?

K

On Fri, Oct 21, 2011 at 4:56 PM, Uros Bizjak  wrote:
> On Fri, Oct 21, 2011 at 12:46 PM, Kirill Yukhin  
> wrote:
>
>> Here is the patch which checks CPUID correctly to get BMI/BMI2/AVX2 feature.
>>
>> ChangeLog entry is:
>> 2011-10-21  H.J. Lu  
>>            Kirill Yukhin  
>>
>>        * config/i386/driver-i386.c (host_detect_local_cpu): Do cpuid 7 only
>>        if max_level allows that.
>>
>> testsuite/ChangeLg entry is
>> 2011-10-21  H.J. Lu  
>>            Kirill Yukhin  
>>
>>        * gcc.target/i386/avx2-check.h (main): Check CPU level
>>        correctly.
>
> Chech CPUID level ...
>
>>        * gcc.target/i386/bmi2-check.h: Ditto.
>>
>> Bootstrap has passed.
>
> OK.
>
> Thanks,
> Uros.
>

Re: [C++ Patch] PR 30066

2011-10-21 Thread Jason Merrill


Need to make sure that this comment is still accurate:


  /* Local statics and classes get the visibility of their
 containing function by default, except that
 -fvisibility-inlines-hidden doesn't affect them.  */


i.e. given

inline int * f() { static int i; return &i; }

int main()
{
  f();
}

f()::i should not be hidden, nor should a local class.

Jason

Re: strlen optimization of decompose strcat?!

2011-10-21 Thread Jakub Jelinek

On Fri, Oct 21, 2011 at 03:20:54PM +0200, Andreas Krebbel wrote:
> on s390 a strcat is already decomposed by fold_builtin_strcat into a
> strlen and a strcpy.  Due to that 3 strlenopt testcases currently
> fail: strlenopt-17g.c, strlenopt-4.c, strlenopt-4g.c.

Well, fold_builtin_strcat does such kind of optimization on all targets,
but only for known lengths of the second argument.

> Perhaps we can remove the strcat folding relying on the strlen
> optimization pass to catch all relevant cases?

I think that would be the best thing to do, nuke the
if (optimize_insn_for_speed_p ()) part of fold_builtin_strcat and
add it instead into handle_builtin_strcat if we'd keep it as strcat
after that pass.  That said, your change makes sense and the nuking
of that folding can be done separately.

> *** gcc/tree-ssa-strlen.c.orig
> --- gcc/tree-ssa-strlen.c
> *** get_string_length (strinfo si)
> *** 397,403 
> callee = gimple_call_fndecl (stmt);
> gcc_assert (callee && DECL_BUILT_IN_CLASS (callee) == 
> BUILT_IN_NORMAL);
> lhs = gimple_call_lhs (stmt);
> !   gcc_assert (builtin_decl_implicit_p (BUILT_IN_STRCPY));
> /* unshare_strinfo is intentionally not called here.  The (delayed)
>transformation of strcpy or strcat into stpcpy is done at the place
>of the former strcpy/strcat call and so can affect all the strinfos
> --- 397,403 
> callee = gimple_call_fndecl (stmt);
> gcc_assert (callee && DECL_BUILT_IN_CLASS (callee) == 
> BUILT_IN_NORMAL);
> lhs = gimple_call_lhs (stmt);
> !   gcc_assert (builtin_decl_implicit_p (BUILT_IN_STPCPY));
> /* unshare_strinfo is intentionally not called here.  The (delayed)
>transformation of strcpy or strcat into stpcpy is done at the place
>of the former strcpy/strcat call and so can affect all the strinfos

The above hunk is correct.

> *** handle_builtin_strcpy (enum built_in_fun
> *** 1115,1127 
> dsi->writable = true;
> dsi->dont_invalidate = true;
>   
> !   if (dsi->length == NULL_TREE)
>   {
> /* If string length of src is unknown, use delayed length
>computation.  If string lenth of dst will be needed, it
>can be computed by transforming this strcpy call into
>stpcpy and subtracting dst from the return value.  */
> !   dsi->stmt = stmt;
> return;
>   }
>   
> --- 1115,1144 
> dsi->writable = true;
> dsi->dont_invalidate = true;
>   
> !   if (builtin_decl_implicit_p (BUILT_IN_STPCPY) && dsi->length == NULL_TREE)

Why this?  dsi->length will be NULL only if src length is unknown.
And at that point
  case BUILT_IN_STRCPY:
  case BUILT_IN_STRCPY_CHK:
if (lhs != NULL_TREE || !builtin_decl_implicit_p (BUILT_IN_STPCPY))
  return;
break;
should have already returned if it is not true.

>   {
> +   strinfo psi;
> + 
> /* If string length of src is unknown, use delayed length
>computation.  If string lenth of dst will be needed, it
>can be computed by transforming this strcpy call into
>stpcpy and subtracting dst from the return value.  */
> !   psi = get_strinfo (dsi->prev);

I think you should do
  if (dsi->prev != 0 && verify_related_strinfos (dsi) != NULL)
{
  psi = get_strinfo (dsi->prev);
  if (psi->endptr == dsi->ptr)
...
}

> !   strinfo npsi = unshare_strinfo (psi);

No need to add a new pointer for that.  Just do psi = unshare_strinfo (psi);

> !   npsi->stmt = stmt;
> !   npsi->next = 0;
> !   npsi->length = NULL_TREE;
> !   npsi->endptr = NULL_TREE;
> !   npsi->dont_invalidate = true;
> ! }
> !   else
> ! dsi->stmt = stmt;

Do you really need to clear npsi->next and not set dsi->stmt
if you tweak npsi->stmt?  I mean, if you have:
#define _GNU_SOURCE /* to make stpcpy prototype visible */
#include 

size_t
foo (char *p, char *q)
{
  size_t l1, l2;
  char *r = strchr (p, '\0');
  strcpy (r, q);
  l1 = strlen (p);
  l2 = strlen (r); /* Or with swapped order.  */
  return l1 + l2;
}

you can compute either l1, or l2, or both using get_string_length
strcpy -> stpcpy transformation (and in any order).
Perhaps we could avoid invalidating not just the previous one,
but also all earlier ones with endptr == dsi->ptr, say if you have

size_t
bar (char *p, char *q)
{
  size_t l1, l2, l3;
  char *r = strchr (p, '\0');
  strcpy (r, "abcde");
  char *s = strchr (r, '\0');
  strcpy (s, q);
  l1 = strlen (p);
  l2 = strlen (r);
  l3 = strlen (s);
  return l1 + l2 + l3;
}

No matter what, it needs to have sufficient testsuite coverage added,
using something that will not depend on the HAVE_movstr strcat
folding (i.e. strchr + strcpy (or strlen + strcpy)).

Jakub

Re: [RFC PATCH] SLP vectorize calls

2011-10-21 Thread Jakub Jelinek

On Fri, Oct 21, 2011 at 03:44:11PM +0200, Ira Rosen wrote:
> But it's OK to allow modifier != NONE if it's not SLP, so we need &&, no?

Well, in my patch that check was guarded by the if (slp_node ...),
so presumably it would allow modifier == NARROW vectorization in the loops
(otherwise some testcases would fail I'd hope).

Is gcc_assert ((slp_node != NULL) == (STMT_SLP_TYPE (stmt_info) != 0));
always true?  If not, when it is not?  When would be slp_node == NULL
and PURE_SLP_STMT true?

Jakub

Re: [Patch,AVR]: Use EIND consistently

2011-10-21 Thread Georg-Johann Lay

 This patch adds support to consistently use EIND.

 The compiler never sets this SFR but uses it in table jumps and 
 EIJMP/EICALL
 instructions.

 Custom startup code could set EIND to an other value than 0 and the 
 compiler
 should use EIND consistently given that EIND might not be zero.

 EIND != 0 will need support of custom linker script to locate jump pads in 
 an
 other segment, but that's a different story.

 The patch undoes some changes from r179760 and introduces using EIND the 
 other
 way round: Not trying to avoid EIND altogether and assume code is supposed 
 to
 work in the lower segment only, but instead use EIND and not zero-reg when
 simulating indirect jump by means of RET.

 With this patch, the application may set EIND to a custom value and invent 
 own
 linker script to place jump pads.  The assertion is that EIND never changes
 throughout the application and therefore ISR prologue/epilogue need not 
 care.

 With the gs() magic, code using indirect jumps works fine with that, e.g.
 - Indirect calls
 - Computed goto
 - Jumping to 1: in prologue_saves

 What does not work as expected is to jump to const_int addresses like

 int main()
 {
   ((void(*)(void))0)();
   return 0;
 }

 Instead, code must read

 extern void my_address (void);

 int main()
 {
my_address();
return 0;
 }

 and compiled with, say -Wl,--defsym,my_address=0x2, so that a jump pad 
 is
 generated.

 Patch ok for trunk?

 Johann

* config/avr/libgcc.S (__EIND__): New define to 0x3C.
(__tablejump__): Consistently use EIND for indirect jump/call.
(__tablejump_elpm__): Ditto.
>>> Approved.
>>>
>>> Denis.
>> Is this a thing to back port?
> 
> As you please.
> 
> Denis.

It's here:

http://gcc.gnu.org/viewcvs?view=revision&revision=180303

Opened PR50820 for this.

Johann

Re: [i386, PR50740] CPUID leaf 7 for BMI/BMI2/AVX2 feature detection not qualified with max_level and doesn't use subleaf

2011-10-21 Thread Uros Bizjak

On Fri, Oct 21, 2011 at 3:58 PM, Kirill Yukhin  wrote:

> Updated testsuite/ChangeLog:
> 2011-10-21  H.J. Lu  
>            Kirill Yukhin  
>
>        * gcc.target/i386/avx2-check.h (main): Check CPUID level
>        correctly.
>        * gcc.target/i386/bmi2-check.h: Ditto.
>
>
> Could anybody please commit that?

Done.

BTW: You should also refer to PR target/50740 in the ChangeLog entry,
I have added that.

Uros.

Re: [i386, PR50740] CPUID leaf 7 for BMI/BMI2/AVX2 feature detection not qualified with max_level and doesn't use subleaf

2011-10-21 Thread Kirill Yukhin

Thanks!

K

On Fri, Oct 21, 2011 at 6:34 PM, Uros Bizjak  wrote:
> On Fri, Oct 21, 2011 at 3:58 PM, Kirill Yukhin  
> wrote:
>
>> Updated testsuite/ChangeLog:
>> 2011-10-21  H.J. Lu  
>>            Kirill Yukhin  
>>
>>        * gcc.target/i386/avx2-check.h (main): Check CPUID level
>>        correctly.
>>        * gcc.target/i386/bmi2-check.h: Ditto.
>>
>>
>> Could anybody please commit that?
>
> Done.
>
> BTW: You should also refer to PR target/50740 in the ChangeLog entry,
> I have added that.
>
> Uros.
>

Re: [C++ Patch] __builtin_choose_expr bump

2011-10-21 Thread Joseph S. Myers

On Fri, 21 Oct 2011, Andy Gibbs wrote:

> Hi,
> 
> Please can I "bump" this patch and ask for it to be approved and committed:
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01711.html

Have you sent in your copyright assignment papers to the FSF?  The patch 
is large enough to need them.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C++ Patch / RFC] PR 45385

2011-10-21 Thread Jason Merrill

I think the fix for 35602 was wrong; instead of trying to suppress the 
warning, we should avoid building expressions that trip it.  In this 
case, the problem is a type mismatch in build_vec_init between 
maxindex/iterator (ptrdiff_type_node) and array_type_nelts_total 
(sizetype).  And indeed, converting (ptrdiff_t)-1 to unsigned changes 
its sign.


I think a better fix for 35602 would be to bail out of build_vec_init 
exit early if maxindex is -1.


Jason

Re: [C++ Patch] __builtin_choose_expr bump

2011-10-21 Thread Andy Gibbs

On Friday, October 21, 2011 4:42 PM, Joseph S. Myers wrote:
> On Fri, 21 Oct 2011, Andy Gibbs wrote:
>
>> Hi,
>>
>> Please can I "bump" this patch and ask for it to be approved and 
>> committed:
>> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01711.html
>
> Have you sent in your copyright assignment papers to the FSF?  The patch
> is large enough to need them.
>

Yes, I have.

Regards,

Andy

[PR translation/47064] Fix committed as obvious

2011-10-21 Thread Paolo Carlini


Hi,

committed.

Thanks,
Paolo.

//
2011-10-21  Roland Stigge  

PR translation/47064
* params.def: Fix typo "compilatoin" -> "compilation".
Index: params.def
===
--- params.def  (revision 180288)
+++ params.def  (working copy)
@@ -111,7 +111,7 @@
 /* Limit on probability of entry BB.  */
 DEFPARAM (PARAM_COMDAT_SHARING_PROBABILITY,
  "comdat-sharing-probability",
- "Probability that COMDAT function will be shared with different 
compilatoin unit",
+ "Probability that COMDAT function will be shared with different 
compilation unit",
  20, 0, 0)
 
 /* Limit on probability of entry BB.  */

Re: [RFC PATCH] SLP vectorize calls

2011-10-21 Thread Ira Rosen

On 21 October 2011 16:25, Jakub Jelinek  wrote:
> On Fri, Oct 21, 2011 at 03:44:11PM +0200, Ira Rosen wrote:
>> But it's OK to allow modifier != NONE if it's not SLP, so we need &&, no?
>
> Well, in my patch that check was guarded by the if (slp_node ...),
> so presumably it would allow modifier == NARROW vectorization in the loops
> (otherwise some testcases would fail I'd hope).

The problem with that is that slp_node can be NULL but it can still be
an SLP stmt (as you probably have guessed judging by the following
questions ;))

>
> Is gcc_assert ((slp_node != NULL) == (STMT_SLP_TYPE (stmt_info) != 0));
> always true?

No.

> If not, when it is not?

STMT_SLP_TYPE (stmt_info) != 0 may mean HYBRID_SLP_STMT, meaning that
we are vectorizing the stmt both as SLP and as regular loop
vectorization. So in the regular loop transformation of a hybrid stmt
(STMT_SLP_TYPE (stmt_info) != 0) doesn't (entail slp_node != NULL).

The other direction is always true.

> When would be slp_node == NULL
> and PURE_SLP_STMT true?

In the analysis of loop SLP. In loop SLP we analyze all the stmts of
the loop in their original order (and not as in BB SLP where we just
analyze SLP nodes). A stmt can belong to more than one SLP node, and
we may also need to vectorize it in a regular loop-vectorization way
at the same time. So, during the analysis we don't have stmt's SLP
node. (Note that during the analysis we need to know ncopies only to
verify that the operation is supported and for cost estimation).
And this is another case when 'if (STMT_SLP_TYPE (stmt_info) != 0)
then (slp_node != NULL)' is false.

I hope this makes sense.
Ira

>
>        Jakub
>

Re: [cxx-mem-model] Handle x86-64 with -m32

2011-10-21 Thread H.J. Lu

On Fri, Oct 21, 2011 at 5:11 AM, Andrew MacLeod  wrote:
> On 10/20/2011 06:50 PM, H.J. Lu wrote:
>>
>> On Thu, Oct 20, 2011 at 3:38 PM, Joseph S. Myers
>>   wrote:
>>>
>>> Do these operations exist for x32 as well as for -m64?  If they do, then
>>> lp64 isn't the right test either; if not, then it is.
>>>
>> X32 has native int64 and int128.
>>
> I presume there is no atomic support for int128 though, and thats what
> 'condition check_effective_target_sync_int_128' is testing for.
>

X32 uses x86-64 instruction set with 32bit pointers.   It has the same
atomic support as x86-64 and has atomic support for int128.

-- 
H.J.

[Patch,AVR]: Fix thinko in LEGITIMIZE_RELOAD_ADDRESS

2011-10-21 Thread Georg-Johann Lay

This fixes avr_legitimize_reload_address:

Since breaking out the code from LEGITIMIZE_RELOAD_ADDRESS, protiype of above is
   avr_legitimize_reload_address (rtx x, ...
but must be
   avr_legitimize_reload_address (rtx *px, ...
because at one place &x is used as input to push_reload which is now px.

Ok to install?

Johann

* config/avr/avr.h (LEGITIMIZE_RELOAD_ADDRESS): Pass address of X
instead of X to avr_legitimize_reload_address.
* config/avr/avr-protos.h (avr_legitimize_reload_address): Change
first arrument from rtx to rtx*.
* config/avr/avr.c (avr_legitimize_reload_address): Ditto.
Pass PX to push_reload instead of &X.  Change log messages for
better distinction.

Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(revision 180298)
+++ config/avr/avr-protos.h	(working copy)
@@ -110,7 +110,7 @@ extern void out_shift_with_cnt (const ch
 extern reg_class_t avr_mode_code_base_reg_class (enum machine_mode, RTX_CODE, RTX_CODE);
 extern bool avr_regno_mode_code_ok_for_base_p (int, enum machine_mode, RTX_CODE, RTX_CODE);
 extern rtx avr_incoming_return_addr_rtx (void);
-extern rtx avr_legitimize_reload_address (rtx, enum machine_mode, int, int, int, int, rtx (*)(rtx,int));
+extern rtx avr_legitimize_reload_address (rtx*, enum machine_mode, int, int, int, int, rtx (*)(rtx,int));
 #endif /* RTX_CODE */
 
 #ifdef REAL_VALUE_TYPE
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 180298)
+++ config/avr/avr.c	(working copy)
@@ -1356,11 +1356,13 @@ avr_legitimize_address (rtx x, rtx oldx,
than 63 bytes or for R++ or --R addressing.  */
 
 rtx
-avr_legitimize_reload_address (rtx x, enum machine_mode mode,
+avr_legitimize_reload_address (rtx *px, enum machine_mode mode,
int opnum, int type, int addr_type,
int ind_levels ATTRIBUTE_UNUSED,
rtx (*mk_memloc)(rtx,int))
 {
+  rtx x = *px;
+  
   if (avr_log.legitimize_reload_address)
 avr_edump ("\n%?:%m %r\n", mode, x);
   
@@ -1372,7 +1374,7 @@ avr_legitimize_reload_address (rtx x, en
opnum, RELOAD_OTHER);
   
   if (avr_log.legitimize_reload_address)
-avr_edump (" RCLASS = %R\n IN = %r\n OUT = %r\n",
+avr_edump (" RCLASS.1 = %R\n IN = %r\n OUT = %r\n",
POINTER_REGS, XEXP (x, 0), XEXP (x, 0));
   
   return x;
@@ -1398,7 +1400,7 @@ avr_legitimize_reload_address (rtx x, en
1, addr_type);
   
   if (avr_log.legitimize_reload_address)
-avr_edump (" RCLASS = %R\n IN = %r\n OUT = %r\n",
+avr_edump (" RCLASS.2 = %R\n IN = %r\n OUT = %r\n",
POINTER_REGS, XEXP (mem, 0), NULL_RTX);
   
   push_reload (mem, NULL_RTX, &XEXP (x, 0), NULL,
@@ -1406,7 +1408,7 @@ avr_legitimize_reload_address (rtx x, en
opnum, type);
   
   if (avr_log.legitimize_reload_address)
-avr_edump (" RCLASS = %R\n IN = %r\n OUT = %r\n",
+avr_edump (" RCLASS.2 = %R\n IN = %r\n OUT = %r\n",
BASE_POINTER_REGS, mem, NULL_RTX);
   
   return x;
@@ -1415,12 +1417,12 @@ avr_legitimize_reload_address (rtx x, en
   else if (! (frame_pointer_needed
   && XEXP (x, 0) == frame_pointer_rtx))
 {
-  push_reload (x, NULL_RTX, &x, NULL,
+  push_reload (x, NULL_RTX, px, NULL,
POINTER_REGS, GET_MODE (x), VOIDmode, 0, 0,
opnum, type);
   
   if (avr_log.legitimize_reload_address)
-avr_edump (" RCLASS = %R\n IN = %r\n OUT = %r\n",
+avr_edump (" RCLASS.3 = %R\n IN = %r\n OUT = %r\n",
POINTER_REGS, x, NULL_RTX);
   
   return x;
Index: config/avr/avr.h
===
--- config/avr/avr.h	(revision 180298)
+++ config/avr/avr.h	(working copy)
@@ -375,7 +375,7 @@ typedef struct avr_args {
 
 #define LEGITIMIZE_RELOAD_ADDRESS(X,MODE,OPNUM,TYPE,IND_L,WIN)  \
   do {  \
-rtx new_x = avr_legitimize_reload_address (X, MODE, OPNUM, TYPE,\
+rtx new_x = avr_legitimize_reload_address (&(X), MODE, OPNUM, TYPE, \
ADDR_TYPE (TYPE),\
IND_L, make_memloc); \
 if (new_x)  \

Re: [Patch,AVR]: Fix thinko in LEGITIMIZE_RELOAD_ADDRESS

2011-10-21 Thread Denis Chertykov

2011/10/21 Georg-Johann Lay :
> This fixes avr_legitimize_reload_address:
>
> Since breaking out the code from LEGITIMIZE_RELOAD_ADDRESS, protiype of above 
> is
>   avr_legitimize_reload_address (rtx x, ...
> but must be
>   avr_legitimize_reload_address (rtx *px, ...
> because at one place &x is used as input to push_reload which is now px.
>
> Ok to install?
>
> Johann
>
>        * config/avr/avr.h (LEGITIMIZE_RELOAD_ADDRESS): Pass address of X
>        instead of X to avr_legitimize_reload_address.
>        * config/avr/avr-protos.h (avr_legitimize_reload_address): Change
>        first arrument from rtx to rtx*.
>        * config/avr/avr.c (avr_legitimize_reload_address): Ditto.
>        Pass PX to push_reload instead of &X.  Change log messages for
>        better distinction.
>

Approved.

Denis.

Re: [RFA:] fix breakage with "Update testsuite to run with slim LTO"

2011-10-21 Thread Jan Hubicka

> If running the gnat.dg testsuite, lib/gcc-dg.exp is now calling
> check_linker_plugin_available early, which ultimately calls
> ${tool}_target_compile.  For all languages but Ada,
> ${tool}_target_compile can compile .c files just fine, but
> gnat_target_compile (which uses gnatmake) cannot, so it falls back to
> directly calling gcc_target_compile in that case.  gcc_target_compile
> relies on GCC_UNDER_TEST being set, which in this case hasn't yet
> happened, thus the error.
> 
> My solution (a hack, actually) is to move the initialization of
> GCC_UNDER_TEST in gcc-dg.exp before the calls to
> check_linker_plugin_available.  x86_64-unknown-linux-gnu testing in
> progress, will commit once that's finished.

Oops, I was under impression that GNAT testsuite is not really using dejagnu
lib so I did not expected a change here.
Thanks for fixing that!

I also noticed that tests scanning output of late optimization passes are
now getting UNRESOLVED state with slim LTO.  We don't really lose coverage
here because we test fat LTO with the other compilation, but probably easiest
is to enfore fat LTO all the time.

Does the following seem resonable?

Honza

* gcc.dg/torture/pta-ptrarith-1.c: Force fat LTO.
* gcc.dg/torture/pta-ptrarith-2.c: Likewise.
* gcc.dg/torture/pr23821.c: Likewise.
* gcc.dg/torture/pta-ptrarith-3.c: Likewise.
* gcc.dg/torture/pr45704.c: Likewise.
* gcc.dg/torture/pr50472.c: Likewise.
* gcc.dg/torture/ipa-pta-1.c: Likewise.
* gcc.dg/torture/pta-callused-1.c: Likewise.
* gcc.dg/torture/pr39074-2.c: Likewise.
* gcc.dg/torture/pr39074.c: Likewise.
* gcc.dg/torture/pr42898-2.c: Likewise.
* gcc.dg/torture/pr42898.c: Likewise.
* gcc.dg/torture/pta-escape-1.c: Likewise.
* gcc.dg/torture/ssa-pta-fn-1.c: Likewise.
Index: gcc.dg/torture/pta-ptrarith-1.c
===
*** gcc.dg/torture/pta-ptrarith-1.c (revision 180289)
--- gcc.dg/torture/pta-ptrarith-1.c (working copy)
***
*** 1,5 
  /* { dg-do run } */
! /* { dg-options "-fdump-tree-alias" } */
  /* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
  
  struct Foo {
--- 1,5 
  /* { dg-do run } */
! /* { dg-options "-fdump-tree-alias -ffat-lto-objects" } */
  /* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
  
  struct Foo {
Index: gcc.dg/torture/pta-ptrarith-2.c
===
*** gcc.dg/torture/pta-ptrarith-2.c (revision 180289)
--- gcc.dg/torture/pta-ptrarith-2.c (working copy)
***
*** 1,5 
  /* { dg-do run } */
! /* { dg-options "-fdump-tree-alias" } */
  /* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
  
  struct Foo {
--- 1,5 
  /* { dg-do run } */
! /* { dg-options "-fdump-tree-alias -ffat-lto-objects" } */
  /* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
  
  struct Foo {
Index: gcc.dg/torture/pr23821.c
===
*** gcc.dg/torture/pr23821.c(revision 180289)
--- gcc.dg/torture/pr23821.c(working copy)
***
*** 3,9 
  /* At -O1 DOM threads a jump in a non-optimal way which leads to
 the bogus propagation.  */
  /* { dg-skip-if "" { *-*-* } { "-O1" } { "" } } */
! /* { dg-options "-fdump-tree-ivcanon-details" } */
  
  int a[199];
  
--- 3,9 
  /* At -O1 DOM threads a jump in a non-optimal way which leads to
 the bogus propagation.  */
  /* { dg-skip-if "" { *-*-* } { "-O1" } { "" } } */
! /* { dg-options "-fdump-tree-ivcanon-details -ffat-lto-objects" } */
  
  int a[199];
  
Index: gcc.dg/torture/pta-ptrarith-3.c
===
*** gcc.dg/torture/pta-ptrarith-3.c (revision 180289)
--- gcc.dg/torture/pta-ptrarith-3.c (working copy)
***
*** 1,5 
  /* { dg-do run } */
! /* { dg-options "-fdump-tree-alias" } */
  /* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
  
  extern void abort (void);
--- 1,5 
  /* { dg-do run } */
! /* { dg-options "-fdump-tree-alias -ffat-lto-objects" } */
  /* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
  
  extern void abort (void);
Index: gcc.dg/torture/pr45704.c
===
*** gcc.dg/torture/pr45704.c(revision 180289)
--- gcc.dg/torture/pr45704.c(working copy)
***
*** 1,5 
  /* { dg-do compile } */
! /* { dg-options "-fdump-tree-optimized" } */
  
  struct st {
  int ptr;
--- 1,5 
  /* { dg-do compile } */
! /* { dg-options "-fdump-tree-optimized -ffat-lto-objects" } */
  
  struct st {
  int ptr;
Index: gcc.dg/torture/pr50472.c
===
*** gcc.dg/torture/pr50472.c(revision 180289)
--- gcc.dg/torture/pr50472.c(working copy)
***
*** 1,5 
  /* { dg-do compile } */
! /* { dg-options "-fdump-tre

Re: [Patch] Add support of AIX response files in collect2

2011-10-21 Thread David Edelsohn

On Fri, Oct 21, 2011 at 6:39 AM, Tristan Gingold  wrote:
> Hi,
>
> the AIX linker supports response files with the '-fFILE' command line option.
> With this patch, collect2 reads the content of the response files and listed 
> object files are handled.  Therefore these files are not forgotten by the 
> collect2 machinery.
>
> Although AIX ld supports glob(3) patterns, this isn't handled by this patch 
> because glob() isn't available on all hosts, isn't present in libiberty and 
> its use is not common.  To be considered for the future.
>
> I haven't added a testcase for this because I think this is not possible with 
> only one file.  But hints are welcome.
>
> Reduced bootstrap on rs6000-aix
>
> Ok for trunk ?
>
> Tristan.
>
> 2011-10-21  Tristan Gingold  
>
>        * collect2.c (main): Add support of -f (response file) on AIX.

Okay.

Thanks, David

Re: [cxx-mem-model] Handle x86-64 with -m32

2011-10-21 Thread Andrew MacLeod


On 10/21/2011 11:28 AM, H.J. Lu wrote:

On Fri, Oct 21, 2011 at 5:11 AM, Andrew MacLeod  wrote:


X32 has native int64 and int128.


I presume there is no atomic support for int128 though, and thats what
'condition check_effective_target_sync_int_128' is testing for.


X32 uses x86-64 instruction set with 32bit pointers.   It has the same
atomic support as x86-64 and has atomic support for int128.


Oh, you aren't talking about 32 bit, but a 32 bit abi on a 64 bit machine.

Re: [C++-11] User defined literals

2011-10-21 Thread 3dw4rd

Jason,

I split the changelog into four parts and tweaked some of the content and 
formatting.

Ed


CL_udlit_gcc_c-family
Description: Binary data


CL_udlit_gcc_cp
Description: Binary data


CL_udlit_gcc_testsuite
Description: Binary data


CL_udlit_libcpp
Description: Binary data

Re: [cxx-mem-model] Handle x86-64 with -m32

2011-10-21 Thread H.J. Lu

On Fri, Oct 21, 2011 at 9:08 AM, Andrew MacLeod  wrote:
> On 10/21/2011 11:28 AM, H.J. Lu wrote:
>>
>> On Fri, Oct 21, 2011 at 5:11 AM, Andrew MacLeod
>>  wrote:

 X32 has native int64 and int128.

>>> I presume there is no atomic support for int128 though, and thats what
>>> 'condition check_effective_target_sync_int_128' is testing for.
>>>
>> X32 uses x86-64 instruction set with 32bit pointers.   It has the same
>> atomic support as x86-64 and has atomic support for int128.
>
> Oh, you aren't talking about 32 bit, but a 32 bit abi on a 64 bit machine.
>

Yes.

-- 
H.J.

Re: [RFA:] fix breakage with "Update testsuite to run with slim LTO"

2011-10-21 Thread Hans-Peter Nilsson

> Date: Fri, 21 Oct 2011 17:44:15 +0200
> From: Jan Hubicka 

> I also noticed that tests scanning output of late optimization passes are
> now getting UNRESOLVED state with slim LTO.  We don't really lose coverage
> here because we test fat LTO with the other compilation, but probably easiest
> is to enfore fat LTO all the time.
> 
> Does the following seem resonable?
> 
> Honza
> 
>   * gcc.dg/torture/pta-ptrarith-1.c: Force fat LTO.
>   * gcc.dg/torture/pta-ptrarith-2.c: Likewise.
>   * gcc.dg/torture/pr23821.c: Likewise.
>   * gcc.dg/torture/pta-ptrarith-3.c: Likewise.
>   * gcc.dg/torture/pr45704.c: Likewise.
>   * gcc.dg/torture/pr50472.c: Likewise.
>   * gcc.dg/torture/ipa-pta-1.c: Likewise.
>   * gcc.dg/torture/pta-callused-1.c: Likewise.
>   * gcc.dg/torture/pr39074-2.c: Likewise.
>   * gcc.dg/torture/pr39074.c: Likewise.
>   * gcc.dg/torture/pr42898-2.c: Likewise.
>   * gcc.dg/torture/pr42898.c: Likewise.
>   * gcc.dg/torture/pta-escape-1.c: Likewise.
>   * gcc.dg/torture/ssa-pta-fn-1.c: Likewise.

Meh...  Please no, this was the kind of scatter-patches my patch
aimed to avoid... for example, easy to miss some tests.

Instead, on top of my patch, just copy the
scan-assembler_required_options proc to a
scan-tree-dump_required_options.  ...no wait, should forcing
fat-lto be done for all tree-dumps?  If only for a subset of
tree-dumps augment the *_required_options proc API to take
arguments that let you check for that.

brgds, H-P

Re: [C++ Patch / RFC] PR 45385

2011-10-21 Thread Paolo Carlini


On 10/21/2011 04:56 PM, Jason Merrill wrote:
I think the fix for 35602 was wrong; instead of trying to suppress the 
warning, we should avoid building expressions that trip it.  In this 
case, the problem is a type mismatch in build_vec_init between 
maxindex/iterator (ptrdiff_type_node) and array_type_nelts_total 
(sizetype).  And indeed, converting (ptrdiff_t)-1 to unsigned changes 
its sign.


I think a better fix for 35602 would be to bail out of build_vec_init 
exit early if maxindex is -1.
Ah great, thanks a lot. The below passes testing, if it's Ok I would be 
tempted to backport it to the 4_6-branch too after 4.6.2 is out..


Thanks again,
Paolo.

//
/cp
2011-10-21  Paolo Carlini  

PR c++/45385
* init.c (build_vec_init): Early return error_mark_node if
maxindex is -1.

/c-family
2011-10-21  Paolo Carlini  

PR c++/45385
* c-common.c (conversion_warning): Remove code looking for
artificial operands.

/testsuite
2011-10-21  Paolo Carlini  

PR c++/45385
* g++.dg/warn/Wconversion4.C: New.
Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 180307)
+++ c-family/c-common.c (working copy)
@@ -2121,23 +2121,12 @@ unsafe_conversion_p (tree type, tree expr, bool pr
 static void
 conversion_warning (tree type, tree expr)
 {
-  int i;
-  const int expr_num_operands = TREE_OPERAND_LENGTH (expr);
   tree expr_type = TREE_TYPE (expr);
   location_t loc = EXPR_LOC_OR_HERE (expr);
 
   if (!warn_conversion && !warn_sign_conversion)
 return;
 
-  /* If any operand is artificial, then this expression was generated
- by the compiler and we do not warn.  */
-  for (i = 0; i < expr_num_operands; i++)
-{
-  tree op = TREE_OPERAND (expr, i);
-  if (op && DECL_P (op) && DECL_ARTIFICIAL (op))
-   return;
-}
-
   switch (TREE_CODE (expr))
 {
 case EQ_EXPR:
Index: testsuite/g++.dg/warn/Wconversion4.C
===
--- testsuite/g++.dg/warn/Wconversion4.C(revision 0)
+++ testsuite/g++.dg/warn/Wconversion4.C(revision 0)
@@ -0,0 +1,17 @@
+// PR c++/45385
+// { dg-options "-Wconversion" } 
+
+void foo(unsigned char);
+
+class Test
+{
+  void eval()
+  {
+foo(bar());  // { dg-warning "may alter its value" }
+  }
+
+  unsigned int bar() const
+  {
+return __INT_MAX__ * 2U + 1;
+  }
+};
Index: cp/init.c
===
--- cp/init.c   (revision 180307)
+++ cp/init.c   (working copy)
@@ -2998,7 +2998,8 @@ build_vec_init (tree base, tree maxindex, tree ini
   if (TREE_CODE (atype) == ARRAY_TYPE && TYPE_DOMAIN (atype))
 maxindex = array_type_nelts (atype);
 
-  if (maxindex == NULL_TREE || maxindex == error_mark_node)
+  if (maxindex == NULL_TREE || maxindex == error_mark_node
+  || tree_int_cst_equal (maxindex, integer_minus_one_node))
 return error_mark_node;
 
   if (explicit_value_init_p)

Re: new patches using -fopt-info (issue5294043)

2011-10-21 Thread Xinliang David Li

There are two proposals here. One is -fopt-info which prints out
informational notes to stderr, and the other is -fopt-report which is
more elaborate form of dump files. Are you object to both or just the
opt-report one?  The former is no different from any other
informational notes we already have -- the only difference is that
they are suppressed by default.

>>    ..
>>  ...
>
> I very well understand the intent.  But I disagree with where you start
> to implement this.  Dump files are _not_ only for developers - after
> all we don't have anything else.  -fopt-report can get as big and unmanagable
> to read as dump files - in fact I argue it will be worse than dump files if
> you go beyond very very coarse reporting.

The problem of using dump files for optimization report is that all
optimization decisions are 'distributed' in phase specific dumps file.
For a whole program report, the number of files that are created is
not manageable (think about a program with 4000 sources each dumping
200 files).  If we create a dummy pass and suck in all optimization
decisions in that pass's dump file -- it will be no different from
opt-report.

>
> Yes, dump files are a "mess".  So - why not clean them up, and at the
> same time annotate dump file pieces so _automatic_ filtering and
> redirecting to stdout with something like -fopt-report would do something
> sensible?  I don't see why dump files have to stay messy while you at
> the same time would need to add _new_ code to dump to stdout for
> -fopt-report.

In my mind, I would like to separate all dumps into three categories.

1) IR dumps, and support dump before and after (this reminds me my
patches are still pending :) )-fdump-tree-pre-[before|after]-
 Dump into .after, .before files
2) debug tracing etc:-fdump-tree-pre-debug-...  Dump
into .debug files.
3) opt report : -fdump-opt or -fopt-report

Changes for 1) and 2) are mechanic but requires lots of work.

>
> So, no, please do it the right way that benefits both compiler developers
> and your "power users".
>
> And yes, the right way is not to start adding that -fopt-report switch.
> The right way is to make dump-files consumable by mere mortals first.

I agree we need to do the right way which needs to be discussed first.
I would argue that mere mortals will really appreciate opt-info
(separate from dump file and opt-report).

thanks,

David

>
> Thanks,
> Richard.
>
>>
>> Thanks,
>>
>> David
>>
>>>
>>> So, please fix dump-files instead.  And for coverage/profiling, fill
>>> in stuff in a dump-file!
>>>
>>> Richard.
>>>
 It would be interested to have some warnings about missing SRA
 opportunities in =1 or =2. I found that sometimes fixing those can give a
 large speedup.

 Right now a common case that prevents SRA on structure field
 is simply a memset or memcpy.

 -Andi


 --
 a...@linux.intel.com -- Speaking for myself only

>>>
>>
>

Re: [patch] dwarf2out crash: missing GTY? (PR 50806)

2011-10-21 Thread Steve Ellcey

FYI: I am seeing this same ICE on the hppa64-hp-hpux11.11 bootstrap.

(debug_expr:DI D#49)
/ctires/gcc/nightly/src/trunk/gcc/cselib.c: In function 'void 
cselib_record_sets(rtx)':
/ctires/gcc/nightly/src/trunk/gcc/cselib.c:2424:1: internal compiler error: in 
mem_loc_descriptor, at dwarf2out.c:12379
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

I am trying to cut down the test case and find out exactly when
it started failing.  The last successful bootstrap I had was r180174 and
the first known bad one is r180233.

Steve Ellcey
s...@cup.hp.com

Re: [PATCH 1/3] Add missing page rounding of a page_entry

2011-10-21 Thread Andi Kleen

On Fri, Oct 21, 2011 at 11:52:44AM +0200, Jakub Jelinek wrote:
> On Fri, Oct 21, 2011 at 11:42:26AM +0200, Richard Guenther wrote:
> > On Fri, Oct 21, 2011 at 7:52 AM, Andi Kleen  wrote:
> > > From: Andi Kleen 
> > >
> > > This one place in ggc forgot to round page_entry->bytes to the
> > > next page boundary, which lead to all the heuristics in freeing to
> > > check for continuous memory failing. Round here too, like all other
> > > allocators already do. The memory consumed should be the same
> > > for MMAP because the kernel would round anyways. It may slightly
> > > increase memory usage when malloc groups are used.
> > >
> > > This will also increase the hitrate on the free page list
> > > slightly.
> > 
> > > 2011-10-18  Andi Kleen  
> > >
> > >        * ggc-page.c (alloc_pages): Always round up entry_size.
> 
> As I said in the PR, ROUND_UP should make the previous
>   if (entry_size < G.pagesize)
> entry_size = G.pagesize;
> completely unnecessary.  

AFAIK there are objects > pagesize in GGC, but not too many

But you're right it will be somewhat expensive (although the mmap
is not very common anyways and most other allocations
already do the roundup). I can drop it.

-Andi

Re: [patch, Fortran] Fix PR 50690

2011-10-21 Thread Thomas Koenig


Jakub Jelinek wrote:

Though, what could be done is just special case OpenMP workshare regions,
insert everything into BLOCK local vars unless in OpenMP workshare, in that
case put the BLOCK with the temporary around the workshare rather than
inside of it.  In the case of omp parallel workshare it would need
to go in between omp parallel and omp workshare.


Well, here's a patch which implements this concept.  I chose to insert
the BLOCK in a separate pass because it was the cleanest way to avoid
infinite recursion when inserting a block.

Regression-tested.  OK for trunk?

Thomas

2011-10-21  Thomas Koenig  

PR fortran/50690
* frontend-passes.c (workshare_level):  New variable.
(create_var):  Put the newly created variable into the block
around the WORKSHARE.
(enclose_workshare):  New callback function to enclose
WORKSHAREs in blocks.
(optimize_namespace):  Use it.
(gfc_code_walker):  Save/restore current namespace when
following a BLOCK.  Keep track of workshare level.

2011-10-21  Thomas Koenig  

PR fortran/50690
* gfortran.dg/gomp/workshare2.f90:  New test.


Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 180063)
+++ frontend-passes.c	(Arbeitskopie)
@@ -66,6 +66,10 @@ static gfc_namespace *current_ns;
 
 static int forall_level;
 
+/* If we are within an OMP WORKSHARE or OMP PARALLEL WORKSHARE.  */
+
+static int workshare_level;
+
 /* Entry point - run all passes for a namespace.  So far, only an
optimization pass is run.  */
 
@@ -245,8 +249,16 @@ create_var (gfc_expr * e)
   gfc_namespace *ns;
   int i;
 
+  /* Special treatment for WORKSHARE: The variable goes into the block
+ created by the earlier pass around it.  */
+
+  if (workshare_level > 0)
+{
+  ns = current_ns;
+  changed_statement = current_code;
+}
   /* If the block hasn't already been created, do so.  */
-  if (inserted_block == NULL)
+  else if (inserted_block == NULL)
 {
   inserted_block = XCNEW (gfc_code);
   inserted_block->op = EXEC_BLOCK;
@@ -497,6 +509,38 @@ convert_do_while (gfc_code **c, int *walk_subtrees
   return 0;
 }
 
+/* Callback function to enclose OMP workshares into BLOCKs.  This is done
+   so that later front end optimization can insert temporary variables into
+   the outer block scope.  */
+
+static int
+enclose_workshare (gfc_code **c, int *walk_subtrees,
+		   void *data ATTRIBUTE_UNUSED)
+{
+  gfc_code *co;
+  gfc_code *new_block;
+  gfc_namespace *ns;
+
+  co = *c;
+
+  if (co->op != EXEC_OMP_WORKSHARE && co->op != EXEC_OMP_PARALLEL_WORKSHARE)
+return 0;
+
+  /* Create the block.  */
+  new_block = XCNEW (gfc_code);
+  new_block->op = EXEC_BLOCK;
+  new_block->loc = co->loc;
+  ns = gfc_build_block_ns (current_ns);
+  new_block->ext.block.ns = ns;
+  new_block->ext.block.assoc = NULL;
+  ns->code = co;
+
+  /* Insert the BLOCK at the right position.  */
+  *c = new_block;
+  *walk_subtrees = false;
+  return 0;
+}
+
 /* Optimize a namespace, including all contained namespaces.  */
 
 static void
@@ -507,6 +551,12 @@ optimize_namespace (gfc_namespace *ns)
   forall_level = 0;
 
   gfc_code_walker (&ns->code, convert_do_while, dummy_expr_callback, NULL);
+  if (gfc_option.gfc_flag_openmp)
+{
+  workshare_level = 0;
+  gfc_code_walker (&ns->code, enclose_workshare, dummy_expr_callback, NULL);
+}
+
   gfc_code_walker (&ns->code, cfe_code, cfe_expr_0, NULL);
   gfc_code_walker (&ns->code, optimize_code, optimize_expr, NULL);
 
@@ -1148,6 +1198,7 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	  gfc_code *b;
 	  gfc_actual_arglist *a;
 	  gfc_code *co;
+	  gfc_namespace *save_ns;
 	  gfc_association_list *alist;
 
 	  /* There might be statement insertions before the current code,
@@ -1159,7 +1210,11 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	{
 
 	case EXEC_BLOCK:
+	  save_ns = current_ns;
+	  current_ns = co->ext.block.ns;
 	  WALK_SUBCODE (co->ext.block.ns->code);
+	  current_ns = save_ns;
+
 	  for (alist = co->ext.block.assoc; alist; alist = alist->next)
 		WALK_SUBEXPR (alist->target);
 	  break;
@@ -1329,14 +1384,18 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	  WALK_SUBEXPR (co->ext.dt->extra_comma);
 	  break;
 
+	case EXEC_OMP_PARALLEL_WORKSHARE:
+	case EXEC_OMP_WORKSHARE:
+	  workshare_level ++;
+
+	  /* Fall through.  */
+
 	case EXEC_OMP_DO:
 	case EXEC_OMP_PARALLEL:
 	case EXEC_OMP_PARALLEL_DO:
 	case EXEC_OMP_PARALLEL_SECTIONS:
-	case EXEC_OMP_PARALLEL_WORKSHARE:
 	case EXEC_OMP_SECTIONS:
 	case EXEC_OMP_SINGLE:
-	case EXEC_OMP_WORKSHARE:
 	case EXEC_OMP_END_SINGLE:
 	case EXEC_OMP_TASK:
 	  if (co->ext.omp_clauses)
@@ -1365,6 +1424,9 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	  if (co->op == EXEC_FORALL)
 	forall_level --;

Re: [cxx-mem-model] Handle x86-64 with -m32

2011-10-21 Thread Aldy Hernandez


On 10/21/11 11:08, Andrew MacLeod wrote:

On 10/21/2011 11:28 AM, H.J. Lu wrote:

On Fri, Oct 21, 2011 at 5:11 AM, Andrew MacLeod
wrote:


X32 has native int64 and int128.


I presume there is no atomic support for int128 though, and thats what
'condition check_effective_target_sync_int_128' is testing for.


X32 uses x86-64 instruction set with 32bit pointers. It has the same
atomic support as x86-64 and has atomic support for int128.


Oh, you aren't talking about 32 bit, but a 32 bit abi on a 64 bit machine.


Thanks for pointing this out Joseph.

The following patch handles both x86_64 and i?86, but only returns true 
for LP64.  Is this what you had in mind?


Aldy
* lib/target-supports.exp (check_effective_target_sync_int_128):
Handle both 32-bit and 64-bit triplets on x86.
(check_effective_target_sync_long_long): Same.
* gcc.dg/simulate-thread/atomic-load-int128.c: Handle i?86-*-*.
* gcc.dg/simulate-thread/atomic-other-int128.c: Same.

Index: lib/target-supports.exp
===
--- lib/target-supports.exp (revision 180156)
+++ lib/target-supports.exp (working copy)
@@ -3456,7 +3456,8 @@ proc check_effective_target_sync_int_128
 verbose "check_effective_target_sync_int_128: using cached result" 2
 } else {
 set et_sync_int_128_saved 0
-if { [istarget x86_64-*-*] } {
+if { ([istarget x86_64-*-*] || [istarget i?86-*-*])
+&& [is-effective-target lp64] } {
set et_sync_int_128_saved 1
 }
 }
@@ -3474,7 +3475,8 @@ proc check_effective_target_sync_long_lo
 verbose "check_effective_target_sync_long_long: using cached result" 2
 } else {
 set et_sync_long_long_saved 0
-if { [istarget x86_64-*-*] } {
+if { ([istarget x86_64-*-*] || [istarget i?86-*-*])
+&& [is-effective-target lp64] } {
set et_sync_long_long_saved 1
 }
 }
Index: gcc.dg/simulate-thread/atomic-load-int128.c
===
--- gcc.dg/simulate-thread/atomic-load-int128.c (revision 180156)
+++ gcc.dg/simulate-thread/atomic-load-int128.c (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do link } */
 /* { dg-require-effective-target sync_int_128 } */
-/* { dg-options "-mcx16" { target { x86_64-*-* } } } */
+/* { dg-options "-mcx16" { target { x86_64-*-* i?86-*-* } } } */
 /* { dg-final { simulate-thread } } */
 
 #include 
Index: gcc.dg/simulate-thread/atomic-other-int128.c
===
--- gcc.dg/simulate-thread/atomic-other-int128.c(revision 180156)
+++ gcc.dg/simulate-thread/atomic-other-int128.c(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do link } */
 /* { dg-require-effective-target sync_int_128 } */
-/* { dg-options "-mcx16" { target { x86_64-*-* } } } */
+/* { dg-options "-mcx16" { target { x86_64-*-* i?86-*-*] } } } */
 /* { dg-final { simulate-thread } } */
 
 #include

Re: [C++ Patch] PR 48630 (PR 31423)

2011-10-21 Thread Paolo Carlini


Hi again,

Another acceptable fix is to
   -- leave the current diagnostic as is if -fms-extensions
   -- suggest '()' is member function
   -- otherwise suggest '&'.
Thanks for your help Gaby, in particular about the MS extension which 
I had overlooked completely (as any hard-code Linux guy would ;). 
Anyway, seriously, I'll try to come up with an improved proposal over 
the next days.

Thus I tested on x86_64-linux the below. Ok for mainline?

Thanks,
Paolo.

/
/cp
2011-10-21  Paolo Carlini  

PR c++/31423
* typeck2.c (cxx_incomplete_type_diagnostic): Improve error message
for invalid use of member function.

/testsuite
2011-10-21  Paolo Carlini  

PR c++/31423
* g++.dg/parse/error43.C: New.
* g++.dg/parse/error44.C: Likewise.
Index: testsuite/g++.dg/parse/error43.C
===
--- testsuite/g++.dg/parse/error43.C(revision 0)
+++ testsuite/g++.dg/parse/error43.C(revision 0)
@@ -0,0 +1,5 @@
+// PR c++/31423
+// { dg-options "" }
+
+class C { public: C* f(); int get(); };
+int f(C* p) { return p->f->get(); }  // { dg-error "forget the '\\(\\)'|base 
operand" }
Index: testsuite/g++.dg/parse/error44.C
===
--- testsuite/g++.dg/parse/error44.C(revision 0)
+++ testsuite/g++.dg/parse/error44.C(revision 0)
@@ -0,0 +1,11 @@
+// PR c++/31423
+// { dg-options "-fms-extensions" }
+
+struct C {
+   int f() { return 1; }
+   int g() { return 2; }
+};
+
+int f(C& c) {
+   return c.g == &c.f; // { dg-error "forget the '&'" }
+}
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 180307)
+++ cp/typeck2.c(working copy)
@@ -428,8 +428,15 @@ cxx_incomplete_type_diagnostic (const_tree value,
 
 case OFFSET_TYPE:
 bad_member:
-  emit_diagnostic (diag_kind, input_location, 0,
-  "invalid use of member (did you forget the %<&%> ?)");
+  if (DECL_FUNCTION_MEMBER_P (TREE_OPERAND (value, 1))
+ && ! flag_ms_extensions)
+   emit_diagnostic (diag_kind, input_location, 0,
+"invalid use of member function "
+"(did you forget the %<()%> ?)");
+  else
+   emit_diagnostic (diag_kind, input_location, 0,
+"invalid use of member "
+"(did you forget the %<&%> ?)");
   break;
 
 case TEMPLATE_TYPE_PARM:

[patch] dwarf2out: Use DW_FORM_ref_udata (.debug -= 7.53%)

2011-10-21 Thread Jan Kratochvil

Hi,

this is a standalone patch to replace DW_FORM_ref4 by DW_FORM_ref_udata
(uleb128).  It saves for libstdc++.so.debug: 5254792 -> 4859136 = 7.53%
Tested with: -O2 -gdwarf-4 -fdebug-types-section

The references are already intra-CU ones, they are never relocated, they are
pre-computed by dwarf2out.c as plain numbers already.  IIRC I got this idea
after reading Google "Fission" but it does not deal with intra-CU refs.

The change does not apply to CFI references, those are unchanged, generated by
different code of dwarf2out.c.

I had a draft patch to use DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref_udata and
DW_FORM_ref4 which should be slightly smaller than just DW_FORM_ref_udata.
But it produced larger files than just DW_FORM_ref_udata.  Assuming it was due
to multiplied abbrev definitions.  One would need to decide when if it is
worth to create a new smaller-sized abbrev but that seems too complicated for
such kind of optimization.  There exist other project(s) in development for
DWARF optimizations as a post-processing tool, this patch is meant just as
a very simple way to reduce the (possibly intermediate) DWARF files.

It is incompatible with GNU binutils readelf, a pending simple fix is:
[patch] Fix readelf for DW_FORM_ref_udata
http://sourceware.org/ml/binutils/2011-10/msg00201.html

No regressions on {x86_64,x86_64-m32,i686}-fedora16pre-linux-gnu
(4.7.0 20111002).  No changes in readelf -w output on libstdc++.so.

My former DW_AT_sibling patch will be updated on top of this one:
Re: [patch#2] dwarf2out: Drop the size + performance overhead of 
DW_AT_sibling
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01869.html
These two optimizations are not incremental to each other as both benefit
primarily from common DW_AT_sibling.  That size decrease 3.49% will be less
after this patch; moreover when not all DW_AT_sibling attrs will be dropped.


Thanks,
Jan


gcc/
2011-10-21  Jan Kratochvil  

* dwarf2out.c (DW_FORM_ref): Remove.
(size_of_die) : Use size_of_uleb128 for
!AT_ref_external references.
(calc_die_sizes_change): New variable.
(calc_die_sizes): Permit die_offset incrementals.  Set
CALC_DIE_SIZES_CHANGE accordingly.
(value_format) : Return DW_FORM_ref_udata for
!AT_ref_external references.
(output_die) : Use dw2_asm_output_data_uleb128
for !AT_ref_external references.
(output_unit_prep): New function, call calc_die_sizes repeatedly based
on CALC_DIE_SIZES_CHANGE.
(output_comp_unit, output_comdat_type_unit): Move some code to
output_unit_prep.

--- gcc/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -226,7 +226,6 @@ static GTY(()) rtx current_unit_personality;
 
 /* Data and reference forms for relocatable data.  */
 #define DW_FORM_data (DWARF_OFFSET_SIZE == 8 ? DW_FORM_data8 : DW_FORM_data4)
-#define DW_FORM_ref (DWARF_OFFSET_SIZE == 8 ? DW_FORM_ref8 : DW_FORM_ref4)
 
 #ifndef DEBUG_FRAME_SECTION
 #define DEBUG_FRAME_SECTION".debug_frame"
@@ -7700,7 +7699,7 @@ size_of_die (dw_die_ref die)
size += DWARF_OFFSET_SIZE;
}
  else
-   size += DWARF_OFFSET_SIZE;
+   size += size_of_uleb128 (AT_ref (a)->die_offset);
  break;
case dw_val_class_fde_ref:
  size += DWARF_OFFSET_SIZE;
@@ -7735,6 +7734,10 @@ size_of_die (dw_die_ref die)
   return size;
 }
 
+/* Has calc_die_sizes changed any DIE_OFFSET?  */
+
+static bool calc_die_sizes_change;
+
 /* Size the debugging information associated with a given DIE.  Visits the
DIE's children recursively.  Updates the global variable next_die_offset, on
each time through.  Uses the current value of next_die_offset to update the
@@ -7745,9 +7748,12 @@ calc_die_sizes (dw_die_ref die)
 {
   dw_die_ref c;
 
-  gcc_assert (die->die_offset == 0
- || (unsigned long int) die->die_offset == next_die_offset);
-  die->die_offset = next_die_offset;
+  gcc_assert ((unsigned long int) die->die_offset <= next_die_offset);
+  if ((unsigned long int) die->die_offset != next_die_offset)
+{
+  die->die_offset = next_die_offset;
+  calc_die_sizes_change = true;
+}
   next_die_offset += size_of_die (die);
 
   FOR_EACH_CHILD (die, c, calc_die_sizes (c));
@@ -8018,7 +8024,7 @@ value_format (dw_attr_ref a)
   if (AT_ref_external (a))
return use_debug_types ? DW_FORM_ref_sig8 : DW_FORM_ref_addr;
   else
-   return DW_FORM_ref;
+   return DW_FORM_ref_udata;
 case dw_val_class_fde_ref:
   return DW_FORM_data;
 case dw_val_class_lbl_id:
@@ -8397,8 +8403,7 @@ output_die (dw_die_ref die)
  else
{
  gcc_assert (AT_ref (a)->die_offset);
- dw2_asm_output_data (DWARF_OFFSET_SIZE, AT_ref (a)->die_offset,
-  "%s", name);
+ dw2_asm_output_data_uleb128 (AT_ref (a)->die_offset, "%s", name);
}
  break;
 
@@ -8496,6 +8501,28 @@

Re: [patch] dwarf2out: Use DW_FORM_ref_udata (.debug -= 7.53%)

2011-10-21 Thread Jan Kratochvil

On Fri, 21 Oct 2011 20:01:29 +0200, Jan Kratochvil wrote:
> No regressions on {x86_64,x86_64-m32,i686}-fedora16pre-linux-gnu
> (4.7.0 20111002).

A typo, only tested for x86_64-fedora16pre-linux-gnu, sorry.


Jan

Re: [C++ Patch] PR 48630 (PR 31423)

2011-10-21 Thread Jason Merrill


OK.

Jason

Re: [C++ Patch / RFC] PR 45385

2011-10-21 Thread Jason Merrill


On 10/21/2011 12:20 PM, Paolo Carlini wrote:

+  || tree_int_cst_equal (maxindex, integer_minus_one_node))


Use integer_all_onesp instead.  OK with that change.

Jason

Re: [PATCH 2/3] Free large chunks in ggc

2011-10-21 Thread Andi Kleen

> > diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> > index ba88e3f..eb0eeef 100644
> > --- a/gcc/ggc-page.c
> > +++ b/gcc/ggc-page.c
> > @@ -972,6 +972,54 @@ release_pages (void)
> >   page_entry *p, *start_p;
> >   char *start;
> >   size_t len;
> > +  size_t mapped_len;
> > +  page_entry *next, *prev, *newprev;
> > +  size_t free_unit = PARAM_VALUE (GGC_FREE_UNIT) * G.pagesize;
> > +
> > +  /* First free larger continuous areas to the OS.
> > +     This allows other allocators to grab these areas if needed.
> > +     This is only done on larger chunks to avoid fragmentation.
> > +     This does not always work because the free_pages list is only
> > +     sorted over a single GC cycle. */
> 
> But release_pages is only called from ggc_collect, or what do you

If there was a spike in GC usage and we end up with lots of free
space in the free list afterward we free it back on the next GC cycle.
Then if there's a malloc or other allocator later it can grab
the address space we freed.

That was done to address your earlier concern.

This will only happen on ggc_collect of course.

So one difference from before the madvise patch is that different
generations of free pages can accumulate in the freelist. Before madvise
the freelist would never contain more than one generation.
Normally it's sorted by address due to the way GC works, but there's no 
attempt to keep the sort order over multiple generations.

The "free in batch" heuristic requires sorting, so it will only
work if all the pages are freed in a single gc cycle.

I considered sorting, but it seemed to be too slow.

I can expand the comment on that.


> mean with the above?  Would the hitrate using the quire size increase
> if we change how we allocate from the freelist or is it real fragmentation
> that causes it?

Not sure really about the hitrate. I haven't measured it. If hitrate
was a concern the free list should be probably split into an array.
I'm sure there are lots of other tunings that could be done on the GC,
but probably not by me for now :)

> 
> I'm a bit hesitant to approve the new param, I'd be ok if we just hard-code
> quire-size / 2.

Ok replacing it with a hardcoded value.

-Andi

Re: [RFA:] fix breakage with "Update testsuite to run with slim LTO"

2011-10-21 Thread Jan Hubicka

> 
> Meh...  Please no, this was the kind of scatter-patches my patch
> aimed to avoid... for example, easy to miss some tests.
> 
> Instead, on top of my patch, just copy the
> scan-assembler_required_options proc to a
> scan-tree-dump_required_options.  ...no wait, should forcing
> fat-lto be done for all tree-dumps?  If only for a subset of

Yep, problem is that early tree passes and analysis part of IPA pases is  run
with fat-lto, while late and RTL passes and execution pass of IPA are not. 

I guess we could make ipa-dump/rtl-dump/tree-dump scanning to disable fat lto
and introduce variants intended to scan late tree dumps and ipa execution 
dumps...
Not sure if it would make more sense than just doing it explicitely in tests.
> tree-dumps augment the *_required_options proc API to take
> arguments that let you check for that.
Well, listing all late tree passes is quite long and changing with time...

Honza
> 
> brgds, H-P

Re: Predication during scheduling

2011-10-21 Thread Bernd Schmidt

On 10/21/11 15:42, Bernd Schmidt wrote:
> On 10/14/11 17:35, Vladimir Makarov wrote:
>> The scheduler part of the patch is ok for me (other part changes are
>> obvious).  Could you only commit it at the beginning of the next week.
> 
> I've committed this variant. It's updated for some recent changes in trunk:

And this fixlet prevents ports that don't expect REG_DEP_CONTROL from
seeing this type of dependency. Committed as obvious.


Bernd
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 180309)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2011-10-21  Bernd Schmidt  
+
+   PR bootstrap/50825
+   * sched-deps.c (add_dependence): If not doing predication, promote
+   REG_DEP_CONTROL to REG_DEP_ANTI.
+
 2011-10-21  Georg-Johann Lay  
 
* config/avr/avr.h (LEGITIMIZE_RELOAD_ADDRESS): Pass address of X
Index: gcc/sched-deps.c
===
--- gcc/sched-deps.c(revision 180302)
+++ gcc/sched-deps.c(working copy)
@@ -1505,6 +1505,10 @@ sd_debug_lists (rtx insn, sd_list_types_
 void
 add_dependence (rtx con, rtx pro, enum reg_note dep_type)
 {
+  if (dep_type == REG_DEP_CONTROL
+  && !(current_sched_info->flags & DO_PREDICATION))
+dep_type = REG_DEP_ANTI;
+
   /* A REG_DEP_CONTROL dependence may be eliminated through predication,
  so we must also make the insn dependent on the setter of the
  condition.  */

[PATCH] Fix V4DImode/V8SImode extract even/odd permutation with -mavx (PR target/50813)

2011-10-21 Thread Jakub Jelinek

Hi!

No idea how I've missed this.  With -mavx V8SImode and V4DImode
are valid, but the choice of shuffle insns for them is limited.

This patch will just pay the reinterpretation penalty and reshuffle
them as corresponding V4DFmode resp. V8SFmode instead of ICE.

Additionally, I've added two new (unrelated) interesting permutations
to the tests.

Bootstrapped/regtested on x86_64-linux and i686-linux and additionally
regtested with
GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
RUNTESTFLAGS='--target_board=unix\{-m32/-msse2,-m32/-msse4,-m32/-mavx,-m64/-msse2,-m64/-msse4,-m64/-mavx\}
 dg-torture.exp=vshuf*'
and compiling/linking all tests at -O2 -DEXPENSIVE with -mavx2 and
testing in sde.  The only failures are for -m64 -Os some of the identity
permutation, which isn't using VEC_PERM_EXPR at all and so is an
unrelated, possibly generic, bug.
Ok for trunk?

2011-10-21  Jakub Jelinek  

PR target/50813
* config/i386/i386.c (expand_vec_perm_even_odd_1): Handle
V4DImode and V8SImode for !TARGET_AVX2.

* gcc.dg/torture/vshuf-32.inc: Add broadcast permutation
from element other than first and reverse permutation.
* gcc.dg/torture/vshuf-16.inc: Likewise.
* gcc.dg/torture/vshuf-8.inc: Likewise.
* gcc.dg/torture/vshuf-4.inc: Likewise.

--- gcc/config/i386/i386.c.jj   2011-10-21 09:39:22.0 +0200
+++ gcc/config/i386/i386.c  2011-10-21 10:03:43.0 +0200
@@ -36023,6 +36023,16 @@ expand_vec_perm_even_odd_1 (struct expan
   return expand_vec_perm_vpshufb2_vpermq_even_odd (d);
 
 case V4DImode:
+  if (!TARGET_AVX2)
+   {
+ struct expand_vec_perm_d d_copy = *d;
+ d_copy.vmode = V4DFmode;
+ d_copy.target = gen_lowpart (V4DFmode, d->target);
+ d_copy.op0 = gen_lowpart (V4DFmode, d->op0);
+ d_copy.op1 = gen_lowpart (V4DFmode, d->op1);
+ return expand_vec_perm_even_odd_1 (&d_copy, odd);
+   }
+
   t1 = gen_reg_rtx (V4DImode);
   t2 = gen_reg_rtx (V4DImode);
 
@@ -36039,6 +36049,16 @@ expand_vec_perm_even_odd_1 (struct expan
   break;
 
 case V8SImode:
+  if (!TARGET_AVX2)
+   {
+ struct expand_vec_perm_d d_copy = *d;
+ d_copy.vmode = V8SFmode;
+ d_copy.target = gen_lowpart (V8SFmode, d->target);
+ d_copy.op0 = gen_lowpart (V8SFmode, d->op0);
+ d_copy.op1 = gen_lowpart (V8SFmode, d->op1);
+ return expand_vec_perm_even_odd_1 (&d_copy, odd);
+   }
+
   t1 = gen_reg_rtx (V8SImode);
   t2 = gen_reg_rtx (V8SImode);
 
--- gcc/testsuite/gcc.dg/torture/vshuf-8.inc.jj 2011-10-20 14:13:38.0 
+0200
+++ gcc/testsuite/gcc.dg/torture/vshuf-8.inc2011-10-21 09:58:47.0 
+0200
@@ -17,7 +17,9 @@ T (13,14, 8, 12, 3, 13, 9, 5, 4) \
 T (14, 15, 3, 13, 6, 14, 12, 10, 0) \
 T (15, 0, 5, 11, 7, 4, 6, 14, 1) \
 T (16, 0, 2, 4, 6, 8, 10, 12, 14) \
-T (17, 1, 3, 5, 7, 9, 11, 13, 15)
+T (17, 1, 3, 5, 7, 9, 11, 13, 15) \
+T (18, 3, 3, 3, 3, 3, 3, 3, 3) \
+T (19, 7, 6, 5, 4, 3, 2, 1, 0)
 #define EXPTESTS \
 T (116,9, 3, 9, 4, 7, 0, 0, 6) \
 T (117,4, 14, 12, 8, 9, 6, 0, 10) \
--- gcc/testsuite/gcc.dg/torture/vshuf-4.inc.jj 2011-10-20 14:13:38.0 
+0200
+++ gcc/testsuite/gcc.dg/torture/vshuf-4.inc2011-10-21 09:59:14.0 
+0200
@@ -17,7 +17,9 @@ T (13,2, 3, 0, 4) \
 T (14, 7, 6, 4, 2) \
 T (15, 6, 1, 3, 4) \
 T (16, 0, 2, 4, 6) \
-T (17, 1, 3, 5, 7)
+T (17, 1, 3, 5, 7) \
+T (18, 3, 3, 3, 3) \
+T (19, 3, 2, 1, 0)
 #define EXPTESTS \
 T (116,1, 2, 4, 3) \
 T (117,7, 3, 3, 0) \
--- gcc/testsuite/gcc.dg/torture/vshuf-32.inc.jj2011-10-20 
14:13:38.0 +0200
+++ gcc/testsuite/gcc.dg/torture/vshuf-32.inc   2011-10-21 09:57:21.0 
+0200
@@ -17,7 +17,9 @@ T (13,7, 51, 13, 61, 25, 4, 19, 58, 35,
 T (14, 22, 53, 28, 42, 45, 38, 49, 13, 54, 61, 21, 52, 7, 16, 34, 9, 1, 43, 
62, 43, 35, 50, 47, 58, 20, 3, 30, 15, 37, 53, 43, 36) \
 T (15, 2, 43, 49, 34, 28, 35, 29, 36, 51, 9, 17, 48, 10, 37, 45, 21, 52, 19, 
25, 33, 60, 31, 30, 42, 12, 26, 27, 46, 5, 40, 14, 36) \
 T (16, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 
38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62) \
-T (17, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 
39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63)
+T (17, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 
39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63) \
+T (18, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3) \
+T (19, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 
13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
 #define EXPTESTS \
 T (116,13, 38, 47, 3, 17, 8, 38, 20, 59, 61, 39, 26, 7, 49, 63, 43, 
57, 16, 40, 19, 4, 32, 27, 7, 52, 19, 46, 55, 36, 41, 48, 6) \
 T (117,39, 35, 59, 20, 56, 18, 58, 63, 57, 14, 2, 16, 5, 61, 35, 4, 
53, 9, 52, 51, 27

Re: [pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-21 Thread Lawrence Crowl

On 10/20/11, Gabriel Charette  wrote:
> I just thought about something..
>
> Earlier I said that ALL line_table issues were resolved after this
> patch (as it ignores the re-included headers that were guarded, as the
> non-pph compiler does naturally).
>
> One problem remains however, I'm pretty sure that re-included
> non-pph'ed header's line_table entries are still showing up multiple
> times (as the direct non-pph childs of a given pph_include have their
> line_table entries copied one by one from the pph file).
>
> I think we were talking about somehow remembering guards context in
> which DECLs were declared and then ignoring DECLs streamed in if they
> belong to a given "header guard type" that was previously seen in a
> prior include using the same non-pph header, allowing us to ignore
> those DECLs that are re-declared when they should have been guarded
> out the second time.
>
> I'm not sure whether there is machinery to handle non-pph re-includes
> yet... but... at the very least, I'm pretty sure those non-pph entries
> still show up multiple times in the line_table.
>
> Now, we can't just remove/ignore those entries either... doing so
> would alter the expected location offset (pph_loc_offset) applied to
> all tokens streamed in directly from the pph header.
>
> What we could potentially do is:
> - ignore the repeated non-pph entry
> - remember the number of locations this entry was "supposed" to take
> (call that pph_loc_ignored_offset)
> - then for DECLs imported after it we would then need an offset of
> pph_loc_offset - pph_loc_ignored_offset, to compensate for the missing
> entries in the line_table
>
> The problem here obviously is that I don't think we have a way of
> knowing which DECLs come before, inside, and after a given non-pph
> header included in the parent pph header which we are currently
> reading.
>
> Furthermore, a DECL coming after the non-pph header could potentially
> refer to something inside the ignored non-pph header and the
> source_location of the referred token would now be invalid (although
> that might already be fixed by the cache hit which would redirect that
> token reference to the same token in the first included copy of that
> same header which wasn't actually skipped as it was first and which is
> valid)
>
>
> On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo  wrote:
>> @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream)
>>   int entries_offset = line_table->used -
>> PPH_NUM_IGNORED_LINE_TABLE_ENTRIES;
>>   enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker
>> (stream);
>>
>> -  pph_reading_includes++;
>> -
>>   for (first = true; next_lt_marker != PPH_LINETABLE_END;
>>next_lt_marker = pph_in_linetable_marker (stream))
>> {
>> @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream)
>>  else
>>lm->included_from += entries_offset;
>>
>
> Also, if we do ignore some non-pph entries, the included_from
> calculation is going to need some trickier logic as well (it's fine
> for the pph includes though as each child calculates it's own offset)
>
>> - gcc_assert (lm->included_from < (int) line_table->used);
>> -
>
> Also, I think this slipped in my previous comment, but I don't see how
> this assert could trigger in the current code. If it did trigger
> something was definitely wrong as it asserts the offseted
> included_from is referring to an entry that is actually in the
> line_table...
>
>>  lm->start_location += pph_loc_offset;

I'm wondering if we shouldn't just whitelist the problematic cases
that we know about in the system/standard headers.  It seems that all
others we could reasonably complain to the maintainers of the code.

-- 
Lawrence Crowl

Ping: [PATCH] non-GNU C++ compilers

2011-10-21 Thread Marc Glisse


Hello,

anyone willing to commit this?


On Sat, 24 Sep 2011, Marc Glisse wrote:


On Sat, 17 Sep 2011, Joseph S. Myers wrote:


These are OK (with ChangeLog entries properly omitting the "include/",
since they go in include/ChangeLog) in the absence of libiberty maintainer
objections within 72 hours.


Thanks. Is someone willing to commit them now they have been accepted? I am 
attaching them as a single patch and copying the changelog entries here for 
convenience (I wrote the date of Monday because it looks like a day where 
someone might have time to commit...).


include/ChangeLog:

2011-09-26  Ulrich Drepper  

* obstack.h [!GNUC] (obstack_free): Avoid cast to int.

2011-09-26  Marc Glisse  

* ansidecl.h (ENUM_BITFIELD): Always use enum in C++


--
Marc GlisseIndex: include/ansidecl.h
===
--- include/ansidecl.h  (revision 179146)
+++ include/ansidecl.h  (working copy)
@@ -416,10 +416,12 @@
 #define EXPORTED_CONST const
 #endif
 
-/* Be conservative and only use enum bitfields with GCC.
+/* Be conservative and only use enum bitfields with C++ or GCC.
FIXME: provide a complete autoconf test for buggy enum bitfields.  */
 
-#if (GCC_VERSION > 2000)
+#ifdef __cplusplus
+#define ENUM_BITFIELD(TYPE) enum TYPE
+#elif (GCC_VERSION > 2000)
 #define ENUM_BITFIELD(TYPE) __extension__ enum TYPE
 #else
 #define ENUM_BITFIELD(TYPE) unsigned int
Index: include/obstack.h
===
--- include/obstack.h   (revision 179146)
+++ include/obstack.h   (working copy)
@@ -532,9 +532,9 @@
 # define obstack_free(h,obj)   \
 ( (h)->temp = (char *) (obj) - (char *) (h)->chunk,\
   (((h)->temp > 0 && (h)->temp < (h)->chunk_limit - (char *) (h)->chunk)\
-   ? (int) ((h)->next_free = (h)->object_base  \
-   = (h)->temp + (char *) (h)->chunk)  \
-   : (((obstack_free) ((h), (h)->temp + (char *) (h)->chunk), 0), 0)))
+   ? (((h)->next_free = (h)->object_base   \
+   = (h)->temp + (char *) (h)->chunk), 0)  \
+   : ((obstack_free) ((h), (h)->temp + (char *) (h)->chunk), 0)))
 
 #endif /* not __GNUC__ or not __STDC__ */

[Patch,AVR] Clean-up avr.c: Break long lines, move targetm to end of file.

2011-10-21 Thread Georg-Johann Lay

No functional change, just a bit of clean-up.

Definition of targetm is moved towards end of the file.

Ok for trunk?

Johann

* config/avr/avr.c: Break long lines.
Define target hooks on the fly if applicable.
(TARGET_ASM_FUNCTION_RODATA_SECTION): Remove first definition
overridden later.
(targetm): Move definition to end of file.
(avr_can_eliminate): Make static on the fly.
(avr_frame_pointer_required_p): Ditto.
(avr_hard_regno_scratch_ok): Ditto.
(avr_builtin_setjmp_frame_value): Make static on the fly.
Indent according to coding rules.
(avr_case_values_threshold): Ditto.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 180308)
+++ config/avr/avr.c	(working copy)
@@ -56,7 +56,9 @@
 
 #define AVR_SECTION_PROGMEM (SECTION_MACH_DEP << 0)
 
-static void avr_option_override (void);
+
+/* Prototypes for local helper functions.  */
+
 static int avr_naked_function_p (tree);
 static int interrupt_function_p (tree);
 static int signal_function_p (tree);
@@ -69,52 +71,18 @@ static const char *ptrreg_to_str (int);
 static const char *cond_string (enum rtx_code);
 static int avr_num_arg_regs (enum machine_mode, const_tree);
 
-static rtx avr_legitimize_address (rtx, rtx, enum machine_mode);
 static tree avr_handle_progmem_attribute (tree *, tree, tree, int, bool *);
 static tree avr_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
 static tree avr_handle_fntype_attribute (tree *, tree, tree, int, bool *);
-static bool avr_assemble_integer (rtx, unsigned int, int);
-static void avr_file_start (void);
-static void avr_file_end (void);
-static bool avr_legitimate_address_p (enum machine_mode, rtx, bool);
-static void avr_asm_function_end_prologue (FILE *);
-static void avr_asm_function_begin_epilogue (FILE *);
-static bool avr_cannot_modify_jumps_p (void);
-static rtx avr_function_value (const_tree, const_tree, bool);
-static rtx avr_libcall_value (enum machine_mode, const_rtx);
-static bool avr_function_value_regno_p (const unsigned int);
-static void avr_insert_attributes (tree, tree *);
-static void avr_asm_init_sections (void);
-static unsigned int avr_section_type_flags (tree, const char *, int);
-
-static void avr_reorg (void);
-static void avr_asm_out_ctor (rtx, int);
-static void avr_asm_out_dtor (rtx, int);
-static int avr_register_move_cost (enum machine_mode, reg_class_t, reg_class_t);
-static int avr_memory_move_cost (enum machine_mode, reg_class_t, bool);
 static int avr_operand_rtx_cost (rtx, enum machine_mode, enum rtx_code,
  int, bool);
-static bool avr_rtx_costs (rtx, int, int, int, int *, bool);
-static int avr_address_cost (rtx, bool);
-static bool avr_return_in_memory (const_tree, const_tree);
 static struct machine_function * avr_init_machine_status (void);
-static void avr_init_builtins (void);
-static rtx avr_expand_builtin (tree, rtx, rtx, enum machine_mode, int);
-static rtx avr_builtin_setjmp_frame_value (void);
-static bool avr_hard_regno_scratch_ok (unsigned int);
-static unsigned int avr_case_values_threshold (void);
-static bool avr_frame_pointer_required_p (void);
-static bool avr_can_eliminate (const int, const int);
-static bool avr_class_likely_spilled_p (reg_class_t c);
-static rtx avr_function_arg (cumulative_args_t , enum machine_mode,
-			 const_tree, bool);
-static void avr_function_arg_advance (cumulative_args_t, enum machine_mode,
-  const_tree, bool);
-static bool avr_function_ok_for_sibcall (tree, tree);
-static void avr_asm_named_section (const char *name, unsigned int flags, tree decl);
-static void avr_encode_section_info (tree, rtx, int);
-static section* avr_asm_function_rodata_section (tree);
-static section* avr_asm_select_section (tree, int, unsigned HOST_WIDE_INT);
+
+
+/* Prototypes for hook implementors if needed before their implementation.  */
+
+static bool avr_rtx_costs (rtx, int, int, int, int *, bool);
+
 
 /* Allocate registers from r25 to r8 for parameters for function calls.  */
 #define FIRST_CUM_REG 26
@@ -197,8 +165,6 @@ static const struct attribute_spec avr_a
 
 #undef TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE avr_attribute_table
-#undef TARGET_ASM_FUNCTION_RODATA_SECTION
-#define TARGET_ASM_FUNCTION_RODATA_SECTION default_no_function_rodata_section
 #undef TARGET_INSERT_ATTRIBUTES
 #define TARGET_INSERT_ATTRIBUTES avr_insert_attributes
 #undef TARGET_SECTION_TYPE_FLAGS
@@ -274,7 +240,6 @@ static const struct attribute_spec avr_a
 #undef TARGET_ASM_FUNCTION_RODATA_SECTION
 #define TARGET_ASM_FUNCTION_RODATA_SECTION avr_asm_function_rodata_section
 
-struct gcc_target targetm = TARGET_INITIALIZER;
 
 
 /* Custom function to replace string prefix.
@@ -535,7 +500,7 @@ avr_regs_to_save (HARD_REG_SET *set)
 
 /* Return true if register FROM can be eliminated via register TO.  */
 
-bool
+static bool
 avr_can_eliminate (const int from, const int to)
 {
   re

[PATCH, libcpp] Fix cpp_peek_token behaviour (PR bootstrap/50778)

2011-10-21 Thread Dodji Seketeli

Hello,

cpp_peek_token can fail to peek tokens sometimes because
_cpp_remaining_tokens_num_in_context counts tokens only from the current
context; worse, _cpp_token_from_context_at also get tokens only from the
context context.  They should both operate on the context given by
cpp_peek_token.  I did the mistake in the first place.

This was breaking bootstrap on ppc-darwin.

Fixed thus, bootstrapped on x86_64-unknown-linux-gnu, tested on
ppc-darwin and a full (very slow) bootstrap is on-going on ppc-darwin.

OK for trunk when ppc-darwin finishes? (or even before?)

libcpp/

* include/internal.h (_cpp_remaining_tokens_num_in_context): Take the
context to act upon.
* lex.c (_cpp_remaining_tokens_num_in_context): Likewise.  Update
comment.
(cpp_token_from_context_at): Likewise.
(cpp_peek_token): Use the context to peek tokens from.
---
 libcpp/internal.h |2 +-
 libcpp/lex.c  |   16 +++-
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/libcpp/internal.h b/libcpp/internal.h
index 6fb2606..e60330df 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -652,7 +652,7 @@ extern cpp_token *_cpp_lex_direct (cpp_reader *);
 extern int _cpp_equiv_tokens (const cpp_token *, const cpp_token *);
 extern void _cpp_init_tokenrun (tokenrun *, unsigned int);
 extern cpp_hashnode *_cpp_lex_identifier (cpp_reader *, const char *);
-extern int _cpp_remaining_tokens_num_in_context (cpp_reader *);
+extern int _cpp_remaining_tokens_num_in_context (cpp_context *);
 
 /* In init.c.  */
 extern void _cpp_maybe_push_include_file (cpp_reader *);
diff --git a/libcpp/lex.c b/libcpp/lex.c
index 527368b..896a3be 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -1703,12 +1703,11 @@ next_tokenrun (tokenrun *run)
   return run->next;
 }
 
-/* Return the number of not yet processed token in the the current
+/* Return the number of not yet processed token in a given
context.  */
 int
-_cpp_remaining_tokens_num_in_context (cpp_reader *pfile)
+_cpp_remaining_tokens_num_in_context (cpp_context *context)
 {
-  cpp_context *context = pfile->context;
   if (context->tokens_kind == TOKENS_KIND_DIRECT)
 return (LAST (context).token - FIRST (context).token);
   else if (context->tokens_kind == TOKENS_KIND_INDIRECT
@@ -1718,12 +1717,11 @@ _cpp_remaining_tokens_num_in_context (cpp_reader *pfile)
   abort ();
 }
 
-/* Returns the token present at index INDEX in the current context.
-   If INDEX is zero, the next token to be processed is returned.  */
+/* Returns the token present at index INDEX in a given context.  If
+   INDEX is zero, the next token to be processed is returned.  */
 static const cpp_token*
-_cpp_token_from_context_at (cpp_reader *pfile, int index)
+_cpp_token_from_context_at (cpp_context *context, int index)
 {
-  cpp_context *context = pfile->context;
   if (context->tokens_kind == TOKENS_KIND_DIRECT)
 return &(FIRST (context).token[index]);
   else if (context->tokens_kind == TOKENS_KIND_INDIRECT
@@ -1744,10 +1742,10 @@ cpp_peek_token (cpp_reader *pfile, int index)
   /* First, scan through any pending cpp_context objects.  */
   while (context->prev)
 {
-  ptrdiff_t sz = _cpp_remaining_tokens_num_in_context (pfile);
+  ptrdiff_t sz = _cpp_remaining_tokens_num_in_context (context);
 
   if (index < (int) sz)
-return _cpp_token_from_context_at (pfile, index);
+return _cpp_token_from_context_at (context, index);
   index -= (int) sz;
   context = context->prev;
 }
-- 
1.7.6.4


-- 
Dodji

Ping: demangle C++ extern "C" functions

2011-10-21 Thread Marc Glisse



Ping (changing the Cc to bother a different person...).

On Sat, 3 Sep 2011, Marc Glisse wrote:


Hello,

this patch is obviously related to PR c++/2316 ("g++ fails to overload on 
language linkage") but seems fairly independent. Currently, the demangler 
recognizes 'Y' for extern "C" functions and ignores it. The patch makes it 
print extern "C" after the function type:

_Z1aIFYviEEvPT_
void a(void (*)(int) extern "C")

Writing it before the return type seems more natural, but it is ambiguous. I 
guess it could also be printed in the middle (next to the star that indicates 
a function pointer), but placing it like the cv-qualifiers of member 
functions seemed good (plus, that's where Oracle puts it in its 
implementation of c++filt).


Since g++ doesn't generate such mangling, the effect should be hard to notice 
;-)


(Even if the patch was ok, I am not a committer)

2011-09-03  Marc Glisse  

   * include/demangle.h (demangle_component_type)
   [DEMANGLE_COMPONENT_EXTERN_C]: Handle extern "C".
   * libiberty/cp-demangle.c (d_dump): Likewise.
   (d_make_comp): Likewise.
   (d_function_type): Likewise.
   (d_print_comp): Likewise.
   (d_print_mod_list): Likewise.
   (d_print_mod): Likewise.
   (d_print_function_type): Likewise.
   * libiberty/testsuite/demangle-expected: Test it.


--
Marc GlisseIndex: include/demangle.h
===
--- include/demangle.h  (revision 178498)
+++ include/demangle.h  (working copy)
@@ -288,6 +288,9 @@
   /* The const qualifier.  The one subtree is the type which is being
  qualified.  */
   DEMANGLE_COMPONENT_CONST,
+  /* extern "C" linkage.  The one subtree is the function type which
+ is being qualified.  */
+  DEMANGLE_COMPONENT_EXTERN_C,
   /* The restrict qualifier modifying a member function.  The one
  subtree is the type which is being qualified.  */
   DEMANGLE_COMPONENT_RESTRICT_THIS,
Index: libiberty/testsuite/demangle-expected
===
--- libiberty/testsuite/demangle-expected   (revision 178498)
+++ libiberty/testsuite/demangle-expected   (working copy)
@@ -4151,3 +4151,8 @@
 --format=auto
 
_ZN3Psi7VariantIIcPKcEE5visitIIRZN11VariantTest9TestVisit11test_methodEvEUlS2_E0_RZNS6_11test_methodEvEUlcE1_RZNS6_11test_methodEvEUlNS_4NoneEE_EEENS_13VariantDetail19SelectVisitorResultIIDpT_EE4typeEDpOSG_
 
Psi::VariantDetail::SelectVisitorResult::type 
Psi::Variant::visit((VariantTest::TestVisit::test_method()::{lambda(Psi::None)#1}&)...)
+#
+# extern "C" linkage for function types.
+--format=gnu-v3
+_Z1aIFYviEEvPT_
+void a(void (*)(int) extern "C")
Index: libiberty/cp-demangle.c
===
--- libiberty/cp-demangle.c (revision 178498)
+++ libiberty/cp-demangle.c (working copy)
@@ -591,6 +591,9 @@
 case DEMANGLE_COMPONENT_CONST:
   printf ("const\n");
   break;
+case DEMANGLE_COMPONENT_EXTERN_C:
+  printf ("extern \"C\"\n");
+  break;
 case DEMANGLE_COMPONENT_RESTRICT_THIS:
   printf ("restrict this\n");
   break;
@@ -807,6 +810,7 @@
   break;
 
   /* These types only require one parameter.  */
+case DEMANGLE_COMPONENT_EXTERN_C:
 case DEMANGLE_COMPONENT_VTABLE:
 case DEMANGLE_COMPONENT_VTT:
 case DEMANGLE_COMPONENT_TYPEINFO:
@@ -2324,18 +2328,22 @@
 d_function_type (struct d_info *di)
 {
   struct demangle_component *ret;
+  int is_extern_c = 0;
 
   if (! d_check_char (di, 'F'))
 return NULL;
   if (d_peek_char (di) == 'Y')
 {
-  /* Function has C linkage.  We don't print this information.
-FIXME: We should print it in verbose mode.  */
+  /* Function has C linkage.  */
+  is_extern_c = 1;
   d_advance (di, 1);
+  di->expansion += sizeof "extern \"C\"";
 }
   ret = d_bare_function_type (di, 1);
   if (! d_check_char (di, 'E'))
 return NULL;
+  if (is_extern_c)
+ret = d_make_comp (di, DEMANGLE_COMPONENT_EXTERN_C, ret, NULL);
   return ret;
 }
 
@@ -3925,6 +3933,7 @@
 case DEMANGLE_COMPONENT_RESTRICT_THIS:
 case DEMANGLE_COMPONENT_VOLATILE_THIS:
 case DEMANGLE_COMPONENT_CONST_THIS:
+case DEMANGLE_COMPONENT_EXTERN_C:
 case DEMANGLE_COMPONENT_VENDOR_TYPE_QUAL:
 case DEMANGLE_COMPONENT_POINTER:
 case DEMANGLE_COMPONENT_COMPLEX:
@@ -4537,7 +4546,8 @@
   || (! suffix
  && (mods->mod->type == DEMANGLE_COMPONENT_RESTRICT_THIS
  || mods->mod->type == DEMANGLE_COMPONENT_VOLATILE_THIS
- || mods->mod->type == DEMANGLE_COMPONENT_CONST_THIS)))
+ || mods->mod->type == DEMANGLE_COMPONENT_CONST_THIS
+ || mods->mod->type == DEMANGLE_COMPONENT_EXTERN_C)))
 {
   d_print_mod_list (dpi, options, mods->next, suffix);
   return;
@@ -4628,6 +4638,9 @@
 case DEMANGLE_COMPONENT_CONST_THIS:
   d_append_string (dpi, " const");
   return;
+case

RE: [Patch,AVR] Clean-up avr.c: Break long lines, move targetm to end of file.

2011-10-21 Thread Weddington, Eric



> -Original Message-
> From: Georg-Johann Lay [mailto:a...@gjlay.de]
> Sent: Friday, October 21, 2011 12:52 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Weddington, Eric; Denis Chertykov
> Subject: [Patch,AVR] Clean-up avr.c: Break long lines, move targetm to
end
> of file.
> 
> No functional change, just a bit of clean-up.
> 
> Definition of targetm is moved towards end of the file.
> 
> Ok for trunk?

Please commit. :-)

Re: [patch] dwarf2out: Use DW_FORM_ref_udata (.debug -= 7.53%)

2011-10-21 Thread Jakub Jelinek

On Fri, Oct 21, 2011 at 08:01:29PM +0200, Jan Kratochvil wrote:
> I had a draft patch to use DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref_udata and
> DW_FORM_ref4 which should be slightly smaller than just DW_FORM_ref_udata.
> But it produced larger files than just DW_FORM_ref_udata.  Assuming it was due
> to multiplied abbrev definitions.  One would need to decide when if it is
> worth to create a new smaller-sized abbrev but that seems too complicated for
> such kind of optimization.  There exist other project(s) in development for
> DWARF optimizations as a post-processing tool, this patch is meant just as
> a very simple way to reduce the (possibly intermediate) DWARF files.

Well, you calculate the sizes multiple times anyway, so I don't see why you
during the size calculations you couldn't start with DW_FORM_ref_udata
as first guess and compute on the side also total sizes of those
DW_FORM_ref_udata bytes and use that number plus the guessed length
of the whole CU to decide if replacing all DW_FORM_ref_udata with
DW_FORM_ref{1,2,4} wouldn't be beneficial.  Small complication for that is
that when there are multiple .debug_info/.debug_types sections that share
the same .debug_abbrev, the decision needs to be done for all of them
together.

BTW, is your 

> +  /* The DIE sizes can increase, due to DW_FORM_ref_udata size increase
> + dependent on increases of other DIE_OFFSETs.  */
> +  do
> +{
> +  /* Initialize the beginning DIE offset - and calculate sizes/offsets.  
> */
> +  next_die_offset = init_die_offset;
> +  calc_die_sizes_change = false;
> +  calc_die_sizes (die);
> +}
> +  while (calc_die_sizes_change);

loop guaranteed to terminate?  If the CU is either only growing or only
shrinking then it hopefully should, but it would be nice to assert that.
For references to DIEs with lower offsets you start with a roughly correct
guess, for references to DIEs with higher offsets you start with 0 and then
just keep growing?

Jakub

Re: [libcpp] Correctly define __cplusplus (PR libstdc++-v3/1773)

2011-10-21 Thread Marc Glisse


On Tue, 9 Aug 2011, Jason Merrill wrote:


On 08/09/2011 09:14 AM, Marc Glisse wrote:

I don't think we should define the C++ 2011 value yet. In my opinion, we
should wait until:
1) the standard is official
2) gcc implements most of it: people will want to use __cplusplus as a
test to know if they can use C++0X features, not if the compiler does
some effort to implement half of them.


I'm of two minds about this, but I see that clang and edg still use 199711L 
in C++0x mode, so let's stick with that for now.


Note that at least clang now defines __cplusplus to its new C++11 value 
(in experimental C++0X mode only). Apparently they switched around last 
June and say they are not the only ones. So if you want to follow their 
lead...


--
Marc Glisse

Re: your mail

2011-10-21 Thread Jakub Jelinek

On Fri, Oct 21, 2011 at 09:54:56AM -0700, Steve Ellcey wrote:
> FYI: I am seeing this same ICE on the hppa64-hp-hpux11.11 bootstrap.
> 
> (debug_expr:DI D#49)
> /ctires/gcc/nightly/src/trunk/gcc/cselib.c: In function 'void 
> cselib_record_sets(rtx)':
> /ctires/gcc/nightly/src/trunk/gcc/cselib.c:2424:1: internal compiler error: 
> in mem_loc_descriptor, at dwarf2out.c:12379
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See  for instructions.
> 
> I am trying to cut down the test case and find out exactly when
> it started failing.  The last successful bootstrap I had was r180174 and
> the first known bad one is r180233.

My guess would be
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180194
DEBUG_EXPR certainly shouldn't make it through into mem_loc_descriptor,
var-tracking is supposed to resolve that.

Jakub

Re: [PATCH] Fix V4DImode/V8SImode extract even/odd permutation with -mavx (PR target/50813)

2011-10-21 Thread Richard Henderson

On 10/21/2011 11:42 AM, Jakub Jelinek wrote:
>   PR target/50813
>   * config/i386/i386.c (expand_vec_perm_even_odd_1): Handle
>   V4DImode and V8SImode for !TARGET_AVX2.
> 
>   * gcc.dg/torture/vshuf-32.inc: Add broadcast permutation
>   from element other than first and reverse permutation.
>   * gcc.dg/torture/vshuf-16.inc: Likewise.
>   * gcc.dg/torture/vshuf-8.inc: Likewise.
>   * gcc.dg/torture/vshuf-4.inc: Likewise.

Ok.


r~

Re: [RFA:] fix breakage with "Update testsuite to run with slim LTO"

2011-10-21 Thread Hans-Peter Nilsson

> Date: Fri, 21 Oct 2011 20:34:05 +0200
> From: Jan Hubicka 

> I guess we could make ipa-dump/rtl-dump/tree-dump scanning to disable fat lto
> and introduce variants intended to scan late tree dumps and ipa execution 
> dumps...

Ok, sounds like a plan.  Are there any such
scan-tests-with-late-thin-lto at present?

> Not sure if it would make more sense than just doing it
> explicitely in tests.

Depends on the number of tests there'll eventually be; I just
see it as strictly increasing and even now a bit above
break-even for improving the machinery instead of patching
separate tests.

BTW, was your ok approval or do I need ok from a testsuite
maintainer?

brgds, H-P

Re: [v3] libstdc++/50196 - enable std::thread, std::mutex etc. on darwin

2011-10-21 Thread Jonathan Wakely

On 21 October 2011 09:15, Jonathan Wakely wrote:
> On 21 October 2011 00:43, Jonathan Wakely wrote:
>> This patch should enable macosx support for  and partial
>> support for , by defining _GLIBCXX_HAS_GTHREADS on POSIX
>> systems without the _POSIX_TIMEOUTS option, and only disabling the
>> types which rely on the Timeouts option, std::timed_mutex and
>> std::recursive_timed_mutex, instead of disabling all thread support.
>
> I've just realised this patch will disable the timed mutexes on
> non-posix platforms - I should only check for _POSIX_TIMEOUTS when
> thread-model = posix, and set HAS_MUTEX_TIMEDLOCK unconditionally
> elsewhere.

Updated patch so that _GTHREADS_HAS_MUTEX_TIMED_LOCK is 1 for
non-posix systems and posix systems that support the Timeouts option.

Tested x86_64-linux.  I think this is OK now and plan to commit it
over the weekend.

* acinclude.m4 (GLIBCXX_HAS_GTHREADS): Don't depend on _POSIX_TIMEOUTS.
* configure: Regenerate.
* include/std/mutex (timed_mutex, recursive_timed_mutex): Define
conditionally on GTHREADS_HAS_MUTEX_TIMEDLOCK.
* testsuite/lib/libstdc++.exp (check_v3_target_gthreads_timed): Define.
* testsuite/lib/dg-options.exp (dg-require-gthreads-timed): Define.
* testsuite/30_threads/recursive_timed_mutex/dest/destructor_locked.cc:
Use dg-require-gthreads-timed instead of dg-require-gthreads.
* testsuite/30_threads/recursive_timed_mutex/native_handle/
typesizes.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/native_handle/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_until/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_until/2.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/cons/assign_neg.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/cons/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/cons/copy_neg.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/requirements/typedefs.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock/2.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/lock/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/lock/2.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/unlock/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_for/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_for/2.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_for/3.cc:
Likewise.
* testsuite/30_threads/timed_mutex/dest/destructor_locked.cc: Likewise.
* testsuite/30_threads/timed_mutex/native_handle/typesizes.cc:
Likewise.
* testsuite/30_threads/timed_mutex/native_handle/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_until/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_until/2.cc: Likewise.
* testsuite/30_threads/timed_mutex/cons/assign_neg.cc: Likewise.
* testsuite/30_threads/timed_mutex/cons/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/cons/copy_neg.cc: Likewise.
* testsuite/30_threads/timed_mutex/requirements/standard_layout.cc:
Likewise.
* testsuite/30_threads/timed_mutex/requirements/typedefs.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock/2.cc: Likewise.
* testsuite/30_threads/timed_mutex/lock/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/unlock/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_for/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_for/2.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_for/3.cc: Likewise.
Index: acinclude.m4
===
--- acinclude.m4	(revision 180278)
+++ acinclude.m4	(working copy)
@@ -3358,11 +3358,19 @@
   ac_save_CXXFLAGS="$CXXFLAGS"
   CXXFLAGS="$CXXFLAGS -fno-exceptions -I${toplevel_srcdir}/gcc"
 
-  AC_MSG_CHECKING([check whether it can be safely assumed that mutex_timedlock is available])
+  target_thread_file=`$CXX -v 2>&1 | sed -n 's/^Thread model: //p'`
+  case $target_thread_file in
+posix)
+  CXXFLAGS="$CXXFLAGS -DSUPPORTS_WEAK -DGTHREAD_USE_WEAK -D_PTHREADS"
+  esac
 
+  AC_MSG_CHECKING([whether it can be safely assumed that mutex_timedlock is available])
+
   AC_TRY_COMPILE([#include ],
 [
-  #if !defined(_POSIX_TIMEOUTS) || _POSIX_TIMEOUTS < 0
+  // In case of POSIX threads check _POSIX_TIMEOUTS.
+  #if (defined(_PTHREADS) \
+  && (!defined(_POSIX_TIMEOUTS) || _POSIX_TIMEOUTS <= 0))
   #error
   #endif
 ], [ac_g

Re: [patch] dwarf2out: Use DW_FORM_ref_udata (.debug -= 7.53%)

2011-10-21 Thread Jan Kratochvil

On Fri, 21 Oct 2011 21:11:16 +0200, Jakub Jelinek wrote:
> Well, you calculate the sizes multiple times anyway, so I don't see why you
> during the size calculations you couldn't start with DW_FORM_ref_udata
> as first guess and compute on the side also total sizes of those
> DW_FORM_ref_udata bytes and use that number plus the guessed length
> of the whole CU to decide if replacing all DW_FORM_ref_udata with
> DW_FORM_ref{1,2,4} wouldn't be beneficial.

The optimal sizes are:
value less than 1 <<  8: DW_FORM_ref1
value less than 1 << 16: DW_FORM_ref2
value less than 1 << 21: DW_FORM_ref_udata
value less than 1 << 32: DW_FORM_ref4

One would have to decide each size specifically if it isn't worth to use the
larger size for the few instances.  It could be done, just the gain is not
expected (not measured) big against DW_FORM_ref_udata.  The currently
suboptimal ranges are:
(1 <<  7) <= value < (1 <<  8)
(1 << 14) <= value < (1 << 16)
(1 << 28) <= value

This suboptimal loss is at most 9316 bytes = 0.177% of .debug size but it will
be less as I do not calculate here with the abbrevs multiplication at all:
readelf -wi libstdc++.so|perl -nle 'next if 
!/^.*<(0x[0-9a-f]+)>.*$/;$x=eval $1;$b++ if 
(1<<7)<=$x&&$x<(1<<8)||(1<<14)<=$x&&$x<(1<<16)||(1<<28)<=$x;END{print $b;}'
9316

IMHO it is not worth it.

Also this problem is already present there.  The following code may enlarge the
output in specific cases due to duplication of the abbrev definitions:
  /* If the string is shorter or equal to the size of the reference, it is
 always better to put it inline.  */
 ^^ = not always
  if (len <= DWARF_OFFSET_SIZE || node->refcount == 0)
return node->form = DW_FORM_string;

This patch has a size regression for DIE_OFFSETs larger than 1 << 28 = 256MB
when DW_FORM_ref_udata becomes 5 bytes while DW_FORM_ref4 was 4 bytes.
Still I do not think it needs to care about optimal output for a single CU of
size larger than 256MB.


> > +  /* The DIE sizes can increase, due to DW_FORM_ref_udata size increase
> > + dependent on increases of other DIE_OFFSETs.  */
> > +  do
> > +{
> > +  /* Initialize the beginning DIE offset - and calculate 
> > sizes/offsets.  */
> > +  next_die_offset = init_die_offset;
> > +  calc_die_sizes_change = false;
> > +  calc_die_sizes (die);
> > +}
> > +  while (calc_die_sizes_change);
> 
> loop guaranteed to terminate?  If the CU is either only growing or only
> shrinking then it hopefully should, but it would be nice to assert that.
> For references to DIEs with lower offsets you start with a roughly correct
> guess, for references to DIEs with higher offsets you start with 0 and then
> just keep growing?

Yes.  The initial DIE_OFFSETs are 0, therefore their sizeof(uleb128) is 1.
As I assume
VAL1 <= VAL2 implicates sizeof (uleb128 (VAL1)) <= sizeof (uleb128 
(VAL2))

there must be valid on each calc_die_sizes call the updated assertion:
gcc_assert ((unsigned long int) die->die_offset <= next_die_offset);

I do not see anything could shrink anywhere.


Thanks,
Jan

Re: [libcpp] Correctly define __cplusplus (PR libstdc++-v3/1773)

2011-10-21 Thread Jason Merrill


On 10/21/2011 03:11 PM, Marc Glisse wrote:

Note that at least clang now defines __cplusplus to its new C++11 value
(in experimental C++0X mode only). Apparently they switched around last
June and say they are not the only ones. So if you want to follow their
lead...


Hmm, between that and the fact that 4.7 will in fact have almost all of 
the C++11 features, I think changing the value makes sense.


Jason

Re: [PATCH, m68k] Fix floating-point comparisons with zero

2011-10-21 Thread Andreas Schwab

Julian Brown  writes:

> gcc/
> * config/m68k/m68k.c (notice_update_cc): Tighten condition for
> setting CC_REVERSED for FP comparisons.

Ok.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Re: [C++-11] User defined literals

2011-10-21 Thread Tom Tromey

> "Ed" == Ed Smith-Rowland <3dw...@verizon.net> writes:

Ed> +  /* Nonzero for the 2011 C++ Standard.  */
Ed> +  unsigned char cxx11;

I think it would be better if the new field name reflected its purpose,
so something like "user_literals".

Ed> +  if (ISIDST(*cur))
Ed> +   {
Ed> + type = cpp_userdef_string_add_type (type);
Ed> + ++cur;
Ed> +   }
Ed> +  while (ISIDNUM(*cur))

There are a few spots like this that are missing a space before an open
paren.

Otherwise the libcpp changes seem fine to me.  I don't actually know the
C++0x user-defined literal spec, though, so someone else will have to
review it for correctness against that.

Tom

Re: [cxx-mem-model] Handle x86-64 with -m32

2011-10-21 Thread Joseph S. Myers

On Fri, 21 Oct 2011, Aldy Hernandez wrote:

> > > X32 uses x86-64 instruction set with 32bit pointers. It has the same
> > > atomic support as x86-64 and has atomic support for int128.
> > 
> > Oh, you aren't talking about 32 bit, but a 32 bit abi on a 64 bit machine.
> 
> Thanks for pointing this out Joseph.
> 
> The following patch handles both x86_64 and i?86, but only returns true for
> LP64.  Is this what you had in mind?

My understanding from the x32 discussion is that the relevant condition is 
"using 64-bit instructions", not "using an LP64 ABI".  That might be "! 
ia32" in effective-target terms.

-- 
Joseph S. Myers
jos...@codesourcery.com

1 2 >

1 - 100 of 122 matches

Mail list logo