date:20110820

[patch, committed] gfortran.dg/graphite/interchange-1.f: Remove xfail

2011-08-20 Thread Tobias Burnus

I saw that the test case now XPASSes and I find also some xpass in 
gcc-testresult. (Not for all, but I think those do not build graphite.)


I think the XPASS is due to 
http://gcc.gnu.org/ml/fortran/2011-08/msg00023.html


Committed patch as Rev. 177923.

Tobias
Index: gcc/testsuite/gfortran.dg/graphite/interchange-1.f
===
--- gcc/testsuite/gfortran.dg/graphite/interchange-1.f	(Revision 177922)
+++ gcc/testsuite/gfortran.dg/graphite/interchange-1.f	(Arbeitskopie)
@@ -41,5 +41,5 @@
 ! known to be 4 in the inner two loops.  See interchange-2.f for the
 ! kernel from bwaves.
 
-! { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" { xfail *-*-* } } }
+! { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } }
 ! { dg-final { cleanup-tree-dump "graphite" } }
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 177922)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,7 @@
+2011-08-20  Tobias Burnus  
+
+	* gfortran.dg/graphite/interchange-1.f: Remove xfail.
+
 2011-08-19  Mikael Morin  
 
 	PR fortran/50129

Re: Vector Comparison patch

2011-08-20 Thread Richard Guenther

On Fri, Aug 19, 2011 at 5:22 PM, Artem Shinkarov
 wrote:
> On Fri, Aug 19, 2011 at 3:54 PM, Richard Guenther
>  wrote:
>> On Fri, Aug 19, 2011 at 2:29 AM, Artem Shinkarov
>>  wrote:
>>> Hi, I had the problem with passing information about single variable
>>> from expand_vec_cond_expr optab into ix86_expand_*_vcond.
>>>
>>> I looked into it this problem for quite a while and found a solution.
>>> Now the question if it could be done better.
>>>
>>> First of all the problem:
>>>
>>> If we represent any vector comparison with VEC_COND_EXPR < v0  v1
>>> ? {-1,...} : {0,...} >, then in the assembler we do not want to see
>>> this useless comparison with {-1...}.
>>>
>>> Now it is easy to fix the problem about excessive masking. The real
>>> challenge starts when the comparison inside vcond is expressed as a
>>> variable. In that case in order to construct correct vector expression
>>> we need to adjust cond in cond ? v0 : v1 to  cond == {-1...} or as we
>>> agreed recently cond != {0,..}. But hat we need to do only to make
>>> vec_cond_expr happy. On the level of assembler we don't want this
>>> condition.
>>>
>>> Now, if I just construct the tree, then in x86, rtx_equal_p, does not
>>> know that this is a constant vector full of -1, because the comparison
>>> operands are not immediate. So I need somehow to mark the fact in
>>> optabs, and then check the information in the x86.
>>
>> Well, this is why I was suggesting the bitwise semantic for a mask
>> operand.  What we should do on the tree level (and that should happen
>> already), is forward the comparison into the COND_EXPR.  Thus,
>>
>> mask = v1 < v2;
>> v3 = mask ? v4 : v5;
>>
>> should get changed to
>>
>> v3 = v1 < v2 ? v4 : v5;
>>
>> by tree-ssa-forwprop.c.  If that is not happening we have to fix that there.
>
> Yeah, that is something I am working on.
>
>> Because we _don't_ know the mask is all -1 or 0 ;)  The user might
>> put in {3, 5 ,1 3} and expect it to be treated like {-1,...} but it isn't
>> so already.
>>
>>> At the moment I do something like this:
>>>
>>> optabs:
>>>
>>> if (!COMPARISON_CLASS_P (op0))
>>>  ops[3] = gen_rtx_EQ (mode, NULL_RTX, NULL_RTX);
>>>
>>> This expression is preserved while checking and verifying.
>>>
>>> ix86:
>>> if (GET_CODE (comp) == EQ && XEXP (comp, 0) == NULL_RTX
>>>      && XEXP (comp, 1) == NULL_RTX)
>>>
>>> See the patch attached for more details. The patch is just to give you
>>> an idea of the way I am doing it and it seems to work. Please don't
>>> criticise the patch itself, better help me to understand if there is a
>>> better way to pass the information from optabs to ix86.
>>
>> Hm, I'm not sure the expand_vec_cond_expr will work that way,
>> I'd have to play with it myself (but will now be running for weekend).
>>
>> Is the special-casing of a < b ? {-1,-1,-1} : {0,0,0,0} in the backend
>> working for you?  I think there are probably some rtl all-ones and all-zeros
>> predicates you can re-use.
>>
>> Richard.
>
> It works fine. Masks all ones and all zeroes are predefined, all -1
> are not, but I am switching to all zeroes. The real question is that

All -1 is the same as all ones.

> this special case of comparison with two empty operands is a little
> bit hackish. On the other hand there should be no problem with that,

I didn't mean this special case which I believe is incorrect anyways
due to the above comment, but the special case resulting from
expanding v1 < v2 as v1 < v2 ? {-1,-1...} : {0,0,...}.

> because operand 3 is used only to get the code of comparison, noone is
> looking inside the arguments, so we could use this fact. The question
> is whether there is a better way.

As I said above, we can't rely on the mask being either {-1,...} or {0,...}.
If we can, then we should have propagated a comparison, otherwise
we need a real != compare with { 0,}.

> Thanks,
> Artem.
>

Re: Dump stats about hottest hash tables when -fmem-report

2011-08-20 Thread Richard Guenther

On Fri, Aug 19, 2011 at 6:40 PM, Tom Tromey  wrote:
>> "Dimitrios" == Dimitrios Apostolou  writes:
>
> Richard> Note that sparsely populated hashes come at the cost of increased
> Richard> cache footprint.  Not sure what is more important here though, memory
> Richard> access or hash computation.
>
> Tom> I was only approving the change to the dumping.
> Tom> I am undecided about making the hash tables more sparse.
>
> Dimitrios> Since my Core Quad processor has large caches and the i386
> Dimitrios> has small pointer size, the few extra empty buckets impose
> Dimitrios> small overhead, which as it seems is minor in comparison to
> Dimitrios> gains due to less rehashes.
>
> Dimitrios> Maybe that's not true on older or alternate equipment. I'd be
> Dimitrios> very interested to hear about runtime measurements on various
> Dimitrios> equipment, please let me know if you do any.
>
> I think you are the most likely person to do this sort of testing.
> You can use machines on the GCC compile farm for this.
>
> Your patch to change the symbol table's load factor is fine technically.
> I think the argument for putting it in is lacking; what I would like to
> see is either some rationale explaining that the increased memory use is
> not important, or some numbers showing that it still performs well on
> more than a single machine.  My reason for wanting this is just that,
> historically, GCC has been very sensitive to increases in memory use.
> Alternatively, comments from more active maintainers indicating that
> they don't care about this would also help your case.
>
> I can't approve or reject the libiberty change, just the libcpp one.

Yes, memory usage is as important as compile-time.  We still have testcases
that show a vast imbalance of them.  I don't know if the symbol table hash
is ever the problem, but changing the generic load factor in libiberty doesn't
sound a good idea - maybe instead have a away of specifying that factor
per hashtable instance.  Or, as usual, improve the hash function to
reduce the collision rate and/or to make re-hashing cheaper.

Richard.

> Tom
>

Re: [PATCH] Fix execute_update_addresses_taken for loop closed SSA form (PR tree-optimization/48739)

2011-08-20 Thread Richard Guenther

On Fri, Aug 19, 2011 at 9:07 PM, Jakub Jelinek  wrote:
> Hi!
>
> If some variable is optimized from TREE_ADDRESSABLE into a gimple var
> during execute_update_addresses_taken while in loop closed SSA form,
> it might not be rewritten into loop closed SSA form, thus either fail
> verification, or following loop passes might miscompile something.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk/4.6?

Ok.

Thanks,
Richard.

> 2011-08-19  Jakub Jelinek  
>
>        PR tree-optimization/48739
>        * tree-ssa.c: Include cfgloop.h.
>        (execute_update_addresses_taken): When updating ssa, if in
>        loop closed SSA form, call rewrite_into_loop_closed_ssa instead of
>        update_ssa.
>        * Makefile.in (tree-ssa.o): Depend on $(CFGLOOP_H).
>
>        * gcc.dg/pr48739-1.c: New test.
>        * gcc.dg/pr48739-2.c: New test.
>
> --- gcc/tree-ssa.c.jj   2011-08-18 08:36:00.0 +0200
> +++ gcc/tree-ssa.c      2011-08-19 18:51:18.0 +0200
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-dump.h"
>  #include "tree-pass.h"
>  #include "diagnostic-core.h"
> +#include "cfgloop.h"
>
>  /* Pointer map of variable mappings, keyed by edge.  */
>  static struct pointer_map_t *edge_var_maps;
> @@ -2208,7 +2209,10 @@ execute_update_addresses_taken (void)
>          }
>
>       /* Update SSA form here, we are called as non-pass as well.  */
> -      update_ssa (TODO_update_ssa);
> +      if (number_of_loops () > 1 && loops_state_satisfies_p 
> (LOOP_CLOSED_SSA))
> +       rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
> +      else
> +       update_ssa (TODO_update_ssa);
>     }
>
>   BITMAP_FREE (not_reg_needs);
> --- gcc/Makefile.in.jj  2011-08-18 08:36:01.0 +0200
> +++ gcc/Makefile.in     2011-08-19 18:55:17.0 +0200
> @@ -2405,7 +2405,7 @@ tree-ssa.o : tree-ssa.c $(TREE_FLOW_H) $
>    $(TREE_DUMP_H) langhooks.h $(TREE_PASS_H) $(BASIC_BLOCK_H) $(BITMAP_H) \
>    $(FLAGS_H) $(GGC_H) $(HASHTAB_H) pointer-set.h \
>    $(GIMPLE_H) $(TREE_INLINE_H) $(TARGET_H) tree-pretty-print.h \
> -   gimple-pretty-print.h
> +   gimple-pretty-print.h $(CFGLOOP_H)
>  tree-into-ssa.o : tree-into-ssa.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
>    $(TREE_H) $(TM_P_H) $(EXPR_H) output.h $(DIAGNOSTIC_H) \
>    $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
> --- gcc/testsuite/gcc.dg/pr48739-1.c.jj 2011-08-19 18:53:43.0 +0200
> +++ gcc/testsuite/gcc.dg/pr48739-1.c    2011-08-19 18:53:26.0 +0200
> @@ -0,0 +1,27 @@
> +/* PR tree-optimization/48739 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target pthread } */
> +/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-dominator-opts" } 
> */
> +
> +extern int g;
> +extern void bar (void);
> +
> +int
> +foo (int x)
> +{
> +  int a, b, *c = (int *) 0;
> +  for (a = 0; a < 10; ++a)
> +    {
> +      bar ();
> +      for (b = 0; b < 5; ++b)
> +       {
> +         x = 0;
> +         c = &x;
> +         g = 1;
> +       }
> +    }
> +  *c = x;
> +  for (x = 0; x != 10; x++)
> +    ;
> +  return g;
> +}
> --- gcc/testsuite/gcc.dg/pr48739-2.c.jj 2011-08-19 18:53:43.0 +0200
> +++ gcc/testsuite/gcc.dg/pr48739-2.c    2011-08-19 18:54:00.0 +0200
> @@ -0,0 +1,27 @@
> +/* PR tree-optimization/48739 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target pthread } */
> +/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-dominator-opts" } 
> */
> +
> +extern int g, v[10];
> +extern void bar (void);
> +
> +int
> +foo (int x)
> +{
> +  int a, b, *c = (int *) 0;
> +  for (a = 0; a < 10; ++a)
> +    {
> +      bar ();
> +      for (b = 0; b < 5; ++b)
> +       {
> +         x = 0;
> +         c = &x;
> +         g = 1;
> +       }
> +    }
> +  *c = x;
> +  for (x = 0; x != 10; x++)
> +    v[x] = x;
> +  return g;
> +}
>
>        Jakub
>

Re: Announcing the Port of Intel(r) Cilk (TM) Plus into GCC

2011-08-20 Thread Richard Guenther

On Sat, Aug 20, 2011 at 8:12 AM, Mike Stump  wrote:
> On Aug 15, 2011, at 1:30 PM, Iyer, Balaji V wrote:
>>   This letter describes the recently created GCC branch called "cilkplus" 
>> that ports the Intel(R) Cilk(TM) Plus language extensions to the C and C++ 
>> front-ends of gcc-4.7. We are looking for collaborators and advice as we 
>> proceed
>
> Enhance the gcc plugin infrastructure to permit the extension to be a pure 
> plugin.  :-)  I'm thinking about doing this for the Objective-C and 
> Objective-C++ languages, as a fun, get the feet wet project.  We can rely 
> upon -flto to improve performance, should performance be a concern.
>
> The actual goal however, is to provide a way for people to play around and 
> add extensions, like say for example, the Apple Blocks extension, but without 
> rebuilding gcc, only using the standard plugin interface.  I think longer 
> term, this can enhance the design and layout of gcc itself.

I of course like the notion of having data-parallel array statements
in C just like in Fortran.  If only because that makes developing
middle-end arrays
easier and a cross-frontend thing ;)  I suppose the present
implementation scalarizes those in the C frontend, but I didn't yet
look at the branch (and seriously,
a short overview of the code changes, like posting a ChangeLog, would be nice).

Thanks,
Richard.

Re: [PATCH, i386]: Expand round(a) = sgn(a) * floor(fabs(a) + 0.5) using SSE4 ROUND insn

2011-08-20 Thread Uros Bizjak

On Mon, Aug 15, 2011 at 5:25 PM, Michael Matz  wrote:

>> > > .LFB0:
>> > >        .cfi_startproc
>> > >        movsd   .LC0(%rip), %xmm2
>> > >        movapd  %xmm0, %xmm1
>> > >        andpd   %xmm2, %xmm1
>> > >        andnpd  %xmm0, %xmm2
>> > >        addsd   .LC1(%rip), %xmm1
>> > >        roundsd $1, %xmm1, %xmm1
>> > >        orpd    %xmm2, %xmm1
>> > >        movapd  %xmm1, %xmm0
>> > >        ret
>> >
>> > Hm, why do we need the sign-copy?  If I read the docs correctly
>> > we can simply use roundsd directly, no?
>>
>> round-half-away-from-zero breaks your neck.  round[ps][sd] only supports
>> the usual four IEEE rounding modes.
>
> But, you should be able to apply the sign to the 0.5, which wouldn't
> require building the absolute value of input:
>
> round(x) = trunc(x + (copysign (0.5, x)))
>
> which should roughly be expanded to:
>
>        movsd   signbits(%rip), %xmm1
>       andpd   %xmm0, %xmm1
>       movsd   nextof0.5(%rip), %xmm2
>       orpd    %xmm1, %xmm2
>       addpd   %xmm2, %xmm0
>       roundsd $1, %xmm0, %xmm0
>        ret
>
> Which has one logical operation less (and one move because I chose a more
> optimal register assignment).

x86 logical insns can load constants directly from the memory, so your
proposal creates even better code:

myround:
.LFB0:
.cfi_startproc
movapd  %xmm0, %xmm1
andpd   .LC1(%rip), %xmm1
orpd.LC0(%rip), %xmm1
addsd   %xmm0, %xmm1
roundsd $3, %xmm1, %xmm0
ret

Please also note that copysign expander has special code path for op0
being constant.

I am testing following patch:

2011-08-20  Uros Bizjak  
Michael Matz  

* config/i386/i386.c (ix86_expand_round_sse4): Expand as
trunc (a + copysign (nextafter (0.5, 0.0), a)).

Uros.
Index: i386/i386.c
===
--- i386/i386.c (revision 177925)
+++ i386/i386.c (working copy)
@@ -32700,42 +32700,44 @@ void
 ix86_expand_round_sse4 (rtx op0, rtx op1)
 {
   enum machine_mode mode = GET_MODE (op0);
-  rtx e1, e2, e3, res, half, mask;
+  rtx e1, e2, res, half, mask;
   const struct real_format *fmt;
   REAL_VALUE_TYPE pred_half, half_minus_pred_half;
+  rtx (*gen_copysign) (rtx, rtx, rtx);
   rtx (*gen_round) (rtx, rtx, rtx);
 
   switch (mode)
 {
 case SFmode:
+  gen_copysign = gen_copysignsf3;
   gen_round = gen_sse4_1_roundsf2;
   break;
 case DFmode:
+  gen_copysign = gen_copysigndf3;
   gen_round = gen_sse4_1_rounddf2;
   break;
 default:
   gcc_unreachable ();
 }
 
-  /* e1 = fabs(op1) */
-  e1 = ix86_expand_sse_fabs (op1, &mask);
+  /* round (a) = trunc (a + copysign (0.5, a)) */
 
   /* load nextafter (0.5, 0.0) */
   fmt = REAL_MODE_FORMAT (mode);
   real_2expN (&half_minus_pred_half, -(fmt->p) - 1, mode);
   REAL_ARITHMETIC (pred_half, MINUS_EXPR, dconsthalf, half_minus_pred_half);
+  half = const_double_from_real_value (pred_half, mode);
 
-  /* e2 = e1 + 0.5 */
-  half = force_reg (mode, const_double_from_real_value (pred_half, mode));
-  e2 = expand_simple_binop (mode, PLUS, e1, half, NULL_RTX, 0, OPTAB_DIRECT);
+  /* e1 = copysign (0.5, op1) */
+  e1 = gen_reg_rtx (mode);
+  emit_insn (gen_copysign (e1, half, op1));
 
-  /* e3 = trunc(e2) */
-  e3 = gen_reg_rtx (mode);
-  emit_insn (gen_round (e3, e2, GEN_INT (ROUND_TRUNC)));
+  /* e2 = op1 + e1 */
+  e2 = expand_simple_binop (mode, PLUS, op1, e1, NULL_RTX, 0, OPTAB_DIRECT);
 
-  /* res = copysign (e3, op1) */
+  /* res = trunc (e2) */
   res = gen_reg_rtx (mode);
-  ix86_sse_copysign_to_positive (res, e3, op1, mask);
+  emit_insn (gen_round (res, e2, GEN_INT (ROUND_TRUNC)));
 
   emit_move_insn (op0, res);
 }

Re: Vector Comparison patch

2011-08-20 Thread Uros Bizjak

On Wed, Aug 17, 2011 at 11:49 AM, Richard Guenther
 wrote:

>>> Hm, ok ... let's hope we can sort-out the backend issues before this
>>> patch goes in so we can remove this converting stuff.
>>
>> Hm, I would hope that we could commit this patch even with this issue,
>> because my feeling is that this case would produce errors on all the
>> other architectures as well, as VEC_COND_EXPR is the feature heavily
>> used in auto-vectorizer. So it means that all the backends must be
>> fixed. And another argument, that this conversion is harmless.
>
> It shouldn't be hard to fix all the backends.  And if we don't do it now
> it will never happen.  I would expect that the codegen part of the
> backends doesn't need any adjustments, just the patterns that
> match what is supported.
>
> Uros, can you convert x86 as an example?  Thus, for
>
> (define_expand "vcond"
>  [(set (match_operand:VF 0 "register_operand" "")
>        (if_then_else:VF
>          (match_operator 3 ""
>            [(match_operand:VF 4 "nonimmediate_operand" "")
>             (match_operand:VF 5 "nonimmediate_operand" "")])
>          (match_operand:VF 1 "general_operand" "")
>          (match_operand:VF 2 "general_operand" "")))]
>  "TARGET_SSE"
> {
>  bool ok = ix86_expand_fp_vcond (operands);
>  gcc_assert (ok);

> allow any vector mode of the same size (and same number of elements?)
> for the vcond mode and operand 1 and 2?  Thus, only restrict the
> embedded comparison to VF?

I am a bit late to this discussion, but I see no problem for the
backend to relax this restriction. I will look into it.

Uros.

Re: [PATCH, i386, testsuite] FMA intrinsics

2011-08-20 Thread Uros Bizjak

Hello!

> This patch adds intrinsics for FMA instruction set along with tests for them.
> Bootstraps and passes make check (including make check on simulator
> for new runtime tests).

? ? ? ? ? ? ? * config/i386/fmaintrin.h: New.

It is not included in the patch.

? ? ? ? ? ? ? * config.gcc: Add fmaintrin.h.
? ? ? ? ? ? ? * config/i386/i386.c
? ? ? ? ? ? ? *  (IX86_BUILTIN_VFMADDSS3): New.
? ? ? ? ? ? ? (IX86_BUILTIN_VFMADDSD3): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMADDSS3): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMADDSD3): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBSS3): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBSD3): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMSUBSS3): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMSUBSD3): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBPS): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBPD): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBPS256): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBPD256): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMADDPS): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMADDPD): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMADDPS256): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMADDPD256): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMSUBPS): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMSUBPD): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMSUBPS256): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFNMSUBPD256): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBADDPS): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBADDPD): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBADDPS256): Likewise.
? ? ? ? ? ? ? (X86_BUILTIN_VFMSUBADDPD256): Likewise.

You don't need to add "negated" versions, one FMA builtin per mode is
enough, please see existing FMA4 descriptions. Just put unary minus
sign in the intrinsics header for "negated" operand and let GCC do its
job. Please see existing FMA4 intrinsics header.

? ? ? ? ? ? ? * config/i386/sse.md (fmai_fnmadd_): New.
? ? ? ? ? ? ? (fmai_fmsub_): Likewise.
? ? ? ? ? ? ? (fmai_fnmsub_): Likewise.
? ? ? ? ? ? ? (fmai_fmadd_s_): Likewise.
? ? ? ? ? ? ? (fmai_vmfmadd_s_): Likewise.
? ? ? ? ? ? ? (fmai_vmfmsub_s_): Likewise.
? ? ? ? ? ? ? (fmai_vmfnmadd_s_): Likewise.
? ? ? ? ? ? ? (fmai_vmfnmsub_s_): Likewise.
? ? ? ? ? ? ? (*fmai_fmadd_s_): Likewise.
? ? ? ? ? ? ? (*fmai_fmsub_s_): Likewise.
? ? ? ? ? ? ? (*fmai_fnmadd_s_): Likewise.
? ? ? ? ? ? ? (*fmai_fnmsub_s_): Likewise.
? ? ? ? ? ? ? (fmsubadd_): Likewise.

Also here. All your FMAMODE patterns should be expanded through
existing "fma4i_fmadd_" expander (you can rename it to
"fmai_fmadd..." to make its name more generic). This includes new
"fmsubadd_" pattern that should be expanded through existing
"fmaddsub_" expander.

vec_merge scalar versions also need only one expander, again follow
existing FMA4 version. Also, there is no need to include "_s_" in the
name. We know that these are scalar versions.

? ? ? ? ? ? ? * gcc.target/i386/fma-check.h: New.
? ? ? ? ? ? ? * gcc.target/i386/fma-256-fmaddXX.c: New testcase.
? ? ? ? ? ? ? * gcc.target/i386/fma-256-fmaddsubXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-256-fmsubXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-256-fmsubaddXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-256-fnmaddXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-256-fnmsubXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-fmaddXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-fmaddsubXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-fmsubXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-fmsubaddXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-fnmaddXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-fnmsubXX.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/fma-compile.c: Likewise.
? ? ? ? ? ? ? * gcc.target/i386/i386.exp (check_effective_target_fma): New.

Is there a reason that all runtime tests are compiled with -O0 except
that there are some existing FMA tests in the testsuite using -O0?
Usually, these kind of tests are compiled using -O2, so optimizations
are applied also to the builtins.

Uros.

Re: patch to solve PR49936

2011-08-20 Thread Richard Sandiford

Hi Vlad,

Vladimir Makarov  writes:
> The following patch makes gcc4.7 behaving as gcc4.6 for the case 
> described on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936.
>
> The patch was successfully bootstrapped on x86_64 and ppc64.
>
> Committed as rev 177916.
>
> 2011-08-19  Vladimir Makarov 
>
>  PR rtl-optimization/49936
>  * ira.c (ira_init_register_move_cost): Ignore too small subclasses
>  for calculation of max register move costs.

Thanks for the patch.  The allocno class costs for MIPS look
much better now.

However, the patch seems to expose a latent problem with the use of
ira_reg_class_max_nregs.  We set the number of allocno objects based
on the ira_reg_class_max_nregs of the allocno class, but often
expect that to be the same as the ira_reg_class_max_nregs of the
pressure class.  I can't see anything in the calculation of the
pressure classes to enforce that though.

In current trunk, this shows up as a failure to build libgcc
on mips64-linux-gnu.  We abort on:

  pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
  nregs = ira_reg_class_max_nregs[pclass][ALLOCNO_MODE (a)];
  gcc_assert (nregs == n);

in ira-lives.c:mark_pseudo_regno_subword_live for the attached
testcase, compiled with -O2 -mabi=64.

In this case it's a MIPS backend bug.  The single pressure class
for MIPS is ALL_REGS, and CLASS_MAX_NREGS (ALL_REGS, TImode)
is returning 4, based on the fact that ALL_REGS includes the
floating-point condition codes.  (CCmode is hard-wired to 4 bytes,
so for CCV2 and CCV4, the correct number of registers is the size
of the mode divided by 4.)  Since floating-point condition codes
can't store TImode, the backend should be ignoring them and
returning 2 instead.  I'm testing a fix for that now.

However, there are other situations where different register banks
really do need different numbers of registers to store the same thing.
E.g. MIPS has a mode in which the core registers are 32 bits but the
floating-point registers are 64 bits.  Thus:

   CLASS_MAX_NREGS (GR_REGS, DFmode) == 2
   CLASS_MAX_NREGS (FP_REGS, DFmode) == 1
   CLASS_MAX_NREGS (ALL_REGS, DFmode) == 2

Moves between GR_REGS and FP_REGS are cheaper than moves between memory
-- MIPS32r2 provides special move instructions -- so the two classes
still end up in the same pressure class.

Richard

typedef int DItype __attribute__((mode(DI)));
typedef int TItype __attribute__((mode(TI)));

DItype
__mulvdi3 (DItype a, DItype b)
{
  const TItype w = (TItype) a * (TItype) b;

  if ((DItype) (w >> (8 * 8)) != (DItype) w >> ((8 * 8) - 1))
abort ();

  return w;
}

Re: [RFC] Cleanup DW_CFA_GNU_args_size handling

2011-08-20 Thread Gerald Pfeifer

I'm afraid this patch casues i386 bootstraps to fail:

  Comparing stages 2 and 3
  warning: gcc/cc1-checksum.o differs
  warning: gcc/cc1plus-checksum.o differs
  warning: gcc/cc1obj-checksum.o differs
  Bootstrap comparison failure!
  libiberty/pic/cplus-dem.o differs
  libiberty/pic/crc32.o differs

Here is part of my binary search:

  r177170 2011-08-02 14:55:47 okay
  r177212 2011-08-02 20:26:57 okay
  r177216 2011-08-02 21:09:26 okay
  r177218 2011-08-02 22:18:35 comparison failure

This is also http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50010 .


Gerald


2011-08-02  Richard Henderson  

PR target/49864
* reg-notes.def (REG_ARGS_SIZE): New.
* calls.c (emit_call_1): Emit REG_ARGS_SIZE for call_pop.
(expand_call): Add REG_ARGS_SIZE to emit_stack_restore.
* cfgcleanup.c (old_insns_match_p): Don't allow cross-jumping to
different stack levels.
* combine-stack-adj.c (adjust_frame_related_expr): Remove.
(maybe_move_args_size_note): New.
(combine_stack_adjustments_for_block): Use it.
* combine.c (distribute_notes): Place REG_ARGS_SIZE.
* dwarf2cfi.c (dw_cfi_row_struct): Remove args_size member.
(dw_trace_info): Add beg_true_args_size, end_true_args_size,
beg_delay_args_size, end_delay_args_size, eh_head, args_size_undefined.
(cur_cfa): New.
(queued_args_size): Remove.
(add_cfi_args_size): Assert size is non-negative.
(stack_adjust_offset, dwarf2out_args_size): Remove.
(dwarf2out_stack_adjust, dwarf2out_notice_stack_adjust): Remove.
(notice_args_size, notice_eh_throw): New.
(dwarf2out_frame_debug_def_cfa): Use cur_cfa.
(dwarf2out_frame_debug_adjust_cfa): Likewise.
(dwarf2out_frame_debug_cfa_offset): Likewise.
(dwarf2out_frame_debug_expr): Likewise.  Don't stack_adjust_offset.
(dwarf2out_frame_debug): Don't handle non-frame-related-p insns.
(change_cfi_row): Don't emit args_size.
(maybe_record_trace_start_abnormal): Split out from ...
(maybe_record_trace_start): Here.  Set args_size_undefined.
(create_trace_edges): Update to match.
(scan_trace): Handle REG_ARGS_SIZE.
(connect_traces): Connect args_size between EH insns.
* emit-rtl.c (try_split): Handle REG_ARGS_SIZE.
* explow.c (suppress_reg_args_size): New.
(adjust_stack_1): Split out from ...
(adjust_stack): ... here.
(anti_adjust_stack): Use it.
(allocate_dynamic_stack_space): Suppress REG_ARGS_SIZE.
* expr.c (mem_autoinc_base): New.
(fixup_args_size_notes): New.
(emit_single_push_insn_1): Rename from emit_single_push_insn.
(emit_single_push_insn): New.  Generate REG_ARGS_SIZE.
* recog.c (peep2_attempt): Handle REG_ARGS_SIZE.
* reload1.c (reload_as_needed): Likewise.
* rtl.h (fixup_args_size_notes): Declare.

[PATCH, i386]: Use satisfies_constraint_L in ix86_binary_operator_ok

2011-08-20 Thread Uros Bizjak

Hello!

No functional change.

2011-08-20  Uros Bizjak  

* config/i386/i386.c (ix86_binary_operator_ok): Use
satisfies_constraint_L.

Tested on x86_64-pc-linux-gnu, committed to mainline.

Uros.
Index: i386.c
===
--- i386.c  (revision 177927)
+++ i386.c  (working copy)
@@ -15787,16 +15787,12 @@ ix86_binary_operator_ok (enum rtx_code code, enum
 
   /* Source 1 cannot be a non-matching memory.  */
   if (MEM_P (src1) && !rtx_equal_p (dst, src1))
-{
-  /* Support "andhi/andsi/anddi" as a zero-extending move.  */
-  return (code == AND
- && (mode == HImode
- || mode == SImode
- || (TARGET_64BIT && mode == DImode))
- && CONST_INT_P (src2)
- && (INTVAL (src2) == 0xff
- || INTVAL (src2) == 0x));
-}
+/* Support "andhi/andsi/anddi" as a zero-extending move.  */
+return (code == AND
+   && (mode == HImode
+   || mode == SImode
+   || (TARGET_64BIT && mode == DImode))
+   && satisfies_constraint_L (src2));
 
   return true;
 }

[wwwdocs] Buildstat update for 4.6

2011-08-20 Thread Tom G. Christensen

Latest results for 4.6.x

-tgc

Testresults for 4.6.1:
  alphaev68-dec-osf5.1a
  hppa64-hp-hpux11.00
  i386-pc-mingw32
  powerpc-apple-darwin8.11.0
  x86_64-apple-darwin10.7.0
  x86_64-apple-darwin11.0.0
Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.6/buildstat.html,v
retrieving revision 1.5
diff -u -r1.5 buildstat.html
--- buildstat.html  5 Jul 2011 22:58:58 -   1.5
+++ buildstat.html  20 Aug 2011 10:54:56 -
@@ -32,6 +32,14 @@
 
 
 
+alphaev68-dec-osf5.1a
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00587.html";>4.6.1
+
+
+
+
 armv7l-unknown-linux-gnueabi
  
 Test results:
@@ -50,6 +58,14 @@
 
 
 
+hppa64-hp-hpux11.00
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-07/msg00569.html";>4.6.1
+
+
+
+
 hppa2.0w-hp-hpux11.11
  
 Test results:
@@ -68,6 +84,14 @@
 
 
 
+i386-pc-mingw32
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-07/msg00696.html";>4.6.1
+
+
+
+
 i386-pc-solaris2.8
  
 Test results:
@@ -149,6 +173,14 @@
 
 
 
+powerpc-apple-darwin8.11.0
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-07/msg01092.html";>4.6.1
+
+
+
+
 s390-ibm-linux-gnu
  
 Test results:
@@ -211,11 +243,20 @@
 x86_64-apple-darwin10.7.0
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-07/msg00454.html";>4.6.1,
 http://gcc.gnu.org/ml/gcc-testresults/2011-03/msg02755.html";>4.6.0
 
 
 
 
+x86_64-apple-darwin11.0.0
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-07/msg00877.html";>4.6.1
+
+
+
+
 x86_64-unknown-linux-gnu
  
 Test results:

[wwwdocs] Buildstat update for 4.5

2011-08-20 Thread Tom G. Christensen

Latest results for 4.5.x

-tgc

Testresults for 4.5.3:
  mips-sgi-irix5.3
Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.5/buildstat.html,v
retrieving revision 1.12
diff -u -r1.12 buildstat.html
--- buildstat.html  8 Jul 2011 23:00:16 -   1.12
+++ buildstat.html  20 Aug 2011 10:54:48 -
@@ -209,6 +209,7 @@
 mips-sgi-irix5.3
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg01356.html";>4.5.3,
 http://gcc.gnu.org/ml/gcc-testresults/2011-01/msg00185.html";>4.5.2,
 http://gcc.gnu.org/ml/gcc-testresults/2010-11/msg01539.html";>4.5.1,
 http://gcc.gnu.org/ml/gcc-testresults/2010-11/msg01424.html";>4.5.1

[wwwdocs] Buildstat update for 4.4

2011-08-20 Thread Tom G. Christensen

Latest results for 4.4.x.

-tgc

Testresults for 4.4.6:
  alphaev68-dec-osf5.1a

Testresults for 4.4.5:
  x86_64-unknown-linux-gnu
Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.4/buildstat.html,v
retrieving revision 1.24
diff -u -r1.24 buildstat.html
--- buildstat.html  8 Jul 2011 09:29:55 -   1.24
+++ buildstat.html  20 Aug 2011 10:54:34 -
@@ -34,6 +34,7 @@
 alphaev68-dec-osf5.1a
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00586.html";>4.4.6,
 http://gcc.gnu.org/ml/gcc-testresults/2011-05/msg00074.html";>4.4.6,
 http://gcc.gnu.org/ml/gcc-testresults/2010-12/msg01338.html";>4.4.5,
 http://gcc.gnu.org/ml/gcc-testresults/2010-07/msg01437.html";>4.4.4,
@@ -462,6 +463,7 @@
 x86_64-unknown-linux-gnu
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg01134.html";>4.4.5,
 http://gcc.gnu.org/ml/gcc-testresults/2010-10/msg00494.html";>4.4.5,
 http://gcc.gnu.org/ml/gcc-testresults/2010-09/msg02197.html";>4.4.4,
 http://gcc.gnu.org/ml/gcc-testresults/2010-09/msg00871.html";>4.4.4,

[wwwdocs] Buildstat update for 4.3

2011-08-20 Thread Tom G. Christensen

Latest results for 4.3.x.

-tgc

Testresults for 4.3.6:
  i386-pc-solaris2.6
  sparc-sun-solaris2.6
  sparc-sun-solaris2.7
Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.3/buildstat.html,v
retrieving revision 1.35
diff -u -r1.35 buildstat.html
--- buildstat.html  4 Jun 2011 10:41:03 -   1.35
+++ buildstat.html  20 Aug 2011 10:54:25 -
@@ -219,6 +219,7 @@
 i386-pc-solaris2.6
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg01975.html";>4.3.6,
 http://gcc.gnu.org/ml/gcc-testresults/2010-09/msg00074.html";>4.3.5,
 http://gcc.gnu.org/ml/gcc-testresults/2009-09/msg00680.html";>4.3.4,
 http://gcc.gnu.org/ml/gcc-testresults/2009-02/msg00797.html";>4.3.3,
@@ -471,6 +472,7 @@
 sparc-sun-solaris2.6
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-07/msg00410.html";>4.3.6,
 http://gcc.gnu.org/ml/gcc-testresults/2010-05/msg02366.html";>4.3.5,
 http://gcc.gnu.org/ml/gcc-testresults/2009-08/msg00982.html";>4.3.4,
 http://gcc.gnu.org/ml/gcc-testresults/2009-01/msg02898.html";>4.3.3,
@@ -484,6 +486,7 @@
 sparc-sun-solaris2.7
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2011-07/msg00409.html";>4.3.6,
 http://gcc.gnu.org/ml/gcc-testresults/2011-05/msg03066.html";>4.3.5,
 http://gcc.gnu.org/ml/gcc-testresults/2009-09/msg01055.html";>4.3.4,
 http://gcc.gnu.org/ml/gcc-testresults/2009-01/msg03291.html";>4.3.3,

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread Uros Bizjak

On Fri, Aug 19, 2011 at 4:51 PM, Kirill Yukhin  wrote:

>>> Updated patch is attached.

Comments inline.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 53c5944..bff1a05 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_ABM | OPTION_MASK_ISA_POPCNT)

 #define OPTION_MASK_ISA_BMI_SET OPTION_MASK_ISA_BMI
+#define OPTION_MASK_ISA_BMI2_SET OPTION_MASK_ISA_BMI2

Are you sure that -mbmi2 does not imply -mbmi?

@@ -13285,6 +13291,7 @@ put_condition_code (enum rtx_code code, enum
machine_mode mode, int reverse,
If CODE is 't', pretend the mode is V8SFmode.
If CODE is 'h', pretend the reg is the 'high' byte register.
If CODE is 'y', print "st(0)" instead of "st", if the reg is stack op.
+   If CODE is 'N', print the half mode high register.
If CODE is 'd', duplicate the operand for AVX instruction.
  */

   If CODE is 'N', print the high register of a double word register pair.

@@ -13294,6 +13301,15 @@ print_reg (rtx x, int code, FILE *file)
   const char *reg;
   bool duplicated = code == 'd' && TARGET_AVX;

+  if (code == 'N')
+{
+  enum machine_mode mode = GET_MODE (x);
+  enum machine_mode half_mode = mode == TImode ? DImode : SImode;
+  x = simplify_gen_subreg (half_mode, x, mode,
+  GET_MODE_SIZE (half_mode));
+  code = 0;
+}
+

No need to check modes, we _KNOW_ that DWI expands to double word
modes. Also, handling of 'N' should be put a couple of lines lower,
like:

 code = 16;
   else if (code == 't')
 code = 32;
+  else if (code == 'N')
+{
+  gcc_assert (mode == GET_MODE_WIDER_MODE (word_mode));
+  x = gen_highpart (word_mode, x);
+  code = GET_MODE_SIZE (word_mode);
+}
   else
 code = GET_MODE_SIZE (GET_MODE (x));

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e7ae397..05f7666 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md

 (define_c_enum "unspecv" [
@@ -751,14 +756,17 @@
 ;; Base name for insn mnemonic.
 (define_code_attr logic [(and "and") (ior "or") (xor "xor")])

+;; Mapping of shift operators
+(define_code_iterator any_shift [ashift lshiftrt ashiftrt])
+
 ;; Mapping of shift-right operators
 (define_code_iterator any_shiftrt [lshiftrt ashiftrt])

 ;; Base name for define_insn
-(define_code_attr shiftrt_insn [(lshiftrt "lshr") (ashiftrt "ashr")])
+(define_code_attr shift_insn [(ashift "ashl") (lshiftrt "lshr")
(ashiftrt "ashr")])

 ;; Base name for insn mnemonic.
-(define_code_attr shiftrt [(lshiftrt "shr") (ashiftrt "sar")])
+(define_code_attr shift [(ashift "shl") (lshiftrt "shr") (ashiftrt "sar")])

These renames should be part of another follow-up patch.

 ;; Mapping of rotate operators
 (define_code_iterator any_rotate [rotate rotatert])
@@ -777,6 +785,8 @@

 ;; Used in signed and unsigned widening multiplications.
 (define_code_iterator any_extend [sign_extend zero_extend])
+(define_code_attr any_extend [(sign_extend "SIGN_EXTEND")
+ (zero_extend "ZERO_EXTEND")])

No. Pattern should be splitted instead.

 ;; Various insn prefixes for signed and unsigned operations.
 (define_code_attr u [(sign_extend "") (zero_extend "u")
@@ -6837,7 +6847,17 @@
   (match_operand:DWIH 1 "nonimmediate_operand" ""))
 (any_extend:
   (match_operand:DWIH 2 "register_operand" ""
- (clobber (reg:CC FLAGS_REG))])])
+ (clobber (reg:CC FLAGS_REG))])]
+  ""
+{
+  if (TARGET_BMI2 &&  == ZERO_EXTEND)
+{
+  emit_insn (gen_bmi2_umul3_1 (operands[0],
+ operands[1],
+ operands[2]));
+  DONE;
+}
+})

Please split the expander instead!

+;; Update pattern if BMI2 is available
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand" "")
+   (any_shift:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "")
+ (subreg:QI
+ (match_operand:SI 2 "register_operand" "") 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_BMI2 && ix86_binary_operator_ok (, mode,
operands) && !reload_completed"
+  [(set (match_dup 0)
+(any_shift:SWI48 (match_dup 1) (match_dup 2)))]
+{
+  if (can_create_pseudo_p () && mode != SImode)
+{
+  rtx tmp = gen_rtx_REG (mode, 0);
+  emit_insn (gen_extendsidi2 (tmp, operands[2]));
+  operands[2] = tmp;
+}
+})

Why splitters? Generate the shifts directly from the expander, fixing
the operands on-the-fly if necessary. Also, do not rename half of the
shift expanders and insn patterns just to introduce *ONE* extra RTX...

@@ -15745,8 +15763,23 @@ ix86_expand_binary_operator (enum rtx_code
code, enum machine_mode mode,
 }

Don't expand RORX through ix86_expand_binary_operator, generate it
directly from expander. You are complicating things with splitters too
much!

I will

[trans-mem] Add futex-based serial lock

2011-08-20 Thread Torvald Riegel

This adds a futex-based version of the serial lock for use on Linux. The
futex code is basically old code of libitm (it got removed in SVN rev
157758) with one fix for sysfutex0 on x86_64 and one change that returns
the number of woken processes (futex_wake).

The gtm_rwlock is similar in concept to the mutex-based version, but
adapted to futexes. It performs better than the mutex-based version. Not
really great yet, but there is no spinning yet, so on contention we'll
always have the overhead of waiting via the futexes.

Again, RBTree with 1/2/4/6 threads, with different update %:
  0%:  4989   120   120
  1%:  48657580
 20%:  351010 9
100%:  16.5   2.5   2.5   2.5

For comparison, the mutex-based version:
0%:49 / 90 / 120
1%:47 / 59 /  27
20%34 /  6 /   3
100%:  15 /  1 /   1

OK for branch?
commit 0b95e53c6da549032ebf7533a4dfea75a7ccb1b2
Author: Torvald Riegel 
Date:   Fri Aug 19 15:56:43 2011 +0200

Add futex-based serial lock.

* config/linux/rwlock.h: New file.
* config/linux/rwlock.c: New file.
* configure.ac: Reenable futex support (undo SVN rev 157758).
* Makefile.am: Same.
* configure.tgt: Same.
* config/linux/alpha/futex_bits.h: Same.
* config/linux/futex.h: Same. Return number of woken processes.
* config/linux/futex.cc: Same.
(futex_wait): Remove spinning.
* config/linux/x86/futex_bits.h: Same. Set futex timeout to zero.
* aclocal.m4: Include generic futex checks.
* configure: Rebuild.
* Makefile.in: Rebuild.
* testsuite/Makefile.in: Rebuild.
* beginend.cc: Include pthread.h.
* config/posix/cachepage.cc: Same.

diff --git a/libitm/Makefile.am b/libitm/Makefile.am
index 09c21ff..ee1822b 100644
--- a/libitm/Makefile.am
+++ b/libitm/Makefile.am
@@ -51,6 +51,9 @@ x86_sse.lo : XCFLAGS += -msse
 x86_avx.lo : XCFLAGS += -mavx
 endif
 
+if ARCH_FUTEX
+libitm_la_SOURCES += futex.cc
+endif
 
 # Automake Documentation:
 # If your package has Texinfo files in many directories, you can use the
diff --git a/libitm/Makefile.in b/libitm/Makefile.in
index 57f76f6..524753e 100644
--- a/libitm/Makefile.in
+++ b/libitm/Makefile.in
@@ -37,6 +37,7 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 @ARCH_X86_TRUE@am__append_1 = x86_sse.cc x86_avx.cc
+@ARCH_FUTEX_TRUE@am__append_2 = futex.cc
 subdir = .
 DIST_COMMON = $(am__configure_deps) $(srcdir)/../config.guess \
$(srcdir)/../config.sub $(srcdir)/../depcomp \
@@ -49,6 +50,7 @@ ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
$(top_srcdir)/../config/depstand.m4 \
$(top_srcdir)/../config/enable.m4 \
+   $(top_srcdir)/../config/futex.m4 \
$(top_srcdir)/../config/lead-dot.m4 \
$(top_srcdir)/../config/mmap.m4 \
$(top_srcdir)/../config/multi.m4 \
@@ -95,12 +97,14 @@ am__libitm_la_SOURCES_DIST = aatree.cc alloc.cc alloc_c.cc \
alloc_cpp.cc barrier.cc beginend.cc clone.cc cacheline.cc \
cachepage.cc eh_cpp.cc local.cc query.cc retry.cc rwlock.cc \
useraction.cc util.cc sjlj.S tls.cc method-serial.cc \
-   x86_sse.cc x86_avx.cc
+   x86_sse.cc x86_avx.cc futex.cc
 @ARCH_X86_TRUE@am__objects_1 = x86_sse.lo x86_avx.lo
+@ARCH_FUTEX_TRUE@am__objects_2 = futex.lo
 am_libitm_la_OBJECTS = aatree.lo alloc.lo alloc_c.lo alloc_cpp.lo \
barrier.lo beginend.lo clone.lo cacheline.lo cachepage.lo \
eh_cpp.lo local.lo query.lo retry.lo rwlock.lo useraction.lo \
-   util.lo sjlj.lo tls.lo method-serial.lo $(am__objects_1)
+   util.lo sjlj.lo tls.lo method-serial.lo $(am__objects_1) \
+   $(am__objects_2)
 libitm_la_OBJECTS = $(am_libitm_la_OBJECTS)
 DEFAULT_INCLUDES = -I.@am__isrc@
 depcomp = $(SHELL) $(top_srcdir)/../depcomp
@@ -369,7 +373,8 @@ libitm_la_LDFLAGS = $(libitm_version_info) 
$(libitm_version_script) \
 libitm_la_SOURCES = aatree.cc alloc.cc alloc_c.cc alloc_cpp.cc \
barrier.cc beginend.cc clone.cc cacheline.cc cachepage.cc \
eh_cpp.cc local.cc query.cc retry.cc rwlock.cc useraction.cc \
-   util.cc sjlj.S tls.cc method-serial.cc $(am__append_1)
+   util.cc sjlj.S tls.cc method-serial.cc $(am__append_1) \
+   $(am__append_2)
 
 # Automake Documentation:
 # If your package has Texinfo files in many directories, you can use the
@@ -499,6 +504,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cachepage.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/clone.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eh_cpp.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/futex.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/local.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/method-serial.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/query.Plo@am__quote@
diff --git a/libitm/aclocal.m4

Re: [PATCH] non-GNU C++ compilers

2011-08-20 Thread Marc Glisse


On Mon, 8 Aug 2011, Joseph S. Myers wrote:


On Mon, 8 Aug 2011, Marc Glisse wrote:


* include/obstack.h (obstack_free): Cast to char* instead of int


This header comes from glibc/gnulib.  Although some local changes have
been made to it in the past, sending any fixes to upstream glibc is still
a good idea.


http://sourceware.org/bugzilla/show_bug.cgi?id=13067

(doesn't help gcc much, but it is in glibc's bugzilla now)

--
Marc Glisse

Re: [rtl, delay-slot] Fix overload of "unchanging" bit

2011-08-20 Thread Richard Sandiford

Richard Henderson  writes:
> As found by a c6x build failure, INSN_ANNULLED_BRANCH_P and RTL_CONST_CALL_P
> both resolve to the same bit for CALL_INSNs.  I want to fix this by
> restricting INSN_ANNULLED_BRANCH_P to JUMP_INSNs, since annulling the slots
> for a call or a plain insn doesn't really make sense.
>
> The following has passed stage2-gcc on sparc64-linux host (full build still
> in progress), with --enable-checking=yes,rtl.  It surely needs more than that,
> and I'm asking for help from the relevant maintainers to give this a try.

'spect you'll have noticed this by now, but there was a typo in:

> +rtx annul_p = JUMP_P (control) && INSN_ANNULLED_BRANCH_P (control);

(should be bool).  Otherwise it tests fine on mips64-linux-gnu with the
additional patch below, which I've just applied.

Richard


gcc/
* config/mips/mips.c (mips_reorg_process_insns): Check for jumps
before checking for annulled branches.

Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c  2011-08-20 19:44:06.0 +0100
+++ gcc/config/mips/mips.c  2011-08-20 19:44:44.0 +0100
@@ -14831,6 +14831,7 @@ mips_reorg_process_insns (void)
 executed.  */
  else if (recog_memoized (insn) == CODE_FOR_r10k_cache_barrier
   && last_insn
+  && JUMP_P (SEQ_BEGIN (last_insn))
   && INSN_ANNULLED_BRANCH_P (SEQ_BEGIN (last_insn)))
delete_insn (insn);
  else

[MIPS, committed] Fix mips_class_max_nregs

2011-08-20 Thread Richard Sandiford

Richard Sandiford  writes:
> In this case it's a MIPS backend bug.  The single pressure class
> for MIPS is ALL_REGS, and CLASS_MAX_NREGS (ALL_REGS, TImode)
> is returning 4, based on the fact that ALL_REGS includes the
> floating-point condition codes.  (CCmode is hard-wired to 4 bytes,
> so for CCV2 and CCV4, the correct number of registers is the size
> of the mode divided by 4.)  Since floating-point condition codes
> can't store TImode, the backend should be ignoring them and
> returning 2 instead.  I'm testing a fix for that now.

Here's what I applied after testing mips64-linux-gnu.  As well as fixing
the wrong value for valid combinations, it has the side-effect of
returning an over-the-top value for more invalid combinations than
before.  That's semi- intentional though.  I don't think this macro is
required to detect invalid modes, or return a specific value for them.

Richard


gcc/
* config/mips/mips.c (mips_class_max_nregs): Check that the mode is
OK for ST_REGS and FP_REGS before taking those classes into account.

Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c  2011-08-20 19:44:44.0 +0100
+++ gcc/config/mips/mips.c  2011-08-20 19:49:06.0 +0100
@@ -10630,12 +10630,14 @@ mips_class_max_nregs (enum reg_class rcl
   COPY_HARD_REG_SET (left, reg_class_contents[(int) rclass]);
   if (hard_reg_set_intersect_p (left, reg_class_contents[(int) ST_REGS]))
 {
-  size = MIN (size, 4);
+  if (HARD_REGNO_MODE_OK (ST_REG_FIRST, mode))
+   size = MIN (size, 4);
   AND_COMPL_HARD_REG_SET (left, reg_class_contents[(int) ST_REGS]);
 }
   if (hard_reg_set_intersect_p (left, reg_class_contents[(int) FP_REGS]))
 {
-  size = MIN (size, UNITS_PER_FPREG);
+  if (HARD_REGNO_MODE_OK (FP_REG_FIRST, mode))
+   size = MIN (size, UNITS_PER_FPREG);
   AND_COMPL_HARD_REG_SET (left, reg_class_contents[(int) FP_REGS]);
 }
   if (!hard_reg_set_empty_p (left))

Re: [Patch, Fortran, OOP] PR 49638: [OOP] length parameter is ignored when overriding type bound character functions with constant length.

2011-08-20 Thread Janus Weil

>> > There is for example (currently) no special handling for operators.
>>
>> Well, unfortunately one cannot just return "-3" for non-matching
>> operators. Just think of cases like A*(B+C) vs A*B+A*C.
> Ah yes. I was thinking expressions themselves were compared; but only their
> values are.

I'm not sure I'm getting you right here. Of course we do compare the
expressions themselves. However, for example things like commutativity
of operators are taken into account, meaning we compare "A+B" equal to
"B+A" (A and B being arbitrary expressions).

Taking care of other algebraic transformations (like e.g.
distributivity as mentioned above) will be a bit harder. And the
question is we are even allowed to do it. Earlier in this thread Steve
mentioned restrictions like

Note 7.18.  X*(Y-Z) -> X*Y - X*Z is a forbidden transformation
(there is no noted restriction on Z > 0).



>> I'll commit the patch (as posted) tomorrow, if Mikael agrees that the
>> description is ok.
> It's fine. Thanks.

Committed as r177932. Thanks again for your review and comments.

Cheers,
Janus

[patch] support for multiarch systems

2011-08-20 Thread Matthias Klose

Multiarch [1] is the term being used to refer to the capability of a system to
install and run applications of multiple different binary targets on the same
system.  The idea and name of multiarch dates back to 2004/2005 [2] (to be
confused with multiarch in glibc).

Multiarch defines new system directories for headers and libraries/object files:

  /usr/include/
  /lib/
  /usr/lib/

  /usr/local/include/
  /usr/local/lib/

The attached patch

 - searches for multiarch subdirectories in the list of
   startfile_prefixes
 - passes the option -imultiarch to the compiler binaries
 - the compiler binaries add the multiarch include paths
   to the system include path.
 - adds a driver option -print-multiarch

The multiarch triplets are defined in the target specific tmake files, and
provided for all known existing multiarch implementations (currently Debian,
Ubuntu and derivatives).  For non-multilib'd configurations, the triplet is
defined in MULTIARCH_DIRNAME, for multilib'd configurations each directory in
MULTILIB_OSDIRNAMES gets an multiarch directory associated, separated by a colon
(e.g. ../lib:x86_64-linux-gnu).  The multiarch names are as used by Debian, the
mips names go back to a discussion from 2006 [3] to match the ones for glibc.

Tested on non-multilib'd and multilib'd systems, both native and cross builds.
Ok for the trunk?

  Matthias

[1] http://wiki.debian.org/Multiarch
[2] http://debconf5.debconf.org/comas/general/proposals/27.html
[3] http://lists.debian.org/debian-mips/2006/03/msg4.html


2011-08-20  Matthias Klose  

* doc/invoke.texi: Document -print-multiarch.
* Makefile.in (s-mlib): Pass MULTIARCH_DIRNAME to genmultilib.
* genmultilib: Add new option for the multiarch name.
* gcc.c (multiarch_dir): Define.
(for_each_path): Search for multiarch suffixes.
(driver_handle_option): Handle multiarch option.
(do_spec_1): Pass -imultiarch if defined.
(main): Print multiarch.
(set_multilib_dir): Separate multilib and multiarch names
from multilib_select.
(print_multilib_info): Ignore multiarch names in multilib_select.
* incpath.c (add_standard_paths): Search the multiarch include dirs.
* cppdeault.h (default_include): Document multiarch in multilib
member.
* cppdefault.c: [LOCAL_INCLUDE_DIR, STANDARD_INCLUDE_DIR] Add an
include directory for multiarch directories.
* common.opt: New options --print-multiarch and -imultilib.
* config/s390/t-linux64: Add multiarch names in MULTILIB_OSDIRNAMES.
* config/sparc/t-linux64: Likewise.
* config/powerpc/t-linux64: Likewise.
* config/i386/t-linux64: Likewise.
* config/mips/t-linux64: Likewise.
* config/alpha/t-linux: Define MULTIARCH_DIRNAME.
* config/arm/t-linux: Likewise.
* config/i386/t-linux: Likewise.
* config/pa/t-linux: Likewise.
* config/sparc/t-linux: Likewise.
* config/ia64/t-glibc: Define MULTIARCH_DIRNAME for linux target.
* gcc/config/i386/t-gnu: New, Define MULTIARCH_DIRNAME.
* gcc/config/i386/t-kfreebsd: New, Define MULTIARCH_DIRNAME and
MULTILIB_OSDIRNAMES.



Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 177846)
+++ gcc/doc/invoke.texi (working copy)
@@ -5937,6 +5937,11 @@
 @file{../lib32}, or if OS libraries are present in @file{lib/@var{subdir}}
 subdirectories it prints e.g.@: @file{amd64}, @file{sparcv9} or @file{ev6}.
 
+@item -print-multiarch
+@opindex print-multiarch
+Print the path to OS libraries for the selected multiarch,
+relative to some @file{lib} subdirectory.
+
 @item -print-prog-name=@var{program}
 @opindex print-prog-name
 Like @option{-print-file-name}, but searches for a program such as @samp{cpp}.
Index: gcc/incpath.c
===
--- gcc/incpath.c   (revision 177846)
+++ gcc/incpath.c   (working copy)
@@ -150,8 +150,14 @@
  if (!filename_ncmp (p->fname, cpp_GCC_INCLUDE_DIR, len))
{
  char *str = concat (iprefix, p->fname + len, NULL);
- if (p->multilib && imultilib)
+ if (p->multilib == 1 && imultilib)
str = concat (str, dir_separator_str, imultilib, NULL);
+ else if (p->multilib == 2)
+   {
+ if (!imultiarch)
+   continue;
+ str = concat (str, dir_separator_str, imultiarch, NULL);
+   }
  add_path (str, SYSTEM, p->cxx_aware, false);
}
}
@@ -195,8 +201,14 @@
  else
str = update_path (p->fname, p->component);
 
- if (p->multilib && imultilib)
+ if (p->multilib == 1 && imultilib)
str = concat (str, dir_separator_str, imultilib, NULL);
+

Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-08-20 Thread H.J. Lu

On Fri, Aug 19, 2011 at 7:55 AM, Jakub Jelinek  wrote:
> On Fri, Aug 19, 2011 at 07:47:40AM -0700, H.J. Lu wrote:
>> 2011-08-19  H.J. Lu  
>>
>>       PR target/46770
>>       * config.gcc (tm_file): Add initfini-array.h if
>>       .init_arary/.fini_array supported.
>
> s/arary/array/
>
> Ok if nobody objects within 24 hours, but please watch for any fallouts.
>
>        Jakub
>

This is the patch I checked in. I moved
default_elf_init_array_asm_out_constructor
and default_elf_fini_array_asm_out_destructor from config/initfini-array.h to
output.h so that we won't get warnings if .init_arrary/.fini_array
sections aren't
enabled.

Thanks.

-- 
H.J.
---
2011-08-20  H.J. Lu  

PR target/46770
* config.gcc (tm_file): Add initfini-array.h if
.init_arrary/.fini_array are supported.

* crtstuff.c: Don't generate .ctors nor .dtors sections if
USE_INITFINI_ARRAY is defined.

* output.h (default_elf_init_array_asm_out_constructor): New.
(default_elf_fini_array_asm_out_destructor): Likewise.
* varasm.c (elf_init_array_section): Likewise.
(elf_fini_array_section): Likewise.
(get_elf_initfini_array_priority_section): Likewise.
(default_elf_init_array_asm_out_constructor): Likewise.
(default_elf_fini_array_asm_out_destructor): Likewise.

* config/initfini-array.h: New.
2011-08-20  H.J. Lu  

PR other/46770
* config.gcc (tm_file): Add initfini-array.h if
.init_arrary/.fini_array are supported.

* crtstuff.c: Don't generate .ctors nor .dtors sections if
USE_INITFINI_ARRAY is defined.

* output.h (default_elf_init_array_asm_out_constructor): New.
(default_elf_fini_array_asm_out_destructor): Likewise.
* varasm.c (elf_init_array_section): Likewise.
(elf_fini_array_section): Likewise.
(get_elf_initfini_array_priority_section): Likewise.
(default_elf_init_array_asm_out_constructor): Likewise.
(default_elf_fini_array_asm_out_destructor): Likewise.

* config/initfini-array.h: New.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b92ce3d..7f29213 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3058,6 +3058,11 @@ if test x$with_schedule = x; then
esac
 fi
 
+# Support --enable-initfini-array.
+if test x$enable_initfini_array = xyes; then
+  tm_file="${tm_file} initfini-array.h"
+fi
+
 # Validate and mark as valid any --with options supported
 # by this target.  In order to use a particular --with option
 # you must list it in supported_defaults; validating the value
diff --git a/gcc/config/initfini-array.h b/gcc/config/initfini-array.h
new file mode 100644
index 000..8aaadf6
--- /dev/null
+++ b/gcc/config/initfini-array.h
@@ -0,0 +1,37 @@
+/* Definitions for ELF systems with .init_array/.fini_array section
+   support.
+   Copyright (C) 2011
+   Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#define USE_INITFINI_ARRAY
+
+#undef INIT_SECTION_ASM_OP
+#undef FINI_SECTION_ASM_OP
+
+#undef INIT_ARRAY_SECTION_ASM_OP
+#define INIT_ARRAY_SECTION_ASM_OP
+
+#undef FINI_ARRAY_SECTION_ASM_OP
+#define FINI_ARRAY_SECTION_ASM_OP
+
+/* Use .init_array/.fini_array section for constructors and destructors. */
+#undef TARGET_ASM_CONSTRUCTOR
+#define TARGET_ASM_CONSTRUCTOR default_elf_init_array_asm_out_constructor
+#undef TARGET_ASM_DESTRUCTOR
+#define TARGET_ASM_DESTRUCTOR default_elf_fini_array_asm_out_destructor
diff --git a/gcc/crtstuff.c b/gcc/crtstuff.c
index b65f490..010d472 100644
--- a/gcc/crtstuff.c
+++ b/gcc/crtstuff.c
@@ -1,7 +1,8 @@
 /* Specialized bits of code needed to support construction and
destruction of file-scope objects in C++ code.
Copyright (C) 1991, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001
-   2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010 Free Software Foundation, 
Inc.
+   2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2011
+   Free Software Foundation, Inc.
Contributed by Ron Guilmette (r...@monkeys.com).
 
 This file is part of GCC.
@@ -189,6 +190,9 @@ typedef void (*func_ptr) (void);
refer to only the __CTOR_END__ symbol in crtend.o and the __DTOR_LIST__
symbol in crtbegin.o, where they are defined.  */
 
+/* No need for .ctors/.dtors section if linker can place them in
+   .init_array/.fini_array section.  */
+#ifndef

Re: [patch] support for multiarch systems

2011-08-20 Thread Jakub Jelinek

On Sat, Aug 20, 2011 at 09:51:33PM +0200, Matthias Klose wrote:
> Tested on non-multilib'd and multilib'd systems, both native and cross builds.
> Ok for the trunk?

I don't think we want to do this unconditionally, we already search way too
many directories by default.  This is a Debian/Ubuntu specific setup, I
don't think many others are going to use such a setup.
So, IMHO you should make it configure time selectable whether those extra
dirs are searched or not.  And by default either don't enable it, or enable
it only on Debian/Ubuntu.

Jakub

Re: [Patch, Fortran, OOP] PR 49638: [OOP] length parameter is ignored when overriding type bound character functions with constant length.

2011-08-20 Thread Mikael Morin

On Saturday 20 August 2011 21:29:21 Janus Weil wrote:
> >> > There is for example (currently) no special handling for operators.
> >> 
> >> Well, unfortunately one cannot just return "-3" for non-matching
> >> operators. Just think of cases like A*(B+C) vs A*B+A*C.
> > 
> > Ah yes. I was thinking expressions themselves were compared; but only
> > their values are.
> 
> I'm not sure I'm getting you right here. Of course we do compare the
> expressions themselves. 
Yes, what I mean is...

> However, for example things like commutativity
> of operators are taken into account, meaning we compare "A+B" equal to
> "B+A" (A and B being arbitrary expressions).
... "A+B" and "B+A" are different expressions, with the same value.
And we return 0 (<=> equality) in that case. So we are interested in same-
value-ness, not same-expression-ness.
And we do compare expressions, because same expresssion ===> same value.
But different expression =/=> different value (as your example shows).

Oh well, nevermind.

Mikael

Re: [trans-mem] Add futex-based serial lock

2011-08-20 Thread Richard Henderson

On 08/20/2011 08:51 AM, Torvald Riegel wrote:
> Add futex-based serial lock.
> 
>   * config/linux/rwlock.h: New file.
>   * config/linux/rwlock.c: New file.
>   * configure.ac: Reenable futex support (undo SVN rev 157758).
>   * Makefile.am: Same.
>   * configure.tgt: Same.
>   * config/linux/alpha/futex_bits.h: Same.
>   * config/linux/futex.h: Same. Return number of woken processes.
>   * config/linux/futex.cc: Same.
>   (futex_wait): Remove spinning.
>   * config/linux/x86/futex_bits.h: Same. Set futex timeout to zero.
>   * aclocal.m4: Include generic futex checks.
>   * configure: Rebuild.
>   * Makefile.in: Rebuild.
>   * testsuite/Makefile.in: Rebuild.
>   * beginend.cc: Include pthread.h.
>   * config/posix/cachepage.cc: Same.

Ok.


r~

Re: [patch] support for multiarch systems

2011-08-20 Thread Joseph S. Myers

On Sat, 20 Aug 2011, Matthias Klose wrote:

> The multiarch triplets are defined in the target specific tmake files, and
> provided for all known existing multiarch implementations (currently Debian,
> Ubuntu and derivatives).  For non-multilib'd configurations, the triplet is

Is there a specification somewhere of what the various triplets mean?

> defined in MULTIARCH_DIRNAME, for multilib'd configurations each directory in
> MULTILIB_OSDIRNAMES gets an multiarch directory associated, separated by a 
> colon

I don't see any documentation in fragments.texi for this 
(MULTIARCH_DIRNAME is new so certainly needs documenting, even if you get 
away with not adding to the nonexistent documentation for 
MULTILIB_OSDIRNAMES (PR 25508)).

> (e.g. ../lib:x86_64-linux-gnu).  The multiarch names are as used by Debian, 
> the

Does this work with the "gccdir=osdir" and "gccdir=!osdir" cases before 
the colon?

> mips names go back to a discussion from 2006 [3] to match the ones for glibc.

For x86, shouldn't a name be allocated for x32?

For m68k, classic m68k and ColdFire have incompatible ABIs.  So you need 
to define what m68k-linux-gnu means of the two ABIs.  Unfortunately 
building for ColdFire has been broken for some time, since 
 (this ought to 
have been dependent on the --with-arch configurey).

For 32-bit Power, hard-float and soft-float ABIs are incompatible.  
Furthermore, the soft-float ABI is used at function-calling level for 
e500v1 and e500v2 - but there are differences in the details of the glibc 
symbols exported (and at least the fenv.h ABI is incompatible between 
soft-float and e500).  So actually there are four variants at the glibc 
level.  You need to define what powerpc-linux-gnu means and avoid it being 
used for anything incompatible.

For MIPS, the hard-float and soft-float ABIs are incompatible.  So you 
need twelve triplets, not six.

For ARM, you have a ChangeLog entry with no corresponding patch.  You need 
to distinguish big and little endian; old ABI, EABI soft-float ABI and 
EABI hard-float ABI (six triplets).

Not all of those variants necessarily are configurable in a multilib 
configuration in the FSF tree (the e500 variants can be achieved with 
powerpc-linux-gnuspe triplets, for example, but those don't have other 
multilibs).  So maybe some of the names won't actually appear in the FSF 
sources - but you still need to define the semantics of the names that do 
appear (whether in the manuals, on the GCC wiki or elsewhere) and 
preferably have somewhere to define semantics for the names not used in 
multilib configurations in FSF GCC.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-20 Thread Richard Henderson

On 08/19/2011 02:04 AM, Richard Guenther wrote:
> So make sure that __cpu_indicator initially has a conservative correct
> value?  I'd still prefer the constructor-in-libgcc option - if only because
> then the compiler-side is much simplified.
> 

Err, I thought __cpu_indicator was a function, not data.

I think we need to discuss this more...


r~

[patch] PR25508 - document MULTILIB_OSDIRNAMES

2011-08-20 Thread Matthias Klose

document MULTILIB_OSDIRNAMES, copied from genmultilib.

Ok for the trunk?

  Matthias

PR bootstrap/25508
* doc/fragments.texi: Document MULTILIB_OSDIRNAMES.

Index: gcc/doc/fragments.texi
===
--- gcc/doc/fragments.texi  (revision 177846)
+++ gcc/doc/fragments.texi  (working copy)
@@ -128,6 +128,19 @@
 of options to be used for all builds.  If you set this, you should
 probably set @code{CRTSTUFF_T_CFLAGS} to a dash followed by it.
 
+@findex MULTILIB_OSDIRNAMES
+@item MULTILIB_OSDIRNAMES
+If @code{MULTILIB_OPTIONS} is used, this variable specifies the list
+of OS subdirectory names.  The format is either the same as of
+@code{MULTILIB_DIRNAMES}, or a set of mappings.  When it is the same
+as @code{MULTILIB_DIRNAMES}, it describes the multilib directories
+using OS conventions, rather than GCC conventions.  When it is a set
+of mappings of the form @code{gccdir=osdir}, the left side gives the
+GCC convention and the right gives the equivalent OS defined location.
+If the osdir part begins with a !, the os directory names are used
+exclusively.  Use the mapping when there is no one-to-one equivalence
+between GCC levels and the OS.
+
 @findex NATIVE_SYSTEM_HEADER_DIR
 @item NATIVE_SYSTEM_HEADER_DIR
 If the default location for system headers is not @file{/usr/include},

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread Uros Bizjak

On Sat, Aug 20, 2011 at 2:09 PM, Uros Bizjak  wrote:

> Don't expand RORX through ix86_expand_binary_operator, generate it
> directly from expander. You are complicating things with splitters too
> much!
>
> I will rewrite this part of i386.md.

So, attached RFC patch handles BMI2 mul, shift and ror stuff.

Some remarks:
- M and N register modifiers are added to print low and high register
of a double word register pair. This is needed for mulx insn.
- ishiftx and rotatex instruction type attributes are added.
- "w" mode attribute is added to add register prefix for word mode.
This is needed to output QImode count register of shift insns.

- mulx is expanded directly from expander, IMO it is always a win to
generate this insn if available.

- Yb register constraint is added to conditionally enable generation
of BMI alternatives in generic shift and rotate patterns. The BMI
variant is generated only if RA chooses it as the most profitable
alternative.
- shift and rotate instructions are split post-reload from generic
patterns to strip flags clobber.
- zero-extended 64bit variants are also handled for shift and rotate insns.
- rotate right AND rotate left instructions are handled through rorx.

2011-08-20  Uros Bizjak  

* config/i386/i386.md (type): Add ishiftx and rotatex.
(length_immediate): Handle ishiftx and rotatex.
(imm_disp): Ditto.
(w): New mode attribute.

(mul3): Split from mul3.
(umul3): Ditto.  Generate bmi2_umul3_1 pattern
for TARGET_BMI2.
(bmi2_umul3_1): New insn pattern.

(*bmi2_ashl3_1): New insn pattern.
(*ashl3_1): Add ishiftx BMI2 alternative.
(*ashl3_1 splitter): New splitter to avoid flags dependency.
(*bmi2_ashlsi3_1_zext): New insn pattern.
(*ashlsi3_1_zext): Add ishiftx BMI2 alternative.
(*ashlsi3_1_zext splitter): New splitter to avoid flags dependency.

(*bmi2_3_1): New insn pattern.
(*3_1): Add ishiftx BMI2 alternative.
(*3_1 splitter): New splitter to avoid
flags dependency.
(*bmi2_si3_1_zext): New insn pattern.
(*si3_1_zext): Add ishiftx BMI2 alternative.
(*si3_1_zext splitter): New splitter to avoid
flags dependency.

(*bmi2_rorx3_1): New insn pattern.
(*3_1): Add rotatex BMI2 alternative.
(*rotate3_1 splitter): New splitter to avoid flags dependency.
(*rotatert3_1 splitter): Ditto.
(*bmi2_rorxsi3_1_zext): New insn pattern.
(*si3_1_zext): Add rotatex BMI2 alternative.
(*rotatesi3_1_zext  splitter): New splitter to avoid flags dependency.
(*rotatertsi3_1_zext splitter): Ditto.

* config/i386/constraints.md (Yb): New register constraint.
* config/i386/i386.c (print_reg): Handle 'M' and 'N' modifiers.
(print_operand): Ditto.

The patch is currently in RFC/RFT state, since I have no way to
properly test it. The patch bootstraps OK and regression test is clean
on x86_64-pc-linux-gnu {,-m32}. I tested the patch lightly on provided
testcases, so expected patterns are generated. Oh, and all insn
constraints should be changed from TARGET_BMI to TARGET_BMI2.

Uros.
Index: i386.md
===
--- i386.md (revision 177925)
+++ i386.md (working copy)
@@ -50,6 +50,8 @@
 ;; t --  likewise, print the V8SFmode name of the register.
 ;; h -- print the QImode name for a "high" register, either ah, bh, ch or dh.
 ;; y -- print "st(0)" instead of "st" as a register.
+;; M -- print the low register of a double word register pair.
+;; N -- print the high register of a double word register pair.
 ;; d -- print duplicated register operand for AVX instruction.
 ;; D -- print condition for SSE cmp instruction.
 ;; P -- if PIC, print an @PLT suffix.
@@ -377,7 +379,7 @@
 (define_attr "type"
   "other,multi,
alu,alu1,negnot,imov,imovx,lea,
-   incdec,ishift,ishift1,rotate,rotate1,imul,idiv,
+   incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,idiv,
icmp,test,ibr,setcc,icmov,
push,pop,call,callv,leave,
str,bitmanip,
@@ -414,8 +416,8 @@
   (const_int 0)
 (eq_attr "unit" "i387,sse,mmx")
   (const_int 0)
-(eq_attr "type" "alu,alu1,negnot,imovx,ishift,rotate,ishift1,rotate1,
- imul,icmp,push,pop")
+(eq_attr "type" "alu,alu1,negnot,imovx,ishift,ishiftx,ishift1,
+ rotate,rotatex,rotate1,imul,icmp,push,pop")
   (symbol_ref "ix86_attr_length_immediate_default (insn, true)")
 (eq_attr "type" "imov,test")
   (symbol_ref "ix86_attr_length_immediate_default (insn, false)")
@@ -675,7 +677,7 @@
  (and (match_operand 0 "memory_displacement_operand" "")
   (match_operand 1 "immediate_operand" "")))
   (const_string "true")
-(and (eq_attr "type" "alu,ishift,rotate,imul,idiv")
+(and (eq_attr "type" "alu,ishift,ishiftx,

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-20 Thread H.J. Lu

On Sat, Aug 20, 2011 at 2:02 PM, Richard Henderson  wrote:
> On 08/19/2011 02:04 AM, Richard Guenther wrote:
>> So make sure that __cpu_indicator initially has a conservative correct
>> value?  I'd still prefer the constructor-in-libgcc option - if only because
>> then the compiler-side is much simplified.
>>
>
> Err, I thought __cpu_indicator was a function, not data.
>
> I think we need to discuss this more...
>

In glibc, we export function __get_cpu_features as a private interface
used for IFUNC.  We can do something similar with libgcc very carefully.


-- 
H.J.

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread H.J. Lu

On Sat, Aug 20, 2011 at 2:16 PM, Uros Bizjak  wrote:
> On Sat, Aug 20, 2011 at 2:09 PM, Uros Bizjak  wrote:
>
>> Don't expand RORX through ix86_expand_binary_operator, generate it
>> directly from expander. You are complicating things with splitters too
>> much!
>>
>> I will rewrite this part of i386.md.
>
> So, attached RFC patch handles BMI2 mul, shift and ror stuff.
>
> Some remarks:
> - M and N register modifiers are added to print low and high register
> of a double word register pair. This is needed for mulx insn.
> - ishiftx and rotatex instruction type attributes are added.
> - "w" mode attribute is added to add register prefix for word mode.
> This is needed to output QImode count register of shift insns.
>
> - mulx is expanded directly from expander, IMO it is always a win to
> generate this insn if available.
>
> - Yb register constraint is added to conditionally enable generation
> of BMI alternatives in generic shift and rotate patterns. The BMI
> variant is generated only if RA chooses it as the most profitable
> alternative.
> - shift and rotate instructions are split post-reload from generic
> patterns to strip flags clobber.
> - zero-extended 64bit variants are also handled for shift and rotate insns.
> - rotate right AND rotate left instructions are handled through rorx.
>
> 2011-08-20  Uros Bizjak  
>
>        * config/i386/i386.md (type): Add ishiftx and rotatex.
>        (length_immediate): Handle ishiftx and rotatex.
>        (imm_disp): Ditto.
>        (w): New mode attribute.
>
>        (mul3): Split from mul3.
>        (umul3): Ditto.  Generate bmi2_umul3_1 pattern
>        for TARGET_BMI2.
>        (bmi2_umul3_1): New insn pattern.
>
>        (*bmi2_ashl3_1): New insn pattern.
>        (*ashl3_1): Add ishiftx BMI2 alternative.
>        (*ashl3_1 splitter): New splitter to avoid flags dependency.
>        (*bmi2_ashlsi3_1_zext): New insn pattern.
>        (*ashlsi3_1_zext): Add ishiftx BMI2 alternative.
>        (*ashlsi3_1_zext splitter): New splitter to avoid flags dependency.
>
>        (*bmi2_3_1): New insn pattern.
>        (*3_1): Add ishiftx BMI2 alternative.
>        (*3_1 splitter): New splitter to avoid
>        flags dependency.
>        (*bmi2_si3_1_zext): New insn pattern.
>        (*si3_1_zext): Add ishiftx BMI2 alternative.
>        (*si3_1_zext splitter): New splitter to avoid
>        flags dependency.
>
>        (*bmi2_rorx3_1): New insn pattern.
>        (*3_1): Add rotatex BMI2 alternative.
>        (*rotate3_1 splitter): New splitter to avoid flags dependency.
>        (*rotatert3_1 splitter): Ditto.
>        (*bmi2_rorxsi3_1_zext): New insn pattern.
>        (*si3_1_zext): Add rotatex BMI2 alternative.
>        (*rotatesi3_1_zext  splitter): New splitter to avoid flags dependency.
>        (*rotatertsi3_1_zext splitter): Ditto.
>
>        * config/i386/constraints.md (Yb): New register constraint.
>        * config/i386/i386.c (print_reg): Handle 'M' and 'N' modifiers.
>        (print_operand): Ditto.
>
> The patch is currently in RFC/RFT state, since I have no way to
> properly test it. The patch bootstraps OK and regression test is clean

We are using HSW emulator (SDE):

http://software.intel.com/en-us/articles/pre-release-license-agreement-for-intel-software-development-emulator-accept-end-user-license-agreement-and-download/

to test FMA, BMI/BMI2.  I have a SDE sim for dejagnu so that I can run
GCC testsuite under SDE.

> on x86_64-pc-linux-gnu {,-m32}. I tested the patch lightly on provided
> testcases, so expected patterns are generated. Oh, and all insn
> constraints should be changed from TARGET_BMI to TARGET_BMI2.
>
> Uros.
>

We can also implement MULX with split:

(define_split
  [(parallel [(set (match_operand: 0 "register_operand" "")
   (mult:
 (zero_extend:
   (match_operand:DWIH 1 "nonimmediate_operand" ""))
 (zero_extend:
   (match_operand:DWIH 2 "nonimmediate_operand" ""
  (clobber (reg:CC FLAGS_REG))])]
  "TARGET_BMI2
   && ix86_binary_operator_ok (MULT, mode, operands)"
  [(set (match_operand: 0 "register_operand" "")
(mult:
  (zero_extend:
(match_operand:DWIH 1 "register_operand" ""))
  (zero_extend:
(match_operand:DWIH 2 "nonimmediate_operand" ""])

(define_insn "*bmi2_umul3_1"
  [(set (match_operand: 0 "register_operand" "=r")
(mult:
  (zero_extend:
(match_operand:DWIH 1 "register_operand" "d"))
  (zero_extend:
(match_operand:DWIH 2 "nonimmediate_operand" "rm"]
  "TARGET_BMI2"
{
  if (mode == DImode)
return "mulx\t{%2, %M0, %N0|%N0, %M0, %2}";
  else
return "mulx\t{%2, %M0, %N0|%N0, %M0, %2}";
}
  [(set_attr "type" "imul")
   (set_attr "prefix" "vex")
   (set_attr "mode" "")])

-- 
H.J.

Re: [trans-mem] Add futex-based serial lock

2011-08-20 Thread H.J. Lu

On Sat, Aug 20, 2011 at 8:51 AM, Torvald Riegel  wrote:
> This adds a futex-based version of the serial lock for use on Linux. The
> futex code is basically old code of libitm (it got removed in SVN rev
> 157758) with one fix for sysfutex0 on x86_64 and one change that returns
> the number of woken processes (futex_wake).
>
> The gtm_rwlock is similar in concept to the mutex-based version, but
> adapted to futexes. It performs better than the mutex-based version. Not
> really great yet, but there is no spinning yet, so on contention we'll
> always have the overhead of waiting via the futexes.
>
> Again, RBTree with 1/2/4/6 threads, with different update %:
>  0%:  49    89   120   120
>  1%:  48    65    75    80
>  20%:  35    10    10     9
> 100%:  16.5   2.5   2.5   2.5
>
> For comparison, the mutex-based version:
> 0%:    49 / 90 / 120
> 1%:    47 / 59 /  27
> 20%    34 /  6 /   3
> 100%:  15 /  1 /   1
>
> OK for branch?
>

For x86. please use

#ifdef __x86_64__
# ifndef SYS_futex
#  define SYS_futex 202
# endif

so that it works with x32. See libgomp/config/linux/x86/futex.h

Thanks.

-- 
H.J.

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread Uros Bizjak

On Sat, Aug 20, 2011 at 11:31 PM, H.J. Lu  wrote:

> We can also implement MULX with split:
>
> (define_split
>  [(parallel [(set (match_operand: 0 "register_operand" "")
>                   (mult:
>                     (zero_extend:
>                       (match_operand:DWIH 1 "nonimmediate_operand" ""))
>                     (zero_extend:
>                       (match_operand:DWIH 2 "nonimmediate_operand" ""
>              (clobber (reg:CC FLAGS_REG))])]
>  "TARGET_BMI2
>   && ix86_binary_operator_ok (MULT, mode, operands)"
>  [(set (match_operand: 0 "register_operand" "")
>        (mult:
>          (zero_extend:
>            (match_operand:DWIH 1 "register_operand" ""))
>          (zero_extend:
>            (match_operand:DWIH 2 "nonimmediate_operand" ""])

Well, this is unconditional splitter, no better than current approach
where the pattern is expanded directly.

If you want to squeeze out the last 0.005% of performance, you should
add BMI alternative to existing umul pattern, leave the choice of
alternative to RA and split the exact alternative (that is, you need
some true_regnum calls in splitter constraint) after reload to mulx
pattern. Please, see new patterns for how this should be done.

I'm not against this approach, but after 10 hours of hacking, I just
wanted to leave it to an interested reader ;)

Uros.

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread Richard Henderson

On 08/20/2011 02:16 PM, Uros Bizjak wrote:
> - Yb register constraint is added to conditionally enable generation
> of BMI alternatives in generic shift and rotate patterns. The BMI
> variant is generated only if RA chooses it as the most profitable
> alternative.

We really should use the (relatively new) enabled attribute instead
of adding more and more conditional register constraints.

r~

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread H.J. Lu

On Sat, Aug 20, 2011 at 2:44 PM, Uros Bizjak  wrote:
> On Sat, Aug 20, 2011 at 11:31 PM, H.J. Lu  wrote:
>
>> We can also implement MULX with split:
>>
>> (define_split
>>  [(parallel [(set (match_operand: 0 "register_operand" "")
>>                   (mult:
>>                     (zero_extend:
>>                       (match_operand:DWIH 1 "nonimmediate_operand" ""))
>>                     (zero_extend:
>>                       (match_operand:DWIH 2 "nonimmediate_operand" ""
>>              (clobber (reg:CC FLAGS_REG))])]
>>  "TARGET_BMI2
>>   && ix86_binary_operator_ok (MULT, mode, operands)"
>>  [(set (match_operand: 0 "register_operand" "")
>>        (mult:
>>          (zero_extend:
>>            (match_operand:DWIH 1 "register_operand" ""))
>>          (zero_extend:
>>            (match_operand:DWIH 2 "nonimmediate_operand" ""])
>
> Well, this is unconditional splitter, no better than current approach
> where the pattern is expanded directly.
>
> If you want to squeeze out the last 0.005% of performance, you should
> add BMI alternative to existing umul pattern, leave the choice of
> alternative to RA and split the exact alternative (that is, you need
> some true_regnum calls in splitter constraint) after reload to mulx
> pattern. Please, see new patterns for how this should be done.
>
> I'm not against this approach, but after 10 hours of hacking, I just
> wanted to leave it to an interested reader ;)

We won't use split then.

Thanks.


-- 
H.J.

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread Richard Henderson

On 08/20/2011 02:16 PM, Uros Bizjak wrote:
> +(define_insn "bmi2_umul3_1"
> +  [(set (match_operand: 0 "register_operand" "=r")
> + (mult:
> +   (zero_extend:
> + (match_operand:DWIH 1 "nonimmediate_operand" "%d"))
> +   (zero_extend:
> + (match_operand:DWIH 2 "nonimmediate_operand" "rm"]
> +  "TARGET_BMI
> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> +  "mulx\t{%2, %M0, %N0|%N0, %M0, %2}"
> +  [(set_attr "type" "imul")
> +   (set_attr "prefix" "vex")
> +   (set_attr "mode" "")])

You can do better than this, and avoid the %M %N specifiers.
The outputs are truly independent and do not need to be a pair.

See the mn10300 umulsidi3{,_internal} patterns.


r~

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread H.J. Lu

On Sat, Aug 20, 2011 at 2:52 PM, Richard Henderson  wrote:
> On 08/20/2011 02:16 PM, Uros Bizjak wrote:
>> +(define_insn "bmi2_umul3_1"
>> +  [(set (match_operand: 0 "register_operand" "=r")
>> +     (mult:
>> +       (zero_extend:
>> +         (match_operand:DWIH 1 "nonimmediate_operand" "%d"))
>> +       (zero_extend:
>> +         (match_operand:DWIH 2 "nonimmediate_operand" "rm"]
>> +  "TARGET_BMI
>> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>> +  "mulx\t{%2, %M0, %N0|%N0, %M0, %2}"
>> +  [(set_attr "type" "imul")
>> +   (set_attr "prefix" "vex")
>> +   (set_attr "mode" "")])
>
> You can do better than this, and avoid the %M %N specifiers.
> The outputs are truly independent and do not need to be a pair.
>

Since RA use register pairs for TImode/DImode, should requiring
TI/DI registers in pairs generate better does?


-- 
H.J.

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread H.J. Lu

On Sat, Aug 20, 2011 at 3:02 PM, H.J. Lu  wrote:
> On Sat, Aug 20, 2011 at 2:52 PM, Richard Henderson  wrote:
>> On 08/20/2011 02:16 PM, Uros Bizjak wrote:
>>> +(define_insn "bmi2_umul3_1"
>>> +  [(set (match_operand: 0 "register_operand" "=r")
>>> +     (mult:
>>> +       (zero_extend:
>>> +         (match_operand:DWIH 1 "nonimmediate_operand" "%d"))
>>> +       (zero_extend:
>>> +         (match_operand:DWIH 2 "nonimmediate_operand" "rm"]
>>> +  "TARGET_BMI
>>> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>>> +  "mulx\t{%2, %M0, %N0|%N0, %M0, %2}"
>>> +  [(set_attr "type" "imul")
>>> +   (set_attr "prefix" "vex")
>>> +   (set_attr "mode" "")])
>>
>> You can do better than this, and avoid the %M %N specifiers.
>> The outputs are truly independent and do not need to be a pair.
>>
>
> Since RA use register pairs for TImode/DImode, should requiring
> TI/DI registers in pairs generate better does?
  ^^ codes.

Without register pairs, we are generating very strange codes.

-- 
H.J.

Re: [patch] PR25508 - document MULTILIB_OSDIRNAMES

2011-08-20 Thread Joseph S. Myers

On Sat, 20 Aug 2011, Matthias Klose wrote:

> +@findex MULTILIB_OSDIRNAMES
> +@item MULTILIB_OSDIRNAMES
> +If @code{MULTILIB_OPTIONS} is used, this variable specifies the list
> +of OS subdirectory names.  The format is either the same as of
> +@code{MULTILIB_DIRNAMES}, or a set of mappings.  When it is the same
> +as @code{MULTILIB_DIRNAMES}, it describes the multilib directories
> +using OS conventions, rather than GCC conventions.  When it is a set

I think more explanation is needed of what this means (where OS 
conventions are used and where GCC conventions are used).

> +of mappings of the form @code{gccdir=osdir}, the left side gives the

@var{gccdir}, @var{srcdir}.

> +GCC convention and the right gives the equivalent OS defined location.
> +If the osdir part begins with a !, the os directory names are used

@var{osdir}, @samp{!}.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [patch] PR25508 - document MULTILIB_OSDIRNAMES

2011-08-20 Thread Matthias Klose

On 08/21/2011 12:21 AM, Joseph S. Myers wrote:
> On Sat, 20 Aug 2011, Matthias Klose wrote:
> 
>> +@findex MULTILIB_OSDIRNAMES
>> +@item MULTILIB_OSDIRNAMES
>> +If @code{MULTILIB_OPTIONS} is used, this variable specifies the list
>> +of OS subdirectory names.  The format is either the same as of
>> +@code{MULTILIB_DIRNAMES}, or a set of mappings.  When it is the same
>> +as @code{MULTILIB_DIRNAMES}, it describes the multilib directories
>> +using OS conventions, rather than GCC conventions.  When it is a set
> 
> I think more explanation is needed of what this means (where OS 
> conventions are used and where GCC conventions are used).

well, could you point me to the GCC conventions?

Re: patch to solve PR49936

2011-08-20 Thread Vladimir Makarov


On 08/20/2011 06:13 AM, Richard Sandiford wrote:

Hi Vlad,

Vladimir Makarov  writes:

The following patch makes gcc4.7 behaving as gcc4.6 for the case
described on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936.

The patch was successfully bootstrapped on x86_64 and ppc64.

Committed as rev 177916.

2011-08-19  Vladimir Makarov

  PR rtl-optimization/49936
  * ira.c (ira_init_register_move_cost): Ignore too small subclasses
  for calculation of max register move costs.

Thanks for the patch.  The allocno class costs for MIPS look
much better now.

However, the patch seems to expose a latent problem with the use of
ira_reg_class_max_nregs.  We set the number of allocno objects based
on the ira_reg_class_max_nregs of the allocno class, but often
expect that to be the same as the ira_reg_class_max_nregs of the
pressure class.  I can't see anything in the calculation of the
pressure classes to enforce that though.

In current trunk, this shows up as a failure to build libgcc
on mips64-linux-gnu.  We abort on:

   pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
   nregs = ira_reg_class_max_nregs[pclass][ALLOCNO_MODE (a)];
   gcc_assert (nregs == n);

in ira-lives.c:mark_pseudo_regno_subword_live for the attached
testcase, compiled with -O2 -mabi=64.

In this case it's a MIPS backend bug.  The single pressure class
for MIPS is ALL_REGS, and CLASS_MAX_NREGS (ALL_REGS, TImode)
is returning 4, based on the fact that ALL_REGS includes the
floating-point condition codes.  (CCmode is hard-wired to 4 bytes,
so for CCV2 and CCV4, the correct number of registers is the size
of the mode divided by 4.)  Since floating-point condition codes
can't store TImode, the backend should be ignoring them and
returning 2 instead.  I'm testing a fix for that now.
Thanks, Richard.  It looks like my merging with Bernd's introduction of 
objects about year ago results in a few typos (they are present in 
mark_pseudo_subword_{live|dead} but absent in other places).  It is 
obvious that allocno class should be used instead of pressure class for 
estimation how many registers are used).  I have the patch to fix it 
too.  You could use it for your patch if you want.


I found also another typo in mark_pseudo_subword_live (strangely it is 
absent mark_pseudo_subword_dead).  The pressure should be increased by 1 
not by nregs.



Index: ira-lives.c
===
--- ira-lives.c (revision 177915)
+++ ira-lives.c (working copy)
@@ -285,7 +285,7 @@ static void
 mark_pseudo_regno_subword_live (int regno, int subword)
 {
   ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  int n, nregs;
+  int n;
   enum reg_class pclass;
   ira_object_t obj;
 
@@ -303,14 +303,14 @@ mark_pseudo_regno_subword_live (int regn
 }
 
   pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  nregs = ira_reg_class_max_nregs[pclass][ALLOCNO_MODE (a)];
-  gcc_assert (nregs == n);
+  gcc_assert
+(n == ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]);
   obj = ALLOCNO_OBJECT (a, subword);
 
   if (sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))
 return;
 
-  inc_register_pressure (pclass, nregs);
+  inc_register_pressure (pclass, 1);
   make_object_born (obj);
 }
 
@@ -414,7 +414,7 @@ static void
 mark_pseudo_regno_subword_dead (int regno, int subword)
 {
   ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  int n, nregs;
+  int n;
   enum reg_class cl;
   ira_object_t obj;
 
@@ -430,8 +430,8 @@ mark_pseudo_regno_subword_dead (int regn
 return;
 
   cl = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  nregs = ira_reg_class_max_nregs[cl][ALLOCNO_MODE (a)];
-  gcc_assert (nregs == n);
+  gcc_assert
+(n == ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]);
 
   obj = ALLOCNO_OBJECT (a, subword);
   if (!sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread Uros Bizjak

On Sat, Aug 20, 2011 at 11:47 PM, Richard Henderson  wrote:

>> - Yb register constraint is added to conditionally enable generation
>> of BMI alternatives in generic shift and rotate patterns. The BMI
>> variant is generated only if RA chooses it as the most profitable
>> alternative.
>
> We really should use the (relatively new) enabled attribute instead
> of adding more and more conditional register constraints.

Indeed. New version is attached - this one also implements imul
splitting to mulx.

2011-08-20  Uros Bizjak  

* config/i386/i386.md (type): Add imulx, ishiftx and rotatex.
(length_immediate): Handle imulx, ishiftx and rotatex.
(imm_disp): Ditto.
(isa): Add bmi2.
(enabled): Handle bmi2.
(w): New mode attribute.

(*mul3): Split from *mul3.
(*umul3): Ditto.  Add imulx BMI2 alternative.
(bmi2_umul3_1): New insn pattern.
(*umul3 splitter): New splitter to avoid flags dependency.

(*bmi2_ashl3_1): New insn pattern.
(*ashl3_1): Add ishiftx BMI2 alternative.
(*ashl3_1 splitter): New splitter to avoid flags dependency.
(*bmi2_ashlsi3_1_zext): New insn pattern.
(*ashlsi3_1_zext): Add ishiftx BMI2 alternative.
(*ashlsi3_1_zext splitter): New splitter to avoid flags dependency.

(*bmi2_3_1): New insn pattern.
(*3_1): Add ishiftx BMI2 alternative.
(*3_1 splitter): New splitter to avoid
flags dependency.
(*bmi2_si3_1_zext): New insn pattern.
(*si3_1_zext): Add ishiftx BMI2 alternative.
(*si3_1_zext splitter): New splitter to avoid
flags dependency.

(*bmi2_rorx3_1): New insn pattern.
(*3_1): Add rotatex BMI2 alternative.
(*rotate3_1 splitter): New splitter to avoid flags dependency.
(*rotatert3_1 splitter): Ditto.
(*bmi2_rorxsi3_1_zext): New insn pattern.
(*si3_1_zext): Add rotatex BMI2 alternative.
(*rotatesi3_1_zext  splitter): New splitter to avoid flags dependency.
(*rotatertsi3_1_zext splitter): Ditto.

* config/i386/i386.c (print_reg): Handle 'M' and 'N' modifiers.
(print_operand): Ditto.

Bootstrapped on x86_64-pc-linux-gnu, regression test in progress.

Uros.
Index: i386/i386.md
===
--- i386/i386.md(revision 177925)
+++ i386/i386.md(working copy)
@@ -50,6 +50,8 @@
 ;; t --  likewise, print the V8SFmode name of the register.
 ;; h -- print the QImode name for a "high" register, either ah, bh, ch or dh.
 ;; y -- print "st(0)" instead of "st" as a register.
+;; M -- print the low register of a double word register pair.
+;; N -- print the high register of a double word register pair.
 ;; d -- print duplicated register operand for AVX instruction.
 ;; D -- print condition for SSE cmp instruction.
 ;; P -- if PIC, print an @PLT suffix.
@@ -377,7 +379,7 @@
 (define_attr "type"
   "other,multi,
alu,alu1,negnot,imov,imovx,lea,
-   incdec,ishift,ishift1,rotate,rotate1,imul,idiv,
+   incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
icmp,test,ibr,setcc,icmov,
push,pop,call,callv,leave,
str,bitmanip,
@@ -414,8 +416,8 @@
   (const_int 0)
 (eq_attr "unit" "i387,sse,mmx")
   (const_int 0)
-(eq_attr "type" "alu,alu1,negnot,imovx,ishift,rotate,ishift1,rotate1,
- imul,icmp,push,pop")
+(eq_attr "type" "alu,alu1,negnot,imovx,ishift,ishiftx,ishift1,
+ rotate,rotatex,rotate1,imul,imulx,icmp,push,pop")
   (symbol_ref "ix86_attr_length_immediate_default (insn, true)")
 (eq_attr "type" "imov,test")
   (symbol_ref "ix86_attr_length_immediate_default (insn, false)")
@@ -675,7 +677,7 @@
  (and (match_operand 0 "memory_displacement_operand" "")
   (match_operand 1 "immediate_operand" "")))
   (const_string "true")
-(and (eq_attr "type" "alu,ishift,rotate,imul,idiv")
+(and (eq_attr "type" 
"alu,ishift,ishiftx,rotate,rotatex,imul,imulx,idiv")
  (and (match_operand 0 "memory_displacement_operand" "")
   (match_operand 2 "immediate_operand" "")))
   (const_string "true")
@@ -699,11 +701,12 @@
 (define_attr "movu" "0,1" (const_string "0"))

 ;; Used to control the "enabled" attribute on a per-instruction basis.
-(define_attr "isa" "base,noavx,avx"
+(define_attr "isa" "base,bmi2,noavx,avx"
   (const_string "base"))

 (define_attr "enabled" ""
-  (cond [(eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
+  (cond [(eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI")
+(eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 (eq_attr "isa" "avx") (symbol_ref "TARGET_AVX")
]
(const_int 1)))
@@ -947,6 +950,9 @@
 ;; Instruction suffix for REX 64bit operators.
 (define_mode_attr rex64suffix [(SI "") (DI "{q}")])

+;; Register prefix for w

Re: [patch] support for multiarch systems

2011-08-20 Thread Matthias Klose

On 08/20/2011 10:07 PM, Jakub Jelinek wrote:
> On Sat, Aug 20, 2011 at 09:51:33PM +0200, Matthias Klose wrote:
>> Tested on non-multilib'd and multilib'd systems, both native and cross 
>> builds.
>> Ok for the trunk?
> 
> I don't think we want to do this unconditionally, we already search way too
> many directories by default.  This is a Debian/Ubuntu specific setup, I
> don't think many others are going to use such a setup.
> So, IMHO you should make it configure time selectable whether those extra
> dirs are searched or not.  And by default either don't enable it, or enable
> it only on Debian/Ubuntu.

Ok, I made it conditional, enabled only if the crti.o file is found in a
multiarch path.

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread Richard Henderson

On 08/20/2011 03:03 PM, H.J. Lu wrote:
> On Sat, Aug 20, 2011 at 3:02 PM, H.J. Lu  wrote:
>>> You can do better than this, and avoid the %M %N specifiers.
>>> The outputs are truly independent and do not need to be a pair.
>>>
>>
>> Since RA use register pairs for TImode/DImode, should requiring
>> TI/DI registers in pairs generate better does?
>   ^^ codes.
> 
> Without register pairs, we are generating very strange codes.
> 

We ought to be making better use of the lower-subregs pass.
Representing independent outputs when possible enables that.

Admittedly, the i386 port needs more attention to really make
this happen properly.  But we don't need to make things even
worse in the meantime.

r~

Re: [patch] support for multiarch systems

2011-08-20 Thread Matthias Klose

On 08/20/2011 10:39 PM, Joseph S. Myers wrote:
> On Sat, 20 Aug 2011, Matthias Klose wrote:
> 
>> The multiarch triplets are defined in the target specific tmake files, and
>> provided for all known existing multiarch implementations (currently Debian,
>> Ubuntu and derivatives).  For non-multilib'd configurations, the triplet is
> 
> Is there a specification somewhere of what the various triplets mean?

there is
https://lists.linux-foundation.org/pipermail/lsb-discuss/2011-February/006674.html
http://wiki.debian.org/Multiarch/Tuples

but the documentation is not up to date. The tuples in use are:

$ for a in alpha amd64 armel armhf hppa i386 ia64 mips mipsel powerpc powerpcspe
ppc64 s390 s390x sh4 sparc sparc64 kfreebsd-i386 kfreebsd-amd64 hurd-i386; do
dpkg-architecture -a$a -qDEB_HOST_MULTIARCH 2>/dev/null; done
alpha-linux-gnu
x86_64-linux-gnu
arm-linux-gnueabi
arm-linux-gnueabihf
hppa-linux-gnu
i386-linux-gnu
ia64-linux-gnu
mips-linux-gnu
mipsel-linux-gnu
powerpc-linux-gnu
powerpc-linux-gnuspe
powerpc64-linux-gnu
s390-linux-gnu
s390x-linux-gnu
sh4-linux-gnu
sparc-linux-gnu
sparc64-linux-gnu
i386-kfreebsd-gnu
x86_64-kfreebsd-gnu
i386-gnu

>> defined in MULTIARCH_DIRNAME, for multilib'd configurations each directory in
>> MULTILIB_OSDIRNAMES gets an multiarch directory associated, separated by a 
>> colon
> 
> I don't see any documentation in fragments.texi for this 
> (MULTIARCH_DIRNAME is new so certainly needs documenting, even if you get 
> away with not adding to the nonexistent documentation for 
> MULTILIB_OSDIRNAMES (PR 25508)).

well, I hope I get away with copying it from genmultilib without closing the
report ;)

>> (e.g. ../lib:x86_64-linux-gnu).  The multiarch names are as used by Debian, 
>> the
> 
> Does this work with the "gccdir=osdir" and "gccdir=!osdir" cases before 
> the colon?

amd64 is configured this way, and I don't handle the !osdir case other than for
the multilib osdir.

>> mips names go back to a discussion from 2006 [3] to match thee, ones for 
>> glibc.
> 
> For x86, shouldn't a name be allocated for x32?

maybe, but I didn't see a port yet.

> For m68k, classic m68k and ColdFire have incompatible ABIs.  So you need 
> to define what m68k-linux-gnu means of the two ABIs.  Unfortunately 
> building for ColdFire has been broken for some time, since 
>  (this ought to 
> have been dependent on the --with-arch configurey).

it's the classic m68k. yes, it has to be defined.

> For 32-bit Power, hard-float and soft-float ABIs are incompatible.  
> Furthermore, the soft-float ABI is used at function-calling level for 
> e500v1 and e500v2 - but there are differences in the details of the glibc 
> symbols exported (and at least the fenv.h ABI is incompatible between 
> soft-float and e500).  So actually there are four variants at the glibc 
> level.  You need to define what powerpc-linux-gnu means and avoid it being 
> used for anything incompatible.

same here. powerpc-linux-gnu is the hard-float one. Debian has an e500 port in
development which currently uses powerpc-linux-gnuspe

> For MIPS, the hard-float and soft-float ABIs are incompatible.  So you 
> need twelve triplets, not six.

yes. but I didn't see a soft-float mips port yet.

> For ARM, you have a ChangeLog entry with no corresponding patch.  You need 
> to distinguish big and little endian; old ABI, EABI soft-float ABI and 
> EABI hard-float ABI (six triplets).

ok, added. Debian has little endian ports only. I see that dpkg treats the
obsolete armeb port as armeb-linux-gnu.

> Not all of those variants necessarily are configurable in a multilib 
> configuration in the FSF tree (the e500 variants can be achieved with 
> powerpc-linux-gnuspe triplets, for example, but those don't have other 
> multilibs).  So maybe some of the names won't actually appear in the FSF 
> sources - but you still need to define the semantics of the names that do 
> appear (whether in the manuals, on the GCC wiki or elsewhere) and 
> preferably have somewhere to define semantics for the names not used in 
> multilib configurations in FSF GCC.

For now, the multiarch documentation should be consolidated; I would like to add
a link from the FCC wiki to this documentation mentioned above.

  Matthias

Re: [patch] support for multiarch systems

2011-08-20 Thread Matthias Klose

On 08/20/2011 09:51 PM, Matthias Klose wrote:
> Multiarch [1] is the term being used to refer to the capability of a system to
> install and run applications of multiple different binary targets on the same
> system.  The idea and name of multiarch dates back to 2004/2005 [2] (to be
> confused with multiarch in glibc).

attached is an updated patch which includes feedback from Jakub and Joseph.

  Matthias

2011-08-21  Matthias Klose  

* doc/invoke.texi: Document -print-multiarch.
* doc/install.texi: Document --enable-multiarch.
* doc/fragments.texi (MULTILIB_OSDIRNAMES): Document optional
multiarch name. (MULTIARCH_DIRNAME): Document.
* configure.ac: New option --enable-multiarch. Substitute with_float.
* configure: Regenerate.
* Makefile.in (s-mlib): Pass MULTIARCH_DIRNAME to genmultilib.
(if_multiarch): Helper macro for use in tmake_files.
(with_float): Define.
* genmultilib: Add new option for the multiarch name.
* gcc.c (multiarch_dir): Define.
(for_each_path): Search for multiarch suffixes.
(driver_handle_option): Handle multiarch option.
(do_spec_1): Pass -imultiarch if defined.
(main): Print multiarch.
(set_multilib_dir): Separate multilib and multiarch names
from multilib_select.
(print_multilib_info): Ignore multiarch names in multilib_select.
* incpath.c (add_standard_paths): Search the multiarch include dirs.
* cppdeault.h (default_include): Document multiarch in multilib
member.
* cppdefault.c: [LOCAL_INCLUDE_DIR, STANDARD_INCLUDE_DIR] Add an
include directory for multiarch directories.
* common.opt: New options --print-multiarch and -imultilib.
* config/s390/t-linux64: Add multiarch names in MULTILIB_OSDIRNAMES.
* config/sparc/t-linux64: Likewise.
* config/powerpc/t-linux64: Likewise.
* config/i386/t-linux64: Likewise.
* config/mips/t-linux64: Likewise.
* config/alpha/t-linux: Define MULTIARCH_DIRNAME.
* config/arm/t-linux: Likewise.
* config/i386/t-linux: Likewise.
* config/pa/t-linux: Likewise.
* config/sparc/t-linux: Likewise.
* config/ia64/t-glibc: Define MULTIARCH_DIRNAME for linux target.
* gcc/config/i386/t-gnu: New, Define MULTIARCH_DIRNAME.
* gcc/config/i386/t-kfreebsd: New, Define MULTIARCH_DIRNAME and
MULTILIB_OSDIRNAMES.

Index: gcc/doc/fragments.texi
===
--- gcc/doc/fragments.texi  (revision 177846)
+++ gcc/doc/fragments.texi  (working copy)
@@ -128,6 +128,33 @@
 of options to be used for all builds.  If you set this, you should
 probably set @code{CRTSTUFF_T_CFLAGS} to a dash followed by it.

+@findex MULTILIB_OSDIRNAMES
+@item MULTILIB_OSDIRNAMES
+If @code{MULTILIB_OPTIONS} is used, this variable specifies the list
+of OS subdirectory names.  The format is either the same as of
+@code{MULTILIB_DIRNAMES}, or a set of mappings.  When it is the same
+as @code{MULTILIB_DIRNAMES}, it describes the multilib directories
+using OS conventions, rather than GCC conventions.  When it is a set
+of mappings of the form @var{gccdir}=@var{osdir}, the left side gives
+the GCC convention and the right gives the equivalent OS defined
+location.  If the @var{osdir} part begins with a @samp{!}, the os
+directory names are used exclusively.  Use the mapping when there is
+no one-to-one equivalence between GCC levels and the OS.
+
+For multilib enabled configurations (see @code{MULTIARCH_DIRNAME})
+below), the multilib name is appended to each directory name, separated
+by a colon (e.g. @samp{../lib:x86_64-linux-gnu}).
+
+@findex MULTIARCH_DIRNAME
+@item MULTIARCH_DIRNAME
+If @code{MULTIARCH_DIRNAME} is used, this variable specifies the
+multiarch name for this configuration.  For multiarch enabled
+configurations it is used to search libraries and crt files in
+@file{/lib/@var{multiarch}} and @file{/usr/lib/@var{multiarch}}, and
+system header files in @file{/usr/include/@var{multiarch}}.
+@code{MULTIARCH_DIRNAME} is not used for multilib enabled
+configurations, but encoded in @code{MULTILIB_OSDIRNAMES} instead.
+
 @findex NATIVE_SYSTEM_HEADER_DIR
 @item NATIVE_SYSTEM_HEADER_DIR
 If the default location for system headers is not @file{/usr/include},
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 177846)
+++ gcc/doc/invoke.texi (working copy)
@@ -5937,6 +5937,11 @@
 @file{../lib32}, or if OS libraries are present in @file{lib/@var{subdir}}
 subdirectories it prints e.g.@: @file{amd64}, @file{sparcv9} or @file{ev6}.

+@item -print-multiarch
+@opindex print-multiarch
+Print the path to OS libraries for the selected multiarch,
+relative to some @file{lib} subdirectory.
+
 @item -print-prog-name=@var{program}
 @opindex print-prog-name
 Like @option

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread Uros Bizjak

On Sun, Aug 21, 2011 at 1:58 AM, Richard Henderson  wrote:
> On 08/20/2011 03:03 PM, H.J. Lu wrote:
>> On Sat, Aug 20, 2011 at 3:02 PM, H.J. Lu  wrote:
 You can do better than this, and avoid the %M %N specifiers.
 The outputs are truly independent and do not need to be a pair.

>>>
>>> Since RA use register pairs for TImode/DImode, should requiring
>>> TI/DI registers in pairs generate better does?
>>                                                           ^^ codes.
>>
>> Without register pairs, we are generating very strange codes.
>>
>
> We ought to be making better use of the lower-subregs pass.
> Representing independent outputs when possible enables that.
>
> Admittedly, the i386 port needs more attention to really make
> this happen properly.  But we don't need to make things even
> worse in the meantime.

I will investigate this.

BTW: Latest patch has a small error. Insn mnemonic in following
pattern should be "mult" instead of "imult", so the correct version
reads:

+(define_insn "*umul3_1"
+  [(set (match_operand: 0 "register_operand" "=A,r")
+   (mult:
+ (zero_extend:
+   (match_operand:DWIH 1 "nonimmediate_operand" "%0,d"))
+ (zero_extend:
+   (match_operand:DWIH 2 "nonimmediate_operand" "rm,rm"
+   (clobber (reg:CC FLAGS_REG))]
+  "!(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "@
+   mul{}\t%2
+   #"

Uros.

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread H.J. Lu

On Sat, Aug 20, 2011 at 5:47 PM, Uros Bizjak  wrote:
> On Sun, Aug 21, 2011 at 1:58 AM, Richard Henderson  wrote:
>> On 08/20/2011 03:03 PM, H.J. Lu wrote:
>>> On Sat, Aug 20, 2011 at 3:02 PM, H.J. Lu  wrote:
> You can do better than this, and avoid the %M %N specifiers.
> The outputs are truly independent and do not need to be a pair.
>

 Since RA use register pairs for TImode/DImode, should requiring
 TI/DI registers in pairs generate better does?
>>>                                                           ^^ codes.
>>>
>>> Without register pairs, we are generating very strange codes.
>>>
>>
>> We ought to be making better use of the lower-subregs pass.
>> Representing independent outputs when possible enables that.
>>
>> Admittedly, the i386 port needs more attention to really make
>> this happen properly.  But we don't need to make things even
>> worse in the meantime.
>
> I will investigate this.
>

One problem is 32bit movdi and 64bit movti.  They require
register pairs.We may need to split them before RA.


-- 
H.J.

Re: [PATCH, testsuite, i386] BMI2 support for GCC

2011-08-20 Thread Richard Henderson

On 08/20/2011 05:52 PM, H.J. Lu wrote:
> One problem is 32bit movdi and 64bit movti.  They require
> register pairs.We may need to split them before RA.

lower-subreg ought to be able to look through plain moves...


r~

Re: patch to solve PR49936

2011-08-20 Thread Vladimir Makarov


On 08/20/2011 06:39 PM, Vladimir Makarov wrote:

On 08/20/2011 06:13 AM, Richard Sandiford wrote:

Hi Vlad,

Vladimir Makarov  writes:

The following patch makes gcc4.7 behaving as gcc4.6 for the case
described on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936.

The patch was successfully bootstrapped on x86_64 and ppc64.

Committed as rev 177916.

2011-08-19  Vladimir Makarov

  PR rtl-optimization/49936
  * ira.c (ira_init_register_move_cost): Ignore too small 
subclasses

  for calculation of max register move costs.

Thanks for the patch.  The allocno class costs for MIPS look
much better now.

However, the patch seems to expose a latent problem with the use of
ira_reg_class_max_nregs.  We set the number of allocno objects based
on the ira_reg_class_max_nregs of the allocno class, but often
expect that to be the same as the ira_reg_class_max_nregs of the
pressure class.  I can't see anything in the calculation of the
pressure classes to enforce that though.

In current trunk, this shows up as a failure to build libgcc
on mips64-linux-gnu.  We abort on:

   pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
   nregs = ira_reg_class_max_nregs[pclass][ALLOCNO_MODE (a)];
   gcc_assert (nregs == n);

in ira-lives.c:mark_pseudo_regno_subword_live for the attached
testcase, compiled with -O2 -mabi=64.

In this case it's a MIPS backend bug.  The single pressure class
for MIPS is ALL_REGS, and CLASS_MAX_NREGS (ALL_REGS, TImode)
is returning 4, based on the fact that ALL_REGS includes the
floating-point condition codes.  (CCmode is hard-wired to 4 bytes,
so for CCV2 and CCV4, the correct number of registers is the size
of the mode divided by 4.)  Since floating-point condition codes
can't store TImode, the backend should be ignoring them and
returning 2 instead.  I'm testing a fix for that now.
Thanks, Richard.  It looks like my merging with Bernd's introduction 
of objects about year ago results in a few typos (they are present in 
mark_pseudo_subword_{live|dead} but absent in other places).  It is 
obvious that allocno class should be used instead of pressure class 
for estimation how many registers are used).  I have the patch to fix 
it too.  You could use it for your patch if you want.


I found also another typo in mark_pseudo_subword_live (strangely it is 
absent mark_pseudo_subword_dead).  The pressure should be increased by 
1 not by nregs.




I've just committed the patch as rev. 177939.

I successfully bootstrapped it on x86-64 and ppc64.

2011-08-20  Vladimir Makarov 

* ira-lives.c (mark_pseudo_regno_subword_live): Use allocno class
for ira_reg_class_max_nregs.  Increase pressure by 1.
(mark_pseudo_regno_subword_dead): Use allocno class
for ira_reg_class_max_nregs.

Re: [patch] support for multiarch systems

2011-08-20 Thread Andrew Pinski

On Sat, Aug 20, 2011 at 5:11 PM, Matthias Klose  wrote:
>> For MIPS, the hard-float and soft-float ABIs are incompatible.  So you
>> need twelve triplets, not six.
>
> yes. but I didn't see a soft-float mips port yet.

We at Cavium has a soft-float mips port and in fact use debian as a
base OS for o32 (but hard float) and have our own n32/n64 libraries
which are soft-float.  mips64octeon-linux-gnu is a soft-float port
which can be used right now; I use that triplet right now to build GCC
on this soft-float based port.

Thanks,
Andrew Pinski

52 matches

Mail list logo