Re: Question on mips multiply patterns in md file

2010-03-16 Thread Amker.Cheng
> If you don't know anything about register class preferencing or reload as
> yet, then this is probably not going to make much sense to you, but it isn't
> anything important you need to worry about at this point.  It is a very
> minor performance optimization.
>
It makes sense to me now, though I haven't read codes for IRA and reloads yet.
Thanks for the detailed explanation.
>
> A define_split can only match something generated by a define_insn, and the
> mul_acc_si define_insn is testing "GENERATE_MADD_MSUB && !TARGET_MIPS16"
> so there is no serious problem.  We are just running a define_split that can
> never match anything.  This could be cleaned up a little by adding an
> appropriate condition to the define_split, or by combining the define_insn
> and define_split patterns into a define_insn_and_split pattern.

In upper words, you mean that define_split would only get chance to
split insn generated
by the corresponding pattern "define_insn \"*mul_acc_si\"", though the
split condition is
some kind of weak(with only "reload_completed"). Because that kind of
insn would only
be generated by the "define_insn \"*mul_acc_si\"" pattern.
Did I get it right? if so, i'm afraid this is actually not my question.

What wanna know is:
mips processors normally implement following kinds of mult/mult-acc insns:
mult: HILO <-- s * t
mul : HILO <-- s * t ; d <-- LO
madd  : HILO <-- HILO + s * t
madd2: HILO <-- HILO + s * t ;  d <-- HILO
cut here-
In my understanding, the macro GENERATE_MADD_MSUB is true when the processor has
madd insn, rather than madd2. And the macro "ISA_HAS_MUL3" is false if it has
no mul insn.

for this kind processor, gcc will
step 1 : generate insn using gen_mul3_internal, according to
pattern "mul3";
step 2 : the combiner try to combine by matching against pattern "*mul_acc_si";
step 3 : it's possible that gcc fail to get LO register allocated for
the combined "*mul_acc_si" insn;
step 4 : after reload, the combined insn will be split according to
the split pattern listed in previous mail.
step 5 : the split insn is actually a "mul3_internal" , but get
no LO allocated, which break the
constraints in "mul3_internal" pattern;

So, what should I do to handle this case? I see no methods except
adding new split pattern like:

(define_split
 [(set (match_operand:SI 0 "d_operand")
   (plus:SI (mult:SI (match_operand:SI 1 "d_operand")
 (match_operand:SI 2 "d_operand"))
(match_operand:SI 3 "d_operand")))
  (clobber (match_operand:SI 4 "lo_operand"))
  (clobber (match_operand:SI 5 "d_operand"))]
 "SPECIAL_PROCESSOR && reload_completed"
 [(parallel [(set (match_dup 4)
  (mult:SI (match_dup 1) (match_dup 2)))
 (clobber (match_dup 4))])
  (set (match_dup 5) (match_dup 4))
  (set (match_dup 0) (plus:SI (match_dup 5) (match_dup 3)))]
 "")

Thanks again, looking forward your new explanations.


-- 
Best Regards.


Re: GCC 4.5 Status Report (2010-03-15)

2010-03-16 Thread Ramana Radhakrishnan


>  42509, arm-gnueabi doesn't bootstrap but is a primary target

I haven't had the time in the past few weeks to work on this
effectively. I'll be able to find some time to work on this during this
week and will get back on this.

cheers
Ramana




Re: fixed-point support in c++

2010-03-16 Thread Sean D'Epagnier
>   The problem is that it won't be as simple as that.  You'll have to extend
> the C++ parser to accept those new RID_ values that it was previously never
> expecting to see in those contexts, I would think (but haven't verified
> against the source yet).  The C++ parser is a hand-coded recursive-descent
> parser, so I wouldn't expect it to be generically able to deal with
> previously-unknown token types suddenly appearing in its input stream at
> arbitrary points.
>
> cheers,
>   DaveK
>

I went through the c++ parser and added support for fixed point there.
 Everything seems to be working, and I am able to use fixed-point
numbers in c++.

The c++ parser is kind of complex and it is possible I missed
something.  I would love to get feedback on this patch, and hopefully
it can get committed to gcc.

Sean
Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 157409)
+++ gcc/builtins.c	(working copy)
@@ -1708,6 +1708,7 @@
 case INTEGER_TYPE:	   return integer_type_class;
 case ENUMERAL_TYPE:	   return enumeral_type_class;
 case BOOLEAN_TYPE:	   return boolean_type_class;
+case FIXED_POINT_TYPE: return fixed_point_type_class;
 case POINTER_TYPE:	   return pointer_type_class;
 case REFERENCE_TYPE:   return reference_type_class;
 case OFFSET_TYPE:	   return offset_type_class;
Index: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 157409)
+++ gcc/fold-const.c	(working copy)
@@ -12303,6 +12303,11 @@
   if (TREE_CODE (arg1) == INTEGER_CST && tree_int_cst_sgn (arg1) < 0)
 	return NULL_TREE;
 
+  /* Since fixed point types cannot perform bitwise and, or, etc..
+	 don't try to convert to an expression with them.  */
+  if (TREE_CODE(type) == FIXED_POINT_TYPE)
+	return NULL_TREE;
+
   /* Turn (a OP c1) OP c2 into a OP (c1+c2).  */
   if (TREE_CODE (op0) == code && host_integerp (arg1, false)
 	  && TREE_INT_CST_LOW (arg1) < TYPE_PRECISION (type)
Index: gcc/cp/typeck.c
===
--- gcc/cp/typeck.c	(revision 157409)
+++ gcc/cp/typeck.c	(working copy)
@@ -316,6 +316,91 @@
   if (code2 == REAL_TYPE && code1 != REAL_TYPE)
 return build_type_attribute_variant (t2, attributes);
 
+  /* Deal with fixed-point types.  */
+  if (code1 == FIXED_POINT_TYPE || code2 == FIXED_POINT_TYPE)
+{
+  unsigned int unsignedp = 0, satp = 0;
+  enum machine_mode m1, m2;
+  unsigned int fbit1, ibit1, fbit2, ibit2, max_fbit, max_ibit;
+
+  m1 = TYPE_MODE (t1);
+  m2 = TYPE_MODE (t2);
+
+  /* If one input type is saturating, the result type is saturating.  */
+  if (TYPE_SATURATING (t1) || TYPE_SATURATING (t2))
+	satp = 1;
+
+  /* If both fixed-point types are unsigned, the result type is unsigned.
+	 When mixing fixed-point and integer types, follow the sign of the
+	 fixed-point type.
+	 Otherwise, the result type is signed.  */
+  if ((TYPE_UNSIGNED (t1) && TYPE_UNSIGNED (t2)
+	   && code1 == FIXED_POINT_TYPE && code2 == FIXED_POINT_TYPE)
+	  || (code1 == FIXED_POINT_TYPE && code2 != FIXED_POINT_TYPE
+	  && TYPE_UNSIGNED (t1))
+	  || (code1 != FIXED_POINT_TYPE && code2 == FIXED_POINT_TYPE
+	  && TYPE_UNSIGNED (t2)))
+	unsignedp = 1;
+
+  /* The result type is signed.  */
+  if (unsignedp == 0)
+	{
+	  /* If the input type is unsigned, we need to convert to the
+	 signed type.  */
+	  if (code1 == FIXED_POINT_TYPE && TYPE_UNSIGNED (t1))
+	{
+	  enum mode_class mclass = (enum mode_class) 0;
+	  if (GET_MODE_CLASS (m1) == MODE_UFRACT)
+		mclass = MODE_FRACT;
+	  else if (GET_MODE_CLASS (m1) == MODE_UACCUM)
+		mclass = MODE_ACCUM;
+	  else
+		gcc_unreachable ();
+	  m1 = mode_for_size (GET_MODE_PRECISION (m1), mclass, 0);
+	}
+	  if (code2 == FIXED_POINT_TYPE && TYPE_UNSIGNED (t2))
+	{
+	  enum mode_class mclass = (enum mode_class) 0;
+	  if (GET_MODE_CLASS (m2) == MODE_UFRACT)
+		mclass = MODE_FRACT;
+	  else if (GET_MODE_CLASS (m2) == MODE_UACCUM)
+		mclass = MODE_ACCUM;
+	  else
+		gcc_unreachable ();
+	  m2 = mode_for_size (GET_MODE_PRECISION (m2), mclass, 0);
+	}
+	}
+
+  if (code1 == FIXED_POINT_TYPE)
+	{
+	  fbit1 = GET_MODE_FBIT (m1);
+	  ibit1 = GET_MODE_IBIT (m1);
+	}
+  else
+	{
+	  fbit1 = 0;
+	  /* Signed integers need to subtract one sign bit.  */
+	  ibit1 = TYPE_PRECISION (t1) - (!TYPE_UNSIGNED (t1));
+	}
+
+  if (code2 == FIXED_POINT_TYPE)
+	{
+	  fbit2 = GET_MODE_FBIT (m2);
+	  ibit2 = GET_MODE_IBIT (m2);
+	}
+  else
+	{
+	  fbit2 = 0;
+	  /* Signed integers need to subtract one sign bit.  */
+	  ibit2 = TYPE_PRECISION (t2) - (!TYPE_UNSIGNED (t2));
+	}
+
+  max_ibit = ibit1 >= ibit2 ?  ibit1 : ibit2;
+  max_fbit = fbit1 >= fbit2 ?  fbit1 : fbit2;
+  return c_common_fixed_point_type_for_size (max_ibit, max_fbit, unsignedp,
+		 satp);
+ 

Re: GCC 4.5 Status Report (2010-03-15)

2010-03-16 Thread Richard Guenther
On Mon, 15 Mar 2010, NightStrike wrote:

> On Mon, Mar 15, 2010 at 12:18 PM, Richard Guenther  wrote:
> > As maintainers do not care for P1 bugs in their maintainance area
> > so will the release managers not consider them P1.
> 
> Probably not the best reason to downgrade a bug, eh?

Well - patches welcome!

Richard.


Re: GCC 4.5 Status Report (2010-03-15)

2010-03-16 Thread Steven Bosscher
On Tue, Mar 16, 2010 at 11:12 AM, Richard Guenther  wrote:
> On Mon, 15 Mar 2010, NightStrike wrote:
>
>> On Mon, Mar 15, 2010 at 12:18 PM, Richard Guenther  wrote:
>> > As maintainers do not care for P1 bugs in their maintainance area
>> > so will the release managers not consider them P1.
>>
>> Probably not the best reason to downgrade a bug, eh?
>
> Well - patches welcome!

Indeed. And one has to realize that fixing all these bugs becomes a
real problem for GCC, as a project, if the company with the largest
listed number of maintainers (many of them of components with P1 bugs)
chooses to not contribue at all to the bug-fixing effort before the
release...

Ciao!
Steven


Re: GCC 4.5 Status Report (2010-03-15)

2010-03-16 Thread Richard Guenther
On Tue, 16 Mar 2010, Steven Bosscher wrote:

> On Tue, Mar 16, 2010 at 11:12 AM, Richard Guenther  wrote:
> > On Mon, 15 Mar 2010, NightStrike wrote:
> >
> >> On Mon, Mar 15, 2010 at 12:18 PM, Richard Guenther  
> >> wrote:
> >> > As maintainers do not care for P1 bugs in their maintainance area
> >> > so will the release managers not consider them P1.
> >>
> >> Probably not the best reason to downgrade a bug, eh?
> >
> > Well - patches welcome!
> 
> Indeed. And one has to realize that fixing all these bugs becomes a
> real problem for GCC, as a project, if the company with the largest
> listed number of maintainers (many of them of components with P1 bugs)
> chooses to not contribue at all to the bug-fixing effort before the
> release...

To be fair the people of that company do not expose bugs proportional
to their headcount either.

Richard.


Re: GCC 4.5 Status Report (2010-03-15)

2010-03-16 Thread Steven Bosscher
On Tue, Mar 16, 2010 at 12:25 PM, Richard Guenther  wrote:
> On Tue, 16 Mar 2010, Steven Bosscher wrote:
>
>> On Tue, Mar 16, 2010 at 11:12 AM, Richard Guenther  wrote:
>> > On Mon, 15 Mar 2010, NightStrike wrote:
>> >
>> >> On Mon, Mar 15, 2010 at 12:18 PM, Richard Guenther  
>> >> wrote:
>> >> > As maintainers do not care for P1 bugs in their maintainance area
>> >> > so will the release managers not consider them P1.
>> >>
>> >> Probably not the best reason to downgrade a bug, eh?
>> >
>> > Well - patches welcome!
>>
>> Indeed. And one has to realize that fixing all these bugs becomes a
>> real problem for GCC, as a project, if the company with the largest
>> listed number of maintainers (many of them of components with P1 bugs)
>> chooses to not contribue at all to the bug-fixing effort before the
>> release...
>
> To be fair the people of that company do not expose bugs proportional
> to their headcount either.

Neither do I, and yet I try to help ;-)

Ciao!
Steven


Re: GCC 4.5 Status Report (2010-03-15)

2010-03-16 Thread Paul Richard Thomas
Richi, Steven,

>> To be fair the people of that company do not expose bugs proportional
>> to their headcount either.
>
> Neither do I, and yet I try to help ;-)


Now, now, you two :-)

Paul


Re: GCC 4.5 Status Report (2010-03-15)

2010-03-16 Thread Joseph S. Myers
On Mon, 15 Mar 2010, Richard Guenther wrote:

>  42509, arm-gnueabi doesn't bootstrap but is a primary target

The primary target is arm-eabi, which is a bare-metal target; the arm-eabi 
and mipsisa64-elf references must be understood as referring to building 
and testing a cross compiler from some other primary platform, since you 
can't bootstrap on those systems.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: fixincl 'make check' regressions...

2010-03-16 Thread Bruce Korb
The intent was to clear up some stuff in the README.
When I noticed that I had affected other files, I had tried to put
everything back.  Obviously a glitch.  I'll fix it when I get home
tonight.

On Mon, Mar 15, 2010 at 11:00 PM, David Miller  wrote:
>
> Ever since your changes installed on March 12th, I've been getting
> fixincludes testsuite failures of the form below.
>
> I also notice that none of these changes added ChangeLog entries, and
> furthermore the SVN commit messages were extremely terse so it was
> hard to diagnose the intent or reasoning behind your changes.
>
> iso/math_c99.h 
> /home/davem/src/GIT/GCC/gcc/fixincludes/tests/base/iso/math_c99.h differ: 
> char 1366, line 52
> *** iso/math_c99.h      Mon Mar 15 22:55:36 2010
> --- /home/davem/src/GIT/GCC/gcc/fixincludes/tests/base/iso/math_c99.h   Thu 
> Jan 21 04:06:11 2010
> ***
> *** 49,55 
>                           ? __builtin_signbitf(x) \
>                           : sizeof(x) == sizeof(long double) \
>                             ? __builtin_signbitl(x) \
> !                            : __builtin_signbit(x));
>  #endif  /* SOLARIS_MATH_8_CHECK */
>
>
> --- 49,55 
>                           ? __builtin_signbitf(x) \
>                           : sizeof(x) == sizeof(long double) \
>                             ? __builtin_signbitl(x) \
> !                            : __builtin_signbit(x))
>  #endif  /* SOLARIS_MATH_8_CHECK */
>
>
>
> There were fixinclude test FAILURES
>


Re: (un)aligned accesses on x86 platform.

2010-03-16 Thread H.J. Lu
2010/3/8 Paweł Sikora :
> hi,
>
> during development a cross platform appliacation on x86 workstation
> i've enabled an alignemnt checking [1] to catch possible erroneous
> code before it appears on client's sparc/arm cpu with sigbus ;)
>
> it works pretty fine and catches alignment violations but Jakub Jelinek
> had told me (on glibc bugzilla) that gcc on x86 can still dereference
> an unaligned pointer (except for vector insns).
> i suppose it means that gcc can emit e.g. movl for access a short int
> (or maybe others scenarios) in some cases and violates cpu alignment rules.
>
> so, is it possible to instruct gcc-x86 to always use suitable loads/stores
> like on sparc/arm?
>
> [1] "AC" bit - http://en.wikipedia.org/wiki/FLAGS_register_(computing)
>

I am interested in an -mstrict-alignment option for x86.

-- 
H.J.


Re: (un)aligned accesses on x86 platform.

2010-03-16 Thread Tristan Gingold

On Mar 16, 2010, at 3:50 PM, H.J. Lu wrote:

> 2010/3/8 Paweł Sikora :
>> hi,
>> 
>> during development a cross platform appliacation on x86 workstation
>> i've enabled an alignemnt checking [1] to catch possible erroneous
>> code before it appears on client's sparc/arm cpu with sigbus ;)
>> 
>> it works pretty fine and catches alignment violations but Jakub Jelinek
>> had told me (on glibc bugzilla) that gcc on x86 can still dereference
>> an unaligned pointer (except for vector insns).
>> i suppose it means that gcc can emit e.g. movl for access a short int
>> (or maybe others scenarios) in some cases and violates cpu alignment rules.
>> 
>> so, is it possible to instruct gcc-x86 to always use suitable loads/stores
>> like on sparc/arm?
>> 
>> [1] "AC" bit - http://en.wikipedia.org/wiki/FLAGS_register_(computing)
>> 
> 
> I am interested in an -mstrict-alignment option for x86.

Not sure it will be useful.  The libc still does unaligned accesses IIRC.



Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-16 Thread Dominique Dhumieres
In the block "Handle constant exponents." in gcc/builtins.c, the condition
!optimize_size has been replaced with optimize_insn_for_speed_p () between
gcc 4.3 and 4.4, but I have not been able to find when and why.
Does anybody remembers the when and why?

This change make the optimization sensitive to PR40106 and unless it has
compeling reasons it should be reverted in this piece of code.

My second question is why using optimize_size? I think it would be better
to define an upper bound instead of POWI_MAX_MULTS that depends on the kind
of optimisation. I cannot see any situation in which sqrt(a) would not be
better that pow(a,0.5) for speed, size, and accuracy.

TIA

Dominique



Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-16 Thread Richard Guenther
On Tue, Mar 16, 2010 at 4:11 PM, Dominique Dhumieres  wrote:
> In the block "Handle constant exponents." in gcc/builtins.c, the condition
> !optimize_size has been replaced with optimize_insn_for_speed_p () between
> gcc 4.3 and 4.4, but I have not been able to find when and why.
> Does anybody remembers the when and why?
>
> This change make the optimization sensitive to PR40106 and unless it has
> compeling reasons it should be reverted in this piece of code.
>
> My second question is why using optimize_size? I think it would be better
> to define an upper bound instead of POWI_MAX_MULTS that depends on the kind
> of optimisation. I cannot see any situation in which sqrt(a) would not be
> better that pow(a,0.5) for speed, size, and accuracy.

pow (a, 0.5) is always expanded to sqrt(a).  It is when we require
additional multiplications, pow (a, n) -> sqrt (a) * a**(n/2), that
optimize_insn_for_speed_p () is checked.

Richard.

> TIA
>
> Dominique
>
>


Re: (un)aligned accesses on x86 platform.

2010-03-16 Thread Alexey Salmin
On Tue, Mar 16, 2010 at 9:05 PM, Tristan Gingold  wrote:
>
> On Mar 16, 2010, at 3:50 PM, H.J. Lu wrote:
>
>> 2010/3/8 Paweł Sikora :
>>> hi,
>>>
>>> during development a cross platform appliacation on x86 workstation
>>> i've enabled an alignemnt checking [1] to catch possible erroneous
>>> code before it appears on client's sparc/arm cpu with sigbus ;)
>>>
>>> it works pretty fine and catches alignment violations but Jakub Jelinek
>>> had told me (on glibc bugzilla) that gcc on x86 can still dereference
>>> an unaligned pointer (except for vector insns).
>>> i suppose it means that gcc can emit e.g. movl for access a short int
>>> (or maybe others scenarios) in some cases and violates cpu alignment rules.
>>>
>>> so, is it possible to instruct gcc-x86 to always use suitable loads/stores
>>> like on sparc/arm?
>>>
>>> [1] "AC" bit - http://en.wikipedia.org/wiki/FLAGS_register_(computing)
>>>
>>
>> I am interested in an -mstrict-alignment option for x86.
>
> Not sure it will be useful.  The libc still does unaligned accesses IIRC.
>
>

Wow. What for?

Alexey


Re: (un)aligned accesses on x86 platform.

2010-03-16 Thread Tristan Gingold

On Mar 16, 2010, at 4:37 PM, Alexey Salmin wrote:
>>> I am interested in an -mstrict-alignment option for x86.
>> 
>> Not sure it will be useful.  The libc still does unaligned accesses IIRC.
>> 
> 
> Wow. What for?

Well, simply because it is not compiled with strict alignment.  There might 
also be some optimization in
memory operation that does unaligned accesses.


Question about removing multiple elements from VEC

2010-03-16 Thread Jie Zhang

Hi,

I'm looking at this FIXME in cp/typeck2.c.

  /* FIXME: Ordered removal is O(1) so the whole function is
 worst-case quadratic. This could be fixed using an aside
 bitmap to record which elements must be removed and remove
 them all at the same time. Or by merging
 split_non_constant_init into process_init_constructor_array,
 that is separating constants from non-constants while building
 the vector.  */
  VEC_ordered_remove (constructor_elt, CONSTRUCTOR_ELTS (init),
  idx);

It seems there is no VEC function which can use a bitmap to do a ordered 
multiple remove. Did I miss something or I have to write one?



Regards,
--
Jie Zhang
CodeSourcery
(650) 331-3385 x735


Re: (un)aligned accesses on x86 platform.

2010-03-16 Thread Alexey Salmin
On Tue, Mar 16, 2010 at 9:48 PM, Tristan Gingold  wrote:
>
> On Mar 16, 2010, at 4:37 PM, Alexey Salmin wrote:
 I am interested in an -mstrict-alignment option for x86.
>>>
>>> Not sure it will be useful.  The libc still does unaligned accesses IIRC.
>>>
>>
>> Wow. What for?
>
> Well, simply because it is not compiled with strict alignment.  There might 
> also be some optimization in
> memory operation that does unaligned accesses.

I always thought that unaligned access is much slower than aligned
one. You mean code-size optimizations?

Alexey


Re: Question about removing multiple elements from VEC

2010-03-16 Thread Richard Guenther
On Tue, Mar 16, 2010 at 5:02 PM, Jie Zhang  wrote:
> Hi,
>
> I'm looking at this FIXME in cp/typeck2.c.
>
>      /* FIXME: Ordered removal is O(1) so the whole function is
>         worst-case quadratic. This could be fixed using an aside
>         bitmap to record which elements must be removed and remove
>         them all at the same time. Or by merging
>         split_non_constant_init into process_init_constructor_array,
>         that is separating constants from non-constants while building
>         the vector.  */
>      VEC_ordered_remove (constructor_elt, CONSTRUCTOR_ELTS (init),
>                          idx);
>
> It seems there is no VEC function which can use a bitmap to do a ordered
> multiple remove. Did I miss something or I have to write one?

You have to write one.

Richard.


Re: Question about removing multiple elements from VEC

2010-03-16 Thread Jie Zhang

On 03/17/2010 12:08 AM, Richard Guenther wrote:

On Tue, Mar 16, 2010 at 5:02 PM, Jie Zhang  wrote:

Hi,

I'm looking at this FIXME in cp/typeck2.c.

  /* FIXME: Ordered removal is O(1) so the whole function is
 worst-case quadratic. This could be fixed using an aside
 bitmap to record which elements must be removed and remove
 them all at the same time. Or by merging
 split_non_constant_init into process_init_constructor_array,
 that is separating constants from non-constants while building
 the vector.  */
  VEC_ordered_remove (constructor_elt, CONSTRUCTOR_ELTS (init),
  idx);

It seems there is no VEC function which can use a bitmap to do a ordered
multiple remove. Did I miss something or I have to write one?


You have to write one.


Thanks!

--
Jie Zhang
CodeSourcery
(650) 331-3385 x735


Re: (un)aligned accesses on x86 platform.

2010-03-16 Thread Piotr Wyderski
Alexey Salmin wrote:

> I always thought that unaligned access is much slower than aligned one.

It is not *MUCH* slower, just slower (unless you cross cache line
boundary). Unaligned accesses are very useful for improving
performance of, among other things, certain hash functions (e.g. Paul
Hsieh's one).

Best regards,
Piotr Wyderski


Re: (un)aligned accesses on x86 platform.

2010-03-16 Thread Jakub Jelinek
On Tue, Mar 16, 2010 at 10:04:04PM +0600, Alexey Salmin wrote:
> >> Wow. What for?
> >
> > Well, simply because it is not compiled with strict alignment.  There might 
> > also be some optimization in
> > memory operation that does unaligned accesses.
> 
> I always thought that unaligned access is much slower than aligned
> one. You mean code-size optimizations?

It is, but if you need to choose between doing say an unaligned 32-bit
read access and reading it in 4 8-bit reads and assembling those together,
on many targets that do allow unaligned accesses the former is much faster.
Especially if in most cases the read is actually aligned and only in rare
cases it is unaligned...

Jakub


Re: LTO and asm specs...

2010-03-16 Thread Richard Henderson
On 03/12/2010 09:33 PM, David Miller wrote:
> I couldn't figure out immediately how to fix this as the
> way LTO does spec overriding and such looked non-trivial.

It would not be a bad thing, IMO, if the sparc assembler
were extended to be able to emit any reloc directly, without
needing a specific command-line option.  Then you'd only
encounter this problem with legacy assemblers.


r~


Re: LTO and asm specs...

2010-03-16 Thread David Miller
From: Richard Henderson 
Date: Tue, 16 Mar 2010 11:31:44 -0700

> On 03/12/2010 09:33 PM, David Miller wrote:
>> I couldn't figure out immediately how to fix this as the
>> way LTO does spec overriding and such looked non-trivial.
> 
> It would not be a bad thing, IMO, if the sparc assembler
> were extended to be able to emit any reloc directly, without
> needing a specific command-line option.  Then you'd only
> encounter this problem with legacy assemblers.

It's not the assemblers fault.

We're using %hi() and expecting the assembler to emit a
PC relative relcation just because the symbol name happens
to be _GLOBAL_OFFSET_TABLE_  And it will do this, but only
when -PIC.  Changing that is pretty dangerous.

But even if we got past that, we need to get the assembler options
right in order to enable instruction classes.  For example we have to
get -Av9a there when using VIS instructions.

Other platforms are going to hit things like this too.

LTO really needs to evaluate the specs correctly.


Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
Hi,

I'm rather surprised that now, in the "sane default world", only __i386
is defined, whereas __i686 is not on x86_64 -m32, I need -march=i686 on
the command line (together with -m32).

I noticed that while analyzing libstdc++/43394, where I was surprised
that some preprocessor lines, legacy code actually, in the library code
for parallel mode do not "notice" that we have now a better default:

#elif defined(__GNUC__) && defined(__i386) &&   \
  (defined(__i686) || defined(__pentium4) || defined(__athlon))
return __sync_fetch_and_add(__ptr, __addend);

... indeed, such lines want __i686 in order to safely enable the builtin
and still find it undefined.

If - as it's probably the case - I'm a bit confused about the meaning of
those __i?86 macros, what people suggest instead? I suspect my
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_* could be put to good use, still I'm
still curious about the exact semantics of the __i?86 macros...

Thanks in advance,
Paolo.


Re: LTO and asm specs...

2010-03-16 Thread Richard Henderson
On 03/16/2010 12:28 PM, David Miller wrote:
> It's not the assemblers fault.
> 
> We're using %hi() and expecting the assembler to emit a
> PC relative relcation just because the symbol name happens
> to be _GLOBAL_OFFSET_TABLE_  And it will do this, but only
> when -PIC.  Changing that is pretty dangerous.

It is the assembler's fault because it doesn't provide %pcrelhi() or
some such to allow the compiler (or asm programmer) to emit exactly
the relocation that's desired.

> But even if we got past that, we need to get the assembler options
> right in order to enable instruction classes.  For example we have to
> get -Av9a there when using VIS instructions.

How about ".arch v9a" like other platforms emit?

Command-line options that control what the assembler emits for
the exact same bit of text are a Really Bad Idea, as we've seen
from other platforms time and time again.


r~


Re: LTO and asm specs...

2010-03-16 Thread David Miller
From: Richard Henderson 
Date: Tue, 16 Mar 2010 12:53:47 -0700

> On 03/16/2010 12:28 PM, David Miller wrote:
>> It's not the assemblers fault.
>> 
>> We're using %hi() and expecting the assembler to emit a
>> PC relative relcation just because the symbol name happens
>> to be _GLOBAL_OFFSET_TABLE_  And it will do this, but only
>> when -PIC.  Changing that is pretty dangerous.
> 
> It is the assembler's fault because it doesn't provide %pcrelhi() or
> some such to allow the compiler (or asm programmer) to emit exactly
> the relocation that's desired.

There is %pc22() and %pc10.  I don't know if it's safe to
change gcc to use them in all cases though.

>> But even if we got past that, we need to get the assembler options
>> right in order to enable instruction classes.  For example we have to
>> get -Av9a there when using VIS instructions.
> 
> How about ".arch v9a" like other platforms emit?
> 
> Command-line options that control what the assembler emits for
> the exact same bit of text are a Really Bad Idea, as we've seen
> from other platforms time and time again.

I think this distracts from the issue that LTO needs to
process specs properly.

Are you seriously against fixing that LTO bug?


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 12:32 PM, Paolo Carlini
 wrote:
> Hi,
>
> I'm rather surprised that now, in the "sane default world", only __i386 is
> defined, whereas __i686 is not on x86_64 -m32, I need -march=i686 on the
> command line (together with -m32).
>
> I noticed that while analyzing libstdc++/43394, where I was surprised that
> some preprocessor lines, legacy code actually, in the library code for
> parallel mode do not "notice" that we have now a better default:
>
> #elif defined(__GNUC__) && defined(__i386) &&   \
>   (defined(__i686) || defined(__pentium4) || defined(__athlon))
>     return __sync_fetch_and_add(__ptr, __addend);
>
> ... indeed, such lines want __i686 in order to safely enable the builtin and
> still find it undefined.
>
> If - as it's probably the case - I'm a bit confused about the meaning of
> those __i?86 macros, what people suggest instead? I suspect my
> __GCC_HAVE_SYNC_COMPARE_AND_SWAP_* could be put to good use, still I'm still
> curious about the exact semantics of the __i?86 macros...
>
> Thanks in advance,
> Paolo.

The question is what processor macros should "-march=x86-64" define. There
is

  {"x86-64", PROCESSOR_K8, CPU_K8,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF},

For -march=x86-64, __k8 is defined.  However, real K8 supports:

  {"k8", PROCESSOR_K8, CPU_K8,
PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
| PTA_SSE2 | PTA_NO_SAHF},

It isn't an issue in i386.c since PROCESSOR_K8 isn't used to check
ISAs. But using __k8 to check ISAs is a problem.

-- 
H.J.


Re: LTO and asm specs...

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 12:28 PM, David Miller  wrote:
> From: Richard Henderson 
> Date: Tue, 16 Mar 2010 11:31:44 -0700
>
>> On 03/12/2010 09:33 PM, David Miller wrote:
>>> I couldn't figure out immediately how to fix this as the
>>> way LTO does spec overriding and such looked non-trivial.
>>
>> It would not be a bad thing, IMO, if the sparc assembler
>> were extended to be able to emit any reloc directly, without
>> needing a specific command-line option.  Then you'd only
>> encounter this problem with legacy assemblers.
>
> It's not the assemblers fault.
>
> We're using %hi() and expecting the assembler to emit a
> PC relative relcation just because the symbol name happens
> to be _GLOBAL_OFFSET_TABLE_  And it will do this, but only
> when -PIC.  Changing that is pretty dangerous.
>
> But even if we got past that, we need to get the assembler options
> right in order to enable instruction classes.  For example we have to
> get -Av9a there when using VIS instructions.
>
> Other platforms are going to hit things like this too.
>
> LTO really needs to evaluate the specs correctly.
>

Can you store assembler options in some LTO section?


-- 
H.J.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
On 03/16/2010 08:53 PM, H.J. Lu wrote:
> The question is what processor macros should "-march=x86-64" define. There
> is
>
>   {"x86-64", PROCESSOR_K8, CPU_K8,
> PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF},
>
> For -march=x86-64, __k8 is defined.  However, real K8 supports:
>
>   {"k8", PROCESSOR_K8, CPU_K8,
> PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
> | PTA_SSE2 | PTA_NO_SAHF},
>
> It isn't an issue in i386.c since PROCESSOR_K8 isn't used to check
> ISAs. But using __k8 to check ISAs is a problem.
>   
I'm not sure to follow the gory details of your reply, but to me, it
seems *really* strange that *now*, on x86_64, "-m32" is not the same as
"-m32 -march=-i686" as far as __i686 is concerned...

Paolo.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 1:13 PM, Paolo Carlini  wrote:
> On 03/16/2010 08:53 PM, H.J. Lu wrote:
>> The question is what processor macros should "-march=x86-64" define. There
>> is
>>
>>       {"x86-64", PROCESSOR_K8, CPU_K8,
>>         PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF},
>>
>> For -march=x86-64, __k8 is defined.  However, real K8 supports:
>>
>>       {"k8", PROCESSOR_K8, CPU_K8,
>>         PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
>>         | PTA_SSE2 | PTA_NO_SAHF},
>>
>> It isn't an issue in i386.c since PROCESSOR_K8 isn't used to check
>> ISAs. But using __k8 to check ISAs is a problem.
>>
> I'm not sure to follow the gory details of your reply, but to me, it
> seems *really* strange that *now*, on x86_64, "-m32" is not the same as
> "-m32 -march=-i686" as far as __i686 is concerned...
>

We never defined __i686 for -m32 by default on x86_64. Here is
a patch to define __i686 for -m32 if the processor supports it.

-- 
H.J.
2010-03-16  H.J. Lu  

* config/i386/i386-c.c (ix86_target_macros_internal): Define
__i686/__i686__ for PROCESSOR_K8, PROCESSOR_AMDFAM10,
PROCESSOR_PENTIUM4, PROCESSOR_NOCONA, PROCESSOR_CORE2 and
PROCESSOR_ATOM.

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 35eab49..f6dad14 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -100,26 +100,53 @@ ix86_target_macros_internal (int isa_flag,
def_or_undef (parse_in, "__athlon_sse__");
   break;
 case PROCESSOR_K8:
+  if (!TARGET_64BIT)
+   {
+ def_or_undef (parse_in, "__i686");
+ def_or_undef (parse_in, "__i686__");
+   }
   def_or_undef (parse_in, "__k8");
   def_or_undef (parse_in, "__k8__");
   break;
 case PROCESSOR_AMDFAM10:
+  if (!TARGET_64BIT)
+   {
+ def_or_undef (parse_in, "__i686");
+ def_or_undef (parse_in, "__i686__");
+   }
   def_or_undef (parse_in, "__amdfam10");
   def_or_undef (parse_in, "__amdfam10__");
   break;
 case PROCESSOR_PENTIUM4:
+  def_or_undef (parse_in, "__i686");
+  def_or_undef (parse_in, "__i686__");
   def_or_undef (parse_in, "__pentium4");
   def_or_undef (parse_in, "__pentium4__");
   break;
 case PROCESSOR_NOCONA:
+  if (!TARGET_64BIT)
+   {
+ def_or_undef (parse_in, "__i686");
+ def_or_undef (parse_in, "__i686__");
+   }
   def_or_undef (parse_in, "__nocona");
   def_or_undef (parse_in, "__nocona__");
   break;
 case PROCESSOR_CORE2:
+  if (!TARGET_64BIT)
+   {
+ def_or_undef (parse_in, "__i686");
+ def_or_undef (parse_in, "__i686__");
+   }
   def_or_undef (parse_in, "__core2");
   def_or_undef (parse_in, "__core2__");
   break;
 case PROCESSOR_ATOM:
+  if (!TARGET_64BIT)
+   {
+ def_or_undef (parse_in, "__i686");
+ def_or_undef (parse_in, "__i686__");
+   }
   def_or_undef (parse_in, "__atom");
   def_or_undef (parse_in, "__atom__");
   break;


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
On 03/16/2010 09:40 PM, H.J. Lu wrote:
> We never defined __i686 for -m32 by default on x86_64. Here is
> a patch to define __i686 for -m32 if the processor supports it.
>   
If I understand correctly the logic underlying the recent work in this
area, I think we certainly want your patch, because otherwise we have
kind of an inconsistent situation: the i686 facilites *are* available,
but __i686 is undefined.

Maybe the patch should go to gcc-patches to...

Thanks,
Paolo.



Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Jakub Jelinek
On Tue, Mar 16, 2010 at 09:53:30PM +0100, Paolo Carlini wrote:
> On 03/16/2010 09:40 PM, H.J. Lu wrote:
> > We never defined __i686 for -m32 by default on x86_64. Here is
> > a patch to define __i686 for -m32 if the processor supports it.
> >   
> If I understand correctly the logic underlying the recent work in this
> area, I think we certainly want your patch, because otherwise we have
> kind of an inconsistent situation: the i686 facilites *are* available,
> but __i686 is undefined.
> 
> Maybe the patch should go to gcc-patches to...

I don't think it is a good idea to change the meaning of the macros years
after they have been introduced.
You could add a different macro if you want.
Why should be __i686 special?  i686 does have __i586 features too, should it
define also __i586, __i486?  Should __core2 define __pentium4?  Etc., etc.

Jakub


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
On 03/16/2010 09:58 PM, Jakub Jelinek wrote:
> I don't think it is a good idea to change the meaning of the macros years
> after they have been introduced.
> You could add a different macro if you want.
> Why should be __i686 special?  i686 does have __i586 features too, should it
> define also __i586, __i486? 
Probably it should, in my opinion.

But maybe I'm missing something about the whole logic of the recent
changes: wasn't about having the default for an i686 target similar, if
not identical, to passing by hand -march=i686? I'm really, really
confused... How is people supposed to figure out with macros that the
new default configuration supports everything -march=i686 supports vs
the previous status when it was identical to -march=i386?!?

Paolo.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 2:03 PM, Paolo Carlini  wrote:
> On 03/16/2010 09:58 PM, Jakub Jelinek wrote:
>> I don't think it is a good idea to change the meaning of the macros years
>> after they have been introduced.
>> You could add a different macro if you want.
>> Why should be __i686 special?  i686 does have __i586 features too, should it
>> define also __i586, __i486?
> Probably it should, in my opinion.
>
> But maybe I'm missing something about the whole logic of the recent
> changes: wasn't about having the default for an i686 target similar, if
> not identical, to passing by hand -march=i686? I'm really, really
> confused... How is people supposed to figure out with macros that the
> new default configuration supports everything -march=i686 supports vs
> the previous status when it was identical to -march=i386?!?
>
> Paolo.
>

Checking __iX86 is a good idea for ISAs since it's meaning isn't well defined
nor enforced.  For libstdc++ purpose, can you check __SSE2__ in addition to
__i686?


-- 
H.J.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 2:06 PM, H.J. Lu  wrote:
> On Tue, Mar 16, 2010 at 2:03 PM, Paolo Carlini  
> wrote:
>> On 03/16/2010 09:58 PM, Jakub Jelinek wrote:
>>> I don't think it is a good idea to change the meaning of the macros years
>>> after they have been introduced.
>>> You could add a different macro if you want.
>>> Why should be __i686 special?  i686 does have __i586 features too, should it
>>> define also __i586, __i486?
>> Probably it should, in my opinion.
>>
>> But maybe I'm missing something about the whole logic of the recent
>> changes: wasn't about having the default for an i686 target similar, if
>> not identical, to passing by hand -march=i686? I'm really, really
>> confused... How is people supposed to figure out with macros that the
>> new default configuration supports everything -march=i686 supports vs
>> the previous status when it was identical to -march=i386?!?
>>
>> Paolo.
>>
>
> Checking __iX86 is a good idea for ISAs since it's meaning isn't well defined

I mean "isn't a good idea".

> nor enforced.  For libstdc++ purpose, can you check __SSE2__ in addition to
> __i686?
>
>
> --
> H.J.
>



-- 
H.J.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 1:58 PM, Jakub Jelinek  wrote:
> On Tue, Mar 16, 2010 at 09:53:30PM +0100, Paolo Carlini wrote:
>> On 03/16/2010 09:40 PM, H.J. Lu wrote:
>> > We never defined __i686 for -m32 by default on x86_64. Here is
>> > a patch to define __i686 for -m32 if the processor supports it.
>> >
>> If I understand correctly the logic underlying the recent work in this
>> area, I think we certainly want your patch, because otherwise we have
>> kind of an inconsistent situation: the i686 facilites *are* available,
>> but __i686 is undefined.
>>
>> Maybe the patch should go to gcc-patches to...
>
> I don't think it is a good idea to change the meaning of the macros years
> after they have been introduced.
> You could add a different macro if you want.
> Why should be __i686 special?  i686 does have __i586 features too, should it
> define also __i586, __i486?  Should __core2 define __pentium4?  Etc., etc.
>

I don't think we should add those at all. i386.c has

 /* For sane SSE instruction set generation we need fcomi instruction.
 It is safe to enable all CMOVE instructions.  */
  if (TARGET_SSE)
TARGET_CMOVE = 1;

Why not check __SSE__ or __SSE2__?


-- 
H.J.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
On 03/16/2010 10:08 PM, H.J. Lu wrote:
> I don't think it is a good idea to change the meaning of the macros years
>> after they have been introduced.
>> You could add a different macro if you want.
>> Why should be __i686 special?  i686 does have __i586 features too, should it
>> define also __i586, __i486?  Should __core2 define __pentium4?  Etc., etc.
>>
>> 
> I don't think we should add those at all.
>   
About i586 & co, I see now that you are right.

To recapitulate my point, it just seemed strange to me, that, before and
after the recent changes, __i386 is defined,  whereas __i686 is defined
only if I pass -march=i686. On the other hand, after the recent changes,
which essentially change the default subtarget to -march=i686, __i686 is
not defined by default.

Paolo.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 1:14 PM, Paolo Carlini  wrote:
> On 03/16/2010 10:08 PM, H.J. Lu wrote:
>> I don't think it is a good idea to change the meaning of the macros years
>>> after they have been introduced.
>>> You could add a different macro if you want.
>>> Why should be __i686 special?  i686 does have __i586 features too, should it
>>> define also __i586, __i486?  Should __core2 define __pentium4?  Etc., etc.
>>>
>>>
>> I don't think we should add those at all.
>>
> About i586 & co, I see now that you are right.
>
> To recapitulate my point, it just seemed strange to me, that, before and
> after the recent changes, __i386 is defined,  whereas __i686 is defined
> only if I pass -march=i686. On the other hand, after the recent changes,
> which essentially change the default subtarget to -march=i686, __i686 is
> not defined by default.
>

That is not true. The new -m32 default ISA on x86-64 is i686 + MMX + SSE + SSE2.
It is Pentium 4, not i686.  For historical reason, we define __k8
instead of __pentium4.


-- 
H.J.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
On 03/16/2010 10:20 PM, H.J. Lu wrote:
> That is not true. The new -m32 default ISA on x86-64 is i686 + MMX + SSE + 
> SSE2.
> It is Pentium 4, not i686.  For historical reason, we define __k8
> instead of __pentium4.
>   
Ah, ok, this is what I was missing! We have *more* than i686. Thus I can
check for __k8.

Thanks again,
Paolo.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 1:30 PM, Paolo Carlini  wrote:
> On 03/16/2010 10:20 PM, H.J. Lu wrote:
>> That is not true. The new -m32 default ISA on x86-64 is i686 + MMX + SSE + 
>> SSE2.
>> It is Pentium 4, not i686.  For historical reason, we define __k8
>> instead of __pentium4.
>>
> Ah, ok, this is what I was missing! We have *more* than i686. Thus I can
> check for __k8.
>

Please check __SSE__ since __k8 won't be defined for -march=atom.

-- 
H.J.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
On 03/16/2010 10:33 PM, H.J. Lu wrote:
> Please check __SSE__ since __k8 won't be defined for -march=atom.
I don't care about Atom.

Paolo.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 2:36 PM, Paolo Carlini  wrote:
> On 03/16/2010 10:33 PM, H.J. Lu wrote:
>> Please check __SSE__ since __k8 won't be defined for -march=atom.
> I don't care about Atom.
>

Do you care about -march=core2?


-- 
H.J.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
On 03/16/2010 11:27 PM, H.J. Lu wrote:
> Do you care about -march=core2?
Ok, thanks, let's check __core2 too, but really, I don't want to fiddle
too much with these macros in the 4.5.0 timeframe. This is code for
parallel-mode which really is tailored by and large to modern 64-bit
machines. For further enhancements we have libstdc++/34106.

Paolo.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 2:32 PM, Paolo Carlini  wrote:
> On 03/16/2010 11:27 PM, H.J. Lu wrote:
>> Do you care about -march=core2?
> Ok, thanks, let's check __core2 too, but really, I don't want to fiddle
> too much with these macros in the 4.5.0 timeframe. This is code for
> parallel-mode which really is tailored by and large to modern 64-bit
> machines. For further enhancements we have libstdc++/34106.
>

As I said, you should check __SSE__ and be done with it. Otherwise you
will need to keep adding more checks for no good reasons.


-- 
H.J.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
On 03/16/2010 11:36 PM, H.J. Lu wrote:
> As I said, you should check __SSE__ and be done with it. Otherwise you
> will need to keep adding more checks for no good reasons.
>   
As I said, that file we'll be reworked *completely* by its maintainers,m
we have another PR for this, and I don't want __SSE__ which by itself
tells me nothing about atomic operations.

Paolo.


gcc-4.4-20100316 is now available

2010-03-16 Thread gccadmin
Snapshot gcc-4.4-20100316 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20100316/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 157496

You'll find:

gcc-4.4-20100316.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.4-20100316.tar.bz2 C front end and core compiler

gcc-ada-4.4-20100316.tar.bz2  Ada front end and runtime

gcc-fortran-4.4-20100316.tar.bz2  Fortran front end and runtime

gcc-g++-4.4-20100316.tar.bz2  C++ front end and runtime

gcc-java-4.4-20100316.tar.bz2 Java front end and runtime

gcc-objc-4.4-20100316.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.4-20100316.tar.bz2The GCC testsuite

Diffs from 4.4-20100309 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread H.J. Lu
On Tue, Mar 16, 2010 at 3:39 PM, Paolo Carlini  wrote:
> On 03/16/2010 11:36 PM, H.J. Lu wrote:
>> As I said, you should check __SSE__ and be done with it. Otherwise you
>> will need to keep adding more checks for no good reasons.
>>
> As I said, that file we'll be reworked *completely* by its maintainers,m
> we have another PR for this, and I don't want __SSE__ which by itself
> tells me nothing about atomic operations.
>

__SSE__/-msse enables i686 ISA. Does i686 ISA support
atomic operations?

-- 
H.J.


Re: Why is __i686 undefined for x86_64 -m32 (in mainline)

2010-03-16 Thread Paolo Carlini
On 03/17/2010 12:04 AM, H.J. Lu wrote:
> __SSE__/-msse enables i686 ISA. Does i686 ISA support
> atomic operations?
>   
If you are willing to contribute to these issue, please add your
comments to the audit trail of libstdc++/34106 and figure out with
Johannes a good clean-up for 4.6.0 (including a good amount of comments,
of course)

Thanks,
Paolo.


Re: constant hoisting out of loops

2010-03-16 Thread fanqifei
On Mon, Mar 15, 2010 at 5:24 AM, Jim Wilson  wrote:
> On 03/10/2010 10:48 PM, fanqifei wrote:
>>
>> For below piece of code, the instruction "clr.w a15" obviously doesn't
>> belong to the inner loop.
>>    6:   bd f4            clr.w a15; #clear to zero
>>    8:   80 af 00        std.w a10 0x0 a15;
>
> There is info lacking here.  Did you compile with optimization?  What does
> the RTL look like before and after the loop opt passes?
>
> I'd guess that your movsi pattern is defined wrong.  You probably have
> predicates that allow either registers or constants in the set source, which
> is normal, and constraints that only allow registers when the dest is a mem.
>  But constraints are only used by the reload pass, so a store zero to mem
> rtl insn will be generated early, and then fixed late during the reload
> pass.  So the loop opt did not move the clear insn out of the loop because
> there was no clear insn at this time.
>
> The way to fix this is to add a condition to the movsi pattern that excludes
> this case.  For instance, something like this:
>   "(register_operand (operands[0], SImode)
>     || register_operand (operands[1], SImode))"
> This will prevent a store zero to mem RTL insn from being accepted.  In
> order to make this work, you need to make movsi an expander that accepts
> anything, and then forces the source to a register if you have a store
> constant to memory.  See for instance the sparc_expand_move function or the
> mips_legitimize_move function.
>
> Use -da (old) or -fdump-rtl-all (new) to see the RTL dumps to see what is
> going on.
>
> Jim
>
It's compiled with -O2.
You are correct. The reload pass emitted the clr.w insn.
However, I can see loop opt passes after reload:
problem1.c.174r.loop2_invariant1
problem1.c.174r.redo_loop2_invariant
problem1.c.175r.loop2_unswitch
problem1.c.177r.redo_loop2_invariant
After reload pass, the clr.w insn is in the loop. And after above
loop2 passes, the insn is not moved outside of the loop.
I am not sure the issue is in these loop2 passes. I guess there is.

For the definition of movsi expander, I will try to do what you pointed out.
(I am not very familiar with these code and that may take me some time.)

current definition of mov pattern:
(define_insn "mov"
  [(set
(match_operand:BWD 0 "nonimmediate_operand"  "=r,m,r,r,r,r,r,r,x,r")
(match_operand:BWD 1 "move_source_operand"   "Z,r,L,I,Q,P,ni,x,r,r"))]
  ""
  "@
 %L1 %0 %1;
 %S0 %0 %1;
 clr %0;
 mv %0 %1;
   ... ...

Thanks!

-- 
-Qifei Fan
http://freshtime.org