Re: Question on mips multiply patterns in md file
> If you don't know anything about register class preferencing or reload as > yet, then this is probably not going to make much sense to you, but it isn't > anything important you need to worry about at this point. It is a very > minor performance optimization. > It makes sense to me now, though I haven't read codes for IRA and reloads yet. Thanks for the detailed explanation. > > A define_split can only match something generated by a define_insn, and the > mul_acc_si define_insn is testing "GENERATE_MADD_MSUB && !TARGET_MIPS16" > so there is no serious problem. We are just running a define_split that can > never match anything. This could be cleaned up a little by adding an > appropriate condition to the define_split, or by combining the define_insn > and define_split patterns into a define_insn_and_split pattern. In upper words, you mean that define_split would only get chance to split insn generated by the corresponding pattern "define_insn \"*mul_acc_si\"", though the split condition is some kind of weak(with only "reload_completed"). Because that kind of insn would only be generated by the "define_insn \"*mul_acc_si\"" pattern. Did I get it right? if so, i'm afraid this is actually not my question. What wanna know is: mips processors normally implement following kinds of mult/mult-acc insns: mult: HILO <-- s * t mul : HILO <-- s * t ; d <-- LO madd : HILO <-- HILO + s * t madd2: HILO <-- HILO + s * t ; d <-- HILO cut here- In my understanding, the macro GENERATE_MADD_MSUB is true when the processor has madd insn, rather than madd2. And the macro "ISA_HAS_MUL3" is false if it has no mul insn. for this kind processor, gcc will step 1 : generate insn using gen_mul3_internal, according to pattern "mul3"; step 2 : the combiner try to combine by matching against pattern "*mul_acc_si"; step 3 : it's possible that gcc fail to get LO register allocated for the combined "*mul_acc_si" insn; step 4 : after reload, the combined insn will be split according to the split pattern listed in previous mail. step 5 : the split insn is actually a "mul3_internal" , but get no LO allocated, which break the constraints in "mul3_internal" pattern; So, what should I do to handle this case? I see no methods except adding new split pattern like: (define_split [(set (match_operand:SI 0 "d_operand") (plus:SI (mult:SI (match_operand:SI 1 "d_operand") (match_operand:SI 2 "d_operand")) (match_operand:SI 3 "d_operand"))) (clobber (match_operand:SI 4 "lo_operand")) (clobber (match_operand:SI 5 "d_operand"))] "SPECIAL_PROCESSOR && reload_completed" [(parallel [(set (match_dup 4) (mult:SI (match_dup 1) (match_dup 2))) (clobber (match_dup 4))]) (set (match_dup 5) (match_dup 4)) (set (match_dup 0) (plus:SI (match_dup 5) (match_dup 3)))] "") Thanks again, looking forward your new explanations. -- Best Regards.
Re: GCC 4.5 Status Report (2010-03-15)
> 42509, arm-gnueabi doesn't bootstrap but is a primary target I haven't had the time in the past few weeks to work on this effectively. I'll be able to find some time to work on this during this week and will get back on this. cheers Ramana
Re: fixed-point support in c++
> The problem is that it won't be as simple as that. You'll have to extend > the C++ parser to accept those new RID_ values that it was previously never > expecting to see in those contexts, I would think (but haven't verified > against the source yet). The C++ parser is a hand-coded recursive-descent > parser, so I wouldn't expect it to be generically able to deal with > previously-unknown token types suddenly appearing in its input stream at > arbitrary points. > > cheers, > DaveK > I went through the c++ parser and added support for fixed point there. Everything seems to be working, and I am able to use fixed-point numbers in c++. The c++ parser is kind of complex and it is possible I missed something. I would love to get feedback on this patch, and hopefully it can get committed to gcc. Sean Index: gcc/builtins.c === --- gcc/builtins.c (revision 157409) +++ gcc/builtins.c (working copy) @@ -1708,6 +1708,7 @@ case INTEGER_TYPE: return integer_type_class; case ENUMERAL_TYPE: return enumeral_type_class; case BOOLEAN_TYPE: return boolean_type_class; +case FIXED_POINT_TYPE: return fixed_point_type_class; case POINTER_TYPE: return pointer_type_class; case REFERENCE_TYPE: return reference_type_class; case OFFSET_TYPE: return offset_type_class; Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 157409) +++ gcc/fold-const.c (working copy) @@ -12303,6 +12303,11 @@ if (TREE_CODE (arg1) == INTEGER_CST && tree_int_cst_sgn (arg1) < 0) return NULL_TREE; + /* Since fixed point types cannot perform bitwise and, or, etc.. + don't try to convert to an expression with them. */ + if (TREE_CODE(type) == FIXED_POINT_TYPE) + return NULL_TREE; + /* Turn (a OP c1) OP c2 into a OP (c1+c2). */ if (TREE_CODE (op0) == code && host_integerp (arg1, false) && TREE_INT_CST_LOW (arg1) < TYPE_PRECISION (type) Index: gcc/cp/typeck.c === --- gcc/cp/typeck.c (revision 157409) +++ gcc/cp/typeck.c (working copy) @@ -316,6 +316,91 @@ if (code2 == REAL_TYPE && code1 != REAL_TYPE) return build_type_attribute_variant (t2, attributes); + /* Deal with fixed-point types. */ + if (code1 == FIXED_POINT_TYPE || code2 == FIXED_POINT_TYPE) +{ + unsigned int unsignedp = 0, satp = 0; + enum machine_mode m1, m2; + unsigned int fbit1, ibit1, fbit2, ibit2, max_fbit, max_ibit; + + m1 = TYPE_MODE (t1); + m2 = TYPE_MODE (t2); + + /* If one input type is saturating, the result type is saturating. */ + if (TYPE_SATURATING (t1) || TYPE_SATURATING (t2)) + satp = 1; + + /* If both fixed-point types are unsigned, the result type is unsigned. + When mixing fixed-point and integer types, follow the sign of the + fixed-point type. + Otherwise, the result type is signed. */ + if ((TYPE_UNSIGNED (t1) && TYPE_UNSIGNED (t2) + && code1 == FIXED_POINT_TYPE && code2 == FIXED_POINT_TYPE) + || (code1 == FIXED_POINT_TYPE && code2 != FIXED_POINT_TYPE + && TYPE_UNSIGNED (t1)) + || (code1 != FIXED_POINT_TYPE && code2 == FIXED_POINT_TYPE + && TYPE_UNSIGNED (t2))) + unsignedp = 1; + + /* The result type is signed. */ + if (unsignedp == 0) + { + /* If the input type is unsigned, we need to convert to the + signed type. */ + if (code1 == FIXED_POINT_TYPE && TYPE_UNSIGNED (t1)) + { + enum mode_class mclass = (enum mode_class) 0; + if (GET_MODE_CLASS (m1) == MODE_UFRACT) + mclass = MODE_FRACT; + else if (GET_MODE_CLASS (m1) == MODE_UACCUM) + mclass = MODE_ACCUM; + else + gcc_unreachable (); + m1 = mode_for_size (GET_MODE_PRECISION (m1), mclass, 0); + } + if (code2 == FIXED_POINT_TYPE && TYPE_UNSIGNED (t2)) + { + enum mode_class mclass = (enum mode_class) 0; + if (GET_MODE_CLASS (m2) == MODE_UFRACT) + mclass = MODE_FRACT; + else if (GET_MODE_CLASS (m2) == MODE_UACCUM) + mclass = MODE_ACCUM; + else + gcc_unreachable (); + m2 = mode_for_size (GET_MODE_PRECISION (m2), mclass, 0); + } + } + + if (code1 == FIXED_POINT_TYPE) + { + fbit1 = GET_MODE_FBIT (m1); + ibit1 = GET_MODE_IBIT (m1); + } + else + { + fbit1 = 0; + /* Signed integers need to subtract one sign bit. */ + ibit1 = TYPE_PRECISION (t1) - (!TYPE_UNSIGNED (t1)); + } + + if (code2 == FIXED_POINT_TYPE) + { + fbit2 = GET_MODE_FBIT (m2); + ibit2 = GET_MODE_IBIT (m2); + } + else + { + fbit2 = 0; + /* Signed integers need to subtract one sign bit. */ + ibit2 = TYPE_PRECISION (t2) - (!TYPE_UNSIGNED (t2)); + } + + max_ibit = ibit1 >= ibit2 ? ibit1 : ibit2; + max_fbit = fbit1 >= fbit2 ? fbit1 : fbit2; + return c_common_fixed_point_type_for_size (max_ibit, max_fbit, unsignedp, + satp); +
Re: GCC 4.5 Status Report (2010-03-15)
On Mon, 15 Mar 2010, NightStrike wrote: > On Mon, Mar 15, 2010 at 12:18 PM, Richard Guenther wrote: > > As maintainers do not care for P1 bugs in their maintainance area > > so will the release managers not consider them P1. > > Probably not the best reason to downgrade a bug, eh? Well - patches welcome! Richard.
Re: GCC 4.5 Status Report (2010-03-15)
On Tue, Mar 16, 2010 at 11:12 AM, Richard Guenther wrote: > On Mon, 15 Mar 2010, NightStrike wrote: > >> On Mon, Mar 15, 2010 at 12:18 PM, Richard Guenther wrote: >> > As maintainers do not care for P1 bugs in their maintainance area >> > so will the release managers not consider them P1. >> >> Probably not the best reason to downgrade a bug, eh? > > Well - patches welcome! Indeed. And one has to realize that fixing all these bugs becomes a real problem for GCC, as a project, if the company with the largest listed number of maintainers (many of them of components with P1 bugs) chooses to not contribue at all to the bug-fixing effort before the release... Ciao! Steven
Re: GCC 4.5 Status Report (2010-03-15)
On Tue, 16 Mar 2010, Steven Bosscher wrote: > On Tue, Mar 16, 2010 at 11:12 AM, Richard Guenther wrote: > > On Mon, 15 Mar 2010, NightStrike wrote: > > > >> On Mon, Mar 15, 2010 at 12:18 PM, Richard Guenther > >> wrote: > >> > As maintainers do not care for P1 bugs in their maintainance area > >> > so will the release managers not consider them P1. > >> > >> Probably not the best reason to downgrade a bug, eh? > > > > Well - patches welcome! > > Indeed. And one has to realize that fixing all these bugs becomes a > real problem for GCC, as a project, if the company with the largest > listed number of maintainers (many of them of components with P1 bugs) > chooses to not contribue at all to the bug-fixing effort before the > release... To be fair the people of that company do not expose bugs proportional to their headcount either. Richard.
Re: GCC 4.5 Status Report (2010-03-15)
On Tue, Mar 16, 2010 at 12:25 PM, Richard Guenther wrote: > On Tue, 16 Mar 2010, Steven Bosscher wrote: > >> On Tue, Mar 16, 2010 at 11:12 AM, Richard Guenther wrote: >> > On Mon, 15 Mar 2010, NightStrike wrote: >> > >> >> On Mon, Mar 15, 2010 at 12:18 PM, Richard Guenther >> >> wrote: >> >> > As maintainers do not care for P1 bugs in their maintainance area >> >> > so will the release managers not consider them P1. >> >> >> >> Probably not the best reason to downgrade a bug, eh? >> > >> > Well - patches welcome! >> >> Indeed. And one has to realize that fixing all these bugs becomes a >> real problem for GCC, as a project, if the company with the largest >> listed number of maintainers (many of them of components with P1 bugs) >> chooses to not contribue at all to the bug-fixing effort before the >> release... > > To be fair the people of that company do not expose bugs proportional > to their headcount either. Neither do I, and yet I try to help ;-) Ciao! Steven
Re: GCC 4.5 Status Report (2010-03-15)
Richi, Steven, >> To be fair the people of that company do not expose bugs proportional >> to their headcount either. > > Neither do I, and yet I try to help ;-) Now, now, you two :-) Paul
Re: GCC 4.5 Status Report (2010-03-15)
On Mon, 15 Mar 2010, Richard Guenther wrote: > 42509, arm-gnueabi doesn't bootstrap but is a primary target The primary target is arm-eabi, which is a bare-metal target; the arm-eabi and mipsisa64-elf references must be understood as referring to building and testing a cross compiler from some other primary platform, since you can't bootstrap on those systems. -- Joseph S. Myers jos...@codesourcery.com
Re: fixincl 'make check' regressions...
The intent was to clear up some stuff in the README. When I noticed that I had affected other files, I had tried to put everything back. Obviously a glitch. I'll fix it when I get home tonight. On Mon, Mar 15, 2010 at 11:00 PM, David Miller wrote: > > Ever since your changes installed on March 12th, I've been getting > fixincludes testsuite failures of the form below. > > I also notice that none of these changes added ChangeLog entries, and > furthermore the SVN commit messages were extremely terse so it was > hard to diagnose the intent or reasoning behind your changes. > > iso/math_c99.h > /home/davem/src/GIT/GCC/gcc/fixincludes/tests/base/iso/math_c99.h differ: > char 1366, line 52 > *** iso/math_c99.h Mon Mar 15 22:55:36 2010 > --- /home/davem/src/GIT/GCC/gcc/fixincludes/tests/base/iso/math_c99.h Thu > Jan 21 04:06:11 2010 > *** > *** 49,55 > ? __builtin_signbitf(x) \ > : sizeof(x) == sizeof(long double) \ > ? __builtin_signbitl(x) \ > ! : __builtin_signbit(x)); > #endif /* SOLARIS_MATH_8_CHECK */ > > > --- 49,55 > ? __builtin_signbitf(x) \ > : sizeof(x) == sizeof(long double) \ > ? __builtin_signbitl(x) \ > ! : __builtin_signbit(x)) > #endif /* SOLARIS_MATH_8_CHECK */ > > > > There were fixinclude test FAILURES >
Re: (un)aligned accesses on x86 platform.
2010/3/8 Paweł Sikora : > hi, > > during development a cross platform appliacation on x86 workstation > i've enabled an alignemnt checking [1] to catch possible erroneous > code before it appears on client's sparc/arm cpu with sigbus ;) > > it works pretty fine and catches alignment violations but Jakub Jelinek > had told me (on glibc bugzilla) that gcc on x86 can still dereference > an unaligned pointer (except for vector insns). > i suppose it means that gcc can emit e.g. movl for access a short int > (or maybe others scenarios) in some cases and violates cpu alignment rules. > > so, is it possible to instruct gcc-x86 to always use suitable loads/stores > like on sparc/arm? > > [1] "AC" bit - http://en.wikipedia.org/wiki/FLAGS_register_(computing) > I am interested in an -mstrict-alignment option for x86. -- H.J.
Re: (un)aligned accesses on x86 platform.
On Mar 16, 2010, at 3:50 PM, H.J. Lu wrote: > 2010/3/8 Paweł Sikora : >> hi, >> >> during development a cross platform appliacation on x86 workstation >> i've enabled an alignemnt checking [1] to catch possible erroneous >> code before it appears on client's sparc/arm cpu with sigbus ;) >> >> it works pretty fine and catches alignment violations but Jakub Jelinek >> had told me (on glibc bugzilla) that gcc on x86 can still dereference >> an unaligned pointer (except for vector insns). >> i suppose it means that gcc can emit e.g. movl for access a short int >> (or maybe others scenarios) in some cases and violates cpu alignment rules. >> >> so, is it possible to instruct gcc-x86 to always use suitable loads/stores >> like on sparc/arm? >> >> [1] "AC" bit - http://en.wikipedia.org/wiki/FLAGS_register_(computing) >> > > I am interested in an -mstrict-alignment option for x86. Not sure it will be useful. The libc still does unaligned accesses IIRC.
Questions about "Handle constant exponents." in gcc/builtins.c
In the block "Handle constant exponents." in gcc/builtins.c, the condition !optimize_size has been replaced with optimize_insn_for_speed_p () between gcc 4.3 and 4.4, but I have not been able to find when and why. Does anybody remembers the when and why? This change make the optimization sensitive to PR40106 and unless it has compeling reasons it should be reverted in this piece of code. My second question is why using optimize_size? I think it would be better to define an upper bound instead of POWI_MAX_MULTS that depends on the kind of optimisation. I cannot see any situation in which sqrt(a) would not be better that pow(a,0.5) for speed, size, and accuracy. TIA Dominique
Re: Questions about "Handle constant exponents." in gcc/builtins.c
On Tue, Mar 16, 2010 at 4:11 PM, Dominique Dhumieres wrote: > In the block "Handle constant exponents." in gcc/builtins.c, the condition > !optimize_size has been replaced with optimize_insn_for_speed_p () between > gcc 4.3 and 4.4, but I have not been able to find when and why. > Does anybody remembers the when and why? > > This change make the optimization sensitive to PR40106 and unless it has > compeling reasons it should be reverted in this piece of code. > > My second question is why using optimize_size? I think it would be better > to define an upper bound instead of POWI_MAX_MULTS that depends on the kind > of optimisation. I cannot see any situation in which sqrt(a) would not be > better that pow(a,0.5) for speed, size, and accuracy. pow (a, 0.5) is always expanded to sqrt(a). It is when we require additional multiplications, pow (a, n) -> sqrt (a) * a**(n/2), that optimize_insn_for_speed_p () is checked. Richard. > TIA > > Dominique > >
Re: (un)aligned accesses on x86 platform.
On Tue, Mar 16, 2010 at 9:05 PM, Tristan Gingold wrote: > > On Mar 16, 2010, at 3:50 PM, H.J. Lu wrote: > >> 2010/3/8 Paweł Sikora : >>> hi, >>> >>> during development a cross platform appliacation on x86 workstation >>> i've enabled an alignemnt checking [1] to catch possible erroneous >>> code before it appears on client's sparc/arm cpu with sigbus ;) >>> >>> it works pretty fine and catches alignment violations but Jakub Jelinek >>> had told me (on glibc bugzilla) that gcc on x86 can still dereference >>> an unaligned pointer (except for vector insns). >>> i suppose it means that gcc can emit e.g. movl for access a short int >>> (or maybe others scenarios) in some cases and violates cpu alignment rules. >>> >>> so, is it possible to instruct gcc-x86 to always use suitable loads/stores >>> like on sparc/arm? >>> >>> [1] "AC" bit - http://en.wikipedia.org/wiki/FLAGS_register_(computing) >>> >> >> I am interested in an -mstrict-alignment option for x86. > > Not sure it will be useful. The libc still does unaligned accesses IIRC. > > Wow. What for? Alexey
Re: (un)aligned accesses on x86 platform.
On Mar 16, 2010, at 4:37 PM, Alexey Salmin wrote: >>> I am interested in an -mstrict-alignment option for x86. >> >> Not sure it will be useful. The libc still does unaligned accesses IIRC. >> > > Wow. What for? Well, simply because it is not compiled with strict alignment. There might also be some optimization in memory operation that does unaligned accesses.
Question about removing multiple elements from VEC
Hi, I'm looking at this FIXME in cp/typeck2.c. /* FIXME: Ordered removal is O(1) so the whole function is worst-case quadratic. This could be fixed using an aside bitmap to record which elements must be removed and remove them all at the same time. Or by merging split_non_constant_init into process_init_constructor_array, that is separating constants from non-constants while building the vector. */ VEC_ordered_remove (constructor_elt, CONSTRUCTOR_ELTS (init), idx); It seems there is no VEC function which can use a bitmap to do a ordered multiple remove. Did I miss something or I have to write one? Regards, -- Jie Zhang CodeSourcery (650) 331-3385 x735
Re: (un)aligned accesses on x86 platform.
On Tue, Mar 16, 2010 at 9:48 PM, Tristan Gingold wrote: > > On Mar 16, 2010, at 4:37 PM, Alexey Salmin wrote: I am interested in an -mstrict-alignment option for x86. >>> >>> Not sure it will be useful. The libc still does unaligned accesses IIRC. >>> >> >> Wow. What for? > > Well, simply because it is not compiled with strict alignment. There might > also be some optimization in > memory operation that does unaligned accesses. I always thought that unaligned access is much slower than aligned one. You mean code-size optimizations? Alexey
Re: Question about removing multiple elements from VEC
On Tue, Mar 16, 2010 at 5:02 PM, Jie Zhang wrote: > Hi, > > I'm looking at this FIXME in cp/typeck2.c. > > /* FIXME: Ordered removal is O(1) so the whole function is > worst-case quadratic. This could be fixed using an aside > bitmap to record which elements must be removed and remove > them all at the same time. Or by merging > split_non_constant_init into process_init_constructor_array, > that is separating constants from non-constants while building > the vector. */ > VEC_ordered_remove (constructor_elt, CONSTRUCTOR_ELTS (init), > idx); > > It seems there is no VEC function which can use a bitmap to do a ordered > multiple remove. Did I miss something or I have to write one? You have to write one. Richard.
Re: Question about removing multiple elements from VEC
On 03/17/2010 12:08 AM, Richard Guenther wrote: On Tue, Mar 16, 2010 at 5:02 PM, Jie Zhang wrote: Hi, I'm looking at this FIXME in cp/typeck2.c. /* FIXME: Ordered removal is O(1) so the whole function is worst-case quadratic. This could be fixed using an aside bitmap to record which elements must be removed and remove them all at the same time. Or by merging split_non_constant_init into process_init_constructor_array, that is separating constants from non-constants while building the vector. */ VEC_ordered_remove (constructor_elt, CONSTRUCTOR_ELTS (init), idx); It seems there is no VEC function which can use a bitmap to do a ordered multiple remove. Did I miss something or I have to write one? You have to write one. Thanks! -- Jie Zhang CodeSourcery (650) 331-3385 x735
Re: (un)aligned accesses on x86 platform.
Alexey Salmin wrote: > I always thought that unaligned access is much slower than aligned one. It is not *MUCH* slower, just slower (unless you cross cache line boundary). Unaligned accesses are very useful for improving performance of, among other things, certain hash functions (e.g. Paul Hsieh's one). Best regards, Piotr Wyderski
Re: (un)aligned accesses on x86 platform.
On Tue, Mar 16, 2010 at 10:04:04PM +0600, Alexey Salmin wrote: > >> Wow. What for? > > > > Well, simply because it is not compiled with strict alignment. There might > > also be some optimization in > > memory operation that does unaligned accesses. > > I always thought that unaligned access is much slower than aligned > one. You mean code-size optimizations? It is, but if you need to choose between doing say an unaligned 32-bit read access and reading it in 4 8-bit reads and assembling those together, on many targets that do allow unaligned accesses the former is much faster. Especially if in most cases the read is actually aligned and only in rare cases it is unaligned... Jakub
Re: LTO and asm specs...
On 03/12/2010 09:33 PM, David Miller wrote: > I couldn't figure out immediately how to fix this as the > way LTO does spec overriding and such looked non-trivial. It would not be a bad thing, IMO, if the sparc assembler were extended to be able to emit any reloc directly, without needing a specific command-line option. Then you'd only encounter this problem with legacy assemblers. r~
Re: LTO and asm specs...
From: Richard Henderson Date: Tue, 16 Mar 2010 11:31:44 -0700 > On 03/12/2010 09:33 PM, David Miller wrote: >> I couldn't figure out immediately how to fix this as the >> way LTO does spec overriding and such looked non-trivial. > > It would not be a bad thing, IMO, if the sparc assembler > were extended to be able to emit any reloc directly, without > needing a specific command-line option. Then you'd only > encounter this problem with legacy assemblers. It's not the assemblers fault. We're using %hi() and expecting the assembler to emit a PC relative relcation just because the symbol name happens to be _GLOBAL_OFFSET_TABLE_ And it will do this, but only when -PIC. Changing that is pretty dangerous. But even if we got past that, we need to get the assembler options right in order to enable instruction classes. For example we have to get -Av9a there when using VIS instructions. Other platforms are going to hit things like this too. LTO really needs to evaluate the specs correctly.
Why is __i686 undefined for x86_64 -m32 (in mainline)
Hi, I'm rather surprised that now, in the "sane default world", only __i386 is defined, whereas __i686 is not on x86_64 -m32, I need -march=i686 on the command line (together with -m32). I noticed that while analyzing libstdc++/43394, where I was surprised that some preprocessor lines, legacy code actually, in the library code for parallel mode do not "notice" that we have now a better default: #elif defined(__GNUC__) && defined(__i386) && \ (defined(__i686) || defined(__pentium4) || defined(__athlon)) return __sync_fetch_and_add(__ptr, __addend); ... indeed, such lines want __i686 in order to safely enable the builtin and still find it undefined. If - as it's probably the case - I'm a bit confused about the meaning of those __i?86 macros, what people suggest instead? I suspect my __GCC_HAVE_SYNC_COMPARE_AND_SWAP_* could be put to good use, still I'm still curious about the exact semantics of the __i?86 macros... Thanks in advance, Paolo.
Re: LTO and asm specs...
On 03/16/2010 12:28 PM, David Miller wrote: > It's not the assemblers fault. > > We're using %hi() and expecting the assembler to emit a > PC relative relcation just because the symbol name happens > to be _GLOBAL_OFFSET_TABLE_ And it will do this, but only > when -PIC. Changing that is pretty dangerous. It is the assembler's fault because it doesn't provide %pcrelhi() or some such to allow the compiler (or asm programmer) to emit exactly the relocation that's desired. > But even if we got past that, we need to get the assembler options > right in order to enable instruction classes. For example we have to > get -Av9a there when using VIS instructions. How about ".arch v9a" like other platforms emit? Command-line options that control what the assembler emits for the exact same bit of text are a Really Bad Idea, as we've seen from other platforms time and time again. r~
Re: LTO and asm specs...
From: Richard Henderson Date: Tue, 16 Mar 2010 12:53:47 -0700 > On 03/16/2010 12:28 PM, David Miller wrote: >> It's not the assemblers fault. >> >> We're using %hi() and expecting the assembler to emit a >> PC relative relcation just because the symbol name happens >> to be _GLOBAL_OFFSET_TABLE_ And it will do this, but only >> when -PIC. Changing that is pretty dangerous. > > It is the assembler's fault because it doesn't provide %pcrelhi() or > some such to allow the compiler (or asm programmer) to emit exactly > the relocation that's desired. There is %pc22() and %pc10. I don't know if it's safe to change gcc to use them in all cases though. >> But even if we got past that, we need to get the assembler options >> right in order to enable instruction classes. For example we have to >> get -Av9a there when using VIS instructions. > > How about ".arch v9a" like other platforms emit? > > Command-line options that control what the assembler emits for > the exact same bit of text are a Really Bad Idea, as we've seen > from other platforms time and time again. I think this distracts from the issue that LTO needs to process specs properly. Are you seriously against fixing that LTO bug?
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 12:32 PM, Paolo Carlini wrote: > Hi, > > I'm rather surprised that now, in the "sane default world", only __i386 is > defined, whereas __i686 is not on x86_64 -m32, I need -march=i686 on the > command line (together with -m32). > > I noticed that while analyzing libstdc++/43394, where I was surprised that > some preprocessor lines, legacy code actually, in the library code for > parallel mode do not "notice" that we have now a better default: > > #elif defined(__GNUC__) && defined(__i386) && \ > (defined(__i686) || defined(__pentium4) || defined(__athlon)) > return __sync_fetch_and_add(__ptr, __addend); > > ... indeed, such lines want __i686 in order to safely enable the builtin and > still find it undefined. > > If - as it's probably the case - I'm a bit confused about the meaning of > those __i?86 macros, what people suggest instead? I suspect my > __GCC_HAVE_SYNC_COMPARE_AND_SWAP_* could be put to good use, still I'm still > curious about the exact semantics of the __i?86 macros... > > Thanks in advance, > Paolo. The question is what processor macros should "-march=x86-64" define. There is {"x86-64", PROCESSOR_K8, CPU_K8, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF}, For -march=x86-64, __k8 is defined. However, real K8 supports: {"k8", PROCESSOR_K8, CPU_K8, PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF}, It isn't an issue in i386.c since PROCESSOR_K8 isn't used to check ISAs. But using __k8 to check ISAs is a problem. -- H.J.
Re: LTO and asm specs...
On Tue, Mar 16, 2010 at 12:28 PM, David Miller wrote: > From: Richard Henderson > Date: Tue, 16 Mar 2010 11:31:44 -0700 > >> On 03/12/2010 09:33 PM, David Miller wrote: >>> I couldn't figure out immediately how to fix this as the >>> way LTO does spec overriding and such looked non-trivial. >> >> It would not be a bad thing, IMO, if the sparc assembler >> were extended to be able to emit any reloc directly, without >> needing a specific command-line option. Then you'd only >> encounter this problem with legacy assemblers. > > It's not the assemblers fault. > > We're using %hi() and expecting the assembler to emit a > PC relative relcation just because the symbol name happens > to be _GLOBAL_OFFSET_TABLE_ And it will do this, but only > when -PIC. Changing that is pretty dangerous. > > But even if we got past that, we need to get the assembler options > right in order to enable instruction classes. For example we have to > get -Av9a there when using VIS instructions. > > Other platforms are going to hit things like this too. > > LTO really needs to evaluate the specs correctly. > Can you store assembler options in some LTO section? -- H.J.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On 03/16/2010 08:53 PM, H.J. Lu wrote: > The question is what processor macros should "-march=x86-64" define. There > is > > {"x86-64", PROCESSOR_K8, CPU_K8, > PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF}, > > For -march=x86-64, __k8 is defined. However, real K8 supports: > > {"k8", PROCESSOR_K8, CPU_K8, > PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE > | PTA_SSE2 | PTA_NO_SAHF}, > > It isn't an issue in i386.c since PROCESSOR_K8 isn't used to check > ISAs. But using __k8 to check ISAs is a problem. > I'm not sure to follow the gory details of your reply, but to me, it seems *really* strange that *now*, on x86_64, "-m32" is not the same as "-m32 -march=-i686" as far as __i686 is concerned... Paolo.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 1:13 PM, Paolo Carlini wrote: > On 03/16/2010 08:53 PM, H.J. Lu wrote: >> The question is what processor macros should "-march=x86-64" define. There >> is >> >> {"x86-64", PROCESSOR_K8, CPU_K8, >> PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF}, >> >> For -march=x86-64, __k8 is defined. However, real K8 supports: >> >> {"k8", PROCESSOR_K8, CPU_K8, >> PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE >> | PTA_SSE2 | PTA_NO_SAHF}, >> >> It isn't an issue in i386.c since PROCESSOR_K8 isn't used to check >> ISAs. But using __k8 to check ISAs is a problem. >> > I'm not sure to follow the gory details of your reply, but to me, it > seems *really* strange that *now*, on x86_64, "-m32" is not the same as > "-m32 -march=-i686" as far as __i686 is concerned... > We never defined __i686 for -m32 by default on x86_64. Here is a patch to define __i686 for -m32 if the processor supports it. -- H.J. 2010-03-16 H.J. Lu * config/i386/i386-c.c (ix86_target_macros_internal): Define __i686/__i686__ for PROCESSOR_K8, PROCESSOR_AMDFAM10, PROCESSOR_PENTIUM4, PROCESSOR_NOCONA, PROCESSOR_CORE2 and PROCESSOR_ATOM. diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 35eab49..f6dad14 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -100,26 +100,53 @@ ix86_target_macros_internal (int isa_flag, def_or_undef (parse_in, "__athlon_sse__"); break; case PROCESSOR_K8: + if (!TARGET_64BIT) + { + def_or_undef (parse_in, "__i686"); + def_or_undef (parse_in, "__i686__"); + } def_or_undef (parse_in, "__k8"); def_or_undef (parse_in, "__k8__"); break; case PROCESSOR_AMDFAM10: + if (!TARGET_64BIT) + { + def_or_undef (parse_in, "__i686"); + def_or_undef (parse_in, "__i686__"); + } def_or_undef (parse_in, "__amdfam10"); def_or_undef (parse_in, "__amdfam10__"); break; case PROCESSOR_PENTIUM4: + def_or_undef (parse_in, "__i686"); + def_or_undef (parse_in, "__i686__"); def_or_undef (parse_in, "__pentium4"); def_or_undef (parse_in, "__pentium4__"); break; case PROCESSOR_NOCONA: + if (!TARGET_64BIT) + { + def_or_undef (parse_in, "__i686"); + def_or_undef (parse_in, "__i686__"); + } def_or_undef (parse_in, "__nocona"); def_or_undef (parse_in, "__nocona__"); break; case PROCESSOR_CORE2: + if (!TARGET_64BIT) + { + def_or_undef (parse_in, "__i686"); + def_or_undef (parse_in, "__i686__"); + } def_or_undef (parse_in, "__core2"); def_or_undef (parse_in, "__core2__"); break; case PROCESSOR_ATOM: + if (!TARGET_64BIT) + { + def_or_undef (parse_in, "__i686"); + def_or_undef (parse_in, "__i686__"); + } def_or_undef (parse_in, "__atom"); def_or_undef (parse_in, "__atom__"); break;
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On 03/16/2010 09:40 PM, H.J. Lu wrote: > We never defined __i686 for -m32 by default on x86_64. Here is > a patch to define __i686 for -m32 if the processor supports it. > If I understand correctly the logic underlying the recent work in this area, I think we certainly want your patch, because otherwise we have kind of an inconsistent situation: the i686 facilites *are* available, but __i686 is undefined. Maybe the patch should go to gcc-patches to... Thanks, Paolo.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 09:53:30PM +0100, Paolo Carlini wrote: > On 03/16/2010 09:40 PM, H.J. Lu wrote: > > We never defined __i686 for -m32 by default on x86_64. Here is > > a patch to define __i686 for -m32 if the processor supports it. > > > If I understand correctly the logic underlying the recent work in this > area, I think we certainly want your patch, because otherwise we have > kind of an inconsistent situation: the i686 facilites *are* available, > but __i686 is undefined. > > Maybe the patch should go to gcc-patches to... I don't think it is a good idea to change the meaning of the macros years after they have been introduced. You could add a different macro if you want. Why should be __i686 special? i686 does have __i586 features too, should it define also __i586, __i486? Should __core2 define __pentium4? Etc., etc. Jakub
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On 03/16/2010 09:58 PM, Jakub Jelinek wrote: > I don't think it is a good idea to change the meaning of the macros years > after they have been introduced. > You could add a different macro if you want. > Why should be __i686 special? i686 does have __i586 features too, should it > define also __i586, __i486? Probably it should, in my opinion. But maybe I'm missing something about the whole logic of the recent changes: wasn't about having the default for an i686 target similar, if not identical, to passing by hand -march=i686? I'm really, really confused... How is people supposed to figure out with macros that the new default configuration supports everything -march=i686 supports vs the previous status when it was identical to -march=i386?!? Paolo.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 2:03 PM, Paolo Carlini wrote: > On 03/16/2010 09:58 PM, Jakub Jelinek wrote: >> I don't think it is a good idea to change the meaning of the macros years >> after they have been introduced. >> You could add a different macro if you want. >> Why should be __i686 special? i686 does have __i586 features too, should it >> define also __i586, __i486? > Probably it should, in my opinion. > > But maybe I'm missing something about the whole logic of the recent > changes: wasn't about having the default for an i686 target similar, if > not identical, to passing by hand -march=i686? I'm really, really > confused... How is people supposed to figure out with macros that the > new default configuration supports everything -march=i686 supports vs > the previous status when it was identical to -march=i386?!? > > Paolo. > Checking __iX86 is a good idea for ISAs since it's meaning isn't well defined nor enforced. For libstdc++ purpose, can you check __SSE2__ in addition to __i686? -- H.J.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 2:06 PM, H.J. Lu wrote: > On Tue, Mar 16, 2010 at 2:03 PM, Paolo Carlini > wrote: >> On 03/16/2010 09:58 PM, Jakub Jelinek wrote: >>> I don't think it is a good idea to change the meaning of the macros years >>> after they have been introduced. >>> You could add a different macro if you want. >>> Why should be __i686 special? i686 does have __i586 features too, should it >>> define also __i586, __i486? >> Probably it should, in my opinion. >> >> But maybe I'm missing something about the whole logic of the recent >> changes: wasn't about having the default for an i686 target similar, if >> not identical, to passing by hand -march=i686? I'm really, really >> confused... How is people supposed to figure out with macros that the >> new default configuration supports everything -march=i686 supports vs >> the previous status when it was identical to -march=i386?!? >> >> Paolo. >> > > Checking __iX86 is a good idea for ISAs since it's meaning isn't well defined I mean "isn't a good idea". > nor enforced. For libstdc++ purpose, can you check __SSE2__ in addition to > __i686? > > > -- > H.J. > -- H.J.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 1:58 PM, Jakub Jelinek wrote: > On Tue, Mar 16, 2010 at 09:53:30PM +0100, Paolo Carlini wrote: >> On 03/16/2010 09:40 PM, H.J. Lu wrote: >> > We never defined __i686 for -m32 by default on x86_64. Here is >> > a patch to define __i686 for -m32 if the processor supports it. >> > >> If I understand correctly the logic underlying the recent work in this >> area, I think we certainly want your patch, because otherwise we have >> kind of an inconsistent situation: the i686 facilites *are* available, >> but __i686 is undefined. >> >> Maybe the patch should go to gcc-patches to... > > I don't think it is a good idea to change the meaning of the macros years > after they have been introduced. > You could add a different macro if you want. > Why should be __i686 special? i686 does have __i586 features too, should it > define also __i586, __i486? Should __core2 define __pentium4? Etc., etc. > I don't think we should add those at all. i386.c has /* For sane SSE instruction set generation we need fcomi instruction. It is safe to enable all CMOVE instructions. */ if (TARGET_SSE) TARGET_CMOVE = 1; Why not check __SSE__ or __SSE2__? -- H.J.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On 03/16/2010 10:08 PM, H.J. Lu wrote: > I don't think it is a good idea to change the meaning of the macros years >> after they have been introduced. >> You could add a different macro if you want. >> Why should be __i686 special? i686 does have __i586 features too, should it >> define also __i586, __i486? Should __core2 define __pentium4? Etc., etc. >> >> > I don't think we should add those at all. > About i586 & co, I see now that you are right. To recapitulate my point, it just seemed strange to me, that, before and after the recent changes, __i386 is defined, whereas __i686 is defined only if I pass -march=i686. On the other hand, after the recent changes, which essentially change the default subtarget to -march=i686, __i686 is not defined by default. Paolo.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 1:14 PM, Paolo Carlini wrote: > On 03/16/2010 10:08 PM, H.J. Lu wrote: >> I don't think it is a good idea to change the meaning of the macros years >>> after they have been introduced. >>> You could add a different macro if you want. >>> Why should be __i686 special? i686 does have __i586 features too, should it >>> define also __i586, __i486? Should __core2 define __pentium4? Etc., etc. >>> >>> >> I don't think we should add those at all. >> > About i586 & co, I see now that you are right. > > To recapitulate my point, it just seemed strange to me, that, before and > after the recent changes, __i386 is defined, whereas __i686 is defined > only if I pass -march=i686. On the other hand, after the recent changes, > which essentially change the default subtarget to -march=i686, __i686 is > not defined by default. > That is not true. The new -m32 default ISA on x86-64 is i686 + MMX + SSE + SSE2. It is Pentium 4, not i686. For historical reason, we define __k8 instead of __pentium4. -- H.J.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On 03/16/2010 10:20 PM, H.J. Lu wrote: > That is not true. The new -m32 default ISA on x86-64 is i686 + MMX + SSE + > SSE2. > It is Pentium 4, not i686. For historical reason, we define __k8 > instead of __pentium4. > Ah, ok, this is what I was missing! We have *more* than i686. Thus I can check for __k8. Thanks again, Paolo.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 1:30 PM, Paolo Carlini wrote: > On 03/16/2010 10:20 PM, H.J. Lu wrote: >> That is not true. The new -m32 default ISA on x86-64 is i686 + MMX + SSE + >> SSE2. >> It is Pentium 4, not i686. For historical reason, we define __k8 >> instead of __pentium4. >> > Ah, ok, this is what I was missing! We have *more* than i686. Thus I can > check for __k8. > Please check __SSE__ since __k8 won't be defined for -march=atom. -- H.J.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On 03/16/2010 10:33 PM, H.J. Lu wrote: > Please check __SSE__ since __k8 won't be defined for -march=atom. I don't care about Atom. Paolo.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 2:36 PM, Paolo Carlini wrote: > On 03/16/2010 10:33 PM, H.J. Lu wrote: >> Please check __SSE__ since __k8 won't be defined for -march=atom. > I don't care about Atom. > Do you care about -march=core2? -- H.J.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On 03/16/2010 11:27 PM, H.J. Lu wrote: > Do you care about -march=core2? Ok, thanks, let's check __core2 too, but really, I don't want to fiddle too much with these macros in the 4.5.0 timeframe. This is code for parallel-mode which really is tailored by and large to modern 64-bit machines. For further enhancements we have libstdc++/34106. Paolo.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 2:32 PM, Paolo Carlini wrote: > On 03/16/2010 11:27 PM, H.J. Lu wrote: >> Do you care about -march=core2? > Ok, thanks, let's check __core2 too, but really, I don't want to fiddle > too much with these macros in the 4.5.0 timeframe. This is code for > parallel-mode which really is tailored by and large to modern 64-bit > machines. For further enhancements we have libstdc++/34106. > As I said, you should check __SSE__ and be done with it. Otherwise you will need to keep adding more checks for no good reasons. -- H.J.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On 03/16/2010 11:36 PM, H.J. Lu wrote: > As I said, you should check __SSE__ and be done with it. Otherwise you > will need to keep adding more checks for no good reasons. > As I said, that file we'll be reworked *completely* by its maintainers,m we have another PR for this, and I don't want __SSE__ which by itself tells me nothing about atomic operations. Paolo.
gcc-4.4-20100316 is now available
Snapshot gcc-4.4-20100316 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20100316/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 157496 You'll find: gcc-4.4-20100316.tar.bz2 Complete GCC (includes all of below) gcc-core-4.4-20100316.tar.bz2 C front end and core compiler gcc-ada-4.4-20100316.tar.bz2 Ada front end and runtime gcc-fortran-4.4-20100316.tar.bz2 Fortran front end and runtime gcc-g++-4.4-20100316.tar.bz2 C++ front end and runtime gcc-java-4.4-20100316.tar.bz2 Java front end and runtime gcc-objc-4.4-20100316.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.4-20100316.tar.bz2The GCC testsuite Diffs from 4.4-20100309 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On Tue, Mar 16, 2010 at 3:39 PM, Paolo Carlini wrote: > On 03/16/2010 11:36 PM, H.J. Lu wrote: >> As I said, you should check __SSE__ and be done with it. Otherwise you >> will need to keep adding more checks for no good reasons. >> > As I said, that file we'll be reworked *completely* by its maintainers,m > we have another PR for this, and I don't want __SSE__ which by itself > tells me nothing about atomic operations. > __SSE__/-msse enables i686 ISA. Does i686 ISA support atomic operations? -- H.J.
Re: Why is __i686 undefined for x86_64 -m32 (in mainline)
On 03/17/2010 12:04 AM, H.J. Lu wrote: > __SSE__/-msse enables i686 ISA. Does i686 ISA support > atomic operations? > If you are willing to contribute to these issue, please add your comments to the audit trail of libstdc++/34106 and figure out with Johannes a good clean-up for 4.6.0 (including a good amount of comments, of course) Thanks, Paolo.
Re: constant hoisting out of loops
On Mon, Mar 15, 2010 at 5:24 AM, Jim Wilson wrote: > On 03/10/2010 10:48 PM, fanqifei wrote: >> >> For below piece of code, the instruction "clr.w a15" obviously doesn't >> belong to the inner loop. >> 6: bd f4 clr.w a15; #clear to zero >> 8: 80 af 00 std.w a10 0x0 a15; > > There is info lacking here. Did you compile with optimization? What does > the RTL look like before and after the loop opt passes? > > I'd guess that your movsi pattern is defined wrong. You probably have > predicates that allow either registers or constants in the set source, which > is normal, and constraints that only allow registers when the dest is a mem. > But constraints are only used by the reload pass, so a store zero to mem > rtl insn will be generated early, and then fixed late during the reload > pass. So the loop opt did not move the clear insn out of the loop because > there was no clear insn at this time. > > The way to fix this is to add a condition to the movsi pattern that excludes > this case. For instance, something like this: > "(register_operand (operands[0], SImode) > || register_operand (operands[1], SImode))" > This will prevent a store zero to mem RTL insn from being accepted. In > order to make this work, you need to make movsi an expander that accepts > anything, and then forces the source to a register if you have a store > constant to memory. See for instance the sparc_expand_move function or the > mips_legitimize_move function. > > Use -da (old) or -fdump-rtl-all (new) to see the RTL dumps to see what is > going on. > > Jim > It's compiled with -O2. You are correct. The reload pass emitted the clr.w insn. However, I can see loop opt passes after reload: problem1.c.174r.loop2_invariant1 problem1.c.174r.redo_loop2_invariant problem1.c.175r.loop2_unswitch problem1.c.177r.redo_loop2_invariant After reload pass, the clr.w insn is in the loop. And after above loop2 passes, the insn is not moved outside of the loop. I am not sure the issue is in these loop2 passes. I guess there is. For the definition of movsi expander, I will try to do what you pointed out. (I am not very familiar with these code and that may take me some time.) current definition of mov pattern: (define_insn "mov" [(set (match_operand:BWD 0 "nonimmediate_operand" "=r,m,r,r,r,r,r,r,x,r") (match_operand:BWD 1 "move_source_operand" "Z,r,L,I,Q,P,ni,x,r,r"))] "" "@ %L1 %0 %1; %S0 %0 %1; clr %0; mv %0 %1; ... ... Thanks! -- -Qifei Fan http://freshtime.org