On Sun, Nov 14, 2010 at 07:21:50PM -0800, Linus Torvalds wrote:
> So when Richard Gunther says "a memory clobber doesn't cover automatic
> storage", to me that very clearly spells "gcc is buggy as hell".
> Because automatic storage with its address taken _very_ much gets
> clobbered by things like
> That said, changing the inline asm to just clobber one less register
> would be completely sufficient to make it work well with all gccs out there,
> just push/pop one of the register around the whole body. I doubt calling
> out SMM BIOS is actually so performance critical that one push and one
On Mon, Nov 15, 2010 at 09:56:05AM +0100, Jakub Jelinek wrote:
> Yes, reload should figure out it has address of regs already tied to %eax,
> unfortunately starting with IRA it doesn't (I'll file a GCC bug about that;
http://gcc.gnu.org/PR46479
Jakub
With 638 macros documented by @defmac, and 475 files that include tm.h ,
our current approach to hookization is too slow to get the tree optimizers
and front ends independent of target macros in any useful timeframe.
Therefore, I propose the following approach:
The target macros currently require
On Mon, Nov 15, 2010 at 09:56:05AM +0100, Jakub Jelinek wrote:
> On Sun, Nov 14, 2010 at 07:21:50PM -0800, Linus Torvalds wrote:
> > So when Richard Gunther says "a memory clobber doesn't cover automatic
> > storage", to me that very clearly spells "gcc is buggy as hell".
> > Because automatic stor
On Mon, Nov 15, 2010 at 9:56 AM, Jakub Jelinek wrote:
> On Sun, Nov 14, 2010 at 07:21:50PM -0800, Linus Torvalds wrote:
>> So when Richard Gunther says "a memory clobber doesn't cover automatic
>> storage", to me that very clearly spells "gcc is buggy as hell".
>> Because automatic storage with it
> And for this the starting point should be what has been requested,
> i.e. preprocessed source + gcc options + gcc version and some hints what
> actually misbehaves (with the , "+m" (*regs) change reverted)
> in gcc bugzilla. Only with that we can actually look at what has been
> happening, see w
On Mon, Nov 15, 2010 at 11:54:46AM +0100, Andi Kleen wrote:
> > And for this the starting point should be what has been requested,
> > i.e. preprocessed source + gcc options + gcc version and some hints what
> > actually misbehaves (with the , "+m" (*regs) change reverted)
> > in gcc bugzilla. Onl
> Guess we need somebody who actually reported the problem, state what
> gcc was actually used and post preprocessed source, gcc options
> from his case.
Jim Bos,
Can you please supply that?
Please use
rm drivers/char/i8k.o
make V=1 drivers/char/i8k.o
make drivers/char/i8k.i
and supply the
On 11/13/2010 08:40 PM, Peter Bergner wrote:
On Sat, 2010-11-13 at 11:27 +0100, Paolo Bonzini wrote:
On 11/12/2010 03:25 PM, H.J. Lu wrote:
IRA may move instructions across an unspec_volatile,
Do you have a testcase?
Are you sure it's IRA and not our old friend update_equiv_regs()
which IRA
Hello,
On 14.11.2010 0:08, Xinliang David Li wrote:
I re-measured the performance difference using trunk gcc and trunk
clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc
because clang/llvm's type based aliasing is not incomplete and not
enabled by default. I also added -fomit-fram
Is there any another way to give attributes to inline assembly insns?
2010/10/26 Ian Lance Taylor :
> roy rosen writes:
>
>> If I want the compiler to understand the inline assembly is it
>> possible to write define_insn which would match the pattern that GCC
>> creates for the inline assembly an
Quoting roy rosen :
Is there any another way to give attributes to inline assembly insns?
See define_asm_attributes.
> It appears that reload_combine does not take exceptions into account.
> When it encounters a BARRIER it forgets all register uses after this
> point. But an exception can transfer control to any of the CODE_LABELs
> and jump back to after the BARRIER, with the registers still in use.
There shou
On Mon, 15 Nov 2010, Joern Rennecke wrote:
> With 638 macros documented by @defmac, and 475 files that include tm.h ,
> our current approach to hookization is too slow to get the tree optimizers
> and front ends independent of target macros in any useful timeframe.
I think it's perfectly feasible
But this lets you just set default attributes.
I want to set real attributes so that the compiler would be able to
know which insn can be parallelized with another.
Is there a different way?
Are you saying that an inline assembly statement would stay as is, and
would not be touched by the compiler
On Mon, Nov 15, 2010 at 3:16 AM, Jakub Jelinek wrote:
>
> I don't see any problems on the assembly level. i8k_smm is
> not inlined in this case and checks all 3 conditions.
If it really is related to gcc not understanding that "*regs" has
changed due to the memory being an automatic variable, an
Eric Botcazou writes:
>> It appears that reload_combine does not take exceptions into account.
>> When it encounters a BARRIER it forgets all register uses after this
>> point. But an exception can transfer control to any of the CODE_LABELs
>> and jump back to after the BARRIER, with the registe
> JUMP_INSNs already invalidate the register use information. The problem
> is that CALL_INSNs that can throw don't.
Sure, that's precisely what I was suggesting to change, like in rev 162301.
--
Eric Botcazou
For peak, FDO is the most effective option. It can boost performance
by 7-10% depending on the program. The options you suggested probably
won't make too big a dent. -funroll-loops can hurt performance
without profiling. More aggressive inlining, ipa-cp, unswitching etc
enabled by O3 may help a l
Quoting "Joseph S. Myers" :
On Mon, 15 Nov 2010, Joern Rennecke wrote:
With 638 macros documented by @defmac, and 475 files that include tm.h ,
our current approach to hookization is too slow to get the tree optimizers
and front ends independent of target macros in any useful timeframe.
I th
Eric Botcazou writes:
>> JUMP_INSNs already invalidate the register use information. The problem
>> is that CALL_INSNs that can throw don't.
>
> Sure, that's precisely what I was suggesting to change, like in rev 162301.
Ahh, you mean something like this? (Fixes the testcase, but not
properly
On 11/15/2010 05:04 PM, Linus Torvalds wrote:
> On Mon, Nov 15, 2010 at 3:16 AM, Jakub Jelinek wrote:
>>
>> I don't see any problems on the assembly level. i8k_smm is
>> not inlined in this case and checks all 3 conditions.
>
> If it really is related to gcc not understanding that "*regs" has
>
On Mon, Nov 15, 2010 at 06:36:06PM +0100, Jim Bos wrote:
> On 11/15/2010 12:37 PM, Andi Kleen wrote:
> See attached, note this is the vanilla 2.6.36 i8k.c (without any patch).
> And to be 100% sure, if I build this (make drivers/char/i8k.ko) it won't
> work.
>
> [ The i8k.i is rather big, even gzi
> For peak, FDO is the most effective option. It can boost performance
> by 7-10% depending on the program. The options you suggested probably
> won't make too big a dent. -funroll-loops can hurt performance
> without profiling. More aggressive inlining, ipa-cp, unswitching etc
-funroll-loops ov
On Mon, Nov 15, 2010 at 9:40 AM, Jim Bos wrote:
>
> Hmm, that doesn't work.
>
> [ Not sure if you read to whole thread but initial workaround was to
> change the asm(..) to asm volatile(..) which did work. ]
Since I have a different gcc than yours (and I'm not going to compile
my own), have you p
On 11/15/2010 06:44 PM, Jakub Jelinek wrote:
> On Mon, Nov 15, 2010 at 06:36:06PM +0100, Jim Bos wrote:
>> On 11/15/2010 12:37 PM, Andi Kleen wrote:
>> See attached, note this is the vanilla 2.6.36 i8k.c (without any patch).
>> And to be 100% sure, if I build this (make drivers/char/i8k.ko) it won'
On Mon, Nov 15, 2010 at 07:17:31PM +0100, Jim Bos wrote:
> # gcc -v
> Reading specs from /usr/lib/gcc/i486-slackware-linux/4.5.1/specs
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i486-slackware-linux/4.5.1/lto-wrapper
> Target: i486-slackware-linux
> Configured with: ../gcc-4.5.1/conf
On 11/15/2010 07:08 PM, Linus Torvalds wrote:
> On Mon, Nov 15, 2010 at 9:40 AM, Jim Bos wrote:
>>
>> Hmm, that doesn't work.
>>
>> [ Not sure if you read to whole thread but initial workaround was to
>> change the asm(..) to asm volatile(..) which did work. ]
>
> Since I have a different gcc tha
On 11/15/2010 07:30 PM, Jim Bos wrote:
> On 11/15/2010 07:08 PM, Linus Torvalds wrote:
>> On Mon, Nov 15, 2010 at 9:40 AM, Jim Bos wrote:
>>>
>>> Hmm, that doesn't work.
>>>
>>> [ Not sure if you read to whole thread but initial workaround was to
>>> change the asm(..) to asm volatile(..) which di
On 11/07/10 15:41, Andreas Schwab wrote:
Andi Kleen writes:
Jim writes:
After upgrading my Dell laptop, both OS+kernel the i8k interface was giving
nonsensical output. As it turned out it's not the kernel but compiler
upgrade which broke this.
Guys at Archlinux have found the underlying ca
On Mon, 15 Nov 2010, Joern Rennecke wrote:
> Quoting "Joseph S. Myers" :
>
> > On Mon, 15 Nov 2010, Joern Rennecke wrote:
> >
> > > With 638 macros documented by @defmac, and 475 files that include tm.h ,
> > > our current approach to hookization is too slow to get the tree optimizers
> > > and
On 11/08/10 03:49, Richard Guenther wrote:
On Mon, Nov 8, 2010 at 12:03 AM, Andi Kleen wrote:
Andreas Schwab writes:
The asm fails to mention that it modifies *regs.
It has a memory clobber, that should be enough, no?
No. A memory clobber does not cover automatic storage.
A memory clobber
roy rosen writes:
> Is there any another way to give attributes to inline assembly insns?
Not that I know of. It would be a useful feature in some cases, though
difficult to document.
For specific cases a backend can normally do better by providing builtin
functions.
Ian
On Mon, Nov 15, 2010 at 10:30 AM, Jim Bos wrote:
>
> Attached version with plain 2.6.36 source and version with the committed
> patch, i.e with the '"+m" (*regs)'
Looks 100% identical in i8k_smm() itself, and I'm not seeing anything
bad. The asm has certainly not been optimized away as implied in
On Mon, Nov 15, 2010 at 07:30:35PM +0100, Jim Bos wrote:
> On 11/15/2010 07:08 PM, Linus Torvalds wrote:
> > On Mon, Nov 15, 2010 at 9:40 AM, Jim Bos wrote:
> >>
> >> Hmm, that doesn't work.
> >>
> >> [ Not sure if you read to whole thread but initial workaround was to
> >> change the asm(..) to a
On Mon, Nov 15, 2010 at 10:45 AM, Jeff Law wrote:
>
> A memory clobber should clobber anything in memory, including autos in
> memory; if it doesn't, then that seems like a major problem. I'd like to
> see the rationale behind not clobbering autos in memory.
Yes. It turns out that the "asm optim
On 11/15/2010 07:26 PM, Jakub Jelinek wrote:
> On Mon, Nov 15, 2010 at 07:17:31PM +0100, Jim Bos wrote:
>> # gcc -v
>> Reading specs from /usr/lib/gcc/i486-slackware-linux/4.5.1/specs
>> COLLECT_GCC=gcc
>> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i486-slackware-linux/4.5.1/lto-wrapper
>> Target: i486-
On Mon, Nov 15, 2010 at 07:58:48PM +0100, Jakub Jelinek wrote:
> Now, not sure why this happens, as there is
> case GIMPLE_ASM:
> for (i = 0; i < gimple_asm_nclobbers (stmt); i++)
> {
> tree op = gimple_asm_clobber_op (stmt, i);
> if (simple_cst_equal(TREE_VALU
On Mon, Nov 15, 2010 at 11:12 AM, Jakub Jelinek wrote:
>
> Ah, the problem is that memory_identifier_string is only initialized in
> ipa-reference.c's initialization, so it can be (and is in this case) NULL in
> ipa-pure-const.c.
Ok. And I guess you can verify that all versions of gcc do this
cor
On Mon, Nov 15, 2010 at 11:21:30AM -0800, Linus Torvalds wrote:
> On Mon, Nov 15, 2010 at 11:12 AM, Jakub Jelinek wrote:
> >
> > Ah, the problem is that memory_identifier_string is only initialized in
> > ipa-reference.c's initialization, so it can be (and is in this case) NULL in
> > ipa-pure-con
On 11/15/2010 11:12 AM, Jakub Jelinek wrote:
> - if (simple_cst_equal(TREE_VALUE (op), memory_identifier_string) == 1)
> + if (strcmp (TREE_STRING_POINTER (TREE_VALUE (link)), "memory") == 0)
I prefer this solution. I think memory_identifier_string is over-engineering.
Patch to remove
On Mon, Nov 15, 2010 at 11:53:05AM -0800, Richard Henderson wrote:
> On 11/15/2010 11:12 AM, Jakub Jelinek wrote:
> > - if (simple_cst_equal(TREE_VALUE (op), memory_identifier_string) == 1)
> > + if (strcmp (TREE_STRING_POINTER (TREE_VALUE (link)), "memory") == 0)
>
> I prefer this solutio
On 11/15/2010 08:51 PM, Jakub Jelinek wrote:
> On Mon, Nov 15, 2010 at 11:21:30AM -0800, Linus Torvalds wrote:
>> On Mon, Nov 15, 2010 at 11:12 AM, Jakub Jelinek wrote:
>>>
>>> Ah, the problem is that memory_identifier_string is only initialized in
>>> ipa-reference.c's initialization, so it can b
The only targets that are using textual prologues and epilogues are now
arc, cris, pdp11 and vax. ARC should probably have been deprecated long
ago, any plans to convert the others or (for cris) to flip the default?
Paolo
> Ahh, you mean something like this? (Fixes the testcase, but not
> properly tested yet.)
Yes, but I think you still need the regular treatment for these CALL_INSNs:
Index: postreload.c
===
--- postreload.c(revision 166701)
We currently have 3 non-algorithmic maintainers:
loop optimizer Zdenek Dvorak o...@ucw.cz
loop optimizer Daniel Berlin dber...@dberlin.org
libcpp Tom Tromey tro...@redhat.com
Especially for the loop optimizer, the situation is a
On 11/15/2010 04:48 PM, Joseph S. Myers wrote:
* The macro is tested with #if/#ifdef/#ifndef/#elif in a source file
outside of config/ (but including front-end subdirectories). Care is
needed in identifying such macros through grep because of
backslash-newline line continuations and because it's
On Nov 15, 2010, at 3:50 PM, Paolo Bonzini wrote:
> The only targets that are using textual prologues and epilogues are now arc,
> cris, pdp11 and vax. ARC should probably have been deprecated long ago, any
> plans to convert the others or (for cris) to flip the default?
Learning how to do th
Quoting Paolo Bonzini :
Augmenting libcpp with a new kind of poisoning that only affect
preprocessor conditionals would probably make this a lot simpler.
If we don't have a wrapper macro, we can just poison the macro.
I still have to find out how practical that is, but for now let's assume
we
2010/11/15 Jan Hubicka :
>> For peak, FDO is the most effective option. It can boost performance
>> by 7-10% depending on the program. The options you suggested probably
>> won't make too big a dent. -funroll-loops can hurt performance
>> without profiling. More aggressive inlining, ipa-cp, unswi
On Mon, Nov 15, 2010 at 7:45 PM, Jeff Law wrote:
> On 11/08/10 03:49, Richard Guenther wrote:
>>
>> On Mon, Nov 8, 2010 at 12:03 AM, Andi Kleen wrote:
>>>
>>> Andreas Schwab writes:
The asm fails to mention that it modifies *regs.
>>>
>>> It has a memory clobber, that should be enough,
On Mon, 15 Nov 2010, Paolo Bonzini wrote:
> The only targets that are using textual prologues and epilogues are now arc,
> cris, pdp11 and vax. ARC should probably have been deprecated long ago, any
> plans to convert the others or (for cris) to flip the default?
What code are you loking at; wher
On Mon, Nov 15, 2010 at 10:00 PM, Paolo Bonzini wrote:
> We currently have 3 non-algorithmic maintainers:
>
> loop optimizer Zdenek Dvorak o...@ucw.cz
> loop optimizer Daniel Berlin dber...@dberlin.org
> libcpp Tom Tromey tro...@r
I did some measurement (64bit).
Experiment 1:
-O2 -funroll-loops vs -O2
It improves performance (geomean) by 0.56%, not too much:
O2 O2 unroll-loops
164.gzip13241331 0.56%
175.v
> On Mon, Nov 15, 2010 at 10:00 PM, Paolo Bonzini wrote:
> > We currently have 3 non-algorithmic maintainers:
> >
> > loop optimizer Zdenek Dvorak o...@ucw.cz
> > loop optimizer Daniel Berlin dber...@dberlin.org
> > libcpp Tom Tromey
> I did some measurement (64bit).
>
> Experiment 1:
>
> -O2 -funroll-loops vs -O2
>
> It improves performance (geomean) by 0.56%, not too much:
> O2 O2 unroll-loops
> 164.gzip13241331 0.56%
> testcase shows that in 4.1/4.2/4.3/4.4 this is miscompiled only when using
> -fno-ipa-reference, in 4.5 it is miscompiled always when optimizing
> unless -fno-ipa-pure-const (as 4.5 added local-pure-const pass which is run
> before ipa-reference) and in 4.6 this has been fixed by Honza when
> doi
On Mon, Nov 15, 2010 at 11:43:22PM +0100, Andi Kleen wrote:
> > testcase shows that in 4.1/4.2/4.3/4.4 this is miscompiled only when using
> > -fno-ipa-reference, in 4.5 it is miscompiled always when optimizing
> > unless -fno-ipa-pure-const (as 4.5 added local-pure-const pass which is run
> > befo
On 11/15/10 15:07, Richard Guenther wrote:
On Mon, Nov 15, 2010 at 7:45 PM, Jeff Law wrote:
On 11/08/10 03:49, Richard Guenther wrote:
On Mon, Nov 8, 2010 at 12:03 AM, Andi Kleenwrote:
Andreas Schwabwrites:
The asm fails to mention that it modifies *regs.
It has a memory clobber, th
On 11/15/2010 11:10 PM, Hans-Peter Nilsson wrote:
There *is* an option to omit the prologue and epologue
controlling the TARGET_PROLOGUE_EPILOGUE; I'm guessing that
could cause confusion.
That's what confused me.
Is that getting in the way of something?
Yes, there is code conditionalized on
On Mon, Nov 15, 2010 at 11:58 PM, Jeff Law wrote:
> On 11/15/10 15:07, Richard Guenther wrote:
>>
>> On Mon, Nov 15, 2010 at 7:45 PM, Jeff Law wrote:
>>>
>>> On 11/08/10 03:49, Richard Guenther wrote:
On Mon, Nov 8, 2010 at 12:03 AM, Andi Kleen
wrote:
>
> Andreas Schwab
Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More
data come later.
164.gzip13241322 -0.10%
175.vpr16941703 0.51%
176.gcc22932347
This means O3 level inlining should be turned on also for lto build by
default -- as -O2 lto performance is too unimpressive.
David
On Mon, Nov 15, 2010 at 3:36 PM, Xinliang David Li wrote:
> Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More
> data come later.
>
>
> This means O3 level inlining should be turned on also for lto build by
> default -- as -O2 lto performance is too unimpressive.
I am just re-tunning the inliner and hope to get more speedups for smaller
costs than we get right now. I however don't think we can resonably enable it
as it is at LT
> > This means O3 level inlining should be turned on also for lto build by
> > default -- as -O2 lto performance is too unimpressive.
>
> I am just re-tunning the inliner and hope to get more speedups for smaller
> costs than we get right now. I however don't think we can resonably enable it
> as
On Mon, Nov 15, 2010 at 4:25 PM, Jan Hubicka wrote:
>> This means O3 level inlining should be turned on also for lto build by
>> default -- as -O2 lto performance is too unimpressive.
>
> I am just re-tunning the inliner and hope to get more speedups for smaller
> costs than we get right now. I h
> On Mon, Nov 15, 2010 at 4:25 PM, Jan Hubicka wrote:
> >> This means O3 level inlining should be turned on also for lto build by
> >> default -- as -O2 lto performance is too unimpressive.
> >
> > I am just re-tunning the inliner and hope to get more speedups for smaller
> > costs than we get rig
> Fortunately linker plugin solves the problem here and this is why I want to
> have it by default. GCC then can do effectively -fwhole-program for binaries
> (since linker knows what will be bound elsewhere) and take advantage of
> visibility((hidden)) hints for shared libraries same way. Most o
I know it is debatable and I could be convinced otherwise, but I would suggest:
#ifdef __cplusplus
extern "C" {
#endif
...
#ifdef __cplusplus
} /* extern "C" */
#endif
be applied liberally in gcc.
Not "around" #includes, it is the job of each .h file, and mindful of #ifdefs
(ie: cor
Jay K writes:
> Any folks that get to see the mangled names, debugging, working on
> binutils, whatever, are saved from them.
Demangling is easy and readily available.
I don't see any reason to add extern "C" indiscriminately at this time.
If there are concrete problems, then those are prob
> > Fortunately linker plugin solves the problem here and this is why I want to
> > have it by default. GCC then can do effectively -fwhole-program for
> > binaries
> > (since linker knows what will be bound elsewhere) and take advantage of
> > visibility((hidden)) hints for shared libraries same
On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka wrote:
>> > Fortunately linker plugin solves the problem here and this is why I want to
>> > have it by default. GCC then can do effectively -fwhole-program for
>> > binaries
>> > (since linker knows what will be bound elsewhere) and take advantage of
> On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka wrote:
> >> > Fortunately linker plugin solves the problem here and this is why I want
> >> > to
> >> > have it by default. GCC then can do effectively -fwhole-program for
> >> > binaries
> >> > (since linker knows what will be bound elsewhere) and
More performance data:
-O2 -funroll-all-loops vs O2: +1.1% geomean
O2 O2 unroll-all-loops
164.gzip13241336 0.94%
175.vpr16941670 -1.44%
On 11/15/10 16:07, Richard Guenther wrote:
If the address of the auto isn't taken, then why is the object in memory to
begin with (with the obvious exception for aggregates).
Exactly sort of my point. If people pass the address of&x to an asm
and modify&x + 8 expecting the "adjacent" stack loc
target.h and function.h include tm.h, and they got data structure dependencies
that are painful to untangle. function.h requires x_rtl to be split into
a target-type tainted part and one that can be used in tree optimizers
/ frontends
(via the inline functions in emit-rtl.c). To get rid of t
77 matches
Mail list logo