Re: [RFA] optimizing predictable branches on x86

2008-03-03 Thread Jan Hubicka
> > >>>I hope so too. For the kernel we have some parts where > >>>__builtin_expect is used quite a lot and noticably helps, and could > >>>help even more if we cut down the use of cmov too. I guess on > >>>architectures with even more predictated instructions it could be > >>>even more useful too

Re: [RFA] optimizing predictable branches on x86

2008-03-03 Thread Paolo Bonzini
I hope so too. For the kernel we have some parts where __builtin_expect is used quite a lot and noticably helps, and could help even more if we cut down the use of cmov too. I guess on architectures with even more predictated instructions it could be even more useful too. Looking at kernel's __

Re: [RFA] optimizing predictable branches on x86

2008-03-03 Thread Nick Piggin
On Tuesday 04 March 2008 00:01, Jan Hubicka wrote: > > On Monday 03 March 2008 22:38, Jan Hubicka wrote: > > I hope so too. For the kernel we have some parts where > > __builtin_expect is used quite a lot and noticably helps, and could > > help even more if we cut down the use of cmov too. I guess

Re: [RFA] optimizing predictable branches on x86

2008-03-03 Thread Jan Hubicka
> On Monday 03 March 2008 22:38, Jan Hubicka wrote: > > Hi, > > I had to tweak the testcase a bit to not compute minimum: GCC optimizes > > this early into MIN_EXPR throwing away any profile information. If we > > get serious here we can maintain it via histogram, but I am not sure it > > is worth

Re: [RFA] optimizing predictable branches on x86

2008-03-03 Thread Nick Piggin
On Monday 03 March 2008 22:38, Jan Hubicka wrote: > Hi, > I had to tweak the testcase a bit to not compute minimum: GCC optimizes > this early into MIN_EXPR throwing away any profile information. If we > get serious here we can maintain it via histogram, but I am not sure it > is worth the effort

Re: [RFA] optimizing predictable branches on x86

2008-03-03 Thread Jan Hubicka
> > But I can also hide the cfun->function_frequency trick in > > DEFAULT_BRANCH_COST macro if it seems to help. (in longer term I hope > > they will all go away as expansion needs to be aware of hotness info > > anyway) > > Well, it definitly helps. I originally hoped there will be fewer places

Re: [RFA] optimizing predictable branches on x86

2008-03-03 Thread Jan Hubicka
> But I can also hide the cfun->function_frequency trick in > DEFAULT_BRANCH_COST macro if it seems to help. (in longer term I hope > they will all go away as expansion needs to be aware of hotness info > anyway) Well, it definitly helps. I originally hoped there will be fewer places querying BRA

Re: [RFA] optimizing predictable branches on x86

2008-03-03 Thread Jan Hubicka
> > >/* High branch cost, expand as the bitwise OR of the conditions. > > Do the same if the RHS has side effects, because we're effectively > > turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR. */ > >! if (BRANCH_COST (!optimize_size, false)>= 4 > >! || TREE_SIDE_EFFEC

Re: [RFA] optimizing predictable branches on x86

2008-03-03 Thread Paolo Bonzini
/* High branch cost, expand as the bitwise OR of the conditions. Do the same if the RHS has side effects, because we're effectively turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR. */ ! if (BRANCH_COST (!optimize_size, false)>= 4 ! || TREE_SIDE_EFFECTS (TR

[RFA] optimizing predictable branches on x86

2008-03-03 Thread Jan Hubicka
Hi, I had to tweak the testcase a bit to not compute minimum: GCC optimizes this early into MIN_EXPR throwing away any profile information. If we get serious here we can maintain it via histogram, but I am not sure it is worth the effort at least until IL is sanitized and expansion cleaned up with

Re: optimizing predictable branches on x86

2008-03-02 Thread Nick Piggin
On Wednesday 27 February 2008 03:06, J.C. Pizarro wrote: > Compiling and executing the code of Nick Piggin at > http://gcc.gnu.org/ml/gcc/2008-02/msg00601.html > > in my old Athlon64 Venice 3200+ 2.0 GHz, > 3 GiB DDR400, 32-bit kernel, gcc 3.4.6, i got > > $ gcc -O3 -falign-functions=64 -falign-loo

Re: optimizing predictable branches on x86

2008-02-27 Thread Jan Hubicka
> > At least on x86 it should also be a good idea to know which way > > the branch is going to go, because it doesn't have explicit branch > > hints, you really want to be able to optimize the cold branch > > predictor case if converting from cmov to conditional branches. > > x86 as of Pentium 4 d

Re: optimizing predictable branches on x86

2008-02-27 Thread Kenny Simpson
> At least on x86 it should also be a good idea to know which way > the branch is going to go, because it doesn't have explicit branch > hints, you really want to be able to optimize the cold branch > predictor case if converting from cmov to conditional branches. x86 as of Pentium 4 does have bra

Re: optimizing predictable branches on x86

2008-02-26 Thread J.C. Pizarro
On Tuesday 26 February 2008 21:14, Jan Hubicka wrote: > Only cases we do so quite reliably IMO are: > 1) loop branches that are not interesting for cmov conversion > 2) branches leading to noreturn calls, also not interesting > 3) builtin_expect mentioned. > 4) when profile feedback is arou

Re: optimizing predictable branches on x86

2008-02-26 Thread J.C. Pizarro
It's a final summary for good performance of the tested machines: + unpredictable: * don't use conditional jmp (the worst). / * use cmov or C version. / \ + no deps: * use cmov or C version. \ / + predictable: \ + has deps: * do

Re: optimizing predictable branches on x86

2008-02-26 Thread J.C. Pizarro
On 2008/2/26, J.C. Pizarro <[EMAIL PROTECTED]>, i wrote: > 4. C > cmov >> jmp when it's unpredictable and has not data dependencies. I'm sorry of my error typo, the correct is (without the "not") 4. C > cmov >> jmp when it's unpredictable and has data dependencies. and my forgotten 3rd annotatio

Re: optimizing predictable branches on x86

2008-02-26 Thread J.C. Pizarro
Compiling and executing the code of Nick Piggin at http://gcc.gnu.org/ml/gcc/2008-02/msg00601.html in my old Athlon64 Venice 3200+ 2.0 GHz, 3 GiB DDR400, 32-bit kernel, gcc 3.4.6, i got $ gcc -O3 -falign-functions=64 -falign-loops=64 -falign-jumps=64 -falign-labels=64 -march=i686 foo.c -o foo $ .

Re: optimizing predictable branches on x86

2008-02-26 Thread Nick Piggin
On Tuesday 26 February 2008 21:14, Jan Hubicka wrote: > Hi, > > > Core2 follows a similar pattern, although it's not seeing any > > slowdown in the "no deps, predictable, jmp" case like K8 does. > > > > Any comments? (please cc me) Should gcc be using conditional jumps > > more often eg. in the cas

Re: optimizing predictable branches on x86

2008-02-26 Thread Jan Hubicka
> Hi, > > Core2 follows a similar pattern, although it's not seeing any > > slowdown in the "no deps, predictable, jmp" case like K8 does. > > > > Any comments? (please cc me) Should gcc be using conditional jumps > > more often eg. in the case of __builtin_expect())? > > The problem is that in g

Re: optimizing predictable branches on x86

2008-02-26 Thread Jan Hubicka
Hi, > Core2 follows a similar pattern, although it's not seeing any > slowdown in the "no deps, predictable, jmp" case like K8 does. > > Any comments? (please cc me) Should gcc be using conditional jumps > more often eg. in the case of __builtin_expect())? The problem is that in general GCC's bra

optimizing predictable branches on x86

2008-02-25 Thread Nick Piggin
Hi list, gcc-4.3 appears to make quite heavy use of cmov to eliminate conditional branches on x86(-64) architecture, even for those branches that are determined to be predictable. The problem with this is that the data dependancy introduced by the cmov can restrict execution, wheras a predicted b