RE: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-21 Thread Zamyatin, Igor
/pop in pro/epilogue for modern CPUs > I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a > SP adjustment instead of a sequence of pushes/pops. The preference to > the MOVs are good for old CPU micro-architectures (before pentium-4, > K10), because it breaks the dat

RE: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-21 Thread Zamyatin, Igor
Sharif Subject: Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs Ahmad has helped doing some atom performance testing (ChromeOS benchmarks) with this patch. In summary, there is no statistically significant regression seen. There is one improvement of about +1.9% (v8 benchmark

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-20 Thread Xinliang David Li
Ahmad has helped doing some atom performance testing (ChromeOS benchmarks) with this patch. In summary, there is no statistically significant regression seen. There is one improvement of about +1.9% (v8 benchmark) which looks real. David On Wed, Dec 12, 2012 at 9:24 AM, Xinliang David Li wrote:

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-20 Thread H.J. Lu
On Thu, Dec 20, 2012 at 7:06 AM, Jan Hubicka wrote: >> > Hi Areg, >> > >> > Did you mean inlined memcpy/memset are as fast as >> > the ones in libc.so on both ia32 and Intel64? >> >> I would be interested in output of the stringop script. > > Also as far as I can remember, none of spec2k6 benchmar

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-20 Thread Jan Hubicka
> > Hi Areg, > > > > Did you mean inlined memcpy/memset are as fast as > > the ones in libc.so on both ia32 and Intel64? > > I would be interested in output of the stringop script. Also as far as I can remember, none of spec2k6 benchmarks is really stringop bound. On Spec2k GCC was quite bound

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-20 Thread Jan Hubicka
> Hi Areg, > > Did you mean inlined memcpy/memset are as fast as > the ones in libc.so on both ia32 and Intel64? I would be interested in output of the stringop script. > > Please keep in mind that memcpy/memset in libc.a > may not be optimized. You must not use -static for > linking. In my se

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-20 Thread H.J. Lu
David Li; GCC Patches; Teresa Johnson; > Melik-adamyan, Areg > Subject: Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs > > On Thu, Dec 13, 2012 at 12:40 PM, Jan Hubicka wrote: >>> > Here we speak about memcpy/memset only. I never got around to >

RE: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-20 Thread Melik-adamyan, Areg
push/pop in pro/epilogue for modern CPUs On Thu, Dec 13, 2012 at 12:40 PM, Jan Hubicka wrote: >> > Here we speak about memcpy/memset only. I never got around to >> > modernize strlen and friends, unfortunately... >> > >> > memcmp and friends are differ

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-13 Thread Jan Hubicka
> > me libc starts to be win only for rather large blocks (i.e. >8KB) > > > > Which glibc are you using? 2.15 as it comes with opensuse 12.2 Honza > > -- > H.J.

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-13 Thread H.J. Lu
On Thu, Dec 13, 2012 at 12:40 PM, Jan Hubicka wrote: >> > Here we speak about memcpy/memset only. I never got around to modernize >> > strlen and friends, unfortunately... >> > >> > memcmp and friends are different beats. They realy need some TLC... >> >> memcpy and memset in glibc are also extr

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-13 Thread Jan Hubicka
> > Here we speak about memcpy/memset only. I never got around to modernize > > strlen and friends, unfortunately... > > > > memcmp and friends are different beats. They realy need some TLC... > > memcpy and memset in glibc are also extremely fast. The default strategy now is to inline only whe

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-13 Thread H.J. Lu
On Thu, Dec 13, 2012 at 12:26 PM, Jan Hubicka wrote: >> On Wed, Dec 12, 2012 at 10:21 PM, Jakub Jelinek wrote: >> > On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote: >> >> On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka wrote: >> >> >> > libcall is not faster up to 8KB to rep seque

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-13 Thread Jan Hubicka
> On Wed, Dec 12, 2012 at 10:21 PM, Jakub Jelinek wrote: > > On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote: > >> On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka wrote: > >> >> > libcall is not faster up to 8KB to rep sequence that is better for > >> >> > regalloc/code > >> >> >

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-13 Thread H.J. Lu
On Wed, Dec 12, 2012 at 10:21 PM, Jakub Jelinek wrote: > On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote: >> On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka wrote: >> >> > libcall is not faster up to 8KB to rep sequence that is better for >> >> > regalloc/code >> >> > cache than f

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-13 Thread Jan Hubicka
> Try the following one. 1) -minline-all-stringops > -mstringop-strategy=rep_8byte -O2 vs 1) -mstringop_strategy=libcall > -O2. > > David > > > #include > #include > #include > #ifndef LEN > #define LEN 16 > #endif > > void copy(char* s1, char* s2,int len) __attribute__((noinline)); > void c

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-13 Thread Richard Biener
On Thu, Dec 13, 2012 at 7:21 AM, Jakub Jelinek wrote: > On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote: >> On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka wrote: >> >> > libcall is not faster up to 8KB to rep sequence that is better for >> >> > regalloc/code >> >> > cache than fu

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
Try the following one. 1) -minline-all-stringops -mstringop-strategy=rep_8byte -O2 vs 1) -mstringop_strategy=libcall -O2. David #include #include #include #ifndef LEN #define LEN 16 #endif void copy(char* s1, char* s2,int len) __attribute__((noinline)); void copy(char* s1, char* s2,int len)

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote: > On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka wrote: > >> > libcall is not faster up to 8KB to rep sequence that is better for > >> > regalloc/code > >> > cache than fully blowin function call. > >> > >> Be careful with this. My

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka wrote: >> > libcall is not faster up to 8KB to rep sequence that is better for >> > regalloc/code >> > cache than fully blowin function call. >> >> Be careful with this. My recollection is that REP sequence is good for >> any size -- for smaller size,

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jan Hubicka
> > libcall is not faster up to 8KB to rep sequence that is better for > > regalloc/code > > cache than fully blowin function call. > > Be careful with this. My recollection is that REP sequence is good for > any size -- for smaller size, the REP initial set up cost is too high > (10s of cycles),

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
On Wed, Dec 12, 2012 at 4:16 PM, Xinliang David Li wrote: > On Wed, Dec 12, 2012 at 10:30 AM, Jan Hubicka wrote: >> Concerning 1push per cycle, I think it is same as K7 hardware did, so move >> prologue should be a win. >>> > Index: config/i386/i386.c >>> > ===

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
On Wed, Dec 12, 2012 at 10:30 AM, Jan Hubicka wrote: > Concerning 1push per cycle, I think it is same as K7 hardware did, so move > prologue should be a win. >> > Index: config/i386/i386.c >> > === >> > --- config/i386/i386.c (revisi

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Andi Kleen
Andi Kleen writes: > >>> >/* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict >>> > more >>> > than 4 branch instructions in the 16 byte window. */ >>> > - m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC, >>> > + m_PPRO | m_P4_NOCONA | m_ATOM |

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jan Hubicka
> Jan Hubicka writes: > > > > libcall is not faster up to 8KB to rep sequence that is better for > > regalloc/code > > cache than fully blowin function call. > > I noticed btw that some of the generated string instructions are slower > than just calling the C library. > > rep scasb etc. is rar

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Andi Kleen
Jan Hubicka writes: > > libcall is not faster up to 8KB to rep sequence that is better for > regalloc/code > cache than fully blowin function call. I noticed btw that some of the generated string instructions are slower than just calling the C library. rep scasb etc. is rarely a win over an op

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jan Hubicka
Concerning 1push per cycle, I think it is same as K7 hardware did, so move prologue should be a win. > > Index: config/i386/i386.c > > === > > --- config/i386/i386.c (revision 194452) > > +++ config/i386/i386.c (working copy) > > @@

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
Honza, can you explain each change and point to the reference? thanks, David On Wed, Dec 12, 2012 at 8:37 AM, Jan Hubicka wrote: >> I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a >> SP adjustment instead of a sequence of pushes/pops. The preference to >> the MOVs are good

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
On Wed, Dec 12, 2012 at 8:37 AM, Jan Hubicka wrote: >> I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a >> SP adjustment instead of a sequence of pushes/pops. The preference to >> the MOVs are good for old CPU micro-architectures (before pentium-4, >> K10), because it breaks t

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jan Hubicka
> I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a > SP adjustment instead of a sequence of pushes/pops. The preference to > the MOVs are good for old CPU micro-architectures (before pentium-4, > K10), because it breaks the data dependency. In modern > micro-architecture, push

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Richard Biener
On Tue, Dec 11, 2012 at 11:53 PM, Xinliang David Li wrote: > The following the O2 size data from SPEC2k. Note that with push/pop, > it is a always a net win (negative delta) in terms of total binary or > total loadable section size. Thanks for the data! Richard. > thanks, > > David > >

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-11 Thread Xinliang David Li
Some SPEC2k performance number (with 3 runs on core2): Push wins over move on 3 benchmarks. Others are noises. perlbmk : ~+1.9% gap: ~+1.4% vortex:~ +0.7% David On Tue, Dec 11, 2012 at 2:53 PM, Xinliang David Li wrote: > The following the O2 size data from SPEC2k. Note that with pus

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-11 Thread Xinliang David Li
The following the O2 size data from SPEC2k. Note that with push/pop, it is a always a net win (negative delta) in terms of total binary or total loadable section size. thanks, David .text.eh_frame Total_binary vortex-move 440252 40796 584066 vortex-push 415436 57452 5759

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-11 Thread Xinliang David Li
On Tue, Dec 11, 2012 at 1:49 AM, Richard Biener wrote: > On Mon, Dec 10, 2012 at 10:07 PM, Mike Stump wrote: >> On Dec 10, 2012, at 12:42 PM, Xinliang David Li wrote: >>> I have not measured the CFI size impact -- but conceivably it should >>> be larger -- which is unfortunate. >> >> Code speed

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-11 Thread Richard Biener
On Mon, Dec 10, 2012 at 10:07 PM, Mike Stump wrote: > On Dec 10, 2012, at 12:42 PM, Xinliang David Li wrote: >> I have not measured the CFI size impact -- but conceivably it should >> be larger -- which is unfortunate. > > Code speed and size are preferable to optimizing dwarf size… :-) I'd let

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-10 Thread Mike Stump
On Dec 10, 2012, at 12:42 PM, Xinliang David Li wrote: > I have not measured the CFI size impact -- but conceivably it should > be larger -- which is unfortunate. Code speed and size are preferable to optimizing dwarf size… :-) I'd let dwarf 5 fix it!

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-10 Thread Xinliang David Li
I have not measured the CFI size impact -- but conceivably it should be larger -- which is unfortunate. David On Mon, Dec 10, 2012 at 1:23 AM, Richard Biener wrote: > On Sun, Dec 9, 2012 at 2:50 PM, Uros Bizjak wrote: >> Hello! >> >>> I noticed in prologue/epilogue, GCC prefers to use MOVs foll

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-10 Thread Richard Biener
On Sun, Dec 9, 2012 at 2:50 PM, Uros Bizjak wrote: > Hello! > >> I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a >> SP adjustment instead of a sequence of pushes/pops. The preference to >> the MOVs are good for old CPU micro-architectures (before pentium-4, >> K10), because i

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-09 Thread Дмитрий Дьяченко
s/Eanble/Enable/ Thanks, Dmitry 2012/12/9 Uros Bizjak : > Hello! > >> I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a >> SP adjustment instead of a sequence of pushes/pops. The preference to >> the MOVs are good for old CPU micro-architectures (before pentium-4, >> K10), be

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-09 Thread Uros Bizjak
Hello! > I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a > SP adjustment instead of a sequence of pushes/pops. The preference to > the MOVs are good for old CPU micro-architectures (before pentium-4, > K10), because it breaks the data dependency. In modern > micro-architectu

[PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-08 Thread Xinliang David Li
I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a SP adjustment instead of a sequence of pushes/pops. The preference to the MOVs are good for old CPU micro-architectures (before pentium-4, K10), because it breaks the data dependency. In modern micro-architecture, push/pop is im