On May 20, 2010, at 8:04 AM, Steven Bosscher wrote: > On Mon, May 17, 2010 at 8:44 AM, Maxim Kuvyrkov <ma...@codesourcery.com> > wrote: >> CodeSourcery is working on improving performance for Intel's Core 2 and Core >> i7 families of processors. >> >> CodeSourcery plans to add support for unaligned vector instructions, to >> provide fine-tuned scheduling support and to update instruction selection >> and instruction cost models for Core i7 and Core 2 families of processors. >> >> As usual, CodeSourcery will be contributing its work to GCC. Currently, our >> target is the end of GCC 4.6 Stage1. >> >> If your favorite benchmark significantly under-performs on Core 2 or Core i7 >> CPUs, don't hesitate asking us to take a look at it. > > I'd like to ask you to look at ffmpeg (missed core2 vectorization > opportunities), polyhedron (PR34501, like, duh! :-), and Apache > benchmark (-mtune=core2 results in lower scores). > > You could check overall effects on an openly available benchmark suite > such as http://www.phoronix-test-suite.com/ > > Good luck with this project, it'll be great when -mtune=core2 actually > improves performance rather than degrading it! > > Ciao! > Steven
ffmpeg builds with -fno-tree-vectorize - there was some miscompilation with it on PPC and the maintainer is too shy to file compiler bugs about it - and that probably won't change. But it's still worth looking at, since it might improve other programs. Some numbers decoding H264 on Core i5 x86-64: asm on: 8.78s asm off (./configure --disable-asm): 15.61s asm off + -ftree-vectorize -ftree-slp-vectorize -fstrict-aliasing: 14.84s So there's a lot of room there. I haven't investigated, but I guess some useful missing features are small-vector vectorization using MMX (ffmpeg uses it everywhere) and scalar write-combining (http://x264dev.multimedia.cx/?p=32). And better scheduling/shorter code in general.