On May 20, 2010, at 8:04 AM, Steven Bosscher wrote:

> On Mon, May 17, 2010 at 8:44 AM, Maxim Kuvyrkov <ma...@codesourcery.com> 
> wrote:
>> CodeSourcery is working on improving performance for Intel's Core 2 and Core
>> i7 families of processors.
>> 
>> CodeSourcery plans to add support for unaligned vector instructions, to
>> provide fine-tuned scheduling support and to update instruction selection
>> and instruction cost models for Core i7 and Core 2 families of processors.
>> 
>> As usual, CodeSourcery will be contributing its work to GCC.  Currently, our
>> target is the end of GCC 4.6 Stage1.
>> 
>> If your favorite benchmark significantly under-performs on Core 2 or Core i7
>> CPUs, don't hesitate asking us to take a look at it.
> 
> I'd like to ask you to look at ffmpeg (missed core2 vectorization
> opportunities), polyhedron (PR34501, like, duh! :-), and Apache
> benchmark (-mtune=core2 results in lower scores).
> 
> You could check overall effects on an openly available benchmark suite
> such as http://www.phoronix-test-suite.com/
> 
> Good luck with this project, it'll be great when -mtune=core2 actually
> improves performance rather than degrading it!
> 
> Ciao!
> Steven

ffmpeg builds with -fno-tree-vectorize - there was some miscompilation with it 
on PPC and the maintainer is too shy to file compiler bugs about it - and that 
probably won't change. But it's still worth looking at, since it might improve 
other programs.

Some numbers decoding H264 on Core i5 x86-64:
asm on: 8.78s
asm off (./configure --disable-asm): 15.61s
asm off + -ftree-vectorize -ftree-slp-vectorize -fstrict-aliasing: 14.84s

So there's a lot of room there.

I haven't investigated, but I guess some useful missing features are 
small-vector vectorization using MMX (ffmpeg uses it everywhere) and scalar 
write-combining (http://x264dev.multimedia.cx/?p=32). And better 
scheduling/shorter code in general.

Reply via email to