On Fri, 2014-05-30 at 07:21 +0900, Charles Plessy wrote: > For a lot of scientific packages, -O3 is chosen by the upstream > author, and I always feel bad that if we make the programs slower > by overriding it to -O2, it will reflect poorly on Debian as a > distribution for scientific works.
In particular -O3 turns on auto-vectorisation. It can provide a big speed up to programs that can take advantage of it - and yes many scientific programs fall into that category. Big as in 300% [0]. So you are correct in saying not turning it on will make Debian look slow compared to a system that takes advantage of it. Unfortunately the instructions need to get the speed up vary by CPU. Not only is AMD is different to Intel, Intel turns them on and off depending on the intended market. This breaks Debian's "One binary rules them all model" unless the upstream has gone to extraordinary lengths. As in providing multiple compiled versions of the same code path, and choosing the best one at run time based on CPU model. Projects that do that generally use hand crafted assembler, usually inlined in C code. Note that means they will run fast without -O3. As others have pointed our -O3 turns on optimisations that help on some architectures and hinder on others. Vectorisation sort of falls into that category: hinder becomes "fail with a SIGILL". That doesn't happen normally because of another fail safe: even with -O3 gcc only generates instructions the target CPU can execute [1]. Debian tells gcc to generate code for a generic CPU. Bottom line: the vectorisation provided -O3 can provide big speed ups to some scientific programs, but it is ineffective on Debian because by necessity it tells gcc to compile code for lowest common denominator CPU which doesn't have the necessary instructions. [0] http://felix.abecassis.me/2012/08/sse-vectorizing-conditional-code/ [1] See the -march option of gcc. In particular, -march=native.
signature.asc
Description: This is a digitally signed message part