Bruno Haible wrote: > Btw, how do you need to write code such that gcc uses the SSE3 instructions?
You mean auto-vectorization, as opposed to explicitly using the mmintrin.h or __builtin_foo APIs? I think you need to specify a -march= that names an architecture that has sse3 (or just -msse3, but that should be implied by an appropriate -march=) as well as -ftree-vectorize. I think that -ftree-vectorize is enabled at -O3 but I'm not positive. Two other notes: starting with 4.2, the gcc default -mtune= is now 'generic' (instead of the old default of pentiumpro) which is meant to be a blended tuning that is appropriate for a wide class of today's most common architectures - Athlon, Opteron, Pentium M, Pentium 4, and Core 2. Thus with gcc >= 4.2 you would expect to see less difference between [no -mtune= specified] and [-mtune=athlon specified] than with older versions given this new default. Also, gcc >= 4.2 offers -mtune=native and -march=native which sets the arch and tune respectively to whatever is appropriate for the host machine, based on cpuid. Brian