"Jakub Jelinek" <ja...@redhat.com> wrote: > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. >> That's bad, REALITY CHECK, please! > > You're wrong. > SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions > didn't have it.
That's correct, I failed to see this difference. > The supported CPU names don't distinguish between core2 submodels, > so if you have core2 with sse4.1, you should either be using -march=native > if compiling on such a machine, or use -march=core2 -msse4.1, This is one of the combinations I didn't test until now; with it (and with -m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise: # Compilation provided by Compiler Explorer at https://godbolt.org/ ispowerof2(unsigned long long): movq xmm1, QWORD PTR [esp+4] pcmpeqd xmm0, xmm0 xor eax, eax paddq xmm0, xmm1 pand xmm0, xmm1 # SUPERFLUOUS! punpcklqdq xmm0, xmm0 # SUPERFLUOUS! ptest xmm0, xmm0 # ptest xmm0, xmm1 sete al ret 9 instructions in 36 bytes instead of 7 instructions in 26 bytes. JFTR: the documentation of MOVQ specifies | when the destination operand is an XMM register, the quadword is | stored to the low quadword of the register, and the high quadword | is cleared to all 0s. > there is no -march={conroe,allendale,wolfdale,merom,penryn,...}. > >> 4) If the documenation is right, then the behaviour of GCC is wrong: it >> doesn't allow to use SSE4.1 without SSE4.2! > > If you aren't able to read the documentation, it is hard to argue. When the documentation is wrong or incomplete it's hard to trust it! | -m32 ... | The -m32 option sets int, long, and pointer types to 32 bits, and | generates code that runs on any i386 system. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but generates SSE2 instructions which DONT run on ANY i386 system! OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates code that does NOT run on ANY i386 system! Where is the precedence of the different -m* options for the CPU type documented? Where is their influence on each other documented? | -march=cpu-type ... | Specifying -march=cpu-type implies -mtune=cpu-type, except where noted | otherwise. ... | -mtune=cpu-type ... | the compiler does not generate any code that cannot run on the default | machine type unless you use a -march=cpu-type option. Why is the "default machine type" not mentioned/specified with -march=? Stefan