"Jonathan Wakely" <jwakely....@gmail.com> wrote: > On Fri, 26 May 2023 at 12:29, Stefan Kanthak <stefan.kant...@nexgo.de> wrote: >> >> "Jakub Jelinek" <ja...@redhat.com> wrote: >> >> > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. >> >> That's bad, REALITY CHECK, please! >> > >> > You're wrong. >> > SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions >> > didn't have it. >> >> That's correct, I failed to see this difference. > > REALITY CHECK please!
Dumbass check please! >> > The supported CPU names don't distinguish between core2 submodels, >> > so if you have core2 with sse4.1, you should either be using -march=native >> > if compiling on such a machine, or use -march=core2 -msse4.1, >> >> This is one of the combinations I didn't test until now; with it (and with >> -m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise: >> >> # Compilation provided by Compiler Explorer at https://godbolt.org/ >> ispowerof2(unsigned long long): >> movq xmm1, QWORD PTR [esp+4] >> pcmpeqd xmm0, xmm0 >> xor eax, eax >> paddq xmm0, xmm1 >> pand xmm0, xmm1 # SUPERFLUOUS! >> punpcklqdq xmm0, xmm0 # SUPERFLUOUS! >> ptest xmm0, xmm0 # ptest xmm0, xmm1 >> sete al >> ret >> >> 9 instructions in 36 bytes instead of 7 instructions in 26 bytes. No comment here? >> JFTR: the documentation of MOVQ specifies >> >> | when the destination operand is an XMM register, the quadword is >> | stored to the low quadword of the register, and the high quadword >> | is cleared to all 0s. >> >> > there is no -march={conroe,allendale,wolfdale,merom,penryn,...}. >> > >> >> 4) If the documenation is right, then the behaviour of GCC is wrong: it >> >> doesn't allow to use SSE4.1 without SSE4.2! >> > >> > If you aren't able to read the documentation, it is hard to argue. >> >> When the documentation is wrong or incomplete it's hard to trust it! > > Just like when you make incorrect statements and assume everybody else is > wrong. Do I assume that? Or did you just make this up? > The documentation isn't perfect, but you should not just ignore it and > assume you know better in all cases. > >> | -m32 >> ... >> | The -m32 option sets int, long, and pointer types to 32 bits, and >> | generates code that runs on any i386 system. >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but >> generates SSE2 instructions which DONT run on ANY i386 system! > > That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109954 I posted this here some years ago; see for example <https://skanthak.homepage.t-online.de/gcc.html#case27> Ignorance is bliss?! >> OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates >> code that does NOT run on ANY i386 system! >> >> Where is the precedence of the different -m* options for the CPU type >> documented? >> Where is their influence on each other documented? > > -march enables the instructions listed for the relevant cpu family, > then using -mxxx or -mno-xxx adds or removes particular instruction > sets from the ones enabled by -march. ADD THIS TO THE DOCUMENTATION! > If you give an option twice, e.g. -march=core2 -march=nehalem, then > the second one wins. If you use -msse2 -mno-sse2 then the second one > wins. ARGH: not repetitions of ONE particular option or its negation, stupid! > You can check this using e.g. > > gcc -Q --help=target -march=core2 -msse2 > >> | -march=cpu-type >> ... >> | Specifying -march=cpu-type implies -mtune=cpu-type, except where noted >> | otherwise. >> ... >> | -mtune=cpu-type >> ... >> | the compiler does not generate any code that cannot run on the default >> | machine type unless you use a -march=cpu-type option. >> >> Why is the "default machine type" not mentioned/specified with -march=? > > Using -march overrides it. The default is set during configure. And exactly this is missing in the documentation for -march=! Guess why I cited the documentation for -mtune= where it is mentioned? > Adding -v to the compilation will show what -march option is used by cc1 by > default. Not reliable unless documented elsewhere! Stefan