"Jonathan Wakely" <jwakely....@gmail.com> wrote: > On Fri, 26 May 2023 at 09:00, Stefan Kanthak <stefan.kant...@nexgo.de> wrote: >> >> "Jonathan Wakely" <jwakely....@gmail.com> wrote: >> >> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote: >> > >> >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak <stefan.kant...@nexgo.de> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> compile the following function on a system with Core2 processor >> >>> (released January 2008) for the 32-bit execution environment: >> >>> >> >>> --- demo.c --- >> >>> int ispowerof2(unsigned long long argument) >> >>> { >> >>> return (argument & argument - 1) == 0; >> >>> } >> >>> --- EOF --- >> >>> >> >>> GCC 13.3: gcc -m32 -O3 demo.c >> >>> >> >>> NOTE: -mtune=native is the default! >> >> >> >> You need to use -march=native and not -mtune=native .... to turn on >> >> the architecture features. >> >> (Un)fortunately this changes nothing! >> >> STOP: that's wrong, it makes it even WORSE! >> >> # Compilation provided by Compiler Explorer at https://godbolt.org/ >> ispowerof2(unsigned long long): >> vmovq xmm1, QWORD PTR [esp+4] >> vpcmpeqd xmm0, xmm0, xmm0 >> xor eax, eax >> vpaddq xmm0, xmm1, xmm0 >> vpand xmm0, xmm0, xmm1 >> vpunpcklqdq xmm0, xmm0, xmm0 >> vptest xmm0, xmm0 >> sete al >> ret >> >> That's what I call a REALLY EPIC FAILURE! >> >> Compare this unefficient BLOAT to the SSE4.1 code from my original post! >> >> > Yes this is just user error. You didn't use the right options to say you >> > want SSE2. >> >> ARGH: please read CAREFULLY what I wrote! > > You wrote "Now add the -mtune=core2 option to EXPLICITLY enable the > NATIVE SSE4.1 > alias "Penryn New Instruction Set" of the Core2 processor" which is > wrong, that's not what -mtune does. > > Read the docs CAREFULLY: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. That's bad, REALITY CHECK, please! 4) If the documenation is right, then the behaviour of GCC is wrong: it doesn't allow to use SSE4.1 without SSE4.2! 5) Compile the function with -march=nehalem (which according to the documentation enables support for BOTH SSE4.1 and SSE4.2) and notice that GCC fails to use SSE4.1! >> 1) I didn't tell GCC to use SSE at all (I DON'T want any compiler to use >> SSE per default, especially when the generated code is SLOWER and BIGGER >> than conventional code using the general purpose registers)! >> >> 2) GCC uses SSE2 on its own, but doesn't support it well: it FAILS to use >> PMOVMSKB here, despite -O3! > > So report a bug to bugzilla, not via an email to the wrong list. > >> >> 3) -march=core2 doesn't help too, GCC fails to use SSE4.1 at all! > > core2 doesn't enable SSE4.1, as clearly shown in the docs: > https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html > > If you send emails full of confused mistakes, don't be surprised if > the replies aren't what you want. > > If you think GCC is generating bad code, file a bug. But make sure > you're actually using the right options to enable the right > instruction sets before complaining about the instructions used. See above: GCC fails to use SSE4.1, despite -march=nehalem And (if the documentation is right, then) GCC fails to support SSE4.1 without SSE4.2. Stefan