"Jonathan Wakely" <jwakely....@gmail.com> wrote:

> On Fri, 26 May 2023 at 09:00, Stefan Kanthak <stefan.kant...@nexgo.de> wrote:
>>
>> "Jonathan Wakely" <jwakely....@gmail.com> wrote:
>>
>> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote:
>> >
>> >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak <stefan.kant...@nexgo.de>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> compile the following function on a system with Core2 processor
>> >>> (released January 2008) for the 32-bit execution environment:
>> >>>
>> >>> --- demo.c ---
>> >>> int ispowerof2(unsigned long long argument)
>> >>> {
>> >>>     return (argument & argument - 1) == 0;
>> >>> }
>> >>> --- EOF ---
>> >>>
>> >>> GCC 13.3: gcc -m32 -O3 demo.c
>> >>>
>> >>> NOTE: -mtune=native is the default!
>> >>
>> >> You need to use -march=native and not -mtune=native .... to turn on
>> >> the architecture features.
>>
>> (Un)fortunately this changes nothing!
>>
>> STOP: that's wrong, it makes it even WORSE!
>>
>> # Compilation provided by Compiler Explorer at https://godbolt.org/
>> ispowerof2(unsigned long long):
>>         vmovq   xmm1, QWORD PTR [esp+4]
>>         vpcmpeqd        xmm0, xmm0, xmm0
>>         xor     eax, eax
>>         vpaddq  xmm0, xmm1, xmm0
>>         vpand   xmm0, xmm0, xmm1
>>         vpunpcklqdq     xmm0, xmm0, xmm0
>>         vptest  xmm0, xmm0
>>         sete    al
>>         ret
>>
>> That's what I call a REALLY EPIC FAILURE!
>>
>> Compare this unefficient BLOAT to the SSE4.1 code from my original post!
>>
>> > Yes this is just user error. You didn't use the right options to say you
>> > want SSE2.
>>
>> ARGH: please read CAREFULLY what I wrote!
> 
> You wrote "Now add the -mtune=core2 option to EXPLICITLY enable the
> NATIVE SSE4.1
> alias "Penryn New Instruction Set" of the Core2 processor" which is
> wrong, that's not what -mtune does.
> 
> Read the docs CAREFULLY: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
   That's bad, REALITY CHECK, please!

4) If the documenation is right, then the behaviour of GCC is wrong: it
   doesn't allow to use SSE4.1 without SSE4.2!

5) Compile the function with -march=nehalem (which according to the
   documentation enables support for BOTH SSE4.1 and SSE4.2) and notice
   that GCC fails to use SSE4.1!

>> 1) I didn't tell GCC to use SSE at all (I DON'T want any compiler to use
>>    SSE per default, especially when the generated code is SLOWER and BIGGER
>>    than conventional code using the general purpose registers)!
>>
>> 2) GCC uses SSE2 on its own, but doesn't support it well: it FAILS to use
>>    PMOVMSKB here, despite -O3!
> 
> So report a bug to bugzilla, not via an email to the wrong list.
> 
>>
>> 3) -march=core2 doesn't help too, GCC fails to use SSE4.1 at all!
> 
> core2 doesn't enable SSE4.1, as clearly shown in the docs:
> https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
> 
> If you send emails full of confused mistakes, don't be surprised if
> the replies aren't what you want.
> 
> If you think GCC is generating bad code, file a bug. But make sure
> you're actually using the right options to enable the right
> instruction sets before complaining about the instructions used.

See above: GCC fails to use SSE4.1, despite -march=nehalem
And (if the documentation is right, then) GCC fails to support SSE4.1
without SSE4.2.

Stefan

Reply via email to