On Fri, 26 May 2023 at 10:06, Stefan Kanthak <stefan.kant...@nexgo.de> wrote:
>
> "Jonathan Wakely" <jwakely....@gmail.com> wrote:
>
> > On Fri, 26 May 2023 at 09:00, Stefan Kanthak <stefan.kant...@nexgo.de> 
> > wrote:
> >>
> >> "Jonathan Wakely" <jwakely....@gmail.com> wrote:
> >>
> >> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> 
> >> > wrote:
> >> >
> >> >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak 
> >> >> <stefan.kant...@nexgo.de>
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> compile the following function on a system with Core2 processor
> >> >>> (released January 2008) for the 32-bit execution environment:
> >> >>>
> >> >>> --- demo.c ---
> >> >>> int ispowerof2(unsigned long long argument)
> >> >>> {
> >> >>>     return (argument & argument - 1) == 0;
> >> >>> }
> >> >>> --- EOF ---
> >> >>>
> >> >>> GCC 13.3: gcc -m32 -O3 demo.c
> >> >>>
> >> >>> NOTE: -mtune=native is the default!
> >> >>
> >> >> You need to use -march=native and not -mtune=native .... to turn on
> >> >> the architecture features.
> >>
> >> (Un)fortunately this changes nothing!
> >>
> >> STOP: that's wrong, it makes it even WORSE!
> >>
> >> # Compilation provided by Compiler Explorer at https://godbolt.org/
> >> ispowerof2(unsigned long long):
> >>         vmovq   xmm1, QWORD PTR [esp+4]
> >>         vpcmpeqd        xmm0, xmm0, xmm0
> >>         xor     eax, eax
> >>         vpaddq  xmm0, xmm1, xmm0
> >>         vpand   xmm0, xmm0, xmm1
> >>         vpunpcklqdq     xmm0, xmm0, xmm0
> >>         vptest  xmm0, xmm0
> >>         sete    al
> >>         ret
> >>
> >> That's what I call a REALLY EPIC FAILURE!
> >>
> >> Compare this unefficient BLOAT to the SSE4.1 code from my original post!
> >>
> >> > Yes this is just user error. You didn't use the right options to say you
> >> > want SSE2.
> >>
> >> ARGH: please read CAREFULLY what I wrote!
> >
> > You wrote "Now add the -mtune=core2 option to EXPLICITLY enable the
> > NATIVE SSE4.1
> > alias "Penryn New Instruction Set" of the Core2 processor" which is
> > wrong, that's not what -mtune does.
> >
> > Read the docs CAREFULLY: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
>
> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
>    That's bad, REALITY CHECK, please!

Are you sure about that?

My understanding is that Core2 introduced SSSE3 and Penryn introduced
SSE4.1. The list at
https://en.wikipedia.org/wiki/List_of_Intel_Core_2_processors shows a
lot of Core2 processors without SSE4.1, is it wrong?

e.g. Intel Core2 E6400 doesn't support SSE4.1


>
> 4) If the documenation is right, then the behaviour of GCC is wrong: it
>    doesn't allow to use SSE4.1 without SSE4.2!

It's not "wrong", it just means GCC has chosen not to add customized
behaviour for the models that only support SSE4.1 and not SSE4.2.
That's not "wrong" unless it's leaving real performance on the floor
for real hardware used by real users.

How common are those models, and is there any significant performance
benefit in adding yet another arch option for those models?


> 5) Compile the function with -march=nehalem (which according to the
>    documentation enables support for BOTH SSE4.1 and SSE4.2) and notice
>    that GCC fails to use SSE4.1!

If you think the code would perform better with SSE4.1 instructions
and GCC doesn't use them for -march=nehalem, PLEASE FILE A BUG.

Stop yelling about it on the mailing list, it just makes you look like
a troll who isn't actually interesting in improving anything, just
complaining.

If you think there's something that should be fixed in GCC file a bug.
File a bug. File a bug.

Did anybody mention yet that you should file a bug?

Reply via email to