On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak <stefan.kant...@nexgo.de> wrote:
>
> "Allan Sandfeld Jensen" <li...@carewolf.com> wrote:
>
> > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote:
> >> Hi @ll,
> >>
> >> in his ACM queue article <https://queue.acm.org/detail.cfm?id=3372264>,
> >> Matt Godbolt used the function
> >>
> >> | bool isWhitespace(char c)
> >> | {
> >> |
> >> |     return c == ' '
> >> |
> >> |       || c == '\r'
> >> |       || c == '\n'
> >> |       || c == '\t';
> >> |
> >> | }
> >>
> >> as an example, for which GCC 9.1 emits the following assembly for AMD64
> >>
> >> processors (see <https://godbolt.org/z/acm19_conds>):
> >> |    xor    eax, eax              ; result = false
> >> |    cmp    dil, 32               ; is c > 32
> >> |    ja     .L4                   ; if so, exit with false
> >> |    movabs rax, 4294977024       ; rax = 0x100002600
> >> |    shrx   rax, rax, rdi         ; rax >>= c
> >> |    and    eax, 1                ; result = rax & 1
> >> |
> >> |.L4:
> >> |    ret
> >>
> > No it doesn't. As your example shows if you took the time to read it, it is
> > what gcc emit when generating code to run on a _haswell_ architecture.
>
> Matt's article does NOT specify the architecture for THIS example.
> He specified it for another example he named "(q)":
>
> | When targeting the Haswell microarchitecture, GCC 8.2 compiles this code
> | to the assembly in (q) (https://godbolt.org/z/acm19_bits):
>
> WHat about CAREFUL reading?
>
> > If you remove -march=haswell from the command line you get:
> >
> >        xor     eax, eax
> >        cmp     dil, 32
> >        ja      .L1
> >        movabs  rax, 4294977024
> >        mov     ecx, edi
> >        shr     rax, cl
> >        and     eax, 1
> >
> > It uses one mov more, but no shrx.
>
> The SHRX is NOT the point here; its the avoidable conditional branch that
> matters!

Whether or not the conditional branch sequence is faster depends on whether
the branch is well-predicted which very much depends on the data you
feed the isWhitespace function with but I guess since this is the
c == ' ' test it _will_ be a well-predicted branch which means the
conditional branch sequence will be usually faster.  The proposed
change turns the control into a data dependence which constrains
instruction scheduling and retirement.  Indeed a mispredicted branch
will likely be more costly.

x86 CPUs do not perform data speculation.

Richard.

>          mov     ecx, edi
>          movabs  rax, 4294977024
>          shr     rax, cl
>          xor     edi, edi
>          cmp     ecx, 33
>          setb    dil
>          and     eax, edi
>
> Stefan

Reply via email to