On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak <stefan.kant...@nexgo.de> wrote: > > "Allan Sandfeld Jensen" <li...@carewolf.com> wrote: > > > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote: > >> Hi @ll, > >> > >> in his ACM queue article <https://queue.acm.org/detail.cfm?id=3372264>, > >> Matt Godbolt used the function > >> > >> | bool isWhitespace(char c) > >> | { > >> | > >> | return c == ' ' > >> | > >> | || c == '\r' > >> | || c == '\n' > >> | || c == '\t'; > >> | > >> | } > >> > >> as an example, for which GCC 9.1 emits the following assembly for AMD64 > >> > >> processors (see <https://godbolt.org/z/acm19_conds>): > >> | xor eax, eax ; result = false > >> | cmp dil, 32 ; is c > 32 > >> | ja .L4 ; if so, exit with false > >> | movabs rax, 4294977024 ; rax = 0x100002600 > >> | shrx rax, rax, rdi ; rax >>= c > >> | and eax, 1 ; result = rax & 1 > >> | > >> |.L4: > >> | ret > >> > > No it doesn't. As your example shows if you took the time to read it, it is > > what gcc emit when generating code to run on a _haswell_ architecture. > > Matt's article does NOT specify the architecture for THIS example. > He specified it for another example he named "(q)": > > | When targeting the Haswell microarchitecture, GCC 8.2 compiles this code > | to the assembly in (q) (https://godbolt.org/z/acm19_bits): > > WHat about CAREFUL reading? > > > If you remove -march=haswell from the command line you get: > > > > xor eax, eax > > cmp dil, 32 > > ja .L1 > > movabs rax, 4294977024 > > mov ecx, edi > > shr rax, cl > > and eax, 1 > > > > It uses one mov more, but no shrx. > > The SHRX is NOT the point here; its the avoidable conditional branch that > matters!
Whether or not the conditional branch sequence is faster depends on whether the branch is well-predicted which very much depends on the data you feed the isWhitespace function with but I guess since this is the c == ' ' test it _will_ be a well-predicted branch which means the conditional branch sequence will be usually faster. The proposed change turns the control into a data dependence which constrains instruction scheduling and retirement. Indeed a mispredicted branch will likely be more costly. x86 CPUs do not perform data speculation. Richard. > mov ecx, edi > movabs rax, 4294977024 > shr rax, cl > xor edi, edi > cmp ecx, 33 > setb dil > and eax, edi > > Stefan