On 8/14/20 12:43 PM, Stefan Kanthak wrote:
Hi @ll,

in his ACM queue article <https://queue.acm.org/detail.cfm?id=3372264>,
Matt Godbolt used the function

| bool isWhitespace(char c)
| {
|     return c == ' '
|       || c == '\r'
|       || c == '\n'
|       || c == '\t';
| }

as an example, for which GCC 9.1 emits the following assembly for AMD64
processors (see <https://godbolt.org/z/acm19_conds>):

|    xor    eax, eax              ; result = false
|    cmp    dil, 32               ; is c > 32
|    ja     .L4                   ; if so, exit with false
|    movabs rax, 4294977024       ; rax = 0x100002600
|    shrx   rax, rax, rdi         ; rax >>= c
|    and    eax, 1                ; result = rax & 1
|.L4:
|    ret

This code is but not optimal!

What evidence do you have that your alternative sequence performs better? Have you benchmarked it? (I tried, but your code doesn't assemble)

It is more instructions and cannot speculate past the setnz (As I understand it, x86_64 speculates branch instructions, but doesn't speculate cmov -- so perversely branches are faster!)

The following equivalent and branchless code works on i386 too,
it needs neither an AMD64 processor nor the SHRX instruction,
which is not available on older processors:


      mov    ecx, edi
      mov    eax, 2600h            ; eax = (1 << '\r') | (1 << '\n') | (1 << 
'\t')
      test   cl, cl
      setnz  al                    ; eax |= (c != '\0')
      shr    eax, cl               ; eax >>= (c % ' ')

^^ operand type mismatch on this instruction

      xor    edx, edx
      cmp    ecx, 33               ; CF = c <= ' '
      adc    edx, edx              ; edx = (c <= ' ')
      and    eax, edx
      ret


regards
Stefan Kanthak



--
Nathan Sidwell

Reply via email to