在 2023-05-26 14:46, Stefan Kanthak 写道:
OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set"
(... ...)
OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the
       right side?

Please stop yelling like that. It makes you look like a naughty pupil.


14 instructions in 33 bytes    # 11 instructions in 32 bytes

OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous
       memory write?

Apart from the SSE question: You are performing 64-bit arithmetic on a 32-bit machine, which GCC isn't good at. The preferred way to check whether a 64-bit integer is a power of two is to cast it to a float, then examine whether its 23-bit mantissa is all zeroes:

Like yours, this also mistakes zero as a 'power of two', but it isn't.
   ```
   sub   esp, 0x0C                  ; 83 EC 0C
   fild  qword ptr [esp + 0x10]     ; DF 6C 24 10
   xor   eax, eax                   ; 33 C0
   fstp  dword ptr [esp]            ; D9 1C 24
   shl   dword ptr [esp], 9         ; C1 24 24 09
   setz  al                         ; 0F 94 C0
   add   esp, 0x0C                  ; 83 C4 0C
   ret                              ; C3
   ```
That's 8 instructions and 23 bytes in total.

In 64-bit mode, 64-bit integers can be converted to floats directly:
   ```
   cvtsi2ss  xmm0, qword ptr [rsp + 0x08]   ; F3 48 0F 2A 44 24 08
   xor       eax, eax                       ; 33 C0
   movd      ecx, xmm0                      ; 66 0F 7E C1
   shl       ecx, 9                         ; C1 E1 09
   setz      al                             ; 0F 94 C0
   ret                                      ; C3
   ```
That's 6 instructions and 20 bytes in total.

GCC has its own limitation, so if you would like aggressive optimization like this, you must do it yourself.


--
Best regards,
LIU Hao

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to