https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59429

--- Comment #16 from Mathias Stearn <redbeard0531 at gmail dot com> ---
Trunk still generates different code for all cases (in some cases subtly so)
for both aarch64 and x86_64: https://www.godbolt.org/z/1s6sxrMWq. For both
arches, it seems like LE and LG generate the best code.

On aarch64, they probably all have the same throughput, but EL and EG have a
size penalty with an extra instruction.

On x86_64, there is much more variety. EL and EG both get end up with a branch
rather than being branchless, which is probably bad since comparison functions
are often called in ways that the result branches are unpredictable. GE and GL
appear to have regressed since this ticket was created. They now do the
comparison twice rather than reusing the flags from the first comparison:

comGL(int, int):
        xor     eax, eax
        cmp     edi, esi
        mov     edx, 1
        setl    al
        neg     eax
        cmp     edi, esi
        cmovg   eax, edx
        ret
comGE(int, int):
        xor     eax, eax
        cmp     edi, esi
        mov     edx, 1
        setne   al
        neg     eax
        cmp     edi, esi
        cmovg   eax, edx
        ret

Reply via email to