How to efficiently unpack 8 bytes from a 64-bit integer?

Phil Ruffwind Thu, 18 Feb 2016 22:25:09 -0800

Hello all,

I am trying to analyze the optimized results of following code.  The
intent is to unpack a 64-bit integer into a struct containing eight
8-bit integers.  The optimized result was very promising at first, but
I then discovered that whenever the unpacking function gets inlined
into another function, the optimization no longer works.


    /* a struct of eight 8-bit integers */
    struct alpha {
        int8_t a;
        int8_t b;
        ...
        int8_t h;
    };

    struct alpha unpack(uint64_t x)
    {
        struct alpha r;
        memcpy(&r, &x, 8);
        return r;
    }

    struct alpha wrapper(uint64_t y)
    {
        return unpack(y);
    }

The code was compiled with gcc 5.3.0 on Linux 4.4.1 with -O3 on x86-64.

The `unpack` function optimizes fine.  It produces the following
assembly as expected:

    mov rax, rdi
    ret

Given that `wrapper` is a trivial wrapper around `unpack`, I would
expect the same.  But in reality this is what I got from gcc:

    mov eax, edi
    xor ecx, ecx
    mov esi, edi
    shr ax, 8
    mov cl, dil
    shr esi, 24
    mov ch, al
    mov rax, rdi
    movzx edx, sil
    and eax, 16711680
    and rcx, -16711681
    sal rdx, 24
    movabs rsi, -4278190081
    or rcx, rax
    mov rax, rcx
    movabs rcx, -1095216660481
    and rax, rsi
    or rax, rdx
    movabs rdx, 1095216660480
    and rdx, rdi
    and rax, rcx
    movabs rcx, -280375465082881
    or rax, rdx
    movabs rdx, 280375465082880
    and rdx, rdi
    and rax, rcx
    movabs rcx, -71776119061217281
    or rax, rdx
    movabs rdx, 71776119061217280
    and rdx, rdi
    and rax, rcx
    shr rdi, 56
    or rax, rdx
    sal rdi, 56
    movabs rdx, 72057594037927935
    and rax, rdx
    or rax, rdi
    ret

This seems quite strange.  Somehow the inlining process seems to have
screwed up the potential optimizations.  Is there a someway to prevent
this from happening short of disabling inlining?  Or perhaps there is
a better way to write this code so that gcc would optimize more
predictably?

I would appreciate any advice, thanks.

Phil

How to efficiently unpack 8 bytes from a 64-bit integer?

Reply via email to