Hello all, I am trying to analyze the optimized results of following code. The intent is to unpack a 64-bit integer into a struct containing eight 8-bit integers. The optimized result was very promising at first, but I then discovered that whenever the unpacking function gets inlined into another function, the optimization no longer works.
/* a struct of eight 8-bit integers */ struct alpha { int8_t a; int8_t b; ... int8_t h; }; struct alpha unpack(uint64_t x) { struct alpha r; memcpy(&r, &x, 8); return r; } struct alpha wrapper(uint64_t y) { return unpack(y); } The code was compiled with gcc 5.3.0 on Linux 4.4.1 with -O3 on x86-64. The `unpack` function optimizes fine. It produces the following assembly as expected: mov rax, rdi ret Given that `wrapper` is a trivial wrapper around `unpack`, I would expect the same. But in reality this is what I got from gcc: mov eax, edi xor ecx, ecx mov esi, edi shr ax, 8 mov cl, dil shr esi, 24 mov ch, al mov rax, rdi movzx edx, sil and eax, 16711680 and rcx, -16711681 sal rdx, 24 movabs rsi, -4278190081 or rcx, rax mov rax, rcx movabs rcx, -1095216660481 and rax, rsi or rax, rdx movabs rdx, 1095216660480 and rdx, rdi and rax, rcx movabs rcx, -280375465082881 or rax, rdx movabs rdx, 280375465082880 and rdx, rdi and rax, rcx movabs rcx, -71776119061217281 or rax, rdx movabs rdx, 71776119061217280 and rdx, rdi and rax, rcx shr rdi, 56 or rax, rdx sal rdi, 56 movabs rdx, 72057594037927935 and rax, rdx or rax, rdi ret This seems quite strange. Somehow the inlining process seems to have screwed up the potential optimizations. Is there a someway to prevent this from happening short of disabling inlining? Or perhaps there is a better way to write this code so that gcc would optimize more predictably? I would appreciate any advice, thanks. Phil