http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51837
Bug #: 51837 Summary: Use of result from 64*64->128 bit multiply via __uint128_t not optimized Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: svfue...@gmail.com unsigned long long foo(unsigned long long x, unsigned long long y) { __uint128_t z = (__uint128_t)x * y; return z ^ (z >> 64); } Compiles into mov %rsi, %rax mul %rdi mov %rax, %r9 mov %rdx, %rax xor %r9, %rax retq The final two mov instructions are not needed, and the above is equivalent to mov %rsi, %rax mul %rdi xor %rdx, %rax retq