http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49526
--- Comment #1 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2011-06-24 22:48:46 UTC --- And clang 2.9 has no problems optimizing this code: $ cat test.c int smmul(int a, int b) { return ((long long)a * b) >> 32; } $ clang -ccc-host-triple arm-none-linux -O2 -mcpu=cortex-a8 -S test.c $ cat test.s .syntax unified .cpu cortex-a8 .eabi_attribute 6, 10 .eabi_attribute 7, 65 .eabi_attribute 8, 1 .eabi_attribute 9, 2 .fpu neon .eabi_attribute 10, 3 .eabi_attribute 12, 1 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .file "test.c" .text .globl smmul .align 2 .type smmul,%function smmul: smmul r0, r1, r0 bx lr .Ltmp0: .size smmul, .Ltmp0-smmul