http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55953
--- Comment #3 from Evgeniy Dushistov <dushistov at mail dot ru> 2013-01-12 00:13:09 UTC --- Cross compiling for arm, g++ have almost the same version: arm-angstrom-linux-gnueabi-g++ (Linaro GCC 4.7-2012.10) 4.7.3 20121001: variant one (for): movw r3, #2280 ; 0x8e8 movt r3, #1 vmov.i8 q8, #48 ; 0x30 mov r2, #48 ; 0x30 vst1.64 {d16-d17}, [r3 :64] vstr d16, [r3, #16] vstr d17, [r3, #24] vstr d16, [r3, #32] vstr d17, [r3, #40] ; 0x28 vstr d16, [r3, #48] ; 0x30 vstr d17, [r3, #56] ; 0x38 vstr d16, [r3, #64] ; 0x40 vstr d17, [r3, #72] ; 0x48 vstr d16, [r3, #80] ; 0x50 vstr d17, [r3, #88] ; 0x58 strb r2, [r3, #96] ; 0x60 strb r2, [r3, #97] ; 0x61 strb r2, [r3, #98] ; 0x62 strb r2, [r3, #99] ; 0x63 bx lr variant two(memset): movw r0, #2272 ; 0x8e0 mov r1, #48 ; 0x30 movt r0, #1 mov r2, #100 ; 0x64 b 0x8494 <memset> The time difference near 5%, the first variant win, command line options: -march=armv7-a -mtune=cortex-a8 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard -Ofast