http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50693
--- Comment #10 from David Edelsohn <dje at gcc dot gnu.org> 2011-10-11 01:35:20 UTC --- Sorry, I was looking at the loop1 and loop2 functions, not the code inlined into the benchmark for main. LLVM generates: movq %r12, %rdi movl $99, %esi movq %rbx, %rdx callq memset GCC vectorizes loop1: .L22: addq $1, %rdx movdqa %xmm0, (%rcx) addq $16, %rcx cmpq %rsi, %rdx jb .L22 but not loop2: .L28: .L29: movb $99, (%rbx,%rax) addq $1, %rax cmpq %rbp, %rax jne .L28