Till Straumann wrote: > Andrew Haley wrote: >> H.J. Lu wrote: >>> >>> That may be too old. Gcc 4.3.4 revision 148680 >>> generates: >>> >>> .L5: >>> leaq (%rsi,%rdx), %rax >>> movzbl (%rax), %eax >>> movb %al, (%rdi,%rdx) >>> addq $1, %rdx >>> cmpq $32, %rdx >>> jne .L5 >>> >> >> 4.4.0 20090307 generates truly bizarre code, though:
> That's roughly the same that 4.3.3 produces. > I had not quoted the full assembly code but just > the essential part that is executed when > source and destination are 4-byte aligned > and are more than 4-bytes apart. > Otherwise (not longword-aligned) the > (correct) code labeled '.L5' is executed. Right. I suspect this is just a matter of finding the place where the vectorization happens and turning it off if source or dest are volatile. Should be easy enough. Andrew.