Till Straumann wrote:

> Andrew Haley wrote:
>> H.J. Lu wrote:
>>>
>>> That may be too old.  Gcc 4.3.4 revision 148680
>>> generates:
>>>
>>> .L5:
>>>     leaq    (%rsi,%rdx), %rax
>>>     movzbl    (%rax), %eax
>>>     movb    %al, (%rdi,%rdx)
>>>     addq    $1, %rdx
>>>     cmpq    $32, %rdx
>>>     jne    .L5
>>>     
>>
>> 4.4.0 20090307 generates truly bizarre code, though:

> That's roughly the same that 4.3.3 produces.
> I had not quoted the full assembly code but just
> the essential part that is executed when
> source and destination are 4-byte aligned
> and are more than 4-bytes apart.
> Otherwise (not longword-aligned) the
> (correct) code labeled '.L5' is executed.

Right.  I suspect this is just a matter of finding the place where the
vectorization happens and turning it off if source or dest are volatile.
Should be easy enough.

Andrew.

Reply via email to