On Mon, Jun 22, 2009 at 11:14 AM, Till
Straumann<strau...@slac.stanford.edu> wrote:
> Andrew Haley wrote:
>>
>> Till Straumann wrote:
>>
>>>
>>> gcc-4.3.2 seems to produce bad code when
>>> accessing an array of small 'volatile'
>>> objects -- it may try to access multiple
>>> such objects in a 'parallel' fashion.
>>> E.g., instead of reading two consecutive
>>> 'volatile short's sequentially it reads
>>> a single 32-bit longword. This may crash
>>> e.g., when accessing a memory-mapped device
>>> which allows only 16-bit accesses.
>>>
>>> If I compile this code fragment
>>>
>>> void volarrcpy(short *d, volatile short *s, int n)
>>> {
>>> int i;
>>>  for (i=0; i<n; i++)
>>>   d[i] = s[i];
>>> }
>>>
>>>
>>> with '-O3' (the critical option seems to be '-ftree-vectorize')
>>> then gcc-4.3.2 produces quite complicated code
>>> but the essential section is (powerpc)
>>>
>>> .L7:
>>>   lhz 0,0(11)
>>>   addi 11,11,2
>>>   lwzx 0,4,9
>>>   stwx 0,3,9
>>>   addi 9,9,4
>>>   bdnz .L7
>>>
>>> or i386
>>>
>>> .L7:
>>>   movw    (%ecx), %ax
>>>   movl    (%esi,%edx,4), %eax
>>>   movl    %eax, (%ebx,%edx,4)
>>>   incl    %edx
>>>   addl    $2, %ecx
>>>   cmpl    %edx, -20(%ebp)
>>>   ja  .L7
>>>
>>>
>>> Disassembled back into C-code, this reads
>>>
>>> uint32_t *dst_l = (uint32_t*)d;
>>> uint32_t *src_l = (uint32_t*)s;
>>>
>>> for (i=0; i<n/2; i++) {
>>>   d[i]     = s[i];
>>>   dst_l[i] = src_l[i];
>>> }
>>>
>>> This code seems neither optimal nor correct.
>>> Besides reading half of the locations twice
>>> which violates the semantics of volatile
>>> objects accessing such objects in a 'vectorized'
>>> way (in this case: instead of reading
>>> two adjacent short addresses gcc emits
>>> a single 32-bit read) seems illegal to me.
>>>
>>> Similar behavior seems to be present in 4.3.3.
>>>
>>> Does anybody have some insight? Should I file
>>> a bug report?
>>>
>>
>> I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)"
>>
>> .L8:
>>        movzwl  (%ecx), %eax
>>        addl    $1, %ebx
>>        addl    $2, %ecx
>>        movw    %ax, (%edx)
>>        addl    $2, %edx
>>        cmpl    %ebx, 16(%ebp)
>>        jg      .L8
>>
>> I think you should upgrade.
>>
>> Andrew.
>>
>
> OK, try this then:
>
> void
> c(char *d, volatile char *s)
> {
> int i;
>   for ( i=0; i<32; i++ )
>       d[i]=s[i];
> }
>
>
> (gcc --version: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

That may be too old.  Gcc 4.3.4 revision 148680
generates:

.L5:
        leaq    (%rsi,%rdx), %rax
        movzbl  (%rax), %eax
        movb    %al, (%rdi,%rdx)
        addq    $1, %rdx
        cmpq    $32, %rdx
        jne     .L5



-- 
H.J.

Reply via email to