Till Straumann wrote:
> gcc-4.3.2 seems to produce bad code when
> accessing an array of small 'volatile'
> objects -- it may try to access multiple
> such objects in a 'parallel' fashion.
> E.g., instead of reading two consecutive
> 'volatile short's sequentially it reads
> a single 32-bit longword. This may crash
> e.g., when accessing a memory-mapped device
> which allows only 16-bit accesses.
> 
> If I compile this code fragment
> 
> void volarrcpy(short *d, volatile short *s, int n)
> {
> int i;
>  for (i=0; i<n; i++)
>    d[i] = s[i];
> }
> 
> 
> with '-O3' (the critical option seems to be '-ftree-vectorize')
> then gcc-4.3.2 produces quite complicated code
> but the essential section is (powerpc)
> 
> .L7:
>    lhz 0,0(11)
>    addi 11,11,2
>    lwzx 0,4,9
>    stwx 0,3,9
>    addi 9,9,4
>    bdnz .L7
> 
> or i386
> 
> .L7:
>    movw    (%ecx), %ax
>    movl    (%esi,%edx,4), %eax
>    movl    %eax, (%ebx,%edx,4)
>    incl    %edx
>    addl    $2, %ecx
>    cmpl    %edx, -20(%ebp)
>    ja  .L7
> 
> 
> Disassembled back into C-code, this reads
> 
> uint32_t *dst_l = (uint32_t*)d;
> uint32_t *src_l = (uint32_t*)s;
> 
> for (i=0; i<n/2; i++) {
>    d[i]     = s[i];
>    dst_l[i] = src_l[i];
> }
> 
> This code seems neither optimal nor correct.
> Besides reading half of the locations twice
> which violates the semantics of volatile
> objects accessing such objects in a 'vectorized'
> way (in this case: instead of reading
> two adjacent short addresses gcc emits
> a single 32-bit read) seems illegal to me.
> 
> Similar behavior seems to be present in 4.3.3.
> 
> Does anybody have some insight? Should I file
> a bug report?

I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)"

.L8:
        movzwl  (%ecx), %eax
        addl    $1, %ebx
        addl    $2, %ecx
        movw    %ax, (%edx)
        addl    $2, %edx
        cmpl    %ebx, 16(%ebp)
        jg      .L8

I think you should upgrade.

Andrew.

Reply via email to