On Mon, Jun 22, 2009 at 11:14 AM, Till Straumann<strau...@slac.stanford.edu> wrote: > Andrew Haley wrote: >> >> Till Straumann wrote: >> >>> >>> gcc-4.3.2 seems to produce bad code when >>> accessing an array of small 'volatile' >>> objects -- it may try to access multiple >>> such objects in a 'parallel' fashion. >>> E.g., instead of reading two consecutive >>> 'volatile short's sequentially it reads >>> a single 32-bit longword. This may crash >>> e.g., when accessing a memory-mapped device >>> which allows only 16-bit accesses. >>> >>> If I compile this code fragment >>> >>> void volarrcpy(short *d, volatile short *s, int n) >>> { >>> int i; >>> for (i=0; i<n; i++) >>> d[i] = s[i]; >>> } >>> >>> >>> with '-O3' (the critical option seems to be '-ftree-vectorize') >>> then gcc-4.3.2 produces quite complicated code >>> but the essential section is (powerpc) >>> >>> .L7: >>> lhz 0,0(11) >>> addi 11,11,2 >>> lwzx 0,4,9 >>> stwx 0,3,9 >>> addi 9,9,4 >>> bdnz .L7 >>> >>> or i386 >>> >>> .L7: >>> movw (%ecx), %ax >>> movl (%esi,%edx,4), %eax >>> movl %eax, (%ebx,%edx,4) >>> incl %edx >>> addl $2, %ecx >>> cmpl %edx, -20(%ebp) >>> ja .L7 >>> >>> >>> Disassembled back into C-code, this reads >>> >>> uint32_t *dst_l = (uint32_t*)d; >>> uint32_t *src_l = (uint32_t*)s; >>> >>> for (i=0; i<n/2; i++) { >>> d[i] = s[i]; >>> dst_l[i] = src_l[i]; >>> } >>> >>> This code seems neither optimal nor correct. >>> Besides reading half of the locations twice >>> which violates the semantics of volatile >>> objects accessing such objects in a 'vectorized' >>> way (in this case: instead of reading >>> two adjacent short addresses gcc emits >>> a single 32-bit read) seems illegal to me. >>> >>> Similar behavior seems to be present in 4.3.3. >>> >>> Does anybody have some insight? Should I file >>> a bug report? >>> >> >> I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)" >> >> .L8: >> movzwl (%ecx), %eax >> addl $1, %ebx >> addl $2, %ecx >> movw %ax, (%edx) >> addl $2, %edx >> cmpl %ebx, 16(%ebp) >> jg .L8 >> >> I think you should upgrade. >> >> Andrew. >> > > OK, try this then: > > void > c(char *d, volatile char *s) > { > int i; > for ( i=0; i<32; i++ ) > d[i]=s[i]; > } > > > (gcc --version: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
That may be too old. Gcc 4.3.4 revision 148680 generates: .L5: leaq (%rsi,%rdx), %rax movzbl (%rax), %eax movb %al, (%rdi,%rdx) addq $1, %rdx cmpq $32, %rdx jne .L5 -- H.J.