Till Straumann wrote: > gcc-4.3.2 seems to produce bad code when > accessing an array of small 'volatile' > objects -- it may try to access multiple > such objects in a 'parallel' fashion. > E.g., instead of reading two consecutive > 'volatile short's sequentially it reads > a single 32-bit longword. This may crash > e.g., when accessing a memory-mapped device > which allows only 16-bit accesses. > > If I compile this code fragment > > void volarrcpy(short *d, volatile short *s, int n) > { > int i; > for (i=0; i<n; i++) > d[i] = s[i]; > } > > > with '-O3' (the critical option seems to be '-ftree-vectorize') > then gcc-4.3.2 produces quite complicated code > but the essential section is (powerpc) > > .L7: > lhz 0,0(11) > addi 11,11,2 > lwzx 0,4,9 > stwx 0,3,9 > addi 9,9,4 > bdnz .L7 > > or i386 > > .L7: > movw (%ecx), %ax > movl (%esi,%edx,4), %eax > movl %eax, (%ebx,%edx,4) > incl %edx > addl $2, %ecx > cmpl %edx, -20(%ebp) > ja .L7 > > > Disassembled back into C-code, this reads > > uint32_t *dst_l = (uint32_t*)d; > uint32_t *src_l = (uint32_t*)s; > > for (i=0; i<n/2; i++) { > d[i] = s[i]; > dst_l[i] = src_l[i]; > } > > This code seems neither optimal nor correct. > Besides reading half of the locations twice > which violates the semantics of volatile > objects accessing such objects in a 'vectorized' > way (in this case: instead of reading > two adjacent short addresses gcc emits > a single 32-bit read) seems illegal to me. > > Similar behavior seems to be present in 4.3.3. > > Does anybody have some insight? Should I file > a bug report?
I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)" .L8: movzwl (%ecx), %eax addl $1, %ebx addl $2, %ecx movw %ax, (%edx) addl $2, %edx cmpl %ebx, 16(%ebp) jg .L8 I think you should upgrade. Andrew.