gcc-4.3.2 seems to produce bad code when accessing an array of small 'volatile' objects -- it may try to access multiple such objects in a 'parallel' fashion. E.g., instead of reading two consecutive 'volatile short's sequentially it reads a single 32-bit longword. This may crash e.g., when accessing a memory-mapped device which allows only 16-bit accesses.
If I compile this code fragment void volarrcpy(short *d, volatile short *s, int n) { int i; for (i=0; i<n; i++) d[i] = s[i]; } with '-O3' (the critical option seems to be '-ftree-vectorize') then gcc-4.3.2 produces quite complicated code but the essential section is (powerpc) .L7: lhz 0,0(11) addi 11,11,2 lwzx 0,4,9 stwx 0,3,9 addi 9,9,4 bdnz .L7 or i386 .L7: movw (%ecx), %ax movl (%esi,%edx,4), %eax movl %eax, (%ebx,%edx,4) incl %edx addl $2, %ecx cmpl %edx, -20(%ebp) ja .L7 Disassembled back into C-code, this reads uint32_t *dst_l = (uint32_t*)d; uint32_t *src_l = (uint32_t*)s; for (i=0; i<n/2; i++) { d[i] = s[i]; dst_l[i] = src_l[i]; } This code seems neither optimal nor correct. Besides reading half of the locations twice which violates the semantics of volatile objects accessing such objects in a 'vectorized' way (in this case: instead of reading two adjacent short addresses gcc emits a single 32-bit read) seems illegal to me. Similar behavior seems to be present in 4.3.3. Does anybody have some insight? Should I file a bug report? Regards -- Till PS: I'm not subscribed to the gcc mailing list; please CC me on any replies, thanks.