Ok, I think I have tracked this down to having broken the aliasing rules, and for the sake of completeness, here was the problem:
Recall that the (big picture) code works fine at -O2, but fails at -O3. The problem seemed to stem from this inline assembly function: void longcpy(long* _dst, long* _src, unsigned _numwords) { asm volatile ( "cld \n\t" "rep \n\t" "movsl \n\t" // Outputs : // Inputs : "S" (_src), "D" (_dst), "c" (_numwords) // Clobbers : "cc", "memory" ); } My interpretation of the problem: _dst, _src and _numwords will get clobbered, but I didn't care. Now if the compiler inlines the function, and later re-uses the register-cached values assuming them to be intact, then it all goes horribly wrong. But, if I specify the outputs like this: // Outputs : "=&S" (_src), "=&D" (_dst), "=&c" (_numwords) the the compiler is warned that the registers are clobbered and now contain some (undefined and unused) return values, and won't expect _src, _dst and _numwords to be intact in esi, edi, ecx. Now everything works fine at -O3. However, I really don't understand the '&' early clobber constraint modifer. What use is it? The Gcc inline assembly howto seems woefully out of date. (Indeed in several of its examples, it would simply add esi, edi and ecx to the clobber list, but that seems illegal now; anything mentioned in inputs or outputs cannot appear in the clobber list it seems.) Thanks, Andrew Walrond