Ok, I think I have tracked this down to having broken the aliasing
rules, and for the sake of completeness, here was the problem:

Recall that the (big picture) code works fine at -O2, but fails at
-O3. The problem seemed to stem from this inline assembly function:

void longcpy(long* _dst, long* _src, unsigned _numwords)
{
    asm volatile (
        "cld         \n\t"
        "rep         \n\t"
        "movsl       \n\t"
        // Outputs
        :
        // Inputs
        : "S" (_src), "D" (_dst), "c" (_numwords)
        // Clobbers
        : "cc", "memory"
        );
}

My interpretation of the problem:

_dst, _src and _numwords will get clobbered, but I didn't
care. Now if the compiler inlines the function, and later re-uses the
register-cached values assuming them to be intact, then it all goes
horribly wrong.

But, if I specify the outputs like this:

        // Outputs
        : "=&S" (_src), "=&D" (_dst), "=&c" (_numwords)

the the compiler is warned that the registers are clobbered and now
contain some (undefined and unused) return values, and won't expect
_src, _dst and _numwords to be intact in esi, edi, ecx.

Now everything works fine at -O3. However, I really don't understand
the '&' early clobber constraint modifer. What use is it?

The Gcc inline assembly howto seems woefully out of date. (Indeed in 
several of its examples, it would simply add esi, edi and ecx to the
clobber list, but that seems illegal now; anything mentioned in inputs
or outputs cannot appear in the clobber list it seems.)

Thanks,

Andrew Walrond

Reply via email to