Arnd Bergmann <[EMAIL PROTECTED]> writes: > a) The patch gets merged upstream. It won't hurt anyone who is > building i486+ optimized binaries and fixes a real bug.
Upstream won't accept the patch, because of the performance penalty. Even if upstream accepts the patch, that won't be before gcc 3.4. However, the gcc 3.2/3.3 ABI will stay with us for a long time (most likely until after the next Debian release). > b) We provide a libstdc++-i386.so.$(version) file that contains > only the __exchange_and_add function and is linked to > libstdc++.so. That would work, yes. > We can shave a bit off by making the function __attribute__((regparm(2))) Even with that change, and -fomit-frame-pointer, I get inline: 2.39809 out-of-line: 4.0224 i.e. this is still a 60% slowdown (in a test case where the processor does branch prediction correctly all the time, and everything is in the cache). The assembler code is _Z11atomic_add2PVii: lock; addl %edx,(%eax) ret so it can't get any better. The performance hit is still unacceptable. > and perhaps by using a trivial non-locking variant when compiling > without threads, as the i386 version uses the mutex only in those > cases and AFAICS it is compatible with the i486 version otherwise. That won't help anything. "Compiling without threads" isn't really supported on Linux: if threads are not used, this is always a link-time/runtime issue, not a compile time issue. > If we know at compile time that locking (neither 'lock;' prefix nor > the mutex call) is never needed, we can even get much faster than the > current i486 code. We never know that. > Also, if an application or library cares about this sort of > micro-optimization, it probably should be provided in an optimized > version anyway. I think the performance loss for applications like KDE will be significant. I doubt that providing two versions of KDE (i386 and i486+) would be feasible. Regards, Martin