On 8/20/09, Boehm, Hans <hans.bo...@hp.com> wrote: > > -----Original Message----- > > From: Lawrence Crowl [mailto:cr...@google.com] > > The problem is that gcc does support 80386. It also supports > > other processors that have less-than-complete support for > > concurrency. Just in the x86 line, we get some additional > > capability in many new layers. > > > > 8086 LOCK XCHG > > 80486 CMPXCHG XADD > > Pentium CMPXCHG8B > > SSE SFENCE > > Aside to an interesting discussion: > > I believe the current conclusion is that SFENCE should be ignored, > except for library or compiler-generated code that uses > non-temporal/coalescing stores, which I believe are also a recent > addition. Normal stores are ordered anyway, so it's not needed. > Thus you are faced with a choice of either (a) implementing fences > on the assumption that ordinary code may contain non-temporal stores, > or (b) making sure that non-temporal stores are always surrounded by > the appropriate fences. This is really an important ABI issue, but > it's something that I believe no ABI currently specifies. Our > conclusion in earlier discussions among a different group of people > was that (b) made more sense, since non-temporal stores of various > kinds seemed to be largely confined to a few library routines.
Hm. I would expect that given the C++0x memory model, compilers could be much more aggressive about using non-temporal stores, potentially improving performance substantially. That is, it may be better to accept a slightly less efficient ABI for today's compilers to gain a more efficient ABI for tomorrow's compilers. > It would be really nice if everyone somehow managed to agree on this. > Inconsistency here, probably even between Windows and Linux, seems > likely to result in really subtle bugs. > > Note that this also affects correctness of spinlock implementations, > not just atomics. A simple store to release a lock doesn't work if > the critical section may contain unfenced non-temporal stores. Yes, but the spinning acquire doesn't require the fence, only the the release. So, is this additional instruction a performance problem? > > > SSE2 MFENCE > > late AMD64 CMPXCHG16B > > > > So, we do not get to ignore the problem as a relic of 80386. This email seems to have gotten side-tracked by my filters. Sorry for the delay. -- Lawrence Crowl