Paul Rubin schrieb: > "Martin v. Löwis" <[EMAIL PROTECTED]> writes: >> Ah, but in the case where the lock# signal is used, it's known that >> the data is not in the cache of the CPU performing the lock operation; >> I believe it is also known that the data is not in the cache of any >> other CPU. So the CPU performing the LOCK INC sequence just has >> to perform two memory cycles. No cache coherency protocol runs >> in that case. > > How can any CPU know in advance that the data is not in the cache of > some other CPU?
AFAIU, the lock# line, in P6, is only used for memory regions that are marked non-cacheable. The CPU determines a memory region to be non-cacheable if either the memory-type register (MTRR) says it's non-cacheable, or if the not-cached bit in the page table is set. If a certain location is known to be modified a lot from different CPUs (e.g. a spin lock), it might be best if the operating system sets the page where this location lives as non-cacheable - it might be more performant to always modify it through the main memory, than having cache coherence deal with it. > OK, this is logical, but it already implies a cache miss, which costs > many dozen (100?) cycles. But this case may be uncommon, since one > hops that cache misses are relatively rare. Right - I'm completely uncertain as to what the cost of a cache miss is, in terms of internal cycles. E.g. how many memory cycles does the CPU have to perform to fill a cache line? And what is the ratio between memory cycles and CPU cycles? > IIRC, the SPJ paper that I linked claims that lock-free protocols > outperform traditional lock-based ones even with just two processors. > But maybe things are better with a dual core processor (shared cache) > than with two separate packages. Likely, yes. Although different dual-core designs are out there, some of them not using a shared cache, but two individual ones. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list