Paul Rubin wrote:
Ross Ridge <rri...@csclub.uwaterloo.ca> writes:
Scott David Daniels <scott.dani...@acm.org> wrote:
The opcode cannot simply talk to its cache, it must either go
directly to off-chip memory or communicate to other processors
that it (and it alone) owns the increment target.
The cache coherency mechanism automatically prevents two or more
processors that have cached the same area of memory from
simultaneously modifying data in that area.
The same cache coherency mechanism that prevents ordinary
"unlocked" instructions from simulanteously modifying the same
cache line on two different processors also provides the guarantee
with "locked" instructions. There's no additional hardware locks
involved, and no additional communication required.
The cache coherency mechanism is what Scott described as
"communicat[ing] to other processors that it (and it alone) owns the
increment target". The cache coherency mechanism is not a trivial
thing at all. It introduces its own hazards and delays, and it is
getting more complicated all the time as processors and caches get
faster and larger. Some time ago, cpu's hit their megahertz limits
and that's why we're using multicores now. Some PL researchers think
cache coherency is going to be the next limit, and are advocating
languages like Erlang, which avoid use of shared memory and have
separate heaps per thread; or alternatively, approaches like the MS
Singularity research OS which relies on something like a linear type
system to statically ensure that a given object is accessible to
only one thread at a time. (That approach allows transferring
objects between threads with no locks or copying required).
How much difference would it make if the reference counts weren't in
cached memory? I'm thinking that an object could have a pointer to its
reference count, which would be stored elsewhere in some uncached memory.
--
http://mail.python.org/mailman/listinfo/python-list