Paul Rubin wrote:
Ross Ridge <rri...@csclub.uwaterloo.ca> writes:
Scott David Daniels  <scott.dani...@acm.org> wrote:
The opcode cannot simply talk to its cache, it must either go directly to off-chip memory or communicate to other processors that it (and it alone) owns the increment target.

The cache coherency mechanism automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area.

The same cache coherency mechanism that prevents ordinary "unlocked" instructions from simulanteously modifying the same cache line on two different processors also provides the guarantee with "locked" instructions. There's no additional hardware locks involved, and no additional communication required.

The cache coherency mechanism is what Scott described as "communicat[ing] to other processors that it (and it alone) owns the increment target". The cache coherency mechanism is not a trivial thing at all. It introduces its own hazards and delays, and it is getting more complicated all the time as processors and caches get faster and larger. Some time ago, cpu's hit their megahertz limits and that's why we're using multicores now. Some PL researchers think cache coherency is going to be the next limit, and are advocating languages like Erlang, which avoid use of shared memory and have separate heaps per thread; or alternatively, approaches like the MS Singularity research OS which relies on something like a linear type
system to statically ensure that a given object is accessible to
only one thread at a time.  (That approach allows transferring
objects between threads with no locks or copying required).

How much difference would it make if the reference counts weren't in
cached memory? I'm thinking that an object could have a pointer to its
reference count, which would be stored elsewhere in some uncached memory.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to