Re: deadlocks and race conditions

Mark Volkmann Wed, 19 Aug 2009 10:44:31 -0700

On Wed, Aug 19, 2009 at 12:18 PM, Rich Hickey<richhic...@gmail.com> wrote:
>
> On Mon, Aug 17, 2009 at 4:16 PM, Mark Volkmann<r.mark.volkm...@gmail.com> 
> wrote:
>>
>> On Mon, Aug 17, 2009 at 2:47 PM, Michael Reid<kid.me...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Ditto on the ordering example. Clojure can't infer which order your
>>> code needs to run any more than it can figure out what your code is
>>> supposed to do.
>>
>> Okay. I was pretty sure that was the case, but just wanted to verify
>> that there wasn't something special going on that I was missing.
>>
>>> On the deadlock question, it is my understanding from a prior post by
>>> Rich that Clojure's STM acquires locks in such a way that deadlocks
>>> cannot occur (as I recall, the locks are sorted by numeric ID an
>>> acquired/released in that order). If a deadlock occurs in the STM, my
>>> understanding is that would be a bug.
>>>
>>> So yes, the Clojure STM should guarantee that a deadlock situation
>>> does not occur.
>>
>> I think it's a common misconception that STM in general just manages
>> locks for you and basically uses them in the same way you would if you
>> were hand-coding lock-based concurrency.
>
> How could that possibly be the case? Lock strategies are open and
> arbitrary. Also, some STMs don't use locks at all. What an STM that
> uses locks should do is use them correctly. Every time you hand-code
> lock-based concurrency you risk getting it wrong. If an STM
> implementation gets it wrong, it gets fixed and everyone benefits from
> the fix.


I agree with everything you said above. I'm explaining what I have
witnessed as a misconception. For example, a C++ guy I know said that
STM implementation "use locks like crazy". I'm trying to explain that
that isn't the case in the Clojure STM implementation.

>> I can say with certainty
>> after studying the Clojure STM implementation for over a month that
>> that isn't the case.
>
> What exactly are you saying?

I'm saying that the Clojure STM implemenation doesn't "use locks like
crazy". For example, a lock is not held for every Ref that is modified
in a transaction. Instead, the Ref gets its tinfo field set to
indicate that a transaction has modified it, but hasn't yet committed
the change.

>> Refs in Clojure STM each have a ReentrantReadWriteLock object. Any
>> number of concurrent transactions can hold the read lock for a Ref OR
>> a single transaction can hold the write lock for a Ref. The only time
>> one of these locks is held for the duration of a transaction is when
>> you call ensure on a Ref. In that case it will hold a read lock until
>> either the Ref is modified in the transaction or until the transaction
>> commits. There is no circumstance under which a write lock will be
>> held for the duration of a transaction. Write locks are obtained only
>> briefly when certain things occur during a transaction and then
>> quickly released. They are acquired again when the transaction commits
>> and then released when the commit completes. The way that one
>> transaction knows that another one has modified a Ref and hasn't yet
>> committed is that the tinfo field of the Ref is set to refer to an
>> object that describes the transaction that modified the Ref (including
>> the order in which it started relative to others and its current
>> status).
>>
>> So locks are not sorted by a numeric ID.
>
> This is an oversimplification and simply wrong. You'll need to spend
> more time studying the implementation. In particular, commuting uses
> an ordered lock strategy.

Is there something specific that I said above that is wrong, because
based on my reading of the code I think it is all correct. I just
discovered what you are talking about with ordered locking for
commutes! See my next comment below.

>> It is transaction starts,
>> tries and commits that are numbered that way. It's a beautiful and
>> complicated strategy ... but it leaves me wondering whether there is
>> any possibility of deadlock. I don't see how it could happen. I'm just
>> asking in case I overlooked something.
>
> Deadlock can be precluded by avoiding mutually incompatible multi-lock
> acquisitions, and/or utilizing non-forever-blocking (timeout-based)
> lock acquisition. In Clojure's STM several strategies are combined.
> ensures use mutually compatible readlocks, and can thus hold them over
> time. Writes use write locks, but - a) try to obtain them only with a
> timeout, and b) don't hold more than one at a time other than during a
> commit, c) use a tagging system during the body of the transaction.
> So, they obtain a write lock temporarily, then tag as owned by this
> transaction, then release. On commit, everything written to will have
> been successfully tagged, and those locks can then be obtained in any
> order, since any other writing transaction vying for the same refs
> will have failed. Thus there can be no lock A wait for B/lock B wait
> for A deadlocks. Commutes always proceed (without tagging), but need
> to coordinate with other writes and commutes on commit. Thus commute
> locks are obtained in order

Ah ... I just saw a detail I had overlooked. The commutes are stored
in a TreeMap, not a HashMap. So that's where the order comes from.
They are sorted on the Ref ids which are assigned sequentially as the
Ref objects are created.

> (with timeouts to avoid clashes with
> writes), contrary to what you've said, based upon comparability of
> refs utilizing numeric IDs. This ensures that multiple commutes
> involving overlapping sets of refs use compatible lock orders.

Cool! I get that now.

> These are well known techniques for safe manual lock based
> concurrency, simply automated by Clojure's STM, so I don't know what
> you are talking about above. I've done a ton of manual concurrency and
> have used all of these techniques. That said, there are many ways to
> combine these and others and the implementation is subject to change.
> Marrying your understanding of STM to understanding this particular
> implementation is tenuous. Not that there's anything wrong with trying
> to understand it, but going from the specific to the general is
> dangerous, and many details *are likely to change* as it gets refined.
>
> If a transaction can't succeed in delivering the semantics, there will
> be retries, which are not problems but the expected behavior. Livelock
> would be possible, but is avoided by the retry limit. If you encounter
> the retry limit it means you will have to re-architect your units of
> work (sizes, durations, degrees of contention, commute vs set), and
> would have to make similar changes with any strategy.

Thanks for the feedback Rich!

-- 
R. Mark Volkmann
Object Computing, Inc.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: deadlocks and race conditions

Reply via email to