Re: STM criticisms from Bryan Cantrill (Sun)

cliffc Wed, 05 Nov 2008 09:40:40 -0800

On Nov 4, 10:53 am, Rich Hickey <[EMAIL PROTECTED]> wrote:
> Once detected, a deadlock can still be a bear to reproduce/debug, and
> often does not appear until the worst possible time - production.

So far, my experience (both direct & observed from others) with
deadlocks has been:
- indeed you get nailed in production
- the post-mortem stack traces are good enough to figure out what
happened
- the fix forces you to think through the concurrency aspects of your
code, but if you manage to do that then the bug stays fixed (if you
fail to think it thru, e.g. "I'll just yank this lock" you are
generally in for a world of hurt)
- HotSpot is a ~500KLoC highly concurrent program with about ~100
named unique locks; we use aggressive lock-ranking asserts and in all
the years of hacking HotSpot I've only ever seen 2 deadlocks (there
were lots before the aggressive lock-ranking asserts - and those 2
deadlocks were because some more junior engineer skirted the asserts
rather than fix the potential deadlock).


> What's even more insidious is the memory not accessed under a lock
> that should have been, and the ensuing corruption.

No argument there.  #1 bug for Azul (and we train all our SE's to look
for it) is HashMap corruption leading to a closed-cycle linked list,
and threads stuck forever spinning down the infinite list.


> As far as livelock, if you get it to happen
> it is usually due to long-running transactions competing with short
> ones, and you can readily see/reproduce it under test loads.

Here I'll disagree.  Clojure uses a particular STM implementation, but
does not dictate at the language level the implementation.  Different
STM implementations have wildly different performance characteristics,
and there's a rich academic literature on how bad it gets.  "In
practice"  (for such little "practice" as there is; Clojure might well
rapidly have the most "practice") STM's indeed "get bad" unless they
are pampered well (i.e. expert tuning).  The problem is: what "gets
bad" varies wildly by STM implementations - so fixing the "long
transactions are getting live-locked by short runs" via hacking the
STM will surely performance-break some other program.

Here's where I think the root of my anguish lies: the STM
runtime&performance is opaque, so changing either it OR my program to
get performance is a hit-or-miss affair.  At least with locks I can
tell you something about what's going on, and make some kind of
engineering guess as to what hacks will help & why.

Random almost-unrelated correlation: making a long transaction into a
series of short ones for performance looks a whole lot like making
fine-grained locking from coarse-grained locking:
- You are breaking the atomic/locked region into parts
- You wouldn't bother expect performance sucks otherwise
- The danger is in missing a path that needs to be atomic
- Failure will be rare, unpredictable (untestable), unreproducible and
generally only under heavy load (i.e., production).
- Clojure's Immutibility won't save you here; you still must transact
around the correct set of Ref's.


> Rich

Don't get me wrong: I think Clojure is on to something good here -
Immutability IS the Big Hammer For Concurrency here, and the STM is
just one of several flavors of "Big Hammer isn't working so I need
another approach".  Given the Immutability-By-Default, STM has a much
better chance of being performant than in other languages so it makes
sense to give it a go.

Cliff

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---
Re: STM criticisms from Bryan Cantrill (Sun)

Reply via email to