Author: allison Date: Wed Oct 11 13:37:08 2006 New Revision: 14903 Modified: trunk/docs/pdds/clip/pdd25_threads.pod
Changes in other areas also in this revision: Modified: trunk/ (props changed) Log: [EMAIL PROTECTED]: allison | 2006-10-11 13:18:42 -0700 Partial update of Threads PDD with collected wisdom from prior discussion. Modified: trunk/docs/pdds/clip/pdd25_threads.pod ============================================================================== --- trunk/docs/pdds/clip/pdd25_threads.pod (original) +++ trunk/docs/pdds/clip/pdd25_threads.pod Wed Oct 11 13:37:08 2006 @@ -20,7 +20,748 @@ =head1 DESCRIPTION -Description of the subject. +=head1 DEFINITIONS + +So we can all talk about things the same way, the following +definitons apply. Some of these are drawn from the POSIX thread spec, +and as such we should have a translation section at the end. + +=over 4 + +=item THREAD + +An OS level thread. If that makes no sense, neither will any of the +rest of this, in which case I recommend picking up "Programming with +POSIX Threads" by Dave Butenhof, and coming back when you have. + +=item MUTEX + +This is a low level, under the hood, not exposed to users, thing that +can be locked. They're non-recursive, non-read/write, exclusive +things. When a thread gets a mutex, any other attempt to get that +mutex will block until the owning thread releases the mutex. The +platform-native lock construct will be used for this. + +{{ - Nigel Sandever: Will this be macroised to hide the platform native +implementation from the main body of the code? +- Dan: Yes. }} + +=item LOCK + +This is an exposed-to-HLL-code thing that can be locked. Only PMCs can +be locked, and the lock may or may not be recursive or read/write. + +=item CONDITION VARIABLE + +The "sleep until something pings me" construct. Useful for queue +construction. Not conceptually associated with a MUTEX. (POSIX +threads require this, so we're going to hide it there behind macros +and/or functions) + +{{ - Nigel Sandever: Could you elaborate on the nature of what would constitute +a "ping"? +- Dan: POSIX has a function cond_signal (and cond_broadcast, which is +similar). What happens is a thread grabs the condition variable and +goes to sleep, and sleeps until another thread does a cond_signal, +which then wakes it up. If there are multiple threads sleeping on the +condition variable, only one is woken. (cond_broadcast wakes them all +up) }} + +=item RENDEZVOUS POINT + +A HLL version of a condition variable. + +=item INTERPRETER + +Those bits of the Parrot_Interp structure that are absolutely required +to be thread-specific. This includes the current register sets and +stack pointers, as well as security context information. Basically if +a continuation captures it, it's the interpreter. + +=item INTERPRETER ENVIRONMENT + +Those bits of the Parrot_Interp structure that aren't required to be +thread-specific (though I'm not sure there are any) I<PLUS> anything +pointed to that doesn't have to be thread-specific. + +The environment includes the global namespaces, pads, stack chunks, +memory allocation regions, arenas, and whatnots. Just because the +pointer to the current pad is thread-specific doesn't mean the pad +I<itself> has to be. It can be shared. + +=item INDEPENDENT THREAD + +A thread that has no contact I<AT ALL> with the internal data of any +other thread in the current process. Independent threads need no +synchronization for anything other than what few global things we +have. And the fewer the better, though alas we can't have none at all. + +Note that independent threads may still communicate back and forth by +passing either atomic things (ints, floats, and pointers) or static +buffers that can become the property of the destination thread. + +=item SHARED THREAD + +A thread that's part of a group of threads sharing a common +interpreter environment. + +{{ - Leo: I presume that this "group of threads" is one thread pool or +interpreter pool. Could you expand the definition to cover "pool". +- Dan: Ah, damn, I needed to fix that up before sending it out. Should've been +"SHARED INTERPRETER". A shared interpreter is one that's in an interpreter +pool. + +An interpreter pool is a set of interpreters that share common +resources. They're essentially threads in the OS sense, and they have +shared access to pretty much everything. The memory pools, arenas, +and global namespace are shared--pretty much everything except what's +in the interpreter structure itself. }} + +=back + +=head1 REQUIREMENTS + +=head2 Supported Models + +=over 4 + +=item POSIX threads + +The threading scheme must be sufficient to support a POSIX +share-everything style of threading, such as is used in perl 5's +"pthread" model, as well as the thread models for Ruby and Python. + +=item "Process-type" threads + +The scheme must also support the perl 5 "iThreads" threading +model. In this model no data is shared implicitly, and all sharing +must be done on purpose and explicitly. It very much resembles the +Unix fork-process-with-shared-memory-segment model, not a surprise as +it was originally developed with emulation of Unix's fork system in +mind. + +{{ - Nigel Sandever: Pseudo forks? +- Dan: Yeah. On Win32, when Perl forks it starts a new thread and clones the +interpreter and its contents. If you later do an exec in that thread it starts +a new process and grabs hold of it. Definitely a clever trick. }} + +=back + +=head2 Guarantees + +{{ -Dave Whipp: Maybe this isn't strictly a threading thing, but are we going +to make any guarantees about event orderings? For example, will we guarantee +that a sequence of events send from one thread to another will always be +received in the order they are sent? (Obviously no guarantees about +interleaving of events from other threads). This is usually only important in +distributed environments where multiple paths exists between a pair of +endpoints, but it would be nice for user-code to be able to rely on it. }} + +=over 4 + +=item No Crashes + +The interpreter guarantees that no user program errors of any sort +will crash the interpreter. This includes threading problems. As +such, synchronization issues (where multiple interpreters are +accessing the same shared data) must not crash the interpreter or +corrupt its internal state. + +=back + +=head2 Assumptions + +=over 4 + +=item System memory allocation routines are threadsafe + +We are assuming that the memory allocation system of the base OS is +threadsafe. While some of the C runtime libraries are notoriously +thread dangerous, memory allocation code generally is threadsafe, and +we'll assume that on all platforms. (Though we will, in general, +manage our own memory) + +{{ - Nigel Sandever: Is there any likelyhood that memory allocation will be +hidden behind macros at two levels: + - ALLOCPOOL() for allocating large chunks of memory (ppols) that are + later sub-allocated (and managed) by: + - SUBALLOC() for sub allocating within a pool + +- Dan: We'll generally hide behind the memory.c routines, if for no other +reason than to allow the embedder to override the memory functions. We need to +define that at some point... +- Gordon Henriksen: Are you wanting something akin to Apache 2 pools, which are +hierarchical and designed to reduce path length when freeing blocks of objects? +For instance, all objects and sub-pools allocated during an HTTP request cycle +can be deallocated just by free()'ing the top-level request pool. + +I don't think parrot could make use of that model, since it can't very +well guarantee that the user cannot retain references past the lifetime +of the pool. Apache trusts modules not to make such errors; parrot can't +trust the byte-code it's executing any further than it can throw it. A +generational collector is a more likely means by which parrot might +reduce memory-related overhead. +- Nigel Sandever: Nothing to do with Apache memory pools. + +I believe that parrot already has the concept of memory pools in it's +memory management. The idea is that by allocating similarly sized objects +from separate (large) allocations, you can reduce the fragmentation of +chunks and reduce the incidences where the memory need to be GC'd and +compacted. + +Allocating an 8 byte chunk from a common memory pool is quite likely to +nip a little off from a previously freed large chunk. When it comes time +reallocate another chunk the same size as that large, freed chunk, although +there is enough room in the over all freespace chain to accommodate it, the +largest available chunk is now 8 bytes or so too small for the requirement. + +That induces either a compaction cycle or the need to extend the memory pool +by the size of the large request. + +Allocating all small requests from the same pool, and large from another +pool means that you are less likely to fragment the memory and more likely +to be able to re-use an existing slot in the free-space chain for any +given request. + +If the allocation of pools, and the allocation of bit-of-a-pool, are +macroised, it makes it possible for OS's where there are multiple APIs +for memory allocation to bypass the CRT memory allocation routines and +use which ever native APis are best suited for the type of allocation. + +Personally, I would like to see memory allocation for each class type +be managed by the class constructor itself. This would theoretically allow +each class that has a fixed instance size to manage it's own pool on OS's +where that makes sense. The class would allocate a pool for itself when +loaded and then allocate instances from that pool on new() and deallocate +upon DESTROY. If it's memory pool was exhausted when new was called, +it would invoke the GC on *it's pool only*. + +This separation would mean that each run of the GC would have a much smaller +pool of memory to compact and garbage collect when it was invoked. It would +also be less likely to be called, as each allocation from a pool of fixed +sized sub allocations will only ever need to call the GC when it's pool is +entirely exhausted. + +But that is a radical departure :), so if would just like to see separate calls +for pool allocation/reallocation and element allocation/reallocation, rather +than having calls to malloc() scattered through out the codebase. + +- Gordon Henrikson: Don't presume that [allocating similarly sized objects +reduces fragmentation and GC]. General purpose memory allocators tend to handle +this scenario... generically. Use of custom allocators is usually a premature +optimization; the best generic memory allocators are very, very good. + +But the ultimate effect of the overuse of custom allocators or pools is +to move memory fragmentation to a higher level, preventing the generic +allocator from addressing it even when the memory allocation patterns +would otherwise allow it. + +Would much rather see [the allocation of pools] abstracted, as they are now, +beneath an API. This reduces the number of ways in which compiled parrot +extensions can be incompatible with the parrot core. + +A PMC class is free to [manage the memory allocation for its +own class constructor]. Only the PMC header cannot be thus managed, and that's +already pooled. + +Th[e idea that such a separation means a smaller pool to compact and garbage +collect] is false. The mark phase will still need to run over the entire +process, else it cannot detect all references into the pool. (Unless you can +provide a type-safety guarantee that your type can only be referenced by other +instances of itself. In which case, your type is necessarily "garbage" and the +best optimization strategy would be dead-code elimination. :) + +I predict the best overall throughput would be with the sweep or copy +phase run immediately after the mark phase, across the entire process: +It would be wasteful to do all that marking and yet leave known garbage +uncollected. + +Statistically, multiple pools will individually exhaust themselves +*MORE* frequently than a global pool. The ideal case for collection +frequency is that there is only one pool, or that all pools rise to +capacity concurrently. In these ideal cases, the segregated pools will +exhaust themselves precisely *AS* frequently as a global pool. In any +case, there is no possibility for a decrease in collection events +through the use of pooled memory. As per above, collection events will +also not become less expensive. Thus, expect garbage collection to have +a negative net impact on performance if pooled memory is used. + +- Leo: The problem is not in the fixed sized header pools, its with the +*memory* pool used e.g for string memory. + +During GC we are walking the *header* pools, and if we find its buffer +memory being used, we move that buffer memory to a new store, thereby +compacting it. The old memory store(s) are freed then. + +So string memory can move around beyond your code. +}} + +=back + +=head1 Proposal + +The proposal is as follows: + +=over 4 + +=item All global state shall be protected by mutexes + +Straightforward enough. This allows for independent threads to +coexist without threatening the state of the proces. + +=item Multiple independent interpreters will be allowed + +Once again, straightforward enough. With threadsafe protected global +state, there's no issue here. + +=item Only one OS thread in an interpreter at once + +While there is no requirement that any interpreter be tied to an +underlying OS thread, under no circumstances may multiple OS threads +use a single interpreter simultaneously. + +=item A Stop-and-copy communication method will be provided + +Parrot will provide a function to make a call into another interpreter +and wait for that call to complete. This call may pass in data and +have data returned to it. The interpreter making the call will block +until the call is complete. The data passed in as parameters will be +copied into the called interpreter, and any return values will be +copied back into the calling interpreter. The called interpreter will +block while the return data is copied back into the calling interpreter. + +=item Inter-interpreter events will be provided + +Interpreters will be able to post events to other interpreters. + +=item Each interpreter will have a unique id + +This ID will be independent of the process or OS thread, and will be +constant across the lifetime of the interpreter. Interpreter IDs +I<may> be reused as interpreters are destroyed and recreated, and as +such are only guaranteed valid while an interpreter is in use. + +(Note that we may decide to relax this requirement, but doing so +likely means moving to at least 64-bit integers to mark interpreter IDs) + +{{ - Nigel Sandever: The provision of method(s) to obtain the native +TIDs/HANDLES would make life for those writing implementation specific +extensions easier. +- Dan: TIDs, definitely. It'll be a sysinfo or interpinfo function. There's the +issue of interpreters binding to different threads, but we'll deal with that. +I'm not sure what the HANDLE is, but explain it and we can work something out. +:) }} + +=item Each interpreter show the same process id + +All the interpreters within a process will share a process ID. On +those systems where each thread has its own unique ID (such as many +versions of Linux) Parrot will still report a single process ID for +all interpreters. + +This process ID will be the ID of the process that first instantiated +Parrot. + +{{ - Leo: The term "process id" is really misleading. Again I presume that one +pool is meant here. I'd vote for keeping a process ID what it is and use "pool +id" or such here. +- Dan: Nope. It's the process ID. Nothing misleading there, unless you've done +threading work under linux, since for a while it gave each thread a separate +PID. +- Leo: $ ps | grep [p]arrot +17472 pts/0 00:00:00 parrot +17473 pts/0 00:00:00 parrot +17474 pts/0 00:00:00 parrot + +So the unless applies ;) This is a single parrot interpreter, with main, +thread-manager, and event-handler thread. 3 PIDs. + +- Liz Mattijsen: This _all_ depends on which version of Linux you're using. +Older versions don''t do it that way, and newer versions don't do it either +(the specific versions escape me at the moment, but I know RH9 does _not_ have +pids for threads). + +I know, because my Thread::Signal module depends on threads having +pids. But fewer and fewer Linux systems "support" it (and Linux was +the only one who worked this way to begin with). + +- Dan: Right. Linux is, as I said, arguably broken here, but I don't want to +argue that point. That's why I specified that the PID is the process id of +whatever instantiated parrot -- in this case, no matter which thread you're in, +when you ask for the pid from parrot you should get 17472. (In this case, at +least) + +Parrot needs to grab the pid at global initialization time and store +it away for later pid queries. }} + + +=item Interpreter pools will share allocation pools + +All the interpreters in an interpreter pool will share header and +memory allocation pools. This means that when there is more than one +interpreter in a pool the memory allocation and collection system +needs to be swapped out, as a copying collector is generally +untenable in a threaded environment. + +{{ - Dan: For a copying collector to work, all the mutators must be +blocked, and arguably all readers should be blocked as well. (I think +they need to be, otherwise you may be accessing reclaimed and reused +memory) That means if we keep a copying collector we need to have +everything that accesses strings or pmcs to get a lock on the GC +before every access of every string or PMC. A touch excessive, I +think. +- Gordon Henriksen: True of non-moving collectors, too. Or, let me put it this +way: non- copying *GC* (the sweep or copy phase) can be threadsafe, but the +mark phase is never threadsafe. The method in which marking is not threadsafe +is a bit more pathological (i.e., it's not the common case as it is with the +copying collector), but a standard tracing DOD cannot be correct when +competeting with mutators. It WILL collect non- garbage (those are MUTATORS, +remember), and the result WILL be Heizenbugs and crashes. + +Some of what I've written up [AR see next comment] addresses why. It's pretty +simple to demonstrate a single case to prove the point, but I don't feel like +re-creating the ASCII art right now. :) I'll send that section when I get out +of the office. + +parrot will have to be able to suspend all threads in the environment. +Unfortunate, but really quite unavoidable. + +- Gordon Henriksen: Consider this simple object graph: + + Root set = { &A, &B } + + [ A=NULL] + [ B=&C ] ---> [ C=....] + +Collection begins in thread 1 and marks A as reachable before being +preempted by thread 2: + + [xA=NULL] + [ B=&C ] ---> [ C=....] + +Thread 2 sets A to &C, and nullifies B before being preempted again by +thread 1: + + [xA=&C ] ---> [ C=....] + [ B=NULL] + +Thread 1 marks B as reachable: + + [xA=&C ] ---> [ C=....] + [xB=NULL] + +The mark phase complete, thread 1 sweeps: + + [ A=&???] ---> ??? + [ B=NULL] + +C was not marked reachable (although it was) and was thus erroneously +collected, leaving a dangling pointer. This problem applies equally to +copying and mark-sweep collectors. + +- Dan: Ah, OK, I see. The problem comes in where we've got an object in the +transient root set (basically the processor stack and registers) that +gets anchored into the base root set (stash, pads, or whatever) after +the DOD has traced where it's going into and falls out of the +transient root set before the DOD traces over to it. + +Race condition. Dammit. + +Okay, I'd not wrapped my brain around that possibility, which will +make for some interesting DOD tracing, especially on SMP systems. I +was *really* hoping a single lock on the arena allocation system that +the DOD held onto while tracing would be sufficient, but I see that +it isn't. + +That means we're going to have to have either a really forgiving DOD +system that takes multiple passes before it collects up a PMC or +buffer (which still isn't safe) or have some way to force a +low-overhead rendezvous. + +The obvious rendezvous point is the arena lock, but that's going to +see a lot of contention anyway, and we'd as soon not have a single +one for speed reasons. Feh. + +Okay, we're going to mandate that we have read/write locks, each +interpreter pool has one, and mutating vtable entries must get a read +lock on the pool read/write lock. Pricey (ick) but isolated to +mutators. The DOD gets a write lock on it, which'll block all +read/write access so no mutators can be in process while the pool DOD +runs. + +I think that'll work. The .1j thread spec requires r/w locks, and we +can fake them on platforms that don't implement them. Hopefully +Nigel's got the Windows scoop so we can see if Win32 has anything +like this (which I'd expect it does) + +- Leo: That is well knowm in the literature as "Tri-Color Invariant": Black are +the already marked (live) PMCs, grey the PMCs on the next_for_GC list, +white the not yet reached PMCs. The strong tri-color invariants states +that no black object may point to a white object, the weak invariant +states, that at least one path from the black to a white object must +contain a grey one. + +This can be handled by either stop the world GCs or by intercepting each +read or write access that would change the color of an object and update +the color accordingly. This is e.g. used for incremental GC. As soon as +we have a thread in the background that runs GC, we have to cope with +these issues. + +- Dan: Yeah, point. And since we want to be able to have an incremental DOD +at some point we need to get support for it in now. + +- Leo: Stopping all interpreters seems to be cheaper. The rwlock will sooner or +later stop all interpreters anyway (on first PMC access), so we can omit the +price for the rwlock and just hold the world(s). +- Dan: The rwlock only stops all the interpreters when the DOD runs. +Anything that mutates a PMC gets a *read* lock so that they don't +interfere with each other, and only pause if the DOD is running. The +DOD getting a *write* lock will block any read lock attempts, so when +the DOD is running no mutation can take place. Since mutation doesn't +require any global exclusion it doesn't need a write lock -- the read +lock is sufficient. +- Leo: Sure. But that would still need to aquire a readers rwlock for each PMC +access. This is more expensive then a mutex. During DOD any PMC access +will halt the interpreter, so we can do that explicitely too and save +the cost for the rwlock overhead. Albeit I can imagine, that aggregates +will need a rwlock anyway. + +- Leo: An alternative would be real background incremental GC, *when* running +multiple threads. I estimate the overhead to be in regions of a rwlock (with no +contention of course). +- Dan: If we have the facilities to do incremental DOD runs then this is +definitely a possibility except for finalizers. Finalizers make +things interesting, though if the background thread doing the DOD is +a member of the interpreter pool then it'd work out OK. +- Leo: Finalizers and incremental DOD don't play together. The DOD must run to +end to be sure, that the objects isn't referenced any more. + + +- Leo: What about temporary PMCs (or strings)? Evaluating a non-trivial +expression can have lot of these. Each new temp would need a lock on the +memory sub-system. +- Dan: Those won't get marked as shared unless we're unconditionally marking +things as shared. (Though we may just give 'em a mutex anyway) [New temps would +need a lock] only on allocation. We could have a per-thread freelist if we +wanted. Wouldn't be unreasonable. +- Leo: This needs either one check per PMC, if its shared or not, or additional +costs for locking temps. Both are rather expensive, compared to the raw +working functionality of a vtable. + +That already smells like separate memory sub-systems. The freelist has +to be filled first. During DOD runs, it has to be refilled. To achieve +some locality of the temp PMCs, you can't just give these to arbitrary +intpreters. Separate free lists seem also to imply separate DOD runs to +cleanup the temps. + +- Leo: Combined with the cost of: "All shared PMCs must have a +threadsafe vtable - The first thing that any vtable function of a shared PMC +must do is to aquire the mutex of the PMCs in its parameter list " +i.e. typically 3 mutexes, it could be that the vtable of a shared PMC +should just grab one interpreter lock (of the owner of that PMC) and +that not all is shared (especially not the temps). +- Dan: Yeah, but since PMCs aren't owned by any one interpreter in the pool +but are rather pool-wide you run into either the global-lock problem +(which kills performance on SMP systems) or interpreters potentially +deadlocking on unrelated data access. +- Leo: Could be. But the performance is the overall throughput. When a lot of +fine grained locks (plus memory subsystem locks for temps) take much +longer then one global lock, then that scheme can nethertheless be +slower on SMP machines. It would scale better though for more CPUs. + +Deadlocks shouldn't be a problem, if exactly one vtable (like in +SharedRef) just grabs one and only one lock. + + +}} + +As the allocation and collection system is a black box to user +programs and much of the interpreter internals, this isn't a big deal +outside of needing swappable allocation systems, the potential +issue of COW'd shared memory leaking, and the need to switch +allocation schemes mid-execution. + +=item Each interpreter has a separate event queue + +Some events, such as timers, may be interpreter-specific and, as +such, each interpreter has its own event queue. + +=item Each interpreter pool has a shared event queue + +Some events, such as IO callbacks, may not be interpreter-specific, +and can be serviced by any interpreter in the interpreter pool. For +these events, there is a pool-wide event queue. + +=item PMCs are the coordination point for threads + +That is, only PMCs are shared as such between threads. Strings, +specifically, are I<not> shared between interpreters as such + +=item All PMCs shared amongst interpreters in a pool must be marked shared + +A PMC which is not marked shared may not be handed to another +interpreter. Parrot will prevent this from happening either by +marking the PMC as shared, or throwing an exception when the PMC is +placed in a spot where it may be shared but is not shareable. + +=item All shared PMCs must have a threadsafe vtable + +The first thing that any vtable function of a shared PMC must do is to +acquire the mutex of the PMCs in its parameter list, in ascending +address order. When the mutexes are released they are not required to +be released in any order. + +{{ - Uri: why the ascending address order to grab mutexes? is this to help solve +deadlocks? +- Dan: Yes. The recommendation I've always seen for deadlock avoidance is to +always have all your code grab its mutexes in some fixed order. With source, +it's generally recommended that you grab mutex variables in lexical order. +Since we're all binary we need a different order, and ascending addresses are +reasonably simple to do. +- Gordon Henriksen: Note that this locking strategy cannot avoid deadlock if +the user is allowed to acquire these locks--HLL locks must be altogether +different beasts from automatic PMC locks. That's okay. Just a design +consequence worth noting for everyone. +- Dan: Oh, arguably it can't avoid deadlock at all, what with vtable methods +having access to the full environment. I can live with deadlocks, only because +there's no real alternative. +- Gordon Henriksen: But PMC implementations have to fall inside of the trusted +environment, so that's not really a failure. Of course uncooperative code can +break a cooperative algorithm. + +- Sam Vilain: RE: "have all your code grab its mutexes in some fixed order." + +Yes; otherwise, you need to back off and start again, if one lock +acquisition fails. + +Consider these functions; for the purpose of this example, lexical +lock ordering is implied; + + func1($AAAA, $CCCC, $FFFF, $GGGG, $KKKK); + func2($BBBB, $DDDD, $FFFF, $IIII, $KKKK); + func3($FFFF, $KKKK); + +So long as the locks are ramped up from the lowest to the highest, +there is no chance of func1 acquiring a lock to be held by func2 (eg, +$KKKK), if that other function already has one of the shared +dependancies (eg, $FFFF). + +All well and good. But, what about recursive locks? + +ie + + sub func1($one is locked, $two is locked, $three is locked) { + for my $x ($one, $two, $three) { + func2($x.property) if $x.property; + } + } + + sub func2($eins is locked) { + if ($eins.property) { + func2($eins.property, { }); + } + } + + $aaa.property = { }; + $bbb.property.property = $aaa; + $ccc = { }; + + if (thork()) { # fork a thread + # thread A + func1($aaa, $bbb, $ccc); + } else { + # thread B + func2($bbb.property); + } + +OK, the execution order that causes the deadlock is: + + 1. Thread B - func2() acquires a lock on the $bbb.property PMC. + 2. Thread A - func() acquires a lock on $aaa, $bbb, $ccc. + 3. Thread A - func2() acquires a lock on $aaa.property, which + returns quickly + 4. Thread A - func2() blocks waiting for a lock on $bbb.property, + held by func2() in thread B + 5. Thread B - func2() blocks waiting for a lock on + $bbb.property.property, held by func() in thread A. + +So, there are still possibilities for deadlocks, as the non-linear +nature of subroutine calls screws up your nice lock ramping. + +In summary, acquiring mutexes in a defined order as a means to avoid +deadlocks only works when you are acquiring them all up front. If you +are acquiring any later, and you detect a deadlock (ie, a loop of +threads blocking on locks held by each other), you *must* be able to +tell one of them to "ramp off" to holding no locks at all. ie, +ROLLBACK :). + +The bugger is, winding back execution state automatically to when you +last started acquiring locks is probably an intractable problem from +an interpreter's perspective... + +Sounds like a job for an exception to me ;-). + + for (1..50) { + eval { + func_that_acquires_first_lock($args); + }; + last unless $@ and $@ =~ m/mutex deadlock/i; + } +}} + +=item Automatic PMC sharing will be provided + +When a PMC is placed into a container which is shared (including +lexical pads and global namespaces) then that PMC will automatically +be marked as shared. It is acceptable for this to trigger an +exception if for some reason a PMC should not be shared between +interpreters. + +PMCs are, by default, not shared. This avoids sharing overhead for +PMCs which are only used as temporaries and not shared. (Note that +this is dangerous, and may end up not being done, due to the sharing +of continuations) + +{{ - Luke: I don't know why this is dangerous. A continuation is a data structure +just like an array or a hash. When you share it, everything "inside" it +gets shared, too. For a continuation, this could be an awful lot of +stuff, but it's the safe way. +- Dan: True, but the danger part is if we don't mark everything grabbed by a +continuation as shared by default. If we do, we might as well mark everything +as shared, as there'll be less overhead. }} + +=item All interpreter constructs in a pool are shareable + +This means that a PMC or string may be used by any interpreter in a +pool. It additionally means that, if full sharing is enabled, that +any interpreter in a pool may invoke a continuation, assuming the +continuation is valid. (That is, a continuation taken at parrot's top +level. Continuations taken within vtable functions, user-defined ops, +or extension code may not be shareable) + +=item The embedding API will allow posting events to a pool + +Many events are interpreter-specific, often caused by one particular +interpreter requesting an async event that later completes. + +=item The embedding API will allow posting events to an interpreter + +For events that don't have to go to any particular interpreter, they +can go into the pool's event loop. + +=item The embedding API will allow calling a parrot sub with a pool + +In those cases where there is an interpreter pool, embedders may call +a parrot sub using the pool as a whole, rather than an individual +interpreter, to run the sub. In that case Parrot may either choose a +dormant interpreter (if there is one) or create a new interpreter in +the pool to run the subroutine. + +When the sub is done, Parrot may either cache the created +interpreter or destroy it as it needs to, though in no case will +Parrot ever leave a pool with no interpreters at all. + +=back =head1 IMPLEMENTATION