This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Thread Programming Model =head1 VERSION Maintainer: Steven McDougall <[EMAIL PROTECTED]> Date: 31 Aug 2000 Last Modified: 05 Sep 2000 Mailing List: [EMAIL PROTECTED] Version: 2 Number: 185 Status: Developing =head1 ABSTRACT This RFC describes the programming interface to Perl6 threads. It documents the function calls, operators, classes, methods, or whatever else the language provides for programming with threads. =head1 SYNOPSIS use Thread; $sub = sub { ... }; $thread = new Thread \&func , @args; $thread = new Thread $sub , @args; $thread = new Thread sub { ... }, @args; async { ... }; $result = join $thread; $thread = this Thread; @threads = all Thread; $thread1 == $thread2 and ... Thread::yield(); critical { ... }; # one thread at a time in this block lock $scalar; lock @array lock %hash; lock ⊂ $ok = try $scalar; $ok = try @array $ok = try %hash; $ok = try ⊂ unlock $scalar; unlock @array unlock %hash; unlock ⊂ $event = auto Thread::Event; $event = manual Thread::Event; set $event; reset $event; wait $event; $semaphore = new Thread::Semaphore $initial; $ok = $semaphore->up($n); $semaphore->down; $count = $semaphore->count; $timer = Thread::Timer->delay($seconds); $timer = Thread::Timer->alarm($time); $timer->wait; $event = $fh->readable $event = $fh->writable $event = $fh->failure $ok = wait_all(@references); $i = wait_any(@references); $queue = new Thread::Queue $queue->enqueue($a); $a = $queue->dequeue; $empty = $queue->empty; =head1 DESCRIPTION =head2 Thread =over 4 =item I<$thread> = C<new> C<Thread> \&I<func>, I<@args> Executes I<func>(I<@args>) in a separate thread. The return value is a reference to the C<Thread> object that manages the thread. The subroutine executes in its enclosing lexical context. This means that lexical variables declared in that context may be shared between threads. See RFC 178 for examples. =item I<$thread> = C<new> C<Thread> I<$sub>, I<@args> =item I<$thread> = C<new> C<Thread> C<sub> { ... }, I<@args> Executes an anonymous subroutine in a separate thread, passing it I<@args>. The return value is a reference to the C<Thread> object that manages the thread. The subroutine is a closure. References to variables in its lexical context are bound when the C<sub> operator executes. See RFC 178 for examples. =item C<async> BLOCK Executes BLOCK in a separate thread. Syntactically, C<async> BLOCK works like C<do> BLOCK. C<async> creates a C<Thread> object to manage the thread, but it does not return a reference to it. If you want the C<Thread> object, use one of the C<new> C<Thread> forms shown above. The BLOCK executes in its enclosing lexical context. This means that lexical variables declared in that context may be shared between threads. =item I<$thread> = C<this> C<Thread> Returns a reference to the C<Thread> object that manages the current thread. =item I<@threads> = C<all> C<Thread> Returns a list of references to all existing C<Thread> objects in the program. This includes C<Thread> objects created for C<async> blocks. =item I<$result> = C<join> I<$thread> =item I<@result> = C<join> I<$thread> Blocks until I<$thread> terminates. May be called repeatedly, by any number of threads. Returns the last expression evaluated in I<$thread>. This expression is evaluated in list context inside the thread. If C<join> is called in list context, it returns the entire list; if C<join> is called in scalar context, it returns the first element of the list. =item I<$thread1> == I<$thread2> Evaluates to true iff I<$thread1> and I<$thread2> reference the same C<Thread> object. =item C<Thread::yield>() Gives the interpreter an opportunity to switch to another thread. The interpreter is not obligated to take this opportunity, and the calling thread may regain control after an arbitrarily short period of time. =back =head2 Critical section C<critical> is a new keyword. Syntactically, it works like C<do>. critical { ... }; The interpreter guarantees that only one thread at a time can execute a C<critical> block. =head2 Lock =over 4 =item C<lock> I<$scalar> =item C<lock> I<@array> =item C<lock> I<%hash> =item C<lock> I<&sub> Applies a lock to a variable. If there are no locks applied to the variable, applies a lock and returns immediately. If there are locks applied by another thread, blocks until there are no locks applied. If there are locks applied by the calling thread, applies another lock and returns immediately. The lock is automatically removed at the end of the lexical scope in which the C<lock> operator executes. =item I<$ok> = C<try> I<$scalar> =item I<$ok> = C<try> I<@array> =item I<$ok> = C<try> I<%hash> =item I<$ok> = C<try> I<&sub> Tries to apply a lock to a variable. If there are no locks applied to the variable, applies a lock and returns true. If there are locks applied by another thread, returns false. If there are locks applied by the calling thread, applies another lock and returns true. The lock is automatically removed at the end of the lexical scope in which the C<lock> operator executes. C<try> never blocks. =item C<unlock> I<$scalar> =item C<unlock> I<@array> =item C<unlock> I<%hash> =item C<unlock> I<&sub> Removes a lock from a variable. If there are locks applied by the calling thread, removes one. If there are locks applied by another thread, does nothing. If there are no locks applied to the variable, does nothing. C<unlock> never blocks. =back A consequence of these rules is that only one thread at a time may have locks applied to a variable. =head2 Event Events allow one thread to wait until something happens in another thread. Events have two states: I<set> and I<reset>. Threads I<wait> on an event; the C<wait> call blocks until the event is set. There are two kinds of events: I<manual> and I<automatic>. When a manual event is set, it remains set until a C<reset> call is made on it. All waiting threads are immediately unblocked, and subsequent calls to C<wait> return immediately. When an automatic event is set, one waiting thread is unblocked and the event is immediately reset. If there are no waiting threads, the first call to C<wait> resets the event and returns immediately. =over 4 =item I<$event> = C<auto> C<Thread::Event> Creates and returns an automatic C<Event> object. The event is initially reset. =item I<$event> = C<manual> C<Thread::Event> Creates and returns a manual C<Event> object. The event is initially reset. =item C<set> I<$event> Sets I<$event>. When a manual event is set, it remains set until a C<reset> call is made on it. During this time, any number of threads may C<wait> on it without blocking. When an automatic event is set, it is reset by the first thread that C<wait>s on it. =item C<reset> I<$event> Resets I<$event>. =item C<wait> I<$event> Blocks until I<$event> is set. =back =head2 Semaphore A semaphore manages a number, called a I<count>. The count is always between zero and a system-dependent maximum. C<up> and C<down> are guaranteed to execute atomically. =over 4 =item I<$semaphore> = C<new> C<Thread::Semaphore> I<$n> Creates and returns a new C<Semaphore> object, with an initial count of I<$n>. If I<$n> is omitted, the initial count is zero. =item I<$ok> = C<$semaphore>->C<up>(I<$n>) If the count can be increased by I<$n> without exceeding the maximum, does so and returns true. Otherwise, does nothing and returns false. If I<$n> is omitted, it defaults to 1. =item I<$semaphore>->C<down> Blocks until the count is positive, then decrements the count and returns. =item I<$count> = I<$semaphore>->C<count> Returns the current count of I<$semaphore>. C<count> is typically used at program termination to release all the resources controlled by a semaphore. Code that uses C<count> while more than one thread has access to I<$semaphore> is subject to race conditions. =back =head2 Timer =over 4 =item I<$timer> = C<Thread::Timer>->C<delay>(I<$seconds>) Creates and returns a new C<Timer> object. The timer will expire I<$seconds> seconds after it is created. I<$seconds> may be a floating point number, so this interface supports whatever time resolution the platform provides. =item I<$timer> = C<Thread::Timer>->C<alarm>(I<$time>) Creates and returns a new C<Timer> object. The timer will expire at I<$time> seconds after the epoch. I<$time> may be floating point number, so this interface supports whatever time resolution the platform provides. =item C<wait> I<$timer> Blocks until the timer expires. =back =head2 I/O =over 4 =item I<$event> = I<$fh>->C<readable> Returns a reference to a C<Thread::Event> object that is set when a read on the file handle will not block. =item I<$event> = I<$fh>->C<writable> Returns a reference to a C<Thread::Event> object that is set when a write on the file handle will not block. =item I<$event> = I<$fh>->C<readable> Returns a reference to a C<Thread::Event> object that is set when I/O on the file handle will fail. =back =head2 Wait functions Threads, variables, events, semaphores, and timers are I<synchronization objects>. A synchronization object is I<signaled> when a thread would not block on it. The conditions for an object to be signaled depend on the kind of object. =over 4 =item * Threads are signaled after they have completed execution. =item * Variables are signaled while there are no locks applied to them. =item * Events are signaled while they are set. =item * Semaphores are signaled while they have a positive count. =item * Timers are signaled after they expire. =back =over 4 =item I<$ok> = C<Thread::wait_all>(I<@references>) C<wait_all> takes a list of references to synchronization objects. It blocks until all of the objects are signaled. C<wait_all> changes the state of the objects as follows =over 4 =item * it applies a lock to variables =item * it resets automatic events =item * it decrements the count of semaphores =item * it does not change the state of other kinds of synchronization objects =back C<wait_all> does not change the state of any object until all the objects are signaled. This prevents deadlock, at least between competing wait functions. Returns true on success, false on error. An error occurs if an element of I<@references> is not a reference. =item I<$i> = C<Thread::wait_any>(I<@references>) C<wait_any> takes a list of references to synchronization objects. It blocks until at least one of the objects is signaled. On success, returns the index in I<@references> of one of the objects that was signaled. The state of this object may be changed, as described above for C<wait_all>. Returns -1 on error. An error occurs if an element of I<@references> is not a reference. =back =head2 Queue C<Thread::Queue> implements a thread-safe queue of scalars. =over 4 =item I<$queue> = C<new> C<Thread::Queue> Creates and returns a new C<Thread::Queue> object. The queue is initially empty. =item I<$queue>->C<enqueue>(I<$a>) Places I<$a> at the tail of the queue. =item I<$a> = I<$queue>->C<dequeue> Blocks while I<$queue> is empty. Then removes and returns the scalar at the head of the queue. =item I<$empty> = I<$queue>->C<empty> Returns true iff the queue is currently empty. C<empty> is typically used at program termination to remove all elements from a queue. Code that uses C<empty> while more than one thread has access to I<$queue> is subject to race conditions. =back =head1 IMPLEMENTATION All of these features should be doable if threads are built into Perl. Implementing the C<readable>, C<writable> and C<failure> methods on file handles probably requires asynchronous I/O. Not everything has to be in the core. For example, Semaphores can be built from Events, and Queues can be built from Semaphores. =head1 DISCUSSION This interface is an amalgam of =over 4 =item * the C<Thread.pm> interface from Perl 5.6.0 =item * the Win32 thread interface =item * my own wish list (you can't get it if you don't ask...) =back Here are some issues to consider =head2 Thread creation Threads are created by new Thread \&func new Thread sub { ... } async { ... } We arguably don't need three different ways to create threads. However, the different syntaxes fit into the language in slightly different ways, and I'm not sure which one I'd be willing to give up. The first is the most fundamental; losing it would be a serious inconvenience. Perl generally allows an anonymous subroutine where ever it allows a code ref, so the second also seems appropriate. And the third allows us to create threads with the kind of lightweight syntax that makes Perl such a lucid language. =head2 C<join> The calling context of C<join> can't be propagated into the thread, for several reasons. =over 4 =item * The thread can compute only one return value, but C<join> can be called repeatedly in different contexts. =item * The thread might terminate before the first call to join. C<join> can return the last expression evaluated in the thread, but it can't retroactively affect the context in which that expression was evaluated. =back Not allowing multiple C<join>s on a thread might help with the first problem; I can't see any way around the second. =head2 Critical sections This interface provides the critical { ... } construct. In principle, we don't need this: you can do the same thing with scoped locks sub foo { lock &foo; ... } sub bar { { lock $bar::a; ... } { lock $bar::b; ... } } Nonetheless, critical sections have several attractive features. =over 4 =item * They reduce clutter. No named variable to lock. =item * Along with less clutter comes fewer chances for bugs. There isn't a locked variable floating around to get locked by the wrong thread, or locked and never unlocked, or deadlocked, or... =item * The implementation can be highly optimized. Internally, a critical section is protected by some kind of mutex. However, this mutex isn't user visible: the interpreter has complete control over it. Therefore, it can be very lightweight. =back Efficiency matters, because critical sections are used to manage things that are...well...critical. Important, global, high-contention resources like memory managers and process schedulers. Granted, these are poor examples for Perl, but you get the idea. Whether to implement C<critical> depends partly on whether serializing execution of a block of code is common enough to merit its own keyword and syntax. Threads.pm in Perl 5.6.0 documents a C<:locked> attribute for subroutines; given a choice, I'd rather have C<critical> than C<:locked>. =head2 Locked variables Following some discussion on perl6-language-flow, I traded mutexes back for locked variables. This led to some changes in the wait functions. =head2 Events I dropped the C<cond_xxx> mechanism in C<Threads.pm> in favor of Events. Events do essentially the same thing with a simpler interface. In particular, Events don't expose a locked variable the way C<cond_xxx> does. (As far as I can tell, this variable is an artifact of the PThreads implementation.) One substantive difference between Events and C<cond_xxx> is that the manual/automatic distinction for Events is a property of the Event object, while the corresponding broadcast/signal distinction for C<cond_xxx> is a property of the signaling call. It isn't clear to me whether one model is preferable to the other. Another difference is that C<set> and C<reset> I<change the state> of an Event object, while C<cond_signal> and C<cond_broadcast> I<send a signal> to threads that are blocked on C<cond_wait>. This reminds me of the level sensitive/edge sensitive distinction for hardware interrupts. Having seen the grief caused by edge sensitive interrupts in the original IBM PC, I am leery of things that resemble them. Perhaps someone can make a case that this is a false analogy. Following some discussion on perl6-language-flow, I reviewed the documentation for the C<cond_xxx> functions in Perl 5.6.0. It begins The C<cond_wait> function takes a B<locked> variable as a parameter, unlocks the variable, However, C<Thread/Semaphore.pm> in the same release uses the C<cond_xxx> functions I<without> locking the condition variable. All I can say at this point is that I do not understand the C<cond_xxx> functions. Perhaps someone can submit an RFC documenting them and showing how to use them. =head2 C<die> I dropped the I<$thread>->C<eval> call from this interface, and didn't say what happens if a thread C<die>s. There are several possibilities =over 4 =item * The exception is propagated to any thread that C<join>s it. This has a certain logic to it, but it suffers from the fact that a program needn't C<join> its threads, so it doesn't guarantee that exceptions will actually be handled. =item * The interpreter prints C<$@> on stderr and exits. This is what C++ does. It ensures that exceptions won't just disappear into the void; however, it also causes a good deal of anxiety and paranoia, because I<any> thread can potentially blow your program out of the water. (I speak from experience here.) =item * The thread just quietly goes away. After working with threads in C++, I'm actually partial to this one. We still need some way to recover C<$@> when a thread C<die>s. Returning C<$@> to C<join> is probably the Wrong Thing. =back =head2 C<==> I dropped I<$thread>->C<equal> in favor of overloading C<==> to compare threads. This seems more natural, and should be easy to implement if threads are built into the language. =head2 Thread IDs I dropped thread IDs from the interface. You don't want thread IDs. Thread IDs are an implementation artifact. Carrying around explicit numerical indices isn't the Perl way. They were broken anyway (wrap at 2^32, with no guarantee of uniqueness after that). =head2 Detach I dropped C<detach> from the interface. Detach is an artifact of languages that require programmers to manage their own storage. It has rigorous semantics, there's no going back, and if you get it wrong, you either leak threads or you crash. In Perl, detachment is more a state of mind. We have threads, and we have C<Thread> objects to manage them. The thread holds a reference on its C<Thread> object until it terminates. The C<Thread> object holds a reference on its thread as long as the C<Thread> object exists. If there are no user-visible references to a C<Thread> object (i.e. the only reference on the C<Thread> object is the one held by the thread), then the thread is said to be detached. A call to C<Thread>->C<all> or C<Thread>->C<this> could recover a reference to the C<Thread> object of a detached thread; when this happens, the thread is no longer detached. In any case, you don't have to worry about it. Like so many others, C<detach> is a problem that Perl doesn't have. =head2 Import To minimize namespace pollution, we could @EXPORT_OK the functions that appear in this interface. use Threads qw(yield wait_all wait_any) On the other hand, if they get moved into the core the issue may be moot. =head2 I/O C<readable>, C<writable>, and C<failure> are documented as being file handle methods; however, it is anticipated that file handles will subsume sockets in Perl6. For an unconnected socket, the semantics of C<readable> are extended so that it is signaled when a C<connect> or C<accept> call will not block. Allowing applications to block on network I/O in a controlled fashion is an important use of the wait functions. An outstanding problem with the interface documented here is that it does not guarantee that a socket will still be readable or writable at the time the application actually attempts I/O, nor does it indicate I<how many> bytes may be read or written without blocking. A better approach might be to do asynchronous I/O, and have C<readable> and C<writable> return an Event that is set when the I/O operation completes. I hesitate to specify such an interface until there is more definition for file handles and asynchronous I/O in Perl6. =head2 Wait functions C<wait_all> and C<wait_one> are generalizations of the C<select>(2) Unix system call and the C<WaitForMultipleObjects> Win32 call. It would seem natural to pass locked variables directly to the wait functions wait_all($a, ... The problem with this is that there is no way to indicate whether the function is to wait for a lock on I<$a>, or wait for (say) an Event that I<$a> refers to. Instead, the wait functions take references; to wait on a locked variable, write wait_all(\$a, ... The wait functions may seem overdone; however, applications really do need these features, and they can be I<very> difficult to implement without language support. For example, C<select>(2) doesn't work with file descriptors for the console, and C<WaitForMultipleObjects> doesn't work with sockets. I have direct experience with the difficulty of programming around these deficiencies. =head2 Timer There are two kinds of timers: relative and absolute. Obviously, you can always build one kind out of the other, but I wanted to distinguish them with different constructors. I named the constructors C<delay> and C<alarm>, respectively. These are short, and read fairly naturally. =head2 C<this Thread> C++ partisans will get brain freeze reading code like my $thread = this Thread; but that's not why I traded in C<self> for C<this>. Really, it's not. I did it because it reads more naturally to me. =head1 CHANGES =head2 v2 =over 4 =item * Added SYNOPSIS, and wrote a proper ABSTRACT =item * Detailed C<async> =item * Detailed sharing of lexicals between threads =item * Traded Mutexes back for C<lock>, C<try>, and C<unlock> =item * Pushed C<Semaphore>, C<Event>, and C<Timer> down into C<Thread::> =item * Specified readable, writable and failure to return Events =item * Reworked the wait functions =item * Added C<Queue> =back =head1 REFERENCES RFC 1: Implementation of Threads in Perl RFC 27: Coroutines for Perl RFC 31: Co-routines RFC 47: Universal Asynchronous I/O RFC 178: Lightweight Threads Threads.pm PThreads info page