Does the following sound like a reasonable module to contribute to CPAN? If so, may I please be registered as a developer with the following information?
Name: Byron Darrah Email: [EMAIL PROTECTED] Preferred user id: BDARRAH Description of proposed contribution: A module that implements parallel loop constructs. In nature, it is similar to Proc::Swarm (managed subprocesses), but the semantics are very different. Using Pardo very strongly resembles using a standard foreach or for loop. It makes parallelization of existing code extremely easy. Output and return statuses from parallel subprocesses are presented in exactly the same order they would have if the loops really were ordinary foreach and for loops. It's even possible (and natural-looking) to nest parallel loops with Pardo. If this module would be acceptable for hosting on CPAN, then I think a reasonable name would be Proc::Pardo. If you like, you can get a copy of the module from: http://home.pacbell.net/bdarrah/Proc-Pardo-0.5.tgz I'll afix a copy of the full docs below my signature, in case you'd like to know more. Oh, and by the way, the existing code has been tested successfully on a variety of systems, some with Perl 5.00x, and some with 5.6.x: Solaris 2.5.1 SPARC Solaris 2.6 SPARC Solaris 2.7 SPARC Solaris 2.8 SPARC Solaris 2.7 Intel Solaris 2.8 Intel FreeBSD 2.2.8 FreeBSD 3.2-STABLE FreeBSD 3.4 FreeBSD 4.x (20011014 snapshot) Linux 2.2.19, 2.4.7, and 2.4.14 Thank you for your time. Any comments are welcome, -- Byron Darrah [EMAIL PROTECTED] NAME Proc::Pardo - Parallel looping construct for Perl programs SYNOPSIS use Proc::Pardo; pardo sub{loop_test}, sub{loop_update}, sub{ loop_body }; pareach array_ref, sub{ loop_body }; DESCRIPTION This module provides a way to easily write for loops and foreach loops that run with a controlled degree of parallelism. One very nice feature is that bufferring is used when necessary such that the output from STDERR and STDOUT looks exactly as if it was produced by running your subroutine on each parameter in plain old sequential fashion. Return status from each loop iteration is also preserved. USAGE The degree of parallelism defaults to 5. No more than that many subprocesses will be allowed to run at any time. The default can be overridden by setting {"Max_Workers"=>n} after a loop body. There are two interfaces to this package: pardo and pareach. The first approximates the semantics of a typical for loop. pareach is more like a typical foreach loop in Perl. (Actually, for and foreach are synonyms in Perl, so I emphasize "typical" because they're usually used as if they have different semantics.) BUGS Perl's signal handling is documented to be unreliable. To get around this, Proc::Pardo makes the assumption that a child process closing it's output descriptors means the child is finished, and that an IO event will be observable via select when this happens. It remains to be seen whether this will turn out to be a more reliable approach, though it seems to be holding up so far. AUTHOR Byron C. Darrah [EMAIL PROTECTED] COPYRIGHT Copyright (c) 2002 Byron C. Darrah. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of the license can be found in the LICENSE file included with this module. SEE ALSO perl(1). PUBLIC METHODS You pass pardo() three args: a loop test, an update function, and a loop body. It behaves mostly like a for loop but be careful that your loop test and update functions don't assume sequential execution. For example: for (my $i=0; $i<100; $i++) { ... } can be parallelized as: { my $i=0; pardo sub{ $i<100 }, sub{ $i++ }, sub{ ... };} You pass pareach() two args: A subroutine reference and an array of parameters. The subroutine will be called once for each item in the array, with the item passed as the arg. For example: foreach my $i ( @stuff ) { ... } can be parallelized as: pareach [ @stuff ], sub{ my $i=shift; ... }; Both pardo and pareach return an array containing the return statuses of each iteration of the loop body, in order as if the loop had been executed sequentially. PRIVATE METHODS And of course, here are all the methods you should never call. wait_for_all_jobs_to_finish Usage : wait_for_all_jobs_to_finish() Purpose : Wait for pending jobs to finish. Returns : N/A. Argument : None. Throws : No exceptions. Comments : Call this just before returning from a pardo-like loop. init_state Usage : init_state() Purpose : Initialize global loop state. Returns : N/A. Argument : None. Throws : No exceptions. Comments : Note that even though pardo loops may nest, or be used by : modules that know nothing of each other, it is safe to : use global variables to store the loop state, because: : 1. pardo is a synchronous function which does not : return until it no longer needs the state : information. : 2. Child processes do not depend on the state : variables. : 3. Pardo is not recursive and even if the outer program : calling pardo is, each pardo task executes in an : isolated suprocess. : Of course, pardo is not re-entrant or thread-safe, but if : you are doing anything in Perl that could try to invoke : pardo from a signal handler or a (non-process) thread, : you probably need to see the BOFH about increasing your : disk quota. dispatch Usage : dispatch($subroutine, $parm) Purpose : Assign a worker process to execute a loop body. Returns : N/A. Argument : A subroutine representing a loop body, and a parameter to : be passed to the loop body as $_[0]. Throws : No exceptions. Comments : If a loop body throws an exception, it will go uncaught. wait_for_available_queue Usage : wait_for_available_queue($slots) Purpose : Sleep until we are allowed to start a new subprocess. Returns : N/A. Argument : Number of queue slots that must be available before returning. Throws : No exceptions. check_for_death() Usage : check_for_death() Purpose : Nonblocking check and handling for death of any worker : process. Returns : N/A. Argument : Nothing. Throws : No exceptions. See Also : waitpid handle_event() Usage : handle_event() Purpose : Wait for a child to die or for output to be available. Returns : N/A. Argument : None. Throws : No exceptions. Comments : Makes the assumption that child process death will cause : an IO event on that process's output descriptors. cleanup_worker Usage : cleanup_worker($worker_index) Purpose : Clean up after a worker has been reaped. Returns : N/A. Argument : The index of the Proc_Order and other hashes. Throws : No exceptions. reclaim_worker_io Usage : reclaim_worker_io($worker_index) Purpose : Reclaim resources no longer needed for a long-dead worker process. Returns : N/A. Argument : Index of the dead worker in Proc_Order and other hashes. Throws : No exceptions. gather_all_output Usage : gather_all_output() Purpose : Gather any output that may have been produced by child : processes and flush the output buffers of the current : process. Returns : N/A. Argument : None. Throws : No exceptions. gather_proc_output Usage : gather_proc_output($worker_index) Purpose : Collect error and standard output from a worker process. Returns : N/A. Argument : Index of a worker process in Proc_Order and other hashes. Throws : No exceptions. make_names Usage : make_names() Purpose : Make up some names for use as file handles. Returns : A list of three names. Argument : N/A. Throws : No exceptions. Comments : Reuse reclaimed names when possible, so we don't bloat : the symbol table needlessly.