Module for parallel loops: Pardo

Byron Darrah Sun, 05 May 2002 17:40:54 -0700

Does the following sound like a reasonable module to contribute to CPAN?  If 
so, may I please be registered as a developer with the following information?


Name: Byron Darrah
Email: [EMAIL PROTECTED]
Preferred user id: BDARRAH
Description of proposed contribution:
A module that implements parallel loop constructs.  In nature, 
it is similar to Proc::Swarm (managed subprocesses), but the semantics are 
very different.  Using Pardo very strongly resembles using a standard foreach 
or for loop.  It makes parallelization of existing code extremely easy.  
Output and return statuses from parallel subprocesses are presented in 
exactly the same order they would have if the loops really were ordinary 
foreach and for loops.  It's even possible (and natural-looking) to nest 
parallel loops with Pardo.

If this module would be acceptable for hosting on CPAN, then I think a 
reasonable name would be Proc::Pardo.  If you like, you can get a copy of the 
module from:

   http://home.pacbell.net/bdarrah/Proc-Pardo-0.5.tgz

I'll afix a copy of the full docs below my signature, in case you'd like to 
know more.

Oh, and by the way, the existing code has been tested successfully on a 
variety of systems, some with Perl 5.00x, and some with 5.6.x:

   Solaris 2.5.1 SPARC
   Solaris 2.6 SPARC
   Solaris 2.7 SPARC
   Solaris 2.8 SPARC
   Solaris 2.7 Intel
   Solaris 2.8 Intel
   FreeBSD 2.2.8
   FreeBSD 3.2-STABLE
   FreeBSD 3.4
   FreeBSD 4.x (20011014 snapshot)
   Linux 2.2.19, 2.4.7, and 2.4.14


Thank you for your time.  Any comments are welcome,
-- 
Byron Darrah
[EMAIL PROTECTED]




NAME
    Proc::Pardo - Parallel looping construct for Perl programs

SYNOPSIS
      use Proc::Pardo;

      pardo sub{loop_test}, sub{loop_update}, sub{
         loop_body
      };

      pareach array_ref, sub{
         loop_body
      };

DESCRIPTION
    This module provides a way to easily write for loops and foreach loops
    that run with a controlled degree of parallelism. One very nice feature
    is that bufferring is used when necessary such that the output from
    STDERR and STDOUT looks exactly as if it was produced by running your
    subroutine on each parameter in plain old sequential fashion. Return
    status from each loop iteration is also preserved.

USAGE
    The degree of parallelism defaults to 5. No more than that many
    subprocesses will be allowed to run at any time. The default can be
    overridden by setting {"Max_Workers"=>n} after a loop body.

    There are two interfaces to this package: pardo and pareach. The first
    approximates the semantics of a typical for loop. pareach is more like a
    typical foreach loop in Perl. (Actually, for and foreach are synonyms in
    Perl, so I emphasize "typical" because they're usually used as if they
    have different semantics.)

BUGS
    Perl's signal handling is documented to be unreliable. To get around
    this, Proc::Pardo makes the assumption that a child process closing it's
    output descriptors means the child is finished, and that an IO event
    will be observable via select when this happens. It remains to be seen
    whether this will turn out to be a more reliable approach, though it
    seems to be holding up so far.

AUTHOR
            Byron C. Darrah
            [EMAIL PROTECTED]

COPYRIGHT
    Copyright (c) 2002 Byron C. Darrah. All rights reserved. This program is
    free software; you can redistribute it and/or modify it under the same
    terms as Perl itself.

    The full text of the license can be found in the LICENSE file included
    with this module.

SEE ALSO
    perl(1).

PUBLIC METHODS
    You pass pardo() three args: a loop test, an update function, and a loop
    body. It behaves mostly like a for loop but be careful that your loop
    test and update functions don't assume sequential execution.

    For example:

       for (my $i=0; $i<100; $i++) {
          ...
       }

    can be parallelized as:

       { my $i=0; pardo sub{ $i<100 }, sub{ $i++ }, sub{
          ...
       };}

    You pass pareach() two args: A subroutine reference and an array of
    parameters. The subroutine will be called once for each item in the
    array, with the item passed as the arg.

    For example:

       foreach my $i ( @stuff ) {
          ...
       }

    can be parallelized as:

       pareach [ @stuff ], sub{
          my $i=shift;
          ...
       };

    Both pardo and pareach return an array containing the return statuses of
    each iteration of the loop body, in order as if the loop had been
    executed sequentially.

PRIVATE METHODS
    And of course, here are all the methods you should never call.

  wait_for_all_jobs_to_finish

     Usage     : wait_for_all_jobs_to_finish()
     Purpose   : Wait for pending jobs to finish.
     Returns   : N/A.
     Argument  : None.
     Throws    : No exceptions.
     Comments  : Call this just before returning from a pardo-like loop.

  init_state

     Usage     : init_state()
     Purpose   : Initialize global loop state.
     Returns   : N/A.
     Argument  : None.
     Throws    : No exceptions.
     Comments  : Note that even though pardo loops may nest, or be used by
               : modules that know nothing of each other, it is safe to
               : use global variables to store the loop state, because:
               :    1.  pardo is a synchronous function which does not
               :        return until it no longer needs the state
               :        information.
               :    2.  Child processes do not depend on the state
               :        variables.
               :    3.  Pardo is not recursive and even if the outer program
               :        calling pardo is, each pardo task executes in an
               :        isolated suprocess.
               : Of course, pardo is not re-entrant or thread-safe, but if
               : you are doing anything in Perl that could try to invoke
               : pardo from a signal handler or a (non-process) thread,
               : you probably need to see the BOFH about increasing your
               : disk quota.

  dispatch

     Usage     : dispatch($subroutine, $parm)
     Purpose   : Assign a worker process to execute a loop body.
     Returns   : N/A.
     Argument  : A subroutine representing a loop body, and a parameter to
               : be passed to the loop body as $_[0].
     Throws    : No exceptions.
     Comments  : If a loop body throws an exception, it will go uncaught.

  wait_for_available_queue

     Usage     : wait_for_available_queue($slots)
     Purpose   : Sleep until we are allowed to start a new subprocess.
     Returns   : N/A.
     Argument  : Number of queue slots that must be available before 
returning.
     Throws    : No exceptions.

  check_for_death()

     Usage     : check_for_death()
     Purpose   : Nonblocking check and handling for death of any worker
               : process.
     Returns   : N/A.
     Argument  : Nothing.
     Throws    : No exceptions.

    See Also : waitpid

  handle_event()

     Usage     : handle_event()
     Purpose   : Wait for a child to die or for output to be available.
     Returns   : N/A.
     Argument  : None.
     Throws    : No exceptions.
     Comments  : Makes the assumption that child process death will cause
               : an IO event on that process's output descriptors.

  cleanup_worker

     Usage     : cleanup_worker($worker_index)
     Purpose   : Clean up after a worker has been reaped.
     Returns   : N/A.
     Argument  : The index of the Proc_Order and other hashes.
     Throws    : No exceptions.

  reclaim_worker_io

     Usage     : reclaim_worker_io($worker_index)
     Purpose   : Reclaim resources no longer needed for a long-dead worker 
process.
     Returns   : N/A.
     Argument  : Index of the dead worker in Proc_Order and other hashes.
     Throws    : No exceptions.

  gather_all_output

     Usage     : gather_all_output()
     Purpose   : Gather any output that may have been produced by child
               : processes and flush the output buffers of the current
               : process.
     Returns   : N/A.
     Argument  : None.
     Throws    : No exceptions.

  gather_proc_output

     Usage     : gather_proc_output($worker_index)
     Purpose   : Collect error and standard output from a worker process.
     Returns   : N/A.
     Argument  : Index of a worker process in Proc_Order and other hashes.
     Throws    : No exceptions.

  make_names

     Usage     : make_names()
     Purpose   : Make up some names for use as file handles.
     Returns   : A list of three names.
     Argument  : N/A.
     Throws    : No exceptions.
     Comments  : Reuse reclaimed names when possible, so we don't bloat
               : the symbol table needlessly.

Module for parallel loops: Pardo

Reply via email to