Hello, I would like to announce dispy (http://dispy.sourceforge.net), a python framework for distributing computations for parallel execution to processors/cores on single node to many nodes over the network. The computations can be python functions or programs. If there are any dependencies, such as other python functions, modules, classes, objects or files, they are also distributed as well. The results of each computation, output, error messages and exception trace, if any, are made available to client program for further processing. Popular map/reduce style programs can be easily developed and deployed with dispy.
There is also an implementation of dispy, called discopy, that uses asynchronous I/O and coroutines, so that discopy will scale efficiently for large number of network connections (right now this is a bit academic, until it has been tested with such setups). The framework with asynchronous I/O and coroutines, called asyncoro, is independent of dispy - discopy is an implementation of dispy using asyncoro. Others may find asyncoro itself useful. Salient features of dispy/discopy are: * Computations (python functions or standalone programs) and its dependencies (files, python functions, classes, modules) are distributed automatically. * Computation nodes can be anywhere on the network (local or remote). For security, either simple hash based authentication or SSL encryption can be used. * A computation may specify which nodes are allowed to execute it (for now, using simple patterns of IP addresses). * After each execution is finished, the results of execution, output, errors and exception trace are made available for further processing. * If callback function is provided, dispy executes that function when a job is finished; this feature is useful for further processing of job results. * Nodes may become available dynamically: dispy will schedule jobs whenever a node is available and computations can use that node. * Client-side and server-side fault recovery are supported: If user program (client) terminates unexpectedly (e.g., due to uncaught exception), the nodes continue to execute scheduled jobs. If client-side fault recover option is used when creating a cluster, the results of the scheduled (but unfinished at the time of crash) jobs for that cluster can be easily retrieved later. If a computation is marked re-entrant (with 'resubmit=True' option) when a cluster is created and a node (server) executing jobs for that computation fails, dispy automatically resubmits those jobs to other available nodes. * In optimization problems it is useful for computations to send (successive) provisional results back to the client, so it can, for example, terminate computations. If computations are python functions, they can use 'dispy_provisional_result' function for this purpose. * dispy can be used in a single process to use all the nodes exclusively (with JobCluster - simpler to use) or in multiple processes simultaneously sharing the nodes (with ShareJobCluster and dispyscheduler). dispy works with python 2.7. It has been tested on Linux, Mac OS X and known to work with Windows. discopy has been tested on Linux and Mac OS X. I am not subscribed to the list, so please Cc me if you have comments. Cheers, Giri
-- http://mail.python.org/mailman/listinfo/python-list