I am developing a Python program that submits a command to each node of a cluster and consumes the stdout and stderr from each. I want all the processes to run in parallel, so I start a thread for each node. There could be a lot of output from a node, so I have a thread reading each stream, for a total of three threads per node. (I could probably reduce to two threads per node by having the process thread handle stdout or stderr.)
I've developed some code and have run into problems using the threading module, and have questions at various levels of detail.
1) How should I solve this problem? I'm an experienced Java programmer but new to Python, so my solution looks very Java-like (hence the use of the threading module). Any advice on the right way to approach the problem in Python would be useful.
2) How many active Python threads is it reasonable to have at one time? Our clusters have up to 50 nodes -- is 100-150 threads known to work? (I'm using Python 2.2.2 on RedHat 9.)
3) I've run into a number of problems with the threading module. My program seems to work about 90% of the time. The remaining 10%, it looks like notify or notifyAll don't wake up waiting threads; or I find some other problem that makes me wonder about the stability of the threading module. I can post details on the problems I'm seeing, but I thought it would be good to get general feedback first. (Googling doesn't turn up any signs of trouble.)
Thanks.
Jack Orenstein
-- http://mail.python.org/mailman/listinfo/python-list