Amit N wrote:

About 800+ 10-15MB files are generated daily that need to be processed. The processing consists of different steps that the files must go through:

-Uncompress
-FilterA
-FilterB
-Parse
-Possibly compress parsed files for archival

You can implement one of two easy straightforward approaches:

1 - Create one program, start N instances of it, where N is the number of CPUs/cores, and let each process one file to completion. You'll probably need an "overseer" program to start them and dispatch jobs to them. The easiest is to start your processes with first N files, then monitor them for completion and when any of them finishes, start another with the next file in queue, etc.

2 - Create a program / process for each of these steps and let the steps operate independently, but feed output from one step to the input of the next. You'll probably need some buffering and more control, so that if (for example) "FilterA" is slower then "Uncompress", the "Uncompress" process is signaled to wait a little until "FilterA" needs more data. The key is that, as long as all the steps run at approximatly the same speed, they can run in parallel.

Note that both approaches are in principle independent on whether you use threads or processes, with the exception of communication between the steps/stages, but you can't use threads in python if your goal is parallel execution of threads.


Attachment: signature.asc
Description: OpenPGP digital signature

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to