On Tue, Sep 18, 2012 at 10:26 AM, Dhananjay <dhananjay.c.jo...@gmail.com> wrote: > Dear all, > > I am trying to use multiprocessing module. > I have 5 functions and 2000 input files. > > First, I want to make sure that these 5 functions execute one after the > other. > Is there any way that I could queue these 5 functions within the same script > ? > > > Next, as there are 2000 input files. > I could queue them by queue.put() and get back to run one by one using > queue.get() as follows: > > for file in files: > if '.dat.gz' in file: > q.put(file) > > while True: > item = q.get() > x1 = f1(item) > x2 = f2(x1) > x3 = f3(x2) > x4 = f4(x3) > final_output = f5(x4) > > > However, how can I input them on my 8 core machine, so that at a time 8 > files will be processed (to the set of 5 functions; each function one after > the other) ? >
The multiprocessing.Pool class seems to be what you need. Documentation at http://docs.python.org/py3k/library/multiprocessing.html#using-a-pool-of-workers Example: #!/usr/bin/env python3 import multiprocessing def file_handler(filename): # do processing on filename, return the final_output print('working on {}'.format(filename)) return 'processed-{}'.format(filename) def main(): p = multiprocessing.Pool(8) files = [ 'a', 'b', 'c' ] result = p.map(file_handler, files) print(result) if __name__ == '__main__': main() If you want, you can also implement everything using multiprocessing.Process and multiprocessing.Queue, but using pools should be simpler. -- regards, kushal -- http://mail.python.org/mailman/listinfo/python-list