Hi All, I have a web-service that needs to handle a bunch of work requests. Each job involves IO call (DB, external web-services to fetch some data), so part of the time is spent on the blocking IO call. On the other side, after getting the data the job involves computational part (using numpy/pandas on time series dataframes). Service runs on multicore machine, so I want to use parallelism as much as possible (especially considering python's GIL) and due to decent number of IO, I want to use multiple threads inside each process so none of CPUs will stale due to IO delays.
It'd be the best scenario to use pool of processes and thread pool (because each worker will need to keep some state, like db connections). I already have my own thread pool implementation, that uses some load-balancing and fair-scheduling techniques that are specific to my problem domain. I'm curious if there is any multiprocessing module that I missed and which I can reuse. As it turned out, the on in the multiprocessing module doesn't support custom Process class (if there were, I would be able to derive it and add the functionality I need) ( http://stackoverflow.com/questions/740844/python-multiprocessing-pool-of-custom-processes). Is there any alternative module that I can reuse? If not, what's the best way to notify caller that the task finished its execution (aka multiprocessing.Pool's apply() function behavior)? What primitives are better to use for that purpose (in case I'll have to go with my own implementation of multiprocessing pool)? Any reference to good blog/educational resource will be highly appreciated! If you believe that my solution is not optimal and have better/easier solution (hope I specified my problem good enough), please share your thoughts Thanks in advance!
-- https://mail.python.org/mailman/listinfo/python-list