Dear all,

I need some advice regarding use of the multiprocessing module. Following is the scenario:

 * I am running gradient descent to estimate parameters of a pairwise
   grid CRF (or a grid based graphical model). There are 106 data
   points. Each data point can be analyzed in parallel.
 * To calculate gradient for each data point, I need to perform
   approximate inference since this is a loopy model. I am using Gibbs
   sampling.
 * My grid is 9x9 so there are 81 variables that I am sampling in one
   sweep of Gibbs sampling. I perform 1000 iterations of Gibbs sampling.
 * My laptop has quad-core Intel i5 processor, so I thought using
   multiprocessing module I can parallelize my code (basically
   calculate gradient in parallel on multiple cores simultaneously).
 * I did not use the multi-threading library because of GIL issues, GIL
   does not allow multiple threads to run at a time.
 * As a result I end up creating a process for each data point (instead
   of a thread that I would ideally like to do, so as to avoid process
   creation overhead).
 * I am using basic NumPy array functionalities.

Previously I was running this code in MATLAB. It runs quite faster, one iteration of gradient descent takes around 14 sec in MATLAB using parfor loop (parallel loop - data points is analyzed within parallel loop). However same program takes almost 215 sec in Python.

I am quite amazed at the slowness of multiprocessing module. Is this because of process creation overhead for each data point?

Please keep my email in the replies as I am not a member of this mailing list.

Thanks,
Abhinav



-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to