Bump On 27 February 2011 20:32, David Mitchell <monch1...@gmail.com> wrote:
> Hello everyone, > > I've read through the message archive and there seems to be a fairly clear > message: don't using the multiprocessing module within web2py. > > However, I'm hoping I might have a use case that's a bit different... > > I've got an app that basically does analytics on moderately large datasets. > I've got a number of controller methods that look like the following: > > def my_method(): > # Note: all data of interest has previously been loaded into > 'session.data' > results = [] > d = local_import('analysis') > results += d.my_1st_analysis_method(session) > results += d.my_2nd_analysis_method(session, date=date) > results += d.my_3rd_analysis_method(session) > results += d.my_4th_analysis_method(session, date=date) > results += d.my_5th_analysis_method(session, date=date) > return dict(results=results) > > The problem I have is that all of the methods in my 'analysis' module, when > run in sequence as per the above, simply take too long to execute and give > me a browser timeout. I can mitigate this to some extent by extending the > timeout on my browser, but I need to be able to use an iPad's Safari browser > and it appears to be impossible to increase the browser timeout on the iPad. > Even if it can be done, that approach seems pretty ugly and I'd rather not > have to do it. What I really want to do is run all of these analysis > methods *simultaneously*, capturing the results of each analysis_method into > a single variable once they've finished. > > All of the methods within the 'analysis' module are designed to run > concurrently - although they reference session variables, I've consciously > avoided updating any session variables within any of these methods. While > all the data is stored in a database, it's loaded into a session variable > (session.data) before my_method is called; this data never gets changed as > part of the analysis. > > Is it reasonable to replace the above code with something like this: > > def my_method(): > import multiprocessing > d = local_import('analysis') > > tasks = [ > ('job': 'd.my_1st_analysis_method', 'params': ['session']), > ('job': 'd.my_2nd_analysis_method', 'params': ['session', > 'date=date']), > ('job': 'd.my_3rd_analysis_method', 'params': ['session']), > ('job': 'd.my_4th_analysis_method', 'params': ['session', > 'date=date']), > ('job': 'd.my_5th_analysis_method', 'params': ['session', > 'date=date']), > ] > > task_queue = multiprocessing.Queue() > for t in tasks: > task_queue.put(t['job']) > > result_queue = multiprocessing.Queue() > > for t in tasks: > args = (arg for arg in t['params']) > worker = multiprocessing.Worker(work_queue, result_queue, > args=args) > worker.start() > > results = [] > while len(results) < len(tasks): > result = result_queue.get() > results.append(result) > > return dict(results=results) > > Note: I haven't tried anything using the multiprocessing module before, so > if you've got any suggestions as to how to improve the above code, I'd > greatly appreciate it... > > Is introducing multiprocessing as I've outlined above a reasonable way to > optimise code in this scenario, or is there something in web2py that makes > this a bad idea? If it's a bad idea, do you have any suggestions what else > I could try? > > Thanks in advance > > David Mitchell >