Bump

On 27 February 2011 20:32, David Mitchell <monch1...@gmail.com> wrote:

> Hello everyone,
>
> I've read through the message archive and there seems to be a fairly clear
> message: don't using the multiprocessing module within web2py.
>
> However, I'm hoping I might have a use case that's a bit different...
>
> I've got an app that basically does analytics on moderately large datasets.
>  I've got a number of controller methods that look like the following:
>
> def my_method():
>     # Note: all data of interest has previously been loaded into
> 'session.data'
>     results = []
>     d = local_import('analysis')
>     results += d.my_1st_analysis_method(session)
>     results += d.my_2nd_analysis_method(session, date=date)
>     results += d.my_3rd_analysis_method(session)
>     results += d.my_4th_analysis_method(session, date=date)
>     results += d.my_5th_analysis_method(session, date=date)
>     return dict(results=results)
>
> The problem I have is that all of the methods in my 'analysis' module, when
> run in sequence as per the above, simply take too long to execute and give
> me a browser timeout.  I can mitigate this to some extent by extending the
> timeout on my browser, but I need to be able to use an iPad's Safari browser
> and it appears to be impossible to increase the browser timeout on the iPad.
>  Even if it can be done, that approach seems pretty ugly and I'd rather not
> have to do it.  What I really want to do is run all of these analysis
> methods *simultaneously*, capturing the results of each analysis_method into
> a single variable once they've finished.
>
> All of the methods within the 'analysis' module are designed to run
> concurrently - although they reference session variables, I've consciously
> avoided updating any session variables within any of these methods.  While
> all the data is stored in a database, it's loaded into a session variable
> (session.data) before my_method is called; this data never gets changed as
> part of the analysis.
>
> Is it reasonable to replace the above code with something like this:
>
> def my_method():
>     import multiprocessing
>     d = local_import('analysis')
>
>     tasks = [
>         ('job': 'd.my_1st_analysis_method', 'params': ['session']),
>         ('job': 'd.my_2nd_analysis_method', 'params': ['session',
> 'date=date']),
>         ('job': 'd.my_3rd_analysis_method', 'params': ['session']),
>         ('job': 'd.my_4th_analysis_method', 'params': ['session',
> 'date=date']),
>         ('job': 'd.my_5th_analysis_method', 'params': ['session',
> 'date=date']),
>     ]
>
>     task_queue = multiprocessing.Queue()
>     for t in tasks:
>         task_queue.put(t['job'])
>
>     result_queue = multiprocessing.Queue()
>
>     for t in tasks:
>         args = (arg for arg in t['params'])
>         worker = multiprocessing.Worker(work_queue, result_queue,
> args=args)
>         worker.start()
>
>     results = []
>     while len(results) < len(tasks):
>         result = result_queue.get()
>         results.append(result)
>
>     return dict(results=results)
>
> Note: I haven't tried anything using the multiprocessing module before, so
> if you've got any suggestions as to how to improve the above code, I'd
> greatly appreciate it...
>
> Is introducing multiprocessing as I've outlined above a reasonable way to
> optimise code in this scenario, or is there something in web2py that makes
> this a bad idea?  If it's a bad idea, do you have any suggestions what else
> I could try?
>
> Thanks in advance
>
> David Mitchell
>

Reply via email to