On Mar 1, 2011, at 2:48 AM, David Mitchell wrote:
> Bump

I tried something like that a while back with Python threads, for much the same 
reason you describe. In my case, each thread was farming out an xml-rpc 
request, each to a different server, an ideal case for this kind of thing, 
since all of my threads were in IO wait. (That is to say, I didn't need MP, 
since the processing didn't happen locally, but I could benefit from threads of 
control.)

The problem comes when the user/browser gives up and cancels the request, and 
immediately resubmits (or simply gets impatient and does a reload). In my case, 
things would get so tangled that I'd have to restart web2py. 

I think that mp might be a reasonable solution, but I'd look for a way to 
redefine the approach to be asynchronous. Fill in the results in the database, 
perhaps, and poll from the client with Ajax, maybe. But do it in such a way 
that the initial request (to web2py) can complete more or less immediately.

> 
> On 27 February 2011 20:32, David Mitchell <monch1...@gmail.com> wrote:
> Hello everyone,
> 
> I've read through the message archive and there seems to be a fairly clear 
> message: don't using the multiprocessing module within web2py.
> 
> However, I'm hoping I might have a use case that's a bit different...
> 
> I've got an app that basically does analytics on moderately large datasets.  
> I've got a number of controller methods that look like the following:
> 
> def my_method():
>     # Note: all data of interest has previously been loaded into 
> 'session.data'
>     results = []
>     d = local_import('analysis')
>     results += d.my_1st_analysis_method(session)
>     results += d.my_2nd_analysis_method(session, date=date)
>     results += d.my_3rd_analysis_method(session)
>     results += d.my_4th_analysis_method(session, date=date)
>     results += d.my_5th_analysis_method(session, date=date)
>     return dict(results=results)
> 
> The problem I have is that all of the methods in my 'analysis' module, when 
> run in sequence as per the above, simply take too long to execute and give me 
> a browser timeout.  I can mitigate this to some extent by extending the 
> timeout on my browser, but I need to be able to use an iPad's Safari browser 
> and it appears to be impossible to increase the browser timeout on the iPad.  
> Even if it can be done, that approach seems pretty ugly and I'd rather not 
> have to do it.  What I really want to do is run all of these analysis methods 
> *simultaneously*, capturing the results of each analysis_method into a single 
> variable once they've finished.
> 
> All of the methods within the 'analysis' module are designed to run 
> concurrently - although they reference session variables, I've consciously 
> avoided updating any session variables within any of these methods.  While 
> all the data is stored in a database, it's loaded into a session variable 
> (session.data) before my_method is called; this data never gets changed as 
> part of the analysis.
> 
> Is it reasonable to replace the above code with something like this:
> 
> def my_method():
>     import multiprocessing
>     d = local_import('analysis')
> 
>     tasks = [
>         ('job': 'd.my_1st_analysis_method', 'params': ['session']),
>         ('job': 'd.my_2nd_analysis_method', 'params': ['session', 
> 'date=date']),
>         ('job': 'd.my_3rd_analysis_method', 'params': ['session']),
>         ('job': 'd.my_4th_analysis_method', 'params': ['session', 
> 'date=date']),
>         ('job': 'd.my_5th_analysis_method', 'params': ['session', 
> 'date=date']),
>     ]
> 
>     task_queue = multiprocessing.Queue()
>     for t in tasks:
>         task_queue.put(t['job'])
> 
>     result_queue = multiprocessing.Queue()
> 
>     for t in tasks:
>         args = (arg for arg in t['params'])
>         worker = multiprocessing.Worker(work_queue, result_queue, args=args)
>         worker.start()
> 
>     results = []
>     while len(results) < len(tasks):
>         result = result_queue.get()
>         results.append(result)
> 
>     return dict(results=results)
> 
> Note: I haven't tried anything using the multiprocessing module before, so if 
> you've got any suggestions as to how to improve the above code, I'd greatly 
> appreciate it...
> 
> Is introducing multiprocessing as I've outlined above a reasonable way to 
> optimise code in this scenario, or is there something in web2py that makes 
> this a bad idea?  If it's a bad idea, do you have any suggestions what else I 
> could try?
> 
> Thanks in advance
> 
> David Mitchell
> 


Reply via email to