Ask Solem <a...@opera.com> added the comment: New patch attach (termination-trackjobs3.patch).
> Hmm, a few notes. I have a bunch of nitpicks, but those > can wait for a later iteration. (Just one style nit: I > noticed a few unneeded whitespace changes... please try > not to do that, as it makes the patch harder to read.) Yeah, nitpicks can wait. We need a satisfactory solution first. I forgot about the whitespace, the reason is that the patch was started from the previous trackjobs patch. > - Am I correct that you handle a crashed worker > by aborting all running jobs? No. The job's result is marked with the WorkerLostError, the process is replaced by a new one, and the pool continue to be functional. >- If you're going to the effort of ACKing, why not record >the mapping of tasks to workers so you can be more selective in your >>termination? I does have access to that. There's ApplyResult.worker_pids(). It doesn't terminate anything, it just clean up after whatever terminated. The MapResult could very well discard the job as a whole, but my patch doesn't do that (at least not yet). > Otherwise, what does the ACKing do towards fixing this particular > issue? It's what lets us find out what PID is processing the job. (It also happens to be a required feature to reliably take advantage of external ack semantics (like in AMQP), and also used by my job timeout patch to know when a job was started, and then it shows to be useful in this problem. > - I think in the final version you'd need to introduce some > interthread locking, because otherwise you're going to have weird race > > conditions. I haven't thought too hard about whether you can > get away with just catching unexpected exceptions, but it's > probably better to do the locking. Where is this required? > - I'm getting hangs infrequently enough to make debugging annoying, > and I don't have time to track down the bug right now. Try this: for i in 1 2 3 4 5; ./python.exe test.regrtest -v test_multiprocessing it should show up quickly enough (at least on os x) > Why don't you strip out any changes that are not needed (e.g. AFAICT, > the > ACK logic), make sure there aren't weird race conditions, > and if we start converging on a patch that looks right from a high > level we > can try to make it work on all the corner case? See the updated patch. I can't remove the ACK, but I removed the accept_callback, as it's not strictly needed to solve this problem. ---------- Added file: http://bugs.python.org/file18664/multiprocessing-tr...@82502-termination-trackjobs3.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9205> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com