Ask Solem <a...@opera.com> added the comment:

New patch attach (termination-trackjobs3.patch).

> Hmm, a few notes. I have a bunch of nitpicks, but those
> can wait for a later iteration. (Just one style nit: I
> noticed a few unneeded whitespace changes... please try
> not to do that, as it makes the patch harder to read.)

Yeah, nitpicks can wait. We need a satisfactory solution first.
I forgot about the whitespace, the reason is that the patch was started
from the previous trackjobs patch.

> - Am I correct that you handle a crashed worker
> by aborting all running jobs?

No. The job's result is marked with the WorkerLostError, the
process is replaced by a new one, and the pool continue to be
functional.

>- If you're going to the effort of ACKing, why not record
>the mapping of tasks to workers so you can be more selective in your 
>>termination?

I does have access to that. There's ApplyResult.worker_pids().
It doesn't terminate anything, it just clean up after whatever terminated. The 
MapResult could very well discard the job as a whole,
but my patch doesn't do that (at least not yet).

> Otherwise, what does the ACKing do towards fixing this particular
> issue?

It's what lets us find out what PID is processing the job. (It also
happens to be a required feature to reliably take advantage of
external ack semantics (like in AMQP), and also used by my job timeout
patch to know when a job was started, and then it shows to be useful
in this problem.

> - I think in the final version you'd need to introduce some
> interthread locking, because otherwise you're going to have weird race > 
> conditions. I haven't thought too hard about whether you can
> get away with just catching unexpected exceptions, but it's
> probably better to do the locking.

Where is this required?

> - I'm getting hangs infrequently enough to make debugging annoying,
> and I don't have time to track down the bug right now.

Try this:

for i in 1 2 3 4 5; ./python.exe test.regrtest -v test_multiprocessing

it should show up quickly enough (at least on os x)

> Why don't you strip out any changes that are not needed (e.g. AFAICT, > the 
> ACK logic), make sure there aren't weird race conditions,
> and if we start converging on a patch that looks right from a high > level we 
> can try to make it work on all the corner case?

See the updated patch. I can't remove the ACK, but I removed the 
accept_callback, as it's not strictly needed to solve this problem.

----------
Added file: 
http://bugs.python.org/file18664/multiprocessing-tr...@82502-termination-trackjobs3.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9205>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to