On Apr 27, 2:54 pm, John Nagle <na...@animats.com> wrote: > I have a multi-threaded CPython program, which has up to four > threads. One thread is simply a wait loop monitoring the other > three and waiting for them to finish, so it can give them more > work to do. When the work threads, which read web pages and > then parse them, are compute-bound, I've had the monitoring thread > starved of CPU time for as long as 120 seconds.
How exactly are you determining that this is the case? > I know that the CPython thread dispatcher sucks, but I didn't > realize it sucked that bad. Is there a preference for running > threads at the head of the list (like UNIX, circa 1979) or > something like that? Not in CPython, which is at the mercy of what the operating system does. Under the covers, CPython uses a semaphore on Windows, which do not have FIFO ordering as per http://msdn.microsoft.com/en-us/library/windows/desktop/ms685129(v=vs.85).aspx. As a result, I think your thread is succumbing to the same issues that impact signal delivery as described on 22-24 and 35-41 of http://www.dabeaz.com/python/GIL.pdf. I'm not sure there's any easy or reliable way to "fix" that from your code. I am not a WinAPI programmer though, and I'd suggest finding one to help you out. It doesn't appear possible to change the scheduling policy for semaphore programatically, and I don't know closely they pay any attention to thread priority. That's just a guess though, and finding out for sure would take some low-level debugging. However, it seems to be the most probable situation assuming your code is correct. > > (And yes, I know about "multiprocessing". These threads are already > in one of several service processes. I don't want to launch even more > copies of the Python interpreter. Why? There's little harm in launching more instances. Processes have some additional startup and memory overhead compared to threads, but I can't imagine it woudl be an issue. Given what you're trying to do, I'd expect to run out of other resources long before I ran out of memory because I created too many processes or threads. > The threads are usually I/O bound, > but when they hit unusually long web pages, they go compute-bound > during parsing.) If your concern is being CPU oversubscribed by using lots of processes, I suspect it's probably misplaced. A whole mess of CPU- bound tasks is pretty much the easiest case for a scheduler to handle. Adam -- http://mail.python.org/mailman/listinfo/python-list