Kristján Valur Jónsson <krist...@ccpgames.com> added the comment:

Sorry, what I meant with the "original problem" was the phenomenon observed by 
Antoine (IIRC) that the same CPU thread tends to hog the gil, even when 
releaseing it in ceval.c.
What I have been looking at up to now is chiefly IO performance using David's 
iotest.py, and improving the poor performance of IO.  IO will not suffer as 
badly on windows because the IO thread will get its fair slice of execution 
time.  Promted by you, I added this bit of code to the iotest.py:
spins = 0
laststat = 0
def spin():
    global spins, laststat
    task,args = task_pidigits()
    while True:
       r= task(*args)
       spins += 1
       t = time.clock()
       if t-laststat > 1:
           print spins/(t-laststat)
           spins = 0
           laststat = t
       

You are right, however that cpu throughput of multiple cpu bound thread 
suffers.  And in fact, on windows, it appears to suffer the least using the 
LEGACY_GIL implementation.  This is, I conjecture, because there are far fewer 
context switches (because relinqushing the GIL fails).  My conjecture is that 
context switches between threads on two cores are so expensive as to 
dramatically affect performance.  Normal multithreaded programs don't suffer 
from this because the threads are kept busy.  But in our case, we are stopping 
one thread on one core, and starting another on a separate core, and this 
causes latency.

Now, I've improved my patch somewhat.  First off, I fixed some minor errors in 
the PRIORITY_GIL implementation.  But more importantly, I added something 
called FIFOCOND.  It is a condition variable that guarantees the FIFO property. 
 This was prompted by my observation that even Windows' Semaphore doesn't do 
that, rather the windows scheduler may allow the currently executing thread to 
jump ahead in the semaphore queue.  The FIFOCOND condition variable fixes that 
using explicit scheduling, and is intended as a diagnostic tool.
(Antoine, your comment from 13:04 about "roundrobin" inasfar as that we don't 
know anything about the condition variable behaviour.  I was assuming FIFO 
behaviour for the sake of argument, and I thought I´ put it in to the comments 
that we assume a general 'fairness' there.  Put in the FIFOCOND and you will 
have that fairness guaranteed.)


At any rate, I believe my patch provides a useful platform for further 
experimentation.
1) Factoring out the gil as a separate type of lock (which it must be)
2) allowing for different implementation of the GIL
3) shoring up the Condition variable implementation on Windows
4) Providing a FIFOCOND_T type to enforce a particular scheduling order, and 
demonstrating how we can be explicit about thread scheduling.

I have already demonstrated that using the PRIORITY_GIL method fixes the 
problem with IO threads in the presence of CPU bound threads.  Your iotest.py 
script is perfect for this, using 2 worker threads.  On windows, the problem 
with IO wasn't so grave as I have explained (windows by default works as the 
ROUNDROBIN_GIL implementation, not the LEGACY_GIL mode used on pthreads).  The 
PRIORITY_GIL solution is particularly effective with multicore on Windows, but 
it also improves IO throughput if cpu affinity of the server is fixed to one 
CPU, i.e. on singlecore.

I have no fix for CPU bound threads, and I honestly don't think such a fix 
exists, except by causing switches to happen far less frequently, e.g. by 
raising the checkinterval, and so mitigating the problem (which is what the new 
gil in py3k does with its timeout implementation)  But the IO fix for pthreads

To summarise then:
1) The GIL has two problems on multicore machines
 a) performance of CPU threads goes down
 b) performance of IO in the presence of CPU threads is abysmal, but not on 
Windows
2) We can fix problem b) on pthreads with the ROUNDROBIN_GIL implementation.
3) We can improve IO performance in the presence of CPU threads on pthreads and 
Windows using the PRIORITY_GIL implementation, even to become faster than on a 
single core.
4) We cannot do anything about decreased performance of co-operatively 
switching CPU threads on multicore except switching less frequently.   But this 
is quite feasible now with the PRIORITY_GIL implementation because it can 
request an immediate gil drop when IO is ready.  So raising the checkinterval 
will not affect IO performance in a negative way.


Please have a look at the latest patch with IO thread performance in mind.  It 
is currently configured to enable the PRIORITY_GIL implementation without the 
FIFOCOND on windows and pthreads.

----------
Added file: http://bugs.python.org/file16770/gil2.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8299>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to