Kristján Valur Jónsson <krist...@ccpgames.com> added the comment:

David, trying to get some more realistic IO benchmarks I did some more tests.  
The idea is to have a threaded socket server, serving requests that take 
different amounts of time to process, and see how io response measures up for 
two classes of requests being serviced simultaneously.

Please see the evalsrv.rar for the client and server scripts.  The client uses 
multiprocessing to distance itself from the GIL issue.  The results on my dual 
core windows box are as follows (LEGACY_GIL is the mac, unfair GIL, 
ROUNDROBIN_GIL is the same with the fairness fix.  "with affinity" means that 
the server process is restricted to running on one core.

label        time                avg time             std.dev 

serial
((30, 500), (2.7145072907310848, 0.09047581466359553, 0.0041867466462554535))
((300, 10), (0.46542703053041656, 0.0015481250643121787, 0.0002282114778449236))

3.36s (3.18s) (total time, sum of individual classes)
simultaneous
((30, 500), (2.8820070707310563, 0.09605833283280416, 0.004430198440914231))
((300, 10), (3.2082978423235358, 0.010690429928943495, 0.014415958519681225))
3.21s (6.09s)

(for each test, you get the indvidual timing for each request class, and then a 
sum of total time and sum of individual times.)
Please don't read too much into small differences, this is a roughly one-off 
test here and likely contains noise.
A few things become apparent:
1) with LEGACY_GIL, affinity appears not to matter.  The 300 fast requests take 
longer to complete than the 30 slow requests if done in parallel, even though 
their serial execution time is roughly 1/5th.
2) With ROUNDROBIN_GIL, serial performance appears not to be affected, but 
simultaneous performance is much better:  end-to-end time is the same, but the 
sum of individual classes is lower.  That means that the client had to wait 
less for their IO results.
3) With ROUNDROBIN_GIL, if we put affinity on, we get the same kind of 
performance as with the LEGACY_GIL.


The most important points here are the two last ones, I think.  The fact that 
the sum of the individual request waits goes down is significant, and it is by 
no small amount that it drops.  But equally perplexing is the fact that forcing 
the server to one cpu, removes the "fairness" again.  It would appear that the 
behaviour of the synchronization object (an windows Semaphore in this case) 
changes depending on the number of cores, just as you had previously mentioned. 
 This is, however, a windows only effect, I think.  I must try to find out what 
is going on.

----------
Added file: http://bugs.python.org/file17034/evalsrv.rar

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8299>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to