New submission from Phillip Sitbon <phillip.sitbon+python-...@gmail.com>:
At the suggestion of others on the Python-Dev list, I'm going to outline what I've been doing with the GIL and where I see possiblity for improvement. Discussion: http://mail.python.org/pipermail/python-dev/2009-May/089746.html Detail: Currently, Python's global interpreter lock is implemented with the same mechanism that provides locks for the threading module. Moving the GIL's low-level handling to a separate mechanism would free core developers from the feature requirements imposed upon thread locks, specifically: 1. Blocking recursion A thread can acquire a lock when it has already been acquired, and this action will block. 2. Release-anywhere Any thread not owning a lock can release it (provided it is currently owned), thus preventing permanent deadlock in #1. These semantics are that of a semaphore with an initial value of 1, where acquisitions decrease its value and releases increment its value. This behavior cannot change without changing the fundamental behavior of the threading.Lock object that many have probably come to expect. I have been looking into ways to minimize GIL overhead in high concurrency situations. At the moment, I'm only looking at Windows, partly because it was an easy target and I have experience with its synchronization objects. While I have come up with a faster alternative for GIL handling, it breaks the expectations of thread lock objects. I then got to wondering: - Does anyone actually use the GIL-locking/unlocking functions in a manner that requires the features listed above? - If so, could it be accomplished a different way if the GIL were expected to behave more like a recursive mutex? I'm not going to make any assumptions about what extensions do with the GIL, but I can hope at least they do use the API functions. If so, the question is if we can safely apply different locking semantics to the GIL: - Non-blocking recursion A thread that already owns the lock can re-acquire it without blocking, increasing a recursion count. - Owner-only releasing Only the thread owning the lock can release it, and only as many times as it has acquired it. In Windows, this is how critical sections behave. I'm not sure about the Linux futex, but I know pthread mutexes and others like it can be made recursive. I'm mainly interested in separating the GIL from thread locks so we can say "do x and y with it but not z" without changing the threading module's lock mechanism. I decided to make a case of this when I was testing my ISAPI extension under high load. The current mechanism is as fast as it can possibly be for a single thread, but does not scale well at all. This has a lot to do with its use of event synchronization objects. Critical sections, on the other hand, scale well even when contention is unavoidable. My first experiment involved modifying the thread lock functions in thread_nt.h to use critical sections. Yes, it broke the behavior of threading.Lock, but it still worked for the GIL. The performance improvement can vary, but it's always noticeable in the presence of concurrency. I saw the performance of my server extension nearly double when serving up Django pages from 10-12 threads concurrently. Obviously we can't just slap on a critical section and call it done, unless nobody cares about changing what threading.Lock does. That's what led me believe that a separate GIL mechanism is very important to have. For your hacking pleasure, I've included a patch for thread_nt.h that will facilitate GIL performance testing. It includes the critical section implementation and a semaphore implementation (which is about as fast as the current behavior). I've also included a _very_ basic script for some ballpark GIL contention measurements. The numbers will be a bit exaggerated, especially with a _Py_CheckInterval of 1, but you will see the difference between the current method and the critical section method right away. Especially in terms of scalability. If you want a baseline cost for the iteration loops, change ThreadCount to 1 and make CheckInterval something big (>1 million). There are other costs included with the reported numbers, but going between the two lock implementations should make it clear that any big difference you see is due to locking/unlocking/contention. Please feel free to correct me if I did it all wrong - I'm sure there are other/better ways to test GIL performance. My testing was done with Python 2.6.2 x86 on Vista x64. I can provide measurements if it's not convenient for anyone to try out the patch. So where to go from here? It'd be good to know if changing the restrictions on GIL usage will break anything. Also, whether the same performance increase can be had for Linux/Mac/others. I'll continue to look into giving the GIL its own locking API, and hopefully my efforts are not for naught. ---------- components: Interpreter Core, Windows files: thread_nt.h.patch keywords: patch messages: 88452 nosy: sitbon severity: normal status: open title: Implement the GIL with critical sections in Windows type: performance versions: Python 2.7, Python 3.1 Added file: http://bugs.python.org/file14096/thread_nt.h.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6132> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com