Python has a GIL that impairs scalability on computers with more than one processor. The problem seems to be that there is only one GIL per process. Solutions to removing the GIL has always stranded on the need for 'fine grained locking' on reference counts. I believe there is a second way, which has been overlooked: Having one GIL per interpreter instead of one GIL per process.
Currently, the Python C API - as I understand it - only allows for a single interpreter per process. Here is how Python would be embedded in a multi-threaded C program today, with the GIL shared among the C threads: #include <windows.h> #include <Python.h> #include <process.h> void threadproc(void *data) { /* create a thread state for this thread */ PyThreadState *mainstate = NULL; mainstate = PyThreadState_Get(); PyThreadState *threadstate = PyThreadState_New(mainstate); PyEval_ReleaseLock(); /* swap this thread in, do whatever we need */ PyEval_AcquireLock(); PyThreadState_Swap(threadstate); PyRun_SimpleString("print 'Hello World1'\n"); PyThreadState_Swap(NULL); PyEval_ReleaseLock(); /* clear thread state for this thread */ PyEval_AcquireLock(); PyThreadState_Swap(NULL); PyThreadState_Clear(threadstate); PyThreadState_Delete(threadstate); PyEval_ReleaseLock(); /* tell Windows this thread is done */ _endthread(); } int main(int argc, char *argv[]) { HANDLE t1, t2, t3; Py_Initialize(); PyEval_InitThreads(); t1 = _beginthread(threadproc, 0, NULL); t2 = _beginthread(threadproc, 0, NULL); t3 = _beginthread(threadproc, 0, NULL); WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE); Py_Finalize(); return 0; } In the Java native interface (JNI) all functions take an en environment variable for the VM. The same thing could be done for Python, with the VM including GIL encapsulated in a single object: #include <windows.h> #include <Python.h> #include <process.h> void threadproc(void *data) { PyVM *vm = Py_Initialize(); /* create a new interpreter */ PyRun_SimpleString(vm, "print 'Hello World1'\n"); Py_Finalize(vm); _endthread(); } int main(int argc, char *argv[]) { HANDLE t1 = _beginthread(threadproc, 0, NULL); HANDLE t2 = _beginthread(threadproc, 0, NULL); HANDLE t3 = _beginthread(threadproc, 0, NULL); WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE); return 0; } Doesn't that look a lot nicer? If one can have more than one interpreter in a single process, it is possible to create a pool of them and implement concurrent programming paradigms such as 'forkjoin' (to appear in Java 7, already in C# 3.0). It would be possible to emulate a fork on platforms not supporting a native fork(), such as Windows. Perl does this in 'perlfork'. This would deal with the GIL issue on computers with more than one CPU. One could actually use ctypes to embed a pool of Python interpreters in a process already running Python. Most of the conversion of the current Python C API could be automated. Python would also need to be linked against a multi-threaded version of the C library. -- http://mail.python.org/mailman/listinfo/python-list