Sorry, just a few more thoughts: Does anybody know why GIL can't be made more atomic? I mean, use different locks for different parts of code? This way there would be way less blocking and the plugin interface could remain the same (the interpreter would know what lock it used for the plugin, so the actual function for releasing / reacquiring the lock could remain the same) On second thought, forget this. This is probably exactly the cause of free-threading reduced performance. Fine-graining the locks increased the lock count and their implementation is rather slow per se. Strange that *nix variants don't have InterlockedExchange, probably because they aren't x86 specific. I find it strange that other architectures wouldn't have these instructions though... Also, an OS should still be able to support such a function even if underlying architecture doesn't have it. After all, a kernel knows what it's currently running and they are typically not preempted themselves.
Also, a beside question: why does python so like to use events instead of "true" synchronization objects? Almost every library I looked at used that. IMHO that's quite irrational. Using objects that are intended for something else for the job while there are plenty of "true" options supported in every OS out there. Still, the free-threading mod could still work just fine if there was just one more global variable added: current python thread count. A simple check for value greater than 1 would trigger the synchronization code, while having just one thread would introduce no locking at all. Still, I didn't like the performance figures of the mod (0.6 execution speed, pretty bad core / processor scaling) I don't know why it's so hard to do simple locking just for writes to globals. I used to do it massively and it always worked almost with no penalty at all. It's true that those were all Windows programs, using critical sections. -- http://mail.python.org/mailman/listinfo/python-list