Hi, I've recently been working on an application[1] which does quite a bit of searching through large data structures and string matching, and I was thinking that it would help to put some of this CPU-intensive work in another thread, but of course this won't work because of Python's GIL.
There's a lot of past discussion on this, and I want to bring it up again because with the work on Python 3000, I think it is worth trying to take a look at what can be done to address portions of the problem through language changes. Also, the recent hardware trend towards multicore processors is another reason I think it is worth taking a look at the problem again. = dynamic objects, locking and __slots__ = I remember reading (though I can't find it now) one person's attempt at true multithreaded programming involved adding a mutex to all object access. The obvious question though is - why don't other true multithreaded languages like Java need to lock an object when making changes? The answer is that they don't support adding random attributes to objects; in other words, they default to the equivalent of __slots__. == Why hasn't __slots__ been successful? == I very rarely see Python code use __slots__. I think there are several reasons for this. The first is that a lot of programs don't need to optimize on this level. The second is that it's annoying to use, because it means you have to type your member variables *another* time (in addition to __init__ for example), which feels very un- Pythonic. == Defining object attributes == In my Python code, one restriction I try to follow is to set all the attributes I use for an object in __init__. You could do this as class member variables, but often I want to set them in __init__ anyways from constructor arguments, so "defining" them in __init__ means I only type them once, not twice. One random idea is to for Python 3000, make the equivalent of __slots__ the default, *but* instead gather the set of attributes from all member variables set in __init__. For example, if I write: class Foo(object): def __init__(self, bar=None): self.__baz = 20 if bar: self.__bar = bar else: self.__bar = time.time() f = Foo() f.otherattr = 40 # this would be an error! Can't add random attributes not defined in __init__ I would argue that the current Python default of supporting adding random attributes is almost never what you really want. If you *do* want to set random attributes, you almost certainly want to be using a dictionary or a subclass of one, not an object. What's nice about the current Python is that you don't need to redundantly type things, and we should preserve that while still allowing more efficient implementation strategies. = Limited threading = Now, I realize there are a ton of other things the GIL protects other than object dictionaries; with true threading you would have to touch the importer, the garbage collector, verify all the C extension modules, etc. Obviously non-trivial. What if as an initial push towards real threading, Python had support for "restricted threads". Essentially, restricted threads would be limited to a subset of the standard library that had been verified for thread safety, would not be able to import new modules, etc. Something like this: def datasearcher(list, queue): for item in list: if item.startswith('foo'): queue.put(item) queue.done() vals = ['foo', 'bar'] queue = queue.Queue() threading.start_restricted_thread(datasearcher, vals, queue) def print_item(item): print item queue.set_callback(print_item) Making up some API above I know, but the point here is "datasearcher" could pretty easily run in a true thread and touch very little of the interpreter; only support for atomic reference counting and a concurrent garbage collector would be needed. Thoughts? [1] http://submind.verbum.org/hotwire/wiki -- http://mail.python.org/mailman/listinfo/python-list