[EMAIL PROTECTED] wrote: > I've recently been working on an application[1] which does quite a bit > of searching through large data structures and string matching, and I > was thinking that it would help to put some of this CPU-intensive work > in another thread, but of course this won't work because of Python's > GIL.
If you are doing string searching, implement the algorithm in C, and call out to the C (remembering to release the GIL). > There's a lot of past discussion on this, and I want to bring it up > again because with the work on Python 3000, I think it is worth trying > to take a look at what can be done to address portions of the problem > through language changes. Not going to happen. All Python 3000 PEPs had a due-date at least a month ago (possibly even 2), so you are too late to get *any* substantial change in. > I remember reading (though I can't find it now) one person's attempt > at true multithreaded programming involved adding a mutex to all > object access. The obvious question though is - why don't other true > multithreaded languages like Java need to lock an object when making > changes? From what I understand, the Java runtime uses fine-grained locking on all objects. You just don't notice it because you don't need to write the acquire()/release() calls. It is done for you. (in a similar fashion to Python's GIL acquisition/release when switching threads) They also have a nice little decorator-like thingy (I'm not a Java guy, so I don't know the name exactly) called 'synchronize', which locks and unlocks the object when accessing it through a method. - Josiah > == Why hasn't __slots__ been successful? == > > I very rarely see Python code use __slots__. I think there are > several reasons for this. The first is that a lot of programs don't > need to optimize on this level. The second is that it's annoying to > use, because it means you have to type your member variables *another* > time (in addition to __init__ for example), which feels very un- > Pythonic. > > == Defining object attributes == > > In my Python code, one restriction I try to follow is to set all the > attributes I use for an object in __init__. You could do this as > class member variables, but often I want to set them in __init__ > anyways from constructor arguments, so "defining" them in __init__ > means I only type them once, not twice. > > One random idea is to for Python 3000, make the equivalent of > __slots__ the default, *but* instead gather > the set of attributes from all member variables set in __init__. For > example, if I write: > > class Foo(object): > def __init__(self, bar=None): > self.__baz = 20 > if bar: > self.__bar = bar > else: > self.__bar = time.time() > > f = Foo() > f.otherattr = 40 # this would be an error! Can't add random > attributes not defined in __init__ > > I would argue that the current Python default of supporting adding > random attributes is almost never what you really want. If you *do* > want to set random attributes, you almost certainly want to be using a > dictionary or a subclass of one, not an object. What's nice about the > current Python is that you don't need to redundantly type things, and > we should preserve that while still allowing more efficient > implementation strategies. > > = Limited threading = > > Now, I realize there are a ton of other things the GIL protects other > than object dictionaries; with true threading you would have to touch > the importer, the garbage collector, verify all the C extension > modules, etc. Obviously non-trivial. What if as an initial push > towards real threading, Python had support for "restricted threads". > Essentially, restricted threads would be limited to a subset of the > standard library that had been verified for thread safety, would not > be able to import new modules, etc. > > Something like this: > > def datasearcher(list, queue): > for item in list: > if item.startswith('foo'): > queue.put(item) > queue.done() > > vals = ['foo', 'bar'] > queue = queue.Queue() > threading.start_restricted_thread(datasearcher, vals, queue) > def print_item(item): > print item > queue.set_callback(print_item) > > Making up some API above I know, but the point here is "datasearcher" > could pretty easily run in a true thread and touch very little of the > interpreter; only support for atomic reference counting and a > concurrent garbage collector would be needed. > > Thoughts? > > [1] http://submind.verbum.org/hotwire/wiki > -- http://mail.python.org/mailman/listinfo/python-list