Jeffrey C. Jacobs <[EMAIL PROTECTED]> added the comment: Thanks Raymond for checking on the status of this issue and Guido for thinking about it too. I still support the basic concept for the reasons I specified in the original description, namely maintaining a sorted list (or random-access iterable). Thus, although in some ways it seems like syntactic sugar to make bisect and heap match the parameter list of sort, there is good reason to do so.
That having been said, the re-computation of the /key/ function for typically the same values at each invocation of an insort_* method can potentially add a lot of otherwise unnecessary calls. So, having thought about this further, it seems to me that, somehow, the bisect/insort routines functions need to cache the results of each evaluation of key in the same way that sort can maintain that information in a local variable. Ideally, IMHO, the keys computed in sort could be cached and stored with the list upon which it was operated, but this is cumbersome, would not work for generic types and would be invalid if for some reason the sort key was different from the bisect key. Typically the later case would not occur but since it might lead to unexpected behavior it should definitely be avoided. Instead, I am now thinking that the only way we can safely maintain caching information is to implement a persistent object to contain it that can be shared between each invocation of bisect/insort so as to not force re-evaluation of the same key argument. This clearly could be done with a class. So I would propose that a bisect class be created that would hold the list, the current key function and the cached key values. Of course, special care would need to be taken to take into account mutable sequences that could be changed between bisect calls. The class could be implements in a number of ways. For instance, it could create a callable instance that accepts a list and other parameters. It will then look at its state, including a check of /is/ with the input and the last list operated upon, and if that test fails, the 'key cache' is invalidated. Next, if a key method is provided, it is checked against the last key function used (/is/) and if it is not, the 'key cache' is invalidated. Then, to prevent mutability problems, the 'key cache' would actually be implemented as a dictionary mapping item to key-value and each time a key needed to be computed, the input would first be checked against the cache and only if it was not there would the key function be called. This should allow for preventing undefined behavior when handling invalid cases. Of course, if the client wishes to use bisect on 2 different lists or 2 different keys, sie could always just create 2 different bisect objects. There are of course other ways to accomplish this, such as binding the bisect object to a list and key at construction that cannot be reset (at least, should not be) once invoked. I am also wary of implementing the 'key cache' as a dictionary as it adds a hash-table lookup which is already potentially expensive. Ideally, if the bisect object could force its associated list to be immutable for the life of the bisect, this would get around the problem of external inserts and deletes that would invalidate the 'key cache'. Obviously, there may be another way to define such a class but I think the only way you can safely make sure the bisect functions don't unnecessarily compute key without modifying the list is to allow for the caching of values between invocations, and the only way you can do that safely is by putting the cache in some helper class. Of course, a class is only needed for when key is used. comp and reverse, AFAIK, are not cached when sorting. So the existing, generic bisect algorithms would still be useful and in fact, you'd want the bisect class to be optional, IMHO. Of course, Guido's suggestion of just mapping your list into a list of (key, value) tuples is a good workaround for just the key case (assuming key, comp and reverse flags are not all used at the same time for sorting). It probably should be a cookbook item that could be documented with the bisect (and heap) libraries. It can't completely solve the problem, but it will solve some conditions in the short term. Long term, I still think we should consider bisect, potentially with a class definition. _____________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1619060> _____________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com