Feature Requests item #1185383, was opened at 2005-04-18 14:26 Message generated for change (Comment added) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1185383&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Marcin Ciura (mciura) Assigned to: Nobody/Anonymous (nobody) Summary: Make bisect.* functions accept an optional compare function Initial Comment: The usability of bisect.bisect_right, bisect.bisect_left, bisect.insort_right and bisect.insort_left would increase if they accepted an optional less_than function to compare the keys. Here is my use case: I have a sorted list of reversed words and a parallel list of flags associated with these words (taken from ispell). The list of reversed words is the one searched; the flags are the result of a search. Issue #1: Now I cannot use simply a list of tuples (rev_word,flags) and a custom less_than function that compares only the first elements of two tuples. Issue #2: When a word is not found in the list, I'd like to make an educated guess as to its flags (this makes more sense in non-English languages, where e.g. infinitives have a special ending), using bisect_left and bisect_right: from bisect import * less_than = lambda x,y: x[:3]<y[:3] lp = bisect_left(word_list, given_rev_word, lt=less_than) rp = bisect_right(word_list, given_rev_word, lt=less_than) # return the union of flag_list[lp:rp] An example (given_rev_word = 'abcpqr'): word_list: 'abbx', 'abcaa', <- lp 'abcdd', 'abcss', 'abdf' <- rp Currently, the first search could be replaced with lp = bisect_left(word_list, given_rev_word[:3]) but I can see only non-nice ways to replace the other search. Rolling my own class that stores a word and its flags, with __lt__ depending on some global setting is not thread-safe, and I find such a solution too heavyweight. I hope that I expressed myself clearly enough. If not, let me know, and I'll try to clarify my point. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2005-06-02 20:25 Message: Logged In: YES user_id=80475 Overall, I'm -1 on this RFE. The comparison to nsmallest() and nlargest() is inaccurate. They run start-to-finish in one function call. The other heapq methods do not use key functions because they have to leave the original data structure unmolested between calls; hence, there is no ability to limit the key function calls to one per record. Likewise, with this request, the key function calls get wasted. The sort() method calls key() for every record and tosses the result afterwards. Each subsequect call to bisect() would need to repeat those calls for a log(N) subset of the data. Hence, accepting this request would create an API that encourages a wasteful design. ---------------------------------------------------------------------- Comment By: Jonas Kölker (jonaskoelker) Date: 2005-04-28 15:13 Message: Logged In: YES user_id=1153395 In a similar vein, I'd like to see that a `key' keyword argument was added to bisect.* (and perhaps `reversed' too)--it's really annoying to sort something by (not __lt__) and not be able to bsearch it. It got added to min/max and heapq.n(smallest|largest) in 2.5 fwiw ;) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2005-04-18 15:09 Message: Logged In: YES user_id=80475 A few thoughts: * If bisect provided an optional lt argument, it would still not be thread-safe. The code for the lt function can call arbitrary Python code that runs through the eval-loop and is hence subject to a thread-switch. * An advantage of the class wrapper approach is that the prefix function gets computed just once per word. This is not a big deal for the specific case of [:3], but it could be a significant benefit for other functions that are expensive to compute (because repeated calls to bisect will access the lt function more than once). * My own approach to the particular use case would be to map prefixes to a list of revwords or (revword, flag) pairs: d = dict(abb=['abbc'], abc=['abcaa', 'abcdd', 'abcss'], abd=['abdf']) This gives O(1) lookup time and limits the calls to the prefix function to once per word. Will leave this request open as it may yet be a good idea. My concern is that it will clutter the code and the API for only a small benefit. Also, I'm looking at a more general purpose solution that would make this feature request moot. This idea is to create a fast comparison decorator class used like this: dwordlist = map(ComparisonDecorator(lambda x: x[:3]), wordlist) lp = bisect_left(dwordlist, given_rev_word) rp = bisect_right(dwordlist, given_rev_word) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1185383&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com