On Sat, 17 Dec 2016 08:31 pm, Peter Otten wrote: > Steve D'Aprano wrote: > >> I have experimented with two solutions, and hope somebody might be able >> to suggest some improvements. Attached is the test code I ran, >> suggestions for improving the performance will be appreciated. > > If there was an attachment to this text -- that didn't make it to the > mailing list or the news group.
I see it on comp.lang.python but not the mailing list or gmane. That's annoying because its a *text* attachment and should be allowed. I'll attach it again, inline this time, at the end of this post. [...] >> I tested the memory consumption and speed of these two solutions with >> (approximately) one million keys. I found: >> >> - the dict solution uses a lot more memory, about 24 bytes per key, >> compared to the pair of lists solution, which is about 0.3 bytes per key; >> >> - but on my computer, dict lookups are about 3-4 times faster. > > Only three to four times? You basically get that from a no-op function > call: [...] > Even adding a dummy value to V to go from > >> V[bisect.bisect_right(L, i) - 1] > > to > > V[bisect.bisect_right(L, i)] > > might be visible in your benchmark. Good thinking! I'll try that. And... it doesn't appear to make any real difference. >> Any suggestions for improving the speed of the binary search version, or >> the memory consumption of the dict? > > Depending on the usage pattern you might try bisect combined with a LRU > cache. Adding a cache will probably eat up the memory savings, but I may play around with that. And here's the code, this time inline. I've added the dummy value to the values list V as suggested. # --- cut --- import bisect import random import sys from timeit import default_timer as timer def make_keys(): """Yield (start, end) values suitable for passing to range(). Distance between start and end increase from 1 to 51, then repeat, for a total of 800*25*51 = 1020000 values all together. """ n = 1 for j in range(800): for i in range(50): yield (n, n+i+1) n += i+1 def populate(): D = {} L = [] V = [None] value = 1 last_b = 1 for a, b in make_keys(): assert a == last_b for i in range(a, b): D[i] = value L.append(a) V.append(value) last_b = b L.append(last_b) return (D, L, V) class Stopwatch: """Context manager for timing long running chunks of code.""" def __enter__(self): self._start = timer() return self def __exit__(self, *args): elapsed = timer() - self._start del self._start if elapsed < 0.01: print("Elapsed time is very small, consider using timeit instead.") print('time taken: %f seconds' % elapsed) D, L, V = populate() assert len(D) == 800*25*51 assert len(L) == len(V) print("size of dict", sys.getsizeof(D)) print("size of two lists", sys.getsizeof(L) + sys.getsizeof(V)) # Confirm that values are the same whether using dict lookup # or binary search. for i in range(1, 800*25*51 + 1): index = bisect.bisect_right(L, i) assert D[i] == V[index] # Simulate a large number of lookups in random order. nums = list(range(1, 800*25*51 + 1)) random.shuffle(nums) with Stopwatch(): for i in nums: x = D[i] with Stopwatch(): for i in nums: x = V[bisect.bisect_right(L, i)] # --- cut --- -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list