On Fri, 16 Dec 2016 04:06 am, Steve D'Aprano wrote: > I have some key:value data where the keys often are found in contiguous > ranges with identical values. [...]
Thank you to everyone who gave suggestions! I have experimented with two solutions, and hope somebody might be able to suggest some improvements. Attached is the test code I ran, suggestions for improving the performance will be appreciated. I decided on these two data structures: (1) The naive solution: don't worry about duplicate values, or the fact that keys come in contiguous groups, and just store each individual key and value in a dict; this is faster but uses more memory. (2) The clever solution: use a pair of lists, one holding the starting value of each group of keys, and the other holding the common values. This saves a lot of memory, but is slower. A concrete example might help. Suppose I have 15 keys in five groups: D = {0: 10, 1: 20, 2: 20, 3: 30, 4: 30, 5: 30, 6: 40, 7: 40, 8: 40, 9: 40, 10: 50, 11: 50, 12: 50, 13: 50, 14: 50} (Remember, in reality I could have as many as a million or two keys. This is just a simple toy example.) Instead of using a dict, I also set up a pair of lists: L = [0, 1, 3, 6, 10, 15] # starting value of each group V = [10, 20, 30, 40, 50] # value associated with each group Note that the first list has one extra item, namely the number one past the final group. I can do a key look-up using either of these: D[key] V[bisect.bisect_right(L, i) - 1] I tested the memory consumption and speed of these two solutions with (approximately) one million keys. I found: - the dict solution uses a lot more memory, about 24 bytes per key, compared to the pair of lists solution, which is about 0.3 bytes per key; - but on my computer, dict lookups are about 3-4 times faster. Any suggestions for improving the speed of the binary search version, or the memory consumption of the dict? By the way: the particular pattern of groups in the sample code (groups of size 1, 2, 3, ... up to 50, then repeating from 1, 2, 3, ... again) is just demo. In my real data, the sizes of the groups are all over the place, in an unpredictable pattern. Thanks in advance. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list