On Sun, Mar 15, 2020 at 12:28 PM Alex Hall <[email protected]> wrote:
> How about http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html
> ?
>
For my part, I didn't look for that, as I was having fun playing around
with the idea. But yes, that looks like it would fit the bill.
And at a glance, they were smarter than me :-) -- no need to keep the
underlying dict in sorted order, just find the key in the list, then use
the regular O(1) dict access.
But since I'm having fun, enclosed is an (incomplete) implementation of a a
SortedMap. This one keeps a list of keys and values in sorted order, then
can access things in O(logN). Insert is O(N), as it's using a regular old
list internally. But that is all in C, and not too bad in practice.
With the OP's example, it's slower than the basic dict method for creating,
but much faster for finding items.
(not formally timed)
But back the python_ideas topic:
It might be good to have a set of real, tree-based data structures -- they
are the best way to go for some use cases, and are non-trivial to
implement well.
(and yes, starting out at a third party package, maybe one that's already
out there, would be the way to go)
-CHB
--
Christopher Barker, PhD
Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
import bisect
from random import randint
N1 = 100_000 # N1 > N2 > N3
N2 = N1 // 10
N3 = N1 // 100
class SortedMap:
"""
a mapping that keeps the data in sorted order, for
faster retreval of "nth" items
note: not complete, but it has the core methods
"""
def __init__(self):
"""
for now, can only initialize an empty one
"""
# keys and values kept as in-sync lists
# can't put them in tuples in s alist, as the
# values may not be comparible for use in bisect
self._keys = []
self._values = []
def items(self):
return zip(self.keys, self.items)
def keys(self):
return iter(self._keys)
def values(self):
return iter(self._values)
def __getitem__(self, key):
"""
get an item
"""
keys = self._keys
idx = bisect.bisect_left(keys, key)
if idx != len(keys) and keys[idx] == key:
return self._values[idx]
else:
raise KeyError(repr(key))
def __setitem__(self, key, value):
"""
add an item, in sorted order in the keys
"""
# find the insertion point
idx = bisect.bisect(self._keys, key)
# put it in both lists:
self._keys.insert(idx, key)
self._values.insert(idx, value)
def nth_smallest_key(self, n):
"""
return the nth smallest key
"""
return self._keys[n]
def nth_smallest_value(self, n):
return self._values[n]
def nth_smallest_item(self, n):
return (self._keys[n], self._values[n])
def test_sorted_map_preserve_items():
sd = SortedMap()
sd[1] = "mary"
sd[3] = "fred"
sd[2] = "bob"
print(sd._keys)
print(sd._values)
# make sure it acts like a regular dict()
assert sd[1] == "mary"
assert sd[3] == "fred"
assert sd[2] == "bob"
def test_sorted_dict_preserve_sort_order():
sd = SortedMap()
sd[1] = "mary"
sd[3] = "fred"
sd[2] = "bob"
# make sure the keys are in order
assert list(sd.keys()) == sorted(sd.keys())
def test_nth_smallest_key():
sd = SortedMap()
for i in range(20):
sd[i] = i + 2
assert sd.nth_smallest_key(2) == 2
assert sd.nth_smallest_key(6) == 6
def test_nth_smallest_item():
sd = SortedMap()
for i in range(20):
sd[i] = i + 2
assert sd.nth_smallest_item(2) == (2, 4)
assert sd.nth_smallest_item(6) == (6, 8)
def test_nth_smallest_value():
sd = SortedMap()
for i in range(20):
sd[i] = i + 2
assert sd.nth_smallest_value(2) == 4
assert sd.nth_smallest_value(6) == 8
def nth_smallest_key(n, m):
return sorted(m.keys())[n]
def main():
dist = lambda: randint(0, 2147483647)
my_map = SortedMap()
# fills a map with N1 random mappings of type (int, int)
print("populating map with random data")
for i in range(0, N1):
my_map[dist()] = dist()
# prints out the N3th smallest key and its value
print("Getting an nth item")
target_key = my_map.nth_smallest_key(N3)
print("({}: {})".format(target_key, my_map[target_key]))
# writes a new random mapping to the map
# then prints out the N3th smallest key and its value if that key
# has changed
# 100000 times
print("Testing adding one new item at a time")
for i in range(N3):
my_map[dist()] = dist()
test_key = my_map.nth_smallest_key(N3)
if target_key != test_key:
target_key = test_key
print(i, target_key, test_key)
print("({}: {})".format(target_key, my_map[target_key]))
# print an indicator every N3 iterations for comparison
if i % N3 == 0:
print("iteration: {}".format(i))
if __name__ == "__main__":
main()
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/7BZJOSELPYSOPSYTER5NORB253TRHE6D/
Code of Conduct: http://python.org/psf/codeofconduct/