In other languages, there are built-in data structures with binary search tree
semantics, as an example, std::map in C++.
By binary search tree semantics we mean the following:
* We can insert elements into the structure in time faster than O(n), i.e.
O(logn) for std::map
* We can iterate over all of the elements of the structure in sorted order in
linear time, i.e. O(n) for std::map
Is there currently any data structure in Python with binary search tree
semantics?
To illustrate the need, can the following program be written elegantly and
efficiently with built-in Python data structures?
Imagine that you are a fund manager at a bank. Bank accounts are added to your
fund very frequently. A bank account has an ID and a value. A smaller ID means
that the bank account was created earlier than one with a larger ID. Each time
that an account is added to your fund, you would like to know the ID number of
the 1000th oldest account, for your records.
Here are implementations in Python 3 and C++, for comparison:
```python3
from random import randint
dist = lambda: randint(0, 2147483647)
def main():
my_map = {}
# fills a map with 1000000 random mappings of type (int, int)
for i in range(0, 1000000):
my_map[dist()] = dist()
# prints out the 1000th smallest key and its value
target_key = nth_smallest_key(1000, my_map)
print("({}: {})".format(target_key, my_map[target_key]))
# writes a new random mapping to the map
# then prints out the 1000th smallest key and its value if that key
# has changed
# 100000 times
for i in range(100000):
my_map[dist()] = dist()
test_key = nth_smallest_key(1000, my_map)
if target_key != test_key:
target_key = test_key
print("({}: {})".format(target_key, my_map[target_key]))
# print an indicator every 100 iterations for comparison
if i % 100 == 0:
print("iteration: {}".format(i))
def nth_smallest_key(n, m):
return sorted(m.keys())[n]
if __name__ == "__main__":
main()
```
```c++
#include <cstdio>
#include <map>
#include <random>
using namespace std;
int nth_smallest_key(int n, map<int, int>& m);
int main() {
random_device rd;
mt19937 psrng(rd());
uniform_int_distribution<> dist(0, 2147483647);
map<int, int> my_map;
// fills a map with 1000000 random mappings of type (<int>: <int>)
for (int i = 0; i < 1000000; i++) {
my_map[dist(psrng)] = dist(psrng);
}
// prints out the 1000th smallest key and its value
int target_key = nth_smallest_key(1000, my_map);
printf("(%d: %d)\n", target_key, my_map[target_key]);
// writes a new random mapping to the map
// then prints out the 1000th smallest key and its value if that key
// has changed
// 100000 times
for (int i = 0; i <= 100000; i++) {
my_map[dist(psrng)] = dist(psrng);
int test_key = nth_smallest_key(1000, my_map);
if (target_key != test_key) {
target_key = test_key;
printf("(%d: %d)\n", target_key, my_map[target_key]);
}
}
return 0;
}
int nth_smallest_key(int n, map<int, int>& m) {
map<int, int>::iterator curr = m.begin();
for (int i = 0; i < n; i++) {
curr = next(curr);
}
return curr->first;
}
```
Makefile:
```make
runcpp: buildcpp
./main
buildcpp:
g++ -o main bst_semantics_cpp.cpp
runpython:
python3 bst_semantics_python.py
```
The C++ program runs on my machine in approximately 5 seconds
```bash
$ time ./main
(2211193: 2021141747)
(2208771: 1079444337)
(2208700: 1187119077)
(2208378: 1447743503)
...
(1996019: 1378401665)
(1989217: 1457497754)
(1989042: 1336229409)
real 0m4.915s
user 0m4.750s
sys 0m0.094s
$
```
The Python program does not reach 1% of completion after 120 seconds
```bash
$ time make runpython
python3 bst_semantics_python.py
(2158070: 1498305903)
iteration: 0
iteration: 100
iteration: 200
^CTraceback (most recent call last):
File "bst_semantics_python.py", line 36, in <module>
main()
File "bst_semantics_python.py", line 23, in main
test_key = nth_smallest_key(1000, my_map)
File "bst_semantics_python.py", line 33, in nth_smallest_key
return sorted(m.keys())[n]
KeyboardInterrupt
Makefile:8: recipe for target 'runpython' failed
make: *** [runpython] Error 1
real 2m2.040s
user 1m59.063s
sys 0m0.375s
$
```
I would like to propose adding such a data structure to the Python standard
library. It could be called a SortedDict.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/VFBM6UFVDUEPVCUTSTKGLB66MRIMOTCO/
Code of Conduct: http://python.org/psf/codeofconduct/