Maciej Blizi?ski wrote: > Hi Pythonistas! > > I've got a question about storing tuples in a dictionary. First, a > small test case which creates a list of dictionaries: > > import time > > list_of_dicts = [] > keys = [str(x) for x in range(200000)] > prev_clk = time.clock() > for i in range(20): > my_dict = {} > for key in keys: > my_dict[key] = key > list_of_dicts.append(my_dict) > new_clk = time.clock() > print i, new_clk - prev_clk > prev_clk = new_clk > > It creates dictionaries and stores them in a list, printing out > execution times. The size of each dictionary is constant, so is the > execution time for each iteration. > > 0 0.1 > 1 0.1 > 2 0.1 > 3 0.08 > 4 0.09 > > ...and so on. > > Then, just one line is changed: > my_dict[key] = key > into: > my_dict[key] = (key, key) > > Full code: > > list_of_dicts = [] > keys = [str(x) for x in range(200000)] > prev_clk = time.clock() > for i in range(20): > my_dict = {} > for key in keys: > my_dict[key] = (key, key) > list_of_dicts.append(my_dict) > new_clk = time.clock() > print i, new_clk - prev_clk > prev_clk = new_clk > > The difference is that instead of single values, tuples are added to > the dictionary instead. When the program is run again: > > 0 0.27 > 1 0.37 > 2 0.49 > 3 0.6 > ... > 16 2.32 > 17 2.45 > 18 2.54 > 19 2.68 > > The execution time is rising linearly with every new dictionary > created. > > Next experiment: dictionaries are not stored in a list, they are just > left out when an iteration has finished. It's done by removing two > lines: > > list_of_dicts = [] > > and > > list_of_dicts.append(my_dict) > > Full code: > > keys = [str(x) for x in range(200000)] > prev_clk = time.clock() > for i in range(20): > my_dict = {} > for key in keys: > my_dict[key] = (key, key) > new_clk = time.clock() > print i, new_clk - prev_clk > prev_clk = new_clk > > The time is constant again: > > 0 0.28 > 1 0.28 > 2 0.28 > 3 0.26 > 4 0.26 > > I see no reason for this kind of performance problem, really. It > happens when both things are true: dictionaries are kept in a list (or > more generally, in memory) and they store tuples. > > As this goes beyond my understanding of Python internals, I would like > to kindly ask, if anyone has an idea about how to create this data > structure (list of dictionaries of tuples, assuming that size of all > dictionaries is the same), in constant time?
Disable garbage collection: import gc gc.disable() It uses inappropriate heuristics in this case. Peter -- http://mail.python.org/mailman/listinfo/python-list