On May 31, 8:30 pm, Maciej BliziĆski <[EMAIL PROTECTED]> wrote: > Hi Pythonistas! > > I've got a question about storing tuples in a dictionary. First, a > small test case which creates a list of dictionaries: > > import time > > list_of_dicts = [] > keys = [str(x) for x in range(200000)] > prev_clk = time.clock() > for i in range(20): > my_dict = {} > for key in keys: > my_dict[key] = key > list_of_dicts.append(my_dict) > new_clk = time.clock() > print i, new_clk - prev_clk > prev_clk = new_clk > > It creates dictionaries and stores them in a list, printing out > execution times. The size of each dictionary is constant, so is the > execution time for each iteration. > > 0 0.1 > 1 0.1 > 2 0.1 > 3 0.08 > 4 0.09 > > ...and so on. > > Then, just one line is changed: > my_dict[key] = key > into: > my_dict[key] = (key, key) > > Full code: > > list_of_dicts = [] > keys = [str(x) for x in range(200000)] > prev_clk = time.clock() > for i in range(20): > my_dict = {} > for key in keys: > my_dict[key] = (key, key) > list_of_dicts.append(my_dict) > new_clk = time.clock() > print i, new_clk - prev_clk > prev_clk = new_clk > > The difference is that instead of single values, tuples are added to > the dictionary instead. When the program is run again: > > 0 0.27 > 1 0.37 > 2 0.49 > 3 0.6 > ... > 16 2.32 > 17 2.45 > 18 2.54 > 19 2.68 > > The execution time is rising linearly with every new dictionary > created. > > Next experiment: dictionaries are not stored in a list, they are just > left out when an iteration has finished. It's done by removing two > lines: > > list_of_dicts = [] > > and > > list_of_dicts.append(my_dict) > > Full code: > > keys = [str(x) for x in range(200000)] > prev_clk = time.clock() > for i in range(20): > my_dict = {} > for key in keys: > my_dict[key] = (key, key) > new_clk = time.clock() > print i, new_clk - prev_clk > prev_clk = new_clk > > The time is constant again: > > 0 0.28 > 1 0.28 > 2 0.28 > 3 0.26 > 4 0.26 > > I see no reason for this kind of performance problem, really. It > happens when both things are true: dictionaries are kept in a list (or > more generally, in memory) and they store tuples. > > As this goes beyond my understanding of Python internals, I would like > to kindly ask, if anyone has an idea about how to create this data > structure (list of dictionaries of tuples, assuming that size of all > dictionaries is the same), in constant time? > > Regards, > Maciej
Let me comment on what happens in you're code: The place where you create new objects is keys = [str(x) for x in range(200000)] # here you create 200000 strings which will be reused ( by reference ) and my_dict[key] = (key, key) # here you create a new tuple with 2 elements ( both are key, so you're taking a reference of existing key object twice ) The tricky part is where you wrote: for key in keys: my_dict[key] = (key, key) list_of_dicts.append(my_dict) # note that list_of_dicts.append is in the loop! check upstairs! This means that my_dict reference will be stored 200000 times, and it won't be released. statement my_dict = {} will always create new my_dict ( 20 times means 20 new dictionaries ) and start over. Since python caches free dictionaries ( after delete - they're used everywhere ), reuse won't happen, and memory will have to be allocated again. Lists are internally like arrays, when there is not enough space for next element, pointer array is doubled, so there is no huge penalty in the append function. Lists are also reused from some internal cache. Dictionaries also have a some growth function. When there is no space for next key, internal hash map doubles. The reason why you have a growing time comes from the fact that memory allocation takes place instead of object being used by reference. Check the memory usage, and you'll see that test time is pretty much proportional to overall memory usage. Regards, Bosko -- http://mail.python.org/mailman/listinfo/python-list