Did you try it with an array object using the array module? On Tue, Feb 2, 2010 at 3:14 PM, Mitchell L Model <mlm...@comcast.net> wrote: > An instructive lesson in YAGNI ("you aren't going to need it"), premature > optimization, and not making assumptions about Python data structure > implementations. > > I need a 1000 x 1000 two-dimensional array of objects. (Since they are > instances of application classes it appears that the array module is > useless; likewise, since I am using Python 3.1, so among other things, I > can't use numpy or its relatives.) The usage pattern is that the array is > first completely filled with objects. Later, objects are sometimes accessed > individually by row and column and often the entire array is iterated over. > > Worried (unnecessarily, as it turns out) by the prospect of 1,000,000 > element list I started by constructing a dictionary with the keys 1 through > 1000, each of which had as its value another dictionary with the keys 1 > through 1000. Actual values were the values of the second level dictionary. > > Using numbers to fill the array to minimize the effect of creating my more > complex objects, and running Python 3.1.1 on an 8-core Mac Pro with 8Gb > memory, I tried the following > > #create and fill the array: > t1 = time.time() > d2 = {} > for j in range(1000): > d2[j] = dict() > for k in range(1000): > d2[j][k] = k > print( round(time.time() - t1, 2)) > > 0.41 > > # access each element of the array: > t1 = time.time() > for j in range(1000): > for k in range(1000): > elt = d2[j][k] > print( round(time.time() - t1, 2)) > > 0.55 > > My program was too slow, so I started investigating whether I could improve > on the two-level dictionary, which got used a lot. To get another baseline I > tried a pure 1,000,000-element list, expecting the times to be be > horrendous, but look! > > # fill a list using append > t1 = time.time() > lst = [] > for n in range(1000000): > lst.append(n) > print( round(time.time() - t1, 2)) > > 0.26 > > # access every element of a list > t1 = time.time() > for n in range(1000000): > elt = lst[n] > print( round(time.time() - t1, 2)) > > 0.25 > > What a shock! I could save half the execution time and all my clever work > and awkward double-layer dictionary expressions by just using a list! > > Even better, look what happens using a comprehension to create the list > instead of a loop with list.append: > > t1 = time.time() > lst = [n for n in range(1000000)] > print( round(time.time() - t1, 2)) > > 0.11 > > Half again to create the list. > > Iterating over the whole list is easier and faster than iterating over the > double-level dictionary, in particular because it doesn't involve a > two-level loop. But what about individual access given a row and a column? > > t1 = time.time() > for j in range(1000): > for k in range(1000): > elt = lst[j * 1000 + k] > print( round(time.time() - t1, 2)) > > 0.45 > > This is the same as for the dictionary. > > I tried a two-level list and a few other things but still haven't found > anything that works better than a single long list -- just like 2-D arrays > are coded in old-style languages, with indices computed as offsets from the > beginning of the linear sequence of all the values. What's amazing is that > creating and accessing 1,000,000-element list in Python is so efficient. The > usual moral obtains: start simple, analyze problems (functional or > performance) as they arise, decide whether they are worth the cost of > change, then change in very limited ways. And of course abstract and > modularize so that, in my case, for example, none of the program's code > would be affected by the change from a two-level dictionary representation > to using a single long list. > > I realize there are many directions an analysis like this can follow, and > many factors affecting it, including patterns of use. I just wanted to > demonstrate the basics for a situation that I just encountered. In > particular, if the array was sparse, rather than completely full, the > two-level dictionary implementation would be the natural representation. > -- > http://mail.python.org/mailman/listinfo/python-list >
-- Gerald Britton -- http://mail.python.org/mailman/listinfo/python-list