On Mon, Jan 17, 2011 at 2:20 PM, Jake Biesinger <jake.biesin...@gmail.com> wrote: > Hi all, > > Using numpy, I can create large 2-dimensional arrays quite easily. >>>> import numpy >>>> mylist = numpy.zeros((100000000,2), dtype=numpy.int32) > > Unfortunately, my target audience may not have numpy so I'd prefer not to use > it. > > Similarly, a list-of-tuples using standard python syntax. >>>> mylist = [(0,0) for i in xrange(100000000) > > but this method uses way too much memory (>4GB for 100 million items, > compared to 1.5GB for numpy method). > > Since I want to keep the two elements together during a sort, I *can't* use > array.array. >>>> mylist = [array.array('i',xrange(100000000)), >>>> array.array('i',xrange(100000000))] > > If I knew the size in advance, I could use ctypes arrays. >>>> from ctypes import * >>>> class myStruct(Structure): >>>> _fields_ = [('x',c_int),('y',c_int)] >>>> mylist_type = myStruct * 100000000 >>>> mylist = mylist_type() > > but I don't know that size (and it can vary between 1 million-200 million), > so preallocating doesn't seem to be an option. > > Is there a python standard library way of creating *efficient* 2-dimensional > lists/arrays, still allowing me to sort and append? > > Thanks! > -- > http://mail.python.org/mailman/listinfo/python-list >
I recently had need of a two dimensional array also, but I only needed small ones, so I just used a dictionary indexed by tuples. If you need to sort a row as an aggregate type, it seems that you probably either want to: 1) Use a list of lists, where the outer list is the easier one to sort by - this is analogous to the array of pointers in C - sorting by the outer would want to compare "elements" by looking at the 0th inner, then 1st inner, etc. So if you can organize things this way, things might be pretty easy. 2) Use a list of instances, where the instances (rows) might just include a list 3) Use array.array with a custom sort so that you can move multiple things when a more common sort would move "one" (possibly an aggregate). If you want some sorting code to start from for #3, I have a variety of sorts written in Python (pure python or cython, your option, each automatically derived from the same m4 file for a given sort) at http://stromberg.dnsalias.org/svn/sorts/compare/trunk/ - however, even the cython timsort provided there doesn't perform quite as well as the standard library's timsort. HTH -- http://mail.python.org/mailman/listinfo/python-list