On Feb 11, 6:50 pm, Steven D'Aprano <st...@remove-this- cybersource.com.au> wrote: > On Thu, 11 Feb 2010 15:39:09 -0800, Jeremy wrote: > > My Python program now consumes over 2 GB of memory and then I get a > > MemoryError. I know I am reading lots of files into memory, but not 2GB > > worth. > > Are you sure? > > Keep in mind that Python has a comparatively high overhead due to its > object-oriented nature. If you have a list of characters: > > ['a', 'b', 'c', 'd'] > > there is the (small) overhead of the list structure itself, but each > individual character is not a single byte, but a relatively large object: > > >>> sys.getsizeof('a') > 32 > > So if you read (say) a 500MB file into a single giant string, you will > have 500MB plus the overhead of a single string object (which is > negligible). But if you read it into a list of 500 million single > characters, you will have the overhead of a single list, plus 500 million > strings, and that's *not* negligible: 32 bytes each instead of 1. > > So try to avoid breaking a single huge strings into vast numbers of tiny > strings all at once. > > > I thought I didn't have to worry about memory allocation in > > Python because of the garbage collector. > > You don't have to worry about explicitly allocating memory, and you > almost never have to worry about explicitly freeing memory (unless you > are making objects that, directly or indirectly, contain themselves -- > see below); but unless you have an infinite amount of RAM available of > course you can run out of memory if you use it all up :) > > > On this note I have a few > > questions. FYI I am using Python 2.6.4 on my Mac. > > > 1. When I pass a variable to the constructor of a class does it copy > > that variable or is it just a reference/pointer? I was under the > > impression that it was just a pointer to the data. > > Python's calling model is the same whether you pass to a class > constructor or any other function or method: > > x = ["some", "data"] > obj = f(x) > > The function f (which might be a class constructor) sees the exact same > list as you assigned to x -- the list is not copied first. However, > there's no promise made about what f does with that list -- it might copy > the list, or make one or more additional lists: > > def f(a_list): > another_copy = a_list[:] > another_list = map(int, a_list) > > > 2. When do I need > > to manually allocate/deallocate memory and when can I trust Python to > > take care of it? > > You never need to manually allocate memory. > > You *may* need to deallocate memory if you make "reference loops", where > one object refers to itself: > > l = [] # make an empty list > l.append(l) # add the list l to itself > > Python can break such simple reference loops itself, but for more > complicated ones, you may need to break them yourself: > > a = [] > b = {2: a} > c = (None, b) > d = [1, 'z', c] > a.append(d) # a reference loop > > Python will deallocate objects when they are no longer in use. They are > always considered in use any time you have them assigned to a name, or in > a list or dict or other structure which is in use. > > You can explicitly remove a name with the del command. For example: > > x = ['my', 'data'] > del x > > After deleting the name x, the list object itself is no longer in use > anywhere and Python will deallocate it. But consider: > > x = ['my', 'data'] > y = x # y now refers to THE SAME list object > del x > > Although you have deleted the name x, the list object is still bound to > the name y, and so Python will *not* deallocate the list. > > Likewise: > > x = ['my', 'data'] > y = [None, 1, x, 'hello world'] > del x > > Although now the list isn't bound to a name, it is inside another list, > and so Python will not deallocate it. > > > 3. Any good practice suggestions? > > Write small functions. Any temporary objects created by the function will > be automatically deallocated when the function returns. > > Avoid global variables. They are a good way to inadvertently end up with > multiple long-lasting copies of data. > > Try to keep data in one big piece rather than lots of little pieces. > > But contradicting the above, if the one big piece is too big, it will be > hard for the operating system to swap it in and out of virtual memory, > causing thrashing, which is *really* slow. So aim for big, but not huge. > > (By "big" I mean megabyte-sized; by "huge" I mean hundreds of megabytes.) > > If possible, avoid reading the entire file in at once, and instead > process it line-by-line. > > Hope this helps, > > -- > Steven
Wow, what a great bunch of responses. Thank you very much. If I understand correctly the suggestions seem to be: 1. Write algorithms to read a file one line at a time instead of reading the whole thing 2. Use lots of little functions so that memory can fall out of scope. You also confirmed what I thought was true that all variables are passed "by reference" so I don't need to worry about the data being copied (unless I do that explicitly). Thanks! Jeremy -- http://mail.python.org/mailman/listinfo/python-list