Well, I am using Python 2.5 (and the IDLE shell) in Windows XP, which ships 
with ESRI's ArcGIS. In addition, I am using some functions in the 
arcgisscripting Python geoprocessing module for geographic information systems 
(GIS) applications, which can complicate things. I am currently isolating 
standard library Python code (e.g., os.walk()) from the arcgisscripting module 
to evaluate in which module the environment crash is occurring. 

-----Original Message-----
From: Dave Angel [mailto:da...@dejaviewphoto.com] 
Sent: Monday, November 07, 2011 1:20 PM
To: Juan Declet-Barreto
Cc: python-list@python.org
Subject: Re: memory management


On 11/07/2011 02:43 PM, Juan Declet-Barreto wrote:
> Hi,
>
> Can anyone provide links or basic info on memory management, variable 
> dereferencing, or the like? I have a script that traverses a file structure 
> using os.walk and adds directory names to a list. It works for a small number 
> of directories, but when I set it loose on a directory with thousands of 
> dirs/subdirs, it crashes the DOS session and also the Python shell (when I 
> run it from the shell).  This makes it difficult to figure out if the 
> allocated memory or heap space for the DOS/shell session have overflown, or 
> why it is crashing.
>
> Juan Declet-Barreto [ciId:image001.png@01CC9D4A.CB6B9D70]
I don't have any reference to point you to, but CPython's memory management is 
really pretty simple.  However, it's important to tell us the build of Python, 
as there are several, with very different memory rules.  For example Jython, 
which is Python running in a Java VM, lets the java garbage collector handle 
things, and it's entirely different.

Likewise, the OS may be relevant.  You're using Windows-kind of terminology, 
but that doesn't prove you're on Windows, nor does it say what version.

Assuming 32 bit CPython 2.7 on XP, the principles are simple.  When an 
object is no longer accessible, it gets garbage collected*.   So if you 
build a list inside a function, and the only reference is from a function's 
local var, then the whole list will be freed when the function exits.  The 
mistakes many people make are unnecessarily using globals, and using lists when 
iterables would work just as well.

The tool on XP to tell how much memory is in use is the task manager.  
As you point out, its hard to catch a short-running app in the act.  So you 
want to add a counter to your code (global), and see how high it gets when it 
crashes.  Then put a test in your code for the timer value, and do an "input" 
somewhat earlier.

At that point, see how much memory the program is actually using.

Now, when an object is freed, a new one of the same size is likely to 
immediately re-use the space.  But if they're all different sizes, it's 
somewhat statistical.  You might get fragmentation, for example.  When Python's 
pool is full, it asks the OS for more (perhaps using swap space), but I don't 
think it ever gives it back.  So your memory use is a kind of ceiling case.  
That's why it's problematic to build a huge data structure, and then walk 
through it, then delete it.  The script will probably continue to show the peak 
memory use, indefinitely.

* (technically, this is ref counted.  When the ref reaches zero the object is 
freed.  Real gc is more lazy scanning)


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to