Python memory handling

2007-05-31 Thread frederic . pica
Greets,

I've some troubles getting my memory freed by python, how can I force
it to release the memory ?
I've tried del and gc.collect() with no success.
Here is a code sample, parsing an XML file under linux python 2.4
(same problem with windows 2.5, tried with the first example) :
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
#Using http://www.pixelbeat.org/scripts/ps_mem.py to get memory
information
import cElementTree as ElementTree #meminfo: 2.3 Mb private, 1.6 Mb
shared
import gc #no memory change

et=ElementTree.parse('primary.xml') #meminfo: 34.6 Mb private, 1.6 Mb
shared
del et #no memory change
gc.collect() #no memory change

So how can I free the 32.3 Mb taken by ElementTree ??

The same problem here with a simple file.readlines()
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
import gc #no memory change
f=open('primary.xml') #no memory change
data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
del data #meminfo: 11.5 Mb private, 1.4 Mb shared
gc.collect() # no memory change

But works great with file.read() :
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
import gc #no memory change
f=open('primary.xml') #no memory change
data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
del data #meminfo: 1.1 Mb private, 1.4 Mb shared
gc.collect() # no memory change

So as I can see, python maintain a memory pool for lists.
In my first example, if I reparse the xml file, the memory doesn't
grow very much (0.1 Mb precisely)
So I think I'm right with the memory pool.

But is there a way to force python to release this memory ?!

Regards,
FP

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python memory handling

2007-05-31 Thread frederic . pica
On 31 mai, 14:16, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> In <[EMAIL PROTECTED]>, frederic.pica
> wrote:
>
> > So as I can see, python maintain a memory pool for lists.
> > In my first example, if I reparse the xml file, the memory doesn't
> > grow very much (0.1 Mb precisely)
> > So I think I'm right with the memory pool.
>
> > But is there a way to force python to release this memory ?!
>
> AFAIK not.  But why is this important as long as the memory consumption
> doesn't grow constantly?  The virtual memory management of the operating
> system usually takes care that only actually used memory is in physical
> RAM.
>
> Ciao,
> Marc 'BlackJack' Rintsch

Because I'm an adept of small is beautiful, of course the OS will swap
the unused memory if needed.
If I daemonize this application I will have a constant 40 Mb used, not
yet free for others applications. If another application need this
memory, the OS will have to swap and loose time for the other
application... And I'm not sure that the system will swap first this
unused memory, it could also swap first another application... AFAIK.
And these 40 Mb are only for a 7 Mb xml file, what about parsing a big
one, like 50 Mb ?

I would have preferred to have the choice of manually freeing this
unused memory or setting manually the size of the memory pool

Regards,
FP

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python memory handling

2007-05-31 Thread frederic . pica
On 31 mai, 16:22, Paul Melis <[EMAIL PROTECTED]> wrote:
> Hello,
>
> [EMAIL PROTECTED] wrote:
> > I've some troubles getting my memory freed by python, how can I force
> > it to release the memory ?
> > I've tried del and gc.collect() with no success.
>
> [...]
>
>
>
> > The same problem here with a simple file.readlines()
> > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
> > import gc #no memory change
> > f=open('primary.xml') #no memory change
> > data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
> > del data #meminfo: 11.5 Mb private, 1.4 Mb shared
> > gc.collect() # no memory change
>
> > But works great with file.read() :
> > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
> > import gc #no memory change
> > f=open('primary.xml') #no memory change
> > data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
> > del data #meminfo: 1.1 Mb private, 1.4 Mb shared
> > gc.collect() # no memory change
>
> > So as I can see, python maintain a memory pool for lists.
> > In my first example, if I reparse the xml file, the memory doesn't
> > grow very much (0.1 Mb precisely)
> > So I think I'm right with the memory pool.
>
> > But is there a way to force python to release this memory ?!
>
> This is from the 2.5 series release notes
> (http://www.python.org/download/releases/2.5.1/NEWS.txt):
>
> "[...]
>
> - Patch #1123430: Python's small-object allocator now returns an arena to
>the system ``free()`` when all memory within an arena becomes unused
>again.  Prior to Python 2.5, arenas (256KB chunks of memory) were never
>freed.  Some applications will see a drop in virtual memory size now,
>especially long-running applications that, from time to time, temporarily
>use a large number of small objects.  Note that when Python returns an
>arena to the platform C's ``free()``, there's no guarantee that the
>platform C library will in turn return that memory to the operating
> system.
>The effect of the patch is to stop making that impossible, and in
> tests it
>appears to be effective at least on Microsoft C and gcc-based systems.
>Thanks to Evan Jones for hard work and patience.
>
> [...]"
>
> So with 2.4 under linux (as you tested) you will indeed not always get
> the used memory back, with respect to lots of small objects being
> collected.
>
> The difference therefore (I think) you see between doing an f.read() and
> an f.readlines() is that the former reads in the whole file as one large
> string object (i.e. not a small object), while the latter returns a list
> of lines where each line is a python object.
>
> I wonder how 2.5 would work out on linux in this situation for you.
>
> Paul


Hello,

I will try later with python 2.5 under linux, but as far as I can see,
it's the same problem under my windows python 2.5
After reading this document :
http://evanjones.ca/memoryallocator/python-memory.pdf

I think it's because list or dictionnaries are used by the parser, and
python use an internal memory pool (not pymalloc) for them...

Regards,
FP

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python memory handling

2007-05-31 Thread frederic . pica
On 31 mai, 17:29, "Josh Bloom" <[EMAIL PROTECTED]> wrote:
> If the memory usage is that important to you, you could break this out
> into 2 programs, one that starts the jobs when needed, the other that
> does the processing and then quits.
> As long as the python startup time isn't an issue for you.
>
> On 31 May 2007 04:40:04 -0700, [EMAIL PROTECTED]
>
> <[EMAIL PROTECTED]> wrote:
> > Greets,
>
> > I've some troubles getting my memory freed by python, how can I force
> > it to release the memory ?
> > I've tried del and gc.collect() with no success.
> > Here is a code sample, parsing an XML file under linux python 2.4
> > (same problem with windows 2.5, tried with the first example) :
> > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
> > #Usinghttp://www.pixelbeat.org/scripts/ps_mem.pyto get memory
> > information
> > import cElementTree as ElementTree #meminfo: 2.3 Mb private, 1.6 Mb
> > shared
> > import gc #no memory change
>
> > et=ElementTree.parse('primary.xml') #meminfo: 34.6 Mb private, 1.6 Mb
> > shared
> > del et #no memory change
> > gc.collect() #no memory change
>
> > So how can I free the 32.3 Mb taken by ElementTree ??
>
> > The same problem here with a simple file.readlines()
> > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
> > import gc #no memory change
> > f=open('primary.xml') #no memory change
> > data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
> > del data #meminfo: 11.5 Mb private, 1.4 Mb shared
> > gc.collect() # no memory change
>
> > But works great with file.read() :
> > #Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
> > import gc #no memory change
> > f=open('primary.xml') #no memory change
> > data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
> > del data #meminfo: 1.1 Mb private, 1.4 Mb shared
> > gc.collect() # no memory change
>
> > So as I can see, python maintain a memory pool for lists.
> > In my first example, if I reparse the xml file, the memory doesn't
> > grow very much (0.1 Mb precisely)
> > So I think I'm right with the memory pool.
>
> > But is there a way to force python to release this memory ?!
>
> > Regards,
> > FP
>
> > --
> >http://mail.python.org/mailman/listinfo/python-list


Yes it's a solution, but I think it's not a good way, I did'nt want to
use bad hacks to bypass a python specific problem.
And the problem is everywhere, every python having to manage big
files.
I've tried xml.dom.minidom using a 66 Mb xml file => 675 Mb of memory
that will never be freed. But that time I've got many unreachable
object when running gc.collect()
Using the same file with cElementTree took me 217 Mb, with no
unreachable object.
For me it's not a good behavior, it's not a good way to let the system
swap this unused memory instead of freeing it.
I think it's a really good idea to have a memory pool for performance
reason, but why is there no 'free block' limit ?
Python is a really really good language that can do many things in a
clear, easier and performance way I think. It has always feet all my
needs. But I can't imagine there is no good solution for that problem,
by limiting the free block pool size or best, letting the user specify
this limit and even better, letting the user completely freeing it
(with also the limit manual specification)

Like:
import pool
pool.free()
pool.limit(size in megabytes)

Why not letting the user choosing that, why not giving the user more
flexibility ?
I will try later under linux with the latest stable python

Regards,
FP

-- 
http://mail.python.org/mailman/listinfo/python-list