Bugs item #1399099, was opened at 2006-01-07 05:04
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1399099&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Closed
Resolution: Wont Fix
Priority: 5
Submitted By: Leo Jay (leojay)
Assigned to: Nobody/Anonymous (nobody)
Summary: i get a memory leak after using split() function on windows

Initial Comment:
my environment is: python 2.4.2 on windows xp 
professional with sp2

what i do is just open a text file and count how many 
lines in that file:

D:\>python
Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 
32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for 
more information.
>>> f = file('b.txt', 'rt')   # the size of b.txt is 
about 100 mega bytes
>>> print len(f.read().split('\n'))
899830
>>> f.close()
>>> del f
>>>

after these steps, the task manager shows that the 
python process still hog about 125 mega bytes memory. 
and the python don't release these memory until i 
quit the python.

but i find that if i remove the split() function,  
python acts right:

D:\>python
Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 
32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for 
more information.
>>> f = file('b.txt', 'rt')
>>> print len(f.read())
95867667
>>>


so, is there something wrong with the split function? 
or it's just my misuse of split?


Best Regards, 
Leo Jay

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2006-01-07 18:54

Message:
Logged In: YES 
user_id=31435

Specifically, you end up with about a million string objects
simultaneously alive.

That consumes about 24 million bytes for string object
headers, + another 100 million bytes for the string contents.

Python can reuse all that memory later, but pymalloc does
not (as perky said) "give it back".

Do note that this is an extraordinarily slow and wasteful
way to count lines.  If that's what you want and you don't
care about peak memory use, then f.read().count('\n') is one
simpler and faster way that creates only one string object.    

The string object is so large in that case (about 100
million bytes) that pymalloc delegates memory management to
the platform malloc.  Whether or not the platform malloc
"gives back" its memory to the OS is a wholly different
question, and one over which Python has no control.  On
Windows NT+ it's very likely to, though.

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2006-01-07 10:54

Message:
Logged In: YES 
user_id=55188

That's came from pymalloc's behavior.
Pymalloc never frees allocated memory in heap. For more
informations about this, see a mailing list thread
http://mail.python.org/pipermail/python-dev/2004-October/049480.html

The usual way to resolve the problem is to utilize iterator
style loops instead of reading the whole data at a time.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1399099&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to