Bugs item #1399099, was opened at 2006-01-07 05:04 Message generated for change (Comment added) made by tim_one You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1399099&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Closed Resolution: Wont Fix Priority: 5 Submitted By: Leo Jay (leojay) Assigned to: Nobody/Anonymous (nobody) Summary: i get a memory leak after using split() function on windows Initial Comment: my environment is: python 2.4.2 on windows xp professional with sp2 what i do is just open a text file and count how many lines in that file: D:\>python Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> f = file('b.txt', 'rt') # the size of b.txt is about 100 mega bytes >>> print len(f.read().split('\n')) 899830 >>> f.close() >>> del f >>> after these steps, the task manager shows that the python process still hog about 125 mega bytes memory. and the python don't release these memory until i quit the python. but i find that if i remove the split() function, python acts right: D:\>python Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> f = file('b.txt', 'rt') >>> print len(f.read()) 95867667 >>> so, is there something wrong with the split function? or it's just my misuse of split? Best Regards, Leo Jay ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2006-01-07 18:54 Message: Logged In: YES user_id=31435 Specifically, you end up with about a million string objects simultaneously alive. That consumes about 24 million bytes for string object headers, + another 100 million bytes for the string contents. Python can reuse all that memory later, but pymalloc does not (as perky said) "give it back". Do note that this is an extraordinarily slow and wasteful way to count lines. If that's what you want and you don't care about peak memory use, then f.read().count('\n') is one simpler and faster way that creates only one string object. The string object is so large in that case (about 100 million bytes) that pymalloc delegates memory management to the platform malloc. Whether or not the platform malloc "gives back" its memory to the OS is a wholly different question, and one over which Python has no control. On Windows NT+ it's very likely to, though. ---------------------------------------------------------------------- Comment By: Hye-Shik Chang (perky) Date: 2006-01-07 10:54 Message: Logged In: YES user_id=55188 That's came from pymalloc's behavior. Pymalloc never frees allocated memory in heap. For more informations about this, see a mailing list thread http://mail.python.org/pipermail/python-dev/2004-October/049480.html The usual way to resolve the problem is to utilize iterator style loops instead of reading the whole data at a time. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1399099&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com