Suitability for long-running text processing?
I have a pair of python programs that parse and index files on my computer to make them searchable. The problem that I have is that they continually grow until my system is out of memory, and then things get ugly. I remember, when I was first learning python, reading that the python interpreter doesn't gc small strings, but I assumed that was outdated and sort of forgot about it. Unfortunately, it seems this is still the case. A sample program (to type/copy and paste into the python REPL): a=[] for i in xrange(33,127): for j in xrange(33,127): for k in xrange(33,127): for l in xrange(33, 127): a.append(chr(i)+chr(j)+chr(k)+chr(l)) del(a) import gc gc.collect() The loop is deep enough that I always interrupt it once python's size is around 250 MB. Once the gc.collect() call is finished, python's size has not changed a bit. Even though there are no locals, no references at all to all the strings that were created, python will not reduce its size. This example is obviously artificial, but I am getting the exact same behaviour in my real programs. Is there some way to convince python to get rid of all the data that is no longer referenced, or do I need to use a different language? This has been tried under python 2.4.3 in gentoo linux and python 2.3 under OS X.3. Any suggestions/work arounds would be much appreciated. -- http://mail.python.org/mailman/listinfo/python-list
Re: Suitability for long-running text processing?
After reading http://www.python.org/doc/faq/general/#how-does-python-manage-memory, I tried modifying this program as below: a=[] for i in xrange(33,127): for j in xrange(33,127): for k in xrange(33,127): for l in xrange(33, 127): a.append(chr(i)+chr(j)+chr(k)+chr(l)) import sys sys.exc_clear() sys.exc_traceback = sys.last_traceback = None del(a) import gc gc.collect() And it still never frees up its memory. -- http://mail.python.org/mailman/listinfo/python-list
Re: Suitability for long-running text processing?
I just tried on my system (Python is using 2.9 MiB) >>> a = ['a' * (1 << 20) for i in xrange(300)] (Python is using 304.1 MiB) >>> del a (Python is using 2.9 MiB -- as before) And I didn't even need to tell the garbage collector to do its job. Some info: It looks like the big difference between our two programs is that you have one huge string repeated 300 times, whereas I have thousands of four-character strings. Are small strings ever collected by python? -- http://mail.python.org/mailman/listinfo/python-list
Re: Suitability for long-running text processing?
$ python Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> # Python is using 2.7 MiB ... a = ['1234' for i in xrange(10 << 20)] >>> # Python is using 42.9 MiB ... del a >>> # Python is using 2.9 MiB With 10,485,760 strings of 4 chars, it still works as expected. Have you tried running the code I posted? Is there any explanation as to why the code I posted fails to ever be cleaned up? In your specific example, you have a huge array of pointers to a single string. Try doing "a[0] is a[1]". You'll get True. Try "a[0] is '1'+'2'+'3'+'4'". You'll get False. Every element of a is a pointer to the exact same string. When you delete a, you're getting rid of a huge array of pointers, but probably not actually losing the four-byte (plus gc overhead) string '1234'. So, does anybody know how to get python to free up _all_ of its allocated strings? -- http://mail.python.org/mailman/listinfo/python-list
Re: Suitability for long-running text processing?
My first thought was that interned strings were causing the growth, but that doesn't seem to be the case. Interned strings, as of 2.3, are no longer immortal, right? The intern doc says you have to keep a reference around to the string now, anyhow. I really wish I could find that thing I read a year and a half ago about python never collecting small strings, but I just can't find it anymore. Maybe it's time for me to go source diving... -- http://mail.python.org/mailman/listinfo/python-list
Re: Suitability for long-running text processing?
I remember something about it coming up in some of the discussions of free lists and better behavior in this regard in 2.5, but I don't remember the details. Under Python 2.5, my original code posting no longer exhibits the bug - upon calling del(a), python's size shrinks back to ~4 MB, which is its starting size. I guess I'll see how painful it is to migrate a gentoo system to 2.5... Thanks for the hint :) -- http://mail.python.org/mailman/listinfo/python-list
Malformed big5 reading bug
Python enters some sort of infinite loop when attempting to read data from a malformed file that is big5 encoded (using the codecs library). This behaviour can be observed under Linux and FreeBSD, using Python 2.4 and 2.5. A really simple example illustrating the bug follows: Python 2.4.4 (#1, May 15 2007, 13:33:55) [GCC 4.1.1 (Gentoo 4.1.1-r3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. import codecs fname='out' outfd=open(fname,'w') outfd.write(chr(243)) outfd.close() infd=codecs.open(fname, encoding='big5') infd.read(1024) And then, it hangs forever. If I instead use the following code: Python 2.5 (r25:51908, Jan 8 2007, 19:09:28) [GCC 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2 Type "help", "copyright", "credits" or "license" for more information. import codecs, signal fname='out' def handler(*args): ... raise Exception("boo!") ... signal.signal(signal.SIGALRM, handler) 0 outfd=open(fname, 'w') outfd.write(chr(243)) outfd.close() infd=codecs.open(fname, encoding='big5') signal.alarm(5) 0 infd.read(1024) The program still hangs forever. The program can be made to crash if I don't install a signal handler at all, but that's pretty lame. It looks like the entire interpreter is being locked up by this read, so I don't think there's likely to be a pure-python workaround, but I thought it would be a good but to have out there so a future version of python can (hopefully) fix this. -- http://mail.python.org/mailman/listinfo/python-list
Re: generating objects of a type from a name.
I'm not sure what a visual object is, but to create an instance of an object whose name is known, you can use "eval": >>> oname = 'list' >>> obj = eval(oname)() >>> obj [] >>> type(obj) Hope that helps! On 26/07/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > I'm trying to generate visual python objects from django objects and > therefore have objects called 'Ring' and 'Cylinder' as django objects > and I want to create objects of those names in visual. > I can cludge it in varius ways by using dir and lots of if lookups but > is there a way of doing this that allows the name to generate a > visual object of the appropriate name or fail nicely if the visual > object doesn't exist? > > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list
zip files as nested modules?
Supposing that I have a directory tree like so: a/ __init__.py b/ __init__.py c.py and b.py has some method (let's call it d) within it. I can, from python, do: from a.b.c import d d() And, that works. Now, suppose I want to have a zipped module under a, called b.zip. Is there any way that I can accomplish the same thing, but using the zip file as the inner module? My directory layout is then a/ __init__.py b.zip And b is a zipfile laid out like b/ __init__.py c.py I tried populating a's __init__ with this: import zipimport import os here = os.path.join(os.getcwd(), __path__[0]) zips = [f for f in os.listdir(here) if f.endswith('.zip')] zips = [os.path.join(here, z) for z in zips] for z in zips: print z mod = os.path.split(z)[-1][:-4] print mod globals()[mod] = zipimport.zipimporter(z).load_module(mod) All the zip modules appear (I actually have a few zips, but that shouldn't be important), but their contents do not seem to be accessible in any way. I could probably put import statements in all the __init__.py files to import everything in the level below, but I am under the impression that relative imports are frowned upon, and it seems pretty bug-prone anyhow. Any pointers on how to accomplish zip modules being nested within normal ones? -- http://mail.python.org/mailman/listinfo/python-list
Re: zip files as nested modules?
and b.py has some method (let's call it d) within it. I can, from python, do: That should be c.py, of course. Is this message getting no replies because it's confusing, it's poorly worded, it's a dumb question, or is it just that nobody knows the answer? I'm stuck on this, so any suggestions at all would be very appreciated. -- http://mail.python.org/mailman/listinfo/python-list
Re: f---ing typechecking
Agreed. This would be similar to: py> 1 + 1.0 Traceback: can only add int to int. Etc. But then again, the unimaginative defense would be that it wouldn't be python if you could catentate a list and a tuple. Of course, that behaviour would be quite defensible; auto-casting int to float is _wrong_, especially with python implementing abitrary precision integers. Integers are more precise than floats, so why would you automatically cast them in that direction? Seeing 0xff+1.0==float(0xff) True Is considerably more irritating than your hypothetical Traceback would be. -- http://mail.python.org/mailman/listinfo/python-list
Re: Urgent : How to do memory leaks detection in python ?
> Python doesn't have memory leaks. Yeah, interesting bit of trivia: python is the world's only non-trivial program that's totally free of bugs. Pretty exciting! But seriously, python 2.4, at least, does have some pretty trivially exposed memory leaks when working with strings. A simple example is this: >>> letters = [chr(c) for c in range(ord('a'), ord('z'))+range(ord('A'), ord('Z'))] >>> ary = [] >>> for a in letters: ... for b in letters: ... for c in letters: ...for d in letters: ... ary.append(a+b+c+d) ... >>> del(ary) >>> import gc >>> gc.collect() 0 The VM's memory usage will never drop from its high point of (on my computer) ~200MB. Since you're using GIS data, this could be what you're running into. I haven't been able to upgrade my systems to python 2.5, but from my tests, that version did not have that memory leak. Nobody seems interesting in backporting fixes from 2.5 to 2.4, so you're probably on your own in that case as well, if upgrading to python 2.5 isn't an option or isn't applicable to your situation. -- http://mail.python.org/mailman/listinfo/python-list
Conditionally skipping the contents of a with-statement
I'd like to write a Fork class to wrap os.fork that allows something like this: with Fork(): # to child stuff, end of block will automatically os._exit() # parent stuff goes here This would require (I think) that the __enter__ method of my Fork class to be able to return a value or raise an exception indicating that the block should not be run. It looks like, from PEP343, any exception thrown in the __enter__ isn't handled by with, and my basic tests confirm this. I could have __enter__ raise a custom exception and wrap the entire with statement in a try/except block, but that sort of defeats the purpose of the with statement. Is there a clean way for the context manager to signal that the execution of the block should be skipped altogether? -- http://mail.python.org/mailman/listinfo/python-list