Suitability for long-running text processing?

2007-01-08 Thread tsuraan

I have a pair of python programs that parse and index files on my computer
to make them searchable.  The problem that I have is that they continually
grow until my system is out of memory, and then things get ugly.  I
remember, when I was first learning python, reading that the python
interpreter doesn't gc small strings, but I assumed that was outdated and
sort of forgot about it.  Unfortunately, it seems this is still the case.  A
sample program (to type/copy and paste into the python REPL):

a=[]
for i in xrange(33,127):
for j in xrange(33,127):
 for k in xrange(33,127):
  for l in xrange(33, 127):
   a.append(chr(i)+chr(j)+chr(k)+chr(l))

del(a)
import gc
gc.collect()

The loop is deep enough that I always interrupt it once python's size is
around 250 MB.  Once the gc.collect() call is finished, python's size has
not changed a bit.  Even though there are no locals, no references at all to
all the strings that were created, python will not reduce its size.  This
example is obviously artificial, but I am getting the exact same behaviour
in my real programs.  Is there some way to convince python to get rid of all
the data that is no longer referenced, or do I need to use a different
language?

This has been tried under python 2.4.3 in gentoo linux and python 2.3 under
OS X.3.  Any suggestions/work arounds would be much appreciated.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan

After reading
http://www.python.org/doc/faq/general/#how-does-python-manage-memory, I
tried modifying this program as below:

a=[]

for i in xrange(33,127):
 for j in xrange(33,127):
  for k in xrange(33,127):
   for l in xrange(33, 127):
a.append(chr(i)+chr(j)+chr(k)+chr(l))



import sys
sys.exc_clear()
sys.exc_traceback = sys.last_traceback = None

del(a)

import gc
gc.collect()



And it still never frees up its memory.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan

I just tried on my system

(Python is using 2.9 MiB)
>>> a = ['a' * (1 << 20) for i in xrange(300)]
(Python is using 304.1 MiB)
>>> del a
(Python is using 2.9 MiB -- as before)

And I didn't even need to tell the garbage collector to do its job. Some
info:



It looks like the big difference between our two programs is that you have
one huge string repeated 300 times, whereas I have thousands of
four-character strings.  Are small strings ever collected by python?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan

$ python
Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> # Python is using 2.7 MiB
... a = ['1234' for i in xrange(10 << 20)]
>>> # Python is using 42.9 MiB
... del a
>>> # Python is using 2.9 MiB

With 10,485,760 strings of 4 chars, it still works as expected.



Have you tried running the code I posted?  Is there any explanation as to
why the code I posted fails to ever be cleaned up?
In your specific example, you have a huge array of pointers to a single
string.  Try doing "a[0] is a[1]".  You'll get True.  Try "a[0] is
'1'+'2'+'3'+'4'".  You'll get False.  Every element of a is a pointer to the
exact same string.  When you delete a, you're getting rid of a huge array of
pointers, but probably not actually losing the four-byte (plus gc overhead)
string '1234'.

So, does anybody know how to get python to free up _all_ of its allocated
strings?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan

My first thought was that interned strings were causing the growth,
but that doesn't seem to be the case.



Interned strings, as of 2.3, are no longer immortal, right?  The intern doc
says you have to keep a reference around to the string now, anyhow.  I
really wish I could find that thing I read a year and a half ago about
python never collecting small strings, but I just can't find it anymore.
Maybe it's time for me to go source diving...
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan

I remember something about it coming up in some of the discussions of
free lists and better behavior in this regard in 2.5, but I don't
remember the details.



Under Python 2.5, my original code posting no longer exhibits the bug - upon
calling del(a), python's size shrinks back to ~4 MB, which is its starting
size.  I guess I'll see how painful it is to migrate a gentoo system to
2.5... Thanks for the hint :)
-- 
http://mail.python.org/mailman/listinfo/python-list

Malformed big5 reading bug

2007-05-29 Thread tsuraan

Python enters some sort of infinite loop when attempting to read data from a
malformed file that is big5 encoded (using the codecs library).  This
behaviour can be observed under Linux and FreeBSD, using Python 2.4 and 2.5.
A really simple example illustrating the bug follows:

Python 2.4.4 (#1, May 15 2007, 13:33:55)
[GCC 4.1.1 (Gentoo 4.1.1-r3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import codecs
fname='out'
outfd=open(fname,'w')
outfd.write(chr(243))
outfd.close()

infd=codecs.open(fname, encoding='big5')
infd.read(1024)


And then, it hangs forever.  If I instead use the following code:

Python 2.5 (r25:51908, Jan  8 2007, 19:09:28)
[GCC 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import codecs, signal
fname='out'
def handler(*args):

...   raise Exception("boo!")
...

signal.signal(signal.SIGALRM, handler)

0

outfd=open(fname, 'w')
outfd.write(chr(243))
outfd.close()

infd=codecs.open(fname, encoding='big5')
signal.alarm(5)

0

infd.read(1024)


The program still hangs forever.  The program can be made to crash if I
don't install a signal handler at all, but that's pretty lame.  It looks
like the entire interpreter is being locked up by this read, so I don't
think there's likely to be a pure-python workaround, but I thought it would
be a good but to have out there so a future version of python can
(hopefully) fix this.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: generating objects of a type from a name.

2007-07-26 Thread tsuraan
I'm not sure what a visual object is, but to create an instance of an
object whose name is known, you can use "eval":

>>> oname = 'list'
>>> obj = eval(oname)()
>>> obj
[]
>>> type(obj)


Hope that helps!


On 26/07/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> I'm trying to generate visual python objects from django objects and
> therefore have objects called  'Ring' and 'Cylinder' as django objects
> and I want to create objects of those names in visual.
> I can cludge it in varius ways by using dir and lots of if lookups but
> is there a  way of doing this that allows the name to generate a
> visual object of the appropriate name or fail nicely if the visual
> object doesn't exist?
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


zip files as nested modules?

2007-04-01 Thread tsuraan
Supposing that I have a directory tree like so:

a/
  __init__.py
  b/
__init__.py
   c.py

and b.py has some method (let's call it d) within it.  I can, from python, do:

from a.b.c import d
d()

And, that works.  Now, suppose I want to have a zipped module under a,
called b.zip.  Is there any way that I can accomplish the same thing,
but using the zip file as the inner module?

My directory layout is then

a/
  __init__.py
  b.zip

And b is a zipfile laid out like

b/
  __init__.py
  c.py

I tried populating a's __init__ with this:

import zipimport
import os
here = os.path.join(os.getcwd(), __path__[0])
zips = [f for f in os.listdir(here) if f.endswith('.zip')]
zips = [os.path.join(here, z) for z in zips]

for z in zips:
  print z
  mod = os.path.split(z)[-1][:-4]
  print mod
  globals()[mod] = zipimport.zipimporter(z).load_module(mod)

All the zip modules appear (I actually have a few zips, but that
shouldn't be important), but their contents do not seem to be
accessible in any way.  I could probably put import statements in all
the __init__.py files to import everything in the level below, but I
am under the impression that relative imports are frowned upon, and it
seems pretty bug-prone anyhow.

Any pointers on how to accomplish zip modules being nested within normal ones?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: zip files as nested modules?

2007-04-02 Thread tsuraan

and b.py has some method (let's call it d) within it.  I can, from python,
do:



That should be c.py, of course.

Is this message getting no replies because it's confusing, it's poorly
worded, it's a dumb question, or is it just that nobody knows the answer?
I'm stuck on this, so any suggestions at all would be very appreciated.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: f---ing typechecking

2007-02-16 Thread tsuraan

Agreed. This would be similar to:

py> 1 + 1.0

Traceback: can only add int to int. Etc.

But then again, the unimaginative defense would be that it wouldn't be
python if you could catentate a list and a tuple.



Of course, that behaviour would be quite defensible; auto-casting int to
float is _wrong_, especially with python implementing abitrary precision
integers.  Integers are more precise than floats, so why would you
automatically cast them in that direction?

Seeing


0xff+1.0==float(0xff)

True

Is considerably more irritating than your hypothetical Traceback would be.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Urgent : How to do memory leaks detection in python ?

2008-03-15 Thread tsuraan
> Python doesn't have memory leaks.

Yeah, interesting bit of trivia: python is the world's only non-trivial
program that's totally free of bugs.  Pretty exciting!  But seriously,
python 2.4, at least, does have some pretty trivially exposed memory leaks
when working with strings.  A simple example is this:

>>> letters = [chr(c) for c in range(ord('a'), ord('z'))+range(ord('A'),
ord('Z'))]
>>> ary = []
>>> for a in letters:
...  for b in letters:
...   for c in letters:
...for d in letters:
... ary.append(a+b+c+d)
...
>>> del(ary)
>>> import gc
>>> gc.collect()
0

The VM's memory usage will never drop from its high point of (on my
computer) ~200MB.  Since you're using GIS data, this could be what you're
running into.  I haven't been able to upgrade my systems to python 2.5, but
from my tests, that version did not have that memory leak.  Nobody seems
interesting in backporting fixes from 2.5 to 2.4, so you're probably on your
own in that case as well, if upgrading to python 2.5 isn't an option or
isn't applicable to your situation.
-- 
http://mail.python.org/mailman/listinfo/python-list

Conditionally skipping the contents of a with-statement

2009-08-21 Thread tsuraan
I'd like to write a Fork class to wrap os.fork that allows something like this:

with Fork():
  # to child stuff, end of block will automatically os._exit()
# parent stuff goes here

This would require (I think) that the __enter__ method of my Fork
class to be able to return a value or raise an exception indicating
that the block should not be run.  It looks like, from PEP343, any
exception thrown in the __enter__ isn't handled by with, and my basic
tests confirm this.  I could have __enter__ raise a custom exception
and wrap the entire with statement in a try/except block, but that
sort of defeats the purpose of the with statement.  Is there a clean
way for the context manager to signal that the execution of the block
should be skipped altogether?
-- 
http://mail.python.org/mailman/listinfo/python-list