On Tue, 24 Aug 2010, Bill Janssen wrote:

I'm starting to see traces like the following in my UpLib (OS X 10.5.8,
32-bit Python 2.5, Java 6, JCC-2.6, PyLucene-2.9.3) that indicate an
out-of-memory issue.  I spawn a lot of short-lived threads in Python,
and each of them is "attached" to Java, and "detached" after the "run"
method returns.  I've run test programs that do nothing but repeatedly
start new threads that then invoke pylucene to index a document, and see
no problems.

I'm trying to come up with a hypothesis for this.  One of the things I'm
wondering is if my Python memory space is approaching the limit, does
PyLucene arrange for the Java garbage collector to invoke the Python
garbage collector if it can't allocate memory?

No, not that I know of. The only fancy exchange between the Python world and the Java world is for 'extensions' of Java classes in Python. These are in a deadly embrace since they keep track of each other. A proxy object and some weak reference tricks do their work to resolve this cleanly. But this assumes the ref count on the Python side becomes 0 or that the finalize() method on the Java side is invoked (for which there is no guarantee according to the spec).

Everything else Java that is held from Python is losing its permanent reference in the JVM when the Python wrapper object's ref count reached 0.

I keep a lot of objects via weak references in my Python memory space, and I may just be filling up VM so that Java can't allocate enough heap/stack space for a new thread. Note that the thread being unsuccessfully started isn't mine; it's being started by Java.

It is generally better practice to pool threads and to reuse them instead of allocating them for short-lived tasks. I have personally no confidence in the JNI thread detaching mechanism... If it works, great but...

As an aside, here is what I found out about using Java-created threads in Python:

When Java creates a thread, Python is not being told about it and the Python VM considers this thread dummy, that is, without a thread state object. In other words, Python doesn't have a documented 'attachCurrentThread()' call.

Instead, a Python thread state object is allocated at every call entering the Python VM from the Java VM running on such a dummy thread and is freed upon return.

The buggy side effect of this is that you lose your thread-local storage between such calls and pay an extra thread state allocation cost for every such call into Python when the GIL is acquired.

A workaround for this is to create and increment this thread state object's ref count when the Java thread is first created and to decrement it upon thread completion. This is what the PythonVM.acquire/releaseThreadState() methods are for in jcc.cpp. The PythonVM class is used when embedding a Python VM in a Java VM as when running Python code in a Tomcat process, for example. Maybe these methods should move elsewhere if they have potential uses outside this scenario...

Andi..


Bill

thr1730: Running document rippers raised the following exception:
thr1730: Traceback (most recent call last):
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/newFolder.py", line 282, 
in _run_rippers
thr1730:     ripper.rip(folderpath, id)
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/createIndexEntry.py", 
line 187, in rip
thr1730:     index_folder(location, self.repository().index_path())
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/createIndexEntry.py", 
line 82, in index_folder
thr1730:     c.index(folder, doc_id)
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/indexing.py", line 813, 
in index
thr1730:     self.reopen()
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/indexing.py", line 635, 
in reopen
thr1730:     self.current_writer.flush()
thr1730:  JavaError: java.lang.OutOfMemoryError: unable to create new native 
thread
thr1730:     Java stacktrace:
thr1730: java.lang.OutOfMemoryError: unable to create new native thread
thr1730:        at java.lang.Thread.start0(Native Method)
thr1730:        at java.lang.Thread.start(Thread.java:592)
thr1730:        at 
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:221)
thr1730:        at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:3070)
thr1730:        at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:3065)
thr1730:        at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:3061)
thr1730:        at 
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4256)
thr1730:        at 
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4060)

Reply via email to