Re: do the Java and Python garbage collectors talk to each other, with JCC?

Andi Vajda Tue, 24 Aug 2010 19:28:25 -0700


On Tue, 24 Aug 2010, Bill Janssen wrote:

I'm starting to see traces like the following in my UpLib (OS X 10.5.8,
32-bit Python 2.5, Java 6, JCC-2.6, PyLucene-2.9.3) that indicate an
out-of-memory issue.  I spawn a lot of short-lived threads in Python,
and each of them is "attached" to Java, and "detached" after the "run"
method returns.  I've run test programs that do nothing but repeatedly
start new threads that then invoke pylucene to index a document, and see
no problems.

I'm trying to come up with a hypothesis for this.  One of the things I'm
wondering is if my Python memory space is approaching the limit, does
PyLucene arrange for the Java garbage collector to invoke the Python
garbage collector if it can't allocate memory?

No, not that I know of. The only fancy exchange between the Python world andthe Java world is for 'extensions' of Java classes in Python. These are in adeadly embrace since they keep track of each other. A proxy object and someweak reference tricks do their work to resolve this cleanly. But thisassumes the ref count on the Python side becomes 0 or that the finalize()method on the Java side is invoked (for which there is no guaranteeaccording to the spec).

Everything else Java that is held from Python is losing its permanentreference in the JVM when the Python wrapper object's ref count reached 0.

I keep a lot of objects via weak references in my Python memory space, andI may just be filling up VM so that Java can't allocate enough heap/stackspace for a new thread. Note that the thread being unsuccessfully startedisn't mine; it's being started by Java.

It is generally better practice to pool threads and to reuse them instead ofallocating them for short-lived tasks. I have personally no confidence inthe JNI thread detaching mechanism... If it works, great but...

As an aside, here is what I found out about using Java-created threads inPython:

When Java creates a thread, Python is not being told about it and the PythonVM considers this thread dummy, that is, without a thread state object. Inother words, Python doesn't have a documented 'attachCurrentThread()' call.

Instead, a Python thread state object is allocated at every call enteringthe Python VM from the Java VM running on such a dummy thread and is freedupon return.

The buggy side effect of this is that you lose your thread-local storagebetween such calls and pay an extra thread state allocation cost for everysuch call into Python when the GIL is acquired.

A workaround for this is to create and increment this thread state object'sref count when the Java thread is first created and to decrement it uponthread completion. This is what the PythonVM.acquire/releaseThreadState()methods are for in jcc.cpp. The PythonVM class is used when embedding aPython VM in a Java VM as when running Python code in a Tomcat process, forexample. Maybe these methods should move elsewhere if they have potentialuses outside this scenario...


Andi..


Bill

thr1730: Running document rippers raised the following exception:
thr1730: Traceback (most recent call last):
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/newFolder.py", line 282, 
in _run_rippers
thr1730:     ripper.rip(folderpath, id)
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/createIndexEntry.py", 
line 187, in rip
thr1730:     index_folder(location, self.repository().index_path())
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/createIndexEntry.py", 
line 82, in index_folder
thr1730:     c.index(folder, doc_id)
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/indexing.py", line 813, 
in index
thr1730:     self.reopen()
thr1730:    File "/local/share/UpLib-1.7.9/code/uplib/indexing.py", line 635, 
in reopen
thr1730:     self.current_writer.flush()
thr1730:  JavaError: java.lang.OutOfMemoryError: unable to create new native 
thread
thr1730:     Java stacktrace:
thr1730: java.lang.OutOfMemoryError: unable to create new native thread
thr1730:        at java.lang.Thread.start0(Native Method)
thr1730:        at java.lang.Thread.start(Thread.java:592)
thr1730:        at 
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:221)
thr1730:        at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:3070)
thr1730:        at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:3065)
thr1730:        at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:3061)
thr1730:        at 
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4256)
thr1730:        at 
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4060)

Re: do the Java and Python garbage collectors talk to each other, with JCC?

Reply via email to