On Wed, 20 May 2009, Moshe Cohen wrote:
Thanks. Version being used : 2.4.1 . I have already tried most of the well documented Lucene ideas. The seemingly weird thing is that the index is always quite small. I have experience of much larger indices on SOLR with no such errors. Started with a memory error, after increasing JVM heap on init I got the too many open files error, increased the OS limit and got a memory error again:-) Of course, I got further along in each stage but ultimately I hit an error. I can workaround the problem by just restarting the program. This is what lead me to suspecting resource leaks specific to Pylucene.
If it's a small enough program, it might be interesting to see if you can reproduce the problem in pure Java.
Are there any useful monitoring functions that can retrieve the resource usage state along the way?
PyLucene is only a wrapper around Java Lucene and the JVM. The one thing you can track in that context is how many java objects escaped the jvm to python and how many references python holds to them. Use env._dumpRefs(); env is what initVM() returns. _dumpRefs() dumps the hashtable of java objects that escaped the VM to python listing their java.lang.System::identifyHashCode() and how many references python holds to each of them.
If that dump grows beyond reasonable, you have a clue about what could be going wrong if you can then track down what the actual objects in question are (log their identifyHashCode() when you use them, for example). If it doesn't grow, then the problem is most likely on the java side and rewriting your program in pure java is going to help with debugging this.
Andi..