Re: Not a valid hit number: 0

2010-08-15 Thread Herbert Roitblat
That seems not to be the cause. I went back to 2.9.2 and got the same error. I must have done something else wrong. Thanks, Herb On 8/14/2010 7:33 AM, Herbert Roitblat wrote: I was setting up a new instance of my program on a new computer. I got this error: 2010-08-14 10:05:21,951 ERROR

Not a valid hit number: 0

2010-08-14 Thread Herbert Roitblat
I was setting up a new instance of my program on a new computer. I got this error: 2010-08-14 10:05:21,951 ERROR Thread LuceneThread: java.lang.IndexOutOfBoundsException: Not a valid hit number: 0 Java stacktrace: java.lang.IndexOutOfBoundsException: Not a valid hit number: 0 at org.a

Re: Stemming and Wildcard Queries

2010-05-20 Thread Herbert Roitblat
At a general level, we have found that stemming during indexing is not advisable. Sometimes users want the exact form and if you have removed the exact form during indexing, obviously, you cannot provide that. Rather, we have found that stemming during search is more useful, or maybe it should

Re: Trouble compiling JCC

2010-04-27 Thread Herbert Roitblat
der on my Ubuntu 8.04 box. ----- Original Message - From: "Herbert Roitblat" To: Sent: Tuesday, April 27, 2010 2:37 PM Subject: Trouble compiling JCC I'm trying to compile JCC, using python setup.py build This is what I get: ~/pylucene-2.9.2-1/jcc$ python setup.py build

Trouble compiling JCC

2010-04-27 Thread Herbert Roitblat
I'm trying to compile JCC, using python setup.py build This is what I get: ~/pylucene-2.9.2-1/jcc$ python setup.py build running build running build_py copying jcc/config.py -> build/lib.linux-x86_64-2.5/jcc running build_ext building 'jcc._jcc' extension gcc -pthread -fno-s

Re: HTMLStripReader, HTMLStripCharFilter

2010-04-27 Thread Herbert Roitblat
Oops. Sorry. replied to wrong message. - Original Message - From: "Herbert Roitblat" To: Sent: Tuesday, April 27, 2010 12:01 PM Subject: Re: HTMLStripReader, HTMLStripCharFilter Great, I will look forward to it. Thanks, Herb - Original Message - From: "Just

Re: HTMLStripReader, HTMLStripCharFilter

2010-04-27 Thread Herbert Roitblat
Great, I will look forward to it. Thanks, Herb - Original Message - From: "Justin" To: Sent: Tuesday, April 27, 2010 11:47 AM Subject: Re: HTMLStripReader, HTMLStripCharFilter Thanks for the help. No more exception. Seems odd that I need to add a filter to make reset apply to the

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-04-14 Thread Herbert Roitblat
Mon, Apr 12, 2010 at 10:54 AM, Herbert Roitblat wrote: Update: reusing the reader and searcher made almost no difference. It still eats up the heap. - Original Message - From: "Herbert L Roitblat" To: Sent: Monday, April 12, 2010 6:50 AM Subject: Re: java.lang.OutOfMemoryEr

How to get the tokens for a given document

2010-04-12 Thread Herbert Roitblat
Hi, folks. I appreciate the help people have been offering. Here is my problem. My immediate need is to get the tokens for a document from the Lucene index. I have a list of documents that I walk, one at a time. Right now, I am getting the tokens and their frequencies and the problem is that

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-04-12 Thread Herbert Roitblat
http://lists.osafoundation.org/pipermail/pylucene-dev/2008-January/002171.html ## self.opCount += 1 lReader.close() lSearcher.close() retFields = copy.deepcopy(tFields) #return a copy of tFields to free up references to it and its contents Herbert Roitblat wrote: H

_dumpRefs problem, tracking heap leak

2010-04-10 Thread Herbert Roitblat
I'm trying to get a listing of the Java items that Python is holding. I tried this: print lucene.JCCEnv._dumpRefs(classes=True).items() I get the message: '_dumpRefs' of 'jcc.JCCEnv' object needs an argument What argument does it need? The heap histo gave me this: num #instances#byte

java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-04-09 Thread Herbert Roitblat
Hi, folks. I am using PyLucene and doing a lot of get tokens. lucene.py reports version 2.4.0. It is rpath linux with 8GB of memory. Python is 2.4. I'm not sure what the maxheap is, I think that it is maxheap='2048m'. I think that it's running in a 64 bit environment. It indexes a set of 116,