Re: How to get the tokens for a given document

2010-04-12 Thread Herbert L Roitblat
Thanks David. I think that I neglected to say that I am using pyLucene 2.4.0. Your suggestion is almost what we're doing: indexReader.getTermFreqVector(ID, fieldName) self.hits = list(self.lSearcher.search(self.query)) if self.hits: self.hit = lucene.Hit.cast_(self.hi

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-04-12 Thread Herbert L Roitblat
t to open a single reader, but run that TermQuery over and over and over again, do you still hit OOME? Mike On Sun, Apr 11, 2010 at 1:28 PM, Herbert L Roitblat wrote: Hi, Folks. Thanks, Ruben, for your help. It let me get a ways down the road. The problem is the the heap is filling up when I

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-04-11 Thread Herbert L Roitblat
Hi, Folks. Thanks, Ruben, for your help. It let me get a ways down the road. The problem is the the heap is filling up when I am doing a lucene.TermQuery. What I am trying to accomplish is to get the terms in one field of each document and their frequency in the document. A code snippet i