Hi, you are walking from indexReader.terms() then on indexReader.termDocs(Term t) for each term and then match your docID on the termsDocs enum? So you walk the whole index?
You need a forward index and lucene is inverted but you have IMHO 2 solutions with lucene (sadly, they both require re-indexing): - Store the text you indexed, when you have to walk terms inside a doc, just, load that field and analyze it again. - Use a TermVector, when you create your content field use the constructor which accept the TermVector enum. You can then walk on it at search time : indexReader.getTermFreqVector(ID, fieldName) Hope it helps. On Mon, Apr 12, 2010 at 11:15:13AM -0700, Herbert Roitblat wrote: > Hi, folks. > I appreciate the help people have been offering. > Here is my problem. My immediate need is to get the tokens for a document > from the Lucene index. I have a list of documents that I walk, one at a > time. Right now, I am getting the tokens and their frequencies and the > problem is that these stay in the heap as I move from document to document. > > Is there another way to get the tokens given a document ID? > > Thanks, > I'm looking for alternative ways to skin this cat. > > Herb -- David Causse Spotter http://www.spotter.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org