Hi,

you are walking from indexReader.terms() then on indexReader.termDocs(Term t) 
for each term and then match your docID on the termsDocs enum? So you walk
the whole index?

You need a forward index and lucene is inverted but you have IMHO 2
solutions with lucene (sadly, they both require re-indexing):
 - Store the text you indexed, when you have to walk terms inside a doc,
   just, load that field and analyze it again.
 - Use a TermVector, when you create your content field use the
   constructor which accept the TermVector enum. You can then walk on it
   at search time : indexReader.getTermFreqVector(ID, fieldName)

Hope it helps.

On Mon, Apr 12, 2010 at 11:15:13AM -0700, Herbert Roitblat wrote:
> Hi, folks.
> I appreciate the help people have been offering.
> Here is my problem.  My immediate need is to get the tokens for a document 
> from the Lucene index.  I have a list of documents that I walk, one at a 
> time.  Right now, I am getting the tokens and their frequencies and the 
> problem is that these stay in the heap as I move from document to document.
> 
> Is there another way to get the tokens given a document ID?
> 
> Thanks,
> I'm looking for alternative ways to skin this cat.
> 
> Herb

-- 
David Causse
Spotter
http://www.spotter.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to