It is possible to reconstruct a document from the terms, but it's a lossy process. Luke does this (you can see from the UI, and the code is available). There's no utility that I know of to make this easy.
It's an open question whether this is more or less work than re-parsing the document (I infer that you have the originals available). Before trying to reconstruct the document I'd ask how often you need to do this. The gremlins coming out of the woodwork from reconstruction would consume a lot of resources. For instance, could you guarantee term positions when you re-indexed the reconstructed document? What about stemmed words? How about synonyms? And are you willing to spend the time tracking down the bugs? As I said, I wouldn't go there until forced.... FWIW Erick On Wed, Dec 30, 2009 at 5:08 PM, tsuraan <tsur...@gmail.com> wrote: > Suppose I have a (useful) document stored in a Lucene index, and I > have a variant that I'd also like to be able to search. This variant > has the exact same data as the original document, but with some extra > fields. I'd like to be able to use an IndexReader to get the document > that I stored, use the document's add method to put my extra fields > in, and then add that document to the index using an IndexWriter. > This doesn't seem to work, in general. Non-stored fields of the > original document are not in the variant document. This makes sense > from an OO point of view (how would that document object possibly have > the non-stored data of the original doc), but is there some > lower-level way to do what I want to do? > > It's somewhat expensive to completely re-create my document, as it > relies on data parsed from (often large) pdf and MS Office files. I'd > like to be able to use the already-stored terms that are in my index > and associated with my existing document. Can I iterate through the > terms of my index and add references to my newly-added document? Is > there any utility to make this work nicely? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >