Merging Lucene documents

Developer Developer Sun, 06 Jan 2008 09:46:22 -0800

Hello Friends,

I have a unique requirement of merging two or more lucene indexed documents
into just one indexed document . For example


Document newDocutmet = doc1+doc2+doc3

In order to do this I am planning to extract tokenstreams form each document
( i.e doc1, doc2 and doc3) , and use them to construct newDocument . The
reason is , I do not have access to the content of the original documents
(doc1,doc2,doc3)


My questions are

1. Is this the correct approach
2. Do I have to update the start and end offsets of the tokens since the
tokens from original documents (doc1, 2,3) were relative to the original
documents, and in the newDocument these offsets may be wrong.
3. If Yes, then how do I make sure that the mergeded tokens have correct
start and end offset.

Thanks !

Merging Lucene documents

Reply via email to