We have a document tagging system where documents are composed of two types of data:
Rarely changed (hereafter: "immutable") data - document text and metadata that we upload and almost never change. The text can be hundreds of pages. User created (hereafter: "mutable") data - document properties that are set by users of our system. In total a document's properties are generally several dozen bytes at most. Even viewing a document changes the data (e.g. the document's "viewed" property. At present, all data is part of a single Lucene document. The problem is that when any piece of mutable data is updated (this happens relatively frequently), we have to reindex the entire document. We'd like to have two separate indexed Lucene documents per logical document, one containing the immutable data and the other containing the much smaller and more transient mutable data. When the mutable data changes, we can delete that document's mutable Lucene document and index a new one very quickly. There are two major difficulties when actually performing a search, though: 1. We are providing complex queries to retrieve logical documents based on information in either of its Lucene documents. It seems non-trivial to fetch a logical document in a BooleanQuery with Occur.MUST clauses referring to fields in both of the Lucene documents. 2. We need to sort results (logical document IDs) based on fields in either of its Lucene documents. Has anyone done anything like this before? Is there functionality I'm overlooking that could make this easier? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org