Split mutable logical document into two Lucene documents

Brandon Mintern Wed, 07 Dec 2011 12:46:52 -0800

We have a document tagging system where documents are composed of two
types of data:


Rarely changed (hereafter: "immutable") data - document text and
metadata that we upload and almost never change. The text can be
hundreds of pages.

User created (hereafter: "mutable") data - document properties that
are set by users of our system. In total a document's properties are
generally several dozen bytes at most. Even viewing a document changes
the data (e.g. the document's "viewed" property.


At present, all data is part of a single Lucene document. The problem
is that when any piece of mutable data is updated (this happens
relatively frequently), we have to reindex the entire document. We'd
like to have two separate indexed Lucene documents per logical
document, one containing the immutable data and the other containing
the much smaller and more transient mutable data. When the mutable
data changes, we can delete that document's mutable Lucene document
and index a new one very quickly.

There are two major difficulties when actually performing a search, though:

1. We are providing complex queries to retrieve logical documents
based on information in either of its Lucene documents. It seems
non-trivial to fetch a logical document in a BooleanQuery with
Occur.MUST clauses referring to fields in both of the Lucene
documents.

2. We need to sort results (logical document IDs) based on fields in
either of its Lucene documents.

Has anyone done anything like this before? Is there functionality I'm
overlooking that could make this easier?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Split mutable logical document into two Lucene documents

Reply via email to