You can just add a document. I used this technique in an application, and it hinges upon realizing that not all documents in an index need to have the same fields. So, say your regular documents have fields f1, f2, f3...fn. Create a special document with fields s1, s2, s3, s4 that contain your meta data. Whenever you add more "regular" documents to the index you can modify the special document as necessary.
The beauty of this is that as long as the special document contains no fields in common with your regular documents, you'll never have it returned by searches because the fields are disjoint. And searches to find it will be very fast because there's only one. You can take this as far as you like. For instance, you could store a field (no need to even index it!) that contains, say, an XML version of all the meta-data you want to use in your special document. Perhaps you want to read this document in at startup and store it in a convenient form. Or..... If you go this route, you may want to consider creating and storing the meta-document as a post-build step. I was surprised at how quickly I could traverse an index and build up the meta-data document after I'd finished with all of the "regular" processing. One caution, however; I'd be very careful about storing Lucene document Ids in my meta-data document since they may change if you delete documents and then optimize your index. In fact, they WILL change. BTW, I thoroughly approve of keeping all the parts you can in the index, since that's fewer things to keep track of. Hope this helps Erick On 2/27/07, Mike O'Leary <[EMAIL PROTECTED]> wrote:
Is there a standard programming idiom for adding extra data to an index that has been created? I am trying to write code to index and search a set of documents using the BM25 algorithm, so (as I understand it) I need to store the length of each document somewhere and the average document length for the collection somewhere (and, I guess, the number of documents that have been indexed at any point so I can keep a running average). It seems like it would make sense to store these values in the index somehow so they are available to the search code. Is there sample code somewhere that describes how to do something like this? Or is there a better way that I'm not thinking of? Thanks. Mike O'Leary