William Mee wrote:
I'd like to add metadata which I get *after* indexing a document's
contents to the index. To be more specific: I'm implementing
shingling (detection of near-duplicate documents) and want to add the
document fingerprint (which is based on the sequence of tokens) to
the index.

There doesn't seem to be an easy way to do this in the Lucene API -
in particular, I can't easily update a document which is already
indexed. The only way I could get this information *before* adding a
document to an index is to create a token stream manually (and then
have this happen all over again when the document is indexed). This
isn't a satisfying solution. Does anyone have any suggestions of how
I could get the fingerprint information into the index? I'd
appreciate any input. Thanks!

You could perhaps try creating a separate index in the same order as your initial one, store the new fields in that, and then use ParallelReader to "glue" the two together at read-time.

Daniel



--
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://nuix.com/                               Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to