I'd like to add metadata which I get *after* indexing a document's contents to the index. To be more specific: I'm implementing shingling (detection of near-duplicate documents) and want to add the document fingerprint (which is based on the sequence of tokens) to the index.
There doesn't seem to be an easy way to do this in the Lucene API - in particular, I can't easily update a document which is already indexed. The only way I could get this information *before* adding a document to an index is to create a token stream manually (and then have this happen all over again when the document is indexed). This isn't a satisfying solution. Does anyone have any suggestions of how I could get the fingerprint information into the index? I'd appreciate any input. Thanks! - Will -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]