I'd like to add metadata which I get *after* indexing a document's contents to 
the index. To be more specific: I'm implementing shingling (detection of 
near-duplicate documents) and want to add the document fingerprint (which is 
based on the sequence of tokens) to the index.

There doesn't seem to be an easy way to do this in the Lucene API - in 
particular, I can't easily update a document which is already indexed. The only 
way I could get this information *before* adding a document to an index is to 
create a token stream manually (and then have this happen all over again when 
the document is indexed). This isn't a satisfying solution. Does anyone have 
any suggestions of how I could get the fingerprint information into the index? 
I'd appreciate any input. Thanks!

- Will
-- 
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to