Any tips on this issue?
Thanks
Marco
- Original Message -
From: Marco Dissel
To: java-user@lucene.apache.org
Sent: Friday, May 13, 2005 9:05 AM
Subject: finding potential duplicate documents
Hello
I've got many documents that are potentially duplicate (merging se
Hello
I've got many documents that are potentially duplicate (merging several
external systems). Any tips how to find documents that are potentially
duplicate (using a variable ranking like >0.5 match)..
I can use the similarity (MoreLikeThis) method from Sandbox, but that's always
comparing
Hello
I've got many documents that are potentially duplicate (merging several
external systems). Any tips how to find documents that are potentially
duplicate (using a variable ranking like >0.5 match)..
I can use the similarity (MoreLikeThis) method from Sandbox, but that's always
comparing