Re: finding potential duplicate documents

2005-05-29 Thread Marco Dissel
Any tips on this issue? Thanks Marco - Original Message - From: Marco Dissel To: java-user@lucene.apache.org Sent: Friday, May 13, 2005 9:05 AM Subject: finding potential duplicate documents Hello I've got many documents that are potentially duplicate (merging se

finding potential duplicate documents

2005-05-13 Thread Marco Dissel
Hello I've got many documents that are potentially duplicate (merging several external systems). Any tips how to find documents that are potentially duplicate (using a variable ranking like >0.5 match).. I can use the similarity (MoreLikeThis) method from Sandbox, but that's always comparing

finding potential duplicate documents

2005-05-13 Thread Marco Dissel
Hello I've got many documents that are potentially duplicate (merging several external systems). Any tips how to find documents that are potentially duplicate (using a variable ranking like >0.5 match).. I can use the similarity (MoreLikeThis) method from Sandbox, but that's always comparing