Help on Similarity class

2006-01-18 Thread Marco Dissel
Hello I'm using Lucene for searching in a CRM application. For example when searching for a company name i want to show similair company names: search for: "Microsoft International" would return (in this order): -microsoft international -microsoft benelux -microsoft Currently it would also return

Re: finding potential duplicate documents

2005-05-29 Thread Marco Dissel
Any tips on this issue? Thanks Marco - Original Message - From: Marco Dissel To: java-user@lucene.apache.org Sent: Friday, May 13, 2005 9:05 AM Subject: finding potential duplicate documents Hello I've got many documents that are potentially duplicate (merging se

finding potential duplicate documents

2005-05-13 Thread Marco Dissel
Hello I've got many documents that are potentially duplicate (merging several external systems). Any tips how to find documents that are potentially duplicate (using a variable ranking like >0.5 match).. I can use the similarity (MoreLikeThis) method from Sandbox, but that's always comparing

finding potential duplicate documents

2005-05-13 Thread Marco Dissel
Hello I've got many documents that are potentially duplicate (merging several external systems). Any tips how to find documents that are potentially duplicate (using a variable ranking like >0.5 match).. I can use the similarity (MoreLikeThis) method from Sandbox, but that's always comparing