On Mar 5, 2011, at 8:35 PM, Mark wrote: > I'm familiar with Deduplication however I do not wish to remove my duplicates > and my needs are slightly different. I would like to mark the first document > with signature 'xyz' as unique but the next one as a duplicate. This way I > can filter out "duplicates" during searching using a filter query but still > return the original document.
My understanding is that you can have it mark duplicates. > > The only thing I know of at the moment is to use field collapsing but I tried > the patch on 1.4.1 and it was terribly slow. > > On 3/5/11 4:43 AM, Grant Ingersoll wrote: >> See http://wiki.apache.org/solr/Deduplication. Should be fairly easy to >> pull out if you are doing just Lucene. >> >> On Mar 5, 2011, at 1:49 AM, Mark wrote: >> >>> Is there a way one could detect duplicates (say by using some unique hash >>> of certain fields) and marking a document as a duplicate but not remove it. >>> >>> Here is an example: >>> >>> Doc 1) This is my test >>> Doc 2) This is my test >>> Doc 3) Another test >>> Doc 4) This is my test >>> >>> Doc 1 and 3 should be considered unique whereas 2 and 4 should be marked as >>> duplicates (of doc 1). >>> >>> Can this be easily accomplished? >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem docs using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org