Mark, Keep in mind that there are actually multiple patches for this. SOLR-236 and SOLR-1086 IIRC. Also, I just noticed this is java-user@lucene. You may want to continue on solr-user@lucene.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Mark <static.void....@gmail.com> > To: java-user@lucene.apache.org > Sent: Sat, March 5, 2011 8:35:13 PM > Subject: Re: Detecting duplicates > > I'm familiar with Deduplication however I do not wish to remove my > duplicates and my needs are slightly different. I would like to mark the > first document with signature 'xyz' as unique but the next one as a > duplicate. This way I can filter out "duplicates" during searching using > a filter query but still return the original document. > > The only thing I know of at the moment is to use field collapsing but I > tried the patch on 1.4.1 and it was terribly slow. > > On 3/5/11 4:43 AM, Grant Ingersoll wrote: > > See http://wiki.apache.org/solr/Deduplication. Should be fairly easy to >pull out if you are doing just Lucene. > > > > On Mar 5, 2011, at 1:49 AM, Mark wrote: > > > >> Is there a way one could detect duplicates (say by using some unique hash >of certain fields) and marking a document as a duplicate but not remove it. > >> > >> Here is an example: > >> > >> Doc 1) This is my test > >> Doc 2) This is my test > >> Doc 3) Another test > >> Doc 4) This is my test > >> > >> Doc 1 and 3 should be considered unique whereas 2 and 4 should be marked > >> as >duplicates (of doc 1). > >> > >> Can this be easily accomplished? > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > -------------------------- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem docs using Solr/Lucene: > > http://www.lucidimagination.com/search > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org