did you check it http://wiki.apache.org/solr/Deduplication
Best Regards Alexander Aristov On 10 March 2011 18:35, Mark <static.void....@gmail.com> wrote: > My understanding is It can mark documents with the same signature > indicating that they are similar however there is no way at query time to > return only 1 "unique" document per signature. Am I missing something? > > > Doc 1) This is my test > Doc 2) This is my test > Doc 3) Another test > Doc 4) This is my test > > If I run a query for "test" it should return > > > Doc 1) This is my test > Doc 3) Another test > > > > On 3/10/11 6:25 AM, Grant Ingersoll wrote: > >> On Mar 5, 2011, at 8:35 PM, Mark wrote: >> >> I'm familiar with Deduplication however I do not wish to remove my >>> duplicates and my needs are slightly different. I would like to mark the >>> first document with signature 'xyz' as unique but the next one as a >>> duplicate. This way I can filter out "duplicates" during searching using a >>> filter query but still return the original document. >>> >> My understanding is that you can have it mark duplicates. >> >> The only thing I know of at the moment is to use field collapsing but I >>> tried the patch on 1.4.1 and it was terribly slow. >>> >>> On 3/5/11 4:43 AM, Grant Ingersoll wrote: >>> >>>> See http://wiki.apache.org/solr/Deduplication. Should be fairly easy >>>> to pull out if you are doing just Lucene. >>>> >>>> On Mar 5, 2011, at 1:49 AM, Mark wrote: >>>> >>>> Is there a way one could detect duplicates (say by using some unique >>>>> hash of certain fields) and marking a document as a duplicate but not >>>>> remove >>>>> it. >>>>> >>>>> Here is an example: >>>>> >>>>> Doc 1) This is my test >>>>> Doc 2) This is my test >>>>> Doc 3) Another test >>>>> Doc 4) This is my test >>>>> >>>>> Doc 1 and 3 should be considered unique whereas 2 and 4 should be >>>>> marked as duplicates (of doc 1). >>>>> >>>>> Can this be easily accomplished? >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> -------------------------- >>>> Grant Ingersoll >>>> http://www.lucidimagination.com/ >>>> >>>> Search the Lucene ecosystem docs using Solr/Lucene: >>>> http://www.lucidimagination.com/search >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem docs using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >