Re: How to avoid duplicate records in lucene

2008-07-23 Thread Chris Lu
the > > original poster about his notion of what a duplicate document meant to > > him. You're right it would be useful to understand more about the > > intention of the original message. > > > > Cheers > > Mark > > > > > > > > > >

Re: How to avoid duplicate records in lucene

2008-07-23 Thread Erick Erickson
asking the > > original poster about his notion of what a duplicate document meant to > > him. You're right it would be useful to understand more about the > > intention of the original message. > > > > Cheers > > Mark > > > > > > > > >

Re: How to avoid duplicate records in lucene

2008-07-22 Thread Sebastin
him. You're right it would be useful to understand more about the > intention of the original message. > > Cheers > Mark > > > > > > ----- Original Message > From: Erick Erickson <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesd

Re: How to avoid duplicate records in lucene

2008-07-22 Thread Erick Erickson
at a duplicate document meant to him. > You're right it would be useful to understand more about the intention of > the original message. > > Cheers > Mark > > > > > > - Original Message > From: Erick Erickson <[EMAIL PROTECTED]> > To: ja

Re: How to avoid duplicate records in lucene

2008-07-22 Thread mark harwood
a duplicate document meant to him. You're right it would be useful to understand more about the intention of the original message. Cheers Mark - Original Message From: Erick Erickson <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, 22 July, 2008 2:37:50

Re: How to avoid duplicate records in lucene

2008-07-22 Thread Erick Erickson
Well, the point of my question was to insure that we were all using common terms. For all we know, the original questioner considered "duplicate" records ones that had identical, or even similar text. Nothing in the original question indicated any de-dup happening. I've often found that assumption

Re: How to avoid duplicate records in lucene

2008-07-21 Thread eks dev
: markharw00d <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Monday, 21 July, 2008 8:44:26 PM > Subject: Re: How to avoid duplicate records in lucene > > >>could you define duplicate? > > That's your choice of field that you want to de-dup on. >

Re: How to avoid duplicate records in lucene

2008-07-21 Thread markharw00d
>>could you define duplicate? That's your choice of field that you want to de-dup on. That could be a field such as "DatabasePrimaryKey" or perhaps a field containing an MD5 hash of document content. The DuplicateFilter ensures only one document can exist in results for each unique value for th

Re: How to avoid duplicate records in lucene

2008-07-21 Thread Erick Erickson
could you define duplicate? As far as I know, you don't get the same (internal) doc id back more than once, so what is a duplicate? Best Erick On Mon, Jul 21, 2008 at 9:40 AM, Sebastin <[EMAIL PROTECTED]> wrote: > > at the time search , while querying the data > markrmiller wrote: > > > > Sebast

Re: How to avoid duplicate records in lucene

2008-07-21 Thread Sebastin
at the time search , while querying the data markrmiller wrote: > > Sebastin wrote: >> Hi All, >> >> Is there any possibility to avoid duplicate records in lucene 2.3.1? >> > I don't believe that there is a very high performance way to do this. > You are basically going to have to query the

Re: How to avoid duplicate records in lucene

2008-07-20 Thread Mark Miller
Sebastin wrote: Hi All, Is there any possibility to avoid duplicate records in lucene 2.3.1? I don't believe that there is a very high performance way to do this. You are basically going to have to query the index for an id before adding a new doc. The best way I can think of off the top

Re: How to avoid duplicate records in lucene

2008-07-19 Thread markharw00d
Sebastin wrote: Hi All, Is there any possibility to avoid duplicate records in lucene 2.3.1? At index-time or query time? See DuplicateFilter in contrib/queries for a query-time filter Cheers Mark - To unsubscribe, e-