the
> > original poster about his notion of what a duplicate document meant to
> > him. You're right it would be useful to understand more about the
> > intention of the original message.
> >
> > Cheers
> > Mark
> >
> >
> >
> >
> >
asking the
> > original poster about his notion of what a duplicate document meant to
> > him. You're right it would be useful to understand more about the
> > intention of the original message.
> >
> > Cheers
> > Mark
> >
> >
> >
> >
>
him. You're right it would be useful to understand more about the
> intention of the original message.
>
> Cheers
> Mark
>
>
>
>
>
> ----- Original Message
> From: Erick Erickson <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Tuesd
at a duplicate document meant to him.
> You're right it would be useful to understand more about the intention of
> the original message.
>
> Cheers
> Mark
>
>
>
>
>
> - Original Message
> From: Erick Erickson <[EMAIL PROTECTED]>
> To: ja
a duplicate document meant to him. You're right it
would be useful to understand more about the intention of the original message.
Cheers
Mark
- Original Message
From: Erick Erickson <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, 22 July, 2008 2:37:50
Well, the point of my question was to insure that we were all using common
terms. For all we know, the original questioner considered "duplicate"
records ones that had identical, or even similar text. Nothing in the
original question indicated any de-dup happening.
I've often found that assumption
: markharw00d <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Monday, 21 July, 2008 8:44:26 PM
> Subject: Re: How to avoid duplicate records in lucene
>
> >>could you define duplicate?
>
> That's your choice of field that you want to de-dup on.
>
>>could you define duplicate?
That's your choice of field that you want to de-dup on.
That could be a field such as "DatabasePrimaryKey" or perhaps a field
containing an MD5 hash of document content.
The DuplicateFilter ensures only one document can exist in results for
each unique value for th
could you define duplicate? As far as I know, you don't
get the same (internal) doc id back more than once, so what
is a duplicate?
Best
Erick
On Mon, Jul 21, 2008 at 9:40 AM, Sebastin <[EMAIL PROTECTED]> wrote:
>
> at the time search , while querying the data
> markrmiller wrote:
> >
> > Sebast
at the time search , while querying the data
markrmiller wrote:
>
> Sebastin wrote:
>> Hi All,
>>
>> Is there any possibility to avoid duplicate records in lucene 2.3.1?
>>
> I don't believe that there is a very high performance way to do this.
> You are basically going to have to query the
Sebastin wrote:
Hi All,
Is there any possibility to avoid duplicate records in lucene 2.3.1?
I don't believe that there is a very high performance way to do this.
You are basically going to have to query the index for an id before
adding a new doc. The best way I can think of off the top
Sebastin wrote:
Hi All,
Is there any possibility to avoid duplicate records in lucene 2.3.1?
At index-time or query time?
See DuplicateFilter in contrib/queries for a query-time filter
Cheers
Mark
-
To unsubscribe, e-
12 matches
Mail list logo