Erick,

example,

IndexWriter writer = new IndexWriter("C:/index",new
StandardAnalyzer(),true);

String records = "Lucene" +" " +"action"+" "+"book" ;

Document doc = new Document();

doc.add(new
Field("contents",records,Field.Store.YES,Field.Index.TOKENIZED));


writer.addDocument(doc);
writer.optimize();
writer.close();


when the records is inserted twice,while querying for "Lucene" it will
display the same record twice.








mark harwood wrote:
> 
>>>Well, the point of my question was to insure that we were all using
common terms.
> 
> Sorry, Erick. I thought your "define duplicate" question was asking me
> about DuplicateFilter's concept of duplicates rather than asking the
> original poster about his notion of what a duplicate document meant to
> him. You're right it would be useful to understand more about the
> intention of the original message.
> 
> Cheers
> Mark
> 
> 
> 
> 
> 
> ----- Original Message ----
> From: Erick Erickson <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Tuesday, 22 July, 2008 2:37:50 PM
> Subject: Re: How to avoid duplicate records in lucene
> 
> Well, the point of my question was to insure that we were all using common
> terms. For all we know, the original questioner considered "duplicate"
> records ones that had identical, or even similar text. Nothing in the
> original question indicated any de-dup happening.
> 
> I've often found that assumptions that we are all talking about the same
> thing are...er...incorrect. And I don't want to waste my time answering
> questions that weren't what was asked......
> 
> Best
> Erick
> 
> On Mon, Jul 21, 2008 at 2:44 PM, markharw00d <[EMAIL PROTECTED]>
> wrote:
> 
>> >>could you define duplicate?
>>
>> That's your choice of field that you want to de-dup on.
>> That could be a field such as "DatabasePrimaryKey" or perhaps a field
>> containing an MD5 hash of document content.
>> The DuplicateFilter ensures only one document can exist in results for
>> each
>> unique value for the choice of field.
>>
>> Cheers
>> Mark
>>
>> Erick Erickson wrote:
>>
>>> could you define duplicate? As far as I know, you don't
>>> get the same (internal) doc id back more than once, so what
>>> is a duplicate?
>>>
>>> Best
>>> Erick
>>>
>>> On Mon, Jul 21, 2008 at 9:40 AM, Sebastin <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>>
>>>> at the time search , while querying the data
>>>> markrmiller wrote:
>>>>
>>>>
>>>>> Sebastin wrote:
>>>>>
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Is there any possibility to avoid duplicate records in lucene  2.3.1?
>>>>>>
>>>>>>
>>>>>>
>>>>> I don't believe that there is a very high performance way to do this.
>>>>> You are basically going to have to query the index for an id before
>>>>> adding a new doc. The best way I can think of off the top of my head
>>>>> is
>>>>> to batch - first check that ids in the batch are unique, then check
>>>>> all
>>>>> ids in the batch against the IndexReader, then add the ones that are
>>>>> not
>>>>> dupes. Of course all of your docs would have to be added through this
>>>>> single choke point so that you knew other threads had not added that
>>>>> id
>>>>> after the first thread had looked but before it added the doc.
>>>>>
>>>>> I think Mark H has you covered if getting the dupes out after are
>>>>> okay.
>>>>>
>>>>> - Mark
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>> http://www.nabble.com/How-to-avoid-duplicate-records-in-lucene-tp18543588p18568862.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>
>>>>
>>>>
>>>>
>>>
>>> 
>>> ------------------------------------------------------------------------
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG. Version: 7.5.526 / Virus Database: 270.5.3/1563 -
>>> Release
>>> Date: 20/07/2008 12:59
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> 
> 
>       __________________________________________________________
> Not happy with your email address?.
> Get the one you really want - millions of new email addresses available
> now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-avoid-duplicate-records-in-lucene-tp18543588p18603752.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to