Erick, example,
IndexWriter writer = new IndexWriter("C:/index",new StandardAnalyzer(),true); String records = "Lucene" +" " +"action"+" "+"book" ; Document doc = new Document(); doc.add(new Field("contents",records,Field.Store.YES,Field.Index.TOKENIZED)); writer.addDocument(doc); writer.optimize(); writer.close(); when the records is inserted twice,while querying for "Lucene" it will display the same record twice. mark harwood wrote: > >>>Well, the point of my question was to insure that we were all using common terms. > > Sorry, Erick. I thought your "define duplicate" question was asking me > about DuplicateFilter's concept of duplicates rather than asking the > original poster about his notion of what a duplicate document meant to > him. You're right it would be useful to understand more about the > intention of the original message. > > Cheers > Mark > > > > > > ----- Original Message ---- > From: Erick Erickson <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, 22 July, 2008 2:37:50 PM > Subject: Re: How to avoid duplicate records in lucene > > Well, the point of my question was to insure that we were all using common > terms. For all we know, the original questioner considered "duplicate" > records ones that had identical, or even similar text. Nothing in the > original question indicated any de-dup happening. > > I've often found that assumptions that we are all talking about the same > thing are...er...incorrect. And I don't want to waste my time answering > questions that weren't what was asked...... > > Best > Erick > > On Mon, Jul 21, 2008 at 2:44 PM, markharw00d <[EMAIL PROTECTED]> > wrote: > >> >>could you define duplicate? >> >> That's your choice of field that you want to de-dup on. >> That could be a field such as "DatabasePrimaryKey" or perhaps a field >> containing an MD5 hash of document content. >> The DuplicateFilter ensures only one document can exist in results for >> each >> unique value for the choice of field. >> >> Cheers >> Mark >> >> Erick Erickson wrote: >> >>> could you define duplicate? As far as I know, you don't >>> get the same (internal) doc id back more than once, so what >>> is a duplicate? >>> >>> Best >>> Erick >>> >>> On Mon, Jul 21, 2008 at 9:40 AM, Sebastin <[EMAIL PROTECTED]> wrote: >>> >>> >>> >>>> at the time search , while querying the data >>>> markrmiller wrote: >>>> >>>> >>>>> Sebastin wrote: >>>>> >>>>> >>>>>> Hi All, >>>>>> >>>>>> Is there any possibility to avoid duplicate records in lucene 2.3.1? >>>>>> >>>>>> >>>>>> >>>>> I don't believe that there is a very high performance way to do this. >>>>> You are basically going to have to query the index for an id before >>>>> adding a new doc. The best way I can think of off the top of my head >>>>> is >>>>> to batch - first check that ids in the batch are unique, then check >>>>> all >>>>> ids in the batch against the IndexReader, then add the ones that are >>>>> not >>>>> dupes. Of course all of your docs would have to be added through this >>>>> single choke point so that you knew other threads had not added that >>>>> id >>>>> after the first thread had looked but before it added the doc. >>>>> >>>>> I think Mark H has you covered if getting the dupes out after are >>>>> okay. >>>>> >>>>> - Mark >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> View this message in context: >>>> >>>> http://www.nabble.com/How-to-avoid-duplicate-records-in-lucene-tp18543588p18568862.html >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >>>> >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> No virus found in this incoming message. >>> Checked by AVG. Version: 7.5.526 / Virus Database: 270.5.3/1563 - >>> Release >>> Date: 20/07/2008 12:59 >>> >>> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > > __________________________________________________________ > Not happy with your email address?. > Get the one you really want - millions of new email addresses available > now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/How-to-avoid-duplicate-records-in-lucene-tp18543588p18603752.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]