NP, if my original reply had included my second one, then you'd have known what I was talking about <G>...
I *love* it when I unknowingly demonstrate the issue I'm trying to clarify <G>. Best Erick On Tue, Jul 22, 2008 at 2:09 PM, mark harwood <[EMAIL PROTECTED]> wrote: > >>Well, the point of my question was to insure that we were all using > common terms. > > Sorry, Erick. I thought your "define duplicate" question was asking me > about DuplicateFilter's concept of duplicates rather than asking the > original poster about his notion of what a duplicate document meant to him. > You're right it would be useful to understand more about the intention of > the original message. > > Cheers > Mark > > > > > > ----- Original Message ---- > From: Erick Erickson <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, 22 July, 2008 2:37:50 PM > Subject: Re: How to avoid duplicate records in lucene > > Well, the point of my question was to insure that we were all using common > terms. For all we know, the original questioner considered "duplicate" > records ones that had identical, or even similar text. Nothing in the > original question indicated any de-dup happening. > > I've often found that assumptions that we are all talking about the same > thing are...er...incorrect. And I don't want to waste my time answering > questions that weren't what was asked...... > > Best > Erick > > On Mon, Jul 21, 2008 at 2:44 PM, markharw00d <[EMAIL PROTECTED]> > wrote: > > > >>could you define duplicate? > > > > That's your choice of field that you want to de-dup on. > > That could be a field such as "DatabasePrimaryKey" or perhaps a field > > containing an MD5 hash of document content. > > The DuplicateFilter ensures only one document can exist in results for > each > > unique value for the choice of field. > > > > Cheers > > Mark > > > > Erick Erickson wrote: > > > >> could you define duplicate? As far as I know, you don't > >> get the same (internal) doc id back more than once, so what > >> is a duplicate? > >> > >> Best > >> Erick > >> > >> On Mon, Jul 21, 2008 at 9:40 AM, Sebastin <[EMAIL PROTECTED]> wrote: > >> > >> > >> > >>> at the time search , while querying the data > >>> markrmiller wrote: > >>> > >>> > >>>> Sebastin wrote: > >>>> > >>>> > >>>>> Hi All, > >>>>> > >>>>> Is there any possibility to avoid duplicate records in lucene 2.3.1? > >>>>> > >>>>> > >>>>> > >>>> I don't believe that there is a very high performance way to do this. > >>>> You are basically going to have to query the index for an id before > >>>> adding a new doc. The best way I can think of off the top of my head > is > >>>> to batch - first check that ids in the batch are unique, then check > all > >>>> ids in the batch against the IndexReader, then add the ones that are > not > >>>> dupes. Of course all of your docs would have to be added through this > >>>> single choke point so that you knew other threads had not added that > id > >>>> after the first thread had looked but before it added the doc. > >>>> > >>>> I think Mark H has you covered if getting the dupes out after are > okay. > >>>> > >>>> - Mark > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>> For additional commands, e-mail: [EMAIL PROTECTED] > >>>> > >>>> > >>>> > >>>> > >>>> > >>> -- > >>> View this message in context: > >>> > >>> > http://www.nabble.com/How-to-avoid-duplicate-records-in-lucene-tp18543588p18568862.html > >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>> For additional commands, e-mail: [EMAIL PROTECTED] > >>> > >>> > >>> > >>> > >> > >> > ------------------------------------------------------------------------ > >> > >> No virus found in this incoming message. > >> Checked by AVG. Version: 7.5.526 / Virus Database: 270.5.3/1563 - > Release > >> Date: 20/07/2008 12:59 > >> > >> > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > __________________________________________________________ > Not happy with your email address?. > Get the one you really want - millions of new email addresses available now > at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >