RE: Deleting duplicates from a Lucene index

2005-05-27 Thread Omar Didi
ay, May 26, 2005 9:18 PM To: java-user@lucene.apache.org Subject: Re: Deleting duplicates from a Lucene index : The two symptoms of this not behaving as expected are : 1) ir.docFreq(t) does not always equal the value returned by : ir.termDocs(t).read(docs, freqs) (see below for actual syntax used). :

Re: Deleting duplicates from a Lucene index

2005-05-26 Thread Chris Hostetter
: The two symptoms of this not behaving as expected are : 1) ir.docFreq(t) does not always equal the value returned by : ir.termDocs(t).read(docs, freqs) (see below for actual syntax used). : 2) Even after optimizing, I still have the same dupes in my index. As far as #1, i don't know much about

Deleting duplicates from a Lucene index

2005-05-26 Thread Dan Climan
I noticed in my lucene index that I had mistakenly indexed some documents multiple times. I wrote the following piece of code to find and eliminate the duplicates, but it did not behave as expected. Background: Every document has an ItemId field that was indexed as a keyword. Two or more documents