ay, May 26, 2005 9:18 PM
To: java-user@lucene.apache.org
Subject: Re: Deleting duplicates from a Lucene index
: The two symptoms of this not behaving as expected are
: 1) ir.docFreq(t) does not always equal the value returned by
: ir.termDocs(t).read(docs, freqs) (see below for actual syntax used).
:
: The two symptoms of this not behaving as expected are
: 1) ir.docFreq(t) does not always equal the value returned by
: ir.termDocs(t).read(docs, freqs) (see below for actual syntax used).
: 2) Even after optimizing, I still have the same dupes in my index.
As far as #1, i don't know much about
I noticed in my lucene index that I had mistakenly indexed some documents
multiple times. I wrote the following piece of code to find and eliminate
the duplicates, but it did not behave as expected.
Background:
Every document has an ItemId field that was indexed as a keyword. Two or
more documents