Re: DuplicateFilter filters not only duplicates

2012-08-30 Thread mark harwood
DuplicateFilter has been mostly broken  since Lucene's switch over to segment-level filtering. Since v2.9 the calls to Filter.getDocIdSet no longer pass a "top-level" reader for accessing the whole index and instead pass a reader restricted to only accessing a single segment's contents. Becaus

Re: DuplicateFilter filters not only duplicates

2012-08-30 Thread Ian Lea
https://issues.apache.org/jira/browse/LUCENE-2348 suggests there are long-standing and probably still current issues with DuplicateFilter and multiple segments. I'm not sure if this could explain what you are seeing. You could try calling optimize(1) on your index writer and see if that makes a d