Hi all. I notice that Filter.getDocIdSet() is now documented as follows:
Note: This method will be called once per segment in the index during searching. The returned {...@link DocIdSet} must refer to document IDs for that segment, not for the top-level reader. If I look at Lucene's own DuplicateFilter, isn't it making the assumption that it will only be called once? And a related question: for those of us who want to implement something *like* DuplicateFilter (as I have done before discovering this new Javadoc), is there a good way to go about it? It seems like we now need to keep a hash of all terms previously seen so that when we go over the new term enum we can check which ones have already been seen. This will dramatically increase memory usage compared to a single BitSet/OpenBitSet. Is there a better way? Also, I presume this means that Filter is now explicitly not threadsafe. We weren't keeping any state in them anyway, but now we will have to, so there is potential for a lot of new bugs if a filter is somehow used by two queries running at the same time. Daniel -- Daniel Noll Forensic and eDiscovery Software Senior Developer The world's most advanced Nuix email data analysis http://nuix.com/ and eDiscovery software --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org