Am 15.04.2013 10:42, schrieb Uwe Schindler: > Not every DocIdSet supports bits(). If it returns null, then bits are not > supported. To enforce a bitset availabe use CachingWrapperFilter (which > internally uses a BitSet to cache). > It might also happen that Filter.getDocIdSet() returns null, which means that > no document matches the filter.
I've been using a ChainedFilter so far. I think this should also support bits(), right? > AcceptDocs in Lucene are generally all non-deleted documents. For your call > to Filter.getDocIdSet you should therefor pass AtomicReader.getLiveDocs() and > not Bits.MatchAllBits. I see. As far as I understand the documentation, getLiveDocs() returns null if there are no deleted documents and returns the Bits matching all available (not deleted) documents otherwise: "Returns the Bits representing live (not deleted) docs. A set bit indicates the doc ID has not been deleted. If this method returns null it means there are no deleted documents." I understand that if there are no deleted documents, I need to replace the result (null) with Bits.MatchAllDocuments(), right? If there are deleted documents however, I can pass on the result having all available (not deleted) document bits set. > You are somehow "misusing" acceptDocs and DocIdSet here, so you have to take > care, semantics are different: > - For acceptDocs "null" means "all documents allowed" -> no deleted documents > - For DocIdSet "null" means "no documents matched" Okay, as described above, I would now pass either the result of getLiveDocs() or Bits.MatchAllDocuments() as the acceptDocs argument to getDocIdSet(): Map<Term, TermContext> termContexts = new HashMap<>(); AtomicReaderContext atomic = ... ChainedFilter filter = ... Bits allDocs = atomic.reader().getLiveDocs(); if (allDocs == null) { // no deleted documents allDocs = new Bits.MatchAllBits(atomic.reader().maxDoc()); } Bits bits = filter.getDocIdSet(atomic, allDocs).bits(); if (bits == null) { // no documents matching filter continue; // skip this iteration } Spans spans = sq.getSpans(atomic, bits, termContexts); > Finally: The trick here is to make Spans think that there are more deleted > docs than AtomicReader returns as deleted docs (if you would directly pass > getLiveDocs() to getSpans()). The filter is applied to the deleted docs > BitSet. Yep, I think I've tried to simulate that now. It is pretty hard to test this systematically, so please let me know if you see an obvious flaw in my code. Thanks! Best, Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org