Hi,
> > AcceptDocs in Lucene are generally all non-deleted documents. For your > call to Filter.getDocIdSet you should therefor pass > AtomicReader.getLiveDocs() and not Bits.MatchAllBits. > > I see. As far as I understand the documentation, getLiveDocs() returns null if > there are no deleted documents and returns the Bits matching all available > (not deleted) documents otherwise: > "Returns the Bits representing live (not deleted) docs. A set bit indicates > the > doc ID has not been deleted. If this method returns null it means there are > no deleted documents." > I understand that if there are no deleted documents, I need to replace the > result (null) with Bits.MatchAllDocuments(), right? If there are deleted > documents however, I can pass on the result having all available (not > deleted) document bits set. No, if acceptDocs==null the filter/query/... assumes that there are no deleted documents. Just pass null. > > You are somehow "misusing" acceptDocs and DocIdSet here, so you have > to take care, semantics are different: > > - For acceptDocs "null" means "all documents allowed" -> no deleted > > documents > > - For DocIdSet "null" means "no documents matched" > > Okay, as described above, I would now pass either the result of > getLiveDocs() or Bits.MatchAllDocuments() as the acceptDocs argument to > getDocIdSet(): > > Map<Term, TermContext> termContexts = new HashMap<>(); > AtomicReaderContext atomic = ... > ChainedFilter filter = ... You just pass getLiveDocs(), no null check needed. Using your code would bring a slowdown for indexes without deletions. > Bits allDocs = atomic.reader().getLiveDocs(); if (allDocs == null) { > // no deleted documents > allDocs = new Bits.MatchAllBits(atomic.reader().maxDoc()); > } > Bits bits = filter.getDocIdSet(atomic, allDocs).bits(); if (bits == null) { > // no documents matching filter > continue; // skip this iteration > } > Spans spans = sq.getSpans(atomic, bits, termContexts); > > > > Finally: The trick here is to make Spans think that there are more deleted > docs than AtomicReader returns as deleted docs (if you would directly pass > getLiveDocs() to getSpans()). The filter is applied to the deleted docs > BitSet. > > Yep, I think I've tried to simulate that now. It is pretty hard to test this > systematically, so please let me know if you see an obvious flaw in my code. > Thanks! > Best, > Carsten > > -- > Institut für Deutsche Sprache | http://www.ids-mannheim.de > Projekt KorAP | http://korap.ids-mannheim.de > Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de > Korpusanalyseplattform der nächsten Generation Next Generation Corpus > Analysis Platform > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org