Hi,

> > AcceptDocs in Lucene are generally all non-deleted documents. For your
> call to Filter.getDocIdSet you should therefor pass
> AtomicReader.getLiveDocs() and not Bits.MatchAllBits.
> 
> I see. As far as I understand the documentation, getLiveDocs() returns null if
> there are no deleted documents and returns the Bits matching all available
> (not deleted) documents otherwise:
> "Returns the Bits representing live (not deleted) docs. A set bit indicates 
> the
> doc ID has not been deleted. If this method returns null it means there are
> no deleted documents."
> I understand that if there are no deleted documents, I need to replace the
> result (null) with Bits.MatchAllDocuments(), right? If there are deleted
> documents however, I can pass on the result having all available (not
> deleted) document bits set.

No, if acceptDocs==null the filter/query/... assumes that there are no deleted 
documents. Just pass null.

> > You are somehow "misusing" acceptDocs and DocIdSet here, so you have
> to take care, semantics are different:
> > - For acceptDocs "null" means "all documents allowed" -> no deleted
> > documents
> > - For DocIdSet "null" means "no documents matched"
> 
> Okay, as described above, I would now pass either the result of
> getLiveDocs() or Bits.MatchAllDocuments() as the acceptDocs argument to
> getDocIdSet():
> 
> Map<Term, TermContext> termContexts = new HashMap<>();
> AtomicReaderContext atomic = ...
> ChainedFilter filter = ...

You just pass getLiveDocs(), no null check needed. Using your code would bring 
a slowdown for indexes without deletions.

> Bits allDocs = atomic.reader().getLiveDocs(); if (allDocs == null) {
>   // no deleted documents
>   allDocs = new Bits.MatchAllBits(atomic.reader().maxDoc());
> }
> Bits bits = filter.getDocIdSet(atomic, allDocs).bits(); if (bits == null) {
>   // no documents matching filter
>   continue; // skip this iteration
> }
> Spans spans = sq.getSpans(atomic, bits, termContexts);
> 
> 
> > Finally: The trick here is to make Spans think that there are more deleted
> docs than AtomicReader returns as deleted docs (if you would directly pass
> getLiveDocs() to getSpans()). The filter is applied to the deleted docs 
> BitSet.
> 
> Yep, I think I've tried to simulate that now. It is pretty hard to test this
> systematically, so please let me know if you see an obvious flaw in my code.
> Thanks!
> Best,
> Carsten
> 
> --
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP                 | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
> Korpusanalyseplattform der nächsten Generation Next Generation Corpus
> Analysis Platform
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to