RE: Lucene 4 getSpans not retrieving spans

Uwe Schindler Wed, 25 Jan 2012 14:22:15 -0800

Hi,
 
> Goofing off with my index, I ran across this example
> http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-
> positional-match-in-lucene/
> for
> using span queries to see what else is around a word that hits.
Noticeably,
> there's a nice getSpans(IndexReader) method that just takes in the index
reader
> and returns all the span objects, something not present in Lucene 4.
> I'm trying to replicate this in Lucene 4.0 to see how viable it is and
despite
> having my span query hit on 10 documents, I cannot retrieve any spans. The
API
> for doing this got remarkably more complex!
> 
> My code reads as follows:
> IndexReader ir = search.getIndexReader(); TermContext tmctxt =
> TermContext.build(ir.getTopReaderContext(),
> testSpan.getTerm(), false);
> Map termMap = new HashMap();
> termMap.put(testSpan.getTerm(), tmctxt); AtomicReaderContext ac = new
> IndexReader.AtomicReaderContext(ir);


Don't do this, to get a top level IndexReader context, use
IR.getTopReaderContext(). What you do here is creating an atomic context on
an index reader that might not be atomic, this can be the reason for
failures. Should also throw random exceptions.

BTW: There is currently lot's of work done refactoring IndexReaders in two
separate classes (CompositeIndexReader and AtomicIndexReader, so the many
UnsupportedOperationEx methods will go away; see
https://issues.apache.org/jira/browse/LUCENE-2858). You can then only get
and execute spans/queries/filters/termsenum/docsenum on AtomicIndexReader
and the corresponding contexts will be type safe. Currently this is one of
the parts in the Lucene API that's very inconsistent and programmer
unfriendly, because most IndexReaders in Lucene (like DirectoryReader or
MultiReader) are composite readers that no longer have low-level
terms/postings APIs. The new API will separate both types strictly. Also
stuff like reopen will move away from the abstract IndexReader interface.

The above code will completely fail to compile after the IR refactoring :-)
The problem is here that you get the IndexReader that's a composite reader
from the IndexSearcher but you try to execute Queries on it. This is no
longer possible. You have to ask the reader for the index segments and do
the search on the low-level atomic SegmentReaders separately. Alternatively
wrap your IR with SlowMultiReaderWrapper that creates an atomic "view" on an
index, but its simply slow, but emulates the behavior still possible in
Lucene 3.x [but also slow there] :-)

> Bits bits = new Bits.MatchAllBits(0);
> Spans spans = testSpan.getSpans(ac, bits, termMap);

This asks for spans with no deleted documents and an Index of size 0 ->
cannot work.

> However, spans never returns a spans object, spans.next() always returns
false.
> 
> Am I missing anything?
> 
> Thanks!
> Stephen


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Lucene 4 getSpans not retrieving spans

Reply via email to