Looking over the implementation of SpanNearQuery I came upon what looked like a bug. Below is a test which fails due to it. SpanNearQuery doesn't return all matching spans; once it's found a span it always increments the span of the clause appearing first in that span (ie. in the example below the two spans should be "one two" and "one two two" where the second has a slop of 1 - unfortunately the span of "one" gets incremented after "one two" is found and so no additional spans get returned). Both in-order and out-of-order SpanNearQueries fail this test.
I think this is an undocumented feature and that the assumption is that if someone searches for "one" near "two" they're interested in the "one two" result and not necessarily the "one two two" result. However, SpanNearQueries can be combined and by not returning all matching spans this can result in problems. For example were we to intersect (ie. SpanNearQuery with 0 slop) between the results of different SpanNearQueries, it is possible that the shortest possible span won't intersect, while a longer span (with legal slop) would. In my mind this is a bug (at least until there is some documentation), and I would expect there to be an option (either a boolean parameter or a different class) which would indeed return all spans which satisfy the slop constraint. What I'd like to know is: 1) Is this a bug? 2) Is there any known workaround for this issue (besides rolling my own, of course)? 3) Could this bug/feature lead to problems with document scoring? Thanks, Moti import java.io.StringReader; import junit.framework.TestCase; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field ; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.spans.SpanNearQuery; import org.apache.lucene.search.spans.SpanQuery ; import org.apache.lucene.search.spans.SpanTermQuery; import org.apache.lucene.search.spans.Spans; import org.apache.lucene.store.RAMDirectory; public class SpanNearQueryTest extends TestCase { private RAMDirectory dir; @Override protected void setUp() throws Exception { super.setUp(); dir = new RAMDirectory(); Document doc = new Document(); doc.add(new Field("field", new StringReader("one two two"))); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer()); writer.addDocument(doc); writer.close(); } public void testNearQueryInOrder() throws Exception { checkNearQuery(true); } public void testNearQueryNotInOrder() throws Exception { checkNearQuery(false); } private void checkNearQuery(boolean inOrder) throws Exception { SpanNearQuery query = new SpanNearQuery(new SpanQuery[] {new SpanTermQuery(new Term("field", "one")), new SpanTermQuery(new Term("field", "two"))}, 5, inOrder); IndexReader reader = IndexReader.open(dir); Spans spans = query.getSpans(reader); int numSpans = 0; while (spans.next()) numSpans++; reader.close(); assertEquals(2, numSpans); } @Override protected void tearDown() throws Exception { dir = null; // release directory super.tearDown(); }