Op Monday 12 May 2008 09:06:36 schreef Eran Sevi: > Thanks Paul, > > I'll give your code sample a try. > I still think that calling getSpans (the first line of code) that > returns millions of results is going to be much slower than calling > getSpans that's going to return only a few thousands of results. > Since the filtering is only performed after calling this method it > can't help in this case.
The given untested code (apart from possible bugs) would process exactly as many documents and their Spans as SpanScorer would in the explicitly filtered case. > I guess your suggested solution is my best option without changing > the way getSpans works (which I'm not going to change any time soon ) Before doing that, have a look at the code of SpanWeight/SpanScorer, ConjunctionScorer, and the filtering code in IndexSearcher. Regards, Paul Elschot. > Eran. > > On Wed, May 7, 2008 at 7:22 PM, Paul Elschot <[EMAIL PROTECTED]> wrote: > > Op Wednesday 07 May 2008 10:18:38 schreef Eran Sevi: > > > Thanks Paul for your reply, > > > > > > Since my index contains a couple of millions documents and the > > > filter is supposed to limit the search space to a few thousands I > > > was hoping I won't have to do the filtering myself after running > > > the query on all the index. > > > > The code I gave earlier effectively does a filtered query search > > on the index. It visits the resulting Spans, and does not provide > > a score value per document as SpanScorer would do. > > Please make sure to test that code thoroughly for reliable results. > > > > > Maybe this is the case anyway and behind the scenes the filter > > > does exactly what you suggested. > > > > Yes, a filtered query search would use skipTo() on the Spans via > > SpanScorer. But the difference between the normal case > > and your case is that you don't need SpanScorer. > > > > > From what I tested the number of results of the SpanQuery greatly > > > affects the running speed so if I'm going to use about 0.1% of > > > the results I'm loosing a lot of time and memory for gathering > > > and storing the spans I'm not going to use. > > > > > > I don't know how SpanQuery works internally but I guess that if > > > the filter is known beforehand, > > > > A Filter needs to make a BitSet available before the query search. > > > > > it could speed things up quite a bit. > > > > I would expect a substantial speedup from using skipTo() on the > > Spans when only 0.1% of the results passes the filter. > > > > Regards, > > Paul Elschot > > > > > Eran. > > > > > > > > > On Wed, May 7, 2008 at 10:34 AM, Paul Elschot > > > <[EMAIL PROTECTED]> > > > > > > wrote: > > > > Op Tuesday 06 May 2008 17:39:38 schreef Paul Elschot: > > > > > Eran, > > > > > > > > > > Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi: > > > > > > Hi, > > > > > > > > > > > > I am looking for a way to filter a SpanQuery according to > > > > > > some other query (on another field from the one used for > > > > > > the SpanQuery). I need to get access to the spans > > > > > > themselves of course. I don't care about the scoring of the > > > > > > filter results and just need the positions of hits found in > > > > > > the documents that matches the filter. > > > > > > > > > > I think you'll have to implement the filtering on the Spans > > > > > yourself. That's not really difficult, just use > > > > > Spans.skipTo(). The code to do that could look sth like this > > > > > (untested): > > > > > > > > > > Spans spans = yourSpanQuery.getSpans(reader); > > > > > BitSet bits = yourFilter.bits(reader); > > > > > int filterDoc = bits.nextSetBit(0); > > > > > while ((filterDoc >= 0) and spans.skipTo(filterDoc)) { > > > > > boolean more = true; > > > > > while (more and (spans.doc() == filterDoc)) { > > > > > // use spans.start() and spans.end() here > > > > > // ... > > > > > more = spans.next(); > > > > > } > > > > > if (! more) { > > > > > break; > > > > > } > > > > > filterDoc = bits.nextSetBit(spans.doc()); > > > > > > > > At this point, no skipping on the spans should be done when > > > > filterDoc equals spans.doc(), so this code still needs some > > > > work. But I think you get the idea. > > > > > > > > Regards, > > > > Paul Elschot > > > > > > > > > } > > > > > > > > > > Please check the javadocs of java.util.BitSet, there may > > > > > be a 1 off error in the arguments to nextSetBit(). > > > > > > > > > > Regards, > > > > > Paul Elschot > > > > > > > > > > > I tried looking through the archives and found some > > > > > > reference to a SpanQueryFilter patch, however I don't see > > > > > > how it can help me achieve what I want to do. This class > > > > > > receives only one query parameter (which I guess is the > > > > > > actual query) and not a query and a filter for example. > > > > > > > > > > > > Any help about how I can achieve this will be appreciated. > > > > > > > > > > > > Thanks, > > > > > > Eran. > > > > > > > > > > ------------------------------------------------------------- > > > > >---- ---- To unsubscribe, e-mail: > > > > > [EMAIL PROTECTED] For additional > > > > > commands, e-mail: [EMAIL PROTECTED] > > > > > > > > --------------------------------------------------------------- > > > >---- -- To unsubscribe, e-mail: > > > > [EMAIL PROTECTED] For additional > > > > commands, e-mail: [EMAIL PROTECTED] > > > > ------------------------------------------------------------------- > >-- To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]