Will be great if someone can show how to do it.. For my application, I donot care about any score (just vanilla boolean search is sufficient)
In the mean while, I experimented with some workaround and would like to share the findings: Problem details: On a collection on 10 million documents, I want to run boolean queries. These boolean queries act as document classifiers for us and there are a few 1500 such queries (each having about 300 boolean clauses). If a document matches the query, we want to know which parts of the boolean queries match the doc (this is a BI application which does text analytics and we need the counts for each matched boolean clause for statistics purpose) As a workaround, I create a filter using the original boolean query, cache it, and fire each boolean sub-query subsequently. This has given me a lot of performance gain (these are initial observations, am still evaluating the performance) Some pseudo-code Filter filter = new QueryWrapperFilter(bigBooleanQuery); CachingWrapperFilter cachingFilter; cachingFilter = new CachingWrapperFilter(filter); fire each boolean subQuery with filter... On Wed, Jul 18, 2012 at 9:25 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > This is possible, using the ScorerVisitor (3.6) / getChildren (4.0). > You need a custom collector that when it collects a competitive hit, > visits the sub-scorers of your BooleanQuery and saves away which ones > matched the current doc. > > But this is very expert and there are real challenges (eg not all > scorers score document-at-a-time) ... would be nice if someone wrote > up some example code showing how to do it... > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, Jul 18, 2012 at 7:17 AM, Ashish Jaen <ashishj...@gmail.com> wrote: > > Is there a way to know which sub-clause of a boolean query matched in the > > result document ? Currently I am using searcher.explain() on each of the > > sub-clause of the boolean query (on each of the documents returned by > > searcher). However, this is turning out to be very slow as I need to > > process ALL the documents returned by the query (A typical query returns > > about 20 thousand documents and my collection has 10 million docs. My > > application is not a user facing one, so few seconds per query is still > > acceptable) > > > > I was wondering if there is a efficient way to achieve the above which > > doesnot use explain() (perhaps storing the information about which > > sub-clause matched a document while searching). Can anyone provide some > > method to solve this and point to the relevant classes which need to be > > changed. > > > > Thanks, > > -Ashish > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >