The cost() method on DocIdSetIterator is responsible for telling BooleanQuery how costly that clause is, and how cost() is implemented varies by query.
For the multi-term queries, like WildcardQuery, Lucene will first visit all matched terms (during the Query.rewrite phase), and rewrite the query either into a disjunction (SHOULD of the N terms), or it will, per segment, visit all docs for all matching terms, setting them in a sparse or dense bitset, recording the cost as the number of documents. But there is work underway now to try to improve the multi-term query cases so that we don't go and do all that up-front work (visiting all terms, and all docs matching each term) when another clause in the boolean query is more restrictive: https://issues.apache.org/jira/browse/LUCENE-7055 Mike McCandless http://blog.mikemccandless.com On Fri, Jan 6, 2017 at 2:28 AM, Rajnish kamboj <rajnishk7.i...@gmail.com> wrote: > OK, got it > > One thing still I need to know (which is not clear to me).... > How does Lucene calculates the most restrictive clause? > > Correct me, if I am wrong in my understanding (in abstract): > 1. During indexing, Lucene keeps information of documents count against > every indexed items. > 2. During search, it first checks, which condition has less number of > documents count before actually iterating. > 3. Then, it iterates that restricted set against other set of conditions. > > If the above is correct then how does Lucene calculates most restrictive > clause in case of Wildcard conditions? > Also, if Lucene first check for most restrictive clause, and then iterate to > match documents to the other clauses, > Then when will the merging of documents happen? > > Coming on to my main query for which I ask question in Lucene community: > What is the search performance benchmark against Lucene version, so that I > can benchmark my application throughput? > > > Regards > Rajnish > > On Tue, Jan 3, 2017 at 6:09 PM, Rajnish kamboj <rajnishk7.i...@gmail.com> > wrote: >> >> OK, got it >> >> One thing still I need to know (which is not clear to me).... >> How does Lucene calculates the most restrictive clause? >> >> Correct me, if I am wrong in my understanding (in abstract): >> 1. During indexing, Lucene keeps information of documents count against >> every indexed items. >> 2. During search, it first checks, which condition has less number of >> documents count before actually iterating. >> 3. Then, it iterates that restricted set against other set of conditions. >> >> If the above is correct then how does Lucene calculates most restrictive >> clause in case of Wildcard conditions? >> Also, if Lucene first check for most restrictive clause, and then iterate >> to match documents to the other clauses, >> Then when will the merging of documents happen? >> >> Coming on to my main query for which I ask question in Lucene community: >> What is the search performance benchmark against Lucene version, so that I >> can benchmark my application throughput? >> >> >> >> On Tue, Jan 3, 2017 at 5:12 PM, Michael McCandless >> <luc...@mikemccandless.com> wrote: >>> >>> When you add MUST sub-clauses to a BooleanQuery (AND to the query >>> parsers) it can make the search run faster because Lucene will take >>> the most restrictive clause and use that to "drive" the iteration of >>> matching documents to the other clauses, allowing those other clauses >>> to iterate much faster than they would otherwise require if they were >>> not AND'd. >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> >>> On Tue, Jan 3, 2017 at 6:33 AM, Rajnish kamboj <rajnishk7.i...@gmail.com> >>> wrote: >>> > The answer is not clear. >>> > >>> > Suppose I have following query and I want 10 records. >>> > Condition1 AND Condition2 AND Condition3 >>> > >>> > As per my understanding Lucene will first evaluate all conditions >>> > separately and then merge the Documents as per AND/OR clauses. >>> > At last it will return me 10 records. >>> > >>> > So, if I add one more condition, then it will add to search time and >>> > merge >>> > time and hence increase latency, which results in decreased throughput. >>> > >>> > >>> > Also, what is the search performance benchmark against Lucene version? >>> > >>> > >>> > Regards >>> > Rajnish >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > On Tuesday 3 January 2017, Michael Wilkowski <m...@silenteight.com> >>> > wrote: >>> > >>> >> My guess: more conditions = less documents to score and sort to >>> >> return. >>> >> >>> >> On Mon, Jan 2, 2017 at 7:23 PM, Rajnish kamboj >>> >> <rajnishk7.i...@gmail.com> >>> >> wrote: >>> >> >>> >> > Hi >>> >> > >>> >> > Is there any Lucene performance benchmark against certain set of >>> >> > data? >>> >> > [i.e Is there any stats for search throughput which Lucene can >>> >> > provide >>> >> for >>> >> > a certain data?] >>> >> > >>> >> > Search throughput Example: >>> >> > Max. 200 TPS for 50K data on Lucene 5.3.1 on RHEL version x (with >>> >> > SSD) >>> >> > Max. 150 TPS for 100K data on Lucene 5.3.1 on RHEL version x (with >>> >> > SSD) >>> >> > Max. 300 TPS for 50K data on Lucene 6.0.0 on RHEL version x (with >>> >> > SSD) >>> >> > etc. >>> >> > >>> >> > Also, does the index size matters for search throughput? >>> >> > >>> >> > Our observation: >>> >> > When we increase the data size (hence index size) the search >>> >> > throughput >>> >> > decreases. >>> >> > When we add more AND conditions, the search throughput increases. >>> >> > Why? >>> >> > Ideally if we add more conditions then the Lucene should have more >>> >> > work >>> >> to >>> >> > do (including merging) and the throughput should decrease but the >>> >> > throughput increases? >>> >> > >>> >> > >>> >> > Regards >>> >> > Rajnish >>> >> > >>> >> >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org