Re: Lucene performance benchmark | search throughput

Rajnish kamboj Thu, 05 Jan 2017 23:28:40 -0800

OK, got it

One thing still I need to know (which is not clear to me)....
How does Lucene calculates the most restrictive clause?


Correct me, if I am wrong in my understanding (in abstract):
1. During indexing, Lucene keeps information of documents count against
every indexed items.
2. During search, it first checks, which condition has less number of
documents count before actually iterating.
3. Then, it iterates that restricted set against other set of conditions.

If the above is correct then how does Lucene calculates most restrictive
clause in case of Wildcard conditions?
Also, if Lucene first check for most restrictive clause, and then iterate
to match documents to the other clauses,
        Then when will the merging of documents happen?

Coming on to my main query for which I ask question in Lucene community:
What is the search performance benchmark against Lucene version, so that I
can benchmark my application throughput?


Regards
Rajnish

On Tue, Jan 3, 2017 at 6:09 PM, Rajnish kamboj <rajnishk7.i...@gmail.com>
wrote:

> OK, got it
>
> One thing still I need to know (which is not clear to me)....
> How does Lucene calculates the most restrictive clause?
>
> Correct me, if I am wrong in my understanding (in abstract):
> 1. During indexing, Lucene keeps information of documents count against
> every indexed items.
> 2. During search, it first checks, which condition has less number of
> documents count before actually iterating.
> 3. Then, it iterates that restricted set against other set of conditions.
>
> If the above is correct then how does Lucene calculates most restrictive
> clause in case of Wildcard conditions?
> Also, if Lucene first check for most restrictive clause, and then iterate
> to match documents to the other clauses,
>         Then when will the merging of documents happen?
>
> Coming on to my main query for which I ask question in Lucene community:
> What is the search performance benchmark against Lucene version, so that I
> can benchmark my application throughput?
>
>
>
> On Tue, Jan 3, 2017 at 5:12 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> When you add MUST sub-clauses to a BooleanQuery  (AND to the query
>> parsers) it can make the search run faster because Lucene will take
>> the most restrictive clause and use that to "drive" the iteration of
>> matching documents to the other clauses, allowing those other clauses
>> to iterate much faster than they would otherwise require if they were
>> not AND'd.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Jan 3, 2017 at 6:33 AM, Rajnish kamboj <rajnishk7.i...@gmail.com>
>> wrote:
>> > The answer is not clear.
>> >
>> > Suppose I have following query and I want 10 records.
>> > Condition1 AND Condition2 AND Condition3
>> >
>> > As per my understanding Lucene will first evaluate all conditions
>> > separately and then merge the Documents as per AND/OR clauses.
>> > At last it will return me 10 records.
>> >
>> > So, if I add one more condition, then it will add to search time and
>> merge
>> > time and hence increase latency, which results in decreased throughput.
>> >
>> >
>> > Also, what is the search performance benchmark against Lucene version?
>> >
>> >
>> > Regards
>> > Rajnish
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tuesday 3 January 2017, Michael Wilkowski <m...@silenteight.com>
>> wrote:
>> >
>> >> My guess: more conditions = less documents to score and sort to return.
>> >>
>> >> On Mon, Jan 2, 2017 at 7:23 PM, Rajnish kamboj <
>> rajnishk7.i...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi
>> >> >
>> >> > Is there any Lucene performance benchmark against certain set of
>> data?
>> >> > [i.e Is there any stats for search throughput which Lucene can
>> provide
>> >> for
>> >> > a certain data?]
>> >> >
>> >> > Search throughput Example:
>> >> > Max. 200 TPS for 50K data on Lucene 5.3.1 on RHEL version x (with
>> SSD)
>> >> > Max. 150 TPS for 100K data on Lucene 5.3.1 on RHEL version x (with
>> SSD)
>> >> > Max. 300 TPS for 50K data on Lucene 6.0.0 on RHEL version x (with
>> SSD)
>> >> > etc.
>> >> >
>> >> > Also, does the index size matters for search throughput?
>> >> >
>> >> > Our observation:
>> >> > When we increase the data size (hence index size) the search
>> throughput
>> >> > decreases.
>> >> > When we add more AND conditions, the search throughput increases.
>> Why?
>> >> > Ideally if we add more conditions then the Lucene should have more
>> work
>> >> to
>> >> > do (including merging) and the throughput should decrease but the
>> >> > throughput increases?
>> >> >
>> >> >
>> >> > Regards
>> >> > Rajnish
>> >> >
>> >>
>>
>
>

Re: Lucene performance benchmark | search throughput

Reply via email to