Testing out a smaller "rows" param is key. Then you can isolate the
performance difference due to the 500 rows. Adding more shards is going to
increase the penalty for having 500 rows, so it's good to understand how
big that penalty is.

Then test out smaller result sets by adjusting the query. Gradually
increase the result set size by adjusting the query. You then can get a
feel for how result set size affects performance. This will give you an
indication how much it will help to have more shards.





Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jan 19, 2022 at 6:19 AM Charlie Hull <
ch...@opensourceconnections.com> wrote:

> Hi Ashwin,
>
> What happens if you reduce the number of rows requested? Do you really
> need 500 results each time? I think this will ask for 500 results from
> *each shard* too.
> https://solr.apache.org/guide/8_7/pagination-of-results.html
>
> Also it looks like you mean boost=def(boostFieldA,1) not
> boost:def(boostFieldA,1), am I right?
>
> Cheers
>
> Charlie
>
> On 19/01/2022 02:43, Ashwin Ramesh wrote:
> > Gentle ping! Promise it's my final one! :)
> >
> > On Thu, Jan 13, 2022 at 8:01 AM Ashwin Ramesh<ash...@canva.com>  wrote:
> >
> >> Hi everyone,
> >>
> >> I have a few questions about how we can improve our solr query
> >> performance, especially for boosts (BF, BQ, boost, etc).
> >>
> >> *System Specs:*
> >> Solr Version: 7.7.x
> >> Heap Size: 31gb
> >> Num Docs: >100M
> >> Shards: 8
> >> Replication Factor: 6
> >> Index is completely mapped into memory
> >>
> >>
> >> Example query:
> >> {
> >> q=hello world
> >> qf=title description keywords
> >> pf=title^0.5
> >> ps=0
> >> fq=type:P
> >> boost:def(boostFieldA,1) // boostFieldA is docValue float type
> >> bf=mul(termfreq(termScoreFieldB,$q),1000.0) // termScoreFieldB is a
> >> textField. No docValue, just indexed
> >> rows:500
> >> fl=id,score
> >> }
> >>
> >> numFound: >21M
> >> qTime: 800ms
> >>
> >> Experimentation of params:
> >>
> >>     - When I remove the boost parameter, the qTime drops to 525ms
> >>     - When I remove the bf parameter, the qTime dropes to 650ms
> >>     - When I remove both the boost & bf parameters, the qTime drops to
> >>     400ms
> >>
> >>
> >> Questions:
> >>
> >>     1. Is there any way to improve the performance of the boosts
> (specific
> >>     field types, etc)?
> >>     2. Will sharding further such that each core only has to score a
> >>     smaller subset of documents help with query performance?
> >>     3. Is there any performance impact when boosting/querying against
> >>     sparse fields, both indexed=true or docValues=true?
> >>     4. It seems the base case scoring is 400ms, which is already quite
> >>     high. Is this because the query (hello world) implicitly gets
> parsed as
> >>     (hello OR world)? Thus it would be more computationally expensive?
> >>     5. Any other advice :) ?
> >>
> >>
> >> Thanks in advance,
> >>
> >> Ash
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> --
> Charlie Hull - Managing Consultant at OpenSource Connections Limited
> Founding member of The Search Network <http://www.thesearchnetwork.com>
> and co-author of Searching the Enterprise
> <
> https://opensourceconnections.com/wp-content/uploads/2020/08/ES_book_final_journal_version.pdf
> >
> tel/fax: +44 (0)8700 118334
> mobile: +44 (0)7767 825828
>
> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
> Amtsgericht Charlottenburg | HRB 230712 B
> Geschäftsführer: John M. Woodell | David E. Pugh
> Finanzamt: Berlin Finanzamt für Körperschaften II
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>

Reply via email to