Update: Solr 6 and Solr 8 both parsed the same input query differently. Is
this an expected difference given that the config and schema are the same.
I suspect this difference could also be a reason for the difference in CPU
performance.

How can I tweak Solr 8 to parse the given query as *BoostQuery* instead of
*FunctionScoreQuery *to confirm this?

*Solr 6 Debug Information:*
"querystring": "aliases:\"love\"^700 OR tedge1:love^900 OR
title_variation_edge:\"love\"^900 OR titleNgram:\"love\"^10 OR
artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR title:\"love\"^1000 OR
(entity_edge:love^800)","parsedquery":
"BoostedQuery(boost(+((aliases:love)^700.0
(tedge1:love)^900.0 (title_variation_edge:love)^900.0
(titleNgram:love)^10.0 (artistEdge:love)^250.0 (keyword:love)^10.0
(title:love)^1000.0 (entity_edge:love)^800.0) ((keywords:\"700 900 (10 ten)
250 (10 ten) 1000\"~2)^25.0 | (aliases:\"700 900 (10 ten) 250 (10 ten)
1000\"~2)^50.0 | (title:\"700 900 (10 ten) 250 (10 ten)
1000\"~2)^60.0),product(sum(const(1),double(n_c_pop)),product(const(1.0E-4),sum(const(1),sum(product(const(0.01),long(search_click)),product(const(1),long(search_clickone)),product(const(10),long(search_clicktwo)),product(const(100),long(search_clickthree))))))))",
*Solr 8 Debug Information:*"querystring": "aliases:\"love\"^700 OR
tedge:love^900 OR title_variation_edge:\"love\"^900 OR
titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR
title:\"love\"^1000 OR (entity_edge:love^800)","parsedquery": "
FunctionScoreQuery(FunctionScoreQuery(+((aliases:love)^700.0
(tedge:love)^900.0 (title_variation_edge:love)^900.0 (titleNgram:love)^10.0
(artistEdge:love)^250.0 (keyword:love)^10.0 (title:love)^1000.0
(entity_edge:love)^800.0), scored by
boost(product(sum(const(1),double(n_c_pop)),product(const(1.0E-4),sum(const(1),sum(product(const(0.01),long(search_click)),product(const(1),long(search_clickone)),product(const(10),long(search_clicktwo)),product(const(100),long(search_clickthree)))))))))"
,
*Solr 6 Query Params:*
"responseHeader": {"status": 0,"QTime": 366,"params": {"q":
"aliases:\"love\"^700
OR tedge:love^900 OR title_variation_edge:\"love\"^900 OR
titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR
title:\"love\"^1000 OR (entity_edge:love^800)","debug": "true","fl":
"doc_id,recording_id,title,type,status,keywords,aliases,score,search_click,thumbnail,year,album,artist,primaryArtist,language,original_title,is_hellotune,isCurated,explicitType"
,"start": "0","boost":
"product(sum(1,n_c_pop),product(0.0001,sum(1,sum(product(0.01,search_click),product(1,search_clickone),product(10,search_clicktwo),product(100,search_clickthree)))))"
,"fq": ["status:(PUBLISH_PUBLISHED LIVE ENRICHED)","((type:song type:artist
type:playlist type:podcast) OR (type:album AND ((creationTime:[* TO
NOW/DAY-31DAYS ] AND size:[2 TO *]) OR creationTime:[NOW/DAY-31DAYS TO *
])))"],"rows": "40","wt": "json","debug.explain.structured": "true"}
*Solr 8 Query Params:*
"responseHeader": {"status": 0,"QTime": 40,"params": {"q":
"aliases:\"love\"^700
OR tedge:love^900 OR title_variation_edge:\"love\"^900 OR
titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR
title:\"love\"^1000 OR (entity_edge:love^800)","debug": "true","fl":
"doc_id,recording_id,title,type,status,keywords,aliases,score,search_click,thumbnail,year,album,artist,primaryArtist,language,original_title,is_hellotune,isCurated,explicitType"
,"start": "0","boost":
"product(sum(1,n_c_pop),product(0.0001,sum(1,sum(product(0.01,search_click),product(1,search_clickone),product(10,search_clicktwo),product(100,search_clickthree)))))"
,"fq": ["status:(PUBLISH_PUBLISHED LIVE ENRICHED)","((type:song type:artist
type:playlist type:podcast) OR (type:album AND ((creationTime:[* TO
NOW/DAY-31DAYS ] AND size:[2 TO *]) OR creationTime:[NOW/DAY-31DAYS TO *
])))"],"rows": "40","wt": "json","debug.explain.structured": "true"}}

On Sat, Aug 27, 2022 at 2:38 AM Shawn Heisey <elyog...@elyograg.org.invalid>
wrote:

> On 8/26/22 14:18, Sidharth Negi wrote:
> > The disk space taken by the index of both Solr versions was about ~35 GB
> > and the number of docs ~30 million in both.
>
> Unless that system is handling insanely complex queries that chew up
> lots of memory, I would not expect it to need more than about 8GB of
> heap with that index size, and quite possibly even less.
>
> 48GB total system memory would probably work if the server is not
> handling anything other than that Solr index.
>
> If you share solr GC logs that cover a day or more of heavy usage
> (several days being better), I might be able to determine a more precise
> heap size recommendation.  Unfortunately there is no generic way to
> calculate heap requirements based on data statistics.  GC logs provide
> the best info, otherwise it requires experimentation.  My guess of 8GB
> is just that -- a guess.  It might be wrong, either too high or too low.
>
> > Let me run an experiment using the same GC settings on both to see if
> that
> > works. Is there anything else we can do to narrow down the reason for
> sure?
> > All slaves combined will have to serve over 80k requests per second once
> we
> > set the number of slaves such that the CPU usage of all remains well
> below
> > 70% at peaks.
>
> Very high query rate, but it sounds like you're going to size
> appropriately to deal with it.
>
> > Interesting to note that when I ran the experiment with Solr 9, the CPU
> > usage was about the same as Solr 6.
>
> That is interesting.  Solr 9 also uses G1 and log4j2, so those are
> probably not the culprit.  Deploying the latest release would be our
> recommendation, because only extremely bad bugs, mostly security issues,
> are going to be fixed in new 8.x releases.  It is very unlikely that a
> new 8.11.x version will address this issue.
>
> Thanks,
> Shawn
>
>

Reply via email to