Update: Solr 6 and Solr 8 both parsed the same input query differently. Is this an expected difference given that the config and schema are the same. I suspect this difference could also be a reason for the difference in CPU performance.
How can I tweak Solr 8 to parse the given query as *BoostQuery* instead of *FunctionScoreQuery *to confirm this? *Solr 6 Debug Information:* "querystring": "aliases:\"love\"^700 OR tedge1:love^900 OR title_variation_edge:\"love\"^900 OR titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR title:\"love\"^1000 OR (entity_edge:love^800)","parsedquery": "BoostedQuery(boost(+((aliases:love)^700.0 (tedge1:love)^900.0 (title_variation_edge:love)^900.0 (titleNgram:love)^10.0 (artistEdge:love)^250.0 (keyword:love)^10.0 (title:love)^1000.0 (entity_edge:love)^800.0) ((keywords:\"700 900 (10 ten) 250 (10 ten) 1000\"~2)^25.0 | (aliases:\"700 900 (10 ten) 250 (10 ten) 1000\"~2)^50.0 | (title:\"700 900 (10 ten) 250 (10 ten) 1000\"~2)^60.0),product(sum(const(1),double(n_c_pop)),product(const(1.0E-4),sum(const(1),sum(product(const(0.01),long(search_click)),product(const(1),long(search_clickone)),product(const(10),long(search_clicktwo)),product(const(100),long(search_clickthree))))))))", *Solr 8 Debug Information:*"querystring": "aliases:\"love\"^700 OR tedge:love^900 OR title_variation_edge:\"love\"^900 OR titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR title:\"love\"^1000 OR (entity_edge:love^800)","parsedquery": " FunctionScoreQuery(FunctionScoreQuery(+((aliases:love)^700.0 (tedge:love)^900.0 (title_variation_edge:love)^900.0 (titleNgram:love)^10.0 (artistEdge:love)^250.0 (keyword:love)^10.0 (title:love)^1000.0 (entity_edge:love)^800.0), scored by boost(product(sum(const(1),double(n_c_pop)),product(const(1.0E-4),sum(const(1),sum(product(const(0.01),long(search_click)),product(const(1),long(search_clickone)),product(const(10),long(search_clicktwo)),product(const(100),long(search_clickthree)))))))))" , *Solr 6 Query Params:* "responseHeader": {"status": 0,"QTime": 366,"params": {"q": "aliases:\"love\"^700 OR tedge:love^900 OR title_variation_edge:\"love\"^900 OR titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR title:\"love\"^1000 OR (entity_edge:love^800)","debug": "true","fl": "doc_id,recording_id,title,type,status,keywords,aliases,score,search_click,thumbnail,year,album,artist,primaryArtist,language,original_title,is_hellotune,isCurated,explicitType" ,"start": "0","boost": "product(sum(1,n_c_pop),product(0.0001,sum(1,sum(product(0.01,search_click),product(1,search_clickone),product(10,search_clicktwo),product(100,search_clickthree)))))" ,"fq": ["status:(PUBLISH_PUBLISHED LIVE ENRICHED)","((type:song type:artist type:playlist type:podcast) OR (type:album AND ((creationTime:[* TO NOW/DAY-31DAYS ] AND size:[2 TO *]) OR creationTime:[NOW/DAY-31DAYS TO * ])))"],"rows": "40","wt": "json","debug.explain.structured": "true"} *Solr 8 Query Params:* "responseHeader": {"status": 0,"QTime": 40,"params": {"q": "aliases:\"love\"^700 OR tedge:love^900 OR title_variation_edge:\"love\"^900 OR titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR title:\"love\"^1000 OR (entity_edge:love^800)","debug": "true","fl": "doc_id,recording_id,title,type,status,keywords,aliases,score,search_click,thumbnail,year,album,artist,primaryArtist,language,original_title,is_hellotune,isCurated,explicitType" ,"start": "0","boost": "product(sum(1,n_c_pop),product(0.0001,sum(1,sum(product(0.01,search_click),product(1,search_clickone),product(10,search_clicktwo),product(100,search_clickthree)))))" ,"fq": ["status:(PUBLISH_PUBLISHED LIVE ENRICHED)","((type:song type:artist type:playlist type:podcast) OR (type:album AND ((creationTime:[* TO NOW/DAY-31DAYS ] AND size:[2 TO *]) OR creationTime:[NOW/DAY-31DAYS TO * ])))"],"rows": "40","wt": "json","debug.explain.structured": "true"}} On Sat, Aug 27, 2022 at 2:38 AM Shawn Heisey <elyog...@elyograg.org.invalid> wrote: > On 8/26/22 14:18, Sidharth Negi wrote: > > The disk space taken by the index of both Solr versions was about ~35 GB > > and the number of docs ~30 million in both. > > Unless that system is handling insanely complex queries that chew up > lots of memory, I would not expect it to need more than about 8GB of > heap with that index size, and quite possibly even less. > > 48GB total system memory would probably work if the server is not > handling anything other than that Solr index. > > If you share solr GC logs that cover a day or more of heavy usage > (several days being better), I might be able to determine a more precise > heap size recommendation. Unfortunately there is no generic way to > calculate heap requirements based on data statistics. GC logs provide > the best info, otherwise it requires experimentation. My guess of 8GB > is just that -- a guess. It might be wrong, either too high or too low. > > > Let me run an experiment using the same GC settings on both to see if > that > > works. Is there anything else we can do to narrow down the reason for > sure? > > All slaves combined will have to serve over 80k requests per second once > we > > set the number of slaves such that the CPU usage of all remains well > below > > 70% at peaks. > > Very high query rate, but it sounds like you're going to size > appropriately to deal with it. > > > Interesting to note that when I ran the experiment with Solr 9, the CPU > > usage was about the same as Solr 6. > > That is interesting. Solr 9 also uses G1 and log4j2, so those are > probably not the culprit. Deploying the latest release would be our > recommendation, because only extremely bad bugs, mostly security issues, > are going to be fixed in new 8.x releases. It is very unlikely that a > new 8.11.x version will address this issue. > > Thanks, > Shawn > >