On 8/26/22 02:55, Sidharth Negi wrote:
We set up Solr 6 and Solr 8 on two identical AWS instances (16 cores, 128 GB of which Solr was given Xmx=50GB) and indexed the same data on them and tested under the same load of traffic. The schema and solrconfig.xml are exactly identical - the schema file is just renamed as managed-schema in Solr 8. None of the two machines are indexing data or taking replication and both have about equal number of segments (42 and 45 segments for Solr 6 and Solr 8 respectively)
Are you really sure that the heap needs to be that big? It really is huge, and due to the way that Java works, anything 32GB or larger requires 64-bit pointers. So a heap size of 31GB actually has more memory available than a heap size of 32GB. At 50 GB, you have likely passed the break-even point. But unless you're dealing with hundreds of millions of documents, it is very unlikely that you need a heap that big.
What's surprising is that Solr 6.6.1 CPU usage is considerably lower than Solr 8.11.2. Just look at the screenshot attached. The blue line is Solr 8.11.2 while the orange one is Solr 6.6.1. Note that the Solr 8 CPU usage is considerably higher with identical traffic.
You have higher CPU usage, but does Solr 8 actually perform worse than Solr 6? What do other metrics show, like CPU iowait percentage?
You've talked about segment counts, but haven't talked about index size. Is the total disk space consumed by the index about the same on both?
I can think of two differences between 6 and 8 that are fundamental: First: 6 uses CMS for garbage collection and 8 uses G1. G1 has better overall performance because more of its work can function in parallel with the application, and I can imagine that it uses a little bit more of resources like memory and CPU. Second: 6 uses log4j 1 and 8 uses log4j 2. The later logging library is much faster because it takes advantage of threads, which could increase the overall CPU usage. Whether that would cause a significant impact depends mostly on how busy the server is and whether the logging configuration has been changed. With default settings, at least one log message is created for almost every request that Solr receives.
There have also been a lot of advancements in other areas, and those probably contribute. Higher CPU usage does not automatically mean that performance is worse. Sometimes applications actually perform better when using more CPU.
Thanks, Shawn