Errata: I meant 80k requests per minute and NOT 80k per second. On Sat, Aug 27, 2022 at 1:48 AM Sidharth Negi <sidharth.negi...@gmail.com> wrote:
> Interesting to note that when I ran the experiment with Solr 9, the CPU > usage was about the same as Solr 6. > > On Fri, Aug 26, 2022 at 7:02 PM Shawn Heisey <apa...@elyograg.org.invalid> > wrote: > >> On 8/26/22 02:55, Sidharth Negi wrote: >> > We set up Solr 6 and Solr 8 on two identical AWS instances (16 cores, >> > 128 GB of which Solr was given Xmx=50GB) and indexed the same data on >> > them and tested under the same load of traffic. The schema and >> > solrconfig.xml are exactly identical - the schema file is just renamed >> > as managed-schema in Solr 8. None of the two machines are indexing >> > data or taking replication and both have about equal number of >> > segments (42 and 45 segments for Solr 6 and Solr 8 respectively) >> >> Are you really sure that the heap needs to be that big? It really is >> huge, and due to the way that Java works, anything 32GB or larger >> requires 64-bit pointers. So a heap size of 31GB actually has more >> memory available than a heap size of 32GB. At 50 GB, you have likely >> passed the break-even point. But unless you're dealing with hundreds of >> millions of documents, it is very unlikely that you need a heap that big. > > > I agree - the number of documents we are dealing with is ~30 million so > most of the heap is unused (over 30 GB). > > > > >> > What's surprising is that Solr 6.6.1 CPU usage is considerably lower >> > than Solr 8.11.2. Just look at the screenshot attached. The blue line >> > is Solr 8.11.2 while the orange one is Solr 6.6.1. Note that the Solr >> > 8 CPU usage is considerably higher with identical traffic. >> >> You have higher CPU usage, but does Solr 8 actually perform worse than >> Solr 6? What do other metrics show, like CPU iowait percentage? > > > I don't think Solr 8 performs any "worse" in terms of query times taken > for a query as such - it's just that the CPU usage is linearly increasing > with traffic and the screenshot is for 30% traffic. Hence for full scale > traffic, Solr 6 will win out as that will need a lesser number of machines > since we want to keep CPU usage well under 70% on a production instance > even though the query times are about the same. > > > > >> You've talked about segment counts, but haven't talked about index >> size. Is the total disk space consumed by the index about the same on >> both? > > > The disk space taken by the index of both Solr versions was about ~35 GB > and the number of docs ~30 million in both. > > >> I can think of two differences between 6 and 8 that are fundamental: >> First: 6 uses CMS for garbage collection and 8 uses G1. G1 has better >> overall performance because more of its work can function in parallel >> with the application, and I can imagine that it uses a little bit more >> of resources like memory and CPU. Second: 6 uses log4j 1 and 8 uses >> log4j 2. The later logging library is much faster because it takes >> advantage of threads, which could increase the overall CPU usage. >> Whether that would cause a significant impact depends mostly on how busy >> the server is and whether the logging configuration has been changed. >> With default settings, at least one log message is created for almost >> every request that Solr receives. >> > > Let me run an experiment using the same GC settings on both to see if that > works. Is there anything else we can do to narrow down the reason for sure? > All slaves combined will have to serve over 80k requests per second once we > set the number of slaves such that the CPU usage of all remains well below > 70% at peaks. > > >> There have also been a lot of advancements in other areas, and those >> probably contribute. Higher CPU usage does not automatically mean that >> performance is worse. Sometimes applications actually perform better >> when using more CPU. >> > > I agree - higher CPU usage is not directly meaning worse performance but > as mentioned above - for us that would translate into more infra and hence > added cost. > > >> Thanks, >> Shawn >> >>