Errata: I meant 80k requests per minute and NOT 80k per second.

On Sat, Aug 27, 2022 at 1:48 AM Sidharth Negi <sidharth.negi...@gmail.com>
wrote:

> Interesting to note that when I ran the experiment with Solr 9, the CPU
> usage was about the same as Solr 6.
>
> On Fri, Aug 26, 2022 at 7:02 PM Shawn Heisey <apa...@elyograg.org.invalid>
> wrote:
>
>> On 8/26/22 02:55, Sidharth Negi wrote:
>> > We set up Solr 6 and Solr 8 on two identical AWS instances (16 cores,
>> > 128 GB of which Solr was given Xmx=50GB) and indexed the same data on
>> > them and tested under the same load of traffic. The schema and
>> > solrconfig.xml are exactly identical - the schema file is just renamed
>> > as managed-schema in Solr 8. None of the two machines are indexing
>> > data or taking replication and both have about equal number of
>> > segments (42 and 45 segments for Solr 6 and Solr 8 respectively)
>>
>> Are you really sure that the heap needs to be that big?  It really is
>> huge, and due to the way that Java works, anything 32GB or larger
>> requires 64-bit pointers.  So a heap size of 31GB actually has more
>> memory available than a heap size of 32GB.  At 50 GB, you have likely
>> passed the break-even point.  But unless you're dealing with hundreds of
>> millions of documents, it is very unlikely that you need a heap that big.
>
>
> I agree - the number of documents we are dealing with is ~30 million so
> most of the heap is unused (over 30 GB).
>
>
>
>
>> > What's surprising is that Solr 6.6.1 CPU usage is considerably lower
>> > than Solr 8.11.2. Just look at the screenshot attached. The blue line
>> > is Solr 8.11.2 while the orange one is Solr 6.6.1. Note that the Solr
>> > 8 CPU usage is considerably higher with identical traffic.
>>
>> You have higher CPU usage, but does Solr 8 actually perform worse than
>> Solr 6?  What do other metrics show, like CPU iowait percentage?
>
>
> I don't think Solr 8 performs any "worse" in terms of query times taken
> for a query as such - it's just that the CPU usage is linearly increasing
> with traffic and the screenshot is for 30% traffic. Hence for full scale
> traffic, Solr 6 will win out as that will need a lesser number of machines
> since we want to keep CPU usage well under 70% on a production instance
> even though the query times are about the same.
>
>
>
>
>> You've talked about segment counts, but haven't talked about index
>> size.  Is the total disk space consumed by the index about the same on
>> both?
>
>
> The disk space taken by the index of both Solr versions was about ~35 GB
> and the number of docs ~30 million in both.
>
>
>> I can think of two differences between 6 and 8 that are fundamental:
>> First: 6 uses CMS for garbage collection and 8 uses G1.  G1 has better
>> overall performance because more of its work can function in parallel
>> with the application, and I can imagine that it uses a little bit more
>> of resources like memory and CPU. Second:  6 uses log4j 1 and 8 uses
>> log4j 2.  The later logging library is much faster because it takes
>> advantage of threads, which could increase the overall CPU usage.
>> Whether that would cause a significant impact depends mostly on how busy
>> the server is and whether the logging configuration has been changed.
>> With default settings, at least one log message is created for almost
>> every request that Solr receives.
>>
>
> Let me run an experiment using the same GC settings on both to see if that
> works. Is there anything else we can do to narrow down the reason for sure?
> All slaves combined will have to serve over 80k requests per second once we
> set the number of slaves such that the CPU usage of all remains well below
> 70% at peaks.
>
>
>> There have also been a lot of advancements in other areas, and those
>> probably contribute.  Higher CPU usage does not automatically mean that
>> performance is worse.  Sometimes applications actually perform better
>> when using more CPU.
>>
>
> I agree - higher CPU usage is not directly meaning worse performance but
> as mentioned above - for us that would translate into more infra and hence
> added cost.
>
>
>> Thanks,
>> Shawn
>>
>>

Reply via email to