Re: SOLR Performance on RHEL 7

Shawn Heisey Tue, 26 Oct 2021 07:20:47 -0700

On 10/26/21 6:10 AM, Paul Russell wrote:

I have a current SOLR cluster running SOLR 6.6 on RHEL 6 servers. All SOLR
instances use a 25G JVM on the RHEL 6 server configured with 64G of memory
managing a 900G collection. Measured response time to queries average about
100ms.

Congrats on getting that performance. With the numbers you havedescribed, I would not expect to see anything that good.

On the RHEL 7 servers the kswapd0 process is consuming up to 30% of the CPU
and response time is being measured at 500-1000 ms for queries.

How long are you giving the system, and how many queries have beenhandled by the cluster before you begin benchmarking? The only way theold cluster could see performance that good is handling a LOT of queries... enough that the OS can figure out how to effectively cache the indexwith limited memory. By my calculations, your systems have less than40GB of free memory to cache a 900GB index. And that assumes that Solris the only software running on these systems.

I tried using the vm.swappiness setting at both 0 and 1 and have been
unable to change the behavior.

Did you see any information other than kswapd0 CPU usage that led you tothis action? I would not expect swap to be the problem with this, andyour own experiments seem to say the same.

If I trim the SOLR JVM to 16Gb response
times get better and GC logs show the JVM is operating correctly..

Sounds like you have a solution. Is there a problem with simplychanging the heap size? If everything works with a lower heap size,then the lower heap size is strongly encouraged. You seem to be makinga point here about the JVM operating correctly with a 16GB heap. Areyou seeing something in GC logs to indicate incorrect operation with thehigher heap? Solr 6.x uses CMS for garbage collection. You might seebetter GC performance by switching to G1. Switching to another collectorwould require a much newer Java version, one that is probably notcompatible with Solr 6.x. Here is the GC_TUNE setting (goes insolr.in.sh) for newer Solr versions:


      GC_TUNE=('-XX:+UseG1GC' \
        '-XX:+PerfDisableSharedMem' \
        '-XX:+ParallelRefProcEnabled' \
        '-XX:MaxGCPauseMillis=250' \
        '-XX:+UseLargePages' \
        '-XX:+AlwaysPreTouch' \
        '-XX:+ExplicitGCInvokesConcurrent')

If your servers have more than one physical CPU and NUMA architecture,then I would strongly recommend adding "-XX:+UseNUMA" to the argumentlist. Adding it on systems with only one NUMA node will not cause problems.

I would not expect the problem to be in the OS, but I could be wrong. It is possible that changes in the newer kernel make it less efficientat figuring out proper cache operation, and that would affect Solr. Usually things get better with an upgrade, but you never know.

It seems more likely to be some other difference between the systems. Top culprit in my mind is Java. Are the two systems running the sameversion of Java from the same vendor? What I would recommend for Solr6.x is the latest OpenJDK 8. In the past I would have recommendedOracle Java, but they changed their licensing, so now I go withOpenJDK. Avoid IBM Java or anything that descends from it -- it isknown to have bugs running Lucene software. If you want to use a newerJava version than Java 8, you'll need to upgrade Solr. Upgrading from6.x to 8.x is something that requires extensive testing, and a completereindex from scratch.


I would be interested in seeing the screenshot described here:

https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue

RHEL uses gnu top.

My own deployments use Ubuntu. Back when I did have access to largeSolr installs, they were running on CentOS, which is effectively thesame as RHEL. I do not recall whether they were CentOS 6 or 7.


Thanks,
Shawn

Re: SOLR Performance on RHEL 7

Reply via email to