Thanks for all the helpful information. Currently we are averaging about 5.5k requests a minute for this collection that is supported by a 3 node SOLR cluster. RHEL6 (Current Servers) and RHEL 7 (New Servers) are both utilizing OpenJDK8. Older servers have an older version 8.131 new servers have 8.302 jdk installations.
GC is configured the same on all servers. GC_TUNE="-XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 -XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:+PerfDisableSharedMem -XX:MetaspaceSize=64M" Because I can bring the nodes on-line during off peak hours and load test I'll take a look at 'swap-off" option. I dont control the hardware but I also think a larger SSD based swap fs is also an option unless turning swap off doesnt work Thanks again.. On Tue, Oct 26, 2021 at 9:20 AM Shawn Heisey <apa...@elyograg.org> wrote: > On 10/26/21 6:10 AM, Paul Russell wrote: > > I have a current SOLR cluster running SOLR 6.6 on RHEL 6 servers. All > SOLR > > instances use a 25G JVM on the RHEL 6 server configured with 64G of > memory > > managing a 900G collection. Measured response time to queries average > about > > 100ms. > > Congrats on getting that performance. With the numbers you have > described, I would not expect to see anything that good. > > > On the RHEL 7 servers the kswapd0 process is consuming up to 30% of the > CPU > > and response time is being measured at 500-1000 ms for queries. > > How long are you giving the system, and how many queries have been > handled by the cluster before you begin benchmarking? The only way the > old cluster could see performance that good is handling a LOT of queries > ... enough that the OS can figure out how to effectively cache the index > with limited memory. By my calculations, your systems have less than > 40GB of free memory to cache a 900GB index. And that assumes that Solr > is the only software running on these systems. > > > I tried using the vm.swappiness setting at both 0 and 1 and have been > > unable to change the behavior. > > Did you see any information other than kswapd0 CPU usage that led you to > this action? I would not expect swap to be the problem with this, and > your own experiments seem to say the same. > > > If I trim the SOLR JVM to 16Gb response > > times get better and GC logs show the JVM is operating correctly.. > > > Sounds like you have a solution. Is there a problem with simply > changing the heap size? If everything works with a lower heap size, > then the lower heap size is strongly encouraged. You seem to be making > a point here about the JVM operating correctly with a 16GB heap. Are > you seeing something in GC logs to indicate incorrect operation with the > higher heap? Solr 6.x uses CMS for garbage collection. You might see > better GC performance by switching to G1. Switching to another collector > would require a much newer Java version, one that is probably not > compatible with Solr 6.x. Here is the GC_TUNE setting (goes in > solr.in.sh) for newer Solr versions: > > GC_TUNE=('-XX:+UseG1GC' \ > '-XX:+PerfDisableSharedMem' \ > '-XX:+ParallelRefProcEnabled' \ > '-XX:MaxGCPauseMillis=250' \ > '-XX:+UseLargePages' \ > '-XX:+AlwaysPreTouch' \ > '-XX:+ExplicitGCInvokesConcurrent') > > If your servers have more than one physical CPU and NUMA architecture, > then I would strongly recommend adding "-XX:+UseNUMA" to the argument > list. Adding it on systems with only one NUMA node will not cause > problems. > > I would not expect the problem to be in the OS, but I could be wrong. > It is possible that changes in the newer kernel make it less efficient > at figuring out proper cache operation, and that would affect Solr. > Usually things get better with an upgrade, but you never know. > > It seems more likely to be some other difference between the systems. > Top culprit in my mind is Java. Are the two systems running the same > version of Java from the same vendor? What I would recommend for Solr > 6.x is the latest OpenJDK 8. In the past I would have recommended > Oracle Java, but they changed their licensing, so now I go with > OpenJDK. Avoid IBM Java or anything that descends from it -- it is > known to have bugs running Lucene software. If you want to use a newer > Java version than Java 8, you'll need to upgrade Solr. Upgrading from > 6.x to 8.x is something that requires extensive testing, and a complete > reindex from scratch. > > I would be interested in seeing the screenshot described here: > > > https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue > > RHEL uses gnu top. > > My own deployments use Ubuntu. Back when I did have access to large > Solr installs, they were running on CentOS, which is effectively the > same as RHEL. I do not recall whether they were CentOS 6 or 7. > > Thanks, > Shawn > > > -- Paul Russell VP Integration/Support Services [image: <!--company-->] <https://www.qflow.com/> *main:* 314.968.9906 *direct:* 314.255.2135 *cell:* 314.258.0864 9317 Manchester Rd. St. Louis, MO 63119 qflow.com <https://www.qflow.com/>