I have always preferred completely turning off swap on solr dedicated machines, and especially if you can’t use an SSD.
> On Oct 26, 2021, at 12:59 PM, Paul Russell <paul.russ...@qflow.com> wrote: > > Thanks for all the helpful information. > > Currently we are averaging about 5.5k requests a minute for this collection > that is supported by a 3 node SOLR cluster. RHEL6 (Current Servers) and > RHEL 7 (New Servers) are both utilizing OpenJDK8. Older servers have an > older version 8.131 new servers have 8.302 jdk installations. > > GC is configured the same on all servers. > > GC_TUNE="-XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 > -XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:+PerfDisableSharedMem > -XX:MetaspaceSize=64M" > > > Because I can bring the nodes on-line during off peak hours and load test > I'll take a look at 'swap-off" option. I dont control the hardware but I > also think a larger SSD based swap fs is also an option unless turning swap > off doesnt work > > > Thanks again.. > > > > > >> On Tue, Oct 26, 2021 at 9:20 AM Shawn Heisey <apa...@elyograg.org> wrote: >> >>> On 10/26/21 6:10 AM, Paul Russell wrote: >>> I have a current SOLR cluster running SOLR 6.6 on RHEL 6 servers. All >> SOLR >>> instances use a 25G JVM on the RHEL 6 server configured with 64G of >> memory >>> managing a 900G collection. Measured response time to queries average >> about >>> 100ms. >> >> Congrats on getting that performance. With the numbers you have >> described, I would not expect to see anything that good. >> >>> On the RHEL 7 servers the kswapd0 process is consuming up to 30% of the >> CPU >>> and response time is being measured at 500-1000 ms for queries. >> >> How long are you giving the system, and how many queries have been >> handled by the cluster before you begin benchmarking? The only way the >> old cluster could see performance that good is handling a LOT of queries >> ... enough that the OS can figure out how to effectively cache the index >> with limited memory. By my calculations, your systems have less than >> 40GB of free memory to cache a 900GB index. And that assumes that Solr >> is the only software running on these systems. >> >>> I tried using the vm.swappiness setting at both 0 and 1 and have been >>> unable to change the behavior. >> >> Did you see any information other than kswapd0 CPU usage that led you to >> this action? I would not expect swap to be the problem with this, and >> your own experiments seem to say the same. >> >>> If I trim the SOLR JVM to 16Gb response >>> times get better and GC logs show the JVM is operating correctly.. >> >> >> Sounds like you have a solution. Is there a problem with simply >> changing the heap size? If everything works with a lower heap size, >> then the lower heap size is strongly encouraged. You seem to be making >> a point here about the JVM operating correctly with a 16GB heap. Are >> you seeing something in GC logs to indicate incorrect operation with the >> higher heap? Solr 6.x uses CMS for garbage collection. You might see >> better GC performance by switching to G1. Switching to another collector >> would require a much newer Java version, one that is probably not >> compatible with Solr 6.x. Here is the GC_TUNE setting (goes in >> solr.in.sh) for newer Solr versions: >> >> GC_TUNE=('-XX:+UseG1GC' \ >> '-XX:+PerfDisableSharedMem' \ >> '-XX:+ParallelRefProcEnabled' \ >> '-XX:MaxGCPauseMillis=250' \ >> '-XX:+UseLargePages' \ >> '-XX:+AlwaysPreTouch' \ >> '-XX:+ExplicitGCInvokesConcurrent') >> >> If your servers have more than one physical CPU and NUMA architecture, >> then I would strongly recommend adding "-XX:+UseNUMA" to the argument >> list. Adding it on systems with only one NUMA node will not cause >> problems. >> >> I would not expect the problem to be in the OS, but I could be wrong. >> It is possible that changes in the newer kernel make it less efficient >> at figuring out proper cache operation, and that would affect Solr. >> Usually things get better with an upgrade, but you never know. >> >> It seems more likely to be some other difference between the systems. >> Top culprit in my mind is Java. Are the two systems running the same >> version of Java from the same vendor? What I would recommend for Solr >> 6.x is the latest OpenJDK 8. In the past I would have recommended >> Oracle Java, but they changed their licensing, so now I go with >> OpenJDK. Avoid IBM Java or anything that descends from it -- it is >> known to have bugs running Lucene software. If you want to use a newer >> Java version than Java 8, you'll need to upgrade Solr. Upgrading from >> 6.x to 8.x is something that requires extensive testing, and a complete >> reindex from scratch. >> >> I would be interested in seeing the screenshot described here: >> >> >> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue >> >> RHEL uses gnu top. >> >> My own deployments use Ubuntu. Back when I did have access to large >> Solr installs, they were running on CentOS, which is effectively the >> same as RHEL. I do not recall whether they were CentOS 6 or 7. >> >> Thanks, >> Shawn >> >> >> > > -- > Paul > Russell > VP Integration/Support Services > [image: <!--company-->] <https://www.qflow.com/> > *main:* 314.968.9906 > *direct:* 314.255.2135 > *cell:* 314.258.0864 > 9317 Manchester Rd. > St. Louis, MO 63119 > qflow.com <https://www.qflow.com/>