I have always preferred completely turning off swap on solr dedicated machines, 
and especially if you can’t use an SSD. 

> On Oct 26, 2021, at 12:59 PM, Paul Russell <paul.russ...@qflow.com> wrote:
> 
> Thanks for all the helpful information.
> 
> Currently we are averaging about 5.5k requests a minute for this collection
> that is supported by a 3 node SOLR cluster. RHEL6 (Current Servers) and
> RHEL 7 (New Servers)  are both utilizing OpenJDK8. Older servers have an
> older version 8.131 new servers have 8.302 jdk installations.
> 
> GC is configured the same on all servers.
> 
> GC_TUNE="-XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200
> -XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:+PerfDisableSharedMem
> -XX:MetaspaceSize=64M"
> 
> 
> Because I can bring the nodes on-line during off peak hours and load test
> I'll take a look at 'swap-off" option. I dont control the hardware but I
> also think a larger SSD based swap fs is also an option unless turning swap
> off doesnt work
> 
> 
> Thanks again..
> 
> 
> 
> 
> 
>> On Tue, Oct 26, 2021 at 9:20 AM Shawn Heisey <apa...@elyograg.org> wrote:
>> 
>>> On 10/26/21 6:10 AM, Paul Russell wrote:
>>> I have a current SOLR cluster running SOLR 6.6 on RHEL 6 servers. All
>> SOLR
>>> instances use a 25G JVM on the RHEL 6 server configured with 64G of
>> memory
>>> managing a 900G collection. Measured response time to queries average
>> about
>>> 100ms.
>> 
>> Congrats on getting that performance.  With the numbers you have
>> described, I would not expect to see anything that good.
>> 
>>> On the RHEL 7 servers the kswapd0 process is consuming up to 30% of the
>> CPU
>>> and response time is being measured at 500-1000 ms for queries.
>> 
>> How long are you giving the system, and how many queries have been
>> handled by the cluster before you begin benchmarking?  The only way the
>> old cluster could see performance that good is handling a LOT of queries
>> ... enough that the OS can figure out how to effectively cache the index
>> with limited memory.  By my calculations, your systems have less than
>> 40GB of free memory to cache a 900GB index.  And that assumes that Solr
>> is the only software running on these systems.
>> 
>>> I tried using the vm.swappiness setting at both 0 and 1 and have been
>>> unable to change the behavior.
>> 
>> Did you see any information other than kswapd0 CPU usage that led you to
>> this action?  I would not expect swap to be the problem with this, and
>> your own experiments seem to say the same.
>> 
>>> If I trim the SOLR JVM to 16Gb response
>>> times get better and GC logs show the JVM is operating correctly..
>> 
>> 
>> Sounds like you have a solution.  Is there a problem with simply
>> changing the heap size?  If everything works with a lower heap size,
>> then the lower heap size is strongly encouraged.  You seem to be making
>> a point here about the JVM operating correctly with a 16GB heap.  Are
>> you seeing something in GC logs to indicate incorrect operation with the
>> higher heap?  Solr 6.x uses CMS for garbage collection. You might see
>> better GC performance by switching to G1. Switching to another collector
>> would require a much newer Java version, one that is probably not
>> compatible with Solr 6.x. Here is the GC_TUNE setting (goes in
>> solr.in.sh) for newer Solr versions:
>> 
>>       GC_TUNE=('-XX:+UseG1GC' \
>>         '-XX:+PerfDisableSharedMem' \
>>         '-XX:+ParallelRefProcEnabled' \
>>         '-XX:MaxGCPauseMillis=250' \
>>         '-XX:+UseLargePages' \
>>         '-XX:+AlwaysPreTouch' \
>>         '-XX:+ExplicitGCInvokesConcurrent')
>> 
>> If your servers have more than one physical CPU and NUMA architecture,
>> then I would strongly recommend adding "-XX:+UseNUMA" to the argument
>> list.  Adding it on systems with only one NUMA node will not cause
>> problems.
>> 
>> I would not expect the problem to be in the OS, but I could be wrong.
>> It is possible that changes in the newer kernel make it less efficient
>> at figuring out proper cache operation, and that would affect Solr.
>> Usually things get better with an upgrade, but you never know.
>> 
>> It seems more likely to be some other difference between the systems.
>> Top culprit in my mind is Java.  Are the two systems running the same
>> version of Java from the same vendor?  What I would recommend for Solr
>> 6.x is the latest OpenJDK 8.  In the past I would have recommended
>> Oracle Java, but they changed their licensing, so now I go with
>> OpenJDK.  Avoid IBM Java or anything that descends from it -- it is
>> known to have bugs running Lucene software.  If you want to use a newer
>> Java version than Java 8, you'll need to upgrade Solr.  Upgrading from
>> 6.x to 8.x is something that requires extensive testing, and a complete
>> reindex from scratch.
>> 
>> I would be interested in seeing the screenshot described here:
>> 
>> 
>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue
>> 
>> RHEL uses gnu top.
>> 
>> My own deployments use Ubuntu.  Back when I did have access to large
>> Solr installs, they were running on CentOS, which is effectively the
>> same as RHEL.  I do not recall whether they were CentOS 6 or 7.
>> 
>> Thanks,
>> Shawn
>> 
>> 
>> 
> 
> -- 
> Paul
> Russell
> VP Integration/Support Services
> [image: <!--company-->] <https://www.qflow.com/>
> *main:* 314.968.9906
> *direct:* 314.255.2135
> *cell:* 314.258.0864
> 9317 Manchester Rd.
> St. Louis, MO 63119
> qflow.com <https://www.qflow.com/>

Reply via email to