Thanks for all the helpful information.

Currently we are averaging about 5.5k requests a minute for this collection
that is supported by a 3 node SOLR cluster. RHEL6 (Current Servers) and
RHEL 7 (New Servers)  are both utilizing OpenJDK8. Older servers have an
older version 8.131 new servers have 8.302 jdk installations.

GC is configured the same on all servers.

GC_TUNE="-XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200
-XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:+PerfDisableSharedMem
-XX:MetaspaceSize=64M"


Because I can bring the nodes on-line during off peak hours and load test
I'll take a look at 'swap-off" option. I dont control the hardware but I
also think a larger SSD based swap fs is also an option unless turning swap
off doesnt work


Thanks again..





On Tue, Oct 26, 2021 at 9:20 AM Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/26/21 6:10 AM, Paul Russell wrote:
> > I have a current SOLR cluster running SOLR 6.6 on RHEL 6 servers. All
> SOLR
> > instances use a 25G JVM on the RHEL 6 server configured with 64G of
> memory
> > managing a 900G collection. Measured response time to queries average
> about
> > 100ms.
>
> Congrats on getting that performance.  With the numbers you have
> described, I would not expect to see anything that good.
>
> > On the RHEL 7 servers the kswapd0 process is consuming up to 30% of the
> CPU
> > and response time is being measured at 500-1000 ms for queries.
>
> How long are you giving the system, and how many queries have been
> handled by the cluster before you begin benchmarking?  The only way the
> old cluster could see performance that good is handling a LOT of queries
> ... enough that the OS can figure out how to effectively cache the index
> with limited memory.  By my calculations, your systems have less than
> 40GB of free memory to cache a 900GB index.  And that assumes that Solr
> is the only software running on these systems.
>
> > I tried using the vm.swappiness setting at both 0 and 1 and have been
> > unable to change the behavior.
>
> Did you see any information other than kswapd0 CPU usage that led you to
> this action?  I would not expect swap to be the problem with this, and
> your own experiments seem to say the same.
>
> > If I trim the SOLR JVM to 16Gb response
> > times get better and GC logs show the JVM is operating correctly..
>
>
> Sounds like you have a solution.  Is there a problem with simply
> changing the heap size?  If everything works with a lower heap size,
> then the lower heap size is strongly encouraged.  You seem to be making
> a point here about the JVM operating correctly with a 16GB heap.  Are
> you seeing something in GC logs to indicate incorrect operation with the
> higher heap?  Solr 6.x uses CMS for garbage collection. You might see
> better GC performance by switching to G1. Switching to another collector
> would require a much newer Java version, one that is probably not
> compatible with Solr 6.x. Here is the GC_TUNE setting (goes in
> solr.in.sh) for newer Solr versions:
>
>        GC_TUNE=('-XX:+UseG1GC' \
>          '-XX:+PerfDisableSharedMem' \
>          '-XX:+ParallelRefProcEnabled' \
>          '-XX:MaxGCPauseMillis=250' \
>          '-XX:+UseLargePages' \
>          '-XX:+AlwaysPreTouch' \
>          '-XX:+ExplicitGCInvokesConcurrent')
>
> If your servers have more than one physical CPU and NUMA architecture,
> then I would strongly recommend adding "-XX:+UseNUMA" to the argument
> list.  Adding it on systems with only one NUMA node will not cause
> problems.
>
> I would not expect the problem to be in the OS, but I could be wrong.
> It is possible that changes in the newer kernel make it less efficient
> at figuring out proper cache operation, and that would affect Solr.
> Usually things get better with an upgrade, but you never know.
>
> It seems more likely to be some other difference between the systems.
> Top culprit in my mind is Java.  Are the two systems running the same
> version of Java from the same vendor?  What I would recommend for Solr
> 6.x is the latest OpenJDK 8.  In the past I would have recommended
> Oracle Java, but they changed their licensing, so now I go with
> OpenJDK.  Avoid IBM Java or anything that descends from it -- it is
> known to have bugs running Lucene software.  If you want to use a newer
> Java version than Java 8, you'll need to upgrade Solr.  Upgrading from
> 6.x to 8.x is something that requires extensive testing, and a complete
> reindex from scratch.
>
> I would be interested in seeing the screenshot described here:
>
>
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue
>
> RHEL uses gnu top.
>
> My own deployments use Ubuntu.  Back when I did have access to large
> Solr installs, they were running on CentOS, which is effectively the
> same as RHEL.  I do not recall whether they were CentOS 6 or 7.
>
> Thanks,
> Shawn
>
>
>

-- 
Paul
Russell
VP Integration/Support Services
[image: <!--company-->] <https://www.qflow.com/>
*main:* 314.968.9906
*direct:* 314.255.2135
*cell:* 314.258.0864
9317 Manchester Rd.
St. Louis, MO 63119
qflow.com <https://www.qflow.com/>

Reply via email to