Hi All,

We have cluster of 30 nodes and each node has 750gb of data.
There are 420 Shards. Shards and data are well distributed with all nodes.
JVM Settings ->

JDK :Amazon.com Inc. OpenJDK 64-Bit Server VM 17.0.1 17.0.1+12-LTS
Processor : 48
JVM Args:
Args
-DSTOP.KEY=solrrocks
-DSTOP.PORT=7983
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=8986
-Dcom.sun.management.jmxremote.ssl=false
-Denable.packages=true
-Denable.runtime.lib=true
-Djava.net.preferIPv4Stack=true
-Djetty.home=/prod/solrCI/8.11.1-191/solr-8.11.1/server
-Djetty.port=8983
-Djute.maxbuffer=10000000
-Dsolr.data.home=
-Dsolr.data.home=/prod/solr_data/inst1
-Dsolr.default.confdir=/prod/solrCI/8.11.1-191/solr-8.11.1/server/solr/configsets/_default/conf
-Dsolr.environment=prod,label=PROD2+PRODUCTION,color=#c9fdd6-Dsolr.install.dir=/prod/solrCI/8.11.1-191/solr-8.11.1
-Dsolr.jetty.inetaccess.excludes=
-Dsolr.jetty.inetaccess.includes=
-Dsolr.log.dir=/prod/solrCI/8.11.1-191/solr-8.11.1/server/logs
-Dsolr.solr.home=/prod/solr_home/inst1
-Duser.timezone=UTC
-DzkClientTimeout=30000
-DzkHost=<zookeeper_string>-XX:+UseNUMA-XX:+UseZGC
-XX:-OmitStackTraceInFastThrow
-XX:CompileCommand=exclude,com.github.benmanes.caffeine.cache.BoundedLocalCache::put
-XX:OnOutOfMemoryError=/prod/solrCI/8.11.1-191/solr-8.11.1/bin/oom_solr.sh 8983 
/prod/solrCI/8.11.1-191/solr-8.11.1/server/logs
-XX:SoftMaxHeapSize=64g-Xlog:gc*:file=/prod/solrCI/8.11.1-191/solr-8.11.1/server/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
-Xms88g
-Xmx88g
-Xss256k

What we observe is only one node shows high usage of heap and other nodes are 
well below threshold.
You can see in attached image.

[cid:image001.png@01D93D2D.6330B760]


Even if we bounce the node or entire cluster same issue comes back and it will 
be the same node which will report high heap usage.
We also try to reload collection but that does not help.
It is also weird that it is only one   node which will get all hit and 
sometimes it just dies.


We compared that machine with all other machine and made sure there is nothing 
different.

If anyone has any pointers to help then it is greatly appreciated.

Please let me know if you need more information.



Thanks,
Jigar Gajjar

Reply via email to