Hi Pranay, i seems that your data is unevenly distributed across the cluster with respect your insertion frequency.Please restructure your partition key
Thanks On Fri, Jan 20, 2017 at 6:49 AM, Pranay akula <pranay.akula2...@gmail.com> wrote: > what i have observed is 2-3 old gen GC's in 1-2 mins before OOM which i > rarely see and seen hinted handoffs get accumulated on nodes which went > down, and Mutation drops as well. > > i really don't know how to analyse hprof file is there any guide or blog > that can help me how to analyse it ?? our cluster has 2 DC's each DC with > 18 nodes each and 12 GB Heap and 4 GB new Heap. > > > Thanks > Pranay. > > On Thu, Jan 19, 2017 at 8:19 AM, Alain RODRIGUEZ <arodr...@gmail.com> > wrote: > >> Hi Pranay, >> >> what can be the reason for this >> >> >> It can be due to a JVM / GC misconfiguration or to some abnormal activity >> in Cassandra. Often, GC issues are a consequences and not the root cause of >> an issue in Cassandra. >> >> >>> how to debug that ?? >> >> how to fine grain why on those particular nodes this is happening when >>> these nodes are serving same requests like rest of the cluster ?? >> >> >> You can enable GC logs on those nodes (use the cassandra-env.sh file to >> do so) and have a look at what's happening there. Also you can have a look >> at the system.log files (search for warning or errors - WARN / ERROR) and >> at "nodetool tpstats". I like to use this last command as follow "watch -d >> nodetool tpstats" to see variations. >> >> Having pending or dropped threads is likely to increase the GC activity. >> As well as having wide rows, many tomstones and some other cases. >> >> So to determine why this is happening, could you share your hardware >> specs, the way JVM / GC is configured (cassandra-env.sh) and let us know >> how nodes are handling threads and about any relevant infrmation that might >> be appearing in the logs. >> >> You can investigate the heap dump as well (I believe you can do this >> using Eclipse Memory Analyzer - MAT). >> >> C*heers, >> ----------------------- >> Alain Rodriguez - @arodream - al...@thelastpickle.com >> France >> >> The Last Pickle - Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> 2017-01-19 14:00 GMT+01:00 Pranay akula <pranay.akula2...@gmail.com>: >> >>> From last few days i am seeing on some of the nodes in cassandra cluster >>> DSE is getting shutdown due to the error below and i need to kill Java >>> process and restart DSE service. >>> >>> I have cross checked reads and writes and compactions nothing looks >>> suspicious, but i am seeing full Gc pause on these server just before the >>> issue happening. what can be the reason for this how to debug that ?? how >>> to fine grain why on those particular nodes this is happening when these >>> nodes are serving same requests like rest of the cluster ?? >>> >>> Is this happening because of Full Gc is not getting performed properly, >>> we using G1GC and DSE 4.8.3 >>> >>> >>> ERROR [SharedPool-Worker-25] 2016-12-27 10:14:26,100 >>> JVMStabilityInspector.java:117 - JVM state determined to be unstable. >>> Exiting forcefully due to:java.lang.OutOfMemoryError: Java heap space >>> >>> at java.util.Arrays.copyOf(Arrays.java:3181) ~[na:1.8.0_74] >>> >>> at >>> org.apache.cassandra.db.RangeTombstoneList.copy(RangeTombstoneList.java:112) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.db.Deleti >>> onInfo.copy(DeletionInfo.java:104) ~[cassandra-all-2.1.13.1131.ja >>> r:2.1.13.1131] >>> >>> at org.apache.cassandra.db.Atomic >>> BTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:217) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.db.Memtable.put(Memtable.java:210) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.db.Column >>> FamilyStore.apply(ColumnFamilyStore.java:1230) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.db.Mutati >>> onVerbHandler.doVerb(MutationVerbHandler.java:54) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.net.Messa >>> geDeliveryTask.run(MessageDeliveryTask.java:64) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at java.util.concurrent.Executors >>> $RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_74] >>> >>> at org.apache.cassandra.concurren >>> t.AbstractTracingAwareExecutorService$FutureTask.run(Abstrac >>> tTracingAwareExecutorService.java:164) ~[cassandra-all-2.1.13.1131.ja >>> r:2.1.13.1131] >>> >>> at org.apache.cassandra.concurren >>> t.SEPWorker.run(SEPWorker.java:105) [cassandra-all-2.1.13.1131.jar >>> :2.1.13.1131] >>> >>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74] >>> >>> >>> ERROR [SharedPool-Worker-25] 2016-12-27 10:14:28,100 >>> SEPWorker.java:141 - Failed to execute task, unexpected exception killed >>> worker: {} >>> >>> java.lang.IllegalStateException: Shutdown in progress >>> >>> at java.lang.ApplicationShutdownH >>> ooks.remove(ApplicationShutdownHooks.java:82) ~[na:1.8.0_74] >>> >>> at java.lang.Runtime.removeShutdownHook(Runtime.java:239) >>> ~[na:1.8.0_74] >>> >>> at org.apache.cassandra.service.S >>> torageService.removeShutdownHook(StorageService.java:764) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.utils.JVM >>> StabilityInspector$Killer.killCurrentJVM(JVMStabilityInspector.java:119) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.utils.JVM >>> StabilityInspector$Killer.killCurrentJVM(JVMStabilityInspector.java:109) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.utils.JVM >>> StabilityInspector.inspectThrowable(JVMStabilityInspector.java:68) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at org.apache.cassandra.concurren >>> t.AbstractTracingAwareExecutorService$FutureTask.run(Abstrac >>> tTracingAwareExecutorService.java:168) ~[cassandra-all-2.1.13.1131.ja >>> r:2.1.13.1131] >>> >>> at >>> >>> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) >>> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] >>> >>> at >>> >>> java.lang.Thread.run(Thread.java:745) [na:1.8.0_74] >>> >>> >>> INFO [Thread-6] 2016-12-27 10:14:56,150 DseDaemon.java:420 - DSE >>> shutting down... >>> >> >> >