Hi,

I am experiencing some weird behaviors after upgrading 2 nodes (out of 13)
to C* 3.0.5 (from 2.1.11). Basically, after restarting a second time, there
is a small chance that the node will die without outputting anything to the
logs (not even dmesg).

This happened on both nodes I upgraded. The only "anomalies" I see in the
logs (although not related to the moment a node dies) are:

* Lots of the following messages against all IPs of the cluster (every
second)

DEBUG [GossipStage:1] 2016-05-05 23:52:02,260 FailureDetector.java:456 -
Ignoring interval time of 2540341017 for /x.y.b.5
DEBUG [GossipStage:1] 2016-05-05 23:52:02,260 FailureDetector.java:456 -
Ignoring interval time of 2000551507 for /x.y.a.7
DEBUG [GossipStage:1] 2016-05-05 23:52:02,260 FailureDetector.java:456 -
Ignoring interval time of 2000479104 for /x.y.a.3
DEBUG [GossipStage:1] 2016-05-05 23:52:02,260 FailureDetector.java:456 -
Ignoring interval time of 2000471247 for /x.y.b.3
DEBUG [GossipStage:1] 2016-05-05 23:52:03,259 FailureDetector.java:456 -
Ignoring interval time of 2000605748 for /x.y.a.5
DEBUG [GossipStage:1] 2016-05-05 23:52:03,260 FailureDetector.java:456 -
Ignoring interval time of 2000731307 for /x.y.b.6
DEBUG [GossipStage:1] 2016-05-05 23:52:03,260 FailureDetector.java:456 -
Ignoring interval time of 3000404107 for /x.y.b.1

* Some metrics are not being pushed to graphite (but some do get to the
server). Also, every time the node tries to push them I can see the
following error in the logs:

ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 23:53:37,770
ScheduledReporter.java:119 - RuntimeException thrown from
GraphiteReporter#report. Exception was suppressed.
java.lang.IllegalStateException: Unable to compute ceiling for max when
histogram overflowed
at
org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231)
~[apache-cassandra-3.0.5.jar:3.0.5]
at
org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
~[apache-cassandra-3.0.5.jar:3.0.5]
at
com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252)
~[metrics-graphite-3.1.0.jar:3.1.0]
at
com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
~[metrics-graphite-3.1.0.jar:3.1.0]
at
com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
~[metrics-core-3.1.0.jar:3.1.0]
at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
~[metrics-core-3.1.0.jar:3.1.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_60]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[na:1.8.0_60]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_60]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_60]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_60]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

Besides these, logs are clean. I've opened a ticket here (
https://issues.apache.org/jira/browse/CASSANDRA-11723) but any help
debugging this is more than welcome.

Regards,
Stefano

Reply via email to