Sam Tunnicliffe created CASSANDRA-21025:
-------------------------------------------
Summary: Failure detector max interval value is calculated
incorrectly
Key: CASSANDRA-21025
URL: https://issues.apache.org/jira/browse/CASSANDRA-21025
Project: Apache Cassandra
Issue Type: Bug
Components: Cluster/Gossip
Reporter: Sam Tunnicliffe
If this setting is not overridden via the {{cassandra.fd_max_interval_ms}}
system property ({{{}CassandraRelevantProperties.FD_MAX_INTERVAL_MS{}}}), then
it is seeded with the value of {{{}FailureDetector.INITIAL_VALUE_NANOS{}}}.
However, a bug in the logic of
{{FailureDetector$ArrivalWindow::getMaxInterval}} means in this case there is
an incorrect conversion between time units.
{code:java}
public static long getMaxInterval()
{
long newValue =
FD_MAX_INTERVAL_MS.getLong(FailureDetector.INITIAL_VALUE_NANOS);
if (newValue != FailureDetector.INITIAL_VALUE_NANOS)
logger.info("Overriding {} from {}ms to {}ms",
FD_MAX_INTERVAL_MS.getKey(), FailureDetector.INITIAL_VALUE_NANOS, newValue);
return TimeUnit.NANOSECONDS.convert(newValue, TimeUnit.MILLISECONDS);
}
{code}
If {{FD_MAX_INTERVAL_MS}} is not set, the supplied default
{{INITIAL_VALUE_NANOS}} is used, but this is then converted as if it were a
value in millis, inflating it 1000000x.
The effective max interval in this case should be 2 seconds, but instead
becomes 23 days, 3 hours, 33 minutes & 20 seconds.
The net effect is that intervals way longer than expected can be recorded if
nodes are intermittently partitioned but not restarted (meaning they retain the
same gossip generation).
In turn this can cause the phi calculation to react to those nodes much more
slowly as the mean arrival time interval is much bigger than expected, leaving
them marked as {{UP}} when they should be {{{}DOWN{}}}.
If {{FD_MAX_INTERVAL_MS}} is overridden then the conversion, and so the
returned value, is correct (assuming an appropriately scaled values is
supplied, there is no guardrail to ensure that). Versions earlier than 5.0 are
not affected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]