[
https://issues.apache.org/jira/browse/CASSANDRA-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vivekanand Koya updated CASSANDRA-20984:
----------------------------------------
Description:
On version mismatch or protocol incompatibility between the two communicating
nodes, cassandra server does not handle the errors correctly. There is no
proper error handling when ClassCastException occurs, since it is not
consistently reproducible.
In
[https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.,|https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.]
the "Stream failed" message points to a streaming operation, which is used for
processes like node repairs or adding new nodes (bootstrapping). I found a
similar Jira that was already raised -
https://issues.apache.org/jira/browse/CASSANDRA-19218. Not sure what to do
with this JIRA. Looks like a duplicate.
was:
*TLDR*
Generic type cast causes ClassCastException when streaming from a new node to
an existing cluster upon streaming compatibility check.
*Pretext*
When a node is initiating and/or joining a cluster a handshake is made to
determine if the node is compatible with the existing message version. The
check is done when _streaming_ and _messaging._ The possibilities of the check
is an enum of SUCCESS, RETRY, INCOMPATIBLE.
In [https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.] the
check was attempted except the runtime cast to INCOMPATIBLE failed. A
ClassCastException is observed instead of an INCOMPATIBLE result in the logs.
I've taken a closer look at the code. I've been able to reproduce the issue and
have produced a fix. Along the way, I've noticed a few things.
*Code Inspection*
# initiateStreaming() method of OutboundConnectionInitiator is called from
NettyStreamingConnectionFactory located in the streaming.async package. Similar
method invocation initiateMessaging() method of OutboundConnectionInitiator is
made from OutboundConnection (net package). Thus, the enum Outcome is
package-private. The sanity check based on
OutboundConnectionInitiator.Result.Outcome could be performed from
OutboundConnection class. The same check cannot be performed from
NettyStreamingConnectionFactory since the outcome field of Result is
inaccessible.
# There is an attempt to cast to the generic type SuccessType in
OutboundConnectionIntiator.Result class. This behavior is inconsistent with the
retry() and incompatible() methods which return their respective classes.
# In NettyStreamingConnectionFactory, there appears to be some confusion in
the involcation of isSuccess() method. It actually is making the invocation on
Netty Future. It should have been on the Result object. On making a successful
connect, NettyStreamingConnectionFactory calls success() on Future' s getNow()
without checking the type of the cast.
# There are no tests for initiateStreaming() method of
OutboundConnectionInitiator as there are for initiateMessaging() method of
OutboundConnectionInitiator.
*Reproduction*
I wrote a test (StreamingTest) that reproduces the issue in
[https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3].
*Code Change*
I used the instanceof in [https://openjdk.org/jeps/394] to make incorrect
comparisons a compile-time error. This is done in OutboundConnection where I
check if result.success() instanceof MessagingSuccess and
OutboundConnectionInitiator where I return Success safely instead.
GitHub Pull request: [https://github.com/apache/cassandra/pull/4438]
_Please note: this change makes use of a feature in JDK 16 and thus needs a
higher minimum JDK._
> Fix java.lang.ClassCastException: Streaming Incompatible versions
> -----------------------------------------------------------------
>
> Key: CASSANDRA-20984
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20984
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Consistency/Bootstrap and Decommission,
> Consistency/Streaming
> Reporter: Vivekanand Koya
> Assignee: Vivekanand Koya
> Priority: Normal
> Fix For: 5.0.3, 5.0.4, 5.0.5
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> On version mismatch or protocol incompatibility between the two communicating
> nodes, cassandra server does not handle the errors correctly. There is no
> proper error handling when ClassCastException occurs, since it is not
> consistently reproducible.
> In
> [https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.,|https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.]
> the "Stream failed" message points to a streaming operation, which is used
> for processes like node repairs or adding new nodes (bootstrapping). I found
> a similar Jira that was already raised -
> https://issues.apache.org/jira/browse/CASSANDRA-19218. Not sure what to do
> with this JIRA. Looks like a duplicate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]