[
https://issues.apache.org/jira/browse/CASSANDRA-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032913#comment-18032913
]
Vivekanand Koya commented on CASSANDRA-20984:
---------------------------------------------
Looks like the Streaming part of cassandra is not robust like the Message
handling code side of cassandra. The message handling part of the cassandra
server code i.e org.apache.cassandra.net.OutboundConnection has all available
information to handle every condition within code. Esp, the class
org.apache.cassandra.net.OutboundConnectionInitiator$Result.Outcome is package
private in org.apache.cassandra.net.
I've taken a closer look at the code. I've been able to reproduce the issue
with the unit test and have produced a fix. Here are my observations.
The streaming side of code mostly located in class
org.apache.cassandra.streaming.async.NettyStreamingConnectionFactory invokes
the method initiateStreaming in class
org.apache.cassandra.net.OutboundConnectionInitiator. Since they are located in
different packages, the streaming code lacks the ability to perform any checks
based on org.apache.cassandra.net.OutboundConnectionInitiator$Result.Outcome.
Put simply the outcome field & enum of Result is inaccessible in
NettyStreamingConnectionFactory class.
I patched the code to perform checks based on Result.Outcome overcoming the
limitation. When working on the Unit test, I also saw the inconsistency in the
way casts are performed between retry, success and incompatible.
In NettyStreamingConnectionFactory, there appears to be some confusion in the
invocation of isSuccess() method. It actually is making the invocation on Netty
Future. It should have been on the Result object. On making a successful
connect, NettyStreamingConnectionFactory calls success() on Future' s getNow()
without checking the type of the cast.
There are no tests for initiateStreaming() method of
OutboundConnectionInitiator as there are for initiateMessaging() method of
OutboundConnectionInitiator.
Reproduction
I wrote a test (StreamingTest) that reproduces the issue in
https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.
Code Change
I used the instanceof in https://openjdk.org/jeps/394 to make incorrect
comparisons a compile-time error. This is done in OutboundConnection where I
check if result.success() instanceof MessagingSuccess and
OutboundConnectionInitiator where I return Success safely instead.
GitHub Pull request: https://github.com/apache/cassandra/pull/4438
Please note: this change makes use of a feature in JDK 16 and thus needs a
higher minimum JDK.
> Fix java.lang.ClassCastException: Streaming Incompatible versions
> -----------------------------------------------------------------
>
> Key: CASSANDRA-20984
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20984
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Consistency/Bootstrap and Decommission,
> Consistency/Streaming
> Reporter: Vivekanand Koya
> Assignee: Vivekanand Koya
> Priority: Normal
> Fix For: 5.0.3, 5.0.4, 5.0.5
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> On version mismatch or protocol incompatibility between the two communicating
> nodes, cassandra server does not handle the errors correctly. There is no
> proper error handling when ClassCastException occurs, since it is not
> consistently reproducible.
> In
> [https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.,|https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.]
> the "Stream failed" message points to a streaming operation, which is used
> for processes like node repairs or adding new nodes (bootstrapping). I found
> a similar Jira that was already raised -
> https://issues.apache.org/jira/browse/CASSANDRA-19218. Not sure what to do
> with this JIRA. Looks like a duplicate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]