[
https://issues.apache.org/jira/browse/CASSANDRA-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vivekanand Koya updated CASSANDRA-20984:
----------------------------------------
Description:
*TLDR*
Generic type cast causes ClassCastException when streaming from a new node to
an existing cluster upon streaming compatibility check.
*Pretext*
When a node is initiating and/or joining a cluster a handshake is made to
determine if the node is compatible with the existing message version. The
check is done when _streaming_ and _messaging._ The possibilities of the check
is an enum of SUCCESS, RETRY, INCOMPATIBLE. __ In
[https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.] the check
was attempted except the runtime cast to INCOMPATIBLE failed. A
ClassCastException is observed instead of an INCOMPATIBLE result in the logs.
I've taken a closer look at the code. I've been able to reproduce the issue and
have produced a fix. Along the way, I've noticed a few things.
*Code Inspection*
# initiateStreaming() method of OutboundConnectionInitiator is called from
NettyStreamingConnectionFactory located in the streaming.async package. Similar
method invocation initiateMessaging() method of OutboundConnectionInitiator is
made from OutboundConnection (net package). Thus, the enum Outcome is
package-private. The sanity check based on
OutboundConnectionInitiator.Result.Outcome could be performed from
OutboundConnection class. The same check cannot be performed from
NettyStreamingConnectionFactory since the outcome field of Result is
inaccessible.
# There is an attempt to cast to the generic type SuccessType in
OutboundConnectionIntiator.Result class. This behavior is inconsistent with the
retry() and incompatible() methods which return their respective classes.
# In NettyStreamingConnectionFactory, there appears to be some confusion in
the involcation of isSuccess() method. It actually is making the invocation on
Netty Future. It should have been on the Result object. On making a successful
connect, NettyStreamingConnectionFactory calls success() on Future' s getNow()
without checking the type of the cast.
# There are no tests for initiateStreaming() method of
OutboundConnectionInitiator as there are for initiateMessaging() method of
OutboundConnectionInitiator.
*Reproduction*
I wrote a test (StreamingTest) that reproduces the issue in
[https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3].
*Code Change*
I used the instanceof in [https://openjdk.org/jeps/394] to make incorrect
comparisons a compile-time error. This is done in OutboundConnection where I
check if result.success() instanceof MessagingSuccess and
OutboundConnectionInitiator where I return Success safely instead.
GitHub Pull request: [https://github.com/apache/cassandra/pull/4438]
_Please note: this change makes use of a feature in JDK 16 and thus needs a
higher minimum JDK._
was:
>From [~rustyrazorblade]'s recommendation
I've taken a closer look at the code. I've been able to reproduce the issue and
have produced a fix. Along the way, I've noticed a few things.
# initiateStreaming() method of OutboundConnectionInitiator is called from
NettyStreamingConnectionFactory located in the streaming.async package. Similar
method invocation initiateMessaging() method of OutboundConnectionInitiator is
made from OutboundConnection (net package). Thus, the enum Outcome is
package-private. The sanity check based on
OutboundConnectionInitiator.Result.Outcome could be performed from
OutboundConnection class. The same check cannot be performed from
NettyStreamingConnectionFactory since the outcome field of Result is
inaccessible.
# There is an attempt to cast to the generic type SuccessType in
OutboundConnectionIntiator.Result class. This behavior is inconsistent with the
retry() and incompatible() methods which return their respective classes.
# In NettyStreamingConnectionFactory, there appears to be some confusion in
the involcation of isSuccess() method. It actually is making the invocation on
Netty Future. It should have been on the Result object. On making a successful
connect, NettyStreamingConnectionFactory calls success() on Future' s getNow()
without checking the type of the cast.
# There are no tests for initiateStreaming() method of
OutboundConnectionInitiator as there are for initiateMessaging() method of
OutboundConnectionInitiator.
What I've done. # I wrote a test (StreamingTest) that reproduces the issue in
[https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3].
# I used the instanceof in [https://openjdk.org/jeps/394] to make incorrect
comparisons a compile-time error. This is done in OutboundConnection where I
check if result.success() instanceof MessagingSuccess and
OutboundConnectionInitiator where I return Success safely instead.
GitHub Pull request: https://github.com/apache/cassandra/pull/4438
_Please note: this change makes use of a feature in JDK 16 and thus needs a
higher minimum JDK._
> Fix java.lang.ClassCastException: Issues while joining
> ------------------------------------------------------
>
> Key: CASSANDRA-20984
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20984
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Consistency/Bootstrap and Decommission,
> Consistency/Streaming
> Reporter: Vivekanand Koya
> Assignee: Vivekanand Koya
> Priority: Normal
> Fix For: 5.0.3, 5.0.4, 5.0.5
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *TLDR*
> Generic type cast causes ClassCastException when streaming from a new node to
> an existing cluster upon streaming compatibility check.
> *Pretext*
> When a node is initiating and/or joining a cluster a handshake is made to
> determine if the node is compatible with the existing message version. The
> check is done when _streaming_ and _messaging._ The possibilities of the
> check is an enum of SUCCESS, RETRY, INCOMPATIBLE. __ In
> [https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.] the check
> was attempted except the runtime cast to INCOMPATIBLE failed. A
> ClassCastException is observed instead of an INCOMPATIBLE result in the logs.
>
> I've taken a closer look at the code. I've been able to reproduce the issue
> and have produced a fix. Along the way, I've noticed a few things.
> *Code Inspection*
> # initiateStreaming() method of OutboundConnectionInitiator is called from
> NettyStreamingConnectionFactory located in the streaming.async package.
> Similar method invocation initiateMessaging() method of
> OutboundConnectionInitiator is made from OutboundConnection (net package).
> Thus, the enum Outcome is package-private. The sanity check based on
> OutboundConnectionInitiator.Result.Outcome could be performed from
> OutboundConnection class. The same check cannot be performed from
> NettyStreamingConnectionFactory since the outcome field of Result is
> inaccessible.
> # There is an attempt to cast to the generic type SuccessType in
> OutboundConnectionIntiator.Result class. This behavior is inconsistent with
> the retry() and incompatible() methods which return their respective classes.
> # In NettyStreamingConnectionFactory, there appears to be some confusion in
> the involcation of isSuccess() method. It actually is making the invocation
> on Netty Future. It should have been on the Result object. On making a
> successful connect, NettyStreamingConnectionFactory calls success() on
> Future' s getNow() without checking the type of the cast.
> # There are no tests for initiateStreaming() method of
> OutboundConnectionInitiator as there are for initiateMessaging() method of
> OutboundConnectionInitiator.
> *Reproduction*
> I wrote a test (StreamingTest) that reproduces the issue in
> [https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3].
> *Code Change*
> I used the instanceof in [https://openjdk.org/jeps/394] to make incorrect
> comparisons a compile-time error. This is done in OutboundConnection where I
> check if result.success() instanceof MessagingSuccess and
> OutboundConnectionInitiator where I return Success safely instead.
>
>
> GitHub Pull request: [https://github.com/apache/cassandra/pull/4438]
> _Please note: this change makes use of a feature in JDK 16 and thus needs a
> higher minimum JDK._
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]