[
https://issues.apache.org/jira/browse/CASSANDRA-18913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773457#comment-17773457
]
Brandon Williams commented on CASSANDRA-18913:
----------------------------------------------
I think we are ready for the patches for the other branches as discussed. I
checked [repeated
runs|https://app.circleci.com/pipelines/github/driftx/cassandra/1327/workflows/4d2003d9-dfe7-4f30-a54e-3790736e1e0e]
of the new test for flakiness and started the [upgrade
tests|https://app.circleci.com/pipelines/github/driftx/cassandra/1327/workflows/39aba0cf-392d-4e0c-935f-a67a17596e73]
for trunk.
> Gossip NPE due to shutdown event corrupting empty statuses
> ----------------------------------------------------------
>
> Key: CASSANDRA-18913
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18913
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip, Cluster/Membership
> Reporter: David Capwell
> Assignee: David Capwell
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> When an instance either disables gossip or shuts down we send a gossip
> shutdown message, peers ignore it if the endpoint isn’t known, else it
> mutates its local copy of the state to mark shutdown…
> When an instance restarts it populates gossip with the endpoints found in
> peers, but the state is empty (not null)
> So, there is a fun timing bug…
> * stop node1
> * start node1; at this point all known endpoints before exist in gossip but
> are empty
> * node2 shutdown (gossip shutdown or node, doesn’t matter)
> * node1 sees the shutdown before gossip messages, and gets corruptted
> * node3 tries to join the cluster, fails due to node1 being corrupted
> There are 2 different patterns the NPE can happen with, in this example node1
> and node3 will have different stack traces
> {code}
> org.apache.cassandra.distributed.shared.ShutdownException: Uncaught
> exceptions were thrown during test
> Suppressed: java.lang.NullPointerException: Unable to get HOST_ID;
> HOST_ID is not defined, given EndpointState: HeartBeatState = HeartBeat:
> generation = 0, version = 2147483647, AppStateMap =
> {STATUS=Value(shutdown,true,37), RPC_READY=Value(false,38),
> STATUS_WITH_PORT=Value(shutdown,true,36)}
> at
> org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1218)
> at
> org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1208)
> at
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:3279)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2756)
> at
> org.apache.cassandra.gms.Gossiper.markAsShutdown(Gossiper.java:611)
> at
> org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:39)
> at
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
> Suppressed: java.lang.NullPointerException: Unable to get HOST_ID;
> HOST_ID is not defined, given EndpointState: HeartBeatState = HeartBeat:
> generation = 0, version = 2147483647, AppStateMap =
> {STATUS=Value(shutdown,true,37), RPC_READY=Value(false,38),
> STATUS_WITH_PORT=Value(shutdown,true,36)}
> at
> org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1218)
> at
> org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1208)
> at
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:3279)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2756)
> at
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1762)
> at
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:3793)
> at
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1465)
> at
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1678)
> at
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
> at
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]