[ 
https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948214#comment-17948214
 ] 

Brandon Williams commented on CASSANDRA-19580:
----------------------------------------------

That did make me realize the ball here is in my court, though.  Let's check CI:

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19580-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1865/workflows/6bddd9a5-b708-4e14-b0d1-2b935fcf02d3],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1865/workflows/7d32d473-a5b2-4a33-b01a-0c725db0883e]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19580-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1863/workflows/9c64c597-c74d-4eec-9f3b-fdfbd3dd5e86],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1863/workflows/f62ae556-fdab-42bd-a060-7811672a5d8d]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19580-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1864/workflows/8afeac25-097b-4422-85f5-a5f69442bbbe],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1864/workflows/b59a3c9d-da3f-4da9-b176-0a7024f689a6]|


> Unable to contact any seeds with node in hibernate status
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-19580
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19580
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Cameron Zemek
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We have customer running into the error 'Unable to contact any seeds!' . I 
> have been able to reproduce this issue if I kill Cassandra as its joining 
> which will put the node into hibernate status. Once a node is in hibernate it 
> will no longer receive any SYN messages from other nodes during startup and 
> as it sends only itself as digest in outbound SYN messages it never receives 
> any states in any of the ACK replies. So once it gets to the check 
> `seenAnySeed` in it fails as the endpointStateMap is empty.
>  
> A workaround is copying the system.peers table from other node but this is 
> less than ideal. I tested modifying maybeGossipToSeed as follows:
> {code:java}
>     /* Possibly gossip to a seed for facilitating partition healing */
>     private void maybeGossipToSeed(MessageOut<GossipDigestSyn> prod)
>     {
>         int size = seeds.size();
>         if (size > 0)
>         {
>             if (size == 1 && 
> seeds.contains(FBUtilities.getBroadcastAddress()))
>             {
>                 return;
>             }
>             if (liveEndpoints.size() == 0)
>             {
>                 List<GossipDigest> gDigests = prod.payload.gDigests;
>                 if (gDigests.size() == 1 && 
> gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 {
>                     gDigests = new ArrayList<GossipDigest>();
>                     GossipDigestSyn digestSynMessage = new 
> GossipDigestSyn(DatabaseDescriptor.getClusterName(),
>                                                                            
> DatabaseDescriptor.getPartitionerName(),
>                                                                            
> gDigests);
>                     MessageOut<GossipDigestSyn> message = new 
> MessageOut<GossipDigestSyn>(MessagingService.Verb.GOSSIP_DIGEST_SYN,
>                                                                               
>             digestSynMessage,
>                                                                               
>             GossipDigestSyn.serializer);
>                     sendGossip(message, seeds);
>                 }
>                 else
>                 {
>                     sendGossip(prod, seeds);
>                 }
>             }
>             else
>             {
>                 /* Gossip with the seed with some probability. */
>                 double probability = seeds.size() / (double) 
> (liveEndpoints.size() + unreachableEndpoints.size());
>                 double randDbl = random.nextDouble();
>                 if (randDbl <= probability)
>                     sendGossip(prod, seeds);
>             }
>         }
>     }
>  {code}
> Only problem is this is the same as SYN from shadow round. It does resolve 
> the issue however as then receive an ACK with all the states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to