We're seeing nodes bootstrapping but not streaming and joining a cluster in 2.2.13.
I have been looking through the MigrationManager code and the StorageService code that seems relevant based on the Bootstrap status messages that are coming through. I'll be referencing line numbers from the 2.2.13 source code. Major unit of work #1: prechecks We start our bootstrap process with StorageService:759 checking if we are autobootstrap (yes), bootstrap already in progress (no), bootstrap complete (no), and a check if seeds contains the current broadcast address (no) We proceed through the 771 shouldBootstrap() call to the "JOINING: waiting for ring information" state. Prior to this point, MigrationManager has already Gossip checked the current (nonmatching) schema version uuid. I believe in this state we only have system keyspace setup based on log messages that have already occurred like: Schema.java:421 - Adding org.apache.cassandra.config.CFMetaData@35645047 [cfId=5a1ff267-ace0-3f12-8563-cfae6103c65e,ksName=system,cfName=sstable_activity Those only have system tables listed, nothing else. Major unit of work #2: Ring Information Here we seem to poll all the nodes getting their token responsibilities. Each time for almost every node we also get: DEBUG [GossipStage:1] 2019-06-12 15:20:07,621 MigrationManager.java:96 - Not pulling schema because versions match or shouldPullSchemaFrom returned false However our schema versions DON'T match, so the shouldPullSchemaFrom must be short-circuiting this. shouldPullSChemaFrom() is a one-liner in MigrationManager on line 151, it seems to check if it knows the ip it is talking to, If that version's schema matches (which it definitely should not at this point)..... and if it is isGossipOnlyMember which I may have seen referred to as a "fat client". Perhaps we are defaulting to this state somehow for nodes/endpoints that are far away or not communicating? Or our view hasn't fully initialized yet? Anyway, I think these are all occurring in StorageService line 779, a for loop. We do not see any of the "got schema" messages that would break this for loop. So we must reach the dealy point? Is the delay parameter equal to -Dcassandra.ring_delay_ms setting? Because perhaps the successes we DID see were due to the increase to ring delay and that allowed this for loop to actually complete? Anyway, we seem to make it out of that for loop without initializing ring information, which also seems to not synchronize the schema. >From there we seem to sail through line 791, the MigrationManager.isReadyForBootstrap() for some reason. This was also mentioned in https://issues.apache.org/jira/browse/CASSANDRA-6648 as leading to bootstrap race conditions similar to ours and was allegedly fixed back in 1.2.X/2.0.X releases. Major Unit of work #3: perform bootstrap tasks lines 796+ of StorageService output ready to bootstrap / range calculation messages, and confirm ring + schema info has been performed, even though the logs are showing we didn't seem to sync schema. the strict consistency check is performed at line 803, which we are using by the way The log outputs the getting bootstrap token status message of JOINING from line 821 >From here there is a ton of log errors for UnknwonColumnFamilyExceptions. Hundreds of them 19-06-12 15:21:47,644 IncomingTcpConnection.java:100 - UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=dd5d7fa0-1e50-11e9-a62d-0d41c97b2404 At the end of these exceptions is Three log statements: INFO [main] 2019-06-12 15:23:25,515 StorageService.java:1142 - JOINING: Starting to bootstrap... INFO [main] 2019-06-12 15:23:25,525 StreamResultFuture.java:87 - [Stream #05af9ee0-8d26-11e9-85c1-bd5476090c54] Executing streaming plan for Bootstrap INFO [main] 2019-06-12 15:23:25,526 StorageService.java:1199 - Bootstrap completed! for the tokens [-7314981925085449175, <254 more tokens>, 5499447097629838103] And that is where we think our node has joined the cluster as a data-less UN node. Maybe like a "fat client"? Who knows. Bootstrap completed! message comes from a Future in StorageService:1199, which was produced on 1192 by bootstrapper.bootstrap(). It times out in 10s of millis or less, so it clearly isn't doing anything or hasn't been given anything to do. -------------------------- AFTER the bootstrp completeion message, we have messages for setting up system_traces and system_distributed These come 10s of millis after the completed future, so they must be lines 905 and 906 executing. THEN we have more messages for gossiping schema version (three or so) where FINALLY the UUID of the schema matches all the other schema uuids as shown in gossipinfo. I speculate this is occuring i the finishJoiningRing() call on line 912. --------------------------- So it appears the "guard" for loop is not sufficiently guarding, and schema is not being synchronized in our case in sufficient time. Perhaps all the other seeds or nodes are having communication issues and are acting as fat clients, so the bootstrap process assumes this is a schemaless node. Is that about correct or do I not understand some aspect of what is going on, outside of the various attempts to handle not-necessarily-up nodes, seeds, and streaming sources which I definitely do not fully understand yet.