We're seeing nodes bootstrapping but not streaming and joining a cluster in
2.2.13.

I have been looking through the MigrationManager code and the
StorageService code that seems relevant based on the Bootstrap status
messages that are coming through. I'll be referencing line numbers from the
2.2.13 source code.

Major unit of work #1: prechecks

We start our bootstrap process with StorageService:759 checking if we are
autobootstrap (yes), bootstrap already in progress (no), bootstrap complete
(no), and a check if seeds contains the current broadcast address (no)

We proceed through the 771 shouldBootstrap() call to the "JOINING: waiting
for ring information" state.

Prior to this point, MigrationManager has already Gossip checked the
current (nonmatching) schema version uuid. I believe in this state we only
have system keyspace setup based on log messages that have already occurred
like:

Schema.java:421 - Adding org.apache.cassandra.config.CFMetaData@35645047
[cfId=5a1ff267-ace0-3f12-8563-cfae6103c65e,ksName=system,cfName=sstable_activity

Those only have system tables listed, nothing else.

Major unit of work #2: Ring Information

Here we seem to poll all the nodes getting their token responsibilities.
Each time for almost every node we also get:

DEBUG [GossipStage:1] 2019-06-12 15:20:07,621 MigrationManager.java:96 -
Not pulling schema because versions match or shouldPullSchemaFrom returned
false

However our schema versions DON'T match, so the shouldPullSchemaFrom must
be short-circuiting this. shouldPullSChemaFrom() is a one-liner in
MigrationManager on line 151, it seems to check if it knows the ip it is
talking to, If that version's schema matches (which it definitely should
not at this point)..... and if it is isGossipOnlyMember which I may have
seen referred to as a "fat client". Perhaps we are defaulting to this state
somehow for nodes/endpoints that are far away or not communicating? Or our
view hasn't fully initialized yet?

Anyway, I think these are all occurring in StorageService line 779, a for
loop. We do not see any of the "got schema" messages that would break this
for loop. So we must reach the dealy point? Is the delay parameter equal
to -Dcassandra.ring_delay_ms setting? Because perhaps the successes we DID
see were due to the increase to ring delay and that allowed this for loop
to actually complete?

Anyway, we seem to make it out of that for loop without initializing ring
information, which also seems to not synchronize the schema.

>From there we seem to sail through line 791, the
MigrationManager.isReadyForBootstrap() for some reason. This was also
mentioned in https://issues.apache.org/jira/browse/CASSANDRA-6648 as
leading to bootstrap race conditions similar to ours and was allegedly
fixed back in 1.2.X/2.0.X releases.

Major Unit of work #3: perform bootstrap tasks

lines 796+ of StorageService output ready to bootstrap / range calculation
messages, and confirm ring + schema info has been performed, even though
the logs are showing we didn't seem to sync schema.

the strict consistency check is performed at line 803, which we are using
by the way

The log outputs the getting bootstrap token status message of JOINING from
line 821

>From here there is a ton of log errors for UnknwonColumnFamilyExceptions.
Hundreds of them

19-06-12 15:21:47,644 IncomingTcpConnection.java:100 -
UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=dd5d7fa0-1e50-11e9-a62d-0d41c97b2404

At the end of these exceptions is Three log statements:

INFO  [main] 2019-06-12 15:23:25,515 StorageService.java:1142 - JOINING:
Starting to bootstrap...

INFO  [main] 2019-06-12 15:23:25,525 StreamResultFuture.java:87 - [Stream
#05af9ee0-8d26-11e9-85c1-bd5476090c54] Executing streaming plan for
Bootstrap

INFO  [main] 2019-06-12 15:23:25,526 StorageService.java:1199 - Bootstrap
completed! for the tokens [-7314981925085449175, <254 more tokens>,
5499447097629838103]

And that is where we think our node has joined the cluster as a data-less
UN node. Maybe like a "fat client"? Who knows.

Bootstrap completed! message comes from a Future in StorageService:1199,
which was produced on 1192 by bootstrapper.bootstrap(). It times out in 10s
of millis or less, so it clearly isn't doing anything or hasn't been given
anything to do.

--------------------------

AFTER the bootstrp completeion message, we have messages for setting up

system_traces and system_distributed

These come 10s of millis after the completed future, so they must be lines
905 and 906 executing.

THEN we have more messages for gossiping schema version (three or so) where
FINALLY the UUID of the schema matches all the other schema uuids as shown
in gossipinfo. I speculate this is occuring i the finishJoiningRing() call
on line 912.


---------------------------

So it appears the "guard" for loop is not sufficiently guarding, and schema
is not being synchronized in our case in sufficient time. Perhaps all the
other seeds or nodes are having communication issues and are acting as fat
clients, so the bootstrap process assumes this is a schemaless node.

Is that about correct or do I not understand some aspect of what is going
on, outside of the various attempts to handle not-necessarily-up nodes,
seeds, and streaming sources which I definitely do not fully understand yet.

Reply via email to