Hi Cameron,
Just want to make sure I understood correctly. So your observation was at
the end of the shadow round, some nodes have empty endpointStateMap? But I
read the code at GossipDigestSynVerbHandler::createShadowReply
it seems that the receiving node will reply with a full stateMap?
return
> Sounds like the request was to hit the pause button until TCM merged rather
> than skipping the work entirely so that's promising.
Correct, I was only asked to wait a few days and to rebase after TCM merged.
The issue was that I had to time box this work and the fact it hit issues kinda
beca
I'm +1 to continuing work on CASSANDRA-18917 for all the reasons Jordan listed.
Sounds like the request was to hit the pause button until TCM merged rather
than skipping the work entirely so that's promising.
On Thu, May 16, 2024, at 1:43 PM, Jon Haddad wrote:
> I have also recently worked with
I have also recently worked with a teams who lost critical data as a result
of gossip issues combined with collision in our token allocation. I
haven’t filed a jira yet as it slipped my mind but I’ve seen it in my own
testing as well. I’ll get a JIRA in describing it in detail.
It’s severe enough
I’m a big +1 on 18917 or more testing of gossip. While I appreciate that it
makes TCM more complicated, gossip and schema propagation bugs have been
the source of our two worst data loss events in the last 3 years. Data loss
should immediately cause us to evaluate what we can do better.
We will li
So, I created https://issues.apache.org/jira/browse/CASSANDRA-18917 which lets
you do deterministic gossip simulation testing cross large clusters within
seconds… I stopped this work as it conflicted with TCM (they were trying to
merge that week) and it hit issues where some nodes never converge
In looking into CASSANDRA-19580 I noticed something that raises a question.
With Gossip SYN it doesn't check for missing digests. If its empty for shadow
round it will add everything from endpointStateMap to the reply. But why not
included missing entries in normal replies? The branching for rep