[ https://issues.apache.org/jira/browse/CASSANDRA-19948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885597#comment-17885597 ]
Bowen Song commented on CASSANDRA-19948: ---------------------------------------- Because the commitlog is persisted to the storage media before the node failure, the commitlog can be processed by CDC when the node comes back online. In the unlikely event that the node had to be replaced, the data can be re-synced via other routes such as batched processing. This guarantees the data between CDC and the table is eventually consistent. But does not guarantee real-time consistency. However, AFAIK, there's no way to guarantee real-time consistency between these two. Even if CDC runs on all 3 nodes, it still doesn't guarantee real-time data consistency. E.g. a QUORUM write succeed on node A but failed on node B and C, CDC will see the data, but subsequent QUORUM read from the node B and C will not see the data. Since the CDC consumer always need to tolerate some forms of temporary desync, it's a trade off between cost and delays. In some (many?) cases, 33% saving on the CDC processing cost can justify the occasional delay on a small number of writes. > Changing cdc table property can cause schema disagreement > --------------------------------------------------------- > > Key: CASSANDRA-19948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19948 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema > Reporter: Bowen Song > Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 4.1.1.txt, 4.1.6.txt, 5.0.0-corrected.txt, > cdc_schema_disagreement.sh > > Time Spent: 10m > Remaining Estimate: 0h > > In the cassandra.yaml file, there is a parameter named "cdc_enabled" which > allows CDC to be enabled or disabled on each individual nodes. > It has been found that it can cause schema disagreement or discrepancy when > an "ALTER TABLE ... WITH cdc=..." statement is ran against a node which has > "cdc_enabled" set to "false" in a cluster in which nodes have mixed > "true"/"false" values for the "cdc_enabled" settings. > The exact behaviour of the above is version-dependant. > On Cassandra 4.1.1, the cluster will end up in the schema disagreement state. > A rolling restart will bring the schema back in sync, but the changes made to > the `cdc` table property will be lost. > On Cassandra 4.1.6, the cluster will not have visible schema disagreement in > the "nodetool describecluster" command's output, but the "ALTER TABLE" > statement only has cosmetic effect on the node it is run. The node with > "cdc_enabled" set to "false" will show the "cdc" table property has changed, > but this does not affect its behaviour in any way. At the same time, other > nodes do not see that table property change at all. This is perhaps even > worse than on 4.1.1, because the alter table statement is silently failing. > On Casandra 5.0.0, the behaviour is the same as 4.1.6. > A shell script for reproducing the above described behaviours in Docker, and > the outputs of it on both 4.1.1 and 4.1.6 and 5.0.0 are attached. > > Edit on 25 Sep: added test result on 5.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org