[ https://issues.apache.org/jira/browse/CASSANDRA-20595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ariel Weisberg updated CASSANDRA-20595: --------------------------------------- Description: Right now sstable import happens independently at different nodes causing non-deterministic reads which break Accord transaction recovery. Mutation tracking also has a similar concern that should probably be tackled at the same time. Basically with mutation tracking it's possible you read from a node that imported the tables and then later read from one that didn't so you really want to have the import be in the mutation tracking log so that nodes can reconcile whether the import occurred before returning a result. SSTables should be staged at each node and then imported via transaction that validates the sstable is actually present at all nodes before committing. So {{TxnRead}} would validate the tables are stages and validate they match across nodes if possible via checksums of some sort, then {{TxnQuery}} would make a go/no go decision for transaction outcome, and then {{TxnWrite}} would actually import the sstables if all checks pass. The one pain point is that this requires all command stores to import at the same time because we have no way to import sstables for just the owned ranges of a command store… but maybe we should? Like have a pre-import and post import version of Tracker/View and command stores can signal which they are going to use and we just forbid multiple concurrent imports so the outcome of an attempt at concurrent import would just be an error saying concurrent imports can’t happen. I think {{MIXED_READS}} will be wonky in this case because it will see some transactions executing with the pre import state and some with the post import state and we have to make a call as to when non-SERIAL reads see the post import state. Probably best for that to be after the Accord transaction doing the import commits at each node. was: Right now sstable import happens independently at different nodes causing non-deterministic reads which break Accord transaction recovery. SSTables should be staged at each node and then imported via transaction that validates the sstable is actually present at all nodes before committing. So {{TxnRead}} would validate the tables are stages and validate they match across nodes if possible via checksums of some sort, then {{TxnQuery}} would make a go/no go decision for transaction outcome, and then {{TxnWrite}} would actually import the sstables if all checks pass. The one pain point is that this requires all command stores to import at the same time because we have no way to import sstables for just the owned ranges of a command store… but maybe we should? Like have a pre-import and post import version of Tracker/View and command stores can signal which they are going to use and we just forbid multiple concurrent imports so the outcome of an attempt at concurrent import would just be an error saying concurrent imports can’t happen. I think {{MIXED_READS}} will be wonky in this case because it will see some transactions executing with the pre import state and some with the post import state and we have to make a call as to when non-SERIAL reads see the post import state. Probably best for that to be after the Accord transaction doing the import commits at each node. > SSTable import doesn't work with Accord transactions > ---------------------------------------------------- > > Key: CASSANDRA-20595 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20595 > Project: Apache Cassandra > Issue Type: Bug > Components: Accord, Local/SSTable > Reporter: Ariel Weisberg > Priority: Normal > > Right now sstable import happens independently at different nodes causing > non-deterministic reads which break Accord transaction recovery. Mutation > tracking also has a similar concern that should probably be tackled at the > same time. Basically with mutation tracking it's possible you read from a > node that imported the tables and then later read from one that didn't so you > really want to have the import be in the mutation tracking log so that nodes > can reconcile whether the import occurred before returning a result. > SSTables should be staged at each node and then imported via transaction that > validates the sstable is actually present at all nodes before committing. So > {{TxnRead}} would validate the tables are stages and validate they match > across nodes if possible via checksums of some sort, then {{TxnQuery}} would > make a go/no go decision for transaction outcome, and then {{TxnWrite}} would > actually import the sstables if all checks pass. > The one pain point is that this requires all command stores to import at the > same time because we have no way to import sstables for just the owned ranges > of a command store… but maybe we should? Like have a pre-import and post > import version of Tracker/View and command stores can signal which they are > going to use and we just forbid multiple concurrent imports so the outcome of > an attempt at concurrent import would just be an error saying concurrent > imports can’t happen. > I think {{MIXED_READS}} will be wonky in this case because it will see some > transactions executing with the pre import state and some with the post > import state and we have to make a call as to when non-SERIAL reads see the > post import state. Probably best for that to be after the Accord transaction > doing the import commits at each node. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org