Also you violate consistency the second it attaches and you can’t run repair offline, so you may serve reads that don’t return expected results (empty or resurrecting, but not respecting monotonic quorum)
It’s basically the same as a corrupt volume. If you care about strict data correctness you do not want to do this until you build a system that archives and restore the commitlog segments. > On Feb 28, 2025, at 6:10 PM, Jeff Jirsa <jji...@apache.org> wrote: > > > >> On 2025/03/01 00:27:27 Jaydeep Chovatia wrote: >> Hi, >> >> I want to reattach an asynchronously replicated EBS volume to Cassandra. I >> want to know how to fix the delta inconsistency when reattaching other than >> running a repair on the dataset. >> >> Here is the scenario. >> Three Cassandra nodes in three separate zones: >> Node1 --> Zone1 (*EBS_Drive1*, Async_EBS_Drive3_Replica) >> Node2 --> Zone2 (EBS_Drive2, *Async_EBS_Drive1_Replica*) >> Node3 --> Zone3 (EBS_Drive3, Async_EBS_Drive2_Replica) >> >> EBS replicates data between Zones asynchronously. EBS Drive1 in Zone1 is >> asynchronously copied to EBS Drive1 in Zone2, and so on. >> >> If Node1 goes down in Zone1, I want to reattach Node1's asynchronously >> replicated drive, *Async_EBS_Drive1_Replica,* in Zone2, which is fine. But >> this async drive would be missing some of the latest data, say the last 15 >> minutes, which was present in EBS_Drive1. Besides going through Cassandra >> repair, what are my options to repair the missing data when I reattach >> *Async_EBS_Drive1_Replica*? > > There is no way to do this with JUST Cassandra in 2025-available Versions of > Cassandra. > > You're effectively asking for a point in time restore functionality from a > backup system that doesnt implement point-in-time restore capability. You'd > need the delta commitlogs and newly flushed sstables, and you'd have to > replay them. Hints wont work, because you've acked the mutations. IR isn't > guaranteed to work, because all replicas may have already promoted the > unrepaired set on the sync ebs volume. > > (If the nodes are ALSO running their own cassandra processes from the primary > zone, I suspect you also end up with out of range data, which is not great, > and I have no idea how future enhancements like CEP-45 would think about this > - or point-in-time restore in general). > >