Hi all

First off, let's introduce the setup. 

- 6 x C* 1.1.2 in active DC (DC1), another 6 in another (DC2)
- keyspace's RF=3 in each DC
- Hector as client.
- client talks only to DC1 unless DC1 can't serve the request. In which case 
talks only to DC2
- commit log was periodically sync with the default setting of 10s. 
- consistency policy = LOCAL QUORUM for both read and write. 
- we are running on production linux VMs (not ideal but this is out of our 
hands)
-----
As part of a DR exercise, we killed all 6 nodes in DC1, hector starts talking 
to DC2, all the data was still there, everything continued to work perfectly. 

Then we brought all nodes, one by one, in DC1 up. We saw a message saying all 
the commit logs were replayed. No errors reported.  We didn't run repair at 
this time. 

We noticed that data that was written an hour before the exercise, around the 
last memtables being flushed,was not found in DC1. 

If we understand correctly, commit logs are being written first and then to 
disk every 10s. At worst we lost the last 10s of data. What could be the cause 
of this behaviour? 

With the blessing of C* we could recovered all these data from DC2. But we 
would like to understand why. 

Many thanks in advanced. 

Amy


Reply via email to