What sort of corruption are you thinking about ? Whenever the first CL nodes involved in a read do not agree on the "current" value a process is run to resolve their differences. This can result in an a node that is out of sync getting repaired.
Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/05/2012, at 11:17 PM, Carpenter, Curt wrote: > I (now) understand that the point of this is to get the most recent copy (at > least of the nodes checked) if all replicas simply haven’t been updated to > the latest changes. But what about dealing with corruption? What if the most > recent copy is corrupt? With a Zookeeper-based transaction system on top, > corruption is all I’m worried about. > > From: Dave Brosius [mailto:dbros...@mebigfatguy.com] > Sent: Thursday, May 10, 2012 10:03 PM > > If you read at Consistency of at least quorum, you are guaranteed that at > least one of the nodes has the latest data, and so you get the right data. If > you read with less than quorum it would be possible for all the nodes that > respond to have stale data. > > On 05/10/2012 09:46 PM, Carpenter, Curt wrote: > Hi all, newbie here. Be gentle. > > From > http://www.datastax.com/docs/1.0/cluster_architecture/about_client_requests: > “Thus, the coordinator first contacts the replicas specified by the > consistency level. The coordinator will send these requests to the replicas > that are currently responding most promptly. The nodes contacted will respond > with the requested data; if multiple nodes are contacted, the rows from each > replica are compared in memory to see if they are consistent. If they are > not, then the replica that has the most recent data (based on the timestamp) > is used by the coordinator to forward the result back to the client. > > To ensure that all replicas have the most recent version of frequently-read > data, the coordinator also contacts and compares the data from all the > remaining replicas that own the row in the background, and if they are > inconsistent, issues writes to the out-of-date replicas to update the row to > reflect the most recently written values. This process is known as read > repair. Read repair can be configured per column family > (usingread_repair_chance), and is enabled by default. > > For example, in a cluster with a replication factor of 3, and a read > consistency level of QUORUM, 2 of the 3 replicas for the given row are > contacted to fulfill the read request. Supposing the contacted replicas had > different versions of the row, the replica with the most recent version would > return the requested data. In the background, the third replica is checked > for consistency with the first two, and if needed, the most recent replica > issues a write to the out-of-date replicas.” > > > Always returns the most recent? What if the most recent write is corrupt? I > thought the whole point of a quorum was that consistency is verified before > the data is returned to the client. No? > > Thanks, > > Curt