What sort of corruption are you thinking about ?

Whenever the first CL nodes involved in a read do not agree on the "current" 
value a  process is run to resolve their differences. This can result in an a 
node that is out of sync getting repaired.
Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/05/2012, at 11:17 PM, Carpenter, Curt wrote:

> I (now) understand that the point of this is to get the most recent copy (at 
> least of the nodes checked) if all replicas simply haven’t been updated to 
> the latest changes. But what about dealing with corruption? What if the most 
> recent copy is corrupt? With a Zookeeper-based transaction system on top, 
> corruption is all I’m worried about.
>  
> From: Dave Brosius [mailto:dbros...@mebigfatguy.com] 
> Sent: Thursday, May 10, 2012 10:03 PM
> 
> If you read at Consistency of at least quorum, you are guaranteed that at 
> least one of the nodes has the latest data, and so you get the right data. If 
> you read with less than quorum it would be possible for all the nodes that 
> respond to have stale data.
> 
> On 05/10/2012 09:46 PM, Carpenter, Curt wrote:
> Hi all, newbie here. Be gentle.
>  
> From 
> http://www.datastax.com/docs/1.0/cluster_architecture/about_client_requests:
> “Thus, the coordinator first contacts the replicas specified by the 
> consistency level. The coordinator will send these requests to the replicas 
> that are currently responding most promptly. The nodes contacted will respond 
> with the requested data; if multiple nodes are contacted, the rows from each 
> replica are compared in memory to see if they are consistent. If they are 
> not, then the replica that has the most recent data (based on the timestamp) 
> is used by the coordinator to forward the result back to the client.
> 
> To ensure that all replicas have the most recent version of frequently-read 
> data, the coordinator also contacts and compares the data from all the 
> remaining replicas that own the row in the background, and if they are 
> inconsistent, issues writes to the out-of-date replicas to update the row to 
> reflect the most recently written values. This process is known as read 
> repair. Read repair can be configured per column family 
> (usingread_repair_chance), and is enabled by default.
> 
> For example, in a cluster with a replication factor of 3, and a read 
> consistency level of QUORUM, 2 of the 3 replicas for the given row are 
> contacted to fulfill the read request. Supposing the contacted replicas had 
> different versions of the row, the replica with the most recent version would 
> return the requested data. In the background, the third replica is checked 
> for consistency with the first two, and if needed, the most recent replica 
> issues a write to the out-of-date replicas.”
> 
>  
> Always returns the most recent? What if the most recent write is corrupt? I 
> thought the whole point of a quorum was that consistency is verified before 
> the data is returned to the client. No?
>  
> Thanks,
>  
> Curt

Reply via email to