Thanks Mark for the very detailed explanation. However what's about timestamp checking ? You're saying that the coordinator checks for the digest of data (cell value) from both nodes but if the cell name have different timestamp would it still request a full data read to the node having the most recent time ?
On Fri, Jul 25, 2014 at 11:25 PM, Mark Reddy <[email protected]> wrote: > Hi Brian, > > A read request will be handled in the following manner: > > Once the coordinator receives a read request it will firstly determine the > replicas responsible for the data. From there those replicas are sorted by > "proximity" to the coordinator. The closest node as determined by proximity > sorting will be sent a command to perform an actual data read i.e. return > the data to the coordinator > > If you have a Replication Factor (RF) of 3 and are reading at CL.QUORUM, > one additional node will be sent a digest query. A digest query is like a > read query except that instead of the receiving node actually returning the > data, it only returns a digest (hash) of the would-be data. The reason for > this is to discover whether the two nodes contacted agree on what the > current data is, without sending the data over the network. Obviously for > large data sets this is an effective bandwidth saver. > > Back on the coordinator node if the data and the digest match the data is > returned to the client. If the data and digest do not match, a full data > read is performed against the contacted replicas in order to guarantee that > the most recent data is returned. > > Asynchronously in the background, the third replica is checked for > consistency with the first two, and if needed, a read repair is initiated > for that node. > > > Mark > > > > On Fri, Jul 25, 2014 at 9:12 PM, Brian Tarbox <[email protected]> > wrote: > >> We're considering a C* setup with very large columns and I have a >> question about the details of read. >> >> I understand that a read request gets handled by the coordinator which >> sends read requests to <quorum> of the nodes holding replicas of the data, >> and once <quorum> nodes have replied with consistent data it is returned to >> the client. >> >> My understanding is that each of the nodes actually sends the full data >> being requested to the coordinator (which in the case of very large columns >> would involve lots of network traffic). Is that right? >> >> The alternative (which I don't think is the case but I've been asked to >> verify) is that the replicas first send meta-data to the coordinator which >> then asks one replica to send the actual data. Again, I don't think this >> is the case but was asked to confirm. >> >> Thanks. >> >> -- >> http://about.me/BrianTarbox >> > >
