Re: do all nodes actually send the data to the coordinator when doing a read?

Mark Reddy Fri, 25 Jul 2014 17:25:26 -0700

>
> However what's about timestamp checking ? You're saying that the
> coordinator checks for the digest of data (cell value) from both nodes but
> if the cell name have different timestamp would it still request a full
> data read to the node having the most recent time ?



When generating the hash to be returned to the coordinator, the possible
cell values that are used are name, value, timestamp, serialisationFlag and
depending on the cell type possibly other values. From there the hashes are
compared and if there is a mismatch the data is requested from the
replicas. At this stage the RowDataResolver will compute the most recent
version of each column, and send diffs to out-of-date replicas.




On Fri, Jul 25, 2014 at 11:32 PM, Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> Yes. Digest includes following: {name, value, timestamp, flags(deleted,
> expired, etc.)}
>
>
> On Fri, Jul 25, 2014 at 2:33 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Thanks Mark for the very detailed explanation.
>>
>>  However what's about timestamp checking ? You're saying that the
>> coordinator checks for the digest of data (cell value) from both nodes but
>> if the cell name have different timestamp would it still request a full
>> data read to the node having the most recent time ?
>>
>>
>> On Fri, Jul 25, 2014 at 11:25 PM, Mark Reddy <mark.re...@boxever.com>
>> wrote:
>>
>>> Hi Brian,
>>>
>>> A read request will be handled in the following manner:
>>>
>>> Once the coordinator receives a read request it will firstly determine
>>> the replicas responsible for the data. From there those replicas are sorted
>>> by "proximity" to the coordinator. The closest node as determined by
>>> proximity sorting will be sent a command to perform an actual data read
>>> i.e. return the data to the coordinator
>>>
>>> If you have a Replication Factor (RF) of 3 and are reading at CL.QUORUM,
>>> one additional node will be sent a digest query. A digest query is like a
>>> read query except that instead of the receiving node actually returning the
>>> data, it only returns a digest (hash) of the would-be data. The reason for
>>> this is to discover whether the two nodes contacted agree on what the
>>> current data is, without sending the data over the network. Obviously for
>>> large data sets this is an effective bandwidth saver.
>>>
>>> Back on the coordinator node if the data and the digest match the data
>>> is returned to the client. If the data and digest do not match, a full data
>>> read is performed against the contacted replicas in order to guarantee that
>>> the most recent data is returned.
>>>
>>> Asynchronously in the background, the third replica is checked for
>>> consistency with the first two, and if needed, a read repair is initiated
>>> for that node.
>>>
>>>
>>> Mark
>>>
>>>
>>>
>>> On Fri, Jul 25, 2014 at 9:12 PM, Brian Tarbox <briantar...@gmail.com>
>>> wrote:
>>>
>>>> We're considering a C* setup with very large columns and I have a
>>>> question about the details of read.
>>>>
>>>> I understand that a read request gets handled by the coordinator which
>>>> sends read requests to <quorum> of the nodes holding replicas of the data,
>>>> and once <quorum> nodes have replied with consistent data it is returned to
>>>> the client.
>>>>
>>>> My understanding is that each of the nodes actually sends the full data
>>>> being requested to the coordinator (which in the case of very large columns
>>>> would involve lots of network traffic).  Is that right?
>>>>
>>>> The alternative (which I don't think is the case but I've been asked to
>>>> verify) is that the replicas first send meta-data to the coordinator which
>>>> then asks one replica to send the actual data.  Again, I don't think this
>>>> is the case but was asked to confirm.
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> http://about.me/BrianTarbox
>>>>
>>>
>>>
>>
>

Re: do all nodes actually send the data to the coordinator when doing a read?

Reply via email to