K - let me state the facts first (As I see know them) - I do not know the inner workings, so interpret my response with that caveat. Although, at an architectural level, one should be able to keep detailed implementation at bay - Quorum is (N+!)/2 where N is the Replication Factor (RF) - And consistency is a guarantee if R(ead) + W(rite) > RF (Which Quorum gives you, but can be achieved via other permutations, depending on whether Read or Write performance is desired)
No getting to your questions:- 1. If Read at Q is nondeterministic, it would likely have to read the other (RF-Q) nodes to achieve Quorum on a deterministic value. At which point - sync'ing all with writes should not be that expensive. But at what point precisely the read is returned - do not know - you will have to look at the code. IMO - at this level it should not matter. 2. Should be at the granularity of data divergence 3. Read Repair or Nodetool (which ever comes first) 4. All peer - there is no primary. There might be a connected node - but no special role/privileges 5. Tries to Q - returns on deterministic read. If not - see (1) 6. Writer supplies timestamp value - can be any value that makes sense within the scope of data/application. HTH, -JA On Fri, Feb 18, 2011 at 10:28 AM, A J <s5a...@gmail.com> wrote: > Couple of more related questions: > > 5. For reads, does Cassandra first read N nodes or just the R nodes it > selects ? I am thinking unless it reads all the N nodes, how will it > know which node has the latest write. > > 6. Who decides the timestamp that gets inserted into the timestamp > field of every column. I would guess the coordinator node picks up its > system's timestamp. If that is true, the clocks on all the nodes > should be synchronized, right ? Otherwise conflict resolution cannot > be done correctly. > For a distributed system, this is not always possible. How do folks > get around this issue ? > > Thanks. > > > > On Fri, Feb 18, 2011 at 10:23 AM, A J <s5a...@gmail.com> wrote: > > Questions about R and N (and W): > > 1. If I set R to Quorum and cassandra identifies a need for read > > repair before returning, would the read repair happen on R nodes (I > > mean subset of R that needs repair) or N nodes before the data is > > delivered to the client ? > > 2. Also does the repair happen at level of row (key) or at level of > column ? > > > > 3. During write, if W is met but N-W is not met for some reason; would > > cassandra try to repair N-W nodes in the background as and when it > > can. Or the N-W are only repaired when a read is issued ? > > > > 4. What is the significance of the 'primary' replica for writes from > > usage point ? Writes to primary and non-primary replicas all happen > > simultaneously. Ensuring W is decided irrespective of it being primary > > or not. Ensuring R is decided by any of the R nodes out of N. > > I know the tokens are divided per the primary replica. But other than > > that, for read and write operations, do the primary replica play any > > special role ? > > > > Thanks. > > >