> I was/am under the impression that a node owns a particular token > range, and does not save any data that falls outside of that range > (with exception to any data that might be replicated to it). Based on > what you are saying, each node owns a token range, but also maintains > copies of data outside of the range. If this is correct, then I can > understand how all of my previous questions seemed "wrong." Cassandra > already does what I want, provided that I use the correct RF and CL > values.
No. I am not entirely sure from where the confusion comes, so I will just try to summarize things from scratch in a brief manner. Any piece of data you store in Cassandra is going to be in a particular row, which has a row key. That row will have a "replica set" in the Cassandra cluster. For RF=3, that replica set contains three nodes. The replicate set is the set of nodes that are responsible for keeping data for a row. In other words, with RF=3, thus a replica set containing 3 nodes for each possible row key, there will be 3 copies of the data in total. All the consistency levels always refer to nodes *in the replica set*. For example, CL.ALL requires that all nodes *in the replica set* respond. CL.QUORUM requires that a majority of all nodes *in the replica set* respond. >From the perspective of a given node in the cluster, assuming for the example RF=3, it will contain data for its own token range as well as data for two other token ranges. To re-iterate another point: The choice of consistency level *never* affects *which* nodes are responsible for a given row key, nor does it affect which rows will eventually receive writes. It *only* affects how many nodes must respond before the operation (read or write) is considered successful. Does that make it clearer? -- / Peter Schuller (@scode on twitter)