For background: http://www.datastax.com/docs/1.0/cluster_architecture/index http://thelastpickle.com/2011/02/07/Introduction-to-Cassandra/
> Which mechanism is used to replicate the changes from one system to another: > statement distribution or recording the changeset via triggers or storing the > changeset in transaction log? Statement distributions is the closest to the truth. But we do not distribute statements. Check the links above, the coordinator processes the requests and sends messages to all the replicas at the same time. From the RDBMS world it's akin to Mirroring in the SQL Server / Oracle world. > Since replication is continuous copying of changes from one node to another, > these changes would have to be snapshotted in order to sustain temporary > network failures so that replication can resume after the network problem is > healed. is there a mechanism to define how long we can store/archive the > snaphotted changes before we discard and would demand a recreation of node > from the scratch rather than rejoin Snapshotting is not used. Look at the Consistency Level, Read Repair, Hinted Handoff and Repair. > What options are available for conflict resolution since we are talking about > master-master replication across tens of nodes? An int64 time stamp which is specified by the client or the server (when using CQL). By convention microseconds past the epoch are used. > If a node is rejoined after a split network where same records would have > been modified on multiple nodes, is there a mechanism to merge the data, > resolve conflicts and eventually reach to a consistent state? See above. It's all part of the Eventual Consistency world. nodetool repair is the final word in repairing data. But the Consistency Level is what specifies the guarantee per request. The best way to learn is to jump in and play with it. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/04/2012, at 3:36 AM, Samba wrote: > Hi all, > We are evaluating Cassandra for a geographically distributed deployment that > requires multi master replication. > > We have a few questions regarding how replication is handled in Cassandra, > like: > > Which mechanism is used to replicate the changes from one system to another: > statement distribution or recording the changeset via triggers or storing the > changeset in transaction log? > Since replication is continuous copying of changes from one node to another, > these changes would have to be snapshotted in order to sustain temporary > network failures so that replication can resume after the network problem is > healed. is there a mechanism to define how long we can store/archive the > snaphotted changes before we discard and would demand a recreation of node > from the scratch rather than rejoin > What options are available for conflict resolution since we are talking about > master-master replication across tens of nodes? > If a node is rejoined after a split network where same records would have > been modified on multiple nodes, is there a mechanism to merge the data, > resolve conflicts and eventually reach to a consistent state? > Thanks and Regards, > Samba