I just wanted to verify the fact that if I happen to setup a multi data-center Cassandra setup, will each data center have the complete data-set with it?
Say, I have two data-center each with two nodes, and a partitioner that ranges from 0 to 100. Initial token assigned this way DC1:N1 = 00 DC2:N1 = 25 DC1:N2 = 50 DC2:N2 = 75 where DCX is data center X, NX is node X. *Which one the following options is true?* *Option #1: *DC1 and DC2, each will hold complete dataset with keys bucketed as follows DC1:N1 = (50, 00] => 50 keys DC1:N2 = (00, 50] => 50 keys ---- Complete data set mirrored at DC1 DC2:N1 = (75, 25] => 50 keys DC2:N2 = (25, 75] => 50 keys ---- Complete data set mirrored at DC2 *Option #2: *DC1 and DC2, each will hold 50% of the data with keys bucketed as follows (much the same way in a single C setup) DC1:N1 = (75, 00] => 25 keys DC2:N1 = (00, 25] => 25 keys DC1:N2 = (25, 50] => 25 keys DC2:N2 = (50, 75] => 25 keys ---- data is divided into the two data centers. Thanks, PP