As far as I know, the main thing about using NetworkTopologyStrategy and different racks is replica placement throughout your cluster. That strategy favours different racks when it comes to choosing where a row's replica will be placed. So, if you have different numbers of nodes in each rack, you will probably end up with an unbalanced cluster (regarding data occupation), not because of the actual rows partitioning, but because of the replicas. The effects of it also depends on you replication factor. (You can sit down and do the math yourself.)
I had an issue like that sometime ago, because I was not aware of that behavior and didn't really care about where my machines were, and was using SimpleStrategy. But when I decided to go for NetworkTopologyStrategy, I realized I had a bad physical configuration (4 nodes in a same rack, 1 node in another one), so I had to fake that last node's rack, as if it was in the same as the other nodes, otherwise I would have that node alone in the rack with twice the data amount the other ones had. (As I said, that could even be worse if I had a higher replication factor.) To be honest, I'm not sure I fully understand the documentation you quoted on your first email, specially the last phrase. But, my (limited) experience with Cassandra (2.1) tells me that if you start off with a balanced rack setup, I'll be fine. Otherwise, you'll have to change you node's physical localization or faking it on config file, and run repair and clean on your entire cluster (which is a pain in the ass) to get a balanced cluster again. I had to do that. =P On Sat, Feb 28, 2015 at 6:05 AM, Amlan Roy <amlan....@cleartrip.com> wrote: > Hi Rob, > > Thanks for sharing the link. I have gone through it and few other documents > as well. Still I am confused. It seems, if we use vnodes and > NetworkTopologyStrategy, we should use a single rack configuration in > Cassandra. Or, it can create hotspots in the ring. Not sure if my > understanding is correct. > > Regards. > > > On 28-Feb-2015, at 2:42 am, Robert Coli <rc...@eventbrite.com> wrote: > > On Fri, Feb 27, 2015 at 7:30 AM, Amlan Roy <amlan....@cleartrip.com> wrote: >> >> I am new to Cassandra and trying to setup a Cassandra 2.0 cluster using 4 >> nodes, 2 each in 2 different racks. All are in same data centre. This is >> what I see in the documentation: >> >> To use racks correctly: >> >> Use the same number of nodes in each rack. Use one rack and place the >> nodes in different racks in an alternating pattern. This allows you to still >> get the benefits of Cassandra's rack feature, and allows for quick and fully >> functional expansions. Once the cluster is stable, you can swap nodes and >> make the appropriate moves to ensure that nodes are placed in the ring in an >> alternating fashion with respect to the racks. >> >> What I have understood is, in cassandra-rackdc.properties, I need to use >> single rack name even though I have 2 racks and then place the nodes in such >> an order that they are placed in an alternating fashion - RAC1-NODE1, >> RAC2-NODE1, RAC1-NODE2, RAC2-NODE2. >> >> Just wanted to know if this is correct. If yes, how do I enforce this >> order while adding nodes. > > https://issues.apache.org/jira/browse/CASSANDRA-3810 > > =Rob > > >