> I may be wrong on this, so anyone else feel free to jump in. Here are some > issues to consider... > > - keyspace memory requirements are global, all nodes must have enough memory > to support the CFs. > - During node moves, additions or deletions the token range may increase, > nodes with less total space than others would make this more complicated. > - during a write the mutation is sent to all replicas, a weak node that is a > replica for a strong and busy node will be asked to store data from the > strong node. > - read repair reads from all replicas > - when strong nodes that replicate to a weak node are compacting or repairing > the dynamic snitch may order them lower than the weak node. Potentially > increasing read requests on the weak one. > - down time for a strong node (or cluster partition) may result in increased > read traffic to a weak node if all up replicas are needed to achieve the CL. > - nodes store their token range and the token range for RF-1 other nodes.
The idea is to layout your ring to account for differences. However the kink is that this only works exactly as you would want for RF=1 where you can directly control the capacity of each node by assigning an appropriately sized ring. For RF > 1 you start having to consider how replicas are chosen, and that a small node with a large "neighbor" (neighbor in the sense of replica selection; direct neighbor in the ring in the simplest case) contributes to the load on your small node. So there are definitely concerns with mixing arbitrarily performing nodes, but it's not like you must have identically sized nodes. Probably a reasonable way to mix nodes is to have as few classes of nodes as possible, and have them adjacent to each other on the ring. So e.g., 15 fat nodes followed by 20 slim nodes. The fat nodes near the fat/slim barrier would probably not be fully utilized because it would spill over (due to RF > 1) on the slimmer nodes. But yes, ring management and interaction with the chosen replication strategy becomes more complex. Keeping in mind though that at worst you have to treat some slightly better nodes as if they weren't. So it only becomes an issue where the node capacity is sufficiently different that you start caring about actually utilizing them fully. I'd be interested to hear what people end up doing about this in production, assuming people have any clusters that have survived long enough on an evolving ring of hardware to actually have this problem yet :) -- / Peter Schuller