> I may be wrong on this, so anyone else feel free to jump in. Here are some 
> issues to consider...
>
> - keyspace memory requirements are global, all nodes must have enough memory 
> to support the CFs.
> - During node moves, additions or deletions the token range may increase, 
> nodes with less total  space than others would make this more complicated.
> - during a write the mutation is sent to all replicas, a weak node that is a 
> replica for a strong and busy node will be asked to store data from the 
> strong node.
> - read repair reads from all replicas
> - when strong nodes that replicate to a weak node are compacting or repairing 
> the dynamic snitch may order them lower than the weak node. Potentially 
> increasing read requests on the weak one.
> - down time for a strong node (or cluster partition) may result in increased 
> read traffic to a weak node if all up replicas are needed to achieve the CL.
> - nodes store their token range and the token range for RF-1 other nodes.

The idea is to layout your ring to account for differences. However
the kink is that this only works exactly as you would want for RF=1
where you can directly control the capacity of each node by assigning
an appropriately sized ring. For RF > 1 you start having to consider
how replicas are chosen, and that a small node with a large "neighbor"
(neighbor in the sense of replica selection; direct neighbor in the
ring in the simplest case) contributes to the load on your small node.

So there are definitely concerns with mixing arbitrarily performing
nodes, but it's not like you must have identically sized nodes.
Probably a reasonable way to mix nodes is to have as few classes of
nodes as possible, and have them adjacent to each other on the ring.
So e.g., 15 fat nodes followed by 20 slim nodes. The fat nodes near
the fat/slim barrier would probably not be fully utilized because it
would spill over (due to RF > 1) on the slimmer nodes.

But yes, ring management and interaction with the chosen replication
strategy becomes more complex. Keeping in mind though that at worst
you have to treat some slightly better nodes as if they weren't. So it
only becomes an issue where the node capacity is sufficiently
different that you start caring about actually utilizing them fully.

I'd be interested to hear what people end up doing about this in
production, assuming people have any clusters that have survived long
enough on an evolving ring of hardware to actually have this problem
yet :)

-- 
/ Peter Schuller

Reply via email to