Re: cassandra nodes with mixed hard disk sizes

Edward Capriolo Tue, 22 Mar 2011 11:13:49 -0700

On Tue, Mar 22, 2011 at 12:23 PM, Peter Schuller
<peter.schul...@infidyne.com> wrote:
>> I may be wrong on this, so anyone else feel free to jump in. Here are some 
>> issues to consider...
>>
>> - keyspace memory requirements are global, all nodes must have enough memory 
>> to support the CFs.
>> - During node moves, additions or deletions the token range may increase, 
>> nodes with less total  space than others would make this more complicated.
>> - during a write the mutation is sent to all replicas, a weak node that is a 
>> replica for a strong and busy node will be asked to store data from the 
>> strong node.
>> - read repair reads from all replicas
>> - when strong nodes that replicate to a weak node are compacting or 
>> repairing the dynamic snitch may order them lower than the weak node. 
>> Potentially increasing read requests on the weak one.
>> - down time for a strong node (or cluster partition) may result in increased 
>> read traffic to a weak node if all up replicas are needed to achieve the CL.
>> - nodes store their token range and the token range for RF-1 other nodes.
>
> The idea is to layout your ring to account for differences. However
> the kink is that this only works exactly as you would want for RF=1
> where you can directly control the capacity of each node by assigning
> an appropriately sized ring. For RF > 1 you start having to consider
> how replicas are chosen, and that a small node with a large "neighbor"
> (neighbor in the sense of replica selection; direct neighbor in the
> ring in the simplest case) contributes to the load on your small node.
>
> So there are definitely concerns with mixing arbitrarily performing
> nodes, but it's not like you must have identically sized nodes.
> Probably a reasonable way to mix nodes is to have as few classes of
> nodes as possible, and have them adjacent to each other on the ring.
> So e.g., 15 fat nodes followed by 20 slim nodes. The fat nodes near
> the fat/slim barrier would probably not be fully utilized because it
> would spill over (due to RF > 1) on the slimmer nodes.
>
> But yes, ring management and interaction with the chosen replication
> strategy becomes more complex. Keeping in mind though that at worst
> you have to treat some slightly better nodes as if they weren't. So it
> only becomes an issue where the node capacity is sufficiently
> different that you start caring about actually utilizing them fully.
>
> I'd be interested to hear what people end up doing about this in
> production, assuming people have any clusters that have survived long
> enough on an evolving ring of hardware to actually have this problem
> yet :)
>
> --
> / Peter Schuller
>


The problem is that laying out the Tokens is very hard. For example
one machine may have 40GB another machine may have 60GB. This creates
a technical term coming...double-whammy.
First it will receive proportionally more requests. (dynamic snitch helps here)
Second these requests will have to Random Read through more data.

Actually it is more like a tripple-whammy since every configuration
like key cache size,memtable flush,etc is now in CFMetaData and shared
across the cluster. What setting do you pick for keycache? If you go
small you do not get the most out of hardware. If you go large your
smaller nodes may have GC issues.

Wait! maybe this is a quadruple-whammy since we have to account for
the data being replicated to other nodes. At replication factor 3 only
1/3rd of the data on the node actually belongs in that TokenRange, So
it is not as simple as having small nodes with smaller ranges, you
also have to consider nodes around it and somehow balance them out to.
(I am not convinced it can be done)

These "whammy" conditions may not be a big deal since your cluster
should have extra capacity. As long as you are operating with some
overhead they may not be noticeable, However as you start approaching
capacity your smaller nodes will start showing (usually IO ) problems
first.

I would say mixed hardware is a no-no if the hardware. By mixed I mean
drastically different hardware. As to how this translates long
term...hopefully you have a 3 or 5 year hardware replacement policy.
Before then disk and memory upgrades should serve you well.

Re: cassandra nodes with mixed hard disk sizes

Reply via email to