Re: cassandra nodes with mixed hard disk sizes

Aaron Morton Tue, 22 Mar 2011 04:22:08 -0700

My assumption is from not seeing anything in the code to explicitly support 
nodes of different specs (also think I saw it somewhere ages ago). AFAIK the 
dynamic snitch is there to detect nodes with a temporarily reduced throughput 
and try to reduce the read load on them.


I may be wrong on this, so anyone else feel free to jump in. Here are some 
issues to consider...

- keyspace memory requirements are global, all nodes must have enough memory to 
support the CFs.
- During node moves, additions or deletions the token range may increase, nodes 
with less total  space than others would make this more complicated.
- during a write the mutation is sent to all replicas, a weak node that is a 
replica for a strong and busy node will be asked to store data from the strong 
node.
- read repair reads from all replicas
- when strong nodes that replicate to a weak node are compacting or repairing 
the dynamic snitch may order them lower than the weak node. Potentially 
increasing read requests on the weak one.
- down time for a strong node (or cluster partition) may result in increased 
read traffic to a weak node if all up replicas are needed to achieve the CL.
- nodes store their token range and the token range for RF-1 other nodes.

Overall when a node goes down other nodes need to be able to handle the 
potential extra load (connections, reads, storing HH). If you have some weak 
and some strong nodes there is a chance of the weak nodes been overwhelmed 
which may reduce the availability of your cluster.

Hope that helps.
Aaron

On 22/03/2011, at 10:54 PM, Daniel Doubleday <daniel.double...@gmx.net> wrote:

> 
> On Mar 22, 2011, at 5:09 AM, aaron morton wrote:
>> 1) You should use nodes with the same capacity (CPU, RAM, HDD), cassandra 
>> assumes they are all equal. 
> 
> Care to elaborate? While equal node will certainly make life easier I would 
> have thought that  dynamic snitch would take care of performance differences 
> and manual assignment of token ranges can yield to any data distribution. 
> Obviously if a node has twices  as much data will probably get twice the 
> load. But if that is no problem ...
> 
> Where does cassandra assume that all are equal?  
> 
> Cheers Daniel
> 
> 
>> 
>> 2) Not sure what exactly would happen. Am guessing either the node would 
>> shutdown or writes would eventually block, probably the former. If the node 
>> was up read performance may suffer (if there were more writes been sent in). 
>> If you really want to know more let me know and I may find time to dig into 
>> it. 
>> 
>> Also a node is be responsible for storing it's token range and acting as a 
>> replica for other token ranges. So reducing the token range may not have a 
>> dramatic affect on the storage requirements. 
>> 
>> Hope that helps. 
>> Aaron
>> 
>> On 22 Mar 2011, at 09:50, Jonathan Colby wrote:
>> 
>>> 
>>> This is a two part question ...
>>> 
>>> 1. If you have cassandra nodes with different sized hard disks,  how do you 
>>> deal with assigning the token ring such that the nodes with larger disks 
>>> get more data?   In other words, given equally distributed token ranges, 
>>> when the smaller disk nodes run out of space, the larger disk nodes with 
>>> still have unused capacity.    Or is installing a mixed hardware cluster a 
>>> no-no?
>>> 
>>> 2. What happens when a cassandra node runs out of disk space for its data 
>>> files?  Does it continue serving the data while not accepting new data?  Or 
>>> does the node break and require manual intervention?
>>> 
>>> This info has alluded me elsewhere.
>>> Jon
>> 
>

Re: cassandra nodes with mixed hard disk sizes

Reply via email to