My assumption is from not seeing anything in the code to explicitly support nodes of different specs (also think I saw it somewhere ages ago). AFAIK the dynamic snitch is there to detect nodes with a temporarily reduced throughput and try to reduce the read load on them.
I may be wrong on this, so anyone else feel free to jump in. Here are some issues to consider... - keyspace memory requirements are global, all nodes must have enough memory to support the CFs. - During node moves, additions or deletions the token range may increase, nodes with less total space than others would make this more complicated. - during a write the mutation is sent to all replicas, a weak node that is a replica for a strong and busy node will be asked to store data from the strong node. - read repair reads from all replicas - when strong nodes that replicate to a weak node are compacting or repairing the dynamic snitch may order them lower than the weak node. Potentially increasing read requests on the weak one. - down time for a strong node (or cluster partition) may result in increased read traffic to a weak node if all up replicas are needed to achieve the CL. - nodes store their token range and the token range for RF-1 other nodes. Overall when a node goes down other nodes need to be able to handle the potential extra load (connections, reads, storing HH). If you have some weak and some strong nodes there is a chance of the weak nodes been overwhelmed which may reduce the availability of your cluster. Hope that helps. Aaron On 22/03/2011, at 10:54 PM, Daniel Doubleday <daniel.double...@gmx.net> wrote: > > On Mar 22, 2011, at 5:09 AM, aaron morton wrote: >> 1) You should use nodes with the same capacity (CPU, RAM, HDD), cassandra >> assumes they are all equal. > > Care to elaborate? While equal node will certainly make life easier I would > have thought that dynamic snitch would take care of performance differences > and manual assignment of token ranges can yield to any data distribution. > Obviously if a node has twices as much data will probably get twice the > load. But if that is no problem ... > > Where does cassandra assume that all are equal? > > Cheers Daniel > > >> >> 2) Not sure what exactly would happen. Am guessing either the node would >> shutdown or writes would eventually block, probably the former. If the node >> was up read performance may suffer (if there were more writes been sent in). >> If you really want to know more let me know and I may find time to dig into >> it. >> >> Also a node is be responsible for storing it's token range and acting as a >> replica for other token ranges. So reducing the token range may not have a >> dramatic affect on the storage requirements. >> >> Hope that helps. >> Aaron >> >> On 22 Mar 2011, at 09:50, Jonathan Colby wrote: >> >>> >>> This is a two part question ... >>> >>> 1. If you have cassandra nodes with different sized hard disks, how do you >>> deal with assigning the token ring such that the nodes with larger disks >>> get more data? In other words, given equally distributed token ranges, >>> when the smaller disk nodes run out of space, the larger disk nodes with >>> still have unused capacity. Or is installing a mixed hardware cluster a >>> no-no? >>> >>> 2. What happens when a cassandra node runs out of disk space for its data >>> files? Does it continue serving the data while not accepting new data? Or >>> does the node break and require manual intervention? >>> >>> This info has alluded me elsewhere. >>> Jon >> >