On Fri, Dec 10, 2010 at 11:39 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> On Thu, Dec 9, 2010 at 10:40 PM, Bill de hÓra <b...@dehora.net> wrote:
>>
>>
>> On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote:
>>
>> The idea behind "micrandra" is for a 6 disk system run 6 instances of
>> Cassandra, one per disk. Use the RackAwareSnitch to make sure no
>> replicas live on the same node.
>>
>> The downsides
>> 1) we would have to manage 6x the instances of cassandra
>> 2) we would have some overhead for each JVM.
>>
>> The upsides ?
>> 1) Since disk/instance failure only degrades the overall performance
>> 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
>> down a disk)
>> 2) Moves and joins have less work to do
>> 3) Can scale up a single node by adding a single disk to an existing
>> system (assuming the ram and cpu is light)
>> 4) OPP would be "easier" to balance out hot spots (maybe not on this
>> one in not an OPP)
>>
>> What does everyone thing? Does it ever make sense to run this way?
>>
>> It might for read heavy loads.
>>
>> When I looked at this, it was pointed out to me it's simpler to run fewer
>> bigger coarser nodes and take the entire node/server out when something goes
>> wrong. Basically give each Cassandra a server.
>>
>> I wonder if it would be better to rethink compaction if that's what's
>> driving the idea. It seems to what is biting everyone, along with GC.
>>
>> Bill
>
> Having 6 IP's on a machine would be a given in this setup. That is not
> an issue for me.
>
> It is not "biting" me. We all know that going from 10-20 nodes is
> pretty simple. However organic growth from 10-16, then a couple months
> later from 16 - 22, can take some effort with 300-600 GB per node,
> since each join and clean up can take a while. I am wondering if
> dividing a single large node into multiple smaller instances would
> make this type of growth easier.
>

To clearly explain the scenario. 5 nodes cluster each node has 20 %
ring. They each have 6 disks. ~ 200 GB data.
Going to 10 nodes is easy. You can join each one directly between each node.

However if you are going from say 5 -> 8. This gets dicey. Do you
calculate the ideal ring position for 10 nodes?
20% | 20% | 10% | 10% | 10% | 10% | 10% | 10%  This results in three
joins and several clean ups. With this choice you save time but hope
you do not get to the point where the first two nodes get overloaded.

If you decide to work with the ideal tokens for 8 you have many moves
joins. Until we have:

https://issues.apache.org/jira/browse/CASSANDRA-1418
https://issues.apache.org/jira/browse/CASSANDRA-1427

Having 6 smaller instances on a node with 6 disks. Would make it
easier to keep close to balanced without having to double your cluster
size each time you grow or doing a series of moves to get balanced
again.

Reply via email to