Interesting idea, .

If it is like dividing the entire load on the system by 6, so if the
effective load is still the same and used SSD's for commit volume we could
get away with 1 commitlog SSD. Even if these 6 instances can handle 80% of
the load (compared to 1 on this machine), that might be acceptable. Could
that help?

I mean the benefits of smaller cassandra nodes does sound very enticing.
Sure we would probably have to throw more memory/CPU at it to get comparable
to 1 instance on that box (or reduce the load), but it does look better than
6 boxes.

On Tue, Dec 7, 2010 at 10:00 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> The major downside is you're going to want to let each instance have
> its own dedicated commitlog spindle too, unless you just don't have
> many updates.
>
> On Tue, Dec 7, 2010 at 8:25 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
> > I am quite ready to be stoned for this thread but I have been thinking
> > about this for a while and I just wanted to bounce these ideas of some
> > guru's.
> >
> > Cassandra does allow multiple data directories, but as far as I can
> > tell no one runs in this configuration. This is something that is very
> > different between the hbase architecture and the Cassandra
> > architecture. HBase borrows the concept from hadoop of JBOD
> > configurations. HBase has many small ish (~256 MB) regions managed
> > with Zookeeper. Cassandra has a few (1 per node) large node sized
> > Token Ranges managed by Gossip consensus.
> >
> > Lets say a node has 6 300 GB disks. You have the options of RAID5,
> > RAID6, RAID10, or RAID0. The problem I have found with these
> > configurations are major compactions (of even large minor ones) can
> > take a long time. Even if your disk is not heavily utilized this is a
> > lot of data to move through. Thus node joins take a long time. Node
> > moves take a long time.
> >
> > The idea behind "micrandra" is for a 6 disk system run 6 instances of
> > Cassandra, one per disk. Use the RackAwareSnitch to make sure no
> > replicas live on the same node.
> >
> > The downsides
> > 1) we would have to manage 6x the instances of cassandra
> > 2) we would have some overhead for each JVM.
> >
> > The upsides ?
> > 1) Since disk/instance failure only degrades the overall performance
> > 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
> > down a disk)
> > 2) Moves and joins have less work to do
> > 3) Can scale up a single node by adding a single disk to an existing
> > system (assuming the ram and cpu is light)
> > 4) OPP would be "easier" to balance out hot spots (maybe not on this
> > one in not an OPP)
> >
> > What does everyone thing? Does it ever make sense to run this way?
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Reply via email to