Re: Integrity of batch_insert and also what about sharding?

Benjamin Black Wed, 07 Apr 2010 16:08:28 -0700

On Wed, Apr 7, 2010 at 3:41 PM, banks <bankse...@gmail.com> wrote:
>
> 2. each cassandra node essentially has the same datastore as all nodes,
> correct?


No.  The ReplicationFactor you set determines how many copies of a
piece of data you want.  If your number of nodes is higher than your
RF, as is common, you will not have the same data on all nodes.  The
exact set of nodes to which data is replicated is determined by the
row key, placement strategy, and node tokens.

> So if I've got 3 terabytes of data and 3 cassandra nodes I'm
> eating 9tb on the SAN?  are there provisions for essentially sharding across
> nodes... so that each node only handles a given keyrange, if so where is the
> howto on that?
>

Sharding is a concept from databases that don't have native
replication and so need a term to describe what they bolt on for the
functionality.  Distribution amongst nodes based on key ranges is how
Cassandra always operates.


b

Re: Integrity of batch_insert and also what about sharding?

Reply via email to