Re: Load balancing

Oleg Anastasjev Fri, 18 Jun 2010 01:03:52 -0700

Mubarak Seyed <seyed <at> apple.com> writes:

> 
> - How does client (application) connect to cassandra cluster? Is it always for
one node (and thrift can get ring info) and send the request to connected node


This depends on client library you use. Any cassandra node can accept client
connections and forward request to node owning requested data.

> - If we send 300k records from each node, it is a over kill for a node which
accepts client connection, does
> node get choked?

Of course in your situation no single node can handle all load. So you have to
connect to several nodes. 
The best way, I believe, is to connect right to the node, owning data you need.
Take a look to org/apache/cassandra/client/RingCache.java for an example how to
read ring state and forward requests to right node.

> - How do we design a cassandra cluster to make sure that insert get
distributed to more than one nodes?
> - If i prefer OrderPreservingPartition as a partitioner, how does single node
handle all the 200k records?

If you prefer OPP, you have 2 ways (manual and automatic): 
1. If you know distribution of keys in your data, you distribute token values
between you nodes in a way, which ensures unform key distribution. Imagine, if
you have single byte keys ranging from 0 to 255 and 64 nodes (i assume data is
distributed uniformly across all keys for simplicity). For this you'll have to
manually configure <Token> in storage-conf of 1st node to 0, 2nd = 4, 3rd = 8,
4th=12 and so on.
2. The automatic way is to start cassandra cluster with small node count, import
data to it and bootstrap rest of nodes, specifying bootstrap=true and empty
value for token in storage conf. This way cassandra will try to balance data by
itself.


200k of records are not big deal for cassandra, IMHO, but of course this depends
on your hardware and size of records.

Anyway, good idea is to test your configuration with real data first.

Re: Load balancing

Reply via email to