Re: how large cassandra could scale when it need to do manual operation?

aaron morton Sat, 09 Jul 2011 16:58:08 -0700

> about the decommission problem, here is the link:  
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
The key part of that post is "and since the second node was under heavy load, 
and not enough ram, it was busy GCing and worked horribly slow" .


> maybe I was misunderstanding the replication factor, doesn't it RF=3 means I 
> could lose two nodes and still have one available(with 100% of the keys), 
> once Nodes>=3?
When you start losing replicas the CL you use dictates if the cluster is still 
up for 100% of the keys. See http://thelastpickle.com/2011/06/13/Down-For-Me/ 

>  I have the strong willing to set RF to a very high value...
As chris said 3 is about normal, it means the QUORUM CL is only 2 nodes. 

> I am also trying to deploy cassandra across two datacenters(with 20ms 
> latency).

Lookup LOCAL_QUORUM in the wiki

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jul 2011, at 02:01, Chris Goffinet wrote:

> As mentioned by Aaron, yes we run hundreds of Cassandra nodes across multiple 
> clusters. We run with RF of 2 and 3 (most common). 
> 
> We use commodity hardware and see failure all the time at this scale. We've 
> never had 3 nodes that were in same replica set, fail all at once. We 
> mitigate risk by being rack diverse, using different vendors for our hard 
> drives, designed workflows to make sure machines get serviced in certain time 
> windows and have an extensive automated burn-in process of (disk, memory, 
> drives) to not roll out nodes/clusters that could fail right away.
> 
> On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu <springri...@gmail.com> wrote:
> thank you very much for the reply. which brings me more confidence on 
> cassandra.
> I will try the automation tools, the examples you've listed seems quite 
> promising!
> 
> 
> about the decommission problem, here is the link:  
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>  I am also trying to deploy cassandra across two datacenters(with 20ms 
> latency). so I am worrying about the network latency will even make it worse. 
>  
> 
> maybe I was misunderstanding the replication factor, doesn't it RF=3 means I 
> could lose two nodes and still have one available(with 100% of the keys), 
> once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it 
> is possible to lose 3 nodes in the same time(facebook once encountered photo 
> loss because there RAID broken, rarely happen though). I have the strong 
> willing to set RF to a very high value...
> 
> Thanks!
> 
> 
> On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com> wrote:
> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time ago. 
> Twitter is a vocal supporter with a large Apache Cassandra install, e.g. 
> "Twitter currently runs a couple hundred Cassandra nodes across a half dozen 
> clusters. " 
> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
> 
> 
> If you are working with a 3 node cluster removing/rebuilding/what ever one 
> node will effect 33% of your capacity. When you scale up the contribution 
> from each individual node goes down, and the impact of one node going down is 
> less. Problems that happen with a few nodes will go away at scale, to be 
> replaced by a whole set of new ones.   
> 
> 
>> 1):  the load balance need to manually performed on every node, according 
>> to: 
> 
> Yes
>       
>> 2): when adding new nodes, need to perform node repair and cleanup on every 
>> node 
> 
> 
> 
> 
> 
> 
> You only need to run cleanup, see 
> http://wiki.apache.org/cassandra/Operations#Bootstrap
> 
> 
> 
> 
> 
> 
> 
>> 3) when decommission a node, there is a chance that slow down the entire 
>> cluster. (not sure why but I saw people ask around about it.) and the only 
>> way to do is shutdown the entire the cluster, rsync the data, and start all 
>> nodes without the decommission one. 
> 
> I cannot remember any specific cases where decommission requires a full 
> cluster stop, do you have a link? With regard to slowing down, the 
> decommission process will stream data from the node you are removing onto the 
> other nodes this can slow down the target node (I think it's more intelligent 
> now about what is moved). This will be exaggerated in a 3 node cluster as you 
> are removing 33% of the processing and adding some (temporary) extra load to 
> the remaining nodes. 
> 
> 
> 
> 
> 
> 
> 
>> after all, I think there is alot of human work to do to maintain the cluster 
>> which make it impossible to scale to thousands of nodes, 
> 
> Automation, Automation, Automation is the only way to go. 
> 
> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, munin, 
> ganglia etc for monitoring. And 
> 
> 
> 
> 
> 
> 
> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra 
> specific management.
> 
> 
> 
> 
> 
> 
> 
>> I am totally wrong about all of this, currently I am serving 1 millions pv 
>> every day with Cassandra and it make me feel unsafe, I am afraid one day one 
>> node crash will cause the data broken and all cluster goes wrong....
> 
> With RF3 and a 3Node cluster you have room to lose one node and the cluster 
> will be up for 100% of the keys. While better than having to worry about 
> *the* database server, it's still entry level fault tolerance. With RF 3 in a 
> 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the 
> keys. 
> 
> 
> 
> 
> 
> 
> 
> Is there something you are specifically concerned about with your current 
> installation ? 
> 
> Cheers
> 
> 
> 
> 
> 
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 8 Jul 2011, at 08:50, Yan Chunlu wrote:
> 
>> hi, all:
>> I am curious about how large that Cassandra can scale? 
>> 
>> from the information I can get, the largest usage is at facebook, which is 
>> about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop, 
>> and yahoo even using 4000 nodes of Hadoop. 
>> 
>> I am not understand why is the situation, I only have  little knowledge with 
>> Cassandra and even no knowledge with Hadoop. 
>> 
>> 
>> 
>> currently I am using cassandra with 3 nodes and having problem bring one 
>> back after it out of sync, the problems I encountered making me worry about 
>> how cassandra could scale out: 
>> 
>> 1):  the load balance need to manually performed on every node, according 
>> to: 
>> 
>> def tokens(nodes): 
>> 
>> for x in xrange(nodes): 
>> 
>> print 2 ** 127 / nodes * x 
>> 
>> 
>> 
>> 2): when adding new nodes, need to perform node repair and cleanup on every 
>> node 
>> 
>> 
>> 
>> 3) when decommission a node, there is a chance that slow down the entire 
>> cluster. (not sure why but I saw people ask around about it.) and the only 
>> way to do is shutdown the entire the cluster, rsync the data, and start all 
>> nodes without the decommission one. 
>> 
>> 
>> 
>> 
>> 
>> after all, I think there is alot of human work to do to maintain the cluster 
>> which make it impossible to scale to thousands of nodes, but I hope I am 
>> totally wrong about all of this, currently I am serving 1 millions pv every 
>> day with Cassandra and it make me feel unsafe, I am afraid one day one node 
>> crash will cause the data broken and all cluster goes wrong.... 
>> 
>> 
>> 
>> in the contrary, relational database make me feel safety but it does not 
>> scale well. 
>> 
>> 
>> 
>> thanks for any guidance here.
>> 
> 
> 
> 
> 
> -- 
> Charles
>

Re: how large cassandra could scale when it need to do manual operation?

Reply via email to