> about the decommission problem, here is the link: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html The key part of that post is "and since the second node was under heavy load, and not enough ram, it was busy GCing and worked horribly slow" .
> maybe I was misunderstanding the replication factor, doesn't it RF=3 means I > could lose two nodes and still have one available(with 100% of the keys), > once Nodes>=3? When you start losing replicas the CL you use dictates if the cluster is still up for 100% of the keys. See http://thelastpickle.com/2011/06/13/Down-For-Me/ > I have the strong willing to set RF to a very high value... As chris said 3 is about normal, it means the QUORUM CL is only 2 nodes. > I am also trying to deploy cassandra across two datacenters(with 20ms > latency). Lookup LOCAL_QUORUM in the wiki Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 9 Jul 2011, at 02:01, Chris Goffinet wrote: > As mentioned by Aaron, yes we run hundreds of Cassandra nodes across multiple > clusters. We run with RF of 2 and 3 (most common). > > We use commodity hardware and see failure all the time at this scale. We've > never had 3 nodes that were in same replica set, fail all at once. We > mitigate risk by being rack diverse, using different vendors for our hard > drives, designed workflows to make sure machines get serviced in certain time > windows and have an extensive automated burn-in process of (disk, memory, > drives) to not roll out nodes/clusters that could fail right away. > > On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu <springri...@gmail.com> wrote: > thank you very much for the reply. which brings me more confidence on > cassandra. > I will try the automation tools, the examples you've listed seems quite > promising! > > > about the decommission problem, here is the link: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html > I am also trying to deploy cassandra across two datacenters(with 20ms > latency). so I am worrying about the network latency will even make it worse. > > > maybe I was misunderstanding the replication factor, doesn't it RF=3 means I > could lose two nodes and still have one available(with 100% of the keys), > once Nodes>=3? besides I am not sure what's twitters setting on RF, but it > is possible to lose 3 nodes in the same time(facebook once encountered photo > loss because there RAID broken, rarely happen though). I have the strong > willing to set RF to a very high value... > > Thanks! > > > On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com> wrote: > AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time ago. > Twitter is a vocal supporter with a large Apache Cassandra install, e.g. > "Twitter currently runs a couple hundred Cassandra nodes across a half dozen > clusters. " > http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011 > > > If you are working with a 3 node cluster removing/rebuilding/what ever one > node will effect 33% of your capacity. When you scale up the contribution > from each individual node goes down, and the impact of one node going down is > less. Problems that happen with a few nodes will go away at scale, to be > replaced by a whole set of new ones. > > >> 1): the load balance need to manually performed on every node, according >> to: > > Yes > >> 2): when adding new nodes, need to perform node repair and cleanup on every >> node > > > > > > > You only need to run cleanup, see > http://wiki.apache.org/cassandra/Operations#Bootstrap > > > > > > > >> 3) when decommission a node, there is a chance that slow down the entire >> cluster. (not sure why but I saw people ask around about it.) and the only >> way to do is shutdown the entire the cluster, rsync the data, and start all >> nodes without the decommission one. > > I cannot remember any specific cases where decommission requires a full > cluster stop, do you have a link? With regard to slowing down, the > decommission process will stream data from the node you are removing onto the > other nodes this can slow down the target node (I think it's more intelligent > now about what is moved). This will be exaggerated in a 3 node cluster as you > are removing 33% of the processing and adding some (temporary) extra load to > the remaining nodes. > > > > > > > >> after all, I think there is alot of human work to do to maintain the cluster >> which make it impossible to scale to thousands of nodes, > > Automation, Automation, Automation is the only way to go. > > Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, munin, > ganglia etc for monitoring. And > > > > > > > Ops Centre (http://www.datastax.com/products/opscenter) for cassandra > specific management. > > > > > > > >> I am totally wrong about all of this, currently I am serving 1 millions pv >> every day with Cassandra and it make me feel unsafe, I am afraid one day one >> node crash will cause the data broken and all cluster goes wrong.... > > With RF3 and a 3Node cluster you have room to lose one node and the cluster > will be up for 100% of the keys. While better than having to worry about > *the* database server, it's still entry level fault tolerance. With RF 3 in a > 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the > keys. > > > > > > > > Is there something you are specifically concerned about with your current > installation ? > > Cheers > > > > > > > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 8 Jul 2011, at 08:50, Yan Chunlu wrote: > >> hi, all: >> I am curious about how large that Cassandra can scale? >> >> from the information I can get, the largest usage is at facebook, which is >> about 150 nodes. in the mean time they are using 2000+ nodes with Hadoop, >> and yahoo even using 4000 nodes of Hadoop. >> >> I am not understand why is the situation, I only have little knowledge with >> Cassandra and even no knowledge with Hadoop. >> >> >> >> currently I am using cassandra with 3 nodes and having problem bring one >> back after it out of sync, the problems I encountered making me worry about >> how cassandra could scale out: >> >> 1): the load balance need to manually performed on every node, according >> to: >> >> def tokens(nodes): >> >> for x in xrange(nodes): >> >> print 2 ** 127 / nodes * x >> >> >> >> 2): when adding new nodes, need to perform node repair and cleanup on every >> node >> >> >> >> 3) when decommission a node, there is a chance that slow down the entire >> cluster. (not sure why but I saw people ask around about it.) and the only >> way to do is shutdown the entire the cluster, rsync the data, and start all >> nodes without the decommission one. >> >> >> >> >> >> after all, I think there is alot of human work to do to maintain the cluster >> which make it impossible to scale to thousands of nodes, but I hope I am >> totally wrong about all of this, currently I am serving 1 millions pv every >> day with Cassandra and it make me feel unsafe, I am afraid one day one node >> crash will cause the data broken and all cluster goes wrong.... >> >> >> >> in the contrary, relational database make me feel safety but it does not >> scale well. >> >> >> >> thanks for any guidance here. >> > > > > > -- > Charles >