thank you very much for the reply. which brings me more confidence on cassandra. I will try the automation tools, the examples you've listed seems quite promising!
about the decommission problem, here is the link: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html I am also trying to deploy cassandra across two datacenters(with 20ms latency). so I am worrying about the network latency will even make it worse. maybe I was misunderstanding the replication factor, doesn't it RF=3 means I could lose two nodes and still have one available(with 100% of the keys), once Nodes>=3? besides I am not sure what's twitters setting on RF, but it is possible to lose 3 nodes in the same time(facebook once encountered photo loss because there RAID broken, rarely happen though). I have the strong willing to set RF to a very high value... Thanks! On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com>wrote: > AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time > ago. Twitter is a vocal supporter with a large Apache Cassandra install, > e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half > dozen clusters. " > http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011 > > > > <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If > you are working with a 3 node cluster removing/rebuilding/what ever one node > will effect 33% of your capacity. When you scale up the contribution from > each individual node goes down, and the impact of one node going down is > less. Problems that happen with a few nodes will go away at scale, to be > replaced by a whole set of new ones. > > > 1): the load balance need to manually performed on every node, according > to: > > Yes > > 2): when adding new nodes, need to perform node repair and cleanup on every > node > > You only need to run cleanup, see > http://wiki.apache.org/cassandra/Operations#Bootstrap > > 3) when decommission a node, there is a chance that slow down the entire > cluster. (not sure why but I saw people ask around about it.) and the only > way to do is shutdown the entire the cluster, rsync the data, and start all > nodes without the decommission one. > > I cannot remember any specific cases where decommission requires a full > cluster stop, do you have a link? With regard to slowing down, the > decommission process will stream data from the node you are removing onto > the other nodes this can slow down the target node (I think it's more > intelligent now about what is moved). This will be exaggerated in a 3 node > cluster as you are removing 33% of the processing and adding some > (temporary) extra load to the remaining nodes. > > after all, I think there is alot of human work to do to maintain the > cluster which make it impossible to scale to thousands of nodes, > > Automation, Automation, Automation is the only way to go. > > Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, > munin, ganglia etc for monitoring. And > Ops Centre (http://www.datastax.com/products/opscenter) for cassandra > specific management. > > I am totally wrong about all of this, currently I am serving 1 millions pv > every day with Cassandra and it make me feel unsafe, I am afraid one day one > node crash will cause the data broken and all cluster goes wrong.... > > With RF3 and a 3Node cluster you have room to lose one node and the cluster > will be up for 100% of the keys. While better than having to worry about > *the* database server, it's still entry level fault tolerance. With RF 3 in > a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the > keys. > > Is there something you are specifically concerned about with your current > installation ? > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 8 Jul 2011, at 08:50, Yan Chunlu wrote: > > hi, all: > I am curious about how large that Cassandra can scale? > > from the information I can get, the largest usage is at facebook, which is > about 150 nodes. in the mean time they are using 2000+ nodes with Hadoop, > and yahoo even using 4000 nodes of Hadoop. > > I am not understand why is the situation, I only have little knowledge > with Cassandra and even no knowledge with Hadoop. > > > > currently I am using cassandra with 3 nodes and having problem bring one > back after it out of sync, the problems I encountered making me worry about > how cassandra could scale out: > > 1): the load balance need to manually performed on every node, according > to: > > def tokens(nodes): > > for x in xrange(nodes): > > print 2 ** 127 / nodes * x > > > > 2): when adding new nodes, need to perform node repair and cleanup on every > node > > > > 3) when decommission a node, there is a chance that slow down the entire > cluster. (not sure why but I saw people ask around about it.) and the only > way to do is shutdown the entire the cluster, rsync the data, and start all > nodes without the decommission one. > > > > > > after all, I think there is alot of human work to do to maintain the > cluster which make it impossible to scale to thousands of nodes, but I hope > I am totally wrong about all of this, currently I am serving 1 millions pv > every day with Cassandra and it make me feel unsafe, I am afraid one day one > node crash will cause the data broken and all cluster goes wrong.... > > > > in the contrary, relational database make me feel safety but it does not > scale well. > > > > thanks for any guidance here. > > > -- Charles