I missed the consistency level part, thanks very much for the explanation. that is clear enough.
On Sun, Jul 10, 2011 at 7:57 AM, aaron morton <aa...@thelastpickle.com>wrote: > about the decommission problem, here is the link: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html > > The key part of that post is "and since the second node was under heavy > load, and not enough ram, it was busy GCing and worked horribly slow" . > > maybe I was misunderstanding the replication factor, doesn't it RF=3 means > I could lose two nodes and still have one available(with 100% of the keys), > once Nodes>=3? > > When you start losing replicas the CL you use dictates if the cluster is > still up for 100% of the keys. See > http://thelastpickle.com/2011/06/13/Down-For-Me/ > > I have the strong willing to set RF to a very high value... > > As chris said 3 is about normal, it means the QUORUM CL is only 2 nodes. > > I am also trying to deploy cassandra across two datacenters(with 20ms >> latency). >> > Lookup LOCAL_QUORUM in the wiki > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 9 Jul 2011, at 02:01, Chris Goffinet wrote: > > As mentioned by Aaron, yes we run hundreds of Cassandra nodes across > multiple clusters. We run with RF of 2 and 3 (most common). > > We use commodity hardware and see failure all the time at this scale. We've > never had 3 nodes that were in same replica set, fail all at once. We > mitigate risk by being rack diverse, using different vendors for our hard > drives, designed workflows to make sure machines get serviced in certain > time windows and have an extensive automated burn-in process of (disk, > memory, drives) to not roll out nodes/clusters that could fail right away. > > On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu <springri...@gmail.com> wrote: > >> thank you very much for the reply. which brings me more confidence on >> cassandra. >> I will try the automation tools, the examples you've listed seems quite >> promising! >> >> >> about the decommission problem, here is the link: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html >> I am also trying to deploy cassandra across two datacenters(with 20ms >> latency). so I am worrying about the network latency will even make it >> worse. >> >> maybe I was misunderstanding the replication factor, doesn't it RF=3 means >> I could lose two nodes and still have one available(with 100% of the keys), >> once Nodes>=3? besides I am not sure what's twitters setting on RF, but it >> is possible to lose 3 nodes in the same time(facebook once encountered photo >> loss because there RAID broken, rarely happen though). I have the strong >> willing to set RF to a very high value... >> >> Thanks! >> >> >> On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com>wrote: >> >>> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time >>> ago. Twitter is a vocal supporter with a large Apache Cassandra install, >>> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half >>> dozen clusters. " >>> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011 >>> >>> >>> >>> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If >>> you are working with a 3 node cluster removing/rebuilding/what ever one node >>> will effect 33% of your capacity. When you scale up the contribution from >>> each individual node goes down, and the impact of one node going down is >>> less. Problems that happen with a few nodes will go away at scale, to be >>> replaced by a whole set of new ones. >>> >>> >>> 1): the load balance need to manually performed on every node, according >>> to: >>> >>> Yes >>> >>> 2): when adding new nodes, need to perform node repair and cleanup on >>> every node >>> >>> You only need to run cleanup, see >>> http://wiki.apache.org/cassandra/Operations#Bootstrap >>> >>> 3) when decommission a node, there is a chance that slow down the entire >>> cluster. (not sure why but I saw people ask around about it.) and the only >>> way to do is shutdown the entire the cluster, rsync the data, and start all >>> nodes without the decommission one. >>> >>> I cannot remember any specific cases where decommission requires a full >>> cluster stop, do you have a link? With regard to slowing down, the >>> decommission process will stream data from the node you are removing onto >>> the other nodes this can slow down the target node (I think it's more >>> intelligent now about what is moved). This will be exaggerated in a 3 node >>> cluster as you are removing 33% of the processing and adding some >>> (temporary) extra load to the remaining nodes. >>> >>> after all, I think there is alot of human work to do to maintain the >>> cluster which make it impossible to scale to thousands of nodes, >>> >>> Automation, Automation, Automation is the only way to go. >>> >>> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, >>> munin, ganglia etc for monitoring. And >>> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra >>> specific management. >>> >>> I am totally wrong about all of this, currently I am serving 1 millions >>> pv every day with Cassandra and it make me feel unsafe, I am afraid one day >>> one node crash will cause the data broken and all cluster goes wrong.... >>> >>> With RF3 and a 3Node cluster you have room to lose one node and the >>> cluster will be up for 100% of the keys. While better than having to worry >>> about *the* database server, it's still entry level fault tolerance. With RF >>> 3 in a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of >>> the keys. >>> >>> Is there something you are specifically concerned about with your current >>> installation ? >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 8 Jul 2011, at 08:50, Yan Chunlu wrote: >>> >>> hi, all: >>> I am curious about how large that Cassandra can scale? >>> >>> from the information I can get, the largest usage is at facebook, which >>> is about 150 nodes. in the mean time they are using 2000+ nodes with >>> Hadoop, and yahoo even using 4000 nodes of Hadoop. >>> >>> I am not understand why is the situation, I only have little knowledge >>> with Cassandra and even no knowledge with Hadoop. >>> >>> >>> >>> currently I am using cassandra with 3 nodes and having problem bring one >>> back after it out of sync, the problems I encountered making me worry about >>> how cassandra could scale out: >>> >>> 1): the load balance need to manually performed on every node, according >>> to: >>> >>> def tokens(nodes): >>> >>> for x in xrange(nodes): >>> >>> print 2 ** 127 / nodes * x >>> >>> >>> >>> 2): when adding new nodes, need to perform node repair and cleanup on >>> every node >>> >>> >>> >>> 3) when decommission a node, there is a chance that slow down the entire >>> cluster. (not sure why but I saw people ask around about it.) and the only >>> way to do is shutdown the entire the cluster, rsync the data, and start all >>> nodes without the decommission one. >>> >>> >>> >>> >>> >>> after all, I think there is alot of human work to do to maintain the >>> cluster which make it impossible to scale to thousands of nodes, but I hope >>> I am totally wrong about all of this, currently I am serving 1 millions pv >>> every day with Cassandra and it make me feel unsafe, I am afraid one day one >>> node crash will cause the data broken and all cluster goes wrong.... >>> >>> >>> >>> in the contrary, relational database make me feel safety but it does not >>> scale well. >>> >>> >>> >>> thanks for any guidance here. >>> >>> >>> >> >> >> -- >> Charles >> > > > -- Charles