thank you very much for the reply. which brings me more confidence on
cassandra.
I will try the automation tools, the examples you've listed seems quite
promising!


about the decommission problem, here is the link:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
 I am also trying to deploy cassandra across two datacenters(with 20ms
latency). so I am worrying about the network latency will even make it
worse.

maybe I was misunderstanding the replication factor, doesn't it RF=3 means I
could lose two nodes and still have one available(with 100% of the keys),
once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it
is possible to lose 3 nodes in the same time(facebook once encountered photo
loss because there RAID broken, rarely happen though). I have the strong
willing to set RF to a very high value...

Thanks!


On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com>wrote:

> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time
> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
> dozen clusters. "
> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
>
>
>
> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If
> you are working with a 3 node cluster removing/rebuilding/what ever one node
> will effect 33% of your capacity. When you scale up the contribution from
> each individual node goes down, and the impact of one node going down is
> less. Problems that happen with a few nodes will go away at scale, to be
> replaced by a whole set of new ones.
>
>
> 1):  the load balance need to manually performed on every node, according
> to:
>
> Yes
>
> 2): when adding new nodes, need to perform node repair and cleanup on every
> node
>
> You only need to run cleanup, see
> http://wiki.apache.org/cassandra/Operations#Bootstrap
>
> 3) when decommission a node, there is a chance that slow down the entire
> cluster. (not sure why but I saw people ask around about it.) and the only
> way to do is shutdown the entire the cluster, rsync the data, and start all
> nodes without the decommission one.
>
> I cannot remember any specific cases where decommission requires a full
> cluster stop, do you have a link? With regard to slowing down, the
> decommission process will stream data from the node you are removing onto
> the other nodes this can slow down the target node (I think it's more
> intelligent now about what is moved). This will be exaggerated in a 3 node
> cluster as you are removing 33% of the processing and adding some
> (temporary) extra load to the remaining nodes.
>
> after all, I think there is alot of human work to do to maintain the
> cluster which make it impossible to scale to thousands of nodes,
>
> Automation, Automation, Automation is the only way to go.
>
> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick,
> munin, ganglia etc for monitoring. And
> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra
> specific management.
>
> I am totally wrong about all of this, currently I am serving 1 millions pv
> every day with Cassandra and it make me feel unsafe, I am afraid one day one
> node crash will cause the data broken and all cluster goes wrong....
>
> With RF3 and a 3Node cluster you have room to lose one node and the cluster
> will be up for 100% of the keys. While better than having to worry about
> *the* database server, it's still entry level fault tolerance. With RF 3 in
> a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the
> keys.
>
> Is there something you are specifically concerned about with your current
> installation ?
>
> Cheers
>
>   -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8 Jul 2011, at 08:50, Yan Chunlu wrote:
>
> hi, all:
> I am curious about how large that Cassandra can scale?
>
> from the information I can get, the largest usage is at facebook, which is
> about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop,
> and yahoo even using 4000 nodes of Hadoop.
>
> I am not understand why is the situation, I only have  little knowledge
> with Cassandra and even no knowledge with Hadoop.
>
>
>
> currently I am using cassandra with 3 nodes and having problem bring one
> back after it out of sync, the problems I encountered making me worry about
> how cassandra could scale out:
>
> 1):  the load balance need to manually performed on every node, according
> to:
>
> def tokens(nodes):
>
> for x in xrange(nodes):
>
> print 2 ** 127 / nodes * x
>
>
>
> 2): when adding new nodes, need to perform node repair and cleanup on every
> node
>
>
>
> 3) when decommission a node, there is a chance that slow down the entire
> cluster. (not sure why but I saw people ask around about it.) and the only
> way to do is shutdown the entire the cluster, rsync the data, and start all
> nodes without the decommission one.
>
>
>
>
>
> after all, I think there is alot of human work to do to maintain the
> cluster which make it impossible to scale to thousands of nodes, but I hope
> I am totally wrong about all of this, currently I am serving 1 millions pv
> every day with Cassandra and it make me feel unsafe, I am afraid one day one
> node crash will cause the data broken and all cluster goes wrong....
>
>
>
> in the contrary, relational database make me feel safety but it does not
> scale well.
>
>
>
> thanks for any guidance here.
>
>
>


-- 
Charles

Reply via email to