Re: how large cassandra could scale when it need to do manual operation?

Yan Chunlu Sat, 09 Jul 2011 23:56:00 -0700

I missed the consistency level part, thanks very much for the explanation.
that is clear enough.


On Sun, Jul 10, 2011 at 7:57 AM, aaron morton <aa...@thelastpickle.com>wrote:

> about the decommission problem, here is the link:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>
> The key part of that post is "and since the second node was under heavy
> load, and not enough ram, it was busy GCing and worked horribly slow" .
>
> maybe I was misunderstanding the replication factor, doesn't it RF=3 means
> I could lose two nodes and still have one available(with 100% of the keys),
> once Nodes>=3?
>
> When you start losing replicas the CL you use dictates if the cluster is
> still up for 100% of the keys. See
> http://thelastpickle.com/2011/06/13/Down-For-Me/
>
>  I have the strong willing to set RF to a very high value...
>
> As chris said 3 is about normal, it means the QUORUM CL is only 2 nodes.
>
> I am also trying to deploy cassandra across two datacenters(with 20ms
>> latency).
>>
> Lookup LOCAL_QUORUM in the wiki
>
> Hope that helps.
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9 Jul 2011, at 02:01, Chris Goffinet wrote:
>
> As mentioned by Aaron, yes we run hundreds of Cassandra nodes across
> multiple clusters. We run with RF of 2 and 3 (most common).
>
> We use commodity hardware and see failure all the time at this scale. We've
> never had 3 nodes that were in same replica set, fail all at once. We
> mitigate risk by being rack diverse, using different vendors for our hard
> drives, designed workflows to make sure machines get serviced in certain
> time windows and have an extensive automated burn-in process of (disk,
> memory, drives) to not roll out nodes/clusters that could fail right away.
>
> On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu <springri...@gmail.com> wrote:
>
>> thank you very much for the reply. which brings me more confidence on
>> cassandra.
>> I will try the automation tools, the examples you've listed seems quite
>> promising!
>>
>>
>> about the decommission problem, here is the link:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>>  I am also trying to deploy cassandra across two datacenters(with 20ms
>> latency). so I am worrying about the network latency will even make it
>> worse.
>>
>> maybe I was misunderstanding the replication factor, doesn't it RF=3 means
>> I could lose two nodes and still have one available(with 100% of the keys),
>> once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it
>> is possible to lose 3 nodes in the same time(facebook once encountered photo
>> loss because there RAID broken, rarely happen though). I have the strong
>> willing to set RF to a very high value...
>>
>> Thanks!
>>
>>
>> On Sat, Jul 9, 2011 at 5:22 AM, aaron morton <aa...@thelastpickle.com>wrote:
>>
>>> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time
>>> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
>>> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
>>> dozen clusters. "
>>> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
>>>
>>>
>>>
>>> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If
>>> you are working with a 3 node cluster removing/rebuilding/what ever one node
>>> will effect 33% of your capacity. When you scale up the contribution from
>>> each individual node goes down, and the impact of one node going down is
>>> less. Problems that happen with a few nodes will go away at scale, to be
>>> replaced by a whole set of new ones.
>>>
>>>
>>> 1):  the load balance need to manually performed on every node, according
>>> to:
>>>
>>> Yes
>>>
>>> 2): when adding new nodes, need to perform node repair and cleanup on
>>> every node
>>>
>>> You only need to run cleanup, see
>>> http://wiki.apache.org/cassandra/Operations#Bootstrap
>>>
>>> 3) when decommission a node, there is a chance that slow down the entire
>>> cluster. (not sure why but I saw people ask around about it.) and the only
>>> way to do is shutdown the entire the cluster, rsync the data, and start all
>>> nodes without the decommission one.
>>>
>>> I cannot remember any specific cases where decommission requires a full
>>> cluster stop, do you have a link? With regard to slowing down, the
>>> decommission process will stream data from the node you are removing onto
>>> the other nodes this can slow down the target node (I think it's more
>>> intelligent now about what is moved). This will be exaggerated in a 3 node
>>> cluster as you are removing 33% of the processing and adding some
>>> (temporary) extra load to the remaining nodes.
>>>
>>> after all, I think there is alot of human work to do to maintain the
>>> cluster which make it impossible to scale to thousands of nodes,
>>>
>>> Automation, Automation, Automation is the only way to go.
>>>
>>> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick,
>>> munin, ganglia etc for monitoring. And
>>> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra
>>> specific management.
>>>
>>> I am totally wrong about all of this, currently I am serving 1 millions
>>> pv every day with Cassandra and it make me feel unsafe, I am afraid one day
>>> one node crash will cause the data broken and all cluster goes wrong....
>>>
>>> With RF3 and a 3Node cluster you have room to lose one node and the
>>> cluster will be up for 100% of the keys. While better than having to worry
>>> about *the* database server, it's still entry level fault tolerance. With RF
>>> 3 in a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of
>>> the keys.
>>>
>>> Is there something you are specifically concerned about with your current
>>> installation ?
>>>
>>> Cheers
>>>
>>>   -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 8 Jul 2011, at 08:50, Yan Chunlu wrote:
>>>
>>> hi, all:
>>> I am curious about how large that Cassandra can scale?
>>>
>>> from the information I can get, the largest usage is at facebook, which
>>> is about 150 nodes.  in the mean time they are using 2000+ nodes with
>>> Hadoop, and yahoo even using 4000 nodes of Hadoop.
>>>
>>> I am not understand why is the situation, I only have  little knowledge
>>> with Cassandra and even no knowledge with Hadoop.
>>>
>>>
>>>
>>> currently I am using cassandra with 3 nodes and having problem bring one
>>> back after it out of sync, the problems I encountered making me worry about
>>> how cassandra could scale out:
>>>
>>> 1):  the load balance need to manually performed on every node, according
>>> to:
>>>
>>> def tokens(nodes):
>>>
>>> for x in xrange(nodes):
>>>
>>> print 2 ** 127 / nodes * x
>>>
>>>
>>>
>>> 2): when adding new nodes, need to perform node repair and cleanup on
>>> every node
>>>
>>>
>>>
>>> 3) when decommission a node, there is a chance that slow down the entire
>>> cluster. (not sure why but I saw people ask around about it.) and the only
>>> way to do is shutdown the entire the cluster, rsync the data, and start all
>>> nodes without the decommission one.
>>>
>>>
>>>
>>>
>>>
>>> after all, I think there is alot of human work to do to maintain the
>>> cluster which make it impossible to scale to thousands of nodes, but I hope
>>> I am totally wrong about all of this, currently I am serving 1 millions pv
>>> every day with Cassandra and it make me feel unsafe, I am afraid one day one
>>> node crash will cause the data broken and all cluster goes wrong....
>>>
>>>
>>>
>>> in the contrary, relational database make me feel safety but it does not
>>> scale well.
>>>
>>>
>>>
>>> thanks for any guidance here.
>>>
>>>
>>>
>>
>>
>> --
>> Charles
>>
>
>
>


-- 
Charles

Re: how large cassandra could scale when it need to do manual operation?

Reply via email to