Re: Is there a way to add a new node to a cluster but not sync old data?

Ryan Svihla Thu, 22 Jan 2015 13:54:07 -0800

Usually this is about tuning, and this isn't an uncommon situation for new
users.


Potential steps to take

1) reduce stream throughput to a point that your cluster can handle it.
This is probably your most important tool. The default throughput depending
on version is 200mb or 400mb, go ahead and drop it down further and
further, I've had to use as low as 15 megs on all nodes to get a single
node bootstrapped. Use nodetool for runtime change of this configuration
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsSetStreamThroughput.html

2) Scale up. if you run out of disk space on nodes and can't compact
anymore then add more disk and change where the data is stored ( make sure
your new disk is fast enough to keep up). If it's load add more cpu and ram.
3) Do some root cause analysis. I can't tell you how many of these issues
are bad JVM tuning, or bad cassandra settings.

On Thu, Jan 22, 2015 at 7:50 AM, Kai Wang <dep...@gmail.com> wrote:

> In last year's summit there was a presentation from Instaclustr -
> https://www.instaclustr.com/meetups/presentation-by-ben-bromhead-at-cassandra-summit-2014-san-francisco/.
> It could be the solution you are looking for. However I don't see the code
> being checked in or JIRA being created. So for now you'd better plan the
> capacity carefully.
>
>
> On Wed, Jan 21, 2015 at 11:21 PM, Yatong Zhang <bluefl...@gmail.com>
> wrote:
>
>> Yes, my cluster is almost full and there are lots of pending tasks. You
>> helped me a lot and thank you Eric~
>>
>> On Thu, Jan 22, 2015 at 11:59 AM, Eric Stevens <migh...@gmail.com> wrote:
>>
>>> Yes, bootstrapping a new node will cause read loads on your existing
>>> nodes - it is becoming the owner and replica of a whole new set of existing
>>> data.  To do that it needs to know what data it's now responsible for, and
>>> that's what bootstrapping is for.
>>>
>>> If you're at the point where bootstrapping a new node is placing a
>>> too-heavy burden on your existing nodes, you may be dangerously close to or
>>> even past the tipping point where you ought to have already grown your
>>> cluster.  You need to grow your cluster as soon as possible, and chances
>>> are you're close to no longer being able to keep up with compaction (see
>>> nodetool compactionstats, make sure pending tasks is <5, preferably 0 or
>>> 1).  Once you're falling behind on compaction, it becomes difficult to
>>> successfully bootstrap new nodes, and you're in a very tough spot.
>>>
>>>
>>> On Wed, Jan 21, 2015 at 7:43 PM, Yatong Zhang <bluefl...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for the reply. The bootstrap of new node put a heavy burden on
>>>> the whole cluster and I don't know why. So that' the issue I want to fix
>>>> actually.
>>>>
>>>> On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens <migh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes, but it won't do what I suspect you're hoping for.  If you disable
>>>>> auto_bootstrap in cassandra.yaml the node will join the cluster and will
>>>>> not stream any old data from existing nodes.
>>>>>
>>>>> The cluster will now be in an inconsistent state.  If you bring enough
>>>>> nodes online this way to violate your read consistency level (eg RF=3,
>>>>> CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
>>>>> missing data that they ought to have returned.
>>>>>
>>>>> There is no way to bring a new node online and have it be responsible
>>>>> just for new data, and have no responsibility for old data.  It *will* be
>>>>> responsible for old data, it just won't *know* about the old data it
>>>>> should be responsible for.  Executing a repair will fix this, but only
>>>>> because the existing nodes will stream all the missing data to the new
>>>>> node.  This will create more pressure on your cluster than just normal
>>>>> bootstrapping would have.
>>>>>
>>>>> I can't think of any reason you'd want to do that unless you needed to
>>>>> grow your cluster really quickly, and were ok with corrupting your old 
>>>>> data.
>>>>>
>>>>> On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang <bluefl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I am using C* 2.0.10 and I was trying to add a new node to a
>>>>>> cluster(actually replace a dead node). But after added the new node some
>>>>>> other nodes in the cluster had a very high work-load and affected the 
>>>>>> whole
>>>>>> performance of the cluster.
>>>>>> So I am wondering is there a way to add a new node and this node only
>>>>>> afford new data?
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


-- 

Thanks,
Ryan Svihla

Re: Is there a way to add a new node to a cluster but not sync old data?

Reply via email to