Re: why leaving riak cluster so slowly and how to accelerate the speed

Dmitri Zagidulin Fri, 14 Aug 2015 12:26:38 -0700

Pending 0% just means no pending transfers, the cluster state is stable.

If you've successfully tested the process on a test cluster, there's no
reason why it'd be different in production.


On Friday, August 14, 2015, changmao wang <wang.chang...@gmail.com> wrote:

> During last three days, I setup a developing riak cluster with five nodes,
> and used "s3cmd" to upload 18GB testing data(maybe 20 thousands of files).
> After that, I tried to let one node leaving the cluster, and then shutdown
> and mark down it. Replacing the IP address and joining the cluster again.
> The above whole processes were successful. However, I'm not sure whether
> no not it can be done on production environment.
>
> I followed below the docs to do above steps:
>
> http://docs.basho.com/riak/latest/ops/running/nodes/renaming/
>
> After I run "riak-admin cluster leave riak@'x.x.x.x'" ,"riak-admin
> cluster plan", "riak-admin cluster commit", then checked the member-status,
> the main difference of leaving cluster on production and developing
> environment are as below:
>
> root@cluster-s3-dev-hd1:~# riak-admin member-status
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
>
> -------------------------------------------------------------------------------
> leaving    18.8%      0.0%    'riak@10.21.236.185
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.185');>'
> valid      21.9%     25.0%    'riak@10.21.236.181
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.181');>'
> valid      21.9%     25.0%    'riak@10.21.236.182
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.182');>'
> valid      18.8%     25.0%    'riak@10.21.236.183
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.183');>'
> valid      18.8%     25.0%    'riak@10.21.236.184
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.184');>'
>
> -------------------------------------------------------------------------------
>
> several minutes elapsed, the then checking the status as below:
>
>
> root@cluster-s3-dev-hd1:~# riak-admin member-status
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
>
> -------------------------------------------------------------------------------
> leaving    12.5%      0.0%    'riak@10.21.236.185
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.185');>'
> valid      21.9%     25.0%    'riak@10.21.236.181
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.181');>'
> valid      28.1%     25.0%    'riak@10.21.236.182
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.182');>'
> valid      18.8%     25.0%    'riak@10.21.236.183
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.183');>'
> valid      18.8%     25.0%    'riak@10.21.236.184
> <javascript:_e(%7B%7D,'cvml','riak@10.21.236.184');>'
>
> -------------------------------------------------------------------------------
> Valid:4 / Leaving:1 / Exiting:0 / Joining:0 / Down:0
>
> After that, I shutdown riak  with "riak stop", and mark down it on active
> nodes.
> My question is what's the meaning ot "Pending 0.0%"?
>
> On production cluster, the status are as below:
> root@cluster1-hd12:/root/scripts# riak-admin transfers
> 'riak@10.21.136.94 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.94');>'
> waiting to handoff 5 partitions
> 'riak@10.21.136.93 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.93');>'
> waiting to handoff 5 partitions
> 'riak@10.21.136.92 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.92');>'
> waiting to handoff 5 partitions
> 'riak@10.21.136.91 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.91');>'
> waiting to handoff 5 partitions
> 'riak@10.21.136.86 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.86');>'
> waiting to handoff 5 partitions
> 'riak@10.21.136.81 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.81');>'
> waiting to handoff 2 partitions
> 'riak@10.21.136.76 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.76');>'
> waiting to handoff 3 partitions
> 'riak@10.21.136.71 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.71');>'
> waiting to handoff 5 partitions
> 'riak@10.21.136.66 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.66');>'
> waiting to handoff 5 partitions
>
> And there're active transfers.  On developing environment, there're no
> active transfers after my running of "riak-admin cluster commit".
> Can I follow the same steps as developing environment to run it on
> production cluster?
>
>
>
> On Wed, Aug 12, 2015 at 10:39 PM, Dmitri Zagidulin <dzagidu...@basho.com
> <javascript:_e(%7B%7D,'cvml','dzagidu...@basho.com');>> wrote:
>
>> Responses inline.
>>
>>
>> On Tue, Aug 11, 2015 at 12:53 PM, changmao wang <wang.chang...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','wang.chang...@gmail.com');>> wrote:
>>
>>> 1. About backuping new nodes of four and then using 'riak-admin
>>> force-replace'. what's the status of new added nodes?
>>> as you know, we want to replace one of leaving nodes.
>>>
>>
>> I don't understand the question. Doing 'riak-admin force-replace' on one
>> of the nodes that's leaving should overwrite the leave request and tell it
>> to change its node id / ip address. (If that doesn't work, stop the leaving
>> node, and do a 'riak-admin reip' command instead).
>>
>>
>>
>>> 2. what's the risk of 'riak-admin force-remove' 'riak@10.21.136.91
>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.91');>' without backup?
>>> As you know, now the node(riak@10.21.136.91
>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.91');>) is a member of the
>>> cluster, and keeping almost 2.5TB data, maybe 10 percent of the whole
>>> cluster.
>>>
>>
>> The only reason I asked about backup is because it sounded like you
>> cleared the disk on it. If it currently has the data, then it'll be fine.
>> Force-remove just changes the IP address, and doesn't delete the data or
>> anything.
>>
>>
>> On Tue, Aug 11, 2015 at 7:32 PM, Dmitri Zagidulin <dzagidu...@basho.com
>> <javascript:_e(%7B%7D,'cvml','dzagidu...@basho.com');>> wrote:
>>
>>> 1. How to force leave "leaving"'s nodes without data loss?
>>>
>>> This depends on - did you back up the data directory of the 4 new nodes,
>>> before you reformatted them?
>>> If you backed them up (and then restored the data directory once you
>>> reformatted them), you can try:
>>>
>>> riak-admin force-replace 'riak@10.21.136.91
>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.91');>' 'riak@<whatever
>>> your new ip address is for that node>'
>>> (same for the other 3)
>>>
>>> If you did not back up those nodes, the only thing you can do is force
>>> them to leave, and then join the new ones. So, for each of the 4:
>>>
>>> riak-admin force-remove 'riak@10.21.136.91
>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.91');>' 'riak@10.21.136.66
>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.66');>'
>>> (same for the other 3)
>>>
>>> In either case, after force-replacing or force-removing, you have to
>>> join the new nodes to the cluster, before you commit.
>>>
>>> riak-admin join 'riak@new node' 'riak@10.21.136.66
>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.66');>'
>>> (same for the other 3)
>>> and finally:
>>> riak-cluster plan
>>> riak-cluster commit
>>>
>>> As for the error, the reason you're seeing it, is because the other
>>> nodes can't contact the 4 that are supposed to be leaving. (Since you wiped
>>> them).
>>> The amount of time that passed doesn't matter, the cluster will be
>>> waiting for those nodes to leave indefinitely, unless you force-remove or
>>> force-replace.
>>>
>>>
>>>
>>> On Tue, Aug 11, 2015 at 1:32 AM, changmao wang <wang.chang...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','wang.chang...@gmail.com');>> wrote:
>>>
>>>> HI Dmitri,
>>>>
>>>> For your question,
>>>> 3) Re-formatted those four nodes and re-installed Riak. Here is where
>>>> it gets tricky though. Several questions for you:
>>>> - Did you attempt to re-join those 4 reinstalled nodes into the
>>>> cluster? What was the output of the cluster join and cluster plan commands?
>>>> - Did the IP address change, after they were reformatted? If so, you
>>>> probably need to use something like 'reip' at this point:
>>>> http://docs.basho.com/riak/latest/ops/running/tools/riak-admin/#reip
>>>>
>>>> I did  NOT try to re-join those 4 re-join those 4 reinstalled nodes
>>>> into the cluster. As you know, member-status shows 'they're leaving" as
>>>> below:
>>>> riak-admin member-status
>>>> ================================= Membership
>>>> ==================================
>>>> Status     Ring    Pending    Node
>>>>
>>>> -------------------------------------------------------------------------------
>>>> leaving    10.9%     10.9%    'riak@10.21.136.91
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.91');>'
>>>> leaving     9.4%     10.9%    'riak@10.21.136.92
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.92');>'
>>>> leaving     7.8%     10.9%    'riak@10.21.136.93
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.93');>'
>>>> leaving     7.8%     10.9%    'riak@10.21.136.94
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.94');>'
>>>> valid      10.9%     10.9%    'riak@10.21.136.66
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.66');>'
>>>> valid      10.9%     10.9%    'riak@10.21.136.71
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.71');>'
>>>> valid      14.1%     10.9%    'riak@10.21.136.76
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.76');>'
>>>> valid      17.2%     12.5%    'riak@10.21.136.81
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.81');>'
>>>> valid      10.9%     10.9%    'riak@10.21.136.86
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.86');>'
>>>>
>>>> -------------------------------------------------------------------------------
>>>> Valid:5 / Leaving:4 / Exiting:0 / Joining:0 / Down:0
>>>>
>>>> two weeks elapsed, 'riak-admin member-status' shows same result. I
>>>> don't know which step ring hand off?
>>>>
>>>> I did not changed the IP address of four newly adding nodes.
>>>>
>>>> My questions:
>>>>
>>>> 1. How to force leave "leaving"'s nodes without data loss?
>>>> 2. I have found some errors related to handoff of partition in
>>>> /etc/riak/log/errors.
>>>> Details are as below:
>>>>
>>>> 2015-07-30 16:04:33.643 [error]
>>>> <0.12872.15>@riak_core_handoff_sender:start_fold:262 ownership_transfer
>>>> transfer of riak_kv_vnode from 'riak@10.21.136.76
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.76');>'
>>>> 45671926166590716193865151022383844364247891968 to 'riak@10.21.136.93
>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.93');>'
>>>> 45671926166590716193865151022383844364247891968 failed because of enotconn
>>>> 2015-07-30 16:04:33.643 [error]
>>>> <0.197.0>@riak_core_handoff_manager:handle_info:289 An outbound handoff of
>>>> partition riak_kv_vnode 45671926166590716193865151022383844364247891968 was
>>>> terminated for reason: {shutdown,{error,enotconn}}
>>>>
>>>>
>>>>
>>>> I have searched it with google and found related articles. However,
>>>> there's no solution.
>>>>
>>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-October/016052.html
>>>>
>>>>
>>>> On Mon, Aug 10, 2015 at 10:09 PM, Dmitri Zagidulin <
>>>> dzagidu...@basho.com
>>>> <javascript:_e(%7B%7D,'cvml','dzagidu...@basho.com');>> wrote:
>>>>
>>>>> Hi Changmao,
>>>>>
>>>>> The state of the cluster can be determined from running 'riak-admin
>>>>> member-status' and 'riak-admin ring-status'.
>>>>> If I understand the sequence of events, you:
>>>>> 1) Joined four new nodes to the cluster. (Which crashed due to not
>>>>> enough disk space)
>>>>> 2) Removed them from the cluster via 'riak-admin cluster leave'.  This
>>>>> is a "planned remove" command, and expects for the nodes to gradually hand
>>>>> off their partitions (to transfer ownership) before actually leaving.  So
>>>>> this is probably the main problem - the ring is stuck waiting for those
>>>>> nodes to properly hand off.
>>>>>
>>>>> 3) Re-formatted those four nodes and re-installed Riak. Here is where
>>>>> it gets tricky though. Several questions for you:
>>>>> - Did you attempt to re-join those 4 reinstalled nodes into the
>>>>> cluster? What was the output of the cluster join and cluster plan 
>>>>> commands?
>>>>> - Did the IP address change, after they were reformatted? If so, you
>>>>> probably need to use something like 'reip' at this point:
>>>>> http://docs.basho.com/riak/latest/ops/running/tools/riak-admin/#reip
>>>>>
>>>>> The 'failed because of enotconn' error message is happening because
>>>>> the cluster is waiting to hand off partitions to .94, but cannot connect 
>>>>> to
>>>>> it.
>>>>>
>>>>> Anyways, here's what I recommend. If you can lose the data, it's
>>>>> probably easier to format and reinstall the whole cluster.
>>>>> If not, you can 'force-remove' those four nodes, one by one (see
>>>>> http://docs.basho.com/riak/latest/ops/running/cluster-admin/#force-remove
>>>>> )
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 6, 2015 at 11:55 PM, changmao wang <
>>>>> wang.chang...@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','wang.chang...@gmail.com');>> wrote:
>>>>>
>>>>>> Dmitri,
>>>>>>
>>>>>> Thanks for your quick reply.
>>>>>> my question are as below:
>>>>>> 1. what's the current status of the whole cluster? Is't doing data
>>>>>> balance?
>>>>>> 2. there's so many errors during one of the node error log. how to
>>>>>> handle it?
>>>>>> 2015-08-05 01:38:59.717 [error]
>>>>>> <0.23000.298>@riak_core_handoff_sender:start_fold:262 ownership_transfer
>>>>>> transfer of riak_kv_vnode from 'riak@10.21.136.81
>>>>>> <javascript:_e(%7B%7D,'cvml','riak@10.21.136.81');>'
>>>>>> 525227150915793236229449236757414210188850757632 to '
>>>>>> riak@10.21.136.94 <javascript:_e(%7B%7D,'cvml','riak@10.21.136.94');>'
>>>>>> 525227150915793236229449236757414210188850757632 failed because of 
>>>>>> enotconn
>>>>>> 2015-08-05 01:38:59.718 [error]
>>>>>> <0.195.0>@riak_core_handoff_manager:handle_info:289 An outbound handoff 
>>>>>> of
>>>>>> partition riak_kv_vnode 525227150915793236229449236757414210188850757632
>>>>>> was terminated for reason: {shutdown,{error,enotconn}}
>>>>>>
>>>>>> During the last 5 days, there's no changes of the "riak-admin member
>>>>>> status" output.
>>>>>> 3. how to accelerate the data balance?
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 7, 2015 at 6:41 AM, Dmitri Zagidulin <
>>>>>> dzagidu...@basho.com
>>>>>> <javascript:_e(%7B%7D,'cvml','dzagidu...@basho.com');>> wrote:
>>>>>>
>>>>>>> Ok, I think I understand so far. So what's the question?
>>>>>>>
>>>>>>> On Thursday, August 6, 2015, Changmao.Wang <
>>>>>>> changmao.w...@datayes.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','changmao.w...@datayes.com');>> wrote:
>>>>>>>
>>>>>>>> Hi Riak users,
>>>>>>>>
>>>>>>>> Before adding new nodes, the cluster only have five nodes. The
>>>>>>>> member list are as below:
>>>>>>>> 10.21.136.66,10.21.136.71,10.21.136.76,10.21.136.81,10.21.136.86.
>>>>>>>> We did not setup http proxy for the cluster, only one node of the
>>>>>>>> cluster provide the http service.  so the CPU load is always high on 
>>>>>>>> this
>>>>>>>> node.
>>>>>>>>
>>>>>>>> After that, I added four nodes (10.21.136.[91-94]) to those
>>>>>>>> cluster. During the ring/data balance progress, each node failed(riak
>>>>>>>> stopped) because of disk 100% full.
>>>>>>>> I used multi-disk path to "data_root" parameter in
>>>>>>>> '/etc/riak/app.config'. Each disk is only 580MB size.
>>>>>>>> As you know, bitcask storage engine did not support multi-disk
>>>>>>>> path. After one of the disks is 100% full, it can not switch next idle
>>>>>>>> disk. So the "riak" service is down.
>>>>>>>>
>>>>>>>> After that, I removed the new add four nodes at active nodes with
>>>>>>>> "riak-admin cluster leave riak@'10.21.136.91'".
>>>>>>>> and then stop "riak" service on other active new nodes, reformat
>>>>>>>> the above new nodes with LVM disk management (bind 6 disk with virtual 
>>>>>>>> disk
>>>>>>>> group).
>>>>>>>> Replace the "data-root" parameter with one folder, and then start
>>>>>>>> "riak" service again. After that, the cluster began the data balance 
>>>>>>>> again.
>>>>>>>> That's the whole story.
>>>>>>>>
>>>>>>>>
>>>>>>>> Amao
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>> *From: *"Dmitri Zagidulin" <dzagidu...@basho.com>
>>>>>>>> *To: *"Changmao.Wang" <changmao.w...@datayes.com>
>>>>>>>> *Sent: *Thursday, August 6, 2015 10:46:59 PM
>>>>>>>> *Subject: *Re: why leaving riak cluster so slowly and how to
>>>>>>>> accelerate the speed
>>>>>>>>
>>>>>>>> Hi Amao,
>>>>>>>>
>>>>>>>> Can you explain a bit more which steps you've taken, and what the
>>>>>>>> problem is?
>>>>>>>>
>>>>>>>> Which nodes have been added, and which nodes are leaving the
>>>>>>>> cluster?
>>>>>>>>
>>>>>>>> On Tue, Jul 28, 2015 at 11:03 PM, Changmao.Wang <
>>>>>>>> changmao.w...@datayes.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Raik user group,
>>>>>>>>>
>>>>>>>>>  I'm using riak and riak-cs 1.4.2. Last weekend, I added four
>>>>>>>>> nodes to cluster with 5 nodes. However, it's failed with one of disks 
>>>>>>>>> 100%
>>>>>>>>> full.
>>>>>>>>> As you know bitcask storage engine can not support multifolders.
>>>>>>>>>
>>>>>>>>> After that, I restarted the "riak" and leave the cluster with the
>>>>>>>>> command "riak-admin cluster leave" and "riak-admin cluster plan", and 
>>>>>>>>> the
>>>>>>>>> commit.
>>>>>>>>> However, riak is always doing KV balance after my submit leaving
>>>>>>>>> command. I guess that it's doing join cluster progress.
>>>>>>>>>
>>>>>>>>> Could you show us how to accelerate the leaving progress? I have
>>>>>>>>> tuned the "transfer-limit" parameters on 9 nodes.
>>>>>>>>>
>>>>>>>>> below is some commands output:
>>>>>>>>> riak-admin member-status
>>>>>>>>> ================================= Membership
>>>>>>>>> ==================================
>>>>>>>>> Status     Ring    Pending    Node
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>> leaving     6.3%     10.9%    'riak@10.21.136.91'
>>>>>>>>> leaving     9.4%     10.9%    'riak@10.21.136.92'
>>>>>>>>> leaving     6.3%     10.9%    'riak@10.21.136.93'
>>>>>>>>> leaving     6.3%     10.9%    'riak@10.21.136.94'
>>>>>>>>> valid      10.9%     10.9%    'riak@10.21.136.66'
>>>>>>>>> valid      12.5%     10.9%    'riak@10.21.136.71'
>>>>>>>>> valid      18.8%     10.9%    'riak@10.21.136.76'
>>>>>>>>> valid      18.8%     12.5%    'riak@10.21.136.81'
>>>>>>>>> valid      10.9%     10.9%    'riak@10.21.136.86'
>>>>>>>>>
>>>>>>>>>  riak-admin transfer_limit
>>>>>>>>> =============================== Transfer Limit
>>>>>>>>> ================================
>>>>>>>>> Limit        Node
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>>   200        'riak@10.21.136.66'
>>>>>>>>>   200        'riak@10.21.136.71'
>>>>>>>>>   100        'riak@10.21.136.76'
>>>>>>>>>   100        'riak@10.21.136.81'
>>>>>>>>>   200        'riak@10.21.136.86'
>>>>>>>>>   500        'riak@10.21.136.91'
>>>>>>>>>   500        'riak@10.21.136.92'
>>>>>>>>>   500        'riak@10.21.136.93'
>>>>>>>>>   500        'riak@10.21.136.94'
>>>>>>>>>
>>>>>>>>> Any more details for your diagnosing the problem?
>>>>>>>>>
>>>>>>>>> Amao
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> riak-users mailing list
>>>>>>>>> riak-users@lists.basho.com
>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> riak-users mailing list
>>>>>>> riak-users@lists.basho.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','riak-users@lists.basho.com');>
>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Amao Wang
>>>>>> Best & Regards
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users@lists.basho.com
>>>>> <javascript:_e(%7B%7D,'cvml','riak-users@lists.basho.com');>
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Amao Wang
>>>> Best & Regards
>>>>
>>>
>>>
>>
>>
>> --
>> Amao Wang
>> Best & Regards
>>
>>>
>>
>
>
> --
> Amao Wang
> Best & Regards
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: why leaving riak cluster so slowly and how to accelerate the speed

Reply via email to