Re: why leaving riak cluster so slowly and how to accelerate the speed

changmao wang Fri, 14 Aug 2015 12:26:21 -0700

During last three days, I setup a developing riak cluster with five nodes,
and used "s3cmd" to upload 18GB testing data(maybe 20 thousands of files).
After that, I tried to let one node leaving the cluster, and then shutdown
and mark down it. Replacing the IP address and joining the cluster again.
The above whole processes were successful. However, I'm not sure whether no
not it can be done on production environment.


I followed below the docs to do above steps:

http://docs.basho.com/riak/latest/ops/running/nodes/renaming/

After I run "riak-admin cluster leave riak@'x.x.x.x'" ,"riak-admin cluster
plan", "riak-admin cluster commit", then checked the member-status, the
main difference of leaving cluster on production and developing environment
are as below:

root@cluster-s3-dev-hd1:~# riak-admin member-status
================================= Membership
==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
leaving    18.8%      0.0%    'riak@10.21.236.185'
valid      21.9%     25.0%    'riak@10.21.236.181'
valid      21.9%     25.0%    'riak@10.21.236.182'
valid      18.8%     25.0%    'riak@10.21.236.183'
valid      18.8%     25.0%    'riak@10.21.236.184'
-------------------------------------------------------------------------------

several minutes elapsed, the then checking the status as below:


root@cluster-s3-dev-hd1:~# riak-admin member-status
================================= Membership
==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
leaving    12.5%      0.0%    'riak@10.21.236.185'
valid      21.9%     25.0%    'riak@10.21.236.181'
valid      28.1%     25.0%    'riak@10.21.236.182'
valid      18.8%     25.0%    'riak@10.21.236.183'
valid      18.8%     25.0%    'riak@10.21.236.184'
-------------------------------------------------------------------------------
Valid:4 / Leaving:1 / Exiting:0 / Joining:0 / Down:0

After that, I shutdown riak  with "riak stop", and mark down it on active
nodes.
My question is what's the meaning ot "Pending 0.0%"?

On production cluster, the status are as below:
root@cluster1-hd12:/root/scripts# riak-admin transfers
'riak@10.21.136.94' waiting to handoff 5 partitions
'riak@10.21.136.93' waiting to handoff 5 partitions
'riak@10.21.136.92' waiting to handoff 5 partitions
'riak@10.21.136.91' waiting to handoff 5 partitions
'riak@10.21.136.86' waiting to handoff 5 partitions
'riak@10.21.136.81' waiting to handoff 2 partitions
'riak@10.21.136.76' waiting to handoff 3 partitions
'riak@10.21.136.71' waiting to handoff 5 partitions
'riak@10.21.136.66' waiting to handoff 5 partitions

And there're active transfers.  On developing environment, there're no
active transfers after my running of "riak-admin cluster commit".
Can I follow the same steps as developing environment to run it on
production cluster?



On Wed, Aug 12, 2015 at 10:39 PM, Dmitri Zagidulin <dzagidu...@basho.com>
wrote:

> Responses inline.
>
>
> On Tue, Aug 11, 2015 at 12:53 PM, changmao wang <wang.chang...@gmail.com>
> wrote:
>
>> 1. About backuping new nodes of four and then using 'riak-admin
>> force-replace'. what's the status of new added nodes?
>> as you know, we want to replace one of leaving nodes.
>>
>
> I don't understand the question. Doing 'riak-admin force-replace' on one
> of the nodes that's leaving should overwrite the leave request and tell it
> to change its node id / ip address. (If that doesn't work, stop the leaving
> node, and do a 'riak-admin reip' command instead).
>
>
>
>> 2. what's the risk of 'riak-admin force-remove' 'riak@10.21.136.91'
>> without backup?
>> As you know, now the node(riak@10.21.136.91) is a member of the cluster,
>> and keeping almost 2.5TB data, maybe 10 percent of the whole cluster.
>>
>
> The only reason I asked about backup is because it sounded like you
> cleared the disk on it. If it currently has the data, then it'll be fine.
> Force-remove just changes the IP address, and doesn't delete the data or
> anything.
>
>
> On Tue, Aug 11, 2015 at 7:32 PM, Dmitri Zagidulin <dzagidu...@basho.com>
> wrote:
>
>> 1. How to force leave "leaving"'s nodes without data loss?
>>
>> This depends on - did you back up the data directory of the 4 new nodes,
>> before you reformatted them?
>> If you backed them up (and then restored the data directory once you
>> reformatted them), you can try:
>>
>> riak-admin force-replace 'riak@10.21.136.91' 'riak@<whatever your new ip
>> address is for that node>'
>> (same for the other 3)
>>
>> If you did not back up those nodes, the only thing you can do is force
>> them to leave, and then join the new ones. So, for each of the 4:
>>
>> riak-admin force-remove 'riak@10.21.136.91' 'riak@10.21.136.66'
>> (same for the other 3)
>>
>> In either case, after force-replacing or force-removing, you have to join
>> the new nodes to the cluster, before you commit.
>>
>> riak-admin join 'riak@new node' 'riak@10.21.136.66'
>> (same for the other 3)
>> and finally:
>> riak-cluster plan
>> riak-cluster commit
>>
>> As for the error, the reason you're seeing it, is because the other nodes
>> can't contact the 4 that are supposed to be leaving. (Since you wiped them).
>> The amount of time that passed doesn't matter, the cluster will be
>> waiting for those nodes to leave indefinitely, unless you force-remove or
>> force-replace.
>>
>>
>>
>> On Tue, Aug 11, 2015 at 1:32 AM, changmao wang <wang.chang...@gmail.com>
>> wrote:
>>
>>> HI Dmitri,
>>>
>>> For your question,
>>> 3) Re-formatted those four nodes and re-installed Riak. Here is where it
>>> gets tricky though. Several questions for you:
>>> - Did you attempt to re-join those 4 reinstalled nodes into the cluster?
>>> What was the output of the cluster join and cluster plan commands?
>>> - Did the IP address change, after they were reformatted? If so, you
>>> probably need to use something like 'reip' at this point:
>>> http://docs.basho.com/riak/latest/ops/running/tools/riak-admin/#reip
>>>
>>> I did  NOT try to re-join those 4 re-join those 4 reinstalled nodes into
>>> the cluster. As you know, member-status shows 'they're leaving" as below:
>>> riak-admin member-status
>>> ================================= Membership
>>> ==================================
>>> Status     Ring    Pending    Node
>>>
>>> -------------------------------------------------------------------------------
>>> leaving    10.9%     10.9%    'riak@10.21.136.91'
>>> leaving     9.4%     10.9%    'riak@10.21.136.92'
>>> leaving     7.8%     10.9%    'riak@10.21.136.93'
>>> leaving     7.8%     10.9%    'riak@10.21.136.94'
>>> valid      10.9%     10.9%    'riak@10.21.136.66'
>>> valid      10.9%     10.9%    'riak@10.21.136.71'
>>> valid      14.1%     10.9%    'riak@10.21.136.76'
>>> valid      17.2%     12.5%    'riak@10.21.136.81'
>>> valid      10.9%     10.9%    'riak@10.21.136.86'
>>>
>>> -------------------------------------------------------------------------------
>>> Valid:5 / Leaving:4 / Exiting:0 / Joining:0 / Down:0
>>>
>>> two weeks elapsed, 'riak-admin member-status' shows same result. I don't
>>> know which step ring hand off?
>>>
>>> I did not changed the IP address of four newly adding nodes.
>>>
>>> My questions:
>>>
>>> 1. How to force leave "leaving"'s nodes without data loss?
>>> 2. I have found some errors related to handoff of partition in
>>> /etc/riak/log/errors.
>>> Details are as below:
>>>
>>> 2015-07-30 16:04:33.643 [error]
>>> <0.12872.15>@riak_core_handoff_sender:start_fold:262 ownership_transfer
>>> transfer of riak_kv_vnode from 'riak@10.21.136.76'
>>> 45671926166590716193865151022383844364247891968 to 'riak@10.21.136.93'
>>> 45671926166590716193865151022383844364247891968 failed because of enotconn
>>> 2015-07-30 16:04:33.643 [error]
>>> <0.197.0>@riak_core_handoff_manager:handle_info:289 An outbound handoff of
>>> partition riak_kv_vnode 45671926166590716193865151022383844364247891968 was
>>> terminated for reason: {shutdown,{error,enotconn}}
>>>
>>>
>>>
>>> I have searched it with google and found related articles. However,
>>> there's no solution.
>>>
>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-October/016052.html
>>>
>>>
>>> On Mon, Aug 10, 2015 at 10:09 PM, Dmitri Zagidulin <dzagidu...@basho.com
>>> > wrote:
>>>
>>>> Hi Changmao,
>>>>
>>>> The state of the cluster can be determined from running 'riak-admin
>>>> member-status' and 'riak-admin ring-status'.
>>>> If I understand the sequence of events, you:
>>>> 1) Joined four new nodes to the cluster. (Which crashed due to not
>>>> enough disk space)
>>>> 2) Removed them from the cluster via 'riak-admin cluster leave'.  This
>>>> is a "planned remove" command, and expects for the nodes to gradually hand
>>>> off their partitions (to transfer ownership) before actually leaving.  So
>>>> this is probably the main problem - the ring is stuck waiting for those
>>>> nodes to properly hand off.
>>>>
>>>> 3) Re-formatted those four nodes and re-installed Riak. Here is where
>>>> it gets tricky though. Several questions for you:
>>>> - Did you attempt to re-join those 4 reinstalled nodes into the
>>>> cluster? What was the output of the cluster join and cluster plan commands?
>>>> - Did the IP address change, after they were reformatted? If so, you
>>>> probably need to use something like 'reip' at this point:
>>>> http://docs.basho.com/riak/latest/ops/running/tools/riak-admin/#reip
>>>>
>>>> The 'failed because of enotconn' error message is happening because the
>>>> cluster is waiting to hand off partitions to .94, but cannot connect to it.
>>>>
>>>> Anyways, here's what I recommend. If you can lose the data, it's
>>>> probably easier to format and reinstall the whole cluster.
>>>> If not, you can 'force-remove' those four nodes, one by one (see
>>>> http://docs.basho.com/riak/latest/ops/running/cluster-admin/#force-remove
>>>> )
>>>>
>>>>
>>>>
>>>> On Thu, Aug 6, 2015 at 11:55 PM, changmao wang <wang.chang...@gmail.com
>>>> > wrote:
>>>>
>>>>> Dmitri,
>>>>>
>>>>> Thanks for your quick reply.
>>>>> my question are as below:
>>>>> 1. what's the current status of the whole cluster? Is't doing data
>>>>> balance?
>>>>> 2. there's so many errors during one of the node error log. how to
>>>>> handle it?
>>>>> 2015-08-05 01:38:59.717 [error]
>>>>> <0.23000.298>@riak_core_handoff_sender:start_fold:262 ownership_transfer
>>>>> transfer of riak_kv_vnode from 'riak@10.21.136.81'
>>>>> 525227150915793236229449236757414210188850757632 to 'riak@10.21.136.94'
>>>>> 525227150915793236229449236757414210188850757632 failed because of 
>>>>> enotconn
>>>>> 2015-08-05 01:38:59.718 [error]
>>>>> <0.195.0>@riak_core_handoff_manager:handle_info:289 An outbound handoff of
>>>>> partition riak_kv_vnode 525227150915793236229449236757414210188850757632
>>>>> was terminated for reason: {shutdown,{error,enotconn}}
>>>>>
>>>>> During the last 5 days, there's no changes of the "riak-admin member
>>>>> status" output.
>>>>> 3. how to accelerate the data balance?
>>>>>
>>>>>
>>>>> On Fri, Aug 7, 2015 at 6:41 AM, Dmitri Zagidulin <dzagidu...@basho.com
>>>>> > wrote:
>>>>>
>>>>>> Ok, I think I understand so far. So what's the question?
>>>>>>
>>>>>> On Thursday, August 6, 2015, Changmao.Wang <changmao.w...@datayes.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Riak users,
>>>>>>>
>>>>>>> Before adding new nodes, the cluster only have five nodes. The
>>>>>>> member list are as below:
>>>>>>> 10.21.136.66,10.21.136.71,10.21.136.76,10.21.136.81,10.21.136.86.
>>>>>>> We did not setup http proxy for the cluster, only one node of the
>>>>>>> cluster provide the http service.  so the CPU load is always high on 
>>>>>>> this
>>>>>>> node.
>>>>>>>
>>>>>>> After that, I added four nodes (10.21.136.[91-94]) to those cluster.
>>>>>>> During the ring/data balance progress, each node failed(riak stopped)
>>>>>>> because of disk 100% full.
>>>>>>> I used multi-disk path to "data_root" parameter in
>>>>>>> '/etc/riak/app.config'. Each disk is only 580MB size.
>>>>>>> As you know, bitcask storage engine did not support multi-disk path.
>>>>>>> After one of the disks is 100% full, it can not switch next idle disk. 
>>>>>>> So
>>>>>>> the "riak" service is down.
>>>>>>>
>>>>>>> After that, I removed the new add four nodes at active nodes with
>>>>>>> "riak-admin cluster leave riak@'10.21.136.91'".
>>>>>>> and then stop "riak" service on other active new nodes, reformat the
>>>>>>> above new nodes with LVM disk management (bind 6 disk with virtual disk
>>>>>>> group).
>>>>>>> Replace the "data-root" parameter with one folder, and then start
>>>>>>> "riak" service again. After that, the cluster began the data balance 
>>>>>>> again.
>>>>>>> That's the whole story.
>>>>>>>
>>>>>>>
>>>>>>> Amao
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From: *"Dmitri Zagidulin" <dzagidu...@basho.com>
>>>>>>> *To: *"Changmao.Wang" <changmao.w...@datayes.com>
>>>>>>> *Sent: *Thursday, August 6, 2015 10:46:59 PM
>>>>>>> *Subject: *Re: why leaving riak cluster so slowly and how to
>>>>>>> accelerate the speed
>>>>>>>
>>>>>>> Hi Amao,
>>>>>>>
>>>>>>> Can you explain a bit more which steps you've taken, and what the
>>>>>>> problem is?
>>>>>>>
>>>>>>> Which nodes have been added, and which nodes are leaving the cluster?
>>>>>>>
>>>>>>> On Tue, Jul 28, 2015 at 11:03 PM, Changmao.Wang <
>>>>>>> changmao.w...@datayes.com> wrote:
>>>>>>>
>>>>>>>> Hi Raik user group,
>>>>>>>>
>>>>>>>>  I'm using riak and riak-cs 1.4.2. Last weekend, I added four nodes
>>>>>>>> to cluster with 5 nodes. However, it's failed with one of disks 100% 
>>>>>>>> full.
>>>>>>>> As you know bitcask storage engine can not support multifolders.
>>>>>>>>
>>>>>>>> After that, I restarted the "riak" and leave the cluster with the
>>>>>>>> command "riak-admin cluster leave" and "riak-admin cluster plan", and 
>>>>>>>> the
>>>>>>>> commit.
>>>>>>>> However, riak is always doing KV balance after my submit leaving
>>>>>>>> command. I guess that it's doing join cluster progress.
>>>>>>>>
>>>>>>>> Could you show us how to accelerate the leaving progress? I have
>>>>>>>> tuned the "transfer-limit" parameters on 9 nodes.
>>>>>>>>
>>>>>>>> below is some commands output:
>>>>>>>> riak-admin member-status
>>>>>>>> ================================= Membership
>>>>>>>> ==================================
>>>>>>>> Status     Ring    Pending    Node
>>>>>>>>
>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>> leaving     6.3%     10.9%    'riak@10.21.136.91'
>>>>>>>> leaving     9.4%     10.9%    'riak@10.21.136.92'
>>>>>>>> leaving     6.3%     10.9%    'riak@10.21.136.93'
>>>>>>>> leaving     6.3%     10.9%    'riak@10.21.136.94'
>>>>>>>> valid      10.9%     10.9%    'riak@10.21.136.66'
>>>>>>>> valid      12.5%     10.9%    'riak@10.21.136.71'
>>>>>>>> valid      18.8%     10.9%    'riak@10.21.136.76'
>>>>>>>> valid      18.8%     12.5%    'riak@10.21.136.81'
>>>>>>>> valid      10.9%     10.9%    'riak@10.21.136.86'
>>>>>>>>
>>>>>>>>  riak-admin transfer_limit
>>>>>>>> =============================== Transfer Limit
>>>>>>>> ================================
>>>>>>>> Limit        Node
>>>>>>>>
>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>   200        'riak@10.21.136.66'
>>>>>>>>   200        'riak@10.21.136.71'
>>>>>>>>   100        'riak@10.21.136.76'
>>>>>>>>   100        'riak@10.21.136.81'
>>>>>>>>   200        'riak@10.21.136.86'
>>>>>>>>   500        'riak@10.21.136.91'
>>>>>>>>   500        'riak@10.21.136.92'
>>>>>>>>   500        'riak@10.21.136.93'
>>>>>>>>   500        'riak@10.21.136.94'
>>>>>>>>
>>>>>>>> Any more details for your diagnosing the problem?
>>>>>>>>
>>>>>>>> Amao
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> riak-users mailing list
>>>>>>>> riak-users@lists.basho.com
>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users@lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Amao Wang
>>>>> Best & Regards
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>>
>>> --
>>> Amao Wang
>>> Best & Regards
>>>
>>
>>
>
>
> --
> Amao Wang
> Best & Regards
>
>>
>


-- 
Amao Wang
Best & Regards

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: why leaving riak cluster so slowly and how to accelerate the speed

Reply via email to