Hello, guys,

It seems like we ran into emergency. I wonder if there can be any help on
that.

Everything that happened below was because we were trying to rebalace space
used by nodes that we running out of space.

Cluster is 7 machines now, member_status looks like:
Attempting to restart script through sudo -u riak
================================= Membership
==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      15.6%     20.3%    'riak@192.168.135.180'
valid       0.0%      0.0%    'riak@192.168.152.90'
valid       0.0%      0.0%    'riak@192.168.153.182'
valid      26.6%     23.4%    'riak@192.168.164.133'
valid      27.3%     21.1%    'riak@192.168.177.36'
valid       8.6%     15.6%    'riak@192.168.194.138'
valid      21.9%     19.5%    'riak@192.168.194.149'
-------------------------------------------------------------------------------
Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

2 nodes with 0 Ring was made to force leave the cluster, they have plenty
of data on them which is now seems to be not accessible. Handoffs are stuck
it seems. Node 'riak@192.168.152.90'(is in same situation as '
riak@192.168.153.182') tries to handoff partitions to 'riak@192.168.164.133'
but fails for unknown reason after huge timeouts(from 5 to 40 minutes).
Partition it's trying to move is about 10Gb in size. It grows slowly on
target node, but probably it's just usual writes from normal operation. It
doesn't get any smaller on source node.

I wonder is there any way to let cluster know that we want those nodes to
be actually members of source node and there's no actual need to transfer
them? How to redo cluster ownership balance? Revert this force-leave stuff.

Thank you,
Leonid
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to