Hi John, Assuming things aren't back to normal... A few things:
Attach to any running node and run this: rpc:multicall([node() | nodes()], riak_core_vnode_manager, force_handoffs, []). This will attempt to force handoff. If this restarts handoff, you've got new issue that we'll need to track down. Please report back if this gets handoffs running again . Another possible fix: Take a look at https://github.com/basho/riak_core/pull/153 This was fixed on 1.1, but it might be what's hitting you (though, admittedly, your issue does seem like a perfect match for the issue from the 1.0.2 release notes). If this is what's ailing you, there's a work-around here: https://github.com/basho/riak_core/pull/153#issuecomment-4527706 If neither of these work, let us know and we'll take a deeper look. Specifically: a) any log files you could send along would be helpful b) the output of the following diagnostic: f(Members). Members = riak_core_ring:all_members(element(2, riak_core_ring_manager:get_raw_ring())). [{N, rpc:call(N, riak_core_handoff_manager, status, [])} || N <- Members]. Thanks, John. Mark On Sun, Jun 3, 2012 at 5:06 AM, John Axel Eriksson <j...@insane.se> wrote: > Hi. > > We had an issue where one of the riak servers died (had to be force > removed from cluster). After we did that things got really bad and most > data was unreachable for hours. I added a new node to replace the old one > at one point as well - that never got any data and even now about a day > later it hasn't gotten any data. > What seems to be the issue now is that there are a few nodes are waiting > on handoff of 1 partition. When I look at ring_status I see this: > > Attempting to restart script through sudo -u riak > ================================== Claimant > =================================== > Claimant: 'riak@r-001.x.x.x > Status: up > Ring Ready: true > > ============================== Ownership Handoff > ============================== > Owner: riak@r-004.x.x.x > Next Owner: riak@r-003.x.x.x > > Index: 930565495644285842450002452081070828921550798848 > Waiting on: [] > Complete: [riak_kv_vnode,riak_pipe_vnode,riak_search_vnode] > > > ------------------------------------------------------------------------------- > > ============================== Unreachable Nodes > ============================== > All nodes are up and reachable > > > Ok, so it looks like the problem described in the Release Notes for 1.0.2 > here https://github.com/basho/riak/blob/1.0.2-release/RELEASE-NOTES.org. > Unfortunately I've run that code (through riak attach) with no result. > > It's been in this state for 12 hours now I think. What can we do to fix > our cluster? > > I upgraded to 1.0.3 hoping it would fix our problems but that didn't help. > I cannot upgrade to 1.1.x because we mainly use Luwak for large object > support > and that's discontinued in 1.1.x as far as I know. > > Thanks for your help, > John > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com